@wentorai/research-plugins 1.3.2 → 1.4.2
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +32 -56
- package/curated/analysis/README.md +1 -13
- package/curated/domains/README.md +1 -5
- package/curated/literature/README.md +3 -12
- package/curated/research/README.md +1 -18
- package/curated/tools/README.md +1 -12
- package/curated/writing/README.md +2 -6
- package/index.ts +88 -5
- package/openclaw.plugin.json +3 -12
- package/package.json +3 -5
- package/skills/analysis/statistics/SKILL.md +1 -1
- package/skills/analysis/statistics/meta-analysis-guide/SKILL.md +1 -1
- package/skills/domains/ai-ml/SKILL.md +3 -2
- package/skills/domains/ai-ml/generative-ai-guide/SKILL.md +1 -0
- package/skills/domains/ai-ml/huggingface-api/SKILL.md +251 -0
- package/skills/domains/biomedical/SKILL.md +9 -2
- package/skills/domains/biomedical/alphafold-api/SKILL.md +227 -0
- package/skills/domains/biomedical/biothings-api/SKILL.md +296 -0
- package/skills/domains/biomedical/clinicaltrials-api-v2/SKILL.md +216 -0
- package/skills/domains/biomedical/enrichr-api/SKILL.md +264 -0
- package/skills/domains/biomedical/ensembl-rest-api/SKILL.md +204 -0
- package/skills/domains/biomedical/medical-data-api/SKILL.md +197 -0
- package/skills/domains/biomedical/pdb-structure-api/SKILL.md +219 -0
- package/skills/domains/business/SKILL.md +2 -3
- package/skills/domains/chemistry/SKILL.md +3 -2
- package/skills/domains/chemistry/catalysis-hub-api/SKILL.md +171 -0
- package/skills/domains/education/SKILL.md +2 -3
- package/skills/domains/law/SKILL.md +3 -2
- package/skills/domains/law/uk-legislation-api/SKILL.md +179 -0
- package/skills/literature/discovery/SKILL.md +1 -1
- package/skills/literature/discovery/citation-alert-guide/SKILL.md +2 -2
- package/skills/literature/discovery/conference-proceedings-guide/SKILL.md +2 -2
- package/skills/literature/discovery/literature-mapping-guide/SKILL.md +1 -1
- package/skills/literature/discovery/paper-recommendation-guide/SKILL.md +8 -14
- package/skills/literature/discovery/rss-paper-feeds/SKILL.md +20 -14
- package/skills/literature/discovery/semantic-paper-radar/SKILL.md +8 -8
- package/skills/literature/discovery/semantic-scholar-recs-guide/SKILL.md +103 -86
- package/skills/literature/fulltext/SKILL.md +3 -2
- package/skills/literature/fulltext/arxiv-latex-source/SKILL.md +195 -0
- package/skills/literature/fulltext/open-access-guide/SKILL.md +1 -1
- package/skills/literature/fulltext/open-access-mining-guide/SKILL.md +5 -5
- package/skills/literature/metadata/citation-network-guide/SKILL.md +3 -3
- package/skills/literature/metadata/h-index-guide/SKILL.md +0 -27
- package/skills/literature/search/SKILL.md +3 -4
- package/skills/literature/search/citation-chaining-guide/SKILL.md +42 -32
- package/skills/literature/search/database-comparison-guide/SKILL.md +1 -1
- package/skills/literature/search/semantic-scholar-api/SKILL.md +56 -53
- package/skills/research/automation/SKILL.md +2 -3
- package/skills/research/automation/datagen-research-guide/SKILL.md +1 -0
- package/skills/research/automation/mle-agent-guide/SKILL.md +1 -0
- package/skills/research/automation/paper-to-agent-guide/SKILL.md +2 -1
- package/skills/research/deep-research/auto-deep-research-guide/SKILL.md +1 -0
- package/skills/research/deep-research/in-depth-research-guide/SKILL.md +1 -1
- package/skills/research/deep-research/kosmos-scientist-guide/SKILL.md +3 -3
- package/skills/research/deep-research/llm-scientific-discovery-guide/SKILL.md +1 -1
- package/skills/research/deep-research/local-deep-research-guide/SKILL.md +6 -6
- package/skills/research/deep-research/open-researcher-guide/SKILL.md +3 -3
- package/skills/research/deep-research/tongyi-deep-research-guide/SKILL.md +4 -4
- package/skills/research/methodology/SKILL.md +1 -1
- package/skills/research/methodology/claude-scientific-guide/SKILL.md +1 -0
- package/skills/research/methodology/grad-school-guide/SKILL.md +1 -1
- package/skills/research/methodology/qualitative-research-guide/SKILL.md +1 -1
- package/skills/research/paper-review/SKILL.md +1 -1
- package/skills/research/paper-review/automated-review-guide/SKILL.md +1 -1
- package/skills/research/paper-review/peer-review-guide/SKILL.md +1 -1
- package/skills/tools/diagram/excalidraw-diagram-guide/SKILL.md +1 -1
- package/skills/tools/diagram/mermaid-architect-guide/SKILL.md +1 -1
- package/skills/tools/diagram/plantuml-guide/SKILL.md +1 -1
- package/skills/tools/document/grobid-pdf-parsing/SKILL.md +1 -1
- package/skills/tools/document/paper-parse-guide/SKILL.md +2 -2
- package/skills/tools/knowledge-graph/SKILL.md +2 -3
- package/skills/tools/knowledge-graph/citation-network-builder/SKILL.md +5 -5
- package/skills/tools/knowledge-graph/knowledge-graph-construction/SKILL.md +1 -1
- package/skills/tools/ocr-translate/zotero-pdf2zh-guide/SKILL.md +1 -0
- package/skills/tools/scraping/academic-web-scraping/SKILL.md +1 -2
- package/skills/tools/scraping/google-scholar-scraper/SKILL.md +7 -7
- package/skills/writing/citation/SKILL.md +1 -1
- package/skills/writing/citation/academic-citation-manager/SKILL.md +20 -17
- package/skills/writing/citation/citation-assistant-skill/SKILL.md +72 -58
- package/skills/writing/citation/obsidian-citation-guide/SKILL.md +1 -0
- package/skills/writing/citation/obsidian-zotero-guide/SKILL.md +1 -0
- package/skills/writing/citation/onecite-reference-guide/SKILL.md +1 -1
- package/skills/writing/citation/papersgpt-zotero-guide/SKILL.md +1 -0
- package/skills/writing/citation/zotero-mdnotes-guide/SKILL.md +1 -0
- package/skills/writing/citation/zotero-reference-guide/SKILL.md +2 -1
- package/skills/writing/citation/zotero-scholar-guide/SKILL.md +1 -1
- package/skills/writing/composition/scientific-writing-resources/SKILL.md +1 -0
- package/skills/writing/latex/latex-drawing-collection/SKILL.md +1 -0
- package/skills/writing/latex/latex-templates-collection/SKILL.md +1 -0
- package/skills/writing/templates/novathesis-guide/SKILL.md +1 -0
- package/src/tools/arxiv.ts +81 -30
- package/src/tools/biorxiv.ts +158 -0
- package/src/tools/crossref.ts +63 -22
- package/src/tools/datacite.ts +191 -0
- package/src/tools/dblp.ts +125 -0
- package/src/tools/doaj.ts +82 -0
- package/src/tools/europe-pmc.ts +159 -0
- package/src/tools/hal.ts +118 -0
- package/src/tools/inspire-hep.ts +165 -0
- package/src/tools/openaire.ts +158 -0
- package/src/tools/openalex.ts +26 -15
- package/src/tools/opencitations.ts +112 -0
- package/src/tools/orcid.ts +139 -0
- package/src/tools/osf-preprints.ts +104 -0
- package/src/tools/pubmed.ts +22 -13
- package/src/tools/ror.ts +118 -0
- package/src/tools/unpaywall.ts +15 -6
- package/src/tools/util.ts +141 -0
- package/src/tools/zenodo.ts +157 -0
- package/mcp-configs/academic-db/ChatSpatial.json +0 -17
- package/mcp-configs/academic-db/academia-mcp.json +0 -17
- package/mcp-configs/academic-db/academic-paper-explorer.json +0 -17
- package/mcp-configs/academic-db/academic-search-mcp-server.json +0 -17
- package/mcp-configs/academic-db/agentinterviews-mcp.json +0 -17
- package/mcp-configs/academic-db/all-in-mcp.json +0 -17
- package/mcp-configs/academic-db/alphafold-mcp.json +0 -20
- package/mcp-configs/academic-db/apple-health-mcp.json +0 -17
- package/mcp-configs/academic-db/arxiv-latex-mcp.json +0 -17
- package/mcp-configs/academic-db/arxiv-mcp-server.json +0 -17
- package/mcp-configs/academic-db/bgpt-mcp.json +0 -17
- package/mcp-configs/academic-db/biomcp.json +0 -17
- package/mcp-configs/academic-db/biothings-mcp.json +0 -17
- package/mcp-configs/academic-db/brightspace-mcp.json +0 -21
- package/mcp-configs/academic-db/catalysishub-mcp-server.json +0 -17
- package/mcp-configs/academic-db/climatiq-mcp.json +0 -20
- package/mcp-configs/academic-db/clinicaltrialsgov-mcp-server.json +0 -17
- package/mcp-configs/academic-db/deep-research-mcp.json +0 -17
- package/mcp-configs/academic-db/dicom-mcp.json +0 -17
- package/mcp-configs/academic-db/enrichr-mcp-server.json +0 -17
- package/mcp-configs/academic-db/fec-mcp-server.json +0 -17
- package/mcp-configs/academic-db/fhir-mcp-server-themomentum.json +0 -17
- package/mcp-configs/academic-db/fhir-mcp.json +0 -19
- package/mcp-configs/academic-db/gget-mcp.json +0 -17
- package/mcp-configs/academic-db/gibs-mcp.json +0 -20
- package/mcp-configs/academic-db/gis-mcp-server.json +0 -22
- package/mcp-configs/academic-db/google-earth-engine-mcp.json +0 -21
- package/mcp-configs/academic-db/google-researcher-mcp.json +0 -17
- package/mcp-configs/academic-db/idea-reality-mcp.json +0 -17
- package/mcp-configs/academic-db/legiscan-mcp.json +0 -19
- package/mcp-configs/academic-db/lex.json +0 -17
- package/mcp-configs/academic-db/m4-clinical-mcp.json +0 -21
- package/mcp-configs/academic-db/medical-mcp.json +0 -21
- package/mcp-configs/academic-db/nexonco-mcp.json +0 -20
- package/mcp-configs/academic-db/omop-mcp.json +0 -20
- package/mcp-configs/academic-db/onekgpd-mcp.json +0 -20
- package/mcp-configs/academic-db/openedu-mcp.json +0 -20
- package/mcp-configs/academic-db/opengenes-mcp.json +0 -20
- package/mcp-configs/academic-db/openstax-mcp.json +0 -21
- package/mcp-configs/academic-db/openstreetmap-mcp.json +0 -21
- package/mcp-configs/academic-db/opentargets-mcp.json +0 -21
- package/mcp-configs/academic-db/pdb-mcp.json +0 -21
- package/mcp-configs/academic-db/smithsonian-mcp.json +0 -20
- package/mcp-configs/ai-platform/Adaptive-Graph-of-Thoughts-MCP-server.json +0 -17
- package/mcp-configs/ai-platform/ai-counsel.json +0 -17
- package/mcp-configs/ai-platform/atlas-mcp-server.json +0 -17
- package/mcp-configs/ai-platform/counsel-mcp.json +0 -17
- package/mcp-configs/ai-platform/cross-llm-mcp.json +0 -17
- package/mcp-configs/ai-platform/gptr-mcp.json +0 -17
- package/mcp-configs/ai-platform/magi-researchers.json +0 -21
- package/mcp-configs/ai-platform/mcp-academic-researcher.json +0 -22
- package/mcp-configs/ai-platform/open-paper-machine.json +0 -21
- package/mcp-configs/ai-platform/paper-intelligence.json +0 -21
- package/mcp-configs/ai-platform/paper-reader.json +0 -21
- package/mcp-configs/ai-platform/paperdebugger.json +0 -21
- package/mcp-configs/browser/decipher-research-agent.json +0 -17
- package/mcp-configs/browser/deep-research.json +0 -17
- package/mcp-configs/browser/everything-claude-code.json +0 -17
- package/mcp-configs/browser/exa-mcp.json +0 -20
- package/mcp-configs/browser/gpt-researcher.json +0 -17
- package/mcp-configs/browser/heurist-agent-framework.json +0 -17
- package/mcp-configs/browser/mcp-searxng.json +0 -21
- package/mcp-configs/browser/mcp-webresearch.json +0 -20
- package/mcp-configs/cloud-docs/confluence-mcp.json +0 -37
- package/mcp-configs/cloud-docs/google-drive-mcp.json +0 -35
- package/mcp-configs/cloud-docs/notion-mcp.json +0 -29
- package/mcp-configs/communication/discord-mcp.json +0 -29
- package/mcp-configs/communication/discourse-mcp.json +0 -21
- package/mcp-configs/communication/slack-mcp.json +0 -29
- package/mcp-configs/communication/telegram-mcp.json +0 -28
- package/mcp-configs/data-platform/4everland-hosting-mcp.json +0 -17
- package/mcp-configs/data-platform/automl-stat-mcp.json +0 -21
- package/mcp-configs/data-platform/context-keeper.json +0 -17
- package/mcp-configs/data-platform/context7.json +0 -19
- package/mcp-configs/data-platform/contextstream-mcp.json +0 -17
- package/mcp-configs/data-platform/email-mcp.json +0 -17
- package/mcp-configs/data-platform/jefferson-stats-mcp.json +0 -22
- package/mcp-configs/data-platform/mcp-excel-server.json +0 -21
- package/mcp-configs/data-platform/mcp-stata.json +0 -21
- package/mcp-configs/data-platform/mcpstack-jupyter.json +0 -21
- package/mcp-configs/data-platform/ml-mcp.json +0 -21
- package/mcp-configs/data-platform/nasdaq-data-link-mcp.json +0 -20
- package/mcp-configs/data-platform/numpy-mcp.json +0 -21
- package/mcp-configs/database/neo4j-mcp.json +0 -37
- package/mcp-configs/database/postgres-mcp.json +0 -28
- package/mcp-configs/database/sqlite-mcp.json +0 -29
- package/mcp-configs/dev-platform/geogebra-mcp.json +0 -21
- package/mcp-configs/dev-platform/github-mcp.json +0 -31
- package/mcp-configs/dev-platform/gitlab-mcp.json +0 -34
- package/mcp-configs/dev-platform/latex-mcp-server.json +0 -21
- package/mcp-configs/dev-platform/manim-mcp.json +0 -20
- package/mcp-configs/dev-platform/mcp-echarts.json +0 -20
- package/mcp-configs/dev-platform/panel-viz-mcp.json +0 -20
- package/mcp-configs/dev-platform/paperbanana.json +0 -20
- package/mcp-configs/dev-platform/texflow-mcp.json +0 -20
- package/mcp-configs/dev-platform/texmcp.json +0 -20
- package/mcp-configs/dev-platform/typst-mcp.json +0 -21
- package/mcp-configs/dev-platform/vizro-mcp.json +0 -20
- package/mcp-configs/email/email-mcp.json +0 -40
- package/mcp-configs/email/gmail-mcp.json +0 -37
- package/mcp-configs/note-knowledge/ApeRAG.json +0 -17
- package/mcp-configs/note-knowledge/In-Memoria.json +0 -17
- package/mcp-configs/note-knowledge/agent-memory.json +0 -17
- package/mcp-configs/note-knowledge/aimemo.json +0 -17
- package/mcp-configs/note-knowledge/biel-mcp.json +0 -19
- package/mcp-configs/note-knowledge/cognee.json +0 -17
- package/mcp-configs/note-knowledge/context-awesome.json +0 -17
- package/mcp-configs/note-knowledge/context-mcp.json +0 -17
- package/mcp-configs/note-knowledge/conversation-handoff-mcp.json +0 -17
- package/mcp-configs/note-knowledge/cortex.json +0 -17
- package/mcp-configs/note-knowledge/devrag.json +0 -17
- package/mcp-configs/note-knowledge/easy-obsidian-mcp.json +0 -17
- package/mcp-configs/note-knowledge/engram.json +0 -17
- package/mcp-configs/note-knowledge/gnosis-mcp.json +0 -17
- package/mcp-configs/note-knowledge/graphlit-mcp-server.json +0 -19
- package/mcp-configs/note-knowledge/local-faiss-mcp.json +0 -21
- package/mcp-configs/note-knowledge/mcp-memory-service.json +0 -21
- package/mcp-configs/note-knowledge/mcp-obsidian.json +0 -23
- package/mcp-configs/note-knowledge/mcp-ragdocs.json +0 -20
- package/mcp-configs/note-knowledge/mcp-summarizer.json +0 -21
- package/mcp-configs/note-knowledge/mediawiki-mcp.json +0 -21
- package/mcp-configs/note-knowledge/openzim-mcp.json +0 -20
- package/mcp-configs/note-knowledge/zettelkasten-mcp.json +0 -21
- package/mcp-configs/reference-mgr/academic-paper-mcp-http.json +0 -20
- package/mcp-configs/reference-mgr/academix.json +0 -20
- package/mcp-configs/reference-mgr/arxiv-cli.json +0 -17
- package/mcp-configs/reference-mgr/arxiv-research-mcp.json +0 -21
- package/mcp-configs/reference-mgr/arxiv-search-mcp.json +0 -17
- package/mcp-configs/reference-mgr/chiken.json +0 -17
- package/mcp-configs/reference-mgr/claude-scholar.json +0 -17
- package/mcp-configs/reference-mgr/devonthink-mcp.json +0 -17
- package/mcp-configs/reference-mgr/google-scholar-abstract-mcp.json +0 -19
- package/mcp-configs/reference-mgr/google-scholar-mcp.json +0 -20
- package/mcp-configs/reference-mgr/mcp-paperswithcode.json +0 -21
- package/mcp-configs/reference-mgr/mcp-scholarly.json +0 -20
- package/mcp-configs/reference-mgr/mcp-simple-arxiv.json +0 -20
- package/mcp-configs/reference-mgr/mcp-simple-pubmed.json +0 -20
- package/mcp-configs/reference-mgr/mcp-zotero.json +0 -21
- package/mcp-configs/reference-mgr/mendeley-mcp.json +0 -20
- package/mcp-configs/reference-mgr/ncbi-mcp-server.json +0 -22
- package/mcp-configs/reference-mgr/onecite.json +0 -21
- package/mcp-configs/reference-mgr/paper-search-mcp.json +0 -21
- package/mcp-configs/reference-mgr/pubmed-search-mcp.json +0 -21
- package/mcp-configs/reference-mgr/scholar-mcp.json +0 -21
- package/mcp-configs/reference-mgr/scholar-multi-mcp.json +0 -21
- package/mcp-configs/reference-mgr/seerai.json +0 -21
- package/mcp-configs/reference-mgr/semantic-scholar-fastmcp.json +0 -21
- package/mcp-configs/reference-mgr/sourcelibrary.json +0 -20
- package/mcp-configs/registry.json +0 -476
- package/mcp-configs/repository/dataverse-mcp.json +0 -33
- package/mcp-configs/repository/huggingface-mcp.json +0 -29
- package/skills/domains/business/xpert-bi-guide/SKILL.md +0 -84
- package/skills/domains/education/edumcp-guide/SKILL.md +0 -74
- package/skills/literature/search/paper-search-mcp-guide/SKILL.md +0 -107
- package/skills/research/automation/mcp-server-guide/SKILL.md +0 -211
- package/skills/tools/knowledge-graph/paperpile-notion-guide/SKILL.md +0 -84
- package/src/tools/semantic-scholar.ts +0 -66
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
---
|
|
2
2
|
name: semantic-scholar-recs-guide
|
|
3
|
-
description: "
|
|
3
|
+
description: "Paper discovery via recommendation APIs (OpenAlex, CrossRef citation networks)"
|
|
4
4
|
metadata:
|
|
5
5
|
openclaw:
|
|
6
6
|
emoji: "🤖"
|
|
@@ -10,70 +10,72 @@ metadata:
|
|
|
10
10
|
source: "wentor-research-plugins"
|
|
11
11
|
---
|
|
12
12
|
|
|
13
|
-
#
|
|
13
|
+
# Paper Discovery via OpenAlex & CrossRef
|
|
14
14
|
|
|
15
|
-
Leverage the
|
|
15
|
+
Leverage the OpenAlex and CrossRef APIs to discover related papers, traverse citation networks, and build comprehensive reading lists programmatically.
|
|
16
16
|
|
|
17
17
|
## Overview
|
|
18
18
|
|
|
19
|
-
|
|
19
|
+
OpenAlex indexes over 250 million academic works and provides a free, no-key-required API that supports:
|
|
20
20
|
|
|
21
|
-
-
|
|
22
|
-
- Recommendations based on positive and negative seed papers
|
|
21
|
+
- Work search by title, keyword, or DOI
|
|
23
22
|
- Citation and reference graph traversal
|
|
24
23
|
- Author profiles and publication histories
|
|
25
|
-
-
|
|
24
|
+
- Concept-based discovery across disciplines
|
|
25
|
+
- Institutional and venue filtering
|
|
26
26
|
|
|
27
|
-
Base URL: `https://api.
|
|
28
|
-
|
|
27
|
+
Base URL: `https://api.openalex.org`
|
|
28
|
+
CrossRef URL: `https://api.crossref.org`
|
|
29
29
|
|
|
30
|
-
##
|
|
30
|
+
## Finding Related Papers
|
|
31
31
|
|
|
32
|
-
|
|
32
|
+
Use OpenAlex's concept graph and citation data to discover related work from seed papers.
|
|
33
33
|
|
|
34
|
-
###
|
|
34
|
+
### Concept-Based Discovery
|
|
35
35
|
|
|
36
36
|
```python
|
|
37
37
|
import requests
|
|
38
38
|
|
|
39
|
-
|
|
39
|
+
HEADERS = {"User-Agent": "ResearchPlugins/1.0 (https://wentor.ai)"}
|
|
40
|
+
WORK_ID = "W2741809807" # OpenAlex work ID
|
|
40
41
|
|
|
42
|
+
# Get the seed paper's concepts
|
|
41
43
|
response = requests.get(
|
|
42
|
-
f"https://api.
|
|
43
|
-
|
|
44
|
-
"fields": "title,authors,year,citationCount,abstract,externalIds",
|
|
45
|
-
"limit": 20
|
|
46
|
-
},
|
|
47
|
-
headers={"x-api-key": "YOUR_API_KEY"} # optional, increases rate limit
|
|
44
|
+
f"https://api.openalex.org/works/{WORK_ID}",
|
|
45
|
+
headers=HEADERS
|
|
48
46
|
)
|
|
49
|
-
|
|
50
|
-
for
|
|
51
|
-
|
|
47
|
+
paper = response.json()
|
|
48
|
+
concepts = [c["id"] for c in paper.get("concepts", [])[:3]]
|
|
49
|
+
|
|
50
|
+
# Find works sharing the same concepts, sorted by citations
|
|
51
|
+
for concept_id in concepts:
|
|
52
|
+
related = requests.get(
|
|
53
|
+
"https://api.openalex.org/works",
|
|
54
|
+
params={"filter": f"concepts.id:{concept_id}", "sort": "cited_by_count:desc", "per_page": 10},
|
|
55
|
+
headers=HEADERS
|
|
56
|
+
)
|
|
57
|
+
for w in related.json().get("results", []):
|
|
58
|
+
print(f"[{w.get('publication_year')}] {w.get('title')} (citations: {w.get('cited_by_count')})")
|
|
52
59
|
```
|
|
53
60
|
|
|
54
|
-
###
|
|
61
|
+
### CrossRef Subject-Based Discovery
|
|
55
62
|
|
|
56
63
|
```python
|
|
57
64
|
import requests
|
|
58
65
|
|
|
59
|
-
|
|
60
|
-
"
|
|
61
|
-
|
|
62
|
-
"
|
|
63
|
-
|
|
64
|
-
|
|
65
|
-
|
|
66
|
-
]
|
|
67
|
-
|
|
68
|
-
|
|
69
|
-
|
|
70
|
-
"
|
|
71
|
-
|
|
72
|
-
params={"fields": "title,year,citationCount,url,abstract", "limit": 30}
|
|
73
|
-
)
|
|
74
|
-
|
|
75
|
-
results = response.json()["recommendedPapers"]
|
|
76
|
-
print(f"Found {len(results)} recommended papers")
|
|
66
|
+
def search_crossref(query, limit=10, sort="is-referenced-by-count"):
|
|
67
|
+
"""Search CrossRef for papers sorted by citation count."""
|
|
68
|
+
resp = requests.get(
|
|
69
|
+
"https://api.crossref.org/works",
|
|
70
|
+
params={"query": query, "rows": limit, "sort": sort, "order": "desc"},
|
|
71
|
+
headers={"User-Agent": "ResearchPlugins/1.0 (https://wentor.ai; mailto:dev@wentor.ai)"}
|
|
72
|
+
)
|
|
73
|
+
return resp.json().get("message", {}).get("items", [])
|
|
74
|
+
|
|
75
|
+
results = search_crossref("transformer attention mechanism")
|
|
76
|
+
for w in results:
|
|
77
|
+
title = w.get("title", [""])[0] if w.get("title") else ""
|
|
78
|
+
print(f" {title} — Cited by: {w.get('is-referenced-by-count', 0)}")
|
|
77
79
|
```
|
|
78
80
|
|
|
79
81
|
## Citation Network Traversal
|
|
@@ -83,48 +85,49 @@ Walk the citation graph to discover foundational and derivative works.
|
|
|
83
85
|
### Forward Citations (Who Cited This Paper?)
|
|
84
86
|
|
|
85
87
|
```python
|
|
86
|
-
|
|
88
|
+
work_id = "W2741809807"
|
|
87
89
|
|
|
88
90
|
response = requests.get(
|
|
89
|
-
|
|
91
|
+
"https://api.openalex.org/works",
|
|
90
92
|
params={
|
|
91
|
-
"
|
|
92
|
-
"
|
|
93
|
-
"
|
|
94
|
-
}
|
|
93
|
+
"filter": f"cites:{work_id}",
|
|
94
|
+
"sort": "cited_by_count:desc",
|
|
95
|
+
"per_page": 20
|
|
96
|
+
},
|
|
97
|
+
headers=HEADERS
|
|
95
98
|
)
|
|
96
99
|
|
|
97
|
-
|
|
98
|
-
|
|
99
|
-
citations.sort(key=lambda x: x["citingPaper"]["citationCount"], reverse=True)
|
|
100
|
-
for c in citations[:10]:
|
|
101
|
-
p = c["citingPaper"]
|
|
102
|
-
print(f" [{p['year']}] {p['title']} ({p['citationCount']} cites)")
|
|
100
|
+
for w in response.json().get("results", []):
|
|
101
|
+
print(f" [{w.get('publication_year')}] {w.get('title')} ({w.get('cited_by_count')} cites)")
|
|
103
102
|
```
|
|
104
103
|
|
|
105
104
|
### Backward References (What Did This Paper Cite?)
|
|
106
105
|
|
|
107
106
|
```python
|
|
108
107
|
response = requests.get(
|
|
109
|
-
f"https://api.
|
|
110
|
-
|
|
108
|
+
f"https://api.openalex.org/works/{work_id}",
|
|
109
|
+
headers=HEADERS
|
|
111
110
|
)
|
|
111
|
+
paper = response.json()
|
|
112
|
+
ref_ids = paper.get("referenced_works", [])
|
|
112
113
|
|
|
113
|
-
|
|
114
|
-
|
|
114
|
+
# Fetch details for referenced works
|
|
115
|
+
for ref_id in ref_ids[:20]:
|
|
116
|
+
ref = requests.get(f"https://api.openalex.org/works/{ref_id.split('/')[-1]}", headers=HEADERS).json()
|
|
117
|
+
print(f" [{ref.get('publication_year')}] {ref.get('title')} ({ref.get('cited_by_count')} cites)")
|
|
115
118
|
```
|
|
116
119
|
|
|
117
120
|
## Building a Reading List Pipeline
|
|
118
121
|
|
|
119
|
-
Combine search,
|
|
122
|
+
Combine search, concept discovery, and citation traversal into a discovery pipeline:
|
|
120
123
|
|
|
121
124
|
| Step | Method | Purpose |
|
|
122
125
|
|------|--------|---------|
|
|
123
126
|
| 1. Seed selection | Manual or keyword search | Identify 3-5 highly relevant papers |
|
|
124
|
-
| 2. Expand via
|
|
125
|
-
| 3. Forward citation |
|
|
126
|
-
| 4. Backward citation |
|
|
127
|
-
| 5. Deduplicate |
|
|
127
|
+
| 2. Expand via concepts | OpenAlex concept graph | Find thematically related work |
|
|
128
|
+
| 3. Forward citation | OpenAlex cites filter | Find recent derivative works |
|
|
129
|
+
| 4. Backward citation | referenced_works field | Find foundational papers |
|
|
130
|
+
| 5. Deduplicate | OpenAlex work ID matching | Remove duplicates across steps |
|
|
128
131
|
| 6. Rank & filter | Sort by year, citations, relevance | Prioritize reading order |
|
|
129
132
|
|
|
130
133
|
```python
|
|
@@ -133,32 +136,46 @@ def build_reading_list(seed_ids, max_papers=50):
|
|
|
133
136
|
seen = set()
|
|
134
137
|
candidates = []
|
|
135
138
|
|
|
136
|
-
|
|
137
|
-
|
|
138
|
-
|
|
139
|
-
|
|
140
|
-
|
|
141
|
-
|
|
142
|
-
|
|
143
|
-
|
|
144
|
-
|
|
145
|
-
|
|
146
|
-
|
|
147
|
-
|
|
148
|
-
|
|
149
|
-
|
|
150
|
-
|
|
151
|
-
|
|
152
|
-
|
|
153
|
-
|
|
139
|
+
for seed_id in seed_ids:
|
|
140
|
+
# Get concepts from seed paper
|
|
141
|
+
paper = requests.get(f"https://api.openalex.org/works/{seed_id}", headers=HEADERS).json()
|
|
142
|
+
concept_ids = [c["id"] for c in paper.get("concepts", [])[:2]]
|
|
143
|
+
|
|
144
|
+
# Find related works via concepts
|
|
145
|
+
for cid in concept_ids:
|
|
146
|
+
related = requests.get(
|
|
147
|
+
"https://api.openalex.org/works",
|
|
148
|
+
params={"filter": f"concepts.id:{cid}", "sort": "cited_by_count:desc", "per_page": 20},
|
|
149
|
+
headers=HEADERS
|
|
150
|
+
).json().get("results", [])
|
|
151
|
+
for w in related:
|
|
152
|
+
wid = w.get("id", "").split("/")[-1]
|
|
153
|
+
if wid not in seen:
|
|
154
|
+
seen.add(wid)
|
|
155
|
+
candidates.append(w)
|
|
156
|
+
|
|
157
|
+
# Get citing works
|
|
158
|
+
citing = requests.get(
|
|
159
|
+
"https://api.openalex.org/works",
|
|
160
|
+
params={"filter": f"cites:{seed_id}", "sort": "cited_by_count:desc", "per_page": 20},
|
|
161
|
+
headers=HEADERS
|
|
162
|
+
).json().get("results", [])
|
|
163
|
+
for w in citing:
|
|
164
|
+
wid = w.get("id", "").split("/")[-1]
|
|
165
|
+
if wid not in seen:
|
|
166
|
+
seen.add(wid)
|
|
167
|
+
candidates.append(w)
|
|
168
|
+
|
|
169
|
+
# Rank by citation count and recency
|
|
170
|
+
candidates.sort(key=lambda p: (p.get("publication_year", 0), p.get("cited_by_count", 0)), reverse=True)
|
|
154
171
|
return candidates[:max_papers]
|
|
155
172
|
```
|
|
156
173
|
|
|
157
|
-
##
|
|
174
|
+
## Best Practices
|
|
158
175
|
|
|
159
|
-
-
|
|
160
|
-
-
|
|
161
|
-
- Always include only the fields you need to reduce payload size
|
|
162
|
-
- Use `
|
|
176
|
+
- OpenAlex is free with no API key required; use a polite `User-Agent` header
|
|
177
|
+
- CrossRef requires a polite pool user agent with contact info for higher rate limits
|
|
178
|
+
- Always include only the fields you need via `select` parameter to reduce payload size
|
|
179
|
+
- Use `page` and `per_page` for pagination on large result sets
|
|
163
180
|
- Cache responses locally to avoid redundant requests
|
|
164
|
-
- Use DOI
|
|
181
|
+
- Use DOI as the universal identifier for cross-system compatibility
|
|
@@ -1,14 +1,15 @@
|
|
|
1
1
|
---
|
|
2
2
|
name: fulltext-skills
|
|
3
|
-
description: "
|
|
3
|
+
description: "16 full-text access skills. Trigger: accessing paper PDFs, bulk downloading, open access, text mining. Design: legal full-text retrieval from open repositories, archives, and preprint servers."
|
|
4
4
|
---
|
|
5
5
|
|
|
6
|
-
# Full-Text Access —
|
|
6
|
+
# Full-Text Access — 16 Skills
|
|
7
7
|
|
|
8
8
|
Select the skill matching the user's need, then `read` its SKILL.md.
|
|
9
9
|
|
|
10
10
|
| Skill | Description |
|
|
11
11
|
|-------|-------------|
|
|
12
|
+
| [arxiv-latex-source](./arxiv-latex-source/SKILL.md) | Download and parse LaTeX source files from arXiv preprints |
|
|
12
13
|
| [bioc-pmc-api](./bioc-pmc-api/SKILL.md) | Access PMC Open Access articles in BioC format for text mining |
|
|
13
14
|
| [core-api-guide](./core-api-guide/SKILL.md) | Search and retrieve open access research papers via CORE aggregator |
|
|
14
15
|
| [dataverse-api](./dataverse-api/SKILL.md) | Deposit and discover research datasets via Harvard Dataverse API |
|
|
@@ -0,0 +1,195 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: arxiv-latex-source
|
|
3
|
+
description: "Download and parse LaTeX source files from arXiv preprints"
|
|
4
|
+
metadata:
|
|
5
|
+
openclaw:
|
|
6
|
+
emoji: "📜"
|
|
7
|
+
category: "literature"
|
|
8
|
+
subcategory: "fulltext"
|
|
9
|
+
keywords: ["arXiv", "LaTeX source", "paper parsing", "formula extraction", "full text", "preprint"]
|
|
10
|
+
source: "https://info.arxiv.org/help/bulk_data_s3.html"
|
|
11
|
+
---
|
|
12
|
+
|
|
13
|
+
# arXiv LaTeX Source Access Guide
|
|
14
|
+
|
|
15
|
+
## Overview
|
|
16
|
+
|
|
17
|
+
arXiv stores the original LaTeX source files for the vast majority of its 2.4 million+ preprints. Accessing LaTeX source provides major advantages over PDF parsing: exact mathematical notation as written by the author, structured sections and labels, machine-readable bibliography entries, and intact figure captions, table data, and cross-references.
|
|
18
|
+
|
|
19
|
+
For formula extraction, citation graph construction, section-level text analysis, or training data curation for scientific language models, LaTeX source is the gold standard. PDF parsing introduces OCR errors in equations, loses structural hierarchy, and mangles complex tables.
|
|
20
|
+
|
|
21
|
+
The e-print endpoint serves source bundles as gzip-compressed tarballs (`.tar.gz`) containing `.tex` files, figures, `.bib`/`.bbl` bibliography files, style files, and supplementary materials. No authentication is required.
|
|
22
|
+
|
|
23
|
+
## Authentication
|
|
24
|
+
|
|
25
|
+
No authentication or API key is required. The e-print endpoint is publicly accessible. However, arXiv asks that automated tools set a descriptive `User-Agent` header and comply with rate limits.
|
|
26
|
+
|
|
27
|
+
## Core Endpoints
|
|
28
|
+
|
|
29
|
+
### Download LaTeX Source
|
|
30
|
+
|
|
31
|
+
- **URL**: `GET https://arxiv.org/e-print/{arxiv_id}`
|
|
32
|
+
- **Response**: `application/gzip` — a `.tar.gz` archive containing the source files
|
|
33
|
+
- **Parameters**:
|
|
34
|
+
| Param | Type | Required | Description |
|
|
35
|
+
|-------|------|----------|-------------|
|
|
36
|
+
| arxiv_id | string | Yes | arXiv identifier, e.g. `2301.00001` or `2301.00001v2` for a specific version |
|
|
37
|
+
|
|
38
|
+
- **Example**:
|
|
39
|
+
```bash
|
|
40
|
+
# Download source archive (response: 200, application/gzip, ~1.3 MB)
|
|
41
|
+
curl -sL -o source.tar.gz "https://arxiv.org/e-print/2301.00001"
|
|
42
|
+
|
|
43
|
+
# List archive contents
|
|
44
|
+
tar tz -f source.tar.gz | head -10
|
|
45
|
+
# ACM-Reference-Format.bbx
|
|
46
|
+
# ACM-Reference-Format.bst
|
|
47
|
+
# Image_1.jpg
|
|
48
|
+
# README.txt
|
|
49
|
+
# acmart.cls
|
|
50
|
+
```
|
|
51
|
+
|
|
52
|
+
- **Content-Disposition header**: `attachment; filename="arXiv-2301.00001v1.tar.gz"`
|
|
53
|
+
- **ETag**: SHA-256 hash provided for caching: `sha256:f1ffe8ec...`
|
|
54
|
+
|
|
55
|
+
### Format Detection
|
|
56
|
+
|
|
57
|
+
The endpoint almost always returns a gzip-compressed tar archive. Rare cases (very old or single-file submissions) may return a single gzip-compressed `.tex` file without tar wrapper. Always verify format before extracting:
|
|
58
|
+
|
|
59
|
+
```bash
|
|
60
|
+
curl -sL "https://arxiv.org/e-print/{arxiv_id}" -o source.gz
|
|
61
|
+
file source.gz # "gzip compressed data, was 'XXXX.tar', ..."
|
|
62
|
+
```
|
|
63
|
+
|
|
64
|
+
### Metadata API (Companion)
|
|
65
|
+
|
|
66
|
+
Pair source downloads with the arXiv Atom API for structured metadata:
|
|
67
|
+
|
|
68
|
+
- **URL**: `GET https://export.arxiv.org/api/query?id_list={arxiv_id}`
|
|
69
|
+
- **Response**: Atom XML with `<title>`, `<author>`, `<summary>`, `<category>`, `<published>`
|
|
70
|
+
- **Example**: `curl -s "https://export.arxiv.org/api/query?id_list=2301.00001"`
|
|
71
|
+
|
|
72
|
+
## LaTeX Source Parsing Guide
|
|
73
|
+
|
|
74
|
+
### Locating the Main .tex File
|
|
75
|
+
|
|
76
|
+
A source archive typically contains multiple files. To find the main document:
|
|
77
|
+
|
|
78
|
+
1. Look for `\documentclass` in `.tex` files — this marks the root document
|
|
79
|
+
2. Check for a `README.txt` that may specify the main file
|
|
80
|
+
3. If multiple `.tex` files contain `\documentclass`, prefer the one with `\begin{document}`
|
|
81
|
+
|
|
82
|
+
```python
|
|
83
|
+
import tarfile, re
|
|
84
|
+
|
|
85
|
+
def find_main_tex(tar_path):
|
|
86
|
+
with tarfile.open(tar_path, 'r:gz') as tar:
|
|
87
|
+
tex_files = [m for m in tar.getmembers() if m.name.endswith('.tex')]
|
|
88
|
+
for member in tex_files:
|
|
89
|
+
content = tar.extractfile(member).read().decode('utf-8', errors='ignore')
|
|
90
|
+
if r'\documentclass' in content and r'\begin{document}' in content:
|
|
91
|
+
return member.name, content
|
|
92
|
+
return None, None
|
|
93
|
+
```
|
|
94
|
+
|
|
95
|
+
### Extracting Sections
|
|
96
|
+
|
|
97
|
+
LaTeX sections follow a predictable hierarchy:
|
|
98
|
+
|
|
99
|
+
```python
|
|
100
|
+
import re
|
|
101
|
+
|
|
102
|
+
def extract_sections(tex_content):
|
|
103
|
+
pattern = r'\\(section|subsection|subsubsection)\{([^}]+)\}'
|
|
104
|
+
sections = re.findall(pattern, tex_content)
|
|
105
|
+
return [(level, title) for level, title in sections]
|
|
106
|
+
# [('section', 'Introduction'), ('section', 'Related Work'), ...]
|
|
107
|
+
```
|
|
108
|
+
|
|
109
|
+
### Extracting Equations
|
|
110
|
+
|
|
111
|
+
```python
|
|
112
|
+
def extract_equations(tex_content):
|
|
113
|
+
patterns = [
|
|
114
|
+
r'\\\[(.+?)\\\]',
|
|
115
|
+
r'\\begin\{equation\}(.+?)\\end\{equation\}',
|
|
116
|
+
r'\\begin\{align\*?\}(.+?)\\end\{align\*?\}',
|
|
117
|
+
]
|
|
118
|
+
equations = []
|
|
119
|
+
for pat in patterns:
|
|
120
|
+
equations.extend(re.findall(pat, tex_content, re.DOTALL))
|
|
121
|
+
return equations
|
|
122
|
+
```
|
|
123
|
+
|
|
124
|
+
### Extracting Bibliography
|
|
125
|
+
|
|
126
|
+
Parse `.bib` files (BibTeX entries) or `.bbl` files (compiled `\bibitem` commands):
|
|
127
|
+
|
|
128
|
+
```python
|
|
129
|
+
def extract_bibliography(tar_path):
|
|
130
|
+
refs = []
|
|
131
|
+
with tarfile.open(tar_path, 'r:gz') as tar:
|
|
132
|
+
for member in tar.getmembers():
|
|
133
|
+
if member.name.endswith('.bib'):
|
|
134
|
+
content = tar.extractfile(member).read().decode('utf-8', errors='ignore')
|
|
135
|
+
refs.extend(re.findall(r'@\w+\{([^,]+),(.+?)\n\}', content, re.DOTALL))
|
|
136
|
+
elif member.name.endswith('.bbl'):
|
|
137
|
+
content = tar.extractfile(member).read().decode('utf-8', errors='ignore')
|
|
138
|
+
refs.extend(re.findall(r'\\bibitem.*?\{(.+?)\}', content))
|
|
139
|
+
return refs
|
|
140
|
+
```
|
|
141
|
+
|
|
142
|
+
## Rate Limits
|
|
143
|
+
|
|
144
|
+
- **Maximum**: 4 requests per second for automated access
|
|
145
|
+
- **Recommended**: 1 request/second with delays between sequential downloads
|
|
146
|
+
- **Bulk access**: For 1000+ papers, use the arXiv S3 bulk data mirror instead
|
|
147
|
+
- **HTTP 429**: Rate limit exceeded; implement exponential backoff
|
|
148
|
+
- **User-Agent**: Required — set a descriptive string: `MyTool/1.0 (mailto:user@university.edu)`
|
|
149
|
+
- Persistent abuse may result in IP-level blocks
|
|
150
|
+
|
|
151
|
+
## Academic Use Cases
|
|
152
|
+
|
|
153
|
+
- **Formula extraction for ML training** — Build equation datasets with ground-truth LaTeX notation, free of OCR noise from PDF parsing
|
|
154
|
+
- **Citation network analysis** — Parse `.bib`/`.bbl` files for exact reference keys to construct citation graphs
|
|
155
|
+
- **Section-level text analysis** — Extract specific sections (e.g., all "Related Work" across a subfield) for systematic reviews
|
|
156
|
+
- **Reproducibility auditing** — Examine algorithm environments, hyperparameter tables, and methodology sections
|
|
157
|
+
- **Cross-paper notation alignment** — Compare and normalize equation environments across papers in a subfield
|
|
158
|
+
|
|
159
|
+
## Complete Python Example
|
|
160
|
+
|
|
161
|
+
```python
|
|
162
|
+
import requests, tarfile, io, re, time, gzip
|
|
163
|
+
|
|
164
|
+
def download_arxiv_source(arxiv_id, delay=1.0):
|
|
165
|
+
"""Download and extract all .tex files from an arXiv paper's source."""
|
|
166
|
+
url = f"https://arxiv.org/e-print/{arxiv_id}"
|
|
167
|
+
headers = {"User-Agent": "ResearchTool/1.0 (mailto:user@example.com)"}
|
|
168
|
+
resp = requests.get(url, headers=headers)
|
|
169
|
+
resp.raise_for_status()
|
|
170
|
+
time.sleep(delay)
|
|
171
|
+
|
|
172
|
+
buf = io.BytesIO(resp.content)
|
|
173
|
+
try:
|
|
174
|
+
with tarfile.open(fileobj=buf, mode='r:gz') as tar:
|
|
175
|
+
return {m.name: tar.extractfile(m).read().decode('utf-8', errors='ignore')
|
|
176
|
+
for m in tar.getmembers() if m.name.endswith('.tex') and m.isfile()}
|
|
177
|
+
except tarfile.ReadError:
|
|
178
|
+
buf.seek(0)
|
|
179
|
+
return {"main.tex": gzip.decompress(buf.read()).decode('utf-8', errors='ignore')}
|
|
180
|
+
|
|
181
|
+
# Usage
|
|
182
|
+
sources = download_arxiv_source("2301.00001")
|
|
183
|
+
for fname, content in sources.items():
|
|
184
|
+
if r'\documentclass' in content:
|
|
185
|
+
sections = re.findall(r'\\section\{([^}]+)\}', content)
|
|
186
|
+
equations = re.findall(r'\\begin\{equation\}(.+?)\\end\{equation\}', content, re.DOTALL)
|
|
187
|
+
print(f"{fname}: {len(sections)} sections, {len(equations)} equations")
|
|
188
|
+
```
|
|
189
|
+
|
|
190
|
+
## References
|
|
191
|
+
|
|
192
|
+
- arXiv e-print access: https://info.arxiv.org/help/bulk_data_s3.html
|
|
193
|
+
- arXiv API documentation: https://info.arxiv.org/help/api/index.html
|
|
194
|
+
- arXiv terms of use: https://info.arxiv.org/help/api/tou.html
|
|
195
|
+
- arXiv S3 bulk data: https://info.arxiv.org/help/bulk_data_s3.html
|
|
@@ -84,7 +84,7 @@ else:
|
|
|
84
84
|
| SSRN | Preprint server | Social sciences, law, economics | ssrn.com |
|
|
85
85
|
| Zenodo | Repository | All disciplines | zenodo.org |
|
|
86
86
|
| CORE | Aggregator | 300M+ papers from repositories | core.ac.uk |
|
|
87
|
-
|
|
|
87
|
+
| OpenAlex | Search + OA links | Cross-disciplinary | openalex.org |
|
|
88
88
|
| BASE (Bielefeld) | Aggregator | 400M+ documents | base-search.net |
|
|
89
89
|
|
|
90
90
|
### Batch OA Lookup
|
|
@@ -93,11 +93,11 @@ Unpaywall / OpenAlex:
|
|
|
93
93
|
- Use: Find OA versions of any DOI
|
|
94
94
|
- Best for: Locating freely available versions of papers
|
|
95
95
|
|
|
96
|
-
|
|
97
|
-
- Coverage:
|
|
98
|
-
- Access: Free API,
|
|
99
|
-
- Features:
|
|
100
|
-
- Best for:
|
|
96
|
+
OpenAlex:
|
|
97
|
+
- Coverage: 250M+ works, all disciplines
|
|
98
|
+
- Access: Free API, no key required
|
|
99
|
+
- Features: Concepts, citation counts, author profiles, institution data
|
|
100
|
+
- Best for: Cross-disciplinary metadata and OA discovery
|
|
101
101
|
```
|
|
102
102
|
|
|
103
103
|
## Full-Text Retrieval and Parsing
|
|
@@ -49,7 +49,7 @@ Whether you are conducting a systematic literature review, mapping a new researc
|
|
|
49
49
|
|
|
50
50
|
| Source | Coverage | API | Cost |
|
|
51
51
|
|--------|----------|-----|------|
|
|
52
|
-
|
|
|
52
|
+
| OpenAlex | 250M+ works, all disciplines | REST API, free | Free (no key required) |
|
|
53
53
|
| OpenAlex | 250M+ works, all disciplines | REST API, free | Free |
|
|
54
54
|
| Crossref | 140M+ DOIs | REST API | Free |
|
|
55
55
|
| Web of Science | Curated, multi-disciplinary | Institutional | Licensed |
|
|
@@ -219,7 +219,7 @@ Traditional citations take years to accumulate. Altmetrics capture immediate att
|
|
|
219
219
|
|
|
220
220
|
## Best Practices
|
|
221
221
|
|
|
222
|
-
- **Combine multiple data sources.** No single database has complete coverage. Merge OpenAlex and
|
|
222
|
+
- **Combine multiple data sources.** No single database has complete coverage. Merge OpenAlex and CrossRef for best results.
|
|
223
223
|
- **Normalize by field and age.** A 2024 paper in biology and a 2024 paper in mathematics have very different citation rate baselines.
|
|
224
224
|
- **Use relative indicators.** Field-Weighted Citation Impact (FWCI) accounts for disciplinary differences.
|
|
225
225
|
- **Do not equate citations with quality.** Retracted papers sometimes have high citation counts. Controversial papers accumulate criticism citations.
|
|
@@ -229,7 +229,7 @@ Traditional citations take years to accumulate. Altmetrics capture immediate att
|
|
|
229
229
|
## References
|
|
230
230
|
|
|
231
231
|
- [OpenAlex API](https://docs.openalex.org/) -- Free, open bibliographic data
|
|
232
|
-
- [
|
|
232
|
+
- [CrossRef API](https://api.crossref.org/) -- DOI resolution and metadata
|
|
233
233
|
- [VOSviewer](https://www.vosviewer.com/) -- Bibliometric visualization tool
|
|
234
234
|
- [bibliometrix R package](https://www.bibliometrix.org/) -- Comprehensive bibliometric analysis
|
|
235
235
|
- [Altmetric](https://www.altmetric.com/) -- Alternative impact metrics
|
|
@@ -115,33 +115,6 @@ for source in results:
|
|
|
115
115
|
|
|
116
116
|
Google Scholar profiles automatically display h-index and i10-index. No calculation needed, but coverage is the broadest (includes non-peer-reviewed sources).
|
|
117
117
|
|
|
118
|
-
### From Semantic Scholar API
|
|
119
|
-
|
|
120
|
-
```python
|
|
121
|
-
def get_author_h_index(author_name):
|
|
122
|
-
"""Calculate h-index for an author using Semantic Scholar."""
|
|
123
|
-
# Search for author
|
|
124
|
-
search_resp = requests.get(
|
|
125
|
-
"https://api.semanticscholar.org/graph/v1/author/search",
|
|
126
|
-
params={"query": author_name, "limit": 1}
|
|
127
|
-
)
|
|
128
|
-
authors = search_resp.json().get("data", [])
|
|
129
|
-
if not authors:
|
|
130
|
-
return None
|
|
131
|
-
|
|
132
|
-
author_id = authors[0]["authorId"]
|
|
133
|
-
|
|
134
|
-
# Get all papers with citation counts
|
|
135
|
-
papers_resp = requests.get(
|
|
136
|
-
f"https://api.semanticscholar.org/graph/v1/author/{author_id}/papers",
|
|
137
|
-
params={"fields": "citationCount", "limit": 1000}
|
|
138
|
-
)
|
|
139
|
-
papers = papers_resp.json().get("data", [])
|
|
140
|
-
citation_counts = [p.get("citationCount", 0) for p in papers]
|
|
141
|
-
|
|
142
|
-
return calculate_h_index(citation_counts)
|
|
143
|
-
```
|
|
144
|
-
|
|
145
118
|
### From OpenAlex
|
|
146
119
|
|
|
147
120
|
```python
|
|
@@ -1,9 +1,9 @@
|
|
|
1
1
|
---
|
|
2
2
|
name: search-skills
|
|
3
|
-
description: "
|
|
3
|
+
description: "31 database search skills. Trigger: finding papers, search strategies, querying academic databases. Design: one skill per database/tool with API details, query syntax, and rate limits."
|
|
4
4
|
---
|
|
5
5
|
|
|
6
|
-
# Database Search —
|
|
6
|
+
# Database Search — 31 Skills
|
|
7
7
|
|
|
8
8
|
Select the skill matching the user's need, then `read` its SKILL.md.
|
|
9
9
|
|
|
@@ -33,11 +33,10 @@ Select the skill matching the user's need, then `read` its SKILL.md.
|
|
|
33
33
|
| [open-semantic-search-guide](./open-semantic-search-guide/SKILL.md) | Self-hosted semantic search and text mining platform |
|
|
34
34
|
| [openaire-api](./openaire-api/SKILL.md) | Search EU-funded research outputs via the OpenAIRE Graph API |
|
|
35
35
|
| [openalex-api](./openalex-api/SKILL.md) | Query the OpenAlex catalog of scholarly works, authors, and institutions |
|
|
36
|
-
| [paper-search-mcp-guide](./paper-search-mcp-guide/SKILL.md) | MCP server for searching papers across arXiv, PubMed, bioRxiv |
|
|
37
36
|
| [plos-open-access-api](./plos-open-access-api/SKILL.md) | Search PLOS open access journals with full-text Solr-powered API |
|
|
38
37
|
| [pubmed-api](./pubmed-api/SKILL.md) | Search biomedical literature and retrieve records via PubMed E-utilities |
|
|
39
38
|
| [scielo-api](./scielo-api/SKILL.md) | Access Latin American and developing world research via SciELO API |
|
|
40
|
-
| [semantic-scholar-api](./semantic-scholar-api/SKILL.md) | Search papers and analyze citation graphs via
|
|
39
|
+
| [semantic-scholar-api](./semantic-scholar-api/SKILL.md) | Search papers and analyze citation graphs via OpenAlex and CrossRef APIs |
|
|
41
40
|
| [share-research-api](./share-research-api/SKILL.md) | Discover open access research outputs via the SHARE notification API |
|
|
42
41
|
| [systematic-search-strategy](./systematic-search-strategy/SKILL.md) | Construct rigorous systematic search strategies for literature reviews |
|
|
43
42
|
| [worldcat-search-api](./worldcat-search-api/SKILL.md) | Search the world's largest library catalog via OCLC WorldCat API |
|