@wentorai/research-plugins 1.4.0 → 1.4.3
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.en.md +143 -0
- package/README.md +98 -131
- package/curated/literature/README.md +2 -2
- package/curated/writing/README.md +1 -1
- package/openclaw.plugin.json +1 -1
- package/package.json +1 -1
- package/skills/literature/discovery/SKILL.md +1 -1
- package/skills/literature/discovery/citation-alert-guide/SKILL.md +2 -2
- package/skills/literature/discovery/conference-proceedings-guide/SKILL.md +2 -2
- package/skills/literature/discovery/literature-mapping-guide/SKILL.md +1 -1
- package/skills/literature/discovery/paper-recommendation-guide/SKILL.md +8 -14
- package/skills/literature/discovery/rss-paper-feeds/SKILL.md +20 -14
- package/skills/literature/discovery/semantic-paper-radar/SKILL.md +8 -8
- package/skills/literature/discovery/semantic-scholar-recs-guide/SKILL.md +103 -86
- package/skills/literature/fulltext/open-access-guide/SKILL.md +1 -1
- package/skills/literature/fulltext/open-access-mining-guide/SKILL.md +5 -5
- package/skills/literature/metadata/citation-network-guide/SKILL.md +3 -3
- package/skills/literature/metadata/h-index-guide/SKILL.md +0 -27
- package/skills/literature/search/SKILL.md +1 -1
- package/skills/literature/search/citation-chaining-guide/SKILL.md +42 -32
- package/skills/literature/search/database-comparison-guide/SKILL.md +1 -1
- package/skills/literature/search/semantic-scholar-api/SKILL.md +56 -53
- package/skills/research/automation/paper-to-agent-guide/SKILL.md +1 -1
- package/skills/research/deep-research/in-depth-research-guide/SKILL.md +1 -1
- package/skills/research/deep-research/kosmos-scientist-guide/SKILL.md +3 -3
- package/skills/research/deep-research/llm-scientific-discovery-guide/SKILL.md +1 -1
- package/skills/research/deep-research/local-deep-research-guide/SKILL.md +6 -6
- package/skills/research/deep-research/open-researcher-guide/SKILL.md +3 -3
- package/skills/research/deep-research/tongyi-deep-research-guide/SKILL.md +4 -4
- package/skills/research/methodology/grad-school-guide/SKILL.md +1 -1
- package/skills/research/paper-review/automated-review-guide/SKILL.md +1 -1
- package/skills/tools/diagram/excalidraw-diagram-guide/SKILL.md +1 -1
- package/skills/tools/diagram/mermaid-architect-guide/SKILL.md +1 -1
- package/skills/tools/diagram/plantuml-guide/SKILL.md +1 -1
- package/skills/tools/document/grobid-pdf-parsing/SKILL.md +1 -1
- package/skills/tools/document/paper-parse-guide/SKILL.md +2 -2
- package/skills/tools/knowledge-graph/citation-network-builder/SKILL.md +5 -5
- package/skills/tools/knowledge-graph/knowledge-graph-construction/SKILL.md +1 -1
- package/skills/tools/scraping/academic-web-scraping/SKILL.md +1 -2
- package/skills/tools/scraping/google-scholar-scraper/SKILL.md +7 -7
- package/skills/writing/citation/SKILL.md +1 -1
- package/skills/writing/citation/academic-citation-manager/SKILL.md +20 -17
- package/skills/writing/citation/citation-assistant-skill/SKILL.md +72 -58
- package/skills/writing/citation/onecite-reference-guide/SKILL.md +1 -1
- package/skills/writing/citation/zotero-reference-guide/SKILL.md +1 -1
- package/skills/writing/citation/zotero-scholar-guide/SKILL.md +1 -1
- package/src/tools/arxiv.ts +13 -3
- package/src/tools/biorxiv.ts +21 -5
- package/src/tools/crossref.ts +13 -6
- package/src/tools/datacite.ts +7 -3
- package/src/tools/doaj.ts +3 -2
- package/src/tools/europe-pmc.ts +4 -3
- package/src/tools/hal.ts +6 -4
- package/src/tools/inspire-hep.ts +3 -2
- package/src/tools/openaire.ts +11 -6
- package/src/tools/openalex.ts +17 -2
- package/src/tools/opencitations.ts +9 -0
- package/src/tools/orcid.ts +3 -0
- package/src/tools/osf-preprints.ts +3 -2
- package/src/tools/pubmed.ts +12 -5
- package/src/tools/unpaywall.ts +3 -0
- package/src/tools/util.ts +33 -0
- package/src/tools/zenodo.ts +10 -4
|
@@ -31,16 +31,16 @@ Key components:
|
|
|
31
31
|
- **Vector database**: Stores and indexes embeddings for fast similarity search. Options include ChromaDB (local), Qdrant, Pinecone, or Weaviate.
|
|
32
32
|
- **Similarity metric**: Cosine similarity is standard for comparing text embeddings.
|
|
33
33
|
|
|
34
|
-
### Using
|
|
34
|
+
### Using OpenAlex's Search API
|
|
35
35
|
|
|
36
|
-
|
|
36
|
+
OpenAlex indexes 250M+ works and supports search queries across all disciplines:
|
|
37
37
|
|
|
38
38
|
```bash
|
|
39
|
-
#
|
|
40
|
-
curl "https://api.
|
|
39
|
+
# Search works via the OpenAlex API
|
|
40
|
+
curl "https://api.openalex.org/works?search=attention+mechanisms+for+graph+neural+networks&per_page=20"
|
|
41
41
|
```
|
|
42
42
|
|
|
43
|
-
The search endpoint uses
|
|
43
|
+
The search endpoint uses relevance-ranked matching. Combine with concept filters and citation data for more targeted discovery. For true semantic matching, build a local embedding index (see below).
|
|
44
44
|
|
|
45
45
|
### Building a Personal Semantic Index
|
|
46
46
|
|
|
@@ -84,7 +84,7 @@ This local index lets you search across all papers you have collected using natu
|
|
|
84
84
|
Use semantic search to expand your awareness beyond your current reading:
|
|
85
85
|
|
|
86
86
|
1. **Seed**: Take the abstract of your current paper (or a paragraph describing your research question).
|
|
87
|
-
2. **Search**: Run it as a semantic query against a large corpus (
|
|
87
|
+
2. **Search**: Run it as a semantic query against a large corpus (OpenAlex, CrossRef, or your local index).
|
|
88
88
|
3. **Filter**: Remove papers you have already read. Sort by a combination of semantic similarity and recency.
|
|
89
89
|
4. **Cluster**: Group the top 50 results into thematic clusters using k-means or HDBSCAN on their embeddings.
|
|
90
90
|
5. **Explore clusters**: Each cluster represents a related subtopic. Read the most-cited paper in each cluster to understand the connection to your work.
|
|
@@ -103,7 +103,7 @@ Semantic search excels at finding papers from other fields that address similar
|
|
|
103
103
|
Set up periodic semantic searches to detect new papers in your area:
|
|
104
104
|
|
|
105
105
|
1. Define 3-5 "concept vectors" by encoding descriptions of your core research interests.
|
|
106
|
-
2. Weekly, search against newly published papers (last 7 days) from arXiv or
|
|
106
|
+
2. Weekly, search against newly published papers (last 7 days) from arXiv or OpenAlex.
|
|
107
107
|
3. Rank new papers by maximum similarity to any of your concept vectors.
|
|
108
108
|
4. Papers above your similarity threshold enter your reading queue automatically.
|
|
109
109
|
|
|
@@ -137,7 +137,7 @@ Compare your research question against the semantic landscape of existing work.
|
|
|
137
137
|
|
|
138
138
|
## References
|
|
139
139
|
|
|
140
|
-
-
|
|
140
|
+
- OpenAlex API: https://api.openalex.org
|
|
141
141
|
- SPECTER2 model: https://huggingface.co/allenai/specter2
|
|
142
142
|
- ChromaDB: https://www.trychroma.com
|
|
143
143
|
- ResearchGPT: https://github.com/mukulpatnaik/researchgpt
|
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
---
|
|
2
2
|
name: semantic-scholar-recs-guide
|
|
3
|
-
description: "
|
|
3
|
+
description: "Paper discovery via recommendation APIs (OpenAlex, CrossRef citation networks)"
|
|
4
4
|
metadata:
|
|
5
5
|
openclaw:
|
|
6
6
|
emoji: "🤖"
|
|
@@ -10,70 +10,72 @@ metadata:
|
|
|
10
10
|
source: "wentor-research-plugins"
|
|
11
11
|
---
|
|
12
12
|
|
|
13
|
-
#
|
|
13
|
+
# Paper Discovery via OpenAlex & CrossRef
|
|
14
14
|
|
|
15
|
-
Leverage the
|
|
15
|
+
Leverage the OpenAlex and CrossRef APIs to discover related papers, traverse citation networks, and build comprehensive reading lists programmatically.
|
|
16
16
|
|
|
17
17
|
## Overview
|
|
18
18
|
|
|
19
|
-
|
|
19
|
+
OpenAlex indexes over 250 million academic works and provides a free, no-key-required API that supports:
|
|
20
20
|
|
|
21
|
-
-
|
|
22
|
-
- Recommendations based on positive and negative seed papers
|
|
21
|
+
- Work search by title, keyword, or DOI
|
|
23
22
|
- Citation and reference graph traversal
|
|
24
23
|
- Author profiles and publication histories
|
|
25
|
-
-
|
|
24
|
+
- Concept-based discovery across disciplines
|
|
25
|
+
- Institutional and venue filtering
|
|
26
26
|
|
|
27
|
-
Base URL: `https://api.
|
|
28
|
-
|
|
27
|
+
Base URL: `https://api.openalex.org`
|
|
28
|
+
CrossRef URL: `https://api.crossref.org`
|
|
29
29
|
|
|
30
|
-
##
|
|
30
|
+
## Finding Related Papers
|
|
31
31
|
|
|
32
|
-
|
|
32
|
+
Use OpenAlex's concept graph and citation data to discover related work from seed papers.
|
|
33
33
|
|
|
34
|
-
###
|
|
34
|
+
### Concept-Based Discovery
|
|
35
35
|
|
|
36
36
|
```python
|
|
37
37
|
import requests
|
|
38
38
|
|
|
39
|
-
|
|
39
|
+
HEADERS = {"User-Agent": "ResearchPlugins/1.0 (https://wentor.ai)"}
|
|
40
|
+
WORK_ID = "W2741809807" # OpenAlex work ID
|
|
40
41
|
|
|
42
|
+
# Get the seed paper's concepts
|
|
41
43
|
response = requests.get(
|
|
42
|
-
f"https://api.
|
|
43
|
-
|
|
44
|
-
"fields": "title,authors,year,citationCount,abstract,externalIds",
|
|
45
|
-
"limit": 20
|
|
46
|
-
},
|
|
47
|
-
headers={"x-api-key": "YOUR_API_KEY"} # optional, increases rate limit
|
|
44
|
+
f"https://api.openalex.org/works/{WORK_ID}",
|
|
45
|
+
headers=HEADERS
|
|
48
46
|
)
|
|
49
|
-
|
|
50
|
-
for
|
|
51
|
-
|
|
47
|
+
paper = response.json()
|
|
48
|
+
concepts = [c["id"] for c in paper.get("concepts", [])[:3]]
|
|
49
|
+
|
|
50
|
+
# Find works sharing the same concepts, sorted by citations
|
|
51
|
+
for concept_id in concepts:
|
|
52
|
+
related = requests.get(
|
|
53
|
+
"https://api.openalex.org/works",
|
|
54
|
+
params={"filter": f"concepts.id:{concept_id}", "sort": "cited_by_count:desc", "per_page": 10},
|
|
55
|
+
headers=HEADERS
|
|
56
|
+
)
|
|
57
|
+
for w in related.json().get("results", []):
|
|
58
|
+
print(f"[{w.get('publication_year')}] {w.get('title')} (citations: {w.get('cited_by_count')})")
|
|
52
59
|
```
|
|
53
60
|
|
|
54
|
-
###
|
|
61
|
+
### CrossRef Subject-Based Discovery
|
|
55
62
|
|
|
56
63
|
```python
|
|
57
64
|
import requests
|
|
58
65
|
|
|
59
|
-
|
|
60
|
-
"
|
|
61
|
-
|
|
62
|
-
"
|
|
63
|
-
|
|
64
|
-
|
|
65
|
-
|
|
66
|
-
]
|
|
67
|
-
|
|
68
|
-
|
|
69
|
-
|
|
70
|
-
"
|
|
71
|
-
|
|
72
|
-
params={"fields": "title,year,citationCount,url,abstract", "limit": 30}
|
|
73
|
-
)
|
|
74
|
-
|
|
75
|
-
results = response.json()["recommendedPapers"]
|
|
76
|
-
print(f"Found {len(results)} recommended papers")
|
|
66
|
+
def search_crossref(query, limit=10, sort="is-referenced-by-count"):
|
|
67
|
+
"""Search CrossRef for papers sorted by citation count."""
|
|
68
|
+
resp = requests.get(
|
|
69
|
+
"https://api.crossref.org/works",
|
|
70
|
+
params={"query": query, "rows": limit, "sort": sort, "order": "desc"},
|
|
71
|
+
headers={"User-Agent": "ResearchPlugins/1.0 (https://wentor.ai; mailto:dev@wentor.ai)"}
|
|
72
|
+
)
|
|
73
|
+
return resp.json().get("message", {}).get("items", [])
|
|
74
|
+
|
|
75
|
+
results = search_crossref("transformer attention mechanism")
|
|
76
|
+
for w in results:
|
|
77
|
+
title = w.get("title", [""])[0] if w.get("title") else ""
|
|
78
|
+
print(f" {title} — Cited by: {w.get('is-referenced-by-count', 0)}")
|
|
77
79
|
```
|
|
78
80
|
|
|
79
81
|
## Citation Network Traversal
|
|
@@ -83,48 +85,49 @@ Walk the citation graph to discover foundational and derivative works.
|
|
|
83
85
|
### Forward Citations (Who Cited This Paper?)
|
|
84
86
|
|
|
85
87
|
```python
|
|
86
|
-
|
|
88
|
+
work_id = "W2741809807"
|
|
87
89
|
|
|
88
90
|
response = requests.get(
|
|
89
|
-
|
|
91
|
+
"https://api.openalex.org/works",
|
|
90
92
|
params={
|
|
91
|
-
"
|
|
92
|
-
"
|
|
93
|
-
"
|
|
94
|
-
}
|
|
93
|
+
"filter": f"cites:{work_id}",
|
|
94
|
+
"sort": "cited_by_count:desc",
|
|
95
|
+
"per_page": 20
|
|
96
|
+
},
|
|
97
|
+
headers=HEADERS
|
|
95
98
|
)
|
|
96
99
|
|
|
97
|
-
|
|
98
|
-
|
|
99
|
-
citations.sort(key=lambda x: x["citingPaper"]["citationCount"], reverse=True)
|
|
100
|
-
for c in citations[:10]:
|
|
101
|
-
p = c["citingPaper"]
|
|
102
|
-
print(f" [{p['year']}] {p['title']} ({p['citationCount']} cites)")
|
|
100
|
+
for w in response.json().get("results", []):
|
|
101
|
+
print(f" [{w.get('publication_year')}] {w.get('title')} ({w.get('cited_by_count')} cites)")
|
|
103
102
|
```
|
|
104
103
|
|
|
105
104
|
### Backward References (What Did This Paper Cite?)
|
|
106
105
|
|
|
107
106
|
```python
|
|
108
107
|
response = requests.get(
|
|
109
|
-
f"https://api.
|
|
110
|
-
|
|
108
|
+
f"https://api.openalex.org/works/{work_id}",
|
|
109
|
+
headers=HEADERS
|
|
111
110
|
)
|
|
111
|
+
paper = response.json()
|
|
112
|
+
ref_ids = paper.get("referenced_works", [])
|
|
112
113
|
|
|
113
|
-
|
|
114
|
-
|
|
114
|
+
# Fetch details for referenced works
|
|
115
|
+
for ref_id in ref_ids[:20]:
|
|
116
|
+
ref = requests.get(f"https://api.openalex.org/works/{ref_id.split('/')[-1]}", headers=HEADERS).json()
|
|
117
|
+
print(f" [{ref.get('publication_year')}] {ref.get('title')} ({ref.get('cited_by_count')} cites)")
|
|
115
118
|
```
|
|
116
119
|
|
|
117
120
|
## Building a Reading List Pipeline
|
|
118
121
|
|
|
119
|
-
Combine search,
|
|
122
|
+
Combine search, concept discovery, and citation traversal into a discovery pipeline:
|
|
120
123
|
|
|
121
124
|
| Step | Method | Purpose |
|
|
122
125
|
|------|--------|---------|
|
|
123
126
|
| 1. Seed selection | Manual or keyword search | Identify 3-5 highly relevant papers |
|
|
124
|
-
| 2. Expand via
|
|
125
|
-
| 3. Forward citation |
|
|
126
|
-
| 4. Backward citation |
|
|
127
|
-
| 5. Deduplicate |
|
|
127
|
+
| 2. Expand via concepts | OpenAlex concept graph | Find thematically related work |
|
|
128
|
+
| 3. Forward citation | OpenAlex cites filter | Find recent derivative works |
|
|
129
|
+
| 4. Backward citation | referenced_works field | Find foundational papers |
|
|
130
|
+
| 5. Deduplicate | OpenAlex work ID matching | Remove duplicates across steps |
|
|
128
131
|
| 6. Rank & filter | Sort by year, citations, relevance | Prioritize reading order |
|
|
129
132
|
|
|
130
133
|
```python
|
|
@@ -133,32 +136,46 @@ def build_reading_list(seed_ids, max_papers=50):
|
|
|
133
136
|
seen = set()
|
|
134
137
|
candidates = []
|
|
135
138
|
|
|
136
|
-
|
|
137
|
-
|
|
138
|
-
|
|
139
|
-
|
|
140
|
-
|
|
141
|
-
|
|
142
|
-
|
|
143
|
-
|
|
144
|
-
|
|
145
|
-
|
|
146
|
-
|
|
147
|
-
|
|
148
|
-
|
|
149
|
-
|
|
150
|
-
|
|
151
|
-
|
|
152
|
-
|
|
153
|
-
|
|
139
|
+
for seed_id in seed_ids:
|
|
140
|
+
# Get concepts from seed paper
|
|
141
|
+
paper = requests.get(f"https://api.openalex.org/works/{seed_id}", headers=HEADERS).json()
|
|
142
|
+
concept_ids = [c["id"] for c in paper.get("concepts", [])[:2]]
|
|
143
|
+
|
|
144
|
+
# Find related works via concepts
|
|
145
|
+
for cid in concept_ids:
|
|
146
|
+
related = requests.get(
|
|
147
|
+
"https://api.openalex.org/works",
|
|
148
|
+
params={"filter": f"concepts.id:{cid}", "sort": "cited_by_count:desc", "per_page": 20},
|
|
149
|
+
headers=HEADERS
|
|
150
|
+
).json().get("results", [])
|
|
151
|
+
for w in related:
|
|
152
|
+
wid = w.get("id", "").split("/")[-1]
|
|
153
|
+
if wid not in seen:
|
|
154
|
+
seen.add(wid)
|
|
155
|
+
candidates.append(w)
|
|
156
|
+
|
|
157
|
+
# Get citing works
|
|
158
|
+
citing = requests.get(
|
|
159
|
+
"https://api.openalex.org/works",
|
|
160
|
+
params={"filter": f"cites:{seed_id}", "sort": "cited_by_count:desc", "per_page": 20},
|
|
161
|
+
headers=HEADERS
|
|
162
|
+
).json().get("results", [])
|
|
163
|
+
for w in citing:
|
|
164
|
+
wid = w.get("id", "").split("/")[-1]
|
|
165
|
+
if wid not in seen:
|
|
166
|
+
seen.add(wid)
|
|
167
|
+
candidates.append(w)
|
|
168
|
+
|
|
169
|
+
# Rank by citation count and recency
|
|
170
|
+
candidates.sort(key=lambda p: (p.get("publication_year", 0), p.get("cited_by_count", 0)), reverse=True)
|
|
154
171
|
return candidates[:max_papers]
|
|
155
172
|
```
|
|
156
173
|
|
|
157
|
-
##
|
|
174
|
+
## Best Practices
|
|
158
175
|
|
|
159
|
-
-
|
|
160
|
-
-
|
|
161
|
-
- Always include only the fields you need to reduce payload size
|
|
162
|
-
- Use `
|
|
176
|
+
- OpenAlex is free with no API key required; use a polite `User-Agent` header
|
|
177
|
+
- CrossRef requires a polite pool user agent with contact info for higher rate limits
|
|
178
|
+
- Always include only the fields you need via `select` parameter to reduce payload size
|
|
179
|
+
- Use `page` and `per_page` for pagination on large result sets
|
|
163
180
|
- Cache responses locally to avoid redundant requests
|
|
164
|
-
- Use DOI
|
|
181
|
+
- Use DOI as the universal identifier for cross-system compatibility
|
|
@@ -84,7 +84,7 @@ else:
|
|
|
84
84
|
| SSRN | Preprint server | Social sciences, law, economics | ssrn.com |
|
|
85
85
|
| Zenodo | Repository | All disciplines | zenodo.org |
|
|
86
86
|
| CORE | Aggregator | 300M+ papers from repositories | core.ac.uk |
|
|
87
|
-
|
|
|
87
|
+
| OpenAlex | Search + OA links | Cross-disciplinary | openalex.org |
|
|
88
88
|
| BASE (Bielefeld) | Aggregator | 400M+ documents | base-search.net |
|
|
89
89
|
|
|
90
90
|
### Batch OA Lookup
|
|
@@ -93,11 +93,11 @@ Unpaywall / OpenAlex:
|
|
|
93
93
|
- Use: Find OA versions of any DOI
|
|
94
94
|
- Best for: Locating freely available versions of papers
|
|
95
95
|
|
|
96
|
-
|
|
97
|
-
- Coverage:
|
|
98
|
-
- Access: Free API,
|
|
99
|
-
- Features:
|
|
100
|
-
- Best for:
|
|
96
|
+
OpenAlex:
|
|
97
|
+
- Coverage: 250M+ works, all disciplines
|
|
98
|
+
- Access: Free API, no key required
|
|
99
|
+
- Features: Concepts, citation counts, author profiles, institution data
|
|
100
|
+
- Best for: Cross-disciplinary metadata and OA discovery
|
|
101
101
|
```
|
|
102
102
|
|
|
103
103
|
## Full-Text Retrieval and Parsing
|
|
@@ -49,7 +49,7 @@ Whether you are conducting a systematic literature review, mapping a new researc
|
|
|
49
49
|
|
|
50
50
|
| Source | Coverage | API | Cost |
|
|
51
51
|
|--------|----------|-----|------|
|
|
52
|
-
|
|
|
52
|
+
| OpenAlex | 250M+ works, all disciplines | REST API, free | Free (no key required) |
|
|
53
53
|
| OpenAlex | 250M+ works, all disciplines | REST API, free | Free |
|
|
54
54
|
| Crossref | 140M+ DOIs | REST API | Free |
|
|
55
55
|
| Web of Science | Curated, multi-disciplinary | Institutional | Licensed |
|
|
@@ -219,7 +219,7 @@ Traditional citations take years to accumulate. Altmetrics capture immediate att
|
|
|
219
219
|
|
|
220
220
|
## Best Practices
|
|
221
221
|
|
|
222
|
-
- **Combine multiple data sources.** No single database has complete coverage. Merge OpenAlex and
|
|
222
|
+
- **Combine multiple data sources.** No single database has complete coverage. Merge OpenAlex and CrossRef for best results.
|
|
223
223
|
- **Normalize by field and age.** A 2024 paper in biology and a 2024 paper in mathematics have very different citation rate baselines.
|
|
224
224
|
- **Use relative indicators.** Field-Weighted Citation Impact (FWCI) accounts for disciplinary differences.
|
|
225
225
|
- **Do not equate citations with quality.** Retracted papers sometimes have high citation counts. Controversial papers accumulate criticism citations.
|
|
@@ -229,7 +229,7 @@ Traditional citations take years to accumulate. Altmetrics capture immediate att
|
|
|
229
229
|
## References
|
|
230
230
|
|
|
231
231
|
- [OpenAlex API](https://docs.openalex.org/) -- Free, open bibliographic data
|
|
232
|
-
- [
|
|
232
|
+
- [CrossRef API](https://api.crossref.org/) -- DOI resolution and metadata
|
|
233
233
|
- [VOSviewer](https://www.vosviewer.com/) -- Bibliometric visualization tool
|
|
234
234
|
- [bibliometrix R package](https://www.bibliometrix.org/) -- Comprehensive bibliometric analysis
|
|
235
235
|
- [Altmetric](https://www.altmetric.com/) -- Alternative impact metrics
|
|
@@ -115,33 +115,6 @@ for source in results:
|
|
|
115
115
|
|
|
116
116
|
Google Scholar profiles automatically display h-index and i10-index. No calculation needed, but coverage is the broadest (includes non-peer-reviewed sources).
|
|
117
117
|
|
|
118
|
-
### From Semantic Scholar API
|
|
119
|
-
|
|
120
|
-
```python
|
|
121
|
-
def get_author_h_index(author_name):
|
|
122
|
-
"""Calculate h-index for an author using Semantic Scholar."""
|
|
123
|
-
# Search for author
|
|
124
|
-
search_resp = requests.get(
|
|
125
|
-
"https://api.semanticscholar.org/graph/v1/author/search",
|
|
126
|
-
params={"query": author_name, "limit": 1}
|
|
127
|
-
)
|
|
128
|
-
authors = search_resp.json().get("data", [])
|
|
129
|
-
if not authors:
|
|
130
|
-
return None
|
|
131
|
-
|
|
132
|
-
author_id = authors[0]["authorId"]
|
|
133
|
-
|
|
134
|
-
# Get all papers with citation counts
|
|
135
|
-
papers_resp = requests.get(
|
|
136
|
-
f"https://api.semanticscholar.org/graph/v1/author/{author_id}/papers",
|
|
137
|
-
params={"fields": "citationCount", "limit": 1000}
|
|
138
|
-
)
|
|
139
|
-
papers = papers_resp.json().get("data", [])
|
|
140
|
-
citation_counts = [p.get("citationCount", 0) for p in papers]
|
|
141
|
-
|
|
142
|
-
return calculate_h_index(citation_counts)
|
|
143
|
-
```
|
|
144
|
-
|
|
145
118
|
### From OpenAlex
|
|
146
119
|
|
|
147
120
|
```python
|
|
@@ -36,7 +36,7 @@ Select the skill matching the user's need, then `read` its SKILL.md.
|
|
|
36
36
|
| [plos-open-access-api](./plos-open-access-api/SKILL.md) | Search PLOS open access journals with full-text Solr-powered API |
|
|
37
37
|
| [pubmed-api](./pubmed-api/SKILL.md) | Search biomedical literature and retrieve records via PubMed E-utilities |
|
|
38
38
|
| [scielo-api](./scielo-api/SKILL.md) | Access Latin American and developing world research via SciELO API |
|
|
39
|
-
| [semantic-scholar-api](./semantic-scholar-api/SKILL.md) | Search papers and analyze citation graphs via
|
|
39
|
+
| [semantic-scholar-api](./semantic-scholar-api/SKILL.md) | Search papers and analyze citation graphs via OpenAlex and CrossRef APIs |
|
|
40
40
|
| [share-research-api](./share-research-api/SKILL.md) | Discover open access research outputs via the SHARE notification API |
|
|
41
41
|
| [systematic-search-strategy](./systematic-search-strategy/SKILL.md) | Construct rigorous systematic search strategies for literature reviews |
|
|
42
42
|
| [worldcat-search-api](./worldcat-search-api/SKILL.md) | Search the world's largest library catalog via OCLC WorldCat API |
|
|
@@ -40,24 +40,30 @@ Examine the reference list of each seed paper and identify which cited works are
|
|
|
40
40
|
```python
|
|
41
41
|
import requests
|
|
42
42
|
|
|
43
|
-
|
|
44
|
-
|
|
45
|
-
|
|
46
|
-
|
|
47
|
-
|
|
48
|
-
|
|
49
|
-
|
|
50
|
-
|
|
51
|
-
|
|
43
|
+
HEADERS = {"User-Agent": "ResearchPlugins/1.0 (https://wentor.ai)"}
|
|
44
|
+
|
|
45
|
+
def get_references(work_id):
|
|
46
|
+
"""Get all references of a paper via OpenAlex."""
|
|
47
|
+
url = f"https://api.openalex.org/works/{work_id}"
|
|
48
|
+
response = requests.get(url, headers=HEADERS)
|
|
49
|
+
paper = response.json()
|
|
50
|
+
ref_ids = paper.get("referenced_works", [])
|
|
51
|
+
|
|
52
|
+
references = []
|
|
53
|
+
for ref_id in ref_ids:
|
|
54
|
+
ref = requests.get(f"https://api.openalex.org/works/{ref_id.split('/')[-1]}", headers=HEADERS).json()
|
|
55
|
+
if ref.get("title"):
|
|
56
|
+
references.append(ref)
|
|
57
|
+
return references
|
|
52
58
|
|
|
53
59
|
# Get references of a seed paper
|
|
54
|
-
|
|
55
|
-
references = get_references(
|
|
60
|
+
seed_id = "W2741809807"
|
|
61
|
+
references = get_references(seed_id)
|
|
56
62
|
|
|
57
63
|
# Sort by citation count to find the most influential foundations
|
|
58
|
-
references.sort(key=lambda p: p.get("
|
|
64
|
+
references.sort(key=lambda p: p.get("cited_by_count", 0), reverse=True)
|
|
59
65
|
for ref in references[:15]:
|
|
60
|
-
print(f"[{ref.get('
|
|
66
|
+
print(f"[{ref.get('publication_year', '?')}] {ref['title']} ({ref.get('cited_by_count', 0)} citations)")
|
|
61
67
|
```
|
|
62
68
|
|
|
63
69
|
### Step 3: Forward Chaining (Citation Tracking)
|
|
@@ -65,28 +71,32 @@ for ref in references[:15]:
|
|
|
65
71
|
Find all papers that have cited your seed paper.
|
|
66
72
|
|
|
67
73
|
```python
|
|
68
|
-
def get_citations(
|
|
69
|
-
"""Get papers citing a given paper via
|
|
70
|
-
url = f"https://api.semanticscholar.org/graph/v1/paper/{paper_id}/citations"
|
|
74
|
+
def get_citations(work_id, limit=200):
|
|
75
|
+
"""Get papers citing a given paper via OpenAlex."""
|
|
71
76
|
all_citations = []
|
|
72
|
-
|
|
73
|
-
while
|
|
74
|
-
response = requests.get(
|
|
75
|
-
"
|
|
76
|
-
|
|
77
|
-
|
|
78
|
-
|
|
79
|
-
|
|
80
|
-
|
|
77
|
+
page = 1
|
|
78
|
+
while len(all_citations) < limit:
|
|
79
|
+
response = requests.get(
|
|
80
|
+
"https://api.openalex.org/works",
|
|
81
|
+
params={
|
|
82
|
+
"filter": f"cites:{work_id}",
|
|
83
|
+
"sort": "cited_by_count:desc",
|
|
84
|
+
"per_page": min(200, limit - len(all_citations)),
|
|
85
|
+
"page": page
|
|
86
|
+
},
|
|
87
|
+
headers=HEADERS
|
|
88
|
+
)
|
|
89
|
+
results = response.json().get("results", [])
|
|
90
|
+
if not results:
|
|
81
91
|
break
|
|
82
|
-
all_citations.extend(
|
|
83
|
-
|
|
92
|
+
all_citations.extend(results)
|
|
93
|
+
page += 1
|
|
84
94
|
return all_citations
|
|
85
95
|
|
|
86
|
-
citations = get_citations(
|
|
96
|
+
citations = get_citations(seed_id)
|
|
87
97
|
# Filter for recent, well-cited papers
|
|
88
|
-
recent_impactful = [c for c in citations if c.get("
|
|
89
|
-
recent_impactful.sort(key=lambda p: p.get("
|
|
98
|
+
recent_impactful = [c for c in citations if c.get("publication_year", 0) >= 2022 and c.get("cited_by_count", 0) >= 5]
|
|
99
|
+
recent_impactful.sort(key=lambda p: p.get("cited_by_count", 0), reverse=True)
|
|
90
100
|
```
|
|
91
101
|
|
|
92
102
|
### Step 4: Co-Citation and Bibliographic Coupling
|
|
@@ -134,7 +144,7 @@ Repeat the process with the most relevant papers discovered in each round:
|
|
|
134
144
|
| Google Scholar "Cited by" | Forward chaining | Free |
|
|
135
145
|
| Web of Science "Cited References" / "Times Cited" | Both directions | Subscription |
|
|
136
146
|
| Scopus "References" / "Cited by" | Both directions | Subscription |
|
|
137
|
-
|
|
|
147
|
+
| OpenAlex API | Programmatic, both directions | Free |
|
|
138
148
|
| Connected Papers (connectedpapers.com) | Visual co-citation graph | Free (limited) |
|
|
139
149
|
| Litmaps (litmaps.com) | Visual citation network | Free tier |
|
|
140
150
|
| CoCites (cocites.com) | Co-citation analysis | Free |
|
|
@@ -145,4 +155,4 @@ Repeat the process with the most relevant papers discovered in each round:
|
|
|
145
155
|
- **Citation bias**: Highly cited papers are not always the best or most relevant. Pay attention to less-cited but methodologically sound papers.
|
|
146
156
|
- **Recency bias**: Forward chaining favors recent papers with fewer citations. Allow time for citation accumulation or use Mendeley readership as a proxy.
|
|
147
157
|
- **Field boundaries**: Citation chains tend to stay within disciplinary silos. Combine with keyword searches in adjacent-field databases to break out.
|
|
148
|
-
- **Incomplete coverage**: No single database indexes all citations. Cross-check with at least two sources (e.g.,
|
|
158
|
+
- **Incomplete coverage**: No single database indexes all citations. Cross-check with at least two sources (e.g., OpenAlex + Google Scholar).
|
|
@@ -96,5 +96,5 @@ A robust literature search should query multiple databases to maximize recall:
|
|
|
96
96
|
|
|
97
97
|
- **Scopus vs. Web of Science**: Scopus has broader coverage (especially post-2000 and non-English journals); WoS has deeper historical archives and the Journal Impact Factor.
|
|
98
98
|
- **Google Scholar** finds the most results but lacks advanced filtering. Use it for snowball searches and finding grey literature, not as your primary systematic search tool.
|
|
99
|
-
- **API access**: PubMed (E-utilities),
|
|
99
|
+
- **API access**: PubMed (E-utilities), OpenAlex, and Crossref all offer free APIs for programmatic searching. Scopus and WoS require institutional API keys.
|
|
100
100
|
- **Alert services**: Set up saved search alerts on PubMed, Scopus, and Google Scholar to stay current in fast-moving fields.
|