@wentorai/research-plugins 1.4.0 → 1.4.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (53) hide show
  1. package/curated/literature/README.md +2 -2
  2. package/curated/writing/README.md +1 -1
  3. package/package.json +1 -1
  4. package/skills/literature/discovery/SKILL.md +1 -1
  5. package/skills/literature/discovery/citation-alert-guide/SKILL.md +2 -2
  6. package/skills/literature/discovery/conference-proceedings-guide/SKILL.md +2 -2
  7. package/skills/literature/discovery/literature-mapping-guide/SKILL.md +1 -1
  8. package/skills/literature/discovery/paper-recommendation-guide/SKILL.md +8 -14
  9. package/skills/literature/discovery/rss-paper-feeds/SKILL.md +20 -14
  10. package/skills/literature/discovery/semantic-paper-radar/SKILL.md +8 -8
  11. package/skills/literature/discovery/semantic-scholar-recs-guide/SKILL.md +103 -86
  12. package/skills/literature/fulltext/open-access-guide/SKILL.md +1 -1
  13. package/skills/literature/fulltext/open-access-mining-guide/SKILL.md +5 -5
  14. package/skills/literature/metadata/citation-network-guide/SKILL.md +3 -3
  15. package/skills/literature/metadata/h-index-guide/SKILL.md +0 -27
  16. package/skills/literature/search/SKILL.md +1 -1
  17. package/skills/literature/search/citation-chaining-guide/SKILL.md +42 -32
  18. package/skills/literature/search/database-comparison-guide/SKILL.md +1 -1
  19. package/skills/literature/search/semantic-scholar-api/SKILL.md +56 -53
  20. package/skills/research/automation/paper-to-agent-guide/SKILL.md +1 -1
  21. package/skills/research/deep-research/in-depth-research-guide/SKILL.md +1 -1
  22. package/skills/research/deep-research/kosmos-scientist-guide/SKILL.md +3 -3
  23. package/skills/research/deep-research/llm-scientific-discovery-guide/SKILL.md +1 -1
  24. package/skills/research/deep-research/local-deep-research-guide/SKILL.md +6 -6
  25. package/skills/research/deep-research/open-researcher-guide/SKILL.md +3 -3
  26. package/skills/research/deep-research/tongyi-deep-research-guide/SKILL.md +4 -4
  27. package/skills/research/methodology/grad-school-guide/SKILL.md +1 -1
  28. package/skills/research/paper-review/automated-review-guide/SKILL.md +1 -1
  29. package/skills/tools/diagram/excalidraw-diagram-guide/SKILL.md +1 -1
  30. package/skills/tools/diagram/mermaid-architect-guide/SKILL.md +1 -1
  31. package/skills/tools/diagram/plantuml-guide/SKILL.md +1 -1
  32. package/skills/tools/document/grobid-pdf-parsing/SKILL.md +1 -1
  33. package/skills/tools/document/paper-parse-guide/SKILL.md +2 -2
  34. package/skills/tools/knowledge-graph/citation-network-builder/SKILL.md +5 -5
  35. package/skills/tools/knowledge-graph/knowledge-graph-construction/SKILL.md +1 -1
  36. package/skills/tools/scraping/academic-web-scraping/SKILL.md +1 -2
  37. package/skills/tools/scraping/google-scholar-scraper/SKILL.md +7 -7
  38. package/skills/writing/citation/SKILL.md +1 -1
  39. package/skills/writing/citation/academic-citation-manager/SKILL.md +20 -17
  40. package/skills/writing/citation/citation-assistant-skill/SKILL.md +72 -58
  41. package/skills/writing/citation/onecite-reference-guide/SKILL.md +1 -1
  42. package/skills/writing/citation/zotero-reference-guide/SKILL.md +1 -1
  43. package/skills/writing/citation/zotero-scholar-guide/SKILL.md +1 -1
  44. package/src/tools/arxiv.ts +3 -0
  45. package/src/tools/biorxiv.ts +19 -3
  46. package/src/tools/crossref.ts +3 -0
  47. package/src/tools/datacite.ts +3 -0
  48. package/src/tools/openalex.ts +6 -0
  49. package/src/tools/opencitations.ts +9 -0
  50. package/src/tools/orcid.ts +3 -0
  51. package/src/tools/pubmed.ts +3 -0
  52. package/src/tools/unpaywall.ts +3 -0
  53. package/src/tools/zenodo.ts +3 -0
@@ -40,24 +40,30 @@ Examine the reference list of each seed paper and identify which cited works are
40
40
  ```python
41
41
  import requests
42
42
 
43
- def get_references(paper_id, limit=100):
44
- """Get all references of a paper via Semantic Scholar."""
45
- url = f"https://api.semanticscholar.org/graph/v1/paper/{paper_id}/references"
46
- response = requests.get(url, params={
47
- "fields": "title,year,citationCount,externalIds,abstract",
48
- "limit": limit
49
- })
50
- refs = response.json().get("data", [])
51
- return [r["citedPaper"] for r in refs if r["citedPaper"].get("title")]
43
+ HEADERS = {"User-Agent": "ResearchPlugins/1.0 (https://wentor.ai)"}
44
+
45
+ def get_references(work_id):
46
+ """Get all references of a paper via OpenAlex."""
47
+ url = f"https://api.openalex.org/works/{work_id}"
48
+ response = requests.get(url, headers=HEADERS)
49
+ paper = response.json()
50
+ ref_ids = paper.get("referenced_works", [])
51
+
52
+ references = []
53
+ for ref_id in ref_ids:
54
+ ref = requests.get(f"https://api.openalex.org/works/{ref_id.split('/')[-1]}", headers=HEADERS).json()
55
+ if ref.get("title"):
56
+ references.append(ref)
57
+ return references
52
58
 
53
59
  # Get references of a seed paper
54
- seed_doi = "DOI:10.1038/s41586-021-03819-2"
55
- references = get_references(seed_doi)
60
+ seed_id = "W2741809807"
61
+ references = get_references(seed_id)
56
62
 
57
63
  # Sort by citation count to find the most influential foundations
58
- references.sort(key=lambda p: p.get("citationCount", 0), reverse=True)
64
+ references.sort(key=lambda p: p.get("cited_by_count", 0), reverse=True)
59
65
  for ref in references[:15]:
60
- print(f"[{ref.get('year', '?')}] {ref['title']} ({ref.get('citationCount', 0)} citations)")
66
+ print(f"[{ref.get('publication_year', '?')}] {ref['title']} ({ref.get('cited_by_count', 0)} citations)")
61
67
  ```
62
68
 
63
69
  ### Step 3: Forward Chaining (Citation Tracking)
@@ -65,28 +71,32 @@ for ref in references[:15]:
65
71
  Find all papers that have cited your seed paper.
66
72
 
67
73
  ```python
68
- def get_citations(paper_id, limit=200):
69
- """Get papers citing a given paper via Semantic Scholar."""
70
- url = f"https://api.semanticscholar.org/graph/v1/paper/{paper_id}/citations"
74
+ def get_citations(work_id, limit=200):
75
+ """Get papers citing a given paper via OpenAlex."""
71
76
  all_citations = []
72
- offset = 0
73
- while offset < limit:
74
- response = requests.get(url, params={
75
- "fields": "title,year,citationCount,externalIds,abstract",
76
- "limit": min(100, limit - offset),
77
- "offset": offset
78
- })
79
- data = response.json().get("data", [])
80
- if not data:
77
+ page = 1
78
+ while len(all_citations) < limit:
79
+ response = requests.get(
80
+ "https://api.openalex.org/works",
81
+ params={
82
+ "filter": f"cites:{work_id}",
83
+ "sort": "cited_by_count:desc",
84
+ "per_page": min(200, limit - len(all_citations)),
85
+ "page": page
86
+ },
87
+ headers=HEADERS
88
+ )
89
+ results = response.json().get("results", [])
90
+ if not results:
81
91
  break
82
- all_citations.extend([c["citingPaper"] for c in data if c["citingPaper"].get("title")])
83
- offset += len(data)
92
+ all_citations.extend(results)
93
+ page += 1
84
94
  return all_citations
85
95
 
86
- citations = get_citations(seed_doi)
96
+ citations = get_citations(seed_id)
87
97
  # Filter for recent, well-cited papers
88
- recent_impactful = [c for c in citations if c.get("year", 0) >= 2022 and c.get("citationCount", 0) >= 5]
89
- recent_impactful.sort(key=lambda p: p.get("citationCount", 0), reverse=True)
98
+ recent_impactful = [c for c in citations if c.get("publication_year", 0) >= 2022 and c.get("cited_by_count", 0) >= 5]
99
+ recent_impactful.sort(key=lambda p: p.get("cited_by_count", 0), reverse=True)
90
100
  ```
91
101
 
92
102
  ### Step 4: Co-Citation and Bibliographic Coupling
@@ -134,7 +144,7 @@ Repeat the process with the most relevant papers discovered in each round:
134
144
  | Google Scholar "Cited by" | Forward chaining | Free |
135
145
  | Web of Science "Cited References" / "Times Cited" | Both directions | Subscription |
136
146
  | Scopus "References" / "Cited by" | Both directions | Subscription |
137
- | Semantic Scholar API | Programmatic, both directions | Free |
147
+ | OpenAlex API | Programmatic, both directions | Free |
138
148
  | Connected Papers (connectedpapers.com) | Visual co-citation graph | Free (limited) |
139
149
  | Litmaps (litmaps.com) | Visual citation network | Free tier |
140
150
  | CoCites (cocites.com) | Co-citation analysis | Free |
@@ -145,4 +155,4 @@ Repeat the process with the most relevant papers discovered in each round:
145
155
  - **Citation bias**: Highly cited papers are not always the best or most relevant. Pay attention to less-cited but methodologically sound papers.
146
156
  - **Recency bias**: Forward chaining favors recent papers with fewer citations. Allow time for citation accumulation or use Mendeley readership as a proxy.
147
157
  - **Field boundaries**: Citation chains tend to stay within disciplinary silos. Combine with keyword searches in adjacent-field databases to break out.
148
- - **Incomplete coverage**: No single database indexes all citations. Cross-check with at least two sources (e.g., Semantic Scholar + Google Scholar).
158
+ - **Incomplete coverage**: No single database indexes all citations. Cross-check with at least two sources (e.g., OpenAlex + Google Scholar).
@@ -96,5 +96,5 @@ A robust literature search should query multiple databases to maximize recall:
96
96
 
97
97
  - **Scopus vs. Web of Science**: Scopus has broader coverage (especially post-2000 and non-English journals); WoS has deeper historical archives and the Journal Impact Factor.
98
98
  - **Google Scholar** finds the most results but lacks advanced filtering. Use it for snowball searches and finding grey literature, not as your primary systematic search tool.
99
- - **API access**: PubMed (E-utilities), Semantic Scholar, OpenAlex, and Crossref all offer free APIs for programmatic searching. Scopus and WoS require institutional API keys.
99
+ - **API access**: PubMed (E-utilities), OpenAlex, and Crossref all offer free APIs for programmatic searching. Scopus and WoS require institutional API keys.
100
100
  - **Alert services**: Set up saved search alerts on PubMed, Scopus, and Google Scholar to stay current in fast-moving fields.
@@ -1,134 +1,137 @@
1
1
  ---
2
2
  name: semantic-scholar-api
3
- description: "Search papers and analyze citation graphs via Semantic Scholar"
3
+ description: "Search papers and analyze citation graphs via OpenAlex and CrossRef APIs"
4
4
  metadata:
5
5
  openclaw:
6
6
  emoji: "🔍"
7
7
  category: "literature"
8
8
  subcategory: "search"
9
9
  keywords: ["academic database search", "semantic search", "AI-powered literature search", "citation analysis", "citation network"]
10
- source: "https://api.semanticscholar.org/"
10
+ source: "https://api.openalex.org/"
11
11
  ---
12
12
 
13
- # Semantic Scholar API Guide
13
+ # OpenAlex & CrossRef API Guide
14
14
 
15
15
  ## Overview
16
16
 
17
- Semantic Scholar is a free, AI-powered research tool created by the Allen Institute for AI (AI2) that indexes over 200 million academic papers across all fields of science. Unlike traditional keyword-based search engines, Semantic Scholar uses natural language processing and machine learning to understand paper content, identify influential citations, and surface the most relevant results.
17
+ OpenAlex is a free, open catalog of the global research system, indexing over 250 million academic works across all fields of science. It provides structured access to papers, authors, institutions, concepts, and citation networks. OpenAlex is the successor to Microsoft Academic Graph and is maintained by OurResearch (the team behind Unpaywall).
18
18
 
19
- The Semantic Scholar Academic Graph API provides structured access to papers, authors, citations, and references. It distinguishes between influential and non-influential citations using a trained classifier, helping researchers quickly identify the most impactful works in any field. The API also provides TLDR summaries generated by AI for many papers.
19
+ CrossRef is the official DOI registration agency for scholarly content, providing metadata for over 150 million DOIs across all publishers and disciplines. Together, OpenAlex and CrossRef provide comprehensive coverage for academic search, citation analysis, and bibliometric research.
20
20
 
21
- The API can be used without authentication for basic access. Registering for a free API key unlocks higher rate limits and is recommended for production applications. The API returns clean JSON responses and supports field selection to minimize response payload size.
21
+ Both APIs are free to use without authentication. OpenAlex requests a polite `User-Agent` header; CrossRef requests a `User-Agent` with contact email for access to the polite pool (faster rate limits).
22
22
 
23
23
  ## Authentication
24
24
 
25
- No authentication is required for basic usage. For higher rate limits, request a free API key at https://www.semanticscholar.org/product/api and include it as a header:
25
+ No authentication is required for either API.
26
26
 
27
+ OpenAlex: Include a `User-Agent` header for polite access:
27
28
  ```
28
- x-api-key: YOUR_API_KEY
29
+ User-Agent: ResearchPlugins/1.0 (https://wentor.ai)
29
30
  ```
30
31
 
31
- Without an API key, rate limits are 5,000 requests per 5 minutes. With a key, limits are significantly higher (up to 1 request per second sustained).
32
+ CrossRef: Include a `User-Agent` header with contact email for polite pool:
33
+ ```
34
+ User-Agent: ResearchPlugins/1.0 (https://wentor.ai; mailto:dev@wentor.ai)
35
+ ```
32
36
 
33
37
  ## Core Endpoints
34
38
 
35
- ### Paper Search: Find Papers by Query
39
+ ### OpenAlex: Search Works
36
40
 
37
- - **URL**: `GET https://api.semanticscholar.org/graph/v1/paper/search`
41
+ - **URL**: `GET https://api.openalex.org/works`
38
42
  - **Parameters**:
39
43
  | Param | Type | Required | Description |
40
44
  |-------|------|----------|-------------|
41
- | query | string | Yes | Search query string |
42
- | offset | integer | No | Pagination offset (default: 0) |
43
- | limit | integer | No | Results per page (default: 10, max: 100) |
44
- | fields | string | No | Comma-separated fields to return (e.g., title,abstract,year,citationCount) |
45
- | year | string | No | Year range filter (e.g., 2020-2024 or 2024-) |
46
- | fieldsOfStudy | string | No | Filter by field (e.g., Computer Science, Medicine) |
45
+ | search | string | No | Full-text search query |
46
+ | filter | string | No | Filter expression (e.g., `from_publication_date:2024-01-01`) |
47
+ | sort | string | No | Sort field (e.g., `cited_by_count:desc`, `publication_date:desc`) |
48
+ | per_page | integer | No | Results per page (default: 25, max: 200) |
49
+ | page | integer | No | Page number (default: 1) |
47
50
  - **Example**:
48
51
  ```bash
49
- curl "https://api.semanticscholar.org/graph/v1/paper/search?query=attention+is+all+you+need&limit=5&fields=title,year,citationCount,authors,tldr"
52
+ curl "https://api.openalex.org/works?search=attention+is+all+you+need&per_page=5"
50
53
  ```
51
- - **Response**: JSON with `total`, `offset`, and `data` array containing paper objects with requested fields.
54
+ - **Response**: JSON with `meta` (count, page info) and `results` array containing work objects.
52
55
 
53
- ### Paper Details: Retrieve Full Paper Metadata
56
+ ### OpenAlex: Get Work Details
54
57
 
55
- - **URL**: `GET https://api.semanticscholar.org/graph/v1/paper/{paper_id}`
58
+ - **URL**: `GET https://api.openalex.org/works/{id}`
56
59
  - **Parameters**:
57
60
  | Param | Type | Required | Description |
58
61
  |-------|------|----------|-------------|
59
- | paper_id | string | Yes | Semantic Scholar ID, DOI, ArXiv ID, or other identifier (e.g., DOI:10.1234/...) |
60
- | fields | string | No | Comma-separated fields to return |
62
+ | id | string | Yes | OpenAlex ID (e.g., `W2741809807`), DOI URL, or other identifier |
61
63
  - **Example**:
62
64
  ```bash
63
- curl "https://api.semanticscholar.org/graph/v1/paper/DOI:10.18653/v1/N19-1423?fields=title,abstract,year,citationCount,influentialCitationCount,references,citations"
65
+ curl "https://api.openalex.org/works/W2741809807"
64
66
  ```
65
- - **Response**: JSON with full paper metadata including `paperId`, `title`, `abstract`, `year`, `citationCount`, `influentialCitationCount`, `references`, and `citations`.
67
+ - **Response**: JSON with full work metadata including `id`, `title`, `abstract_inverted_index`, `publication_year`, `cited_by_count`, `authorships`, `concepts`, `referenced_works`.
66
68
 
67
- ### Author Search: Find Researchers
69
+ ### OpenAlex: Search Authors
68
70
 
69
- - **URL**: `GET https://api.semanticscholar.org/graph/v1/author/search`
71
+ - **URL**: `GET https://api.openalex.org/authors`
70
72
  - **Parameters**:
71
73
  | Param | Type | Required | Description |
72
74
  |-------|------|----------|-------------|
73
- | query | string | Yes | Author name query |
74
- | offset | integer | No | Pagination offset |
75
- | limit | integer | No | Results per page (max: 1000) |
76
- | fields | string | No | Fields to return (e.g., name,paperCount,citationCount,hIndex) |
75
+ | search | string | No | Author name search |
76
+ | filter | string | No | Filter expression |
77
+ | per_page | integer | No | Results per page (max: 200) |
77
78
  - **Example**:
78
79
  ```bash
79
- curl "https://api.semanticscholar.org/graph/v1/author/search?query=Yoshua+Bengio&fields=name,paperCount,citationCount,hIndex"
80
+ curl "https://api.openalex.org/authors?search=Yoshua+Bengio&per_page=5"
80
81
  ```
81
- - **Response**: JSON with author profiles including publication and citation metrics.
82
+ - **Response**: JSON with author profiles including `works_count`, `cited_by_count`, `summary_stats.h_index`, affiliations.
82
83
 
83
- ### Dataset Releases: Bulk Data Access
84
+ ### CrossRef: Resolve DOI
84
85
 
85
- - **URL**: `GET https://api.semanticscholar.org/datasets/v1/release`
86
+ - **URL**: `GET https://api.crossref.org/works/{doi}`
86
87
  - **Parameters**:
87
88
  | Param | Type | Required | Description |
88
89
  |-------|------|----------|-------------|
89
- | (none) | - | - | Returns list of available dataset releases |
90
+ | doi | string | Yes | DOI to resolve (e.g., `10.1038/nature12373`) |
90
91
  - **Example**:
91
92
  ```bash
92
- curl "https://api.semanticscholar.org/datasets/v1/release"
93
+ curl "https://api.crossref.org/works/10.18653/v1/N19-1423"
93
94
  ```
94
- - **Response**: JSON array of release identifiers (dates) for bulk dataset downloads.
95
+ - **Response**: JSON with full bibliographic metadata including title, authors, journal, dates, references count, and citation count.
95
96
 
96
97
  ## Rate Limits
97
98
 
98
- Without API key: 5,000 requests per 5 minutes (approximately 16.7 requests per second in bursts). With API key: higher sustained throughput, varies by key tier. The API returns HTTP 429 when limits are exceeded. Use the `Retry-After` header value to determine wait time before retrying. Batch endpoints are available for retrieving multiple papers or authors in a single request, which is more efficient than individual lookups.
99
+ OpenAlex: No strict rate limit, but use polite `User-Agent` header. Recommended: max 10 requests per second. The API returns HTTP 429 when limits are exceeded.
100
+
101
+ CrossRef: Without polite pool: ~50 requests per second. With polite pool (contact email in User-Agent): higher limits. The API returns HTTP 429 when limits are exceeded.
99
102
 
100
103
  ## Common Patterns
101
104
 
102
105
  ### Build a Citation Network
103
106
 
104
- Retrieve a paper and its citation tree to map influence:
107
+ Retrieve a paper and find all works that cite it:
105
108
 
106
109
  ```bash
107
- # Get paper with its references and citations
108
- curl "https://api.semanticscholar.org/graph/v1/paper/CorpusID:49313245?fields=title,citations.title,citations.citationCount,references.title,references.citationCount"
110
+ # Get paper details
111
+ curl "https://api.openalex.org/works/W2741809807"
112
+
113
+ # Get works citing this paper, sorted by citation count
114
+ curl "https://api.openalex.org/works?filter=cites:W2741809807&sort=cited_by_count:desc&per_page=20"
109
115
  ```
110
116
 
111
117
  ### Find Influential Papers on a Topic
112
118
 
113
- Search for highly cited and influential works:
119
+ Search for highly cited works on a topic:
114
120
 
115
121
  ```bash
116
- curl "https://api.semanticscholar.org/graph/v1/paper/search?query=graph+neural+networks&fields=title,year,citationCount,influentialCitationCount&limit=20"
122
+ curl "https://api.openalex.org/works?search=graph+neural+networks&sort=cited_by_count:desc&per_page=20"
117
123
  ```
118
124
 
119
- ### Batch Paper Lookup
125
+ ### Batch Paper Lookup via CrossRef
120
126
 
121
- Retrieve metadata for multiple papers in a single request using the batch endpoint:
127
+ Search CrossRef for papers matching a query, sorted by citation count:
122
128
 
123
129
  ```bash
124
- curl -X POST "https://api.semanticscholar.org/graph/v1/paper/batch" \
125
- -H "Content-Type: application/json" \
126
- -d '{"ids": ["DOI:10.1038/s41586-021-03819-2", "CorpusID:49313245"]}' \
127
- --url-query "fields=title,year,citationCount"
130
+ curl "https://api.crossref.org/works?query=graph+neural+networks&sort=is-referenced-by-count&order=desc&rows=20"
128
131
  ```
129
132
 
130
133
  ## References
131
134
 
132
- - Official documentation: https://api.semanticscholar.org/
133
- - API tutorial: https://www.semanticscholar.org/product/api/tutorial
134
- - Semantic Scholar paper: https://arxiv.org/abs/2301.10140
135
+ - OpenAlex documentation: https://docs.openalex.org/
136
+ - CrossRef API documentation: https://api.crossref.org/swagger-ui/index.html
137
+ - OpenAlex source: https://github.com/ourresearch/openalex-guts
@@ -83,7 +83,7 @@ The skill supports building knowledge graphs from processed papers:
83
83
 
84
84
  - Extract entities (methods, datasets, metrics, tools, concepts)
85
85
  - Map relationships between entities (uses, extends, contradicts, supports)
86
- - Link to external knowledge bases (Semantic Scholar, OpenAlex, DOI)
86
+ - Link to external knowledge bases (OpenAlex, CrossRef, DOI)
87
87
  - Track citation chains for key claims
88
88
  - Identify research lineages and methodological evolution
89
89
 
@@ -52,7 +52,7 @@ Search systematically across source tiers:
52
52
 
53
53
  | Tier | Source Type | Examples | Purpose |
54
54
  |------|-----------|---------|---------|
55
- | **1** | Academic databases | Semantic Scholar, PubMed, Scopus, Web of Science | Peer-reviewed primary research |
55
+ | **1** | Academic databases | OpenAlex, PubMed, Scopus, Web of Science | Peer-reviewed primary research |
56
56
  | **2** | Preprint servers | arXiv, bioRxiv, SSRN, medRxiv | Cutting-edge, not yet reviewed |
57
57
  | **3** | Grey literature | WHO reports, World Bank, NBER working papers | Policy and institutional knowledge |
58
58
  | **4** | Patents and standards | Google Patents, USPTO, IEEE standards | Technical implementations |
@@ -48,7 +48,7 @@ You are an AI Scientist conducting rigorous research.
48
48
  Follow the scientific method strictly:
49
49
 
50
50
  1. **Literature Review**: Search for related work before
51
- proposing anything new. Use Semantic Scholar API.
51
+ proposing anything new. Use OpenAlex API.
52
52
  2. **Hypothesis**: State falsifiable hypotheses clearly.
53
53
  3. **Experiment Design**: Define independent/dependent
54
54
  variables, controls, evaluation metrics.
@@ -62,7 +62,7 @@ Follow the scientific method strictly:
62
62
  ## Tools Available
63
63
  - Python 3.11+ with PyTorch, NumPy, SciPy
64
64
  - LaTeX (pdflatex + bibtex)
65
- - Semantic Scholar API for literature
65
+ - OpenAlex API for literature
66
66
  - W&B for experiment tracking (optional)
67
67
  ```
68
68
 
@@ -153,7 +153,7 @@ Analyze results and write paper:
153
153
  - Method (formal description)
154
154
  - Experiments (setup + results + analysis)
155
155
  - Conclusion (summary + limitations + future)
156
- 5. Verify all citations are real (Semantic Scholar)
156
+ 5. Verify all citations are real (OpenAlex/CrossRef)
157
157
  """
158
158
  ```
159
159
 
@@ -62,7 +62,7 @@ from scientific_agent import HypothesisGenerator
62
62
 
63
63
  generator = HypothesisGenerator(
64
64
  llm_provider="anthropic",
65
- knowledge_sources=["pubmed", "semantic_scholar"],
65
+ knowledge_sources=["pubmed", "openalex"],
66
66
  )
67
67
 
68
68
  hypotheses = generator.generate(
@@ -16,7 +16,7 @@ metadata:
16
16
 
17
17
  Local Deep Research is an open-source deep research tool with over 4,000 GitHub stars that conducts comprehensive multi-source research using either local LLMs (via Ollama, LM Studio, or vLLM) or cloud-based models. It searches across 10+ academic and web sources simultaneously, synthesizes the findings, and produces well-cited research reports. The project is designed for researchers who need thorough, multi-perspective research coverage while maintaining the option to keep everything running locally for privacy.
18
18
 
19
- What makes Local Deep Research stand out is its breadth of search integration. Rather than relying on a single search API, it queries multiple sources in parallel -- including Google Scholar, Semantic Scholar, arXiv, PubMed, Wikipedia, web search engines, and more -- then cross-references and synthesizes the results. This multi-source approach produces more comprehensive and balanced research outputs compared to single-source tools.
19
+ What makes Local Deep Research stand out is its breadth of search integration. Rather than relying on a single search API, it queries multiple sources in parallel -- including Google Scholar, OpenAlex, arXiv, PubMed, Wikipedia, web search engines, and more -- then cross-references and synthesizes the results. This multi-source approach produces more comprehensive and balanced research outputs compared to single-source tools.
20
20
 
21
21
  The tool is particularly well-suited for academic researchers who need to conduct preliminary literature reviews, verify claims across multiple databases, or explore interdisciplinary topics where relevant work may be scattered across different platforms and publication venues.
22
22
 
@@ -94,7 +94,7 @@ from local_deep_research import DeepResearcher
94
94
  researcher = DeepResearcher(
95
95
  llm_provider="ollama",
96
96
  llm_model="llama3.1:70b",
97
- search_sources=["google_scholar", "semantic_scholar",
97
+ search_sources=["google_scholar", "openalex",
98
98
  "arxiv", "web"],
99
99
  max_iterations=10,
100
100
  )
@@ -114,7 +114,7 @@ Local Deep Research queries multiple sources in parallel for each research sub-q
114
114
  | Source | Type | API Key Required | Best For |
115
115
  |--------|------|-----------------|----------|
116
116
  | Google Scholar | Academic | No (via scraping) | Broad academic search |
117
- | Semantic Scholar | Academic | Optional | CS/AI papers, citation data |
117
+ | OpenAlex | Academic | No | Cross-disciplinary, citation data |
118
118
  | arXiv | Academic | No | Preprints, ML/physics/math |
119
119
  | PubMed | Academic | No | Biomedical literature |
120
120
  | Wikipedia | Encyclopedia | No | Background and definitions |
@@ -128,12 +128,12 @@ Local Deep Research queries multiple sources in parallel for each research sub-q
128
128
  # Customize source priorities for your research domain
129
129
  researcher = DeepResearcher(
130
130
  search_sources={
131
- "primary": ["semantic_scholar", "arxiv"],
131
+ "primary": ["openalex", "arxiv"],
132
132
  "secondary": ["google_scholar", "web"],
133
133
  "reference": ["wikipedia", "crossref"],
134
134
  },
135
135
  source_weights={
136
- "semantic_scholar": 1.5, # Prioritize academic sources
136
+ "openalex": 1.5, # Prioritize academic sources
137
137
  "arxiv": 1.5,
138
138
  "web": 0.8,
139
139
  },
@@ -249,5 +249,5 @@ local-deep-research "Your sensitive research query here"
249
249
  - Repository: https://github.com/LearningCircuit/local-deep-research
250
250
  - Ollama: https://ollama.com/
251
251
  - SearXNG: https://github.com/searxng/searxng
252
- - Semantic Scholar API: https://api.semanticscholar.org/
252
+ - OpenAlex API: https://api.openalex.org/
253
253
  - arXiv API: https://info.arxiv.org/help/api/
@@ -43,14 +43,14 @@ result = researcher.research(
43
43
 
44
44
  ```python
45
45
  # Each sub-question triggers:
46
- # - Academic search (Semantic Scholar, arXiv)
46
+ # - Academic search (OpenAlex, arXiv)
47
47
  # - Paper reading (abstract + key sections)
48
48
  # - Evidence extraction
49
49
  # - Follow-up question generation
50
50
 
51
51
  # Configuration
52
52
  researcher = OpenResearcher(
53
- search_backends=["semantic_scholar", "arxiv"],
53
+ search_backends=["openalex", "arxiv"],
54
54
  max_iterations=5, # Research rounds per sub-question
55
55
  papers_per_iteration=10, # Papers to read per round
56
56
  follow_up_questions=True, # Generate follow-up questions
@@ -96,7 +96,7 @@ researcher = OpenResearcher(
96
96
  llm_provider="anthropic",
97
97
  model="claude-sonnet-4-20250514",
98
98
  search_config={
99
- "backends": ["semantic_scholar", "arxiv"],
99
+ "backends": ["openalex", "arxiv"],
100
100
  "max_results_per_query": 20,
101
101
  },
102
102
  reading_config={
@@ -119,12 +119,12 @@ DeepResearch integrates with multiple search providers to cast a wide net:
119
119
  - **Tavily**: AI-optimized search API designed for research agents
120
120
  - **Serper**: Fast Google search results API
121
121
  - **SearXNG**: Self-hosted meta-search engine for privacy-focused deployments
122
- - **Semantic Scholar API**: Direct academic paper search (no API key required for basic access)
122
+ - **OpenAlex API**: Direct academic paper search (free, no API key required)
123
123
 
124
124
  ```python
125
125
  # Configure multiple search backends for comprehensive coverage
126
126
  agent = DeepResearch(
127
- search_engines=["bing", "semantic_scholar"],
127
+ search_engines=["bing", "openalex"],
128
128
  search_strategy="parallel", # Search all engines simultaneously
129
129
  )
130
130
  ```
@@ -151,7 +151,7 @@ Create research profiles optimized for specific academic domains:
151
151
  # Biomedical research profile
152
152
  bio_config = {
153
153
  "preferred_sources": ["pubmed", "biorxiv", "nature", "science"],
154
- "search_engines": ["semantic_scholar", "bing"],
154
+ "search_engines": ["openalex", "bing"],
155
155
  "terminology_mode": "technical",
156
156
  "citation_format": "apa",
157
157
  }
@@ -214,4 +214,4 @@ The trace includes all search queries, retrieved documents, LLM prompts and resp
214
214
  - Repository: https://github.com/Alibaba-NLP/DeepResearch
215
215
  - Qwen model family: https://github.com/QwenLM/Qwen
216
216
  - Alibaba NLP group: https://github.com/Alibaba-NLP
217
- - Semantic Scholar API: https://api.semanticscholar.org/
217
+ - OpenAlex API: https://api.openalex.org/
@@ -30,7 +30,7 @@ A strong research question is the foundation of any good paper. It should be spe
30
30
  |-----------|-------------|---------------|
31
31
  | **F**easible | Can be answered with available resources | Do you have the data, compute, and time? |
32
32
  | **I**nteresting | Engages the research community | Would peers read this at a top venue? |
33
- | **N**ovel | Not already answered | Has Semantic Scholar search been done? |
33
+ | **N**ovel | Not already answered | Has OpenAlex/CrossRef search been done? |
34
34
  | **E**thical | Follows research ethics standards | Does it require IRB approval? |
35
35
  | **R**elevant | Advances the field meaningfully | Does it connect to open problems? |
36
36
 
@@ -274,7 +274,7 @@ Plagiarism and integrity:
274
274
 
275
275
  Reference management:
276
276
  - scite.ai: smart citation analysis (supporting/contrasting)
277
- - Semantic Scholar: related work discovery
277
+ - OpenAlex: related work discovery
278
278
  - Connected Papers: citation graph visualization
279
279
  ```
280
280
 
@@ -86,7 +86,7 @@ For software or experimental system diagrams, use grouped rectangles with labele
86
86
  Input: "Draw a system architecture with three layers:
87
87
  Frontend (React dashboard),
88
88
  Backend (FastAPI + PostgreSQL),
89
- External (Semantic Scholar API, CrossRef API)"
89
+ External (OpenAlex API, CrossRef API)"
90
90
  ```
91
91
 
92
92
  The output places each layer as a dashed-border container with internal component boxes and inter-layer arrows.
@@ -56,7 +56,7 @@ C4Context
56
56
 
57
57
  System(platform, "Wentor Platform", "AI-powered research assistant ecosystem")
58
58
 
59
- System_Ext(scholar, "Semantic Scholar", "Academic paper database")
59
+ System_Ext(scholar, "OpenAlex", "Academic paper database")
60
60
  System_Ext(crossref, "CrossRef", "DOI resolution and metadata")
61
61
  System_Ext(github, "GitHub", "Code and skill repositories")
62
62
 
@@ -225,7 +225,7 @@ package "Data Layer" {
225
225
  }
226
226
 
227
227
  package "External APIs" {
228
- [Semantic Scholar] as S2
228
+ [Unpaywall] as UP
229
229
  [CrossRef] as CR
230
230
  [OpenAlex] as OA
231
231
  }
@@ -16,7 +16,7 @@ metadata:
16
16
 
17
17
  Academic PDFs are the primary format for distributing research, yet extracting structured data from them remains challenging. PDFs encode visual layout, not semantic structure -- headings, paragraphs, equations, tables, and citations are all just positioned text and graphics. GROBID (GeneRation Of BIbliographic Data) is the leading open-source tool for parsing academic PDFs into structured XML/TEI format, extracting metadata, body text, references, and figures with high accuracy.
18
18
 
19
- GROBID is used by major academic platforms including Semantic Scholar, CORE, and ResearchGate for large-scale document processing. It combines machine learning models (CRF and deep learning) with heuristic rules to handle the diverse formatting of academic papers across publishers and disciplines.
19
+ GROBID is used by major academic platforms including CORE, ResearchGate, and others for large-scale document processing. It combines machine learning models (CRF and deep learning) with heuristic rules to handle the diverse formatting of academic papers across publishers and disciplines.
20
20
 
21
21
  This guide covers installing and running GROBID, using its REST API for batch processing, extracting specific elements (metadata, references, body sections), and integrating GROBID output into downstream workflows such as knowledge bases, systematic reviews, and literature analysis pipelines.
22
22
 
@@ -32,7 +32,7 @@ Both modes begin by parsing the paper's structure from its PDF or HTML source, e
32
32
  | DOI | Resolve via CrossRef/Unpaywall | Auto-fetches open access version |
33
33
  | arXiv ID | `https://arxiv.org/pdf/{id}` | Always available |
34
34
  | URL | Direct download | May require institutional access |
35
- | Semantic Scholar ID | S2 API + PDF link | Includes metadata |
35
+ | OpenAlex ID | OpenAlex API + OA link | Includes metadata |
36
36
 
37
37
  ### PDF Parsing Pipeline
38
38
 
@@ -238,6 +238,6 @@ comparison = create_comparison_table(summaries,
238
238
 
239
239
  - GROBID: https://github.com/kermitt2/grobid
240
240
  - PyMuPDF: https://pymupdf.readthedocs.io
241
- - Semantic Scholar API: https://api.semanticscholar.org
241
+ - OpenAlex API: https://api.openalex.org
242
242
  - Unpaywall API: https://unpaywall.org/products/api
243
243
  - S. Keshav, "How to Read a Paper" (2007): http://ccr.sigcomm.org/online/files/p83-keshavA.pdf
@@ -44,12 +44,12 @@ OpenAlex (free):
44
44
  - Limits: Reference linking less complete than WoS
45
45
  - Best for: Large-scale analysis, reproducible research
46
46
 
47
- Semantic Scholar (free):
47
+ CrossRef (free):
48
48
  - Format: JSON via REST API
49
- - Coverage: ~200M papers, strong in CS/Biomed
50
- - Strengths: Free, citation context, citation intents
51
- - Limits: Weaker coverage in humanities and social sciences
52
- - Best for: CS/AI-focused networks, citation intent analysis
49
+ - Coverage: ~150M DOIs across all publishers
50
+ - Strengths: Free, authoritative DOI metadata, reference linking
51
+ - Limits: No abstract text, citation counts may lag
52
+ - Best for: Cross-publisher networks, DOI resolution
53
53
  ```
54
54
 
55
55
  ### Data Cleaning for Network Construction
@@ -293,7 +293,7 @@ Provide a detailed answer citing specific papers, methods, and findings from the
293
293
  - **Start with a clear schema.** Define your entity types and relations before extracting data. A schema change later requires re-processing.
294
294
  - **Use persistent identifiers.** DOIs for papers, ORCIDs for authors, and canonical names for methods prevent duplicate nodes.
295
295
  - **Validate extracted triples.** LLM extraction is imperfect. Sample and manually verify 5-10% of extractions.
296
- - **Enrich with external data.** Link your KG to OpenAlex, Semantic Scholar, or Wikidata for additional metadata.
296
+ - **Enrich with external data.** Link your KG to OpenAlex, CrossRef, or Wikidata for additional metadata.
297
297
  - **Version your graph.** Export snapshots regularly and track changes over time.
298
298
  - **Design queries before building.** Know what questions you want to answer before deciding on the schema.
299
299
 
@@ -28,7 +28,6 @@ APIs are always preferable to scraping when available. They provide structured d
28
28
 
29
29
  | API | Data | Rate Limit | Auth |
30
30
  |-----|------|-----------|------|
31
- | Semantic Scholar | Papers, authors, citations | 100 req/sec (with key) | API key (free) |
32
31
  | OpenAlex | Papers, authors, venues, concepts | 100K req/day | Email in header |
33
32
  | Crossref | DOI metadata | 50 req/sec (polite pool) | Email in header |
34
33
  | PubMed (Entrez) | Biomedical literature | 10 req/sec (with key) | API key (free) |
@@ -319,7 +318,7 @@ class DataCollector:
319
318
  ## References
320
319
 
321
320
  - [OpenAlex API Documentation](https://docs.openalex.org/) -- Open bibliographic data API
322
- - [Semantic Scholar API](https://api.semanticscholar.org/) -- Paper and author data
321
+ - [CrossRef API](https://api.crossref.org/) -- DOI resolution and metadata
323
322
  - [BeautifulSoup Documentation](https://www.crummy.com/software/BeautifulSoup/bs4/doc/) -- HTML parsing
324
323
  - [Scrapy Documentation](https://docs.scrapy.org/) -- Web scraping framework
325
324
  - [Playwright Documentation](https://playwright.dev/python/) -- Browser automation