@wentorai/research-plugins 1.4.0 → 1.4.3
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.en.md +143 -0
- package/README.md +98 -131
- package/curated/literature/README.md +2 -2
- package/curated/writing/README.md +1 -1
- package/openclaw.plugin.json +1 -1
- package/package.json +1 -1
- package/skills/literature/discovery/SKILL.md +1 -1
- package/skills/literature/discovery/citation-alert-guide/SKILL.md +2 -2
- package/skills/literature/discovery/conference-proceedings-guide/SKILL.md +2 -2
- package/skills/literature/discovery/literature-mapping-guide/SKILL.md +1 -1
- package/skills/literature/discovery/paper-recommendation-guide/SKILL.md +8 -14
- package/skills/literature/discovery/rss-paper-feeds/SKILL.md +20 -14
- package/skills/literature/discovery/semantic-paper-radar/SKILL.md +8 -8
- package/skills/literature/discovery/semantic-scholar-recs-guide/SKILL.md +103 -86
- package/skills/literature/fulltext/open-access-guide/SKILL.md +1 -1
- package/skills/literature/fulltext/open-access-mining-guide/SKILL.md +5 -5
- package/skills/literature/metadata/citation-network-guide/SKILL.md +3 -3
- package/skills/literature/metadata/h-index-guide/SKILL.md +0 -27
- package/skills/literature/search/SKILL.md +1 -1
- package/skills/literature/search/citation-chaining-guide/SKILL.md +42 -32
- package/skills/literature/search/database-comparison-guide/SKILL.md +1 -1
- package/skills/literature/search/semantic-scholar-api/SKILL.md +56 -53
- package/skills/research/automation/paper-to-agent-guide/SKILL.md +1 -1
- package/skills/research/deep-research/in-depth-research-guide/SKILL.md +1 -1
- package/skills/research/deep-research/kosmos-scientist-guide/SKILL.md +3 -3
- package/skills/research/deep-research/llm-scientific-discovery-guide/SKILL.md +1 -1
- package/skills/research/deep-research/local-deep-research-guide/SKILL.md +6 -6
- package/skills/research/deep-research/open-researcher-guide/SKILL.md +3 -3
- package/skills/research/deep-research/tongyi-deep-research-guide/SKILL.md +4 -4
- package/skills/research/methodology/grad-school-guide/SKILL.md +1 -1
- package/skills/research/paper-review/automated-review-guide/SKILL.md +1 -1
- package/skills/tools/diagram/excalidraw-diagram-guide/SKILL.md +1 -1
- package/skills/tools/diagram/mermaid-architect-guide/SKILL.md +1 -1
- package/skills/tools/diagram/plantuml-guide/SKILL.md +1 -1
- package/skills/tools/document/grobid-pdf-parsing/SKILL.md +1 -1
- package/skills/tools/document/paper-parse-guide/SKILL.md +2 -2
- package/skills/tools/knowledge-graph/citation-network-builder/SKILL.md +5 -5
- package/skills/tools/knowledge-graph/knowledge-graph-construction/SKILL.md +1 -1
- package/skills/tools/scraping/academic-web-scraping/SKILL.md +1 -2
- package/skills/tools/scraping/google-scholar-scraper/SKILL.md +7 -7
- package/skills/writing/citation/SKILL.md +1 -1
- package/skills/writing/citation/academic-citation-manager/SKILL.md +20 -17
- package/skills/writing/citation/citation-assistant-skill/SKILL.md +72 -58
- package/skills/writing/citation/onecite-reference-guide/SKILL.md +1 -1
- package/skills/writing/citation/zotero-reference-guide/SKILL.md +1 -1
- package/skills/writing/citation/zotero-scholar-guide/SKILL.md +1 -1
- package/src/tools/arxiv.ts +13 -3
- package/src/tools/biorxiv.ts +21 -5
- package/src/tools/crossref.ts +13 -6
- package/src/tools/datacite.ts +7 -3
- package/src/tools/doaj.ts +3 -2
- package/src/tools/europe-pmc.ts +4 -3
- package/src/tools/hal.ts +6 -4
- package/src/tools/inspire-hep.ts +3 -2
- package/src/tools/openaire.ts +11 -6
- package/src/tools/openalex.ts +17 -2
- package/src/tools/opencitations.ts +9 -0
- package/src/tools/orcid.ts +3 -0
- package/src/tools/osf-preprints.ts +3 -2
- package/src/tools/pubmed.ts +12 -5
- package/src/tools/unpaywall.ts +3 -0
- package/src/tools/util.ts +33 -0
- package/src/tools/zenodo.ts +10 -4
|
@@ -1,134 +1,137 @@
|
|
|
1
1
|
---
|
|
2
2
|
name: semantic-scholar-api
|
|
3
|
-
description: "Search papers and analyze citation graphs via
|
|
3
|
+
description: "Search papers and analyze citation graphs via OpenAlex and CrossRef APIs"
|
|
4
4
|
metadata:
|
|
5
5
|
openclaw:
|
|
6
6
|
emoji: "🔍"
|
|
7
7
|
category: "literature"
|
|
8
8
|
subcategory: "search"
|
|
9
9
|
keywords: ["academic database search", "semantic search", "AI-powered literature search", "citation analysis", "citation network"]
|
|
10
|
-
source: "https://api.
|
|
10
|
+
source: "https://api.openalex.org/"
|
|
11
11
|
---
|
|
12
12
|
|
|
13
|
-
#
|
|
13
|
+
# OpenAlex & CrossRef API Guide
|
|
14
14
|
|
|
15
15
|
## Overview
|
|
16
16
|
|
|
17
|
-
|
|
17
|
+
OpenAlex is a free, open catalog of the global research system, indexing over 250 million academic works across all fields of science. It provides structured access to papers, authors, institutions, concepts, and citation networks. OpenAlex is the successor to Microsoft Academic Graph and is maintained by OurResearch (the team behind Unpaywall).
|
|
18
18
|
|
|
19
|
-
|
|
19
|
+
CrossRef is the official DOI registration agency for scholarly content, providing metadata for over 150 million DOIs across all publishers and disciplines. Together, OpenAlex and CrossRef provide comprehensive coverage for academic search, citation analysis, and bibliometric research.
|
|
20
20
|
|
|
21
|
-
|
|
21
|
+
Both APIs are free to use without authentication. OpenAlex requests a polite `User-Agent` header; CrossRef requests a `User-Agent` with contact email for access to the polite pool (faster rate limits).
|
|
22
22
|
|
|
23
23
|
## Authentication
|
|
24
24
|
|
|
25
|
-
No authentication is required for
|
|
25
|
+
No authentication is required for either API.
|
|
26
26
|
|
|
27
|
+
OpenAlex: Include a `User-Agent` header for polite access:
|
|
27
28
|
```
|
|
28
|
-
|
|
29
|
+
User-Agent: ResearchPlugins/1.0 (https://wentor.ai)
|
|
29
30
|
```
|
|
30
31
|
|
|
31
|
-
|
|
32
|
+
CrossRef: Include a `User-Agent` header with contact email for polite pool:
|
|
33
|
+
```
|
|
34
|
+
User-Agent: ResearchPlugins/1.0 (https://wentor.ai; mailto:dev@wentor.ai)
|
|
35
|
+
```
|
|
32
36
|
|
|
33
37
|
## Core Endpoints
|
|
34
38
|
|
|
35
|
-
###
|
|
39
|
+
### OpenAlex: Search Works
|
|
36
40
|
|
|
37
|
-
- **URL**: `GET https://api.
|
|
41
|
+
- **URL**: `GET https://api.openalex.org/works`
|
|
38
42
|
- **Parameters**:
|
|
39
43
|
| Param | Type | Required | Description |
|
|
40
44
|
|-------|------|----------|-------------|
|
|
41
|
-
|
|
|
42
|
-
|
|
|
43
|
-
|
|
|
44
|
-
|
|
|
45
|
-
|
|
|
46
|
-
| fieldsOfStudy | string | No | Filter by field (e.g., Computer Science, Medicine) |
|
|
45
|
+
| search | string | No | Full-text search query |
|
|
46
|
+
| filter | string | No | Filter expression (e.g., `from_publication_date:2024-01-01`) |
|
|
47
|
+
| sort | string | No | Sort field (e.g., `cited_by_count:desc`, `publication_date:desc`) |
|
|
48
|
+
| per_page | integer | No | Results per page (default: 25, max: 200) |
|
|
49
|
+
| page | integer | No | Page number (default: 1) |
|
|
47
50
|
- **Example**:
|
|
48
51
|
```bash
|
|
49
|
-
curl "https://api.
|
|
52
|
+
curl "https://api.openalex.org/works?search=attention+is+all+you+need&per_page=5"
|
|
50
53
|
```
|
|
51
|
-
- **Response**: JSON with `
|
|
54
|
+
- **Response**: JSON with `meta` (count, page info) and `results` array containing work objects.
|
|
52
55
|
|
|
53
|
-
###
|
|
56
|
+
### OpenAlex: Get Work Details
|
|
54
57
|
|
|
55
|
-
- **URL**: `GET https://api.
|
|
58
|
+
- **URL**: `GET https://api.openalex.org/works/{id}`
|
|
56
59
|
- **Parameters**:
|
|
57
60
|
| Param | Type | Required | Description |
|
|
58
61
|
|-------|------|----------|-------------|
|
|
59
|
-
|
|
|
60
|
-
| fields | string | No | Comma-separated fields to return |
|
|
62
|
+
| id | string | Yes | OpenAlex ID (e.g., `W2741809807`), DOI URL, or other identifier |
|
|
61
63
|
- **Example**:
|
|
62
64
|
```bash
|
|
63
|
-
curl "https://api.
|
|
65
|
+
curl "https://api.openalex.org/works/W2741809807"
|
|
64
66
|
```
|
|
65
|
-
- **Response**: JSON with full
|
|
67
|
+
- **Response**: JSON with full work metadata including `id`, `title`, `abstract_inverted_index`, `publication_year`, `cited_by_count`, `authorships`, `concepts`, `referenced_works`.
|
|
66
68
|
|
|
67
|
-
###
|
|
69
|
+
### OpenAlex: Search Authors
|
|
68
70
|
|
|
69
|
-
- **URL**: `GET https://api.
|
|
71
|
+
- **URL**: `GET https://api.openalex.org/authors`
|
|
70
72
|
- **Parameters**:
|
|
71
73
|
| Param | Type | Required | Description |
|
|
72
74
|
|-------|------|----------|-------------|
|
|
73
|
-
|
|
|
74
|
-
|
|
|
75
|
-
|
|
|
76
|
-
| fields | string | No | Fields to return (e.g., name,paperCount,citationCount,hIndex) |
|
|
75
|
+
| search | string | No | Author name search |
|
|
76
|
+
| filter | string | No | Filter expression |
|
|
77
|
+
| per_page | integer | No | Results per page (max: 200) |
|
|
77
78
|
- **Example**:
|
|
78
79
|
```bash
|
|
79
|
-
curl "https://api.
|
|
80
|
+
curl "https://api.openalex.org/authors?search=Yoshua+Bengio&per_page=5"
|
|
80
81
|
```
|
|
81
|
-
- **Response**: JSON with author profiles including
|
|
82
|
+
- **Response**: JSON with author profiles including `works_count`, `cited_by_count`, `summary_stats.h_index`, affiliations.
|
|
82
83
|
|
|
83
|
-
###
|
|
84
|
+
### CrossRef: Resolve DOI
|
|
84
85
|
|
|
85
|
-
- **URL**: `GET https://api.
|
|
86
|
+
- **URL**: `GET https://api.crossref.org/works/{doi}`
|
|
86
87
|
- **Parameters**:
|
|
87
88
|
| Param | Type | Required | Description |
|
|
88
89
|
|-------|------|----------|-------------|
|
|
89
|
-
|
|
|
90
|
+
| doi | string | Yes | DOI to resolve (e.g., `10.1038/nature12373`) |
|
|
90
91
|
- **Example**:
|
|
91
92
|
```bash
|
|
92
|
-
curl "https://api.
|
|
93
|
+
curl "https://api.crossref.org/works/10.18653/v1/N19-1423"
|
|
93
94
|
```
|
|
94
|
-
- **Response**: JSON
|
|
95
|
+
- **Response**: JSON with full bibliographic metadata including title, authors, journal, dates, references count, and citation count.
|
|
95
96
|
|
|
96
97
|
## Rate Limits
|
|
97
98
|
|
|
98
|
-
|
|
99
|
+
OpenAlex: No strict rate limit, but use polite `User-Agent` header. Recommended: max 10 requests per second. The API returns HTTP 429 when limits are exceeded.
|
|
100
|
+
|
|
101
|
+
CrossRef: Without polite pool: ~50 requests per second. With polite pool (contact email in User-Agent): higher limits. The API returns HTTP 429 when limits are exceeded.
|
|
99
102
|
|
|
100
103
|
## Common Patterns
|
|
101
104
|
|
|
102
105
|
### Build a Citation Network
|
|
103
106
|
|
|
104
|
-
Retrieve a paper and
|
|
107
|
+
Retrieve a paper and find all works that cite it:
|
|
105
108
|
|
|
106
109
|
```bash
|
|
107
|
-
# Get paper
|
|
108
|
-
curl "https://api.
|
|
110
|
+
# Get paper details
|
|
111
|
+
curl "https://api.openalex.org/works/W2741809807"
|
|
112
|
+
|
|
113
|
+
# Get works citing this paper, sorted by citation count
|
|
114
|
+
curl "https://api.openalex.org/works?filter=cites:W2741809807&sort=cited_by_count:desc&per_page=20"
|
|
109
115
|
```
|
|
110
116
|
|
|
111
117
|
### Find Influential Papers on a Topic
|
|
112
118
|
|
|
113
|
-
Search for highly cited
|
|
119
|
+
Search for highly cited works on a topic:
|
|
114
120
|
|
|
115
121
|
```bash
|
|
116
|
-
curl "https://api.
|
|
122
|
+
curl "https://api.openalex.org/works?search=graph+neural+networks&sort=cited_by_count:desc&per_page=20"
|
|
117
123
|
```
|
|
118
124
|
|
|
119
|
-
### Batch Paper Lookup
|
|
125
|
+
### Batch Paper Lookup via CrossRef
|
|
120
126
|
|
|
121
|
-
|
|
127
|
+
Search CrossRef for papers matching a query, sorted by citation count:
|
|
122
128
|
|
|
123
129
|
```bash
|
|
124
|
-
curl
|
|
125
|
-
-H "Content-Type: application/json" \
|
|
126
|
-
-d '{"ids": ["DOI:10.1038/s41586-021-03819-2", "CorpusID:49313245"]}' \
|
|
127
|
-
--url-query "fields=title,year,citationCount"
|
|
130
|
+
curl "https://api.crossref.org/works?query=graph+neural+networks&sort=is-referenced-by-count&order=desc&rows=20"
|
|
128
131
|
```
|
|
129
132
|
|
|
130
133
|
## References
|
|
131
134
|
|
|
132
|
-
-
|
|
133
|
-
- API
|
|
134
|
-
-
|
|
135
|
+
- OpenAlex documentation: https://docs.openalex.org/
|
|
136
|
+
- CrossRef API documentation: https://api.crossref.org/swagger-ui/index.html
|
|
137
|
+
- OpenAlex source: https://github.com/ourresearch/openalex-guts
|
|
@@ -83,7 +83,7 @@ The skill supports building knowledge graphs from processed papers:
|
|
|
83
83
|
|
|
84
84
|
- Extract entities (methods, datasets, metrics, tools, concepts)
|
|
85
85
|
- Map relationships between entities (uses, extends, contradicts, supports)
|
|
86
|
-
- Link to external knowledge bases (
|
|
86
|
+
- Link to external knowledge bases (OpenAlex, CrossRef, DOI)
|
|
87
87
|
- Track citation chains for key claims
|
|
88
88
|
- Identify research lineages and methodological evolution
|
|
89
89
|
|
|
@@ -52,7 +52,7 @@ Search systematically across source tiers:
|
|
|
52
52
|
|
|
53
53
|
| Tier | Source Type | Examples | Purpose |
|
|
54
54
|
|------|-----------|---------|---------|
|
|
55
|
-
| **1** | Academic databases |
|
|
55
|
+
| **1** | Academic databases | OpenAlex, PubMed, Scopus, Web of Science | Peer-reviewed primary research |
|
|
56
56
|
| **2** | Preprint servers | arXiv, bioRxiv, SSRN, medRxiv | Cutting-edge, not yet reviewed |
|
|
57
57
|
| **3** | Grey literature | WHO reports, World Bank, NBER working papers | Policy and institutional knowledge |
|
|
58
58
|
| **4** | Patents and standards | Google Patents, USPTO, IEEE standards | Technical implementations |
|
|
@@ -48,7 +48,7 @@ You are an AI Scientist conducting rigorous research.
|
|
|
48
48
|
Follow the scientific method strictly:
|
|
49
49
|
|
|
50
50
|
1. **Literature Review**: Search for related work before
|
|
51
|
-
proposing anything new. Use
|
|
51
|
+
proposing anything new. Use OpenAlex API.
|
|
52
52
|
2. **Hypothesis**: State falsifiable hypotheses clearly.
|
|
53
53
|
3. **Experiment Design**: Define independent/dependent
|
|
54
54
|
variables, controls, evaluation metrics.
|
|
@@ -62,7 +62,7 @@ Follow the scientific method strictly:
|
|
|
62
62
|
## Tools Available
|
|
63
63
|
- Python 3.11+ with PyTorch, NumPy, SciPy
|
|
64
64
|
- LaTeX (pdflatex + bibtex)
|
|
65
|
-
-
|
|
65
|
+
- OpenAlex API for literature
|
|
66
66
|
- W&B for experiment tracking (optional)
|
|
67
67
|
```
|
|
68
68
|
|
|
@@ -153,7 +153,7 @@ Analyze results and write paper:
|
|
|
153
153
|
- Method (formal description)
|
|
154
154
|
- Experiments (setup + results + analysis)
|
|
155
155
|
- Conclusion (summary + limitations + future)
|
|
156
|
-
5. Verify all citations are real (
|
|
156
|
+
5. Verify all citations are real (OpenAlex/CrossRef)
|
|
157
157
|
"""
|
|
158
158
|
```
|
|
159
159
|
|
|
@@ -16,7 +16,7 @@ metadata:
|
|
|
16
16
|
|
|
17
17
|
Local Deep Research is an open-source deep research tool with over 4,000 GitHub stars that conducts comprehensive multi-source research using either local LLMs (via Ollama, LM Studio, or vLLM) or cloud-based models. It searches across 10+ academic and web sources simultaneously, synthesizes the findings, and produces well-cited research reports. The project is designed for researchers who need thorough, multi-perspective research coverage while maintaining the option to keep everything running locally for privacy.
|
|
18
18
|
|
|
19
|
-
What makes Local Deep Research stand out is its breadth of search integration. Rather than relying on a single search API, it queries multiple sources in parallel -- including Google Scholar,
|
|
19
|
+
What makes Local Deep Research stand out is its breadth of search integration. Rather than relying on a single search API, it queries multiple sources in parallel -- including Google Scholar, OpenAlex, arXiv, PubMed, Wikipedia, web search engines, and more -- then cross-references and synthesizes the results. This multi-source approach produces more comprehensive and balanced research outputs compared to single-source tools.
|
|
20
20
|
|
|
21
21
|
The tool is particularly well-suited for academic researchers who need to conduct preliminary literature reviews, verify claims across multiple databases, or explore interdisciplinary topics where relevant work may be scattered across different platforms and publication venues.
|
|
22
22
|
|
|
@@ -94,7 +94,7 @@ from local_deep_research import DeepResearcher
|
|
|
94
94
|
researcher = DeepResearcher(
|
|
95
95
|
llm_provider="ollama",
|
|
96
96
|
llm_model="llama3.1:70b",
|
|
97
|
-
search_sources=["google_scholar", "
|
|
97
|
+
search_sources=["google_scholar", "openalex",
|
|
98
98
|
"arxiv", "web"],
|
|
99
99
|
max_iterations=10,
|
|
100
100
|
)
|
|
@@ -114,7 +114,7 @@ Local Deep Research queries multiple sources in parallel for each research sub-q
|
|
|
114
114
|
| Source | Type | API Key Required | Best For |
|
|
115
115
|
|--------|------|-----------------|----------|
|
|
116
116
|
| Google Scholar | Academic | No (via scraping) | Broad academic search |
|
|
117
|
-
|
|
|
117
|
+
| OpenAlex | Academic | No | Cross-disciplinary, citation data |
|
|
118
118
|
| arXiv | Academic | No | Preprints, ML/physics/math |
|
|
119
119
|
| PubMed | Academic | No | Biomedical literature |
|
|
120
120
|
| Wikipedia | Encyclopedia | No | Background and definitions |
|
|
@@ -128,12 +128,12 @@ Local Deep Research queries multiple sources in parallel for each research sub-q
|
|
|
128
128
|
# Customize source priorities for your research domain
|
|
129
129
|
researcher = DeepResearcher(
|
|
130
130
|
search_sources={
|
|
131
|
-
"primary": ["
|
|
131
|
+
"primary": ["openalex", "arxiv"],
|
|
132
132
|
"secondary": ["google_scholar", "web"],
|
|
133
133
|
"reference": ["wikipedia", "crossref"],
|
|
134
134
|
},
|
|
135
135
|
source_weights={
|
|
136
|
-
"
|
|
136
|
+
"openalex": 1.5, # Prioritize academic sources
|
|
137
137
|
"arxiv": 1.5,
|
|
138
138
|
"web": 0.8,
|
|
139
139
|
},
|
|
@@ -249,5 +249,5 @@ local-deep-research "Your sensitive research query here"
|
|
|
249
249
|
- Repository: https://github.com/LearningCircuit/local-deep-research
|
|
250
250
|
- Ollama: https://ollama.com/
|
|
251
251
|
- SearXNG: https://github.com/searxng/searxng
|
|
252
|
-
-
|
|
252
|
+
- OpenAlex API: https://api.openalex.org/
|
|
253
253
|
- arXiv API: https://info.arxiv.org/help/api/
|
|
@@ -43,14 +43,14 @@ result = researcher.research(
|
|
|
43
43
|
|
|
44
44
|
```python
|
|
45
45
|
# Each sub-question triggers:
|
|
46
|
-
# - Academic search (
|
|
46
|
+
# - Academic search (OpenAlex, arXiv)
|
|
47
47
|
# - Paper reading (abstract + key sections)
|
|
48
48
|
# - Evidence extraction
|
|
49
49
|
# - Follow-up question generation
|
|
50
50
|
|
|
51
51
|
# Configuration
|
|
52
52
|
researcher = OpenResearcher(
|
|
53
|
-
search_backends=["
|
|
53
|
+
search_backends=["openalex", "arxiv"],
|
|
54
54
|
max_iterations=5, # Research rounds per sub-question
|
|
55
55
|
papers_per_iteration=10, # Papers to read per round
|
|
56
56
|
follow_up_questions=True, # Generate follow-up questions
|
|
@@ -96,7 +96,7 @@ researcher = OpenResearcher(
|
|
|
96
96
|
llm_provider="anthropic",
|
|
97
97
|
model="claude-sonnet-4-20250514",
|
|
98
98
|
search_config={
|
|
99
|
-
"backends": ["
|
|
99
|
+
"backends": ["openalex", "arxiv"],
|
|
100
100
|
"max_results_per_query": 20,
|
|
101
101
|
},
|
|
102
102
|
reading_config={
|
|
@@ -119,12 +119,12 @@ DeepResearch integrates with multiple search providers to cast a wide net:
|
|
|
119
119
|
- **Tavily**: AI-optimized search API designed for research agents
|
|
120
120
|
- **Serper**: Fast Google search results API
|
|
121
121
|
- **SearXNG**: Self-hosted meta-search engine for privacy-focused deployments
|
|
122
|
-
- **
|
|
122
|
+
- **OpenAlex API**: Direct academic paper search (free, no API key required)
|
|
123
123
|
|
|
124
124
|
```python
|
|
125
125
|
# Configure multiple search backends for comprehensive coverage
|
|
126
126
|
agent = DeepResearch(
|
|
127
|
-
search_engines=["bing", "
|
|
127
|
+
search_engines=["bing", "openalex"],
|
|
128
128
|
search_strategy="parallel", # Search all engines simultaneously
|
|
129
129
|
)
|
|
130
130
|
```
|
|
@@ -151,7 +151,7 @@ Create research profiles optimized for specific academic domains:
|
|
|
151
151
|
# Biomedical research profile
|
|
152
152
|
bio_config = {
|
|
153
153
|
"preferred_sources": ["pubmed", "biorxiv", "nature", "science"],
|
|
154
|
-
"search_engines": ["
|
|
154
|
+
"search_engines": ["openalex", "bing"],
|
|
155
155
|
"terminology_mode": "technical",
|
|
156
156
|
"citation_format": "apa",
|
|
157
157
|
}
|
|
@@ -214,4 +214,4 @@ The trace includes all search queries, retrieved documents, LLM prompts and resp
|
|
|
214
214
|
- Repository: https://github.com/Alibaba-NLP/DeepResearch
|
|
215
215
|
- Qwen model family: https://github.com/QwenLM/Qwen
|
|
216
216
|
- Alibaba NLP group: https://github.com/Alibaba-NLP
|
|
217
|
-
-
|
|
217
|
+
- OpenAlex API: https://api.openalex.org/
|
|
@@ -30,7 +30,7 @@ A strong research question is the foundation of any good paper. It should be spe
|
|
|
30
30
|
|-----------|-------------|---------------|
|
|
31
31
|
| **F**easible | Can be answered with available resources | Do you have the data, compute, and time? |
|
|
32
32
|
| **I**nteresting | Engages the research community | Would peers read this at a top venue? |
|
|
33
|
-
| **N**ovel | Not already answered | Has
|
|
33
|
+
| **N**ovel | Not already answered | Has OpenAlex/CrossRef search been done? |
|
|
34
34
|
| **E**thical | Follows research ethics standards | Does it require IRB approval? |
|
|
35
35
|
| **R**elevant | Advances the field meaningfully | Does it connect to open problems? |
|
|
36
36
|
|
|
@@ -274,7 +274,7 @@ Plagiarism and integrity:
|
|
|
274
274
|
|
|
275
275
|
Reference management:
|
|
276
276
|
- scite.ai: smart citation analysis (supporting/contrasting)
|
|
277
|
-
-
|
|
277
|
+
- OpenAlex: related work discovery
|
|
278
278
|
- Connected Papers: citation graph visualization
|
|
279
279
|
```
|
|
280
280
|
|
|
@@ -86,7 +86,7 @@ For software or experimental system diagrams, use grouped rectangles with labele
|
|
|
86
86
|
Input: "Draw a system architecture with three layers:
|
|
87
87
|
Frontend (React dashboard),
|
|
88
88
|
Backend (FastAPI + PostgreSQL),
|
|
89
|
-
External (
|
|
89
|
+
External (OpenAlex API, CrossRef API)"
|
|
90
90
|
```
|
|
91
91
|
|
|
92
92
|
The output places each layer as a dashed-border container with internal component boxes and inter-layer arrows.
|
|
@@ -56,7 +56,7 @@ C4Context
|
|
|
56
56
|
|
|
57
57
|
System(platform, "Wentor Platform", "AI-powered research assistant ecosystem")
|
|
58
58
|
|
|
59
|
-
System_Ext(scholar, "
|
|
59
|
+
System_Ext(scholar, "OpenAlex", "Academic paper database")
|
|
60
60
|
System_Ext(crossref, "CrossRef", "DOI resolution and metadata")
|
|
61
61
|
System_Ext(github, "GitHub", "Code and skill repositories")
|
|
62
62
|
|
|
@@ -16,7 +16,7 @@ metadata:
|
|
|
16
16
|
|
|
17
17
|
Academic PDFs are the primary format for distributing research, yet extracting structured data from them remains challenging. PDFs encode visual layout, not semantic structure -- headings, paragraphs, equations, tables, and citations are all just positioned text and graphics. GROBID (GeneRation Of BIbliographic Data) is the leading open-source tool for parsing academic PDFs into structured XML/TEI format, extracting metadata, body text, references, and figures with high accuracy.
|
|
18
18
|
|
|
19
|
-
GROBID is used by major academic platforms including
|
|
19
|
+
GROBID is used by major academic platforms including CORE, ResearchGate, and others for large-scale document processing. It combines machine learning models (CRF and deep learning) with heuristic rules to handle the diverse formatting of academic papers across publishers and disciplines.
|
|
20
20
|
|
|
21
21
|
This guide covers installing and running GROBID, using its REST API for batch processing, extracting specific elements (metadata, references, body sections), and integrating GROBID output into downstream workflows such as knowledge bases, systematic reviews, and literature analysis pipelines.
|
|
22
22
|
|
|
@@ -32,7 +32,7 @@ Both modes begin by parsing the paper's structure from its PDF or HTML source, e
|
|
|
32
32
|
| DOI | Resolve via CrossRef/Unpaywall | Auto-fetches open access version |
|
|
33
33
|
| arXiv ID | `https://arxiv.org/pdf/{id}` | Always available |
|
|
34
34
|
| URL | Direct download | May require institutional access |
|
|
35
|
-
|
|
|
35
|
+
| OpenAlex ID | OpenAlex API + OA link | Includes metadata |
|
|
36
36
|
|
|
37
37
|
### PDF Parsing Pipeline
|
|
38
38
|
|
|
@@ -238,6 +238,6 @@ comparison = create_comparison_table(summaries,
|
|
|
238
238
|
|
|
239
239
|
- GROBID: https://github.com/kermitt2/grobid
|
|
240
240
|
- PyMuPDF: https://pymupdf.readthedocs.io
|
|
241
|
-
-
|
|
241
|
+
- OpenAlex API: https://api.openalex.org
|
|
242
242
|
- Unpaywall API: https://unpaywall.org/products/api
|
|
243
243
|
- S. Keshav, "How to Read a Paper" (2007): http://ccr.sigcomm.org/online/files/p83-keshavA.pdf
|
|
@@ -44,12 +44,12 @@ OpenAlex (free):
|
|
|
44
44
|
- Limits: Reference linking less complete than WoS
|
|
45
45
|
- Best for: Large-scale analysis, reproducible research
|
|
46
46
|
|
|
47
|
-
|
|
47
|
+
CrossRef (free):
|
|
48
48
|
- Format: JSON via REST API
|
|
49
|
-
- Coverage: ~
|
|
50
|
-
- Strengths: Free,
|
|
51
|
-
- Limits:
|
|
52
|
-
- Best for:
|
|
49
|
+
- Coverage: ~150M DOIs across all publishers
|
|
50
|
+
- Strengths: Free, authoritative DOI metadata, reference linking
|
|
51
|
+
- Limits: No abstract text, citation counts may lag
|
|
52
|
+
- Best for: Cross-publisher networks, DOI resolution
|
|
53
53
|
```
|
|
54
54
|
|
|
55
55
|
### Data Cleaning for Network Construction
|
|
@@ -293,7 +293,7 @@ Provide a detailed answer citing specific papers, methods, and findings from the
|
|
|
293
293
|
- **Start with a clear schema.** Define your entity types and relations before extracting data. A schema change later requires re-processing.
|
|
294
294
|
- **Use persistent identifiers.** DOIs for papers, ORCIDs for authors, and canonical names for methods prevent duplicate nodes.
|
|
295
295
|
- **Validate extracted triples.** LLM extraction is imperfect. Sample and manually verify 5-10% of extractions.
|
|
296
|
-
- **Enrich with external data.** Link your KG to OpenAlex,
|
|
296
|
+
- **Enrich with external data.** Link your KG to OpenAlex, CrossRef, or Wikidata for additional metadata.
|
|
297
297
|
- **Version your graph.** Export snapshots regularly and track changes over time.
|
|
298
298
|
- **Design queries before building.** Know what questions you want to answer before deciding on the schema.
|
|
299
299
|
|
|
@@ -28,7 +28,6 @@ APIs are always preferable to scraping when available. They provide structured d
|
|
|
28
28
|
|
|
29
29
|
| API | Data | Rate Limit | Auth |
|
|
30
30
|
|-----|------|-----------|------|
|
|
31
|
-
| Semantic Scholar | Papers, authors, citations | 100 req/sec (with key) | API key (free) |
|
|
32
31
|
| OpenAlex | Papers, authors, venues, concepts | 100K req/day | Email in header |
|
|
33
32
|
| Crossref | DOI metadata | 50 req/sec (polite pool) | Email in header |
|
|
34
33
|
| PubMed (Entrez) | Biomedical literature | 10 req/sec (with key) | API key (free) |
|
|
@@ -319,7 +318,7 @@ class DataCollector:
|
|
|
319
318
|
## References
|
|
320
319
|
|
|
321
320
|
- [OpenAlex API Documentation](https://docs.openalex.org/) -- Open bibliographic data API
|
|
322
|
-
- [
|
|
321
|
+
- [CrossRef API](https://api.crossref.org/) -- DOI resolution and metadata
|
|
323
322
|
- [BeautifulSoup Documentation](https://www.crummy.com/software/BeautifulSoup/bs4/doc/) -- HTML parsing
|
|
324
323
|
- [Scrapy Documentation](https://docs.scrapy.org/) -- Web scraping framework
|
|
325
324
|
- [Playwright Documentation](https://playwright.dev/python/) -- Browser automation
|
|
@@ -41,7 +41,7 @@ Ethical guidelines:
|
|
|
41
41
|
OpenAlex could answer it instead
|
|
42
42
|
|
|
43
43
|
Official and semi-official alternatives:
|
|
44
|
-
-
|
|
44
|
+
- OpenAlex API: free, no key required, excellent coverage
|
|
45
45
|
- OpenAlex API: free, comprehensive, well-documented
|
|
46
46
|
- Crossref API: free, DOI-based metadata and citation counts
|
|
47
47
|
- CORE API: free, full-text open access content
|
|
@@ -232,12 +232,12 @@ OpenAlex (openalex.org):
|
|
|
232
232
|
- Data: titles, abstracts, citations, authors, institutions
|
|
233
233
|
- Best for: large-scale bibliometric analysis
|
|
234
234
|
|
|
235
|
-
|
|
236
|
-
- Coverage:
|
|
237
|
-
- API: REST,
|
|
238
|
-
- Rate limit:
|
|
239
|
-
- Data: titles, abstracts, citations,
|
|
240
|
-
- Best for:
|
|
235
|
+
OpenAlex (openalex.org):
|
|
236
|
+
- Coverage: 250M+ works, all disciplines
|
|
237
|
+
- API: REST, no key required
|
|
238
|
+
- Rate limit: ~10 requests/sec polite
|
|
239
|
+
- Data: titles, abstracts, citations, concepts, author profiles
|
|
240
|
+
- Best for: cross-disciplinary analysis, open data research
|
|
241
241
|
|
|
242
242
|
Crossref (crossref.org):
|
|
243
243
|
- Coverage: 130M+ DOIs
|
|
@@ -11,7 +11,7 @@ Select the skill matching the user's need, then `read` its SKILL.md.
|
|
|
11
11
|
|-------|-------------|
|
|
12
12
|
| [academic-citation-manager](./academic-citation-manager/SKILL.md) | Manage academic citations across BibTeX, APA, MLA, and Chicago formats |
|
|
13
13
|
| [bibtex-management-guide](./bibtex-management-guide/SKILL.md) | Clean, format, deduplicate, and manage BibTeX bibliography files for LaTeX |
|
|
14
|
-
| [citation-assistant-skill](./citation-assistant-skill/SKILL.md) | Claude Code skill for citation workflow via
|
|
14
|
+
| [citation-assistant-skill](./citation-assistant-skill/SKILL.md) | Claude Code skill for citation workflow via OpenAlex and CrossRef |
|
|
15
15
|
| [citation-style-guide](./citation-style-guide/SKILL.md) | APA, MLA, Chicago citation format guide with CSL configuration |
|
|
16
16
|
| [jabref-reference-guide](./jabref-reference-guide/SKILL.md) | Guide to JabRef open-source BibTeX and BibLaTeX reference manager |
|
|
17
17
|
| [jasminum-zotero-guide](./jasminum-zotero-guide/SKILL.md) | Guide to Jasminum for retrieving CNKI Chinese academic metadata in Zotero |
|
|
@@ -18,7 +18,7 @@ Manage academic citations across multiple formats (BibTeX, APA 7th, MLA 9th, Chi
|
|
|
18
18
|
|
|
19
19
|
Citation management is a persistent friction point in academic writing. Researchers collect references from multiple sources (databases, PDFs, colleagues, web pages), store them in different formats, and must output them in the specific style required by each target journal. Errors in citations -- misspelled author names, incorrect years, broken DOIs, inconsistent formatting -- are among the most common reasons for desk rejection and reviewer criticism.
|
|
20
20
|
|
|
21
|
-
This skill provides a comprehensive citation management workflow that goes beyond what GUI reference managers offer. It can retrieve complete metadata from a DOI in seconds, convert between any citation format, detect and merge duplicate entries, validate entries against CrossRef and
|
|
21
|
+
This skill provides a comprehensive citation management workflow that goes beyond what GUI reference managers offer. It can retrieve complete metadata from a DOI in seconds, convert between any citation format, detect and merge duplicate entries, validate entries against CrossRef and OpenAlex databases, and generate properly formatted bibliographies for any major citation style.
|
|
22
22
|
|
|
23
23
|
The approach is text-based and scriptable, making it ideal for integration with LaTeX workflows, Markdown writing pipelines, and automated document generation. All citation data is stored in standard BibTeX format as the canonical source, with on-demand conversion to other formats for specific manuscript requirements.
|
|
24
24
|
|
|
@@ -52,33 +52,36 @@ print(bibtex)
|
|
|
52
52
|
# }
|
|
53
53
|
```
|
|
54
54
|
|
|
55
|
-
### From
|
|
55
|
+
### From OpenAlex
|
|
56
56
|
|
|
57
57
|
```python
|
|
58
|
-
def
|
|
59
|
-
"""Retrieve citation data from
|
|
60
|
-
url = f"https://api.
|
|
61
|
-
|
|
62
|
-
response = requests.get(url,
|
|
58
|
+
def get_citation_from_openalex(work_id):
|
|
59
|
+
"""Retrieve citation data from OpenAlex API."""
|
|
60
|
+
url = f"https://api.openalex.org/works/{work_id}"
|
|
61
|
+
headers = {"User-Agent": "ResearchPlugins/1.0 (https://wentor.ai)"}
|
|
62
|
+
response = requests.get(url, headers=headers)
|
|
63
63
|
if response.status_code == 200:
|
|
64
64
|
data = response.json()
|
|
65
65
|
return format_as_bibtex(data)
|
|
66
66
|
return None
|
|
67
67
|
|
|
68
|
-
def format_as_bibtex(
|
|
69
|
-
"""Convert
|
|
70
|
-
|
|
71
|
-
author_str = " and ".join(a["
|
|
72
|
-
first_author =
|
|
73
|
-
year =
|
|
68
|
+
def format_as_bibtex(oa_data):
|
|
69
|
+
"""Convert OpenAlex data to BibTeX."""
|
|
70
|
+
authorships = oa_data.get("authorships", [])
|
|
71
|
+
author_str = " and ".join(a["author"]["display_name"] for a in authorships)
|
|
72
|
+
first_author = authorships[0]["author"]["display_name"].split()[-1] if authorships else "Unknown"
|
|
73
|
+
year = str(oa_data.get("publication_year", ""))
|
|
74
74
|
key = f"{first_author}_{year}"
|
|
75
75
|
|
|
76
|
+
venue = oa_data.get("primary_location", {}) or {}
|
|
77
|
+
journal = (venue.get("source") or {}).get("display_name", "")
|
|
78
|
+
|
|
76
79
|
return f"""@article{{{key},
|
|
77
|
-
title={{{
|
|
80
|
+
title={{{oa_data.get('title', '')}}},
|
|
78
81
|
author={{{author_str}}},
|
|
79
82
|
year={{{year}}},
|
|
80
|
-
journal={{{
|
|
81
|
-
doi={{{
|
|
83
|
+
journal={{{journal}}},
|
|
84
|
+
doi={{{oa_data.get('doi', '')}}}
|
|
82
85
|
}}"""
|
|
83
86
|
```
|
|
84
87
|
|
|
@@ -308,7 +311,7 @@ pandoc paper.md --citeproc --bibliography=references.bib \
|
|
|
308
311
|
## References
|
|
309
312
|
|
|
310
313
|
- CrossRef API: https://api.crossref.org
|
|
311
|
-
-
|
|
314
|
+
- OpenAlex API: https://api.openalex.org
|
|
312
315
|
- APA 7th Edition Manual: https://apastyle.apa.org/products/publication-manual-7th-edition
|
|
313
316
|
- BibTeX documentation: http://www.bibtex.org
|
|
314
317
|
- CSL styles repository: https://github.com/citation-style-language/styles
|