openalex-mcp-server 1.0.1
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/.claude/README.md +680 -0
- package/.claude/commands/prd.md +138 -0
- package/.claude/commands/ralph-yolo.md +346 -0
- package/.claude/commands/ralph.md +226 -0
- package/.claude/ralph-config.json +17 -0
- package/.claude/scripts/prompt.md +108 -0
- package/.claude/scripts/ralph.sh +127 -0
- package/.claude/skills/prd.md +270 -0
- package/.claude/skills/ralph-yolo.md +613 -0
- package/.claude/skills/ralph.md +315 -0
- package/.claude/templates/prd.json.example +64 -0
- package/.env.example +8 -0
- package/.github/workflows/npm-publish.yml +48 -0
- package/README.md +525 -0
- package/config/mcp-config.json +77 -0
- package/docs/PRD.md +897 -0
- package/docs/api-document.md +973 -0
- package/docs/document-mcp.txt +1 -0
- package/package.json +49 -0
- package/prd-progress.txt +66 -0
- package/src/cache-manager.js +204 -0
- package/src/cli.js +47 -0
- package/src/fulltext-downloader.js +333 -0
- package/src/index.js +603 -0
- package/src/json-optimizer.js +153 -0
- package/src/openalex-client.js +305 -0
- package/src/types/pdf-parse.d.ts +13 -0
- package/src/utils.js +90 -0
- package/tests/cli.test.js +31 -0
- package/tsconfig.json +22 -0
|
@@ -0,0 +1,973 @@
|
|
|
1
|
+
# Overview
|
|
2
|
+
|
|
3
|
+
[**OpenAlex**](https://openalex.org) is a fully open catalog of the global research system. It's named after the [ancient Library of Alexandria](https://en.wikipedia.org/wiki/Library_of_Alexandria) and made by the nonprofit [OurResearch](https://ourresearch.org/).
|
|
4
|
+
|
|
5
|
+
This is the **technical documentation for OpenAlex,** including the [**OpenAlex API**](https://docs.openalex.org/how-to-use-the-api/api-overview) and the [**data snapshot**](https://docs.openalex.org/download-all-data/openalex-snapshot)**.** Here, you can learn how to set up your code to access OpenAlex's data. If you want to explore the data as a human, you may be more interested in [**OpenAlex Web**](https://help.openalex.org)**.**
|
|
6
|
+
|
|
7
|
+
## Data
|
|
8
|
+
|
|
9
|
+
The OpenAlex dataset describes scholarly [*entities* ](https://docs.openalex.org/api-entities/entities-overview)and how those entities are connected to each other. Types of entities include [works](https://docs.openalex.org/api-entities/works), [authors](https://docs.openalex.org/api-entities/authors), [sources](https://docs.openalex.org/api-entities/sources), [institutions](https://docs.openalex.org/api-entities/institutions), [topics](https://docs.openalex.org/api-entities/topics), [publishers](https://docs.openalex.org/api-entities/publishers), and [funders](https://docs.openalex.org/api-entities/funders).
|
|
10
|
+
|
|
11
|
+
Together, these make a huge web (or more technically, heterogeneous directed [graph](https://en.wikipedia.org/wiki/Graph_theory)) of hundreds of millions of entities and billions of connections between them all.
|
|
12
|
+
|
|
13
|
+
Learn more at our general help center article: [About the data](https://help.openalex.org/hc/en-us/articles/24397285563671-About-the-data)
|
|
14
|
+
|
|
15
|
+
## Access
|
|
16
|
+
|
|
17
|
+
We offer a fast, modern REST API to get OpenAlex data programmatically. It's free but requires an API key (also free). Get yours at [openalex.org/settings/api](https://openalex.org/settings/api). With your free key, you get 100,000 credits per day. [Learn more](https://docs.openalex.org/how-to-use-the-api/api-overview)
|
|
18
|
+
|
|
19
|
+
There is also a complete database snapshot available to download. [Learn more about the data snapshot here.](https://docs.openalex.org/download-all-data/openalex-snapshot)
|
|
20
|
+
|
|
21
|
+
The API has a limit of 100,000 credits per day, and the snapshot is updated monthly. If you need a higher limit, or more frequent updates, please look into [**OpenAlex Premium.**](https://openalex.org/pricing)
|
|
22
|
+
|
|
23
|
+
The web interface for OpenAlex, built directly on top of the API, is the quickest and easiest way to [get started with OpenAlex](https://help.openalex.org/getting-started).
|
|
24
|
+
|
|
25
|
+
## Why OpenAlex?
|
|
26
|
+
|
|
27
|
+
OpenAlex offers an open replacement for industry-standard scientific knowledge bases like Elsevier's Scopus and Clarivate's Web of Science. [Compared to](https://openalex.org/about#comparison) these paywalled services, OpenAlex offers significant advantages in terms of inclusivity, affordability, and avaliability.
|
|
28
|
+
|
|
29
|
+
OpenAlex is:
|
|
30
|
+
|
|
31
|
+
* *Big —* We have about twice the coverage of the other services, and have significantly better coverage of non-English works and works from the Global South.
|
|
32
|
+
* *Easy —* Our service is fast, modern, and well-documented.
|
|
33
|
+
* *Open —* Our complete dataset is free under the CC0 license, which allows for transparency and reuse.
|
|
34
|
+
|
|
35
|
+
Many people and organizations have already found great value using OpenAlex. Have a look at the [Testimonials](https://openalex.org/testimonials) to hear what they've said!
|
|
36
|
+
|
|
37
|
+
## Contact
|
|
38
|
+
|
|
39
|
+
For tech support and bug reports, please visit our [help page](https://openalex.org/help). You can also join the [OpenAlex user group](https://groups.google.com/g/openalex-users), and follow us on [Twitter (@OpenAlex\_org)](https://twitter.com/openalex_org) and [Mastodon](https://mastodon.social/@OpenAlex).
|
|
40
|
+
|
|
41
|
+
## Citation
|
|
42
|
+
|
|
43
|
+
If you use OpenAlex in research, please cite [this paper](https://arxiv.org/abs/2205.01833):
|
|
44
|
+
|
|
45
|
+
> Priem, J., Piwowar, H., & Orr, R. (2022). *OpenAlex: A fully-open index of scholarly works, authors, venues, institutions, and concepts*. ArXiv. <https://arxiv.org/abs/2205.01833>
|
|
46
|
+
|
|
47
|
+
---
|
|
48
|
+
|
|
49
|
+
# API Guide for LLMs
|
|
50
|
+
|
|
51
|
+
## OpenAlex API Guide for LLM Agents and AI Applications
|
|
52
|
+
|
|
53
|
+
OpenAlex is a fully open catalog of scholarly works, authors, sources, institutions, topics, publishers, and funders. Base URL: <https://api.openalex.org> Documentation: <https://docs.openalex.org> API key required (free at openalex.org/settings/api) | 100,000 credits/day with key
|
|
54
|
+
|
|
55
|
+
### CRITICAL GOTCHAS - Read These First!
|
|
56
|
+
|
|
57
|
+
#### ❌ DON'T: Create ad-hoc sampling by using random page numbers
|
|
58
|
+
|
|
59
|
+
WRONG: ?page=5, ?page=17, ?page=42 to get "random" results This is NOT random sampling and will bias your results!
|
|
60
|
+
|
|
61
|
+
#### ✅ DO: Use the ?sample parameter for random sampling
|
|
62
|
+
|
|
63
|
+
CORRECT: <https://api.openalex.org/works?sample=20> For consistent results, add a seed: ?sample=20\&seed=123
|
|
64
|
+
|
|
65
|
+
#### ❌ DON'T: Try to sample large datasets (10k+) in one request
|
|
66
|
+
|
|
67
|
+
The sample parameter maxes out at reasonable sizes for a single request.
|
|
68
|
+
|
|
69
|
+
#### ✅ DO: Use multiple samples with different seeds, then deduplicate
|
|
70
|
+
|
|
71
|
+
For large random samples (10k+ records):
|
|
72
|
+
|
|
73
|
+
1. Make multiple sample requests with different seeds
|
|
74
|
+
2. Combine results
|
|
75
|
+
3. Deduplicate by ID Example:
|
|
76
|
+
|
|
77
|
+
* ?sample=1000\&seed=1
|
|
78
|
+
* ?sample=1000\&seed=2
|
|
79
|
+
* ?sample=1000\&seed=3 Then deduplicate the combined results by checking work IDs. See: <https://docs.openalex.org/how-to-use-the-api/get-lists-of-entities/sample-entity-lists>
|
|
80
|
+
|
|
81
|
+
#### ❌ DON'T: Search/filter by entity names directly
|
|
82
|
+
|
|
83
|
+
WRONG: /works?filter=author\_name:Einstein Entity names are ambiguous and this won't work!
|
|
84
|
+
|
|
85
|
+
#### ✅ DO: Use two-step lookup pattern for related entities
|
|
86
|
+
|
|
87
|
+
CORRECT two-step process:
|
|
88
|
+
|
|
89
|
+
1. Find the entity ID: /authors?search=einstein Response shows ID like "A5023888391" or full URI
|
|
90
|
+
2. Use ID to filter: /works?filter=authorships.author.id:A5023888391
|
|
91
|
+
|
|
92
|
+
Why? Names are ambiguous. "MIT" could be many institutions. IDs are unique. This applies to: authors, institutions, sources, topics, publishers, funders.
|
|
93
|
+
|
|
94
|
+
#### ❌ DON'T: Try to group by multiple dimensions in one query
|
|
95
|
+
|
|
96
|
+
WRONG: You cannot do SQL-style "GROUP BY topic, year" in a single API call.
|
|
97
|
+
|
|
98
|
+
#### ✅ DO: Make multiple queries and combine results client-side
|
|
99
|
+
|
|
100
|
+
To analyze by topic AND year (or any two dimensions):
|
|
101
|
+
|
|
102
|
+
1. Make one query per year: ?filter=publication\_year:2020\&group\_by=topics.id
|
|
103
|
+
2. Repeat for 2021, 2022, etc.
|
|
104
|
+
3. Combine results in your code The API only supports one group\_by per request.
|
|
105
|
+
|
|
106
|
+
#### ❌ DON'T: Ignore API errors or retry immediately on failure
|
|
107
|
+
|
|
108
|
+
API errors are common, especially at scale. Immediate retries can make things worse.
|
|
109
|
+
|
|
110
|
+
#### ✅ DO: Implement exponential backoff for retries
|
|
111
|
+
|
|
112
|
+
When you get errors (429 rate limit, 500 server error, timeouts):
|
|
113
|
+
|
|
114
|
+
1. Catch the error
|
|
115
|
+
2. Wait before retrying (1s, 2s, 4s, 8s, etc.)
|
|
116
|
+
3. Include a max retry limit (e.g., 5 attempts)
|
|
117
|
+
4. Log failures for debugging
|
|
118
|
+
|
|
119
|
+
#### ❌ DON'T: Use default page sizes for bulk extraction
|
|
120
|
+
|
|
121
|
+
Default is only 25 results per page. Slow for large extracts!
|
|
122
|
+
|
|
123
|
+
#### ✅ DO: Use maximum page size (200) for bulk data extraction
|
|
124
|
+
|
|
125
|
+
FAST: ?per-page=200 This reduces the number of API calls needed by 8x compared to default.
|
|
126
|
+
|
|
127
|
+
#### ❌ DON'T: Make sequential API calls for lists of known IDs
|
|
128
|
+
|
|
129
|
+
SLOW: Loop through 100 DOIs making 100 separate API calls.
|
|
130
|
+
|
|
131
|
+
#### ✅ DO: Use the OR filter (pipe |) for batch ID lookups
|
|
132
|
+
|
|
133
|
+
FAST: Combine up to 50 IDs in one query using pipe separator: /works?filter=doi:<https://doi.org/10.1371/journal.pone.0266781|https://doi.org/10.1371/journal.pone.0267149|>... You can include up to 50 values per filter. Use per-page=50 to get all results. See: <https://docs.openalex.org/how-to-use-the-api/get-lists-of-entities/filter-entity-lists#addition-or>
|
|
134
|
+
|
|
135
|
+
#### ❌ DON'T: Ignore rate limits when using concurrency/threading
|
|
136
|
+
|
|
137
|
+
Using multiple threads WITHOUT respecting rate limits will get you rate-limited or banned.
|
|
138
|
+
|
|
139
|
+
#### ✅ DO: Respect rate limits even across concurrent requests
|
|
140
|
+
|
|
141
|
+
* Max 100 requests per second
|
|
142
|
+
* Daily limit: 100,000 credits (list requests cost 10 credits, singleton requests cost 1)
|
|
143
|
+
|
|
144
|
+
When using threading/async:
|
|
145
|
+
|
|
146
|
+
1. Implement rate limiting across ALL threads
|
|
147
|
+
2. Track requests per second globally
|
|
148
|
+
3. Get an API key for higher credit limits
|
|
149
|
+
|
|
150
|
+
### Quick Reference
|
|
151
|
+
|
|
152
|
+
#### Base URL and Authentication
|
|
153
|
+
|
|
154
|
+
```
|
|
155
|
+
Base: https://api.openalex.org
|
|
156
|
+
Auth: API key required (free at openalex.org/settings/api)
|
|
157
|
+
Rate: 100k credits/day with key, 100 credits/day without (list=10cr, singleton=1cr)
|
|
158
|
+
```
|
|
159
|
+
|
|
160
|
+
#### Get Higher Credit Limits
|
|
161
|
+
|
|
162
|
+
For higher daily credit limits, subscribe to [OpenAlex Premium](https://openalex.org/pricing) and use your API key:
|
|
163
|
+
|
|
164
|
+
```
|
|
165
|
+
https://api.openalex.org/works?api_key=YOUR_API_KEY
|
|
166
|
+
```
|
|
167
|
+
|
|
168
|
+
Academic researchers can often get increased limits for free—contact <support@openalex.org>.
|
|
169
|
+
|
|
170
|
+
#### Entity Endpoints
|
|
171
|
+
|
|
172
|
+
```
|
|
173
|
+
/works - 240M+ scholarly documents (articles, books, datasets)
|
|
174
|
+
/authors - Researcher profiles with disambiguated identities
|
|
175
|
+
/sources - Journals, repositories, conferences
|
|
176
|
+
/institutions - Universities, research organizations
|
|
177
|
+
/topics - Subject classifications (3-level hierarchy)
|
|
178
|
+
/publishers - Publishing organizations
|
|
179
|
+
/funders - Funding agencies
|
|
180
|
+
/text - Tag your own text with OpenAlex topics/keywords (POST)
|
|
181
|
+
```
|
|
182
|
+
|
|
183
|
+
#### Essential Query Parameters
|
|
184
|
+
|
|
185
|
+
```
|
|
186
|
+
api_key= - Your API key (required, get free at openalex.org/settings/api)
|
|
187
|
+
filter= - Filter results (see filter syntax below)
|
|
188
|
+
search= - Full-text search across title/abstract/fulltext
|
|
189
|
+
sort= - Sort results (e.g., cited_by_count:desc)
|
|
190
|
+
per-page= - Results per page (default: 25, max: 200)
|
|
191
|
+
page= - Page number for pagination
|
|
192
|
+
sample= - Get random results (e.g., sample=50)
|
|
193
|
+
seed= - Seed for reproducible sampling
|
|
194
|
+
select= - Limit returned fields (e.g., select=id,title)
|
|
195
|
+
group_by= - Aggregate results by a field
|
|
196
|
+
```
|
|
197
|
+
|
|
198
|
+
### Filter Syntax
|
|
199
|
+
|
|
200
|
+
#### Basic Filtering
|
|
201
|
+
|
|
202
|
+
```
|
|
203
|
+
Single filter: ?filter=publication_year:2020
|
|
204
|
+
Multiple (AND): ?filter=publication_year:2020,is_oa:true
|
|
205
|
+
Values (OR): ?filter=type:journal-article|book
|
|
206
|
+
Negation: ?filter=type:!journal-article
|
|
207
|
+
```
|
|
208
|
+
|
|
209
|
+
#### Comparison Operators
|
|
210
|
+
|
|
211
|
+
```
|
|
212
|
+
Greater than: ?filter=cited_by_count:>100
|
|
213
|
+
Less than: ?filter=publication_year:<2020
|
|
214
|
+
Range: ?filter=publication_year:2020-2023
|
|
215
|
+
```
|
|
216
|
+
|
|
217
|
+
#### Multiple Values in Same Attribute
|
|
218
|
+
|
|
219
|
+
You can express AND within a single attribute two ways:
|
|
220
|
+
|
|
221
|
+
```
|
|
222
|
+
Repeat filter: ?filter=institutions.country_code:us,institutions.country_code:gb
|
|
223
|
+
Use + symbol: ?filter=institutions.country_code:us+gb
|
|
224
|
+
```
|
|
225
|
+
|
|
226
|
+
Both mean: "works with author from US AND author from GB"
|
|
227
|
+
|
|
228
|
+
#### OR Queries (Pipe Separator)
|
|
229
|
+
|
|
230
|
+
```
|
|
231
|
+
Any of these: ?filter=institutions.country_code:us|gb|ca
|
|
232
|
+
Batch IDs: ?filter=doi:10.1/abc|10.2/def|10.3/ghi
|
|
233
|
+
```
|
|
234
|
+
|
|
235
|
+
You can combine up to 50 values with pipes.
|
|
236
|
+
|
|
237
|
+
#### Important: OR only works WITHIN a filter, not BETWEEN filters
|
|
238
|
+
|
|
239
|
+
```
|
|
240
|
+
POSSIBLE: ?filter=type:article|book (article OR book)
|
|
241
|
+
NOT POSSIBLE: Cannot do "(year=2020 OR year=2021) AND (type=article)"
|
|
242
|
+
WORKAROUND: Make separate queries and combine results
|
|
243
|
+
```
|
|
244
|
+
|
|
245
|
+
### Common Patterns
|
|
246
|
+
|
|
247
|
+
#### Get Random Sample of Works
|
|
248
|
+
|
|
249
|
+
```
|
|
250
|
+
Small sample:
|
|
251
|
+
https://api.openalex.org/works?sample=20
|
|
252
|
+
|
|
253
|
+
Reproducible sample:
|
|
254
|
+
https://api.openalex.org/works?sample=20&seed=42
|
|
255
|
+
|
|
256
|
+
Large sample (10k+):
|
|
257
|
+
1. https://api.openalex.org/works?sample=1000&seed=1
|
|
258
|
+
2. https://api.openalex.org/works?sample=1000&seed=2
|
|
259
|
+
3. https://api.openalex.org/works?sample=1000&seed=3
|
|
260
|
+
...then deduplicate by ID
|
|
261
|
+
```
|
|
262
|
+
|
|
263
|
+
#### Search Works by Title/Abstract
|
|
264
|
+
|
|
265
|
+
```
|
|
266
|
+
Simple search:
|
|
267
|
+
https://api.openalex.org/works?search=machine+learning
|
|
268
|
+
|
|
269
|
+
Search specific field:
|
|
270
|
+
https://api.openalex.org/works?filter=title.search:CRISPR
|
|
271
|
+
|
|
272
|
+
Search + filter:
|
|
273
|
+
https://api.openalex.org/works?search=climate&filter=publication_year:2023
|
|
274
|
+
```
|
|
275
|
+
|
|
276
|
+
#### Find Works by Author (Two-Step Pattern)
|
|
277
|
+
|
|
278
|
+
```
|
|
279
|
+
Step 1 - Get author ID:
|
|
280
|
+
https://api.openalex.org/authors?search=Heather+Piwowar
|
|
281
|
+
|
|
282
|
+
Response includes: "id": "https://openalex.org/A5023888391"
|
|
283
|
+
|
|
284
|
+
Step 2 - Get their works:
|
|
285
|
+
https://api.openalex.org/works?filter=authorships.author.id:A5023888391
|
|
286
|
+
|
|
287
|
+
Alternative - Use ORCID directly:
|
|
288
|
+
https://api.openalex.org/works?filter=authorships.author.id:https://orcid.org/0000-0003-1613-5981
|
|
289
|
+
```
|
|
290
|
+
|
|
291
|
+
#### Find Works by Institution (Two-Step Pattern)
|
|
292
|
+
|
|
293
|
+
```
|
|
294
|
+
Step 1 - Get institution ID:
|
|
295
|
+
https://api.openalex.org/institutions?search=MIT
|
|
296
|
+
|
|
297
|
+
Response includes: "id": "https://openalex.org/I136199984"
|
|
298
|
+
|
|
299
|
+
Step 2 - Get their works:
|
|
300
|
+
https://api.openalex.org/works?filter=authorships.institutions.id:I136199984
|
|
301
|
+
|
|
302
|
+
Alternative - Use ROR directly:
|
|
303
|
+
https://api.openalex.org/works?filter=authorships.institutions.id:https://ror.org/042nb2s44
|
|
304
|
+
```
|
|
305
|
+
|
|
306
|
+
#### Get Highly Cited Recent Papers
|
|
307
|
+
|
|
308
|
+
```
|
|
309
|
+
https://api.openalex.org/works?filter=publication_year:>2020&sort=cited_by_count:desc&per-page=200
|
|
310
|
+
```
|
|
311
|
+
|
|
312
|
+
#### Get Open Access Works Only
|
|
313
|
+
|
|
314
|
+
```
|
|
315
|
+
All OA:
|
|
316
|
+
https://api.openalex.org/works?filter=is_oa:true
|
|
317
|
+
|
|
318
|
+
Gold OA only:
|
|
319
|
+
https://api.openalex.org/works?filter=open_access.oa_status:gold
|
|
320
|
+
|
|
321
|
+
Published OA version:
|
|
322
|
+
https://api.openalex.org/works?filter=has_oa_published_version:true
|
|
323
|
+
```
|
|
324
|
+
|
|
325
|
+
#### Filter by Multiple Criteria
|
|
326
|
+
|
|
327
|
+
```
|
|
328
|
+
Recent OA works about COVID from top institutions:
|
|
329
|
+
https://api.openalex.org/works?filter=publication_year:2022,is_oa:true,title.search:covid,authorships.institutions.id:I136199984|I27837315
|
|
330
|
+
|
|
331
|
+
Breaking down the filters:
|
|
332
|
+
- publication_year:2022 (recent)
|
|
333
|
+
- is_oa:true (open access)
|
|
334
|
+
- title.search:covid (about COVID)
|
|
335
|
+
- authorships.institutions.id:I136199984|I27837315 (MIT or Harvard)
|
|
336
|
+
```
|
|
337
|
+
|
|
338
|
+
#### Bulk Lookup by DOIs
|
|
339
|
+
|
|
340
|
+
```
|
|
341
|
+
Get specific works by DOI (efficient batch method):
|
|
342
|
+
https://api.openalex.org/works?filter=doi:https://doi.org/10.1371/journal.pone.0266781|https://doi.org/10.1371/journal.pone.0267149|https://doi.org/10.1371/journal.pone.0267890&per-page=50
|
|
343
|
+
|
|
344
|
+
Up to 50 DOIs per request. Use per-page=50 to ensure you get all results.
|
|
345
|
+
```
|
|
346
|
+
|
|
347
|
+
#### Get Works from Specific Journal
|
|
348
|
+
|
|
349
|
+
```
|
|
350
|
+
Step 1 - Get source ID:
|
|
351
|
+
https://api.openalex.org/sources?search=Nature
|
|
352
|
+
|
|
353
|
+
Response includes: "id": "https://openalex.org/S137773608"
|
|
354
|
+
|
|
355
|
+
Step 2 - Get works from that source:
|
|
356
|
+
https://api.openalex.org/works?filter=primary_location.source.id:S137773608
|
|
357
|
+
```
|
|
358
|
+
|
|
359
|
+
#### Aggregate/Group Data
|
|
360
|
+
|
|
361
|
+
```
|
|
362
|
+
Top topics by work count:
|
|
363
|
+
https://api.openalex.org/works?group_by=topics.id
|
|
364
|
+
|
|
365
|
+
Papers per year:
|
|
366
|
+
https://api.openalex.org/works?group_by=publication_year
|
|
367
|
+
|
|
368
|
+
Most prolific institutions:
|
|
369
|
+
https://api.openalex.org/works?group_by=authorships.institutions.id
|
|
370
|
+
|
|
371
|
+
Group with filters:
|
|
372
|
+
https://api.openalex.org/works?filter=publication_year:>2020&group_by=topics.id
|
|
373
|
+
```
|
|
374
|
+
|
|
375
|
+
#### Pagination for Large Result Sets
|
|
376
|
+
|
|
377
|
+
```
|
|
378
|
+
First page:
|
|
379
|
+
https://api.openalex.org/works?filter=publication_year:2023&per-page=200
|
|
380
|
+
|
|
381
|
+
Next pages:
|
|
382
|
+
https://api.openalex.org/works?filter=publication_year:2023&per-page=200&page=2
|
|
383
|
+
https://api.openalex.org/works?filter=publication_year:2023&per-page=200&page=3
|
|
384
|
+
...
|
|
385
|
+
|
|
386
|
+
The meta.count field tells you total results.
|
|
387
|
+
Calculate pages needed: ceil(meta.count / per-page)
|
|
388
|
+
```
|
|
389
|
+
|
|
390
|
+
#### Select Specific Fields Only (Faster Responses)
|
|
391
|
+
|
|
392
|
+
```
|
|
393
|
+
Just IDs and titles:
|
|
394
|
+
https://api.openalex.org/works?select=id,title&per-page=200
|
|
395
|
+
|
|
396
|
+
Multiple fields:
|
|
397
|
+
https://api.openalex.org/works?select=id,title,publication_year,cited_by_count
|
|
398
|
+
```
|
|
399
|
+
|
|
400
|
+
#### Autocomplete for Type-Ahead
|
|
401
|
+
|
|
402
|
+
```
|
|
403
|
+
Fast autocomplete endpoint for building search UIs:
|
|
404
|
+
|
|
405
|
+
Authors:
|
|
406
|
+
https://api.openalex.org/autocomplete/authors?q=einst
|
|
407
|
+
|
|
408
|
+
Institutions:
|
|
409
|
+
https://api.openalex.org/autocomplete/institutions?q=stanford
|
|
410
|
+
|
|
411
|
+
Works:
|
|
412
|
+
https://api.openalex.org/autocomplete/works?q=neural+networks
|
|
413
|
+
|
|
414
|
+
Typically returns in ~200ms
|
|
415
|
+
```
|
|
416
|
+
|
|
417
|
+
#### Tag Your Own Text (/text endpoint)
|
|
418
|
+
|
|
419
|
+
```
|
|
420
|
+
POST or GET to classify your own content:
|
|
421
|
+
|
|
422
|
+
https://api.openalex.org/text?title=Machine+learning+for+drug+discovery
|
|
423
|
+
|
|
424
|
+
Returns topics, keywords, and concepts for your text.
|
|
425
|
+
Costs 1000 credits per request (limited to 1 req/sec).
|
|
426
|
+
Text must be 20-2000 characters.
|
|
427
|
+
```
|
|
428
|
+
|
|
429
|
+
### Response Structure
|
|
430
|
+
|
|
431
|
+
#### List Endpoints
|
|
432
|
+
|
|
433
|
+
All list endpoints (/works, /authors, etc.) return:
|
|
434
|
+
|
|
435
|
+
```json
|
|
436
|
+
{
|
|
437
|
+
"meta": {
|
|
438
|
+
"count": 240523418,
|
|
439
|
+
"db_response_time_ms": 42,
|
|
440
|
+
"page": 1,
|
|
441
|
+
"per_page": 25
|
|
442
|
+
},
|
|
443
|
+
"results": [
|
|
444
|
+
{ /* entity object */ },
|
|
445
|
+
{ /* entity object */ },
|
|
446
|
+
...
|
|
447
|
+
]
|
|
448
|
+
}
|
|
449
|
+
```
|
|
450
|
+
|
|
451
|
+
#### Single Entity Endpoints
|
|
452
|
+
|
|
453
|
+
Getting a single entity returns the object directly:
|
|
454
|
+
|
|
455
|
+
```
|
|
456
|
+
https://api.openalex.org/works/W2741809807
|
|
457
|
+
→ Returns a Work object directly (no meta/results wrapper)
|
|
458
|
+
```
|
|
459
|
+
|
|
460
|
+
#### Group By Responses
|
|
461
|
+
|
|
462
|
+
```json
|
|
463
|
+
{
|
|
464
|
+
"meta": { "count": 100, ... },
|
|
465
|
+
"group_by": [
|
|
466
|
+
{
|
|
467
|
+
"key": "https://openalex.org/T10001",
|
|
468
|
+
"key_display_name": "Artificial Intelligence",
|
|
469
|
+
"count": 15234
|
|
470
|
+
},
|
|
471
|
+
...
|
|
472
|
+
]
|
|
473
|
+
}
|
|
474
|
+
```
|
|
475
|
+
|
|
476
|
+
### Performance Optimization Tips
|
|
477
|
+
|
|
478
|
+
#### 1. Use Maximum Page Size
|
|
479
|
+
|
|
480
|
+
```
|
|
481
|
+
SLOW: Default 25 per page = more API calls
|
|
482
|
+
FAST: ?per-page=200 (8x fewer API calls)
|
|
483
|
+
```
|
|
484
|
+
|
|
485
|
+
#### 2. Use Batch ID Lookups
|
|
486
|
+
|
|
487
|
+
```
|
|
488
|
+
SLOW: Loop through 50 DOIs, 50 API calls
|
|
489
|
+
FAST: One call with pipe-separated DOIs (up to 50)
|
|
490
|
+
```
|
|
491
|
+
|
|
492
|
+
#### 3. Select Only Fields You Need
|
|
493
|
+
|
|
494
|
+
```
|
|
495
|
+
SLOW: Full objects with all fields
|
|
496
|
+
FAST: ?select=id,title,publication_year
|
|
497
|
+
```
|
|
498
|
+
|
|
499
|
+
#### 4. Use Concurrent Requests with Rate Limiting
|
|
500
|
+
|
|
501
|
+
```python
|
|
502
|
+
# Pseudo-code
|
|
503
|
+
from concurrent.futures import ThreadPoolExecutor
|
|
504
|
+
import time
|
|
505
|
+
|
|
506
|
+
rate_limiter = RateLimiter(100) # max 100 req/sec
|
|
507
|
+
|
|
508
|
+
def fetch_page(page_num):
|
|
509
|
+
rate_limiter.wait() # Ensure we don't exceed rate limit
|
|
510
|
+
return requests.get(f"...&page={page_num}&api_key=YOUR_KEY")
|
|
511
|
+
|
|
512
|
+
with ThreadPoolExecutor(max_workers=10) as executor:
|
|
513
|
+
results = executor.map(fetch_page, range(1, 101))
|
|
514
|
+
```
|
|
515
|
+
|
|
516
|
+
#### 5. Get an API Key for Heavy Usage
|
|
517
|
+
|
|
518
|
+
```
|
|
519
|
+
FREE: 100,000 credits/day
|
|
520
|
+
PREMIUM: Higher limits (varies by plan)
|
|
521
|
+
|
|
522
|
+
Get an API key: https://openalex.org/pricing
|
|
523
|
+
```
|
|
524
|
+
|
|
525
|
+
### Handling Errors
|
|
526
|
+
|
|
527
|
+
#### Common HTTP Status Codes
|
|
528
|
+
|
|
529
|
+
```
|
|
530
|
+
200 OK - Success
|
|
531
|
+
400 Bad Request - Invalid parameter (check your filter syntax)
|
|
532
|
+
403 Forbidden - Rate limit exceeded (slow down, implement backoff)
|
|
533
|
+
404 Not Found - Entity doesn't exist
|
|
534
|
+
500 Server Error - Temporary issue (retry with backoff)
|
|
535
|
+
```
|
|
536
|
+
|
|
537
|
+
#### Exponential Backoff Pattern
|
|
538
|
+
|
|
539
|
+
```python
|
|
540
|
+
def fetch_with_retry(url, max_retries=5):
|
|
541
|
+
for attempt in range(max_retries):
|
|
542
|
+
try:
|
|
543
|
+
response = requests.get(url, timeout=30)
|
|
544
|
+
if response.status_code == 200:
|
|
545
|
+
return response.json()
|
|
546
|
+
elif response.status_code == 403:
|
|
547
|
+
# Rate limited
|
|
548
|
+
wait_time = 2 ** attempt # 1s, 2s, 4s, 8s, 16s
|
|
549
|
+
time.sleep(wait_time)
|
|
550
|
+
elif response.status_code >= 500:
|
|
551
|
+
# Server error
|
|
552
|
+
wait_time = 2 ** attempt
|
|
553
|
+
time.sleep(wait_time)
|
|
554
|
+
else:
|
|
555
|
+
# Other error, don't retry
|
|
556
|
+
response.raise_for_status()
|
|
557
|
+
except requests.exceptions.Timeout:
|
|
558
|
+
if attempt < max_retries - 1:
|
|
559
|
+
time.sleep(2 ** attempt)
|
|
560
|
+
else:
|
|
561
|
+
raise
|
|
562
|
+
raise Exception(f"Failed after {max_retries} retries")
|
|
563
|
+
```
|
|
564
|
+
|
|
565
|
+
### Entity-Specific Filter Examples
|
|
566
|
+
|
|
567
|
+
#### Works Filters (Most Common)
|
|
568
|
+
|
|
569
|
+
```
|
|
570
|
+
authorships.author.id - Author's OpenAlex ID
|
|
571
|
+
authorships.institutions.id - Institution's OpenAlex ID
|
|
572
|
+
cited_by_count - Number of citations
|
|
573
|
+
is_oa - Is open access (true/false)
|
|
574
|
+
publication_year - Year published
|
|
575
|
+
primary_location.source.id - Source (journal) ID
|
|
576
|
+
topics.id - Topic ID
|
|
577
|
+
type - article, book, dataset, etc.
|
|
578
|
+
has_doi - Has a DOI (true/false)
|
|
579
|
+
has_fulltext - Has fulltext available (true/false)
|
|
580
|
+
```
|
|
581
|
+
|
|
582
|
+
#### Authors Filters
|
|
583
|
+
|
|
584
|
+
```
|
|
585
|
+
last_known_institution.id - Current/last institution
|
|
586
|
+
works_count - Number of works authored
|
|
587
|
+
cited_by_count - Total citations
|
|
588
|
+
orcid - ORCID identifier
|
|
589
|
+
```
|
|
590
|
+
|
|
591
|
+
#### Sources Filters
|
|
592
|
+
|
|
593
|
+
```
|
|
594
|
+
host_organization - Publisher/host
|
|
595
|
+
type - journal, repository, etc.
|
|
596
|
+
is_oa - Is open access
|
|
597
|
+
```
|
|
598
|
+
|
|
599
|
+
#### Institutions Filters
|
|
600
|
+
|
|
601
|
+
```
|
|
602
|
+
type - education, healthcare, company, etc.
|
|
603
|
+
country_code - Two-letter country code
|
|
604
|
+
continent - africa, asia, europe, etc.
|
|
605
|
+
```
|
|
606
|
+
|
|
607
|
+
### External ID Support
|
|
608
|
+
|
|
609
|
+
You can use external IDs directly in the API:
|
|
610
|
+
|
|
611
|
+
#### Works
|
|
612
|
+
|
|
613
|
+
```
|
|
614
|
+
DOI: /works/https://doi.org/10.7717/peerj.4375
|
|
615
|
+
PMID: /works/pmid:29844763
|
|
616
|
+
```
|
|
617
|
+
|
|
618
|
+
#### Authors
|
|
619
|
+
|
|
620
|
+
```
|
|
621
|
+
ORCID: /authors/https://orcid.org/0000-0003-1613-5981
|
|
622
|
+
```
|
|
623
|
+
|
|
624
|
+
#### Institutions
|
|
625
|
+
|
|
626
|
+
```
|
|
627
|
+
ROR: /institutions/https://ror.org/02y3ad647
|
|
628
|
+
```
|
|
629
|
+
|
|
630
|
+
#### Sources
|
|
631
|
+
|
|
632
|
+
```
|
|
633
|
+
ISSN: /sources/issn:0028-0836
|
|
634
|
+
```
|
|
635
|
+
|
|
636
|
+
### Advanced Tips
|
|
637
|
+
|
|
638
|
+
#### Reproducible Random Samples
|
|
639
|
+
|
|
640
|
+
Always use a seed for reproducible sampling:
|
|
641
|
+
|
|
642
|
+
```
|
|
643
|
+
https://api.openalex.org/works?sample=100&seed=42
|
|
644
|
+
```
|
|
645
|
+
|
|
646
|
+
Same seed = same results every time.
|
|
647
|
+
|
|
648
|
+
#### Finding Related Works
|
|
649
|
+
|
|
650
|
+
```
|
|
651
|
+
Get cited works:
|
|
652
|
+
1. Get work: /works/W2741809807
|
|
653
|
+
2. Response includes: "referenced_works": [...]
|
|
654
|
+
3. Fetch those: /works?filter=openalex_id:W123|W456|W789
|
|
655
|
+
|
|
656
|
+
Get citing works:
|
|
657
|
+
1. Get work: /works/W2741809807
|
|
658
|
+
2. Response includes: "cited_by_api_url"
|
|
659
|
+
3. Follow that URL
|
|
660
|
+
```
|
|
661
|
+
|
|
662
|
+
#### Filtering by Date Ranges
|
|
663
|
+
|
|
664
|
+
```
|
|
665
|
+
Exact year: ?filter=publication_year:2020
|
|
666
|
+
After: ?filter=publication_year:>2020
|
|
667
|
+
Before: ?filter=publication_year:<2020
|
|
668
|
+
Range: ?filter=publication_year:2018-2022
|
|
669
|
+
```
|
|
670
|
+
|
|
671
|
+
#### Complex Boolean Searches
|
|
672
|
+
|
|
673
|
+
The search parameter supports boolean operators:
|
|
674
|
+
|
|
675
|
+
```
|
|
676
|
+
AND: ?search=climate+AND+change
|
|
677
|
+
OR: ?search=climate+OR+weather
|
|
678
|
+
NOT: ?search=climate+NOT+politics
|
|
679
|
+
```
|
|
680
|
+
|
|
681
|
+
### Rate Limiting Best Practices
|
|
682
|
+
|
|
683
|
+
#### Without API Key
|
|
684
|
+
|
|
685
|
+
* 100 credits per day (for testing only)
|
|
686
|
+
* Max 100 requests per second
|
|
687
|
+
* Not suitable for production use
|
|
688
|
+
|
|
689
|
+
#### With Free API Key
|
|
690
|
+
|
|
691
|
+
* 100,000 credits per day
|
|
692
|
+
* Max 100 requests per second
|
|
693
|
+
* Get your free key at [openalex.org/settings/api](https://openalex.org/settings/api)
|
|
694
|
+
|
|
695
|
+
#### With Premium API Key
|
|
696
|
+
|
|
697
|
+
* Higher credit limits (varies by plan)
|
|
698
|
+
* Max 100 requests per second
|
|
699
|
+
* Contact <support@openalex.org> for academic waivers
|
|
700
|
+
|
|
701
|
+
#### Credit Costs
|
|
702
|
+
|
|
703
|
+
* Singleton requests (e.g., `/works/W123`): 1 credit
|
|
704
|
+
* List requests (e.g., `/works?filter=...`): 10 credits
|
|
705
|
+
* Text/Aboutness requests: 1,000 credits
|
|
706
|
+
|
|
707
|
+
#### Concurrent Requests Strategy
|
|
708
|
+
|
|
709
|
+
```
|
|
710
|
+
1. Track requests per second globally (not per thread)
|
|
711
|
+
2. Use a semaphore or rate limiter across threads
|
|
712
|
+
3. Add delays between batches if needed
|
|
713
|
+
4. Monitor for 403 responses (rate limit exceeded)
|
|
714
|
+
5. Back off if you hit limits
|
|
715
|
+
```
|
|
716
|
+
|
|
717
|
+
#### Daily Limit Management
|
|
718
|
+
|
|
719
|
+
With 100k credits/day limit:
|
|
720
|
+
|
|
721
|
+
* Singleton requests: up to 100,000/day
|
|
722
|
+
* List requests: up to 10,000/day
|
|
723
|
+
* Plan accordingly for large jobs
|
|
724
|
+
* Consider OpenAlex Premium for higher limits
|
|
725
|
+
|
|
726
|
+
### Common Mistakes to Avoid
|
|
727
|
+
|
|
728
|
+
1. ❌ Using page numbers for sampling → ✅ Use ?sample=
|
|
729
|
+
2. ❌ Filtering by entity names → ✅ Get IDs first, then filter
|
|
730
|
+
3. ❌ Default page size → ✅ Use per-page=200
|
|
731
|
+
4. ❌ Sequential ID lookups → ✅ Batch with pipe (|) operator
|
|
732
|
+
5. ❌ No error handling → ✅ Implement retry with backoff
|
|
733
|
+
6. ❌ Ignoring rate limits in threads → ✅ Global rate limiting
|
|
734
|
+
7. ❌ Trying to group by multiple fields → ✅ Multiple queries + combine
|
|
735
|
+
8. ❌ No API key for heavy usage → ✅ Get API key for higher credit limits
|
|
736
|
+
9. ❌ Fetching all fields → ✅ Use select= for needed fields only
|
|
737
|
+
10. ❌ Assuming instant responses → ✅ Add timeouts (30s recommended)
|
|
738
|
+
|
|
739
|
+
### Need More Info?
|
|
740
|
+
|
|
741
|
+
* Full documentation: <https://docs.openalex.org>
|
|
742
|
+
* API Overview: <https://docs.openalex.org/how-to-use-the-api/api-overview>
|
|
743
|
+
* Entity schemas: <https://docs.openalex.org/api-entities>
|
|
744
|
+
* Help: <https://openalex.org/help>
|
|
745
|
+
* User group: <https://groups.google.com/g/openalex-users>
|
|
746
|
+
|
|
747
|
+
### For Premium Features
|
|
748
|
+
|
|
749
|
+
If you need:
|
|
750
|
+
|
|
751
|
+
* More than 100k credits/day
|
|
752
|
+
* Faster than daily snapshot updates
|
|
753
|
+
* Commercial support
|
|
754
|
+
* SLA guarantees
|
|
755
|
+
|
|
756
|
+
See: <https://openalex.org/pricing>
|
|
757
|
+
|
|
758
|
+
***
|
|
759
|
+
|
|
760
|
+
Last updated: 2025-10-13 Maintained for: LLM agents, AI applications, and automated tools
|
|
761
|
+
|
|
762
|
+
---
|
|
763
|
+
|
|
764
|
+
# Quickstart tutorial
|
|
765
|
+
|
|
766
|
+
Lets use the OpenAlex API to get journal articles and books published by authors at Stanford University. We'll limit our search to articles published between 2010 and 2020. Since OpenAlex is free and openly available, these examples work without any login or account creation. :thumbsup:
|
|
767
|
+
|
|
768
|
+
{% hint style="info" %}
|
|
769
|
+
If you open these examples in a web browser, they will look *much* better if you have a browser plug-in such as [JSONVue](https://chrome.google.com/webstore/detail/jsonvue/chklaanhfefbnpoihckbnefhakgolnmc) installed.
|
|
770
|
+
{% endhint %}
|
|
771
|
+
|
|
772
|
+
### 1. Find the institution
|
|
773
|
+
|
|
774
|
+
You can use the [institutions](https://docs.openalex.org/api-entities/institutions) endpoint to learn about universities and research centers. OpenAlex has a powerful search feature that searches across 108,000 institutions.
|
|
775
|
+
|
|
776
|
+
Lets use it to search for Stanford University:
|
|
777
|
+
|
|
778
|
+
* Find Stanford University\
|
|
779
|
+
[`https://api.openalex.org/institutions?search=stanford`](https://api.openalex.org/institutions?search=stanford)
|
|
780
|
+
|
|
781
|
+
Our first result looks correct (yeah!):
|
|
782
|
+
|
|
783
|
+
```json
|
|
784
|
+
{
|
|
785
|
+
"id": "https://openalex.org/I97018004",
|
|
786
|
+
"ror": "https://ror.org/00f54p054",
|
|
787
|
+
"display_name": "Stanford University",
|
|
788
|
+
"country_code": "US",
|
|
789
|
+
"type": "education",
|
|
790
|
+
"homepage_url": "http://www.stanford.edu/"
|
|
791
|
+
// other fields removed
|
|
792
|
+
}
|
|
793
|
+
```
|
|
794
|
+
|
|
795
|
+
We can use the ID `https://openalex.org/I97018004` in that result to find out more.
|
|
796
|
+
|
|
797
|
+
### 2. Find articles (works) associated with Stanford University
|
|
798
|
+
|
|
799
|
+
The [works](https://docs.openalex.org/api-entities/works) endpoint contains over 240 million articles, books, and theses :astonished:. We can filter to show works associated with Stanford.
|
|
800
|
+
|
|
801
|
+
* Show works where at least one author is associated with Stanford University\
|
|
802
|
+
[`https://api.openalex.org/works?filter=institutions.id:https://openalex.org/I97018004`](https://api.openalex.org/works?filter=institutions.id:https://openalex.org/I97018004)
|
|
803
|
+
|
|
804
|
+
This is just one of the 50+ ways that you can filter works!
|
|
805
|
+
|
|
806
|
+
### 3. Filter works by publication year
|
|
807
|
+
|
|
808
|
+
Right now the list shows records for all years. Lets narrow it down to works that were published between 2010 to 2020, and sort from newest to oldest.
|
|
809
|
+
|
|
810
|
+
* Show works with publication years 2010 to 2020, associated with Stanford University\
|
|
811
|
+
<https://api.openalex.org/works?filter=institutions.id:https://openalex.org/I97018004,publication_year:2010-2020&sort=publication_date:desc>
|
|
812
|
+
|
|
813
|
+
### 4. Group works by publication year to show counts by year
|
|
814
|
+
|
|
815
|
+
Finally, you can group our result by publication year to get our final result, which is the number of articles produced by Stanford, by year from 2010 to 2020. There are more than 30 ways to group records in OpenAlex, including by publisher, journal, and open access status.
|
|
816
|
+
|
|
817
|
+
* Group records by publication year\
|
|
818
|
+
[`https://api.openalex.org/works?filter=institutions.id:https://openalex.org/I97018004,publication\_year:2010-2020\&group-by=publication\_year`](https://api.openalex.org/works?filter=institutions.id:https://openalex.org/I97018004,publication_year:2010-2020\&group-by=publication_year)
|
|
819
|
+
|
|
820
|
+
That gives a result like this:
|
|
821
|
+
|
|
822
|
+
```json
|
|
823
|
+
[
|
|
824
|
+
{
|
|
825
|
+
"key": "2020",
|
|
826
|
+
"key_display_name": "2020",
|
|
827
|
+
"count": 18627
|
|
828
|
+
},
|
|
829
|
+
{
|
|
830
|
+
"key": "2019",
|
|
831
|
+
"key_display_name": "2019",
|
|
832
|
+
"count": 15933
|
|
833
|
+
},
|
|
834
|
+
{
|
|
835
|
+
"key": "2017",
|
|
836
|
+
"key_display_name": "2017",
|
|
837
|
+
"count": 14789
|
|
838
|
+
},
|
|
839
|
+
...
|
|
840
|
+
]
|
|
841
|
+
```
|
|
842
|
+
|
|
843
|
+
There you have it! This same technique can be applied to hundreds of questions around scholarly data. The data you received is under a [CC0 license](https://creativecommons.org/publicdomain/zero/1.0/), so not only did you access it easily, you can share it freely! :tada:
|
|
844
|
+
|
|
845
|
+
## What's next?
|
|
846
|
+
|
|
847
|
+
Jump into an area of OpenAlex that interests you:
|
|
848
|
+
|
|
849
|
+
* [Works](https://docs.openalex.org/api-entities/works)
|
|
850
|
+
* [Authors](https://docs.openalex.org/api-entities/authors)
|
|
851
|
+
* [Sources](https://docs.openalex.org/api-entities/sources)
|
|
852
|
+
* [Institutions](https://docs.openalex.org/api-entities/institutions)
|
|
853
|
+
* [Topics](https://docs.openalex.org/api-entities/topics)
|
|
854
|
+
* [Publishers](https://docs.openalex.org/api-entities/publishers)
|
|
855
|
+
* [Funders](https://docs.openalex.org/api-entities/funders)
|
|
856
|
+
|
|
857
|
+
And check out our [tutorials](https://docs.openalex.org/additional-help/tutorials) page for some hands-on examples!
|
|
858
|
+
|
|
859
|
+
---
|
|
860
|
+
|
|
861
|
+
# Quickstart tutorial
|
|
862
|
+
|
|
863
|
+
Lets use the OpenAlex API to get journal articles and books published by authors at Stanford University. We'll limit our search to articles published between 2010 and 2020. Since OpenAlex is free and openly available, these examples work without any login or account creation. :thumbsup:
|
|
864
|
+
|
|
865
|
+
{% hint style="info" %}
|
|
866
|
+
If you open these examples in a web browser, they will look *much* better if you have a browser plug-in such as [JSONVue](https://chrome.google.com/webstore/detail/jsonvue/chklaanhfefbnpoihckbnefhakgolnmc) installed.
|
|
867
|
+
{% endhint %}
|
|
868
|
+
|
|
869
|
+
### 1. Find the institution
|
|
870
|
+
|
|
871
|
+
You can use the [institutions](https://docs.openalex.org/api-entities/institutions) endpoint to learn about universities and research centers. OpenAlex has a powerful search feature that searches across 108,000 institutions.
|
|
872
|
+
|
|
873
|
+
Lets use it to search for Stanford University:
|
|
874
|
+
|
|
875
|
+
* Find Stanford University\
|
|
876
|
+
[`https://api.openalex.org/institutions?search=stanford`](https://api.openalex.org/institutions?search=stanford)
|
|
877
|
+
|
|
878
|
+
Our first result looks correct (yeah!):
|
|
879
|
+
|
|
880
|
+
```json
|
|
881
|
+
{
|
|
882
|
+
"id": "https://openalex.org/I97018004",
|
|
883
|
+
"ror": "https://ror.org/00f54p054",
|
|
884
|
+
"display_name": "Stanford University",
|
|
885
|
+
"country_code": "US",
|
|
886
|
+
"type": "education",
|
|
887
|
+
"homepage_url": "http://www.stanford.edu/"
|
|
888
|
+
// other fields removed
|
|
889
|
+
}
|
|
890
|
+
```
|
|
891
|
+
|
|
892
|
+
We can use the ID `https://openalex.org/I97018004` in that result to find out more.
|
|
893
|
+
|
|
894
|
+
### 2. Find articles (works) associated with Stanford University
|
|
895
|
+
|
|
896
|
+
The [works](https://docs.openalex.org/api-entities/works) endpoint contains over 240 million articles, books, and theses :astonished:. We can filter to show works associated with Stanford.
|
|
897
|
+
|
|
898
|
+
* Show works where at least one author is associated with Stanford University\
|
|
899
|
+
[`https://api.openalex.org/works?filter=institutions.id:https://openalex.org/I97018004`](https://api.openalex.org/works?filter=institutions.id:https://openalex.org/I97018004)
|
|
900
|
+
|
|
901
|
+
This is just one of the 50+ ways that you can filter works!
|
|
902
|
+
|
|
903
|
+
### 3. Filter works by publication year
|
|
904
|
+
|
|
905
|
+
Right now the list shows records for all years. Lets narrow it down to works that were published between 2010 to 2020, and sort from newest to oldest.
|
|
906
|
+
|
|
907
|
+
* Show works with publication years 2010 to 2020, associated with Stanford University\
|
|
908
|
+
<https://api.openalex.org/works?filter=institutions.id:https://openalex.org/I97018004,publication_year:2010-2020&sort=publication_date:desc>
|
|
909
|
+
|
|
910
|
+
### 4. Group works by publication year to show counts by year
|
|
911
|
+
|
|
912
|
+
Finally, you can group our result by publication year to get our final result, which is the number of articles produced by Stanford, by year from 2010 to 2020. There are more than 30 ways to group records in OpenAlex, including by publisher, journal, and open access status.
|
|
913
|
+
|
|
914
|
+
* Group records by publication year\
|
|
915
|
+
[`https://api.openalex.org/works?filter=institutions.id:https://openalex.org/I97018004,publication\_year:2010-2020\&group-by=publication\_year`](https://api.openalex.org/works?filter=institutions.id:https://openalex.org/I97018004,publication_year:2010-2020\&group-by=publication_year)
|
|
916
|
+
|
|
917
|
+
That gives a result like this:
|
|
918
|
+
|
|
919
|
+
```json
|
|
920
|
+
[
|
|
921
|
+
{
|
|
922
|
+
"key": "2020",
|
|
923
|
+
"key_display_name": "2020",
|
|
924
|
+
"count": 18627
|
|
925
|
+
},
|
|
926
|
+
{
|
|
927
|
+
"key": "2019",
|
|
928
|
+
"key_display_name": "2019",
|
|
929
|
+
"count": 15933
|
|
930
|
+
},
|
|
931
|
+
{
|
|
932
|
+
"key": "2017",
|
|
933
|
+
"key_display_name": "2017",
|
|
934
|
+
"count": 14789
|
|
935
|
+
},
|
|
936
|
+
...
|
|
937
|
+
]
|
|
938
|
+
```
|
|
939
|
+
|
|
940
|
+
There you have it! This same technique can be applied to hundreds of questions around scholarly data. The data you received is under a [CC0 license](https://creativecommons.org/publicdomain/zero/1.0/), so not only did you access it easily, you can share it freely! :tada:
|
|
941
|
+
|
|
942
|
+
## What's next?
|
|
943
|
+
|
|
944
|
+
Jump into an area of OpenAlex that interests you:
|
|
945
|
+
|
|
946
|
+
* [Works](https://docs.openalex.org/api-entities/works)
|
|
947
|
+
* [Authors](https://docs.openalex.org/api-entities/authors)
|
|
948
|
+
* [Sources](https://docs.openalex.org/api-entities/sources)
|
|
949
|
+
* [Institutions](https://docs.openalex.org/api-entities/institutions)
|
|
950
|
+
* [Topics](https://docs.openalex.org/api-entities/topics)
|
|
951
|
+
* [Publishers](https://docs.openalex.org/api-entities/publishers)
|
|
952
|
+
* [Funders](https://docs.openalex.org/api-entities/funders)
|
|
953
|
+
|
|
954
|
+
And check out our [tutorials](https://docs.openalex.org/additional-help/tutorials) page for some hands-on examples!
|
|
955
|
+
|
|
956
|
+
---
|
|
957
|
+
|
|
958
|
+
# Entities overview
|
|
959
|
+
|
|
960
|
+
The OpenAlex dataset describes scholarly *entities* and how those entities are connected to each other. Together, these make a huge web (or more technically, heterogeneous directed [graph](https://en.wikipedia.org/wiki/Graph_theory)) of hundreds of millions of entities and billions of connections between them all.
|
|
961
|
+
|
|
962
|
+
Learn more about the OpenAlex entities:
|
|
963
|
+
|
|
964
|
+
* [Works](https://docs.openalex.org/api-entities/works): Scholarly documents like journal articles, books, datasets, and theses
|
|
965
|
+
* [Authors](https://docs.openalex.org/api-entities/authors): People who create works
|
|
966
|
+
* [Sources](https://docs.openalex.org/api-entities/sources): Where works are hosted (such as journals, conferences, and repositories)
|
|
967
|
+
* [Institutions](https://docs.openalex.org/api-entities/institutions): Universities and other organizations to which authors claim affiliations
|
|
968
|
+
* [Topics](https://docs.openalex.org/api-entities/topics): Topics assigned to works
|
|
969
|
+
* [Publishers](https://docs.openalex.org/api-entities/publishers): Companies and organizations that distribute works
|
|
970
|
+
* [Funders](https://docs.openalex.org/api-entities/funders): Organizations that fund research
|
|
971
|
+
* [Geo](https://docs.openalex.org/api-entities/geo): Where things are in the world
|
|
972
|
+
|
|
973
|
+
---
|