openalex-mcp-server 1.0.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,973 @@
1
+ # Overview
2
+
3
+ [**OpenAlex**](https://openalex.org) is a fully open catalog of the global research system. It's named after the [ancient Library of Alexandria](https://en.wikipedia.org/wiki/Library_of_Alexandria) and made by the nonprofit [OurResearch](https://ourresearch.org/).
4
+
5
+ This is the **technical documentation for OpenAlex,** including the [**OpenAlex API**](https://docs.openalex.org/how-to-use-the-api/api-overview) and the [**data snapshot**](https://docs.openalex.org/download-all-data/openalex-snapshot)**.** Here, you can learn how to set up your code to access OpenAlex's data. If you want to explore the data as a human, you may be more interested in [**OpenAlex Web**](https://help.openalex.org)**.**
6
+
7
+ ## Data
8
+
9
+ The OpenAlex dataset describes scholarly [*entities* ](https://docs.openalex.org/api-entities/entities-overview)and how those entities are connected to each other. Types of entities include [works](https://docs.openalex.org/api-entities/works), [authors](https://docs.openalex.org/api-entities/authors), [sources](https://docs.openalex.org/api-entities/sources), [institutions](https://docs.openalex.org/api-entities/institutions), [topics](https://docs.openalex.org/api-entities/topics), [publishers](https://docs.openalex.org/api-entities/publishers), and [funders](https://docs.openalex.org/api-entities/funders).
10
+
11
+ Together, these make a huge web (or more technically, heterogeneous directed [graph](https://en.wikipedia.org/wiki/Graph_theory)) of hundreds of millions of entities and billions of connections between them all.
12
+
13
+ Learn more at our general help center article: [About the data](https://help.openalex.org/hc/en-us/articles/24397285563671-About-the-data)
14
+
15
+ ## Access
16
+
17
+ We offer a fast, modern REST API to get OpenAlex data programmatically. It's free but requires an API key (also free). Get yours at [openalex.org/settings/api](https://openalex.org/settings/api). With your free key, you get 100,000 credits per day. [Learn more](https://docs.openalex.org/how-to-use-the-api/api-overview)
18
+
19
+ There is also a complete database snapshot available to download. [Learn more about the data snapshot here.](https://docs.openalex.org/download-all-data/openalex-snapshot)
20
+
21
+ The API has a limit of 100,000 credits per day, and the snapshot is updated monthly. If you need a higher limit, or more frequent updates, please look into [**OpenAlex Premium.**](https://openalex.org/pricing)
22
+
23
+ The web interface for OpenAlex, built directly on top of the API, is the quickest and easiest way to [get started with OpenAlex](https://help.openalex.org/getting-started).
24
+
25
+ ## Why OpenAlex?
26
+
27
+ OpenAlex offers an open replacement for industry-standard scientific knowledge bases like Elsevier's Scopus and Clarivate's Web of Science. [Compared to](https://openalex.org/about#comparison) these paywalled services, OpenAlex offers significant advantages in terms of inclusivity, affordability, and avaliability.
28
+
29
+ OpenAlex is:
30
+
31
+ * *Big —* We have about twice the coverage of the other services, and have significantly better coverage of non-English works and works from the Global South.
32
+ * *Easy —* Our service is fast, modern, and well-documented.
33
+ * *Open —* Our complete dataset is free under the CC0 license, which allows for transparency and reuse.
34
+
35
+ Many people and organizations have already found great value using OpenAlex. Have a look at the [Testimonials](https://openalex.org/testimonials) to hear what they've said!
36
+
37
+ ## Contact
38
+
39
+ For tech support and bug reports, please visit our [help page](https://openalex.org/help). You can also join the [OpenAlex user group](https://groups.google.com/g/openalex-users), and follow us on [Twitter (@OpenAlex\_org)](https://twitter.com/openalex_org) and [Mastodon](https://mastodon.social/@OpenAlex).
40
+
41
+ ## Citation
42
+
43
+ If you use OpenAlex in research, please cite [this paper](https://arxiv.org/abs/2205.01833):
44
+
45
+ > Priem, J., Piwowar, H., & Orr, R. (2022). *OpenAlex: A fully-open index of scholarly works, authors, venues, institutions, and concepts*. ArXiv. <https://arxiv.org/abs/2205.01833>
46
+
47
+ ---
48
+
49
+ # API Guide for LLMs
50
+
51
+ ## OpenAlex API Guide for LLM Agents and AI Applications
52
+
53
+ OpenAlex is a fully open catalog of scholarly works, authors, sources, institutions, topics, publishers, and funders. Base URL: <https://api.openalex.org> Documentation: <https://docs.openalex.org> API key required (free at openalex.org/settings/api) | 100,000 credits/day with key
54
+
55
+ ### CRITICAL GOTCHAS - Read These First!
56
+
57
+ #### ❌ DON'T: Create ad-hoc sampling by using random page numbers
58
+
59
+ WRONG: ?page=5, ?page=17, ?page=42 to get "random" results This is NOT random sampling and will bias your results!
60
+
61
+ #### ✅ DO: Use the ?sample parameter for random sampling
62
+
63
+ CORRECT: <https://api.openalex.org/works?sample=20> For consistent results, add a seed: ?sample=20\&seed=123
64
+
65
+ #### ❌ DON'T: Try to sample large datasets (10k+) in one request
66
+
67
+ The sample parameter maxes out at reasonable sizes for a single request.
68
+
69
+ #### ✅ DO: Use multiple samples with different seeds, then deduplicate
70
+
71
+ For large random samples (10k+ records):
72
+
73
+ 1. Make multiple sample requests with different seeds
74
+ 2. Combine results
75
+ 3. Deduplicate by ID Example:
76
+
77
+ * ?sample=1000\&seed=1
78
+ * ?sample=1000\&seed=2
79
+ * ?sample=1000\&seed=3 Then deduplicate the combined results by checking work IDs. See: <https://docs.openalex.org/how-to-use-the-api/get-lists-of-entities/sample-entity-lists>
80
+
81
+ #### ❌ DON'T: Search/filter by entity names directly
82
+
83
+ WRONG: /works?filter=author\_name:Einstein Entity names are ambiguous and this won't work!
84
+
85
+ #### ✅ DO: Use two-step lookup pattern for related entities
86
+
87
+ CORRECT two-step process:
88
+
89
+ 1. Find the entity ID: /authors?search=einstein Response shows ID like "A5023888391" or full URI
90
+ 2. Use ID to filter: /works?filter=authorships.author.id:A5023888391
91
+
92
+ Why? Names are ambiguous. "MIT" could be many institutions. IDs are unique. This applies to: authors, institutions, sources, topics, publishers, funders.
93
+
94
+ #### ❌ DON'T: Try to group by multiple dimensions in one query
95
+
96
+ WRONG: You cannot do SQL-style "GROUP BY topic, year" in a single API call.
97
+
98
+ #### ✅ DO: Make multiple queries and combine results client-side
99
+
100
+ To analyze by topic AND year (or any two dimensions):
101
+
102
+ 1. Make one query per year: ?filter=publication\_year:2020\&group\_by=topics.id
103
+ 2. Repeat for 2021, 2022, etc.
104
+ 3. Combine results in your code The API only supports one group\_by per request.
105
+
106
+ #### ❌ DON'T: Ignore API errors or retry immediately on failure
107
+
108
+ API errors are common, especially at scale. Immediate retries can make things worse.
109
+
110
+ #### ✅ DO: Implement exponential backoff for retries
111
+
112
+ When you get errors (429 rate limit, 500 server error, timeouts):
113
+
114
+ 1. Catch the error
115
+ 2. Wait before retrying (1s, 2s, 4s, 8s, etc.)
116
+ 3. Include a max retry limit (e.g., 5 attempts)
117
+ 4. Log failures for debugging
118
+
119
+ #### ❌ DON'T: Use default page sizes for bulk extraction
120
+
121
+ Default is only 25 results per page. Slow for large extracts!
122
+
123
+ #### ✅ DO: Use maximum page size (200) for bulk data extraction
124
+
125
+ FAST: ?per-page=200 This reduces the number of API calls needed by 8x compared to default.
126
+
127
+ #### ❌ DON'T: Make sequential API calls for lists of known IDs
128
+
129
+ SLOW: Loop through 100 DOIs making 100 separate API calls.
130
+
131
+ #### ✅ DO: Use the OR filter (pipe |) for batch ID lookups
132
+
133
+ FAST: Combine up to 50 IDs in one query using pipe separator: /works?filter=doi:<https://doi.org/10.1371/journal.pone.0266781|https://doi.org/10.1371/journal.pone.0267149|>... You can include up to 50 values per filter. Use per-page=50 to get all results. See: <https://docs.openalex.org/how-to-use-the-api/get-lists-of-entities/filter-entity-lists#addition-or>
134
+
135
+ #### ❌ DON'T: Ignore rate limits when using concurrency/threading
136
+
137
+ Using multiple threads WITHOUT respecting rate limits will get you rate-limited or banned.
138
+
139
+ #### ✅ DO: Respect rate limits even across concurrent requests
140
+
141
+ * Max 100 requests per second
142
+ * Daily limit: 100,000 credits (list requests cost 10 credits, singleton requests cost 1)
143
+
144
+ When using threading/async:
145
+
146
+ 1. Implement rate limiting across ALL threads
147
+ 2. Track requests per second globally
148
+ 3. Get an API key for higher credit limits
149
+
150
+ ### Quick Reference
151
+
152
+ #### Base URL and Authentication
153
+
154
+ ```
155
+ Base: https://api.openalex.org
156
+ Auth: API key required (free at openalex.org/settings/api)
157
+ Rate: 100k credits/day with key, 100 credits/day without (list=10cr, singleton=1cr)
158
+ ```
159
+
160
+ #### Get Higher Credit Limits
161
+
162
+ For higher daily credit limits, subscribe to [OpenAlex Premium](https://openalex.org/pricing) and use your API key:
163
+
164
+ ```
165
+ https://api.openalex.org/works?api_key=YOUR_API_KEY
166
+ ```
167
+
168
+ Academic researchers can often get increased limits for free—contact <support@openalex.org>.
169
+
170
+ #### Entity Endpoints
171
+
172
+ ```
173
+ /works - 240M+ scholarly documents (articles, books, datasets)
174
+ /authors - Researcher profiles with disambiguated identities
175
+ /sources - Journals, repositories, conferences
176
+ /institutions - Universities, research organizations
177
+ /topics - Subject classifications (3-level hierarchy)
178
+ /publishers - Publishing organizations
179
+ /funders - Funding agencies
180
+ /text - Tag your own text with OpenAlex topics/keywords (POST)
181
+ ```
182
+
183
+ #### Essential Query Parameters
184
+
185
+ ```
186
+ api_key= - Your API key (required, get free at openalex.org/settings/api)
187
+ filter= - Filter results (see filter syntax below)
188
+ search= - Full-text search across title/abstract/fulltext
189
+ sort= - Sort results (e.g., cited_by_count:desc)
190
+ per-page= - Results per page (default: 25, max: 200)
191
+ page= - Page number for pagination
192
+ sample= - Get random results (e.g., sample=50)
193
+ seed= - Seed for reproducible sampling
194
+ select= - Limit returned fields (e.g., select=id,title)
195
+ group_by= - Aggregate results by a field
196
+ ```
197
+
198
+ ### Filter Syntax
199
+
200
+ #### Basic Filtering
201
+
202
+ ```
203
+ Single filter: ?filter=publication_year:2020
204
+ Multiple (AND): ?filter=publication_year:2020,is_oa:true
205
+ Values (OR): ?filter=type:journal-article|book
206
+ Negation: ?filter=type:!journal-article
207
+ ```
208
+
209
+ #### Comparison Operators
210
+
211
+ ```
212
+ Greater than: ?filter=cited_by_count:>100
213
+ Less than: ?filter=publication_year:<2020
214
+ Range: ?filter=publication_year:2020-2023
215
+ ```
216
+
217
+ #### Multiple Values in Same Attribute
218
+
219
+ You can express AND within a single attribute two ways:
220
+
221
+ ```
222
+ Repeat filter: ?filter=institutions.country_code:us,institutions.country_code:gb
223
+ Use + symbol: ?filter=institutions.country_code:us+gb
224
+ ```
225
+
226
+ Both mean: "works with author from US AND author from GB"
227
+
228
+ #### OR Queries (Pipe Separator)
229
+
230
+ ```
231
+ Any of these: ?filter=institutions.country_code:us|gb|ca
232
+ Batch IDs: ?filter=doi:10.1/abc|10.2/def|10.3/ghi
233
+ ```
234
+
235
+ You can combine up to 50 values with pipes.
236
+
237
+ #### Important: OR only works WITHIN a filter, not BETWEEN filters
238
+
239
+ ```
240
+ POSSIBLE: ?filter=type:article|book (article OR book)
241
+ NOT POSSIBLE: Cannot do "(year=2020 OR year=2021) AND (type=article)"
242
+ WORKAROUND: Make separate queries and combine results
243
+ ```
244
+
245
+ ### Common Patterns
246
+
247
+ #### Get Random Sample of Works
248
+
249
+ ```
250
+ Small sample:
251
+ https://api.openalex.org/works?sample=20
252
+
253
+ Reproducible sample:
254
+ https://api.openalex.org/works?sample=20&seed=42
255
+
256
+ Large sample (10k+):
257
+ 1. https://api.openalex.org/works?sample=1000&seed=1
258
+ 2. https://api.openalex.org/works?sample=1000&seed=2
259
+ 3. https://api.openalex.org/works?sample=1000&seed=3
260
+ ...then deduplicate by ID
261
+ ```
262
+
263
+ #### Search Works by Title/Abstract
264
+
265
+ ```
266
+ Simple search:
267
+ https://api.openalex.org/works?search=machine+learning
268
+
269
+ Search specific field:
270
+ https://api.openalex.org/works?filter=title.search:CRISPR
271
+
272
+ Search + filter:
273
+ https://api.openalex.org/works?search=climate&filter=publication_year:2023
274
+ ```
275
+
276
+ #### Find Works by Author (Two-Step Pattern)
277
+
278
+ ```
279
+ Step 1 - Get author ID:
280
+ https://api.openalex.org/authors?search=Heather+Piwowar
281
+
282
+ Response includes: "id": "https://openalex.org/A5023888391"
283
+
284
+ Step 2 - Get their works:
285
+ https://api.openalex.org/works?filter=authorships.author.id:A5023888391
286
+
287
+ Alternative - Use ORCID directly:
288
+ https://api.openalex.org/works?filter=authorships.author.id:https://orcid.org/0000-0003-1613-5981
289
+ ```
290
+
291
+ #### Find Works by Institution (Two-Step Pattern)
292
+
293
+ ```
294
+ Step 1 - Get institution ID:
295
+ https://api.openalex.org/institutions?search=MIT
296
+
297
+ Response includes: "id": "https://openalex.org/I136199984"
298
+
299
+ Step 2 - Get their works:
300
+ https://api.openalex.org/works?filter=authorships.institutions.id:I136199984
301
+
302
+ Alternative - Use ROR directly:
303
+ https://api.openalex.org/works?filter=authorships.institutions.id:https://ror.org/042nb2s44
304
+ ```
305
+
306
+ #### Get Highly Cited Recent Papers
307
+
308
+ ```
309
+ https://api.openalex.org/works?filter=publication_year:>2020&sort=cited_by_count:desc&per-page=200
310
+ ```
311
+
312
+ #### Get Open Access Works Only
313
+
314
+ ```
315
+ All OA:
316
+ https://api.openalex.org/works?filter=is_oa:true
317
+
318
+ Gold OA only:
319
+ https://api.openalex.org/works?filter=open_access.oa_status:gold
320
+
321
+ Published OA version:
322
+ https://api.openalex.org/works?filter=has_oa_published_version:true
323
+ ```
324
+
325
+ #### Filter by Multiple Criteria
326
+
327
+ ```
328
+ Recent OA works about COVID from top institutions:
329
+ https://api.openalex.org/works?filter=publication_year:2022,is_oa:true,title.search:covid,authorships.institutions.id:I136199984|I27837315
330
+
331
+ Breaking down the filters:
332
+ - publication_year:2022 (recent)
333
+ - is_oa:true (open access)
334
+ - title.search:covid (about COVID)
335
+ - authorships.institutions.id:I136199984|I27837315 (MIT or Harvard)
336
+ ```
337
+
338
+ #### Bulk Lookup by DOIs
339
+
340
+ ```
341
+ Get specific works by DOI (efficient batch method):
342
+ https://api.openalex.org/works?filter=doi:https://doi.org/10.1371/journal.pone.0266781|https://doi.org/10.1371/journal.pone.0267149|https://doi.org/10.1371/journal.pone.0267890&per-page=50
343
+
344
+ Up to 50 DOIs per request. Use per-page=50 to ensure you get all results.
345
+ ```
346
+
347
+ #### Get Works from Specific Journal
348
+
349
+ ```
350
+ Step 1 - Get source ID:
351
+ https://api.openalex.org/sources?search=Nature
352
+
353
+ Response includes: "id": "https://openalex.org/S137773608"
354
+
355
+ Step 2 - Get works from that source:
356
+ https://api.openalex.org/works?filter=primary_location.source.id:S137773608
357
+ ```
358
+
359
+ #### Aggregate/Group Data
360
+
361
+ ```
362
+ Top topics by work count:
363
+ https://api.openalex.org/works?group_by=topics.id
364
+
365
+ Papers per year:
366
+ https://api.openalex.org/works?group_by=publication_year
367
+
368
+ Most prolific institutions:
369
+ https://api.openalex.org/works?group_by=authorships.institutions.id
370
+
371
+ Group with filters:
372
+ https://api.openalex.org/works?filter=publication_year:>2020&group_by=topics.id
373
+ ```
374
+
375
+ #### Pagination for Large Result Sets
376
+
377
+ ```
378
+ First page:
379
+ https://api.openalex.org/works?filter=publication_year:2023&per-page=200
380
+
381
+ Next pages:
382
+ https://api.openalex.org/works?filter=publication_year:2023&per-page=200&page=2
383
+ https://api.openalex.org/works?filter=publication_year:2023&per-page=200&page=3
384
+ ...
385
+
386
+ The meta.count field tells you total results.
387
+ Calculate pages needed: ceil(meta.count / per-page)
388
+ ```
389
+
390
+ #### Select Specific Fields Only (Faster Responses)
391
+
392
+ ```
393
+ Just IDs and titles:
394
+ https://api.openalex.org/works?select=id,title&per-page=200
395
+
396
+ Multiple fields:
397
+ https://api.openalex.org/works?select=id,title,publication_year,cited_by_count
398
+ ```
399
+
400
+ #### Autocomplete for Type-Ahead
401
+
402
+ ```
403
+ Fast autocomplete endpoint for building search UIs:
404
+
405
+ Authors:
406
+ https://api.openalex.org/autocomplete/authors?q=einst
407
+
408
+ Institutions:
409
+ https://api.openalex.org/autocomplete/institutions?q=stanford
410
+
411
+ Works:
412
+ https://api.openalex.org/autocomplete/works?q=neural+networks
413
+
414
+ Typically returns in ~200ms
415
+ ```
416
+
417
+ #### Tag Your Own Text (/text endpoint)
418
+
419
+ ```
420
+ POST or GET to classify your own content:
421
+
422
+ https://api.openalex.org/text?title=Machine+learning+for+drug+discovery
423
+
424
+ Returns topics, keywords, and concepts for your text.
425
+ Costs 1000 credits per request (limited to 1 req/sec).
426
+ Text must be 20-2000 characters.
427
+ ```
428
+
429
+ ### Response Structure
430
+
431
+ #### List Endpoints
432
+
433
+ All list endpoints (/works, /authors, etc.) return:
434
+
435
+ ```json
436
+ {
437
+ "meta": {
438
+ "count": 240523418,
439
+ "db_response_time_ms": 42,
440
+ "page": 1,
441
+ "per_page": 25
442
+ },
443
+ "results": [
444
+ { /* entity object */ },
445
+ { /* entity object */ },
446
+ ...
447
+ ]
448
+ }
449
+ ```
450
+
451
+ #### Single Entity Endpoints
452
+
453
+ Getting a single entity returns the object directly:
454
+
455
+ ```
456
+ https://api.openalex.org/works/W2741809807
457
+ → Returns a Work object directly (no meta/results wrapper)
458
+ ```
459
+
460
+ #### Group By Responses
461
+
462
+ ```json
463
+ {
464
+ "meta": { "count": 100, ... },
465
+ "group_by": [
466
+ {
467
+ "key": "https://openalex.org/T10001",
468
+ "key_display_name": "Artificial Intelligence",
469
+ "count": 15234
470
+ },
471
+ ...
472
+ ]
473
+ }
474
+ ```
475
+
476
+ ### Performance Optimization Tips
477
+
478
+ #### 1. Use Maximum Page Size
479
+
480
+ ```
481
+ SLOW: Default 25 per page = more API calls
482
+ FAST: ?per-page=200 (8x fewer API calls)
483
+ ```
484
+
485
+ #### 2. Use Batch ID Lookups
486
+
487
+ ```
488
+ SLOW: Loop through 50 DOIs, 50 API calls
489
+ FAST: One call with pipe-separated DOIs (up to 50)
490
+ ```
491
+
492
+ #### 3. Select Only Fields You Need
493
+
494
+ ```
495
+ SLOW: Full objects with all fields
496
+ FAST: ?select=id,title,publication_year
497
+ ```
498
+
499
+ #### 4. Use Concurrent Requests with Rate Limiting
500
+
501
+ ```python
502
+ # Pseudo-code
503
+ from concurrent.futures import ThreadPoolExecutor
504
+ import time
505
+
506
+ rate_limiter = RateLimiter(100) # max 100 req/sec
507
+
508
+ def fetch_page(page_num):
509
+ rate_limiter.wait() # Ensure we don't exceed rate limit
510
+ return requests.get(f"...&page={page_num}&api_key=YOUR_KEY")
511
+
512
+ with ThreadPoolExecutor(max_workers=10) as executor:
513
+ results = executor.map(fetch_page, range(1, 101))
514
+ ```
515
+
516
+ #### 5. Get an API Key for Heavy Usage
517
+
518
+ ```
519
+ FREE: 100,000 credits/day
520
+ PREMIUM: Higher limits (varies by plan)
521
+
522
+ Get an API key: https://openalex.org/pricing
523
+ ```
524
+
525
+ ### Handling Errors
526
+
527
+ #### Common HTTP Status Codes
528
+
529
+ ```
530
+ 200 OK - Success
531
+ 400 Bad Request - Invalid parameter (check your filter syntax)
532
+ 403 Forbidden - Rate limit exceeded (slow down, implement backoff)
533
+ 404 Not Found - Entity doesn't exist
534
+ 500 Server Error - Temporary issue (retry with backoff)
535
+ ```
536
+
537
+ #### Exponential Backoff Pattern
538
+
539
+ ```python
540
+ def fetch_with_retry(url, max_retries=5):
541
+ for attempt in range(max_retries):
542
+ try:
543
+ response = requests.get(url, timeout=30)
544
+ if response.status_code == 200:
545
+ return response.json()
546
+ elif response.status_code == 403:
547
+ # Rate limited
548
+ wait_time = 2 ** attempt # 1s, 2s, 4s, 8s, 16s
549
+ time.sleep(wait_time)
550
+ elif response.status_code >= 500:
551
+ # Server error
552
+ wait_time = 2 ** attempt
553
+ time.sleep(wait_time)
554
+ else:
555
+ # Other error, don't retry
556
+ response.raise_for_status()
557
+ except requests.exceptions.Timeout:
558
+ if attempt < max_retries - 1:
559
+ time.sleep(2 ** attempt)
560
+ else:
561
+ raise
562
+ raise Exception(f"Failed after {max_retries} retries")
563
+ ```
564
+
565
+ ### Entity-Specific Filter Examples
566
+
567
+ #### Works Filters (Most Common)
568
+
569
+ ```
570
+ authorships.author.id - Author's OpenAlex ID
571
+ authorships.institutions.id - Institution's OpenAlex ID
572
+ cited_by_count - Number of citations
573
+ is_oa - Is open access (true/false)
574
+ publication_year - Year published
575
+ primary_location.source.id - Source (journal) ID
576
+ topics.id - Topic ID
577
+ type - article, book, dataset, etc.
578
+ has_doi - Has a DOI (true/false)
579
+ has_fulltext - Has fulltext available (true/false)
580
+ ```
581
+
582
+ #### Authors Filters
583
+
584
+ ```
585
+ last_known_institution.id - Current/last institution
586
+ works_count - Number of works authored
587
+ cited_by_count - Total citations
588
+ orcid - ORCID identifier
589
+ ```
590
+
591
+ #### Sources Filters
592
+
593
+ ```
594
+ host_organization - Publisher/host
595
+ type - journal, repository, etc.
596
+ is_oa - Is open access
597
+ ```
598
+
599
+ #### Institutions Filters
600
+
601
+ ```
602
+ type - education, healthcare, company, etc.
603
+ country_code - Two-letter country code
604
+ continent - africa, asia, europe, etc.
605
+ ```
606
+
607
+ ### External ID Support
608
+
609
+ You can use external IDs directly in the API:
610
+
611
+ #### Works
612
+
613
+ ```
614
+ DOI: /works/https://doi.org/10.7717/peerj.4375
615
+ PMID: /works/pmid:29844763
616
+ ```
617
+
618
+ #### Authors
619
+
620
+ ```
621
+ ORCID: /authors/https://orcid.org/0000-0003-1613-5981
622
+ ```
623
+
624
+ #### Institutions
625
+
626
+ ```
627
+ ROR: /institutions/https://ror.org/02y3ad647
628
+ ```
629
+
630
+ #### Sources
631
+
632
+ ```
633
+ ISSN: /sources/issn:0028-0836
634
+ ```
635
+
636
+ ### Advanced Tips
637
+
638
+ #### Reproducible Random Samples
639
+
640
+ Always use a seed for reproducible sampling:
641
+
642
+ ```
643
+ https://api.openalex.org/works?sample=100&seed=42
644
+ ```
645
+
646
+ Same seed = same results every time.
647
+
648
+ #### Finding Related Works
649
+
650
+ ```
651
+ Get cited works:
652
+ 1. Get work: /works/W2741809807
653
+ 2. Response includes: "referenced_works": [...]
654
+ 3. Fetch those: /works?filter=openalex_id:W123|W456|W789
655
+
656
+ Get citing works:
657
+ 1. Get work: /works/W2741809807
658
+ 2. Response includes: "cited_by_api_url"
659
+ 3. Follow that URL
660
+ ```
661
+
662
+ #### Filtering by Date Ranges
663
+
664
+ ```
665
+ Exact year: ?filter=publication_year:2020
666
+ After: ?filter=publication_year:>2020
667
+ Before: ?filter=publication_year:<2020
668
+ Range: ?filter=publication_year:2018-2022
669
+ ```
670
+
671
+ #### Complex Boolean Searches
672
+
673
+ The search parameter supports boolean operators:
674
+
675
+ ```
676
+ AND: ?search=climate+AND+change
677
+ OR: ?search=climate+OR+weather
678
+ NOT: ?search=climate+NOT+politics
679
+ ```
680
+
681
+ ### Rate Limiting Best Practices
682
+
683
+ #### Without API Key
684
+
685
+ * 100 credits per day (for testing only)
686
+ * Max 100 requests per second
687
+ * Not suitable for production use
688
+
689
+ #### With Free API Key
690
+
691
+ * 100,000 credits per day
692
+ * Max 100 requests per second
693
+ * Get your free key at [openalex.org/settings/api](https://openalex.org/settings/api)
694
+
695
+ #### With Premium API Key
696
+
697
+ * Higher credit limits (varies by plan)
698
+ * Max 100 requests per second
699
+ * Contact <support@openalex.org> for academic waivers
700
+
701
+ #### Credit Costs
702
+
703
+ * Singleton requests (e.g., `/works/W123`): 1 credit
704
+ * List requests (e.g., `/works?filter=...`): 10 credits
705
+ * Text/Aboutness requests: 1,000 credits
706
+
707
+ #### Concurrent Requests Strategy
708
+
709
+ ```
710
+ 1. Track requests per second globally (not per thread)
711
+ 2. Use a semaphore or rate limiter across threads
712
+ 3. Add delays between batches if needed
713
+ 4. Monitor for 403 responses (rate limit exceeded)
714
+ 5. Back off if you hit limits
715
+ ```
716
+
717
+ #### Daily Limit Management
718
+
719
+ With 100k credits/day limit:
720
+
721
+ * Singleton requests: up to 100,000/day
722
+ * List requests: up to 10,000/day
723
+ * Plan accordingly for large jobs
724
+ * Consider OpenAlex Premium for higher limits
725
+
726
+ ### Common Mistakes to Avoid
727
+
728
+ 1. ❌ Using page numbers for sampling → ✅ Use ?sample=
729
+ 2. ❌ Filtering by entity names → ✅ Get IDs first, then filter
730
+ 3. ❌ Default page size → ✅ Use per-page=200
731
+ 4. ❌ Sequential ID lookups → ✅ Batch with pipe (|) operator
732
+ 5. ❌ No error handling → ✅ Implement retry with backoff
733
+ 6. ❌ Ignoring rate limits in threads → ✅ Global rate limiting
734
+ 7. ❌ Trying to group by multiple fields → ✅ Multiple queries + combine
735
+ 8. ❌ No API key for heavy usage → ✅ Get API key for higher credit limits
736
+ 9. ❌ Fetching all fields → ✅ Use select= for needed fields only
737
+ 10. ❌ Assuming instant responses → ✅ Add timeouts (30s recommended)
738
+
739
+ ### Need More Info?
740
+
741
+ * Full documentation: <https://docs.openalex.org>
742
+ * API Overview: <https://docs.openalex.org/how-to-use-the-api/api-overview>
743
+ * Entity schemas: <https://docs.openalex.org/api-entities>
744
+ * Help: <https://openalex.org/help>
745
+ * User group: <https://groups.google.com/g/openalex-users>
746
+
747
+ ### For Premium Features
748
+
749
+ If you need:
750
+
751
+ * More than 100k credits/day
752
+ * Faster than daily snapshot updates
753
+ * Commercial support
754
+ * SLA guarantees
755
+
756
+ See: <https://openalex.org/pricing>
757
+
758
+ ***
759
+
760
+ Last updated: 2025-10-13 Maintained for: LLM agents, AI applications, and automated tools
761
+
762
+ ---
763
+
764
+ # Quickstart tutorial
765
+
766
+ Lets use the OpenAlex API to get journal articles and books published by authors at Stanford University. We'll limit our search to articles published between 2010 and 2020. Since OpenAlex is free and openly available, these examples work without any login or account creation. :thumbsup:
767
+
768
+ {% hint style="info" %}
769
+ If you open these examples in a web browser, they will look *much* better if you have a browser plug-in such as [JSONVue](https://chrome.google.com/webstore/detail/jsonvue/chklaanhfefbnpoihckbnefhakgolnmc) installed.
770
+ {% endhint %}
771
+
772
+ ### 1. Find the institution
773
+
774
+ You can use the [institutions](https://docs.openalex.org/api-entities/institutions) endpoint to learn about universities and research centers. OpenAlex has a powerful search feature that searches across 108,000 institutions.
775
+
776
+ Lets use it to search for Stanford University:
777
+
778
+ * Find Stanford University\
779
+ [`https://api.openalex.org/institutions?search=stanford`](https://api.openalex.org/institutions?search=stanford)
780
+
781
+ Our first result looks correct (yeah!):
782
+
783
+ ```json
784
+ {
785
+ "id": "https://openalex.org/I97018004",
786
+ "ror": "https://ror.org/00f54p054",
787
+ "display_name": "Stanford University",
788
+ "country_code": "US",
789
+ "type": "education",
790
+ "homepage_url": "http://www.stanford.edu/"
791
+ // other fields removed
792
+ }
793
+ ```
794
+
795
+ We can use the ID `https://openalex.org/I97018004` in that result to find out more.
796
+
797
+ ### 2. Find articles (works) associated with Stanford University
798
+
799
+ The [works](https://docs.openalex.org/api-entities/works) endpoint contains over 240 million articles, books, and theses :astonished:. We can filter to show works associated with Stanford.
800
+
801
+ * Show works where at least one author is associated with Stanford University\
802
+ [`https://api.openalex.org/works?filter=institutions.id:https://openalex.org/I97018004`](https://api.openalex.org/works?filter=institutions.id:https://openalex.org/I97018004)
803
+
804
+ This is just one of the 50+ ways that you can filter works!
805
+
806
+ ### 3. Filter works by publication year
807
+
808
+ Right now the list shows records for all years. Lets narrow it down to works that were published between 2010 to 2020, and sort from newest to oldest.
809
+
810
+ * Show works with publication years 2010 to 2020, associated with Stanford University\
811
+ <https://api.openalex.org/works?filter=institutions.id:https://openalex.org/I97018004,publication_year:2010-2020&sort=publication_date:desc>
812
+
813
+ ### 4. Group works by publication year to show counts by year
814
+
815
+ Finally, you can group our result by publication year to get our final result, which is the number of articles produced by Stanford, by year from 2010 to 2020. There are more than 30 ways to group records in OpenAlex, including by publisher, journal, and open access status.
816
+
817
+ * Group records by publication year\
818
+ [`https://api.openalex.org/works?filter=institutions.id:https://openalex.org/I97018004,publication\_year:2010-2020\&group-by=publication\_year`](https://api.openalex.org/works?filter=institutions.id:https://openalex.org/I97018004,publication_year:2010-2020\&group-by=publication_year)
819
+
820
+ That gives a result like this:
821
+
822
+ ```json
823
+ [
824
+ {
825
+ "key": "2020",
826
+ "key_display_name": "2020",
827
+ "count": 18627
828
+ },
829
+ {
830
+ "key": "2019",
831
+ "key_display_name": "2019",
832
+ "count": 15933
833
+ },
834
+ {
835
+ "key": "2017",
836
+ "key_display_name": "2017",
837
+ "count": 14789
838
+ },
839
+ ...
840
+ ]
841
+ ```
842
+
843
+ There you have it! This same technique can be applied to hundreds of questions around scholarly data. The data you received is under a [CC0 license](https://creativecommons.org/publicdomain/zero/1.0/), so not only did you access it easily, you can share it freely! :tada:
844
+
845
+ ## What's next?
846
+
847
+ Jump into an area of OpenAlex that interests you:
848
+
849
+ * [Works](https://docs.openalex.org/api-entities/works)
850
+ * [Authors](https://docs.openalex.org/api-entities/authors)
851
+ * [Sources](https://docs.openalex.org/api-entities/sources)
852
+ * [Institutions](https://docs.openalex.org/api-entities/institutions)
853
+ * [Topics](https://docs.openalex.org/api-entities/topics)
854
+ * [Publishers](https://docs.openalex.org/api-entities/publishers)
855
+ * [Funders](https://docs.openalex.org/api-entities/funders)
856
+
857
+ And check out our [tutorials](https://docs.openalex.org/additional-help/tutorials) page for some hands-on examples!
858
+
859
+ ---
860
+
861
+ # Quickstart tutorial
862
+
863
+ Lets use the OpenAlex API to get journal articles and books published by authors at Stanford University. We'll limit our search to articles published between 2010 and 2020. Since OpenAlex is free and openly available, these examples work without any login or account creation. :thumbsup:
864
+
865
+ {% hint style="info" %}
866
+ If you open these examples in a web browser, they will look *much* better if you have a browser plug-in such as [JSONVue](https://chrome.google.com/webstore/detail/jsonvue/chklaanhfefbnpoihckbnefhakgolnmc) installed.
867
+ {% endhint %}
868
+
869
+ ### 1. Find the institution
870
+
871
+ You can use the [institutions](https://docs.openalex.org/api-entities/institutions) endpoint to learn about universities and research centers. OpenAlex has a powerful search feature that searches across 108,000 institutions.
872
+
873
+ Lets use it to search for Stanford University:
874
+
875
+ * Find Stanford University\
876
+ [`https://api.openalex.org/institutions?search=stanford`](https://api.openalex.org/institutions?search=stanford)
877
+
878
+ Our first result looks correct (yeah!):
879
+
880
+ ```json
881
+ {
882
+ "id": "https://openalex.org/I97018004",
883
+ "ror": "https://ror.org/00f54p054",
884
+ "display_name": "Stanford University",
885
+ "country_code": "US",
886
+ "type": "education",
887
+ "homepage_url": "http://www.stanford.edu/"
888
+ // other fields removed
889
+ }
890
+ ```
891
+
892
+ We can use the ID `https://openalex.org/I97018004` in that result to find out more.
893
+
894
+ ### 2. Find articles (works) associated with Stanford University
895
+
896
+ The [works](https://docs.openalex.org/api-entities/works) endpoint contains over 240 million articles, books, and theses :astonished:. We can filter to show works associated with Stanford.
897
+
898
+ * Show works where at least one author is associated with Stanford University\
899
+ [`https://api.openalex.org/works?filter=institutions.id:https://openalex.org/I97018004`](https://api.openalex.org/works?filter=institutions.id:https://openalex.org/I97018004)
900
+
901
+ This is just one of the 50+ ways that you can filter works!
902
+
903
+ ### 3. Filter works by publication year
904
+
905
+ Right now the list shows records for all years. Lets narrow it down to works that were published between 2010 to 2020, and sort from newest to oldest.
906
+
907
+ * Show works with publication years 2010 to 2020, associated with Stanford University\
908
+ <https://api.openalex.org/works?filter=institutions.id:https://openalex.org/I97018004,publication_year:2010-2020&sort=publication_date:desc>
909
+
910
+ ### 4. Group works by publication year to show counts by year
911
+
912
+ Finally, you can group our result by publication year to get our final result, which is the number of articles produced by Stanford, by year from 2010 to 2020. There are more than 30 ways to group records in OpenAlex, including by publisher, journal, and open access status.
913
+
914
+ * Group records by publication year\
915
+ [`https://api.openalex.org/works?filter=institutions.id:https://openalex.org/I97018004,publication\_year:2010-2020\&group-by=publication\_year`](https://api.openalex.org/works?filter=institutions.id:https://openalex.org/I97018004,publication_year:2010-2020\&group-by=publication_year)
916
+
917
+ That gives a result like this:
918
+
919
+ ```json
920
+ [
921
+ {
922
+ "key": "2020",
923
+ "key_display_name": "2020",
924
+ "count": 18627
925
+ },
926
+ {
927
+ "key": "2019",
928
+ "key_display_name": "2019",
929
+ "count": 15933
930
+ },
931
+ {
932
+ "key": "2017",
933
+ "key_display_name": "2017",
934
+ "count": 14789
935
+ },
936
+ ...
937
+ ]
938
+ ```
939
+
940
+ There you have it! This same technique can be applied to hundreds of questions around scholarly data. The data you received is under a [CC0 license](https://creativecommons.org/publicdomain/zero/1.0/), so not only did you access it easily, you can share it freely! :tada:
941
+
942
+ ## What's next?
943
+
944
+ Jump into an area of OpenAlex that interests you:
945
+
946
+ * [Works](https://docs.openalex.org/api-entities/works)
947
+ * [Authors](https://docs.openalex.org/api-entities/authors)
948
+ * [Sources](https://docs.openalex.org/api-entities/sources)
949
+ * [Institutions](https://docs.openalex.org/api-entities/institutions)
950
+ * [Topics](https://docs.openalex.org/api-entities/topics)
951
+ * [Publishers](https://docs.openalex.org/api-entities/publishers)
952
+ * [Funders](https://docs.openalex.org/api-entities/funders)
953
+
954
+ And check out our [tutorials](https://docs.openalex.org/additional-help/tutorials) page for some hands-on examples!
955
+
956
+ ---
957
+
958
+ # Entities overview
959
+
960
+ The OpenAlex dataset describes scholarly *entities* and how those entities are connected to each other. Together, these make a huge web (or more technically, heterogeneous directed [graph](https://en.wikipedia.org/wiki/Graph_theory)) of hundreds of millions of entities and billions of connections between them all.
961
+
962
+ Learn more about the OpenAlex entities:
963
+
964
+ * [Works](https://docs.openalex.org/api-entities/works): Scholarly documents like journal articles, books, datasets, and theses
965
+ * [Authors](https://docs.openalex.org/api-entities/authors): People who create works
966
+ * [Sources](https://docs.openalex.org/api-entities/sources): Where works are hosted (such as journals, conferences, and repositories)
967
+ * [Institutions](https://docs.openalex.org/api-entities/institutions): Universities and other organizations to which authors claim affiliations
968
+ * [Topics](https://docs.openalex.org/api-entities/topics): Topics assigned to works
969
+ * [Publishers](https://docs.openalex.org/api-entities/publishers): Companies and organizations that distribute works
970
+ * [Funders](https://docs.openalex.org/api-entities/funders): Organizations that fund research
971
+ * [Geo](https://docs.openalex.org/api-entities/geo): Where things are in the world
972
+
973
+ ---