arxiv-api-wrapper 1.1.0 → 2.0.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -1,250 +1,318 @@
1
- # arxiv-api-wrapper
2
-
3
- A TypeScript package that provides a convenient wrapper around the arXiv API, enabling easy querying and parsing of arXiv papers.
4
-
5
- ## Installation
6
-
7
- ```bash
8
- npm install arxiv-api-wrapper
9
- ```
10
-
11
- ## Quick Start
12
-
13
- ```typescript
14
- import { getArxivEntries, getArxivEntriesById } from 'arxiv-api-wrapper';
15
-
16
- // Search for papers
17
- const result = await getArxivEntries({
18
- search: {
19
- title: ['quantum computing'],
20
- author: ['John Doe'],
21
- },
22
- maxResults: 10,
23
- sortBy: 'submittedDate',
24
- sortOrder: 'descending',
25
- });
26
-
27
- console.log(`Found ${result.feed.totalResults} papers`);
28
- result.entries.forEach(entry => {
29
- console.log(`${entry.arxivId}: ${entry.title}`);
30
- });
31
-
32
- // Or fetch specific papers by ID
33
- const papers = await getArxivEntriesById(['2101.01234', '2101.05678']);
34
- ```
35
-
36
- ## Features
37
-
38
- - **Type-safe**: Full TypeScript support with comprehensive type definitions
39
- - **Flexible Search**: Support for complex queries with multiple filters, OR groups, and negation
40
- - **Rate Limiting**: Built-in token bucket rate limiter to respect arXiv API guidelines
41
- - **Retry Logic**: Automatic retries with exponential backoff for transient failures
42
- - **Pagination**: Support for paginated results with configurable page size
43
- - **Sorting**: Multiple sort options (relevance, submission date, last updated)
44
-
45
- ## API Reference
46
-
47
- For complete API documentation with detailed type information and examples, see the [generated API documentation](https://vagdur.github.io/arxiv-api-wrapper/).
48
-
49
- ### `getArxivEntriesById(ids: string[], options?): Promise<ArxivQueryResult>`
50
-
51
- Simpler function to fetch arXiv papers by their IDs using the id_list API mode.
52
-
53
- **Parameters:**
54
- - `ids: string[]` - Array of arXiv paper IDs (e.g., `['2101.01234', '2101.05678']`)
55
- - `options?: object` - Optional request configuration
56
- - `rateLimit?: { tokensPerInterval: number, intervalMs: number }` - Rate limit configuration
57
- - `retries?: number` - Number of retry attempts (default: 3)
58
- - `timeoutMs?: number` - Request timeout in milliseconds (default: 10000)
59
- - `userAgent?: string` - Custom User-Agent header
60
-
61
- **Returns:** Same as `getArxivEntries` - see return type below.
62
-
63
- ### `getArxivEntries(options: ArxivQueryOptions): Promise<ArxivQueryResult>`
64
-
65
- Main function to query the arXiv API with search filters or ID lists.
66
-
67
- **Options:**
68
- - `idList?: string[]` - List of arXiv IDs to fetch (e.g., `['2101.01234', '2101.05678']`)
69
- - `search?: ArxivSearchFilters` - Search filters (when used with `idList`, filters the entries from `idList` to only return those matching the search query)
70
- - `start?: number` - Pagination offset (0-based)
71
- - `maxResults?: number` - Maximum number of results (≤ 300)
72
- - `sortBy?: 'relevance' | 'lastUpdatedDate' | 'submittedDate'` - Sort field
73
- - `sortOrder?: 'ascending' | 'descending'` - Sort direction
74
- - `timeoutMs?: number` - Request timeout in milliseconds (default: 10000)
75
- - `retries?: number` - Number of retry attempts (default: 3)
76
- - `rateLimit?: { tokensPerInterval: number, intervalMs: number }` - Rate limit configuration
77
- - `userAgent?: string` - Custom User-Agent header
78
-
79
- **Search Filters:**
80
- - `title?: string[]` - Search in titles
81
- - `author?: string[]` - Search by author names
82
- - `abstract?: string[]` - Search in abstracts
83
- - `category?: string[]` - Filter by arXiv categories
84
- - `submittedDateRange?: { from: string, to: string }` - Date range filter (YYYYMMDDTTTT format)
85
- - `or?: ArxivSearchFilters[]` - OR group of filters
86
- - `andNot?: ArxivSearchFilters` - Negated filter (ANDNOT)
87
-
88
- **Returns:**
89
- ```typescript
90
- {
91
- feed: {
92
- id: string;
93
- updated: string;
94
- title: string;
95
- link: string;
96
- totalResults: number;
97
- startIndex: number;
98
- itemsPerPage: number;
99
- };
100
- entries: Array<{
101
- id: string;
102
- arxivId: string;
103
- title: string;
104
- summary: string;
105
- published: string;
106
- updated: string;
107
- authors: Array<{ name: string; affiliation?: string }>;
108
- categories: string[];
109
- primaryCategory?: string;
110
- links: Array<{ href: string; rel?: string; type?: string; title?: string }>;
111
- doi?: string;
112
- journalRef?: string;
113
- comment?: string;
114
- }>;
115
- }
116
- ```
117
-
118
- ## Examples
119
-
120
- ### Search by title and author
121
-
122
- ```typescript
123
- const result = await getArxivEntries({
124
- search: {
125
- title: ['machine learning'],
126
- author: ['Geoffrey Hinton'],
127
- },
128
- maxResults: 5,
129
- });
130
- ```
131
-
132
- ### Fetch specific papers by ID
133
-
134
- Using the simpler `getArxivEntriesById` function:
135
-
136
- ```typescript
137
- const result = await getArxivEntriesById(['2101.01234', '2101.05678']);
138
- ```
139
-
140
- Or using `getArxivEntries`:
141
-
142
- ```typescript
143
- const result = await getArxivEntries({
144
- idList: ['2101.01234', '2101.05678'],
145
- });
146
- ```
147
-
148
- ### Complex search with OR and date range
149
-
150
- ```typescript
151
- const result = await getArxivEntries({
152
- search: {
153
- or: [
154
- { title: ['quantum'] },
155
- { abstract: ['quantum'] },
156
- ],
157
- submittedDateRange: {
158
- from: '202301010600',
159
- to: '202401010600',
160
- },
161
- },
162
- sortBy: 'submittedDate',
163
- sortOrder: 'descending',
164
- });
165
- ```
166
-
167
- ### Fetch papers by ID with rate limiting
168
-
169
- ```typescript
170
- const result = await getArxivEntriesById(
171
- ['2101.01234', '2101.05678'],
172
- {
173
- rateLimit: {
174
- tokensPerInterval: 1,
175
- intervalMs: 3000, // 1 request per 3 seconds
176
- },
177
- timeoutMs: 15000,
178
- }
179
- );
180
- ```
181
-
182
- ### Search with rate limiting
183
-
184
- ```typescript
185
- const result = await getArxivEntries({
186
- search: { title: ['neural networks'] },
187
- rateLimit: {
188
- tokensPerInterval: 1,
189
- intervalMs: 3000, // 1 request per 3 seconds
190
- },
191
- });
192
- ```
193
-
194
- ## Documentation
195
-
196
- ### Generating API Documentation
197
-
198
- To generate browsable API documentation from the source code:
199
-
200
- ```bash
201
- npm run docs:generate
202
- ```
203
-
204
- This will create HTML documentation in the `docs/` directory. You can then view it locally:
205
-
206
- ```bash
207
- npm run docs:serve
208
- ```
209
-
210
- The generated documentation includes:
211
- - Complete API reference for all exported functions and types
212
- - Detailed parameter descriptions and examples
213
- - Type information and relationships
214
- - Search functionality
215
-
216
- ### IDE IntelliSense
217
-
218
- All exported functions and types include JSDoc comments for enhanced IDE IntelliSense support. Hover over any exported symbol in your IDE to see inline documentation.
219
-
220
- ## TypeScript Types
221
-
222
- All types are exported from the package:
223
-
224
- ```typescript
225
- import type {
226
- ArxivQueryOptions,
227
- ArxivQueryResult,
228
- ArxivSearchFilters,
229
- ArxivEntry,
230
- ArxivFeedMeta,
231
- ArxivAuthor,
232
- ArxivLink,
233
- ArxivSortBy,
234
- ArxivSortOrder,
235
- ArxivRateLimitConfig,
236
- ArxivDateRange,
237
- } from 'arxiv-api-wrapper';
238
- ```
239
-
240
- ## License
241
-
242
- ISC
243
-
244
- ## Author
245
-
246
- Vilhelm Agdur
247
-
248
- ## Repository
249
-
250
- https://github.com/vagdur/arxiv-api-wrapper
1
+ # arxiv-api-wrapper
2
+
3
+ A TypeScript package that provides a convenient wrapper around the arXiv API, enabling easy querying and parsing of arXiv papers.
4
+
5
+ ## Installation
6
+
7
+ ```bash
8
+ npm install arxiv-api-wrapper
9
+ ```
10
+
11
+ ## Quick Start
12
+
13
+ ```typescript
14
+ import { getArxivEntries, getArxivEntriesById } from 'arxiv-api-wrapper';
15
+
16
+ // Search for papers
17
+ const result = await getArxivEntries({
18
+ search: {
19
+ title: ['quantum computing'],
20
+ author: ['John Doe'],
21
+ },
22
+ maxResults: 10,
23
+ sortBy: 'submittedDate',
24
+ sortOrder: 'descending',
25
+ });
26
+
27
+ console.log(`Found ${result.feed.totalResults} papers`);
28
+ result.entries.forEach(entry => {
29
+ console.log(`${entry.arxivId}: ${entry.title}`);
30
+ });
31
+
32
+ // Or fetch specific papers by ID
33
+ const papers = await getArxivEntriesById(['2101.01234', '2101.05678']);
34
+ ```
35
+
36
+ ## Features
37
+
38
+ - **Type-safe**: Full TypeScript support with comprehensive type definitions
39
+ - **Flexible Search**: Support for complex queries with multiple filters, OR groups, and negation
40
+ - **Rate Limiting**: Built-in token bucket rate limiter to respect arXiv API guidelines
41
+ - **Retry Logic**: Automatic retries with exponential backoff for transient failures
42
+ - **Pagination**: Support for paginated results with configurable page size
43
+ - **Sorting**: Multiple sort options (relevance, submission date, last updated)
44
+ - **OAI-PMH**: Support for the [arXiv Open Archives Initiative](https://info.arxiv.org/help/oa/index.html#open-archives-initiative-oai) interface (Identify, ListSets, GetRecord, ListRecords, ListIdentifiers, ListMetadataFormats)
45
+
46
+ ## OAI-PMH interface
47
+
48
+ The package also supports the arXiv OAI-PMH endpoint (`https://oaipmh.arxiv.org/oai`), which is useful for metadata harvesting and bulk access. See the [arXiv OAI help](https://info.arxiv.org/help/oa/index.html#open-archives-initiative-oai) and the [OAI-PMH v2.0 protocol](https://www.openarchives.org/OAI/openarchivesprotocol.html) for details.
49
+
50
+ ```typescript
51
+ import {
52
+ oaiIdentify,
53
+ oaiListRecords,
54
+ oaiListRecordsAsyncIterator,
55
+ oaiGetRecord,
56
+ oaiListSets,
57
+ oaiListIdentifiers,
58
+ oaiListMetadataFormats,
59
+ } from 'arxiv-api-wrapper';
60
+
61
+ // Repository info
62
+ const identify = await oaiIdentify();
63
+ console.log(identify.repositoryName, identify.protocolVersion);
64
+
65
+ // One page of records (e.g. Dublin Core)
66
+ const result = await oaiListRecords('oai_dc', {
67
+ from: '2024-01-01',
68
+ until: '2024-01-31',
69
+ set: 'math:math:LO', // optional: restrict to a set
70
+ rateLimit: { tokensPerInterval: 1, intervalMs: 1000 },
71
+ });
72
+ result.records.forEach((rec) => {
73
+ console.log(rec.header.identifier, rec.metadata);
74
+ });
75
+ if (result.resumptionToken) {
76
+ // Fetch next page with result.resumptionToken.value
77
+ }
78
+
79
+ // Single record by identifier (full or short form)
80
+ const record = await oaiGetRecord('cs/0112017', 'oai_dc');
81
+ ```
82
+
83
+ For an intermediate option between manual page-by-page pagination and `*All` helpers, use async iterators:
84
+
85
+ ```typescript
86
+ for await (const rec of oaiListRecordsAsyncIterator('oai_dc', {
87
+ from: '2024-01-01',
88
+ until: '2024-01-02',
89
+ maxRecords: 50,
90
+ })) {
91
+ console.log(rec.header.identifier);
92
+ }
93
+ ```
94
+
95
+ The `oaiListRecordsAll` / `oaiListIdentifiersAll` / `oaiListSetsAll` helpers are convenience wrappers that collect from the corresponding async iterators.
96
+
97
+ All OAI functions accept optional `timeoutMs`, `retries`, `userAgent`, and `rateLimit` (same as the Atom API). OAI errors (e.g. `idDoesNotExist`, `noRecordsMatch`) are thrown as `OaiError` with a `code` and `messageText`.
98
+
99
+ ## API Reference
100
+
101
+ For complete API documentation with detailed type information and examples, see the [generated API documentation](https://vagdur.github.io/arxiv-api-wrapper/).
102
+
103
+ ### `getArxivEntriesById(ids: string[], options?): Promise<ArxivQueryResult>`
104
+
105
+ Simpler function to fetch arXiv papers by their IDs using the id_list API mode.
106
+
107
+ **Parameters:**
108
+ - `ids: string[]` - Array of arXiv paper IDs (e.g., `['2101.01234', '2101.05678']`)
109
+ - `options?: object` - Optional request configuration
110
+ - `rateLimit?: { tokensPerInterval: number, intervalMs: number }` - Rate limit configuration
111
+ - `retries?: number` - Number of retry attempts (default: 3)
112
+ - `timeoutMs?: number` - Request timeout in milliseconds (default: 10000)
113
+ - `userAgent?: string` - Custom User-Agent header
114
+
115
+ **Returns:** Same as `getArxivEntries` - see return type below.
116
+
117
+ ### `getArxivEntries(options: ArxivQueryOptions): Promise<ArxivQueryResult>`
118
+
119
+ Main function to query the arXiv API with search filters or ID lists.
120
+
121
+ **Options:**
122
+ - `idList?: string[]` - List of arXiv IDs to fetch (e.g., `['2101.01234', '2101.05678']`)
123
+ - `search?: ArxivSearchFilters` - Search filters (when used with `idList`, filters the entries from `idList` to only return those matching the search query)
124
+ - `start?: number` - Pagination offset (0-based)
125
+ - `maxResults?: number` - Maximum number of results (≤ 300)
126
+ - `sortBy?: 'relevance' | 'lastUpdatedDate' | 'submittedDate'` - Sort field
127
+ - `sortOrder?: 'ascending' | 'descending'` - Sort direction
128
+ - `timeoutMs?: number` - Request timeout in milliseconds (default: 10000)
129
+ - `retries?: number` - Number of retry attempts (default: 3)
130
+ - `rateLimit?: { tokensPerInterval: number, intervalMs: number }` - Rate limit configuration
131
+ - `userAgent?: string` - Custom User-Agent header
132
+
133
+ **Search Filters:**
134
+ - `title?: string[]` - Search in titles
135
+ - `author?: string[]` - Search by author names
136
+ - `abstract?: string[]` - Search in abstracts
137
+ - `category?: string[]` - Filter by arXiv categories
138
+ - `submittedDateRange?: { from: string, to: string }` - Date range filter (YYYYMMDDTTTT format)
139
+ - `or?: ArxivSearchFilters[]` - OR group of filters
140
+ - `andNot?: ArxivSearchFilters` - Negated filter (ANDNOT)
141
+
142
+ **Returns:**
143
+ ```typescript
144
+ {
145
+ feed: {
146
+ id: string;
147
+ updated: string;
148
+ title: string;
149
+ link: string;
150
+ totalResults: number;
151
+ startIndex: number;
152
+ itemsPerPage: number;
153
+ };
154
+ entries: Array<{
155
+ id: string;
156
+ arxivId: string;
157
+ title: string;
158
+ summary: string;
159
+ published: string;
160
+ updated: string;
161
+ authors: Array<{ name: string; affiliation?: string }>;
162
+ categories: string[];
163
+ primaryCategory?: string;
164
+ links: Array<{ href: string; rel?: string; type?: string; title?: string }>;
165
+ doi?: string;
166
+ journalRef?: string;
167
+ comment?: string;
168
+ }>;
169
+ }
170
+ ```
171
+
172
+ ## Examples
173
+
174
+ ### Search by title and author
175
+
176
+ ```typescript
177
+ const result = await getArxivEntries({
178
+ search: {
179
+ title: ['machine learning'],
180
+ author: ['Geoffrey Hinton'],
181
+ },
182
+ maxResults: 5,
183
+ });
184
+ ```
185
+
186
+ ### Fetch specific papers by ID
187
+
188
+ Using the simpler `getArxivEntriesById` function:
189
+
190
+ ```typescript
191
+ const result = await getArxivEntriesById(['2101.01234', '2101.05678']);
192
+ ```
193
+
194
+ Or using `getArxivEntries`:
195
+
196
+ ```typescript
197
+ const result = await getArxivEntries({
198
+ idList: ['2101.01234', '2101.05678'],
199
+ });
200
+ ```
201
+
202
+ ### Complex search with OR and date range
203
+
204
+ ```typescript
205
+ const result = await getArxivEntries({
206
+ search: {
207
+ or: [
208
+ { title: ['quantum'] },
209
+ { abstract: ['quantum'] },
210
+ ],
211
+ submittedDateRange: {
212
+ from: '202301010600',
213
+ to: '202401010600',
214
+ },
215
+ },
216
+ sortBy: 'submittedDate',
217
+ sortOrder: 'descending',
218
+ });
219
+ ```
220
+
221
+ ### Fetch papers by ID with rate limiting
222
+
223
+ ```typescript
224
+ const result = await getArxivEntriesById(
225
+ ['2101.01234', '2101.05678'],
226
+ {
227
+ rateLimit: {
228
+ tokensPerInterval: 1,
229
+ intervalMs: 3000, // 1 request per 3 seconds
230
+ },
231
+ timeoutMs: 15000,
232
+ }
233
+ );
234
+ ```
235
+
236
+ ### Search with rate limiting
237
+
238
+ ```typescript
239
+ const result = await getArxivEntries({
240
+ search: { title: ['neural networks'] },
241
+ rateLimit: {
242
+ tokensPerInterval: 1,
243
+ intervalMs: 3000, // 1 request per 3 seconds
244
+ },
245
+ });
246
+ ```
247
+
248
+ ## Documentation
249
+
250
+ ### Generating API Documentation
251
+
252
+ To generate browsable API documentation from the source code:
253
+
254
+ ```bash
255
+ npm run docs:generate
256
+ ```
257
+
258
+ This will create HTML documentation in the `docs/` directory. You can then view it locally:
259
+
260
+ ```bash
261
+ npm run docs:serve
262
+ ```
263
+
264
+ The generated documentation includes:
265
+ - Complete API reference for all exported functions and types
266
+ - Detailed parameter descriptions and examples
267
+ - Type information and relationships
268
+ - Search functionality
269
+
270
+ ### IDE IntelliSense
271
+
272
+ All exported functions and types include JSDoc comments for enhanced IDE IntelliSense support. Hover over any exported symbol in your IDE to see inline documentation.
273
+
274
+ ## TypeScript Types
275
+
276
+ All types are exported from the package:
277
+
278
+ ```typescript
279
+ import type {
280
+ ArxivQueryOptions,
281
+ ArxivQueryResult,
282
+ ArxivSearchFilters,
283
+ ArxivEntry,
284
+ ArxivFeedMeta,
285
+ ArxivAuthor,
286
+ ArxivLink,
287
+ ArxivSortBy,
288
+ ArxivSortOrder,
289
+ ArxivRateLimitConfig,
290
+ ArxivDateRange,
291
+ // OAI-PMH types
292
+ OaiIdentifyResponse,
293
+ OaiRecord,
294
+ OaiHeader,
295
+ OaiSet,
296
+ OaiMetadataFormat,
297
+ OaiResumptionToken,
298
+ OaiListRecordsResult,
299
+ OaiListIdentifiersResult,
300
+ OaiListSetsResult,
301
+ OaiRequestOptions,
302
+ OaiListOptions,
303
+ OaiErrorCode,
304
+ OaiError
305
+ } from 'arxiv-api-wrapper';
306
+ ```
307
+
308
+ ## License
309
+
310
+ ISC
311
+
312
+ ## Author
313
+
314
+ Vilhelm Agdur
315
+
316
+ ## Repository
317
+
318
+ https://github.com/vagdur/arxiv-api-wrapper
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "arxiv-api-wrapper",
3
- "version": "1.1.0",
3
+ "version": "2.0.0",
4
4
  "description": "Provides functions wrapping the arXiv API",
5
5
  "keywords": [
6
6
  "arxiv"
@@ -20,16 +20,18 @@
20
20
  "types": "./src/index.ts",
21
21
  "scripts": {
22
22
  "test": "vitest run --config tests/vitest.config.mts",
23
+ "typecheck": "tsc --noEmit",
24
+ "check": "npm run typecheck && npm run test && npm audit",
23
25
  "docs:generate": "typedoc",
24
26
  "docs:serve": "npx serve docs"
25
27
  },
26
28
  "dependencies": {
27
- "fast-xml-parser": "^4.3.5"
29
+ "fast-xml-parser": "^5.3.5"
28
30
  },
29
31
  "devDependencies": {
30
32
  "@types/node": "^25.0.0",
31
- "typedoc": "^0.26.0",
33
+ "typedoc": "^0.28.17",
32
34
  "typescript": "^5.0.0",
33
- "vitest": "^1.0.0"
35
+ "vitest": "^4.0.18"
34
36
  }
35
37
  }