paperplain-mcp 1.2.2 → 1.2.4
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +105 -18
- package/package.json +1 -1
- package/server.js +38 -20
package/README.md
CHANGED
|
@@ -1,10 +1,25 @@
|
|
|
1
1
|
# PaperPlain MCP
|
|
2
2
|
|
|
3
|
-
|
|
3
|
+
**Web search gives your agent links. PaperPlain gives it science.**
|
|
4
|
+
|
|
5
|
+
Give any AI agent instant access to 200M+ peer-reviewed papers from PubMed, ArXiv, and Semantic Scholar — structured, verifiable, and ready for reasoning.
|
|
4
6
|
|
|
5
7
|
**Free. No API key. No account. No backend.**
|
|
6
8
|
|
|
7
|
-
|
|
9
|
+
---
|
|
10
|
+
|
|
11
|
+
## Why not just use web search?
|
|
12
|
+
|
|
13
|
+
| Web Search | PaperPlain MCP |
|
|
14
|
+
|---|---|
|
|
15
|
+
| Snippets, SEO noise, blogs | Full abstracts, peer-reviewed only |
|
|
16
|
+
| Returns URLs to scrape | Structured JSON ready for reasoning |
|
|
17
|
+
| Can hallucinate or misattribute sources | Real DOIs, real PMIDs — verifiable |
|
|
18
|
+
| Search engines block bots | PubMed/ArXiv/S2 built for programmatic access |
|
|
19
|
+
| No quality signal | Citation counts included |
|
|
20
|
+
| Mixed sources, no routing | Health → PubMed, CS/AI → ArXiv, general → all three |
|
|
21
|
+
|
|
22
|
+
---
|
|
8
23
|
|
|
9
24
|
## Install
|
|
10
25
|
|
|
@@ -34,10 +49,50 @@ Restart your client. That's it.
|
|
|
34
49
|
- Cursor: `.cursor/mcp.json`
|
|
35
50
|
- Windsurf: `~/.codeium/windsurf/mcp_config.json`
|
|
36
51
|
|
|
52
|
+
> Note: PaperPlain is a stdio-based MCP. It works with local clients (Claude Desktop, Cursor, Windsurf, VS Code agents). It does not support Claude.ai web chat, which requires remote HTTP-based MCP servers.
|
|
53
|
+
|
|
54
|
+
---
|
|
55
|
+
|
|
56
|
+
## Limitations
|
|
57
|
+
|
|
58
|
+
PaperPlain uses free public APIs — no backend, no cost. The trade-off is rate limits imposed by each source:
|
|
59
|
+
|
|
60
|
+
- **PubMed** — generous, rarely an issue for normal agent usage
|
|
61
|
+
- **ArXiv** — strict under parallel load; PaperPlain falls back to Semantic Scholar's ARXIV: endpoint automatically
|
|
62
|
+
- **Semantic Scholar** — ~1 req/s unauthenticated; most likely to cause 429s in batch workflows
|
|
63
|
+
|
|
64
|
+
When a source is rate-limited, `search_research` returns a `warnings` field explaining which source failed and why. `find_paper_by_title` returns a plain-text error the agent can relay to the user.
|
|
65
|
+
|
|
66
|
+
### Optional: Semantic Scholar API key
|
|
67
|
+
|
|
68
|
+
For heavy usage (automated research workflows, batch fetches), you can add a free S2 API key to raise the rate limit from ~1 req/s to 100 req/s.
|
|
69
|
+
|
|
70
|
+
1. Request a key at [semanticscholar.org/product/api](https://www.semanticscholar.org/product/api) (free, approved within a day)
|
|
71
|
+
2. Add it to your MCP config:
|
|
72
|
+
|
|
73
|
+
```json
|
|
74
|
+
{
|
|
75
|
+
"mcpServers": {
|
|
76
|
+
"paperplain": {
|
|
77
|
+
"command": "npx",
|
|
78
|
+
"args": ["-y", "paperplain-mcp"],
|
|
79
|
+
"env": {
|
|
80
|
+
"S2_API_KEY": "your-key-here"
|
|
81
|
+
}
|
|
82
|
+
}
|
|
83
|
+
}
|
|
84
|
+
}
|
|
85
|
+
```
|
|
86
|
+
|
|
87
|
+
Zero-config users are unaffected — the key is entirely optional.
|
|
88
|
+
|
|
89
|
+
---
|
|
90
|
+
|
|
37
91
|
## Tools
|
|
38
92
|
|
|
39
93
|
### `search_research`
|
|
40
|
-
|
|
94
|
+
|
|
95
|
+
Search PubMed, ArXiv, and Semantic Scholar for peer-reviewed papers. Auto-routes based on topic — health queries go to PubMed + S2, CS/AI queries go to ArXiv + S2, everything else hits all three.
|
|
41
96
|
|
|
42
97
|
```
|
|
43
98
|
query Natural language question or topic
|
|
@@ -45,35 +100,67 @@ max_results 1–10 papers (default: 5)
|
|
|
45
100
|
domain "auto" | "health" | "cs" | "general"
|
|
46
101
|
```
|
|
47
102
|
|
|
48
|
-
Returns
|
|
103
|
+
Returns papers with title, authors, abstract, published date, URL, DOI, citation count, and a `source_status` field so your agent knows if any database was unavailable.
|
|
49
104
|
|
|
50
105
|
### `fetch_paper`
|
|
51
|
-
|
|
106
|
+
|
|
107
|
+
Fetch full metadata and abstract for a specific paper. Supports:
|
|
108
|
+
|
|
109
|
+
- **ArXiv IDs** — `"2301.07041"`, `"arxiv:2301.07041v2"`, `"https://arxiv.org/abs/2301.07041"`
|
|
110
|
+
- **PubMed IDs** — `"pubmed:37183813"` or just `"37183813"`
|
|
111
|
+
- **DOIs** — `"10.1145/3290605.3300857"` or `"doi:10.1145/3290605.3300857"` (resolved via Semantic Scholar)
|
|
112
|
+
|
|
113
|
+
Falls back to Semantic Scholar's ARXIV: endpoint when the ArXiv API is rate-limited.
|
|
114
|
+
|
|
115
|
+
### `find_paper_by_title`
|
|
116
|
+
|
|
117
|
+
Find a specific paper when you only know its title. Uses Semantic Scholar's title-match search and returns the closest result.
|
|
52
118
|
|
|
53
119
|
```
|
|
54
|
-
|
|
120
|
+
title Full or partial paper title, e.g. "Attention Is All You Need"
|
|
121
|
+
year Publication year to narrow the match (optional)
|
|
55
122
|
```
|
|
56
123
|
|
|
124
|
+
Useful for verifying a citation or retrieving an abstract when you have no ID or DOI.
|
|
125
|
+
|
|
126
|
+
---
|
|
127
|
+
|
|
57
128
|
## How it works
|
|
58
129
|
|
|
59
|
-
1.
|
|
60
|
-
2. PaperPlain routes to
|
|
61
|
-
3. Returns structured JSON
|
|
62
|
-
4.
|
|
130
|
+
1. Agent calls `search_research("agentic AI for home energy management")`
|
|
131
|
+
2. PaperPlain classifies the domain (CS/AI) and routes to ArXiv + Semantic Scholar
|
|
132
|
+
3. Returns structured JSON — full abstracts, authors, dates, DOIs, citation counts
|
|
133
|
+
4. Agent's LLM synthesizes findings from the returned context — no black-box summaries
|
|
63
134
|
|
|
64
|
-
No LLM calls on our side. No cost. No rate limits beyond what PubMed
|
|
135
|
+
No LLM calls on our side. No cost. No rate limits beyond what PubMed, ArXiv, and Semantic Scholar impose.
|
|
65
136
|
|
|
66
|
-
|
|
137
|
+
---
|
|
67
138
|
|
|
68
|
-
|
|
69
|
-
User: What does the research say about cold exposure and metabolism?
|
|
139
|
+
## Example output
|
|
70
140
|
|
|
71
|
-
|
|
72
|
-
|
|
73
|
-
|
|
74
|
-
|
|
141
|
+
```json
|
|
142
|
+
{
|
|
143
|
+
"query": "transformer architecture energy forecasting",
|
|
144
|
+
"domain": "cs",
|
|
145
|
+
"source_status": { "arxiv": "ok", "semanticscholar": "ok" },
|
|
146
|
+
"total": 5,
|
|
147
|
+
"papers": [
|
|
148
|
+
{
|
|
149
|
+
"id": "arxiv:2306.05042",
|
|
150
|
+
"source": "arxiv",
|
|
151
|
+
"title": "Temporal Fusion Transformers for Interpretable Multi-horizon Time Series Forecasting",
|
|
152
|
+
"authors": ["Bryan Lim", "Sercan Arik"],
|
|
153
|
+
"published": "2023-06-08",
|
|
154
|
+
"abstract": "...",
|
|
155
|
+
"url": "https://arxiv.org/abs/2306.05042",
|
|
156
|
+
"citations": 1423
|
|
157
|
+
}
|
|
158
|
+
]
|
|
159
|
+
}
|
|
75
160
|
```
|
|
76
161
|
|
|
162
|
+
---
|
|
163
|
+
|
|
77
164
|
## Self-host
|
|
78
165
|
|
|
79
166
|
```bash
|
package/package.json
CHANGED
package/server.js
CHANGED
|
@@ -14,6 +14,14 @@ const PUBMED_BASE = "https://eutils.ncbi.nlm.nih.gov/entrez/eutils";
|
|
|
14
14
|
const PUBMED_PARAMS = "tool=paperplain&email=hello@paperplain.io";
|
|
15
15
|
const SEMANTIC_SCHOLAR_BASE = "https://api.semanticscholar.org/graph/v1";
|
|
16
16
|
|
|
17
|
+
// Optional S2 API key — raises rate limits from ~1 req/s to 100 req/s.
|
|
18
|
+
// Get a free key at semanticscholar.org/product/api
|
|
19
|
+
// Set via MCP env config: { "env": { "S2_API_KEY": "your-key" } }
|
|
20
|
+
const S2_API_KEY = process.env.S2_API_KEY || null;
|
|
21
|
+
function s2Options() {
|
|
22
|
+
return S2_API_KEY ? { headers: { "x-api-key": S2_API_KEY } } : {};
|
|
23
|
+
}
|
|
24
|
+
|
|
17
25
|
// ── Domain classifier (keyword-based, no LLM needed) ───────────────────────
|
|
18
26
|
// Note: "energy" intentionally excluded from health — it's more common in
|
|
19
27
|
// CS/engineering contexts (energy management, HEMS, smart grid) than health.
|
|
@@ -71,11 +79,11 @@ function parseArxivXml(xml) {
|
|
|
71
79
|
return papers;
|
|
72
80
|
}
|
|
73
81
|
|
|
74
|
-
async function fetchWithTimeout(url, ms = 10000) {
|
|
82
|
+
async function fetchWithTimeout(url, ms = 10000, options = {}) {
|
|
75
83
|
const controller = new AbortController();
|
|
76
84
|
const timer = setTimeout(() => controller.abort(), ms);
|
|
77
85
|
try {
|
|
78
|
-
return await fetch(url, { signal: controller.signal });
|
|
86
|
+
return await fetch(url, { signal: controller.signal, ...options });
|
|
79
87
|
} finally {
|
|
80
88
|
clearTimeout(timer);
|
|
81
89
|
}
|
|
@@ -125,7 +133,8 @@ async function fetchS2ByArxivId(arxivId) {
|
|
|
125
133
|
try {
|
|
126
134
|
const fields = "title,abstract,authors,year,citationCount,openAccessPdf,externalIds";
|
|
127
135
|
const res = await fetchWithTimeout(
|
|
128
|
-
`${SEMANTIC_SCHOLAR_BASE}/paper/ARXIV:${encodeURIComponent(clean)}?fields=${fields}
|
|
136
|
+
`${SEMANTIC_SCHOLAR_BASE}/paper/ARXIV:${encodeURIComponent(clean)}?fields=${fields}`,
|
|
137
|
+
10000, s2Options()
|
|
129
138
|
);
|
|
130
139
|
if (!res.ok) return null;
|
|
131
140
|
const item = await res.json().catch(() => null);
|
|
@@ -218,15 +227,11 @@ async function searchSemanticScholar(query, maxResults) {
|
|
|
218
227
|
try {
|
|
219
228
|
const fields = "title,abstract,authors,year,citationCount,openAccessPdf,externalIds";
|
|
220
229
|
const url = `${SEMANTIC_SCHOLAR_BASE}/paper/search?query=${encodeURIComponent(query)}&limit=${maxResults}&fields=${fields}`;
|
|
221
|
-
const
|
|
222
|
-
|
|
223
|
-
|
|
224
|
-
|
|
225
|
-
response = await fetch(url, { signal: controller.signal });
|
|
226
|
-
} finally {
|
|
227
|
-
clearTimeout(timeout);
|
|
230
|
+
const response = await fetchWithTimeout(url, 10000, s2Options());
|
|
231
|
+
if (!response.ok) {
|
|
232
|
+
if (response.status === 429) throw new Error("S2_RATE_LIMITED");
|
|
233
|
+
return [];
|
|
228
234
|
}
|
|
229
|
-
if (!response.ok) return [];
|
|
230
235
|
const data = await response.json().catch(() => null);
|
|
231
236
|
if (!data?.data) return [];
|
|
232
237
|
return data.data
|
|
@@ -254,7 +259,8 @@ async function searchSemanticScholar(query, maxResults) {
|
|
|
254
259
|
})
|
|
255
260
|
.filter(Boolean)
|
|
256
261
|
.sort((a, b) => b.citations - a.citations);
|
|
257
|
-
} catch {
|
|
262
|
+
} catch (err) {
|
|
263
|
+
if (err.message === "S2_RATE_LIMITED") throw err;
|
|
258
264
|
return [];
|
|
259
265
|
}
|
|
260
266
|
}
|
|
@@ -262,7 +268,7 @@ async function searchSemanticScholar(query, maxResults) {
|
|
|
262
268
|
// ── MCP Server ─────────────────────────────────────────────────────────────
|
|
263
269
|
const server = new McpServer({
|
|
264
270
|
name: "paperplain",
|
|
265
|
-
version: "1.2.
|
|
271
|
+
version: "1.2.4",
|
|
266
272
|
description:
|
|
267
273
|
"Search 200M+ peer-reviewed papers from PubMed, ArXiv, and Semantic Scholar. Returns papers with full abstracts — use your own model to synthesize findings.",
|
|
268
274
|
});
|
|
@@ -320,7 +326,10 @@ Use the returned abstracts to synthesize findings, answer the user's question, o
|
|
|
320
326
|
const r = await searchSemanticScholar(q, n);
|
|
321
327
|
sourceStatus.semanticscholar = r.length ? "ok" : "empty";
|
|
322
328
|
return r;
|
|
323
|
-
} catch {
|
|
329
|
+
} catch (err) {
|
|
330
|
+
sourceStatus.semanticscholar = err.message === "S2_RATE_LIMITED" ? "rate_limited" : "error";
|
|
331
|
+
return [];
|
|
332
|
+
}
|
|
324
333
|
}
|
|
325
334
|
|
|
326
335
|
try {
|
|
@@ -338,8 +347,9 @@ Use the returned abstracts to synthesize findings, answer the user's question, o
|
|
|
338
347
|
safeS2(query, Math.ceil(max_results / 2)),
|
|
339
348
|
]);
|
|
340
349
|
const maxArxiv = Math.ceil(max_results * 0.6);
|
|
341
|
-
|
|
342
|
-
const
|
|
350
|
+
// Deduplicate on URL — S2 uses arxiv.org URLs for arXiv papers, matching exactly
|
|
351
|
+
const arxivUrls = new Set(arxiv.map((p) => p.url));
|
|
352
|
+
const uniqueS2 = s2.filter((p) => !arxivUrls.has(p.url));
|
|
343
353
|
papers = [
|
|
344
354
|
...arxiv.slice(0, maxArxiv),
|
|
345
355
|
...uniqueS2.slice(0, max_results - Math.min(arxiv.length, maxArxiv)),
|
|
@@ -350,12 +360,15 @@ Use the returned abstracts to synthesize findings, answer the user's question, o
|
|
|
350
360
|
safePubMed(query, max_results),
|
|
351
361
|
safeS2(query, Math.ceil(max_results / 2)),
|
|
352
362
|
]);
|
|
363
|
+
// Deduplicate S2 against both ArXiv and PubMed URLs
|
|
364
|
+
const seenUrls = new Set([...arxiv.map((p) => p.url), ...pubmed.map((p) => p.url)]);
|
|
365
|
+
const uniqueS2 = s2.filter((p) => !seenUrls.has(p.url));
|
|
353
366
|
const maxEach = Math.floor(max_results / 3);
|
|
354
367
|
const remainder = max_results - maxEach * 3;
|
|
355
368
|
papers = [
|
|
356
369
|
...arxiv.slice(0, maxEach + remainder),
|
|
357
370
|
...pubmed.slice(0, maxEach),
|
|
358
|
-
...
|
|
371
|
+
...uniqueS2.slice(0, maxEach),
|
|
359
372
|
].slice(0, max_results);
|
|
360
373
|
}
|
|
361
374
|
|
|
@@ -368,6 +381,7 @@ Use the returned abstracts to synthesize findings, answer the user's question, o
|
|
|
368
381
|
: ["arxiv", "pubmed", "semanticscholar"];
|
|
369
382
|
for (const src of expectedSources) {
|
|
370
383
|
if (sourceStatus[src] === "empty") warnings.push(`${src}: returned 0 results (API may be rate-limited or query too specific)`);
|
|
384
|
+
if (sourceStatus[src] === "rate_limited") warnings.push(`${src}: rate-limited (429) — wait 60s and retry, or add S2_API_KEY to your MCP env config for higher limits`);
|
|
371
385
|
if (sourceStatus[src] === "error") warnings.push(`${src}: request failed (API may be temporarily unavailable)`);
|
|
372
386
|
}
|
|
373
387
|
|
|
@@ -415,7 +429,8 @@ async function fetchS2ByDoi(doi) {
|
|
|
415
429
|
const clean = doi.replace(/^doi:/i, "").trim();
|
|
416
430
|
const fields = "title,abstract,authors,year,citationCount,openAccessPdf,externalIds";
|
|
417
431
|
const res = await fetchWithTimeout(
|
|
418
|
-
`${SEMANTIC_SCHOLAR_BASE}/paper/DOI:${encodeURIComponent(clean)}?fields=${fields}
|
|
432
|
+
`${SEMANTIC_SCHOLAR_BASE}/paper/DOI:${encodeURIComponent(clean)}?fields=${fields}`,
|
|
433
|
+
10000, s2Options()
|
|
419
434
|
);
|
|
420
435
|
if (!res.ok) return null;
|
|
421
436
|
const item = await res.json().catch(() => null);
|
|
@@ -544,10 +559,13 @@ Useful for verifying a citation or retrieving abstract details for a paper you a
|
|
|
544
559
|
try {
|
|
545
560
|
const fields = "title,abstract,authors,year,citationCount,openAccessPdf,externalIds";
|
|
546
561
|
const url = `${SEMANTIC_SCHOLAR_BASE}/paper/search?query=${encodeURIComponent(title)}&limit=5&fields=${fields}`;
|
|
547
|
-
const res = await fetchWithTimeout(url);
|
|
562
|
+
const res = await fetchWithTimeout(url, 10000, s2Options());
|
|
548
563
|
if (!res.ok) {
|
|
564
|
+
const msg = res.status === 429
|
|
565
|
+
? `Rate limited by Semantic Scholar (429). Wait 60 seconds and retry. To avoid this for batch workflows, add S2_API_KEY to your MCP env config.`
|
|
566
|
+
: `Search failed: Semantic Scholar returned ${res.status}`;
|
|
549
567
|
return {
|
|
550
|
-
content: [{ type: "text", text:
|
|
568
|
+
content: [{ type: "text", text: msg }],
|
|
551
569
|
isError: true,
|
|
552
570
|
};
|
|
553
571
|
}
|