paperplain-mcp 1.2.3 → 1.2.4

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (3) hide show
  1. package/README.md +105 -18
  2. package/package.json +1 -1
  3. package/server.js +31 -17
package/README.md CHANGED
@@ -1,10 +1,25 @@
1
1
  # PaperPlain MCP
2
2
 
3
- Give any AI agent access to 200M+ peer-reviewed papers from PubMed, ArXiv, and Semantic Scholar.
3
+ **Web search gives your agent links. PaperPlain gives it science.**
4
+
5
+ Give any AI agent instant access to 200M+ peer-reviewed papers from PubMed, ArXiv, and Semantic Scholar — structured, verifiable, and ready for reasoning.
4
6
 
5
7
  **Free. No API key. No account. No backend.**
6
8
 
7
- The MCP calls PubMed and ArXiv directly and returns papers with full abstracts. Your agent's own LLM synthesizes the findings — no black-box summaries, no extra cost, full context.
9
+ ---
10
+
11
+ ## Why not just use web search?
12
+
13
+ | Web Search | PaperPlain MCP |
14
+ |---|---|
15
+ | Snippets, SEO noise, blogs | Full abstracts, peer-reviewed only |
16
+ | Returns URLs to scrape | Structured JSON ready for reasoning |
17
+ | Can hallucinate or misattribute sources | Real DOIs, real PMIDs — verifiable |
18
+ | Search engines block bots | PubMed/ArXiv/S2 built for programmatic access |
19
+ | No quality signal | Citation counts included |
20
+ | Mixed sources, no routing | Health → PubMed, CS/AI → ArXiv, general → all three |
21
+
22
+ ---
8
23
 
9
24
  ## Install
10
25
 
@@ -34,10 +49,50 @@ Restart your client. That's it.
34
49
  - Cursor: `.cursor/mcp.json`
35
50
  - Windsurf: `~/.codeium/windsurf/mcp_config.json`
36
51
 
52
+ > Note: PaperPlain is a stdio-based MCP. It works with local clients (Claude Desktop, Cursor, Windsurf, VS Code agents). It does not support Claude.ai web chat, which requires remote HTTP-based MCP servers.
53
+
54
+ ---
55
+
56
+ ## Limitations
57
+
58
+ PaperPlain uses free public APIs — no backend, no cost. The trade-off is rate limits imposed by each source:
59
+
60
+ - **PubMed** — generous, rarely an issue for normal agent usage
61
+ - **ArXiv** — strict under parallel load; PaperPlain falls back to Semantic Scholar's ARXIV: endpoint automatically
62
+ - **Semantic Scholar** — ~1 req/s unauthenticated; most likely to cause 429s in batch workflows
63
+
64
+ When a source is rate-limited, `search_research` returns a `warnings` field explaining which source failed and why. `find_paper_by_title` returns a plain-text error the agent can relay to the user.
65
+
66
+ ### Optional: Semantic Scholar API key
67
+
68
+ For heavy usage (automated research workflows, batch fetches), you can add a free S2 API key to raise the rate limit from ~1 req/s to 100 req/s.
69
+
70
+ 1. Request a key at [semanticscholar.org/product/api](https://www.semanticscholar.org/product/api) (free, approved within a day)
71
+ 2. Add it to your MCP config:
72
+
73
+ ```json
74
+ {
75
+ "mcpServers": {
76
+ "paperplain": {
77
+ "command": "npx",
78
+ "args": ["-y", "paperplain-mcp"],
79
+ "env": {
80
+ "S2_API_KEY": "your-key-here"
81
+ }
82
+ }
83
+ }
84
+ }
85
+ ```
86
+
87
+ Zero-config users are unaffected — the key is entirely optional.
88
+
89
+ ---
90
+
37
91
  ## Tools
38
92
 
39
93
  ### `search_research`
40
- Search PubMed, ArXiv, and Semantic Scholar for peer-reviewed papers. Auto-routes based on topic (health → PubMed + S2, CS/AI → ArXiv + S2, general → all three).
94
+
95
+ Search PubMed, ArXiv, and Semantic Scholar for peer-reviewed papers. Auto-routes based on topic — health queries go to PubMed + S2, CS/AI queries go to ArXiv + S2, everything else hits all three.
41
96
 
42
97
  ```
43
98
  query Natural language question or topic
@@ -45,35 +100,67 @@ max_results 1–10 papers (default: 5)
45
100
  domain "auto" | "health" | "cs" | "general"
46
101
  ```
47
102
 
48
- Returns: array of papers with title, authors, abstract, published date, URL, DOI.
103
+ Returns papers with title, authors, abstract, published date, URL, DOI, citation count, and a `source_status` field so your agent knows if any database was unavailable.
49
104
 
50
105
  ### `fetch_paper`
51
- Fetch full metadata and abstract for a specific paper by ID.
106
+
107
+ Fetch full metadata and abstract for a specific paper. Supports:
108
+
109
+ - **ArXiv IDs** — `"2301.07041"`, `"arxiv:2301.07041v2"`, `"https://arxiv.org/abs/2301.07041"`
110
+ - **PubMed IDs** — `"pubmed:37183813"` or just `"37183813"`
111
+ - **DOIs** — `"10.1145/3290605.3300857"` or `"doi:10.1145/3290605.3300857"` (resolved via Semantic Scholar)
112
+
113
+ Falls back to Semantic Scholar's ARXIV: endpoint when the ArXiv API is rate-limited.
114
+
115
+ ### `find_paper_by_title`
116
+
117
+ Find a specific paper when you only know its title. Uses Semantic Scholar's title-match search and returns the closest result.
52
118
 
53
119
  ```
54
- paper_id ArXiv ID ("2301.07041") or PubMed ID ("pubmed:37183813")
120
+ title Full or partial paper title, e.g. "Attention Is All You Need"
121
+ year Publication year to narrow the match (optional)
55
122
  ```
56
123
 
124
+ Useful for verifying a citation or retrieving an abstract when you have no ID or DOI.
125
+
126
+ ---
127
+
57
128
  ## How it works
58
129
 
59
- 1. Your agent calls `search_research("effects of sleep deprivation on memory")`
60
- 2. PaperPlain routes to PubMed + Semantic Scholar (health topic), fetches abstracts
61
- 3. Returns structured JSON with papers and full abstracts
62
- 4. Your agent's LLM synthesizes findings using its full context
130
+ 1. Agent calls `search_research("agentic AI for home energy management")`
131
+ 2. PaperPlain classifies the domain (CS/AI) and routes to ArXiv + Semantic Scholar
132
+ 3. Returns structured JSON full abstracts, authors, dates, DOIs, citation counts
133
+ 4. Agent's LLM synthesizes findings from the returned context — no black-box summaries
63
134
 
64
- No LLM calls on our side. No cost. No rate limits beyond what PubMed/ArXiv impose.
135
+ No LLM calls on our side. No cost. No rate limits beyond what PubMed, ArXiv, and Semantic Scholar impose.
65
136
 
66
- ## Example
137
+ ---
67
138
 
68
- ```
69
- User: What does the research say about cold exposure and metabolism?
139
+ ## Example output
70
140
 
71
- Agent calls: search_research("cold exposure brown adipose tissue metabolism")
72
- → Returns 5 PubMed papers with abstracts
73
- → Agent synthesizes: "Three RCTs found that regular cold water immersion (14°C,
74
- 1hr/week for 6 weeks) increased brown adipose tissue activity by 37-42%..."
141
+ ```json
142
+ {
143
+ "query": "transformer architecture energy forecasting",
144
+ "domain": "cs",
145
+ "source_status": { "arxiv": "ok", "semanticscholar": "ok" },
146
+ "total": 5,
147
+ "papers": [
148
+ {
149
+ "id": "arxiv:2306.05042",
150
+ "source": "arxiv",
151
+ "title": "Temporal Fusion Transformers for Interpretable Multi-horizon Time Series Forecasting",
152
+ "authors": ["Bryan Lim", "Sercan Arik"],
153
+ "published": "2023-06-08",
154
+ "abstract": "...",
155
+ "url": "https://arxiv.org/abs/2306.05042",
156
+ "citations": 1423
157
+ }
158
+ ]
159
+ }
75
160
  ```
76
161
 
162
+ ---
163
+
77
164
  ## Self-host
78
165
 
79
166
  ```bash
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "paperplain-mcp",
3
- "version": "1.2.3",
3
+ "version": "1.2.4",
4
4
  "description": "MCP server — search 200M+ peer-reviewed papers from PubMed, ArXiv, and Semantic Scholar. Free. No API key.",
5
5
  "type": "module",
6
6
  "bin": {
package/server.js CHANGED
@@ -14,6 +14,14 @@ const PUBMED_BASE = "https://eutils.ncbi.nlm.nih.gov/entrez/eutils";
14
14
  const PUBMED_PARAMS = "tool=paperplain&email=hello@paperplain.io";
15
15
  const SEMANTIC_SCHOLAR_BASE = "https://api.semanticscholar.org/graph/v1";
16
16
 
17
+ // Optional S2 API key — raises rate limits from ~1 req/s to 100 req/s.
18
+ // Get a free key at semanticscholar.org/product/api
19
+ // Set via MCP env config: { "env": { "S2_API_KEY": "your-key" } }
20
+ const S2_API_KEY = process.env.S2_API_KEY || null;
21
+ function s2Options() {
22
+ return S2_API_KEY ? { headers: { "x-api-key": S2_API_KEY } } : {};
23
+ }
24
+
17
25
  // ── Domain classifier (keyword-based, no LLM needed) ───────────────────────
18
26
  // Note: "energy" intentionally excluded from health — it's more common in
19
27
  // CS/engineering contexts (energy management, HEMS, smart grid) than health.
@@ -71,11 +79,11 @@ function parseArxivXml(xml) {
71
79
  return papers;
72
80
  }
73
81
 
74
- async function fetchWithTimeout(url, ms = 10000) {
82
+ async function fetchWithTimeout(url, ms = 10000, options = {}) {
75
83
  const controller = new AbortController();
76
84
  const timer = setTimeout(() => controller.abort(), ms);
77
85
  try {
78
- return await fetch(url, { signal: controller.signal });
86
+ return await fetch(url, { signal: controller.signal, ...options });
79
87
  } finally {
80
88
  clearTimeout(timer);
81
89
  }
@@ -125,7 +133,8 @@ async function fetchS2ByArxivId(arxivId) {
125
133
  try {
126
134
  const fields = "title,abstract,authors,year,citationCount,openAccessPdf,externalIds";
127
135
  const res = await fetchWithTimeout(
128
- `${SEMANTIC_SCHOLAR_BASE}/paper/ARXIV:${encodeURIComponent(clean)}?fields=${fields}`
136
+ `${SEMANTIC_SCHOLAR_BASE}/paper/ARXIV:${encodeURIComponent(clean)}?fields=${fields}`,
137
+ 10000, s2Options()
129
138
  );
130
139
  if (!res.ok) return null;
131
140
  const item = await res.json().catch(() => null);
@@ -218,15 +227,11 @@ async function searchSemanticScholar(query, maxResults) {
218
227
  try {
219
228
  const fields = "title,abstract,authors,year,citationCount,openAccessPdf,externalIds";
220
229
  const url = `${SEMANTIC_SCHOLAR_BASE}/paper/search?query=${encodeURIComponent(query)}&limit=${maxResults}&fields=${fields}`;
221
- const controller = new AbortController();
222
- const timeout = setTimeout(() => controller.abort(), 10000);
223
- let response;
224
- try {
225
- response = await fetch(url, { signal: controller.signal });
226
- } finally {
227
- clearTimeout(timeout);
230
+ const response = await fetchWithTimeout(url, 10000, s2Options());
231
+ if (!response.ok) {
232
+ if (response.status === 429) throw new Error("S2_RATE_LIMITED");
233
+ return [];
228
234
  }
229
- if (!response.ok) return [];
230
235
  const data = await response.json().catch(() => null);
231
236
  if (!data?.data) return [];
232
237
  return data.data
@@ -254,7 +259,8 @@ async function searchSemanticScholar(query, maxResults) {
254
259
  })
255
260
  .filter(Boolean)
256
261
  .sort((a, b) => b.citations - a.citations);
257
- } catch {
262
+ } catch (err) {
263
+ if (err.message === "S2_RATE_LIMITED") throw err;
258
264
  return [];
259
265
  }
260
266
  }
@@ -262,7 +268,7 @@ async function searchSemanticScholar(query, maxResults) {
262
268
  // ── MCP Server ─────────────────────────────────────────────────────────────
263
269
  const server = new McpServer({
264
270
  name: "paperplain",
265
- version: "1.2.3",
271
+ version: "1.2.4",
266
272
  description:
267
273
  "Search 200M+ peer-reviewed papers from PubMed, ArXiv, and Semantic Scholar. Returns papers with full abstracts — use your own model to synthesize findings.",
268
274
  });
@@ -320,7 +326,10 @@ Use the returned abstracts to synthesize findings, answer the user's question, o
320
326
  const r = await searchSemanticScholar(q, n);
321
327
  sourceStatus.semanticscholar = r.length ? "ok" : "empty";
322
328
  return r;
323
- } catch { sourceStatus.semanticscholar = "error"; return []; }
329
+ } catch (err) {
330
+ sourceStatus.semanticscholar = err.message === "S2_RATE_LIMITED" ? "rate_limited" : "error";
331
+ return [];
332
+ }
324
333
  }
325
334
 
326
335
  try {
@@ -372,6 +381,7 @@ Use the returned abstracts to synthesize findings, answer the user's question, o
372
381
  : ["arxiv", "pubmed", "semanticscholar"];
373
382
  for (const src of expectedSources) {
374
383
  if (sourceStatus[src] === "empty") warnings.push(`${src}: returned 0 results (API may be rate-limited or query too specific)`);
384
+ if (sourceStatus[src] === "rate_limited") warnings.push(`${src}: rate-limited (429) — wait 60s and retry, or add S2_API_KEY to your MCP env config for higher limits`);
375
385
  if (sourceStatus[src] === "error") warnings.push(`${src}: request failed (API may be temporarily unavailable)`);
376
386
  }
377
387
 
@@ -419,7 +429,8 @@ async function fetchS2ByDoi(doi) {
419
429
  const clean = doi.replace(/^doi:/i, "").trim();
420
430
  const fields = "title,abstract,authors,year,citationCount,openAccessPdf,externalIds";
421
431
  const res = await fetchWithTimeout(
422
- `${SEMANTIC_SCHOLAR_BASE}/paper/DOI:${encodeURIComponent(clean)}?fields=${fields}`
432
+ `${SEMANTIC_SCHOLAR_BASE}/paper/DOI:${encodeURIComponent(clean)}?fields=${fields}`,
433
+ 10000, s2Options()
423
434
  );
424
435
  if (!res.ok) return null;
425
436
  const item = await res.json().catch(() => null);
@@ -548,10 +559,13 @@ Useful for verifying a citation or retrieving abstract details for a paper you a
548
559
  try {
549
560
  const fields = "title,abstract,authors,year,citationCount,openAccessPdf,externalIds";
550
561
  const url = `${SEMANTIC_SCHOLAR_BASE}/paper/search?query=${encodeURIComponent(title)}&limit=5&fields=${fields}`;
551
- const res = await fetchWithTimeout(url);
562
+ const res = await fetchWithTimeout(url, 10000, s2Options());
552
563
  if (!res.ok) {
564
+ const msg = res.status === 429
565
+ ? `Rate limited by Semantic Scholar (429). Wait 60 seconds and retry. To avoid this for batch workflows, add S2_API_KEY to your MCP env config.`
566
+ : `Search failed: Semantic Scholar returned ${res.status}`;
553
567
  return {
554
- content: [{ type: "text", text: `Search failed: Semantic Scholar returned ${res.status}` }],
568
+ content: [{ type: "text", text: msg }],
555
569
  isError: true,
556
570
  };
557
571
  }