npm - paperplain-mcp - Versions diffs - 1.2.2 → 1.2.4 - Mend

paperplain-mcp 1.2.2 → 1.2.4

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (3) hide show

package/README.md CHANGED Viewed

@@ -1,10 +1,25 @@
 # PaperPlain MCP
-Give any AI agent access to 200M+ peer-reviewed papers from PubMed, ArXiv, and Semantic Scholar.
+**Web search gives your agent links. PaperPlain gives it science.**
+Give any AI agent instant access to 200M+ peer-reviewed papers from PubMed, ArXiv, and Semantic Scholar — structured, verifiable, and ready for reasoning.
 **Free. No API key. No account. No backend.**
-The MCP calls PubMed and ArXiv directly and returns papers with full abstracts. Your agent's own LLM synthesizes the findings — no black-box summaries, no extra cost, full context.
+---
+## Why not just use web search?
+| Web Search | PaperPlain MCP |
+|---|---|
+| Snippets, SEO noise, blogs | Full abstracts, peer-reviewed only |
+| Returns URLs to scrape | Structured JSON ready for reasoning |
+| Can hallucinate or misattribute sources | Real DOIs, real PMIDs — verifiable |
+| Search engines block bots | PubMed/ArXiv/S2 built for programmatic access |
+| No quality signal | Citation counts included |
+| Mixed sources, no routing | Health → PubMed, CS/AI → ArXiv, general → all three |
+---
 ## Install
@@ -34,10 +49,50 @@ Restart your client. That's it.
 - Cursor: `.cursor/mcp.json`
 - Windsurf: `~/.codeium/windsurf/mcp_config.json`
+> Note: PaperPlain is a stdio-based MCP. It works with local clients (Claude Desktop, Cursor, Windsurf, VS Code agents). It does not support Claude.ai web chat, which requires remote HTTP-based MCP servers.
+---
+## Limitations
+PaperPlain uses free public APIs — no backend, no cost. The trade-off is rate limits imposed by each source:
+- **PubMed** — generous, rarely an issue for normal agent usage
+- **ArXiv** — strict under parallel load; PaperPlain falls back to Semantic Scholar's ARXIV: endpoint automatically
+- **Semantic Scholar** — ~1 req/s unauthenticated; most likely to cause 429s in batch workflows
+When a source is rate-limited, `search_research` returns a `warnings` field explaining which source failed and why. `find_paper_by_title` returns a plain-text error the agent can relay to the user.
+### Optional: Semantic Scholar API key
+For heavy usage (automated research workflows, batch fetches), you can add a free S2 API key to raise the rate limit from ~1 req/s to 100 req/s.
+1. Request a key at [semanticscholar.org/product/api](https://www.semanticscholar.org/product/api) (free, approved within a day)
+2. Add it to your MCP config:
+```json
+{
+  "mcpServers": {
+    "paperplain": {
+      "command": "npx",
+      "args": ["-y", "paperplain-mcp"],
+      "env": {
+        "S2_API_KEY": "your-key-here"
+      }
+    }
+  }
+}
+```
+Zero-config users are unaffected — the key is entirely optional.
+---
 ## Tools
 ### `search_research`
-Search PubMed, ArXiv, and Semantic Scholar for peer-reviewed papers. Auto-routes based on topic (health → PubMed + S2, CS/AI → ArXiv + S2, general → all three).
+Search PubMed, ArXiv, and Semantic Scholar for peer-reviewed papers. Auto-routes based on topic — health queries go to PubMed + S2, CS/AI queries go to ArXiv + S2, everything else hits all three.
 ```
 query         Natural language question or topic
@@ -45,35 +100,67 @@ max_results   1–10 papers (default: 5)
 domain        "auto" | "health" | "cs" | "general"
 ```
-Returns: array of papers with title, authors, abstract, published date, URL, DOI.
+Returns papers with title, authors, abstract, published date, URL, DOI, citation count, and a `source_status` field so your agent knows if any database was unavailable.
 ### `fetch_paper`
-Fetch full metadata and abstract for a specific paper by ID.
+Fetch full metadata and abstract for a specific paper. Supports:
+- **ArXiv IDs** — `"2301.07041"`, `"arxiv:2301.07041v2"`, `"https://arxiv.org/abs/2301.07041"`
+- **PubMed IDs** — `"pubmed:37183813"` or just `"37183813"`
+- **DOIs** — `"10.1145/3290605.3300857"` or `"doi:10.1145/3290605.3300857"` (resolved via Semantic Scholar)
+Falls back to Semantic Scholar's ARXIV: endpoint when the ArXiv API is rate-limited.
+### `find_paper_by_title`
+Find a specific paper when you only know its title. Uses Semantic Scholar's title-match search and returns the closest result.
 ```
-paper_id   ArXiv ID ("2301.07041") or PubMed ID ("pubmed:37183813")
+title   Full or partial paper title, e.g. "Attention Is All You Need"
+year    Publication year to narrow the match (optional)
 ```
+Useful for verifying a citation or retrieving an abstract when you have no ID or DOI.
+---
 ## How it works
-1. Your agent calls `search_research("effects of sleep deprivation on memory")`
-2. PaperPlain routes to PubMed + Semantic Scholar (health topic), fetches abstracts
-3. Returns structured JSON with papers and full abstracts
-4. Your agent's LLM synthesizes findings using its full context
+1. Agent calls `search_research("agentic AI for home energy management")`
+2. PaperPlain classifies the domain (CS/AI) and routes to ArXiv + Semantic Scholar
+3. Returns structured JSON — full abstracts, authors, dates, DOIs, citation counts
+4. Agent's LLM synthesizes findings from the returned context — no black-box summaries
-No LLM calls on our side. No cost. No rate limits beyond what PubMed/ArXiv impose.
+No LLM calls on our side. No cost. No rate limits beyond what PubMed, ArXiv, and Semantic Scholar impose.
-## Example
+---
-```
-User: What does the research say about cold exposure and metabolism?
+## Example output
-Agent calls: search_research("cold exposure brown adipose tissue metabolism")
-→ Returns 5 PubMed papers with abstracts
-→ Agent synthesizes: "Three RCTs found that regular cold water immersion (14°C,
-  1hr/week for 6 weeks) increased brown adipose tissue activity by 37-42%..."
+```json
+{
+  "query": "transformer architecture energy forecasting",
+  "domain": "cs",
+  "source_status": { "arxiv": "ok", "semanticscholar": "ok" },
+  "total": 5,
+  "papers": [
+    {
+      "id": "arxiv:2306.05042",
+      "source": "arxiv",
+      "title": "Temporal Fusion Transformers for Interpretable Multi-horizon Time Series Forecasting",
+      "authors": ["Bryan Lim", "Sercan Arik"],
+      "published": "2023-06-08",
+      "abstract": "...",
+      "url": "https://arxiv.org/abs/2306.05042",
+      "citations": 1423
+    }
+  ]
+}
 ```
+---
 ## Self-host
 ```bash

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "paperplain-mcp",
-  "version": "1.2.2",
+  "version": "1.2.4",
   "description": "MCP server — search 200M+ peer-reviewed papers from PubMed, ArXiv, and Semantic Scholar. Free. No API key.",
   "type": "module",
   "bin": {

package/server.js CHANGED Viewed

@@ -14,6 +14,14 @@ const PUBMED_BASE = "https://eutils.ncbi.nlm.nih.gov/entrez/eutils";
 const PUBMED_PARAMS = "tool=paperplain&email=hello@paperplain.io";
 const SEMANTIC_SCHOLAR_BASE = "https://api.semanticscholar.org/graph/v1";
+// Optional S2 API key — raises rate limits from ~1 req/s to 100 req/s.
+// Get a free key at semanticscholar.org/product/api
+// Set via MCP env config: { "env": { "S2_API_KEY": "your-key" } }
+const S2_API_KEY = process.env.S2_API_KEY || null;
+function s2Options() {
+  return S2_API_KEY ? { headers: { "x-api-key": S2_API_KEY } } : {};
+}
 // ── Domain classifier (keyword-based, no LLM needed) ───────────────────────
 // Note: "energy" intentionally excluded from health — it's more common in
 // CS/engineering contexts (energy management, HEMS, smart grid) than health.
@@ -71,11 +79,11 @@ function parseArxivXml(xml) {
   return papers;
 }
-async function fetchWithTimeout(url, ms = 10000) {
+async function fetchWithTimeout(url, ms = 10000, options = {}) {
   const controller = new AbortController();
   const timer = setTimeout(() => controller.abort(), ms);
   try {
-    return await fetch(url, { signal: controller.signal });
+    return await fetch(url, { signal: controller.signal, ...options });
   } finally {
     clearTimeout(timer);
   }
@@ -125,7 +133,8 @@ async function fetchS2ByArxivId(arxivId) {
   try {
     const fields = "title,abstract,authors,year,citationCount,openAccessPdf,externalIds";
     const res = await fetchWithTimeout(
-      `${SEMANTIC_SCHOLAR_BASE}/paper/ARXIV:${encodeURIComponent(clean)}?fields=${fields}`
+      `${SEMANTIC_SCHOLAR_BASE}/paper/ARXIV:${encodeURIComponent(clean)}?fields=${fields}`,
+      10000, s2Options()
     );
     if (!res.ok) return null;
     const item = await res.json().catch(() => null);
@@ -218,15 +227,11 @@ async function searchSemanticScholar(query, maxResults) {
   try {
     const fields = "title,abstract,authors,year,citationCount,openAccessPdf,externalIds";
     const url = `${SEMANTIC_SCHOLAR_BASE}/paper/search?query=${encodeURIComponent(query)}&limit=${maxResults}&fields=${fields}`;
-    const controller = new AbortController();
-    const timeout = setTimeout(() => controller.abort(), 10000);
-    let response;
-    try {
-      response = await fetch(url, { signal: controller.signal });
-    } finally {
-      clearTimeout(timeout);
+    const response = await fetchWithTimeout(url, 10000, s2Options());
+    if (!response.ok) {
+      if (response.status === 429) throw new Error("S2_RATE_LIMITED");
+      return [];
     }
-    if (!response.ok) return [];
     const data = await response.json().catch(() => null);
     if (!data?.data) return [];
     return data.data
@@ -254,7 +259,8 @@ async function searchSemanticScholar(query, maxResults) {
       })
       .filter(Boolean)
       .sort((a, b) => b.citations - a.citations);
-  } catch {
+  } catch (err) {
+    if (err.message === "S2_RATE_LIMITED") throw err;
     return [];
   }
 }
@@ -262,7 +268,7 @@ async function searchSemanticScholar(query, maxResults) {
 // ── MCP Server ─────────────────────────────────────────────────────────────
 const server = new McpServer({
   name: "paperplain",
-  version: "1.2.2",
+  version: "1.2.4",
   description:
     "Search 200M+ peer-reviewed papers from PubMed, ArXiv, and Semantic Scholar. Returns papers with full abstracts — use your own model to synthesize findings.",
 });
@@ -320,7 +326,10 @@ Use the returned abstracts to synthesize findings, answer the user's question, o
         const r = await searchSemanticScholar(q, n);
         sourceStatus.semanticscholar = r.length ? "ok" : "empty";
         return r;
-      } catch { sourceStatus.semanticscholar = "error"; return []; }
+      } catch (err) {
+        sourceStatus.semanticscholar = err.message === "S2_RATE_LIMITED" ? "rate_limited" : "error";
+        return [];
+      }
     }
     try {
@@ -338,8 +347,9 @@ Use the returned abstracts to synthesize findings, answer the user's question, o
           safeS2(query, Math.ceil(max_results / 2)),
         ]);
         const maxArxiv = Math.ceil(max_results * 0.6);
-        const arxivIds = new Set(arxiv.map((p) => p.id));
-        const uniqueS2 = s2.filter((p) => !arxivIds.has(p.id));
+        // Deduplicate on URL — S2 uses arxiv.org URLs for arXiv papers, matching exactly
+        const arxivUrls = new Set(arxiv.map((p) => p.url));
+        const uniqueS2 = s2.filter((p) => !arxivUrls.has(p.url));
         papers = [
           ...arxiv.slice(0, maxArxiv),
           ...uniqueS2.slice(0, max_results - Math.min(arxiv.length, maxArxiv)),
@@ -350,12 +360,15 @@ Use the returned abstracts to synthesize findings, answer the user's question, o
           safePubMed(query, max_results),
           safeS2(query, Math.ceil(max_results / 2)),
         ]);
+        // Deduplicate S2 against both ArXiv and PubMed URLs
+        const seenUrls = new Set([...arxiv.map((p) => p.url), ...pubmed.map((p) => p.url)]);
+        const uniqueS2 = s2.filter((p) => !seenUrls.has(p.url));
         const maxEach = Math.floor(max_results / 3);
         const remainder = max_results - maxEach * 3;
         papers = [
           ...arxiv.slice(0, maxEach + remainder),
           ...pubmed.slice(0, maxEach),
-          ...s2.slice(0, maxEach),
+          ...uniqueS2.slice(0, maxEach),
         ].slice(0, max_results);
       }
@@ -368,6 +381,7 @@ Use the returned abstracts to synthesize findings, answer the user's question, o
         : ["arxiv", "pubmed", "semanticscholar"];
       for (const src of expectedSources) {
         if (sourceStatus[src] === "empty") warnings.push(`${src}: returned 0 results (API may be rate-limited or query too specific)`);
+        if (sourceStatus[src] === "rate_limited") warnings.push(`${src}: rate-limited (429) — wait 60s and retry, or add S2_API_KEY to your MCP env config for higher limits`);
         if (sourceStatus[src] === "error") warnings.push(`${src}: request failed (API may be temporarily unavailable)`);
       }
@@ -415,7 +429,8 @@ async function fetchS2ByDoi(doi) {
     const clean = doi.replace(/^doi:/i, "").trim();
     const fields = "title,abstract,authors,year,citationCount,openAccessPdf,externalIds";
     const res = await fetchWithTimeout(
-      `${SEMANTIC_SCHOLAR_BASE}/paper/DOI:${encodeURIComponent(clean)}?fields=${fields}`
+      `${SEMANTIC_SCHOLAR_BASE}/paper/DOI:${encodeURIComponent(clean)}?fields=${fields}`,
+      10000, s2Options()
     );
     if (!res.ok) return null;
     const item = await res.json().catch(() => null);
@@ -544,10 +559,13 @@ Useful for verifying a citation or retrieving abstract details for a paper you a
     try {
       const fields = "title,abstract,authors,year,citationCount,openAccessPdf,externalIds";
       const url = `${SEMANTIC_SCHOLAR_BASE}/paper/search?query=${encodeURIComponent(title)}&limit=5&fields=${fields}`;
-      const res = await fetchWithTimeout(url);
+      const res = await fetchWithTimeout(url, 10000, s2Options());
       if (!res.ok) {
+        const msg = res.status === 429
+          ? `Rate limited by Semantic Scholar (429). Wait 60 seconds and retry. To avoid this for batch workflows, add S2_API_KEY to your MCP env config.`
+          : `Search failed: Semantic Scholar returned ${res.status}`;
         return {
-          content: [{ type: "text", text: `Search failed: Semantic Scholar returned ${res.status}` }],
+          content: [{ type: "text", text: msg }],
           isError: true,
         };
       }