paperqa-mcp-server 0.1.0__tar.gz
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
|
@@ -0,0 +1,21 @@
|
|
|
1
|
+
MIT License
|
|
2
|
+
|
|
3
|
+
Copyright (c) 2025 Menyoung Lee
|
|
4
|
+
|
|
5
|
+
Permission is hereby granted, free of charge, to any person obtaining a copy
|
|
6
|
+
of this software and associated documentation files (the "Software"), to deal
|
|
7
|
+
in the Software without restriction, including without limitation the rights
|
|
8
|
+
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
|
|
9
|
+
copies of the Software, and to permit persons to whom the Software is
|
|
10
|
+
furnished to do so, subject to the following conditions:
|
|
11
|
+
|
|
12
|
+
The above copyright notice and this permission notice shall be included in all
|
|
13
|
+
copies or substantial portions of the Software.
|
|
14
|
+
|
|
15
|
+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
|
16
|
+
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
|
17
|
+
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
|
|
18
|
+
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
|
19
|
+
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
|
|
20
|
+
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
|
|
21
|
+
SOFTWARE.
|
|
@@ -0,0 +1,246 @@
|
|
|
1
|
+
Metadata-Version: 2.4
|
|
2
|
+
Name: paperqa-mcp-server
|
|
3
|
+
Version: 0.1.0
|
|
4
|
+
Summary: MCP server exposing PaperQA2 for deep synthesis across scientific papers
|
|
5
|
+
Project-URL: Repository, https://github.com/menyoung/paperqa-mcp-server
|
|
6
|
+
Project-URL: Issues, https://github.com/menyoung/paperqa-mcp-server/issues
|
|
7
|
+
License-Expression: MIT
|
|
8
|
+
License-File: LICENSE
|
|
9
|
+
Keywords: llm,mcp,paperqa,rag,research,scientific-literature
|
|
10
|
+
Classifier: Development Status :: 4 - Beta
|
|
11
|
+
Classifier: Intended Audience :: Science/Research
|
|
12
|
+
Classifier: License :: OSI Approved :: MIT License
|
|
13
|
+
Classifier: Programming Language :: Python :: 3
|
|
14
|
+
Requires-Python: >=3.11
|
|
15
|
+
Requires-Dist: mcp[cli]>=1.2.0
|
|
16
|
+
Requires-Dist: paper-qa<2026.3,>=2026.2
|
|
17
|
+
Requires-Dist: pillow
|
|
18
|
+
Description-Content-Type: text/markdown
|
|
19
|
+
|
|
20
|
+
# paperqa-mcp-server
|
|
21
|
+
|
|
22
|
+
Give Claude the ability to read, search, and synthesize across your
|
|
23
|
+
entire PDF library. Built on [PaperQA2](https://github.com/Future-House/paper-qa).
|
|
24
|
+
|
|
25
|
+
Point it at your Zotero storage folder (or any folder of PDFs) and ask
|
|
26
|
+
Claude questions that require deep reading across multiple papers.
|
|
27
|
+
|
|
28
|
+
## Quick start
|
|
29
|
+
|
|
30
|
+
### 1. Install uv
|
|
31
|
+
|
|
32
|
+
[uv](https://docs.astral.sh/uv/) is a Python package manager. If you don't
|
|
33
|
+
have it yet:
|
|
34
|
+
|
|
35
|
+
```bash
|
|
36
|
+
curl -LsSf https://astral.sh/uv/install.sh | sh
|
|
37
|
+
```
|
|
38
|
+
|
|
39
|
+
After installing, **restart your terminal** so `uv` is on your PATH.
|
|
40
|
+
|
|
41
|
+
Verify it works:
|
|
42
|
+
|
|
43
|
+
```bash
|
|
44
|
+
uv --version
|
|
45
|
+
```
|
|
46
|
+
|
|
47
|
+
### 2. Get an OpenAI API key
|
|
48
|
+
|
|
49
|
+
PaperQA2 uses OpenAI for embeddings and internal reasoning. Get a key at
|
|
50
|
+
https://platform.openai.com/api-keys
|
|
51
|
+
|
|
52
|
+
### 3. Test that it runs
|
|
53
|
+
|
|
54
|
+
This downloads ~90 Python packages the first time — that's normal:
|
|
55
|
+
|
|
56
|
+
```bash
|
|
57
|
+
uvx paperqa-mcp-server --help 2>/dev/null; echo "OK if no Python errors above"
|
|
58
|
+
```
|
|
59
|
+
|
|
60
|
+
### 4. Add to Claude Desktop
|
|
61
|
+
|
|
62
|
+
1. Open Claude Desktop
|
|
63
|
+
2. Go to **Settings → Developer → Edit Config**
|
|
64
|
+
3. This opens `claude_desktop_config.json`. Add a `paperqa` entry inside
|
|
65
|
+
`mcpServers` (create `mcpServers` if it doesn't exist):
|
|
66
|
+
|
|
67
|
+
First, find your full path to `uvx`:
|
|
68
|
+
|
|
69
|
+
```bash
|
|
70
|
+
which uvx # e.g. /Users/yourname/.local/bin/uvx
|
|
71
|
+
```
|
|
72
|
+
|
|
73
|
+
Then use that path in the config:
|
|
74
|
+
|
|
75
|
+
```json
|
|
76
|
+
{
|
|
77
|
+
"mcpServers": {
|
|
78
|
+
"paperqa": {
|
|
79
|
+
"command": "/FULL/PATH/TO/uvx",
|
|
80
|
+
"args": ["paperqa-mcp-server"],
|
|
81
|
+
"env": {
|
|
82
|
+
"OPENAI_API_KEY": "sk-your-key-here"
|
|
83
|
+
}
|
|
84
|
+
}
|
|
85
|
+
}
|
|
86
|
+
}
|
|
87
|
+
```
|
|
88
|
+
|
|
89
|
+
Replace the two placeholders:
|
|
90
|
+
- `/FULL/PATH/TO/uvx` — paste the output of `which uvx`
|
|
91
|
+
- `sk-your-key-here` — your OpenAI API key from step 2
|
|
92
|
+
|
|
93
|
+
If your PDFs are somewhere other than `~/Zotero/storage`, add a
|
|
94
|
+
`PAPER_DIRECTORY` entry to `env`:
|
|
95
|
+
|
|
96
|
+
```json
|
|
97
|
+
"env": {
|
|
98
|
+
"OPENAI_API_KEY": "sk-your-key-here",
|
|
99
|
+
"PAPER_DIRECTORY": "/full/path/to/your/pdfs"
|
|
100
|
+
}
|
|
101
|
+
```
|
|
102
|
+
|
|
103
|
+
4. **Quit Claude Desktop completely** (Cmd+Q, not just close the window)
|
|
104
|
+
and reopen it
|
|
105
|
+
5. You should see a hammer icon — click it and `paper_qa` should be listed
|
|
106
|
+
|
|
107
|
+
### 5. Pre-build the index
|
|
108
|
+
|
|
109
|
+
Before Claude can search your papers, the server needs to build a search
|
|
110
|
+
index. This reads each PDF, splits it into chunks, and sends the chunks
|
|
111
|
+
to OpenAI's embedding API. With hundreds of papers this takes a while
|
|
112
|
+
and costs a few dollars in API calls.
|
|
113
|
+
|
|
114
|
+
If you have more than 10 unindexed papers, the server will refuse to
|
|
115
|
+
answer queries and tell you to run this step first. A few new papers
|
|
116
|
+
will be indexed automatically when you query.
|
|
117
|
+
|
|
118
|
+
```bash
|
|
119
|
+
OPENAI_API_KEY=sk-your-key-here uvx paperqa-mcp-server index
|
|
120
|
+
```
|
|
121
|
+
|
|
122
|
+
**If this crashes** with a rate limit error, just re-run the same command.
|
|
123
|
+
It picks up where it left off — each run indexes more files. With a large
|
|
124
|
+
library (500+ papers) you may need to run it a few times.
|
|
125
|
+
|
|
126
|
+
After that, the index is cached at `~/.pqa/indexes/`. Only new or changed
|
|
127
|
+
files get re-processed on subsequent runs.
|
|
128
|
+
|
|
129
|
+
## Troubleshooting
|
|
130
|
+
|
|
131
|
+
**"Server disconnected" in Claude Desktop**
|
|
132
|
+
|
|
133
|
+
Claude Desktop has a short startup timeout. If `uv` needs to download
|
|
134
|
+
packages on first launch, it will time out. Fix: run `uvx paperqa-mcp-server`
|
|
135
|
+
once from the terminal first so packages are cached.
|
|
136
|
+
|
|
137
|
+
**"Index incomplete" when querying**
|
|
138
|
+
|
|
139
|
+
The server checks the index before each query. If too many papers are
|
|
140
|
+
unindexed, it returns a diagnostic message instead of trying (and
|
|
141
|
+
failing) to index them all on the fly. Fix: run the index command in
|
|
142
|
+
step 5.
|
|
143
|
+
|
|
144
|
+
**Hammer icon doesn't appear**
|
|
145
|
+
|
|
146
|
+
Make sure you quit Claude Desktop completely (Cmd+Q) and reopened it.
|
|
147
|
+
Check for JSON syntax errors in `claude_desktop_config.json` — a
|
|
148
|
+
missing comma is the most common mistake.
|
|
149
|
+
|
|
150
|
+
## Use a different LLM
|
|
151
|
+
|
|
152
|
+
By default, PaperQA2 uses `gpt-4o-mini` for its internal reasoning.
|
|
153
|
+
This is separate from Claude — Claude calls the tool, PaperQA2 does
|
|
154
|
+
its own LLM calls internally to gather and synthesize evidence.
|
|
155
|
+
|
|
156
|
+
To use a different model, add env vars to your Claude Desktop config:
|
|
157
|
+
|
|
158
|
+
```json
|
|
159
|
+
"env": {
|
|
160
|
+
"OPENAI_API_KEY": "sk-your-key-here",
|
|
161
|
+
"PQA_LLM": "gpt-4o",
|
|
162
|
+
"PQA_SUMMARY_LLM": "gpt-4o-mini"
|
|
163
|
+
}
|
|
164
|
+
```
|
|
165
|
+
|
|
166
|
+
## All environment variables
|
|
167
|
+
|
|
168
|
+
| Variable | Default | Purpose |
|
|
169
|
+
|---|---|---|
|
|
170
|
+
| `PAPER_DIRECTORY` | `~/Zotero/storage` | Folder containing your PDFs |
|
|
171
|
+
| `OPENAI_API_KEY` | — | **Required** for default embeddings |
|
|
172
|
+
| `PQA_LLM` | `gpt-4o-mini` | LLM for internal reasoning |
|
|
173
|
+
| `PQA_SUMMARY_LLM` | `gpt-4o-mini` | LLM for summarizing chunks |
|
|
174
|
+
| `PQA_EMBEDDING` | `text-embedding-3-small` | Embedding model |
|
|
175
|
+
| `ANTHROPIC_API_KEY` | — | Only if using Claude as internal LLM |
|
|
176
|
+
|
|
177
|
+
## Works with zotero-mcp
|
|
178
|
+
|
|
179
|
+
This pairs well with [zotero-mcp](https://github.com/54yyyu/zotero-mcp):
|
|
180
|
+
|
|
181
|
+
- **paperqa-mcp-server** — deep reading and synthesis across full paper text
|
|
182
|
+
- **zotero-mcp** — browse your library, search metadata, read annotations
|
|
183
|
+
|
|
184
|
+
Claude can cross-reference between them — for example, finding papers
|
|
185
|
+
with PaperQA and then pulling up their Zotero metadata and annotations.
|
|
186
|
+
PaperQA2's citations include Zotero storage keys (e.g. `ABC123DE` from
|
|
187
|
+
`storage/ABC123DE/paper.pdf`) that Claude can use to look up items via
|
|
188
|
+
zotero-mcp.
|
|
189
|
+
|
|
190
|
+
## Index implementation notes
|
|
191
|
+
|
|
192
|
+
`paperqa-mcp-server index` uses the same `_settings()` function as the MCP
|
|
193
|
+
server, so the index it builds is exactly the one the server will look
|
|
194
|
+
for. The PaperQA2 index directory name is a hash of the settings
|
|
195
|
+
(embedding model, chunk size, paper directory path, etc.). The settings
|
|
196
|
+
include:
|
|
197
|
+
|
|
198
|
+
- **Multimodal OFF** — skip image extraction from PDFs (avoids a crash on
|
|
199
|
+
PDFs with CMYK images)
|
|
200
|
+
- **Doc details OFF** — skip Crossref/Semantic Scholar metadata lookups
|
|
201
|
+
(avoids rate limits; Claude can get metadata from Zotero directly via
|
|
202
|
+
zotero-mcp)
|
|
203
|
+
- **Concurrency 1** — index one file at a time to stay under OpenAI's
|
|
204
|
+
embedding rate limit
|
|
205
|
+
|
|
206
|
+
> **Why not `pqa index`?** The `pqa` CLI constructs settings via pydantic's
|
|
207
|
+
> `CliSettingsSource`, which produces different defaults than constructing
|
|
208
|
+
> `Settings()` directly in Python (e.g. `chunk_chars` of 7000 vs 5000).
|
|
209
|
+
> Different settings = different index hash = server can't find the index.
|
|
210
|
+
> Always use `paperqa-mcp-server index` to build the index.
|
|
211
|
+
|
|
212
|
+
## Install from GitHub (latest)
|
|
213
|
+
|
|
214
|
+
To use the latest version from the main branch instead of PyPI:
|
|
215
|
+
|
|
216
|
+
```json
|
|
217
|
+
{
|
|
218
|
+
"mcpServers": {
|
|
219
|
+
"paperqa": {
|
|
220
|
+
"command": "/FULL/PATH/TO/uvx",
|
|
221
|
+
"args": ["--from", "git+https://github.com/menyoung/paperqa-mcp-server", "paperqa-mcp-server"],
|
|
222
|
+
"env": {
|
|
223
|
+
"OPENAI_API_KEY": "sk-your-key-here"
|
|
224
|
+
}
|
|
225
|
+
}
|
|
226
|
+
}
|
|
227
|
+
}
|
|
228
|
+
```
|
|
229
|
+
|
|
230
|
+
To build the index from the latest main branch:
|
|
231
|
+
|
|
232
|
+
```bash
|
|
233
|
+
OPENAI_API_KEY=sk-your-key-here uvx --from git+https://github.com/menyoung/paperqa-mcp-server paperqa-mcp-server index
|
|
234
|
+
```
|
|
235
|
+
|
|
236
|
+
## Development
|
|
237
|
+
|
|
238
|
+
If you want to contribute or modify the server locally:
|
|
239
|
+
|
|
240
|
+
```bash
|
|
241
|
+
git clone https://github.com/menyoung/paperqa-mcp-server.git
|
|
242
|
+
cd paperqa-mcp-server
|
|
243
|
+
uv sync
|
|
244
|
+
uv run paperqa-mcp-server # run the server
|
|
245
|
+
uv run paperqa-mcp-server index # build the index
|
|
246
|
+
```
|
|
@@ -0,0 +1,227 @@
|
|
|
1
|
+
# paperqa-mcp-server
|
|
2
|
+
|
|
3
|
+
Give Claude the ability to read, search, and synthesize across your
|
|
4
|
+
entire PDF library. Built on [PaperQA2](https://github.com/Future-House/paper-qa).
|
|
5
|
+
|
|
6
|
+
Point it at your Zotero storage folder (or any folder of PDFs) and ask
|
|
7
|
+
Claude questions that require deep reading across multiple papers.
|
|
8
|
+
|
|
9
|
+
## Quick start
|
|
10
|
+
|
|
11
|
+
### 1. Install uv
|
|
12
|
+
|
|
13
|
+
[uv](https://docs.astral.sh/uv/) is a Python package manager. If you don't
|
|
14
|
+
have it yet:
|
|
15
|
+
|
|
16
|
+
```bash
|
|
17
|
+
curl -LsSf https://astral.sh/uv/install.sh | sh
|
|
18
|
+
```
|
|
19
|
+
|
|
20
|
+
After installing, **restart your terminal** so `uv` is on your PATH.
|
|
21
|
+
|
|
22
|
+
Verify it works:
|
|
23
|
+
|
|
24
|
+
```bash
|
|
25
|
+
uv --version
|
|
26
|
+
```
|
|
27
|
+
|
|
28
|
+
### 2. Get an OpenAI API key
|
|
29
|
+
|
|
30
|
+
PaperQA2 uses OpenAI for embeddings and internal reasoning. Get a key at
|
|
31
|
+
https://platform.openai.com/api-keys
|
|
32
|
+
|
|
33
|
+
### 3. Test that it runs
|
|
34
|
+
|
|
35
|
+
This downloads ~90 Python packages the first time — that's normal:
|
|
36
|
+
|
|
37
|
+
```bash
|
|
38
|
+
uvx paperqa-mcp-server --help 2>/dev/null; echo "OK if no Python errors above"
|
|
39
|
+
```
|
|
40
|
+
|
|
41
|
+
### 4. Add to Claude Desktop
|
|
42
|
+
|
|
43
|
+
1. Open Claude Desktop
|
|
44
|
+
2. Go to **Settings → Developer → Edit Config**
|
|
45
|
+
3. This opens `claude_desktop_config.json`. Add a `paperqa` entry inside
|
|
46
|
+
`mcpServers` (create `mcpServers` if it doesn't exist):
|
|
47
|
+
|
|
48
|
+
First, find your full path to `uvx`:
|
|
49
|
+
|
|
50
|
+
```bash
|
|
51
|
+
which uvx # e.g. /Users/yourname/.local/bin/uvx
|
|
52
|
+
```
|
|
53
|
+
|
|
54
|
+
Then use that path in the config:
|
|
55
|
+
|
|
56
|
+
```json
|
|
57
|
+
{
|
|
58
|
+
"mcpServers": {
|
|
59
|
+
"paperqa": {
|
|
60
|
+
"command": "/FULL/PATH/TO/uvx",
|
|
61
|
+
"args": ["paperqa-mcp-server"],
|
|
62
|
+
"env": {
|
|
63
|
+
"OPENAI_API_KEY": "sk-your-key-here"
|
|
64
|
+
}
|
|
65
|
+
}
|
|
66
|
+
}
|
|
67
|
+
}
|
|
68
|
+
```
|
|
69
|
+
|
|
70
|
+
Replace the two placeholders:
|
|
71
|
+
- `/FULL/PATH/TO/uvx` — paste the output of `which uvx`
|
|
72
|
+
- `sk-your-key-here` — your OpenAI API key from step 2
|
|
73
|
+
|
|
74
|
+
If your PDFs are somewhere other than `~/Zotero/storage`, add a
|
|
75
|
+
`PAPER_DIRECTORY` entry to `env`:
|
|
76
|
+
|
|
77
|
+
```json
|
|
78
|
+
"env": {
|
|
79
|
+
"OPENAI_API_KEY": "sk-your-key-here",
|
|
80
|
+
"PAPER_DIRECTORY": "/full/path/to/your/pdfs"
|
|
81
|
+
}
|
|
82
|
+
```
|
|
83
|
+
|
|
84
|
+
4. **Quit Claude Desktop completely** (Cmd+Q, not just close the window)
|
|
85
|
+
and reopen it
|
|
86
|
+
5. You should see a hammer icon — click it and `paper_qa` should be listed
|
|
87
|
+
|
|
88
|
+
### 5. Pre-build the index
|
|
89
|
+
|
|
90
|
+
Before Claude can search your papers, the server needs to build a search
|
|
91
|
+
index. This reads each PDF, splits it into chunks, and sends the chunks
|
|
92
|
+
to OpenAI's embedding API. With hundreds of papers this takes a while
|
|
93
|
+
and costs a few dollars in API calls.
|
|
94
|
+
|
|
95
|
+
If you have more than 10 unindexed papers, the server will refuse to
|
|
96
|
+
answer queries and tell you to run this step first. A few new papers
|
|
97
|
+
will be indexed automatically when you query.
|
|
98
|
+
|
|
99
|
+
```bash
|
|
100
|
+
OPENAI_API_KEY=sk-your-key-here uvx paperqa-mcp-server index
|
|
101
|
+
```
|
|
102
|
+
|
|
103
|
+
**If this crashes** with a rate limit error, just re-run the same command.
|
|
104
|
+
It picks up where it left off — each run indexes more files. With a large
|
|
105
|
+
library (500+ papers) you may need to run it a few times.
|
|
106
|
+
|
|
107
|
+
After that, the index is cached at `~/.pqa/indexes/`. Only new or changed
|
|
108
|
+
files get re-processed on subsequent runs.
|
|
109
|
+
|
|
110
|
+
## Troubleshooting
|
|
111
|
+
|
|
112
|
+
**"Server disconnected" in Claude Desktop**
|
|
113
|
+
|
|
114
|
+
Claude Desktop has a short startup timeout. If `uv` needs to download
|
|
115
|
+
packages on first launch, it will time out. Fix: run `uvx paperqa-mcp-server`
|
|
116
|
+
once from the terminal first so packages are cached.
|
|
117
|
+
|
|
118
|
+
**"Index incomplete" when querying**
|
|
119
|
+
|
|
120
|
+
The server checks the index before each query. If too many papers are
|
|
121
|
+
unindexed, it returns a diagnostic message instead of trying (and
|
|
122
|
+
failing) to index them all on the fly. Fix: run the index command in
|
|
123
|
+
step 5.
|
|
124
|
+
|
|
125
|
+
**Hammer icon doesn't appear**
|
|
126
|
+
|
|
127
|
+
Make sure you quit Claude Desktop completely (Cmd+Q) and reopened it.
|
|
128
|
+
Check for JSON syntax errors in `claude_desktop_config.json` — a
|
|
129
|
+
missing comma is the most common mistake.
|
|
130
|
+
|
|
131
|
+
## Use a different LLM
|
|
132
|
+
|
|
133
|
+
By default, PaperQA2 uses `gpt-4o-mini` for its internal reasoning.
|
|
134
|
+
This is separate from Claude — Claude calls the tool, PaperQA2 does
|
|
135
|
+
its own LLM calls internally to gather and synthesize evidence.
|
|
136
|
+
|
|
137
|
+
To use a different model, add env vars to your Claude Desktop config:
|
|
138
|
+
|
|
139
|
+
```json
|
|
140
|
+
"env": {
|
|
141
|
+
"OPENAI_API_KEY": "sk-your-key-here",
|
|
142
|
+
"PQA_LLM": "gpt-4o",
|
|
143
|
+
"PQA_SUMMARY_LLM": "gpt-4o-mini"
|
|
144
|
+
}
|
|
145
|
+
```
|
|
146
|
+
|
|
147
|
+
## All environment variables
|
|
148
|
+
|
|
149
|
+
| Variable | Default | Purpose |
|
|
150
|
+
|---|---|---|
|
|
151
|
+
| `PAPER_DIRECTORY` | `~/Zotero/storage` | Folder containing your PDFs |
|
|
152
|
+
| `OPENAI_API_KEY` | — | **Required** for default embeddings |
|
|
153
|
+
| `PQA_LLM` | `gpt-4o-mini` | LLM for internal reasoning |
|
|
154
|
+
| `PQA_SUMMARY_LLM` | `gpt-4o-mini` | LLM for summarizing chunks |
|
|
155
|
+
| `PQA_EMBEDDING` | `text-embedding-3-small` | Embedding model |
|
|
156
|
+
| `ANTHROPIC_API_KEY` | — | Only if using Claude as internal LLM |
|
|
157
|
+
|
|
158
|
+
## Works with zotero-mcp
|
|
159
|
+
|
|
160
|
+
This pairs well with [zotero-mcp](https://github.com/54yyyu/zotero-mcp):
|
|
161
|
+
|
|
162
|
+
- **paperqa-mcp-server** — deep reading and synthesis across full paper text
|
|
163
|
+
- **zotero-mcp** — browse your library, search metadata, read annotations
|
|
164
|
+
|
|
165
|
+
Claude can cross-reference between them — for example, finding papers
|
|
166
|
+
with PaperQA and then pulling up their Zotero metadata and annotations.
|
|
167
|
+
PaperQA2's citations include Zotero storage keys (e.g. `ABC123DE` from
|
|
168
|
+
`storage/ABC123DE/paper.pdf`) that Claude can use to look up items via
|
|
169
|
+
zotero-mcp.
|
|
170
|
+
|
|
171
|
+
## Index implementation notes
|
|
172
|
+
|
|
173
|
+
`paperqa-mcp-server index` uses the same `_settings()` function as the MCP
|
|
174
|
+
server, so the index it builds is exactly the one the server will look
|
|
175
|
+
for. The PaperQA2 index directory name is a hash of the settings
|
|
176
|
+
(embedding model, chunk size, paper directory path, etc.). The settings
|
|
177
|
+
include:
|
|
178
|
+
|
|
179
|
+
- **Multimodal OFF** — skip image extraction from PDFs (avoids a crash on
|
|
180
|
+
PDFs with CMYK images)
|
|
181
|
+
- **Doc details OFF** — skip Crossref/Semantic Scholar metadata lookups
|
|
182
|
+
(avoids rate limits; Claude can get metadata from Zotero directly via
|
|
183
|
+
zotero-mcp)
|
|
184
|
+
- **Concurrency 1** — index one file at a time to stay under OpenAI's
|
|
185
|
+
embedding rate limit
|
|
186
|
+
|
|
187
|
+
> **Why not `pqa index`?** The `pqa` CLI constructs settings via pydantic's
|
|
188
|
+
> `CliSettingsSource`, which produces different defaults than constructing
|
|
189
|
+
> `Settings()` directly in Python (e.g. `chunk_chars` of 7000 vs 5000).
|
|
190
|
+
> Different settings = different index hash = server can't find the index.
|
|
191
|
+
> Always use `paperqa-mcp-server index` to build the index.
|
|
192
|
+
|
|
193
|
+
## Install from GitHub (latest)
|
|
194
|
+
|
|
195
|
+
To use the latest version from the main branch instead of PyPI:
|
|
196
|
+
|
|
197
|
+
```json
|
|
198
|
+
{
|
|
199
|
+
"mcpServers": {
|
|
200
|
+
"paperqa": {
|
|
201
|
+
"command": "/FULL/PATH/TO/uvx",
|
|
202
|
+
"args": ["--from", "git+https://github.com/menyoung/paperqa-mcp-server", "paperqa-mcp-server"],
|
|
203
|
+
"env": {
|
|
204
|
+
"OPENAI_API_KEY": "sk-your-key-here"
|
|
205
|
+
}
|
|
206
|
+
}
|
|
207
|
+
}
|
|
208
|
+
}
|
|
209
|
+
```
|
|
210
|
+
|
|
211
|
+
To build the index from the latest main branch:
|
|
212
|
+
|
|
213
|
+
```bash
|
|
214
|
+
OPENAI_API_KEY=sk-your-key-here uvx --from git+https://github.com/menyoung/paperqa-mcp-server paperqa-mcp-server index
|
|
215
|
+
```
|
|
216
|
+
|
|
217
|
+
## Development
|
|
218
|
+
|
|
219
|
+
If you want to contribute or modify the server locally:
|
|
220
|
+
|
|
221
|
+
```bash
|
|
222
|
+
git clone https://github.com/menyoung/paperqa-mcp-server.git
|
|
223
|
+
cd paperqa-mcp-server
|
|
224
|
+
uv sync
|
|
225
|
+
uv run paperqa-mcp-server # run the server
|
|
226
|
+
uv run paperqa-mcp-server index # build the index
|
|
227
|
+
```
|
|
@@ -0,0 +1,33 @@
|
|
|
1
|
+
[build-system]
|
|
2
|
+
requires = ["hatchling"]
|
|
3
|
+
build-backend = "hatchling.build"
|
|
4
|
+
|
|
5
|
+
[project]
|
|
6
|
+
name = "paperqa-mcp-server"
|
|
7
|
+
version = "0.1.0"
|
|
8
|
+
description = "MCP server exposing PaperQA2 for deep synthesis across scientific papers"
|
|
9
|
+
readme = "README.md"
|
|
10
|
+
license = "MIT"
|
|
11
|
+
requires-python = ">=3.11"
|
|
12
|
+
keywords = ["mcp", "paperqa", "scientific-literature", "research", "llm", "rag"]
|
|
13
|
+
classifiers = [
|
|
14
|
+
"Development Status :: 4 - Beta",
|
|
15
|
+
"Intended Audience :: Science/Research",
|
|
16
|
+
"License :: OSI Approved :: MIT License",
|
|
17
|
+
"Programming Language :: Python :: 3",
|
|
18
|
+
]
|
|
19
|
+
dependencies = [
|
|
20
|
+
"paper-qa>=2026.2,<2026.3",
|
|
21
|
+
"mcp[cli]>=1.2.0",
|
|
22
|
+
"pillow",
|
|
23
|
+
]
|
|
24
|
+
|
|
25
|
+
[project.scripts]
|
|
26
|
+
paperqa-mcp-server = "paperqa_mcp_server:main"
|
|
27
|
+
|
|
28
|
+
[project.urls]
|
|
29
|
+
Repository = "https://github.com/menyoung/paperqa-mcp-server"
|
|
30
|
+
Issues = "https://github.com/menyoung/paperqa-mcp-server/issues"
|
|
31
|
+
|
|
32
|
+
[tool.hatch.build.targets.wheel]
|
|
33
|
+
packages = ["src/paperqa_mcp_server"]
|
|
@@ -0,0 +1,180 @@
|
|
|
1
|
+
"""MCP server exposing PaperQA2 for deep synthesis across scientific papers."""
|
|
2
|
+
|
|
3
|
+
from __future__ import annotations
|
|
4
|
+
|
|
5
|
+
import os
|
|
6
|
+
import pathlib
|
|
7
|
+
import pickle
|
|
8
|
+
import zlib
|
|
9
|
+
|
|
10
|
+
from mcp.server.fastmcp import FastMCP
|
|
11
|
+
from paperqa import Settings, agent_query
|
|
12
|
+
|
|
13
|
+
mcp = FastMCP("paperqa")
|
|
14
|
+
|
|
15
|
+
|
|
16
|
+
def _settings() -> Settings:
|
|
17
|
+
return Settings(
|
|
18
|
+
llm=os.environ.get("PQA_LLM", "gpt-4o-mini"),
|
|
19
|
+
summary_llm=os.environ.get("PQA_SUMMARY_LLM", "gpt-4o-mini"),
|
|
20
|
+
embedding=os.environ.get("PQA_EMBEDDING", "text-embedding-3-small"),
|
|
21
|
+
temperature=0.1,
|
|
22
|
+
parsing={"multimodal": "OFF", "use_doc_details": False},
|
|
23
|
+
answer={"evidence_k": 15, "answer_max_sources": 10},
|
|
24
|
+
agent={
|
|
25
|
+
"index": {
|
|
26
|
+
"paper_directory": os.environ.get(
|
|
27
|
+
"PAPER_DIRECTORY",
|
|
28
|
+
os.path.expanduser("~/Zotero/storage"),
|
|
29
|
+
),
|
|
30
|
+
"concurrency": 1,
|
|
31
|
+
}
|
|
32
|
+
},
|
|
33
|
+
)
|
|
34
|
+
|
|
35
|
+
|
|
36
|
+
_UNINDEXED_THRESHOLD = 10
|
|
37
|
+
|
|
38
|
+
|
|
39
|
+
def _index_status(settings: Settings | None = None) -> dict:
|
|
40
|
+
"""Read the index manifest and compare against files in the paper directory.
|
|
41
|
+
|
|
42
|
+
Returns a dict with keys: indexed, errored, unindexed, total, ready, message.
|
|
43
|
+
"""
|
|
44
|
+
if settings is None:
|
|
45
|
+
settings = _settings()
|
|
46
|
+
index_name = settings.get_index_name()
|
|
47
|
+
index_dir = pathlib.Path(settings.agent.index.index_directory) / index_name
|
|
48
|
+
paper_dir = pathlib.Path(settings.agent.index.paper_directory)
|
|
49
|
+
files_filter = settings.agent.index.files_filter
|
|
50
|
+
|
|
51
|
+
# Discover files PaperQA would try to index (same filter as paperqa)
|
|
52
|
+
total = 0
|
|
53
|
+
if paper_dir.is_dir():
|
|
54
|
+
total = sum(1 for f in paper_dir.rglob("*") if files_filter(f))
|
|
55
|
+
|
|
56
|
+
# Read the manifest
|
|
57
|
+
manifest_path = index_dir / "files.zip"
|
|
58
|
+
manifest: dict[str, str] = {}
|
|
59
|
+
manifest_error = False
|
|
60
|
+
if manifest_path.exists():
|
|
61
|
+
try:
|
|
62
|
+
manifest = pickle.loads(zlib.decompress(manifest_path.read_bytes()))
|
|
63
|
+
except Exception:
|
|
64
|
+
manifest_error = True
|
|
65
|
+
|
|
66
|
+
errored = sum(1 for v in manifest.values() if v == "ERROR")
|
|
67
|
+
indexed = len(manifest) - errored
|
|
68
|
+
unindexed = max(0, total - len(manifest))
|
|
69
|
+
|
|
70
|
+
ready = unindexed <= _UNINDEXED_THRESHOLD and not manifest_error
|
|
71
|
+
if manifest_error:
|
|
72
|
+
message = (
|
|
73
|
+
f"Index manifest is corrupt ({total} files on disk)."
|
|
74
|
+
" Rebuild the index from the terminal"
|
|
75
|
+
" — see the paperqa-mcp-server README, step 5."
|
|
76
|
+
)
|
|
77
|
+
else:
|
|
78
|
+
message = f"{indexed}/{total} papers indexed"
|
|
79
|
+
if errored:
|
|
80
|
+
message += f", {errored} errors"
|
|
81
|
+
if unindexed:
|
|
82
|
+
message += f", {unindexed} unindexed"
|
|
83
|
+
if ready:
|
|
84
|
+
message += ". Ready to query."
|
|
85
|
+
else:
|
|
86
|
+
message += (
|
|
87
|
+
". Queries will fail or time out."
|
|
88
|
+
" Please finish building the index from the terminal"
|
|
89
|
+
" — see the paperqa-mcp-server README, step 5."
|
|
90
|
+
)
|
|
91
|
+
|
|
92
|
+
return {
|
|
93
|
+
"indexed": indexed,
|
|
94
|
+
"errored": errored,
|
|
95
|
+
"unindexed": unindexed,
|
|
96
|
+
"total": total,
|
|
97
|
+
"ready": ready,
|
|
98
|
+
"message": message,
|
|
99
|
+
}
|
|
100
|
+
|
|
101
|
+
|
|
102
|
+
@mcp.tool()
|
|
103
|
+
async def index_status() -> str:
|
|
104
|
+
"""Check the health of the paper index.
|
|
105
|
+
|
|
106
|
+
Returns a summary of how many papers are indexed, how many have
|
|
107
|
+
errors, and how many are unindexed. Use this to diagnose why
|
|
108
|
+
paper_qa queries might be failing or timing out.
|
|
109
|
+
"""
|
|
110
|
+
status = _index_status()
|
|
111
|
+
lines = [
|
|
112
|
+
f"Index status: {status['message']}",
|
|
113
|
+
f" Indexed: {status['indexed']}",
|
|
114
|
+
f" Errors: {status['errored']}",
|
|
115
|
+
f" Unindexed: {status['unindexed']}",
|
|
116
|
+
f" Total files: {status['total']}",
|
|
117
|
+
]
|
|
118
|
+
return "\n".join(lines)
|
|
119
|
+
|
|
120
|
+
|
|
121
|
+
@mcp.tool()
|
|
122
|
+
async def paper_qa(query: str) -> str:
|
|
123
|
+
"""Search and synthesize across all papers in the library.
|
|
124
|
+
|
|
125
|
+
Use this for questions that require deep reading and synthesis
|
|
126
|
+
across multiple scientific papers — e.g. "What methods have been
|
|
127
|
+
used to recycle lithium from spent batteries?" or "Compare the
|
|
128
|
+
thermal stability of PEEK vs PTFE in the literature."
|
|
129
|
+
|
|
130
|
+
Returns a detailed answer with inline citations. Each citation
|
|
131
|
+
includes a file path containing an 8-character Zotero storage key
|
|
132
|
+
(e.g. ABC123DE from storage/ABC123DE/paper.pdf). You can use these
|
|
133
|
+
keys with zotero-mcp tools to look up the full bibliographic record,
|
|
134
|
+
read annotations, or find related items.
|
|
135
|
+
|
|
136
|
+
Not for quick metadata lookups or library browsing — use Zotero
|
|
137
|
+
tools for that.
|
|
138
|
+
|
|
139
|
+
If this tool returns "Index incomplete", the paper index has not
|
|
140
|
+
been fully built yet. Tell the user to run the index build command
|
|
141
|
+
from the terminal (see the paperqa-mcp-server README, step 5).
|
|
142
|
+
Do not retry the query — it will give the same result until the
|
|
143
|
+
index is built.
|
|
144
|
+
|
|
145
|
+
This tool can take 30–90 seconds to respond when working normally.
|
|
146
|
+
"""
|
|
147
|
+
settings = _settings()
|
|
148
|
+
status = _index_status(settings)
|
|
149
|
+
if not status["ready"]:
|
|
150
|
+
return f"Index incomplete: {status['message']}"
|
|
151
|
+
|
|
152
|
+
try:
|
|
153
|
+
response = await agent_query(query=query, settings=settings)
|
|
154
|
+
except Exception as e:
|
|
155
|
+
return f"PaperQA error: {e}"
|
|
156
|
+
if not response.session.formatted_answer:
|
|
157
|
+
return f"PaperQA could not answer (status: {response.status})."
|
|
158
|
+
return response.session.formatted_answer
|
|
159
|
+
|
|
160
|
+
|
|
161
|
+
def _build_index() -> None:
|
|
162
|
+
"""Build the search index using the same settings as the MCP server."""
|
|
163
|
+
import asyncio
|
|
164
|
+
|
|
165
|
+
from paperqa.agents.search import get_directory_index
|
|
166
|
+
|
|
167
|
+
settings = _settings()
|
|
168
|
+
print(f"Building index: {settings.get_index_name()}")
|
|
169
|
+
print(f"Paper directory: {settings.agent.index.paper_directory}")
|
|
170
|
+
asyncio.run(get_directory_index(settings=settings))
|
|
171
|
+
print("Done.")
|
|
172
|
+
|
|
173
|
+
|
|
174
|
+
def main():
|
|
175
|
+
import sys
|
|
176
|
+
|
|
177
|
+
if len(sys.argv) > 1 and sys.argv[1] == "index":
|
|
178
|
+
_build_index()
|
|
179
|
+
else:
|
|
180
|
+
mcp.run(transport="stdio")
|