paperqa-mcp-server 0.1.0__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,8 @@
1
+ __pycache__/
2
+ *.pyc
3
+ .venv/
4
+ .env
5
+ *.egg-info/
6
+ dist/
7
+ build/
8
+ uv.lock
@@ -0,0 +1,21 @@
1
+ MIT License
2
+
3
+ Copyright (c) 2025 Menyoung Lee
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ of this software and associated documentation files (the "Software"), to deal
7
+ in the Software without restriction, including without limitation the rights
8
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ copies of the Software, and to permit persons to whom the Software is
10
+ furnished to do so, subject to the following conditions:
11
+
12
+ The above copyright notice and this permission notice shall be included in all
13
+ copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21
+ SOFTWARE.
@@ -0,0 +1,246 @@
1
+ Metadata-Version: 2.4
2
+ Name: paperqa-mcp-server
3
+ Version: 0.1.0
4
+ Summary: MCP server exposing PaperQA2 for deep synthesis across scientific papers
5
+ Project-URL: Repository, https://github.com/menyoung/paperqa-mcp-server
6
+ Project-URL: Issues, https://github.com/menyoung/paperqa-mcp-server/issues
7
+ License-Expression: MIT
8
+ License-File: LICENSE
9
+ Keywords: llm,mcp,paperqa,rag,research,scientific-literature
10
+ Classifier: Development Status :: 4 - Beta
11
+ Classifier: Intended Audience :: Science/Research
12
+ Classifier: License :: OSI Approved :: MIT License
13
+ Classifier: Programming Language :: Python :: 3
14
+ Requires-Python: >=3.11
15
+ Requires-Dist: mcp[cli]>=1.2.0
16
+ Requires-Dist: paper-qa<2026.3,>=2026.2
17
+ Requires-Dist: pillow
18
+ Description-Content-Type: text/markdown
19
+
20
+ # paperqa-mcp-server
21
+
22
+ Give Claude the ability to read, search, and synthesize across your
23
+ entire PDF library. Built on [PaperQA2](https://github.com/Future-House/paper-qa).
24
+
25
+ Point it at your Zotero storage folder (or any folder of PDFs) and ask
26
+ Claude questions that require deep reading across multiple papers.
27
+
28
+ ## Quick start
29
+
30
+ ### 1. Install uv
31
+
32
+ [uv](https://docs.astral.sh/uv/) is a Python package manager. If you don't
33
+ have it yet:
34
+
35
+ ```bash
36
+ curl -LsSf https://astral.sh/uv/install.sh | sh
37
+ ```
38
+
39
+ After installing, **restart your terminal** so `uv` is on your PATH.
40
+
41
+ Verify it works:
42
+
43
+ ```bash
44
+ uv --version
45
+ ```
46
+
47
+ ### 2. Get an OpenAI API key
48
+
49
+ PaperQA2 uses OpenAI for embeddings and internal reasoning. Get a key at
50
+ https://platform.openai.com/api-keys
51
+
52
+ ### 3. Test that it runs
53
+
54
+ This downloads ~90 Python packages the first time — that's normal:
55
+
56
+ ```bash
57
+ uvx paperqa-mcp-server --help 2>/dev/null; echo "OK if no Python errors above"
58
+ ```
59
+
60
+ ### 4. Add to Claude Desktop
61
+
62
+ 1. Open Claude Desktop
63
+ 2. Go to **Settings → Developer → Edit Config**
64
+ 3. This opens `claude_desktop_config.json`. Add a `paperqa` entry inside
65
+ `mcpServers` (create `mcpServers` if it doesn't exist):
66
+
67
+ First, find your full path to `uvx`:
68
+
69
+ ```bash
70
+ which uvx # e.g. /Users/yourname/.local/bin/uvx
71
+ ```
72
+
73
+ Then use that path in the config:
74
+
75
+ ```json
76
+ {
77
+ "mcpServers": {
78
+ "paperqa": {
79
+ "command": "/FULL/PATH/TO/uvx",
80
+ "args": ["paperqa-mcp-server"],
81
+ "env": {
82
+ "OPENAI_API_KEY": "sk-your-key-here"
83
+ }
84
+ }
85
+ }
86
+ }
87
+ ```
88
+
89
+ Replace the two placeholders:
90
+ - `/FULL/PATH/TO/uvx` — paste the output of `which uvx`
91
+ - `sk-your-key-here` — your OpenAI API key from step 2
92
+
93
+ If your PDFs are somewhere other than `~/Zotero/storage`, add a
94
+ `PAPER_DIRECTORY` entry to `env`:
95
+
96
+ ```json
97
+ "env": {
98
+ "OPENAI_API_KEY": "sk-your-key-here",
99
+ "PAPER_DIRECTORY": "/full/path/to/your/pdfs"
100
+ }
101
+ ```
102
+
103
+ 4. **Quit Claude Desktop completely** (Cmd+Q, not just close the window)
104
+ and reopen it
105
+ 5. You should see a hammer icon — click it and `paper_qa` should be listed
106
+
107
+ ### 5. Pre-build the index
108
+
109
+ Before Claude can search your papers, the server needs to build a search
110
+ index. This reads each PDF, splits it into chunks, and sends the chunks
111
+ to OpenAI's embedding API. With hundreds of papers this takes a while
112
+ and costs a few dollars in API calls.
113
+
114
+ If you have more than 10 unindexed papers, the server will refuse to
115
+ answer queries and tell you to run this step first. A few new papers
116
+ will be indexed automatically when you query.
117
+
118
+ ```bash
119
+ OPENAI_API_KEY=sk-your-key-here uvx paperqa-mcp-server index
120
+ ```
121
+
122
+ **If this crashes** with a rate limit error, just re-run the same command.
123
+ It picks up where it left off — each run indexes more files. With a large
124
+ library (500+ papers) you may need to run it a few times.
125
+
126
+ After that, the index is cached at `~/.pqa/indexes/`. Only new or changed
127
+ files get re-processed on subsequent runs.
128
+
129
+ ## Troubleshooting
130
+
131
+ **"Server disconnected" in Claude Desktop**
132
+
133
+ Claude Desktop has a short startup timeout. If `uv` needs to download
134
+ packages on first launch, it will time out. Fix: run `uvx paperqa-mcp-server`
135
+ once from the terminal first so packages are cached.
136
+
137
+ **"Index incomplete" when querying**
138
+
139
+ The server checks the index before each query. If too many papers are
140
+ unindexed, it returns a diagnostic message instead of trying (and
141
+ failing) to index them all on the fly. Fix: run the index command in
142
+ step 5.
143
+
144
+ **Hammer icon doesn't appear**
145
+
146
+ Make sure you quit Claude Desktop completely (Cmd+Q) and reopened it.
147
+ Check for JSON syntax errors in `claude_desktop_config.json` — a
148
+ missing comma is the most common mistake.
149
+
150
+ ## Use a different LLM
151
+
152
+ By default, PaperQA2 uses `gpt-4o-mini` for its internal reasoning.
153
+ This is separate from Claude — Claude calls the tool, PaperQA2 does
154
+ its own LLM calls internally to gather and synthesize evidence.
155
+
156
+ To use a different model, add env vars to your Claude Desktop config:
157
+
158
+ ```json
159
+ "env": {
160
+ "OPENAI_API_KEY": "sk-your-key-here",
161
+ "PQA_LLM": "gpt-4o",
162
+ "PQA_SUMMARY_LLM": "gpt-4o-mini"
163
+ }
164
+ ```
165
+
166
+ ## All environment variables
167
+
168
+ | Variable | Default | Purpose |
169
+ |---|---|---|
170
+ | `PAPER_DIRECTORY` | `~/Zotero/storage` | Folder containing your PDFs |
171
+ | `OPENAI_API_KEY` | — | **Required** for default embeddings |
172
+ | `PQA_LLM` | `gpt-4o-mini` | LLM for internal reasoning |
173
+ | `PQA_SUMMARY_LLM` | `gpt-4o-mini` | LLM for summarizing chunks |
174
+ | `PQA_EMBEDDING` | `text-embedding-3-small` | Embedding model |
175
+ | `ANTHROPIC_API_KEY` | — | Only if using Claude as internal LLM |
176
+
177
+ ## Works with zotero-mcp
178
+
179
+ This pairs well with [zotero-mcp](https://github.com/54yyyu/zotero-mcp):
180
+
181
+ - **paperqa-mcp-server** — deep reading and synthesis across full paper text
182
+ - **zotero-mcp** — browse your library, search metadata, read annotations
183
+
184
+ Claude can cross-reference between them — for example, finding papers
185
+ with PaperQA and then pulling up their Zotero metadata and annotations.
186
+ PaperQA2's citations include Zotero storage keys (e.g. `ABC123DE` from
187
+ `storage/ABC123DE/paper.pdf`) that Claude can use to look up items via
188
+ zotero-mcp.
189
+
190
+ ## Index implementation notes
191
+
192
+ `paperqa-mcp-server index` uses the same `_settings()` function as the MCP
193
+ server, so the index it builds is exactly the one the server will look
194
+ for. The PaperQA2 index directory name is a hash of the settings
195
+ (embedding model, chunk size, paper directory path, etc.). The settings
196
+ include:
197
+
198
+ - **Multimodal OFF** — skip image extraction from PDFs (avoids a crash on
199
+ PDFs with CMYK images)
200
+ - **Doc details OFF** — skip Crossref/Semantic Scholar metadata lookups
201
+ (avoids rate limits; Claude can get metadata from Zotero directly via
202
+ zotero-mcp)
203
+ - **Concurrency 1** — index one file at a time to stay under OpenAI's
204
+ embedding rate limit
205
+
206
+ > **Why not `pqa index`?** The `pqa` CLI constructs settings via pydantic's
207
+ > `CliSettingsSource`, which produces different defaults than constructing
208
+ > `Settings()` directly in Python (e.g. `chunk_chars` of 7000 vs 5000).
209
+ > Different settings = different index hash = server can't find the index.
210
+ > Always use `paperqa-mcp-server index` to build the index.
211
+
212
+ ## Install from GitHub (latest)
213
+
214
+ To use the latest version from the main branch instead of PyPI:
215
+
216
+ ```json
217
+ {
218
+ "mcpServers": {
219
+ "paperqa": {
220
+ "command": "/FULL/PATH/TO/uvx",
221
+ "args": ["--from", "git+https://github.com/menyoung/paperqa-mcp-server", "paperqa-mcp-server"],
222
+ "env": {
223
+ "OPENAI_API_KEY": "sk-your-key-here"
224
+ }
225
+ }
226
+ }
227
+ }
228
+ ```
229
+
230
+ To build the index from the latest main branch:
231
+
232
+ ```bash
233
+ OPENAI_API_KEY=sk-your-key-here uvx --from git+https://github.com/menyoung/paperqa-mcp-server paperqa-mcp-server index
234
+ ```
235
+
236
+ ## Development
237
+
238
+ If you want to contribute or modify the server locally:
239
+
240
+ ```bash
241
+ git clone https://github.com/menyoung/paperqa-mcp-server.git
242
+ cd paperqa-mcp-server
243
+ uv sync
244
+ uv run paperqa-mcp-server # run the server
245
+ uv run paperqa-mcp-server index # build the index
246
+ ```
@@ -0,0 +1,227 @@
1
+ # paperqa-mcp-server
2
+
3
+ Give Claude the ability to read, search, and synthesize across your
4
+ entire PDF library. Built on [PaperQA2](https://github.com/Future-House/paper-qa).
5
+
6
+ Point it at your Zotero storage folder (or any folder of PDFs) and ask
7
+ Claude questions that require deep reading across multiple papers.
8
+
9
+ ## Quick start
10
+
11
+ ### 1. Install uv
12
+
13
+ [uv](https://docs.astral.sh/uv/) is a Python package manager. If you don't
14
+ have it yet:
15
+
16
+ ```bash
17
+ curl -LsSf https://astral.sh/uv/install.sh | sh
18
+ ```
19
+
20
+ After installing, **restart your terminal** so `uv` is on your PATH.
21
+
22
+ Verify it works:
23
+
24
+ ```bash
25
+ uv --version
26
+ ```
27
+
28
+ ### 2. Get an OpenAI API key
29
+
30
+ PaperQA2 uses OpenAI for embeddings and internal reasoning. Get a key at
31
+ https://platform.openai.com/api-keys
32
+
33
+ ### 3. Test that it runs
34
+
35
+ This downloads ~90 Python packages the first time — that's normal:
36
+
37
+ ```bash
38
+ uvx paperqa-mcp-server --help 2>/dev/null; echo "OK if no Python errors above"
39
+ ```
40
+
41
+ ### 4. Add to Claude Desktop
42
+
43
+ 1. Open Claude Desktop
44
+ 2. Go to **Settings → Developer → Edit Config**
45
+ 3. This opens `claude_desktop_config.json`. Add a `paperqa` entry inside
46
+ `mcpServers` (create `mcpServers` if it doesn't exist):
47
+
48
+ First, find your full path to `uvx`:
49
+
50
+ ```bash
51
+ which uvx # e.g. /Users/yourname/.local/bin/uvx
52
+ ```
53
+
54
+ Then use that path in the config:
55
+
56
+ ```json
57
+ {
58
+ "mcpServers": {
59
+ "paperqa": {
60
+ "command": "/FULL/PATH/TO/uvx",
61
+ "args": ["paperqa-mcp-server"],
62
+ "env": {
63
+ "OPENAI_API_KEY": "sk-your-key-here"
64
+ }
65
+ }
66
+ }
67
+ }
68
+ ```
69
+
70
+ Replace the two placeholders:
71
+ - `/FULL/PATH/TO/uvx` — paste the output of `which uvx`
72
+ - `sk-your-key-here` — your OpenAI API key from step 2
73
+
74
+ If your PDFs are somewhere other than `~/Zotero/storage`, add a
75
+ `PAPER_DIRECTORY` entry to `env`:
76
+
77
+ ```json
78
+ "env": {
79
+ "OPENAI_API_KEY": "sk-your-key-here",
80
+ "PAPER_DIRECTORY": "/full/path/to/your/pdfs"
81
+ }
82
+ ```
83
+
84
+ 4. **Quit Claude Desktop completely** (Cmd+Q, not just close the window)
85
+ and reopen it
86
+ 5. You should see a hammer icon — click it and `paper_qa` should be listed
87
+
88
+ ### 5. Pre-build the index
89
+
90
+ Before Claude can search your papers, the server needs to build a search
91
+ index. This reads each PDF, splits it into chunks, and sends the chunks
92
+ to OpenAI's embedding API. With hundreds of papers this takes a while
93
+ and costs a few dollars in API calls.
94
+
95
+ If you have more than 10 unindexed papers, the server will refuse to
96
+ answer queries and tell you to run this step first. A few new papers
97
+ will be indexed automatically when you query.
98
+
99
+ ```bash
100
+ OPENAI_API_KEY=sk-your-key-here uvx paperqa-mcp-server index
101
+ ```
102
+
103
+ **If this crashes** with a rate limit error, just re-run the same command.
104
+ It picks up where it left off — each run indexes more files. With a large
105
+ library (500+ papers) you may need to run it a few times.
106
+
107
+ After that, the index is cached at `~/.pqa/indexes/`. Only new or changed
108
+ files get re-processed on subsequent runs.
109
+
110
+ ## Troubleshooting
111
+
112
+ **"Server disconnected" in Claude Desktop**
113
+
114
+ Claude Desktop has a short startup timeout. If `uv` needs to download
115
+ packages on first launch, it will time out. Fix: run `uvx paperqa-mcp-server`
116
+ once from the terminal first so packages are cached.
117
+
118
+ **"Index incomplete" when querying**
119
+
120
+ The server checks the index before each query. If too many papers are
121
+ unindexed, it returns a diagnostic message instead of trying (and
122
+ failing) to index them all on the fly. Fix: run the index command in
123
+ step 5.
124
+
125
+ **Hammer icon doesn't appear**
126
+
127
+ Make sure you quit Claude Desktop completely (Cmd+Q) and reopened it.
128
+ Check for JSON syntax errors in `claude_desktop_config.json` — a
129
+ missing comma is the most common mistake.
130
+
131
+ ## Use a different LLM
132
+
133
+ By default, PaperQA2 uses `gpt-4o-mini` for its internal reasoning.
134
+ This is separate from Claude — Claude calls the tool, PaperQA2 does
135
+ its own LLM calls internally to gather and synthesize evidence.
136
+
137
+ To use a different model, add env vars to your Claude Desktop config:
138
+
139
+ ```json
140
+ "env": {
141
+ "OPENAI_API_KEY": "sk-your-key-here",
142
+ "PQA_LLM": "gpt-4o",
143
+ "PQA_SUMMARY_LLM": "gpt-4o-mini"
144
+ }
145
+ ```
146
+
147
+ ## All environment variables
148
+
149
+ | Variable | Default | Purpose |
150
+ |---|---|---|
151
+ | `PAPER_DIRECTORY` | `~/Zotero/storage` | Folder containing your PDFs |
152
+ | `OPENAI_API_KEY` | — | **Required** for default embeddings |
153
+ | `PQA_LLM` | `gpt-4o-mini` | LLM for internal reasoning |
154
+ | `PQA_SUMMARY_LLM` | `gpt-4o-mini` | LLM for summarizing chunks |
155
+ | `PQA_EMBEDDING` | `text-embedding-3-small` | Embedding model |
156
+ | `ANTHROPIC_API_KEY` | — | Only if using Claude as internal LLM |
157
+
158
+ ## Works with zotero-mcp
159
+
160
+ This pairs well with [zotero-mcp](https://github.com/54yyyu/zotero-mcp):
161
+
162
+ - **paperqa-mcp-server** — deep reading and synthesis across full paper text
163
+ - **zotero-mcp** — browse your library, search metadata, read annotations
164
+
165
+ Claude can cross-reference between them — for example, finding papers
166
+ with PaperQA and then pulling up their Zotero metadata and annotations.
167
+ PaperQA2's citations include Zotero storage keys (e.g. `ABC123DE` from
168
+ `storage/ABC123DE/paper.pdf`) that Claude can use to look up items via
169
+ zotero-mcp.
170
+
171
+ ## Index implementation notes
172
+
173
+ `paperqa-mcp-server index` uses the same `_settings()` function as the MCP
174
+ server, so the index it builds is exactly the one the server will look
175
+ for. The PaperQA2 index directory name is a hash of the settings
176
+ (embedding model, chunk size, paper directory path, etc.). The settings
177
+ include:
178
+
179
+ - **Multimodal OFF** — skip image extraction from PDFs (avoids a crash on
180
+ PDFs with CMYK images)
181
+ - **Doc details OFF** — skip Crossref/Semantic Scholar metadata lookups
182
+ (avoids rate limits; Claude can get metadata from Zotero directly via
183
+ zotero-mcp)
184
+ - **Concurrency 1** — index one file at a time to stay under OpenAI's
185
+ embedding rate limit
186
+
187
+ > **Why not `pqa index`?** The `pqa` CLI constructs settings via pydantic's
188
+ > `CliSettingsSource`, which produces different defaults than constructing
189
+ > `Settings()` directly in Python (e.g. `chunk_chars` of 7000 vs 5000).
190
+ > Different settings = different index hash = server can't find the index.
191
+ > Always use `paperqa-mcp-server index` to build the index.
192
+
193
+ ## Install from GitHub (latest)
194
+
195
+ To use the latest version from the main branch instead of PyPI:
196
+
197
+ ```json
198
+ {
199
+ "mcpServers": {
200
+ "paperqa": {
201
+ "command": "/FULL/PATH/TO/uvx",
202
+ "args": ["--from", "git+https://github.com/menyoung/paperqa-mcp-server", "paperqa-mcp-server"],
203
+ "env": {
204
+ "OPENAI_API_KEY": "sk-your-key-here"
205
+ }
206
+ }
207
+ }
208
+ }
209
+ ```
210
+
211
+ To build the index from the latest main branch:
212
+
213
+ ```bash
214
+ OPENAI_API_KEY=sk-your-key-here uvx --from git+https://github.com/menyoung/paperqa-mcp-server paperqa-mcp-server index
215
+ ```
216
+
217
+ ## Development
218
+
219
+ If you want to contribute or modify the server locally:
220
+
221
+ ```bash
222
+ git clone https://github.com/menyoung/paperqa-mcp-server.git
223
+ cd paperqa-mcp-server
224
+ uv sync
225
+ uv run paperqa-mcp-server # run the server
226
+ uv run paperqa-mcp-server index # build the index
227
+ ```
@@ -0,0 +1,33 @@
1
+ [build-system]
2
+ requires = ["hatchling"]
3
+ build-backend = "hatchling.build"
4
+
5
+ [project]
6
+ name = "paperqa-mcp-server"
7
+ version = "0.1.0"
8
+ description = "MCP server exposing PaperQA2 for deep synthesis across scientific papers"
9
+ readme = "README.md"
10
+ license = "MIT"
11
+ requires-python = ">=3.11"
12
+ keywords = ["mcp", "paperqa", "scientific-literature", "research", "llm", "rag"]
13
+ classifiers = [
14
+ "Development Status :: 4 - Beta",
15
+ "Intended Audience :: Science/Research",
16
+ "License :: OSI Approved :: MIT License",
17
+ "Programming Language :: Python :: 3",
18
+ ]
19
+ dependencies = [
20
+ "paper-qa>=2026.2,<2026.3",
21
+ "mcp[cli]>=1.2.0",
22
+ "pillow",
23
+ ]
24
+
25
+ [project.scripts]
26
+ paperqa-mcp-server = "paperqa_mcp_server:main"
27
+
28
+ [project.urls]
29
+ Repository = "https://github.com/menyoung/paperqa-mcp-server"
30
+ Issues = "https://github.com/menyoung/paperqa-mcp-server/issues"
31
+
32
+ [tool.hatch.build.targets.wheel]
33
+ packages = ["src/paperqa_mcp_server"]
@@ -0,0 +1,180 @@
1
+ """MCP server exposing PaperQA2 for deep synthesis across scientific papers."""
2
+
3
+ from __future__ import annotations
4
+
5
+ import os
6
+ import pathlib
7
+ import pickle
8
+ import zlib
9
+
10
+ from mcp.server.fastmcp import FastMCP
11
+ from paperqa import Settings, agent_query
12
+
13
+ mcp = FastMCP("paperqa")
14
+
15
+
16
+ def _settings() -> Settings:
17
+ return Settings(
18
+ llm=os.environ.get("PQA_LLM", "gpt-4o-mini"),
19
+ summary_llm=os.environ.get("PQA_SUMMARY_LLM", "gpt-4o-mini"),
20
+ embedding=os.environ.get("PQA_EMBEDDING", "text-embedding-3-small"),
21
+ temperature=0.1,
22
+ parsing={"multimodal": "OFF", "use_doc_details": False},
23
+ answer={"evidence_k": 15, "answer_max_sources": 10},
24
+ agent={
25
+ "index": {
26
+ "paper_directory": os.environ.get(
27
+ "PAPER_DIRECTORY",
28
+ os.path.expanduser("~/Zotero/storage"),
29
+ ),
30
+ "concurrency": 1,
31
+ }
32
+ },
33
+ )
34
+
35
+
36
+ _UNINDEXED_THRESHOLD = 10
37
+
38
+
39
+ def _index_status(settings: Settings | None = None) -> dict:
40
+ """Read the index manifest and compare against files in the paper directory.
41
+
42
+ Returns a dict with keys: indexed, errored, unindexed, total, ready, message.
43
+ """
44
+ if settings is None:
45
+ settings = _settings()
46
+ index_name = settings.get_index_name()
47
+ index_dir = pathlib.Path(settings.agent.index.index_directory) / index_name
48
+ paper_dir = pathlib.Path(settings.agent.index.paper_directory)
49
+ files_filter = settings.agent.index.files_filter
50
+
51
+ # Discover files PaperQA would try to index (same filter as paperqa)
52
+ total = 0
53
+ if paper_dir.is_dir():
54
+ total = sum(1 for f in paper_dir.rglob("*") if files_filter(f))
55
+
56
+ # Read the manifest
57
+ manifest_path = index_dir / "files.zip"
58
+ manifest: dict[str, str] = {}
59
+ manifest_error = False
60
+ if manifest_path.exists():
61
+ try:
62
+ manifest = pickle.loads(zlib.decompress(manifest_path.read_bytes()))
63
+ except Exception:
64
+ manifest_error = True
65
+
66
+ errored = sum(1 for v in manifest.values() if v == "ERROR")
67
+ indexed = len(manifest) - errored
68
+ unindexed = max(0, total - len(manifest))
69
+
70
+ ready = unindexed <= _UNINDEXED_THRESHOLD and not manifest_error
71
+ if manifest_error:
72
+ message = (
73
+ f"Index manifest is corrupt ({total} files on disk)."
74
+ " Rebuild the index from the terminal"
75
+ " — see the paperqa-mcp-server README, step 5."
76
+ )
77
+ else:
78
+ message = f"{indexed}/{total} papers indexed"
79
+ if errored:
80
+ message += f", {errored} errors"
81
+ if unindexed:
82
+ message += f", {unindexed} unindexed"
83
+ if ready:
84
+ message += ". Ready to query."
85
+ else:
86
+ message += (
87
+ ". Queries will fail or time out."
88
+ " Please finish building the index from the terminal"
89
+ " — see the paperqa-mcp-server README, step 5."
90
+ )
91
+
92
+ return {
93
+ "indexed": indexed,
94
+ "errored": errored,
95
+ "unindexed": unindexed,
96
+ "total": total,
97
+ "ready": ready,
98
+ "message": message,
99
+ }
100
+
101
+
102
+ @mcp.tool()
103
+ async def index_status() -> str:
104
+ """Check the health of the paper index.
105
+
106
+ Returns a summary of how many papers are indexed, how many have
107
+ errors, and how many are unindexed. Use this to diagnose why
108
+ paper_qa queries might be failing or timing out.
109
+ """
110
+ status = _index_status()
111
+ lines = [
112
+ f"Index status: {status['message']}",
113
+ f" Indexed: {status['indexed']}",
114
+ f" Errors: {status['errored']}",
115
+ f" Unindexed: {status['unindexed']}",
116
+ f" Total files: {status['total']}",
117
+ ]
118
+ return "\n".join(lines)
119
+
120
+
121
+ @mcp.tool()
122
+ async def paper_qa(query: str) -> str:
123
+ """Search and synthesize across all papers in the library.
124
+
125
+ Use this for questions that require deep reading and synthesis
126
+ across multiple scientific papers — e.g. "What methods have been
127
+ used to recycle lithium from spent batteries?" or "Compare the
128
+ thermal stability of PEEK vs PTFE in the literature."
129
+
130
+ Returns a detailed answer with inline citations. Each citation
131
+ includes a file path containing an 8-character Zotero storage key
132
+ (e.g. ABC123DE from storage/ABC123DE/paper.pdf). You can use these
133
+ keys with zotero-mcp tools to look up the full bibliographic record,
134
+ read annotations, or find related items.
135
+
136
+ Not for quick metadata lookups or library browsing — use Zotero
137
+ tools for that.
138
+
139
+ If this tool returns "Index incomplete", the paper index has not
140
+ been fully built yet. Tell the user to run the index build command
141
+ from the terminal (see the paperqa-mcp-server README, step 5).
142
+ Do not retry the query — it will give the same result until the
143
+ index is built.
144
+
145
+ This tool can take 30–90 seconds to respond when working normally.
146
+ """
147
+ settings = _settings()
148
+ status = _index_status(settings)
149
+ if not status["ready"]:
150
+ return f"Index incomplete: {status['message']}"
151
+
152
+ try:
153
+ response = await agent_query(query=query, settings=settings)
154
+ except Exception as e:
155
+ return f"PaperQA error: {e}"
156
+ if not response.session.formatted_answer:
157
+ return f"PaperQA could not answer (status: {response.status})."
158
+ return response.session.formatted_answer
159
+
160
+
161
+ def _build_index() -> None:
162
+ """Build the search index using the same settings as the MCP server."""
163
+ import asyncio
164
+
165
+ from paperqa.agents.search import get_directory_index
166
+
167
+ settings = _settings()
168
+ print(f"Building index: {settings.get_index_name()}")
169
+ print(f"Paper directory: {settings.agent.index.paper_directory}")
170
+ asyncio.run(get_directory_index(settings=settings))
171
+ print("Done.")
172
+
173
+
174
+ def main():
175
+ import sys
176
+
177
+ if len(sys.argv) > 1 and sys.argv[1] == "index":
178
+ _build_index()
179
+ else:
180
+ mcp.run(transport="stdio")