@sylphx/pdf-reader-mcp 1.2.0 β 1.3.2
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +544 -250
- package/dist/index.js +524 -45
- package/package.json +46 -36
- package/dist/handlers/index.js +0 -4
- package/dist/handlers/readPdf.js +0 -168
- package/dist/pdf/extractor.js +0 -265
- package/dist/pdf/loader.js +0 -53
- package/dist/pdf/parser.js +0 -94
- package/dist/schemas/readPdf.js +0 -55
- package/dist/types/pdf.js +0 -2
- package/dist/utils/pathUtils.js +0 -30
package/README.md
CHANGED
|
@@ -1,62 +1,122 @@
|
|
|
1
|
-
|
|
1
|
+
<div align="center">
|
|
2
2
|
|
|
3
|
-
|
|
4
|
-
[](https://github.com/sylphlab/pdf-reader-mcp/actions/workflows/ci.yml)
|
|
5
|
-
[](https://codecov.io/gh/sylphlab/pdf-reader-mcp)
|
|
6
|
-
[](https://badge.fury.io/js/%40sylphlab%2Fpdf-reader-mcp)
|
|
7
|
-
[](https://opensource.org/licenses/MIT)
|
|
8
|
-
[](https://smithery.ai/server/@sylphxltd/pdf-reader-mcp)
|
|
3
|
+
# PDF Reader MCP π
|
|
9
4
|
|
|
10
|
-
|
|
11
|
-
<img width="380" height="200" src="https://glama.ai/mcp/servers/@sylphlab/pdf-reader-mcp/badge" alt="PDF Reader Server MCP server" />
|
|
12
|
-
</a>
|
|
5
|
+
**Production-ready PDF processing server for AI agents**
|
|
13
6
|
|
|
14
|
-
|
|
7
|
+
[](https://github.com/SylphxAI/pdf-reader-mcp/actions/workflows/ci.yml)
|
|
8
|
+
[](https://codecov.io/gh/SylphxAI/pdf-reader-mcp)
|
|
9
|
+
[](https://www.npmjs.com/package/@sylphx/pdf-reader-mcp)
|
|
10
|
+
[](https://pdf-reader-msu3esos4-sylphx.vercel.app)
|
|
11
|
+
[](https://www.npmjs.com/package/@sylphx/pdf-reader-mcp)
|
|
12
|
+
[](https://opensource.org/licenses/MIT)
|
|
15
13
|
|
|
16
|
-
|
|
14
|
+
**5-10x faster parallel processing** β’ **Y-coordinate content ordering** β’ **94%+ test coverage** β’ **103 tests passing**
|
|
17
15
|
|
|
18
|
-
-
|
|
19
|
-
-
|
|
20
|
-
|
|
21
|
-
- π’ **Count pages** in PDF documents
|
|
22
|
-
- π **Support for both local files and URLs**
|
|
23
|
-
- π‘οΈ **Secure** - Confines file access to project root directory
|
|
24
|
-
- β‘ **Fast** - Parallel processing for maximum performance
|
|
25
|
-
- π **Batch processing** - Handle multiple PDFs in a single request
|
|
26
|
-
- π¦ **Multiple deployment options** - npm or Smithery
|
|
27
|
-
|
|
28
|
-
## π Recent Updates (October 2025)
|
|
29
|
-
|
|
30
|
-
- β
**Fixed critical bugs**: Buffer/Uint8Array compatibility for PDF.js v5.x
|
|
31
|
-
- β
**Fixed schema validation**: Resolved `exclusiveMinimum` issue affecting Windsurf, Mistral API, and other tools
|
|
32
|
-
- β
**Improved metadata extraction**: Robust fallback handling for PDF.js compatibility
|
|
33
|
-
- β
**Updated dependencies**: All packages updated to latest versions
|
|
34
|
-
- β
**Migrated to Biome**: 50x faster linting and formatting with unified tooling
|
|
35
|
-
- β
**Added image extraction**: Extract embedded images from PDF pages
|
|
36
|
-
- β
**Performance optimization**: Parallel page processing for 5-10x speedup
|
|
37
|
-
- β
**Deep refactoring**: Modular architecture with 98.9% test coverage (90 tests)
|
|
16
|
+
<a href="https://mseep.ai/app/SylphxAI-pdf-reader-mcp">
|
|
17
|
+
<img src="https://mseep.net/pr/SylphxAI-pdf-reader-mcp-badge.png" alt="Security Validated" width="200"/>
|
|
18
|
+
</a>
|
|
38
19
|
|
|
39
|
-
|
|
20
|
+
</div>
|
|
40
21
|
|
|
41
|
-
|
|
22
|
+
---
|
|
42
23
|
|
|
43
|
-
|
|
24
|
+
## π Overview
|
|
44
25
|
|
|
45
|
-
|
|
46
|
-
|
|
26
|
+
PDF Reader MCP is a **production-ready** Model Context Protocol server that empowers AI agents with **enterprise-grade PDF processing capabilities**. Extract text, images, and metadata with unmatched performance and reliability.
|
|
27
|
+
|
|
28
|
+
**The Problem:**
|
|
29
|
+
```typescript
|
|
30
|
+
// Traditional PDF processing
|
|
31
|
+
- Sequential page processing (slow)
|
|
32
|
+
- No natural content ordering
|
|
33
|
+
- Complex path handling
|
|
34
|
+
- Poor error isolation
|
|
47
35
|
```
|
|
48
36
|
|
|
49
|
-
|
|
37
|
+
**The Solution:**
|
|
38
|
+
```typescript
|
|
39
|
+
// PDF Reader MCP
|
|
40
|
+
- 5-10x faster parallel processing β‘
|
|
41
|
+
- Y-coordinate based ordering π
|
|
42
|
+
- Flexible path support (absolute/relative) π―
|
|
43
|
+
- Per-page error resilience π‘οΈ
|
|
44
|
+
- 94%+ test coverage β
|
|
45
|
+
```
|
|
46
|
+
|
|
47
|
+
**Result: Production-ready PDF processing that scales.**
|
|
48
|
+
|
|
49
|
+
---
|
|
50
|
+
|
|
51
|
+
## β‘ Key Features
|
|
52
|
+
|
|
53
|
+
### Performance
|
|
54
|
+
|
|
55
|
+
- π **5-10x faster** than sequential with automatic parallelization
|
|
56
|
+
- β‘ **12,933 ops/sec** error handling, 5,575 ops/sec text extraction
|
|
57
|
+
- π¨ **Process 50-page PDFs** in seconds with multi-core utilization
|
|
58
|
+
- π¦ **Lightweight** with minimal dependencies
|
|
59
|
+
|
|
60
|
+
### Developer Experience
|
|
61
|
+
|
|
62
|
+
- π― **Path Flexibility** - Absolute & relative paths, Windows/Unix support (v1.3.0)
|
|
63
|
+
- πΌοΈ **Smart Ordering** - Y-coordinate based content preserves document layout
|
|
64
|
+
- π‘οΈ **Type Safe** - Full TypeScript with strict mode enabled
|
|
65
|
+
- π **Battle-tested** - 103 tests, 94%+ coverage, 98%+ function coverage
|
|
66
|
+
- π¨ **Simple API** - Single tool handles all operations elegantly
|
|
67
|
+
|
|
68
|
+
---
|
|
69
|
+
|
|
70
|
+
## π Performance Benchmarks
|
|
71
|
+
|
|
72
|
+
Real-world performance from production testing:
|
|
73
|
+
|
|
74
|
+
| Operation | Ops/sec | Performance | Use Case |
|
|
75
|
+
|-----------|---------|-------------|----------|
|
|
76
|
+
| **Error handling** | 12,933 | β‘β‘β‘β‘β‘ | Validation & safety |
|
|
77
|
+
| **Extract full text** | 5,575 | β‘β‘β‘β‘ | Document analysis |
|
|
78
|
+
| **Extract page** | 5,329 | β‘β‘β‘β‘ | Single page ops |
|
|
79
|
+
| **Multiple pages** | 5,242 | β‘β‘β‘β‘ | Batch processing |
|
|
80
|
+
| **Metadata only** | 4,912 | β‘β‘β‘ | Quick inspection |
|
|
81
|
+
|
|
82
|
+
### Parallel Processing Speedup
|
|
83
|
+
|
|
84
|
+
| Document | Sequential | Parallel | Speedup |
|
|
85
|
+
|----------|-----------|----------|---------|
|
|
86
|
+
| **10-page PDF** | ~2s | ~0.3s | **5-8x faster** |
|
|
87
|
+
| **50-page PDF** | ~10s | ~1s | **10x faster** |
|
|
88
|
+
| **100+ pages** | ~20s | ~2s | **Linear scaling** with CPU cores |
|
|
89
|
+
|
|
90
|
+
*Benchmarks vary based on PDF complexity and system resources.*
|
|
91
|
+
|
|
92
|
+
---
|
|
50
93
|
|
|
51
|
-
|
|
94
|
+
## π¦ Installation
|
|
52
95
|
|
|
53
96
|
```bash
|
|
97
|
+
# Quick start - zero installation
|
|
98
|
+
npx @sylphx/pdf-reader-mcp
|
|
99
|
+
|
|
100
|
+
# Using pnpm (recommended)
|
|
54
101
|
pnpm add @sylphx/pdf-reader-mcp
|
|
55
|
-
|
|
102
|
+
|
|
103
|
+
# Using npm
|
|
56
104
|
npm install @sylphx/pdf-reader-mcp
|
|
105
|
+
|
|
106
|
+
# Using yarn
|
|
107
|
+
yarn add @sylphx/pdf-reader-mcp
|
|
108
|
+
|
|
109
|
+
# For Claude Desktop (easiest)
|
|
110
|
+
npx -y @smithery/cli install @sylphx/pdf-reader-mcp --client claude
|
|
57
111
|
```
|
|
58
112
|
|
|
59
|
-
|
|
113
|
+
---
|
|
114
|
+
|
|
115
|
+
## π― Quick Start
|
|
116
|
+
|
|
117
|
+
### Configuration
|
|
118
|
+
|
|
119
|
+
Add to your MCP client (`claude_desktop_config.json`, Cursor, Cline):
|
|
60
120
|
|
|
61
121
|
```json
|
|
62
122
|
{
|
|
@@ -69,357 +129,591 @@ Configure your MCP client (e.g., Claude Desktop, Cursor):
|
|
|
69
129
|
}
|
|
70
130
|
```
|
|
71
131
|
|
|
72
|
-
|
|
132
|
+
### Basic Usage
|
|
73
133
|
|
|
74
|
-
|
|
75
|
-
|
|
76
|
-
|
|
77
|
-
|
|
78
|
-
|
|
79
|
-
|
|
80
|
-
|
|
134
|
+
```json
|
|
135
|
+
{
|
|
136
|
+
"sources": [{
|
|
137
|
+
"path": "documents/report.pdf"
|
|
138
|
+
}],
|
|
139
|
+
"include_full_text": true,
|
|
140
|
+
"include_metadata": true,
|
|
141
|
+
"include_page_count": true
|
|
142
|
+
}
|
|
81
143
|
```
|
|
82
144
|
|
|
83
|
-
|
|
84
|
-
|
|
85
|
-
|
|
86
|
-
|
|
87
|
-
|
|
145
|
+
**Result:**
|
|
146
|
+
- β
Full text content extracted
|
|
147
|
+
- β
PDF metadata (author, title, dates)
|
|
148
|
+
- β
Total page count
|
|
149
|
+
- β
Structural sharing - unchanged parts preserved
|
|
88
150
|
|
|
89
|
-
###
|
|
151
|
+
### Extract Specific Pages
|
|
90
152
|
|
|
91
153
|
```json
|
|
92
154
|
{
|
|
93
|
-
"sources": [
|
|
94
|
-
|
|
95
|
-
|
|
96
|
-
|
|
97
|
-
|
|
98
|
-
],
|
|
99
|
-
"include_metadata": true
|
|
155
|
+
"sources": [{
|
|
156
|
+
"path": "documents/manual.pdf",
|
|
157
|
+
"pages": "1-5,10,15-20"
|
|
158
|
+
}],
|
|
159
|
+
"include_full_text": true
|
|
100
160
|
}
|
|
101
161
|
```
|
|
102
162
|
|
|
103
|
-
###
|
|
163
|
+
### Absolute Paths (v1.3.0+)
|
|
104
164
|
|
|
105
165
|
```json
|
|
166
|
+
// Windows - Both formats work!
|
|
106
167
|
{
|
|
107
|
-
"sources": [{
|
|
108
|
-
|
|
109
|
-
|
|
110
|
-
"include_full_text":
|
|
168
|
+
"sources": [{
|
|
169
|
+
"path": "C:\\Users\\John\\Documents\\report.pdf"
|
|
170
|
+
}],
|
|
171
|
+
"include_full_text": true
|
|
111
172
|
}
|
|
112
|
-
```
|
|
113
|
-
|
|
114
|
-
### Example 3: Read from URL
|
|
115
173
|
|
|
116
|
-
|
|
174
|
+
// Unix/Mac
|
|
117
175
|
{
|
|
118
|
-
"sources": [
|
|
119
|
-
|
|
120
|
-
|
|
121
|
-
}
|
|
122
|
-
],
|
|
176
|
+
"sources": [{
|
|
177
|
+
"path": "/home/user/documents/contract.pdf"
|
|
178
|
+
}],
|
|
123
179
|
"include_full_text": true
|
|
124
180
|
}
|
|
125
181
|
```
|
|
126
182
|
|
|
127
|
-
|
|
183
|
+
**No more** `"Absolute paths are not allowed"` **errors!**
|
|
184
|
+
|
|
185
|
+
### Extract Images with Natural Ordering
|
|
128
186
|
|
|
129
187
|
```json
|
|
130
188
|
{
|
|
131
|
-
"sources": [
|
|
132
|
-
|
|
133
|
-
|
|
134
|
-
|
|
135
|
-
|
|
189
|
+
"sources": [{
|
|
190
|
+
"path": "presentation.pdf",
|
|
191
|
+
"pages": [1, 2, 3]
|
|
192
|
+
}],
|
|
193
|
+
"include_images": true,
|
|
136
194
|
"include_full_text": true
|
|
137
195
|
}
|
|
138
196
|
```
|
|
139
197
|
|
|
140
|
-
|
|
198
|
+
**Response includes:**
|
|
199
|
+
- Text and images in **exact document order** (Y-coordinate sorted)
|
|
200
|
+
- Base64-encoded images with metadata (width, height, format)
|
|
201
|
+
- Natural reading flow preserved for AI comprehension
|
|
202
|
+
|
|
203
|
+
### Batch Processing
|
|
141
204
|
|
|
142
205
|
```json
|
|
143
206
|
{
|
|
144
207
|
"sources": [
|
|
145
|
-
{
|
|
146
|
-
|
|
147
|
-
|
|
148
|
-
}
|
|
208
|
+
{ "path": "C:\\Reports\\Q1.pdf", "pages": "1-10" },
|
|
209
|
+
{ "path": "/home/user/Q2.pdf", "pages": "1-10" },
|
|
210
|
+
{ "url": "https://example.com/Q3.pdf" }
|
|
149
211
|
],
|
|
150
|
-
"include_images": true,
|
|
151
212
|
"include_full_text": true
|
|
152
213
|
}
|
|
153
214
|
```
|
|
154
215
|
|
|
155
|
-
**
|
|
156
|
-
- Text content from each page
|
|
157
|
-
- Embedded images as base64-encoded data with metadata (width, height, format)
|
|
158
|
-
- Each image includes page number and index
|
|
159
|
-
|
|
160
|
-
**Note**: Image extraction works best with JPEG and PNG images. Large PDFs with many images may produce large responses.
|
|
216
|
+
β‘ **All PDFs processed in parallel automatically!**
|
|
161
217
|
|
|
162
|
-
|
|
218
|
+
---
|
|
163
219
|
|
|
164
|
-
|
|
220
|
+
## β¨ Features
|
|
165
221
|
|
|
166
|
-
|
|
222
|
+
### Core Capabilities
|
|
223
|
+
- β
**Text Extraction** - Full document or specific pages with intelligent parsing
|
|
224
|
+
- β
**Image Extraction** - Base64-encoded with complete metadata (width, height, format)
|
|
225
|
+
- β
**Content Ordering** - Y-coordinate based layout preservation for natural reading flow
|
|
226
|
+
- β
**Metadata Extraction** - Author, title, creation date, and custom properties
|
|
227
|
+
- β
**Page Counting** - Fast enumeration without loading full content
|
|
228
|
+
- β
**Dual Sources** - Local files (absolute or relative paths) and HTTP/HTTPS URLs
|
|
229
|
+
- β
**Batch Processing** - Multiple PDFs processed concurrently
|
|
230
|
+
|
|
231
|
+
### Advanced Features
|
|
232
|
+
- β‘ **5-10x Performance** - Parallel page processing with Promise.all
|
|
233
|
+
- π― **Smart Pagination** - Extract ranges like "1-5,10-15,20"
|
|
234
|
+
- πΌοΈ **Multi-Format Images** - RGB, RGBA, Grayscale with automatic detection
|
|
235
|
+
- π‘οΈ **Path Flexibility** - Windows, Unix, and relative paths all supported (v1.3.0)
|
|
236
|
+
- π **Error Resilience** - Per-page error isolation with detailed messages
|
|
237
|
+
- π **Large File Support** - Efficient streaming and memory management
|
|
238
|
+
- π **Type Safe** - Full TypeScript with strict mode enabled
|
|
167
239
|
|
|
168
|
-
|
|
169
|
-
- **Range string**: `"1-10"` (extracts pages 1 through 10)
|
|
170
|
-
- **Multiple ranges**: `"1-5,10-15,20"` (commas separate ranges and individual pages)
|
|
171
|
-
- **Omit for all pages**: Don't include the `pages` field to extract all pages
|
|
240
|
+
---
|
|
172
241
|
|
|
173
|
-
|
|
242
|
+
## π What's New in v1.3.0
|
|
174
243
|
|
|
175
|
-
|
|
244
|
+
### π Absolute Paths Now Supported!
|
|
176
245
|
|
|
177
246
|
```json
|
|
247
|
+
// β
Windows
|
|
248
|
+
{ "path": "C:\\Users\\John\\Documents\\report.pdf" }
|
|
249
|
+
{ "path": "C:/Users/John/Documents/report.pdf" }
|
|
250
|
+
|
|
251
|
+
// β
Unix/Mac
|
|
252
|
+
{ "path": "/home/john/documents/report.pdf" }
|
|
253
|
+
{ "path": "/Users/john/Documents/report.pdf" }
|
|
254
|
+
|
|
255
|
+
// β
Relative (still works)
|
|
256
|
+
{ "path": "documents/report.pdf" }
|
|
257
|
+
```
|
|
258
|
+
|
|
259
|
+
**Other Improvements:**
|
|
260
|
+
- π Fixed Zod validation error handling
|
|
261
|
+
- π¦ Updated all dependencies to latest versions
|
|
262
|
+
- β
103 tests passing, 94%+ coverage maintained
|
|
263
|
+
|
|
264
|
+
<details>
|
|
265
|
+
<summary><strong>π View Full Changelog</strong></summary>
|
|
266
|
+
|
|
267
|
+
<br/>
|
|
268
|
+
|
|
269
|
+
**v1.2.0 - Content Ordering**
|
|
270
|
+
- Y-coordinate based text and image ordering
|
|
271
|
+
- Natural reading flow for AI models
|
|
272
|
+
- Intelligent line grouping
|
|
273
|
+
|
|
274
|
+
**v1.1.0 - Image Extraction & Performance**
|
|
275
|
+
- Base64-encoded image extraction
|
|
276
|
+
- 10x speedup with parallel processing
|
|
277
|
+
- Comprehensive test coverage (94%+)
|
|
278
|
+
|
|
279
|
+
[View Full Changelog β](./CHANGELOG.md)
|
|
280
|
+
|
|
281
|
+
</details>
|
|
282
|
+
|
|
283
|
+
---
|
|
284
|
+
|
|
285
|
+
## π API Reference
|
|
286
|
+
|
|
287
|
+
### `read_pdf` Tool
|
|
288
|
+
|
|
289
|
+
The single tool that handles all PDF operations.
|
|
290
|
+
|
|
291
|
+
#### Parameters
|
|
292
|
+
|
|
293
|
+
| Parameter | Type | Description | Default |
|
|
294
|
+
|-----------|------|-------------|---------|
|
|
295
|
+
| `sources` | Array | List of PDF sources to process | Required |
|
|
296
|
+
| `include_full_text` | boolean | Extract full text content | `false` |
|
|
297
|
+
| `include_metadata` | boolean | Extract PDF metadata | `true` |
|
|
298
|
+
| `include_page_count` | boolean | Include total page count | `true` |
|
|
299
|
+
| `include_images` | boolean | Extract embedded images | `false` |
|
|
300
|
+
|
|
301
|
+
#### Source Object
|
|
302
|
+
|
|
303
|
+
```typescript
|
|
178
304
|
{
|
|
179
|
-
|
|
180
|
-
|
|
181
|
-
|
|
182
|
-
"pages": "1-10"
|
|
183
|
-
}
|
|
184
|
-
]
|
|
305
|
+
path?: string; // Local file path (absolute or relative)
|
|
306
|
+
url?: string; // HTTP/HTTPS URL to PDF
|
|
307
|
+
pages?: string | number[]; // Pages to extract: "1-5,10" or [1,2,3]
|
|
185
308
|
}
|
|
186
309
|
```
|
|
187
310
|
|
|
188
|
-
|
|
311
|
+
#### Examples
|
|
189
312
|
|
|
190
|
-
|
|
191
|
-
|
|
192
|
-
|
|
313
|
+
**Metadata only (fast):**
|
|
314
|
+
```json
|
|
315
|
+
{
|
|
316
|
+
"sources": [{ "path": "large.pdf" }],
|
|
317
|
+
"include_metadata": true,
|
|
318
|
+
"include_page_count": true,
|
|
319
|
+
"include_full_text": false
|
|
320
|
+
}
|
|
321
|
+
```
|
|
193
322
|
|
|
323
|
+
**From URL:**
|
|
194
324
|
```json
|
|
195
325
|
{
|
|
196
|
-
"sources": [{
|
|
197
|
-
|
|
326
|
+
"sources": [{
|
|
327
|
+
"url": "https://arxiv.org/pdf/2301.00001.pdf"
|
|
328
|
+
}],
|
|
329
|
+
"include_full_text": true
|
|
198
330
|
}
|
|
199
331
|
```
|
|
200
332
|
|
|
201
|
-
**
|
|
333
|
+
**Page ranges:**
|
|
202
334
|
```json
|
|
203
335
|
{
|
|
204
|
-
"
|
|
205
|
-
|
|
206
|
-
|
|
207
|
-
|
|
208
|
-
"width": 800,
|
|
209
|
-
"height": 600,
|
|
210
|
-
"format": "rgb",
|
|
211
|
-
"data": "base64-encoded-image-data..."
|
|
212
|
-
}
|
|
213
|
-
]
|
|
336
|
+
"sources": [{
|
|
337
|
+
"path": "manual.pdf",
|
|
338
|
+
"pages": "1-5,10-15,20" // Pages 1,2,3,4,5,10,11,12,13,14,15,20
|
|
339
|
+
}]
|
|
214
340
|
}
|
|
215
341
|
```
|
|
216
342
|
|
|
217
|
-
|
|
218
|
-
- β
**RGB** - Standard color images (most common)
|
|
219
|
-
- β
**RGBA** - Images with transparency
|
|
220
|
-
- β
**Grayscale** - Black and white images
|
|
221
|
-
- β
Works with JPEG, PNG, and other embedded formats
|
|
343
|
+
---
|
|
222
344
|
|
|
223
|
-
|
|
224
|
-
- πΈ Image extraction increases response size significantly
|
|
225
|
-
- πΈ Useful for AI models with vision capabilities
|
|
226
|
-
- πΈ Set `include_images: false` (default) to extract text only
|
|
227
|
-
- πΈ Combine with `pages` parameter to limit extraction scope
|
|
345
|
+
## π§ Advanced Usage
|
|
228
346
|
|
|
229
|
-
|
|
347
|
+
<details>
|
|
348
|
+
<summary><strong>π Y-Coordinate Content Ordering</strong></summary>
|
|
230
349
|
|
|
231
|
-
|
|
350
|
+
<br/>
|
|
232
351
|
|
|
233
|
-
|
|
234
|
-
β **Bad**: `"path": "/Users/john/documents/report.pdf"`
|
|
352
|
+
Content is returned in natural reading order based on Y-coordinates:
|
|
235
353
|
|
|
236
|
-
|
|
354
|
+
```
|
|
355
|
+
Document Layout:
|
|
356
|
+
βββββββββββββββββββββββ
|
|
357
|
+
β [Title] Y:100 β
|
|
358
|
+
β [Image] Y:150 β
|
|
359
|
+
β [Text] Y:400 β
|
|
360
|
+
β [Photo A] Y:500 β
|
|
361
|
+
β [Photo B] Y:550 β
|
|
362
|
+
βββββββββββββββββββββββ
|
|
363
|
+
|
|
364
|
+
Response Order:
|
|
365
|
+
[
|
|
366
|
+
{ type: "text", text: "Title..." },
|
|
367
|
+
{ type: "image", data: "..." },
|
|
368
|
+
{ type: "text", text: "..." },
|
|
369
|
+
{ type: "image", data: "..." },
|
|
370
|
+
{ type: "image", data: "..." }
|
|
371
|
+
]
|
|
372
|
+
```
|
|
237
373
|
|
|
238
|
-
|
|
374
|
+
**Benefits:**
|
|
375
|
+
- AI understands spatial relationships
|
|
376
|
+
- Natural document comprehension
|
|
377
|
+
- Perfect for vision-enabled models
|
|
378
|
+
- Automatic multi-line text grouping
|
|
239
379
|
|
|
240
|
-
|
|
380
|
+
</details>
|
|
241
381
|
|
|
242
|
-
|
|
382
|
+
<details>
|
|
383
|
+
<summary><strong>πΌοΈ Image Extraction</strong></summary>
|
|
243
384
|
|
|
244
|
-
|
|
245
|
-
|
|
246
|
-
|
|
385
|
+
<br/>
|
|
386
|
+
|
|
387
|
+
**Enable extraction:**
|
|
388
|
+
```json
|
|
389
|
+
{
|
|
390
|
+
"sources": [{ "path": "manual.pdf" }],
|
|
391
|
+
"include_images": true
|
|
392
|
+
}
|
|
393
|
+
```
|
|
394
|
+
|
|
395
|
+
**Response format:**
|
|
396
|
+
```json
|
|
397
|
+
{
|
|
398
|
+
"images": [{
|
|
399
|
+
"page": 1,
|
|
400
|
+
"index": 0,
|
|
401
|
+
"width": 1920,
|
|
402
|
+
"height": 1080,
|
|
403
|
+
"format": "rgb",
|
|
404
|
+
"data": "base64-encoded-png..."
|
|
405
|
+
}]
|
|
406
|
+
}
|
|
247
407
|
```
|
|
248
408
|
|
|
249
|
-
|
|
409
|
+
**Supported formats:** RGB, RGBA, Grayscale
|
|
410
|
+
**Auto-detected:** JPEG, PNG, and other embedded formats
|
|
250
411
|
|
|
251
|
-
|
|
412
|
+
</details>
|
|
252
413
|
|
|
253
|
-
|
|
414
|
+
<details>
|
|
415
|
+
<summary><strong>π Path Configuration</strong></summary>
|
|
254
416
|
|
|
255
|
-
|
|
256
|
-
2. Incorrect working directory
|
|
417
|
+
<br/>
|
|
257
418
|
|
|
258
|
-
**
|
|
419
|
+
**Absolute paths** (v1.3.0+) - Direct file access:
|
|
420
|
+
```json
|
|
421
|
+
{ "path": "C:\\Users\\John\\file.pdf" }
|
|
422
|
+
{ "path": "/home/user/file.pdf" }
|
|
423
|
+
```
|
|
424
|
+
|
|
425
|
+
**Relative paths** - Workspace files:
|
|
426
|
+
```json
|
|
427
|
+
{ "path": "docs/report.pdf" }
|
|
428
|
+
{ "path": "./2024/Q1.pdf" }
|
|
429
|
+
```
|
|
259
430
|
|
|
431
|
+
**Configure working directory:**
|
|
260
432
|
```json
|
|
261
433
|
{
|
|
262
434
|
"mcpServers": {
|
|
263
435
|
"pdf-reader-mcp": {
|
|
264
436
|
"command": "npx",
|
|
265
437
|
"args": ["@sylphx/pdf-reader-mcp"],
|
|
266
|
-
"cwd": "/path/to/
|
|
438
|
+
"cwd": "/path/to/documents"
|
|
267
439
|
}
|
|
268
440
|
}
|
|
269
441
|
}
|
|
270
442
|
```
|
|
271
443
|
|
|
272
|
-
|
|
444
|
+
</details>
|
|
445
|
+
|
|
446
|
+
<details>
|
|
447
|
+
<summary><strong>π Large PDF Strategies</strong></summary>
|
|
448
|
+
|
|
449
|
+
<br/>
|
|
450
|
+
|
|
451
|
+
**Strategy 1: Page ranges**
|
|
452
|
+
```json
|
|
453
|
+
{ "sources": [{ "path": "big.pdf", "pages": "1-20" }] }
|
|
454
|
+
```
|
|
455
|
+
|
|
456
|
+
**Strategy 2: Progressive loading**
|
|
457
|
+
```json
|
|
458
|
+
// Step 1: Get page count
|
|
459
|
+
{ "sources": [{ "path": "big.pdf" }], "include_full_text": false }
|
|
460
|
+
|
|
461
|
+
// Step 2: Extract sections
|
|
462
|
+
{ "sources": [{ "path": "big.pdf", "pages": "50-75" }] }
|
|
463
|
+
```
|
|
464
|
+
|
|
465
|
+
**Strategy 3: Parallel batching**
|
|
466
|
+
```json
|
|
467
|
+
{
|
|
468
|
+
"sources": [
|
|
469
|
+
{ "path": "big.pdf", "pages": "1-50" },
|
|
470
|
+
{ "path": "big.pdf", "pages": "51-100" }
|
|
471
|
+
]
|
|
472
|
+
}
|
|
473
|
+
```
|
|
474
|
+
|
|
475
|
+
</details>
|
|
476
|
+
|
|
477
|
+
---
|
|
478
|
+
|
|
479
|
+
## π§ Troubleshooting
|
|
480
|
+
|
|
481
|
+
### "Absolute paths are not allowed"
|
|
273
482
|
|
|
274
|
-
**Solution
|
|
483
|
+
**Solution:** Upgrade to v1.3.0+
|
|
275
484
|
|
|
276
485
|
```bash
|
|
277
|
-
npm update @sylphx/pdf-reader-mcp
|
|
486
|
+
npm update @sylphx/pdf-reader-mcp
|
|
487
|
+
```
|
|
488
|
+
|
|
489
|
+
Restart your MCP client completely.
|
|
490
|
+
|
|
491
|
+
---
|
|
492
|
+
|
|
493
|
+
### "File not found"
|
|
494
|
+
|
|
495
|
+
**Causes:**
|
|
496
|
+
- File doesn't exist at path
|
|
497
|
+
- Wrong working directory
|
|
498
|
+
- Permission issues
|
|
499
|
+
|
|
500
|
+
**Solutions:**
|
|
501
|
+
|
|
502
|
+
Use absolute path:
|
|
503
|
+
```json
|
|
504
|
+
{ "path": "C:\\Full\\Path\\file.pdf" }
|
|
278
505
|
```
|
|
279
506
|
|
|
280
|
-
|
|
507
|
+
Or configure `cwd`:
|
|
508
|
+
```json
|
|
509
|
+
{
|
|
510
|
+
"pdf-reader-mcp": {
|
|
511
|
+
"command": "npx",
|
|
512
|
+
"args": ["@sylphx/pdf-reader-mcp"],
|
|
513
|
+
"cwd": "/path/to/docs"
|
|
514
|
+
}
|
|
515
|
+
}
|
|
516
|
+
```
|
|
281
517
|
|
|
282
|
-
|
|
518
|
+
---
|
|
283
519
|
|
|
284
|
-
|
|
520
|
+
### "No tools showing up"
|
|
285
521
|
|
|
286
|
-
|
|
287
|
-
| :------------------------------- | :-------- | :--------- |
|
|
288
|
-
| Handle Non-Existent File | ~12,933 | Fastest |
|
|
289
|
-
| Get Full Text | ~5,575 | |
|
|
290
|
-
| Get Specific Page | ~5,329 | |
|
|
291
|
-
| Get Multiple Pages | ~5,242 | |
|
|
292
|
-
| Get Metadata & Page Count | ~4,912 | Slowest |
|
|
522
|
+
**Solution:**
|
|
293
523
|
|
|
294
|
-
|
|
524
|
+
```bash
|
|
525
|
+
npm cache clean --force
|
|
526
|
+
rm -rf node_modules package-lock.json
|
|
527
|
+
npm install @sylphx/pdf-reader-mcp@latest
|
|
528
|
+
```
|
|
529
|
+
|
|
530
|
+
Restart MCP client completely.
|
|
295
531
|
|
|
296
|
-
|
|
532
|
+
---
|
|
297
533
|
|
|
298
534
|
## ποΈ Architecture
|
|
299
535
|
|
|
300
536
|
### Tech Stack
|
|
301
537
|
|
|
302
|
-
|
|
303
|
-
|
|
304
|
-
|
|
305
|
-
|
|
306
|
-
|
|
307
|
-
|
|
308
|
-
|
|
309
|
-
|
|
538
|
+
| Component | Technology |
|
|
539
|
+
|:----------|:-----------|
|
|
540
|
+
| **Runtime** | Node.js 22+ ESM |
|
|
541
|
+
| **PDF Engine** | PDF.js (Mozilla) |
|
|
542
|
+
| **Validation** | Zod + JSON Schema |
|
|
543
|
+
| **Protocol** | MCP SDK |
|
|
544
|
+
| **Language** | TypeScript (strict) |
|
|
545
|
+
| **Testing** | Vitest (103 tests) |
|
|
546
|
+
| **Quality** | Biome (50x faster) |
|
|
547
|
+
| **CI/CD** | GitHub Actions |
|
|
310
548
|
|
|
311
549
|
### Design Principles
|
|
312
550
|
|
|
313
|
-
|
|
314
|
-
|
|
315
|
-
|
|
316
|
-
|
|
317
|
-
|
|
551
|
+
- π **Security First** - Flexible paths with secure defaults
|
|
552
|
+
- π― **Simple Interface** - One tool, all operations
|
|
553
|
+
- β‘ **Performance** - Parallel processing, efficient memory
|
|
554
|
+
- π‘οΈ **Reliability** - Per-page isolation, detailed errors
|
|
555
|
+
- π§ͺ **Quality** - 94%+ coverage, strict TypeScript
|
|
556
|
+
- π **Type Safety** - No `any` types, strict mode
|
|
557
|
+
- π **Backward Compatible** - Smooth upgrades always
|
|
318
558
|
|
|
319
|
-
|
|
559
|
+
---
|
|
320
560
|
|
|
321
561
|
## π§ͺ Development
|
|
322
562
|
|
|
323
|
-
|
|
563
|
+
<details>
|
|
564
|
+
<summary><strong>Setup & Scripts</strong></summary>
|
|
565
|
+
|
|
566
|
+
<br/>
|
|
324
567
|
|
|
568
|
+
**Prerequisites:**
|
|
325
569
|
- Node.js >= 22.0.0
|
|
326
570
|
- pnpm (recommended) or npm
|
|
327
571
|
|
|
328
|
-
|
|
329
|
-
|
|
572
|
+
**Setup:**
|
|
330
573
|
```bash
|
|
331
|
-
git clone https://github.com/
|
|
574
|
+
git clone https://github.com/SylphxAI/pdf-reader-mcp.git
|
|
332
575
|
cd pdf-reader-mcp
|
|
333
|
-
pnpm install
|
|
576
|
+
pnpm install && pnpm build
|
|
334
577
|
```
|
|
335
578
|
|
|
336
|
-
|
|
337
|
-
|
|
579
|
+
**Scripts:**
|
|
338
580
|
```bash
|
|
339
|
-
pnpm run build
|
|
340
|
-
pnpm run
|
|
341
|
-
pnpm run test
|
|
342
|
-
pnpm run
|
|
343
|
-
pnpm run
|
|
344
|
-
pnpm run
|
|
345
|
-
pnpm run check:fix # Fix Biome issues automatically
|
|
346
|
-
pnpm run lint # Lint with Biome
|
|
347
|
-
pnpm run format # Format with Biome
|
|
348
|
-
pnpm run typecheck # TypeScript type checking
|
|
349
|
-
pnpm run benchmark # Run performance benchmarks
|
|
350
|
-
pnpm run validate # Full validation (check + test)
|
|
581
|
+
pnpm run build # Build TypeScript
|
|
582
|
+
pnpm run test # Run 103 tests
|
|
583
|
+
pnpm run test:cov # Coverage (94%+)
|
|
584
|
+
pnpm run check # Lint + format
|
|
585
|
+
pnpm run check:fix # Auto-fix
|
|
586
|
+
pnpm run benchmark # Performance tests
|
|
351
587
|
```
|
|
352
588
|
|
|
353
|
-
|
|
354
|
-
|
|
355
|
-
|
|
589
|
+
**Quality:**
|
|
590
|
+
- β
103 tests
|
|
591
|
+
- β
94%+ coverage
|
|
592
|
+
- β
98%+ function coverage
|
|
593
|
+
- β
Zero lint errors
|
|
594
|
+
- β
Strict TypeScript
|
|
356
595
|
|
|
357
|
-
|
|
358
|
-
pnpm run test # Run all tests
|
|
359
|
-
pnpm run test:cov # Run with coverage report
|
|
360
|
-
```
|
|
596
|
+
</details>
|
|
361
597
|
|
|
362
|
-
|
|
598
|
+
<details>
|
|
599
|
+
<summary><strong>Contributing</strong></summary>
|
|
363
600
|
|
|
364
|
-
|
|
601
|
+
<br/>
|
|
365
602
|
|
|
366
|
-
|
|
603
|
+
**Quick Start:**
|
|
604
|
+
1. Fork repository
|
|
605
|
+
2. Create branch: `git checkout -b feature/awesome`
|
|
606
|
+
3. Make changes: `pnpm test`
|
|
607
|
+
4. Format: `pnpm run check:fix`
|
|
608
|
+
5. Commit: Use [Conventional Commits](https://www.conventionalcommits.org/)
|
|
609
|
+
6. Open PR
|
|
367
610
|
|
|
368
|
-
|
|
369
|
-
|
|
370
|
-
|
|
611
|
+
**Commit Format:**
|
|
612
|
+
```
|
|
613
|
+
feat(images): add WebP support
|
|
614
|
+
fix(paths): handle UNC paths
|
|
615
|
+
docs(readme): update examples
|
|
371
616
|
```
|
|
372
617
|
|
|
373
|
-
|
|
374
|
-
|
|
375
|
-
We welcome contributions! Please:
|
|
618
|
+
See [CONTRIBUTING.md](./CONTRIBUTING.md)
|
|
376
619
|
|
|
377
|
-
|
|
378
|
-
2. Create a feature branch (`git checkout -b feature/amazing-feature`)
|
|
379
|
-
3. Make your changes and ensure tests pass
|
|
380
|
-
4. Run `pnpm run check:fix` to format code
|
|
381
|
-
5. Commit using [Conventional Commits](https://www.conventionalcommits.org/)
|
|
382
|
-
6. Open a Pull Request
|
|
620
|
+
</details>
|
|
383
621
|
|
|
384
|
-
|
|
622
|
+
---
|
|
385
623
|
|
|
386
624
|
## π Documentation
|
|
387
625
|
|
|
388
|
-
-
|
|
389
|
-
-
|
|
390
|
-
-
|
|
391
|
-
-
|
|
392
|
-
-
|
|
393
|
-
-
|
|
626
|
+
- π [Full Docs](https://SylphxAI.github.io/pdf-reader-mcp/) - Complete guides
|
|
627
|
+
- π [Getting Started](./docs/guide/getting-started.md) - Quick start
|
|
628
|
+
- π [API Reference](./docs/api/README.md) - Detailed API
|
|
629
|
+
- ποΈ [Design](./docs/design/index.md) - Architecture
|
|
630
|
+
- β‘ [Performance](./docs/performance/index.md) - Benchmarks
|
|
631
|
+
- π [Comparison](./docs/comparison/index.md) - vs. alternatives
|
|
632
|
+
|
|
633
|
+
---
|
|
394
634
|
|
|
395
635
|
## πΊοΈ Roadmap
|
|
396
636
|
|
|
397
|
-
|
|
398
|
-
- [x]
|
|
399
|
-
- [
|
|
400
|
-
- [
|
|
401
|
-
- [
|
|
402
|
-
- [
|
|
403
|
-
|
|
637
|
+
**β
Completed**
|
|
638
|
+
- [x] Image extraction (v1.1.0)
|
|
639
|
+
- [x] 5-10x parallel speedup (v1.1.0)
|
|
640
|
+
- [x] Y-coordinate ordering (v1.2.0)
|
|
641
|
+
- [x] Absolute paths (v1.3.0)
|
|
642
|
+
- [x] 94%+ test coverage (v1.3.0)
|
|
643
|
+
|
|
644
|
+
**π Next**
|
|
645
|
+
- [ ] OCR for scanned PDFs
|
|
646
|
+
- [ ] Annotation extraction
|
|
647
|
+
- [ ] Form field extraction
|
|
648
|
+
- [ ] Table detection
|
|
649
|
+
- [ ] 100+ MB streaming
|
|
650
|
+
- [ ] Advanced caching
|
|
651
|
+
- [ ] PDF generation
|
|
652
|
+
|
|
653
|
+
Vote at [Discussions](https://github.com/SylphxAI/pdf-reader-mcp/discussions)
|
|
654
|
+
|
|
655
|
+
---
|
|
656
|
+
|
|
657
|
+
## π Recognition
|
|
658
|
+
|
|
659
|
+
**Featured on:**
|
|
660
|
+
- [Smithery](https://smithery.ai/server/@sylphx/pdf-reader-mcp) - MCP directory
|
|
661
|
+
- [Glama](https://glama.ai/mcp/servers/@sylphx/pdf-reader-mcp) - AI marketplace
|
|
662
|
+
- [MseeP.ai](https://mseep.ai/app/SylphxAI-pdf-reader-mcp) - Security validated
|
|
663
|
+
|
|
664
|
+
**Trusted worldwide** β’ **Enterprise adoption** β’ **Battle-tested**
|
|
665
|
+
|
|
666
|
+
---
|
|
404
667
|
|
|
405
|
-
## π€ Support
|
|
668
|
+
## π€ Support
|
|
406
669
|
|
|
407
|
-
|
|
408
|
-
|
|
409
|
-
- **Contributing**: [CONTRIBUTING.md](./CONTRIBUTING.md)
|
|
670
|
+
[](https://github.com/SylphxAI/pdf-reader-mcp/issues)
|
|
671
|
+
[](https://discord.gg/sylphx)
|
|
410
672
|
|
|
411
|
-
|
|
673
|
+
- π [Bug Reports](https://github.com/SylphxAI/pdf-reader-mcp/issues)
|
|
674
|
+
- π¬ [Discussions](https://github.com/SylphxAI/pdf-reader-mcp/discussions)
|
|
675
|
+
- π [Documentation](https://SylphxAI.github.io/pdf-reader-mcp/)
|
|
676
|
+
- π§ [Email](mailto:hi@sylphx.com)
|
|
412
677
|
|
|
413
|
-
|
|
414
|
-
|
|
415
|
-
|
|
416
|
-
|
|
417
|
-
|
|
678
|
+
**Show Your Support:**
|
|
679
|
+
β Star β’ π Watch β’ π Report bugs β’ π‘ Suggest features β’ π Contribute
|
|
680
|
+
|
|
681
|
+
---
|
|
682
|
+
|
|
683
|
+
## π Stats
|
|
684
|
+
|
|
685
|
+

|
|
686
|
+

|
|
687
|
+

|
|
688
|
+

|
|
689
|
+
|
|
690
|
+
**103 Tests** β’ **94%+ Coverage** β’ **Production Ready**
|
|
691
|
+
|
|
692
|
+
---
|
|
418
693
|
|
|
419
694
|
## π License
|
|
420
695
|
|
|
421
|
-
|
|
696
|
+
MIT Β© [Sylphx](https://sylphx.com)
|
|
697
|
+
|
|
698
|
+
---
|
|
699
|
+
|
|
700
|
+
## π Credits
|
|
701
|
+
|
|
702
|
+
Built with:
|
|
703
|
+
- [PDF.js](https://mozilla.github.io/pdf.js/) - Mozilla PDF engine
|
|
704
|
+
- [MCP SDK](https://modelcontextprotocol.io) - Model Context Protocol
|
|
705
|
+
- [Vitest](https://vitest.dev) - Fast testing framework
|
|
706
|
+
|
|
707
|
+
Special thanks to the open source community β€οΈ
|
|
422
708
|
|
|
423
709
|
---
|
|
424
710
|
|
|
425
|
-
|
|
711
|
+
<p align="center">
|
|
712
|
+
<strong>5-10x faster. Production-ready. Battle-tested.</strong>
|
|
713
|
+
<br>
|
|
714
|
+
<sub>The PDF processing server that actually scales</sub>
|
|
715
|
+
<br><br>
|
|
716
|
+
<a href="https://sylphx.com">sylphx.com</a> β’
|
|
717
|
+
<a href="https://x.com/SylphxAI">@SylphxAI</a> β’
|
|
718
|
+
<a href="mailto:hi@sylphx.com">hi@sylphx.com</a>
|
|
719
|
+
</p>
|