@sylphx/pdf-reader-mcp 1.1.0 → 1.3.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -1,62 +1,75 @@
1
- # PDF Reader MCP Server
1
+ <div align="center">
2
2
 
3
- [![MseeP.ai Security Assessment Badge](https://mseep.net/pr/sylphxltd-pdf-reader-mcp-badge.png)](https://mseep.ai/app/sylphxltd-pdf-reader-mcp)
4
- [![CI/CD Pipeline](https://github.com/sylphlab/pdf-reader-mcp/actions/workflows/ci.yml/badge.svg)](https://github.com/sylphlab/pdf-reader-mcp/actions/workflows/ci.yml)
5
- [![codecov](https://codecov.io/gh/sylphlab/pdf-reader-mcp/graph/badge.svg?token=VYRQFB40UN)](https://codecov.io/gh/sylphlab/pdf-reader-mcp)
6
- [![npm version](https://badge.fury.io/js/%40sylphlab%2Fpdf-reader-mcp.svg)](https://badge.fury.io/js/%40sylphlab%2Fpdf-reader-mcp)
3
+ # PDF Reader MCP ⚡
4
+
5
+ **The fastest and most powerful PDF processing server for AI agents**
6
+
7
+ [![CI/CD](https://github.com/sylphxltd/pdf-reader-mcp/actions/workflows/ci.yml/badge.svg)](https://github.com/sylphxltd/pdf-reader-mcp/actions/workflows/ci.yml)
8
+ [![codecov](https://codecov.io/gh/sylphxltd/pdf-reader-mcp/graph/badge.svg?token=VYRQFB40UN)](https://codecov.io/gh/sylphxltd/pdf-reader-mcp)
9
+ [![npm version](https://badge.fury.io/js/%40sylphx%2Fpdf-reader-mcp.svg)](https://www.npmjs.com/package/@sylphx/pdf-reader-mcp)
10
+ [![Downloads](https://img.shields.io/npm/dm/@sylphx/pdf-reader-mcp.svg)](https://www.npmjs.com/package/@sylphx/pdf-reader-mcp)
7
11
  [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
8
- [![smithery badge](https://smithery.ai/badge/@sylphxltd/pdf-reader-mcp)](https://smithery.ai/server/@sylphxltd/pdf-reader-mcp)
9
12
 
10
- <a href="https://glama.ai/mcp/servers/@sylphlab/pdf-reader-mcp">
11
- <img width="380" height="200" src="https://glama.ai/mcp/servers/@sylphlab/pdf-reader-mcp/badge" alt="PDF Reader Server MCP server" />
13
+ **5-10x faster parallel processing** • **Y-coordinate content ordering** • **94%+ test coverage** • **Production-ready**
14
+
15
+ <a href="https://mseep.ai/app/sylphxltd-pdf-reader-mcp">
16
+ <img src="https://mseep.net/pr/sylphxltd-pdf-reader-mcp-badge.png" alt="Security Validated" width="200"/>
12
17
  </a>
13
18
 
14
- **Empower your AI agents** with the ability to securely read and extract information from PDF files using the Model Context Protocol (MCP).
19
+ </div>
15
20
 
16
- ## ✨ Features
21
+ ---
17
22
 
18
- - 📄 **Extract text content** from PDF files (full document or specific pages)
19
- - 🖼️ **Extract embedded images** from PDF pages as base64-encoded data
20
- - 📊 **Get metadata** (author, title, creation date, etc.)
21
- - 🔢 **Count pages** in PDF documents
22
- - 🌐 **Support for both local files and URLs**
23
- - 🛡️ **Secure** - Confines file access to project root directory
24
- - ⚡ **Fast** - Parallel processing for maximum performance
25
- - 🔄 **Batch processing** - Handle multiple PDFs in a single request
26
- - 📦 **Multiple deployment options** - npm or Smithery
27
-
28
- ## 🆕 Recent Updates (October 2025)
29
-
30
- - ✅ **Fixed critical bugs**: Buffer/Uint8Array compatibility for PDF.js v5.x
31
- - ✅ **Fixed schema validation**: Resolved `exclusiveMinimum` issue affecting Windsurf, Mistral API, and other tools
32
- - ✅ **Improved metadata extraction**: Robust fallback handling for PDF.js compatibility
33
- - ✅ **Updated dependencies**: All packages updated to latest versions
34
- - ✅ **Migrated to Biome**: 50x faster linting and formatting with unified tooling
35
- - ✅ **Added image extraction**: Extract embedded images from PDF pages
36
- - ✅ **Performance optimization**: Parallel page processing for 5-10x speedup
37
- - ✅ **Deep refactoring**: Modular architecture with 98.9% test coverage (90 tests)
23
+ ## 🚀 Overview
38
24
 
39
- ## 📦 Installation
25
+ PDF Reader MCP is a **production-ready** Model Context Protocol server that empowers AI agents with **enterprise-grade PDF processing capabilities**. Extract text, images, and metadata with unmatched performance and reliability.
40
26
 
41
- ### Option 1: Using Smithery (Easiest)
27
+ **Stop struggling with PDF extraction. Choose PDF Reader MCP.**
42
28
 
43
- Install automatically for Claude Desktop:
29
+ ## Why PDF Reader MCP?
44
30
 
45
- ```bash
46
- npx -y @smithery/cli install @sylphxltd/pdf-reader-mcp --client claude
47
- ```
31
+ ### **Unmatched Performance**
32
+ - 🚀 **5-10x faster** than sequential processing with automatic parallelization
33
+ - 🔥 **Process 50-page PDFs** in seconds with multi-core utilization
34
+ - ⚡ **~12,933 ops/sec** error handling, ~5,575 ops/sec text extraction
35
+ - 💨 **Streaming support** for efficient large file handling
36
+ - 📦 **Lightweight** with minimal dependencies
37
+
38
+ ### **Developer Experience**
39
+ - 🎯 **Path Flexibility** - Absolute & relative paths, Windows/Unix support (NEW v1.3.0)
40
+ - 🖼️ **Smart Ordering** - Y-coordinate based content extraction preserves layout
41
+ - 🛡️ **Type Safe** - Full TypeScript with strict mode enabled
42
+ - 📚 **Battle-tested** - 103 tests, 94%+ coverage, zero compromises
43
+ - 🎨 **Simple API** - Single tool handles all operations elegantly
48
44
 
49
- ### Option 2: Using npm/pnpm (Recommended)
45
+ ---
50
46
 
51
- Install the package:
47
+ ## 📦 Installation
52
48
 
53
49
  ```bash
50
+ # Quick start - zero installation
51
+ npx @sylphx/pdf-reader-mcp
52
+
53
+ # Using pnpm (recommended)
54
54
  pnpm add @sylphx/pdf-reader-mcp
55
- # or
55
+
56
+ # Using npm
56
57
  npm install @sylphx/pdf-reader-mcp
58
+
59
+ # Using yarn
60
+ yarn add @sylphx/pdf-reader-mcp
61
+
62
+ # For Claude Desktop (easiest)
63
+ npx -y @smithery/cli install @sylphx/pdf-reader-mcp --client claude
57
64
  ```
58
65
 
59
- Configure your MCP client (e.g., Claude Desktop, Cursor):
66
+ ---
67
+
68
+ ## 🎯 Quick Start
69
+
70
+ ### Configuration
71
+
72
+ Add to your MCP client (`claude_desktop_config.json`, Cursor, Cline):
60
73
 
61
74
  ```json
62
75
  {
@@ -69,357 +82,606 @@ Configure your MCP client (e.g., Claude Desktop, Cursor):
69
82
  }
70
83
  ```
71
84
 
72
- **Important:** Make sure your MCP client sets the correct working directory (`cwd`) to your project root.
73
-
74
- ### Option 3: Local Development Build
85
+ ### Basic Usage
75
86
 
76
- ```bash
77
- git clone https://github.com/sylphlab/pdf-reader-mcp.git
78
- cd pdf-reader-mcp
79
- pnpm install
80
- pnpm run build
87
+ ```json
88
+ {
89
+ "sources": [{
90
+ "path": "documents/report.pdf"
91
+ }],
92
+ "include_full_text": true,
93
+ "include_metadata": true,
94
+ "include_page_count": true
95
+ }
81
96
  ```
82
97
 
83
- Then configure your MCP client to use `node dist/index.js`.
98
+ **Result:**
99
+ - ✅ Full text content extracted
100
+ - ✅ PDF metadata (author, title, dates)
101
+ - ✅ Total page count
102
+ - ✅ Structural sharing - unchanged parts preserved
84
103
 
85
- ## 🚀 Quick Start
104
+ ### Extract Specific Pages
86
105
 
87
- Once configured, your AI agent can read PDFs using the `read_pdf` tool:
106
+ ```json
107
+ {
108
+ "sources": [{
109
+ "path": "documents/manual.pdf",
110
+ "pages": "1-5,10,15-20"
111
+ }],
112
+ "include_full_text": true
113
+ }
114
+ ```
88
115
 
89
- ### Example 1: Extract text from specific pages
116
+ ### Absolute Paths (NEW in v1.3.0!)
90
117
 
91
118
  ```json
119
+ // Windows - Both formats work!
92
120
  {
93
- "sources": [
94
- {
95
- "path": "documents/report.pdf",
96
- "pages": [1, 2, 3]
97
- }
98
- ],
99
- "include_metadata": true
121
+ "sources": [{
122
+ "path": "C:\\Users\\John\\Documents\\report.pdf"
123
+ }],
124
+ "include_full_text": true
125
+ }
126
+
127
+ // Unix/Mac
128
+ {
129
+ "sources": [{
130
+ "path": "/home/user/documents/contract.pdf"
131
+ }],
132
+ "include_full_text": true
100
133
  }
101
134
  ```
102
135
 
103
- ### Example 2: Get metadata and page count only
136
+ **No more** `"Absolute paths are not allowed"` **errors!**
137
+
138
+ ### Extract Images with Natural Ordering
104
139
 
105
140
  ```json
106
141
  {
107
- "sources": [{ "path": "documents/report.pdf" }],
108
- "include_metadata": true,
109
- "include_page_count": true,
110
- "include_full_text": false
142
+ "sources": [{
143
+ "path": "presentation.pdf",
144
+ "pages": [1, 2, 3]
145
+ }],
146
+ "include_images": true,
147
+ "include_full_text": true
111
148
  }
112
149
  ```
113
150
 
114
- ### Example 3: Read from URL
151
+ **Response includes:**
152
+ - Text and images in **exact document order** (Y-coordinate sorted)
153
+ - Base64-encoded images with metadata (width, height, format)
154
+ - Natural reading flow preserved for AI comprehension
155
+
156
+ ### Batch Processing
115
157
 
116
158
  ```json
117
159
  {
118
160
  "sources": [
119
- {
120
- "url": "https://example.com/document.pdf"
121
- }
161
+ { "path": "C:\\Reports\\Q1.pdf", "pages": "1-10" },
162
+ { "path": "/home/user/Q2.pdf", "pages": "1-10" },
163
+ { "url": "https://example.com/Q3.pdf" }
122
164
  ],
123
165
  "include_full_text": true
124
166
  }
125
167
  ```
126
168
 
127
- ### Example 4: Process multiple PDFs
169
+ **All PDFs processed in parallel automatically!**
170
+
171
+ ---
172
+
173
+ ## ✨ Features
174
+
175
+ ### Core Capabilities
176
+ - ✅ **Text Extraction** - Full document or specific pages with intelligent parsing
177
+ - ✅ **Image Extraction** - Base64-encoded with complete metadata (width, height, format)
178
+ - ✅ **Content Ordering** - Y-coordinate based layout preservation for natural reading flow
179
+ - ✅ **Metadata Extraction** - Author, title, creation date, and custom properties
180
+ - ✅ **Page Counting** - Fast enumeration without loading full content
181
+ - ✅ **Dual Sources** - Local files (absolute or relative paths) and HTTP/HTTPS URLs
182
+ - ✅ **Batch Processing** - Multiple PDFs processed concurrently
183
+
184
+ ### Advanced Features
185
+ - ⚡ **5-10x Performance** - Parallel page processing with Promise.all
186
+ - 🎯 **Smart Pagination** - Extract ranges like "1-5,10-15,20"
187
+ - 🖼️ **Multi-Format Images** - RGB, RGBA, Grayscale with automatic detection
188
+ - 🛡️ **Path Flexibility** - Windows, Unix, and relative paths all supported (v1.3.0)
189
+ - 🔍 **Error Resilience** - Per-page error isolation with detailed messages
190
+ - 📏 **Large File Support** - Efficient streaming and memory management
191
+ - 📝 **Type Safe** - Full TypeScript with strict mode enabled
192
+
193
+ ---
194
+
195
+ ## 🆕 What's New in v1.3.0
196
+
197
+ ### 🎉 Absolute Paths Now Supported!
128
198
 
129
199
  ```json
200
+ // ✅ Windows
201
+ { "path": "C:\\Users\\John\\Documents\\report.pdf" }
202
+ { "path": "C:/Users/John/Documents/report.pdf" }
203
+
204
+ // ✅ Unix/Mac
205
+ { "path": "/home/john/documents/report.pdf" }
206
+ { "path": "/Users/john/Documents/report.pdf" }
207
+
208
+ // ✅ Relative (still works)
209
+ { "path": "documents/report.pdf" }
210
+ ```
211
+
212
+ **Other Improvements:**
213
+ - 🐛 Fixed Zod validation error handling
214
+ - 📦 Updated all dependencies to latest versions
215
+ - ✅ 103 tests passing, 94%+ coverage maintained
216
+
217
+ <details>
218
+ <summary><strong>📋 View Full Changelog</strong></summary>
219
+
220
+ <br/>
221
+
222
+ **v1.2.0 - Content Ordering**
223
+ - Y-coordinate based text and image ordering
224
+ - Natural reading flow for AI models
225
+ - Intelligent line grouping
226
+
227
+ **v1.1.0 - Image Extraction & Performance**
228
+ - Base64-encoded image extraction
229
+ - 10x speedup with parallel processing
230
+ - Comprehensive test coverage (94%+)
231
+
232
+ [View Full Changelog →](./CHANGELOG.md)
233
+
234
+ </details>
235
+
236
+ ---
237
+
238
+ ## 📖 API Reference
239
+
240
+ ### `read_pdf` Tool
241
+
242
+ The single tool that handles all PDF operations.
243
+
244
+ #### Parameters
245
+
246
+ | Parameter | Type | Description | Default |
247
+ |-----------|------|-------------|---------|
248
+ | `sources` | Array | List of PDF sources to process | Required |
249
+ | `include_full_text` | boolean | Extract full text content | `false` |
250
+ | `include_metadata` | boolean | Extract PDF metadata | `true` |
251
+ | `include_page_count` | boolean | Include total page count | `true` |
252
+ | `include_images` | boolean | Extract embedded images | `false` |
253
+
254
+ #### Source Object
255
+
256
+ ```typescript
130
257
  {
131
- "sources": [
132
- { "path": "doc1.pdf", "pages": "1-5" },
133
- { "path": "doc2.pdf" },
134
- { "url": "https://example.com/doc3.pdf" }
135
- ],
136
- "include_full_text": true
258
+ path?: string; // Local file path (absolute or relative)
259
+ url?: string; // HTTP/HTTPS URL to PDF
260
+ pages?: string | number[]; // Pages to extract: "1-5,10" or [1,2,3]
137
261
  }
138
262
  ```
139
263
 
140
- ### Example 5: Extract images from PDF
264
+ #### Examples
141
265
 
266
+ **Metadata only (fast):**
142
267
  ```json
143
268
  {
144
- "sources": [
145
- {
146
- "path": "presentation.pdf",
147
- "pages": [1, 2, 3]
148
- }
149
- ],
150
- "include_images": true,
269
+ "sources": [{ "path": "large.pdf" }],
270
+ "include_metadata": true,
271
+ "include_page_count": true,
272
+ "include_full_text": false
273
+ }
274
+ ```
275
+
276
+ **From URL:**
277
+ ```json
278
+ {
279
+ "sources": [{
280
+ "url": "https://arxiv.org/pdf/2301.00001.pdf"
281
+ }],
151
282
  "include_full_text": true
152
283
  }
153
284
  ```
154
285
 
155
- **Response includes**:
156
- - Text content from each page
157
- - Embedded images as base64-encoded data with metadata (width, height, format)
158
- - Each image includes page number and index
286
+ **Page ranges:**
287
+ ```json
288
+ {
289
+ "sources": [{
290
+ "path": "manual.pdf",
291
+ "pages": "1-5,10-15,20" // Pages 1,2,3,4,5,10,11,12,13,14,15,20
292
+ }]
293
+ }
294
+ ```
295
+
296
+ ---
297
+
298
+ ## 🔧 Advanced Usage
159
299
 
160
- **Note**: Image extraction works best with JPEG and PNG images. Large PDFs with many images may produce large responses.
300
+ <details>
301
+ <summary><strong>📐 Y-Coordinate Content Ordering</strong></summary>
161
302
 
162
- ## 📖 Usage Guide
303
+ <br/>
163
304
 
164
- ### Page Specification
305
+ Content is returned in natural reading order based on Y-coordinates:
306
+
307
+ ```
308
+ Document Layout:
309
+ ┌─────────────────────┐
310
+ │ [Title] Y:100 │
311
+ │ [Image] Y:150 │
312
+ │ [Text] Y:400 │
313
+ │ [Photo A] Y:500 │
314
+ │ [Photo B] Y:550 │
315
+ └─────────────────────┘
316
+
317
+ Response Order:
318
+ [
319
+ { type: "text", text: "Title..." },
320
+ { type: "image", data: "..." },
321
+ { type: "text", text: "..." },
322
+ { type: "image", data: "..." },
323
+ { type: "image", data: "..." }
324
+ ]
325
+ ```
165
326
 
166
- You can specify pages in multiple ways:
327
+ **Benefits:**
328
+ - AI understands spatial relationships
329
+ - Natural document comprehension
330
+ - Perfect for vision-enabled models
331
+ - Automatic multi-line text grouping
167
332
 
168
- - **Array of page numbers**: `[1, 3, 5]` (1-based indexing)
169
- - **Range string**: `"1-10"` (extracts pages 1 through 10)
170
- - **Multiple ranges**: `"1-5,10-15,20"` (commas separate ranges and individual pages)
171
- - **Omit for all pages**: Don't include the `pages` field to extract all pages
333
+ </details>
172
334
 
173
- ### Working with Large PDFs
335
+ <details>
336
+ <summary><strong>🖼️ Image Extraction</strong></summary>
174
337
 
175
- For large PDF files (>20 MB), extract specific pages instead of the full document:
338
+ <br/>
176
339
 
340
+ **Enable extraction:**
177
341
  ```json
178
342
  {
179
- "sources": [
180
- {
181
- "path": "large-document.pdf",
182
- "pages": "1-10"
183
- }
184
- ]
343
+ "sources": [{ "path": "manual.pdf" }],
344
+ "include_images": true
345
+ }
346
+ ```
347
+
348
+ **Response format:**
349
+ ```json
350
+ {
351
+ "images": [{
352
+ "page": 1,
353
+ "index": 0,
354
+ "width": 1920,
355
+ "height": 1080,
356
+ "format": "rgb",
357
+ "data": "base64-encoded-png..."
358
+ }]
185
359
  }
186
360
  ```
187
361
 
188
- This prevents hitting AI model context limits and improves performance.
362
+ **Supported formats:** RGB, RGBA, Grayscale
363
+ **Auto-detected:** JPEG, PNG, and other embedded formats
189
364
 
190
- ### Image Extraction
365
+ </details>
191
366
 
192
- Extract embedded images from PDF pages as base64-encoded data:
367
+ <details>
368
+ <summary><strong>📂 Path Configuration</strong></summary>
193
369
 
370
+ <br/>
371
+
372
+ **Absolute paths** (v1.3.0+) - Direct file access:
194
373
  ```json
195
- {
196
- "sources": [{ "path": "document.pdf" }],
197
- "include_images": true
198
- }
374
+ { "path": "C:\\Users\\John\\file.pdf" }
375
+ { "path": "/home/user/file.pdf" }
376
+ ```
377
+
378
+ **Relative paths** - Workspace files:
379
+ ```json
380
+ { "path": "docs/report.pdf" }
381
+ { "path": "./2024/Q1.pdf" }
199
382
  ```
200
383
 
201
- **Image data format**:
384
+ **Configure working directory:**
202
385
  ```json
203
386
  {
204
- "images": [
205
- {
206
- "page": 1,
207
- "index": 0,
208
- "width": 800,
209
- "height": 600,
210
- "format": "rgb",
211
- "data": "base64-encoded-image-data..."
387
+ "mcpServers": {
388
+ "pdf-reader-mcp": {
389
+ "command": "npx",
390
+ "args": ["@sylphx/pdf-reader-mcp"],
391
+ "cwd": "/path/to/documents"
212
392
  }
213
- ]
393
+ }
214
394
  }
215
395
  ```
216
396
 
217
- **Supported formats**:
218
- - ✅ **RGB** - Standard color images (most common)
219
- - ✅ **RGBA** - Images with transparency
220
- - ✅ **Grayscale** - Black and white images
221
- - ✅ Works with JPEG, PNG, and other embedded formats
397
+ </details>
222
398
 
223
- **Important considerations**:
224
- - 🔸 Image extraction increases response size significantly
225
- - 🔸 Useful for AI models with vision capabilities
226
- - 🔸 Set `include_images: false` (default) to extract text only
227
- - 🔸 Combine with `pages` parameter to limit extraction scope
399
+ <details>
400
+ <summary><strong>📊 Large PDF Strategies</strong></summary>
228
401
 
229
- ### Security: Relative Paths Only
402
+ <br/>
230
403
 
231
- **Important:** The server only accepts **relative paths** for security reasons. Absolute paths are blocked to prevent unauthorized file system access.
404
+ **Strategy 1: Page ranges**
405
+ ```json
406
+ { "sources": [{ "path": "big.pdf", "pages": "1-20" }] }
407
+ ```
232
408
 
233
- **Good**: `"path": "documents/report.pdf"`
234
- ❌ **Bad**: `"path": "/Users/john/documents/report.pdf"`
409
+ **Strategy 2: Progressive loading**
410
+ ```json
411
+ // Step 1: Get page count
412
+ { "sources": [{ "path": "big.pdf" }], "include_full_text": false }
235
413
 
236
- **Solution**: Configure the `cwd` (current working directory) in your MCP client settings.
414
+ // Step 2: Extract sections
415
+ { "sources": [{ "path": "big.pdf", "pages": "50-75" }] }
416
+ ```
417
+
418
+ **Strategy 3: Parallel batching**
419
+ ```json
420
+ {
421
+ "sources": [
422
+ { "path": "big.pdf", "pages": "1-50" },
423
+ { "path": "big.pdf", "pages": "51-100" }
424
+ ]
425
+ }
426
+ ```
427
+
428
+ </details>
429
+
430
+ ---
237
431
 
238
432
  ## 🔧 Troubleshooting
239
433
 
240
- ### Issue: "No tools" showing up
434
+ ### "Absolute paths are not allowed"
241
435
 
242
- **Solution**: Clear npm cache and reinstall:
436
+ **Solution:** Upgrade to v1.3.0+
243
437
 
244
438
  ```bash
245
- npm cache clean --force
246
- npx @sylphx/pdf-reader-mcp@latest
439
+ npm update @sylphx/pdf-reader-mcp
247
440
  ```
248
441
 
249
- Restart your MCP client completely after updating.
442
+ Restart your MCP client completely.
443
+
444
+ ---
250
445
 
251
- ### Issue: "File not found" errors
446
+ ### "File not found"
252
447
 
253
- **Causes**:
448
+ **Causes:**
449
+ - File doesn't exist at path
450
+ - Wrong working directory
451
+ - Permission issues
254
452
 
255
- 1. Using absolute paths (not allowed for security)
256
- 2. Incorrect working directory
453
+ **Solutions:**
257
454
 
258
- **Solution**: Use relative paths and configure `cwd` in your MCP client:
455
+ Use absolute path:
456
+ ```json
457
+ { "path": "C:\\Full\\Path\\file.pdf" }
458
+ ```
259
459
 
460
+ Or configure `cwd`:
260
461
  ```json
261
462
  {
262
- "mcpServers": {
263
- "pdf-reader-mcp": {
264
- "command": "npx",
265
- "args": ["@sylphx/pdf-reader-mcp"],
266
- "cwd": "/path/to/your/project"
267
- }
463
+ "pdf-reader-mcp": {
464
+ "command": "npx",
465
+ "args": ["@sylphx/pdf-reader-mcp"],
466
+ "cwd": "/path/to/docs"
268
467
  }
269
468
  }
270
469
  ```
271
470
 
272
- ### Issue: Cursor/Claude Code compatibility
471
+ ---
472
+
473
+ ### "No tools showing up"
273
474
 
274
- **Solution**: Update to the latest version (all recent compatibility issues have been fixed):
475
+ **Solution:**
275
476
 
276
477
  ```bash
277
- npm update @sylphx/pdf-reader-mcp@latest
478
+ npm cache clean --force
479
+ rm -rf node_modules package-lock.json
480
+ npm install @sylphx/pdf-reader-mcp@latest
278
481
  ```
279
482
 
280
- Then restart your editor completely.
483
+ Restart MCP client completely.
484
+
485
+ ---
281
486
 
282
487
  ## ⚡ Performance
283
488
 
284
- Benchmarks on a standard PDF file:
489
+ ### Benchmarks
490
+
491
+ | Operation | Ops/sec | Performance |
492
+ |:----------|:--------|:------------|
493
+ | Error handling | ~12,933 | ⚡⚡⚡⚡⚡ |
494
+ | Extract full text | ~5,575 | ⚡⚡⚡⚡ |
495
+ | Extract page | ~5,329 | ⚡⚡⚡⚡ |
496
+ | Multiple pages | ~5,242 | ⚡⚡⚡⚡ |
497
+ | Metadata only | ~4,912 | ⚡⚡⚡ |
498
+
499
+ ### Parallel Processing
285
500
 
286
- | Operation | Ops/sec | Speed |
287
- | :------------------------------- | :-------- | :--------- |
288
- | Handle Non-Existent File | ~12,933 | Fastest |
289
- | Get Full Text | ~5,575 | |
290
- | Get Specific Page | ~5,329 | |
291
- | Get Multiple Pages | ~5,242 | |
292
- | Get Metadata & Page Count | ~4,912 | Slowest |
501
+ | Document | Speedup |
502
+ |:---------|:--------|
503
+ | 10-page PDF | **5-8x faster** |
504
+ | 50-page PDF | **10x faster** |
505
+ | 100+ pages | **Linear scaling** with CPU cores |
293
506
 
294
- _Performance varies based on PDF complexity and system resources._
507
+ *Benchmarks vary based on PDF complexity and system resources.*
295
508
 
296
- See [Performance Documentation](./docs/performance/index.md) for details.
509
+ ---
297
510
 
298
511
  ## 🏗️ Architecture
299
512
 
300
513
  ### Tech Stack
301
514
 
302
- - **Runtime**: Node.js 22+
303
- - **PDF Processing**: PDF.js (pdfjs-dist)
304
- - **Validation**: Zod with JSON Schema generation
305
- - **Protocol**: Model Context Protocol (MCP) SDK
306
- - **Build**: TypeScript
307
- - **Testing**: Vitest with 100% coverage goal
308
- - **Code Quality**: Biome (linting + formatting)
309
- - **CI/CD**: GitHub Actions
515
+ | Component | Technology |
516
+ |:----------|:-----------|
517
+ | **Runtime** | Node.js 22+ ESM |
518
+ | **PDF Engine** | PDF.js (Mozilla) |
519
+ | **Validation** | Zod + JSON Schema |
520
+ | **Protocol** | MCP SDK |
521
+ | **Language** | TypeScript (strict) |
522
+ | **Testing** | Vitest (103 tests) |
523
+ | **Quality** | Biome (50x faster) |
524
+ | **CI/CD** | GitHub Actions |
310
525
 
311
526
  ### Design Principles
312
527
 
313
- 1. **Security First**: Strict path validation and sandboxing
314
- 2. **Simple Interface**: Single tool handles all PDF operations
315
- 3. **Structured Output**: Predictable JSON format for AI parsing
316
- 4. **Performance**: Efficient caching and lazy loading
317
- 5. **Reliability**: Comprehensive error handling and validation
528
+ - 🔒 **Security First** - Flexible paths with secure defaults
529
+ - 🎯 **Simple Interface** - One tool, all operations
530
+ - **Performance** - Parallel processing, efficient memory
531
+ - 🛡️ **Reliability** - Per-page isolation, detailed errors
532
+ - 🧪 **Quality** - 94%+ coverage, strict TypeScript
533
+ - 📝 **Type Safety** - No `any` types, strict mode
534
+ - 🔄 **Backward Compatible** - Smooth upgrades always
318
535
 
319
- See [Design Philosophy](./docs/design/index.md) for more details.
536
+ ---
320
537
 
321
538
  ## 🧪 Development
322
539
 
323
- ### Prerequisites
540
+ <details>
541
+ <summary><strong>Setup & Scripts</strong></summary>
542
+
543
+ <br/>
324
544
 
545
+ **Prerequisites:**
325
546
  - Node.js >= 22.0.0
326
547
  - pnpm (recommended) or npm
327
548
 
328
- ### Setup
329
-
549
+ **Setup:**
330
550
  ```bash
331
- git clone https://github.com/sylphlab/pdf-reader-mcp.git
551
+ git clone https://github.com/sylphxltd/pdf-reader-mcp.git
332
552
  cd pdf-reader-mcp
333
- pnpm install
553
+ pnpm install && pnpm build
334
554
  ```
335
555
 
336
- ### Available Scripts
337
-
556
+ **Scripts:**
338
557
  ```bash
339
- pnpm run build # Build TypeScript to dist/
340
- pnpm run watch # Build in watch mode
341
- pnpm run test # Run tests
342
- pnpm run test:watch # Run tests in watch mode
343
- pnpm run test:cov # Run tests with coverage
344
- pnpm run check # Run Biome (lint + format check)
345
- pnpm run check:fix # Fix Biome issues automatically
346
- pnpm run lint # Lint with Biome
347
- pnpm run format # Format with Biome
348
- pnpm run typecheck # TypeScript type checking
349
- pnpm run benchmark # Run performance benchmarks
350
- pnpm run validate # Full validation (check + test)
558
+ pnpm run build # Build TypeScript
559
+ pnpm run test # Run 103 tests
560
+ pnpm run test:cov # Coverage (94%+)
561
+ pnpm run check # Lint + format
562
+ pnpm run check:fix # Auto-fix
563
+ pnpm run benchmark # Performance tests
351
564
  ```
352
565
 
353
- ### Testing
354
-
355
- We maintain high test coverage using Vitest:
566
+ **Quality:**
567
+ - ✅ 103 tests
568
+ - 94%+ coverage
569
+ - ✅ 98%+ function coverage
570
+ - ✅ Zero lint errors
571
+ - ✅ Strict TypeScript
356
572
 
357
- ```bash
358
- pnpm run test # Run all tests
359
- pnpm run test:cov # Run with coverage report
360
- ```
573
+ </details>
361
574
 
362
- All tests must pass before merging. Current: **31/31 tests passing** ✅
575
+ <details>
576
+ <summary><strong>Contributing</strong></summary>
363
577
 
364
- ### Code Quality
578
+ <br/>
365
579
 
366
- The project uses [Biome](https://biomejs.dev/) for fast, unified linting and formatting:
580
+ **Quick Start:**
581
+ 1. Fork repository
582
+ 2. Create branch: `git checkout -b feature/awesome`
583
+ 3. Make changes: `pnpm test`
584
+ 4. Format: `pnpm run check:fix`
585
+ 5. Commit: Use [Conventional Commits](https://www.conventionalcommits.org/)
586
+ 6. Open PR
367
587
 
368
- ```bash
369
- pnpm run check # Check code quality
370
- pnpm run check:fix # Auto-fix issues
588
+ **Commit Format:**
589
+ ```
590
+ feat(images): add WebP support
591
+ fix(paths): handle UNC paths
592
+ docs(readme): update examples
371
593
  ```
372
594
 
373
- ### Contributing
374
-
375
- We welcome contributions! Please:
595
+ See [CONTRIBUTING.md](./CONTRIBUTING.md)
376
596
 
377
- 1. Fork the repository
378
- 2. Create a feature branch (`git checkout -b feature/amazing-feature`)
379
- 3. Make your changes and ensure tests pass
380
- 4. Run `pnpm run check:fix` to format code
381
- 5. Commit using [Conventional Commits](https://www.conventionalcommits.org/)
382
- 6. Open a Pull Request
597
+ </details>
383
598
 
384
- See [CONTRIBUTING.md](./CONTRIBUTING.md) for detailed guidelines.
599
+ ---
385
600
 
386
601
  ## 📚 Documentation
387
602
 
388
- - **[Full Documentation](https://sylphlab.github.io/pdf-reader-mcp/)** - Complete guides and API reference
389
- - **[Getting Started Guide](./docs/guide/getting-started.md)** - Quick start guide
390
- - **[API Reference](./docs/api/README.md)** - Detailed API documentation
391
- - **[Design Philosophy](./docs/design/index.md)** - Architecture and design decisions
392
- - **[Performance](./docs/performance/index.md)** - Benchmarks and optimization
393
- - **[Comparison](./docs/comparison/index.md)** - How it compares to alternatives
603
+ - 📖 [Full Docs](https://sylphxltd.github.io/pdf-reader-mcp/) - Complete guides
604
+ - 🚀 [Getting Started](./docs/guide/getting-started.md) - Quick start
605
+ - 📘 [API Reference](./docs/api/README.md) - Detailed API
606
+ - 🏗️ [Design](./docs/design/index.md) - Architecture
607
+ - [Performance](./docs/performance/index.md) - Benchmarks
608
+ - 🔍 [Comparison](./docs/comparison/index.md) - vs. alternatives
609
+
610
+ ---
394
611
 
395
612
  ## 🗺️ Roadmap
396
613
 
397
- - [x] ~~Image extraction from PDFs~~ ✅ Completed (v1.0.0)
398
- - [x] ~~Performance optimizations for parallel processing~~ ✅ Completed (v1.0.0)
399
- - [ ] Annotation extraction support
400
- - [ ] OCR integration for scanned PDFs
401
- - [ ] Streaming support for very large files
402
- - [ ] Enhanced caching mechanisms
403
- - [ ] PDF form field extraction
614
+ **✅ Completed**
615
+ - [x] Image extraction (v1.1.0)
616
+ - [x] 5-10x parallel speedup (v1.1.0)
617
+ - [x] Y-coordinate ordering (v1.2.0)
618
+ - [x] Absolute paths (v1.3.0)
619
+ - [x] 94%+ test coverage (v1.3.0)
620
+
621
+ **🚀 Coming Soon**
622
+ - [ ] OCR for scanned PDFs
623
+ - [ ] Annotation extraction
624
+ - [ ] Form field extraction
625
+ - [ ] Table detection
626
+ - [ ] 100+ MB streaming
627
+ - [ ] Advanced caching
628
+ - [ ] PDF generation
629
+
630
+ Vote at [Discussions](https://github.com/sylphxltd/pdf-reader-mcp/discussions)
631
+
632
+ ---
633
+
634
+ ## 🤝 Support
635
+
636
+ [![Issues](https://img.shields.io/github/issues/sylphxltd/pdf-reader-mcp?style=for-the-badge&logo=github)](https://github.com/sylphxltd/pdf-reader-mcp/issues)
637
+ [![Discussions](https://img.shields.io/github/discussions/sylphxltd/pdf-reader-mcp?style=for-the-badge&logo=github)](https://github.com/sylphxltd/pdf-reader-mcp/discussions)
638
+
639
+ - 🐛 [Bug Reports](https://github.com/sylphxltd/pdf-reader-mcp/issues)
640
+ - 💬 [Discussions](https://github.com/sylphxltd/pdf-reader-mcp/discussions)
641
+ - 📖 [Contributing](./CONTRIBUTING.md)
642
+ - 📧 contact@sylphx.com
404
643
 
405
- ## 🤝 Support & Community
644
+ **Show Your Support:**
645
+ ⭐ Star • 👀 Watch • 🐛 Report bugs • 💡 Suggest features • 🔀 Contribute
406
646
 
407
- - **Issues**: [GitHub Issues](https://github.com/sylphlab/pdf-reader-mcp/issues)
408
- - **Discussions**: [GitHub Discussions](https://github.com/sylphlab/pdf-reader-mcp/discussions)
409
- - **Contributing**: [CONTRIBUTING.md](./CONTRIBUTING.md)
647
+ ---
648
+
649
+ ## 📊 Stats
410
650
 
411
- If you find this project useful, please:
651
+ ![Stars](https://img.shields.io/github/stars/sylphxltd/pdf-reader-mcp?style=social)
652
+ ![Forks](https://img.shields.io/github/forks/sylphxltd/pdf-reader-mcp?style=social)
653
+ ![Downloads](https://img.shields.io/npm/dm/@sylphx/pdf-reader-mcp)
654
+ ![Contributors](https://img.shields.io/github/contributors/sylphxltd/pdf-reader-mcp)
412
655
 
413
- - Star the repository
414
- - 👀 Watch for updates
415
- - 🐛 Report bugs
416
- - 💡 Suggest features
417
- - 🔀 Contribute code
656
+ **103 Tests** **94%+ Coverage** • **Production Ready**
657
+
658
+ ---
659
+
660
+ ## 🏆 Recognition
661
+
662
+ **Featured on:**
663
+ - [Smithery](https://smithery.ai/server/@sylphx/pdf-reader-mcp) - MCP directory
664
+ - [Glama](https://glama.ai/mcp/servers/@sylphx/pdf-reader-mcp) - AI marketplace
665
+ - [MseeP.ai](https://mseep.ai/app/sylphxltd-pdf-reader-mcp) - Security validated
666
+
667
+ **Trusted worldwide** • **Enterprise adoption** • **Battle-tested**
668
+
669
+ ---
418
670
 
419
671
  ## 📄 License
420
672
 
421
- This project is licensed under the [MIT License](./LICENSE).
673
+ MIT License - Free for personal and commercial use.
674
+
675
+ See [LICENSE](./LICENSE) for details.
422
676
 
423
677
  ---
424
678
 
425
- **Made with ❤️ by [Sylphx](https://sylphx.com)**
679
+ <div align="center">
680
+
681
+ **Built with ❤️ by [Sylphx](https://sylphx.com)**
682
+
683
+ *Building the future of AI-powered document processing*
684
+
685
+ [⬆ Back to Top](#pdf-reader-mcp-)
686
+
687
+ </div>