@shuji-bonji/pdf-spec-mcp 0.2.1 → 0.2.3
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.ja.md +299 -42
- package/README.md +296 -42
- package/package.json +1 -1
package/README.ja.md
CHANGED
|
@@ -7,6 +7,14 @@
|
|
|
7
7
|
|
|
8
8
|
ISO 32000(PDF)仕様書への構造化アクセスを提供する MCP(Model Context Protocol)サーバーです。LLM が PDF 仕様書をナビゲート・検索・分析するためのツールを提供します。
|
|
9
9
|
|
|
10
|
+
> [!IMPORTANT]
|
|
11
|
+
> **PDF 仕様書ファイルは同梱されていません。**
|
|
12
|
+
> このサーバーを利用するには、PDF 仕様書を別途入手し、ローカルディレクトリに配置する必要があります。
|
|
13
|
+
>
|
|
14
|
+
> **入手先:** [PDF Association — Sponsored Standards](https://pdfa.org/sponsored-standards/)
|
|
15
|
+
>
|
|
16
|
+
> 詳しくは「[セットアップ](#セットアップ)」を参照してください。
|
|
17
|
+
|
|
10
18
|
## 特徴
|
|
11
19
|
|
|
12
20
|
- **マルチ仕様対応** — 最大17の PDF 関連文書を自動検出(ISO 32000-2、PDF/UA、Tagged PDF ガイド等)
|
|
@@ -18,57 +26,161 @@ ISO 32000(PDF)仕様書への構造化アクセスを提供する MCP(Mode
|
|
|
18
26
|
- **バージョン比較** — PDF 1.7 と PDF 2.0 のセクション構造差分
|
|
19
27
|
- **並行処理** — 大規模ドキュメントのチャンク並行ページ処理
|
|
20
28
|
|
|
21
|
-
##
|
|
29
|
+
## アーキテクチャ
|
|
22
30
|
|
|
23
|
-
|
|
24
|
-
|
|
25
|
-
|
|
26
|
-
|
|
27
|
-
|
|
28
|
-
|
|
29
|
-
|
|
30
|
-
|
|
31
|
-
|
|
32
|
-
|
|
31
|
+
```mermaid
|
|
32
|
+
graph LR
|
|
33
|
+
subgraph Client["MCP クライアント"]
|
|
34
|
+
LLM["LLM<br/>(Claude, etc.)"]
|
|
35
|
+
end
|
|
36
|
+
|
|
37
|
+
subgraph Server["PDF Spec MCP Server"]
|
|
38
|
+
direction TB
|
|
39
|
+
MCP["MCP Server<br/>index.ts"]
|
|
40
|
+
|
|
41
|
+
subgraph Tools["Tools Layer"]
|
|
42
|
+
direction LR
|
|
43
|
+
T1["list_specs"]
|
|
44
|
+
T2["get_structure"]
|
|
45
|
+
T3["get_section"]
|
|
46
|
+
T4["search_spec"]
|
|
47
|
+
T5["get_requirements"]
|
|
48
|
+
T6["get_definitions"]
|
|
49
|
+
T7["get_tables"]
|
|
50
|
+
T8["compare_versions"]
|
|
51
|
+
end
|
|
52
|
+
|
|
53
|
+
subgraph Services["Services Layer"]
|
|
54
|
+
direction LR
|
|
55
|
+
REG["Registry<br/>PDF 自動検出"]
|
|
56
|
+
LOADER["Loader<br/>LRU キャッシュ"]
|
|
57
|
+
SVC["PDFService<br/>オーケストレーション"]
|
|
58
|
+
CMP["CompareService<br/>バージョン比較"]
|
|
59
|
+
end
|
|
60
|
+
|
|
61
|
+
subgraph Extractors["Extractors"]
|
|
62
|
+
direction LR
|
|
63
|
+
OUTLINE["OutlineResolver<br/>目次・セクション索引"]
|
|
64
|
+
CONTENT["ContentExtractor<br/>構造化抽出"]
|
|
65
|
+
SEARCH["SearchIndex<br/>全文検索"]
|
|
66
|
+
REQ["RequirementExtractor"]
|
|
67
|
+
DEF["DefinitionExtractor"]
|
|
68
|
+
end
|
|
69
|
+
|
|
70
|
+
subgraph Utils["Utils"]
|
|
71
|
+
direction LR
|
|
72
|
+
CACHE["LRU Cache"]
|
|
73
|
+
CONC["Concurrency"]
|
|
74
|
+
VALID["Validation"]
|
|
75
|
+
end
|
|
76
|
+
end
|
|
77
|
+
|
|
78
|
+
subgraph PDFs["PDF 仕様書ファイル (別途入手)"]
|
|
79
|
+
direction LR
|
|
80
|
+
PDF1["ISO 32000-2<br/>(PDF 2.0)"]
|
|
81
|
+
PDF2["ISO 32000-1<br/>(PDF 1.7)"]
|
|
82
|
+
PDF3["TS 32001–32005<br/>PDF/UA 等"]
|
|
83
|
+
end
|
|
84
|
+
|
|
85
|
+
LLM <-->|"stdio / JSON-RPC"| MCP
|
|
86
|
+
MCP --> Tools
|
|
87
|
+
Tools --> Services
|
|
88
|
+
Services --> Extractors
|
|
89
|
+
Services --> Utils
|
|
90
|
+
LOADER --> PDFs
|
|
91
|
+
REG -->|"ファイル名パターン<br/>自動検出"| PDFs
|
|
92
|
+
|
|
93
|
+
style Client fill:#e8f4f8,stroke:#2196F3
|
|
94
|
+
style PDFs fill:#fff3e0,stroke:#FF9800
|
|
95
|
+
style Tools fill:#e8f5e9,stroke:#4CAF50
|
|
96
|
+
style Services fill:#f3e5f5,stroke:#9C27B0
|
|
97
|
+
style Extractors fill:#fce4ec,stroke:#E91E63
|
|
98
|
+
style Utils fill:#f5f5f5,stroke:#9E9E9E
|
|
99
|
+
```
|
|
33
100
|
|
|
34
|
-
|
|
101
|
+
### レイヤー構成
|
|
35
102
|
|
|
36
|
-
|
|
103
|
+
| レイヤー | 役割 |
|
|
104
|
+
| -------------- | ------------------------------------------------------------------ |
|
|
105
|
+
| **Tools** | MCP ツールスキーマ定義 & ハンドラー(入力バリデーション) |
|
|
106
|
+
| **Services** | ビジネスロジック(PDF レジストリ、ローダー、オーケストレーション) |
|
|
107
|
+
| **Extractors** | PDF からの情報抽出(目次、コンテンツ、検索、要件、定義) |
|
|
108
|
+
| **Utils** | 共通ユーティリティ(キャッシュ、並行処理、バリデーション) |
|
|
37
109
|
|
|
38
|
-
|
|
39
|
-
|
|
40
|
-
|
|
41
|
-
|
|
42
|
-
|
|
43
|
-
|
|
44
|
-
|
|
110
|
+
## セットアップ
|
|
111
|
+
|
|
112
|
+
### 1. PDF 仕様書の入手
|
|
113
|
+
|
|
114
|
+
> [!WARNING]
|
|
115
|
+
> PDF 仕様書は **著作権で保護されたドキュメント** であり、このパッケージには含まれていません。
|
|
116
|
+
> 以下から入手し、任意のローカルディレクトリに配置してください。
|
|
117
|
+
|
|
118
|
+
| 文書 | 入手先 |
|
|
119
|
+
| ------------------------- | ----------------------------------------------------------------------------------------------- |
|
|
120
|
+
| ISO 32000-2 (PDF 2.0) | [PDF Association](https://pdfa.org/resource/iso-32000-pdf/) |
|
|
121
|
+
| ISO 32000-1 (PDF 1.7) | [Adobe (無償)](https://opensource.adobe.com/dc-acrobat-sdk-docs/pdfstandards/PDF32000_2008.pdf) |
|
|
122
|
+
| TS 32001–32005, PDF/UA 等 | [PDF Association — Sponsored Standards](https://pdfa.org/sponsored-standards/) |
|
|
45
123
|
|
|
46
|
-
|
|
124
|
+
以下の全17ファイルに対応しています。すべてを揃える必要はなく、必要な仕様書だけ配置すれば動作します(最低限 ISO 32000-2 を推奨)。
|
|
47
125
|
|
|
48
|
-
|
|
49
|
-
-
|
|
126
|
+
```
|
|
127
|
+
pdf-specs/
|
|
128
|
+
│
|
|
129
|
+
│ ── 標準 ──────────────────────────────────
|
|
130
|
+
├── ISO_32000-2_sponsored-ec2.pdf # iso32000-2 : PDF 2.0 EC2(推奨・主対象)
|
|
131
|
+
├── ISO_32000-2-2020_sponsored.pdf # iso32000-2-2020 : PDF 2.0 原版
|
|
132
|
+
├── PDF32000_2008.pdf # pdf17 : PDF 1.7(バージョン比較用)
|
|
133
|
+
├── pdfreference1.7old.pdf # pdf17old : Adobe PDF Reference 1.7
|
|
134
|
+
│
|
|
135
|
+
│ ── 技術仕様(TS)──────────────────────────
|
|
136
|
+
├── ISO_TS_32001-2022_sponsored.pdf # ts32001 : ハッシュ拡張 (SHA-3)
|
|
137
|
+
├── ISO_TS_32002-2022_sponsored.pdf # ts32002 : 電子署名拡張 (ECC/PAdES)
|
|
138
|
+
├── ISO_TS_32003-2023_sponsored.pdf # ts32003 : AES-GCM 暗号化
|
|
139
|
+
├── ISO-TS-32004-2024_sponsored.pdf # ts32004 : 整合性保護
|
|
140
|
+
├── ISO-TS-32005-2023-sponsored.pdf # ts32005 : 名前空間マッピング
|
|
141
|
+
│
|
|
142
|
+
│ ── PDF/UA(アクセシビリティ)──────────────
|
|
143
|
+
├── ISO-14289-1-2014-sponsored.pdf # pdfua1 : PDF/UA-1
|
|
144
|
+
├── ISO-14289-2-2024-sponsored.pdf # pdfua2 : PDF/UA-2
|
|
145
|
+
│
|
|
146
|
+
│ ── ガイド ─────────────────────────────────
|
|
147
|
+
├── Tagged-PDF-Best-Practice-Guide.pdf # tagged-bpg : Tagged PDF ベストプラクティス
|
|
148
|
+
├── Well-Tagged-PDF-WTPDF-1.0.pdf # wtpdf : Well-Tagged PDF
|
|
149
|
+
├── PDF-Declarations.pdf # declarations: PDF Declarations
|
|
150
|
+
│
|
|
151
|
+
│ ── アプリケーションノート ─────────────────
|
|
152
|
+
├── PDF20_AN001-BPC.pdf # an001 : Black Point Compensation
|
|
153
|
+
├── PDF20_AN002-AF.pdf # an002 : Associated Files
|
|
154
|
+
└── PDF20_AN003-ObjectMetadataLocations.pdf # an003 : Object Metadata
|
|
155
|
+
```
|
|
156
|
+
|
|
157
|
+
### 2. インストール
|
|
158
|
+
|
|
159
|
+
このパッケージは MCP クライアントから起動される CLI バイナリ(`pdf-spec-mcp`)を提供します。
|
|
160
|
+
**通常は手動でインストールする必要はありません** — 次のステップのように MCP クライアント側で `npx @shuji-bonji/pdf-spec-mcp` を指定してください。
|
|
50
161
|
|
|
51
|
-
|
|
162
|
+
シェルから直接動作確認したい場合(デバッグ用途など):
|
|
52
163
|
|
|
53
164
|
```bash
|
|
54
|
-
|
|
165
|
+
PDF_SPEC_DIR=/path/to/pdf-specs npx -y @shuji-bonji/pdf-spec-mcp
|
|
55
166
|
```
|
|
56
167
|
|
|
57
|
-
|
|
168
|
+
グローバルインストールしたい場合(任意):
|
|
58
169
|
|
|
59
170
|
```bash
|
|
60
|
-
|
|
171
|
+
npm install -g @shuji-bonji/pdf-spec-mcp
|
|
172
|
+
PDF_SPEC_DIR=/path/to/pdf-specs pdf-spec-mcp
|
|
61
173
|
```
|
|
62
174
|
|
|
63
|
-
|
|
175
|
+
### 3. MCP クライアント設定
|
|
64
176
|
|
|
65
|
-
|
|
177
|
+
#### 環境変数
|
|
66
178
|
|
|
67
179
|
| 変数 | 説明 | デフォルト |
|
|
68
180
|
| -------------- | ------------------------------------------ | ---------- |
|
|
69
181
|
| `PDF_SPEC_DIR` | PDF 仕様書ファイルが格納されたディレクトリ | (必須) |
|
|
70
182
|
|
|
71
|
-
|
|
183
|
+
#### Claude Desktop
|
|
72
184
|
|
|
73
185
|
`claude_desktop_config.json` に追加:
|
|
74
186
|
|
|
@@ -86,7 +198,7 @@ PDF_SPEC_DIR=/path/to/pdf-specs npx @shuji-bonji/pdf-spec-mcp
|
|
|
86
198
|
}
|
|
87
199
|
```
|
|
88
200
|
|
|
89
|
-
|
|
201
|
+
#### Cursor / VS Code
|
|
90
202
|
|
|
91
203
|
`.cursor/mcp.json` または VS Code の MCP 設定に追加:
|
|
92
204
|
|
|
@@ -104,22 +216,165 @@ PDF_SPEC_DIR=/path/to/pdf-specs npx @shuji-bonji/pdf-spec-mcp
|
|
|
104
216
|
}
|
|
105
217
|
```
|
|
106
218
|
|
|
107
|
-
##
|
|
219
|
+
## 提供ツール
|
|
220
|
+
|
|
221
|
+
全ツールで `spec` パラメータにより対象仕様を切り替えられます(デフォルト: `iso32000-2`)。
|
|
222
|
+
|
|
223
|
+
> **Note:** 以前検討されていた `get_ts_section` は、`get_section` に `spec` パラメータを追加することで統合されました。
|
|
224
|
+
> TS 仕様や PDF/UA も同じツールセットで参照できます。
|
|
225
|
+
|
|
226
|
+
| ツール | 説明 |
|
|
227
|
+
| ------------------ | ----------------------------------------------- |
|
|
228
|
+
| `list_specs` | 検出済みの全 PDF 仕様一覧をメタデータ付きで取得 |
|
|
229
|
+
| `get_structure` | セクション階層(目次)を深さ指定で取得 |
|
|
230
|
+
| `get_section` | 指定セクションの構造化コンテンツを取得 |
|
|
231
|
+
| `search_spec` | 仕様書内の全文キーワード検索 |
|
|
232
|
+
| `get_requirements` | 規範的要件(shall/must/may)を抽出 |
|
|
233
|
+
| `get_definitions` | 用語定義を検索 |
|
|
234
|
+
| `get_tables` | セクション内のテーブル構造を抽出 |
|
|
235
|
+
| `compare_versions` | PDF 1.7 と PDF 2.0 のセクション構造を比較 |
|
|
236
|
+
|
|
237
|
+
### `list_specs` — 仕様一覧
|
|
238
|
+
|
|
239
|
+
利用可能な全仕様書の一覧を取得します。他のツールで使う `spec` ID を確認できます。
|
|
108
240
|
|
|
109
|
-
|
|
241
|
+
```jsonc
|
|
242
|
+
// 全仕様を一覧
|
|
243
|
+
{ }
|
|
110
244
|
|
|
245
|
+
// カテゴリでフィルタ
|
|
246
|
+
{ "category": "ts" } // 技術仕様のみ
|
|
247
|
+
{ "category": "pdfua" } // PDF/UA のみ
|
|
248
|
+
{ "category": "guide" } // ガイド文書のみ
|
|
111
249
|
```
|
|
112
|
-
|
|
113
|
-
|
|
114
|
-
|
|
115
|
-
|
|
116
|
-
|
|
117
|
-
|
|
118
|
-
|
|
119
|
-
|
|
250
|
+
|
|
251
|
+
### `get_structure` — 目次取得
|
|
252
|
+
|
|
253
|
+
仕様書のセクション階層(目次ツリー)を取得します。
|
|
254
|
+
|
|
255
|
+
```jsonc
|
|
256
|
+
// PDF 2.0 のトップレベルセクションのみ
|
|
257
|
+
{ "max_depth": 1 }
|
|
258
|
+
|
|
259
|
+
// 2階層まで展開
|
|
260
|
+
{ "max_depth": 2 }
|
|
261
|
+
|
|
262
|
+
// TS 32002(電子署名)の全構造
|
|
263
|
+
{ "spec": "ts32002" }
|
|
264
|
+
|
|
265
|
+
// PDF/UA-2 の構造
|
|
266
|
+
{ "spec": "pdfua2", "max_depth": 2 }
|
|
120
267
|
```
|
|
121
268
|
|
|
122
|
-
|
|
269
|
+
### `get_section` — セクション内容取得
|
|
270
|
+
|
|
271
|
+
指定セクションの構造化コンテンツ(見出し・段落・リスト・テーブル・注記)を取得します。
|
|
272
|
+
|
|
273
|
+
```jsonc
|
|
274
|
+
// PDF 2.0 のセクション 7.3.4(String Objects)
|
|
275
|
+
{ "section": "7.3.4" }
|
|
276
|
+
|
|
277
|
+
// PDF 2.0 の Annex A
|
|
278
|
+
{ "section": "Annex A" }
|
|
279
|
+
|
|
280
|
+
// TS 32002 のセクション 5
|
|
281
|
+
{ "spec": "ts32002", "section": "5" }
|
|
282
|
+
|
|
283
|
+
// PDF/UA-2 のセクション 8(Tagged PDF)
|
|
284
|
+
{ "spec": "pdfua2", "section": "8" }
|
|
285
|
+
```
|
|
286
|
+
|
|
287
|
+
### `search_spec` — 全文検索
|
|
288
|
+
|
|
289
|
+
仕様書内をキーワード検索し、セクションコンテキスト付きの結果を返します。
|
|
290
|
+
|
|
291
|
+
```jsonc
|
|
292
|
+
// PDF 2.0 で "digital signature" を検索
|
|
293
|
+
{ "query": "digital signature" }
|
|
294
|
+
|
|
295
|
+
// 結果数を制限
|
|
296
|
+
{ "query": "font", "max_results": 5 }
|
|
297
|
+
|
|
298
|
+
// TS 32002 内で検索
|
|
299
|
+
{ "spec": "ts32002", "query": "CMS" }
|
|
300
|
+
```
|
|
301
|
+
|
|
302
|
+
### `get_requirements` — 要件抽出
|
|
303
|
+
|
|
304
|
+
ISO 規約に基づく規範的要件(shall / must / may)を抽出します。
|
|
305
|
+
|
|
306
|
+
```jsonc
|
|
307
|
+
// セクション 12.8 の全要件
|
|
308
|
+
{ "section": "12.8" }
|
|
309
|
+
|
|
310
|
+
// "shall" 要件のみ
|
|
311
|
+
{ "section": "12.8", "level": "shall" }
|
|
312
|
+
|
|
313
|
+
// "shall not" 要件のみ
|
|
314
|
+
{ "section": "7.3", "level": "shall not" }
|
|
315
|
+
|
|
316
|
+
// PDF/UA-2 の要件
|
|
317
|
+
{ "spec": "pdfua2", "section": "8", "level": "shall" }
|
|
318
|
+
```
|
|
319
|
+
|
|
320
|
+
### `get_definitions` — 用語定義検索
|
|
321
|
+
|
|
322
|
+
Section 3(用語定義)からの用語検索を行います。
|
|
323
|
+
|
|
324
|
+
```jsonc
|
|
325
|
+
// "font" に関する定義を検索
|
|
326
|
+
{ "term": "font" }
|
|
327
|
+
|
|
328
|
+
// 全定義を一覧
|
|
329
|
+
{ }
|
|
330
|
+
|
|
331
|
+
// PDF/UA の用語定義
|
|
332
|
+
{ "spec": "pdfua2", "term": "artifact" }
|
|
333
|
+
```
|
|
334
|
+
|
|
335
|
+
### `get_tables` — テーブル抽出
|
|
336
|
+
|
|
337
|
+
セクション内のテーブル構造(ヘッダー・行・キャプション)を抽出します。複数ページにまたがるテーブルも自動マージされます。
|
|
338
|
+
|
|
339
|
+
```jsonc
|
|
340
|
+
// セクション 7.3.4 の全テーブル
|
|
341
|
+
{ "section": "7.3.4" }
|
|
342
|
+
|
|
343
|
+
// 特定のテーブルのみ(0始まりのインデックス)
|
|
344
|
+
{ "section": "7.3.4", "table_index": 0 }
|
|
345
|
+
|
|
346
|
+
// TS 仕様のテーブル
|
|
347
|
+
{ "spec": "ts32002", "section": "5" }
|
|
348
|
+
```
|
|
349
|
+
|
|
350
|
+
### `compare_versions` — バージョン比較
|
|
351
|
+
|
|
352
|
+
PDF 1.7(ISO 32000-1)と PDF 2.0(ISO 32000-2)のセクション構造を比較します。タイトルベースの自動マッチングにより、一致・追加・削除されたセクションを検出します。
|
|
353
|
+
|
|
354
|
+
> [!NOTE]
|
|
355
|
+
> このツールには PDF 1.7 (`PDF32000_2008.pdf`) と PDF 2.0 の両方が `PDF_SPEC_DIR` に必要です。
|
|
356
|
+
|
|
357
|
+
```jsonc
|
|
358
|
+
// セクション 12.8(Digital Signatures)の差分
|
|
359
|
+
{ "section": "12.8" }
|
|
360
|
+
|
|
361
|
+
// トップレベルの全セクション比較
|
|
362
|
+
{ }
|
|
363
|
+
```
|
|
364
|
+
|
|
365
|
+
## 対応仕様
|
|
366
|
+
|
|
367
|
+
`PDF_SPEC_DIR` 内の PDF ファイルをファイル名パターンで自動検出します:
|
|
368
|
+
|
|
369
|
+
| カテゴリ | Spec ID | 文書 |
|
|
370
|
+
| -------------------------- | ---------------------------------------------------- | ------------------------------------------------- |
|
|
371
|
+
| **標準** | `iso32000-2`, `iso32000-2-2020`, `pdf17`, `pdf17old` | ISO 32000-2 (PDF 2.0), ISO 32000-1 (PDF 1.7) |
|
|
372
|
+
| **技術仕様** | `ts32001` – `ts32005` | ハッシュ、電子署名、AES-GCM、整合性保護、名前空間 |
|
|
373
|
+
| **PDF/UA** | `pdfua1`, `pdfua2` | アクセシビリティ (ISO 14289-1, 14289-2) |
|
|
374
|
+
| **ガイド** | `tagged-bpg`, `wtpdf`, `declarations` | Tagged PDF、Well-Tagged PDF、Declarations |
|
|
375
|
+
| **アプリケーションノート** | `an001` – `an003` | BPC、Associated Files、Object Metadata |
|
|
376
|
+
|
|
377
|
+
## ディレクトリ構造
|
|
123
378
|
|
|
124
379
|
```
|
|
125
380
|
src/
|
|
@@ -140,6 +395,8 @@ src/
|
|
|
140
395
|
├── tools/
|
|
141
396
|
│ ├── definitions.ts # MCP ツールスキーマ
|
|
142
397
|
│ └── handlers.ts # ツール実装
|
|
398
|
+
├── types/
|
|
399
|
+
│ └── index.ts # 共有型定義
|
|
143
400
|
└── utils/
|
|
144
401
|
├── concurrency.ts # mapConcurrent(制限付き並行 Promise.all)
|
|
145
402
|
├── text.ts # テキスト正規化
|
package/README.md
CHANGED
|
@@ -7,6 +7,14 @@
|
|
|
7
7
|
|
|
8
8
|
An MCP (Model Context Protocol) server that provides structured access to ISO 32000 (PDF) specification documents. Enables LLMs to navigate, search, and analyze PDF specifications through well-defined tools.
|
|
9
9
|
|
|
10
|
+
> [!IMPORTANT]
|
|
11
|
+
> **PDF specification files are NOT included in this package.**
|
|
12
|
+
> You must obtain the PDF specification documents separately and place them in a local directory.
|
|
13
|
+
>
|
|
14
|
+
> **Download from:** [PDF Association — Sponsored Standards](https://pdfa.org/sponsored-standards/)
|
|
15
|
+
>
|
|
16
|
+
> See "[Setup](#setup)" for details.
|
|
17
|
+
|
|
10
18
|
## Features
|
|
11
19
|
|
|
12
20
|
- **Multi-spec support** — Auto-discovers and manages up to 17 PDF-related documents (ISO 32000-2, PDF/UA, Tagged PDF guides, etc.)
|
|
@@ -18,57 +26,161 @@ An MCP (Model Context Protocol) server that provides structured access to ISO 32
|
|
|
18
26
|
- **Version comparison** — Diff PDF 1.7 vs PDF 2.0 section structures
|
|
19
27
|
- **Bounded-concurrency processing** — Parallel page processing for large documents
|
|
20
28
|
|
|
21
|
-
##
|
|
29
|
+
## Architecture
|
|
22
30
|
|
|
23
|
-
|
|
24
|
-
|
|
25
|
-
|
|
26
|
-
|
|
27
|
-
|
|
28
|
-
|
|
29
|
-
|
|
30
|
-
|
|
31
|
-
|
|
32
|
-
|
|
31
|
+
```mermaid
|
|
32
|
+
graph LR
|
|
33
|
+
subgraph Client["MCP Client"]
|
|
34
|
+
LLM["LLM<br/>(Claude, etc.)"]
|
|
35
|
+
end
|
|
36
|
+
|
|
37
|
+
subgraph Server["PDF Spec MCP Server"]
|
|
38
|
+
direction TB
|
|
39
|
+
MCP["MCP Server<br/>index.ts"]
|
|
40
|
+
|
|
41
|
+
subgraph Tools["Tools Layer"]
|
|
42
|
+
direction LR
|
|
43
|
+
T1["list_specs"]
|
|
44
|
+
T2["get_structure"]
|
|
45
|
+
T3["get_section"]
|
|
46
|
+
T4["search_spec"]
|
|
47
|
+
T5["get_requirements"]
|
|
48
|
+
T6["get_definitions"]
|
|
49
|
+
T7["get_tables"]
|
|
50
|
+
T8["compare_versions"]
|
|
51
|
+
end
|
|
52
|
+
|
|
53
|
+
subgraph Services["Services Layer"]
|
|
54
|
+
direction LR
|
|
55
|
+
REG["Registry<br/>Auto-discovery"]
|
|
56
|
+
LOADER["Loader<br/>LRU Cache"]
|
|
57
|
+
SVC["PDFService<br/>Orchestration"]
|
|
58
|
+
CMP["CompareService<br/>Version Diff"]
|
|
59
|
+
end
|
|
60
|
+
|
|
61
|
+
subgraph Extractors["Extractors"]
|
|
62
|
+
direction LR
|
|
63
|
+
OUTLINE["OutlineResolver<br/>TOC & Section Index"]
|
|
64
|
+
CONTENT["ContentExtractor<br/>Structured Extraction"]
|
|
65
|
+
SEARCH["SearchIndex<br/>Full-text Search"]
|
|
66
|
+
REQ["RequirementExtractor"]
|
|
67
|
+
DEF["DefinitionExtractor"]
|
|
68
|
+
end
|
|
69
|
+
|
|
70
|
+
subgraph Utils["Utils"]
|
|
71
|
+
direction LR
|
|
72
|
+
CACHE["LRU Cache"]
|
|
73
|
+
CONC["Concurrency"]
|
|
74
|
+
VALID["Validation"]
|
|
75
|
+
end
|
|
76
|
+
end
|
|
77
|
+
|
|
78
|
+
subgraph PDFs["PDF Spec Files (obtained separately)"]
|
|
79
|
+
direction LR
|
|
80
|
+
PDF1["ISO 32000-2<br/>(PDF 2.0)"]
|
|
81
|
+
PDF2["ISO 32000-1<br/>(PDF 1.7)"]
|
|
82
|
+
PDF3["TS 32001–32005<br/>PDF/UA, etc."]
|
|
83
|
+
end
|
|
84
|
+
|
|
85
|
+
LLM <-->|"stdio / JSON-RPC"| MCP
|
|
86
|
+
MCP --> Tools
|
|
87
|
+
Tools --> Services
|
|
88
|
+
Services --> Extractors
|
|
89
|
+
Services --> Utils
|
|
90
|
+
LOADER --> PDFs
|
|
91
|
+
REG -->|"Filename pattern<br/>auto-discovery"| PDFs
|
|
92
|
+
|
|
93
|
+
style Client fill:#e8f4f8,stroke:#2196F3
|
|
94
|
+
style PDFs fill:#fff3e0,stroke:#FF9800
|
|
95
|
+
style Tools fill:#e8f5e9,stroke:#4CAF50
|
|
96
|
+
style Services fill:#f3e5f5,stroke:#9C27B0
|
|
97
|
+
style Extractors fill:#fce4ec,stroke:#E91E63
|
|
98
|
+
style Utils fill:#f5f5f5,stroke:#9E9E9E
|
|
99
|
+
```
|
|
33
100
|
|
|
34
|
-
|
|
101
|
+
### Layer Overview
|
|
35
102
|
|
|
36
|
-
|
|
103
|
+
| Layer | Responsibility |
|
|
104
|
+
| -------------- | ---------------------------------------------------------------------------------- |
|
|
105
|
+
| **Tools** | MCP tool schema definitions & handlers (input validation) |
|
|
106
|
+
| **Services** | Business logic (PDF registry, loader, orchestration) |
|
|
107
|
+
| **Extractors** | Information extraction from PDFs (TOC, content, search, requirements, definitions) |
|
|
108
|
+
| **Utils** | Shared utilities (cache, concurrency, validation) |
|
|
37
109
|
|
|
38
|
-
|
|
39
|
-
| ------------------ | ---------------------------------------------------- | ------------------------------------------------------- |
|
|
40
|
-
| **Standard** | `iso32000-2`, `iso32000-2-2020`, `pdf17`, `pdf17old` | ISO 32000-2 (PDF 2.0), ISO 32000-1 (PDF 1.7) |
|
|
41
|
-
| **Technical Spec** | `ts32001` – `ts32005` | Hash, Digital Signatures, AES-GCM, Integrity, Namespace |
|
|
42
|
-
| **PDF/UA** | `pdfua1`, `pdfua2` | Accessibility (ISO 14289-1, 14289-2) |
|
|
43
|
-
| **Guide** | `tagged-bpg`, `wtpdf`, `declarations` | Tagged PDF, Well-Tagged PDF, Declarations |
|
|
44
|
-
| **App Note** | `an001` – `an003` | BPC, Associated Files, Object Metadata |
|
|
110
|
+
## Setup
|
|
45
111
|
|
|
46
|
-
|
|
112
|
+
### 1. Obtain PDF Specification Files
|
|
47
113
|
|
|
48
|
-
|
|
49
|
-
|
|
114
|
+
> [!WARNING]
|
|
115
|
+
> PDF specifications are **copyrighted documents** and are not included in this package.
|
|
116
|
+
> Download them from the sources below and place them in a local directory.
|
|
50
117
|
|
|
51
|
-
|
|
118
|
+
| Document | Source |
|
|
119
|
+
| ---------------------------- | ----------------------------------------------------------------------------------------------- |
|
|
120
|
+
| ISO 32000-2 (PDF 2.0) | [PDF Association](https://pdfa.org/resource/iso-32000-pdf/) |
|
|
121
|
+
| ISO 32000-1 (PDF 1.7) | [Adobe (free)](https://opensource.adobe.com/dc-acrobat-sdk-docs/pdfstandards/PDF32000_2008.pdf) |
|
|
122
|
+
| TS 32001–32005, PDF/UA, etc. | [PDF Association — Sponsored Standards](https://pdfa.org/sponsored-standards/) |
|
|
123
|
+
|
|
124
|
+
All 17 files below are supported. You do not need all of them — place only the specs you need (at minimum, ISO 32000-2 is recommended).
|
|
125
|
+
|
|
126
|
+
```
|
|
127
|
+
pdf-specs/
|
|
128
|
+
│
|
|
129
|
+
│ ── Standards ─────────────────────────────
|
|
130
|
+
├── ISO_32000-2_sponsored-ec2.pdf # iso32000-2 : PDF 2.0 EC2 (recommended)
|
|
131
|
+
├── ISO_32000-2-2020_sponsored.pdf # iso32000-2-2020 : PDF 2.0 original
|
|
132
|
+
├── PDF32000_2008.pdf # pdf17 : PDF 1.7 (for version comparison)
|
|
133
|
+
├── pdfreference1.7old.pdf # pdf17old : Adobe PDF Reference 1.7
|
|
134
|
+
│
|
|
135
|
+
│ ── Technical Specifications (TS) ─────────
|
|
136
|
+
├── ISO_TS_32001-2022_sponsored.pdf # ts32001 : Hash extensions (SHA-3)
|
|
137
|
+
├── ISO_TS_32002-2022_sponsored.pdf # ts32002 : Digital signature extensions (ECC/PAdES)
|
|
138
|
+
├── ISO_TS_32003-2023_sponsored.pdf # ts32003 : AES-GCM encryption
|
|
139
|
+
├── ISO-TS-32004-2024_sponsored.pdf # ts32004 : Integrity protection
|
|
140
|
+
├── ISO-TS-32005-2023-sponsored.pdf # ts32005 : Namespace mapping
|
|
141
|
+
│
|
|
142
|
+
│ ── PDF/UA (Accessibility) ────────────────
|
|
143
|
+
├── ISO-14289-1-2014-sponsored.pdf # pdfua1 : PDF/UA-1
|
|
144
|
+
├── ISO-14289-2-2024-sponsored.pdf # pdfua2 : PDF/UA-2
|
|
145
|
+
│
|
|
146
|
+
│ ── Guides ────────────────────────────────
|
|
147
|
+
├── Tagged-PDF-Best-Practice-Guide.pdf # tagged-bpg : Tagged PDF Best Practice
|
|
148
|
+
├── Well-Tagged-PDF-WTPDF-1.0.pdf # wtpdf : Well-Tagged PDF
|
|
149
|
+
├── PDF-Declarations.pdf # declarations: PDF Declarations
|
|
150
|
+
│
|
|
151
|
+
│ ── Application Notes ─────────────────────
|
|
152
|
+
├── PDF20_AN001-BPC.pdf # an001 : Black Point Compensation
|
|
153
|
+
├── PDF20_AN002-AF.pdf # an002 : Associated Files
|
|
154
|
+
└── PDF20_AN003-ObjectMetadataLocations.pdf # an003 : Object Metadata
|
|
155
|
+
```
|
|
156
|
+
|
|
157
|
+
### 2. Install
|
|
158
|
+
|
|
159
|
+
This package ships a CLI binary (`pdf-spec-mcp`) intended to be launched by an MCP client.
|
|
160
|
+
**You do not need to install it manually** — just point your MCP client to `npx @shuji-bonji/pdf-spec-mcp` as shown in the next step.
|
|
161
|
+
|
|
162
|
+
If you want to run it directly from the shell (e.g. for debugging):
|
|
52
163
|
|
|
53
164
|
```bash
|
|
54
|
-
|
|
165
|
+
PDF_SPEC_DIR=/path/to/pdf-specs npx -y @shuji-bonji/pdf-spec-mcp
|
|
55
166
|
```
|
|
56
167
|
|
|
57
|
-
Or
|
|
168
|
+
Or install it globally (optional):
|
|
58
169
|
|
|
59
170
|
```bash
|
|
60
|
-
|
|
171
|
+
npm install -g @shuji-bonji/pdf-spec-mcp
|
|
172
|
+
PDF_SPEC_DIR=/path/to/pdf-specs pdf-spec-mcp
|
|
61
173
|
```
|
|
62
174
|
|
|
63
|
-
|
|
175
|
+
### 3. Configure MCP Client
|
|
64
176
|
|
|
65
|
-
|
|
177
|
+
#### Environment Variable
|
|
66
178
|
|
|
67
179
|
| Variable | Description | Default |
|
|
68
180
|
| -------------- | -------------------------------------------- | ---------- |
|
|
69
181
|
| `PDF_SPEC_DIR` | Directory containing PDF specification files | (required) |
|
|
70
182
|
|
|
71
|
-
|
|
183
|
+
#### Claude Desktop
|
|
72
184
|
|
|
73
185
|
Add to `claude_desktop_config.json`:
|
|
74
186
|
|
|
@@ -86,7 +198,7 @@ Add to `claude_desktop_config.json`:
|
|
|
86
198
|
}
|
|
87
199
|
```
|
|
88
200
|
|
|
89
|
-
|
|
201
|
+
#### Cursor / VS Code
|
|
90
202
|
|
|
91
203
|
Add to `.cursor/mcp.json` or VS Code MCP settings:
|
|
92
204
|
|
|
@@ -104,22 +216,162 @@ Add to `.cursor/mcp.json` or VS Code MCP settings:
|
|
|
104
216
|
}
|
|
105
217
|
```
|
|
106
218
|
|
|
107
|
-
##
|
|
219
|
+
## Available Tools
|
|
220
|
+
|
|
221
|
+
All tools accept an optional `spec` parameter to target a specific specification (default: `iso32000-2`).
|
|
222
|
+
|
|
223
|
+
| Tool | Description |
|
|
224
|
+
| ------------------ | ----------------------------------------------------------------- |
|
|
225
|
+
| `list_specs` | List all discovered PDF specifications with metadata |
|
|
226
|
+
| `get_structure` | Get section hierarchy (table of contents) with configurable depth |
|
|
227
|
+
| `get_section` | Get structured content of a specific section |
|
|
228
|
+
| `search_spec` | Full-text keyword search across a specification |
|
|
229
|
+
| `get_requirements` | Extract normative requirements (shall/must/may) |
|
|
230
|
+
| `get_definitions` | Lookup term definitions |
|
|
231
|
+
| `get_tables` | Extract table structures from a section |
|
|
232
|
+
| `compare_versions` | Compare PDF 1.7 and PDF 2.0 section structures |
|
|
233
|
+
|
|
234
|
+
### `list_specs` — Discover Specifications
|
|
235
|
+
|
|
236
|
+
List all available specification documents. Use the returned IDs as the `spec` parameter in other tools.
|
|
237
|
+
|
|
238
|
+
```jsonc
|
|
239
|
+
// List all specs
|
|
240
|
+
{ }
|
|
241
|
+
|
|
242
|
+
// Filter by category
|
|
243
|
+
{ "category": "ts" } // Technical specs only
|
|
244
|
+
{ "category": "pdfua" } // PDF/UA only
|
|
245
|
+
{ "category": "guide" } // Guide documents only
|
|
246
|
+
```
|
|
247
|
+
|
|
248
|
+
### `get_structure` — Table of Contents
|
|
249
|
+
|
|
250
|
+
Get the section hierarchy (TOC tree) of a specification.
|
|
251
|
+
|
|
252
|
+
```jsonc
|
|
253
|
+
// PDF 2.0 top-level sections only
|
|
254
|
+
{ "max_depth": 1 }
|
|
255
|
+
|
|
256
|
+
// Expand to 2 levels
|
|
257
|
+
{ "max_depth": 2 }
|
|
258
|
+
|
|
259
|
+
// TS 32002 (Digital Signatures) full structure
|
|
260
|
+
{ "spec": "ts32002" }
|
|
261
|
+
|
|
262
|
+
// PDF/UA-2 structure
|
|
263
|
+
{ "spec": "pdfua2", "max_depth": 2 }
|
|
264
|
+
```
|
|
265
|
+
|
|
266
|
+
### `get_section` — Section Content
|
|
267
|
+
|
|
268
|
+
Get structured content (headings, paragraphs, lists, tables, notes) of a specific section.
|
|
269
|
+
|
|
270
|
+
```jsonc
|
|
271
|
+
// PDF 2.0 Section 7.3.4 (String Objects)
|
|
272
|
+
{ "section": "7.3.4" }
|
|
273
|
+
|
|
274
|
+
// PDF 2.0 Annex A
|
|
275
|
+
{ "section": "Annex A" }
|
|
276
|
+
|
|
277
|
+
// TS 32002 Section 5
|
|
278
|
+
{ "spec": "ts32002", "section": "5" }
|
|
279
|
+
|
|
280
|
+
// PDF/UA-2 Section 8 (Tagged PDF)
|
|
281
|
+
{ "spec": "pdfua2", "section": "8" }
|
|
282
|
+
```
|
|
283
|
+
|
|
284
|
+
### `search_spec` — Full-text Search
|
|
285
|
+
|
|
286
|
+
Search across a specification with section-aware context snippets.
|
|
108
287
|
|
|
109
|
-
|
|
288
|
+
```jsonc
|
|
289
|
+
// Search PDF 2.0 for "digital signature"
|
|
290
|
+
{ "query": "digital signature" }
|
|
110
291
|
|
|
292
|
+
// Limit results
|
|
293
|
+
{ "query": "font", "max_results": 5 }
|
|
294
|
+
|
|
295
|
+
// Search within TS 32002
|
|
296
|
+
{ "spec": "ts32002", "query": "CMS" }
|
|
111
297
|
```
|
|
112
|
-
|
|
113
|
-
|
|
114
|
-
|
|
115
|
-
|
|
116
|
-
|
|
117
|
-
|
|
118
|
-
|
|
119
|
-
|
|
298
|
+
|
|
299
|
+
### `get_requirements` — Normative Requirements
|
|
300
|
+
|
|
301
|
+
Extract normative requirements (shall / must / may) per ISO conventions.
|
|
302
|
+
|
|
303
|
+
```jsonc
|
|
304
|
+
// All requirements in section 12.8
|
|
305
|
+
{ "section": "12.8" }
|
|
306
|
+
|
|
307
|
+
// Only "shall" requirements
|
|
308
|
+
{ "section": "12.8", "level": "shall" }
|
|
309
|
+
|
|
310
|
+
// Only "shall not" requirements
|
|
311
|
+
{ "section": "7.3", "level": "shall not" }
|
|
312
|
+
|
|
313
|
+
// PDF/UA-2 requirements
|
|
314
|
+
{ "spec": "pdfua2", "section": "8", "level": "shall" }
|
|
120
315
|
```
|
|
121
316
|
|
|
122
|
-
|
|
317
|
+
### `get_definitions` — Term Definitions
|
|
318
|
+
|
|
319
|
+
Look up term definitions from Section 3 (Definitions).
|
|
320
|
+
|
|
321
|
+
```jsonc
|
|
322
|
+
// Search for "font" definitions
|
|
323
|
+
{ "term": "font" }
|
|
324
|
+
|
|
325
|
+
// List all definitions
|
|
326
|
+
{ }
|
|
327
|
+
|
|
328
|
+
// PDF/UA definitions
|
|
329
|
+
{ "spec": "pdfua2", "term": "artifact" }
|
|
330
|
+
```
|
|
331
|
+
|
|
332
|
+
### `get_tables` — Table Extraction
|
|
333
|
+
|
|
334
|
+
Extract table structures (headers, rows, captions) from a section. Multi-page tables are automatically merged.
|
|
335
|
+
|
|
336
|
+
```jsonc
|
|
337
|
+
// All tables in section 7.3.4
|
|
338
|
+
{ "section": "7.3.4" }
|
|
339
|
+
|
|
340
|
+
// Specific table only (0-based index)
|
|
341
|
+
{ "section": "7.3.4", "table_index": 0 }
|
|
342
|
+
|
|
343
|
+
// TS spec tables
|
|
344
|
+
{ "spec": "ts32002", "section": "5" }
|
|
345
|
+
```
|
|
346
|
+
|
|
347
|
+
### `compare_versions` — Version Comparison
|
|
348
|
+
|
|
349
|
+
Compare section structures between PDF 1.7 (ISO 32000-1) and PDF 2.0 (ISO 32000-2). Uses title-based automatic matching to detect matched, added, and removed sections.
|
|
350
|
+
|
|
351
|
+
> [!NOTE]
|
|
352
|
+
> This tool requires both PDF 1.7 (`PDF32000_2008.pdf`) and PDF 2.0 files in `PDF_SPEC_DIR`.
|
|
353
|
+
|
|
354
|
+
```jsonc
|
|
355
|
+
// Diff section 12.8 (Digital Signatures)
|
|
356
|
+
{ "section": "12.8" }
|
|
357
|
+
|
|
358
|
+
// Compare all top-level sections
|
|
359
|
+
{ }
|
|
360
|
+
```
|
|
361
|
+
|
|
362
|
+
## Supported Specifications
|
|
363
|
+
|
|
364
|
+
The server auto-discovers PDF files in `PDF_SPEC_DIR` by filename pattern matching:
|
|
365
|
+
|
|
366
|
+
| Category | Spec IDs | Documents |
|
|
367
|
+
| ------------------ | ---------------------------------------------------- | ------------------------------------------------------- |
|
|
368
|
+
| **Standard** | `iso32000-2`, `iso32000-2-2020`, `pdf17`, `pdf17old` | ISO 32000-2 (PDF 2.0), ISO 32000-1 (PDF 1.7) |
|
|
369
|
+
| **Technical Spec** | `ts32001` – `ts32005` | Hash, Digital Signatures, AES-GCM, Integrity, Namespace |
|
|
370
|
+
| **PDF/UA** | `pdfua1`, `pdfua2` | Accessibility (ISO 14289-1, 14289-2) |
|
|
371
|
+
| **Guide** | `tagged-bpg`, `wtpdf`, `declarations` | Tagged PDF, Well-Tagged PDF, Declarations |
|
|
372
|
+
| **App Note** | `an001` – `an003` | BPC, Associated Files, Object Metadata |
|
|
373
|
+
|
|
374
|
+
## Directory Structure
|
|
123
375
|
|
|
124
376
|
```
|
|
125
377
|
src/
|
|
@@ -140,6 +392,8 @@ src/
|
|
|
140
392
|
├── tools/
|
|
141
393
|
│ ├── definitions.ts # MCP tool schemas
|
|
142
394
|
│ └── handlers.ts # Tool implementations
|
|
395
|
+
├── types/
|
|
396
|
+
│ └── index.ts # Shared type definitions
|
|
143
397
|
└── utils/
|
|
144
398
|
├── concurrency.ts # mapConcurrent (bounded Promise.all)
|
|
145
399
|
├── text.ts # Text normalization
|