@noedgeai-org/doc2x-mcp 0.1.3-dev.6.1 → 0.1.4-dev.8.1
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +179 -57
- package/README_EN.md +181 -59
- package/dist/doc2x/materialize.d.ts +15 -0
- package/dist/doc2x/materialize.js +52 -0
- package/dist/doc2x/pdf.d.ts +12 -1
- package/dist/doc2x/pdf.js +19 -10
- package/dist/mcp/registerPdfTools.js +66 -4
- package/dist/mcp/registerToolsShared.d.ts +2 -0
- package/dist/mcp/registerToolsShared.js +7 -2
- package/package.json +2 -2
- package/skills/doc2x-mcp/SKILL.md +78 -135
package/README.md
CHANGED
|
@@ -1,29 +1,47 @@
|
|
|
1
1
|
# Doc2x MCP Server
|
|
2
2
|
|
|
3
|
+
[](https://github.com/NoEdgeAI/doc2x-mcp/actions/workflows/ci.yml)
|
|
4
|
+
[](https://github.com/NoEdgeAI/doc2x-mcp/actions/workflows/publish.yml)
|
|
5
|
+
[](https://www.npmjs.com/package/@noedgeai-org/doc2x-mcp)
|
|
6
|
+
|
|
3
7
|
简体中文 | [English](./README_EN.md)
|
|
4
8
|
|
|
5
|
-
|
|
9
|
+
将 Doc2x v2 PDF/图片能力封装为基于 stdio 的 MCP Server,提供稳定、可组合的语义化 tools。
|
|
10
|
+
|
|
11
|
+
## 目录
|
|
12
|
+
|
|
13
|
+
- [项目定位](#项目定位)
|
|
14
|
+
- [版本与环境](#版本与环境)
|
|
15
|
+
- [快速开始](#快速开始)
|
|
16
|
+
- [配置参考](#配置参考)
|
|
17
|
+
- [Tool API 总览](#tool-api-总览)
|
|
18
|
+
- [常见工作流](#常见工作流)
|
|
19
|
+
- [本地开发](#本地开发)
|
|
20
|
+
- [CI / 发布流水线](#ci--发布流水线)
|
|
21
|
+
- [如何发布](#如何发布)
|
|
22
|
+
- [安装本仓库 Skill(可选)](#安装本仓库-skill可选)
|
|
23
|
+
- [安全与排错](#安全与排错)
|
|
24
|
+
- [问题反馈](#问题反馈)
|
|
25
|
+
- [License](#license)
|
|
6
26
|
|
|
7
|
-
##
|
|
27
|
+
## 项目定位
|
|
8
28
|
|
|
9
|
-
-
|
|
29
|
+
- 面向 MCP 客户端(Codex CLI / Claude Code / 自定义 Agent)提供 Doc2x 能力。
|
|
30
|
+
- 以 submit/status/wait 统一异步任务模型,便于自动化编排。
|
|
31
|
+
- 提供可控超时、轮询、下载白名单等运行时安全边界。
|
|
10
32
|
|
|
11
|
-
##
|
|
33
|
+
## 版本与环境
|
|
12
34
|
|
|
13
|
-
|
|
35
|
+
- 本地运行:Node.js `>=18` 即可。
|
|
36
|
+
- CI 校验:Node.js `18`、`20` 都会跑构建。
|
|
37
|
+
- 发布环境:GitHub Actions 中发布任务使用 Node.js `24`。
|
|
38
|
+
- 包管理器:统一用 pnpm(锁文件 `pnpm-lock.yaml`)。
|
|
14
39
|
|
|
15
|
-
|
|
16
|
-
- `DOC2X_BASE_URL`:可选,默认 `https://v2.doc2x.noedgeai.com`
|
|
17
|
-
- `DOC2X_HTTP_TIMEOUT_MS`:可选,默认 `60000`
|
|
18
|
-
- `DOC2X_POLL_INTERVAL_MS`:可选,默认 `2000`
|
|
19
|
-
- `DOC2X_MAX_WAIT_MS`:可选,默认 `600000`
|
|
20
|
-
- `DOC2X_PARSE_PDF_MAX_OUTPUT_CHARS`:可选,默认 `5000`;限制 `doc2x_parse_pdf_wait_text` 返回文本的最大字符数,避免大模型上下文超限(设为 `0` 表示不限制)
|
|
21
|
-
- `DOC2X_PARSE_PDF_MAX_OUTPUT_PAGES`:可选,默认 `10`;限制 `doc2x_parse_pdf_wait_text` 合并的最大页数(设为 `0` 表示不限制)
|
|
22
|
-
- `DOC2X_DOWNLOAD_URL_ALLOWLIST`:可选,默认 `".amazonaws.com.cn,.aliyuncs.com,.noedgeai.com"`;设为 `*` 可允许任意 host(不推荐)
|
|
40
|
+
## 快速开始
|
|
23
41
|
|
|
24
|
-
|
|
42
|
+
### 方式 A:通过 npx(推荐)
|
|
25
43
|
|
|
26
|
-
|
|
44
|
+
在 MCP client 配置中添加:
|
|
27
45
|
|
|
28
46
|
```json
|
|
29
47
|
{
|
|
@@ -40,11 +58,13 @@
|
|
|
40
58
|
|
|
41
59
|
```bash
|
|
42
60
|
cd doc2x-mcp
|
|
43
|
-
|
|
44
|
-
|
|
45
|
-
DOC2X_API_KEY=sk-xxx
|
|
61
|
+
pnpm install --frozen-lockfile
|
|
62
|
+
pnpm run build
|
|
63
|
+
DOC2X_API_KEY=sk-xxx pnpm start
|
|
46
64
|
```
|
|
47
65
|
|
|
66
|
+
MCP client 指向本地构建产物:
|
|
67
|
+
|
|
48
68
|
```json
|
|
49
69
|
{
|
|
50
70
|
"command": "node",
|
|
@@ -56,27 +76,34 @@ DOC2X_API_KEY=sk-xxx npm start
|
|
|
56
76
|
}
|
|
57
77
|
```
|
|
58
78
|
|
|
59
|
-
##
|
|
60
|
-
|
|
61
|
-
|
|
62
|
-
|
|
63
|
-
|
|
64
|
-
|
|
65
|
-
|
|
66
|
-
|
|
67
|
-
|
|
68
|
-
|
|
69
|
-
|
|
70
|
-
|
|
71
|
-
|
|
72
|
-
|
|
73
|
-
|
|
79
|
+
## 配置参考
|
|
80
|
+
|
|
81
|
+
| 环境变量 | 必填 | 默认值 | 说明 |
|
|
82
|
+
| --- | --- | --- | --- |
|
|
83
|
+
| `DOC2X_API_KEY` | 是 | - | Doc2x API Key(`sk-xxx`) |
|
|
84
|
+
| `DOC2X_BASE_URL` | 否 | `https://v2.doc2x.noedgeai.com` | Doc2x API 基础地址 |
|
|
85
|
+
| `DOC2X_HTTP_TIMEOUT_MS` | 否 | `60000` | 单次 HTTP 超时(毫秒) |
|
|
86
|
+
| `DOC2X_POLL_INTERVAL_MS` | 否 | `2000` | 轮询间隔(毫秒) |
|
|
87
|
+
| `DOC2X_MAX_WAIT_MS` | 否 | `600000` | wait 类工具最大等待时长(毫秒) |
|
|
88
|
+
| `DOC2X_PARSE_PDF_MAX_OUTPUT_CHARS` | 否 | `5000` | `doc2x_parse_pdf_wait_text` 最大返回字符数;`0`=不限制 |
|
|
89
|
+
| `DOC2X_PARSE_PDF_MAX_OUTPUT_PAGES` | 否 | `10` | `doc2x_parse_pdf_wait_text` 最大合并页数;`0`=不限制 |
|
|
90
|
+
| `DOC2X_DOWNLOAD_URL_ALLOWLIST` | 否 | `.amazonaws.com.cn,.aliyuncs.com,.noedgeai.com` | 下载 URL 白名单;`*` 允许任意 host(不推荐) |
|
|
91
|
+
|
|
92
|
+
## Tool API 总览
|
|
93
|
+
|
|
94
|
+
| 阶段 | Tools | 说明 |
|
|
95
|
+
| --- | --- | --- |
|
|
96
|
+
| PDF 解析 | `doc2x_parse_pdf_submit` / `doc2x_parse_pdf_status` / `doc2x_parse_pdf_wait_text` / `doc2x_materialize_pdf_layout_json` | 提交任务、查询状态、等待并取文本,或将 v3 layout 结果落盘为本地 JSON |
|
|
97
|
+
| 结果导出 | `doc2x_convert_export_submit` / `doc2x_convert_export_result` / `doc2x_convert_export_wait` | 发起导出、查结果、等待导出完成 |
|
|
98
|
+
| 下载落盘 | `doc2x_download_url_to_file` / `doc2x_materialize_convert_zip` | 下载 URL 到本地、解包 convert zip |
|
|
99
|
+
| 图片版面解析 | `doc2x_parse_image_layout_sync` / `doc2x_parse_image_layout_submit` / `doc2x_parse_image_layout_status` / `doc2x_parse_image_layout_wait_text` | 同步/异步图片 OCR 与版面解析 |
|
|
100
|
+
| 诊断 | `doc2x_debug_config` | 返回配置解析与 API key 来源,便于排错 |
|
|
74
101
|
|
|
75
102
|
### PDF 解析模型(`doc2x_parse_pdf_submit` / `doc2x_parse_pdf_wait_text`)
|
|
76
103
|
|
|
77
104
|
- 可选参数:`model`
|
|
78
|
-
-
|
|
79
|
-
-
|
|
105
|
+
- 可选值:`v2`(默认) / `v3-2026`(最新模型)
|
|
106
|
+
- 不传时默认 `v2`
|
|
80
107
|
|
|
81
108
|
```json
|
|
82
109
|
{
|
|
@@ -84,22 +111,110 @@ DOC2X_API_KEY=sk-xxx npm start
|
|
|
84
111
|
}
|
|
85
112
|
```
|
|
86
113
|
|
|
114
|
+
### PDF Layout JSON 落盘(`doc2x_materialize_pdf_layout_json`)
|
|
115
|
+
|
|
116
|
+
- 必选参数:`output_path`
|
|
117
|
+
- `uid` 与 `pdf_path` 二选一
|
|
118
|
+
- `v2` 不支持 `layout`;需要 `pages[].layout` 时请使用 `v3-2026`
|
|
119
|
+
- 若传 `pdf_path` 但不传 `model`,该工具默认使用 `v3-2026`
|
|
120
|
+
- 成功时将原始 `result` JSON 写到本地
|
|
121
|
+
|
|
122
|
+
`layout` 是页面块结构和坐标信息,适合 figure/table 裁剪、区域高亮、结构化抽取和版面分析;如果只想看正文内容,优先使用 Markdown / DOCX 导出。
|
|
123
|
+
|
|
124
|
+
```json
|
|
125
|
+
{
|
|
126
|
+
"pdf_path": "/absolute/path/to/input.pdf",
|
|
127
|
+
"output_path": "/absolute/path/to/input_v3.layout.json"
|
|
128
|
+
}
|
|
129
|
+
```
|
|
130
|
+
|
|
87
131
|
### 导出公式参数(`doc2x_convert_export_submit` / `doc2x_convert_export_wait`)
|
|
88
132
|
|
|
89
133
|
- 必选参数:`formula_mode`(`normal` / `dollar`)
|
|
90
|
-
- 可选参数:`formula_level
|
|
134
|
+
- 可选参数:`formula_level`(仅源解析任务为 `model=v3-2026` 时生效)
|
|
91
135
|
- 取值说明:
|
|
92
|
-
- `0
|
|
93
|
-
- `1
|
|
94
|
-
- `2
|
|
136
|
+
- `0`:保留公式
|
|
137
|
+
- `1`:仅退化行内公式(`\\(...\\)`、`$...$`)
|
|
138
|
+
- `2`:退化全部公式(`\\(...\\)`、`$...$`、`\\[...\\]`、`$$...$$`)
|
|
95
139
|
|
|
96
|
-
##
|
|
140
|
+
## 常见工作流
|
|
97
141
|
|
|
98
|
-
|
|
142
|
+
### 工作流 1:PDF -> Markdown 本地文件
|
|
143
|
+
|
|
144
|
+
1. `doc2x_parse_pdf_submit` 提交 PDF 解析。
|
|
145
|
+
2. `doc2x_convert_export_wait` 等待导出(`to=md`,并指定 `formula_mode`)。
|
|
146
|
+
3. `doc2x_convert_export_result` 获取下载 URL。
|
|
147
|
+
4. `doc2x_download_url_to_file` 下载到目标路径。
|
|
148
|
+
|
|
149
|
+
### 工作流 2:图片版面 OCR 快速结果
|
|
150
|
+
|
|
151
|
+
1. `doc2x_parse_image_layout_sync` 直接同步解析。
|
|
152
|
+
2. 若需要稳态轮询,改用 submit/status/wait 组合。
|
|
153
|
+
|
|
154
|
+
### 工作流 3:PDF -> v3 layout JSON 本地文件
|
|
155
|
+
|
|
156
|
+
1. 调用 `doc2x_materialize_pdf_layout_json`,传入 `pdf_path` 和 `output_path`。
|
|
157
|
+
2. 工具会等待 parse 成功,并将原始 `result` JSON 落到本地。
|
|
158
|
+
3. 该 JSON 可直接提供给后续 figure/table 裁剪脚本使用。
|
|
159
|
+
|
|
160
|
+
## 本地开发
|
|
161
|
+
|
|
162
|
+
### 环境要求
|
|
163
|
+
|
|
164
|
+
- Node.js `>=18`
|
|
165
|
+
- pnpm(与仓库锁文件一致)
|
|
166
|
+
|
|
167
|
+
### 常用命令
|
|
168
|
+
|
|
169
|
+
```bash
|
|
170
|
+
pnpm install --frozen-lockfile
|
|
171
|
+
pnpm run build
|
|
172
|
+
pnpm run test
|
|
173
|
+
pnpm run format:check
|
|
174
|
+
```
|
|
175
|
+
|
|
176
|
+
运行服务:
|
|
177
|
+
|
|
178
|
+
```bash
|
|
179
|
+
DOC2X_API_KEY=sk-xxx pnpm start
|
|
180
|
+
```
|
|
99
181
|
|
|
100
|
-
##
|
|
182
|
+
## CI / 发布流水线
|
|
101
183
|
|
|
102
|
-
|
|
184
|
+
仓库使用 GitHub Actions:
|
|
185
|
+
|
|
186
|
+
- CI:`.github/workflows/ci.yml`
|
|
187
|
+
- 触发:`push` 到 `main`、`pull_request`、`workflow_dispatch`、每周一 UTC `03:17`
|
|
188
|
+
- 文档-only 变更(`**/*.md`、`LICENSE`)自动跳过
|
|
189
|
+
- 任务:
|
|
190
|
+
- `dependency-review`(仅 PR)
|
|
191
|
+
- `build`(Node.js `18/20` 矩阵)
|
|
192
|
+
- `package-smoke`(`npm pack --dry-run`)
|
|
193
|
+
- `security-audit`(仅手动/定时)
|
|
194
|
+
|
|
195
|
+
- Publish:`.github/workflows/publish.yml`
|
|
196
|
+
- `dev` 分支 push:发布 npm `dev` tag
|
|
197
|
+
- `v*.*.*` tag push:发布 npm `latest`
|
|
198
|
+
- 发布前校验 tag 版本与 `package.json` 版本一致
|
|
199
|
+
- 发布命令:`npm publish --provenance`
|
|
200
|
+
|
|
201
|
+
建议提交前本地对齐:
|
|
202
|
+
|
|
203
|
+
```bash
|
|
204
|
+
pnpm install --frozen-lockfile
|
|
205
|
+
pnpm run build
|
|
206
|
+
npm pack --dry-run
|
|
207
|
+
pnpm audit --prod --audit-level high
|
|
208
|
+
```
|
|
209
|
+
|
|
210
|
+
## 如何发布
|
|
211
|
+
|
|
212
|
+
- 开发预发布(`dev`):push 到 `dev` 分支后自动发布到 npm `dev` tag。版本会自动改成 `x.y.z-dev.<run>.<attempt>`。
|
|
213
|
+
- 正式发布(`latest`):push `v*.*.*` tag 后发布到 npm `latest`。tag 版本必须和 `package.json` 版本一致。
|
|
214
|
+
|
|
215
|
+
## 安装本仓库 Skill(可选)
|
|
216
|
+
|
|
217
|
+
用于给 Codex CLI / Claude Code 增加一个“教大模型如何使用 doc2x-mcp tools 的 Skill”。
|
|
103
218
|
|
|
104
219
|
不需要 clone 仓库的一键安装(推荐):
|
|
105
220
|
|
|
@@ -107,28 +222,35 @@ MIT License,详见 `LICENSE`。
|
|
|
107
222
|
curl -fsSL https://raw.githubusercontent.com/NoEdgeAI/doc2x-mcp/main/scripts/install-skill.sh | sh
|
|
108
223
|
```
|
|
109
224
|
|
|
110
|
-
重复执行同一条命令即可覆盖安装(默认会覆盖已存在目录)。
|
|
111
|
-
|
|
112
225
|
在本仓库源码目录安装:
|
|
113
226
|
|
|
114
227
|
```bash
|
|
115
|
-
|
|
228
|
+
pnpm run skill:install
|
|
116
229
|
```
|
|
117
230
|
|
|
118
|
-
|
|
119
|
-
|
|
120
|
-
脚本默认安装到:
|
|
231
|
+
默认安装目录:
|
|
121
232
|
|
|
122
|
-
- Codex CLI:`~/.codex/skills/public/doc2x-mcp
|
|
123
|
-
- Claude Code:`~/.claude/skills/doc2x-mcp
|
|
233
|
+
- Codex CLI:`~/.codex/skills/public/doc2x-mcp`(可用 `CODEX_HOME` 覆盖)
|
|
234
|
+
- Claude Code:`~/.claude/skills/doc2x-mcp`(可用 `CLAUDE_HOME` 覆盖)
|
|
124
235
|
|
|
125
236
|
说明:
|
|
126
237
|
|
|
127
|
-
- `--target auto`(默认)会同时安装到 Codex + Claude
|
|
128
|
-
- PowerShell 7
|
|
129
|
-
- Windows PowerShell 5.1
|
|
238
|
+
- `--target auto`(默认)会同时安装到 Codex + Claude。
|
|
239
|
+
- PowerShell 7+:`irm https://raw.githubusercontent.com/NoEdgeAI/doc2x-mcp/main/scripts/install-skill.ps1 | iex`
|
|
240
|
+
- Windows PowerShell 5.1:`irm https://raw.githubusercontent.com/NoEdgeAI/doc2x-mcp/main/scripts/install-skill-winps.ps1 | iex`
|
|
130
241
|
|
|
131
|
-
|
|
242
|
+
## 安全与排错
|
|
132
243
|
|
|
133
|
-
-
|
|
134
|
-
-
|
|
244
|
+
- 不要在仓库提交真实 `DOC2X_API_KEY`。
|
|
245
|
+
- 白名单默认限制下载域名;如需放开,评估风险后再使用 `DOC2X_DOWNLOAD_URL_ALLOWLIST=*`。
|
|
246
|
+
- 配置异常时优先调用 `doc2x_debug_config` 定位环境变量来源与解析结果。
|
|
247
|
+
|
|
248
|
+
## 问题反馈
|
|
249
|
+
|
|
250
|
+
- 使用问题或缺陷反馈:GitHub Issues
|
|
251
|
+
[https://github.com/NoEdgeAI/doc2x-mcp/issues](https://github.com/NoEdgeAI/doc2x-mcp/issues)
|
|
252
|
+
- 建议在 issue 中附上最小复现输入、触发的 tool 名称、以及 `doc2x_debug_config` 结果(可脱敏)。
|
|
253
|
+
|
|
254
|
+
## License
|
|
255
|
+
|
|
256
|
+
MIT License,详见 `LICENSE`。
|
package/README_EN.md
CHANGED
|
@@ -1,34 +1,52 @@
|
|
|
1
1
|
# Doc2x MCP Server
|
|
2
2
|
|
|
3
|
+
[](https://github.com/NoEdgeAI/doc2x-mcp/actions/workflows/ci.yml)
|
|
4
|
+
[](https://github.com/NoEdgeAI/doc2x-mcp/actions/workflows/publish.yml)
|
|
5
|
+
[](https://www.npmjs.com/package/@noedgeai-org/doc2x-mcp)
|
|
6
|
+
|
|
3
7
|
English | [简体中文](./README.md)
|
|
4
8
|
|
|
5
|
-
|
|
9
|
+
A stdio-based MCP Server that wraps Doc2x v2 PDF/image capabilities into stable, composable semantic tools.
|
|
10
|
+
|
|
11
|
+
## Table of Contents
|
|
12
|
+
|
|
13
|
+
- [Project Scope](#project-scope)
|
|
14
|
+
- [Runtime Quick Facts](#runtime-quick-facts)
|
|
15
|
+
- [Quick Start](#quick-start)
|
|
16
|
+
- [Configuration Reference](#configuration-reference)
|
|
17
|
+
- [Tool API Overview](#tool-api-overview)
|
|
18
|
+
- [Common Workflows](#common-workflows)
|
|
19
|
+
- [Local Development](#local-development)
|
|
20
|
+
- [CI / Release Pipelines](#ci--release-pipelines)
|
|
21
|
+
- [Publishing Flow](#publishing-flow)
|
|
22
|
+
- [Install Repo Skill (Optional)](#install-repo-skill-optional)
|
|
23
|
+
- [Security and Troubleshooting](#security-and-troubleshooting)
|
|
24
|
+
- [Getting Help](#getting-help)
|
|
25
|
+
- [License](#license)
|
|
6
26
|
|
|
7
|
-
##
|
|
27
|
+
## Project Scope
|
|
8
28
|
|
|
9
|
-
-
|
|
29
|
+
- Exposes Doc2x capabilities to MCP clients (Codex CLI / Claude Code / custom agents).
|
|
30
|
+
- Uses a unified async contract (`submit/status/wait`) for predictable automation.
|
|
31
|
+
- Provides runtime safety boundaries (timeouts, polling controls, download allowlist).
|
|
10
32
|
|
|
11
|
-
##
|
|
33
|
+
## Runtime Quick Facts
|
|
12
34
|
|
|
13
|
-
|
|
35
|
+
- Local run: Node.js `>=18` is enough.
|
|
36
|
+
- CI checks: builds run on Node.js `18` and `20`.
|
|
37
|
+
- Release environment: publish jobs run on Node.js `24` in GitHub Actions.
|
|
38
|
+
- Package manager: pnpm with lockfile `pnpm-lock.yaml`.
|
|
14
39
|
|
|
15
|
-
|
|
16
|
-
- `DOC2X_BASE_URL`: optional, default `https://v2.doc2x.noedgeai.com`
|
|
17
|
-
- `DOC2X_HTTP_TIMEOUT_MS`: optional, default `60000`
|
|
18
|
-
- `DOC2X_POLL_INTERVAL_MS`: optional, default `2000`
|
|
19
|
-
- `DOC2X_MAX_WAIT_MS`: optional, default `600000`
|
|
20
|
-
- `DOC2X_PARSE_PDF_MAX_OUTPUT_CHARS`: optional, default `5000`; limit the returned text size of `doc2x_parse_pdf_wait_text` (set `0` for unlimited)
|
|
21
|
-
- `DOC2X_PARSE_PDF_MAX_OUTPUT_PAGES`: optional, default `10`; limit merged pages of `doc2x_parse_pdf_wait_text` (set `0` for unlimited)
|
|
22
|
-
- `DOC2X_DOWNLOAD_URL_ALLOWLIST`: optional, default `".amazonaws.com.cn,.aliyuncs.com,.noedgeai.com"`; set to `*` to allow any host (not recommended)
|
|
40
|
+
## Quick Start
|
|
23
41
|
|
|
24
|
-
|
|
42
|
+
### Option A: via npx (recommended)
|
|
25
43
|
|
|
26
|
-
|
|
44
|
+
Add this in your MCP client config:
|
|
27
45
|
|
|
28
46
|
```json
|
|
29
47
|
{
|
|
30
48
|
"command": "npx",
|
|
31
|
-
"args": ["-y", "@noedgeai-org/doc2x-mcp"],
|
|
49
|
+
"args": ["-y", "@noedgeai-org/doc2x-mcp@latest"],
|
|
32
50
|
"env": {
|
|
33
51
|
"DOC2X_API_KEY": "sk-xxx",
|
|
34
52
|
"DOC2X_BASE_URL": "https://v2.doc2x.noedgeai.com"
|
|
@@ -40,11 +58,13 @@ Configure via environment variables:
|
|
|
40
58
|
|
|
41
59
|
```bash
|
|
42
60
|
cd doc2x-mcp
|
|
43
|
-
|
|
44
|
-
|
|
45
|
-
DOC2X_API_KEY=sk-xxx
|
|
61
|
+
pnpm install --frozen-lockfile
|
|
62
|
+
pnpm run build
|
|
63
|
+
DOC2X_API_KEY=sk-xxx pnpm start
|
|
46
64
|
```
|
|
47
65
|
|
|
66
|
+
Point MCP client to your local build output:
|
|
67
|
+
|
|
48
68
|
```json
|
|
49
69
|
{
|
|
50
70
|
"command": "node",
|
|
@@ -56,27 +76,34 @@ DOC2X_API_KEY=sk-xxx npm start
|
|
|
56
76
|
}
|
|
57
77
|
```
|
|
58
78
|
|
|
59
|
-
##
|
|
60
|
-
|
|
61
|
-
|
|
62
|
-
|
|
63
|
-
- `
|
|
64
|
-
|
|
65
|
-
|
|
66
|
-
|
|
67
|
-
|
|
68
|
-
|
|
69
|
-
|
|
70
|
-
|
|
71
|
-
|
|
72
|
-
|
|
73
|
-
|
|
79
|
+
## Configuration Reference
|
|
80
|
+
|
|
81
|
+
| Environment Variable | Required | Default | Description |
|
|
82
|
+
| --- | --- | --- | --- |
|
|
83
|
+
| `DOC2X_API_KEY` | Yes | - | Doc2x API key (`sk-xxx`) |
|
|
84
|
+
| `DOC2X_BASE_URL` | No | `https://v2.doc2x.noedgeai.com` | Doc2x API base URL |
|
|
85
|
+
| `DOC2X_HTTP_TIMEOUT_MS` | No | `60000` | Per-request HTTP timeout in ms |
|
|
86
|
+
| `DOC2X_POLL_INTERVAL_MS` | No | `2000` | Polling interval in ms |
|
|
87
|
+
| `DOC2X_MAX_WAIT_MS` | No | `600000` | Max wait duration for wait tools in ms |
|
|
88
|
+
| `DOC2X_PARSE_PDF_MAX_OUTPUT_CHARS` | No | `5000` | Max returned chars for `doc2x_parse_pdf_wait_text`; `0` = unlimited |
|
|
89
|
+
| `DOC2X_PARSE_PDF_MAX_OUTPUT_PAGES` | No | `10` | Max merged pages for `doc2x_parse_pdf_wait_text`; `0` = unlimited |
|
|
90
|
+
| `DOC2X_DOWNLOAD_URL_ALLOWLIST` | No | `.amazonaws.com.cn,.aliyuncs.com,.noedgeai.com` | URL host allowlist for downloads; `*` allows any host (not recommended) |
|
|
91
|
+
|
|
92
|
+
## Tool API Overview
|
|
93
|
+
|
|
94
|
+
| Stage | Tools | Purpose |
|
|
95
|
+
| --- | --- | --- |
|
|
96
|
+
| PDF parse | `doc2x_parse_pdf_submit` / `doc2x_parse_pdf_status` / `doc2x_parse_pdf_wait_text` / `doc2x_materialize_pdf_layout_json` | Submit parse tasks, check status, wait and fetch text, or materialize v3 layout JSON locally |
|
|
97
|
+
| Export | `doc2x_convert_export_submit` / `doc2x_convert_export_result` / `doc2x_convert_export_wait` | Start export, read export result, wait for completion |
|
|
98
|
+
| Download | `doc2x_download_url_to_file` / `doc2x_materialize_convert_zip` | Download export URL to local path, materialize convert zip |
|
|
99
|
+
| Image layout parse | `doc2x_parse_image_layout_sync` / `doc2x_parse_image_layout_submit` / `doc2x_parse_image_layout_status` / `doc2x_parse_image_layout_wait_text` | Sync/async OCR and layout parse for images |
|
|
100
|
+
| Diagnostics | `doc2x_debug_config` | Show resolved config and API key source |
|
|
74
101
|
|
|
75
102
|
### PDF Parse Model (`doc2x_parse_pdf_submit` / `doc2x_parse_pdf_wait_text`)
|
|
76
103
|
|
|
77
104
|
- Optional parameter: `model`
|
|
78
|
-
- Supported values:
|
|
79
|
-
-
|
|
105
|
+
- Supported values: `v2` (default) / `v3-2026` (latest model)
|
|
106
|
+
- Default (when omitted): `v2`
|
|
80
107
|
|
|
81
108
|
```json
|
|
82
109
|
{
|
|
@@ -84,22 +111,110 @@ DOC2X_API_KEY=sk-xxx npm start
|
|
|
84
111
|
}
|
|
85
112
|
```
|
|
86
113
|
|
|
114
|
+
### PDF Layout JSON Materialization (`doc2x_materialize_pdf_layout_json`)
|
|
115
|
+
|
|
116
|
+
- Required: `output_path`
|
|
117
|
+
- Provide either `uid` or `pdf_path`
|
|
118
|
+
- `v2` does not support `layout`; use `v3-2026` when `pages[].layout` is required
|
|
119
|
+
- When `pdf_path` is used and `model` is omitted, this tool defaults to `v3-2026`
|
|
120
|
+
- On success it writes the raw parse `result` JSON locally
|
|
121
|
+
|
|
122
|
+
`layout` contains page block structure and coordinates, which is useful for figure/table crops, region highlighting, structured extraction, and layout analysis. If the goal is readable full text, prefer Markdown / DOCX export.
|
|
123
|
+
|
|
124
|
+
```json
|
|
125
|
+
{
|
|
126
|
+
"pdf_path": "/absolute/path/to/input.pdf",
|
|
127
|
+
"output_path": "/absolute/path/to/input_v3.layout.json"
|
|
128
|
+
}
|
|
129
|
+
```
|
|
130
|
+
|
|
87
131
|
### Export Formula Parameters (`doc2x_convert_export_submit` / `doc2x_convert_export_wait`)
|
|
88
132
|
|
|
89
|
-
- Required
|
|
90
|
-
- Optional
|
|
133
|
+
- Required: `formula_mode` (`normal` / `dollar`)
|
|
134
|
+
- Optional: `formula_level` (effective only when source parse used `model=v3-2026`)
|
|
91
135
|
- Value mapping:
|
|
92
|
-
- `0`: keep formulas
|
|
93
|
-
- `1`: degrade inline formulas
|
|
94
|
-
- `2`: degrade all formulas
|
|
136
|
+
- `0`: keep formulas
|
|
137
|
+
- `1`: degrade inline formulas (`\\(...\\)`, `$...$`)
|
|
138
|
+
- `2`: degrade all formulas (`\\(...\\)`, `$...$`, `\\[...\\]`, `$$...$$`)
|
|
95
139
|
|
|
96
|
-
##
|
|
140
|
+
## Common Workflows
|
|
97
141
|
|
|
98
|
-
|
|
142
|
+
### Workflow 1: PDF -> Markdown local file
|
|
143
|
+
|
|
144
|
+
1. Submit parse via `doc2x_parse_pdf_submit`.
|
|
145
|
+
2. Wait export via `doc2x_convert_export_wait` (`to=md` and `formula_mode` required).
|
|
146
|
+
3. Read result URL via `doc2x_convert_export_result`.
|
|
147
|
+
4. Download to local path via `doc2x_download_url_to_file`.
|
|
148
|
+
|
|
149
|
+
### Workflow 2: Fast image OCR/layout result
|
|
150
|
+
|
|
151
|
+
1. Use `doc2x_parse_image_layout_sync` for direct parse.
|
|
152
|
+
2. For robust polling behavior, switch to submit/status/wait flow.
|
|
153
|
+
|
|
154
|
+
### Workflow 3: PDF -> local v3 layout JSON
|
|
155
|
+
|
|
156
|
+
1. Call `doc2x_materialize_pdf_layout_json` with `pdf_path` and `output_path`.
|
|
157
|
+
2. The tool waits for parse success and writes the raw `result` JSON locally.
|
|
158
|
+
3. The saved JSON can be consumed directly by downstream figure/table crop scripts.
|
|
159
|
+
|
|
160
|
+
## Local Development
|
|
161
|
+
|
|
162
|
+
### Requirements
|
|
163
|
+
|
|
164
|
+
- Node.js `>=18`
|
|
165
|
+
- pnpm (aligned with lockfile)
|
|
166
|
+
|
|
167
|
+
### Common commands
|
|
168
|
+
|
|
169
|
+
```bash
|
|
170
|
+
pnpm install --frozen-lockfile
|
|
171
|
+
pnpm run build
|
|
172
|
+
pnpm run test
|
|
173
|
+
pnpm run format:check
|
|
174
|
+
```
|
|
175
|
+
|
|
176
|
+
Run server:
|
|
177
|
+
|
|
178
|
+
```bash
|
|
179
|
+
DOC2X_API_KEY=sk-xxx pnpm start
|
|
180
|
+
```
|
|
99
181
|
|
|
100
|
-
##
|
|
182
|
+
## CI / Release Pipelines
|
|
101
183
|
|
|
102
|
-
|
|
184
|
+
This repository uses GitHub Actions:
|
|
185
|
+
|
|
186
|
+
- CI: `.github/workflows/ci.yml`
|
|
187
|
+
- Triggers: `push` to `main`, `pull_request`, `workflow_dispatch`, weekly on Monday at UTC `03:17`
|
|
188
|
+
- Doc-only changes (`**/*.md`, `LICENSE`) are skipped
|
|
189
|
+
- Jobs:
|
|
190
|
+
- `dependency-review` (PR only)
|
|
191
|
+
- `build` (Node.js `18/20` matrix)
|
|
192
|
+
- `package-smoke` (`npm pack --dry-run`)
|
|
193
|
+
- `security-audit` (manual/scheduled only)
|
|
194
|
+
|
|
195
|
+
- Publish: `.github/workflows/publish.yml`
|
|
196
|
+
- Push to `dev`: publish npm package with `dev` tag
|
|
197
|
+
- Push tag `v*.*.*`: publish npm package as `latest`
|
|
198
|
+
- Verifies tag version matches `package.json` version before publish
|
|
199
|
+
- Publish command: `npm publish --provenance`
|
|
200
|
+
|
|
201
|
+
Recommended local parity checks before pushing:
|
|
202
|
+
|
|
203
|
+
```bash
|
|
204
|
+
pnpm install --frozen-lockfile
|
|
205
|
+
pnpm run build
|
|
206
|
+
npm pack --dry-run
|
|
207
|
+
pnpm audit --prod --audit-level high
|
|
208
|
+
```
|
|
209
|
+
|
|
210
|
+
## Publishing Flow
|
|
211
|
+
|
|
212
|
+
- Dev pre-release (`dev`): push to `dev` branch to publish npm `dev` tag. Version is auto rewritten to `x.y.z-dev.<run>.<attempt>`.
|
|
213
|
+
- Production release (`latest`): push `v*.*.*` tag to publish npm `latest`. Tag version must match `package.json`.
|
|
214
|
+
|
|
215
|
+
## Install Repo Skill (Optional)
|
|
216
|
+
|
|
217
|
+
Installs a reusable skill for Codex CLI / Claude Code to guide tool usage with the standard `submit/status/wait/export/download` workflow.
|
|
103
218
|
|
|
104
219
|
One-command install without cloning (recommended):
|
|
105
220
|
|
|
@@ -107,28 +222,35 @@ One-command install without cloning (recommended):
|
|
|
107
222
|
curl -fsSL https://raw.githubusercontent.com/NoEdgeAI/doc2x-mcp/main/scripts/install-skill.sh | sh
|
|
108
223
|
```
|
|
109
224
|
|
|
110
|
-
Re-run the same command to overwrite (default behavior overwrites an existing destination directory).
|
|
111
|
-
|
|
112
225
|
Install from this repo source directory:
|
|
113
226
|
|
|
114
227
|
```bash
|
|
115
|
-
|
|
228
|
+
pnpm run skill:install
|
|
116
229
|
```
|
|
117
230
|
|
|
118
|
-
Default
|
|
119
|
-
|
|
120
|
-
The script installs to:
|
|
231
|
+
Default destinations:
|
|
121
232
|
|
|
122
|
-
- Codex CLI: `~/.codex/skills/public/doc2x-mcp` (override
|
|
123
|
-
- Claude Code: `~/.claude/skills/doc2x-mcp` (override
|
|
233
|
+
- Codex CLI: `~/.codex/skills/public/doc2x-mcp` (override with `CODEX_HOME`)
|
|
234
|
+
- Claude Code: `~/.claude/skills/doc2x-mcp` (override with `CLAUDE_HOME`)
|
|
124
235
|
|
|
125
236
|
Notes:
|
|
126
237
|
|
|
127
|
-
- `--target auto` (default) installs to both Codex + Claude
|
|
128
|
-
- PowerShell 7
|
|
129
|
-
- Windows PowerShell 5.1
|
|
238
|
+
- `--target auto` (default) installs to both Codex + Claude.
|
|
239
|
+
- PowerShell 7+: `irm https://raw.githubusercontent.com/NoEdgeAI/doc2x-mcp/main/scripts/install-skill.ps1 | iex`
|
|
240
|
+
- Windows PowerShell 5.1: `irm https://raw.githubusercontent.com/NoEdgeAI/doc2x-mcp/main/scripts/install-skill-winps.ps1 | iex`
|
|
130
241
|
|
|
131
|
-
|
|
242
|
+
## Security and Troubleshooting
|
|
132
243
|
|
|
133
|
-
-
|
|
134
|
-
-
|
|
244
|
+
- Never commit real `DOC2X_API_KEY` to the repository.
|
|
245
|
+
- The download allowlist is restrictive by default; evaluate risk before using `DOC2X_DOWNLOAD_URL_ALLOWLIST=*`.
|
|
246
|
+
- Use `doc2x_debug_config` first when diagnosing config/environment issues.
|
|
247
|
+
|
|
248
|
+
## Getting Help
|
|
249
|
+
|
|
250
|
+
- Usage questions or bug reports: GitHub Issues
|
|
251
|
+
[https://github.com/NoEdgeAI/doc2x-mcp/issues](https://github.com/NoEdgeAI/doc2x-mcp/issues)
|
|
252
|
+
- Include minimal reproduction input, affected tool name, and sanitized `doc2x_debug_config` output when possible.
|
|
253
|
+
|
|
254
|
+
## License
|
|
255
|
+
|
|
256
|
+
MIT License. See `LICENSE`.
|
|
@@ -1,3 +1,8 @@
|
|
|
1
|
+
export declare function validatePdfLayoutResult(result: unknown, uid?: string): {
|
|
2
|
+
result: Record<string, unknown>;
|
|
3
|
+
pageCount: number;
|
|
4
|
+
hasLayout: true;
|
|
5
|
+
};
|
|
1
6
|
export declare function materializeConvertZip(args: {
|
|
2
7
|
convert_zip_base64: string;
|
|
3
8
|
output_dir: string;
|
|
@@ -6,3 +11,13 @@ export declare function materializeConvertZip(args: {
|
|
|
6
11
|
zip_path: string;
|
|
7
12
|
extracted: boolean;
|
|
8
13
|
}>;
|
|
14
|
+
export declare function materializePdfLayoutJson(args: {
|
|
15
|
+
result: unknown;
|
|
16
|
+
output_path: string;
|
|
17
|
+
uid?: string;
|
|
18
|
+
}): Promise<{
|
|
19
|
+
uid: string;
|
|
20
|
+
output_path: string;
|
|
21
|
+
page_count: number;
|
|
22
|
+
has_layout: true;
|
|
23
|
+
}>;
|
|
@@ -1,6 +1,8 @@
|
|
|
1
1
|
import fsp from 'node:fs/promises';
|
|
2
2
|
import path from 'node:path';
|
|
3
3
|
import { spawn } from 'node:child_process';
|
|
4
|
+
import { ToolError } from '#errors';
|
|
5
|
+
import { TOOL_ERROR_CODE_INVALID_JSON } from '#errorCodes';
|
|
4
6
|
function spawnUnzip(zipPath, outputDir) {
|
|
5
7
|
return new Promise((resolve) => {
|
|
6
8
|
const child = spawn('unzip', ['-o', zipPath, '-d', outputDir], { stdio: 'ignore' });
|
|
@@ -8,6 +10,44 @@ function spawnUnzip(zipPath, outputDir) {
|
|
|
8
10
|
child.on('exit', (code) => resolve(code === 0));
|
|
9
11
|
});
|
|
10
12
|
}
|
|
13
|
+
function isRecord(value) {
|
|
14
|
+
return value !== null && typeof value === 'object' && !Array.isArray(value);
|
|
15
|
+
}
|
|
16
|
+
export function validatePdfLayoutResult(result, uid) {
|
|
17
|
+
if (!isRecord(result))
|
|
18
|
+
throw new ToolError({
|
|
19
|
+
code: TOOL_ERROR_CODE_INVALID_JSON,
|
|
20
|
+
message: 'parse result must be a JSON object',
|
|
21
|
+
retryable: false,
|
|
22
|
+
uid,
|
|
23
|
+
});
|
|
24
|
+
const pages = result.pages;
|
|
25
|
+
if (!Array.isArray(pages) || pages.length === 0)
|
|
26
|
+
throw new ToolError({
|
|
27
|
+
code: TOOL_ERROR_CODE_INVALID_JSON,
|
|
28
|
+
message: 'parse result must contain a non-empty pages array',
|
|
29
|
+
retryable: false,
|
|
30
|
+
uid,
|
|
31
|
+
});
|
|
32
|
+
for (let i = 0; i < pages.length; i++) {
|
|
33
|
+
const page = pages[i];
|
|
34
|
+
if (!isRecord(page))
|
|
35
|
+
throw new ToolError({
|
|
36
|
+
code: TOOL_ERROR_CODE_INVALID_JSON,
|
|
37
|
+
message: `pages[${i}] must be an object`,
|
|
38
|
+
retryable: false,
|
|
39
|
+
uid,
|
|
40
|
+
});
|
|
41
|
+
if (!isRecord(page.layout))
|
|
42
|
+
throw new ToolError({
|
|
43
|
+
code: TOOL_ERROR_CODE_INVALID_JSON,
|
|
44
|
+
message: `pages[${i}].layout must be an object`,
|
|
45
|
+
retryable: false,
|
|
46
|
+
uid,
|
|
47
|
+
});
|
|
48
|
+
}
|
|
49
|
+
return { result, pageCount: pages.length, hasLayout: true };
|
|
50
|
+
}
|
|
11
51
|
export async function materializeConvertZip(args) {
|
|
12
52
|
const outDir = path.resolve(args.output_dir);
|
|
13
53
|
await fsp.mkdir(outDir, { recursive: true });
|
|
@@ -17,3 +57,15 @@ export async function materializeConvertZip(args) {
|
|
|
17
57
|
const extracted = await spawnUnzip(zipPath, outDir);
|
|
18
58
|
return { output_dir: outDir, zip_path: zipPath, extracted };
|
|
19
59
|
}
|
|
60
|
+
export async function materializePdfLayoutJson(args) {
|
|
61
|
+
const validated = validatePdfLayoutResult(args.result, args.uid);
|
|
62
|
+
const outputPath = path.resolve(args.output_path);
|
|
63
|
+
await fsp.mkdir(path.dirname(outputPath), { recursive: true });
|
|
64
|
+
await fsp.writeFile(outputPath, `${JSON.stringify(validated.result, null, 2)}\n`, 'utf8');
|
|
65
|
+
return {
|
|
66
|
+
uid: args.uid ?? '',
|
|
67
|
+
output_path: outputPath,
|
|
68
|
+
page_count: validated.pageCount,
|
|
69
|
+
has_layout: validated.hasLayout,
|
|
70
|
+
};
|
|
71
|
+
}
|
package/dist/doc2x/pdf.d.ts
CHANGED
|
@@ -1,4 +1,6 @@
|
|
|
1
|
-
export declare const
|
|
1
|
+
export declare const PARSE_PDF_MODEL_V2: "v2";
|
|
2
|
+
export declare const PARSE_PDF_MODEL_V3: "v3-2026";
|
|
3
|
+
export declare const PARSE_PDF_MODELS: readonly ["v2", "v3-2026"];
|
|
2
4
|
export type ParsePdfModel = (typeof PARSE_PDF_MODELS)[number];
|
|
3
5
|
export declare function parsePdfSubmit(pdfPath: string, opts?: {
|
|
4
6
|
model?: ParsePdfModel;
|
|
@@ -12,6 +14,15 @@ export declare function parsePdfStatus(uid: string): Promise<{
|
|
|
12
14
|
detail: string;
|
|
13
15
|
result: {} | null;
|
|
14
16
|
}>;
|
|
17
|
+
export declare function parsePdfWaitResultByUid(args: {
|
|
18
|
+
uid: string;
|
|
19
|
+
poll_interval_ms?: number;
|
|
20
|
+
max_wait_ms?: number;
|
|
21
|
+
}): Promise<{
|
|
22
|
+
uid: string;
|
|
23
|
+
status: "success";
|
|
24
|
+
result: {} | null;
|
|
25
|
+
}>;
|
|
15
26
|
export declare function parsePdfWaitTextByUid(args: {
|
|
16
27
|
uid: string;
|
|
17
28
|
poll_interval_ms?: number;
|
package/dist/doc2x/pdf.js
CHANGED
|
@@ -9,7 +9,9 @@ import { doc2xRequestJson, putToSignedUrl } from '#doc2x/client';
|
|
|
9
9
|
import { DOC2X_TASK_STATUS_FAILED, DOC2X_TASK_STATUS_SUCCESS } from '#doc2x/constants';
|
|
10
10
|
import { HTTP_METHOD_GET, HTTP_METHOD_POST } from '#doc2x/http';
|
|
11
11
|
import { v2 } from '#doc2x/paths';
|
|
12
|
-
export const
|
|
12
|
+
export const PARSE_PDF_MODEL_V2 = 'v2';
|
|
13
|
+
export const PARSE_PDF_MODEL_V3 = 'v3-2026';
|
|
14
|
+
export const PARSE_PDF_MODELS = [PARSE_PDF_MODEL_V2, PARSE_PDF_MODEL_V3];
|
|
13
15
|
function mergePagesToTextWithLimit(result, joinWith, limits) {
|
|
14
16
|
const parsed = result ?? null;
|
|
15
17
|
const sourcePages = _.isArray(parsed?.pages) ? parsed.pages : [];
|
|
@@ -112,10 +114,9 @@ export async function parsePdfStatus(uid) {
|
|
|
112
114
|
result: data.result ?? null,
|
|
113
115
|
};
|
|
114
116
|
}
|
|
115
|
-
|
|
117
|
+
async function waitForParsePdfSuccessByUid(args) {
|
|
116
118
|
const pollInterval = args.poll_interval_ms ?? CONFIG.pollIntervalMs;
|
|
117
119
|
const maxWait = args.max_wait_ms ?? CONFIG.maxWaitMs;
|
|
118
|
-
const joinWith = args.join_with ?? '\n\n---\n\n';
|
|
119
120
|
const uid = String(args.uid || '').trim();
|
|
120
121
|
if (!uid)
|
|
121
122
|
throw new ToolError({
|
|
@@ -145,13 +146,8 @@ export async function parsePdfWaitTextByUid(args) {
|
|
|
145
146
|
}
|
|
146
147
|
throw e;
|
|
147
148
|
}
|
|
148
|
-
if (st.status === DOC2X_TASK_STATUS_SUCCESS)
|
|
149
|
-
|
|
150
|
-
maxOutputChars: args.max_output_chars,
|
|
151
|
-
maxOutputPages: args.max_output_pages,
|
|
152
|
-
});
|
|
153
|
-
return { uid, status: DOC2X_TASK_STATUS_SUCCESS, ...merged };
|
|
154
|
-
}
|
|
149
|
+
if (st.status === DOC2X_TASK_STATUS_SUCCESS)
|
|
150
|
+
return st;
|
|
155
151
|
if (st.status === DOC2X_TASK_STATUS_FAILED)
|
|
156
152
|
throw new ToolError({
|
|
157
153
|
code: TOOL_ERROR_CODE_PARSE_FAILED,
|
|
@@ -162,3 +158,16 @@ export async function parsePdfWaitTextByUid(args) {
|
|
|
162
158
|
await sleep(pollInterval);
|
|
163
159
|
}
|
|
164
160
|
}
|
|
161
|
+
export async function parsePdfWaitResultByUid(args) {
|
|
162
|
+
const st = await waitForParsePdfSuccessByUid(args);
|
|
163
|
+
return { uid: st.uid, status: DOC2X_TASK_STATUS_SUCCESS, result: st.result };
|
|
164
|
+
}
|
|
165
|
+
export async function parsePdfWaitTextByUid(args) {
|
|
166
|
+
const joinWith = args.join_with ?? '\n\n---\n\n';
|
|
167
|
+
const st = await waitForParsePdfSuccessByUid(args);
|
|
168
|
+
const merged = mergePagesToTextWithLimit(st.result, joinWith, {
|
|
169
|
+
maxOutputChars: args.max_output_chars,
|
|
170
|
+
maxOutputPages: args.max_output_pages,
|
|
171
|
+
});
|
|
172
|
+
return { uid: st.uid, status: DOC2X_TASK_STATUS_SUCCESS, ...merged };
|
|
173
|
+
}
|
|
@@ -1,14 +1,15 @@
|
|
|
1
1
|
import { CONFIG } from '#config';
|
|
2
2
|
import { isRetryableError } from '#errors';
|
|
3
|
-
import { parsePdfStatus, parsePdfSubmit, parsePdfWaitTextByUid, } from '#doc2x/pdf';
|
|
3
|
+
import { PARSE_PDF_MODEL_V2, PARSE_PDF_MODEL_V3, parsePdfStatus, parsePdfSubmit, parsePdfWaitResultByUid, parsePdfWaitTextByUid, } from '#doc2x/pdf';
|
|
4
|
+
import { materializePdfLayoutJson } from '#doc2x/materialize';
|
|
4
5
|
import { asJsonResult, asTextResult } from '#mcp/results';
|
|
5
|
-
import { deleteUidCache, fileSig, getSubmittedUidFromCache, joinWithSchema, makePdfUidCacheKey, missingEitherFieldError, nonNegativeIntSchema, parsePdfModelSchema, parsePdfUidSchema, pdfPathForWaitSchema, pdfPathSchema, positiveIntMsSchema, setFailedUidCache, setSubmittedUidCache, withToolErrorHandling, } from '#mcp/registerToolsShared';
|
|
6
|
+
import { deleteUidCache, fileSig, getSubmittedUidFromCache, jsonOutputPathSchema, joinWithSchema, makePdfUidCacheKey, missingEitherFieldError, nonNegativeIntSchema, parsePdfModelSchema, parsePdfUidSchema, pdfPathForWaitSchema, pdfPathSchema, positiveIntMsSchema, setFailedUidCache, setSubmittedUidCache, withToolErrorHandling, } from '#mcp/registerToolsShared';
|
|
6
7
|
export function registerPdfTools(server, ctx) {
|
|
7
8
|
server.registerTool('doc2x_parse_pdf_submit', {
|
|
8
9
|
description: 'Create a Doc2x PDF parse task for a local file and return {uid}. Prefer calling doc2x_parse_pdf_status to monitor progress/result; only call doc2x_parse_pdf_wait_text if the user explicitly asks to wait/return merged text.',
|
|
9
10
|
inputSchema: {
|
|
10
11
|
pdf_path: pdfPathSchema,
|
|
11
|
-
model: parsePdfModelSchema.describe(
|
|
12
|
+
model: parsePdfModelSchema.describe(`Optional parse model. Supported values: '${PARSE_PDF_MODEL_V2}' and '${PARSE_PDF_MODEL_V3}'. Omit this field to use default ${PARSE_PDF_MODEL_V2}.`),
|
|
12
13
|
},
|
|
13
14
|
}, withToolErrorHandling(async ({ pdf_path, model }) => {
|
|
14
15
|
const sig = await fileSig(pdf_path);
|
|
@@ -45,7 +46,7 @@ export function registerPdfTools(server, ctx) {
|
|
|
45
46
|
.optional()
|
|
46
47
|
.describe('Max pages to merge into returned text (0 = unlimited). Default can be set via env DOC2X_PARSE_PDF_MAX_OUTPUT_PAGES.'),
|
|
47
48
|
model: parsePdfModelSchema
|
|
48
|
-
.describe(
|
|
49
|
+
.describe(`Optional parse model used only when submitting from pdf_path. Supported values: '${PARSE_PDF_MODEL_V2}' and '${PARSE_PDF_MODEL_V3}'. Omit this field to use default ${PARSE_PDF_MODEL_V2}.`),
|
|
49
50
|
},
|
|
50
51
|
}, withToolErrorHandling(async (args) => {
|
|
51
52
|
const maxOutputChars = args.max_output_chars ?? CONFIG.parsePdfMaxOutputChars;
|
|
@@ -120,4 +121,65 @@ export function registerPdfTools(server, ctx) {
|
|
|
120
121
|
}
|
|
121
122
|
}
|
|
122
123
|
}));
|
|
124
|
+
server.registerTool('doc2x_materialize_pdf_layout_json', {
|
|
125
|
+
description: `Wait for a PDF parse task and write the raw Doc2x result JSON (with page layout) to output_path. Prefer passing uid. If only pdf_path is provided, this tool reuses a cached uid or submits a new parse with model='${PARSE_PDF_MODEL_V3}' by default.`,
|
|
126
|
+
inputSchema: {
|
|
127
|
+
uid: parsePdfUidSchema.optional(),
|
|
128
|
+
pdf_path: pdfPathForWaitSchema.optional(),
|
|
129
|
+
output_path: jsonOutputPathSchema,
|
|
130
|
+
poll_interval_ms: positiveIntMsSchema.optional(),
|
|
131
|
+
max_wait_ms: positiveIntMsSchema.optional(),
|
|
132
|
+
model: parsePdfModelSchema
|
|
133
|
+
.describe(`Optional parse model used only when submitting from pdf_path. Supported values: '${PARSE_PDF_MODEL_V2}' and '${PARSE_PDF_MODEL_V3}'. Defaults to '${PARSE_PDF_MODEL_V3}' for this tool because ${PARSE_PDF_MODEL_V2} does not return layout.`),
|
|
134
|
+
},
|
|
135
|
+
}, withToolErrorHandling(async (args) => {
|
|
136
|
+
const materializeByUid = async (uid) => {
|
|
137
|
+
const out = await parsePdfWaitResultByUid({
|
|
138
|
+
uid,
|
|
139
|
+
poll_interval_ms: args.poll_interval_ms,
|
|
140
|
+
max_wait_ms: args.max_wait_ms,
|
|
141
|
+
});
|
|
142
|
+
return await materializePdfLayoutJson({
|
|
143
|
+
uid: out.uid,
|
|
144
|
+
result: out.result,
|
|
145
|
+
output_path: args.output_path,
|
|
146
|
+
});
|
|
147
|
+
};
|
|
148
|
+
const uid = String(args.uid || '').trim();
|
|
149
|
+
if (uid)
|
|
150
|
+
return asJsonResult(await materializeByUid(uid));
|
|
151
|
+
const pdfPath = String(args.pdf_path || '').trim();
|
|
152
|
+
if (!pdfPath)
|
|
153
|
+
throw missingEitherFieldError('uid', 'pdf_path');
|
|
154
|
+
const sig = await fileSig(pdfPath);
|
|
155
|
+
const model = args.model ?? PARSE_PDF_MODEL_V3;
|
|
156
|
+
const cacheKey = makePdfUidCacheKey(sig.absPath, model);
|
|
157
|
+
const resolvedUid = getSubmittedUidFromCache(ctx, { kind: 'pdf', key: cacheKey, sig });
|
|
158
|
+
const finalUid = resolvedUid || (await parsePdfSubmit(pdfPath, { model })).uid;
|
|
159
|
+
setSubmittedUidCache(ctx, { kind: 'pdf', key: cacheKey, sig, uid: finalUid });
|
|
160
|
+
const markFailed = (failedUid) => setFailedUidCache(ctx, { kind: 'pdf', key: cacheKey, sig, uid: failedUid });
|
|
161
|
+
try {
|
|
162
|
+
return asJsonResult(await materializeByUid(finalUid));
|
|
163
|
+
}
|
|
164
|
+
catch (e) {
|
|
165
|
+
if (!resolvedUid) {
|
|
166
|
+
markFailed(finalUid);
|
|
167
|
+
throw e;
|
|
168
|
+
}
|
|
169
|
+
deleteUidCache(ctx, { kind: 'pdf', key: cacheKey });
|
|
170
|
+
if (!isRetryableError(e)) {
|
|
171
|
+
markFailed(finalUid);
|
|
172
|
+
throw e;
|
|
173
|
+
}
|
|
174
|
+
const retryUid = (await parsePdfSubmit(pdfPath, { model })).uid;
|
|
175
|
+
setSubmittedUidCache(ctx, { kind: 'pdf', key: cacheKey, sig, uid: retryUid });
|
|
176
|
+
try {
|
|
177
|
+
return asJsonResult(await materializeByUid(retryUid));
|
|
178
|
+
}
|
|
179
|
+
catch (retryErr) {
|
|
180
|
+
markFailed(retryUid);
|
|
181
|
+
throw retryErr;
|
|
182
|
+
}
|
|
183
|
+
}
|
|
184
|
+
}));
|
|
123
185
|
}
|
|
@@ -82,6 +82,7 @@ export declare const convertToSchema: z.ZodEnum<{
|
|
|
82
82
|
docx: "docx";
|
|
83
83
|
}>;
|
|
84
84
|
export declare const parsePdfModelSchema: z.ZodOptional<z.ZodEnum<{
|
|
85
|
+
v2: "v2";
|
|
85
86
|
"v3-2026": "v3-2026";
|
|
86
87
|
}>>;
|
|
87
88
|
export declare const convertFormulaModeSchema: z.ZodEnum<{
|
|
@@ -98,6 +99,7 @@ export declare const imagePathSchema: z.ZodString;
|
|
|
98
99
|
export declare const imagePathForWaitSchema: z.ZodString;
|
|
99
100
|
export declare const pdfPathForWaitSchema: z.ZodString;
|
|
100
101
|
export declare const outputPathSchema: z.ZodString;
|
|
102
|
+
export declare const jsonOutputPathSchema: z.ZodString;
|
|
101
103
|
export declare const doc2xDownloadUrlSchema: z.ZodPipe<z.ZodString, z.ZodURL>;
|
|
102
104
|
export declare const convertZipBase64Schema: z.ZodString;
|
|
103
105
|
export declare const outputDirSchema: z.ZodString;
|
|
@@ -5,7 +5,7 @@ import path from 'node:path';
|
|
|
5
5
|
import { LRUCache } from 'lru-cache';
|
|
6
6
|
import { z } from 'zod';
|
|
7
7
|
import { CONVERT_FORMULA_LEVELS } from '#doc2x/convert';
|
|
8
|
-
import { PARSE_PDF_MODELS } from '#doc2x/pdf';
|
|
8
|
+
import { PARSE_PDF_MODEL_V2, PARSE_PDF_MODELS } from '#doc2x/pdf';
|
|
9
9
|
import { ToolError } from '#errors';
|
|
10
10
|
import { TOOL_ERROR_CODE_INVALID_ARGUMENT } from '#errorCodes';
|
|
11
11
|
import { asErrorResult } from '#mcp/results';
|
|
@@ -101,7 +101,7 @@ export function sameSig(a, b) {
|
|
|
101
101
|
return a.md5 === b.md5;
|
|
102
102
|
}
|
|
103
103
|
function normalizeParsePdfModel(model) {
|
|
104
|
-
return model ??
|
|
104
|
+
return model ?? PARSE_PDF_MODEL_V2;
|
|
105
105
|
}
|
|
106
106
|
export function makePdfUidCacheKey(absPath, model) {
|
|
107
107
|
return JSON.stringify([absPath, normalizeParsePdfModel(model)]);
|
|
@@ -174,6 +174,11 @@ export const imagePathSchema = imagePathBaseSchema.describe("Absolute path to a
|
|
|
174
174
|
export const imagePathForWaitSchema = imagePathBaseSchema.describe('Absolute path to a local image file (png/jpg). Used to reuse cached uid or submit a new async task.');
|
|
175
175
|
export const pdfPathForWaitSchema = pdfPathSchema.describe('Absolute path to a local PDF file. If uid is not provided, this tool will reuse cached uid (if any) or submit a new task.');
|
|
176
176
|
export const outputPathSchema = absolutePathSchema.describe('Absolute path for the output file. The file will be overwritten if it exists.');
|
|
177
|
+
export const jsonOutputPathSchema = absolutePathSchema
|
|
178
|
+
.refine((v) => v.toLowerCase().endsWith('.json'), {
|
|
179
|
+
message: "Path must end with '.json'.",
|
|
180
|
+
})
|
|
181
|
+
.describe('Absolute path for the output JSON file. The file will be overwritten if it exists.');
|
|
177
182
|
export const doc2xDownloadUrlSchema = z
|
|
178
183
|
.string()
|
|
179
184
|
.trim()
|
package/package.json
CHANGED
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
{
|
|
2
2
|
"name": "@noedgeai-org/doc2x-mcp",
|
|
3
|
-
"version": "0.1.
|
|
3
|
+
"version": "0.1.4-dev.8.1",
|
|
4
4
|
"description": "Doc2x MCP server (stdio, MCP SDK).",
|
|
5
5
|
"license": "MIT",
|
|
6
6
|
"engines": {
|
|
@@ -31,7 +31,7 @@
|
|
|
31
31
|
"skill:install:ps": "pwsh -NoProfile -ExecutionPolicy Bypass -File scripts/install-skill.ps1",
|
|
32
32
|
"skill:install:winps": "powershell -NoProfile -ExecutionPolicy Bypass -File scripts/install-skill-winps.ps1",
|
|
33
33
|
"start": "node dist/index.js",
|
|
34
|
-
"test:unit": "npm run build && node --test test/unit/registerToolsShared.test.js",
|
|
34
|
+
"test:unit": "npm run build && node --test test/unit/registerToolsShared.test.js test/unit/materialize.test.js",
|
|
35
35
|
"test:e2e": "npm run build && node --test test/e2e/mcpServer.e2e.test.js",
|
|
36
36
|
"test": "npm run test:unit && npm run test:e2e",
|
|
37
37
|
"prepublishOnly": "pnpm run build"
|
|
@@ -1,173 +1,116 @@
|
|
|
1
1
|
---
|
|
2
2
|
name: doc2x-mcp
|
|
3
|
-
description: 使用 Doc2x MCP
|
|
3
|
+
description: 使用 Doc2x MCP 工具处理 PDF、扫描件和图片:提交解析、查询状态、等待文本、导出 Markdown/LaTeX/DOCX、下载落盘,以及将 PDF v3 layout 结果写为本地 JSON。用户提到 PDF、OCR、scan/scanned PDF、image-to-text、extract text/tables、表格抽取、layout、Markdown、LaTeX/TeX、DOCX、doc2x、doc2x-mcp、MCP、figure/table crop、v3 JSON 时使用。
|
|
4
4
|
---
|
|
5
5
|
|
|
6
|
-
# Doc2x MCP
|
|
6
|
+
# Doc2x MCP
|
|
7
7
|
|
|
8
|
-
##
|
|
8
|
+
## 目的
|
|
9
9
|
|
|
10
|
-
|
|
10
|
+
凡是“解析 PDF/图片、抽取文本/表格、导出文档、下载结果、获取 v3 layout JSON”的请求,都应通过 `doc2x-mcp` tools 执行真实操作,不要臆造 `uid`、`url`、文件内容或导出结果。
|
|
11
11
|
|
|
12
|
-
|
|
13
|
-
- 不要跳过工具步骤直接输出“看起来合理”的内容
|
|
12
|
+
## 必须遵守
|
|
14
13
|
|
|
15
|
-
|
|
14
|
+
1. 所有文件路径都用绝对路径:`pdf_path`、`image_path`、`output_path`、`output_dir`。
|
|
15
|
+
2. 不要伪造下载 URL;只能使用 `doc2x_convert_export_*` 返回的 `url`。
|
|
16
|
+
3. 同一个 `uid` 的同一组导出参数不要并发重复提交。
|
|
17
|
+
4. 同一个 `uid` 做多档导出对比时,必须按“导出成功 -> 立即下载 -> 再导出下一档”执行,避免结果覆盖。
|
|
18
|
+
5. 不要回显 `DOC2X_API_KEY`;排错只用 `doc2x_debug_config` 的摘要信息。
|
|
19
|
+
6. `model` 只用于 PDF 解析提交;`formula_level` 只用于导出,且仅在源解析为 `v3-2026` 时有效。
|
|
20
|
+
7. `doc2x_parse_pdf_wait_text` 只适合预览或摘要;需要完整结果时优先导出文件。
|
|
21
|
+
8. 需要 PDF v3 block/layout 坐标时,不要从文本结果推断,直接使用 `doc2x_materialize_pdf_layout_json`。
|
|
16
22
|
|
|
17
|
-
|
|
18
|
-
`pdf_path` / `image_path` / `output_path` / `output_dir` 都应使用绝对路径;相对路径可能会被 server 以意外的 cwd 解析导致失败。
|
|
23
|
+
## 参数边界
|
|
19
24
|
|
|
20
|
-
|
|
21
|
-
|
|
25
|
+
- PDF 解析:`doc2x_parse_pdf_submit` 和 `doc2x_parse_pdf_wait_text(pdf_path 分支)` 可传 `model: "v2" | "v3-2026"`;不传默认 `v2`。
|
|
26
|
+
- PDF layout JSON:`doc2x_materialize_pdf_layout_json` 在 `pdf_path` 分支默认使用 `v3-2026`,并要求返回结果包含 `pages[].layout`。
|
|
27
|
+
- 导出:`formula_mode` 建议总是显式传入。
|
|
28
|
+
- `formula_level` 必须传数字 `0 | 1 | 2`,不要传字符串。
|
|
29
|
+
- 图片解析路径只接受 `png/jpg/jpeg`;PDF 路径必须以 `.pdf` 结尾;layout JSON 输出路径应以 `.json` 结尾。
|
|
22
30
|
|
|
23
|
-
|
|
24
|
-
同一个 `uid` 对同一种导出配置(`to + formula_mode + formula_level (+ filename + filename_mode + merge_cross_page_forms...)`)不要并行重复 submit。
|
|
25
|
-
补充:同一 `uid + to` 的导出结果可能会被后一次覆盖;做“多档对比”(如 `formula_level=0/1/2`)时,必须按 **导出成功 → 立即下载落盘 → 再导出下一档** 的顺序执行。
|
|
31
|
+
## 按目标选 Tool
|
|
26
32
|
|
|
27
|
-
|
|
28
|
-
|
|
33
|
+
- 提交 PDF 解析:`doc2x_parse_pdf_submit`
|
|
34
|
+
- 查看 PDF 状态:`doc2x_parse_pdf_status`
|
|
35
|
+
- 取 PDF 文本预览:`doc2x_parse_pdf_wait_text`
|
|
36
|
+
- 导出 PDF 为 `md/tex/docx`:`doc2x_convert_export_wait`
|
|
37
|
+
- 下载导出文件:`doc2x_download_url_to_file`
|
|
38
|
+
- 落盘 PDF v3 layout JSON:`doc2x_materialize_pdf_layout_json`
|
|
39
|
+
- 图片版面解析原始结果:`doc2x_parse_image_layout_sync`
|
|
40
|
+
- 图片版面解析并等待首屏 Markdown:`doc2x_parse_image_layout_submit` -> `doc2x_parse_image_layout_wait_text`
|
|
41
|
+
- 落盘 `convert_zip`:`doc2x_materialize_convert_zip`
|
|
42
|
+
- 配置排错:`doc2x_debug_config`
|
|
29
43
|
|
|
30
|
-
|
|
31
|
-
下载必须使用 `doc2x_convert_export_*` 返回的 `url`;不要自己拼接。
|
|
44
|
+
## 标准流程
|
|
32
45
|
|
|
33
|
-
|
|
34
|
-
`model` 仅用于 PDF 解析提交(默认 `v2`,可选 `v3-2026`);`formula_level` 仅用于导出(`doc2x_convert_export_*`),并且只在源解析任务使用 `v3-2026` 时生效(`v2` 下无效)。
|
|
46
|
+
### 1. PDF -> 完整文件
|
|
35
47
|
|
|
36
|
-
|
|
48
|
+
当用户要完整 Markdown / TeX / DOCX,本流程优先:
|
|
37
49
|
|
|
38
|
-
|
|
39
|
-
|
|
40
|
-
|
|
41
|
-
|
|
42
|
-
- `formula_level`:`0 | 1 | 2`(可选,**数字类型**,不要传字符串 `"0"|"1"|"2"`)
|
|
43
|
-
- `0`:不退化公式(保留原始 Markdown)
|
|
44
|
-
- `1`:行内公式退化为普通文本(`\(...\)`、`$...$`)
|
|
45
|
-
- `2`:行内 + 块级公式全部退化为普通文本(`\(...\)`、`$...$`、`\[...\]`、`$$...$$`)
|
|
50
|
+
1. `doc2x_parse_pdf_submit({ pdf_path, model? })`
|
|
51
|
+
2. 轮询 `doc2x_parse_pdf_status({ uid })` 直到成功
|
|
52
|
+
3. `doc2x_convert_export_wait({ uid, to, formula_mode, formula_level?, filename?, filename_mode? })`
|
|
53
|
+
4. `doc2x_download_url_to_file({ url, output_path })`
|
|
46
54
|
|
|
47
|
-
|
|
55
|
+
说明:
|
|
48
56
|
|
|
49
|
-
-
|
|
50
|
-
-
|
|
51
|
-
-
|
|
52
|
-
- **下载落盘**:`doc2x_download_url_to_file`
|
|
53
|
-
- **图片版面解析**:`doc2x_parse_image_layout_sync` 或 `doc2x_parse_image_layout_submit` → `doc2x_parse_image_layout_wait_text`
|
|
54
|
-
- **解包资源 zip**:`doc2x_materialize_convert_zip`
|
|
55
|
-
- **配置排错**:`doc2x_debug_config`
|
|
57
|
+
- `md/docx` 常用 `formula_mode: "normal"`
|
|
58
|
+
- `tex` 常用 `formula_mode: "dollar"`
|
|
59
|
+
- 需要完整内容时,不要用 `doc2x_parse_pdf_wait_text` 代替导出
|
|
56
60
|
|
|
57
|
-
|
|
61
|
+
### 2. PDF -> 文本预览
|
|
58
62
|
|
|
59
|
-
|
|
63
|
+
仅在用户要快速预览、摘要、少量文本时使用:
|
|
60
64
|
|
|
61
|
-
|
|
65
|
+
- `doc2x_parse_pdf_wait_text({ pdf_path | uid, max_output_chars?, max_output_pages?, model? })`
|
|
62
66
|
|
|
63
|
-
|
|
64
|
-
- `doc2x_parse_pdf_status` 可并行(批量轮询)
|
|
65
|
-
- **流水线式并行**:某个 `uid` 一旦解析成功,立刻开始该 `uid` 的导出+下载(不必等所有 PDF 都解析完)
|
|
66
|
-
- 不同 `uid` 的导出与下载可并行
|
|
67
|
-
- **同一个 `uid` 的同一种导出配置(`to + formula_mode + formula_level (+ filename + filename_mode + merge_cross_page_forms...)`)不要并行重复提交**
|
|
68
|
-
- 同一个 `uid` 若要导出多种格式(例如 md + docx + tex),建议**按格式串行**,但不同 `uid` 仍可并行
|
|
67
|
+
若出现截断提示,应切回“PDF -> 完整文件”流程。
|
|
69
68
|
|
|
70
|
-
|
|
69
|
+
### 3. PDF -> v3 layout JSON
|
|
71
70
|
|
|
72
|
-
|
|
71
|
+
当用户要 figure/table 坐标、block bbox、layout blocks、后续裁剪脚本输入时使用:
|
|
73
72
|
|
|
74
|
-
|
|
73
|
+
- 优先:`doc2x_materialize_pdf_layout_json({ uid | pdf_path, output_path, model? })`
|
|
75
74
|
|
|
76
|
-
|
|
77
|
-
- 若 `status="failed"`:汇报 `detail`,该文件停止后续步骤
|
|
75
|
+
要向用户说明 `layout` 的用途:
|
|
78
76
|
|
|
79
|
-
|
|
77
|
+
- `Markdown/text` 适合阅读正文;`layout` 适合程序继续处理页面结构
|
|
78
|
+
- `layout.blocks[].bbox` 可用于 figure/table 裁剪、区域截图、框选高亮、可视化调试
|
|
79
|
+
- `layout.blocks[].type` 可用于区分标题、正文、表格、图片等块,做结构化抽取
|
|
80
|
+
- `layout` 适合作为后续脚本输入,例如 figure/table crop、block 对齐、版面分析
|
|
81
|
+
- 如果用户只想“看内容”,优先给 Markdown / DOCX;如果用户要“知道内容在页面哪里”,就用 `layout`
|
|
80
82
|
|
|
81
|
-
|
|
83
|
+
行为要求:
|
|
82
84
|
|
|
83
|
-
-
|
|
84
|
-
-
|
|
85
|
-
-
|
|
85
|
+
- 走 `pdf_path` 分支时,默认使用 `v3-2026`
|
|
86
|
+
- 输出的是原始 parse `result` JSON,而不是精简文本
|
|
87
|
+
- 若返回结果缺少 `pages[].layout`,应视为失败而不是静默降级
|
|
86
88
|
|
|
87
|
-
|
|
89
|
+
### 4. 图片 -> 版面结果
|
|
88
90
|
|
|
89
|
-
|
|
91
|
+
- 直接拿原始结果:`doc2x_parse_image_layout_sync({ image_path })`
|
|
92
|
+
- 等待并取首屏 Markdown:`doc2x_parse_image_layout_submit({ image_path })` -> `doc2x_parse_image_layout_wait_text({ uid })`
|
|
93
|
+
- 结果包含 `convert_zip` 且用户要资源落盘时:`doc2x_materialize_convert_zip({ convert_zip_base64, output_dir })`
|
|
90
94
|
|
|
91
|
-
|
|
92
|
-
- 需要做公式退化时显式传 `formula_level`(`0/1/2`);若不需要退化,建议显式传 `0`,避免调用端默认值歧义
|
|
93
|
-
- `filename`/`filename_mode` 主要用于 `md/tex`:传不带扩展名的 basename,并配合 `filename_mode: "auto"`(避免 `name.md.md` / `name.tex.tex`)
|
|
94
|
-
- 对同一个 `uid` 做多格式导出时,先确定顺序(例如先 md 再 docx),逐个完成再进行下一个格式
|
|
95
|
-
- 对同一个 `uid` 的同一格式做“多档参数对比”(如 `formula_level`),每一档都要先下载再进行下一档,避免覆盖导致误判
|
|
95
|
+
### 5. 批量 PDF
|
|
96
96
|
|
|
97
|
-
|
|
97
|
+
批量场景采用流水线,不要全串行:
|
|
98
98
|
|
|
99
|
-
|
|
100
|
-
|
|
99
|
+
1. 多个 `pdf_path` 可并行 `doc2x_parse_pdf_submit`
|
|
100
|
+
2. 多个 `uid` 可并行 `doc2x_parse_pdf_status`
|
|
101
|
+
3. 某个 `uid` 一旦 parse 成功,立即开始它自己的导出和下载
|
|
102
|
+
4. 不同 `uid` 可并行导出
|
|
103
|
+
5. 同一个 `uid` 的同一种导出配置不要并发
|
|
101
104
|
|
|
102
|
-
|
|
105
|
+
## 向用户回报
|
|
103
106
|
|
|
104
|
-
-
|
|
107
|
+
- 成功时报告:输入文件、`uid`、输出路径、必要时 `bytes_written`
|
|
108
|
+
- 失败时报告:错误码、错误消息、相关 `uid`,并指出哪些文件未受影响
|
|
109
|
+
- 当用户目标是“本地文件”时,优先回报落盘结果,不要只贴长文本
|
|
105
110
|
|
|
106
|
-
|
|
111
|
+
## 常见错误处理
|
|
107
112
|
|
|
108
|
-
|
|
109
|
-
|
|
110
|
-
|
|
111
|
-
|
|
112
|
-
|
|
113
|
-
当用户目标是“拿到完整 Markdown / 落盘”,主链路应当是导出与下载,不要依赖 `doc2x_parse_pdf_wait_text`。
|
|
114
|
-
|
|
115
|
-
**提交解析任务**
|
|
116
|
-
|
|
117
|
-
- `doc2x_parse_pdf_submit({ pdf_path, model? })` → `{ uid }`
|
|
118
|
-
|
|
119
|
-
**等待解析完成**
|
|
120
|
-
|
|
121
|
-
- 轮询 `doc2x_parse_pdf_status({ uid })` 直到 `status="success"`(失败则带 `detail` 汇报)
|
|
122
|
-
|
|
123
|
-
**导出 Markdown**
|
|
124
|
-
|
|
125
|
-
- `doc2x_convert_export_wait({ uid, to: "md", formula_mode: "normal", formula_level?, filename?, filename_mode? })` → `{ status: "success", url }`
|
|
126
|
-
|
|
127
|
-
**下载落盘**
|
|
128
|
-
|
|
129
|
-
- `doc2x_download_url_to_file({ url, output_path })` → `{ output_path, bytes_written }`
|
|
130
|
-
|
|
131
|
-
**向用户回报**
|
|
132
|
-
|
|
133
|
-
- 回复用户:保存路径、文件大小、`uid`(必要时附上 `url`)
|
|
134
|
-
|
|
135
|
-
### 工作流 C:PDF → 文本预览(可控长度)
|
|
136
|
-
|
|
137
|
-
当用户只需要“摘要/少量预览”时才用:
|
|
138
|
-
|
|
139
|
-
- `doc2x_parse_pdf_wait_text({ pdf_path | uid, max_output_chars?, max_output_pages? })`
|
|
140
|
-
|
|
141
|
-
如果返回包含截断提示(`[doc2x-mcp] Output truncated ...`),应切换到“工作流 B”导出 md 获取完整内容。
|
|
142
|
-
|
|
143
|
-
### 工作流 D:PDF 导出格式(MD / TEX / DOCX)
|
|
144
|
-
|
|
145
|
-
- Markdown:`to="md"`(完整 Markdown 导出优先参考“工作流 B”)
|
|
146
|
-
- LaTeX:`to="tex"`
|
|
147
|
-
- Word:`to="docx"`
|
|
148
|
-
- 调用链同“工作流 A / B”(先解析 → 再导出 → 再下载),按目标格式调整 `to`(并按需设置 `formula_mode/formula_level/filename`)
|
|
149
|
-
- 注意:`doc2x_convert_export_submit.formula_mode` 必填(`"normal"` 或 `"dollar"`);`formula_level` 可选(`0/1/2`)
|
|
150
|
-
- 若需要对比不同 `formula_level`,请按顺序执行并在每次导出成功后立即下载,再进行下一档,避免后一次结果覆盖前一次。
|
|
151
|
-
|
|
152
|
-
### 工作流 E:图片 → Markdown(版面解析)
|
|
153
|
-
|
|
154
|
-
- 只要结果(同步):`doc2x_parse_image_layout_sync({ image_path })`(返回原始 JSON,可能包含 `convert_zip`)
|
|
155
|
-
- 要首屏 markdown(异步):`doc2x_parse_image_layout_submit({ image_path })` → `doc2x_parse_image_layout_wait_text({ uid })`
|
|
156
|
-
|
|
157
|
-
如果结果里有 `convert_zip`(base64)且用户希望落盘资源文件:
|
|
158
|
-
|
|
159
|
-
- `doc2x_materialize_convert_zip({ convert_zip_base64, output_dir })` → `{ output_dir, zip_path, extracted }`
|
|
160
|
-
|
|
161
|
-
## 失败与排错(你应当这样处理)
|
|
162
|
-
|
|
163
|
-
1. 鉴权/配置异常
|
|
164
|
-
先 `doc2x_debug_config()`,确认 `apiKeyLen > 0` 且 `baseUrl/httpTimeoutMs/pollIntervalMs/maxWaitMs` 合理。
|
|
165
|
-
|
|
166
|
-
2. 等待超时
|
|
167
|
-
建议用户调大 `DOC2X_MAX_WAIT_MS` 或按需调 `DOC2X_POLL_INTERVAL_MS`(不要过于频繁)。
|
|
168
|
-
|
|
169
|
-
3. 下载被阻止(安全策略)
|
|
170
|
-
`doc2x_download_url_to_file` 只允许 `https` 且要求 host 在 `DOC2X_DOWNLOAD_URL_ALLOWLIST` 内;被拦截时解释原因,并让用户选择“加 allowlist”或“保持默认安全策略”。
|
|
171
|
-
|
|
172
|
-
4. 用户给的是相对路径/不确定路径
|
|
173
|
-
要求用户提供绝对路径;不要猜。
|
|
113
|
+
1. 缺参数或路径不合法:提示用户提供绝对路径,不要猜测相对路径。
|
|
114
|
+
2. 等待超时:说明可调大 `DOC2X_MAX_WAIT_MS` 或适度调整轮询间隔。
|
|
115
|
+
3. 下载被策略拦截:解释是 `DOC2X_DOWNLOAD_URL_ALLOWLIST` 限制,不要绕过。
|
|
116
|
+
4. 认证或配置问题:调用 `doc2x_debug_config`,只汇报 `apiKeySource/apiKeyPrefix/apiKeyLen` 等摘要。
|