@noedgeai-org/doc2x-mcp 0.1.2 → 0.1.3-dev.2.2
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +58 -1
- package/README_EN.md +57 -0
- package/dist/doc2x/convert.js +3 -0
- package/dist/doc2x/pdf.js +11 -5
- package/dist/mcp/registerTools.js +54 -24
- package/package.json +12 -4
- package/scripts/install-skill-winps.ps1 +172 -0
- package/scripts/install-skill.ps1 +147 -0
- package/scripts/install-skill.sh +195 -0
- package/skills/doc2x-mcp/SKILL.md +169 -0
package/README.md
CHANGED
|
@@ -28,7 +28,7 @@
|
|
|
28
28
|
```json
|
|
29
29
|
{
|
|
30
30
|
"command": "npx",
|
|
31
|
-
"args": ["-y", "@noedgeai-org/doc2x-mcp"],
|
|
31
|
+
"args": ["-y", "@noedgeai-org/doc2x-mcp@latest"],
|
|
32
32
|
"env": {
|
|
33
33
|
"DOC2X_API_KEY": "sk-xxx",
|
|
34
34
|
"DOC2X_BASE_URL": "https://v2.doc2x.noedgeai.com"
|
|
@@ -72,6 +72,63 @@ DOC2X_API_KEY=sk-xxx npm start
|
|
|
72
72
|
- `doc2x_materialize_convert_zip`
|
|
73
73
|
- `doc2x_debug_config`
|
|
74
74
|
|
|
75
|
+
### PDF 解析模型(`doc2x_parse_pdf_submit` / `doc2x_parse_pdf_wait_text`)
|
|
76
|
+
|
|
77
|
+
- 可选参数:`model`
|
|
78
|
+
- 可选值:仅 `v3-2026`(最新模型)
|
|
79
|
+
- 说明:不传 `model` 时默认使用 `v2`;若想体验最新模型,传:
|
|
80
|
+
|
|
81
|
+
```json
|
|
82
|
+
{
|
|
83
|
+
"model": "v3-2026"
|
|
84
|
+
}
|
|
85
|
+
```
|
|
86
|
+
|
|
87
|
+
### 导出公式参数(`doc2x_convert_export_submit` / `doc2x_convert_export_wait`)
|
|
88
|
+
|
|
89
|
+
- 必选参数:`formula_mode`(`normal` / `dollar`)
|
|
90
|
+
- 可选参数:`formula_level`(`int32`,仅源解析任务为 `model=v3-2026` 时生效,`v2` 下无效)
|
|
91
|
+
- 取值说明:
|
|
92
|
+
- `0`:不退化公式(保留原始 Markdown)
|
|
93
|
+
- `1`:行内公式变为普通文本(退化 `\\(...\\)` 和 `$...$`)
|
|
94
|
+
- `2`:全部公式变为普通文本(退化 `\\(...\\)`、`$...$`、`\\[...\\]`、`$$...$$`)
|
|
95
|
+
|
|
75
96
|
## 5) 协议
|
|
76
97
|
|
|
77
98
|
MIT License,详见 `LICENSE`。
|
|
99
|
+
|
|
100
|
+
## 6) 安装本仓库 Skill(可选)
|
|
101
|
+
|
|
102
|
+
用于给 Codex CLI / Claude Code 增加一个“教大模型如何使用 doc2x-mcp tools 的 Skill”(便于按固定工作流调用 tools、导出与下载、以及排错)。
|
|
103
|
+
|
|
104
|
+
不需要 clone 仓库的一键安装(推荐):
|
|
105
|
+
|
|
106
|
+
```bash
|
|
107
|
+
curl -fsSL https://raw.githubusercontent.com/NoEdgeAI/doc2x-mcp/main/scripts/install-skill.sh | sh
|
|
108
|
+
```
|
|
109
|
+
|
|
110
|
+
重复执行同一条命令即可覆盖安装(默认会覆盖已存在目录)。
|
|
111
|
+
|
|
112
|
+
在本仓库源码目录安装:
|
|
113
|
+
|
|
114
|
+
```bash
|
|
115
|
+
npm run skill:install
|
|
116
|
+
```
|
|
117
|
+
|
|
118
|
+
默认安装到:
|
|
119
|
+
|
|
120
|
+
脚本默认安装到:
|
|
121
|
+
|
|
122
|
+
- Codex CLI:`~/.codex/skills/public/doc2x-mcp`(用 `CODEX_HOME` 覆盖)
|
|
123
|
+
- Claude Code:`~/.claude/skills/doc2x-mcp`(用 `CLAUDE_HOME` 覆盖)
|
|
124
|
+
|
|
125
|
+
说明:
|
|
126
|
+
|
|
127
|
+
- `--target auto`(默认)会同时安装到 Codex + Claude;如只想装其中一个,用 `--target codex|claude`。
|
|
128
|
+
- PowerShell 7+ 一键安装:`irm https://raw.githubusercontent.com/NoEdgeAI/doc2x-mcp/main/scripts/install-skill.ps1 | iex`
|
|
129
|
+
- Windows PowerShell 5.1 一键安装:`irm https://raw.githubusercontent.com/NoEdgeAI/doc2x-mcp/main/scripts/install-skill-winps.ps1 | iex`
|
|
130
|
+
|
|
131
|
+
覆盖安装目录示例:
|
|
132
|
+
|
|
133
|
+
- mac/linux:`CODEX_HOME=/custom/.codex curl -fsSL https://raw.githubusercontent.com/NoEdgeAI/doc2x-mcp/main/scripts/install-skill.sh | sh -s -- --target codex`
|
|
134
|
+
- Windows:`$env:CODEX_HOME="C:\\path\\.codex"; irm https://raw.githubusercontent.com/NoEdgeAI/doc2x-mcp/main/scripts/install-skill.ps1 | iex`
|
package/README_EN.md
CHANGED
|
@@ -72,6 +72,63 @@ DOC2X_API_KEY=sk-xxx npm start
|
|
|
72
72
|
- `doc2x_materialize_convert_zip`
|
|
73
73
|
- `doc2x_debug_config`
|
|
74
74
|
|
|
75
|
+
### PDF Parse Model (`doc2x_parse_pdf_submit` / `doc2x_parse_pdf_wait_text`)
|
|
76
|
+
|
|
77
|
+
- Optional parameter: `model`
|
|
78
|
+
- Supported values: only `v3-2026` (latest model)
|
|
79
|
+
- Notes: omit `model` to use default `v2`; to try the latest model, pass:
|
|
80
|
+
|
|
81
|
+
```json
|
|
82
|
+
{
|
|
83
|
+
"model": "v3-2026"
|
|
84
|
+
}
|
|
85
|
+
```
|
|
86
|
+
|
|
87
|
+
### Export Formula Parameters (`doc2x_convert_export_submit` / `doc2x_convert_export_wait`)
|
|
88
|
+
|
|
89
|
+
- Required parameter: `formula_mode` (`normal` / `dollar`)
|
|
90
|
+
- Optional parameter: `formula_level` (`int32`, effective only when the source parse task uses `model=v3-2026`; ignored by `v2`)
|
|
91
|
+
- Value mapping:
|
|
92
|
+
- `0`: keep formulas as-is (preserve original Markdown)
|
|
93
|
+
- `1`: degrade inline formulas to plain text (`\\(...\\)` and `$...$`)
|
|
94
|
+
- `2`: degrade all formulas to plain text (`\\(...\\)`, `$...$`, `\\[...\\]`, `$$...$$`)
|
|
95
|
+
|
|
75
96
|
## 5) License
|
|
76
97
|
|
|
77
98
|
MIT License. See `LICENSE`.
|
|
99
|
+
|
|
100
|
+
## 6) Install Repo Skill (Optional)
|
|
101
|
+
|
|
102
|
+
Installs a tool-use skill for Codex CLI / Claude Code (teaches the LLM how to use doc2x-mcp tools with a standard workflow: submit/status/wait/export/download).
|
|
103
|
+
|
|
104
|
+
One-command install without cloning (recommended):
|
|
105
|
+
|
|
106
|
+
```bash
|
|
107
|
+
curl -fsSL https://raw.githubusercontent.com/NoEdgeAI/doc2x-mcp/main/scripts/install-skill.sh | sh
|
|
108
|
+
```
|
|
109
|
+
|
|
110
|
+
Re-run the same command to overwrite (default behavior overwrites an existing destination directory).
|
|
111
|
+
|
|
112
|
+
Install from this repo source directory:
|
|
113
|
+
|
|
114
|
+
```bash
|
|
115
|
+
npm run skill:install
|
|
116
|
+
```
|
|
117
|
+
|
|
118
|
+
Default destination:
|
|
119
|
+
|
|
120
|
+
The script installs to:
|
|
121
|
+
|
|
122
|
+
- Codex CLI: `~/.codex/skills/public/doc2x-mcp` (override via `CODEX_HOME`)
|
|
123
|
+
- Claude Code: `~/.claude/skills/doc2x-mcp` (override via `CLAUDE_HOME`)
|
|
124
|
+
|
|
125
|
+
Notes:
|
|
126
|
+
|
|
127
|
+
- `--target auto` (default) installs to both Codex + Claude; use `--target codex|claude` to install only one.
|
|
128
|
+
- PowerShell 7+ one-command install: `irm https://raw.githubusercontent.com/NoEdgeAI/doc2x-mcp/main/scripts/install-skill.ps1 | iex`
|
|
129
|
+
- Windows PowerShell 5.1 one-command install: `irm https://raw.githubusercontent.com/NoEdgeAI/doc2x-mcp/main/scripts/install-skill-winps.ps1 | iex`
|
|
130
|
+
|
|
131
|
+
Override install dir examples:
|
|
132
|
+
|
|
133
|
+
- mac/linux: `CODEX_HOME=/custom/.codex curl -fsSL https://raw.githubusercontent.com/NoEdgeAI/doc2x-mcp/main/scripts/install-skill.sh | sh -s -- --target codex`
|
|
134
|
+
- Windows: `$env:CODEX_HOME="C:\\path\\.codex"; irm https://raw.githubusercontent.com/NoEdgeAI/doc2x-mcp/main/scripts/install-skill.ps1 | iex`
|
package/dist/doc2x/convert.js
CHANGED
|
@@ -7,6 +7,7 @@ import { doc2xRequestJson, normalizeUrl } from '#doc2x/client';
|
|
|
7
7
|
import { DOC2X_TASK_STATUS_FAILED, DOC2X_TASK_STATUS_SUCCESS } from '#doc2x/constants';
|
|
8
8
|
import { HTTP_METHOD_GET, HTTP_METHOD_POST } from '#doc2x/http';
|
|
9
9
|
import { v2 } from '#doc2x/paths';
|
|
10
|
+
export const CONVERT_FORMULA_LEVELS = [0, 1, 2];
|
|
10
11
|
function normalizeExportFilename(filename, to, mode) {
|
|
11
12
|
const v = String(filename).trim();
|
|
12
13
|
if (!v)
|
|
@@ -36,6 +37,8 @@ export async function convertExportSubmit(args) {
|
|
|
36
37
|
to: args.to,
|
|
37
38
|
formula_mode: args.formula_mode,
|
|
38
39
|
};
|
|
40
|
+
if (args.formula_level != null)
|
|
41
|
+
body.formula_level = args.formula_level;
|
|
39
42
|
if (args.merge_cross_page_forms != null)
|
|
40
43
|
body.merge_cross_page_forms = args.merge_cross_page_forms;
|
|
41
44
|
if (args.filename != null)
|
package/dist/doc2x/pdf.js
CHANGED
|
@@ -9,6 +9,7 @@ import { doc2xRequestJson, putToSignedUrl } from '#doc2x/client';
|
|
|
9
9
|
import { DOC2X_TASK_STATUS_FAILED, DOC2X_TASK_STATUS_SUCCESS } from '#doc2x/constants';
|
|
10
10
|
import { HTTP_METHOD_GET, HTTP_METHOD_POST } from '#doc2x/http';
|
|
11
11
|
import { v2 } from '#doc2x/paths';
|
|
12
|
+
export const PARSE_PDF_MODELS = ['v3-2026'];
|
|
12
13
|
function mergePagesToTextWithLimit(result, joinWith, limits) {
|
|
13
14
|
const pages = _.sortBy(_.isArray(result?.pages) ? result.pages : [], (p) => Number(p?.page_idx ?? 0));
|
|
14
15
|
const maxPages = (limits?.maxOutputPages ?? 0) > 0 ? Number(limits?.maxOutputPages) : Number.POSITIVE_INFINITY;
|
|
@@ -53,11 +54,15 @@ function mergePagesToTextWithLimit(result, joinWith, limits) {
|
|
|
53
54
|
truncated = true;
|
|
54
55
|
return { text: parts.join(''), truncated, returnedPages, totalPages: pages.length };
|
|
55
56
|
}
|
|
56
|
-
async function preuploadPdfWithRetry() {
|
|
57
|
+
async function preuploadPdfWithRetry(model) {
|
|
58
|
+
const body = {};
|
|
59
|
+
if (model)
|
|
60
|
+
body.model = model;
|
|
61
|
+
const payload = Object.keys(body).length > 0 ? { body } : undefined;
|
|
57
62
|
let attempt = 0;
|
|
58
63
|
while (true) {
|
|
59
64
|
try {
|
|
60
|
-
const data = await doc2xRequestJson(HTTP_METHOD_POST, v2('/parse/preupload'));
|
|
65
|
+
const data = await doc2xRequestJson(HTTP_METHOD_POST, v2('/parse/preupload'), payload);
|
|
61
66
|
return { uid: String(data.uid), url: String(data.url) };
|
|
62
67
|
}
|
|
63
68
|
catch (e) {
|
|
@@ -69,7 +74,7 @@ async function preuploadPdfWithRetry() {
|
|
|
69
74
|
}
|
|
70
75
|
}
|
|
71
76
|
}
|
|
72
|
-
export async function parsePdfSubmit(pdfPath) {
|
|
77
|
+
export async function parsePdfSubmit(pdfPath, opts) {
|
|
73
78
|
const p = path.resolve(pdfPath);
|
|
74
79
|
if (!p.toLowerCase().endsWith('.pdf'))
|
|
75
80
|
throw new ToolError({
|
|
@@ -78,12 +83,13 @@ export async function parsePdfSubmit(pdfPath) {
|
|
|
78
83
|
retryable: false,
|
|
79
84
|
});
|
|
80
85
|
await fsp.access(p);
|
|
81
|
-
|
|
86
|
+
const model = opts?.model;
|
|
87
|
+
let data = await preuploadPdfWithRetry(model);
|
|
82
88
|
try {
|
|
83
89
|
await putToSignedUrl(String(data.url), p);
|
|
84
90
|
}
|
|
85
91
|
catch {
|
|
86
|
-
data = await preuploadPdfWithRetry();
|
|
92
|
+
data = await preuploadPdfWithRetry(model);
|
|
87
93
|
await putToSignedUrl(String(data.url), p);
|
|
88
94
|
}
|
|
89
95
|
return { uid: String(data.uid) };
|
|
@@ -3,11 +3,11 @@ import path from 'node:path';
|
|
|
3
3
|
import _ from 'lodash';
|
|
4
4
|
import { z } from 'zod';
|
|
5
5
|
import { CONFIG, RESOLVED_KEY, parseDownloadUrlAllowlist } from '#config';
|
|
6
|
-
import { convertExportResult, convertExportSubmit, convertExportWaitByUid } from '#doc2x/convert';
|
|
6
|
+
import { CONVERT_FORMULA_LEVELS, convertExportResult, convertExportSubmit, convertExportWaitByUid, } from '#doc2x/convert';
|
|
7
7
|
import { downloadUrlToFile } from '#doc2x/download';
|
|
8
8
|
import { parseImageLayoutStatus, parseImageLayoutSubmit, parseImageLayoutSync, parseImageLayoutWaitTextByUid, } from '#doc2x/image';
|
|
9
9
|
import { materializeConvertZip } from '#doc2x/materialize';
|
|
10
|
-
import { parsePdfStatus, parsePdfSubmit, parsePdfWaitTextByUid } from '#doc2x/pdf';
|
|
10
|
+
import { PARSE_PDF_MODELS, parsePdfStatus, parsePdfSubmit, parsePdfWaitTextByUid, } from '#doc2x/pdf';
|
|
11
11
|
import { ToolError } from '#errors';
|
|
12
12
|
import { TOOL_ERROR_CODE_INVALID_ARGUMENT } from '#errorCodes';
|
|
13
13
|
import { asErrorResult, asJsonResult, asTextResult } from '#mcp/results';
|
|
@@ -19,6 +19,23 @@ async function fileSig(p) {
|
|
|
19
19
|
function sameSig(a, b) {
|
|
20
20
|
return a.absPath === b.absPath && a.size === b.size && a.mtimeMs === b.mtimeMs;
|
|
21
21
|
}
|
|
22
|
+
function normalizeParsePdfModel(model) {
|
|
23
|
+
return model ?? 'v2';
|
|
24
|
+
}
|
|
25
|
+
function makePdfUidCacheKey(absPath, model) {
|
|
26
|
+
return JSON.stringify([absPath, normalizeParsePdfModel(model)]);
|
|
27
|
+
}
|
|
28
|
+
function makeConvertSubmitKey(args) {
|
|
29
|
+
return JSON.stringify({
|
|
30
|
+
uid: args.uid,
|
|
31
|
+
to: args.to,
|
|
32
|
+
formula_mode: args.formula_mode ?? null,
|
|
33
|
+
formula_level: args.formula_level ?? null,
|
|
34
|
+
filename: args.filename ?? null,
|
|
35
|
+
filename_mode: args.filename_mode ?? null,
|
|
36
|
+
merge_cross_page_forms: args.merge_cross_page_forms ?? null,
|
|
37
|
+
});
|
|
38
|
+
}
|
|
22
39
|
export function registerTools(server) {
|
|
23
40
|
const pdfUidCache = new Map();
|
|
24
41
|
const imageUidCache = new Map();
|
|
@@ -30,12 +47,16 @@ export function registerTools(server) {
|
|
|
30
47
|
.string()
|
|
31
48
|
.min(1)
|
|
32
49
|
.describe("Absolute path to a local PDF file. Use an absolute path (relative paths are resolved from the MCP server process cwd, which may be '/'). Must end with '.pdf'."),
|
|
50
|
+
model: z
|
|
51
|
+
.enum(PARSE_PDF_MODELS)
|
|
52
|
+
.optional()
|
|
53
|
+
.describe("Optional parse model. Use 'v3-2026' to try the latest model. Omit this field to use default v2."),
|
|
33
54
|
},
|
|
34
|
-
}, async ({ pdf_path }) => {
|
|
55
|
+
}, async ({ pdf_path, model }) => {
|
|
35
56
|
try {
|
|
36
57
|
const sig = await fileSig(pdf_path);
|
|
37
|
-
const res = await parsePdfSubmit(pdf_path);
|
|
38
|
-
pdfUidCache.set(sig.absPath, { sig, uid: res.uid });
|
|
58
|
+
const res = await parsePdfSubmit(pdf_path, { model });
|
|
59
|
+
pdfUidCache.set(makePdfUidCacheKey(sig.absPath, model), { sig, uid: res.uid });
|
|
39
60
|
return asJsonResult(res);
|
|
40
61
|
}
|
|
41
62
|
catch (e) {
|
|
@@ -84,6 +105,10 @@ export function registerTools(server) {
|
|
|
84
105
|
.min(0)
|
|
85
106
|
.optional()
|
|
86
107
|
.describe('Max pages to merge into returned text (0 = unlimited). Default can be set via env DOC2X_PARSE_PDF_MAX_OUTPUT_PAGES.'),
|
|
108
|
+
model: z
|
|
109
|
+
.enum(PARSE_PDF_MODELS)
|
|
110
|
+
.optional()
|
|
111
|
+
.describe("Optional parse model used only when submitting from pdf_path. Use 'v3-2026' to try latest model. Omit this field to use default v2."),
|
|
87
112
|
},
|
|
88
113
|
}, async (args) => {
|
|
89
114
|
try {
|
|
@@ -122,10 +147,12 @@ export function registerTools(server) {
|
|
|
122
147
|
}));
|
|
123
148
|
}
|
|
124
149
|
const sig = await fileSig(pdfPath);
|
|
125
|
-
const
|
|
150
|
+
const model = args.model;
|
|
151
|
+
const cacheKey = makePdfUidCacheKey(sig.absPath, model);
|
|
152
|
+
const cached = pdfUidCache.get(cacheKey);
|
|
126
153
|
const resolvedUid = cached && sameSig(cached.sig, sig) ? cached.uid : '';
|
|
127
|
-
const finalUid = resolvedUid || (await parsePdfSubmit(pdfPath)).uid;
|
|
128
|
-
pdfUidCache.set(
|
|
154
|
+
const finalUid = resolvedUid || (await parsePdfSubmit(pdfPath, { model })).uid;
|
|
155
|
+
pdfUidCache.set(cacheKey, { sig, uid: finalUid });
|
|
129
156
|
const out = await parsePdfWaitTextByUid({
|
|
130
157
|
uid: finalUid,
|
|
131
158
|
poll_interval_ms: args.poll_interval_ms,
|
|
@@ -147,6 +174,14 @@ export function registerTools(server) {
|
|
|
147
174
|
uid: z.string().min(1).describe('Doc2x parse task uid returned by doc2x_parse_pdf_submit.'),
|
|
148
175
|
to: z.enum(['md', 'tex', 'docx']),
|
|
149
176
|
formula_mode: z.enum(['normal', 'dollar']),
|
|
177
|
+
formula_level: z
|
|
178
|
+
.union([
|
|
179
|
+
z.literal(CONVERT_FORMULA_LEVELS[0]),
|
|
180
|
+
z.literal(CONVERT_FORMULA_LEVELS[1]),
|
|
181
|
+
z.literal(CONVERT_FORMULA_LEVELS[2]),
|
|
182
|
+
])
|
|
183
|
+
.optional()
|
|
184
|
+
.describe('Optional formula degradation level. Effective only when source parse uses model=v3-2026 (ignored by v2). 0: keep formulas, 1: degrade inline formulas, 2: degrade inline and block formulas.'),
|
|
150
185
|
filename: z
|
|
151
186
|
.string()
|
|
152
187
|
.describe("Optional output filename (for md/tex only). Tip: pass a basename WITHOUT extension to avoid getting 'name.md.md' / 'name.tex.tex'.")
|
|
@@ -159,14 +194,7 @@ export function registerTools(server) {
|
|
|
159
194
|
},
|
|
160
195
|
}, async (args) => {
|
|
161
196
|
try {
|
|
162
|
-
const key =
|
|
163
|
-
uid: args.uid,
|
|
164
|
-
to: args.to,
|
|
165
|
-
formula_mode: args.formula_mode,
|
|
166
|
-
filename: args.filename ?? null,
|
|
167
|
-
filename_mode: args.filename_mode ?? null,
|
|
168
|
-
merge_cross_page_forms: args.merge_cross_page_forms ?? null,
|
|
169
|
-
});
|
|
197
|
+
const key = makeConvertSubmitKey(args);
|
|
170
198
|
const res = await convertExportSubmit(args);
|
|
171
199
|
convertSubmitCache.add(key);
|
|
172
200
|
return asJsonResult(res);
|
|
@@ -196,6 +224,14 @@ export function registerTools(server) {
|
|
|
196
224
|
.enum(['md', 'tex', 'docx'])
|
|
197
225
|
.describe('Expected target format. Used to verify the result URL.'),
|
|
198
226
|
formula_mode: z.enum(['normal', 'dollar']).optional(),
|
|
227
|
+
formula_level: z
|
|
228
|
+
.union([
|
|
229
|
+
z.literal(CONVERT_FORMULA_LEVELS[0]),
|
|
230
|
+
z.literal(CONVERT_FORMULA_LEVELS[1]),
|
|
231
|
+
z.literal(CONVERT_FORMULA_LEVELS[2]),
|
|
232
|
+
])
|
|
233
|
+
.optional()
|
|
234
|
+
.describe('Optional formula degradation level used when this tool auto-submits export (formula_mode must be provided). Effective only when source parse uses model=v3-2026 (ignored by v2).'),
|
|
199
235
|
filename: z.string().optional(),
|
|
200
236
|
filename_mode: z.enum(['auto', 'raw']).optional(),
|
|
201
237
|
merge_cross_page_forms: z.boolean().optional(),
|
|
@@ -205,19 +241,13 @@ export function registerTools(server) {
|
|
|
205
241
|
}, async (args) => {
|
|
206
242
|
try {
|
|
207
243
|
if (args.formula_mode) {
|
|
208
|
-
const key =
|
|
209
|
-
uid: args.uid,
|
|
210
|
-
to: args.to,
|
|
211
|
-
formula_mode: args.formula_mode,
|
|
212
|
-
filename: args.filename ?? null,
|
|
213
|
-
filename_mode: args.filename_mode ?? null,
|
|
214
|
-
merge_cross_page_forms: args.merge_cross_page_forms ?? null,
|
|
215
|
-
});
|
|
244
|
+
const key = makeConvertSubmitKey(args);
|
|
216
245
|
if (!convertSubmitCache.has(key)) {
|
|
217
246
|
await convertExportSubmit({
|
|
218
247
|
uid: args.uid,
|
|
219
248
|
to: args.to,
|
|
220
249
|
formula_mode: args.formula_mode,
|
|
250
|
+
formula_level: args.formula_level,
|
|
221
251
|
filename: args.filename,
|
|
222
252
|
filename_mode: args.filename_mode,
|
|
223
253
|
merge_cross_page_forms: args.merge_cross_page_forms,
|
package/package.json
CHANGED
|
@@ -1,11 +1,12 @@
|
|
|
1
1
|
{
|
|
2
2
|
"name": "@noedgeai-org/doc2x-mcp",
|
|
3
|
-
"version": "0.1.2",
|
|
3
|
+
"version": "0.1.3-dev.2.2",
|
|
4
4
|
"description": "Doc2x MCP server (stdio, MCP SDK).",
|
|
5
5
|
"license": "MIT",
|
|
6
6
|
"engines": {
|
|
7
7
|
"node": ">=18"
|
|
8
8
|
},
|
|
9
|
+
"packageManager": "pnpm@10.26.2",
|
|
9
10
|
"type": "module",
|
|
10
11
|
"main": "dist/index.js",
|
|
11
12
|
"bin": {
|
|
@@ -16,19 +17,26 @@
|
|
|
16
17
|
"./LICENSE",
|
|
17
18
|
"./README.md",
|
|
18
19
|
"./README_EN.md",
|
|
20
|
+
"./scripts/install-skill.sh",
|
|
21
|
+
"./scripts/install-skill.ps1",
|
|
22
|
+
"./scripts/install-skill-winps.ps1",
|
|
23
|
+
"./skills/doc2x-mcp/SKILL.md",
|
|
19
24
|
"./package.json"
|
|
20
25
|
],
|
|
21
26
|
"scripts": {
|
|
22
27
|
"build": "node ./node_modules/typescript/bin/tsc -p tsconfig.json",
|
|
23
28
|
"format": "prettier --write .",
|
|
24
29
|
"format:check": "prettier --check .",
|
|
30
|
+
"skill:install": "sh scripts/install-skill.sh",
|
|
31
|
+
"skill:install:ps": "pwsh -NoProfile -ExecutionPolicy Bypass -File scripts/install-skill.ps1",
|
|
32
|
+
"skill:install:winps": "powershell -NoProfile -ExecutionPolicy Bypass -File scripts/install-skill-winps.ps1",
|
|
25
33
|
"start": "node dist/index.js",
|
|
26
|
-
"prepublishOnly": "
|
|
34
|
+
"prepublishOnly": "pnpm run build"
|
|
27
35
|
},
|
|
28
36
|
"dependencies": {
|
|
29
|
-
"@modelcontextprotocol/sdk": "
|
|
37
|
+
"@modelcontextprotocol/sdk": "1.26.0",
|
|
30
38
|
"@types/lodash": "^4.17.23",
|
|
31
|
-
"lodash": "
|
|
39
|
+
"lodash": "4.17.23",
|
|
32
40
|
"zod": "latest"
|
|
33
41
|
},
|
|
34
42
|
"devDependencies": {
|
|
@@ -0,0 +1,172 @@
|
|
|
1
|
+
[CmdletBinding()]
|
|
2
|
+
Param(
|
|
3
|
+
[ValidateSet("auto", "codex", "claude")]
|
|
4
|
+
[string]$Target = "auto",
|
|
5
|
+
|
|
6
|
+
[string]$Category = "public",
|
|
7
|
+
[string]$Name = "doc2x-mcp",
|
|
8
|
+
[string]$Dest = "",
|
|
9
|
+
|
|
10
|
+
[switch]$Force,
|
|
11
|
+
[switch]$DryRun
|
|
12
|
+
)
|
|
13
|
+
|
|
14
|
+
Set-StrictMode -Version 2.0
|
|
15
|
+
$ErrorActionPreference = "Stop"
|
|
16
|
+
|
|
17
|
+
# -------------------------------------------------------------------
|
|
18
|
+
# Best-effort TLS defaults for GitHub (Windows PowerShell 5.1)
|
|
19
|
+
# -------------------------------------------------------------------
|
|
20
|
+
|
|
21
|
+
try {
|
|
22
|
+
[Net.ServicePointManager]::SecurityProtocol = [Net.SecurityProtocolType]::Tls12
|
|
23
|
+
} catch {
|
|
24
|
+
# ignore
|
|
25
|
+
}
|
|
26
|
+
|
|
27
|
+
# -------------------------------------------------------------------
|
|
28
|
+
# Environment & paths
|
|
29
|
+
# -------------------------------------------------------------------
|
|
30
|
+
|
|
31
|
+
$userHome = $HOME
|
|
32
|
+
if (-not $userHome) {
|
|
33
|
+
$userHome = $env:USERPROFILE
|
|
34
|
+
}
|
|
35
|
+
if (-not $userHome) {
|
|
36
|
+
throw 'Cannot resolve home directory ($HOME / $env:USERPROFILE).'
|
|
37
|
+
}
|
|
38
|
+
|
|
39
|
+
$codexHome = $env:CODEX_HOME
|
|
40
|
+
if (-not $codexHome) {
|
|
41
|
+
$codexHome = Join-Path $userHome ".codex"
|
|
42
|
+
}
|
|
43
|
+
|
|
44
|
+
$claudeHome = $env:CLAUDE_HOME
|
|
45
|
+
if (-not $claudeHome) {
|
|
46
|
+
$claudeHome = Join-Path $userHome ".claude"
|
|
47
|
+
}
|
|
48
|
+
|
|
49
|
+
$codexRoot = Join-Path $codexHome "skills"
|
|
50
|
+
$claudeRoot = Join-Path $claudeHome "skills"
|
|
51
|
+
|
|
52
|
+
# -------------------------------------------------------------------
|
|
53
|
+
# Helper functions
|
|
54
|
+
# -------------------------------------------------------------------
|
|
55
|
+
|
|
56
|
+
function Get-InstallRoots {
|
|
57
|
+
param(
|
|
58
|
+
[string]$Target,
|
|
59
|
+
[string]$CodexRoot,
|
|
60
|
+
[string]$ClaudeRoot
|
|
61
|
+
)
|
|
62
|
+
|
|
63
|
+
switch ($Target) {
|
|
64
|
+
"codex" { return @($CodexRoot) }
|
|
65
|
+
"claude" { return @($ClaudeRoot) }
|
|
66
|
+
default {
|
|
67
|
+
$roots = @()
|
|
68
|
+
if (Test-Path $CodexRoot) { $roots += $CodexRoot }
|
|
69
|
+
if (Test-Path $ClaudeRoot) { $roots += $ClaudeRoot }
|
|
70
|
+
if ($roots.Count -gt 0) {
|
|
71
|
+
return $roots
|
|
72
|
+
}
|
|
73
|
+
return @($CodexRoot)
|
|
74
|
+
}
|
|
75
|
+
}
|
|
76
|
+
}
|
|
77
|
+
|
|
78
|
+
function New-TempFilePath {
|
|
79
|
+
return [System.IO.Path]::Combine(
|
|
80
|
+
[System.IO.Path]::GetTempPath(),
|
|
81
|
+
([System.Guid]::NewGuid().ToString() + ".md")
|
|
82
|
+
)
|
|
83
|
+
}
|
|
84
|
+
|
|
85
|
+
# -------------------------------------------------------------------
|
|
86
|
+
# Resolve install roots
|
|
87
|
+
# -------------------------------------------------------------------
|
|
88
|
+
|
|
89
|
+
$roots = Get-InstallRoots -Target $Target -CodexRoot $codexRoot -ClaudeRoot $claudeRoot
|
|
90
|
+
|
|
91
|
+
if ($Dest -and $roots.Count -gt 1) {
|
|
92
|
+
throw "-Dest cannot be used when installing to multiple targets."
|
|
93
|
+
}
|
|
94
|
+
|
|
95
|
+
# -------------------------------------------------------------------
|
|
96
|
+
# Resolve SKILL.md source
|
|
97
|
+
# -------------------------------------------------------------------
|
|
98
|
+
|
|
99
|
+
$rawBase = $env:DOC2X_MCP_RAW_BASE
|
|
100
|
+
if (-not $rawBase) {
|
|
101
|
+
$rawBase = "https://raw.githubusercontent.com/NoEdgeAI/doc2x-mcp/main"
|
|
102
|
+
}
|
|
103
|
+
$remoteSkillMdUrl = "$rawBase/skills/doc2x-mcp/SKILL.md"
|
|
104
|
+
|
|
105
|
+
$localSkillMdPath = ""
|
|
106
|
+
if (Test-Path ".\skills\doc2x-mcp\SKILL.md") {
|
|
107
|
+
$localSkillMdPath = ".\skills\doc2x-mcp\SKILL.md"
|
|
108
|
+
}
|
|
109
|
+
|
|
110
|
+
# -------------------------------------------------------------------
|
|
111
|
+
# Dry run
|
|
112
|
+
# -------------------------------------------------------------------
|
|
113
|
+
|
|
114
|
+
if ($DryRun) {
|
|
115
|
+
[pscustomobject]@{
|
|
116
|
+
roots = $roots
|
|
117
|
+
remote_skill_md_url = $remoteSkillMdUrl
|
|
118
|
+
local_skill_md_path = $localSkillMdPath
|
|
119
|
+
category = $Category
|
|
120
|
+
name = $Name
|
|
121
|
+
dest = $Dest
|
|
122
|
+
} | ConvertTo-Json -Depth 4
|
|
123
|
+
return
|
|
124
|
+
}
|
|
125
|
+
|
|
126
|
+
# -------------------------------------------------------------------
|
|
127
|
+
# Download / install
|
|
128
|
+
# -------------------------------------------------------------------
|
|
129
|
+
|
|
130
|
+
$tempSkillMd = ""
|
|
131
|
+
$tempIsTemp = $false
|
|
132
|
+
|
|
133
|
+
try {
|
|
134
|
+
if ($localSkillMdPath) {
|
|
135
|
+
$tempSkillMd = $localSkillMdPath
|
|
136
|
+
} else {
|
|
137
|
+
$tempSkillMd = New-TempFilePath
|
|
138
|
+
$tempIsTemp = $true
|
|
139
|
+
Invoke-WebRequest -UseBasicParsing -Uri $remoteSkillMdUrl -OutFile $tempSkillMd | Out-Null
|
|
140
|
+
}
|
|
141
|
+
|
|
142
|
+
foreach ($root in $roots) {
|
|
143
|
+
if ($Dest) {
|
|
144
|
+
$destDir = $Dest
|
|
145
|
+
} elseif ($root -eq $codexRoot) {
|
|
146
|
+
$destDir = Join-Path (Join-Path $root $Category) $Name
|
|
147
|
+
} elseif ($root -eq $claudeRoot) {
|
|
148
|
+
$destDir = Join-Path $root $Name
|
|
149
|
+
} else {
|
|
150
|
+
$destDir = Join-Path (Join-Path $root $Category) $Name
|
|
151
|
+
}
|
|
152
|
+
|
|
153
|
+
$skillMdDest = Join-Path $destDir "SKILL.md"
|
|
154
|
+
|
|
155
|
+
if (Test-Path $destDir) {
|
|
156
|
+
if ($Dest -and (-not $Force)) {
|
|
157
|
+
throw "Destination already exists: $destDir`nRe-run with -Force to overwrite an explicit -Dest."
|
|
158
|
+
}
|
|
159
|
+
Remove-Item -Recurse -Force $destDir
|
|
160
|
+
}
|
|
161
|
+
|
|
162
|
+
New-Item -ItemType Directory -Force -Path $destDir | Out-Null
|
|
163
|
+
Copy-Item -Force $tempSkillMd $skillMdDest
|
|
164
|
+
|
|
165
|
+
Write-Output "Installed skill to: $destDir"
|
|
166
|
+
}
|
|
167
|
+
}
|
|
168
|
+
finally {
|
|
169
|
+
if ($tempIsTemp -and $tempSkillMd -and (Test-Path $tempSkillMd)) {
|
|
170
|
+
Remove-Item -Force $tempSkillMd
|
|
171
|
+
}
|
|
172
|
+
}
|
|
@@ -0,0 +1,147 @@
|
|
|
1
|
+
[CmdletBinding()]
|
|
2
|
+
Param(
|
|
3
|
+
[ValidateSet("auto", "codex", "claude")]
|
|
4
|
+
[string]$Target = "auto",
|
|
5
|
+
|
|
6
|
+
[string]$Category = "public",
|
|
7
|
+
[string]$Name = "doc2x-mcp",
|
|
8
|
+
[string]$Dest = "",
|
|
9
|
+
|
|
10
|
+
[switch]$Force,
|
|
11
|
+
[switch]$DryRun
|
|
12
|
+
)
|
|
13
|
+
|
|
14
|
+
Set-StrictMode -Version Latest
|
|
15
|
+
$ErrorActionPreference = "Stop"
|
|
16
|
+
|
|
17
|
+
# -------------------------------------------------------------------
|
|
18
|
+
# Environment & paths
|
|
19
|
+
# -------------------------------------------------------------------
|
|
20
|
+
|
|
21
|
+
$userHome = $HOME
|
|
22
|
+
if (-not $userHome) {
|
|
23
|
+
throw '$HOME is not set'
|
|
24
|
+
}
|
|
25
|
+
|
|
26
|
+
$codexHome = $env:CODEX_HOME ?? (Join-Path $userHome ".codex")
|
|
27
|
+
$claudeHome = $env:CLAUDE_HOME ?? (Join-Path $userHome ".claude")
|
|
28
|
+
|
|
29
|
+
$codexRoot = Join-Path $codexHome "skills"
|
|
30
|
+
$claudeRoot = Join-Path $claudeHome "skills"
|
|
31
|
+
|
|
32
|
+
# -------------------------------------------------------------------
|
|
33
|
+
# Helper functions
|
|
34
|
+
# -------------------------------------------------------------------
|
|
35
|
+
|
|
36
|
+
function Get-InstallRoots {
|
|
37
|
+
param(
|
|
38
|
+
[string]$Target,
|
|
39
|
+
[string]$CodexRoot,
|
|
40
|
+
[string]$ClaudeRoot
|
|
41
|
+
)
|
|
42
|
+
|
|
43
|
+
switch ($Target) {
|
|
44
|
+
"codex" { return @($CodexRoot) }
|
|
45
|
+
"claude" { return @($ClaudeRoot) }
|
|
46
|
+
default {
|
|
47
|
+
$roots = @()
|
|
48
|
+
if (Test-Path $CodexRoot) { $roots += $CodexRoot }
|
|
49
|
+
if (Test-Path $ClaudeRoot) { $roots += $ClaudeRoot }
|
|
50
|
+
return ($roots.Count -gt 0) ? $roots : @($CodexRoot)
|
|
51
|
+
}
|
|
52
|
+
}
|
|
53
|
+
}
|
|
54
|
+
|
|
55
|
+
function New-TempFilePath {
|
|
56
|
+
return [System.IO.Path]::Combine(
|
|
57
|
+
[System.IO.Path]::GetTempPath(),
|
|
58
|
+
([System.Guid]::NewGuid().ToString() + ".md")
|
|
59
|
+
)
|
|
60
|
+
}
|
|
61
|
+
|
|
62
|
+
# -------------------------------------------------------------------
|
|
63
|
+
# Resolve install roots
|
|
64
|
+
# -------------------------------------------------------------------
|
|
65
|
+
|
|
66
|
+
$roots = Get-InstallRoots -Target $Target -CodexRoot $codexRoot -ClaudeRoot $claudeRoot
|
|
67
|
+
|
|
68
|
+
if ($Dest -and $roots.Count -gt 1) {
|
|
69
|
+
throw "-Dest cannot be used when installing to multiple targets."
|
|
70
|
+
}
|
|
71
|
+
|
|
72
|
+
# -------------------------------------------------------------------
|
|
73
|
+
# Resolve SKILL.md source
|
|
74
|
+
# -------------------------------------------------------------------
|
|
75
|
+
|
|
76
|
+
$rawBase = $env:DOC2X_MCP_RAW_BASE ?? "https://raw.githubusercontent.com/NoEdgeAI/doc2x-mcp/main"
|
|
77
|
+
$remoteSkillMdUrl = "$rawBase/skills/doc2x-mcp/SKILL.md"
|
|
78
|
+
|
|
79
|
+
$localSkillMdPath = ""
|
|
80
|
+
if (Test-Path ".\skills\doc2x-mcp\SKILL.md") {
|
|
81
|
+
$localSkillMdPath = ".\skills\doc2x-mcp\SKILL.md"
|
|
82
|
+
}
|
|
83
|
+
|
|
84
|
+
# -------------------------------------------------------------------
|
|
85
|
+
# Dry run
|
|
86
|
+
# -------------------------------------------------------------------
|
|
87
|
+
|
|
88
|
+
if ($DryRun) {
|
|
89
|
+
[pscustomobject]@{
|
|
90
|
+
roots = $roots
|
|
91
|
+
remote_skill_md_url = $remoteSkillMdUrl
|
|
92
|
+
local_skill_md_path = $localSkillMdPath
|
|
93
|
+
category = $Category
|
|
94
|
+
name = $Name
|
|
95
|
+
dest = $Dest
|
|
96
|
+
} | ConvertTo-Json -Depth 4
|
|
97
|
+
return
|
|
98
|
+
}
|
|
99
|
+
|
|
100
|
+
# -------------------------------------------------------------------
|
|
101
|
+
# Download / install
|
|
102
|
+
# -------------------------------------------------------------------
|
|
103
|
+
|
|
104
|
+
$tempSkillMd = ""
|
|
105
|
+
$tempIsTemp = $false
|
|
106
|
+
|
|
107
|
+
try {
|
|
108
|
+
if ($localSkillMdPath) {
|
|
109
|
+
$tempSkillMd = $localSkillMdPath
|
|
110
|
+
} else {
|
|
111
|
+
$tempSkillMd = New-TempFilePath
|
|
112
|
+
$tempIsTemp = $true
|
|
113
|
+
Invoke-WebRequest -Uri $remoteSkillMdUrl -OutFile $tempSkillMd | Out-Null
|
|
114
|
+
}
|
|
115
|
+
|
|
116
|
+
foreach ($root in $roots) {
|
|
117
|
+
|
|
118
|
+
if ($Dest) {
|
|
119
|
+
$destDir = $Dest
|
|
120
|
+
} elseif ($root -eq $codexRoot) {
|
|
121
|
+
$destDir = Join-Path (Join-Path $root $Category) $Name
|
|
122
|
+
} elseif ($root -eq $claudeRoot) {
|
|
123
|
+
$destDir = Join-Path $root $Name
|
|
124
|
+
} else {
|
|
125
|
+
$destDir = Join-Path (Join-Path $root $Category) $Name
|
|
126
|
+
}
|
|
127
|
+
|
|
128
|
+
$skillMdDest = Join-Path $destDir "SKILL.md"
|
|
129
|
+
|
|
130
|
+
if (Test-Path $destDir) {
|
|
131
|
+
if ($Dest -and (-not $Force)) {
|
|
132
|
+
throw "Destination already exists: $destDir`nRe-run with -Force to overwrite an explicit -Dest."
|
|
133
|
+
}
|
|
134
|
+
Remove-Item -Recurse -Force $destDir
|
|
135
|
+
}
|
|
136
|
+
|
|
137
|
+
New-Item -ItemType Directory -Force -Path $destDir | Out-Null
|
|
138
|
+
Copy-Item -Force $tempSkillMd $skillMdDest
|
|
139
|
+
|
|
140
|
+
Write-Output "Installed skill to: $destDir"
|
|
141
|
+
}
|
|
142
|
+
}
|
|
143
|
+
finally {
|
|
144
|
+
if ($tempIsTemp -and $tempSkillMd -and (Test-Path $tempSkillMd)) {
|
|
145
|
+
Remove-Item -Force $tempSkillMd
|
|
146
|
+
}
|
|
147
|
+
}
|
|
@@ -0,0 +1,195 @@
|
|
|
1
|
+
#!/bin/sh
|
|
2
|
+
set -eu
|
|
3
|
+
|
|
4
|
+
usage() {
|
|
5
|
+
cat <<'EOF'
|
|
6
|
+
Install doc2x-mcp skill into Codex CLI / Claude Code skills directory.
|
|
7
|
+
|
|
8
|
+
Usage:
|
|
9
|
+
sh install-skill.sh [--target auto|codex|claude]
|
|
10
|
+
curl -fsSL <URL>/scripts/install-skill.sh | sh
|
|
11
|
+
|
|
12
|
+
Options:
|
|
13
|
+
--target auto|codex|claude (default: auto; auto installs to both)
|
|
14
|
+
--category Codex category under skills root (default: public; ignored for Claude)
|
|
15
|
+
--name skill directory name (default: doc2x-mcp)
|
|
16
|
+
--dest explicit destination directory (overrides target/category/name)
|
|
17
|
+
--force allow overwriting when --dest points to an existing directory
|
|
18
|
+
--dry-run print planned paths only
|
|
19
|
+
|
|
20
|
+
Env:
|
|
21
|
+
CODEX_HOME override Codex home (default: ~/.codex)
|
|
22
|
+
CLAUDE_HOME override Claude home (default: ~/.claude)
|
|
23
|
+
DOC2X_MCP_RAW_BASE raw base URL (default: https://raw.githubusercontent.com/NoEdgeAI/doc2x-mcp/main)
|
|
24
|
+
EOF
|
|
25
|
+
}
|
|
26
|
+
|
|
27
|
+
TARGET="auto"
|
|
28
|
+
CATEGORY="public"
|
|
29
|
+
NAME="doc2x-mcp"
|
|
30
|
+
FORCE="0"
|
|
31
|
+
DRY_RUN="0"
|
|
32
|
+
DEST=""
|
|
33
|
+
|
|
34
|
+
while [ "$#" -gt 0 ]; do
|
|
35
|
+
case "$1" in
|
|
36
|
+
--target)
|
|
37
|
+
TARGET="${2:-}"
|
|
38
|
+
shift 2
|
|
39
|
+
;;
|
|
40
|
+
--category)
|
|
41
|
+
CATEGORY="${2:-}"
|
|
42
|
+
shift 2
|
|
43
|
+
;;
|
|
44
|
+
--name)
|
|
45
|
+
NAME="${2:-}"
|
|
46
|
+
shift 2
|
|
47
|
+
;;
|
|
48
|
+
--dest)
|
|
49
|
+
DEST="${2:-}"
|
|
50
|
+
shift 2
|
|
51
|
+
;;
|
|
52
|
+
--force)
|
|
53
|
+
FORCE="1"
|
|
54
|
+
shift 1
|
|
55
|
+
;;
|
|
56
|
+
--dry-run)
|
|
57
|
+
DRY_RUN="1"
|
|
58
|
+
shift 1
|
|
59
|
+
;;
|
|
60
|
+
-h|--help)
|
|
61
|
+
usage
|
|
62
|
+
exit 0
|
|
63
|
+
;;
|
|
64
|
+
*)
|
|
65
|
+
echo "Unknown arg: $1" >&2
|
|
66
|
+
usage >&2
|
|
67
|
+
exit 1
|
|
68
|
+
;;
|
|
69
|
+
esac
|
|
70
|
+
done
|
|
71
|
+
|
|
72
|
+
if [ "$TARGET" != "auto" ] && [ "$TARGET" != "codex" ] && [ "$TARGET" != "claude" ]; then
|
|
73
|
+
echo "Invalid --target: $TARGET (expected auto|codex|claude)" >&2
|
|
74
|
+
exit 1
|
|
75
|
+
fi
|
|
76
|
+
|
|
77
|
+
HOME_DIR="${HOME:-}"
|
|
78
|
+
if [ -z "$HOME_DIR" ]; then
|
|
79
|
+
echo "\$HOME is not set" >&2
|
|
80
|
+
exit 1
|
|
81
|
+
fi
|
|
82
|
+
|
|
83
|
+
CODEX_HOME_DIR="${CODEX_HOME:-$HOME_DIR/.codex}"
|
|
84
|
+
CLAUDE_HOME_DIR="${CLAUDE_HOME:-$HOME_DIR/.claude}"
|
|
85
|
+
|
|
86
|
+
CODEX_SKILLS_ROOT="$CODEX_HOME_DIR/skills"
|
|
87
|
+
CLAUDE_SKILLS_ROOT="$CLAUDE_HOME_DIR/skills"
|
|
88
|
+
|
|
89
|
+
pick_skills_roots() {
|
|
90
|
+
if [ "$TARGET" = "codex" ]; then
|
|
91
|
+
printf '%s\n' "$CODEX_SKILLS_ROOT"
|
|
92
|
+
return
|
|
93
|
+
fi
|
|
94
|
+
if [ "$TARGET" = "claude" ]; then
|
|
95
|
+
printf '%s\n' "$CLAUDE_SKILLS_ROOT"
|
|
96
|
+
return
|
|
97
|
+
fi
|
|
98
|
+
|
|
99
|
+
printf '%s\n' "$CODEX_SKILLS_ROOT"
|
|
100
|
+
printf '%s\n' "$CLAUDE_SKILLS_ROOT"
|
|
101
|
+
}
|
|
102
|
+
|
|
103
|
+
SKILLS_ROOTS="$(pick_skills_roots)"
|
|
104
|
+
|
|
105
|
+
RAW_BASE="${DOC2X_MCP_RAW_BASE:-https://raw.githubusercontent.com/NoEdgeAI/doc2x-mcp/main}"
|
|
106
|
+
REMOTE_SKILL_MD_URL="$RAW_BASE/skills/doc2x-mcp/SKILL.md"
|
|
107
|
+
|
|
108
|
+
LOCAL_SKILL_MD_PATH=""
|
|
109
|
+
if [ -f "./skills/doc2x-mcp/SKILL.md" ]; then
|
|
110
|
+
LOCAL_SKILL_MD_PATH="./skills/doc2x-mcp/SKILL.md"
|
|
111
|
+
fi
|
|
112
|
+
|
|
113
|
+
count_lines() {
|
|
114
|
+
echo "$1" | awk 'NF{c++} END{print c+0}'
|
|
115
|
+
}
|
|
116
|
+
|
|
117
|
+
ROOTS_COUNT="$(count_lines "$SKILLS_ROOTS")"
|
|
118
|
+
if [ -n "$DEST" ] && [ "$ROOTS_COUNT" -gt 1 ]; then
|
|
119
|
+
echo "--dest cannot be used when installing to multiple targets (auto found both Codex/Claude)." >&2
|
|
120
|
+
exit 1
|
|
121
|
+
fi
|
|
122
|
+
|
|
123
|
+
if [ "$DRY_RUN" = "1" ]; then
|
|
124
|
+
echo "skills_roots=$(echo "$SKILLS_ROOTS" | tr '\n' ' ' | sed 's/[[:space:]]*$//')"
|
|
125
|
+
echo "remote_skill_md_url=$REMOTE_SKILL_MD_URL"
|
|
126
|
+
echo "local_skill_md_path=$LOCAL_SKILL_MD_PATH"
|
|
127
|
+
echo "category=$CATEGORY"
|
|
128
|
+
echo "name=$NAME"
|
|
129
|
+
echo "dest=$DEST"
|
|
130
|
+
exit 0
|
|
131
|
+
fi
|
|
132
|
+
|
|
133
|
+
tmp_skill_md=""
|
|
134
|
+
tmp_skill_md_is_temp="0"
|
|
135
|
+
cleanup() {
|
|
136
|
+
if [ "${tmp_skill_md_is_temp:-0}" = "1" ] && [ -n "${tmp_skill_md:-}" ] && [ -f "$tmp_skill_md" ]; then
|
|
137
|
+
rm -f "$tmp_skill_md"
|
|
138
|
+
fi
|
|
139
|
+
}
|
|
140
|
+
trap cleanup EXIT INT TERM
|
|
141
|
+
|
|
142
|
+
prepare_skill_md() {
|
|
143
|
+
if [ -n "$LOCAL_SKILL_MD_PATH" ]; then
|
|
144
|
+
tmp_skill_md="$LOCAL_SKILL_MD_PATH"
|
|
145
|
+
tmp_skill_md_is_temp="0"
|
|
146
|
+
return
|
|
147
|
+
fi
|
|
148
|
+
tmp_skill_md="$(mktemp -t doc2x-mcp-skill.XXXXXX 2>/dev/null || mktemp)"
|
|
149
|
+
tmp_skill_md_is_temp="1"
|
|
150
|
+
if command -v curl >/dev/null 2>&1; then
|
|
151
|
+
curl -fsSL "$REMOTE_SKILL_MD_URL" -o "$tmp_skill_md"
|
|
152
|
+
return
|
|
153
|
+
fi
|
|
154
|
+
if command -v wget >/dev/null 2>&1; then
|
|
155
|
+
wget -qO "$tmp_skill_md" "$REMOTE_SKILL_MD_URL"
|
|
156
|
+
return
|
|
157
|
+
fi
|
|
158
|
+
echo "Neither curl nor wget found; cannot download SKILL.md" >&2
|
|
159
|
+
exit 1
|
|
160
|
+
}
|
|
161
|
+
|
|
162
|
+
install_to_root() {
|
|
163
|
+
root="$1"
|
|
164
|
+
if [ -n "$DEST" ]; then
|
|
165
|
+
dest_dir="$DEST"
|
|
166
|
+
elif [ "$root" = "$CODEX_SKILLS_ROOT" ]; then
|
|
167
|
+
dest_dir="$root/$CATEGORY/$NAME"
|
|
168
|
+
elif [ "$root" = "$CLAUDE_SKILLS_ROOT" ]; then
|
|
169
|
+
dest_dir="$root/$NAME"
|
|
170
|
+
else
|
|
171
|
+
dest_dir="$root/$CATEGORY/$NAME"
|
|
172
|
+
fi
|
|
173
|
+
|
|
174
|
+
if [ -e "$dest_dir" ]; then
|
|
175
|
+
if [ -n "$DEST" ] && [ "$FORCE" != "1" ]; then
|
|
176
|
+
echo "Destination already exists: $dest_dir" >&2
|
|
177
|
+
echo "Re-run with --force to overwrite an explicit --dest." >&2
|
|
178
|
+
exit 1
|
|
179
|
+
fi
|
|
180
|
+
rm -rf "$dest_dir"
|
|
181
|
+
fi
|
|
182
|
+
|
|
183
|
+
mkdir -p "$dest_dir"
|
|
184
|
+
cp "$tmp_skill_md" "$dest_dir/SKILL.md"
|
|
185
|
+
echo "Installed skill to: $dest_dir"
|
|
186
|
+
}
|
|
187
|
+
|
|
188
|
+
prepare_skill_md
|
|
189
|
+
|
|
190
|
+
while IFS= read -r root; do
|
|
191
|
+
[ -n "$root" ] || continue
|
|
192
|
+
install_to_root "$root"
|
|
193
|
+
done <<EOF
|
|
194
|
+
$SKILLS_ROOTS
|
|
195
|
+
EOF
|
|
@@ -0,0 +1,169 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: doc2x-mcp
|
|
3
|
+
description: 使用 Doc2x MCP 工具完成文档解析与转换:对 PDF/扫描件/图片做 OCR 与版面解析,抽取文本/表格,导出为 Markdown/LaTeX(TeX)/DOCX 并下载落盘(submit/status/wait/export/download)。当用户提到 PDF/pdfs、scanned PDF、OCR、image-to-text、extract text/tables、表格抽取、文档转换/convert、导出/export、Markdown、LaTeX/TeX、DOCX、doc2x、doc2x-mcp、MCP 时使用。
|
|
4
|
+
---
|
|
5
|
+
|
|
6
|
+
# Doc2x MCP Tool-Use Skill (for LLM)
|
|
7
|
+
|
|
8
|
+
## 你要做什么
|
|
9
|
+
|
|
10
|
+
你是一个会调用 MCP tools 的助手。凡是涉及 PDF/图片的“解析/抽取/导出/下载”,必须通过 `doc2x-mcp` tools 执行真实操作:
|
|
11
|
+
|
|
12
|
+
- 不要臆测/伪造 `uid`、`url`、文件内容或导出结果
|
|
13
|
+
- 不要跳过工具步骤直接输出“看起来合理”的内容
|
|
14
|
+
|
|
15
|
+
## 全局约束(必须遵守)
|
|
16
|
+
|
|
17
|
+
1. 路径必须是绝对路径
|
|
18
|
+
`pdf_path` / `image_path` / `output_path` / `output_dir` 都应使用绝对路径;相对路径可能会被 server 以意外的 cwd 解析导致失败。
|
|
19
|
+
|
|
20
|
+
2. 扩展名约束
|
|
21
|
+
`doc2x_parse_pdf_submit.pdf_path` 必须以 `.pdf` 结尾;图片解析使用 `png/jpg`。
|
|
22
|
+
|
|
23
|
+
3. 不要并发重复提交导出
|
|
24
|
+
同一个 `uid` 对同一种导出配置(`to + formula_mode + formula_level (+ filename + filename_mode + merge_cross_page_forms...)`)不要并行重复 submit。
|
|
25
|
+
|
|
26
|
+
4. 不要泄露密钥
|
|
27
|
+
永远不要回显/记录 `DOC2X_API_KEY`。排错只用 `doc2x_debug_config` 的 `apiKeyLen/apiKeyPrefix/apiKeySource`。
|
|
28
|
+
|
|
29
|
+
5. 不要伪造下载 URL
|
|
30
|
+
下载必须使用 `doc2x_convert_export_*` 返回的 `url`;不要自己拼接。
|
|
31
|
+
|
|
32
|
+
6. 参数生效边界
|
|
33
|
+
`model` 仅用于 PDF 解析提交(默认 `v2`,可选 `v3-2026`);`formula_level` 仅用于导出(`doc2x_convert_export_*`),并且只在源解析任务使用 `v3-2026` 时生效(`v2` 下无效)。
|
|
34
|
+
|
|
35
|
+
## 关键参数语义(避免误用)
|
|
36
|
+
|
|
37
|
+
- `doc2x_parse_pdf_submit` / `doc2x_parse_pdf_wait_text(pdf_path 提交分支)`
|
|
38
|
+
- 可选 `model: "v3-2026"`;不传则默认 `v2`。
|
|
39
|
+
- `doc2x_convert_export_submit` / `doc2x_convert_export_wait`
|
|
40
|
+
- `formula_mode`:`"normal"` 或 `"dollar"`(关键参数,建议总是显式传入)。
|
|
41
|
+
- `formula_level`:`0 | 1 | 2`(可选)
|
|
42
|
+
- `0`:不退化公式(保留原始 Markdown)
|
|
43
|
+
- `1`:行内公式退化为普通文本(`\(...\)`、`$...$`)
|
|
44
|
+
- `2`:行内 + 块级公式全部退化为普通文本(`\(...\)`、`$...$`、`\[...\]`、`$$...$$`)
|
|
45
|
+
|
|
46
|
+
## Tool 选择(按用户目标)
|
|
47
|
+
|
|
48
|
+
- **PDF 解析任务**:`doc2x_parse_pdf_submit` → `doc2x_parse_pdf_status`
|
|
49
|
+
- **少量预览/摘要**:`doc2x_parse_pdf_wait_text`(可能截断;要完整内容请导出文件)
|
|
50
|
+
- **导出文件(md/tex/docx)**:`doc2x_convert_export_submit` → `doc2x_convert_export_wait`(或直接 `doc2x_convert_export_wait` 走兼容模式一键导出)
|
|
51
|
+
- **下载落盘**:`doc2x_download_url_to_file`
|
|
52
|
+
- **图片版面解析**:`doc2x_parse_image_layout_sync` 或 `doc2x_parse_image_layout_submit` → `doc2x_parse_image_layout_wait_text`
|
|
53
|
+
- **解包资源 zip**:`doc2x_materialize_convert_zip`
|
|
54
|
+
- **配置排错**:`doc2x_debug_config`
|
|
55
|
+
|
|
56
|
+
## 标准工作流(照做)
|
|
57
|
+
|
|
58
|
+
### 工作流 A:批量 PDF → 导出文件(MD/TEX/DOCX,高效并行版)
|
|
59
|
+
|
|
60
|
+
适用于“多个 PDF 批量导出并落盘(.md / .tex / .docx)”。核心原则:
|
|
61
|
+
|
|
62
|
+
- `doc2x_parse_pdf_submit` 可并行(批量提交)
|
|
63
|
+
- `doc2x_parse_pdf_status` 可并行(批量轮询)
|
|
64
|
+
- **流水线式并行**:某个 `uid` 一旦解析成功,立刻开始该 `uid` 的导出+下载(不必等所有 PDF 都解析完)
|
|
65
|
+
- 不同 `uid` 的导出与下载可并行
|
|
66
|
+
- **同一个 `uid` 的同一种导出配置(`to + formula_mode + formula_level (+ filename + filename_mode + merge_cross_page_forms...)`)不要并行重复提交**
|
|
67
|
+
- 同一个 `uid` 若要导出多种格式(例如 md + docx + tex),建议**按格式串行**,但不同 `uid` 仍可并行
|
|
68
|
+
|
|
69
|
+
**批量提交解析任务(并行)**
|
|
70
|
+
|
|
71
|
+
- 对每个 `pdf_path` 调用:`doc2x_parse_pdf_submit({ pdf_path, model? })` → `{ uid }`
|
|
72
|
+
|
|
73
|
+
**等待解析完成(并行)**
|
|
74
|
+
|
|
75
|
+
- 对每个 `uid` 轮询:`doc2x_parse_pdf_status({ uid })` 直到 `status="success"`
|
|
76
|
+
- 若 `status="failed"`:汇报 `detail`,该文件停止后续步骤
|
|
77
|
+
|
|
78
|
+
**导出目标格式(并行,按 uid)**
|
|
79
|
+
|
|
80
|
+
推荐用 `doc2x_convert_export_wait` 走“兼容模式一键导出”(当你提供 `formula_mode` 且本进程未提交过该导出时,会自动 submit 一次,然后 wait),避免你手动拆成 submit+wait:
|
|
81
|
+
|
|
82
|
+
- DOCX:`doc2x_convert_export_wait({ uid, to: "docx", formula_mode: "normal", formula_level? })` → `{ status: "success", url }`
|
|
83
|
+
- Markdown:`doc2x_convert_export_wait({ uid, to: "md", formula_mode: "normal", formula_level?, filename?, filename_mode? })` → `{ status: "success", url }`
|
|
84
|
+
- LaTeX:`doc2x_convert_export_wait({ uid, to: "tex", formula_mode: "dollar", formula_level? })` → `{ status: "success", url }`
|
|
85
|
+
|
|
86
|
+
(或显式两步:`doc2x_convert_export_submit(...)` → `doc2x_convert_export_wait({ uid, to })`)
|
|
87
|
+
|
|
88
|
+
**补充建议**
|
|
89
|
+
|
|
90
|
+
- `formula_mode` 是关键参数:建议总是显式传入(`"normal"` / `"dollar"`,按用户偏好选择;常见:`md/docx` 用 `"normal"`、`tex` 用 `"dollar"`)
|
|
91
|
+
- 需要做公式退化时显式传 `formula_level`(`0/1/2`);若不需要退化,建议显式传 `0`,避免调用端默认值歧义
|
|
92
|
+
- `filename`/`filename_mode` 主要用于 `md/tex`:传不带扩展名的 basename,并配合 `filename_mode: "auto"`(避免 `name.md.md` / `name.tex.tex`)
|
|
93
|
+
- 对同一个 `uid` 做多格式导出时,先确定顺序(例如先 md 再 docx),逐个完成再进行下一个格式
|
|
94
|
+
|
|
95
|
+
**批量下载(并行)**
|
|
96
|
+
|
|
97
|
+
- `doc2x_download_url_to_file({ url, output_path })` → `{ output_path, bytes_written }`
|
|
98
|
+
- `output_path` 必须为绝对路径,且每个文件应唯一(建议用原文件名 + 对应扩展名:`.md` / `.tex` / `.docx`)
|
|
99
|
+
|
|
100
|
+
**并发建议**
|
|
101
|
+
|
|
102
|
+
- 10 个 PDF 以内通常可以直接并行;更多文件建议分批/限流(避免触发超时/限流)
|
|
103
|
+
|
|
104
|
+
**向用户回报(按文件汇总)**
|
|
105
|
+
|
|
106
|
+
- 成功:列出每个输入文件对应的 `output_path` 与 `bytes_written`
|
|
107
|
+
- 失败:列出失败文件与错误原因(包含 `uid` 与 `detail`/错误码),并说明其余文件不受影响
|
|
108
|
+
|
|
109
|
+
### 工作流 B:PDF → Markdown 文件(推荐)
|
|
110
|
+
|
|
111
|
+
当用户目标是“拿到完整 Markdown / 落盘”,主链路应当是导出与下载,不要依赖 `doc2x_parse_pdf_wait_text`。
|
|
112
|
+
|
|
113
|
+
**提交解析任务**
|
|
114
|
+
|
|
115
|
+
- `doc2x_parse_pdf_submit({ pdf_path, model? })` → `{ uid }`
|
|
116
|
+
|
|
117
|
+
**等待解析完成**
|
|
118
|
+
|
|
119
|
+
- 轮询 `doc2x_parse_pdf_status({ uid })` 直到 `status="success"`(失败则带 `detail` 汇报)
|
|
120
|
+
|
|
121
|
+
**导出 Markdown**
|
|
122
|
+
|
|
123
|
+
- `doc2x_convert_export_wait({ uid, to: "md", formula_mode: "normal", formula_level?, filename?, filename_mode? })` → `{ status: "success", url }`
|
|
124
|
+
|
|
125
|
+
**下载落盘**
|
|
126
|
+
|
|
127
|
+
- `doc2x_download_url_to_file({ url, output_path })` → `{ output_path, bytes_written }`
|
|
128
|
+
|
|
129
|
+
**向用户回报**
|
|
130
|
+
|
|
131
|
+
- 回复用户:保存路径、文件大小、`uid`(必要时附上 `url`)
|
|
132
|
+
|
|
133
|
+
### 工作流 C:PDF → 文本预览(可控长度)
|
|
134
|
+
|
|
135
|
+
当用户只需要“摘要/少量预览”时才用:
|
|
136
|
+
|
|
137
|
+
- `doc2x_parse_pdf_wait_text({ pdf_path | uid, max_output_chars?, max_output_pages? })`
|
|
138
|
+
|
|
139
|
+
如果返回包含截断提示(`[doc2x-mcp] Output truncated ...`),应切换到“工作流 B”导出 md 获取完整内容。
|
|
140
|
+
|
|
141
|
+
### 工作流 D:PDF → LaTeX / DOCX
|
|
142
|
+
|
|
143
|
+
- LaTeX:把 `to` 设为 `"tex"`
|
|
144
|
+
- Word:把 `to` 设为 `"docx"`
|
|
145
|
+
- 调用链同“工作流 A / B”(先解析 → 再导出 → 再下载),仅替换 `to`(以及必要时调整 `formula_mode/formula_level/filename`)
|
|
146
|
+
- 注意:`doc2x_convert_export_submit.formula_mode` 必填(`"normal"` 或 `"dollar"`);`formula_level` 可选(`0/1/2`)
|
|
147
|
+
|
|
148
|
+
### 工作流 E:图片 → Markdown(版面解析)
|
|
149
|
+
|
|
150
|
+
- 只要结果(同步):`doc2x_parse_image_layout_sync({ image_path })`(返回原始 JSON,可能包含 `convert_zip`)
|
|
151
|
+
- 要首屏 markdown(异步):`doc2x_parse_image_layout_submit({ image_path })` → `doc2x_parse_image_layout_wait_text({ uid })`
|
|
152
|
+
|
|
153
|
+
如果结果里有 `convert_zip`(base64)且用户希望落盘资源文件:
|
|
154
|
+
|
|
155
|
+
- `doc2x_materialize_convert_zip({ convert_zip_base64, output_dir })` → `{ output_dir, zip_path, extracted }`
|
|
156
|
+
|
|
157
|
+
## 失败与排错(你应当这样处理)
|
|
158
|
+
|
|
159
|
+
1. 鉴权/配置异常
|
|
160
|
+
先 `doc2x_debug_config()`,确认 `apiKeyLen > 0` 且 `baseUrl/httpTimeoutMs/pollIntervalMs/maxWaitMs` 合理。
|
|
161
|
+
|
|
162
|
+
2. 等待超时
|
|
163
|
+
建议用户调大 `DOC2X_MAX_WAIT_MS` 或按需调 `DOC2X_POLL_INTERVAL_MS`(不要过于频繁)。
|
|
164
|
+
|
|
165
|
+
3. 下载被阻止(安全策略)
|
|
166
|
+
`doc2x_download_url_to_file` 只允许 `https` 且要求 host 在 `DOC2X_DOWNLOAD_URL_ALLOWLIST` 内;被拦截时解释原因,并让用户选择“加 allowlist”或“保持默认安全策略”。
|
|
167
|
+
|
|
168
|
+
4. 用户给的是相对路径/不确定路径
|
|
169
|
+
要求用户提供绝对路径;不要猜。
|