npm - @noedgeai-org/doc2x-mcp - Versions diffs - 0.1.3-dev.6.1 → 0.1.4-dev.8.1 - Mend

@noedgeai-org/doc2x-mcp 0.1.3-dev.6.1 → 0.1.4-dev.8.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (11) hide show

package/README.md +179 -57
package/README_EN.md +181 -59
package/dist/doc2x/materialize.d.ts +15 -0
package/dist/doc2x/materialize.js +52 -0
package/dist/doc2x/pdf.d.ts +12 -1
package/dist/doc2x/pdf.js +19 -10
package/dist/mcp/registerPdfTools.js +66 -4
package/dist/mcp/registerToolsShared.d.ts +2 -0
package/dist/mcp/registerToolsShared.js +7 -2
package/package.json +2 -2
package/skills/doc2x-mcp/SKILL.md +78 -135

package/README.md CHANGED Viewed

@@ -1,29 +1,47 @@
 # Doc2x MCP Server
+[![CI](https://github.com/NoEdgeAI/doc2x-mcp/actions/workflows/ci.yml/badge.svg)](https://github.com/NoEdgeAI/doc2x-mcp/actions/workflows/ci.yml)
+[![Publish](https://github.com/NoEdgeAI/doc2x-mcp/actions/workflows/publish.yml/badge.svg)](https://github.com/NoEdgeAI/doc2x-mcp/actions/workflows/publish.yml)
+[![npm version](https://img.shields.io/npm/v/%40noedgeai-org%2Fdoc2x-mcp)](https://www.npmjs.com/package/@noedgeai-org/doc2x-mcp)
 简体中文 | [English](./README_EN.md)
-本项目提供一个基于 stdio 的 MCP Server，把 Doc2x v2 的 PDF/图片接口封装成语义化 tools。
+将 Doc2x v2 PDF/图片能力封装为基于 stdio 的 MCP Server，提供稳定、可组合的语义化 tools。
+## 目录
+- [项目定位](#项目定位)
+- [版本与环境](#版本与环境)
+- [快速开始](#快速开始)
+- [配置参考](#配置参考)
+- [Tool API 总览](#tool-api-总览)
+- [常见工作流](#常见工作流)
+- [本地开发](#本地开发)
+- [CI / 发布流水线](#ci--发布流水线)
+- [如何发布](#如何发布)
+- [安装本仓库 Skill（可选）](#安装本仓库-skill可选)
+- [安全与排错](#安全与排错)
+- [问题反馈](#问题反馈)
+- [License](#license)
-## 1) 运行环境
+## 项目定位
-- Node.js >= 18
+- 面向 MCP 客户端（Codex CLI / Claude Code / 自定义 Agent）提供 Doc2x 能力。
+- 以 submit/status/wait 统一异步任务模型，便于自动化编排。
+- 提供可控超时、轮询、下载白名单等运行时安全边界。
-## 2) 配置
+## 版本与环境
-通过环境变量配置：
+- 本地运行：Node.js `>=18` 即可。
+- CI 校验：Node.js `18`、`20` 都会跑构建。
+- 发布环境：GitHub Actions 中发布任务使用 Node.js `24`。
+- 包管理器：统一用 pnpm（锁文件 `pnpm-lock.yaml`）。
-- `DOC2X_API_KEY`：必填（形如 `sk-xxx`）
-- `DOC2X_BASE_URL`：可选，默认 `https://v2.doc2x.noedgeai.com`
-- `DOC2X_HTTP_TIMEOUT_MS`：可选，默认 `60000`
-- `DOC2X_POLL_INTERVAL_MS`：可选，默认 `2000`
-- `DOC2X_MAX_WAIT_MS`：可选，默认 `600000`
-- `DOC2X_PARSE_PDF_MAX_OUTPUT_CHARS`：可选，默认 `5000`；限制 `doc2x_parse_pdf_wait_text` 返回文本的最大字符数，避免大模型上下文超限（设为 `0` 表示不限制）
-- `DOC2X_PARSE_PDF_MAX_OUTPUT_PAGES`：可选，默认 `10`；限制 `doc2x_parse_pdf_wait_text` 合并的最大页数（设为 `0` 表示不限制）
-- `DOC2X_DOWNLOAD_URL_ALLOWLIST`：可选，默认 `".amazonaws.com.cn,.aliyuncs.com,.noedgeai.com"`；设为 `*` 可允许任意 host（不推荐）
+## 快速开始
-## 3) 启动
+### 方式 A：通过 npx（推荐）
-### 方式 A：通过 npx
+在 MCP client 配置中添加：
 ```json
 {
@@ -40,11 +58,13 @@
 ```bash
 cd doc2x-mcp
-npm install
-npm run build
-DOC2X_API_KEY=sk-xxx npm start
+pnpm install --frozen-lockfile
+pnpm run build
+DOC2X_API_KEY=sk-xxx pnpm start
 ```
+MCP client 指向本地构建产物：
 ```json
 {
   "command": "node",
@@ -56,27 +76,34 @@ DOC2X_API_KEY=sk-xxx npm start
 }
 ```
-## 4) Tools
-- `doc2x_parse_pdf_submit`
-- `doc2x_parse_pdf_status`
-- `doc2x_parse_pdf_wait_text`
-- `doc2x_convert_export_submit`
-- `doc2x_convert_export_result`
-- `doc2x_convert_export_wait`
-- `doc2x_download_url_to_file`
-- `doc2x_parse_image_layout_sync`
-- `doc2x_parse_image_layout_submit`
-- `doc2x_parse_image_layout_status`
-- `doc2x_parse_image_layout_wait_text`
-- `doc2x_materialize_convert_zip`
-- `doc2x_debug_config`
+## 配置参考
+| 环境变量 | 必填 | 默认值 | 说明 |
+| --- | --- | --- | --- |
+| `DOC2X_API_KEY` | 是 | - | Doc2x API Key（`sk-xxx`） |
+| `DOC2X_BASE_URL` | 否 | `https://v2.doc2x.noedgeai.com` | Doc2x API 基础地址 |
+| `DOC2X_HTTP_TIMEOUT_MS` | 否 | `60000` | 单次 HTTP 超时（毫秒） |
+| `DOC2X_POLL_INTERVAL_MS` | 否 | `2000` | 轮询间隔（毫秒） |
+| `DOC2X_MAX_WAIT_MS` | 否 | `600000` | wait 类工具最大等待时长（毫秒） |
+| `DOC2X_PARSE_PDF_MAX_OUTPUT_CHARS` | 否 | `5000` | `doc2x_parse_pdf_wait_text` 最大返回字符数；`0`=不限制 |
+| `DOC2X_PARSE_PDF_MAX_OUTPUT_PAGES` | 否 | `10` | `doc2x_parse_pdf_wait_text` 最大合并页数；`0`=不限制 |
+| `DOC2X_DOWNLOAD_URL_ALLOWLIST` | 否 | `.amazonaws.com.cn,.aliyuncs.com,.noedgeai.com` | 下载 URL 白名单；`*` 允许任意 host（不推荐） |
+## Tool API 总览
+| 阶段 | Tools | 说明 |
+| --- | --- | --- |
+| PDF 解析 | `doc2x_parse_pdf_submit` / `doc2x_parse_pdf_status` / `doc2x_parse_pdf_wait_text` / `doc2x_materialize_pdf_layout_json` | 提交任务、查询状态、等待并取文本，或将 v3 layout 结果落盘为本地 JSON |
+| 结果导出 | `doc2x_convert_export_submit` / `doc2x_convert_export_result` / `doc2x_convert_export_wait` | 发起导出、查结果、等待导出完成 |
+| 下载落盘 | `doc2x_download_url_to_file` / `doc2x_materialize_convert_zip` | 下载 URL 到本地、解包 convert zip |
+| 图片版面解析 | `doc2x_parse_image_layout_sync` / `doc2x_parse_image_layout_submit` / `doc2x_parse_image_layout_status` / `doc2x_parse_image_layout_wait_text` | 同步/异步图片 OCR 与版面解析 |
+| 诊断 | `doc2x_debug_config` | 返回配置解析与 API key 来源，便于排错 |
 ### PDF 解析模型（`doc2x_parse_pdf_submit` / `doc2x_parse_pdf_wait_text`）
 - 可选参数：`model`
-- 可选值：仅 `v3-2026`（最新模型）
-- 说明：不传 `model` 时默认使用 `v2`；若想体验最新模型，传：
+- 可选值：`v2`（默认） / `v3-2026`（最新模型）
+- 不传时默认 `v2`
 ```json
 {
@@ -84,22 +111,110 @@ DOC2X_API_KEY=sk-xxx npm start
 }
 ```
+### PDF Layout JSON 落盘（`doc2x_materialize_pdf_layout_json`）
+- 必选参数：`output_path`
+- `uid` 与 `pdf_path` 二选一
+- `v2` 不支持 `layout`；需要 `pages[].layout` 时请使用 `v3-2026`
+- 若传 `pdf_path` 但不传 `model`，该工具默认使用 `v3-2026`
+- 成功时将原始 `result` JSON 写到本地
+`layout` 是页面块结构和坐标信息，适合 figure/table 裁剪、区域高亮、结构化抽取和版面分析；如果只想看正文内容，优先使用 Markdown / DOCX 导出。
+```json
+{
+  "pdf_path": "/absolute/path/to/input.pdf",
+  "output_path": "/absolute/path/to/input_v3.layout.json"
+}
+```
 ### 导出公式参数（`doc2x_convert_export_submit` / `doc2x_convert_export_wait`）
 - 必选参数：`formula_mode`（`normal` / `dollar`）
-- 可选参数：`formula_level`（`int32`，仅源解析任务为 `model=v3-2026` 时生效，`v2` 下无效）
+- 可选参数：`formula_level`（仅源解析任务为 `model=v3-2026` 时生效）
 - 取值说明：
-  - `0`：不退化公式（保留原始 Markdown）
-  - `1`：行内公式变为普通文本（退化 `\\(...\\)` 和 `$...$`）
-  - `2`：全部公式变为普通文本（退化 `\\(...\\)`、`$...$`、`\\[...\\]`、`$$...$$`）
+  - `0`：保留公式
+  - `1`：仅退化行内公式（`\\(...\\)`、`$...$`）
+  - `2`：退化全部公式（`\\(...\\)`、`$...$`、`\\[...\\]`、`$$...$$`）
-## 5) 协议
+## 常见工作流
-MIT License，详见 `LICENSE`。
+### 工作流 1：PDF -> Markdown 本地文件
+1. `doc2x_parse_pdf_submit` 提交 PDF 解析。
+2. `doc2x_convert_export_wait` 等待导出（`to=md`，并指定 `formula_mode`）。
+3. `doc2x_convert_export_result` 获取下载 URL。
+4. `doc2x_download_url_to_file` 下载到目标路径。
+### 工作流 2：图片版面 OCR 快速结果
+1. `doc2x_parse_image_layout_sync` 直接同步解析。
+2. 若需要稳态轮询，改用 submit/status/wait 组合。
+### 工作流 3：PDF -> v3 layout JSON 本地文件
+1. 调用 `doc2x_materialize_pdf_layout_json`，传入 `pdf_path` 和 `output_path`。
+2. 工具会等待 parse 成功，并将原始 `result` JSON 落到本地。
+3. 该 JSON 可直接提供给后续 figure/table 裁剪脚本使用。
+## 本地开发
+### 环境要求
+- Node.js `>=18`
+- pnpm（与仓库锁文件一致）
+### 常用命令
+```bash
+pnpm install --frozen-lockfile
+pnpm run build
+pnpm run test
+pnpm run format:check
+```
+运行服务：
+```bash
+DOC2X_API_KEY=sk-xxx pnpm start
+```
-## 6) 安装本仓库 Skill（可选）
+## CI / 发布流水线
-用于给 Codex CLI / Claude Code 增加一个“教大模型如何使用 doc2x-mcp tools 的 Skill”（便于按固定工作流调用 tools、导出与下载、以及排错）。
+仓库使用 GitHub Actions：
+- CI：`.github/workflows/ci.yml`
+  - 触发：`push` 到 `main`、`pull_request`、`workflow_dispatch`、每周一 UTC `03:17`
+  - 文档-only 变更（`**/*.md`、`LICENSE`）自动跳过
+  - 任务：
+    - `dependency-review`（仅 PR）
+    - `build`（Node.js `18/20` 矩阵）
+    - `package-smoke`（`npm pack --dry-run`）
+    - `security-audit`（仅手动/定时）
+- Publish：`.github/workflows/publish.yml`
+  - `dev` 分支 push：发布 npm `dev` tag
+  - `v*.*.*` tag push：发布 npm `latest`
+  - 发布前校验 tag 版本与 `package.json` 版本一致
+  - 发布命令：`npm publish --provenance`
+建议提交前本地对齐：
+```bash
+pnpm install --frozen-lockfile
+pnpm run build
+npm pack --dry-run
+pnpm audit --prod --audit-level high
+```
+## 如何发布
+- 开发预发布（`dev`）：push 到 `dev` 分支后自动发布到 npm `dev` tag。版本会自动改成 `x.y.z-dev.<run>.<attempt>`。
+- 正式发布（`latest`）：push `v*.*.*` tag 后发布到 npm `latest`。tag 版本必须和 `package.json` 版本一致。
+## 安装本仓库 Skill（可选）
+用于给 Codex CLI / Claude Code 增加一个“教大模型如何使用 doc2x-mcp tools 的 Skill”。
 不需要 clone 仓库的一键安装（推荐）：
@@ -107,28 +222,35 @@ MIT License，详见 `LICENSE`。
 curl -fsSL https://raw.githubusercontent.com/NoEdgeAI/doc2x-mcp/main/scripts/install-skill.sh | sh
 ```
-重复执行同一条命令即可覆盖安装（默认会覆盖已存在目录）。
 在本仓库源码目录安装：
 ```bash
-npm run skill:install
+pnpm run skill:install
 ```
-默认安装到：
-脚本默认安装到：
+默认安装目录：
-- Codex CLI：`~/.codex/skills/public/doc2x-mcp`（用 `CODEX_HOME` 覆盖）
-- Claude Code：`~/.claude/skills/doc2x-mcp`（用 `CLAUDE_HOME` 覆盖）
+- Codex CLI：`~/.codex/skills/public/doc2x-mcp`（可用 `CODEX_HOME` 覆盖）
+- Claude Code：`~/.claude/skills/doc2x-mcp`（可用 `CLAUDE_HOME` 覆盖）
 说明：
-- `--target auto`（默认）会同时安装到 Codex + Claude；如只想装其中一个，用 `--target codex|claude`。
-- PowerShell 7+ 一键安装：`irm https://raw.githubusercontent.com/NoEdgeAI/doc2x-mcp/main/scripts/install-skill.ps1 | iex`
-- Windows PowerShell 5.1 一键安装：`irm https://raw.githubusercontent.com/NoEdgeAI/doc2x-mcp/main/scripts/install-skill-winps.ps1 | iex`
+- `--target auto`（默认）会同时安装到 Codex + Claude。
+- PowerShell 7+：`irm https://raw.githubusercontent.com/NoEdgeAI/doc2x-mcp/main/scripts/install-skill.ps1 | iex`
+- Windows PowerShell 5.1：`irm https://raw.githubusercontent.com/NoEdgeAI/doc2x-mcp/main/scripts/install-skill-winps.ps1 | iex`
-覆盖安装目录示例：
+## 安全与排错
-- mac/linux：`CODEX_HOME=/custom/.codex curl -fsSL https://raw.githubusercontent.com/NoEdgeAI/doc2x-mcp/main/scripts/install-skill.sh | sh -s -- --target codex`
-- Windows：`$env:CODEX_HOME="C:\\path\\.codex"; irm https://raw.githubusercontent.com/NoEdgeAI/doc2x-mcp/main/scripts/install-skill.ps1 | iex`
+- 不要在仓库提交真实 `DOC2X_API_KEY`。
+- 白名单默认限制下载域名；如需放开，评估风险后再使用 `DOC2X_DOWNLOAD_URL_ALLOWLIST=*`。
+- 配置异常时优先调用 `doc2x_debug_config` 定位环境变量来源与解析结果。
+## 问题反馈
+- 使用问题或缺陷反馈：GitHub Issues
+  [https://github.com/NoEdgeAI/doc2x-mcp/issues](https://github.com/NoEdgeAI/doc2x-mcp/issues)
+- 建议在 issue 中附上最小复现输入、触发的 tool 名称、以及 `doc2x_debug_config` 结果（可脱敏）。
+## License
+MIT License，详见 `LICENSE`。

package/README_EN.md CHANGED Viewed

@@ -1,34 +1,52 @@
 # Doc2x MCP Server
+[![CI](https://github.com/NoEdgeAI/doc2x-mcp/actions/workflows/ci.yml/badge.svg)](https://github.com/NoEdgeAI/doc2x-mcp/actions/workflows/ci.yml)
+[![Publish](https://github.com/NoEdgeAI/doc2x-mcp/actions/workflows/publish.yml/badge.svg)](https://github.com/NoEdgeAI/doc2x-mcp/actions/workflows/publish.yml)
+[![npm version](https://img.shields.io/npm/v/%40noedgeai-org%2Fdoc2x-mcp)](https://www.npmjs.com/package/@noedgeai-org/doc2x-mcp)
 English | [简体中文](./README.md)
-This project provides a stdio-based MCP Server that wraps Doc2x v2 PDF/image APIs into semantic tools.
+A stdio-based MCP Server that wraps Doc2x v2 PDF/image capabilities into stable, composable semantic tools.
+## Table of Contents
+- [Project Scope](#project-scope)
+- [Runtime Quick Facts](#runtime-quick-facts)
+- [Quick Start](#quick-start)
+- [Configuration Reference](#configuration-reference)
+- [Tool API Overview](#tool-api-overview)
+- [Common Workflows](#common-workflows)
+- [Local Development](#local-development)
+- [CI / Release Pipelines](#ci--release-pipelines)
+- [Publishing Flow](#publishing-flow)
+- [Install Repo Skill (Optional)](#install-repo-skill-optional)
+- [Security and Troubleshooting](#security-and-troubleshooting)
+- [Getting Help](#getting-help)
+- [License](#license)
-## 1) Requirements
+## Project Scope
-- Node.js >= 18
+- Exposes Doc2x capabilities to MCP clients (Codex CLI / Claude Code / custom agents).
+- Uses a unified async contract (`submit/status/wait`) for predictable automation.
+- Provides runtime safety boundaries (timeouts, polling controls, download allowlist).
-## 2) Configuration
+## Runtime Quick Facts
-Configure via environment variables:
+- Local run: Node.js `>=18` is enough.
+- CI checks: builds run on Node.js `18` and `20`.
+- Release environment: publish jobs run on Node.js `24` in GitHub Actions.
+- Package manager: pnpm with lockfile `pnpm-lock.yaml`.
-- `DOC2X_API_KEY`: required (e.g. `sk-xxx`)
-- `DOC2X_BASE_URL`: optional, default `https://v2.doc2x.noedgeai.com`
-- `DOC2X_HTTP_TIMEOUT_MS`: optional, default `60000`
-- `DOC2X_POLL_INTERVAL_MS`: optional, default `2000`
-- `DOC2X_MAX_WAIT_MS`: optional, default `600000`
-- `DOC2X_PARSE_PDF_MAX_OUTPUT_CHARS`: optional, default `5000`; limit the returned text size of `doc2x_parse_pdf_wait_text` (set `0` for unlimited)
-- `DOC2X_PARSE_PDF_MAX_OUTPUT_PAGES`: optional, default `10`; limit merged pages of `doc2x_parse_pdf_wait_text` (set `0` for unlimited)
-- `DOC2X_DOWNLOAD_URL_ALLOWLIST`: optional, default `".amazonaws.com.cn,.aliyuncs.com,.noedgeai.com"`; set to `*` to allow any host (not recommended)
+## Quick Start
-## 3) Run
+### Option A: via npx (recommended)
-### Option A: via npx
+Add this in your MCP client config:
 ```json
 {
   "command": "npx",
-  "args": ["-y", "@noedgeai-org/doc2x-mcp"],
+  "args": ["-y", "@noedgeai-org/doc2x-mcp@latest"],
   "env": {
     "DOC2X_API_KEY": "sk-xxx",
     "DOC2X_BASE_URL": "https://v2.doc2x.noedgeai.com"
@@ -40,11 +58,13 @@ Configure via environment variables:
 ```bash
 cd doc2x-mcp
-npm install
-npm run build
-DOC2X_API_KEY=sk-xxx npm start
+pnpm install --frozen-lockfile
+pnpm run build
+DOC2X_API_KEY=sk-xxx pnpm start
 ```
+Point MCP client to your local build output:
 ```json
 {
   "command": "node",
@@ -56,27 +76,34 @@ DOC2X_API_KEY=sk-xxx npm start
 }
 ```
-## 4) Tools
-- `doc2x_parse_pdf_submit`
-- `doc2x_parse_pdf_status`
-- `doc2x_parse_pdf_wait_text`
-- `doc2x_convert_export_submit`
-- `doc2x_convert_export_result`
-- `doc2x_convert_export_wait`
-- `doc2x_download_url_to_file`
-- `doc2x_parse_image_layout_sync`
-- `doc2x_parse_image_layout_submit`
-- `doc2x_parse_image_layout_status`
-- `doc2x_parse_image_layout_wait_text`
-- `doc2x_materialize_convert_zip`
-- `doc2x_debug_config`
+## Configuration Reference
+| Environment Variable | Required | Default | Description |
+| --- | --- | --- | --- |
+| `DOC2X_API_KEY` | Yes | - | Doc2x API key (`sk-xxx`) |
+| `DOC2X_BASE_URL` | No | `https://v2.doc2x.noedgeai.com` | Doc2x API base URL |
+| `DOC2X_HTTP_TIMEOUT_MS` | No | `60000` | Per-request HTTP timeout in ms |
+| `DOC2X_POLL_INTERVAL_MS` | No | `2000` | Polling interval in ms |
+| `DOC2X_MAX_WAIT_MS` | No | `600000` | Max wait duration for wait tools in ms |
+| `DOC2X_PARSE_PDF_MAX_OUTPUT_CHARS` | No | `5000` | Max returned chars for `doc2x_parse_pdf_wait_text`; `0` = unlimited |
+| `DOC2X_PARSE_PDF_MAX_OUTPUT_PAGES` | No | `10` | Max merged pages for `doc2x_parse_pdf_wait_text`; `0` = unlimited |
+| `DOC2X_DOWNLOAD_URL_ALLOWLIST` | No | `.amazonaws.com.cn,.aliyuncs.com,.noedgeai.com` | URL host allowlist for downloads; `*` allows any host (not recommended) |
+## Tool API Overview
+| Stage | Tools | Purpose |
+| --- | --- | --- |
+| PDF parse | `doc2x_parse_pdf_submit` / `doc2x_parse_pdf_status` / `doc2x_parse_pdf_wait_text` / `doc2x_materialize_pdf_layout_json` | Submit parse tasks, check status, wait and fetch text, or materialize v3 layout JSON locally |
+| Export | `doc2x_convert_export_submit` / `doc2x_convert_export_result` / `doc2x_convert_export_wait` | Start export, read export result, wait for completion |
+| Download | `doc2x_download_url_to_file` / `doc2x_materialize_convert_zip` | Download export URL to local path, materialize convert zip |
+| Image layout parse | `doc2x_parse_image_layout_sync` / `doc2x_parse_image_layout_submit` / `doc2x_parse_image_layout_status` / `doc2x_parse_image_layout_wait_text` | Sync/async OCR and layout parse for images |
+| Diagnostics | `doc2x_debug_config` | Show resolved config and API key source |
 ### PDF Parse Model (`doc2x_parse_pdf_submit` / `doc2x_parse_pdf_wait_text`)
 - Optional parameter: `model`
-- Supported values: only `v3-2026` (latest model)
-- Notes: omit `model` to use default `v2`; to try the latest model, pass:
+- Supported values: `v2` (default) / `v3-2026` (latest model)
+- Default (when omitted): `v2`
 ```json
 {
@@ -84,22 +111,110 @@ DOC2X_API_KEY=sk-xxx npm start
 }
 ```
+### PDF Layout JSON Materialization (`doc2x_materialize_pdf_layout_json`)
+- Required: `output_path`
+- Provide either `uid` or `pdf_path`
+- `v2` does not support `layout`; use `v3-2026` when `pages[].layout` is required
+- When `pdf_path` is used and `model` is omitted, this tool defaults to `v3-2026`
+- On success it writes the raw parse `result` JSON locally
+`layout` contains page block structure and coordinates, which is useful for figure/table crops, region highlighting, structured extraction, and layout analysis. If the goal is readable full text, prefer Markdown / DOCX export.
+```json
+{
+  "pdf_path": "/absolute/path/to/input.pdf",
+  "output_path": "/absolute/path/to/input_v3.layout.json"
+}
+```
 ### Export Formula Parameters (`doc2x_convert_export_submit` / `doc2x_convert_export_wait`)
-- Required parameter: `formula_mode` (`normal` / `dollar`)
-- Optional parameter: `formula_level` (`int32`, effective only when the source parse task uses `model=v3-2026`; ignored by `v2`)
+- Required: `formula_mode` (`normal` / `dollar`)
+- Optional: `formula_level` (effective only when source parse used `model=v3-2026`)
 - Value mapping:
-  - `0`: keep formulas as-is (preserve original Markdown)
-  - `1`: degrade inline formulas to plain text (`\\(...\\)` and `$...$`)
-  - `2`: degrade all formulas to plain text (`\\(...\\)`, `$...$`, `\\[...\\]`, `$$...$$`)
+  - `0`: keep formulas
+  - `1`: degrade inline formulas (`\\(...\\)`, `$...$`)
+  - `2`: degrade all formulas (`\\(...\\)`, `$...$`, `\\[...\\]`, `$$...$$`)
-## 5) License
+## Common Workflows
-MIT License. See `LICENSE`.
+### Workflow 1: PDF -> Markdown local file
+1. Submit parse via `doc2x_parse_pdf_submit`.
+2. Wait export via `doc2x_convert_export_wait` (`to=md` and `formula_mode` required).
+3. Read result URL via `doc2x_convert_export_result`.
+4. Download to local path via `doc2x_download_url_to_file`.
+### Workflow 2: Fast image OCR/layout result
+1. Use `doc2x_parse_image_layout_sync` for direct parse.
+2. For robust polling behavior, switch to submit/status/wait flow.
+### Workflow 3: PDF -> local v3 layout JSON
+1. Call `doc2x_materialize_pdf_layout_json` with `pdf_path` and `output_path`.
+2. The tool waits for parse success and writes the raw `result` JSON locally.
+3. The saved JSON can be consumed directly by downstream figure/table crop scripts.
+## Local Development
+### Requirements
+- Node.js `>=18`
+- pnpm (aligned with lockfile)
+### Common commands
+```bash
+pnpm install --frozen-lockfile
+pnpm run build
+pnpm run test
+pnpm run format:check
+```
+Run server:
+```bash
+DOC2X_API_KEY=sk-xxx pnpm start
+```
-## 6) Install Repo Skill (Optional)
+## CI / Release Pipelines
-Installs a tool-use skill for Codex CLI / Claude Code (teaches the LLM how to use doc2x-mcp tools with a standard workflow: submit/status/wait/export/download).
+This repository uses GitHub Actions:
+- CI: `.github/workflows/ci.yml`
+  - Triggers: `push` to `main`, `pull_request`, `workflow_dispatch`, weekly on Monday at UTC `03:17`
+  - Doc-only changes (`**/*.md`, `LICENSE`) are skipped
+  - Jobs:
+    - `dependency-review` (PR only)
+    - `build` (Node.js `18/20` matrix)
+    - `package-smoke` (`npm pack --dry-run`)
+    - `security-audit` (manual/scheduled only)
+- Publish: `.github/workflows/publish.yml`
+  - Push to `dev`: publish npm package with `dev` tag
+  - Push tag `v*.*.*`: publish npm package as `latest`
+  - Verifies tag version matches `package.json` version before publish
+  - Publish command: `npm publish --provenance`
+Recommended local parity checks before pushing:
+```bash
+pnpm install --frozen-lockfile
+pnpm run build
+npm pack --dry-run
+pnpm audit --prod --audit-level high
+```
+## Publishing Flow
+- Dev pre-release (`dev`): push to `dev` branch to publish npm `dev` tag. Version is auto rewritten to `x.y.z-dev.<run>.<attempt>`.
+- Production release (`latest`): push `v*.*.*` tag to publish npm `latest`. Tag version must match `package.json`.
+## Install Repo Skill (Optional)
+Installs a reusable skill for Codex CLI / Claude Code to guide tool usage with the standard `submit/status/wait/export/download` workflow.
 One-command install without cloning (recommended):
@@ -107,28 +222,35 @@ One-command install without cloning (recommended):
 curl -fsSL https://raw.githubusercontent.com/NoEdgeAI/doc2x-mcp/main/scripts/install-skill.sh | sh
 ```
-Re-run the same command to overwrite (default behavior overwrites an existing destination directory).
 Install from this repo source directory:
 ```bash
-npm run skill:install
+pnpm run skill:install
 ```
-Default destination:
-The script installs to:
+Default destinations:
-- Codex CLI: `~/.codex/skills/public/doc2x-mcp` (override via `CODEX_HOME`)
-- Claude Code: `~/.claude/skills/doc2x-mcp` (override via `CLAUDE_HOME`)
+- Codex CLI: `~/.codex/skills/public/doc2x-mcp` (override with `CODEX_HOME`)
+- Claude Code: `~/.claude/skills/doc2x-mcp` (override with `CLAUDE_HOME`)
 Notes:
-- `--target auto` (default) installs to both Codex + Claude; use `--target codex|claude` to install only one.
-- PowerShell 7+ one-command install: `irm https://raw.githubusercontent.com/NoEdgeAI/doc2x-mcp/main/scripts/install-skill.ps1 | iex`
-- Windows PowerShell 5.1 one-command install: `irm https://raw.githubusercontent.com/NoEdgeAI/doc2x-mcp/main/scripts/install-skill-winps.ps1 | iex`
+- `--target auto` (default) installs to both Codex + Claude.
+- PowerShell 7+: `irm https://raw.githubusercontent.com/NoEdgeAI/doc2x-mcp/main/scripts/install-skill.ps1 | iex`
+- Windows PowerShell 5.1: `irm https://raw.githubusercontent.com/NoEdgeAI/doc2x-mcp/main/scripts/install-skill-winps.ps1 | iex`
-Override install dir examples:
+## Security and Troubleshooting
-- mac/linux: `CODEX_HOME=/custom/.codex curl -fsSL https://raw.githubusercontent.com/NoEdgeAI/doc2x-mcp/main/scripts/install-skill.sh | sh -s -- --target codex`
-- Windows: `$env:CODEX_HOME="C:\\path\\.codex"; irm https://raw.githubusercontent.com/NoEdgeAI/doc2x-mcp/main/scripts/install-skill.ps1 | iex`
+- Never commit real `DOC2X_API_KEY` to the repository.
+- The download allowlist is restrictive by default; evaluate risk before using `DOC2X_DOWNLOAD_URL_ALLOWLIST=*`.
+- Use `doc2x_debug_config` first when diagnosing config/environment issues.
+## Getting Help
+- Usage questions or bug reports: GitHub Issues
+  [https://github.com/NoEdgeAI/doc2x-mcp/issues](https://github.com/NoEdgeAI/doc2x-mcp/issues)
+- Include minimal reproduction input, affected tool name, and sanitized `doc2x_debug_config` output when possible.
+## License
+MIT License. See `LICENSE`.

package/dist/doc2x/materialize.d.ts CHANGED Viewed

@@ -1,3 +1,8 @@
+export declare function validatePdfLayoutResult(result: unknown, uid?: string): {
+    result: Record<string, unknown>;
+    pageCount: number;
+    hasLayout: true;
+};
 export declare function materializeConvertZip(args: {
     convert_zip_base64: string;
     output_dir: string;
@@ -6,3 +11,13 @@ export declare function materializeConvertZip(args: {
     zip_path: string;
     extracted: boolean;
 }>;
+export declare function materializePdfLayoutJson(args: {
+    result: unknown;
+    output_path: string;
+    uid?: string;
+}): Promise<{
+    uid: string;
+    output_path: string;
+    page_count: number;
+    has_layout: true;
+}>;

package/dist/doc2x/materialize.js CHANGED Viewed

@@ -1,6 +1,8 @@
 import fsp from 'node:fs/promises';
 import path from 'node:path';
 import { spawn } from 'node:child_process';
+import { ToolError } from '#errors';
+import { TOOL_ERROR_CODE_INVALID_JSON } from '#errorCodes';
 function spawnUnzip(zipPath, outputDir) {
     return new Promise((resolve) => {
         const child = spawn('unzip', ['-o', zipPath, '-d', outputDir], { stdio: 'ignore' });
@@ -8,6 +10,44 @@ function spawnUnzip(zipPath, outputDir) {
         child.on('exit', (code) => resolve(code === 0));
     });
 }
+function isRecord(value) {
+    return value !== null && typeof value === 'object' && !Array.isArray(value);
+}
+export function validatePdfLayoutResult(result, uid) {
+    if (!isRecord(result))
+        throw new ToolError({
+            code: TOOL_ERROR_CODE_INVALID_JSON,
+            message: 'parse result must be a JSON object',
+            retryable: false,
+            uid,
+        });
+    const pages = result.pages;
+    if (!Array.isArray(pages) || pages.length === 0)
+        throw new ToolError({
+            code: TOOL_ERROR_CODE_INVALID_JSON,
+            message: 'parse result must contain a non-empty pages array',
+            retryable: false,
+            uid,
+        });
+    for (let i = 0; i < pages.length; i++) {
+        const page = pages[i];
+        if (!isRecord(page))
+            throw new ToolError({
+                code: TOOL_ERROR_CODE_INVALID_JSON,
+                message: `pages[${i}] must be an object`,
+                retryable: false,
+                uid,
+            });
+        if (!isRecord(page.layout))
+            throw new ToolError({
+                code: TOOL_ERROR_CODE_INVALID_JSON,
+                message: `pages[${i}].layout must be an object`,
+                retryable: false,
+                uid,
+            });
+    }
+    return { result, pageCount: pages.length, hasLayout: true };
+}
 export async function materializeConvertZip(args) {
     const outDir = path.resolve(args.output_dir);
     await fsp.mkdir(outDir, { recursive: true });
@@ -17,3 +57,15 @@ export async function materializeConvertZip(args) {
     const extracted = await spawnUnzip(zipPath, outDir);
     return { output_dir: outDir, zip_path: zipPath, extracted };
 }
+export async function materializePdfLayoutJson(args) {
+    const validated = validatePdfLayoutResult(args.result, args.uid);
+    const outputPath = path.resolve(args.output_path);
+    await fsp.mkdir(path.dirname(outputPath), { recursive: true });
+    await fsp.writeFile(outputPath, `${JSON.stringify(validated.result, null, 2)}\n`, 'utf8');
+    return {
+        uid: args.uid ?? '',
+        output_path: outputPath,
+        page_count: validated.pageCount,
+        has_layout: validated.hasLayout,
+    };
+}

package/dist/doc2x/pdf.d.ts CHANGED Viewed

@@ -1,4 +1,6 @@
-export declare const PARSE_PDF_MODELS: readonly ["v3-2026"];
+export declare const PARSE_PDF_MODEL_V2: "v2";
+export declare const PARSE_PDF_MODEL_V3: "v3-2026";
+export declare const PARSE_PDF_MODELS: readonly ["v2", "v3-2026"];
 export type ParsePdfModel = (typeof PARSE_PDF_MODELS)[number];
 export declare function parsePdfSubmit(pdfPath: string, opts?: {
     model?: ParsePdfModel;
@@ -12,6 +14,15 @@ export declare function parsePdfStatus(uid: string): Promise<{
     detail: string;
     result: {} | null;
 }>;
+export declare function parsePdfWaitResultByUid(args: {
+    uid: string;
+    poll_interval_ms?: number;
+    max_wait_ms?: number;
+}): Promise<{
+    uid: string;
+    status: "success";
+    result: {} | null;
+}>;
 export declare function parsePdfWaitTextByUid(args: {
     uid: string;
     poll_interval_ms?: number;

package/dist/doc2x/pdf.js CHANGED Viewed

@@ -9,7 +9,9 @@ import { doc2xRequestJson, putToSignedUrl } from '#doc2x/client';
 import { DOC2X_TASK_STATUS_FAILED, DOC2X_TASK_STATUS_SUCCESS } from '#doc2x/constants';
 import { HTTP_METHOD_GET, HTTP_METHOD_POST } from '#doc2x/http';
 import { v2 } from '#doc2x/paths';
-export const PARSE_PDF_MODELS = ['v3-2026'];
+export const PARSE_PDF_MODEL_V2 = 'v2';
+export const PARSE_PDF_MODEL_V3 = 'v3-2026';
+export const PARSE_PDF_MODELS = [PARSE_PDF_MODEL_V2, PARSE_PDF_MODEL_V3];
 function mergePagesToTextWithLimit(result, joinWith, limits) {
     const parsed = result ?? null;
     const sourcePages = _.isArray(parsed?.pages) ? parsed.pages : [];
@@ -112,10 +114,9 @@ export async function parsePdfStatus(uid) {
         result: data.result ?? null,
     };
 }
-export async function parsePdfWaitTextByUid(args) {
+async function waitForParsePdfSuccessByUid(args) {
     const pollInterval = args.poll_interval_ms ?? CONFIG.pollIntervalMs;
     const maxWait = args.max_wait_ms ?? CONFIG.maxWaitMs;
-    const joinWith = args.join_with ?? '\n\n---\n\n';
     const uid = String(args.uid || '').trim();
     if (!uid)
         throw new ToolError({
@@ -145,13 +146,8 @@ export async function parsePdfWaitTextByUid(args) {
             }
             throw e;
         }
-        if (st.status === DOC2X_TASK_STATUS_SUCCESS) {
-            const merged = mergePagesToTextWithLimit(st.result, joinWith, {
-                maxOutputChars: args.max_output_chars,
-                maxOutputPages: args.max_output_pages,
-            });
-            return { uid, status: DOC2X_TASK_STATUS_SUCCESS, ...merged };
-        }
+        if (st.status === DOC2X_TASK_STATUS_SUCCESS)
+            return st;
         if (st.status === DOC2X_TASK_STATUS_FAILED)
             throw new ToolError({
                 code: TOOL_ERROR_CODE_PARSE_FAILED,
@@ -162,3 +158,16 @@ export async function parsePdfWaitTextByUid(args) {
         await sleep(pollInterval);
     }
 }
+export async function parsePdfWaitResultByUid(args) {
+    const st = await waitForParsePdfSuccessByUid(args);
+    return { uid: st.uid, status: DOC2X_TASK_STATUS_SUCCESS, result: st.result };
+}
+export async function parsePdfWaitTextByUid(args) {
+    const joinWith = args.join_with ?? '\n\n---\n\n';
+    const st = await waitForParsePdfSuccessByUid(args);
+    const merged = mergePagesToTextWithLimit(st.result, joinWith, {
+        maxOutputChars: args.max_output_chars,
+        maxOutputPages: args.max_output_pages,
+    });
+    return { uid: st.uid, status: DOC2X_TASK_STATUS_SUCCESS, ...merged };
+}

package/dist/mcp/registerPdfTools.js CHANGED Viewed

@@ -1,14 +1,15 @@
 import { CONFIG } from '#config';
 import { isRetryableError } from '#errors';
-import { parsePdfStatus, parsePdfSubmit, parsePdfWaitTextByUid, } from '#doc2x/pdf';
+import { PARSE_PDF_MODEL_V2, PARSE_PDF_MODEL_V3, parsePdfStatus, parsePdfSubmit, parsePdfWaitResultByUid, parsePdfWaitTextByUid, } from '#doc2x/pdf';
+import { materializePdfLayoutJson } from '#doc2x/materialize';
 import { asJsonResult, asTextResult } from '#mcp/results';
-import { deleteUidCache, fileSig, getSubmittedUidFromCache, joinWithSchema, makePdfUidCacheKey, missingEitherFieldError, nonNegativeIntSchema, parsePdfModelSchema, parsePdfUidSchema, pdfPathForWaitSchema, pdfPathSchema, positiveIntMsSchema, setFailedUidCache, setSubmittedUidCache, withToolErrorHandling, } from '#mcp/registerToolsShared';
+import { deleteUidCache, fileSig, getSubmittedUidFromCache, jsonOutputPathSchema, joinWithSchema, makePdfUidCacheKey, missingEitherFieldError, nonNegativeIntSchema, parsePdfModelSchema, parsePdfUidSchema, pdfPathForWaitSchema, pdfPathSchema, positiveIntMsSchema, setFailedUidCache, setSubmittedUidCache, withToolErrorHandling, } from '#mcp/registerToolsShared';
 export function registerPdfTools(server, ctx) {
     server.registerTool('doc2x_parse_pdf_submit', {
         description: 'Create a Doc2x PDF parse task for a local file and return {uid}. Prefer calling doc2x_parse_pdf_status to monitor progress/result; only call doc2x_parse_pdf_wait_text if the user explicitly asks to wait/return merged text.',
         inputSchema: {
             pdf_path: pdfPathSchema,
-            model: parsePdfModelSchema.describe("Optional parse model. Use 'v3-2026' to try the latest model. Omit this field to use default v2."),
+            model: parsePdfModelSchema.describe(`Optional parse model. Supported values: '${PARSE_PDF_MODEL_V2}' and '${PARSE_PDF_MODEL_V3}'. Omit this field to use default ${PARSE_PDF_MODEL_V2}.`),
         },
     }, withToolErrorHandling(async ({ pdf_path, model }) => {
         const sig = await fileSig(pdf_path);
@@ -45,7 +46,7 @@ export function registerPdfTools(server, ctx) {
                 .optional()
                 .describe('Max pages to merge into returned text (0 = unlimited). Default can be set via env DOC2X_PARSE_PDF_MAX_OUTPUT_PAGES.'),
             model: parsePdfModelSchema
-                .describe("Optional parse model used only when submitting from pdf_path. Use 'v3-2026' to try latest model. Omit this field to use default v2."),
+                .describe(`Optional parse model used only when submitting from pdf_path. Supported values: '${PARSE_PDF_MODEL_V2}' and '${PARSE_PDF_MODEL_V3}'. Omit this field to use default ${PARSE_PDF_MODEL_V2}.`),
         },
     }, withToolErrorHandling(async (args) => {
         const maxOutputChars = args.max_output_chars ?? CONFIG.parsePdfMaxOutputChars;
@@ -120,4 +121,65 @@ export function registerPdfTools(server, ctx) {
             }
         }
     }));
+    server.registerTool('doc2x_materialize_pdf_layout_json', {
+        description: `Wait for a PDF parse task and write the raw Doc2x result JSON (with page layout) to output_path. Prefer passing uid. If only pdf_path is provided, this tool reuses a cached uid or submits a new parse with model='${PARSE_PDF_MODEL_V3}' by default.`,
+        inputSchema: {
+            uid: parsePdfUidSchema.optional(),
+            pdf_path: pdfPathForWaitSchema.optional(),
+            output_path: jsonOutputPathSchema,
+            poll_interval_ms: positiveIntMsSchema.optional(),
+            max_wait_ms: positiveIntMsSchema.optional(),
+            model: parsePdfModelSchema
+                .describe(`Optional parse model used only when submitting from pdf_path. Supported values: '${PARSE_PDF_MODEL_V2}' and '${PARSE_PDF_MODEL_V3}'. Defaults to '${PARSE_PDF_MODEL_V3}' for this tool because ${PARSE_PDF_MODEL_V2} does not return layout.`),
+        },
+    }, withToolErrorHandling(async (args) => {
+        const materializeByUid = async (uid) => {
+            const out = await parsePdfWaitResultByUid({
+                uid,
+                poll_interval_ms: args.poll_interval_ms,
+                max_wait_ms: args.max_wait_ms,
+            });
+            return await materializePdfLayoutJson({
+                uid: out.uid,
+                result: out.result,
+                output_path: args.output_path,
+            });
+        };
+        const uid = String(args.uid || '').trim();
+        if (uid)
+            return asJsonResult(await materializeByUid(uid));
+        const pdfPath = String(args.pdf_path || '').trim();
+        if (!pdfPath)
+            throw missingEitherFieldError('uid', 'pdf_path');
+        const sig = await fileSig(pdfPath);
+        const model = args.model ?? PARSE_PDF_MODEL_V3;
+        const cacheKey = makePdfUidCacheKey(sig.absPath, model);
+        const resolvedUid = getSubmittedUidFromCache(ctx, { kind: 'pdf', key: cacheKey, sig });
+        const finalUid = resolvedUid || (await parsePdfSubmit(pdfPath, { model })).uid;
+        setSubmittedUidCache(ctx, { kind: 'pdf', key: cacheKey, sig, uid: finalUid });
+        const markFailed = (failedUid) => setFailedUidCache(ctx, { kind: 'pdf', key: cacheKey, sig, uid: failedUid });
+        try {
+            return asJsonResult(await materializeByUid(finalUid));
+        }
+        catch (e) {
+            if (!resolvedUid) {
+                markFailed(finalUid);
+                throw e;
+            }
+            deleteUidCache(ctx, { kind: 'pdf', key: cacheKey });
+            if (!isRetryableError(e)) {
+                markFailed(finalUid);
+                throw e;
+            }
+            const retryUid = (await parsePdfSubmit(pdfPath, { model })).uid;
+            setSubmittedUidCache(ctx, { kind: 'pdf', key: cacheKey, sig, uid: retryUid });
+            try {
+                return asJsonResult(await materializeByUid(retryUid));
+            }
+            catch (retryErr) {
+                markFailed(retryUid);
+                throw retryErr;
+            }
+        }
+    }));
 }

package/dist/mcp/registerToolsShared.d.ts CHANGED Viewed

@@ -82,6 +82,7 @@ export declare const convertToSchema: z.ZodEnum<{
     docx: "docx";
 }>;
 export declare const parsePdfModelSchema: z.ZodOptional<z.ZodEnum<{
+    v2: "v2";
     "v3-2026": "v3-2026";
 }>>;
 export declare const convertFormulaModeSchema: z.ZodEnum<{
@@ -98,6 +99,7 @@ export declare const imagePathSchema: z.ZodString;
 export declare const imagePathForWaitSchema: z.ZodString;
 export declare const pdfPathForWaitSchema: z.ZodString;
 export declare const outputPathSchema: z.ZodString;
+export declare const jsonOutputPathSchema: z.ZodString;
 export declare const doc2xDownloadUrlSchema: z.ZodPipe<z.ZodString, z.ZodURL>;
 export declare const convertZipBase64Schema: z.ZodString;
 export declare const outputDirSchema: z.ZodString;

package/dist/mcp/registerToolsShared.js CHANGED Viewed

@@ -5,7 +5,7 @@ import path from 'node:path';
 import { LRUCache } from 'lru-cache';
 import { z } from 'zod';
 import { CONVERT_FORMULA_LEVELS } from '#doc2x/convert';
-import { PARSE_PDF_MODELS } from '#doc2x/pdf';
+import { PARSE_PDF_MODEL_V2, PARSE_PDF_MODELS } from '#doc2x/pdf';
 import { ToolError } from '#errors';
 import { TOOL_ERROR_CODE_INVALID_ARGUMENT } from '#errorCodes';
 import { asErrorResult } from '#mcp/results';
@@ -101,7 +101,7 @@ export function sameSig(a, b) {
     return a.md5 === b.md5;
 }
 function normalizeParsePdfModel(model) {
-    return model ?? 'v2';
+    return model ?? PARSE_PDF_MODEL_V2;
 }
 export function makePdfUidCacheKey(absPath, model) {
     return JSON.stringify([absPath, normalizeParsePdfModel(model)]);
@@ -174,6 +174,11 @@ export const imagePathSchema = imagePathBaseSchema.describe("Absolute path to a
 export const imagePathForWaitSchema = imagePathBaseSchema.describe('Absolute path to a local image file (png/jpg). Used to reuse cached uid or submit a new async task.');
 export const pdfPathForWaitSchema = pdfPathSchema.describe('Absolute path to a local PDF file. If uid is not provided, this tool will reuse cached uid (if any) or submit a new task.');
 export const outputPathSchema = absolutePathSchema.describe('Absolute path for the output file. The file will be overwritten if it exists.');
+export const jsonOutputPathSchema = absolutePathSchema
+    .refine((v) => v.toLowerCase().endsWith('.json'), {
+    message: "Path must end with '.json'.",
+})
+    .describe('Absolute path for the output JSON file. The file will be overwritten if it exists.');
 export const doc2xDownloadUrlSchema = z
     .string()
     .trim()

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "@noedgeai-org/doc2x-mcp",
-  "version": "0.1.3-dev.6.1",
+  "version": "0.1.4-dev.8.1",
   "description": "Doc2x MCP server (stdio, MCP SDK).",
   "license": "MIT",
   "engines": {
@@ -31,7 +31,7 @@
     "skill:install:ps": "pwsh -NoProfile -ExecutionPolicy Bypass -File scripts/install-skill.ps1",
     "skill:install:winps": "powershell -NoProfile -ExecutionPolicy Bypass -File scripts/install-skill-winps.ps1",
     "start": "node dist/index.js",
-    "test:unit": "npm run build && node --test test/unit/registerToolsShared.test.js",
+    "test:unit": "npm run build && node --test test/unit/registerToolsShared.test.js test/unit/materialize.test.js",
     "test:e2e": "npm run build && node --test test/e2e/mcpServer.e2e.test.js",
     "test": "npm run test:unit && npm run test:e2e",
     "prepublishOnly": "pnpm run build"

package/skills/doc2x-mcp/SKILL.md CHANGED Viewed

@@ -1,173 +1,116 @@
 ---
 name: doc2x-mcp
-description: 使用 Doc2x MCP 工具完成文档解析与转换：对 PDF/扫描件/图片做 OCR 与版面解析，抽取文本/表格，导出为 Markdown/LaTeX(TeX)/DOCX 并下载落盘（submit/status/wait/export/download）。当用户提到 PDF/pdfs、scanned PDF、OCR、image-to-text、extract text/tables、表格抽取、文档转换/convert、导出/export、Markdown、LaTeX/TeX、DOCX、doc2x、doc2x-mcp、MCP 时使用。
+description: 使用 Doc2x MCP 工具处理 PDF、扫描件和图片：提交解析、查询状态、等待文本、导出 Markdown/LaTeX/DOCX、下载落盘，以及将 PDF v3 layout 结果写为本地 JSON。用户提到 PDF、OCR、scan/scanned PDF、image-to-text、extract text/tables、表格抽取、layout、Markdown、LaTeX/TeX、DOCX、doc2x、doc2x-mcp、MCP、figure/table crop、v3 JSON 时使用。
 ---
-# Doc2x MCP Tool-Use Skill (for LLM)
+# Doc2x MCP
-## 你要做什么
+## 目的
-你是一个会调用 MCP tools 的助手。凡是涉及 PDF/图片的“解析/抽取/导出/下载”，必须通过 `doc2x-mcp` tools 执行真实操作：
+凡是“解析 PDF/图片、抽取文本/表格、导出文档、下载结果、获取 v3 layout JSON”的请求，都应通过 `doc2x-mcp` tools 执行真实操作，不要臆造 `uid`、`url`、文件内容或导出结果。
-- 不要臆测/伪造 `uid`、`url`、文件内容或导出结果
-- 不要跳过工具步骤直接输出“看起来合理”的内容
+## 必须遵守
-## 全局约束（必须遵守）
+1. 所有文件路径都用绝对路径：`pdf_path`、`image_path`、`output_path`、`output_dir`。
+2. 不要伪造下载 URL；只能使用 `doc2x_convert_export_*` 返回的 `url`。
+3. 同一个 `uid` 的同一组导出参数不要并发重复提交。
+4. 同一个 `uid` 做多档导出对比时，必须按“导出成功 -> 立即下载 -> 再导出下一档”执行，避免结果覆盖。
+5. 不要回显 `DOC2X_API_KEY`；排错只用 `doc2x_debug_config` 的摘要信息。
+6. `model` 只用于 PDF 解析提交；`formula_level` 只用于导出，且仅在源解析为 `v3-2026` 时有效。
+7. `doc2x_parse_pdf_wait_text` 只适合预览或摘要；需要完整结果时优先导出文件。
+8. 需要 PDF v3 block/layout 坐标时，不要从文本结果推断，直接使用 `doc2x_materialize_pdf_layout_json`。
-1. 路径必须是绝对路径
-   `pdf_path` / `image_path` / `output_path` / `output_dir` 都应使用绝对路径；相对路径可能会被 server 以意外的 cwd 解析导致失败。
+## 参数边界
-2. 扩展名约束
-   `doc2x_parse_pdf_submit.pdf_path` 必须以 `.pdf` 结尾；图片解析使用 `png/jpg`。
+- PDF 解析：`doc2x_parse_pdf_submit` 和 `doc2x_parse_pdf_wait_text(pdf_path 分支)` 可传 `model: "v2" | "v3-2026"`；不传默认 `v2`。
+- PDF layout JSON：`doc2x_materialize_pdf_layout_json` 在 `pdf_path` 分支默认使用 `v3-2026`，并要求返回结果包含 `pages[].layout`。
+- 导出：`formula_mode` 建议总是显式传入。
+- `formula_level` 必须传数字 `0 | 1 | 2`，不要传字符串。
+- 图片解析路径只接受 `png/jpg/jpeg`；PDF 路径必须以 `.pdf` 结尾；layout JSON 输出路径应以 `.json` 结尾。
-3. 不要并发重复提交导出
-   同一个 `uid` 对同一种导出配置（`to + formula_mode + formula_level (+ filename + filename_mode + merge_cross_page_forms...)`）不要并行重复 submit。
-   补充：同一 `uid + to` 的导出结果可能会被后一次覆盖；做“多档对比”（如 `formula_level=0/1/2`）时，必须按 **导出成功 → 立即下载落盘 → 再导出下一档** 的顺序执行。
+## 按目标选 Tool
-4. 不要泄露密钥
-   永远不要回显/记录 `DOC2X_API_KEY`。排错只用 `doc2x_debug_config` 的 `apiKeyLen/apiKeyPrefix/apiKeySource`。
+- 提交 PDF 解析：`doc2x_parse_pdf_submit`
+- 查看 PDF 状态：`doc2x_parse_pdf_status`
+- 取 PDF 文本预览：`doc2x_parse_pdf_wait_text`
+- 导出 PDF 为 `md/tex/docx`：`doc2x_convert_export_wait`
+- 下载导出文件：`doc2x_download_url_to_file`
+- 落盘 PDF v3 layout JSON：`doc2x_materialize_pdf_layout_json`
+- 图片版面解析原始结果：`doc2x_parse_image_layout_sync`
+- 图片版面解析并等待首屏 Markdown：`doc2x_parse_image_layout_submit` -> `doc2x_parse_image_layout_wait_text`
+- 落盘 `convert_zip`：`doc2x_materialize_convert_zip`
+- 配置排错：`doc2x_debug_config`
-5. 不要伪造下载 URL
-   下载必须使用 `doc2x_convert_export_*` 返回的 `url`；不要自己拼接。
+## 标准流程
-6. 参数生效边界
-   `model` 仅用于 PDF 解析提交（默认 `v2`，可选 `v3-2026`）；`formula_level` 仅用于导出（`doc2x_convert_export_*`），并且只在源解析任务使用 `v3-2026` 时生效（`v2` 下无效）。
+### 1. PDF -> 完整文件
-## 关键参数语义（避免误用）
+当用户要完整 Markdown / TeX / DOCX，本流程优先：
-- `doc2x_parse_pdf_submit` / `doc2x_parse_pdf_wait_text(pdf_path 提交分支)`
-  - 可选 `model: "v3-2026"`；不传则默认 `v2`。
-- `doc2x_convert_export_submit` / `doc2x_convert_export_wait`
-  - `formula_mode`：`"normal"` 或 `"dollar"`（关键参数，建议总是显式传入）。
-  - `formula_level`：`0 | 1 | 2`（可选，**数字类型**，不要传字符串 `"0"|"1"|"2"`）
-    - `0`：不退化公式（保留原始 Markdown）
-    - `1`：行内公式退化为普通文本（`\(...\)`、`$...$`）
-    - `2`：行内 + 块级公式全部退化为普通文本（`\(...\)`、`$...$`、`\[...\]`、`$$...$$`）
+1. `doc2x_parse_pdf_submit({ pdf_path, model? })`
+2. 轮询 `doc2x_parse_pdf_status({ uid })` 直到成功
+3. `doc2x_convert_export_wait({ uid, to, formula_mode, formula_level?, filename?, filename_mode? })`
+4. `doc2x_download_url_to_file({ url, output_path })`
-## Tool 选择（按用户目标）
+说明：
-- **PDF 解析任务**：`doc2x_parse_pdf_submit` → `doc2x_parse_pdf_status`
-- **少量预览/摘要**：`doc2x_parse_pdf_wait_text`（可能截断；要完整内容请导出文件）
-- **导出文件（md/tex/docx）**：`doc2x_convert_export_submit` → `doc2x_convert_export_wait`（或直接 `doc2x_convert_export_wait` 走兼容模式一键导出）
-- **下载落盘**：`doc2x_download_url_to_file`
-- **图片版面解析**：`doc2x_parse_image_layout_sync` 或 `doc2x_parse_image_layout_submit` → `doc2x_parse_image_layout_wait_text`
-- **解包资源 zip**：`doc2x_materialize_convert_zip`
-- **配置排错**：`doc2x_debug_config`
+- `md/docx` 常用 `formula_mode: "normal"`
+- `tex` 常用 `formula_mode: "dollar"`
+- 需要完整内容时，不要用 `doc2x_parse_pdf_wait_text` 代替导出
-## 标准工作流（照做）
+### 2. PDF -> 文本预览
-### 工作流 A：批量 PDF → 导出文件（MD/TEX/DOCX，高效并行版）
+仅在用户要快速预览、摘要、少量文本时使用：
-适用于“多个 PDF 批量导出并落盘（.md / .tex / .docx）”。核心原则：
+- `doc2x_parse_pdf_wait_text({ pdf_path | uid, max_output_chars?, max_output_pages?, model? })`
-- `doc2x_parse_pdf_submit` 可并行（批量提交）
-- `doc2x_parse_pdf_status` 可并行（批量轮询）
-- **流水线式并行**：某个 `uid` 一旦解析成功，立刻开始该 `uid` 的导出+下载（不必等所有 PDF 都解析完）
-- 不同 `uid` 的导出与下载可并行
-- **同一个 `uid` 的同一种导出配置（`to + formula_mode + formula_level (+ filename + filename_mode + merge_cross_page_forms...)`）不要并行重复提交**
-- 同一个 `uid` 若要导出多种格式（例如 md + docx + tex），建议**按格式串行**，但不同 `uid` 仍可并行
+若出现截断提示，应切回“PDF -> 完整文件”流程。
-**批量提交解析任务（并行）**
+### 3. PDF -> v3 layout JSON
-- 对每个 `pdf_path` 调用：`doc2x_parse_pdf_submit({ pdf_path, model? })` → `{ uid }`
+当用户要 figure/table 坐标、block bbox、layout blocks、后续裁剪脚本输入时使用：
-**等待解析完成（并行）**
+- 优先：`doc2x_materialize_pdf_layout_json({ uid | pdf_path, output_path, model? })`
-- 对每个 `uid` 轮询：`doc2x_parse_pdf_status({ uid })` 直到 `status="success"`
-- 若 `status="failed"`：汇报 `detail`，该文件停止后续步骤
+要向用户说明 `layout` 的用途：
-**导出目标格式（并行，按 uid）**
+- `Markdown/text` 适合阅读正文；`layout` 适合程序继续处理页面结构
+- `layout.blocks[].bbox` 可用于 figure/table 裁剪、区域截图、框选高亮、可视化调试
+- `layout.blocks[].type` 可用于区分标题、正文、表格、图片等块，做结构化抽取
+- `layout` 适合作为后续脚本输入，例如 figure/table crop、block 对齐、版面分析
+- 如果用户只想“看内容”，优先给 Markdown / DOCX；如果用户要“知道内容在页面哪里”，就用 `layout`
-推荐用 `doc2x_convert_export_wait` 走“兼容模式一键导出”（当你提供 `formula_mode` 且本进程未提交过该导出时，会自动 submit 一次，然后 wait），避免你手动拆成 submit+wait：
+行为要求：
-- DOCX：`doc2x_convert_export_wait({ uid, to: "docx", formula_mode: "normal", formula_level? })` → `{ status: "success", url }`
-- Markdown：`doc2x_convert_export_wait({ uid, to: "md", formula_mode: "normal", formula_level?, filename?, filename_mode? })` → `{ status: "success", url }`
-- LaTeX：`doc2x_convert_export_wait({ uid, to: "tex", formula_mode: "dollar", formula_level? })` → `{ status: "success", url }`
+- 走 `pdf_path` 分支时，默认使用 `v3-2026`
+- 输出的是原始 parse `result` JSON，而不是精简文本
+- 若返回结果缺少 `pages[].layout`，应视为失败而不是静默降级
-（或显式两步：`doc2x_convert_export_submit(...)` → `doc2x_convert_export_wait({ uid, to })`）
+### 4. 图片 -> 版面结果
-**补充建议**
+- 直接拿原始结果：`doc2x_parse_image_layout_sync({ image_path })`
+- 等待并取首屏 Markdown：`doc2x_parse_image_layout_submit({ image_path })` -> `doc2x_parse_image_layout_wait_text({ uid })`
+- 结果包含 `convert_zip` 且用户要资源落盘时：`doc2x_materialize_convert_zip({ convert_zip_base64, output_dir })`
-- `formula_mode` 是关键参数：建议总是显式传入（`"normal"` / `"dollar"`，按用户偏好选择；常见：`md/docx` 用 `"normal"`、`tex` 用 `"dollar"`）
-- 需要做公式退化时显式传 `formula_level`（`0/1/2`）；若不需要退化，建议显式传 `0`，避免调用端默认值歧义
-- `filename`/`filename_mode` 主要用于 `md/tex`：传不带扩展名的 basename，并配合 `filename_mode: "auto"`（避免 `name.md.md` / `name.tex.tex`）
-- 对同一个 `uid` 做多格式导出时，先确定顺序（例如先 md 再 docx），逐个完成再进行下一个格式
-- 对同一个 `uid` 的同一格式做“多档参数对比”（如 `formula_level`），每一档都要先下载再进行下一档，避免覆盖导致误判
+### 5. 批量 PDF
-**批量下载（并行）**
+批量场景采用流水线，不要全串行：
-- `doc2x_download_url_to_file({ url, output_path })` → `{ output_path, bytes_written }`
-- `output_path` 必须为绝对路径，且每个文件应唯一（建议用原文件名 + 对应扩展名：`.md` / `.tex` / `.docx`）
+1. 多个 `pdf_path` 可并行 `doc2x_parse_pdf_submit`
+2. 多个 `uid` 可并行 `doc2x_parse_pdf_status`
+3. 某个 `uid` 一旦 parse 成功，立即开始它自己的导出和下载
+4. 不同 `uid` 可并行导出
+5. 同一个 `uid` 的同一种导出配置不要并发
-**并发建议**
+## 向用户回报
-- 10 个 PDF 以内通常可以直接并行；更多文件建议分批/限流（避免触发超时/限流）
+- 成功时报告：输入文件、`uid`、输出路径、必要时 `bytes_written`
+- 失败时报告：错误码、错误消息、相关 `uid`，并指出哪些文件未受影响
+- 当用户目标是“本地文件”时，优先回报落盘结果，不要只贴长文本
-**向用户回报（按文件汇总）**
+## 常见错误处理
-- 成功：列出每个输入文件对应的 `output_path` 与 `bytes_written`
-- 失败：列出失败文件与错误原因（包含 `uid` 与 `detail`/错误码），并说明其余文件不受影响
-### 工作流 B：PDF → Markdown 文件（推荐）
-当用户目标是“拿到完整 Markdown / 落盘”，主链路应当是导出与下载，不要依赖 `doc2x_parse_pdf_wait_text`。
-**提交解析任务**
-- `doc2x_parse_pdf_submit({ pdf_path, model? })` → `{ uid }`
-**等待解析完成**
-- 轮询 `doc2x_parse_pdf_status({ uid })` 直到 `status="success"`（失败则带 `detail` 汇报）
-**导出 Markdown**
-- `doc2x_convert_export_wait({ uid, to: "md", formula_mode: "normal", formula_level?, filename?, filename_mode? })` → `{ status: "success", url }`
-**下载落盘**
-- `doc2x_download_url_to_file({ url, output_path })` → `{ output_path, bytes_written }`
-**向用户回报**
-- 回复用户：保存路径、文件大小、`uid`（必要时附上 `url`）
-### 工作流 C：PDF → 文本预览（可控长度）
-当用户只需要“摘要/少量预览”时才用：
-- `doc2x_parse_pdf_wait_text({ pdf_path | uid, max_output_chars?, max_output_pages? })`
-如果返回包含截断提示（`[doc2x-mcp] Output truncated ...`），应切换到“工作流 B”导出 md 获取完整内容。
-### 工作流 D：PDF 导出格式（MD / TEX / DOCX）
-- Markdown：`to="md"`（完整 Markdown 导出优先参考“工作流 B”）
-- LaTeX：`to="tex"`
-- Word：`to="docx"`
-- 调用链同“工作流 A / B”（先解析 → 再导出 → 再下载），按目标格式调整 `to`（并按需设置 `formula_mode/formula_level/filename`）
-- 注意：`doc2x_convert_export_submit.formula_mode` 必填（`"normal"` 或 `"dollar"`）；`formula_level` 可选（`0/1/2`）
-- 若需要对比不同 `formula_level`，请按顺序执行并在每次导出成功后立即下载，再进行下一档，避免后一次结果覆盖前一次。
-### 工作流 E：图片 → Markdown（版面解析）
-- 只要结果（同步）：`doc2x_parse_image_layout_sync({ image_path })`（返回原始 JSON，可能包含 `convert_zip`）
-- 要首屏 markdown（异步）：`doc2x_parse_image_layout_submit({ image_path })` → `doc2x_parse_image_layout_wait_text({ uid })`
-如果结果里有 `convert_zip`（base64）且用户希望落盘资源文件：
-- `doc2x_materialize_convert_zip({ convert_zip_base64, output_dir })` → `{ output_dir, zip_path, extracted }`
-## 失败与排错（你应当这样处理）
-1. 鉴权/配置异常
-   先 `doc2x_debug_config()`，确认 `apiKeyLen > 0` 且 `baseUrl/httpTimeoutMs/pollIntervalMs/maxWaitMs` 合理。
-2. 等待超时
-   建议用户调大 `DOC2X_MAX_WAIT_MS` 或按需调 `DOC2X_POLL_INTERVAL_MS`（不要过于频繁）。
-3. 下载被阻止（安全策略）
-   `doc2x_download_url_to_file` 只允许 `https` 且要求 host 在 `DOC2X_DOWNLOAD_URL_ALLOWLIST` 内；被拦截时解释原因，并让用户选择“加 allowlist”或“保持默认安全策略”。
-4. 用户给的是相对路径/不确定路径
-   要求用户提供绝对路径；不要猜。
+1. 缺参数或路径不合法：提示用户提供绝对路径，不要猜测相对路径。
+2. 等待超时：说明可调大 `DOC2X_MAX_WAIT_MS` 或适度调整轮询间隔。
+3. 下载被策略拦截：解释是 `DOC2X_DOWNLOAD_URL_ALLOWLIST` 限制，不要绕过。
+4. 认证或配置问题：调用 `doc2x_debug_config`，只汇报 `apiKeySource/apiKeyPrefix/apiKeyLen` 等摘要。