@echofiles/echo-pdf 0.4.2 → 0.5.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +232 -16
- package/bin/echo-pdf.js +176 -8
- package/bin/lib/http.js +26 -1
- package/dist/auth.js +16 -4
- package/dist/local/index.d.ts +135 -0
- package/dist/local/index.js +555 -0
- package/dist/mcp-server.js +3 -6
- package/dist/node/pdfium-local.d.ts +8 -0
- package/dist/node/pdfium-local.js +147 -0
- package/dist/node/semantic-local.d.ts +16 -0
- package/dist/node/semantic-local.js +113 -0
- package/dist/pdf-config.js +10 -0
- package/dist/pdf-types.d.ts +4 -0
- package/dist/provider-client.d.ts +8 -0
- package/dist/provider-client.js +39 -0
- package/dist/worker.js +20 -0
- package/package.json +13 -2
package/README.md
CHANGED
|
@@ -1,34 +1,183 @@
|
|
|
1
1
|
# echo-pdf
|
|
2
2
|
|
|
3
|
-
`echo-pdf`
|
|
3
|
+
`echo-pdf` 当前阶段定位为本地优先的 PDF context engine for AI agents。
|
|
4
|
+
|
|
5
|
+
一句话定义:
|
|
6
|
+
|
|
7
|
+
- 把本地 PDF 处理成可复用的 CLI outputs、library primitives 和 workspace artifacts,供本机 agent/app 继续消费。
|
|
8
|
+
|
|
9
|
+
当前主线产品形态:
|
|
10
|
+
|
|
11
|
+
- npm package:`@echofiles/echo-pdf`
|
|
12
|
+
- CLI:`echo-pdf ...`
|
|
13
|
+
- 本地 workspace artifacts:`.echo-pdf-workspace/...`
|
|
14
|
+
- 文档站:仅用于说明安装、CLI、artifacts 和集成契约,不提供在线处理服务
|
|
15
|
+
|
|
16
|
+
目标用户与主要用法:
|
|
17
|
+
|
|
18
|
+
- 需要在本机或本地开发环境处理 PDF 的 agent / IDE / app 开发者
|
|
19
|
+
- 需要稳定页级 primitives、document context 和可缓存 artifacts 的下游集成方
|
|
20
|
+
- 需要 clean consumer import + CLI workflow 的本地组件使用方
|
|
21
|
+
|
|
22
|
+
当前阶段能力:
|
|
4
23
|
|
|
5
24
|
- 页面提取:把 PDF 指定页渲染为图片
|
|
6
25
|
- OCR:识别页面文本
|
|
7
26
|
- 表格识别:提取表格并输出 LaTeX `tabular`
|
|
8
|
-
-
|
|
27
|
+
- 页级文档索引:生成本地可复用的 document / page artifacts
|
|
28
|
+
- 语义结构层:在 page index 之上产出可缓存的 heading / section 结构
|
|
29
|
+
- 页面渲染与 OCR artifacts:把 page render/image 与 OCR 结果缓存到本地 workspace
|
|
30
|
+
|
|
31
|
+
当前阶段优先:
|
|
32
|
+
|
|
33
|
+
- 本地 CLI
|
|
34
|
+
- 本地 library/client API
|
|
35
|
+
- 本地 workspace artifacts
|
|
36
|
+
- clean-consumer npm package
|
|
37
|
+
|
|
38
|
+
当前阶段非重点:
|
|
39
|
+
|
|
40
|
+
- MCP 扩展或把 MCP 作为主入口
|
|
41
|
+
- Hosted SaaS / multi-tenant 平台能力
|
|
42
|
+
- 把网站做成在线 PDF 服务
|
|
43
|
+
- datasheet / EDA 等领域特化逻辑
|
|
44
|
+
|
|
45
|
+
进一步的定位说明见:
|
|
46
|
+
|
|
47
|
+
- [`docs/PRODUCT.md`](./docs/PRODUCT.md)
|
|
48
|
+
- [`docs/PACKAGING.md`](./docs/PACKAGING.md)
|
|
49
|
+
- [`docs/WORKSPACE_CONTRACT.md`](./docs/WORKSPACE_CONTRACT.md)
|
|
50
|
+
- [`docs/DEVELOPMENT.md`](./docs/DEVELOPMENT.md)
|
|
51
|
+
|
|
52
|
+
## Local-first workflow
|
|
53
|
+
|
|
54
|
+
最短路径:
|
|
55
|
+
|
|
56
|
+
```bash
|
|
57
|
+
npm i -g @echofiles/echo-pdf
|
|
58
|
+
echo-pdf document ./sample.pdf
|
|
59
|
+
echo-pdf structure ./sample.pdf
|
|
60
|
+
echo-pdf semantic ./sample.pdf
|
|
61
|
+
echo-pdf page ./sample.pdf --page 1
|
|
62
|
+
echo-pdf render ./sample.pdf --page 1
|
|
63
|
+
echo-pdf ocr ./sample.pdf --page 1 --model gpt-4.1-mini
|
|
64
|
+
```
|
|
65
|
+
|
|
66
|
+
源码 checkout 的本地开发路径:
|
|
67
|
+
|
|
68
|
+
```bash
|
|
69
|
+
npm install
|
|
70
|
+
npm run document:dev -- document ./fixtures/smoke.pdf
|
|
71
|
+
npm run document:dev -- structure ./fixtures/smoke.pdf
|
|
72
|
+
npm run document:dev -- semantic ./fixtures/smoke.pdf
|
|
73
|
+
npm run document:dev -- page ./fixtures/smoke.pdf --page 1
|
|
74
|
+
```
|
|
75
|
+
|
|
76
|
+
说明:
|
|
77
|
+
|
|
78
|
+
- 发布包 / 已构建 checkout:`echo-pdf document ...` 继续走 `dist/`
|
|
79
|
+
- 源码 checkout 且还没 build:使用 `npm run document:dev -- ...`
|
|
80
|
+
- `document:dev` 只用于本地开发;它会显式优先加载 `src/local/index.ts`,即使仓库里仍然存在旧 `dist/`
|
|
81
|
+
- 发布包和正常 `echo-pdf document ...` 仍然只走 `dist/`
|
|
82
|
+
|
|
83
|
+
默认会在当前目录写入可检查的 workspace:
|
|
84
|
+
|
|
85
|
+
```text
|
|
86
|
+
.echo-pdf-workspace/
|
|
87
|
+
documents/<documentId>/
|
|
88
|
+
document.json
|
|
89
|
+
structure.json
|
|
90
|
+
semantic-structure.json
|
|
91
|
+
pages/
|
|
92
|
+
0001.json
|
|
93
|
+
0002.json
|
|
94
|
+
...
|
|
95
|
+
renders/
|
|
96
|
+
0001.scale-2.json
|
|
97
|
+
0001.scale-2.png
|
|
98
|
+
ocr/
|
|
99
|
+
0001.scale-2.provider-openai.model-gpt-4o.prompt-<hash>.json
|
|
100
|
+
```
|
|
101
|
+
|
|
102
|
+
这些 artifacts 会在 PDF 未变化时被复用,便于本地下游产品(例如 echo-datasheet)做增量读取。
|
|
103
|
+
|
|
104
|
+
正式的 workspace layout、cache/invalidation、detector/strategy metadata、以及下游可依赖边界,见 [`docs/WORKSPACE_CONTRACT.md`](./docs/WORKSPACE_CONTRACT.md)。
|
|
105
|
+
|
|
106
|
+
## Local library/client API
|
|
107
|
+
|
|
108
|
+
当前阶段优先提供本地可组合 primitives,让下游产品直接围绕 PDF 建立 document metadata / page-level artifacts,而不是先依赖远端 MCP/SaaS。
|
|
109
|
+
|
|
110
|
+
### Local-first entrypoints(semver 稳定)
|
|
111
|
+
|
|
112
|
+
- `@echofiles/echo-pdf`:core API
|
|
113
|
+
- `@echofiles/echo-pdf/core`:与根入口等价的 core API
|
|
114
|
+
- `@echofiles/echo-pdf/local`:本地 document primitives
|
|
115
|
+
- `@echofiles/echo-pdf/worker`:兼容保留的 Worker 路由入口,不是本阶段重点
|
|
116
|
+
|
|
117
|
+
### Local document primitives
|
|
118
|
+
|
|
119
|
+
```ts
|
|
120
|
+
import {
|
|
121
|
+
get_document,
|
|
122
|
+
get_document_structure,
|
|
123
|
+
get_semantic_document_structure,
|
|
124
|
+
get_page_content,
|
|
125
|
+
get_page_render,
|
|
126
|
+
get_page_ocr,
|
|
127
|
+
} from "@echofiles/echo-pdf/local"
|
|
128
|
+
|
|
129
|
+
const doc = await get_document({ pdfPath: "./sample.pdf" })
|
|
130
|
+
const pageIndex = await get_document_structure({ pdfPath: "./sample.pdf" })
|
|
131
|
+
const semantic = await get_semantic_document_structure({ pdfPath: "./sample.pdf" })
|
|
132
|
+
const page1 = await get_page_content({ pdfPath: "./sample.pdf", pageNumber: 1 })
|
|
133
|
+
const render1 = await get_page_render({ pdfPath: "./sample.pdf", pageNumber: 1 })
|
|
134
|
+
const ocr1 = await get_page_ocr({ pdfPath: "./sample.pdf", pageNumber: 1, model: "gpt-4.1-mini" })
|
|
135
|
+
```
|
|
136
|
+
|
|
137
|
+
这些调用会把 artifacts 写入本地 workspace,并在 PDF 未变化时尽量复用已有页面结果。
|
|
138
|
+
`get_document_structure()` 继续返回最小 page index:`document -> pages[]`。
|
|
139
|
+
`get_semantic_document_structure()` 单独返回 heading / section 语义层,并写入 `semantic-structure.json`。
|
|
140
|
+
`get_page_render()` 会生成可复用的 PNG + metadata。
|
|
141
|
+
`get_page_ocr()` 会把 OCR 结果写入独立 artifact;它需要本地 provider key / model,不依赖 MCP 或远端服务入口。
|
|
142
|
+
|
|
143
|
+
### Page Index vs Semantic Structure
|
|
9
144
|
|
|
10
|
-
|
|
145
|
+
- `get_document_structure()`
|
|
146
|
+
- 契约:稳定的 `document -> pages[]`
|
|
147
|
+
- artifact:`structure.json`
|
|
148
|
+
- 目的:给下游做页级遍历、page artifact 定位、增量读取
|
|
149
|
+
- `get_semantic_document_structure()`
|
|
150
|
+
- 契约:显式的 heading / section 语义层,优先走本地 provider/model 的 agent 抽取;未配置或失败时退回保守 heuristic
|
|
151
|
+
- artifact:`semantic-structure.json`
|
|
152
|
+
- 目的:给下游做章节导航、语义分段;它不替代 page index,也不改变 `pages[]` 输出
|
|
11
153
|
|
|
12
|
-
|
|
13
|
-
- CLI
|
|
14
|
-
- HTTP API
|
|
154
|
+
当前 semantic 结构会把 detector 明确写入 artifact:
|
|
15
155
|
|
|
16
|
-
|
|
156
|
+
- `agent-structured-v1`:使用本地配置的 provider/model 对 page text 进行结构化抽取
|
|
157
|
+
- `heading-heuristic-v1`:当本地未配置模型,或 agent 抽取失败时使用的保守回退
|
|
17
158
|
|
|
18
|
-
|
|
159
|
+
两种模式都遵循同一输出契约;检测不到时会返回空结构,而不是伪造 section tree。
|
|
160
|
+
|
|
161
|
+
## Tool library compatibility
|
|
162
|
+
|
|
163
|
+
除了 local-first primitives 之外,`@echofiles/echo-pdf` 仍保留现有 `pdf_extract_pages / pdf_ocr_pages / pdf_tables_to_latex / file_ops` 工具实现的复用入口,用于兼容已有集成。
|
|
19
164
|
|
|
20
165
|
### Public entrypoints(semver 稳定)
|
|
21
166
|
|
|
22
|
-
- `@echofiles/echo-pdf`:core API
|
|
167
|
+
- `@echofiles/echo-pdf`:core API
|
|
23
168
|
- `@echofiles/echo-pdf/core`:与根入口等价的 core API
|
|
24
|
-
- `@echofiles/echo-pdf/
|
|
169
|
+
- `@echofiles/echo-pdf/local`:本地 document primitives
|
|
170
|
+
- `@echofiles/echo-pdf/worker`:Worker 路由入口(兼容保留)
|
|
25
171
|
|
|
26
172
|
仅以上 `exports` 子路径视为公开 API。`src/*`、`dist/*` 等深路径导入不受兼容性承诺保护,可能在次版本中变动。
|
|
27
173
|
|
|
174
|
+
完整的 package entrypoint、runtime、semver、以及 clean-consumer import 保证,见 [`docs/PACKAGING.md`](./docs/PACKAGING.md)。
|
|
175
|
+
|
|
28
176
|
### Runtime expectations
|
|
29
177
|
|
|
30
178
|
- Node.js: `>=20`(与 `package.json#engines` 一致)
|
|
31
179
|
- 需要 ESM `import` 能力与标准 `fetch`(Node 20+ 原生支持)
|
|
180
|
+
- `@echofiles/echo-pdf/local` 面向本地 Node/Bun CLI 或 app runtime
|
|
32
181
|
- 建议使用支持 package `exports` 的现代 bundler/runtime(Vite、Webpack 5、Rspack、esbuild、Wrangler 等)
|
|
33
182
|
- TypeScript 消费方建议:`module=NodeNext` + `moduleResolution=NodeNext`
|
|
34
183
|
|
|
@@ -41,7 +190,7 @@ tmpdir="$(mktemp -d)"
|
|
|
41
190
|
cd "$tmpdir"
|
|
42
191
|
npm init -y
|
|
43
192
|
npm i /path/to/echofiles-echo-pdf-<version>.tgz
|
|
44
|
-
node --input-type=module -e "await import('@echofiles/echo-pdf'); await import('@echofiles/echo-pdf/core'); await import('@echofiles/echo-pdf/worker'); console.log('ok')"
|
|
193
|
+
node --input-type=module -e "await import('@echofiles/echo-pdf'); await import('@echofiles/echo-pdf/core'); await import('@echofiles/echo-pdf/local'); await import('@echofiles/echo-pdf/worker'); console.log('ok')"
|
|
45
194
|
```
|
|
46
195
|
|
|
47
196
|
### Example
|
|
@@ -87,7 +236,10 @@ console.log(result)
|
|
|
87
236
|
- 对公开 API 的破坏性变更只会在 major 版本发布
|
|
88
237
|
- 新增导出、参数扩展(向后兼容)会在 minor/patch 发布
|
|
89
238
|
|
|
90
|
-
## 1.
|
|
239
|
+
## 1. Compatibility surfaces(deferred / not primary)
|
|
240
|
+
|
|
241
|
+
以下内容是当前仓库中兼容保留的入口,不是本阶段主线产品形态。
|
|
242
|
+
主线仍然是 npm package + CLI + local workspace artifacts;网站只是文档站,不是在线服务。
|
|
91
243
|
|
|
92
244
|
请先确定你的线上地址(Worker 域名)。文档里用:
|
|
93
245
|
|
|
@@ -159,7 +311,65 @@ echo-pdf config set --key service.storage.maxFileBytes --value 10000000
|
|
|
159
311
|
echo-pdf config set --key service.maxPagesPerRequest --value 20
|
|
160
312
|
```
|
|
161
313
|
|
|
162
|
-
##
|
|
314
|
+
## 2.1 六个核心 primitives
|
|
315
|
+
|
|
316
|
+
本地 CLI 主命令面与 `@echofiles/echo-pdf/local` 的六个 primitives 一一对应:
|
|
317
|
+
|
|
318
|
+
- `document <file.pdf>` -> `get_document`
|
|
319
|
+
- `structure <file.pdf>` -> `get_document_structure`
|
|
320
|
+
- `semantic <file.pdf>` -> `get_semantic_document_structure`
|
|
321
|
+
- `page <file.pdf> --page <N>` -> `get_page_content`
|
|
322
|
+
- `render <file.pdf> --page <N>` -> `get_page_render`
|
|
323
|
+
- `ocr <file.pdf> --page <N>` -> `get_page_ocr`
|
|
324
|
+
|
|
325
|
+
兼容边界:
|
|
326
|
+
|
|
327
|
+
- 旧的 `document get|index|structure|semantic|page|render|ocr ...` 仍作为兼容别名保留
|
|
328
|
+
- README 和 `--help` 现在优先展示这六个主命令,而不是旧的子命令树
|
|
329
|
+
|
|
330
|
+
建立本地索引并输出 metadata:
|
|
331
|
+
|
|
332
|
+
```bash
|
|
333
|
+
echo-pdf document ./sample.pdf
|
|
334
|
+
```
|
|
335
|
+
|
|
336
|
+
读取结构树:
|
|
337
|
+
|
|
338
|
+
```bash
|
|
339
|
+
echo-pdf structure ./sample.pdf
|
|
340
|
+
```
|
|
341
|
+
|
|
342
|
+
读取语义结构层:
|
|
343
|
+
|
|
344
|
+
```bash
|
|
345
|
+
echo-pdf semantic ./sample.pdf
|
|
346
|
+
```
|
|
347
|
+
|
|
348
|
+
读取指定页面内容:
|
|
349
|
+
|
|
350
|
+
```bash
|
|
351
|
+
echo-pdf page ./sample.pdf --page 1
|
|
352
|
+
```
|
|
353
|
+
|
|
354
|
+
生成页面渲染 artifact:
|
|
355
|
+
|
|
356
|
+
```bash
|
|
357
|
+
echo-pdf render ./sample.pdf --page 1 --scale 2
|
|
358
|
+
```
|
|
359
|
+
|
|
360
|
+
生成 OCR artifact(需要本地 provider key / model):
|
|
361
|
+
|
|
362
|
+
```bash
|
|
363
|
+
echo-pdf ocr ./sample.pdf --page 1 --model gpt-4.1-mini
|
|
364
|
+
```
|
|
365
|
+
|
|
366
|
+
自定义 artifact workspace:
|
|
367
|
+
|
|
368
|
+
```bash
|
|
369
|
+
echo-pdf document ./sample.pdf --workspace ./.cache/echo-pdf
|
|
370
|
+
```
|
|
371
|
+
|
|
372
|
+
## 3. MCP 使用(兼容保留,非本阶段重点)
|
|
163
373
|
|
|
164
374
|
### 3.1 检查 MCP 服务可用
|
|
165
375
|
|
|
@@ -201,7 +411,7 @@ echo-pdf mcp call --tool pdf_extract_pages --args '{
|
|
|
201
411
|
stdio 模式会把本地 `path/filePath` 自动上传为 `fileId` 后再调用远端工具。
|
|
202
412
|
|
|
203
413
|
```bash
|
|
204
|
-
echo-pdf mcp
|
|
414
|
+
echo-pdf mcp-stdio
|
|
205
415
|
```
|
|
206
416
|
|
|
207
417
|
生成 Claude Desktop/Cursor 等可用的 stdio 配置片段:
|
|
@@ -287,12 +497,18 @@ curl -sS -X POST https://echo-pdf.echofilesai.workers.dev/tools/call \
|
|
|
287
497
|
}'
|
|
288
498
|
```
|
|
289
499
|
|
|
290
|
-
CLI
|
|
500
|
+
CLI(默认不自动上传本地文件,需显式开启):
|
|
291
501
|
|
|
292
502
|
```bash
|
|
293
|
-
echo-pdf call --tool pdf_extract_pages --args '{"path":"./sample.pdf","pages":[1],"returnMode":"url"}'
|
|
503
|
+
echo-pdf call --tool pdf_extract_pages --auto-upload --args '{"path":"./sample.pdf","pages":[1],"returnMode":"url"}'
|
|
294
504
|
```
|
|
295
505
|
|
|
506
|
+
说明:
|
|
507
|
+
|
|
508
|
+
- `echo-pdf call` 默认禁用本地文件自动上传,避免误上传脚枪。
|
|
509
|
+
- 需要自动上传时,显式传 `--auto-upload`,CLI 会回显上传清单(本地路径 -> fileId)。
|
|
510
|
+
- 如果是本地 agent/IDE 场景,优先使用 `echo-pdf mcp-stdio`,它会按 MCP stdio 约定处理 `path/filePath` 自动上传。
|
|
511
|
+
|
|
296
512
|
下载产物:
|
|
297
513
|
|
|
298
514
|
```bash
|
package/bin/echo-pdf.js
CHANGED
|
@@ -4,7 +4,7 @@ import fs from "node:fs"
|
|
|
4
4
|
import os from "node:os"
|
|
5
5
|
import path from "node:path"
|
|
6
6
|
import { fileURLToPath } from "node:url"
|
|
7
|
-
import { downloadFile, postJson, uploadFile, withUploadedLocalFile } from "./lib/http.js"
|
|
7
|
+
import { downloadFile, postJson, prepareArgsWithLocalUploads, uploadFile, withUploadedLocalFile } from "./lib/http.js"
|
|
8
8
|
import { runMcpStdio } from "./lib/mcp-stdio.js"
|
|
9
9
|
|
|
10
10
|
const CONFIG_DIR = path.join(os.homedir(), ".config", "echo-pdf-cli")
|
|
@@ -208,10 +208,25 @@ const runDevServer = (port, host) => {
|
|
|
208
208
|
})
|
|
209
209
|
}
|
|
210
210
|
|
|
211
|
-
const
|
|
211
|
+
const printLocalServiceHints = (host, port) => {
|
|
212
|
+
const resolvedHost = host === "0.0.0.0" ? "127.0.0.1" : host
|
|
213
|
+
const baseUrl = `http://${resolvedHost}:${port}`
|
|
214
|
+
const mcpUrl = `${baseUrl}/mcp`
|
|
215
|
+
process.stdout.write(`\nLocal component endpoints:\n`)
|
|
216
|
+
process.stdout.write(` ECHO_PDF_BASE_URL=${baseUrl}\n`)
|
|
217
|
+
process.stdout.write(` ECHO_PDF_MCP_URL=${mcpUrl}\n`)
|
|
218
|
+
process.stdout.write(`\nExport snippet:\n`)
|
|
219
|
+
process.stdout.write(` export ECHO_PDF_BASE_URL=${baseUrl}\n`)
|
|
220
|
+
process.stdout.write(` export ECHO_PDF_MCP_URL=${mcpUrl}\n\n`)
|
|
221
|
+
}
|
|
222
|
+
|
|
223
|
+
const runMcpStdioCommand = async (serviceUrlOverride) => {
|
|
212
224
|
const config = loadConfig()
|
|
225
|
+
const serviceUrl = typeof serviceUrlOverride === "string" && serviceUrlOverride.trim().length > 0
|
|
226
|
+
? serviceUrlOverride.trim()
|
|
227
|
+
: config.serviceUrl
|
|
213
228
|
await runMcpStdio({
|
|
214
|
-
serviceUrl
|
|
229
|
+
serviceUrl,
|
|
215
230
|
headers: buildMcpHeaders(),
|
|
216
231
|
postJson,
|
|
217
232
|
withUploadedLocalFile,
|
|
@@ -308,9 +323,134 @@ const writeDevVarsConfigJson = (devVarsPath, configJson) => {
|
|
|
308
323
|
fs.writeFileSync(devVarsPath, lines.join("\n"))
|
|
309
324
|
}
|
|
310
325
|
|
|
326
|
+
const LOCAL_DOCUMENT_DIST_ENTRY = new URL("../dist/local/index.js", import.meta.url)
|
|
327
|
+
const LOCAL_DOCUMENT_SOURCE_ENTRY = new URL("../src/local/index.ts", import.meta.url)
|
|
328
|
+
const IS_BUN_RUNTIME = typeof process.versions?.bun === "string"
|
|
329
|
+
const SHOULD_PREFER_SOURCE_DOCUMENT_API = process.env.ECHO_PDF_SOURCE_DEV === "1"
|
|
330
|
+
|
|
331
|
+
const loadLocalDocumentApi = async () => {
|
|
332
|
+
if (SHOULD_PREFER_SOURCE_DOCUMENT_API) {
|
|
333
|
+
if (IS_BUN_RUNTIME && fs.existsSync(fileURLToPath(LOCAL_DOCUMENT_SOURCE_ENTRY))) {
|
|
334
|
+
return import(LOCAL_DOCUMENT_SOURCE_ENTRY.href)
|
|
335
|
+
}
|
|
336
|
+
throw new Error(
|
|
337
|
+
"Source-checkout document dev mode requires Bun and src/local/index.ts. " +
|
|
338
|
+
"Use `npm run document:dev -- <command> ...` from a source checkout."
|
|
339
|
+
)
|
|
340
|
+
}
|
|
341
|
+
try {
|
|
342
|
+
return await import(LOCAL_DOCUMENT_DIST_ENTRY.href)
|
|
343
|
+
} catch (error) {
|
|
344
|
+
const code = error && typeof error === "object" ? error.code : ""
|
|
345
|
+
if (code === "ERR_MODULE_NOT_FOUND") {
|
|
346
|
+
throw new Error(
|
|
347
|
+
"Local document commands require built artifacts in a source checkout. " +
|
|
348
|
+
"Run `npm run build` first, use `npm run document:dev -- <command> ...` in a source checkout, or install the published package."
|
|
349
|
+
)
|
|
350
|
+
}
|
|
351
|
+
throw error
|
|
352
|
+
}
|
|
353
|
+
}
|
|
354
|
+
|
|
355
|
+
const LOCAL_PRIMITIVE_COMMANDS = ["document", "structure", "semantic", "page", "render", "ocr"]
|
|
356
|
+
const LEGACY_DOCUMENT_SUBCOMMANDS = ["index", "get", "structure", "semantic", "page", "render", "ocr"]
|
|
357
|
+
|
|
358
|
+
const isLegacyDocumentSubcommand = (value) => typeof value === "string" && LEGACY_DOCUMENT_SUBCOMMANDS.includes(value)
|
|
359
|
+
|
|
360
|
+
const readDocumentPrimitiveArgs = (command, subcommand, rest) => {
|
|
361
|
+
if (command === "document" && isLegacyDocumentSubcommand(subcommand)) {
|
|
362
|
+
const primitive = subcommand === "index" || subcommand === "get" ? "document" : subcommand
|
|
363
|
+
return {
|
|
364
|
+
primitive,
|
|
365
|
+
pdfPath: rest[0],
|
|
366
|
+
}
|
|
367
|
+
}
|
|
368
|
+
return {
|
|
369
|
+
primitive: command,
|
|
370
|
+
pdfPath: command === "document" ? subcommand : rest[0],
|
|
371
|
+
}
|
|
372
|
+
}
|
|
373
|
+
|
|
374
|
+
const runLocalPrimitiveCommand = async (command, subcommand, rest, flags) => {
|
|
375
|
+
const local = await loadLocalDocumentApi()
|
|
376
|
+
const { primitive, pdfPath } = readDocumentPrimitiveArgs(command, subcommand, rest)
|
|
377
|
+
const workspaceDir = typeof flags.workspace === "string" ? flags.workspace : undefined
|
|
378
|
+
const forceRefresh = flags["force-refresh"] === true
|
|
379
|
+
const renderScale = typeof flags.scale === "string" ? Number(flags.scale) : undefined
|
|
380
|
+
|
|
381
|
+
if (typeof pdfPath !== "string" || pdfPath.length === 0 || pdfPath.startsWith("--")) {
|
|
382
|
+
throw new Error(`${primitive} requires a pdf path argument`)
|
|
383
|
+
}
|
|
384
|
+
|
|
385
|
+
if (primitive === "document") {
|
|
386
|
+
const data = await local.get_document({ pdfPath, workspaceDir, forceRefresh })
|
|
387
|
+
print(data)
|
|
388
|
+
return
|
|
389
|
+
}
|
|
390
|
+
|
|
391
|
+
if (primitive === "structure") {
|
|
392
|
+
const data = await local.get_document_structure({ pdfPath, workspaceDir, forceRefresh })
|
|
393
|
+
print(data)
|
|
394
|
+
return
|
|
395
|
+
}
|
|
396
|
+
|
|
397
|
+
if (primitive === "semantic") {
|
|
398
|
+
const data = await local.get_semantic_document_structure({
|
|
399
|
+
pdfPath,
|
|
400
|
+
workspaceDir,
|
|
401
|
+
forceRefresh,
|
|
402
|
+
provider: typeof flags.provider === "string" ? flags.provider : undefined,
|
|
403
|
+
model: typeof flags.model === "string" ? flags.model : undefined,
|
|
404
|
+
})
|
|
405
|
+
print(data)
|
|
406
|
+
return
|
|
407
|
+
}
|
|
408
|
+
|
|
409
|
+
const pageNumber = typeof flags.page === "string" ? Number(flags.page) : NaN
|
|
410
|
+
if (!Number.isInteger(pageNumber) || pageNumber < 1) {
|
|
411
|
+
throw new Error(`${primitive} requires --page <positive integer>`)
|
|
412
|
+
}
|
|
413
|
+
|
|
414
|
+
if (primitive === "page") {
|
|
415
|
+
const data = await local.get_page_content({ pdfPath, workspaceDir, forceRefresh, pageNumber })
|
|
416
|
+
print(data)
|
|
417
|
+
return
|
|
418
|
+
}
|
|
419
|
+
|
|
420
|
+
if (primitive === "render") {
|
|
421
|
+
const data = await local.get_page_render({ pdfPath, workspaceDir, forceRefresh, pageNumber, renderScale })
|
|
422
|
+
print(data)
|
|
423
|
+
return
|
|
424
|
+
}
|
|
425
|
+
|
|
426
|
+
if (primitive === "ocr") {
|
|
427
|
+
const data = await local.get_page_ocr({
|
|
428
|
+
pdfPath,
|
|
429
|
+
workspaceDir,
|
|
430
|
+
forceRefresh,
|
|
431
|
+
pageNumber,
|
|
432
|
+
renderScale,
|
|
433
|
+
provider: typeof flags.provider === "string" ? flags.provider : undefined,
|
|
434
|
+
model: typeof flags.model === "string" ? flags.model : undefined,
|
|
435
|
+
prompt: typeof flags.prompt === "string" ? flags.prompt : undefined,
|
|
436
|
+
})
|
|
437
|
+
print(data)
|
|
438
|
+
return
|
|
439
|
+
}
|
|
440
|
+
|
|
441
|
+
throw new Error(`Unsupported local primitive command: ${primitive}`)
|
|
442
|
+
}
|
|
443
|
+
|
|
311
444
|
const usage = () => {
|
|
312
445
|
process.stdout.write(`echo-pdf CLI\n\n`)
|
|
313
446
|
process.stdout.write(`Commands:\n`)
|
|
447
|
+
process.stdout.write(` document <file.pdf> [--workspace DIR] [--force-refresh]\n`)
|
|
448
|
+
process.stdout.write(` structure <file.pdf> [--workspace DIR] [--force-refresh]\n`)
|
|
449
|
+
process.stdout.write(` semantic <file.pdf> [--provider alias] [--model model] [--workspace DIR] [--force-refresh]\n`)
|
|
450
|
+
process.stdout.write(` page <file.pdf> --page <N> [--workspace DIR] [--force-refresh]\n`)
|
|
451
|
+
process.stdout.write(` render <file.pdf> --page <N> [--scale N] [--workspace DIR] [--force-refresh]\n`)
|
|
452
|
+
process.stdout.write(` ocr <file.pdf> --page <N> [--scale N] [--provider alias] [--model model] [--prompt text] [--workspace DIR] [--force-refresh]\n`)
|
|
453
|
+
process.stdout.write(`\nCompatibility / existing service commands:\n`)
|
|
314
454
|
process.stdout.write(` init [--service-url URL]\n`)
|
|
315
455
|
process.stdout.write(` dev [--port 8788] [--host 127.0.0.1]\n`)
|
|
316
456
|
process.stdout.write(` provider set --provider <${PROVIDER_SET_NAMES.join("|")}> --api-key <KEY> [--profile name]\n`)
|
|
@@ -322,12 +462,19 @@ const usage = () => {
|
|
|
322
462
|
process.stdout.write(` model get [--provider alias] [--profile name]\n`)
|
|
323
463
|
process.stdout.write(` model list [--profile name]\n`)
|
|
324
464
|
process.stdout.write(` tools\n`)
|
|
325
|
-
process.stdout.write(` call --tool <name> --args '<json>' [--provider alias] [--model model] [--profile name]\n`)
|
|
465
|
+
process.stdout.write(` call --tool <name> --args '<json>' [--provider alias] [--model model] [--profile name] [--auto-upload]\n`)
|
|
466
|
+
process.stdout.write(` document get <file.pdf> [--workspace DIR] [--force-refresh]\n`)
|
|
467
|
+
process.stdout.write(` document structure <file.pdf> [--workspace DIR] [--force-refresh]\n`)
|
|
468
|
+
process.stdout.write(` document semantic <file.pdf> [--provider alias] [--model model] [--workspace DIR] [--force-refresh]\n`)
|
|
469
|
+
process.stdout.write(` document page <file.pdf> --page <N> [--workspace DIR] [--force-refresh]\n`)
|
|
470
|
+
process.stdout.write(` document render <file.pdf> --page <N> [--scale N] [--workspace DIR] [--force-refresh]\n`)
|
|
471
|
+
process.stdout.write(` document ocr <file.pdf> --page <N> [--scale N] [--provider alias] [--model model] [--prompt text] [--workspace DIR] [--force-refresh]\n`)
|
|
326
472
|
process.stdout.write(` file upload <local.pdf>\n`)
|
|
327
473
|
process.stdout.write(` file get --file-id <id> --out <path>\n`)
|
|
328
474
|
process.stdout.write(` mcp initialize\n`)
|
|
329
475
|
process.stdout.write(` mcp tools\n`)
|
|
330
476
|
process.stdout.write(` mcp call --tool <name> --args '<json>'\n`)
|
|
477
|
+
process.stdout.write(` mcp-stdio [--service-url URL]\n`)
|
|
331
478
|
process.stdout.write(` mcp stdio\n`)
|
|
332
479
|
process.stdout.write(` setup add <claude-desktop|claude-code|cursor|cline|windsurf|gemini|json>\n`)
|
|
333
480
|
}
|
|
@@ -338,7 +485,7 @@ const setupSnippet = (tool, serviceUrl, mode = "http") => {
|
|
|
338
485
|
mcpServers: {
|
|
339
486
|
"echo-pdf": {
|
|
340
487
|
command: "echo-pdf",
|
|
341
|
-
args: ["mcp
|
|
488
|
+
args: ["mcp-stdio"],
|
|
342
489
|
env: {
|
|
343
490
|
ECHO_PDF_SERVICE_URL: serviceUrl,
|
|
344
491
|
},
|
|
@@ -416,7 +563,7 @@ const main = async () => {
|
|
|
416
563
|
const [command, ...raw] = argv
|
|
417
564
|
let subcommand = ""
|
|
418
565
|
let rest = raw
|
|
419
|
-
if (["provider", "mcp", "setup", "model", "config"].includes(command)) {
|
|
566
|
+
if (["provider", "mcp", "setup", "model", "config", "document"].includes(command)) {
|
|
420
567
|
subcommand = raw[0] || ""
|
|
421
568
|
rest = raw.slice(1)
|
|
422
569
|
}
|
|
@@ -436,10 +583,16 @@ const main = async () => {
|
|
|
436
583
|
const port = typeof flags.port === "string" ? Number(flags.port) : 8788
|
|
437
584
|
const host = typeof flags.host === "string" ? flags.host : "127.0.0.1"
|
|
438
585
|
if (!Number.isFinite(port) || port <= 0) throw new Error("dev --port must be positive number")
|
|
586
|
+
printLocalServiceHints(host, Math.floor(port))
|
|
439
587
|
runDevServer(Math.floor(port), host)
|
|
440
588
|
return
|
|
441
589
|
}
|
|
442
590
|
|
|
591
|
+
if (command === "mcp-stdio") {
|
|
592
|
+
await runMcpStdioCommand(typeof flags["service-url"] === "string" ? flags["service-url"] : undefined)
|
|
593
|
+
return
|
|
594
|
+
}
|
|
595
|
+
|
|
443
596
|
if (command === "provider" && subcommand === "set") {
|
|
444
597
|
const providerAlias = resolveProviderAliasInput(flags.provider)
|
|
445
598
|
const apiKey = flags["api-key"]
|
|
@@ -566,6 +719,11 @@ const main = async () => {
|
|
|
566
719
|
return
|
|
567
720
|
}
|
|
568
721
|
|
|
722
|
+
if (LOCAL_PRIMITIVE_COMMANDS.includes(command) || (command === "document" && isLegacyDocumentSubcommand(subcommand))) {
|
|
723
|
+
await runLocalPrimitiveCommand(command, subcommand, rest, flags)
|
|
724
|
+
return
|
|
725
|
+
}
|
|
726
|
+
|
|
569
727
|
if (command === "call") {
|
|
570
728
|
const config = loadConfig()
|
|
571
729
|
const profileName = getProfileName(config, flags.profile)
|
|
@@ -573,7 +731,17 @@ const main = async () => {
|
|
|
573
731
|
const tool = flags.tool
|
|
574
732
|
if (typeof tool !== "string") throw new Error("call requires --tool")
|
|
575
733
|
const args = typeof flags.args === "string" ? JSON.parse(flags.args) : {}
|
|
576
|
-
const
|
|
734
|
+
const autoUpload = flags["auto-upload"] === true
|
|
735
|
+
const prepared = await prepareArgsWithLocalUploads(config.serviceUrl, tool, args, {
|
|
736
|
+
autoUpload,
|
|
737
|
+
})
|
|
738
|
+
if (prepared.uploads.length > 0) {
|
|
739
|
+
process.stderr.write(`[echo-pdf] auto-uploaded local files:\n`)
|
|
740
|
+
for (const item of prepared.uploads) {
|
|
741
|
+
process.stderr.write(` - ${item.localPath} -> ${item.fileId} (${item.tool})\n`)
|
|
742
|
+
}
|
|
743
|
+
}
|
|
744
|
+
const preparedArgs = prepared.args
|
|
577
745
|
const provider = resolveProviderAlias(profile, flags.provider)
|
|
578
746
|
const model = typeof flags.model === "string" ? flags.model : resolveDefaultModel(profile, provider)
|
|
579
747
|
const providerApiKeys = buildProviderApiKeys(config, profileName)
|
|
@@ -638,7 +806,7 @@ const main = async () => {
|
|
|
638
806
|
}
|
|
639
807
|
|
|
640
808
|
if (command === "mcp" && subcommand === "stdio") {
|
|
641
|
-
await runMcpStdioCommand()
|
|
809
|
+
await runMcpStdioCommand(typeof flags["service-url"] === "string" ? flags["service-url"] : undefined)
|
|
642
810
|
return
|
|
643
811
|
}
|
|
644
812
|
|
package/bin/lib/http.js
CHANGED
|
@@ -53,20 +53,45 @@ export const downloadFile = async (serviceUrl, fileId, outputPath) => {
|
|
|
53
53
|
return absOut
|
|
54
54
|
}
|
|
55
55
|
|
|
56
|
-
|
|
56
|
+
const parseAutoUploadFlag = (value) => {
|
|
57
|
+
if (value === true) return true
|
|
58
|
+
if (typeof value === "string") {
|
|
59
|
+
const normalized = value.trim().toLowerCase()
|
|
60
|
+
return normalized === "1" || normalized === "true" || normalized === "yes" || normalized === "on"
|
|
61
|
+
}
|
|
62
|
+
return false
|
|
63
|
+
}
|
|
64
|
+
|
|
65
|
+
export const prepareArgsWithLocalUploads = async (serviceUrl, tool, args, options = {}) => {
|
|
57
66
|
const nextArgs = { ...(args || {}) }
|
|
67
|
+
const uploads = []
|
|
68
|
+
const autoUploadEnabled = options.autoUpload !== false
|
|
58
69
|
if (tool.startsWith("pdf_")) {
|
|
59
70
|
const localPath = typeof nextArgs.path === "string"
|
|
60
71
|
? nextArgs.path
|
|
61
72
|
: (typeof nextArgs.filePath === "string" ? nextArgs.filePath : "")
|
|
62
73
|
if (localPath && !nextArgs.fileId && !nextArgs.url && !nextArgs.base64) {
|
|
74
|
+
if (!autoUploadEnabled) {
|
|
75
|
+
throw new Error(
|
|
76
|
+
"Local file auto-upload is disabled for `echo-pdf call`. " +
|
|
77
|
+
"Use --auto-upload, or upload first (`echo-pdf file upload`) and pass fileId, or use `echo-pdf mcp-stdio`."
|
|
78
|
+
)
|
|
79
|
+
}
|
|
63
80
|
const upload = await uploadFile(serviceUrl, localPath)
|
|
64
81
|
const fileId = upload?.file?.id
|
|
65
82
|
if (!fileId) throw new Error(`upload failed for local path: ${localPath}`)
|
|
66
83
|
nextArgs.fileId = fileId
|
|
67
84
|
delete nextArgs.path
|
|
68
85
|
delete nextArgs.filePath
|
|
86
|
+
uploads.push({ tool, localPath, fileId })
|
|
69
87
|
}
|
|
70
88
|
}
|
|
89
|
+
return { args: nextArgs, uploads }
|
|
90
|
+
}
|
|
91
|
+
|
|
92
|
+
export const withUploadedLocalFile = async (serviceUrl, tool, args, options = {}) => {
|
|
93
|
+
const { args: nextArgs } = await prepareArgsWithLocalUploads(serviceUrl, tool, args, {
|
|
94
|
+
autoUpload: parseAutoUploadFlag(options.autoUpload ?? true),
|
|
95
|
+
})
|
|
71
96
|
return nextArgs
|
|
72
97
|
}
|
package/dist/auth.js
CHANGED
|
@@ -1,7 +1,19 @@
|
|
|
1
1
|
export const checkHeaderAuth = (request, env, options) => {
|
|
2
|
-
|
|
2
|
+
const authHeader = typeof options.authHeader === "string" ? options.authHeader.trim() : "";
|
|
3
|
+
const authEnv = typeof options.authEnv === "string" ? options.authEnv.trim() : "";
|
|
4
|
+
const hasHeader = authHeader.length > 0;
|
|
5
|
+
const hasEnv = authEnv.length > 0;
|
|
6
|
+
if (!hasHeader && !hasEnv)
|
|
3
7
|
return { ok: true };
|
|
4
|
-
|
|
8
|
+
if (!hasHeader || !hasEnv) {
|
|
9
|
+
return {
|
|
10
|
+
ok: false,
|
|
11
|
+
status: 500,
|
|
12
|
+
code: options.misconfiguredCode,
|
|
13
|
+
message: `${options.contextName} auth must configure both authHeader and authEnv`,
|
|
14
|
+
};
|
|
15
|
+
}
|
|
16
|
+
const required = env[authEnv];
|
|
5
17
|
if (typeof required !== "string" || required.length === 0) {
|
|
6
18
|
if (options.allowMissingSecret === true)
|
|
7
19
|
return { ok: true };
|
|
@@ -9,10 +21,10 @@ export const checkHeaderAuth = (request, env, options) => {
|
|
|
9
21
|
ok: false,
|
|
10
22
|
status: 500,
|
|
11
23
|
code: options.misconfiguredCode,
|
|
12
|
-
message: `${options.contextName} auth is configured but env "${
|
|
24
|
+
message: `${options.contextName} auth is configured but env "${authEnv}" is missing`,
|
|
13
25
|
};
|
|
14
26
|
}
|
|
15
|
-
if (request.headers.get(
|
|
27
|
+
if (request.headers.get(authHeader) !== required) {
|
|
16
28
|
return {
|
|
17
29
|
ok: false,
|
|
18
30
|
status: 401,
|