@chenchaolong/plugin-mineru 0.0.12 → 1.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -1,113 +1,101 @@
1
- # Xpert Plugin: MinerU
2
-
3
- `@chenchaolong/plugin-mineru` is a MinerU document converter plugin for the [Xpert AI](https://github.com/xpert-ai/xpert) platform, providing extraction capabilities from PDF to Markdown and structured JSON. The plugin includes built-in MinerU integration strategies, document conversion strategies, and result parsing services, enabling secure access to the MinerU API in automated workflows, polling task status, and writing parsed content and attachment resources to the platform file system.
4
-
5
- ## Installation
6
-
7
- ```bash
8
- pnpm add @chenchaolong/plugin-mineru
9
- # or
10
- npm install @chenchaolong/plugin-mineru
11
- ```
12
-
13
- > **Note**: This plugin depends on `@xpert-ai/plugin-sdk`, `@nestjs/common@^11`, `@nestjs/config@^4`, `@metad/contracts`, `axios@1`, `chalk@4`, `@langchain/core@^0.3.72`, and `uuid@8` as peerDependencies. Please ensure these packages are installed in your host project.
14
-
15
- ## Quick Start
16
-
17
- 1. **Prepare MinerU Credentials**
18
- Obtain a valid API Key from the MinerU dashboard and confirm the service address (default: `https://mineru.net/api/v4`).
19
-
20
- 2. **Configure Integration in Xpert**
21
- - Via Xpert Console: Create a MinerU integration and fill in the following fields.
22
- - Or set environment variables in your deployment environment:
23
- - `MINERU_API_BASE_URL`: Optional, defaults to `https://mineru.net/api/v4`.
24
- - `MINERU_API_TOKEN`: Required, used as a fallback credential if no integration is configured.
25
-
26
- Example integration configuration (JSON):
27
-
28
- ```json
29
- {
30
- "provider": "mineru",
31
- "options": {
32
- "apiUrl": "https://mineru.net/api/v4",
33
- "apiKey": "your-mineru-api-key"
34
- }
35
- }
36
- ```
37
-
38
- 3. **Register the Plugin**
39
- Configure the plugin in your host service's plugin registration process:
40
-
41
- ```sh .env
42
- PLUGINS=@chenchaolong/plugin-mineru
43
- ```
44
-
45
- The plugin returns the NestJS module `MinerUPlugin` in the `register` hook and logs messages during the `onStart`/`onStop` lifecycle.
46
-
47
- ## MinerU Integration Options
48
-
49
- | Field | Type | Description | Required | Default |
50
- | -------- | ------ | ------------------------------------- | -------- | ---------------------------- |
51
- | apiUrl | string | MinerU API base URL | No | `https://mineru.net/api/v4` |
52
- | apiKey | string | MinerU service API Key (keep secret) | Yes | — |
53
-
54
- > If both integration configuration and environment variables are provided, options from the integration configuration take precedence.
55
-
56
- ## Document Conversion Parameters
57
-
58
- `MinerUTransformerStrategy` supports the following configuration options (passed to the MinerU API when starting a workflow):
59
-
60
- | Field | Type | Default | Description |
61
- | ---------------- | ------- | ------------ | --------------------------------------------------- |
62
- | `isOcr` | boolean | `true` | Enable OCR for image-based PDFs. |
63
- | `enableFormula` | boolean | `true` | Recognize mathematical formulas and output tags. |
64
- | `enableTable` | boolean | `true` | Recognize tables and output structured tags. |
65
- | `language` | string | `"ch"` | Main document language, per MinerU API (`en`/`ch`). |
66
- | `modelVersion` | string | `"pipeline"` | MinerU model version (`pipeline`, `vlm`, etc.). |
67
-
68
- By default, the plugin creates MinerU tasks for each file to be processed, polls until `full_zip_url` is returned, then downloads and parses the zip package in memory.
69
-
70
- ## Permissions
71
-
72
- - **Integration**: Access MinerU integration configuration to read API address and credentials.
73
- - **File System**: Perform `read/write/list` on `XpFileSystem` to store image resources from MinerU results.
74
-
75
- Ensure the plugin is granted these permissions in your authorization policy, or it will not be able to retrieve results or write attachments.
76
-
77
- ## Output Content
78
-
79
- The parser generates:
80
-
81
- - Full Markdown: Resource links are automatically replaced to point to actual URLs written via `XpFileSystem`.
82
- - Structured metadata: Includes MinerU task ID, layout JSON (`layout.json`), content list (`content_list.json`), original PDF filename, etc.
83
- - Attachment asset list: Records written image resources for easy association by callers.
84
-
85
- The returned `Document<ChunkMetadata>` array currently defaults to a single chunk containing the full Markdown; you can split it as needed.
86
-
87
- ## Local Deployment
88
-
89
- For self-hosted MinerU deployments, see [LOCAL_SETUP.md](./LOCAL_SETUP.md) for detailed instructions on:
90
- - Starting MinerU server using Docker
91
- - Installing from source code
92
- - Configuration and troubleshooting
93
-
94
- Quick start with Docker:
95
- ```bash
96
- docker run -d --name mineru -p 9960:9960 opendatalab/mineru:latest
97
- ```
98
-
99
- ## Development & Debugging
100
-
101
- Run the following commands in the repository root to build and test locally:
102
-
103
- ```bash
104
- npm install
105
- npx nx build @chenchaolong/plugin-mineru
106
- npx nx test @chenchaolong/plugin-mineru
107
- ```
108
-
109
- TypeScript build artifacts are output to `packages/mineru/dist`. Before publishing, ensure `package.json`, type declarations, and runtime files are in sync.
110
-
111
- ## License
112
-
113
- This project follows the [AGPL-3.0 License](../../../LICENSE) in the repository root.
1
+ # Xpert Plugin: MinerU
2
+
3
+ `@chenchaolong/plugin-mineru` is a MinerU document converter plugin for the [Xpert AI](https://github.com/xpert-ai/xpert) platform, providing extraction capabilities from PDF to Markdown and structured JSON. The plugin includes built-in MinerU integration strategies, document conversion strategies, and result parsing services, enabling secure access to the MinerU API in automated workflows, polling task status, and writing parsed content and attachment resources to the platform file system.
4
+
5
+ ## Installation
6
+
7
+ ```bash
8
+ pnpm add @chenchaolong/plugin-mineru
9
+ # or
10
+ npm install @chenchaolong/plugin-mineru
11
+ ```
12
+
13
+ > **Note**: This plugin depends on `@xpert-ai/plugin-sdk`, `@nestjs/common@^11`, `@nestjs/config@^4`, `@metad/contracts`, `axios@1`, `chalk@4`, `@langchain/core@^0.3.72`, and `uuid@8` as peerDependencies. Please ensure these packages are installed in your host project.
14
+
15
+ ## Quick Start
16
+
17
+ 1. **Prepare MinerU Credentials**
18
+ Obtain a valid API Key from the MinerU dashboard and confirm the service address (default: `https://mineru.net/api/v4`).
19
+
20
+ 2. **Configure Integration in Xpert**
21
+ - Via Xpert Console: Create a MinerU integration and fill in the following fields.
22
+ - Or set environment variables in your deployment environment:
23
+ - `MINERU_API_BASE_URL`: Optional, defaults to `https://mineru.net/api/v4`.
24
+ - `MINERU_API_TOKEN`: Required, used as a fallback credential if no integration is configured.
25
+
26
+ Example integration configuration (JSON):
27
+
28
+ ```json
29
+ {
30
+ "provider": "mineru",
31
+ "options": {
32
+ "apiUrl": "https://mineru.net/api/v4",
33
+ "apiKey": "your-mineru-api-key"
34
+ }
35
+ }
36
+ ```
37
+
38
+ 3. **Register the Plugin**
39
+ Configure the plugin in your host service's plugin registration process:
40
+
41
+ ```sh .env
42
+ PLUGINS=@chenchaolong/plugin-mineru
43
+ ```
44
+
45
+ The plugin returns the NestJS module `MinerUPlugin` in the `register` hook and logs messages during the `onStart`/`onStop` lifecycle.
46
+
47
+ ## MinerU Integration Options
48
+
49
+ | Field | Type | Description | Required | Default |
50
+ | -------- | ------ | ------------------------------------- | -------- | ---------------------------- |
51
+ | apiUrl | string | MinerU API base URL | No | `https://mineru.net/api/v4` |
52
+ | apiKey | string | MinerU service API Key (keep secret) | Yes | — |
53
+
54
+ > If both integration configuration and environment variables are provided, options from the integration configuration take precedence.
55
+
56
+ ## Document Conversion Parameters
57
+
58
+ `MinerUTransformerStrategy` supports the following configuration options (passed to the MinerU API when starting a workflow):
59
+
60
+ | Field | Type | Default | Description |
61
+ | ---------------- | ------- | ------------ | --------------------------------------------------- |
62
+ | `isOcr` | boolean | `true` | Enable OCR for image-based PDFs. |
63
+ | `enableFormula` | boolean | `true` | Recognize mathematical formulas and output tags. |
64
+ | `enableTable` | boolean | `true` | Recognize tables and output structured tags. |
65
+ | `language` | string | `"ch"` | Main document language, per MinerU API (`en`/`ch`). |
66
+ | `modelVersion` | string | `"pipeline"` | MinerU model version (`pipeline`, `vlm`, etc.). |
67
+
68
+ By default, the plugin creates MinerU tasks for each file to be processed, polls until `full_zip_url` is returned, then downloads and parses the zip package in memory.
69
+
70
+ ## Permissions
71
+
72
+ - **Integration**: Access MinerU integration configuration to read API address and credentials.
73
+ - **File System**: Perform `read/write/list` on `XpFileSystem` to store image resources from MinerU results.
74
+
75
+ Ensure the plugin is granted these permissions in your authorization policy, or it will not be able to retrieve results or write attachments.
76
+
77
+ ## Output Content
78
+
79
+ The parser generates:
80
+
81
+ - Full Markdown: Resource links are automatically replaced to point to actual URLs written via `XpFileSystem`.
82
+ - Structured metadata: Includes MinerU task ID, layout JSON (`layout.json`), content list (`content_list.json`), original PDF filename, etc.
83
+ - Attachment asset list: Records written image resources for easy association by callers.
84
+
85
+ The returned `Document<ChunkMetadata>` array currently defaults to a single chunk containing the full Markdown; you can split it as needed.
86
+
87
+ ## Development & Debugging
88
+
89
+ Run the following commands in the repository root to build and test locally:
90
+
91
+ ```bash
92
+ npm install
93
+ npx nx build @chenchaolong/plugin-mineru
94
+ npx nx test @chenchaolong/plugin-mineru
95
+ ```
96
+
97
+ TypeScript build artifacts are output to `packages/mineru/dist`. Before publishing, ensure `package.json`, type declarations, and runtime files are in sync.
98
+
99
+ ## License
100
+
101
+ This project follows the [AGPL-3.0 License](../../../LICENSE) in the repository root.
@@ -1 +1 @@
1
- {"version":3,"file":"mineru.client.d.ts","sourceRoot":"","sources":["../../src/lib/mineru.client.ts"],"names":[],"mappings":"AAAA,OAAO,EAAE,YAAY,EAAE,MAAM,kBAAkB,CAAC;AAEhD,OAAO,EAAE,aAAa,EAAE,MAAM,gBAAgB,CAAC;AAC/C,OAAO,EAAmB,YAAY,EAAE,MAAM,sBAAsB,CAAC;AACrE,OAAc,EAAE,aAAa,EAAE,MAAM,OAAO,CAAC;AAK7C,OAAO,EAIL,wBAAwB,EAExB,0BAA0B,EAC1B,gBAAgB,EACjB,MAAM,YAAY,CAAC;AAIpB,UAAU,iBAAiB;IACzB,GAAG,CAAC,EAAE,MAAM,CAAC;IACb,QAAQ,CAAC,EAAE,MAAM,CAAC;IAClB,QAAQ,CAAC,EAAE,MAAM,CAAC;IAClB,KAAK,CAAC,EAAE,OAAO,CAAC;IAChB,aAAa,CAAC,EAAE,OAAO,CAAC;IACxB,WAAW,CAAC,EAAE,OAAO,CAAC;IACtB,QAAQ,CAAC,EAAE,MAAM,CAAC;IAClB,YAAY,CAAC,EAAE,MAAM,CAAC;IACtB,MAAM,CAAC,EAAE,MAAM,CAAC;IAChB,UAAU,CAAC,EAAE,MAAM,CAAC;IACpB,YAAY,CAAC,EAAE,MAAM,EAAE,CAAC;IACxB,WAAW,CAAC,EAAE,MAAM,CAAC;IACrB,IAAI,CAAC,EAAE,MAAM,CAAC;IACd,mEAAmE;IACnE,WAAW,CAAC,EAAE,MAAM,CAAC;IACrB,yEAAyE;IACzE,OAAO,CAAC,EAAE,MAAM,CAAC;IACjB,2EAA2E;IAC3E,SAAS,CAAC,EAAE,MAAM,CAAC;IACnB,4EAA4E;IAC5E,gBAAgB,CAAC,EAAE,OAAO,CAAC;CAC5B;AAED,UAAU,mBAAmB;IAC3B,GAAG,EAAE,MAAM,CAAC;IACZ,KAAK,CAAC,EAAE,OAAO,CAAC;IAChB,MAAM,CAAC,EAAE,MAAM,CAAC;IAChB,UAAU,CAAC,EAAE,MAAM,CAAC;CACrB;AAED,UAAU,sBAAsB;IAC9B,KAAK,EAAE,mBAAmB,EAAE,CAAC;IAC7B,aAAa,CAAC,EAAE,OAAO,CAAC;IACxB,WAAW,CAAC,EAAE,OAAO,CAAC;IACtB,QAAQ,CAAC,EAAE,MAAM,CAAC;IAClB,YAAY,CAAC,EAAE,MAAM,CAAC;IACtB,YAAY,CAAC,EAAE,MAAM,EAAE,CAAC;IACxB,WAAW,CAAC,EAAE,MAAM,CAAC;IACrB,IAAI,CAAC,EAAE,MAAM,CAAC;CACf;AAED,UAAU,iBAAiB;IACzB,aAAa,CAAC,EAAE,OAAO,CAAC;IACxB,WAAW,CAAC,EAAE,OAAO,CAAC;IACtB,QAAQ,CAAC,EAAE,MAAM,CAAC;CACnB;AASD,qBAAa,YAAY;IAWrB,OAAO,CAAC,QAAQ,CAAC,aAAa;IAC9B,OAAO,CAAC,QAAQ,CAAC,WAAW,CAAC;IAX/B,OAAO,CAAC,QAAQ,CAAC,MAAM,CAAiC;IACxD,OAAO,CAAC,QAAQ,CAAC,OAAO,CAAS;IACjC,OAAO,CAAC,QAAQ,CAAC,KAAK,CAAC,CAAS;IAChC,SAAgB,UAAU,EAAE,gBAAgB,CAAC;IAC7C,OAAO,CAAC,QAAQ,CAAC,UAAU,CAAiD;IAE5E,IAAI,UAAU,IAAI,YAAY,GAAG,SAAS,CAEzC;gBAEkB,aAAa,EAAE,aAAa,EAC5B,WAAW,CAAC,EAAE;QACvB,UAAU,CAAC,EAAE,YAAY,CAAC;QAC1B,WAAW,CAAC,EAAE,OAAO,CAAC,YAAY,CAAC,wBAAwB,CAAC,CAAC,CAAC;KACjE;IAkBP;;;OAGG;IACG,UAAU,CAAC,OAAO,EAAE,iBAAiB,GAAG,OAAO,CAAC;QAAE,MAAM,EAAE,MAAM,CAAA;KAAE,CAAC;IAYzE;;OAEG;IACG,eAAe,CAAC,OAAO,EAAE,sBAAsB,GAAG,OAAO,CAAC;QAAE,OAAO,EAAE,MAAM,CAAC;QAAC,QAAQ,CAAC,EAAE,MAAM,EAAE,CAAA;KAAE,CAAC;IA+CzG,iBAAiB,CAAC,MAAM,EAAE,MAAM,GAAG,0BAA0B,GAAG,SAAS;IAOzE;;OAEG;IACG,aAAa,CAAC,MAAM,EAAE,MAAM,EAAE,OAAO,CAAC,EAAE,iBAAiB,GAAG,OAAO,CAAC;QACxE,YAAY,CAAC,EAAE,MAAM,CAAC;QACtB,QAAQ,CAAC,EAAE,MAAM,CAAC;QAClB,OAAO,CAAC,EAAE,MAAM,CAAC;QACjB,MAAM,CAAC,EAAE,MAAM,CAAC;KACjB,CAAC;IAoBF;;OAEG;IACG,cAAc,CAAC,OAAO,EAAE,MAAM,GAAG,OAAO,CAAC,GAAG,CAAC;IAiBnD;;OAEG;IACG,WAAW,CAAC,MAAM,EAAE,MAAM,EAAE,SAAS,SAAgB,EAAE,UAAU,SAAO,GAAG,OAAO,CAAC,GAAG,CAAC;IAsB7F,OAAO,CAAC,cAAc;IAMtB,OAAO,CAAC,iBAAiB;IAczB,OAAO,CAAC,kBAAkB;IAyB1B,OAAO,CAAC,sBAAsB;IAI9B,OAAO,CAAC,gBAAgB;IAIxB,OAAO,CAAC,WAAW;IAQnB,OAAO,CAAC,kBAAkB;IAO1B,OAAO,CAAC,oBAAoB;YAYd,kBAAkB;YAmClB,oBAAoB;YASpB,qBAAqB;YA0DrB,uBAAuB;IA+CrC,OAAO,CAAC,iBAAiB;IAgBzB,OAAO,CAAC,2BAA2B;IAenC,OAAO,CAAC,6BAA6B;IAcrC,OAAO,CAAC,iBAAiB;IAQzB,OAAO,CAAC,aAAa;IAcrB,OAAO,CAAC,iBAAiB;IAQzB,OAAO,CAAC,eAAe;YAIT,YAAY;IAkB1B,OAAO,CAAC,eAAe;IA0BvB,wBAAwB,IAAI,OAAO,CAAC,aAAa,CAAC,GAAG,EAAE,GAAG,CAAC,CAAC;IAKtD,wBAAwB;CAU/B"}
1
+ {"version":3,"file":"mineru.client.d.ts","sourceRoot":"","sources":["../../src/lib/mineru.client.ts"],"names":[],"mappings":"AAAA,OAAO,EAAE,YAAY,EAAE,MAAM,kBAAkB,CAAC;AAEhD,OAAO,EAAE,aAAa,EAAE,MAAM,gBAAgB,CAAC;AAC/C,OAAO,EAAmB,YAAY,EAAE,MAAM,sBAAsB,CAAC;AACrE,OAAc,EAAE,aAAa,EAAE,MAAM,OAAO,CAAC;AAK7C,OAAO,EAIL,wBAAwB,EAExB,0BAA0B,EAC1B,gBAAgB,EACjB,MAAM,YAAY,CAAC;AAIpB,UAAU,iBAAiB;IACzB,GAAG,CAAC,EAAE,MAAM,CAAC;IACb,QAAQ,CAAC,EAAE,MAAM,CAAC;IAClB,QAAQ,CAAC,EAAE,MAAM,CAAC;IAClB,KAAK,CAAC,EAAE,OAAO,CAAC;IAChB,aAAa,CAAC,EAAE,OAAO,CAAC;IACxB,WAAW,CAAC,EAAE,OAAO,CAAC;IACtB,QAAQ,CAAC,EAAE,MAAM,CAAC;IAClB,YAAY,CAAC,EAAE,MAAM,CAAC;IACtB,MAAM,CAAC,EAAE,MAAM,CAAC;IAChB,UAAU,CAAC,EAAE,MAAM,CAAC;IACpB,YAAY,CAAC,EAAE,MAAM,EAAE,CAAC;IACxB,WAAW,CAAC,EAAE,MAAM,CAAC;IACrB,IAAI,CAAC,EAAE,MAAM,CAAC;IACd,mEAAmE;IACnE,WAAW,CAAC,EAAE,MAAM,CAAC;IACrB,yEAAyE;IACzE,OAAO,CAAC,EAAE,MAAM,CAAC;IACjB,2EAA2E;IAC3E,SAAS,CAAC,EAAE,MAAM,CAAC;IACnB,4EAA4E;IAC5E,gBAAgB,CAAC,EAAE,OAAO,CAAC;CAC5B;AAED,UAAU,mBAAmB;IAC3B,GAAG,EAAE,MAAM,CAAC;IACZ,KAAK,CAAC,EAAE,OAAO,CAAC;IAChB,MAAM,CAAC,EAAE,MAAM,CAAC;IAChB,UAAU,CAAC,EAAE,MAAM,CAAC;CACrB;AAED,UAAU,sBAAsB;IAC9B,KAAK,EAAE,mBAAmB,EAAE,CAAC;IAC7B,aAAa,CAAC,EAAE,OAAO,CAAC;IACxB,WAAW,CAAC,EAAE,OAAO,CAAC;IACtB,QAAQ,CAAC,EAAE,MAAM,CAAC;IAClB,YAAY,CAAC,EAAE,MAAM,CAAC;IACtB,YAAY,CAAC,EAAE,MAAM,EAAE,CAAC;IACxB,WAAW,CAAC,EAAE,MAAM,CAAC;IACrB,IAAI,CAAC,EAAE,MAAM,CAAC;CACf;AAED,UAAU,iBAAiB;IACzB,aAAa,CAAC,EAAE,OAAO,CAAC;IACxB,WAAW,CAAC,EAAE,OAAO,CAAC;IACtB,QAAQ,CAAC,EAAE,MAAM,CAAC;CACnB;AASD,qBAAa,YAAY;IAWrB,OAAO,CAAC,QAAQ,CAAC,aAAa;IAC9B,OAAO,CAAC,QAAQ,CAAC,WAAW,CAAC;IAX/B,OAAO,CAAC,QAAQ,CAAC,MAAM,CAAiC;IACxD,OAAO,CAAC,QAAQ,CAAC,OAAO,CAAS;IACjC,OAAO,CAAC,QAAQ,CAAC,KAAK,CAAC,CAAS;IAChC,SAAgB,UAAU,EAAE,gBAAgB,CAAC;IAC7C,OAAO,CAAC,QAAQ,CAAC,UAAU,CAAiD;IAE5E,IAAI,UAAU,IAAI,YAAY,GAAG,SAAS,CAEzC;gBAEkB,aAAa,EAAE,aAAa,EAC5B,WAAW,CAAC,EAAE;QACvB,UAAU,CAAC,EAAE,YAAY,CAAC;QAC1B,WAAW,CAAC,EAAE,OAAO,CAAC,YAAY,CAAC,wBAAwB,CAAC,CAAC,CAAC;KACjE;IAkBP;;;OAGG;IACG,UAAU,CAAC,OAAO,EAAE,iBAAiB,GAAG,OAAO,CAAC;QAAE,MAAM,EAAE,MAAM,CAAA;KAAE,CAAC;IAYzE;;OAEG;IACG,eAAe,CAAC,OAAO,EAAE,sBAAsB,GAAG,OAAO,CAAC;QAAE,OAAO,EAAE,MAAM,CAAC;QAAC,QAAQ,CAAC,EAAE,MAAM,EAAE,CAAA;KAAE,CAAC;IAmCzG,iBAAiB,CAAC,MAAM,EAAE,MAAM,GAAG,0BAA0B,GAAG,SAAS;IAOzE;;OAEG;IACG,aAAa,CAAC,MAAM,EAAE,MAAM,EAAE,OAAO,CAAC,EAAE,iBAAiB,GAAG,OAAO,CAAC;QACxE,YAAY,CAAC,EAAE,MAAM,CAAC;QACtB,QAAQ,CAAC,EAAE,MAAM,CAAC;QAClB,OAAO,CAAC,EAAE,MAAM,CAAC;QACjB,MAAM,CAAC,EAAE,MAAM,CAAC;KACjB,CAAC;IAoBF;;OAEG;IACG,cAAc,CAAC,OAAO,EAAE,MAAM,GAAG,OAAO,CAAC,GAAG,CAAC;IAiBnD;;OAEG;IACG,WAAW,CAAC,MAAM,EAAE,MAAM,EAAE,SAAS,SAAgB,EAAE,UAAU,SAAO,GAAG,OAAO,CAAC,GAAG,CAAC;IAsB7F,OAAO,CAAC,cAAc;IAMtB,OAAO,CAAC,iBAAiB;IAczB,OAAO,CAAC,kBAAkB;IAyB1B,OAAO,CAAC,sBAAsB;IAI9B,OAAO,CAAC,gBAAgB;IAIxB,OAAO,CAAC,WAAW;IAQnB,OAAO,CAAC,kBAAkB;IAO1B,OAAO,CAAC,oBAAoB;YAYd,kBAAkB;YA4BlB,oBAAoB;YA6BpB,qBAAqB;YAoErB,uBAAuB;IAsDrC,OAAO,CAAC,iBAAiB;IAgBzB,OAAO,CAAC,2BAA2B;IAenC,OAAO,CAAC,6BAA6B;IAcrC,OAAO,CAAC,iBAAiB;IAQzB,OAAO,CAAC,aAAa;IAcrB,OAAO,CAAC,iBAAiB;IAQzB,OAAO,CAAC,eAAe;YAIT,YAAY;IAkB1B,OAAO,CAAC,eAAe;IA0BvB,wBAAwB,IAAI,OAAO,CAAC,aAAa,CAAC,GAAG,EAAE,GAAG,CAAC,CAAC;IAKtD,wBAAwB;CAU/B"}
@@ -46,10 +46,6 @@ export class MinerUClient {
46
46
  */
47
47
  async createBatchTask(options) {
48
48
  this.ensureOfficial('createBatchTask');
49
- // Validate files is an array
50
- if (!Array.isArray(options.files)) {
51
- throw new Error('MinerU createBatchTask requires files to be an array');
52
- }
53
49
  const url = this.buildApiUrl('extract', 'task', 'batch');
54
50
  const body = {
55
51
  files: options.files.map((file) => {
@@ -71,15 +67,8 @@ export class MinerUClient {
71
67
  body.language = options.language;
72
68
  if (options.modelVersion)
73
69
  body.model_version = options.modelVersion;
74
- // Ensure extraFormats is an array if provided
75
- if (options.extraFormats) {
76
- if (Array.isArray(options.extraFormats)) {
77
- body.extra_formats = options.extraFormats;
78
- }
79
- else {
80
- this.logger.warn('extraFormats is not an array, ignoring');
81
- }
82
- }
70
+ if (options.extraFormats)
71
+ body.extra_formats = options.extraFormats;
83
72
  if (options.callbackUrl)
84
73
  body.callback = options.callbackUrl;
85
74
  if (options.seed)
@@ -242,15 +231,8 @@ export class MinerUClient {
242
231
  body.data_id = options.dataId;
243
232
  if (options.pageRanges)
244
233
  body.page_ranges = options.pageRanges;
245
- // Ensure extraFormats is an array if provided
246
- if (options.extraFormats) {
247
- if (Array.isArray(options.extraFormats)) {
248
- body.extra_formats = options.extraFormats;
249
- }
250
- else {
251
- this.logger.warn('extraFormats is not an array, ignoring');
252
- }
253
- }
234
+ if (options.extraFormats)
235
+ body.extra_formats = options.extraFormats;
254
236
  if (options.callbackUrl)
255
237
  body.callback = options.callbackUrl;
256
238
  if (options.seed)
@@ -269,15 +251,33 @@ export class MinerUClient {
269
251
  }
270
252
  }
271
253
  async createSelfHostedTask(options) {
254
+ // Validate fileSystem is available for self-hosted mode
255
+ if (!this.fileSystem) {
256
+ throw new Error('MinerU self-hosted mode requires fileSystem permission');
257
+ }
258
+ // Validate filePath is provided
259
+ if (!options.filePath) {
260
+ throw new Error('MinerU self-hosted mode requires filePath to be provided');
261
+ }
262
+ // Get absolute file path from fileSystem
272
263
  const filePath = this.fileSystem.fullPath(options.filePath);
264
+ // Validate file exists before attempting to parse
265
+ try {
266
+ await fs.promises.access(filePath, fs.constants.F_OK);
267
+ }
268
+ catch (error) {
269
+ this.logger.error(`File not found: ${filePath}`, error instanceof Error ? error.stack : error);
270
+ throw new Error(`File not found: ${filePath}`);
271
+ }
273
272
  const taskId = randomUUID();
274
- const result = await this.invokeSelfHostedParse(filePath, options.fileName, options);
273
+ const result = await this.invokeSelfHostedParse(filePath, options.fileName || basename(filePath), options);
275
274
  this.localTasks.set(taskId, { ...result, sourceUrl: options.url });
276
275
  return { taskId };
277
276
  }
278
277
  async invokeSelfHostedParse(filePath, fileName, options) {
279
278
  const parseUrl = this.buildApiUrl('file_parse');
280
279
  const form = new FormData();
280
+ // Create file read stream (file existence is already validated in createSelfHostedTask)
281
281
  form.append('files', fs.createReadStream(filePath), {
282
282
  filename: fileName,
283
283
  });
@@ -308,11 +308,14 @@ export class MinerUClient {
308
308
  return this.invokeSelfHostedParseV1(filePath, fileName, options);
309
309
  }
310
310
  if (response.status === 400) {
311
- throw new BadRequestException(`MinerU self-hosted parse failed: ${response.status} ${getErrorMessage(response.data)}`);
311
+ const errorMessage = getErrorMessage(response.data);
312
+ this.logger.error(`MinerU self-hosted parse failed with 400: ${errorMessage}`, JSON.stringify(response.data));
313
+ throw new BadRequestException(`MinerU self-hosted parse failed: ${response.status} ${errorMessage}`);
312
314
  }
313
315
  if (response.status !== 200) {
314
- console.error(response.data);
315
- throw new Error(`MinerU self-hosted parse failed: ${response.status} ${response.statusText}`);
316
+ const errorMessage = getErrorMessage(response.data) || response.statusText;
317
+ this.logger.error(`MinerU self-hosted parse failed with ${response.status}: ${errorMessage}`, JSON.stringify(response.data));
318
+ throw new Error(`MinerU self-hosted parse failed: ${response.status} ${response.statusText}. ${errorMessage}`);
316
319
  }
317
320
  return this.normalizeSelfHostedResponse(response.data);
318
321
  }
@@ -341,7 +344,9 @@ export class MinerUClient {
341
344
  validateStatus: () => true,
342
345
  });
343
346
  if (response.status !== 200) {
344
- throw new Error(`MinerU self-hosted legacy parse failed: ${response.status} ${response.statusText}`);
347
+ const errorMessage = getErrorMessage(response.data) || response.statusText;
348
+ this.logger.error(`MinerU self-hosted legacy parse failed with ${response.status}: ${errorMessage}`, JSON.stringify(response.data));
349
+ throw new Error(`MinerU self-hosted legacy parse failed: ${response.status} ${response.statusText}. ${errorMessage}`);
345
350
  }
346
351
  return this.normalizeSelfHostedResponse(response.data);
347
352
  }
@@ -1 +1 @@
1
- {"version":3,"file":"result-parser.service.d.ts","sourceRoot":"","sources":["../../src/lib/result-parser.service.ts"],"names":[],"mappings":"AAAA,OAAO,EAAE,QAAQ,EAAE,MAAM,2BAA2B,CAAC;AACrD,OAAO,EAAE,kBAAkB,EAAE,MAAM,kBAAkB,CAAC;AAEtD,OAAO,EACL,aAAa,EAEb,YAAY,EACb,MAAM,sBAAsB,CAAC;AAK9B,OAAO,EAEL,sBAAsB,EACtB,0BAA0B,EAC3B,MAAM,YAAY,CAAC;AAEpB,qBACa,yBAAyB;IACpC,OAAO,CAAC,QAAQ,CAAC,MAAM,CAA8C;IAE/D,YAAY,CAChB,UAAU,EAAE,MAAM,EAClB,MAAM,EAAE,MAAM,EACd,QAAQ,EAAE,OAAO,CAAC,kBAAkB,CAAC,EACrC,UAAU,EAAE,YAAY,GACvB,OAAO,CAAC;QACT,EAAE,CAAC,EAAE,MAAM,CAAC;QACZ,MAAM,EAAE,QAAQ,CAAC,aAAa,CAAC,EAAE,CAAC;QAClC,QAAQ,EAAE,sBAAsB,CAAC;KAClC,CAAC;IAqFI,cAAc,CAClB,MAAM,EAAE,0BAA0B,EAClC,MAAM,EAAE,MAAM,EACd,QAAQ,EAAE,OAAO,CAAC,kBAAkB,CAAC,EACrC,UAAU,EAAE,YAAY,GACvB,OAAO,CAAC;QACT,EAAE,CAAC,EAAE,MAAM,CAAC;QACZ,MAAM,EAAE,QAAQ,CAAC,aAAa,CAAC,EAAE,CAAC;QAClC,QAAQ,EAAE,sBAAsB,CAAC;KAClC,CAAC;CAoDH"}
1
+ {"version":3,"file":"result-parser.service.d.ts","sourceRoot":"","sources":["../../src/lib/result-parser.service.ts"],"names":[],"mappings":"AAAA,OAAO,EAAE,QAAQ,EAAE,MAAM,2BAA2B,CAAC;AACrD,OAAO,EAAE,kBAAkB,EAAE,MAAM,kBAAkB,CAAC;AAEtD,OAAO,EACL,aAAa,EAEb,YAAY,EACb,MAAM,sBAAsB,CAAC;AAK9B,OAAO,EAEL,sBAAsB,EACtB,0BAA0B,EAC3B,MAAM,YAAY,CAAC;AAEpB,qBACa,yBAAyB;IACpC,OAAO,CAAC,QAAQ,CAAC,MAAM,CAA8C;IAE/D,YAAY,CAChB,UAAU,EAAE,MAAM,EAClB,MAAM,EAAE,MAAM,EACd,QAAQ,EAAE,OAAO,CAAC,kBAAkB,CAAC,EACrC,UAAU,EAAE,YAAY,GACvB,OAAO,CAAC;QACT,EAAE,CAAC,EAAE,MAAM,CAAC;QACZ,MAAM,EAAE,QAAQ,CAAC,aAAa,CAAC,EAAE,CAAC;QAClC,QAAQ,EAAE,sBAAsB,CAAC;KAClC,CAAC;IAqFI,cAAc,CAClB,MAAM,EAAE,0BAA0B,EAClC,MAAM,EAAE,MAAM,EACd,QAAQ,EAAE,OAAO,CAAC,kBAAkB,CAAC,EACrC,UAAU,EAAE,YAAY,GACvB,OAAO,CAAC;QACT,EAAE,CAAC,EAAE,MAAM,CAAC;QACZ,MAAM,EAAE,QAAQ,CAAC,aAAa,CAAC,EAAE,CAAC;QAClC,QAAQ,EAAE,sBAAsB,CAAC;KAClC,CAAC;CAkDH"}
@@ -99,9 +99,7 @@ let MinerUResultParserService = MinerUResultParserService_1 = class MinerUResult
99
99
  };
100
100
  const assets = [];
101
101
  const pathMap = new Map();
102
- // Ensure images is an array before iterating
103
- const images = Array.isArray(result.images) ? result.images : [];
104
- for (const image of images) {
102
+ for (const image of result.images) {
105
103
  const filePath = join(document.folder || '', 'images', image.name);
106
104
  const url = await fileSystem.writeFile(filePath, Buffer.from(image.dataUrl.split(',')[1], 'base64'));
107
105
  pathMap.set(`images/${image.name}`, url);
@@ -1 +1 @@
1
- {"version":3,"file":"transformer-mineru.strategy.d.ts","sourceRoot":"","sources":["../../src/lib/transformer-mineru.strategy.ts"],"names":[],"mappings":"AAAA,OAAO,EAAE,QAAQ,EAAE,kBAAkB,EAAE,MAAM,kBAAkB,CAAA;AAG/D,OAAO,EACL,aAAa,EAEb,oBAAoB,EACpB,4BAA4B,EAC5B,qBAAqB,EACtB,MAAM,sBAAsB,CAAA;AAI7B,OAAO,EAAgB,wBAAwB,EAAE,MAAM,YAAY,CAAA;AAEnE,qBAEa,yBAA0B,YAAW,4BAA4B,CAAC,wBAAwB,CAAC;IAEtG,OAAO,CAAC,QAAQ,CAAC,YAAY,CAA2B;IAGxD,OAAO,CAAC,QAAQ,CAAC,aAAa,CAAe;IAE7C,QAAQ,CAAC,WAAW,mDAWnB;IAED,QAAQ,CAAC,IAAI;;;;;;;;;;;kBAWM,QAAQ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;MAwE1B;IAED,cAAc,CAAC,MAAM,EAAE,GAAG,GAAG,OAAO,CAAC,IAAI,CAAC;IAIpC,kBAAkB,CACtB,SAAS,EAAE,OAAO,CAAC,kBAAkB,CAAC,EAAE,EACxC,MAAM,EAAE,wBAAwB,GAC/B,OAAO,CAAC,OAAO,CAAC,kBAAkB,CAAC,aAAa,CAAC,CAAC,EAAE,CAAC;CAsDzD"}
1
+ {"version":3,"file":"transformer-mineru.strategy.d.ts","sourceRoot":"","sources":["../../src/lib/transformer-mineru.strategy.ts"],"names":[],"mappings":"AAAA,OAAO,EAAE,QAAQ,EAAE,kBAAkB,EAAE,MAAM,kBAAkB,CAAA;AAG/D,OAAO,EACL,aAAa,EAEb,oBAAoB,EACpB,4BAA4B,EAC5B,qBAAqB,EACtB,MAAM,sBAAsB,CAAA;AAI7B,OAAO,EAAgB,wBAAwB,EAAE,MAAM,YAAY,CAAA;AAEnE,qBAEa,yBAA0B,YAAW,4BAA4B,CAAC,wBAAwB,CAAC;IAEtG,OAAO,CAAC,QAAQ,CAAC,YAAY,CAA2B;IAGxD,OAAO,CAAC,QAAQ,CAAC,aAAa,CAAe;IAE7C,QAAQ,CAAC,WAAW,mDAWnB;IAED,QAAQ,CAAC,IAAI;;;;;;;;;;;kBAWM,QAAQ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;MAwE1B;IAED,cAAc,CAAC,MAAM,EAAE,GAAG,GAAG,OAAO,CAAC,IAAI,CAAC;IAIpC,kBAAkB,CACtB,SAAS,EAAE,OAAO,CAAC,kBAAkB,CAAC,EAAE,EACxC,MAAM,EAAE,wBAAwB,GAC/B,OAAO,CAAC,OAAO,CAAC,kBAAkB,CAAC,aAAa,CAAC,CAAC,EAAE,CAAC;CA8DzD"}
@@ -125,8 +125,12 @@ let MinerUTransformerStrategy = class MinerUTransformerStrategy {
125
125
  });
126
126
  const result = mineru.getSelfHostedTask(taskId);
127
127
  const parsedResult = await this.resultParser.parseLocalTask(result, taskId, document, config.permissions.fileSystem);
128
- parsedResult.id = document.id;
129
- parsedResults.push(parsedResult);
128
+ // Convert parsedResult to IKnowledgeDocument format
129
+ parsedResults.push({
130
+ id: document.id,
131
+ chunks: parsedResult.chunks,
132
+ metadata: parsedResult.metadata
133
+ });
130
134
  }
131
135
  else {
132
136
  const { taskId } = await mineru.createTask({
@@ -141,8 +145,12 @@ let MinerUTransformerStrategy = class MinerUTransformerStrategy {
141
145
  // Waiting for completion
142
146
  const result = await mineru.waitForTask(taskId, 5 * 60 * 1000, 5000);
143
147
  const parsedResult = await this.resultParser.parseFromUrl(result.full_zip_url, taskId, document, config.permissions.fileSystem);
144
- parsedResult.id = document.id;
145
- parsedResults.push(parsedResult);
148
+ // Convert parsedResult to IKnowledgeDocument format
149
+ parsedResults.push({
150
+ id: document.id,
151
+ chunks: parsedResult.chunks,
152
+ metadata: parsedResult.metadata
153
+ });
146
154
  }
147
155
  }
148
156
  return parsedResults;
package/dist/lib/types.js CHANGED
@@ -2,26 +2,26 @@ export const MinerU = 'mineru';
2
2
  export const ENV_MINERU_API_BASE_URL = 'MINERU_API_BASE_URL';
3
3
  export const ENV_MINERU_API_TOKEN = 'MINERU_API_TOKEN';
4
4
  export const ENV_MINERU_SERVER_TYPE = 'MINERU_SERVER_TYPE';
5
- export const icon = `<svg width="24" height="24" viewBox="0 0 24 24" fill="none" xmlns="http://www.w3.org/2000/svg">
6
- <path d="M19.7238 3.86898C19.7238 4.57597 19.1502 5.1491 18.4427 5.1491C17.7352 5.1491 17.1616 4.57597 17.1616 3.86898C17.1616 3.16199 17.7352 2.58887 18.4427 2.58887C19.1502 2.58887 19.7238 3.16199 19.7238 3.86898Z" fill="url(#paint0_linear_8609_1645)"/>
7
- <path d="M19.7238 3.86898C19.7238 4.57597 19.1502 5.1491 18.4427 5.1491C17.7352 5.1491 17.1616 4.57597 17.1616 3.86898C17.1616 3.16199 17.7352 2.58887 18.4427 2.58887C19.1502 2.58887 19.7238 3.16199 19.7238 3.86898Z" fill="#010101"/>
8
- <path d="M15.3681 5.1491C15.3681 5.85609 14.7945 6.42921 14.087 6.42921C13.3794 6.42921 12.8059 5.85609 12.8059 5.1491C12.8059 4.44211 13.3794 3.86898 14.087 3.86898C14.7945 3.86898 15.3681 4.44211 15.3681 5.1491Z" fill="url(#paint1_linear_8609_1645)"/>
9
- <path d="M15.3681 5.1491C15.3681 5.85609 14.7945 6.42921 14.087 6.42921C13.3794 6.42921 12.8059 5.85609 12.8059 5.1491C12.8059 4.44211 13.3794 3.86898 14.087 3.86898C14.7945 3.86898 15.3681 4.44211 15.3681 5.1491Z" fill="#010101"/>
10
- <path fill-rule="evenodd" clip-rule="evenodd" d="M8.05175 11.2368C8.05175 13.4605 9.14375 15.4293 10.8211 16.6371C11.8241 15.7389 12.4551 14.4345 12.4551 12.9828V9.39673C12.4551 8.85661 12.8197 8.38448 13.3426 8.24757L19.8924 6.53265C20.6459 6.33534 21.3826 6.90341 21.3826 7.6818L21.3826 12.0452C21.3826 17.2179 17.1861 21.4111 12.0095 21.4111L11.9942 21.4111C6.81758 21.4111 2.62109 17.2179 2.62109 12.0452V9.03388C2.62109 8.49175 2.9884 8.01839 3.51385 7.88336L6.56677 7.09882C7.31904 6.9055 8.05175 7.47318 8.05175 8.24934V11.2368ZM3.9798 12.0452C3.9798 13.8476 4.57565 15.5108 5.58124 16.849C6.04996 17.4728 6.7655 17.8884 7.54573 17.8884V17.8884C8.28848 17.8884 8.9927 17.7236 9.62376 17.4286C7.83439 15.9596 6.69304 13.7314 6.69304 11.2368V8.46821L3.9798 9.16546V12.0452Z" fill="url(#paint2_linear_8609_1645)"/>
11
- <path fill-rule="evenodd" clip-rule="evenodd" d="M8.05175 11.2368C8.05175 13.4605 9.14375 15.4293 10.8211 16.6371C11.8241 15.7389 12.4551 14.4345 12.4551 12.9828V9.39673C12.4551 8.85661 12.8197 8.38448 13.3426 8.24757L19.8924 6.53265C20.6459 6.33534 21.3826 6.90341 21.3826 7.6818L21.3826 12.0452C21.3826 17.2179 17.1861 21.4111 12.0095 21.4111L11.9942 21.4111C6.81758 21.4111 2.62109 17.2179 2.62109 12.0452V9.03388C2.62109 8.49175 2.9884 8.01839 3.51385 7.88336L6.56677 7.09882C7.31904 6.9055 8.05175 7.47318 8.05175 8.24934V11.2368ZM3.9798 12.0452C3.9798 13.8476 4.57565 15.5108 5.58124 16.849C6.04996 17.4728 6.7655 17.8884 7.54573 17.8884V17.8884C8.28848 17.8884 8.9927 17.7236 9.62376 17.4286C7.83439 15.9596 6.69304 13.7314 6.69304 11.2368V8.46821L3.9798 9.16546V12.0452Z" fill="#010101"/>
12
- <defs>
13
- <linearGradient id="paint0_linear_8609_1645" x1="14.3898" y1="8.36821" x2="13.1876" y2="19.4461" gradientUnits="userSpaceOnUse">
14
- <stop stop-color="white"/>
15
- <stop offset="1" stop-color="#2E2E2E"/>
16
- </linearGradient>
17
- <linearGradient id="paint1_linear_8609_1645" x1="14.3898" y1="8.36821" x2="13.1876" y2="19.4461" gradientUnits="userSpaceOnUse">
18
- <stop stop-color="white"/>
19
- <stop offset="1" stop-color="#2E2E2E"/>
20
- </linearGradient>
21
- <linearGradient id="paint2_linear_8609_1645" x1="14.3898" y1="8.36821" x2="13.1876" y2="19.4461" gradientUnits="userSpaceOnUse">
22
- <stop stop-color="white"/>
23
- <stop offset="1" stop-color="#2E2E2E"/>
24
- </linearGradient>
25
- </defs>
26
- </svg>
5
+ export const icon = `<svg width="24" height="24" viewBox="0 0 24 24" fill="none" xmlns="http://www.w3.org/2000/svg">
6
+ <path d="M19.7238 3.86898C19.7238 4.57597 19.1502 5.1491 18.4427 5.1491C17.7352 5.1491 17.1616 4.57597 17.1616 3.86898C17.1616 3.16199 17.7352 2.58887 18.4427 2.58887C19.1502 2.58887 19.7238 3.16199 19.7238 3.86898Z" fill="url(#paint0_linear_8609_1645)"/>
7
+ <path d="M19.7238 3.86898C19.7238 4.57597 19.1502 5.1491 18.4427 5.1491C17.7352 5.1491 17.1616 4.57597 17.1616 3.86898C17.1616 3.16199 17.7352 2.58887 18.4427 2.58887C19.1502 2.58887 19.7238 3.16199 19.7238 3.86898Z" fill="#010101"/>
8
+ <path d="M15.3681 5.1491C15.3681 5.85609 14.7945 6.42921 14.087 6.42921C13.3794 6.42921 12.8059 5.85609 12.8059 5.1491C12.8059 4.44211 13.3794 3.86898 14.087 3.86898C14.7945 3.86898 15.3681 4.44211 15.3681 5.1491Z" fill="url(#paint1_linear_8609_1645)"/>
9
+ <path d="M15.3681 5.1491C15.3681 5.85609 14.7945 6.42921 14.087 6.42921C13.3794 6.42921 12.8059 5.85609 12.8059 5.1491C12.8059 4.44211 13.3794 3.86898 14.087 3.86898C14.7945 3.86898 15.3681 4.44211 15.3681 5.1491Z" fill="#010101"/>
10
+ <path fill-rule="evenodd" clip-rule="evenodd" d="M8.05175 11.2368C8.05175 13.4605 9.14375 15.4293 10.8211 16.6371C11.8241 15.7389 12.4551 14.4345 12.4551 12.9828V9.39673C12.4551 8.85661 12.8197 8.38448 13.3426 8.24757L19.8924 6.53265C20.6459 6.33534 21.3826 6.90341 21.3826 7.6818L21.3826 12.0452C21.3826 17.2179 17.1861 21.4111 12.0095 21.4111L11.9942 21.4111C6.81758 21.4111 2.62109 17.2179 2.62109 12.0452V9.03388C2.62109 8.49175 2.9884 8.01839 3.51385 7.88336L6.56677 7.09882C7.31904 6.9055 8.05175 7.47318 8.05175 8.24934V11.2368ZM3.9798 12.0452C3.9798 13.8476 4.57565 15.5108 5.58124 16.849C6.04996 17.4728 6.7655 17.8884 7.54573 17.8884V17.8884C8.28848 17.8884 8.9927 17.7236 9.62376 17.4286C7.83439 15.9596 6.69304 13.7314 6.69304 11.2368V8.46821L3.9798 9.16546V12.0452Z" fill="url(#paint2_linear_8609_1645)"/>
11
+ <path fill-rule="evenodd" clip-rule="evenodd" d="M8.05175 11.2368C8.05175 13.4605 9.14375 15.4293 10.8211 16.6371C11.8241 15.7389 12.4551 14.4345 12.4551 12.9828V9.39673C12.4551 8.85661 12.8197 8.38448 13.3426 8.24757L19.8924 6.53265C20.6459 6.33534 21.3826 6.90341 21.3826 7.6818L21.3826 12.0452C21.3826 17.2179 17.1861 21.4111 12.0095 21.4111L11.9942 21.4111C6.81758 21.4111 2.62109 17.2179 2.62109 12.0452V9.03388C2.62109 8.49175 2.9884 8.01839 3.51385 7.88336L6.56677 7.09882C7.31904 6.9055 8.05175 7.47318 8.05175 8.24934V11.2368ZM3.9798 12.0452C3.9798 13.8476 4.57565 15.5108 5.58124 16.849C6.04996 17.4728 6.7655 17.8884 7.54573 17.8884V17.8884C8.28848 17.8884 8.9927 17.7236 9.62376 17.4286C7.83439 15.9596 6.69304 13.7314 6.69304 11.2368V8.46821L3.9798 9.16546V12.0452Z" fill="#010101"/>
12
+ <defs>
13
+ <linearGradient id="paint0_linear_8609_1645" x1="14.3898" y1="8.36821" x2="13.1876" y2="19.4461" gradientUnits="userSpaceOnUse">
14
+ <stop stop-color="white"/>
15
+ <stop offset="1" stop-color="#2E2E2E"/>
16
+ </linearGradient>
17
+ <linearGradient id="paint1_linear_8609_1645" x1="14.3898" y1="8.36821" x2="13.1876" y2="19.4461" gradientUnits="userSpaceOnUse">
18
+ <stop stop-color="white"/>
19
+ <stop offset="1" stop-color="#2E2E2E"/>
20
+ </linearGradient>
21
+ <linearGradient id="paint2_linear_8609_1645" x1="14.3898" y1="8.36821" x2="13.1876" y2="19.4461" gradientUnits="userSpaceOnUse">
22
+ <stop stop-color="white"/>
23
+ <stop offset="1" stop-color="#2E2E2E"/>
24
+ </linearGradient>
25
+ </defs>
26
+ </svg>
27
27
  `;
package/package.json CHANGED
@@ -1,52 +1,49 @@
1
- {
2
- "name": "@chenchaolong/plugin-mineru",
3
- "version": "0.0.12",
4
- "repository": {
5
- "type": "git",
6
- "url": "https://github.com/xpert-ai/xpert-plugins.git"
7
- },
8
- "bugs": {
9
- "url": "https://github.com/xpert-ai/xpert-plugins/issues"
10
- },
11
- "type": "module",
12
- "main": "./dist/index.js",
13
- "module": "./dist/index.js",
14
- "types": "./dist/index.d.ts",
15
- "exports": {
16
- "./package.json": "./package.json",
17
- ".": {
18
- "@xpert-plugins-starter/source": "./src/index.ts",
19
- "types": "./dist/index.d.ts",
20
- "import": "./dist/index.js",
21
- "default": "./dist/index.js"
22
- }
23
- },
24
- "files": [
25
- "dist",
26
- "!**/*.tsbuildinfo"
27
- ],
28
- "dependencies": {
29
- "form-data": "^4.0.0",
30
- "tslib": "^2.3.0",
31
- "unzipper": "0.12.3"
32
- },
33
- "peerDependencies": {
34
- "@nestjs/config": "^4.0.2",
35
- "zod": "3.25.67",
36
- "@xpert-ai/plugin-sdk": "^3.6.2",
37
- "@metad/contracts": "^3.6.2",
38
- "@nestjs/common": "^11.1.6",
39
- "axios": "1.12.2",
40
- "nestjs-i18n": "10.5.1",
41
- "chalk": "4.1.2",
42
- "@langchain/core": "^0.3.72",
43
- "lodash-es": "4.17.21",
44
- "uuid": "8.3.2"
45
- },
46
- "devDependencies": {
47
- "@types/unzipper": "^0.10.11"
48
- },
49
- "publishConfig": {
50
- "access": "public"
51
- }
52
- }
1
+ {
2
+ "name": "@chenchaolong/plugin-mineru",
3
+ "version": "1.1.0",
4
+ "repository": {
5
+ "type": "git",
6
+ "url": "https://github.com/xpert-ai/xpert-plugins.git"
7
+ },
8
+ "bugs": {
9
+ "url": "https://github.com/xpert-ai/xpert-plugins/issues"
10
+ },
11
+ "type": "module",
12
+ "main": "./dist/index.js",
13
+ "module": "./dist/index.js",
14
+ "types": "./dist/index.d.ts",
15
+ "exports": {
16
+ "./package.json": "./package.json",
17
+ ".": {
18
+ "@xpert-plugins-starter/source": "./src/index.ts",
19
+ "types": "./dist/index.d.ts",
20
+ "import": "./dist/index.js",
21
+ "default": "./dist/index.js"
22
+ }
23
+ },
24
+ "files": [
25
+ "dist",
26
+ "!**/*.tsbuildinfo"
27
+ ],
28
+ "dependencies": {
29
+ "form-data": "^4.0.0",
30
+ "tslib": "^2.3.0",
31
+ "unzipper": "0.12.3"
32
+ },
33
+ "peerDependencies": {
34
+ "@nestjs/config": "^4.0.2",
35
+ "zod": "3.25.67",
36
+ "@xpert-ai/plugin-sdk": "^3.6.2",
37
+ "@metad/contracts": "^3.6.2",
38
+ "@nestjs/common": "^11.1.6",
39
+ "axios": "1.12.2",
40
+ "nestjs-i18n": "10.5.1",
41
+ "chalk": "4.1.2",
42
+ "@langchain/core": "^0.3.72",
43
+ "lodash-es": "4.17.21",
44
+ "uuid": "8.3.2"
45
+ },
46
+ "devDependencies": {
47
+ "@types/unzipper": "^0.10.11"
48
+ }
49
+ }
@@ -1,10 +0,0 @@
1
- import { StructuredToolInterface, ToolSchemaBase } from '@langchain/core/tools';
2
- import { BuiltinToolset } from '@xpert-ai/plugin-sdk';
3
- import { ConfigService } from '@nestjs/config';
4
- import { MinerUResultParserService } from './result-parser.service.js';
5
- export declare function setMinerUToolsetServices(configService: ConfigService, resultParser: MinerUResultParserService): void;
6
- export declare class MinerUToolset extends BuiltinToolset<StructuredToolInterface, Record<string, never>> {
7
- _validateCredentials(credentials: Record<string, never>): Promise<void>;
8
- initTools(): Promise<StructuredToolInterface<ToolSchemaBase, any, any>[]>;
9
- }
10
- //# sourceMappingURL=mineru-toolset.d.ts.map
@@ -1 +0,0 @@
1
- {"version":3,"file":"mineru-toolset.d.ts","sourceRoot":"","sources":["../../src/lib/mineru-toolset.ts"],"names":[],"mappings":"AAAA,OAAO,EAAE,uBAAuB,EAAE,cAAc,EAAE,MAAM,uBAAuB,CAAC;AAChF,OAAO,EAAE,cAAc,EAAE,MAAM,sBAAsB,CAAC;AACtD,OAAO,EAAE,aAAa,EAAE,MAAM,gBAAgB,CAAC;AAC/C,OAAO,EAAE,yBAAyB,EAAE,MAAM,4BAA4B,CAAC;AAOvE,wBAAgB,wBAAwB,CACtC,aAAa,EAAE,aAAa,EAC5B,YAAY,EAAE,yBAAyB,QAIxC;AAED,qBAAa,aAAc,SAAQ,cAAc,CAAC,uBAAuB,EAAE,MAAM,CAAC,MAAM,EAAE,KAAK,CAAC,CAAC;IAChF,oBAAoB,CAAC,WAAW,EAAE,MAAM,CAAC,MAAM,EAAE,KAAK,CAAC,GAAG,OAAO,CAAC,IAAI,CAAC;IAIvE,SAAS,IAAI,OAAO,CAAC,uBAAuB,CAAC,cAAc,EAAE,GAAG,EAAE,GAAG,CAAC,EAAE,CAAC;CASzF"}
@@ -1,23 +0,0 @@
1
- import { BuiltinToolset } from '@xpert-ai/plugin-sdk';
2
- import { buildPdfToMarkdownTool } from './pdf-to-markdown.tool.js';
3
- // Store services globally for tool access
4
- let globalConfigService;
5
- let globalResultParser;
6
- export function setMinerUToolsetServices(configService, resultParser) {
7
- globalConfigService = configService;
8
- globalResultParser = resultParser;
9
- }
10
- export class MinerUToolset extends BuiltinToolset {
11
- async _validateCredentials(credentials) {
12
- // No credentials needed for mineru toolset (uses integration permissions)
13
- }
14
- async initTools() {
15
- if (!globalConfigService || !globalResultParser) {
16
- throw new Error('MinerU services not initialized. Call setMinerUToolsetServices first.');
17
- }
18
- this.tools = [
19
- buildPdfToMarkdownTool(globalConfigService, globalResultParser),
20
- ];
21
- return this.tools;
22
- }
23
- }
@@ -1,34 +0,0 @@
1
- import { ConfigService } from '@nestjs/config';
2
- import { BuiltinToolset, IToolsetStrategy } from '@xpert-ai/plugin-sdk';
3
- import { MinerUResultParserService } from './result-parser.service.js';
4
- export declare class MinerUToolsetStrategy implements IToolsetStrategy<any> {
5
- private readonly configService;
6
- private readonly resultParser;
7
- constructor(configService: ConfigService, resultParser: MinerUResultParserService);
8
- meta: {
9
- author: string;
10
- tags: string[];
11
- name: string;
12
- label: {
13
- en_US: string;
14
- zh_Hans: string;
15
- };
16
- description: {
17
- en_US: string;
18
- zh_Hans: string;
19
- };
20
- icon: {
21
- svg: string;
22
- color: string;
23
- };
24
- configSchema: {
25
- type: string;
26
- properties: {};
27
- required: any[];
28
- };
29
- };
30
- validateConfig(config: any): Promise<void>;
31
- create(config: any): Promise<BuiltinToolset>;
32
- createTools(): any[];
33
- }
34
- //# sourceMappingURL=mineru-toolset.strategy.d.ts.map
@@ -1 +0,0 @@
1
- {"version":3,"file":"mineru-toolset.strategy.d.ts","sourceRoot":"","sources":["../../src/lib/mineru-toolset.strategy.ts"],"names":[],"mappings":"AACA,OAAO,EAAE,aAAa,EAAE,MAAM,gBAAgB,CAAC;AAC/C,OAAO,EAAE,cAAc,EAAE,gBAAgB,EAAmB,MAAM,sBAAsB,CAAC;AAGzF,OAAO,EAAE,yBAAyB,EAAE,MAAM,4BAA4B,CAAC;AAEvE,qBAEa,qBAAsB,YAAW,gBAAgB,CAAC,GAAG,CAAC;IAG/D,OAAO,CAAC,QAAQ,CAAC,aAAa;IAE9B,OAAO,CAAC,QAAQ,CAAC,YAAY;gBAFZ,aAAa,EAAE,aAAa,EAE5B,YAAY,EAAE,yBAAyB;IAM1D,IAAI;;;;;;;;;;;;;;;;;;;;;MAqBF;IAEF,cAAc,CAAC,MAAM,EAAE,GAAG,GAAG,OAAO,CAAC,IAAI,CAAC;IAKpC,MAAM,CAAC,MAAM,EAAE,GAAG,GAAG,OAAO,CAAC,cAAc,CAAC;IAIlD,WAAW;CAKZ"}
@@ -1,58 +0,0 @@
1
- import { __decorate, __metadata, __param } from "tslib";
2
- import { Injectable, forwardRef, Inject } from '@nestjs/common';
3
- import { ConfigService } from '@nestjs/config';
4
- import { ToolsetStrategy } from '@xpert-ai/plugin-sdk';
5
- import { MinerU, icon } from './types.js';
6
- import { MinerUToolset, setMinerUToolsetServices } from './mineru-toolset.js';
7
- import { MinerUResultParserService } from './result-parser.service.js';
8
- let MinerUToolsetStrategy = class MinerUToolsetStrategy {
9
- constructor(configService, resultParser) {
10
- this.configService = configService;
11
- this.resultParser = resultParser;
12
- this.meta = {
13
- author: 'Xpert AI',
14
- tags: ['mineru', 'pdf', 'markdown', 'conversion', 'tool'],
15
- name: MinerU,
16
- label: {
17
- en_US: 'MinerU',
18
- zh_Hans: 'MinerU',
19
- },
20
- description: {
21
- en_US: 'Convert PDF files to Markdown and JSON format using MinerU. Supports OCR, formula recognition, and table extraction.',
22
- zh_Hans: '使用MinerU将PDF文件转换为Markdown和JSON格式。支持OCR、公式识别和表格提取。',
23
- },
24
- icon: {
25
- svg: icon,
26
- color: '#14b8a6',
27
- },
28
- configSchema: {
29
- type: 'object',
30
- properties: {},
31
- required: [],
32
- },
33
- };
34
- // Initialize global services for tool access
35
- setMinerUToolsetServices(this.configService, this.resultParser);
36
- }
37
- validateConfig(config) {
38
- // No validation needed - uses integration permissions
39
- return Promise.resolve();
40
- }
41
- async create(config) {
42
- return new MinerUToolset(config || {});
43
- }
44
- createTools() {
45
- // Tools are created dynamically in MinerUToolset.initTools()
46
- // This method is not used when using BuiltinToolset
47
- return [];
48
- }
49
- };
50
- MinerUToolsetStrategy = __decorate([
51
- Injectable(),
52
- ToolsetStrategy(MinerU),
53
- __param(0, Inject(forwardRef(() => ConfigService))),
54
- __param(1, Inject(MinerUResultParserService)),
55
- __metadata("design:paramtypes", [ConfigService,
56
- MinerUResultParserService])
57
- ], MinerUToolsetStrategy);
58
- export { MinerUToolsetStrategy };
@@ -1,90 +0,0 @@
1
- import { z } from 'zod';
2
- import { ConfigService } from '@nestjs/config';
3
- import { MinerUResultParserService } from './result-parser.service.js';
4
- export declare function buildPdfToMarkdownTool(configService: ConfigService, resultParser: MinerUResultParserService): import("@langchain/core/tools").DynamicStructuredTool<z.ZodObject<{
5
- file: z.ZodObject<{
6
- name: z.ZodOptional<z.ZodString>;
7
- filename: z.ZodOptional<z.ZodString>;
8
- content: z.ZodOptional<z.ZodUnion<[z.ZodString, z.ZodType<Buffer<ArrayBufferLike>, z.ZodTypeDef, Buffer<ArrayBufferLike>>, z.ZodType<Uint8Array<ArrayBuffer>, z.ZodTypeDef, Uint8Array<ArrayBuffer>>]>>;
9
- filePath: z.ZodOptional<z.ZodString>;
10
- fileUrl: z.ZodOptional<z.ZodString>;
11
- }, "strip", z.ZodTypeAny, {
12
- name?: string;
13
- filePath?: string;
14
- fileUrl?: string;
15
- filename?: string;
16
- content?: string | Uint8Array<ArrayBuffer> | Buffer<ArrayBufferLike>;
17
- }, {
18
- name?: string;
19
- filePath?: string;
20
- fileUrl?: string;
21
- filename?: string;
22
- content?: string | Uint8Array<ArrayBuffer> | Buffer<ArrayBufferLike>;
23
- }>;
24
- isOcr: z.ZodOptional<z.ZodBoolean>;
25
- enableFormula: z.ZodOptional<z.ZodBoolean>;
26
- enableTable: z.ZodOptional<z.ZodBoolean>;
27
- language: z.ZodOptional<z.ZodEnum<["en", "ch"]>>;
28
- modelVersion: z.ZodOptional<z.ZodEnum<["pipeline", "vlm"]>>;
29
- }, "strip", z.ZodTypeAny, {
30
- isOcr?: boolean;
31
- enableFormula?: boolean;
32
- enableTable?: boolean;
33
- language?: "ch" | "en";
34
- modelVersion?: "pipeline" | "vlm";
35
- file?: {
36
- name?: string;
37
- filePath?: string;
38
- fileUrl?: string;
39
- filename?: string;
40
- content?: string | Uint8Array<ArrayBuffer> | Buffer<ArrayBufferLike>;
41
- };
42
- }, {
43
- isOcr?: boolean;
44
- enableFormula?: boolean;
45
- enableTable?: boolean;
46
- language?: "ch" | "en";
47
- modelVersion?: "pipeline" | "vlm";
48
- file?: {
49
- name?: string;
50
- filePath?: string;
51
- fileUrl?: string;
52
- filename?: string;
53
- content?: string | Uint8Array<ArrayBuffer> | Buffer<ArrayBufferLike>;
54
- };
55
- }>, {
56
- isOcr?: boolean;
57
- enableFormula?: boolean;
58
- enableTable?: boolean;
59
- language?: "ch" | "en";
60
- modelVersion?: "pipeline" | "vlm";
61
- file?: {
62
- name?: string;
63
- filePath?: string;
64
- fileUrl?: string;
65
- filename?: string;
66
- content?: string | Uint8Array<ArrayBuffer> | Buffer<ArrayBufferLike>;
67
- };
68
- }, {
69
- isOcr?: boolean;
70
- enableFormula?: boolean;
71
- enableTable?: boolean;
72
- language?: "ch" | "en";
73
- modelVersion?: "pipeline" | "vlm";
74
- file?: {
75
- name?: string;
76
- filePath?: string;
77
- fileUrl?: string;
78
- filename?: string;
79
- content?: string | Uint8Array<ArrayBuffer> | Buffer<ArrayBufferLike>;
80
- };
81
- }, (string | {
82
- files: {
83
- mimeType: string;
84
- fileName: string;
85
- filePath: string;
86
- fileUrl: string;
87
- extension: string;
88
- }[];
89
- })[]>;
90
- //# sourceMappingURL=pdf-to-markdown.tool.d.ts.map
@@ -1 +0,0 @@
1
- {"version":3,"file":"pdf-to-markdown.tool.d.ts","sourceRoot":"","sources":["../../src/lib/pdf-to-markdown.tool.ts"],"names":[],"mappings":"AAGA,OAAO,EAAE,CAAC,EAAE,MAAM,KAAK,CAAC;AACxB,OAAO,EAAE,aAAa,EAAE,MAAM,gBAAgB,CAAC;AAE/C,OAAO,EAAE,yBAAyB,EAAE,MAAM,4BAA4B,CAAC;AAIvE,wBAAgB,sBAAsB,CACpC,aAAa,EAAE,aAAa,EAC5B,YAAY,EAAE,yBAAyB;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;MAqKxC"}
@@ -1,146 +0,0 @@
1
- import { tool } from '@langchain/core/tools';
2
- import { getCurrentTaskInput } from '@langchain/langgraph';
3
- import { getErrorMessage } from '@xpert-ai/plugin-sdk';
4
- import { z } from 'zod';
5
- import { MinerUClient } from './mineru.client.js';
6
- export function buildPdfToMarkdownTool(configService, resultParser) {
7
- return tool(async (input) => {
8
- try {
9
- const { file, isOcr, enableFormula, enableTable, language, modelVersion } = input;
10
- if (!file) {
11
- throw new Error('No file provided');
12
- }
13
- const currentState = getCurrentTaskInput();
14
- const workspacePath = currentState?.[`sys`]?.['volume'] ?? '/tmp/xpert';
15
- const baseUrl = currentState?.[`sys`]?.['workspace_url'] ?? 'http://localhost:3000';
16
- // Get permissions from current state
17
- const permissions = currentState?.[`sys`]?.['permissions'];
18
- if (!permissions?.fileSystem) {
19
- throw new Error('File system permission is required for MinerU tool');
20
- }
21
- // Get file content
22
- let fileContent;
23
- let fileName;
24
- let filePath;
25
- let fileUrl;
26
- if (file.content) {
27
- if (typeof file.content === 'string') {
28
- // Base64 string
29
- fileContent = Buffer.from(file.content, 'base64');
30
- }
31
- else if (Buffer.isBuffer(file.content)) {
32
- fileContent = file.content;
33
- }
34
- else if (file.content instanceof Uint8Array) {
35
- fileContent = Buffer.from(file.content);
36
- }
37
- else {
38
- throw new Error('Invalid file content format');
39
- }
40
- fileName = file.name || file.filename || 'document.pdf';
41
- }
42
- else if (file.filePath) {
43
- filePath = file.filePath;
44
- fileContent = await permissions.fileSystem.readFile(filePath);
45
- fileName = file.name || file.filename || filePath.split('/').pop() || 'document.pdf';
46
- }
47
- else if (file.fileUrl) {
48
- fileUrl = file.fileUrl;
49
- const response = await fetch(fileUrl);
50
- if (!response.ok) {
51
- throw new Error(`Failed to download file from URL: ${response.statusText}`);
52
- }
53
- const arrayBuffer = await response.arrayBuffer();
54
- fileContent = Buffer.from(arrayBuffer);
55
- fileName = file.name || file.filename || fileUrl.split('/').pop() || 'document.pdf';
56
- }
57
- else {
58
- throw new Error('File must provide content, filePath, or fileUrl');
59
- }
60
- // Save file to workspace if not already there
61
- if (!filePath) {
62
- const relativePath = `mineru-input/${fileName}`;
63
- filePath = relativePath;
64
- fileUrl = await permissions.fileSystem.writeFile(relativePath, fileContent);
65
- }
66
- // Create MinerU client
67
- const mineruClient = new MinerUClient(configService, {
68
- fileSystem: permissions.fileSystem,
69
- integration: permissions.integration,
70
- });
71
- // Create task
72
- const { taskId } = await mineruClient.createTask({
73
- url: fileUrl || file.fileUrl,
74
- filePath: filePath,
75
- fileName: fileName,
76
- isOcr: isOcr ?? true,
77
- enableFormula: enableFormula ?? true,
78
- enableTable: enableTable ?? true,
79
- language: language || 'ch',
80
- modelVersion: modelVersion || 'pipeline',
81
- });
82
- // Get result
83
- let result;
84
- if (mineruClient.serverType === 'self-hosted') {
85
- result = mineruClient.getSelfHostedTask(taskId);
86
- if (!result) {
87
- throw new Error('Failed to get MinerU task result');
88
- }
89
- }
90
- else {
91
- result = await mineruClient.waitForTask(taskId, 5 * 60 * 1000, 5000);
92
- }
93
- // Parse result
94
- const parsedResult = mineruClient.serverType === 'self-hosted'
95
- ? await resultParser.parseLocalTask(result, taskId, { folder: 'mineru-output', name: fileName }, permissions.fileSystem)
96
- : await resultParser.parseFromUrl(result.full_zip_url, taskId, { folder: 'mineru-output', name: fileName }, permissions.fileSystem);
97
- // Get markdown content
98
- const markdownContent = parsedResult.chunks[0]?.pageContent || '';
99
- const outputFileName = fileName.replace(/\.pdf$/i, '.md');
100
- const outputPath = `mineru-output/${outputFileName}`;
101
- const outputUrl = await permissions.fileSystem.writeFile(outputPath, Buffer.from(markdownContent, 'utf-8'));
102
- return [
103
- `Successfully converted PDF to Markdown: ${outputFileName}`,
104
- {
105
- files: [
106
- {
107
- mimeType: 'text/markdown',
108
- fileName: outputPath,
109
- filePath: permissions.fileSystem.fullPath(outputPath),
110
- fileUrl: outputUrl,
111
- extension: 'md',
112
- },
113
- ...(parsedResult.metadata.assets || []).map((asset) => ({
114
- mimeType: asset.type === 'image' ? 'image/png' : 'application/json',
115
- fileName: asset.filePath,
116
- filePath: permissions.fileSystem.fullPath(asset.filePath),
117
- fileUrl: asset.url,
118
- extension: asset.type === 'image' ? 'png' : 'json',
119
- })),
120
- ],
121
- },
122
- ];
123
- }
124
- catch (error) {
125
- throw new Error(`Error converting PDF to Markdown: ${getErrorMessage(error)}`);
126
- }
127
- }, {
128
- name: 'pdf_to_markdown',
129
- description: `Convert PDF file to Markdown format using MinerU. Supports OCR, formula recognition, and table extraction.`,
130
- schema: z.object({
131
- file: z.object({
132
- name: z.string().optional(),
133
- filename: z.string().optional(),
134
- content: z.union([z.string(), z.instanceof(Buffer), z.instanceof(Uint8Array)]).optional(),
135
- filePath: z.string().optional(),
136
- fileUrl: z.string().optional(),
137
- }),
138
- isOcr: z.boolean().optional().describe('Enable OCR for image-based PDFs'),
139
- enableFormula: z.boolean().optional().describe('Enable recognition of mathematical formulas'),
140
- enableTable: z.boolean().optional().describe('Enable recognition of tables'),
141
- language: z.enum(['en', 'ch']).optional().describe('Document language (en for English, ch for Chinese)'),
142
- modelVersion: z.enum(['pipeline', 'vlm']).optional().describe('MinerU model version'),
143
- }),
144
- responseFormat: 'content_and_artifact',
145
- });
146
- }