@chenchaolong/plugin-mineru 0.0.13 → 1.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -1,113 +1,101 @@
1
- # Xpert Plugin: MinerU
2
-
3
- `@chenchaolong/plugin-mineru` is a MinerU document converter plugin for the [Xpert AI](https://github.com/xpert-ai/xpert) platform, providing extraction capabilities from PDF to Markdown and structured JSON. The plugin includes built-in MinerU integration strategies, document conversion strategies, and result parsing services, enabling secure access to the MinerU API in automated workflows, polling task status, and writing parsed content and attachment resources to the platform file system.
4
-
5
- ## Installation
6
-
7
- ```bash
8
- pnpm add @chenchaolong/plugin-mineru
9
- # or
10
- npm install @chenchaolong/plugin-mineru
11
- ```
12
-
13
- > **Note**: This plugin depends on `@xpert-ai/plugin-sdk`, `@nestjs/common@^11`, `@nestjs/config@^4`, `@metad/contracts`, `axios@1`, `chalk@4`, `@langchain/core@^0.3.72`, and `uuid@8` as peerDependencies. Please ensure these packages are installed in your host project.
14
-
15
- ## Quick Start
16
-
17
- 1. **Prepare MinerU Credentials**
18
- Obtain a valid API Key from the MinerU dashboard and confirm the service address (default: `https://mineru.net/api/v4`).
19
-
20
- 2. **Configure Integration in Xpert**
21
- - Via Xpert Console: Create a MinerU integration and fill in the following fields.
22
- - Or set environment variables in your deployment environment:
23
- - `MINERU_API_BASE_URL`: Optional, defaults to `https://mineru.net/api/v4`.
24
- - `MINERU_API_TOKEN`: Required, used as a fallback credential if no integration is configured.
25
-
26
- Example integration configuration (JSON):
27
-
28
- ```json
29
- {
30
- "provider": "mineru",
31
- "options": {
32
- "apiUrl": "https://mineru.net/api/v4",
33
- "apiKey": "your-mineru-api-key"
34
- }
35
- }
36
- ```
37
-
38
- 3. **Register the Plugin**
39
- Configure the plugin in your host service's plugin registration process:
40
-
41
- ```sh .env
42
- PLUGINS=@chenchaolong/plugin-mineru
43
- ```
44
-
45
- The plugin returns the NestJS module `MinerUPlugin` in the `register` hook and logs messages during the `onStart`/`onStop` lifecycle.
46
-
47
- ## MinerU Integration Options
48
-
49
- | Field | Type | Description | Required | Default |
50
- | -------- | ------ | ------------------------------------- | -------- | ---------------------------- |
51
- | apiUrl | string | MinerU API base URL | No | `https://mineru.net/api/v4` |
52
- | apiKey | string | MinerU service API Key (keep secret) | Yes | — |
53
-
54
- > If both integration configuration and environment variables are provided, options from the integration configuration take precedence.
55
-
56
- ## Document Conversion Parameters
57
-
58
- `MinerUTransformerStrategy` supports the following configuration options (passed to the MinerU API when starting a workflow):
59
-
60
- | Field | Type | Default | Description |
61
- | ---------------- | ------- | ------------ | --------------------------------------------------- |
62
- | `isOcr` | boolean | `true` | Enable OCR for image-based PDFs. |
63
- | `enableFormula` | boolean | `true` | Recognize mathematical formulas and output tags. |
64
- | `enableTable` | boolean | `true` | Recognize tables and output structured tags. |
65
- | `language` | string | `"ch"` | Main document language, per MinerU API (`en`/`ch`). |
66
- | `modelVersion` | string | `"pipeline"` | MinerU model version (`pipeline`, `vlm`, etc.). |
67
-
68
- By default, the plugin creates MinerU tasks for each file to be processed, polls until `full_zip_url` is returned, then downloads and parses the zip package in memory.
69
-
70
- ## Permissions
71
-
72
- - **Integration**: Access MinerU integration configuration to read API address and credentials.
73
- - **File System**: Perform `read/write/list` on `XpFileSystem` to store image resources from MinerU results.
74
-
75
- Ensure the plugin is granted these permissions in your authorization policy, or it will not be able to retrieve results or write attachments.
76
-
77
- ## Output Content
78
-
79
- The parser generates:
80
-
81
- - Full Markdown: Resource links are automatically replaced to point to actual URLs written via `XpFileSystem`.
82
- - Structured metadata: Includes MinerU task ID, layout JSON (`layout.json`), content list (`content_list.json`), original PDF filename, etc.
83
- - Attachment asset list: Records written image resources for easy association by callers.
84
-
85
- The returned `Document<ChunkMetadata>` array currently defaults to a single chunk containing the full Markdown; you can split it as needed.
86
-
87
- ## Local Deployment
88
-
89
- For self-hosted MinerU deployments, see [LOCAL_SETUP.md](./LOCAL_SETUP.md) for detailed instructions on:
90
- - Starting MinerU server using Docker
91
- - Installing from source code
92
- - Configuration and troubleshooting
93
-
94
- Quick start with Docker:
95
- ```bash
96
- docker run -d --name mineru -p 9960:9960 opendatalab/mineru:latest
97
- ```
98
-
99
- ## Development & Debugging
100
-
101
- Run the following commands in the repository root to build and test locally:
102
-
103
- ```bash
104
- npm install
105
- npx nx build @chenchaolong/plugin-mineru
106
- npx nx test @chenchaolong/plugin-mineru
107
- ```
108
-
109
- TypeScript build artifacts are output to `packages/mineru/dist`. Before publishing, ensure `package.json`, type declarations, and runtime files are in sync.
110
-
111
- ## License
112
-
113
- This project follows the [AGPL-3.0 License](../../../LICENSE) in the repository root.
1
+ # Xpert Plugin: MinerU
2
+
3
+ `@chenchaolong/plugin-mineru` is a MinerU document converter plugin for the [Xpert AI](https://github.com/xpert-ai/xpert) platform, providing extraction capabilities from PDF to Markdown and structured JSON. The plugin includes built-in MinerU integration strategies, document conversion strategies, and result parsing services, enabling secure access to the MinerU API in automated workflows, polling task status, and writing parsed content and attachment resources to the platform file system.
4
+
5
+ ## Installation
6
+
7
+ ```bash
8
+ pnpm add @chenchaolong/plugin-mineru
9
+ # or
10
+ npm install @chenchaolong/plugin-mineru
11
+ ```
12
+
13
+ > **Note**: This plugin depends on `@xpert-ai/plugin-sdk`, `@nestjs/common@^11`, `@nestjs/config@^4`, `@metad/contracts`, `axios@1`, `chalk@4`, `@langchain/core@^0.3.72`, and `uuid@8` as peerDependencies. Please ensure these packages are installed in your host project.
14
+
15
+ ## Quick Start
16
+
17
+ 1. **Prepare MinerU Credentials**
18
+ Obtain a valid API Key from the MinerU dashboard and confirm the service address (default: `https://mineru.net/api/v4`).
19
+
20
+ 2. **Configure Integration in Xpert**
21
+ - Via Xpert Console: Create a MinerU integration and fill in the following fields.
22
+ - Or set environment variables in your deployment environment:
23
+ - `MINERU_API_BASE_URL`: Optional, defaults to `https://mineru.net/api/v4`.
24
+ - `MINERU_API_TOKEN`: Required, used as a fallback credential if no integration is configured.
25
+
26
+ Example integration configuration (JSON):
27
+
28
+ ```json
29
+ {
30
+ "provider": "mineru",
31
+ "options": {
32
+ "apiUrl": "https://mineru.net/api/v4",
33
+ "apiKey": "your-mineru-api-key"
34
+ }
35
+ }
36
+ ```
37
+
38
+ 3. **Register the Plugin**
39
+ Configure the plugin in your host service's plugin registration process:
40
+
41
+ ```sh .env
42
+ PLUGINS=@chenchaolong/plugin-mineru
43
+ ```
44
+
45
+ The plugin returns the NestJS module `MinerUPlugin` in the `register` hook and logs messages during the `onStart`/`onStop` lifecycle.
46
+
47
+ ## MinerU Integration Options
48
+
49
+ | Field | Type | Description | Required | Default |
50
+ | -------- | ------ | ------------------------------------- | -------- | ---------------------------- |
51
+ | apiUrl | string | MinerU API base URL | No | `https://mineru.net/api/v4` |
52
+ | apiKey | string | MinerU service API Key (keep secret) | Yes | — |
53
+
54
+ > If both integration configuration and environment variables are provided, options from the integration configuration take precedence.
55
+
56
+ ## Document Conversion Parameters
57
+
58
+ `MinerUTransformerStrategy` supports the following configuration options (passed to the MinerU API when starting a workflow):
59
+
60
+ | Field | Type | Default | Description |
61
+ | ---------------- | ------- | ------------ | --------------------------------------------------- |
62
+ | `isOcr` | boolean | `true` | Enable OCR for image-based PDFs. |
63
+ | `enableFormula` | boolean | `true` | Recognize mathematical formulas and output tags. |
64
+ | `enableTable` | boolean | `true` | Recognize tables and output structured tags. |
65
+ | `language` | string | `"ch"` | Main document language, per MinerU API (`en`/`ch`). |
66
+ | `modelVersion` | string | `"pipeline"` | MinerU model version (`pipeline`, `vlm`, etc.). |
67
+
68
+ By default, the plugin creates MinerU tasks for each file to be processed, polls until `full_zip_url` is returned, then downloads and parses the zip package in memory.
69
+
70
+ ## Permissions
71
+
72
+ - **Integration**: Access MinerU integration configuration to read API address and credentials.
73
+ - **File System**: Perform `read/write/list` on `XpFileSystem` to store image resources from MinerU results.
74
+
75
+ Ensure the plugin is granted these permissions in your authorization policy, or it will not be able to retrieve results or write attachments.
76
+
77
+ ## Output Content
78
+
79
+ The parser generates:
80
+
81
+ - Full Markdown: Resource links are automatically replaced to point to actual URLs written via `XpFileSystem`.
82
+ - Structured metadata: Includes MinerU task ID, layout JSON (`layout.json`), content list (`content_list.json`), original PDF filename, etc.
83
+ - Attachment asset list: Records written image resources for easy association by callers.
84
+
85
+ The returned `Document<ChunkMetadata>` array currently defaults to a single chunk containing the full Markdown; you can split it as needed.
86
+
87
+ ## Development & Debugging
88
+
89
+ Run the following commands in the repository root to build and test locally:
90
+
91
+ ```bash
92
+ npm install
93
+ npx nx build @chenchaolong/plugin-mineru
94
+ npx nx test @chenchaolong/plugin-mineru
95
+ ```
96
+
97
+ TypeScript build artifacts are output to `packages/mineru/dist`. Before publishing, ensure `package.json`, type declarations, and runtime files are in sync.
98
+
99
+ ## License
100
+
101
+ This project follows the [AGPL-3.0 License](../../../LICENSE) in the repository root.
@@ -1 +1 @@
1
- {"version":3,"file":"mineru.client.d.ts","sourceRoot":"","sources":["../../src/lib/mineru.client.ts"],"names":[],"mappings":"AAAA,OAAO,EAAE,YAAY,EAAE,MAAM,kBAAkB,CAAC;AAEhD,OAAO,EAAE,aAAa,EAAE,MAAM,gBAAgB,CAAC;AAC/C,OAAO,EAAmB,YAAY,EAAE,MAAM,sBAAsB,CAAC;AACrE,OAAc,EAAE,aAAa,EAAE,MAAM,OAAO,CAAC;AAK7C,OAAO,EAIL,wBAAwB,EAExB,0BAA0B,EAC1B,gBAAgB,EACjB,MAAM,YAAY,CAAC;AAIpB,UAAU,iBAAiB;IACzB,GAAG,CAAC,EAAE,MAAM,CAAC;IACb,QAAQ,CAAC,EAAE,MAAM,CAAC;IAClB,QAAQ,CAAC,EAAE,MAAM,CAAC;IAClB,KAAK,CAAC,EAAE,OAAO,CAAC;IAChB,aAAa,CAAC,EAAE,OAAO,CAAC;IACxB,WAAW,CAAC,EAAE,OAAO,CAAC;IACtB,QAAQ,CAAC,EAAE,MAAM,CAAC;IAClB,YAAY,CAAC,EAAE,MAAM,CAAC;IACtB,MAAM,CAAC,EAAE,MAAM,CAAC;IAChB,UAAU,CAAC,EAAE,MAAM,CAAC;IACpB,YAAY,CAAC,EAAE,MAAM,EAAE,CAAC;IACxB,WAAW,CAAC,EAAE,MAAM,CAAC;IACrB,IAAI,CAAC,EAAE,MAAM,CAAC;IACd,mEAAmE;IACnE,WAAW,CAAC,EAAE,MAAM,CAAC;IACrB,yEAAyE;IACzE,OAAO,CAAC,EAAE,MAAM,CAAC;IACjB,2EAA2E;IAC3E,SAAS,CAAC,EAAE,MAAM,CAAC;IACnB,4EAA4E;IAC5E,gBAAgB,CAAC,EAAE,OAAO,CAAC;CAC5B;AAED,UAAU,mBAAmB;IAC3B,GAAG,EAAE,MAAM,CAAC;IACZ,KAAK,CAAC,EAAE,OAAO,CAAC;IAChB,MAAM,CAAC,EAAE,MAAM,CAAC;IAChB,UAAU,CAAC,EAAE,MAAM,CAAC;CACrB;AAED,UAAU,sBAAsB;IAC9B,KAAK,EAAE,mBAAmB,EAAE,CAAC;IAC7B,aAAa,CAAC,EAAE,OAAO,CAAC;IACxB,WAAW,CAAC,EAAE,OAAO,CAAC;IACtB,QAAQ,CAAC,EAAE,MAAM,CAAC;IAClB,YAAY,CAAC,EAAE,MAAM,CAAC;IACtB,YAAY,CAAC,EAAE,MAAM,EAAE,CAAC;IACxB,WAAW,CAAC,EAAE,MAAM,CAAC;IACrB,IAAI,CAAC,EAAE,MAAM,CAAC;CACf;AAED,UAAU,iBAAiB;IACzB,aAAa,CAAC,EAAE,OAAO,CAAC;IACxB,WAAW,CAAC,EAAE,OAAO,CAAC;IACtB,QAAQ,CAAC,EAAE,MAAM,CAAC;CACnB;AASD,qBAAa,YAAY;IAWrB,OAAO,CAAC,QAAQ,CAAC,aAAa;IAC9B,OAAO,CAAC,QAAQ,CAAC,WAAW,CAAC;IAX/B,OAAO,CAAC,QAAQ,CAAC,MAAM,CAAiC;IACxD,OAAO,CAAC,QAAQ,CAAC,OAAO,CAAS;IACjC,OAAO,CAAC,QAAQ,CAAC,KAAK,CAAC,CAAS;IAChC,SAAgB,UAAU,EAAE,gBAAgB,CAAC;IAC7C,OAAO,CAAC,QAAQ,CAAC,UAAU,CAAiD;IAE5E,IAAI,UAAU,IAAI,YAAY,GAAG,SAAS,CAEzC;gBAEkB,aAAa,EAAE,aAAa,EAC5B,WAAW,CAAC,EAAE;QACvB,UAAU,CAAC,EAAE,YAAY,CAAC;QAC1B,WAAW,CAAC,EAAE,OAAO,CAAC,YAAY,CAAC,wBAAwB,CAAC,CAAC,CAAC;KACjE;IAkBP;;;OAGG;IACG,UAAU,CAAC,OAAO,EAAE,iBAAiB,GAAG,OAAO,CAAC;QAAE,MAAM,EAAE,MAAM,CAAA;KAAE,CAAC;IAYzE;;OAEG;IACG,eAAe,CAAC,OAAO,EAAE,sBAAsB,GAAG,OAAO,CAAC;QAAE,OAAO,EAAE,MAAM,CAAC;QAAC,QAAQ,CAAC,EAAE,MAAM,EAAE,CAAA;KAAE,CAAC;IA+CzG,iBAAiB,CAAC,MAAM,EAAE,MAAM,GAAG,0BAA0B,GAAG,SAAS;IAOzE;;OAEG;IACG,aAAa,CAAC,MAAM,EAAE,MAAM,EAAE,OAAO,CAAC,EAAE,iBAAiB,GAAG,OAAO,CAAC;QACxE,YAAY,CAAC,EAAE,MAAM,CAAC;QACtB,QAAQ,CAAC,EAAE,MAAM,CAAC;QAClB,OAAO,CAAC,EAAE,MAAM,CAAC;QACjB,MAAM,CAAC,EAAE,MAAM,CAAC;KACjB,CAAC;IAoBF;;OAEG;IACG,cAAc,CAAC,OAAO,EAAE,MAAM,GAAG,OAAO,CAAC,GAAG,CAAC;IAiBnD;;OAEG;IACG,WAAW,CAAC,MAAM,EAAE,MAAM,EAAE,SAAS,SAAgB,EAAE,UAAU,SAAO,GAAG,OAAO,CAAC,GAAG,CAAC;IAsB7F,OAAO,CAAC,cAAc;IAMtB,OAAO,CAAC,iBAAiB;IAczB,OAAO,CAAC,kBAAkB;IAyB1B,OAAO,CAAC,sBAAsB;IAI9B,OAAO,CAAC,gBAAgB;IAIxB,OAAO,CAAC,WAAW;IAQnB,OAAO,CAAC,kBAAkB;IAO1B,OAAO,CAAC,oBAAoB;YAYd,kBAAkB;YAmClB,oBAAoB;YAcpB,qBAAqB;YA0DrB,uBAAuB;IA+CrC,OAAO,CAAC,iBAAiB;IAgBzB,OAAO,CAAC,2BAA2B;IAenC,OAAO,CAAC,6BAA6B;IAcrC,OAAO,CAAC,iBAAiB;IAQzB,OAAO,CAAC,aAAa;IAcrB,OAAO,CAAC,iBAAiB;IAQzB,OAAO,CAAC,eAAe;YAIT,YAAY;IAkB1B,OAAO,CAAC,eAAe;IA0BvB,wBAAwB,IAAI,OAAO,CAAC,aAAa,CAAC,GAAG,EAAE,GAAG,CAAC,CAAC;IAKtD,wBAAwB;CAU/B"}
1
+ {"version":3,"file":"mineru.client.d.ts","sourceRoot":"","sources":["../../src/lib/mineru.client.ts"],"names":[],"mappings":"AAAA,OAAO,EAAE,YAAY,EAAE,MAAM,kBAAkB,CAAC;AAEhD,OAAO,EAAE,aAAa,EAAE,MAAM,gBAAgB,CAAC;AAC/C,OAAO,EAAmB,YAAY,EAAE,MAAM,sBAAsB,CAAC;AACrE,OAAc,EAAE,aAAa,EAAE,MAAM,OAAO,CAAC;AAK7C,OAAO,EAIL,wBAAwB,EAExB,0BAA0B,EAC1B,gBAAgB,EACjB,MAAM,YAAY,CAAC;AAIpB,UAAU,iBAAiB;IACzB,GAAG,CAAC,EAAE,MAAM,CAAC;IACb,QAAQ,CAAC,EAAE,MAAM,CAAC;IAClB,QAAQ,CAAC,EAAE,MAAM,CAAC;IAClB,KAAK,CAAC,EAAE,OAAO,CAAC;IAChB,aAAa,CAAC,EAAE,OAAO,CAAC;IACxB,WAAW,CAAC,EAAE,OAAO,CAAC;IACtB,QAAQ,CAAC,EAAE,MAAM,CAAC;IAClB,YAAY,CAAC,EAAE,MAAM,CAAC;IACtB,MAAM,CAAC,EAAE,MAAM,CAAC;IAChB,UAAU,CAAC,EAAE,MAAM,CAAC;IACpB,YAAY,CAAC,EAAE,MAAM,EAAE,CAAC;IACxB,WAAW,CAAC,EAAE,MAAM,CAAC;IACrB,IAAI,CAAC,EAAE,MAAM,CAAC;IACd,mEAAmE;IACnE,WAAW,CAAC,EAAE,MAAM,CAAC;IACrB,yEAAyE;IACzE,OAAO,CAAC,EAAE,MAAM,CAAC;IACjB,2EAA2E;IAC3E,SAAS,CAAC,EAAE,MAAM,CAAC;IACnB,4EAA4E;IAC5E,gBAAgB,CAAC,EAAE,OAAO,CAAC;CAC5B;AAED,UAAU,mBAAmB;IAC3B,GAAG,EAAE,MAAM,CAAC;IACZ,KAAK,CAAC,EAAE,OAAO,CAAC;IAChB,MAAM,CAAC,EAAE,MAAM,CAAC;IAChB,UAAU,CAAC,EAAE,MAAM,CAAC;CACrB;AAED,UAAU,sBAAsB;IAC9B,KAAK,EAAE,mBAAmB,EAAE,CAAC;IAC7B,aAAa,CAAC,EAAE,OAAO,CAAC;IACxB,WAAW,CAAC,EAAE,OAAO,CAAC;IACtB,QAAQ,CAAC,EAAE,MAAM,CAAC;IAClB,YAAY,CAAC,EAAE,MAAM,CAAC;IACtB,YAAY,CAAC,EAAE,MAAM,EAAE,CAAC;IACxB,WAAW,CAAC,EAAE,MAAM,CAAC;IACrB,IAAI,CAAC,EAAE,MAAM,CAAC;CACf;AAED,UAAU,iBAAiB;IACzB,aAAa,CAAC,EAAE,OAAO,CAAC;IACxB,WAAW,CAAC,EAAE,OAAO,CAAC;IACtB,QAAQ,CAAC,EAAE,MAAM,CAAC;CACnB;AASD,qBAAa,YAAY;IAWrB,OAAO,CAAC,QAAQ,CAAC,aAAa;IAC9B,OAAO,CAAC,QAAQ,CAAC,WAAW,CAAC;IAX/B,OAAO,CAAC,QAAQ,CAAC,MAAM,CAAiC;IACxD,OAAO,CAAC,QAAQ,CAAC,OAAO,CAAS;IACjC,OAAO,CAAC,QAAQ,CAAC,KAAK,CAAC,CAAS;IAChC,SAAgB,UAAU,EAAE,gBAAgB,CAAC;IAC7C,OAAO,CAAC,QAAQ,CAAC,UAAU,CAAiD;IAE5E,IAAI,UAAU,IAAI,YAAY,GAAG,SAAS,CAEzC;gBAEkB,aAAa,EAAE,aAAa,EAC5B,WAAW,CAAC,EAAE;QACvB,UAAU,CAAC,EAAE,YAAY,CAAC;QAC1B,WAAW,CAAC,EAAE,OAAO,CAAC,YAAY,CAAC,wBAAwB,CAAC,CAAC,CAAC;KACjE;IAkBP;;;OAGG;IACG,UAAU,CAAC,OAAO,EAAE,iBAAiB,GAAG,OAAO,CAAC;QAAE,MAAM,EAAE,MAAM,CAAA;KAAE,CAAC;IAYzE;;OAEG;IACG,eAAe,CAAC,OAAO,EAAE,sBAAsB,GAAG,OAAO,CAAC;QAAE,OAAO,EAAE,MAAM,CAAC;QAAC,QAAQ,CAAC,EAAE,MAAM,EAAE,CAAA;KAAE,CAAC;IAmCzG,iBAAiB,CAAC,MAAM,EAAE,MAAM,GAAG,0BAA0B,GAAG,SAAS;IAOzE;;OAEG;IACG,aAAa,CAAC,MAAM,EAAE,MAAM,EAAE,OAAO,CAAC,EAAE,iBAAiB,GAAG,OAAO,CAAC;QACxE,YAAY,CAAC,EAAE,MAAM,CAAC;QACtB,QAAQ,CAAC,EAAE,MAAM,CAAC;QAClB,OAAO,CAAC,EAAE,MAAM,CAAC;QACjB,MAAM,CAAC,EAAE,MAAM,CAAC;KACjB,CAAC;IAoBF;;OAEG;IACG,cAAc,CAAC,OAAO,EAAE,MAAM,GAAG,OAAO,CAAC,GAAG,CAAC;IAiBnD;;OAEG;IACG,WAAW,CAAC,MAAM,EAAE,MAAM,EAAE,SAAS,SAAgB,EAAE,UAAU,SAAO,GAAG,OAAO,CAAC,GAAG,CAAC;IAsB7F,OAAO,CAAC,cAAc;IAMtB,OAAO,CAAC,iBAAiB;IAczB,OAAO,CAAC,kBAAkB;IAyB1B,OAAO,CAAC,sBAAsB;IAI9B,OAAO,CAAC,gBAAgB;IAIxB,OAAO,CAAC,WAAW;IAQnB,OAAO,CAAC,kBAAkB;IAO1B,OAAO,CAAC,oBAAoB;YAYd,kBAAkB;YA4BlB,oBAAoB;YA6BpB,qBAAqB;YAoErB,uBAAuB;IAsDrC,OAAO,CAAC,iBAAiB;IAgBzB,OAAO,CAAC,2BAA2B;IAenC,OAAO,CAAC,6BAA6B;IAcrC,OAAO,CAAC,iBAAiB;IAQzB,OAAO,CAAC,aAAa;IAcrB,OAAO,CAAC,iBAAiB;IAQzB,OAAO,CAAC,eAAe;YAIT,YAAY;IAkB1B,OAAO,CAAC,eAAe;IA0BvB,wBAAwB,IAAI,OAAO,CAAC,aAAa,CAAC,GAAG,EAAE,GAAG,CAAC,CAAC;IAKtD,wBAAwB;CAU/B"}
@@ -3,7 +3,7 @@ import { getErrorMessage } from '@xpert-ai/plugin-sdk';
3
3
  import axios from 'axios';
4
4
  import FormData from 'form-data';
5
5
  import { randomUUID } from 'crypto';
6
- import { basename, normalize, resolve } from 'path';
6
+ import { basename } from 'path';
7
7
  import fs from 'fs';
8
8
  import { ENV_MINERU_API_BASE_URL, ENV_MINERU_API_TOKEN, ENV_MINERU_SERVER_TYPE, } from './types.js';
9
9
  const DEFAULT_OFFICIAL_BASE_URL = 'https://mineru.net/api/v4';
@@ -46,10 +46,6 @@ export class MinerUClient {
46
46
  */
47
47
  async createBatchTask(options) {
48
48
  this.ensureOfficial('createBatchTask');
49
- // Validate files is an array
50
- if (!Array.isArray(options.files)) {
51
- throw new Error('MinerU createBatchTask requires files to be an array');
52
- }
53
49
  const url = this.buildApiUrl('extract', 'task', 'batch');
54
50
  const body = {
55
51
  files: options.files.map((file) => {
@@ -71,15 +67,8 @@ export class MinerUClient {
71
67
  body.language = options.language;
72
68
  if (options.modelVersion)
73
69
  body.model_version = options.modelVersion;
74
- // Ensure extraFormats is an array if provided
75
- if (options.extraFormats) {
76
- if (Array.isArray(options.extraFormats)) {
77
- body.extra_formats = options.extraFormats;
78
- }
79
- else {
80
- this.logger.warn('extraFormats is not an array, ignoring');
81
- }
82
- }
70
+ if (options.extraFormats)
71
+ body.extra_formats = options.extraFormats;
83
72
  if (options.callbackUrl)
84
73
  body.callback = options.callbackUrl;
85
74
  if (options.seed)
@@ -242,15 +231,8 @@ export class MinerUClient {
242
231
  body.data_id = options.dataId;
243
232
  if (options.pageRanges)
244
233
  body.page_ranges = options.pageRanges;
245
- // Ensure extraFormats is an array if provided
246
- if (options.extraFormats) {
247
- if (Array.isArray(options.extraFormats)) {
248
- body.extra_formats = options.extraFormats;
249
- }
250
- else {
251
- this.logger.warn('extraFormats is not an array, ignoring');
252
- }
253
- }
234
+ if (options.extraFormats)
235
+ body.extra_formats = options.extraFormats;
254
236
  if (options.callbackUrl)
255
237
  body.callback = options.callbackUrl;
256
238
  if (options.seed)
@@ -269,20 +251,33 @@ export class MinerUClient {
269
251
  }
270
252
  }
271
253
  async createSelfHostedTask(options) {
254
+ // Validate fileSystem is available for self-hosted mode
255
+ if (!this.fileSystem) {
256
+ throw new Error('MinerU self-hosted mode requires fileSystem permission');
257
+ }
258
+ // Validate filePath is provided
272
259
  if (!options.filePath) {
273
- throw new Error('MinerU createSelfHostedTask requires a filePath');
260
+ throw new Error('MinerU self-hosted mode requires filePath to be provided');
261
+ }
262
+ // Get absolute file path from fileSystem
263
+ const filePath = this.fileSystem.fullPath(options.filePath);
264
+ // Validate file exists before attempting to parse
265
+ try {
266
+ await fs.promises.access(filePath, fs.constants.F_OK);
267
+ }
268
+ catch (error) {
269
+ this.logger.error(`File not found: ${filePath}`, error instanceof Error ? error.stack : error);
270
+ throw new Error(`File not found: ${filePath}`);
274
271
  }
275
- // Normalize path for cross-platform compatibility (Windows/Linux)
276
- const rawPath = this.fileSystem.fullPath(options.filePath);
277
- const filePath = normalize(resolve(rawPath));
278
272
  const taskId = randomUUID();
279
- const result = await this.invokeSelfHostedParse(filePath, options.fileName, options);
273
+ const result = await this.invokeSelfHostedParse(filePath, options.fileName || basename(filePath), options);
280
274
  this.localTasks.set(taskId, { ...result, sourceUrl: options.url });
281
275
  return { taskId };
282
276
  }
283
277
  async invokeSelfHostedParse(filePath, fileName, options) {
284
278
  const parseUrl = this.buildApiUrl('file_parse');
285
279
  const form = new FormData();
280
+ // Create file read stream (file existence is already validated in createSelfHostedTask)
286
281
  form.append('files', fs.createReadStream(filePath), {
287
282
  filename: fileName,
288
283
  });
@@ -313,11 +308,14 @@ export class MinerUClient {
313
308
  return this.invokeSelfHostedParseV1(filePath, fileName, options);
314
309
  }
315
310
  if (response.status === 400) {
316
- throw new BadRequestException(`MinerU self-hosted parse failed: ${response.status} ${getErrorMessage(response.data)}`);
311
+ const errorMessage = getErrorMessage(response.data);
312
+ this.logger.error(`MinerU self-hosted parse failed with 400: ${errorMessage}`, JSON.stringify(response.data));
313
+ throw new BadRequestException(`MinerU self-hosted parse failed: ${response.status} ${errorMessage}`);
317
314
  }
318
315
  if (response.status !== 200) {
319
- console.error(response.data);
320
- throw new Error(`MinerU self-hosted parse failed: ${response.status} ${response.statusText}`);
316
+ const errorMessage = getErrorMessage(response.data) || response.statusText;
317
+ this.logger.error(`MinerU self-hosted parse failed with ${response.status}: ${errorMessage}`, JSON.stringify(response.data));
318
+ throw new Error(`MinerU self-hosted parse failed: ${response.status} ${response.statusText}. ${errorMessage}`);
321
319
  }
322
320
  return this.normalizeSelfHostedResponse(response.data);
323
321
  }
@@ -346,7 +344,9 @@ export class MinerUClient {
346
344
  validateStatus: () => true,
347
345
  });
348
346
  if (response.status !== 200) {
349
- throw new Error(`MinerU self-hosted legacy parse failed: ${response.status} ${response.statusText}`);
347
+ const errorMessage = getErrorMessage(response.data) || response.statusText;
348
+ this.logger.error(`MinerU self-hosted legacy parse failed with ${response.status}: ${errorMessage}`, JSON.stringify(response.data));
349
+ throw new Error(`MinerU self-hosted legacy parse failed: ${response.status} ${response.statusText}. ${errorMessage}`);
350
350
  }
351
351
  return this.normalizeSelfHostedResponse(response.data);
352
352
  }
@@ -1 +1 @@
1
- {"version":3,"file":"result-parser.service.d.ts","sourceRoot":"","sources":["../../src/lib/result-parser.service.ts"],"names":[],"mappings":"AAAA,OAAO,EAAE,QAAQ,EAAE,MAAM,2BAA2B,CAAC;AACrD,OAAO,EAAE,kBAAkB,EAAE,MAAM,kBAAkB,CAAC;AAEtD,OAAO,EACL,aAAa,EAEb,YAAY,EACb,MAAM,sBAAsB,CAAC;AAK9B,OAAO,EAEL,sBAAsB,EACtB,0BAA0B,EAC3B,MAAM,YAAY,CAAC;AAEpB,qBACa,yBAAyB;IACpC,OAAO,CAAC,QAAQ,CAAC,MAAM,CAA8C;IAE/D,YAAY,CAChB,UAAU,EAAE,MAAM,EAClB,MAAM,EAAE,MAAM,EACd,QAAQ,EAAE,OAAO,CAAC,kBAAkB,CAAC,EACrC,UAAU,EAAE,YAAY,GACvB,OAAO,CAAC;QACT,EAAE,CAAC,EAAE,MAAM,CAAC;QACZ,MAAM,EAAE,QAAQ,CAAC,aAAa,CAAC,EAAE,CAAC;QAClC,QAAQ,EAAE,sBAAsB,CAAC;KAClC,CAAC;IAwFI,cAAc,CAClB,MAAM,EAAE,0BAA0B,EAClC,MAAM,EAAE,MAAM,EACd,QAAQ,EAAE,OAAO,CAAC,kBAAkB,CAAC,EACrC,UAAU,EAAE,YAAY,GACvB,OAAO,CAAC;QACT,EAAE,CAAC,EAAE,MAAM,CAAC;QACZ,MAAM,EAAE,QAAQ,CAAC,aAAa,CAAC,EAAE,CAAC;QAClC,QAAQ,EAAE,sBAAsB,CAAC;KAClC,CAAC;CAoDH"}
1
+ {"version":3,"file":"result-parser.service.d.ts","sourceRoot":"","sources":["../../src/lib/result-parser.service.ts"],"names":[],"mappings":"AAAA,OAAO,EAAE,QAAQ,EAAE,MAAM,2BAA2B,CAAC;AACrD,OAAO,EAAE,kBAAkB,EAAE,MAAM,kBAAkB,CAAC;AAEtD,OAAO,EACL,aAAa,EAEb,YAAY,EACb,MAAM,sBAAsB,CAAC;AAK9B,OAAO,EAEL,sBAAsB,EACtB,0BAA0B,EAC3B,MAAM,YAAY,CAAC;AAEpB,qBACa,yBAAyB;IACpC,OAAO,CAAC,QAAQ,CAAC,MAAM,CAA8C;IAE/D,YAAY,CAChB,UAAU,EAAE,MAAM,EAClB,MAAM,EAAE,MAAM,EACd,QAAQ,EAAE,OAAO,CAAC,kBAAkB,CAAC,EACrC,UAAU,EAAE,YAAY,GACvB,OAAO,CAAC;QACT,EAAE,CAAC,EAAE,MAAM,CAAC;QACZ,MAAM,EAAE,QAAQ,CAAC,aAAa,CAAC,EAAE,CAAC;QAClC,QAAQ,EAAE,sBAAsB,CAAC;KAClC,CAAC;IAqFI,cAAc,CAClB,MAAM,EAAE,0BAA0B,EAClC,MAAM,EAAE,MAAM,EACd,QAAQ,EAAE,OAAO,CAAC,kBAAkB,CAAC,EACrC,UAAU,EAAE,YAAY,GACvB,OAAO,CAAC;QACT,EAAE,CAAC,EAAE,MAAM,CAAC;QACZ,MAAM,EAAE,QAAQ,CAAC,aAAa,CAAC,EAAE,CAAC;QAClC,QAAQ,EAAE,sBAAsB,CAAC;KAClC,CAAC;CAkDH"}
@@ -3,7 +3,7 @@ import { __decorate } from "tslib";
3
3
  import { Document } from '@langchain/core/documents';
4
4
  import { Injectable, Logger } from '@nestjs/common';
5
5
  import axios from 'axios';
6
- import { join, normalize } from 'path';
6
+ import { join } from 'path';
7
7
  import unzipper from 'unzipper';
8
8
  import { v4 as uuidv4 } from 'uuid';
9
9
  import { MinerU, } from './types.js';
@@ -34,11 +34,8 @@ let MinerUResultParserService = MinerUResultParserService_1 = class MinerUResult
34
34
  continue;
35
35
  const data = await entry.buffer();
36
36
  zipEntries.push({ entryName: entry.path, data });
37
- // Normalize ZIP entry path (ZIP files use POSIX format with '/' separator)
38
- // Convert to platform-specific path format for cross-platform compatibility
39
- const normalizedEntryPath = entry.path.replace(/\\/g, '/'); // Normalize to POSIX format first
40
- const fileName = normalizedEntryPath;
41
- const filePath = normalize(join(document.folder || '', normalizedEntryPath));
37
+ const fileName = entry.path;
38
+ const filePath = join(document.folder || '', entry.path);
42
39
  const url = await fileSystem.writeFile(filePath, data);
43
40
  pathMap.set(fileName, url);
44
41
  // Write images to local file system
@@ -102,9 +99,7 @@ let MinerUResultParserService = MinerUResultParserService_1 = class MinerUResult
102
99
  };
103
100
  const assets = [];
104
101
  const pathMap = new Map();
105
- // Ensure images is an array before iterating
106
- const images = Array.isArray(result.images) ? result.images : [];
107
- for (const image of images) {
102
+ for (const image of result.images) {
108
103
  const filePath = join(document.folder || '', 'images', image.name);
109
104
  const url = await fileSystem.writeFile(filePath, Buffer.from(image.dataUrl.split(',')[1], 'base64'));
110
105
  pathMap.set(`images/${image.name}`, url);
@@ -1 +1 @@
1
- {"version":3,"file":"transformer-mineru.strategy.d.ts","sourceRoot":"","sources":["../../src/lib/transformer-mineru.strategy.ts"],"names":[],"mappings":"AAAA,OAAO,EAAE,QAAQ,EAAE,kBAAkB,EAAE,MAAM,kBAAkB,CAAA;AAG/D,OAAO,EACL,aAAa,EAEb,oBAAoB,EACpB,4BAA4B,EAC5B,qBAAqB,EACtB,MAAM,sBAAsB,CAAA;AAI7B,OAAO,EAAgB,wBAAwB,EAAE,MAAM,YAAY,CAAA;AAEnE,qBAEa,yBAA0B,YAAW,4BAA4B,CAAC,wBAAwB,CAAC;IAEtG,OAAO,CAAC,QAAQ,CAAC,YAAY,CAA2B;IAGxD,OAAO,CAAC,QAAQ,CAAC,aAAa,CAAe;IAE7C,QAAQ,CAAC,WAAW,mDAWnB;IAED,QAAQ,CAAC,IAAI;;;;;;;;;;;kBAWM,QAAQ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;MAwE1B;IAED,cAAc,CAAC,MAAM,EAAE,GAAG,GAAG,OAAO,CAAC,IAAI,CAAC;IAIpC,kBAAkB,CACtB,SAAS,EAAE,OAAO,CAAC,kBAAkB,CAAC,EAAE,EACxC,MAAM,EAAE,wBAAwB,GAC/B,OAAO,CAAC,OAAO,CAAC,kBAAkB,CAAC,aAAa,CAAC,CAAC,EAAE,CAAC;CAsDzD"}
1
+ {"version":3,"file":"transformer-mineru.strategy.d.ts","sourceRoot":"","sources":["../../src/lib/transformer-mineru.strategy.ts"],"names":[],"mappings":"AAAA,OAAO,EAAE,QAAQ,EAAE,kBAAkB,EAAE,MAAM,kBAAkB,CAAA;AAG/D,OAAO,EACL,aAAa,EAEb,oBAAoB,EACpB,4BAA4B,EAC5B,qBAAqB,EACtB,MAAM,sBAAsB,CAAA;AAI7B,OAAO,EAAgB,wBAAwB,EAAE,MAAM,YAAY,CAAA;AAEnE,qBAEa,yBAA0B,YAAW,4BAA4B,CAAC,wBAAwB,CAAC;IAEtG,OAAO,CAAC,QAAQ,CAAC,YAAY,CAA2B;IAGxD,OAAO,CAAC,QAAQ,CAAC,aAAa,CAAe;IAE7C,QAAQ,CAAC,WAAW,mDAWnB;IAED,QAAQ,CAAC,IAAI;;;;;;;;;;;kBAWM,QAAQ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;MAwE1B;IAED,cAAc,CAAC,MAAM,EAAE,GAAG,GAAG,OAAO,CAAC,IAAI,CAAC;IAIpC,kBAAkB,CACtB,SAAS,EAAE,OAAO,CAAC,kBAAkB,CAAC,EAAE,EACxC,MAAM,EAAE,wBAAwB,GAC/B,OAAO,CAAC,OAAO,CAAC,kBAAkB,CAAC,aAAa,CAAC,CAAC,EAAE,CAAC;CA8DzD"}
@@ -125,8 +125,12 @@ let MinerUTransformerStrategy = class MinerUTransformerStrategy {
125
125
  });
126
126
  const result = mineru.getSelfHostedTask(taskId);
127
127
  const parsedResult = await this.resultParser.parseLocalTask(result, taskId, document, config.permissions.fileSystem);
128
- parsedResult.id = document.id;
129
- parsedResults.push(parsedResult);
128
+ // Convert parsedResult to IKnowledgeDocument format
129
+ parsedResults.push({
130
+ id: document.id,
131
+ chunks: parsedResult.chunks,
132
+ metadata: parsedResult.metadata
133
+ });
130
134
  }
131
135
  else {
132
136
  const { taskId } = await mineru.createTask({
@@ -141,8 +145,12 @@ let MinerUTransformerStrategy = class MinerUTransformerStrategy {
141
145
  // Waiting for completion
142
146
  const result = await mineru.waitForTask(taskId, 5 * 60 * 1000, 5000);
143
147
  const parsedResult = await this.resultParser.parseFromUrl(result.full_zip_url, taskId, document, config.permissions.fileSystem);
144
- parsedResult.id = document.id;
145
- parsedResults.push(parsedResult);
148
+ // Convert parsedResult to IKnowledgeDocument format
149
+ parsedResults.push({
150
+ id: document.id,
151
+ chunks: parsedResult.chunks,
152
+ metadata: parsedResult.metadata
153
+ });
146
154
  }
147
155
  }
148
156
  return parsedResults;
package/dist/lib/types.js CHANGED
@@ -2,26 +2,26 @@ export const MinerU = 'mineru';
2
2
  export const ENV_MINERU_API_BASE_URL = 'MINERU_API_BASE_URL';
3
3
  export const ENV_MINERU_API_TOKEN = 'MINERU_API_TOKEN';
4
4
  export const ENV_MINERU_SERVER_TYPE = 'MINERU_SERVER_TYPE';
5
- export const icon = `<svg width="24" height="24" viewBox="0 0 24 24" fill="none" xmlns="http://www.w3.org/2000/svg">
6
- <path d="M19.7238 3.86898C19.7238 4.57597 19.1502 5.1491 18.4427 5.1491C17.7352 5.1491 17.1616 4.57597 17.1616 3.86898C17.1616 3.16199 17.7352 2.58887 18.4427 2.58887C19.1502 2.58887 19.7238 3.16199 19.7238 3.86898Z" fill="url(#paint0_linear_8609_1645)"/>
7
- <path d="M19.7238 3.86898C19.7238 4.57597 19.1502 5.1491 18.4427 5.1491C17.7352 5.1491 17.1616 4.57597 17.1616 3.86898C17.1616 3.16199 17.7352 2.58887 18.4427 2.58887C19.1502 2.58887 19.7238 3.16199 19.7238 3.86898Z" fill="#010101"/>
8
- <path d="M15.3681 5.1491C15.3681 5.85609 14.7945 6.42921 14.087 6.42921C13.3794 6.42921 12.8059 5.85609 12.8059 5.1491C12.8059 4.44211 13.3794 3.86898 14.087 3.86898C14.7945 3.86898 15.3681 4.44211 15.3681 5.1491Z" fill="url(#paint1_linear_8609_1645)"/>
9
- <path d="M15.3681 5.1491C15.3681 5.85609 14.7945 6.42921 14.087 6.42921C13.3794 6.42921 12.8059 5.85609 12.8059 5.1491C12.8059 4.44211 13.3794 3.86898 14.087 3.86898C14.7945 3.86898 15.3681 4.44211 15.3681 5.1491Z" fill="#010101"/>
10
- <path fill-rule="evenodd" clip-rule="evenodd" d="M8.05175 11.2368C8.05175 13.4605 9.14375 15.4293 10.8211 16.6371C11.8241 15.7389 12.4551 14.4345 12.4551 12.9828V9.39673C12.4551 8.85661 12.8197 8.38448 13.3426 8.24757L19.8924 6.53265C20.6459 6.33534 21.3826 6.90341 21.3826 7.6818L21.3826 12.0452C21.3826 17.2179 17.1861 21.4111 12.0095 21.4111L11.9942 21.4111C6.81758 21.4111 2.62109 17.2179 2.62109 12.0452V9.03388C2.62109 8.49175 2.9884 8.01839 3.51385 7.88336L6.56677 7.09882C7.31904 6.9055 8.05175 7.47318 8.05175 8.24934V11.2368ZM3.9798 12.0452C3.9798 13.8476 4.57565 15.5108 5.58124 16.849C6.04996 17.4728 6.7655 17.8884 7.54573 17.8884V17.8884C8.28848 17.8884 8.9927 17.7236 9.62376 17.4286C7.83439 15.9596 6.69304 13.7314 6.69304 11.2368V8.46821L3.9798 9.16546V12.0452Z" fill="url(#paint2_linear_8609_1645)"/>
11
- <path fill-rule="evenodd" clip-rule="evenodd" d="M8.05175 11.2368C8.05175 13.4605 9.14375 15.4293 10.8211 16.6371C11.8241 15.7389 12.4551 14.4345 12.4551 12.9828V9.39673C12.4551 8.85661 12.8197 8.38448 13.3426 8.24757L19.8924 6.53265C20.6459 6.33534 21.3826 6.90341 21.3826 7.6818L21.3826 12.0452C21.3826 17.2179 17.1861 21.4111 12.0095 21.4111L11.9942 21.4111C6.81758 21.4111 2.62109 17.2179 2.62109 12.0452V9.03388C2.62109 8.49175 2.9884 8.01839 3.51385 7.88336L6.56677 7.09882C7.31904 6.9055 8.05175 7.47318 8.05175 8.24934V11.2368ZM3.9798 12.0452C3.9798 13.8476 4.57565 15.5108 5.58124 16.849C6.04996 17.4728 6.7655 17.8884 7.54573 17.8884V17.8884C8.28848 17.8884 8.9927 17.7236 9.62376 17.4286C7.83439 15.9596 6.69304 13.7314 6.69304 11.2368V8.46821L3.9798 9.16546V12.0452Z" fill="#010101"/>
12
- <defs>
13
- <linearGradient id="paint0_linear_8609_1645" x1="14.3898" y1="8.36821" x2="13.1876" y2="19.4461" gradientUnits="userSpaceOnUse">
14
- <stop stop-color="white"/>
15
- <stop offset="1" stop-color="#2E2E2E"/>
16
- </linearGradient>
17
- <linearGradient id="paint1_linear_8609_1645" x1="14.3898" y1="8.36821" x2="13.1876" y2="19.4461" gradientUnits="userSpaceOnUse">
18
- <stop stop-color="white"/>
19
- <stop offset="1" stop-color="#2E2E2E"/>
20
- </linearGradient>
21
- <linearGradient id="paint2_linear_8609_1645" x1="14.3898" y1="8.36821" x2="13.1876" y2="19.4461" gradientUnits="userSpaceOnUse">
22
- <stop stop-color="white"/>
23
- <stop offset="1" stop-color="#2E2E2E"/>
24
- </linearGradient>
25
- </defs>
26
- </svg>
5
+ export const icon = `<svg width="24" height="24" viewBox="0 0 24 24" fill="none" xmlns="http://www.w3.org/2000/svg">
6
+ <path d="M19.7238 3.86898C19.7238 4.57597 19.1502 5.1491 18.4427 5.1491C17.7352 5.1491 17.1616 4.57597 17.1616 3.86898C17.1616 3.16199 17.7352 2.58887 18.4427 2.58887C19.1502 2.58887 19.7238 3.16199 19.7238 3.86898Z" fill="url(#paint0_linear_8609_1645)"/>
7
+ <path d="M19.7238 3.86898C19.7238 4.57597 19.1502 5.1491 18.4427 5.1491C17.7352 5.1491 17.1616 4.57597 17.1616 3.86898C17.1616 3.16199 17.7352 2.58887 18.4427 2.58887C19.1502 2.58887 19.7238 3.16199 19.7238 3.86898Z" fill="#010101"/>
8
+ <path d="M15.3681 5.1491C15.3681 5.85609 14.7945 6.42921 14.087 6.42921C13.3794 6.42921 12.8059 5.85609 12.8059 5.1491C12.8059 4.44211 13.3794 3.86898 14.087 3.86898C14.7945 3.86898 15.3681 4.44211 15.3681 5.1491Z" fill="url(#paint1_linear_8609_1645)"/>
9
+ <path d="M15.3681 5.1491C15.3681 5.85609 14.7945 6.42921 14.087 6.42921C13.3794 6.42921 12.8059 5.85609 12.8059 5.1491C12.8059 4.44211 13.3794 3.86898 14.087 3.86898C14.7945 3.86898 15.3681 4.44211 15.3681 5.1491Z" fill="#010101"/>
10
+ <path fill-rule="evenodd" clip-rule="evenodd" d="M8.05175 11.2368C8.05175 13.4605 9.14375 15.4293 10.8211 16.6371C11.8241 15.7389 12.4551 14.4345 12.4551 12.9828V9.39673C12.4551 8.85661 12.8197 8.38448 13.3426 8.24757L19.8924 6.53265C20.6459 6.33534 21.3826 6.90341 21.3826 7.6818L21.3826 12.0452C21.3826 17.2179 17.1861 21.4111 12.0095 21.4111L11.9942 21.4111C6.81758 21.4111 2.62109 17.2179 2.62109 12.0452V9.03388C2.62109 8.49175 2.9884 8.01839 3.51385 7.88336L6.56677 7.09882C7.31904 6.9055 8.05175 7.47318 8.05175 8.24934V11.2368ZM3.9798 12.0452C3.9798 13.8476 4.57565 15.5108 5.58124 16.849C6.04996 17.4728 6.7655 17.8884 7.54573 17.8884V17.8884C8.28848 17.8884 8.9927 17.7236 9.62376 17.4286C7.83439 15.9596 6.69304 13.7314 6.69304 11.2368V8.46821L3.9798 9.16546V12.0452Z" fill="url(#paint2_linear_8609_1645)"/>
11
+ <path fill-rule="evenodd" clip-rule="evenodd" d="M8.05175 11.2368C8.05175 13.4605 9.14375 15.4293 10.8211 16.6371C11.8241 15.7389 12.4551 14.4345 12.4551 12.9828V9.39673C12.4551 8.85661 12.8197 8.38448 13.3426 8.24757L19.8924 6.53265C20.6459 6.33534 21.3826 6.90341 21.3826 7.6818L21.3826 12.0452C21.3826 17.2179 17.1861 21.4111 12.0095 21.4111L11.9942 21.4111C6.81758 21.4111 2.62109 17.2179 2.62109 12.0452V9.03388C2.62109 8.49175 2.9884 8.01839 3.51385 7.88336L6.56677 7.09882C7.31904 6.9055 8.05175 7.47318 8.05175 8.24934V11.2368ZM3.9798 12.0452C3.9798 13.8476 4.57565 15.5108 5.58124 16.849C6.04996 17.4728 6.7655 17.8884 7.54573 17.8884V17.8884C8.28848 17.8884 8.9927 17.7236 9.62376 17.4286C7.83439 15.9596 6.69304 13.7314 6.69304 11.2368V8.46821L3.9798 9.16546V12.0452Z" fill="#010101"/>
12
+ <defs>
13
+ <linearGradient id="paint0_linear_8609_1645" x1="14.3898" y1="8.36821" x2="13.1876" y2="19.4461" gradientUnits="userSpaceOnUse">
14
+ <stop stop-color="white"/>
15
+ <stop offset="1" stop-color="#2E2E2E"/>
16
+ </linearGradient>
17
+ <linearGradient id="paint1_linear_8609_1645" x1="14.3898" y1="8.36821" x2="13.1876" y2="19.4461" gradientUnits="userSpaceOnUse">
18
+ <stop stop-color="white"/>
19
+ <stop offset="1" stop-color="#2E2E2E"/>
20
+ </linearGradient>
21
+ <linearGradient id="paint2_linear_8609_1645" x1="14.3898" y1="8.36821" x2="13.1876" y2="19.4461" gradientUnits="userSpaceOnUse">
22
+ <stop stop-color="white"/>
23
+ <stop offset="1" stop-color="#2E2E2E"/>
24
+ </linearGradient>
25
+ </defs>
26
+ </svg>
27
27
  `;
package/package.json CHANGED
@@ -1,52 +1,49 @@
1
- {
2
- "name": "@chenchaolong/plugin-mineru",
3
- "version": "0.0.13",
4
- "repository": {
5
- "type": "git",
6
- "url": "https://github.com/xpert-ai/xpert-plugins.git"
7
- },
8
- "bugs": {
9
- "url": "https://github.com/xpert-ai/xpert-plugins/issues"
10
- },
11
- "type": "module",
12
- "main": "./dist/index.js",
13
- "module": "./dist/index.js",
14
- "types": "./dist/index.d.ts",
15
- "exports": {
16
- "./package.json": "./package.json",
17
- ".": {
18
- "@xpert-plugins-starter/source": "./src/index.ts",
19
- "types": "./dist/index.d.ts",
20
- "import": "./dist/index.js",
21
- "default": "./dist/index.js"
22
- }
23
- },
24
- "files": [
25
- "dist",
26
- "!**/*.tsbuildinfo"
27
- ],
28
- "dependencies": {
29
- "form-data": "^4.0.0",
30
- "tslib": "^2.3.0",
31
- "unzipper": "0.12.3"
32
- },
33
- "peerDependencies": {
34
- "@nestjs/config": "^4.0.2",
35
- "zod": "3.25.67",
36
- "@xpert-ai/plugin-sdk": "^3.6.2",
37
- "@metad/contracts": "^3.6.2",
38
- "@nestjs/common": "^11.1.6",
39
- "axios": "1.12.2",
40
- "nestjs-i18n": "10.5.1",
41
- "chalk": "4.1.2",
42
- "@langchain/core": "^0.3.72",
43
- "lodash-es": "4.17.21",
44
- "uuid": "8.3.2"
45
- },
46
- "devDependencies": {
47
- "@types/unzipper": "^0.10.11"
48
- },
49
- "publishConfig": {
50
- "access": "public"
51
- }
52
- }
1
+ {
2
+ "name": "@chenchaolong/plugin-mineru",
3
+ "version": "1.1.0",
4
+ "repository": {
5
+ "type": "git",
6
+ "url": "https://github.com/xpert-ai/xpert-plugins.git"
7
+ },
8
+ "bugs": {
9
+ "url": "https://github.com/xpert-ai/xpert-plugins/issues"
10
+ },
11
+ "type": "module",
12
+ "main": "./dist/index.js",
13
+ "module": "./dist/index.js",
14
+ "types": "./dist/index.d.ts",
15
+ "exports": {
16
+ "./package.json": "./package.json",
17
+ ".": {
18
+ "@xpert-plugins-starter/source": "./src/index.ts",
19
+ "types": "./dist/index.d.ts",
20
+ "import": "./dist/index.js",
21
+ "default": "./dist/index.js"
22
+ }
23
+ },
24
+ "files": [
25
+ "dist",
26
+ "!**/*.tsbuildinfo"
27
+ ],
28
+ "dependencies": {
29
+ "form-data": "^4.0.0",
30
+ "tslib": "^2.3.0",
31
+ "unzipper": "0.12.3"
32
+ },
33
+ "peerDependencies": {
34
+ "@nestjs/config": "^4.0.2",
35
+ "zod": "3.25.67",
36
+ "@xpert-ai/plugin-sdk": "^3.6.2",
37
+ "@metad/contracts": "^3.6.2",
38
+ "@nestjs/common": "^11.1.6",
39
+ "axios": "1.12.2",
40
+ "nestjs-i18n": "10.5.1",
41
+ "chalk": "4.1.2",
42
+ "@langchain/core": "^0.3.72",
43
+ "lodash-es": "4.17.21",
44
+ "uuid": "8.3.2"
45
+ },
46
+ "devDependencies": {
47
+ "@types/unzipper": "^0.10.11"
48
+ }
49
+ }
@@ -1,10 +0,0 @@
1
- import { StructuredToolInterface, ToolSchemaBase } from '@langchain/core/tools';
2
- import { BuiltinToolset } from '@xpert-ai/plugin-sdk';
3
- import { ConfigService } from '@nestjs/config';
4
- import { MinerUResultParserService } from './result-parser.service.js';
5
- export declare function setMinerUToolsetServices(configService: ConfigService, resultParser: MinerUResultParserService): void;
6
- export declare class MinerUToolset extends BuiltinToolset<StructuredToolInterface, Record<string, never>> {
7
- _validateCredentials(credentials: Record<string, never>): Promise<void>;
8
- initTools(): Promise<StructuredToolInterface<ToolSchemaBase, any, any>[]>;
9
- }
10
- //# sourceMappingURL=mineru-toolset.d.ts.map
@@ -1 +0,0 @@
1
- {"version":3,"file":"mineru-toolset.d.ts","sourceRoot":"","sources":["../../src/lib/mineru-toolset.ts"],"names":[],"mappings":"AAAA,OAAO,EAAE,uBAAuB,EAAE,cAAc,EAAE,MAAM,uBAAuB,CAAC;AAChF,OAAO,EAAE,cAAc,EAAE,MAAM,sBAAsB,CAAC;AACtD,OAAO,EAAE,aAAa,EAAE,MAAM,gBAAgB,CAAC;AAC/C,OAAO,EAAE,yBAAyB,EAAE,MAAM,4BAA4B,CAAC;AAOvE,wBAAgB,wBAAwB,CACtC,aAAa,EAAE,aAAa,EAC5B,YAAY,EAAE,yBAAyB,QAIxC;AAED,qBAAa,aAAc,SAAQ,cAAc,CAAC,uBAAuB,EAAE,MAAM,CAAC,MAAM,EAAE,KAAK,CAAC,CAAC;IAChF,oBAAoB,CAAC,WAAW,EAAE,MAAM,CAAC,MAAM,EAAE,KAAK,CAAC,GAAG,OAAO,CAAC,IAAI,CAAC;IAIvE,SAAS,IAAI,OAAO,CAAC,uBAAuB,CAAC,cAAc,EAAE,GAAG,EAAE,GAAG,CAAC,EAAE,CAAC;CASzF"}
@@ -1,23 +0,0 @@
1
- import { BuiltinToolset } from '@xpert-ai/plugin-sdk';
2
- import { buildPdfToMarkdownTool } from './pdf-to-markdown.tool.js';
3
- // Store services globally for tool access
4
- let globalConfigService;
5
- let globalResultParser;
6
- export function setMinerUToolsetServices(configService, resultParser) {
7
- globalConfigService = configService;
8
- globalResultParser = resultParser;
9
- }
10
- export class MinerUToolset extends BuiltinToolset {
11
- async _validateCredentials(credentials) {
12
- // No credentials needed for mineru toolset (uses integration permissions)
13
- }
14
- async initTools() {
15
- if (!globalConfigService || !globalResultParser) {
16
- throw new Error('MinerU services not initialized. Call setMinerUToolsetServices first.');
17
- }
18
- this.tools = [
19
- buildPdfToMarkdownTool(globalConfigService, globalResultParser),
20
- ];
21
- return this.tools;
22
- }
23
- }
@@ -1,34 +0,0 @@
1
- import { ConfigService } from '@nestjs/config';
2
- import { BuiltinToolset, IToolsetStrategy } from '@xpert-ai/plugin-sdk';
3
- import { MinerUResultParserService } from './result-parser.service.js';
4
- export declare class MinerUToolsetStrategy implements IToolsetStrategy<any> {
5
- private readonly configService;
6
- private readonly resultParser;
7
- constructor(configService: ConfigService, resultParser: MinerUResultParserService);
8
- meta: {
9
- author: string;
10
- tags: string[];
11
- name: string;
12
- label: {
13
- en_US: string;
14
- zh_Hans: string;
15
- };
16
- description: {
17
- en_US: string;
18
- zh_Hans: string;
19
- };
20
- icon: {
21
- svg: string;
22
- color: string;
23
- };
24
- configSchema: {
25
- type: string;
26
- properties: {};
27
- required: any[];
28
- };
29
- };
30
- validateConfig(config: any): Promise<void>;
31
- create(config: any): Promise<BuiltinToolset>;
32
- createTools(): any[];
33
- }
34
- //# sourceMappingURL=mineru-toolset.strategy.d.ts.map
@@ -1 +0,0 @@
1
- {"version":3,"file":"mineru-toolset.strategy.d.ts","sourceRoot":"","sources":["../../src/lib/mineru-toolset.strategy.ts"],"names":[],"mappings":"AACA,OAAO,EAAE,aAAa,EAAE,MAAM,gBAAgB,CAAC;AAC/C,OAAO,EAAE,cAAc,EAAE,gBAAgB,EAAmB,MAAM,sBAAsB,CAAC;AAGzF,OAAO,EAAE,yBAAyB,EAAE,MAAM,4BAA4B,CAAC;AAEvE,qBAEa,qBAAsB,YAAW,gBAAgB,CAAC,GAAG,CAAC;IAG/D,OAAO,CAAC,QAAQ,CAAC,aAAa;IAE9B,OAAO,CAAC,QAAQ,CAAC,YAAY;gBAFZ,aAAa,EAAE,aAAa,EAE5B,YAAY,EAAE,yBAAyB;IAM1D,IAAI;;;;;;;;;;;;;;;;;;;;;MAqBF;IAEF,cAAc,CAAC,MAAM,EAAE,GAAG,GAAG,OAAO,CAAC,IAAI,CAAC;IAKpC,MAAM,CAAC,MAAM,EAAE,GAAG,GAAG,OAAO,CAAC,cAAc,CAAC;IAIlD,WAAW;CAKZ"}
@@ -1,58 +0,0 @@
1
- import { __decorate, __metadata, __param } from "tslib";
2
- import { Injectable, forwardRef, Inject } from '@nestjs/common';
3
- import { ConfigService } from '@nestjs/config';
4
- import { ToolsetStrategy } from '@xpert-ai/plugin-sdk';
5
- import { MinerU, icon } from './types.js';
6
- import { MinerUToolset, setMinerUToolsetServices } from './mineru-toolset.js';
7
- import { MinerUResultParserService } from './result-parser.service.js';
8
- let MinerUToolsetStrategy = class MinerUToolsetStrategy {
9
- constructor(configService, resultParser) {
10
- this.configService = configService;
11
- this.resultParser = resultParser;
12
- this.meta = {
13
- author: 'Xpert AI',
14
- tags: ['mineru', 'pdf', 'markdown', 'conversion', 'tool'],
15
- name: MinerU,
16
- label: {
17
- en_US: 'MinerU',
18
- zh_Hans: 'MinerU',
19
- },
20
- description: {
21
- en_US: 'Convert PDF files to Markdown and JSON format using MinerU. Supports OCR, formula recognition, and table extraction.',
22
- zh_Hans: '使用MinerU将PDF文件转换为Markdown和JSON格式。支持OCR、公式识别和表格提取。',
23
- },
24
- icon: {
25
- svg: icon,
26
- color: '#14b8a6',
27
- },
28
- configSchema: {
29
- type: 'object',
30
- properties: {},
31
- required: [],
32
- },
33
- };
34
- // Initialize global services for tool access
35
- setMinerUToolsetServices(this.configService, this.resultParser);
36
- }
37
- validateConfig(config) {
38
- // No validation needed - uses integration permissions
39
- return Promise.resolve();
40
- }
41
- async create(config) {
42
- return new MinerUToolset(config || {});
43
- }
44
- createTools() {
45
- // Tools are created dynamically in MinerUToolset.initTools()
46
- // This method is not used when using BuiltinToolset
47
- return [];
48
- }
49
- };
50
- MinerUToolsetStrategy = __decorate([
51
- Injectable(),
52
- ToolsetStrategy(MinerU),
53
- __param(0, Inject(forwardRef(() => ConfigService))),
54
- __param(1, Inject(MinerUResultParserService)),
55
- __metadata("design:paramtypes", [ConfigService,
56
- MinerUResultParserService])
57
- ], MinerUToolsetStrategy);
58
- export { MinerUToolsetStrategy };
@@ -1,90 +0,0 @@
1
- import { z } from 'zod';
2
- import { ConfigService } from '@nestjs/config';
3
- import { MinerUResultParserService } from './result-parser.service.js';
4
- export declare function buildPdfToMarkdownTool(configService: ConfigService, resultParser: MinerUResultParserService): import("@langchain/core/tools").DynamicStructuredTool<z.ZodObject<{
5
- file: z.ZodObject<{
6
- name: z.ZodOptional<z.ZodString>;
7
- filename: z.ZodOptional<z.ZodString>;
8
- content: z.ZodOptional<z.ZodUnion<[z.ZodString, z.ZodType<Buffer<ArrayBufferLike>, z.ZodTypeDef, Buffer<ArrayBufferLike>>, z.ZodType<Uint8Array<ArrayBuffer>, z.ZodTypeDef, Uint8Array<ArrayBuffer>>]>>;
9
- filePath: z.ZodOptional<z.ZodString>;
10
- fileUrl: z.ZodOptional<z.ZodString>;
11
- }, "strip", z.ZodTypeAny, {
12
- name?: string;
13
- filePath?: string;
14
- fileUrl?: string;
15
- filename?: string;
16
- content?: string | Uint8Array<ArrayBuffer> | Buffer<ArrayBufferLike>;
17
- }, {
18
- name?: string;
19
- filePath?: string;
20
- fileUrl?: string;
21
- filename?: string;
22
- content?: string | Uint8Array<ArrayBuffer> | Buffer<ArrayBufferLike>;
23
- }>;
24
- isOcr: z.ZodOptional<z.ZodBoolean>;
25
- enableFormula: z.ZodOptional<z.ZodBoolean>;
26
- enableTable: z.ZodOptional<z.ZodBoolean>;
27
- language: z.ZodOptional<z.ZodEnum<["en", "ch"]>>;
28
- modelVersion: z.ZodOptional<z.ZodEnum<["pipeline", "vlm"]>>;
29
- }, "strip", z.ZodTypeAny, {
30
- isOcr?: boolean;
31
- enableFormula?: boolean;
32
- enableTable?: boolean;
33
- language?: "ch" | "en";
34
- modelVersion?: "pipeline" | "vlm";
35
- file?: {
36
- name?: string;
37
- filePath?: string;
38
- fileUrl?: string;
39
- filename?: string;
40
- content?: string | Uint8Array<ArrayBuffer> | Buffer<ArrayBufferLike>;
41
- };
42
- }, {
43
- isOcr?: boolean;
44
- enableFormula?: boolean;
45
- enableTable?: boolean;
46
- language?: "ch" | "en";
47
- modelVersion?: "pipeline" | "vlm";
48
- file?: {
49
- name?: string;
50
- filePath?: string;
51
- fileUrl?: string;
52
- filename?: string;
53
- content?: string | Uint8Array<ArrayBuffer> | Buffer<ArrayBufferLike>;
54
- };
55
- }>, {
56
- isOcr?: boolean;
57
- enableFormula?: boolean;
58
- enableTable?: boolean;
59
- language?: "ch" | "en";
60
- modelVersion?: "pipeline" | "vlm";
61
- file?: {
62
- name?: string;
63
- filePath?: string;
64
- fileUrl?: string;
65
- filename?: string;
66
- content?: string | Uint8Array<ArrayBuffer> | Buffer<ArrayBufferLike>;
67
- };
68
- }, {
69
- isOcr?: boolean;
70
- enableFormula?: boolean;
71
- enableTable?: boolean;
72
- language?: "ch" | "en";
73
- modelVersion?: "pipeline" | "vlm";
74
- file?: {
75
- name?: string;
76
- filePath?: string;
77
- fileUrl?: string;
78
- filename?: string;
79
- content?: string | Uint8Array<ArrayBuffer> | Buffer<ArrayBufferLike>;
80
- };
81
- }, (string | {
82
- files: {
83
- mimeType: string;
84
- fileName: string;
85
- filePath: string;
86
- fileUrl: string;
87
- extension: string;
88
- }[];
89
- })[]>;
90
- //# sourceMappingURL=pdf-to-markdown.tool.d.ts.map
@@ -1 +0,0 @@
1
- {"version":3,"file":"pdf-to-markdown.tool.d.ts","sourceRoot":"","sources":["../../src/lib/pdf-to-markdown.tool.ts"],"names":[],"mappings":"AAGA,OAAO,EAAE,CAAC,EAAE,MAAM,KAAK,CAAC;AACxB,OAAO,EAAE,aAAa,EAAE,MAAM,gBAAgB,CAAC;AAE/C,OAAO,EAAE,yBAAyB,EAAE,MAAM,4BAA4B,CAAC;AAIvE,wBAAgB,sBAAsB,CACpC,aAAa,EAAE,aAAa,EAC5B,YAAY,EAAE,yBAAyB;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;MAqKxC"}
@@ -1,146 +0,0 @@
1
- import { tool } from '@langchain/core/tools';
2
- import { getCurrentTaskInput } from '@langchain/langgraph';
3
- import { getErrorMessage } from '@xpert-ai/plugin-sdk';
4
- import { z } from 'zod';
5
- import { MinerUClient } from './mineru.client.js';
6
- export function buildPdfToMarkdownTool(configService, resultParser) {
7
- return tool(async (input) => {
8
- try {
9
- const { file, isOcr, enableFormula, enableTable, language, modelVersion } = input;
10
- if (!file) {
11
- throw new Error('No file provided');
12
- }
13
- const currentState = getCurrentTaskInput();
14
- const workspacePath = currentState?.[`sys`]?.['volume'] ?? '/tmp/xpert';
15
- const baseUrl = currentState?.[`sys`]?.['workspace_url'] ?? 'http://localhost:3000';
16
- // Get permissions from current state
17
- const permissions = currentState?.[`sys`]?.['permissions'];
18
- if (!permissions?.fileSystem) {
19
- throw new Error('File system permission is required for MinerU tool');
20
- }
21
- // Get file content
22
- let fileContent;
23
- let fileName;
24
- let filePath;
25
- let fileUrl;
26
- if (file.content) {
27
- if (typeof file.content === 'string') {
28
- // Base64 string
29
- fileContent = Buffer.from(file.content, 'base64');
30
- }
31
- else if (Buffer.isBuffer(file.content)) {
32
- fileContent = file.content;
33
- }
34
- else if (file.content instanceof Uint8Array) {
35
- fileContent = Buffer.from(file.content);
36
- }
37
- else {
38
- throw new Error('Invalid file content format');
39
- }
40
- fileName = file.name || file.filename || 'document.pdf';
41
- }
42
- else if (file.filePath) {
43
- filePath = file.filePath;
44
- fileContent = await permissions.fileSystem.readFile(filePath);
45
- fileName = file.name || file.filename || filePath.split('/').pop() || 'document.pdf';
46
- }
47
- else if (file.fileUrl) {
48
- fileUrl = file.fileUrl;
49
- const response = await fetch(fileUrl);
50
- if (!response.ok) {
51
- throw new Error(`Failed to download file from URL: ${response.statusText}`);
52
- }
53
- const arrayBuffer = await response.arrayBuffer();
54
- fileContent = Buffer.from(arrayBuffer);
55
- fileName = file.name || file.filename || fileUrl.split('/').pop() || 'document.pdf';
56
- }
57
- else {
58
- throw new Error('File must provide content, filePath, or fileUrl');
59
- }
60
- // Save file to workspace if not already there
61
- if (!filePath) {
62
- const relativePath = `mineru-input/${fileName}`;
63
- filePath = relativePath;
64
- fileUrl = await permissions.fileSystem.writeFile(relativePath, fileContent);
65
- }
66
- // Create MinerU client
67
- const mineruClient = new MinerUClient(configService, {
68
- fileSystem: permissions.fileSystem,
69
- integration: permissions.integration,
70
- });
71
- // Create task
72
- const { taskId } = await mineruClient.createTask({
73
- url: fileUrl || file.fileUrl,
74
- filePath: filePath,
75
- fileName: fileName,
76
- isOcr: isOcr ?? true,
77
- enableFormula: enableFormula ?? true,
78
- enableTable: enableTable ?? true,
79
- language: language || 'ch',
80
- modelVersion: modelVersion || 'pipeline',
81
- });
82
- // Get result
83
- let result;
84
- if (mineruClient.serverType === 'self-hosted') {
85
- result = mineruClient.getSelfHostedTask(taskId);
86
- if (!result) {
87
- throw new Error('Failed to get MinerU task result');
88
- }
89
- }
90
- else {
91
- result = await mineruClient.waitForTask(taskId, 5 * 60 * 1000, 5000);
92
- }
93
- // Parse result
94
- const parsedResult = mineruClient.serverType === 'self-hosted'
95
- ? await resultParser.parseLocalTask(result, taskId, { folder: 'mineru-output', name: fileName }, permissions.fileSystem)
96
- : await resultParser.parseFromUrl(result.full_zip_url, taskId, { folder: 'mineru-output', name: fileName }, permissions.fileSystem);
97
- // Get markdown content
98
- const markdownContent = parsedResult.chunks[0]?.pageContent || '';
99
- const outputFileName = fileName.replace(/\.pdf$/i, '.md');
100
- const outputPath = `mineru-output/${outputFileName}`;
101
- const outputUrl = await permissions.fileSystem.writeFile(outputPath, Buffer.from(markdownContent, 'utf-8'));
102
- return [
103
- `Successfully converted PDF to Markdown: ${outputFileName}`,
104
- {
105
- files: [
106
- {
107
- mimeType: 'text/markdown',
108
- fileName: outputPath,
109
- filePath: permissions.fileSystem.fullPath(outputPath),
110
- fileUrl: outputUrl,
111
- extension: 'md',
112
- },
113
- ...(parsedResult.metadata.assets || []).map((asset) => ({
114
- mimeType: asset.type === 'image' ? 'image/png' : 'application/json',
115
- fileName: asset.filePath,
116
- filePath: permissions.fileSystem.fullPath(asset.filePath),
117
- fileUrl: asset.url,
118
- extension: asset.type === 'image' ? 'png' : 'json',
119
- })),
120
- ],
121
- },
122
- ];
123
- }
124
- catch (error) {
125
- throw new Error(`Error converting PDF to Markdown: ${getErrorMessage(error)}`);
126
- }
127
- }, {
128
- name: 'pdf_to_markdown',
129
- description: `Convert PDF file to Markdown format using MinerU. Supports OCR, formula recognition, and table extraction.`,
130
- schema: z.object({
131
- file: z.object({
132
- name: z.string().optional(),
133
- filename: z.string().optional(),
134
- content: z.union([z.string(), z.instanceof(Buffer), z.instanceof(Uint8Array)]).optional(),
135
- filePath: z.string().optional(),
136
- fileUrl: z.string().optional(),
137
- }),
138
- isOcr: z.boolean().optional().describe('Enable OCR for image-based PDFs'),
139
- enableFormula: z.boolean().optional().describe('Enable recognition of mathematical formulas'),
140
- enableTable: z.boolean().optional().describe('Enable recognition of tables'),
141
- language: z.enum(['en', 'ch']).optional().describe('Document language (en for English, ch for Chinese)'),
142
- modelVersion: z.enum(['pipeline', 'vlm']).optional().describe('MinerU model version'),
143
- }),
144
- responseFormat: 'content_and_artifact',
145
- });
146
- }