@chenchaolong/plugin-mineru 0.0.13 → 1.1.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +101 -113
- package/dist/lib/mineru.client.d.ts.map +1 -1
- package/dist/lib/mineru.client.js +32 -32
- package/dist/lib/result-parser.service.d.ts.map +1 -1
- package/dist/lib/result-parser.service.js +4 -9
- package/dist/lib/transformer-mineru.strategy.d.ts.map +1 -1
- package/dist/lib/transformer-mineru.strategy.js +12 -4
- package/dist/lib/types.js +22 -22
- package/package.json +49 -52
- package/dist/lib/mineru-toolset.d.ts +0 -10
- package/dist/lib/mineru-toolset.d.ts.map +0 -1
- package/dist/lib/mineru-toolset.js +0 -23
- package/dist/lib/mineru-toolset.strategy.d.ts +0 -34
- package/dist/lib/mineru-toolset.strategy.d.ts.map +0 -1
- package/dist/lib/mineru-toolset.strategy.js +0 -58
- package/dist/lib/pdf-to-markdown.tool.d.ts +0 -90
- package/dist/lib/pdf-to-markdown.tool.d.ts.map +0 -1
- package/dist/lib/pdf-to-markdown.tool.js +0 -146
package/README.md
CHANGED
|
@@ -1,113 +1,101 @@
|
|
|
1
|
-
# Xpert Plugin: MinerU
|
|
2
|
-
|
|
3
|
-
`@chenchaolong/plugin-mineru` is a MinerU document converter plugin for the [Xpert AI](https://github.com/xpert-ai/xpert) platform, providing extraction capabilities from PDF to Markdown and structured JSON. The plugin includes built-in MinerU integration strategies, document conversion strategies, and result parsing services, enabling secure access to the MinerU API in automated workflows, polling task status, and writing parsed content and attachment resources to the platform file system.
|
|
4
|
-
|
|
5
|
-
## Installation
|
|
6
|
-
|
|
7
|
-
```bash
|
|
8
|
-
pnpm add @chenchaolong/plugin-mineru
|
|
9
|
-
# or
|
|
10
|
-
npm install @chenchaolong/plugin-mineru
|
|
11
|
-
```
|
|
12
|
-
|
|
13
|
-
> **Note**: This plugin depends on `@xpert-ai/plugin-sdk`, `@nestjs/common@^11`, `@nestjs/config@^4`, `@metad/contracts`, `axios@1`, `chalk@4`, `@langchain/core@^0.3.72`, and `uuid@8` as peerDependencies. Please ensure these packages are installed in your host project.
|
|
14
|
-
|
|
15
|
-
## Quick Start
|
|
16
|
-
|
|
17
|
-
1. **Prepare MinerU Credentials**
|
|
18
|
-
Obtain a valid API Key from the MinerU dashboard and confirm the service address (default: `https://mineru.net/api/v4`).
|
|
19
|
-
|
|
20
|
-
2. **Configure Integration in Xpert**
|
|
21
|
-
- Via Xpert Console: Create a MinerU integration and fill in the following fields.
|
|
22
|
-
- Or set environment variables in your deployment environment:
|
|
23
|
-
- `MINERU_API_BASE_URL`: Optional, defaults to `https://mineru.net/api/v4`.
|
|
24
|
-
- `MINERU_API_TOKEN`: Required, used as a fallback credential if no integration is configured.
|
|
25
|
-
|
|
26
|
-
Example integration configuration (JSON):
|
|
27
|
-
|
|
28
|
-
```json
|
|
29
|
-
{
|
|
30
|
-
"provider": "mineru",
|
|
31
|
-
"options": {
|
|
32
|
-
"apiUrl": "https://mineru.net/api/v4",
|
|
33
|
-
"apiKey": "your-mineru-api-key"
|
|
34
|
-
}
|
|
35
|
-
}
|
|
36
|
-
```
|
|
37
|
-
|
|
38
|
-
3. **Register the Plugin**
|
|
39
|
-
Configure the plugin in your host service's plugin registration process:
|
|
40
|
-
|
|
41
|
-
```sh .env
|
|
42
|
-
PLUGINS=@chenchaolong/plugin-mineru
|
|
43
|
-
```
|
|
44
|
-
|
|
45
|
-
The plugin returns the NestJS module `MinerUPlugin` in the `register` hook and logs messages during the `onStart`/`onStop` lifecycle.
|
|
46
|
-
|
|
47
|
-
## MinerU Integration Options
|
|
48
|
-
|
|
49
|
-
| Field | Type | Description | Required | Default |
|
|
50
|
-
| -------- | ------ | ------------------------------------- | -------- | ---------------------------- |
|
|
51
|
-
| apiUrl | string | MinerU API base URL | No | `https://mineru.net/api/v4` |
|
|
52
|
-
| apiKey | string | MinerU service API Key (keep secret) | Yes | — |
|
|
53
|
-
|
|
54
|
-
> If both integration configuration and environment variables are provided, options from the integration configuration take precedence.
|
|
55
|
-
|
|
56
|
-
## Document Conversion Parameters
|
|
57
|
-
|
|
58
|
-
`MinerUTransformerStrategy` supports the following configuration options (passed to the MinerU API when starting a workflow):
|
|
59
|
-
|
|
60
|
-
| Field | Type | Default | Description |
|
|
61
|
-
| ---------------- | ------- | ------------ | --------------------------------------------------- |
|
|
62
|
-
| `isOcr` | boolean | `true` | Enable OCR for image-based PDFs. |
|
|
63
|
-
| `enableFormula` | boolean | `true` | Recognize mathematical formulas and output tags. |
|
|
64
|
-
| `enableTable` | boolean | `true` | Recognize tables and output structured tags. |
|
|
65
|
-
| `language` | string | `"ch"` | Main document language, per MinerU API (`en`/`ch`). |
|
|
66
|
-
| `modelVersion` | string | `"pipeline"` | MinerU model version (`pipeline`, `vlm`, etc.). |
|
|
67
|
-
|
|
68
|
-
By default, the plugin creates MinerU tasks for each file to be processed, polls until `full_zip_url` is returned, then downloads and parses the zip package in memory.
|
|
69
|
-
|
|
70
|
-
## Permissions
|
|
71
|
-
|
|
72
|
-
- **Integration**: Access MinerU integration configuration to read API address and credentials.
|
|
73
|
-
- **File System**: Perform `read/write/list` on `XpFileSystem` to store image resources from MinerU results.
|
|
74
|
-
|
|
75
|
-
Ensure the plugin is granted these permissions in your authorization policy, or it will not be able to retrieve results or write attachments.
|
|
76
|
-
|
|
77
|
-
## Output Content
|
|
78
|
-
|
|
79
|
-
The parser generates:
|
|
80
|
-
|
|
81
|
-
- Full Markdown: Resource links are automatically replaced to point to actual URLs written via `XpFileSystem`.
|
|
82
|
-
- Structured metadata: Includes MinerU task ID, layout JSON (`layout.json`), content list (`content_list.json`), original PDF filename, etc.
|
|
83
|
-
- Attachment asset list: Records written image resources for easy association by callers.
|
|
84
|
-
|
|
85
|
-
The returned `Document<ChunkMetadata>` array currently defaults to a single chunk containing the full Markdown; you can split it as needed.
|
|
86
|
-
|
|
87
|
-
##
|
|
88
|
-
|
|
89
|
-
|
|
90
|
-
|
|
91
|
-
|
|
92
|
-
|
|
93
|
-
|
|
94
|
-
|
|
95
|
-
```
|
|
96
|
-
|
|
97
|
-
|
|
98
|
-
|
|
99
|
-
##
|
|
100
|
-
|
|
101
|
-
|
|
102
|
-
|
|
103
|
-
```bash
|
|
104
|
-
npm install
|
|
105
|
-
npx nx build @chenchaolong/plugin-mineru
|
|
106
|
-
npx nx test @chenchaolong/plugin-mineru
|
|
107
|
-
```
|
|
108
|
-
|
|
109
|
-
TypeScript build artifacts are output to `packages/mineru/dist`. Before publishing, ensure `package.json`, type declarations, and runtime files are in sync.
|
|
110
|
-
|
|
111
|
-
## License
|
|
112
|
-
|
|
113
|
-
This project follows the [AGPL-3.0 License](../../../LICENSE) in the repository root.
|
|
1
|
+
# Xpert Plugin: MinerU
|
|
2
|
+
|
|
3
|
+
`@chenchaolong/plugin-mineru` is a MinerU document converter plugin for the [Xpert AI](https://github.com/xpert-ai/xpert) platform, providing extraction capabilities from PDF to Markdown and structured JSON. The plugin includes built-in MinerU integration strategies, document conversion strategies, and result parsing services, enabling secure access to the MinerU API in automated workflows, polling task status, and writing parsed content and attachment resources to the platform file system.
|
|
4
|
+
|
|
5
|
+
## Installation
|
|
6
|
+
|
|
7
|
+
```bash
|
|
8
|
+
pnpm add @chenchaolong/plugin-mineru
|
|
9
|
+
# or
|
|
10
|
+
npm install @chenchaolong/plugin-mineru
|
|
11
|
+
```
|
|
12
|
+
|
|
13
|
+
> **Note**: This plugin depends on `@xpert-ai/plugin-sdk`, `@nestjs/common@^11`, `@nestjs/config@^4`, `@metad/contracts`, `axios@1`, `chalk@4`, `@langchain/core@^0.3.72`, and `uuid@8` as peerDependencies. Please ensure these packages are installed in your host project.
|
|
14
|
+
|
|
15
|
+
## Quick Start
|
|
16
|
+
|
|
17
|
+
1. **Prepare MinerU Credentials**
|
|
18
|
+
Obtain a valid API Key from the MinerU dashboard and confirm the service address (default: `https://mineru.net/api/v4`).
|
|
19
|
+
|
|
20
|
+
2. **Configure Integration in Xpert**
|
|
21
|
+
- Via Xpert Console: Create a MinerU integration and fill in the following fields.
|
|
22
|
+
- Or set environment variables in your deployment environment:
|
|
23
|
+
- `MINERU_API_BASE_URL`: Optional, defaults to `https://mineru.net/api/v4`.
|
|
24
|
+
- `MINERU_API_TOKEN`: Required, used as a fallback credential if no integration is configured.
|
|
25
|
+
|
|
26
|
+
Example integration configuration (JSON):
|
|
27
|
+
|
|
28
|
+
```json
|
|
29
|
+
{
|
|
30
|
+
"provider": "mineru",
|
|
31
|
+
"options": {
|
|
32
|
+
"apiUrl": "https://mineru.net/api/v4",
|
|
33
|
+
"apiKey": "your-mineru-api-key"
|
|
34
|
+
}
|
|
35
|
+
}
|
|
36
|
+
```
|
|
37
|
+
|
|
38
|
+
3. **Register the Plugin**
|
|
39
|
+
Configure the plugin in your host service's plugin registration process:
|
|
40
|
+
|
|
41
|
+
```sh .env
|
|
42
|
+
PLUGINS=@chenchaolong/plugin-mineru
|
|
43
|
+
```
|
|
44
|
+
|
|
45
|
+
The plugin returns the NestJS module `MinerUPlugin` in the `register` hook and logs messages during the `onStart`/`onStop` lifecycle.
|
|
46
|
+
|
|
47
|
+
## MinerU Integration Options
|
|
48
|
+
|
|
49
|
+
| Field | Type | Description | Required | Default |
|
|
50
|
+
| -------- | ------ | ------------------------------------- | -------- | ---------------------------- |
|
|
51
|
+
| apiUrl | string | MinerU API base URL | No | `https://mineru.net/api/v4` |
|
|
52
|
+
| apiKey | string | MinerU service API Key (keep secret) | Yes | — |
|
|
53
|
+
|
|
54
|
+
> If both integration configuration and environment variables are provided, options from the integration configuration take precedence.
|
|
55
|
+
|
|
56
|
+
## Document Conversion Parameters
|
|
57
|
+
|
|
58
|
+
`MinerUTransformerStrategy` supports the following configuration options (passed to the MinerU API when starting a workflow):
|
|
59
|
+
|
|
60
|
+
| Field | Type | Default | Description |
|
|
61
|
+
| ---------------- | ------- | ------------ | --------------------------------------------------- |
|
|
62
|
+
| `isOcr` | boolean | `true` | Enable OCR for image-based PDFs. |
|
|
63
|
+
| `enableFormula` | boolean | `true` | Recognize mathematical formulas and output tags. |
|
|
64
|
+
| `enableTable` | boolean | `true` | Recognize tables and output structured tags. |
|
|
65
|
+
| `language` | string | `"ch"` | Main document language, per MinerU API (`en`/`ch`). |
|
|
66
|
+
| `modelVersion` | string | `"pipeline"` | MinerU model version (`pipeline`, `vlm`, etc.). |
|
|
67
|
+
|
|
68
|
+
By default, the plugin creates MinerU tasks for each file to be processed, polls until `full_zip_url` is returned, then downloads and parses the zip package in memory.
|
|
69
|
+
|
|
70
|
+
## Permissions
|
|
71
|
+
|
|
72
|
+
- **Integration**: Access MinerU integration configuration to read API address and credentials.
|
|
73
|
+
- **File System**: Perform `read/write/list` on `XpFileSystem` to store image resources from MinerU results.
|
|
74
|
+
|
|
75
|
+
Ensure the plugin is granted these permissions in your authorization policy, or it will not be able to retrieve results or write attachments.
|
|
76
|
+
|
|
77
|
+
## Output Content
|
|
78
|
+
|
|
79
|
+
The parser generates:
|
|
80
|
+
|
|
81
|
+
- Full Markdown: Resource links are automatically replaced to point to actual URLs written via `XpFileSystem`.
|
|
82
|
+
- Structured metadata: Includes MinerU task ID, layout JSON (`layout.json`), content list (`content_list.json`), original PDF filename, etc.
|
|
83
|
+
- Attachment asset list: Records written image resources for easy association by callers.
|
|
84
|
+
|
|
85
|
+
The returned `Document<ChunkMetadata>` array currently defaults to a single chunk containing the full Markdown; you can split it as needed.
|
|
86
|
+
|
|
87
|
+
## Development & Debugging
|
|
88
|
+
|
|
89
|
+
Run the following commands in the repository root to build and test locally:
|
|
90
|
+
|
|
91
|
+
```bash
|
|
92
|
+
npm install
|
|
93
|
+
npx nx build @chenchaolong/plugin-mineru
|
|
94
|
+
npx nx test @chenchaolong/plugin-mineru
|
|
95
|
+
```
|
|
96
|
+
|
|
97
|
+
TypeScript build artifacts are output to `packages/mineru/dist`. Before publishing, ensure `package.json`, type declarations, and runtime files are in sync.
|
|
98
|
+
|
|
99
|
+
## License
|
|
100
|
+
|
|
101
|
+
This project follows the [AGPL-3.0 License](../../../LICENSE) in the repository root.
|
|
@@ -1 +1 @@
|
|
|
1
|
-
{"version":3,"file":"mineru.client.d.ts","sourceRoot":"","sources":["../../src/lib/mineru.client.ts"],"names":[],"mappings":"AAAA,OAAO,EAAE,YAAY,EAAE,MAAM,kBAAkB,CAAC;AAEhD,OAAO,EAAE,aAAa,EAAE,MAAM,gBAAgB,CAAC;AAC/C,OAAO,EAAmB,YAAY,EAAE,MAAM,sBAAsB,CAAC;AACrE,OAAc,EAAE,aAAa,EAAE,MAAM,OAAO,CAAC;AAK7C,OAAO,EAIL,wBAAwB,EAExB,0BAA0B,EAC1B,gBAAgB,EACjB,MAAM,YAAY,CAAC;AAIpB,UAAU,iBAAiB;IACzB,GAAG,CAAC,EAAE,MAAM,CAAC;IACb,QAAQ,CAAC,EAAE,MAAM,CAAC;IAClB,QAAQ,CAAC,EAAE,MAAM,CAAC;IAClB,KAAK,CAAC,EAAE,OAAO,CAAC;IAChB,aAAa,CAAC,EAAE,OAAO,CAAC;IACxB,WAAW,CAAC,EAAE,OAAO,CAAC;IACtB,QAAQ,CAAC,EAAE,MAAM,CAAC;IAClB,YAAY,CAAC,EAAE,MAAM,CAAC;IACtB,MAAM,CAAC,EAAE,MAAM,CAAC;IAChB,UAAU,CAAC,EAAE,MAAM,CAAC;IACpB,YAAY,CAAC,EAAE,MAAM,EAAE,CAAC;IACxB,WAAW,CAAC,EAAE,MAAM,CAAC;IACrB,IAAI,CAAC,EAAE,MAAM,CAAC;IACd,mEAAmE;IACnE,WAAW,CAAC,EAAE,MAAM,CAAC;IACrB,yEAAyE;IACzE,OAAO,CAAC,EAAE,MAAM,CAAC;IACjB,2EAA2E;IAC3E,SAAS,CAAC,EAAE,MAAM,CAAC;IACnB,4EAA4E;IAC5E,gBAAgB,CAAC,EAAE,OAAO,CAAC;CAC5B;AAED,UAAU,mBAAmB;IAC3B,GAAG,EAAE,MAAM,CAAC;IACZ,KAAK,CAAC,EAAE,OAAO,CAAC;IAChB,MAAM,CAAC,EAAE,MAAM,CAAC;IAChB,UAAU,CAAC,EAAE,MAAM,CAAC;CACrB;AAED,UAAU,sBAAsB;IAC9B,KAAK,EAAE,mBAAmB,EAAE,CAAC;IAC7B,aAAa,CAAC,EAAE,OAAO,CAAC;IACxB,WAAW,CAAC,EAAE,OAAO,CAAC;IACtB,QAAQ,CAAC,EAAE,MAAM,CAAC;IAClB,YAAY,CAAC,EAAE,MAAM,CAAC;IACtB,YAAY,CAAC,EAAE,MAAM,EAAE,CAAC;IACxB,WAAW,CAAC,EAAE,MAAM,CAAC;IACrB,IAAI,CAAC,EAAE,MAAM,CAAC;CACf;AAED,UAAU,iBAAiB;IACzB,aAAa,CAAC,EAAE,OAAO,CAAC;IACxB,WAAW,CAAC,EAAE,OAAO,CAAC;IACtB,QAAQ,CAAC,EAAE,MAAM,CAAC;CACnB;AASD,qBAAa,YAAY;IAWrB,OAAO,CAAC,QAAQ,CAAC,aAAa;IAC9B,OAAO,CAAC,QAAQ,CAAC,WAAW,CAAC;IAX/B,OAAO,CAAC,QAAQ,CAAC,MAAM,CAAiC;IACxD,OAAO,CAAC,QAAQ,CAAC,OAAO,CAAS;IACjC,OAAO,CAAC,QAAQ,CAAC,KAAK,CAAC,CAAS;IAChC,SAAgB,UAAU,EAAE,gBAAgB,CAAC;IAC7C,OAAO,CAAC,QAAQ,CAAC,UAAU,CAAiD;IAE5E,IAAI,UAAU,IAAI,YAAY,GAAG,SAAS,CAEzC;gBAEkB,aAAa,EAAE,aAAa,EAC5B,WAAW,CAAC,EAAE;QACvB,UAAU,CAAC,EAAE,YAAY,CAAC;QAC1B,WAAW,CAAC,EAAE,OAAO,CAAC,YAAY,CAAC,wBAAwB,CAAC,CAAC,CAAC;KACjE;IAkBP;;;OAGG;IACG,UAAU,CAAC,OAAO,EAAE,iBAAiB,GAAG,OAAO,CAAC;QAAE,MAAM,EAAE,MAAM,CAAA;KAAE,CAAC;IAYzE;;OAEG;IACG,eAAe,CAAC,OAAO,EAAE,sBAAsB,GAAG,OAAO,CAAC;QAAE,OAAO,EAAE,MAAM,CAAC;QAAC,QAAQ,CAAC,EAAE,MAAM,EAAE,CAAA;KAAE,CAAC;
|
|
1
|
+
{"version":3,"file":"mineru.client.d.ts","sourceRoot":"","sources":["../../src/lib/mineru.client.ts"],"names":[],"mappings":"AAAA,OAAO,EAAE,YAAY,EAAE,MAAM,kBAAkB,CAAC;AAEhD,OAAO,EAAE,aAAa,EAAE,MAAM,gBAAgB,CAAC;AAC/C,OAAO,EAAmB,YAAY,EAAE,MAAM,sBAAsB,CAAC;AACrE,OAAc,EAAE,aAAa,EAAE,MAAM,OAAO,CAAC;AAK7C,OAAO,EAIL,wBAAwB,EAExB,0BAA0B,EAC1B,gBAAgB,EACjB,MAAM,YAAY,CAAC;AAIpB,UAAU,iBAAiB;IACzB,GAAG,CAAC,EAAE,MAAM,CAAC;IACb,QAAQ,CAAC,EAAE,MAAM,CAAC;IAClB,QAAQ,CAAC,EAAE,MAAM,CAAC;IAClB,KAAK,CAAC,EAAE,OAAO,CAAC;IAChB,aAAa,CAAC,EAAE,OAAO,CAAC;IACxB,WAAW,CAAC,EAAE,OAAO,CAAC;IACtB,QAAQ,CAAC,EAAE,MAAM,CAAC;IAClB,YAAY,CAAC,EAAE,MAAM,CAAC;IACtB,MAAM,CAAC,EAAE,MAAM,CAAC;IAChB,UAAU,CAAC,EAAE,MAAM,CAAC;IACpB,YAAY,CAAC,EAAE,MAAM,EAAE,CAAC;IACxB,WAAW,CAAC,EAAE,MAAM,CAAC;IACrB,IAAI,CAAC,EAAE,MAAM,CAAC;IACd,mEAAmE;IACnE,WAAW,CAAC,EAAE,MAAM,CAAC;IACrB,yEAAyE;IACzE,OAAO,CAAC,EAAE,MAAM,CAAC;IACjB,2EAA2E;IAC3E,SAAS,CAAC,EAAE,MAAM,CAAC;IACnB,4EAA4E;IAC5E,gBAAgB,CAAC,EAAE,OAAO,CAAC;CAC5B;AAED,UAAU,mBAAmB;IAC3B,GAAG,EAAE,MAAM,CAAC;IACZ,KAAK,CAAC,EAAE,OAAO,CAAC;IAChB,MAAM,CAAC,EAAE,MAAM,CAAC;IAChB,UAAU,CAAC,EAAE,MAAM,CAAC;CACrB;AAED,UAAU,sBAAsB;IAC9B,KAAK,EAAE,mBAAmB,EAAE,CAAC;IAC7B,aAAa,CAAC,EAAE,OAAO,CAAC;IACxB,WAAW,CAAC,EAAE,OAAO,CAAC;IACtB,QAAQ,CAAC,EAAE,MAAM,CAAC;IAClB,YAAY,CAAC,EAAE,MAAM,CAAC;IACtB,YAAY,CAAC,EAAE,MAAM,EAAE,CAAC;IACxB,WAAW,CAAC,EAAE,MAAM,CAAC;IACrB,IAAI,CAAC,EAAE,MAAM,CAAC;CACf;AAED,UAAU,iBAAiB;IACzB,aAAa,CAAC,EAAE,OAAO,CAAC;IACxB,WAAW,CAAC,EAAE,OAAO,CAAC;IACtB,QAAQ,CAAC,EAAE,MAAM,CAAC;CACnB;AASD,qBAAa,YAAY;IAWrB,OAAO,CAAC,QAAQ,CAAC,aAAa;IAC9B,OAAO,CAAC,QAAQ,CAAC,WAAW,CAAC;IAX/B,OAAO,CAAC,QAAQ,CAAC,MAAM,CAAiC;IACxD,OAAO,CAAC,QAAQ,CAAC,OAAO,CAAS;IACjC,OAAO,CAAC,QAAQ,CAAC,KAAK,CAAC,CAAS;IAChC,SAAgB,UAAU,EAAE,gBAAgB,CAAC;IAC7C,OAAO,CAAC,QAAQ,CAAC,UAAU,CAAiD;IAE5E,IAAI,UAAU,IAAI,YAAY,GAAG,SAAS,CAEzC;gBAEkB,aAAa,EAAE,aAAa,EAC5B,WAAW,CAAC,EAAE;QACvB,UAAU,CAAC,EAAE,YAAY,CAAC;QAC1B,WAAW,CAAC,EAAE,OAAO,CAAC,YAAY,CAAC,wBAAwB,CAAC,CAAC,CAAC;KACjE;IAkBP;;;OAGG;IACG,UAAU,CAAC,OAAO,EAAE,iBAAiB,GAAG,OAAO,CAAC;QAAE,MAAM,EAAE,MAAM,CAAA;KAAE,CAAC;IAYzE;;OAEG;IACG,eAAe,CAAC,OAAO,EAAE,sBAAsB,GAAG,OAAO,CAAC;QAAE,OAAO,EAAE,MAAM,CAAC;QAAC,QAAQ,CAAC,EAAE,MAAM,EAAE,CAAA;KAAE,CAAC;IAmCzG,iBAAiB,CAAC,MAAM,EAAE,MAAM,GAAG,0BAA0B,GAAG,SAAS;IAOzE;;OAEG;IACG,aAAa,CAAC,MAAM,EAAE,MAAM,EAAE,OAAO,CAAC,EAAE,iBAAiB,GAAG,OAAO,CAAC;QACxE,YAAY,CAAC,EAAE,MAAM,CAAC;QACtB,QAAQ,CAAC,EAAE,MAAM,CAAC;QAClB,OAAO,CAAC,EAAE,MAAM,CAAC;QACjB,MAAM,CAAC,EAAE,MAAM,CAAC;KACjB,CAAC;IAoBF;;OAEG;IACG,cAAc,CAAC,OAAO,EAAE,MAAM,GAAG,OAAO,CAAC,GAAG,CAAC;IAiBnD;;OAEG;IACG,WAAW,CAAC,MAAM,EAAE,MAAM,EAAE,SAAS,SAAgB,EAAE,UAAU,SAAO,GAAG,OAAO,CAAC,GAAG,CAAC;IAsB7F,OAAO,CAAC,cAAc;IAMtB,OAAO,CAAC,iBAAiB;IAczB,OAAO,CAAC,kBAAkB;IAyB1B,OAAO,CAAC,sBAAsB;IAI9B,OAAO,CAAC,gBAAgB;IAIxB,OAAO,CAAC,WAAW;IAQnB,OAAO,CAAC,kBAAkB;IAO1B,OAAO,CAAC,oBAAoB;YAYd,kBAAkB;YA4BlB,oBAAoB;YA6BpB,qBAAqB;YAoErB,uBAAuB;IAsDrC,OAAO,CAAC,iBAAiB;IAgBzB,OAAO,CAAC,2BAA2B;IAenC,OAAO,CAAC,6BAA6B;IAcrC,OAAO,CAAC,iBAAiB;IAQzB,OAAO,CAAC,aAAa;IAcrB,OAAO,CAAC,iBAAiB;IAQzB,OAAO,CAAC,eAAe;YAIT,YAAY;IAkB1B,OAAO,CAAC,eAAe;IA0BvB,wBAAwB,IAAI,OAAO,CAAC,aAAa,CAAC,GAAG,EAAE,GAAG,CAAC,CAAC;IAKtD,wBAAwB;CAU/B"}
|
|
@@ -3,7 +3,7 @@ import { getErrorMessage } from '@xpert-ai/plugin-sdk';
|
|
|
3
3
|
import axios from 'axios';
|
|
4
4
|
import FormData from 'form-data';
|
|
5
5
|
import { randomUUID } from 'crypto';
|
|
6
|
-
import { basename
|
|
6
|
+
import { basename } from 'path';
|
|
7
7
|
import fs from 'fs';
|
|
8
8
|
import { ENV_MINERU_API_BASE_URL, ENV_MINERU_API_TOKEN, ENV_MINERU_SERVER_TYPE, } from './types.js';
|
|
9
9
|
const DEFAULT_OFFICIAL_BASE_URL = 'https://mineru.net/api/v4';
|
|
@@ -46,10 +46,6 @@ export class MinerUClient {
|
|
|
46
46
|
*/
|
|
47
47
|
async createBatchTask(options) {
|
|
48
48
|
this.ensureOfficial('createBatchTask');
|
|
49
|
-
// Validate files is an array
|
|
50
|
-
if (!Array.isArray(options.files)) {
|
|
51
|
-
throw new Error('MinerU createBatchTask requires files to be an array');
|
|
52
|
-
}
|
|
53
49
|
const url = this.buildApiUrl('extract', 'task', 'batch');
|
|
54
50
|
const body = {
|
|
55
51
|
files: options.files.map((file) => {
|
|
@@ -71,15 +67,8 @@ export class MinerUClient {
|
|
|
71
67
|
body.language = options.language;
|
|
72
68
|
if (options.modelVersion)
|
|
73
69
|
body.model_version = options.modelVersion;
|
|
74
|
-
|
|
75
|
-
|
|
76
|
-
if (Array.isArray(options.extraFormats)) {
|
|
77
|
-
body.extra_formats = options.extraFormats;
|
|
78
|
-
}
|
|
79
|
-
else {
|
|
80
|
-
this.logger.warn('extraFormats is not an array, ignoring');
|
|
81
|
-
}
|
|
82
|
-
}
|
|
70
|
+
if (options.extraFormats)
|
|
71
|
+
body.extra_formats = options.extraFormats;
|
|
83
72
|
if (options.callbackUrl)
|
|
84
73
|
body.callback = options.callbackUrl;
|
|
85
74
|
if (options.seed)
|
|
@@ -242,15 +231,8 @@ export class MinerUClient {
|
|
|
242
231
|
body.data_id = options.dataId;
|
|
243
232
|
if (options.pageRanges)
|
|
244
233
|
body.page_ranges = options.pageRanges;
|
|
245
|
-
|
|
246
|
-
|
|
247
|
-
if (Array.isArray(options.extraFormats)) {
|
|
248
|
-
body.extra_formats = options.extraFormats;
|
|
249
|
-
}
|
|
250
|
-
else {
|
|
251
|
-
this.logger.warn('extraFormats is not an array, ignoring');
|
|
252
|
-
}
|
|
253
|
-
}
|
|
234
|
+
if (options.extraFormats)
|
|
235
|
+
body.extra_formats = options.extraFormats;
|
|
254
236
|
if (options.callbackUrl)
|
|
255
237
|
body.callback = options.callbackUrl;
|
|
256
238
|
if (options.seed)
|
|
@@ -269,20 +251,33 @@ export class MinerUClient {
|
|
|
269
251
|
}
|
|
270
252
|
}
|
|
271
253
|
async createSelfHostedTask(options) {
|
|
254
|
+
// Validate fileSystem is available for self-hosted mode
|
|
255
|
+
if (!this.fileSystem) {
|
|
256
|
+
throw new Error('MinerU self-hosted mode requires fileSystem permission');
|
|
257
|
+
}
|
|
258
|
+
// Validate filePath is provided
|
|
272
259
|
if (!options.filePath) {
|
|
273
|
-
throw new Error('MinerU
|
|
260
|
+
throw new Error('MinerU self-hosted mode requires filePath to be provided');
|
|
261
|
+
}
|
|
262
|
+
// Get absolute file path from fileSystem
|
|
263
|
+
const filePath = this.fileSystem.fullPath(options.filePath);
|
|
264
|
+
// Validate file exists before attempting to parse
|
|
265
|
+
try {
|
|
266
|
+
await fs.promises.access(filePath, fs.constants.F_OK);
|
|
267
|
+
}
|
|
268
|
+
catch (error) {
|
|
269
|
+
this.logger.error(`File not found: ${filePath}`, error instanceof Error ? error.stack : error);
|
|
270
|
+
throw new Error(`File not found: ${filePath}`);
|
|
274
271
|
}
|
|
275
|
-
// Normalize path for cross-platform compatibility (Windows/Linux)
|
|
276
|
-
const rawPath = this.fileSystem.fullPath(options.filePath);
|
|
277
|
-
const filePath = normalize(resolve(rawPath));
|
|
278
272
|
const taskId = randomUUID();
|
|
279
|
-
const result = await this.invokeSelfHostedParse(filePath, options.fileName, options);
|
|
273
|
+
const result = await this.invokeSelfHostedParse(filePath, options.fileName || basename(filePath), options);
|
|
280
274
|
this.localTasks.set(taskId, { ...result, sourceUrl: options.url });
|
|
281
275
|
return { taskId };
|
|
282
276
|
}
|
|
283
277
|
async invokeSelfHostedParse(filePath, fileName, options) {
|
|
284
278
|
const parseUrl = this.buildApiUrl('file_parse');
|
|
285
279
|
const form = new FormData();
|
|
280
|
+
// Create file read stream (file existence is already validated in createSelfHostedTask)
|
|
286
281
|
form.append('files', fs.createReadStream(filePath), {
|
|
287
282
|
filename: fileName,
|
|
288
283
|
});
|
|
@@ -313,11 +308,14 @@ export class MinerUClient {
|
|
|
313
308
|
return this.invokeSelfHostedParseV1(filePath, fileName, options);
|
|
314
309
|
}
|
|
315
310
|
if (response.status === 400) {
|
|
316
|
-
|
|
311
|
+
const errorMessage = getErrorMessage(response.data);
|
|
312
|
+
this.logger.error(`MinerU self-hosted parse failed with 400: ${errorMessage}`, JSON.stringify(response.data));
|
|
313
|
+
throw new BadRequestException(`MinerU self-hosted parse failed: ${response.status} ${errorMessage}`);
|
|
317
314
|
}
|
|
318
315
|
if (response.status !== 200) {
|
|
319
|
-
|
|
320
|
-
|
|
316
|
+
const errorMessage = getErrorMessage(response.data) || response.statusText;
|
|
317
|
+
this.logger.error(`MinerU self-hosted parse failed with ${response.status}: ${errorMessage}`, JSON.stringify(response.data));
|
|
318
|
+
throw new Error(`MinerU self-hosted parse failed: ${response.status} ${response.statusText}. ${errorMessage}`);
|
|
321
319
|
}
|
|
322
320
|
return this.normalizeSelfHostedResponse(response.data);
|
|
323
321
|
}
|
|
@@ -346,7 +344,9 @@ export class MinerUClient {
|
|
|
346
344
|
validateStatus: () => true,
|
|
347
345
|
});
|
|
348
346
|
if (response.status !== 200) {
|
|
349
|
-
|
|
347
|
+
const errorMessage = getErrorMessage(response.data) || response.statusText;
|
|
348
|
+
this.logger.error(`MinerU self-hosted legacy parse failed with ${response.status}: ${errorMessage}`, JSON.stringify(response.data));
|
|
349
|
+
throw new Error(`MinerU self-hosted legacy parse failed: ${response.status} ${response.statusText}. ${errorMessage}`);
|
|
350
350
|
}
|
|
351
351
|
return this.normalizeSelfHostedResponse(response.data);
|
|
352
352
|
}
|
|
@@ -1 +1 @@
|
|
|
1
|
-
{"version":3,"file":"result-parser.service.d.ts","sourceRoot":"","sources":["../../src/lib/result-parser.service.ts"],"names":[],"mappings":"AAAA,OAAO,EAAE,QAAQ,EAAE,MAAM,2BAA2B,CAAC;AACrD,OAAO,EAAE,kBAAkB,EAAE,MAAM,kBAAkB,CAAC;AAEtD,OAAO,EACL,aAAa,EAEb,YAAY,EACb,MAAM,sBAAsB,CAAC;AAK9B,OAAO,EAEL,sBAAsB,EACtB,0BAA0B,EAC3B,MAAM,YAAY,CAAC;AAEpB,qBACa,yBAAyB;IACpC,OAAO,CAAC,QAAQ,CAAC,MAAM,CAA8C;IAE/D,YAAY,CAChB,UAAU,EAAE,MAAM,EAClB,MAAM,EAAE,MAAM,EACd,QAAQ,EAAE,OAAO,CAAC,kBAAkB,CAAC,EACrC,UAAU,EAAE,YAAY,GACvB,OAAO,CAAC;QACT,EAAE,CAAC,EAAE,MAAM,CAAC;QACZ,MAAM,EAAE,QAAQ,CAAC,aAAa,CAAC,EAAE,CAAC;QAClC,QAAQ,EAAE,sBAAsB,CAAC;KAClC,CAAC;
|
|
1
|
+
{"version":3,"file":"result-parser.service.d.ts","sourceRoot":"","sources":["../../src/lib/result-parser.service.ts"],"names":[],"mappings":"AAAA,OAAO,EAAE,QAAQ,EAAE,MAAM,2BAA2B,CAAC;AACrD,OAAO,EAAE,kBAAkB,EAAE,MAAM,kBAAkB,CAAC;AAEtD,OAAO,EACL,aAAa,EAEb,YAAY,EACb,MAAM,sBAAsB,CAAC;AAK9B,OAAO,EAEL,sBAAsB,EACtB,0BAA0B,EAC3B,MAAM,YAAY,CAAC;AAEpB,qBACa,yBAAyB;IACpC,OAAO,CAAC,QAAQ,CAAC,MAAM,CAA8C;IAE/D,YAAY,CAChB,UAAU,EAAE,MAAM,EAClB,MAAM,EAAE,MAAM,EACd,QAAQ,EAAE,OAAO,CAAC,kBAAkB,CAAC,EACrC,UAAU,EAAE,YAAY,GACvB,OAAO,CAAC;QACT,EAAE,CAAC,EAAE,MAAM,CAAC;QACZ,MAAM,EAAE,QAAQ,CAAC,aAAa,CAAC,EAAE,CAAC;QAClC,QAAQ,EAAE,sBAAsB,CAAC;KAClC,CAAC;IAqFI,cAAc,CAClB,MAAM,EAAE,0BAA0B,EAClC,MAAM,EAAE,MAAM,EACd,QAAQ,EAAE,OAAO,CAAC,kBAAkB,CAAC,EACrC,UAAU,EAAE,YAAY,GACvB,OAAO,CAAC;QACT,EAAE,CAAC,EAAE,MAAM,CAAC;QACZ,MAAM,EAAE,QAAQ,CAAC,aAAa,CAAC,EAAE,CAAC;QAClC,QAAQ,EAAE,sBAAsB,CAAC;KAClC,CAAC;CAkDH"}
|
|
@@ -3,7 +3,7 @@ import { __decorate } from "tslib";
|
|
|
3
3
|
import { Document } from '@langchain/core/documents';
|
|
4
4
|
import { Injectable, Logger } from '@nestjs/common';
|
|
5
5
|
import axios from 'axios';
|
|
6
|
-
import { join
|
|
6
|
+
import { join } from 'path';
|
|
7
7
|
import unzipper from 'unzipper';
|
|
8
8
|
import { v4 as uuidv4 } from 'uuid';
|
|
9
9
|
import { MinerU, } from './types.js';
|
|
@@ -34,11 +34,8 @@ let MinerUResultParserService = MinerUResultParserService_1 = class MinerUResult
|
|
|
34
34
|
continue;
|
|
35
35
|
const data = await entry.buffer();
|
|
36
36
|
zipEntries.push({ entryName: entry.path, data });
|
|
37
|
-
|
|
38
|
-
|
|
39
|
-
const normalizedEntryPath = entry.path.replace(/\\/g, '/'); // Normalize to POSIX format first
|
|
40
|
-
const fileName = normalizedEntryPath;
|
|
41
|
-
const filePath = normalize(join(document.folder || '', normalizedEntryPath));
|
|
37
|
+
const fileName = entry.path;
|
|
38
|
+
const filePath = join(document.folder || '', entry.path);
|
|
42
39
|
const url = await fileSystem.writeFile(filePath, data);
|
|
43
40
|
pathMap.set(fileName, url);
|
|
44
41
|
// Write images to local file system
|
|
@@ -102,9 +99,7 @@ let MinerUResultParserService = MinerUResultParserService_1 = class MinerUResult
|
|
|
102
99
|
};
|
|
103
100
|
const assets = [];
|
|
104
101
|
const pathMap = new Map();
|
|
105
|
-
|
|
106
|
-
const images = Array.isArray(result.images) ? result.images : [];
|
|
107
|
-
for (const image of images) {
|
|
102
|
+
for (const image of result.images) {
|
|
108
103
|
const filePath = join(document.folder || '', 'images', image.name);
|
|
109
104
|
const url = await fileSystem.writeFile(filePath, Buffer.from(image.dataUrl.split(',')[1], 'base64'));
|
|
110
105
|
pathMap.set(`images/${image.name}`, url);
|
|
@@ -1 +1 @@
|
|
|
1
|
-
{"version":3,"file":"transformer-mineru.strategy.d.ts","sourceRoot":"","sources":["../../src/lib/transformer-mineru.strategy.ts"],"names":[],"mappings":"AAAA,OAAO,EAAE,QAAQ,EAAE,kBAAkB,EAAE,MAAM,kBAAkB,CAAA;AAG/D,OAAO,EACL,aAAa,EAEb,oBAAoB,EACpB,4BAA4B,EAC5B,qBAAqB,EACtB,MAAM,sBAAsB,CAAA;AAI7B,OAAO,EAAgB,wBAAwB,EAAE,MAAM,YAAY,CAAA;AAEnE,qBAEa,yBAA0B,YAAW,4BAA4B,CAAC,wBAAwB,CAAC;IAEtG,OAAO,CAAC,QAAQ,CAAC,YAAY,CAA2B;IAGxD,OAAO,CAAC,QAAQ,CAAC,aAAa,CAAe;IAE7C,QAAQ,CAAC,WAAW,mDAWnB;IAED,QAAQ,CAAC,IAAI;;;;;;;;;;;kBAWM,QAAQ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;MAwE1B;IAED,cAAc,CAAC,MAAM,EAAE,GAAG,GAAG,OAAO,CAAC,IAAI,CAAC;IAIpC,kBAAkB,CACtB,SAAS,EAAE,OAAO,CAAC,kBAAkB,CAAC,EAAE,EACxC,MAAM,EAAE,wBAAwB,GAC/B,OAAO,CAAC,OAAO,CAAC,kBAAkB,CAAC,aAAa,CAAC,CAAC,EAAE,CAAC;
|
|
1
|
+
{"version":3,"file":"transformer-mineru.strategy.d.ts","sourceRoot":"","sources":["../../src/lib/transformer-mineru.strategy.ts"],"names":[],"mappings":"AAAA,OAAO,EAAE,QAAQ,EAAE,kBAAkB,EAAE,MAAM,kBAAkB,CAAA;AAG/D,OAAO,EACL,aAAa,EAEb,oBAAoB,EACpB,4BAA4B,EAC5B,qBAAqB,EACtB,MAAM,sBAAsB,CAAA;AAI7B,OAAO,EAAgB,wBAAwB,EAAE,MAAM,YAAY,CAAA;AAEnE,qBAEa,yBAA0B,YAAW,4BAA4B,CAAC,wBAAwB,CAAC;IAEtG,OAAO,CAAC,QAAQ,CAAC,YAAY,CAA2B;IAGxD,OAAO,CAAC,QAAQ,CAAC,aAAa,CAAe;IAE7C,QAAQ,CAAC,WAAW,mDAWnB;IAED,QAAQ,CAAC,IAAI;;;;;;;;;;;kBAWM,QAAQ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;MAwE1B;IAED,cAAc,CAAC,MAAM,EAAE,GAAG,GAAG,OAAO,CAAC,IAAI,CAAC;IAIpC,kBAAkB,CACtB,SAAS,EAAE,OAAO,CAAC,kBAAkB,CAAC,EAAE,EACxC,MAAM,EAAE,wBAAwB,GAC/B,OAAO,CAAC,OAAO,CAAC,kBAAkB,CAAC,aAAa,CAAC,CAAC,EAAE,CAAC;CA8DzD"}
|
|
@@ -125,8 +125,12 @@ let MinerUTransformerStrategy = class MinerUTransformerStrategy {
|
|
|
125
125
|
});
|
|
126
126
|
const result = mineru.getSelfHostedTask(taskId);
|
|
127
127
|
const parsedResult = await this.resultParser.parseLocalTask(result, taskId, document, config.permissions.fileSystem);
|
|
128
|
-
parsedResult
|
|
129
|
-
parsedResults.push(
|
|
128
|
+
// Convert parsedResult to IKnowledgeDocument format
|
|
129
|
+
parsedResults.push({
|
|
130
|
+
id: document.id,
|
|
131
|
+
chunks: parsedResult.chunks,
|
|
132
|
+
metadata: parsedResult.metadata
|
|
133
|
+
});
|
|
130
134
|
}
|
|
131
135
|
else {
|
|
132
136
|
const { taskId } = await mineru.createTask({
|
|
@@ -141,8 +145,12 @@ let MinerUTransformerStrategy = class MinerUTransformerStrategy {
|
|
|
141
145
|
// Waiting for completion
|
|
142
146
|
const result = await mineru.waitForTask(taskId, 5 * 60 * 1000, 5000);
|
|
143
147
|
const parsedResult = await this.resultParser.parseFromUrl(result.full_zip_url, taskId, document, config.permissions.fileSystem);
|
|
144
|
-
parsedResult
|
|
145
|
-
parsedResults.push(
|
|
148
|
+
// Convert parsedResult to IKnowledgeDocument format
|
|
149
|
+
parsedResults.push({
|
|
150
|
+
id: document.id,
|
|
151
|
+
chunks: parsedResult.chunks,
|
|
152
|
+
metadata: parsedResult.metadata
|
|
153
|
+
});
|
|
146
154
|
}
|
|
147
155
|
}
|
|
148
156
|
return parsedResults;
|
package/dist/lib/types.js
CHANGED
|
@@ -2,26 +2,26 @@ export const MinerU = 'mineru';
|
|
|
2
2
|
export const ENV_MINERU_API_BASE_URL = 'MINERU_API_BASE_URL';
|
|
3
3
|
export const ENV_MINERU_API_TOKEN = 'MINERU_API_TOKEN';
|
|
4
4
|
export const ENV_MINERU_SERVER_TYPE = 'MINERU_SERVER_TYPE';
|
|
5
|
-
export const icon = `<svg width="24" height="24" viewBox="0 0 24 24" fill="none" xmlns="http://www.w3.org/2000/svg">
|
|
6
|
-
<path d="M19.7238 3.86898C19.7238 4.57597 19.1502 5.1491 18.4427 5.1491C17.7352 5.1491 17.1616 4.57597 17.1616 3.86898C17.1616 3.16199 17.7352 2.58887 18.4427 2.58887C19.1502 2.58887 19.7238 3.16199 19.7238 3.86898Z" fill="url(#paint0_linear_8609_1645)"/>
|
|
7
|
-
<path d="M19.7238 3.86898C19.7238 4.57597 19.1502 5.1491 18.4427 5.1491C17.7352 5.1491 17.1616 4.57597 17.1616 3.86898C17.1616 3.16199 17.7352 2.58887 18.4427 2.58887C19.1502 2.58887 19.7238 3.16199 19.7238 3.86898Z" fill="#010101"/>
|
|
8
|
-
<path d="M15.3681 5.1491C15.3681 5.85609 14.7945 6.42921 14.087 6.42921C13.3794 6.42921 12.8059 5.85609 12.8059 5.1491C12.8059 4.44211 13.3794 3.86898 14.087 3.86898C14.7945 3.86898 15.3681 4.44211 15.3681 5.1491Z" fill="url(#paint1_linear_8609_1645)"/>
|
|
9
|
-
<path d="M15.3681 5.1491C15.3681 5.85609 14.7945 6.42921 14.087 6.42921C13.3794 6.42921 12.8059 5.85609 12.8059 5.1491C12.8059 4.44211 13.3794 3.86898 14.087 3.86898C14.7945 3.86898 15.3681 4.44211 15.3681 5.1491Z" fill="#010101"/>
|
|
10
|
-
<path fill-rule="evenodd" clip-rule="evenodd" d="M8.05175 11.2368C8.05175 13.4605 9.14375 15.4293 10.8211 16.6371C11.8241 15.7389 12.4551 14.4345 12.4551 12.9828V9.39673C12.4551 8.85661 12.8197 8.38448 13.3426 8.24757L19.8924 6.53265C20.6459 6.33534 21.3826 6.90341 21.3826 7.6818L21.3826 12.0452C21.3826 17.2179 17.1861 21.4111 12.0095 21.4111L11.9942 21.4111C6.81758 21.4111 2.62109 17.2179 2.62109 12.0452V9.03388C2.62109 8.49175 2.9884 8.01839 3.51385 7.88336L6.56677 7.09882C7.31904 6.9055 8.05175 7.47318 8.05175 8.24934V11.2368ZM3.9798 12.0452C3.9798 13.8476 4.57565 15.5108 5.58124 16.849C6.04996 17.4728 6.7655 17.8884 7.54573 17.8884V17.8884C8.28848 17.8884 8.9927 17.7236 9.62376 17.4286C7.83439 15.9596 6.69304 13.7314 6.69304 11.2368V8.46821L3.9798 9.16546V12.0452Z" fill="url(#paint2_linear_8609_1645)"/>
|
|
11
|
-
<path fill-rule="evenodd" clip-rule="evenodd" d="M8.05175 11.2368C8.05175 13.4605 9.14375 15.4293 10.8211 16.6371C11.8241 15.7389 12.4551 14.4345 12.4551 12.9828V9.39673C12.4551 8.85661 12.8197 8.38448 13.3426 8.24757L19.8924 6.53265C20.6459 6.33534 21.3826 6.90341 21.3826 7.6818L21.3826 12.0452C21.3826 17.2179 17.1861 21.4111 12.0095 21.4111L11.9942 21.4111C6.81758 21.4111 2.62109 17.2179 2.62109 12.0452V9.03388C2.62109 8.49175 2.9884 8.01839 3.51385 7.88336L6.56677 7.09882C7.31904 6.9055 8.05175 7.47318 8.05175 8.24934V11.2368ZM3.9798 12.0452C3.9798 13.8476 4.57565 15.5108 5.58124 16.849C6.04996 17.4728 6.7655 17.8884 7.54573 17.8884V17.8884C8.28848 17.8884 8.9927 17.7236 9.62376 17.4286C7.83439 15.9596 6.69304 13.7314 6.69304 11.2368V8.46821L3.9798 9.16546V12.0452Z" fill="#010101"/>
|
|
12
|
-
<defs>
|
|
13
|
-
<linearGradient id="paint0_linear_8609_1645" x1="14.3898" y1="8.36821" x2="13.1876" y2="19.4461" gradientUnits="userSpaceOnUse">
|
|
14
|
-
<stop stop-color="white"/>
|
|
15
|
-
<stop offset="1" stop-color="#2E2E2E"/>
|
|
16
|
-
</linearGradient>
|
|
17
|
-
<linearGradient id="paint1_linear_8609_1645" x1="14.3898" y1="8.36821" x2="13.1876" y2="19.4461" gradientUnits="userSpaceOnUse">
|
|
18
|
-
<stop stop-color="white"/>
|
|
19
|
-
<stop offset="1" stop-color="#2E2E2E"/>
|
|
20
|
-
</linearGradient>
|
|
21
|
-
<linearGradient id="paint2_linear_8609_1645" x1="14.3898" y1="8.36821" x2="13.1876" y2="19.4461" gradientUnits="userSpaceOnUse">
|
|
22
|
-
<stop stop-color="white"/>
|
|
23
|
-
<stop offset="1" stop-color="#2E2E2E"/>
|
|
24
|
-
</linearGradient>
|
|
25
|
-
</defs>
|
|
26
|
-
</svg>
|
|
5
|
+
export const icon = `<svg width="24" height="24" viewBox="0 0 24 24" fill="none" xmlns="http://www.w3.org/2000/svg">
|
|
6
|
+
<path d="M19.7238 3.86898C19.7238 4.57597 19.1502 5.1491 18.4427 5.1491C17.7352 5.1491 17.1616 4.57597 17.1616 3.86898C17.1616 3.16199 17.7352 2.58887 18.4427 2.58887C19.1502 2.58887 19.7238 3.16199 19.7238 3.86898Z" fill="url(#paint0_linear_8609_1645)"/>
|
|
7
|
+
<path d="M19.7238 3.86898C19.7238 4.57597 19.1502 5.1491 18.4427 5.1491C17.7352 5.1491 17.1616 4.57597 17.1616 3.86898C17.1616 3.16199 17.7352 2.58887 18.4427 2.58887C19.1502 2.58887 19.7238 3.16199 19.7238 3.86898Z" fill="#010101"/>
|
|
8
|
+
<path d="M15.3681 5.1491C15.3681 5.85609 14.7945 6.42921 14.087 6.42921C13.3794 6.42921 12.8059 5.85609 12.8059 5.1491C12.8059 4.44211 13.3794 3.86898 14.087 3.86898C14.7945 3.86898 15.3681 4.44211 15.3681 5.1491Z" fill="url(#paint1_linear_8609_1645)"/>
|
|
9
|
+
<path d="M15.3681 5.1491C15.3681 5.85609 14.7945 6.42921 14.087 6.42921C13.3794 6.42921 12.8059 5.85609 12.8059 5.1491C12.8059 4.44211 13.3794 3.86898 14.087 3.86898C14.7945 3.86898 15.3681 4.44211 15.3681 5.1491Z" fill="#010101"/>
|
|
10
|
+
<path fill-rule="evenodd" clip-rule="evenodd" d="M8.05175 11.2368C8.05175 13.4605 9.14375 15.4293 10.8211 16.6371C11.8241 15.7389 12.4551 14.4345 12.4551 12.9828V9.39673C12.4551 8.85661 12.8197 8.38448 13.3426 8.24757L19.8924 6.53265C20.6459 6.33534 21.3826 6.90341 21.3826 7.6818L21.3826 12.0452C21.3826 17.2179 17.1861 21.4111 12.0095 21.4111L11.9942 21.4111C6.81758 21.4111 2.62109 17.2179 2.62109 12.0452V9.03388C2.62109 8.49175 2.9884 8.01839 3.51385 7.88336L6.56677 7.09882C7.31904 6.9055 8.05175 7.47318 8.05175 8.24934V11.2368ZM3.9798 12.0452C3.9798 13.8476 4.57565 15.5108 5.58124 16.849C6.04996 17.4728 6.7655 17.8884 7.54573 17.8884V17.8884C8.28848 17.8884 8.9927 17.7236 9.62376 17.4286C7.83439 15.9596 6.69304 13.7314 6.69304 11.2368V8.46821L3.9798 9.16546V12.0452Z" fill="url(#paint2_linear_8609_1645)"/>
|
|
11
|
+
<path fill-rule="evenodd" clip-rule="evenodd" d="M8.05175 11.2368C8.05175 13.4605 9.14375 15.4293 10.8211 16.6371C11.8241 15.7389 12.4551 14.4345 12.4551 12.9828V9.39673C12.4551 8.85661 12.8197 8.38448 13.3426 8.24757L19.8924 6.53265C20.6459 6.33534 21.3826 6.90341 21.3826 7.6818L21.3826 12.0452C21.3826 17.2179 17.1861 21.4111 12.0095 21.4111L11.9942 21.4111C6.81758 21.4111 2.62109 17.2179 2.62109 12.0452V9.03388C2.62109 8.49175 2.9884 8.01839 3.51385 7.88336L6.56677 7.09882C7.31904 6.9055 8.05175 7.47318 8.05175 8.24934V11.2368ZM3.9798 12.0452C3.9798 13.8476 4.57565 15.5108 5.58124 16.849C6.04996 17.4728 6.7655 17.8884 7.54573 17.8884V17.8884C8.28848 17.8884 8.9927 17.7236 9.62376 17.4286C7.83439 15.9596 6.69304 13.7314 6.69304 11.2368V8.46821L3.9798 9.16546V12.0452Z" fill="#010101"/>
|
|
12
|
+
<defs>
|
|
13
|
+
<linearGradient id="paint0_linear_8609_1645" x1="14.3898" y1="8.36821" x2="13.1876" y2="19.4461" gradientUnits="userSpaceOnUse">
|
|
14
|
+
<stop stop-color="white"/>
|
|
15
|
+
<stop offset="1" stop-color="#2E2E2E"/>
|
|
16
|
+
</linearGradient>
|
|
17
|
+
<linearGradient id="paint1_linear_8609_1645" x1="14.3898" y1="8.36821" x2="13.1876" y2="19.4461" gradientUnits="userSpaceOnUse">
|
|
18
|
+
<stop stop-color="white"/>
|
|
19
|
+
<stop offset="1" stop-color="#2E2E2E"/>
|
|
20
|
+
</linearGradient>
|
|
21
|
+
<linearGradient id="paint2_linear_8609_1645" x1="14.3898" y1="8.36821" x2="13.1876" y2="19.4461" gradientUnits="userSpaceOnUse">
|
|
22
|
+
<stop stop-color="white"/>
|
|
23
|
+
<stop offset="1" stop-color="#2E2E2E"/>
|
|
24
|
+
</linearGradient>
|
|
25
|
+
</defs>
|
|
26
|
+
</svg>
|
|
27
27
|
`;
|
package/package.json
CHANGED
|
@@ -1,52 +1,49 @@
|
|
|
1
|
-
{
|
|
2
|
-
"name": "@chenchaolong/plugin-mineru",
|
|
3
|
-
"version": "
|
|
4
|
-
"repository": {
|
|
5
|
-
"type": "git",
|
|
6
|
-
"url": "https://github.com/xpert-ai/xpert-plugins.git"
|
|
7
|
-
},
|
|
8
|
-
"bugs": {
|
|
9
|
-
"url": "https://github.com/xpert-ai/xpert-plugins/issues"
|
|
10
|
-
},
|
|
11
|
-
"type": "module",
|
|
12
|
-
"main": "./dist/index.js",
|
|
13
|
-
"module": "./dist/index.js",
|
|
14
|
-
"types": "./dist/index.d.ts",
|
|
15
|
-
"exports": {
|
|
16
|
-
"./package.json": "./package.json",
|
|
17
|
-
".": {
|
|
18
|
-
"@xpert-plugins-starter/source": "./src/index.ts",
|
|
19
|
-
"types": "./dist/index.d.ts",
|
|
20
|
-
"import": "./dist/index.js",
|
|
21
|
-
"default": "./dist/index.js"
|
|
22
|
-
}
|
|
23
|
-
},
|
|
24
|
-
"files": [
|
|
25
|
-
"dist",
|
|
26
|
-
"!**/*.tsbuildinfo"
|
|
27
|
-
],
|
|
28
|
-
"dependencies": {
|
|
29
|
-
"form-data": "^4.0.0",
|
|
30
|
-
"tslib": "^2.3.0",
|
|
31
|
-
"unzipper": "0.12.3"
|
|
32
|
-
},
|
|
33
|
-
"peerDependencies": {
|
|
34
|
-
"@nestjs/config": "^4.0.2",
|
|
35
|
-
"zod": "3.25.67",
|
|
36
|
-
"@xpert-ai/plugin-sdk": "^3.6.2",
|
|
37
|
-
"@metad/contracts": "^3.6.2",
|
|
38
|
-
"@nestjs/common": "^11.1.6",
|
|
39
|
-
"axios": "1.12.2",
|
|
40
|
-
"nestjs-i18n": "10.5.1",
|
|
41
|
-
"chalk": "4.1.2",
|
|
42
|
-
"@langchain/core": "^0.3.72",
|
|
43
|
-
"lodash-es": "4.17.21",
|
|
44
|
-
"uuid": "8.3.2"
|
|
45
|
-
},
|
|
46
|
-
"devDependencies": {
|
|
47
|
-
"@types/unzipper": "^0.10.11"
|
|
48
|
-
}
|
|
49
|
-
|
|
50
|
-
"access": "public"
|
|
51
|
-
}
|
|
52
|
-
}
|
|
1
|
+
{
|
|
2
|
+
"name": "@chenchaolong/plugin-mineru",
|
|
3
|
+
"version": "1.1.0",
|
|
4
|
+
"repository": {
|
|
5
|
+
"type": "git",
|
|
6
|
+
"url": "https://github.com/xpert-ai/xpert-plugins.git"
|
|
7
|
+
},
|
|
8
|
+
"bugs": {
|
|
9
|
+
"url": "https://github.com/xpert-ai/xpert-plugins/issues"
|
|
10
|
+
},
|
|
11
|
+
"type": "module",
|
|
12
|
+
"main": "./dist/index.js",
|
|
13
|
+
"module": "./dist/index.js",
|
|
14
|
+
"types": "./dist/index.d.ts",
|
|
15
|
+
"exports": {
|
|
16
|
+
"./package.json": "./package.json",
|
|
17
|
+
".": {
|
|
18
|
+
"@xpert-plugins-starter/source": "./src/index.ts",
|
|
19
|
+
"types": "./dist/index.d.ts",
|
|
20
|
+
"import": "./dist/index.js",
|
|
21
|
+
"default": "./dist/index.js"
|
|
22
|
+
}
|
|
23
|
+
},
|
|
24
|
+
"files": [
|
|
25
|
+
"dist",
|
|
26
|
+
"!**/*.tsbuildinfo"
|
|
27
|
+
],
|
|
28
|
+
"dependencies": {
|
|
29
|
+
"form-data": "^4.0.0",
|
|
30
|
+
"tslib": "^2.3.0",
|
|
31
|
+
"unzipper": "0.12.3"
|
|
32
|
+
},
|
|
33
|
+
"peerDependencies": {
|
|
34
|
+
"@nestjs/config": "^4.0.2",
|
|
35
|
+
"zod": "3.25.67",
|
|
36
|
+
"@xpert-ai/plugin-sdk": "^3.6.2",
|
|
37
|
+
"@metad/contracts": "^3.6.2",
|
|
38
|
+
"@nestjs/common": "^11.1.6",
|
|
39
|
+
"axios": "1.12.2",
|
|
40
|
+
"nestjs-i18n": "10.5.1",
|
|
41
|
+
"chalk": "4.1.2",
|
|
42
|
+
"@langchain/core": "^0.3.72",
|
|
43
|
+
"lodash-es": "4.17.21",
|
|
44
|
+
"uuid": "8.3.2"
|
|
45
|
+
},
|
|
46
|
+
"devDependencies": {
|
|
47
|
+
"@types/unzipper": "^0.10.11"
|
|
48
|
+
}
|
|
49
|
+
}
|
|
@@ -1,10 +0,0 @@
|
|
|
1
|
-
import { StructuredToolInterface, ToolSchemaBase } from '@langchain/core/tools';
|
|
2
|
-
import { BuiltinToolset } from '@xpert-ai/plugin-sdk';
|
|
3
|
-
import { ConfigService } from '@nestjs/config';
|
|
4
|
-
import { MinerUResultParserService } from './result-parser.service.js';
|
|
5
|
-
export declare function setMinerUToolsetServices(configService: ConfigService, resultParser: MinerUResultParserService): void;
|
|
6
|
-
export declare class MinerUToolset extends BuiltinToolset<StructuredToolInterface, Record<string, never>> {
|
|
7
|
-
_validateCredentials(credentials: Record<string, never>): Promise<void>;
|
|
8
|
-
initTools(): Promise<StructuredToolInterface<ToolSchemaBase, any, any>[]>;
|
|
9
|
-
}
|
|
10
|
-
//# sourceMappingURL=mineru-toolset.d.ts.map
|
|
@@ -1 +0,0 @@
|
|
|
1
|
-
{"version":3,"file":"mineru-toolset.d.ts","sourceRoot":"","sources":["../../src/lib/mineru-toolset.ts"],"names":[],"mappings":"AAAA,OAAO,EAAE,uBAAuB,EAAE,cAAc,EAAE,MAAM,uBAAuB,CAAC;AAChF,OAAO,EAAE,cAAc,EAAE,MAAM,sBAAsB,CAAC;AACtD,OAAO,EAAE,aAAa,EAAE,MAAM,gBAAgB,CAAC;AAC/C,OAAO,EAAE,yBAAyB,EAAE,MAAM,4BAA4B,CAAC;AAOvE,wBAAgB,wBAAwB,CACtC,aAAa,EAAE,aAAa,EAC5B,YAAY,EAAE,yBAAyB,QAIxC;AAED,qBAAa,aAAc,SAAQ,cAAc,CAAC,uBAAuB,EAAE,MAAM,CAAC,MAAM,EAAE,KAAK,CAAC,CAAC;IAChF,oBAAoB,CAAC,WAAW,EAAE,MAAM,CAAC,MAAM,EAAE,KAAK,CAAC,GAAG,OAAO,CAAC,IAAI,CAAC;IAIvE,SAAS,IAAI,OAAO,CAAC,uBAAuB,CAAC,cAAc,EAAE,GAAG,EAAE,GAAG,CAAC,EAAE,CAAC;CASzF"}
|
|
@@ -1,23 +0,0 @@
|
|
|
1
|
-
import { BuiltinToolset } from '@xpert-ai/plugin-sdk';
|
|
2
|
-
import { buildPdfToMarkdownTool } from './pdf-to-markdown.tool.js';
|
|
3
|
-
// Store services globally for tool access
|
|
4
|
-
let globalConfigService;
|
|
5
|
-
let globalResultParser;
|
|
6
|
-
export function setMinerUToolsetServices(configService, resultParser) {
|
|
7
|
-
globalConfigService = configService;
|
|
8
|
-
globalResultParser = resultParser;
|
|
9
|
-
}
|
|
10
|
-
export class MinerUToolset extends BuiltinToolset {
|
|
11
|
-
async _validateCredentials(credentials) {
|
|
12
|
-
// No credentials needed for mineru toolset (uses integration permissions)
|
|
13
|
-
}
|
|
14
|
-
async initTools() {
|
|
15
|
-
if (!globalConfigService || !globalResultParser) {
|
|
16
|
-
throw new Error('MinerU services not initialized. Call setMinerUToolsetServices first.');
|
|
17
|
-
}
|
|
18
|
-
this.tools = [
|
|
19
|
-
buildPdfToMarkdownTool(globalConfigService, globalResultParser),
|
|
20
|
-
];
|
|
21
|
-
return this.tools;
|
|
22
|
-
}
|
|
23
|
-
}
|
|
@@ -1,34 +0,0 @@
|
|
|
1
|
-
import { ConfigService } from '@nestjs/config';
|
|
2
|
-
import { BuiltinToolset, IToolsetStrategy } from '@xpert-ai/plugin-sdk';
|
|
3
|
-
import { MinerUResultParserService } from './result-parser.service.js';
|
|
4
|
-
export declare class MinerUToolsetStrategy implements IToolsetStrategy<any> {
|
|
5
|
-
private readonly configService;
|
|
6
|
-
private readonly resultParser;
|
|
7
|
-
constructor(configService: ConfigService, resultParser: MinerUResultParserService);
|
|
8
|
-
meta: {
|
|
9
|
-
author: string;
|
|
10
|
-
tags: string[];
|
|
11
|
-
name: string;
|
|
12
|
-
label: {
|
|
13
|
-
en_US: string;
|
|
14
|
-
zh_Hans: string;
|
|
15
|
-
};
|
|
16
|
-
description: {
|
|
17
|
-
en_US: string;
|
|
18
|
-
zh_Hans: string;
|
|
19
|
-
};
|
|
20
|
-
icon: {
|
|
21
|
-
svg: string;
|
|
22
|
-
color: string;
|
|
23
|
-
};
|
|
24
|
-
configSchema: {
|
|
25
|
-
type: string;
|
|
26
|
-
properties: {};
|
|
27
|
-
required: any[];
|
|
28
|
-
};
|
|
29
|
-
};
|
|
30
|
-
validateConfig(config: any): Promise<void>;
|
|
31
|
-
create(config: any): Promise<BuiltinToolset>;
|
|
32
|
-
createTools(): any[];
|
|
33
|
-
}
|
|
34
|
-
//# sourceMappingURL=mineru-toolset.strategy.d.ts.map
|
|
@@ -1 +0,0 @@
|
|
|
1
|
-
{"version":3,"file":"mineru-toolset.strategy.d.ts","sourceRoot":"","sources":["../../src/lib/mineru-toolset.strategy.ts"],"names":[],"mappings":"AACA,OAAO,EAAE,aAAa,EAAE,MAAM,gBAAgB,CAAC;AAC/C,OAAO,EAAE,cAAc,EAAE,gBAAgB,EAAmB,MAAM,sBAAsB,CAAC;AAGzF,OAAO,EAAE,yBAAyB,EAAE,MAAM,4BAA4B,CAAC;AAEvE,qBAEa,qBAAsB,YAAW,gBAAgB,CAAC,GAAG,CAAC;IAG/D,OAAO,CAAC,QAAQ,CAAC,aAAa;IAE9B,OAAO,CAAC,QAAQ,CAAC,YAAY;gBAFZ,aAAa,EAAE,aAAa,EAE5B,YAAY,EAAE,yBAAyB;IAM1D,IAAI;;;;;;;;;;;;;;;;;;;;;MAqBF;IAEF,cAAc,CAAC,MAAM,EAAE,GAAG,GAAG,OAAO,CAAC,IAAI,CAAC;IAKpC,MAAM,CAAC,MAAM,EAAE,GAAG,GAAG,OAAO,CAAC,cAAc,CAAC;IAIlD,WAAW;CAKZ"}
|
|
@@ -1,58 +0,0 @@
|
|
|
1
|
-
import { __decorate, __metadata, __param } from "tslib";
|
|
2
|
-
import { Injectable, forwardRef, Inject } from '@nestjs/common';
|
|
3
|
-
import { ConfigService } from '@nestjs/config';
|
|
4
|
-
import { ToolsetStrategy } from '@xpert-ai/plugin-sdk';
|
|
5
|
-
import { MinerU, icon } from './types.js';
|
|
6
|
-
import { MinerUToolset, setMinerUToolsetServices } from './mineru-toolset.js';
|
|
7
|
-
import { MinerUResultParserService } from './result-parser.service.js';
|
|
8
|
-
let MinerUToolsetStrategy = class MinerUToolsetStrategy {
|
|
9
|
-
constructor(configService, resultParser) {
|
|
10
|
-
this.configService = configService;
|
|
11
|
-
this.resultParser = resultParser;
|
|
12
|
-
this.meta = {
|
|
13
|
-
author: 'Xpert AI',
|
|
14
|
-
tags: ['mineru', 'pdf', 'markdown', 'conversion', 'tool'],
|
|
15
|
-
name: MinerU,
|
|
16
|
-
label: {
|
|
17
|
-
en_US: 'MinerU',
|
|
18
|
-
zh_Hans: 'MinerU',
|
|
19
|
-
},
|
|
20
|
-
description: {
|
|
21
|
-
en_US: 'Convert PDF files to Markdown and JSON format using MinerU. Supports OCR, formula recognition, and table extraction.',
|
|
22
|
-
zh_Hans: '使用MinerU将PDF文件转换为Markdown和JSON格式。支持OCR、公式识别和表格提取。',
|
|
23
|
-
},
|
|
24
|
-
icon: {
|
|
25
|
-
svg: icon,
|
|
26
|
-
color: '#14b8a6',
|
|
27
|
-
},
|
|
28
|
-
configSchema: {
|
|
29
|
-
type: 'object',
|
|
30
|
-
properties: {},
|
|
31
|
-
required: [],
|
|
32
|
-
},
|
|
33
|
-
};
|
|
34
|
-
// Initialize global services for tool access
|
|
35
|
-
setMinerUToolsetServices(this.configService, this.resultParser);
|
|
36
|
-
}
|
|
37
|
-
validateConfig(config) {
|
|
38
|
-
// No validation needed - uses integration permissions
|
|
39
|
-
return Promise.resolve();
|
|
40
|
-
}
|
|
41
|
-
async create(config) {
|
|
42
|
-
return new MinerUToolset(config || {});
|
|
43
|
-
}
|
|
44
|
-
createTools() {
|
|
45
|
-
// Tools are created dynamically in MinerUToolset.initTools()
|
|
46
|
-
// This method is not used when using BuiltinToolset
|
|
47
|
-
return [];
|
|
48
|
-
}
|
|
49
|
-
};
|
|
50
|
-
MinerUToolsetStrategy = __decorate([
|
|
51
|
-
Injectable(),
|
|
52
|
-
ToolsetStrategy(MinerU),
|
|
53
|
-
__param(0, Inject(forwardRef(() => ConfigService))),
|
|
54
|
-
__param(1, Inject(MinerUResultParserService)),
|
|
55
|
-
__metadata("design:paramtypes", [ConfigService,
|
|
56
|
-
MinerUResultParserService])
|
|
57
|
-
], MinerUToolsetStrategy);
|
|
58
|
-
export { MinerUToolsetStrategy };
|
|
@@ -1,90 +0,0 @@
|
|
|
1
|
-
import { z } from 'zod';
|
|
2
|
-
import { ConfigService } from '@nestjs/config';
|
|
3
|
-
import { MinerUResultParserService } from './result-parser.service.js';
|
|
4
|
-
export declare function buildPdfToMarkdownTool(configService: ConfigService, resultParser: MinerUResultParserService): import("@langchain/core/tools").DynamicStructuredTool<z.ZodObject<{
|
|
5
|
-
file: z.ZodObject<{
|
|
6
|
-
name: z.ZodOptional<z.ZodString>;
|
|
7
|
-
filename: z.ZodOptional<z.ZodString>;
|
|
8
|
-
content: z.ZodOptional<z.ZodUnion<[z.ZodString, z.ZodType<Buffer<ArrayBufferLike>, z.ZodTypeDef, Buffer<ArrayBufferLike>>, z.ZodType<Uint8Array<ArrayBuffer>, z.ZodTypeDef, Uint8Array<ArrayBuffer>>]>>;
|
|
9
|
-
filePath: z.ZodOptional<z.ZodString>;
|
|
10
|
-
fileUrl: z.ZodOptional<z.ZodString>;
|
|
11
|
-
}, "strip", z.ZodTypeAny, {
|
|
12
|
-
name?: string;
|
|
13
|
-
filePath?: string;
|
|
14
|
-
fileUrl?: string;
|
|
15
|
-
filename?: string;
|
|
16
|
-
content?: string | Uint8Array<ArrayBuffer> | Buffer<ArrayBufferLike>;
|
|
17
|
-
}, {
|
|
18
|
-
name?: string;
|
|
19
|
-
filePath?: string;
|
|
20
|
-
fileUrl?: string;
|
|
21
|
-
filename?: string;
|
|
22
|
-
content?: string | Uint8Array<ArrayBuffer> | Buffer<ArrayBufferLike>;
|
|
23
|
-
}>;
|
|
24
|
-
isOcr: z.ZodOptional<z.ZodBoolean>;
|
|
25
|
-
enableFormula: z.ZodOptional<z.ZodBoolean>;
|
|
26
|
-
enableTable: z.ZodOptional<z.ZodBoolean>;
|
|
27
|
-
language: z.ZodOptional<z.ZodEnum<["en", "ch"]>>;
|
|
28
|
-
modelVersion: z.ZodOptional<z.ZodEnum<["pipeline", "vlm"]>>;
|
|
29
|
-
}, "strip", z.ZodTypeAny, {
|
|
30
|
-
isOcr?: boolean;
|
|
31
|
-
enableFormula?: boolean;
|
|
32
|
-
enableTable?: boolean;
|
|
33
|
-
language?: "ch" | "en";
|
|
34
|
-
modelVersion?: "pipeline" | "vlm";
|
|
35
|
-
file?: {
|
|
36
|
-
name?: string;
|
|
37
|
-
filePath?: string;
|
|
38
|
-
fileUrl?: string;
|
|
39
|
-
filename?: string;
|
|
40
|
-
content?: string | Uint8Array<ArrayBuffer> | Buffer<ArrayBufferLike>;
|
|
41
|
-
};
|
|
42
|
-
}, {
|
|
43
|
-
isOcr?: boolean;
|
|
44
|
-
enableFormula?: boolean;
|
|
45
|
-
enableTable?: boolean;
|
|
46
|
-
language?: "ch" | "en";
|
|
47
|
-
modelVersion?: "pipeline" | "vlm";
|
|
48
|
-
file?: {
|
|
49
|
-
name?: string;
|
|
50
|
-
filePath?: string;
|
|
51
|
-
fileUrl?: string;
|
|
52
|
-
filename?: string;
|
|
53
|
-
content?: string | Uint8Array<ArrayBuffer> | Buffer<ArrayBufferLike>;
|
|
54
|
-
};
|
|
55
|
-
}>, {
|
|
56
|
-
isOcr?: boolean;
|
|
57
|
-
enableFormula?: boolean;
|
|
58
|
-
enableTable?: boolean;
|
|
59
|
-
language?: "ch" | "en";
|
|
60
|
-
modelVersion?: "pipeline" | "vlm";
|
|
61
|
-
file?: {
|
|
62
|
-
name?: string;
|
|
63
|
-
filePath?: string;
|
|
64
|
-
fileUrl?: string;
|
|
65
|
-
filename?: string;
|
|
66
|
-
content?: string | Uint8Array<ArrayBuffer> | Buffer<ArrayBufferLike>;
|
|
67
|
-
};
|
|
68
|
-
}, {
|
|
69
|
-
isOcr?: boolean;
|
|
70
|
-
enableFormula?: boolean;
|
|
71
|
-
enableTable?: boolean;
|
|
72
|
-
language?: "ch" | "en";
|
|
73
|
-
modelVersion?: "pipeline" | "vlm";
|
|
74
|
-
file?: {
|
|
75
|
-
name?: string;
|
|
76
|
-
filePath?: string;
|
|
77
|
-
fileUrl?: string;
|
|
78
|
-
filename?: string;
|
|
79
|
-
content?: string | Uint8Array<ArrayBuffer> | Buffer<ArrayBufferLike>;
|
|
80
|
-
};
|
|
81
|
-
}, (string | {
|
|
82
|
-
files: {
|
|
83
|
-
mimeType: string;
|
|
84
|
-
fileName: string;
|
|
85
|
-
filePath: string;
|
|
86
|
-
fileUrl: string;
|
|
87
|
-
extension: string;
|
|
88
|
-
}[];
|
|
89
|
-
})[]>;
|
|
90
|
-
//# sourceMappingURL=pdf-to-markdown.tool.d.ts.map
|
|
@@ -1 +0,0 @@
|
|
|
1
|
-
{"version":3,"file":"pdf-to-markdown.tool.d.ts","sourceRoot":"","sources":["../../src/lib/pdf-to-markdown.tool.ts"],"names":[],"mappings":"AAGA,OAAO,EAAE,CAAC,EAAE,MAAM,KAAK,CAAC;AACxB,OAAO,EAAE,aAAa,EAAE,MAAM,gBAAgB,CAAC;AAE/C,OAAO,EAAE,yBAAyB,EAAE,MAAM,4BAA4B,CAAC;AAIvE,wBAAgB,sBAAsB,CACpC,aAAa,EAAE,aAAa,EAC5B,YAAY,EAAE,yBAAyB;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;MAqKxC"}
|
|
@@ -1,146 +0,0 @@
|
|
|
1
|
-
import { tool } from '@langchain/core/tools';
|
|
2
|
-
import { getCurrentTaskInput } from '@langchain/langgraph';
|
|
3
|
-
import { getErrorMessage } from '@xpert-ai/plugin-sdk';
|
|
4
|
-
import { z } from 'zod';
|
|
5
|
-
import { MinerUClient } from './mineru.client.js';
|
|
6
|
-
export function buildPdfToMarkdownTool(configService, resultParser) {
|
|
7
|
-
return tool(async (input) => {
|
|
8
|
-
try {
|
|
9
|
-
const { file, isOcr, enableFormula, enableTable, language, modelVersion } = input;
|
|
10
|
-
if (!file) {
|
|
11
|
-
throw new Error('No file provided');
|
|
12
|
-
}
|
|
13
|
-
const currentState = getCurrentTaskInput();
|
|
14
|
-
const workspacePath = currentState?.[`sys`]?.['volume'] ?? '/tmp/xpert';
|
|
15
|
-
const baseUrl = currentState?.[`sys`]?.['workspace_url'] ?? 'http://localhost:3000';
|
|
16
|
-
// Get permissions from current state
|
|
17
|
-
const permissions = currentState?.[`sys`]?.['permissions'];
|
|
18
|
-
if (!permissions?.fileSystem) {
|
|
19
|
-
throw new Error('File system permission is required for MinerU tool');
|
|
20
|
-
}
|
|
21
|
-
// Get file content
|
|
22
|
-
let fileContent;
|
|
23
|
-
let fileName;
|
|
24
|
-
let filePath;
|
|
25
|
-
let fileUrl;
|
|
26
|
-
if (file.content) {
|
|
27
|
-
if (typeof file.content === 'string') {
|
|
28
|
-
// Base64 string
|
|
29
|
-
fileContent = Buffer.from(file.content, 'base64');
|
|
30
|
-
}
|
|
31
|
-
else if (Buffer.isBuffer(file.content)) {
|
|
32
|
-
fileContent = file.content;
|
|
33
|
-
}
|
|
34
|
-
else if (file.content instanceof Uint8Array) {
|
|
35
|
-
fileContent = Buffer.from(file.content);
|
|
36
|
-
}
|
|
37
|
-
else {
|
|
38
|
-
throw new Error('Invalid file content format');
|
|
39
|
-
}
|
|
40
|
-
fileName = file.name || file.filename || 'document.pdf';
|
|
41
|
-
}
|
|
42
|
-
else if (file.filePath) {
|
|
43
|
-
filePath = file.filePath;
|
|
44
|
-
fileContent = await permissions.fileSystem.readFile(filePath);
|
|
45
|
-
fileName = file.name || file.filename || filePath.split('/').pop() || 'document.pdf';
|
|
46
|
-
}
|
|
47
|
-
else if (file.fileUrl) {
|
|
48
|
-
fileUrl = file.fileUrl;
|
|
49
|
-
const response = await fetch(fileUrl);
|
|
50
|
-
if (!response.ok) {
|
|
51
|
-
throw new Error(`Failed to download file from URL: ${response.statusText}`);
|
|
52
|
-
}
|
|
53
|
-
const arrayBuffer = await response.arrayBuffer();
|
|
54
|
-
fileContent = Buffer.from(arrayBuffer);
|
|
55
|
-
fileName = file.name || file.filename || fileUrl.split('/').pop() || 'document.pdf';
|
|
56
|
-
}
|
|
57
|
-
else {
|
|
58
|
-
throw new Error('File must provide content, filePath, or fileUrl');
|
|
59
|
-
}
|
|
60
|
-
// Save file to workspace if not already there
|
|
61
|
-
if (!filePath) {
|
|
62
|
-
const relativePath = `mineru-input/${fileName}`;
|
|
63
|
-
filePath = relativePath;
|
|
64
|
-
fileUrl = await permissions.fileSystem.writeFile(relativePath, fileContent);
|
|
65
|
-
}
|
|
66
|
-
// Create MinerU client
|
|
67
|
-
const mineruClient = new MinerUClient(configService, {
|
|
68
|
-
fileSystem: permissions.fileSystem,
|
|
69
|
-
integration: permissions.integration,
|
|
70
|
-
});
|
|
71
|
-
// Create task
|
|
72
|
-
const { taskId } = await mineruClient.createTask({
|
|
73
|
-
url: fileUrl || file.fileUrl,
|
|
74
|
-
filePath: filePath,
|
|
75
|
-
fileName: fileName,
|
|
76
|
-
isOcr: isOcr ?? true,
|
|
77
|
-
enableFormula: enableFormula ?? true,
|
|
78
|
-
enableTable: enableTable ?? true,
|
|
79
|
-
language: language || 'ch',
|
|
80
|
-
modelVersion: modelVersion || 'pipeline',
|
|
81
|
-
});
|
|
82
|
-
// Get result
|
|
83
|
-
let result;
|
|
84
|
-
if (mineruClient.serverType === 'self-hosted') {
|
|
85
|
-
result = mineruClient.getSelfHostedTask(taskId);
|
|
86
|
-
if (!result) {
|
|
87
|
-
throw new Error('Failed to get MinerU task result');
|
|
88
|
-
}
|
|
89
|
-
}
|
|
90
|
-
else {
|
|
91
|
-
result = await mineruClient.waitForTask(taskId, 5 * 60 * 1000, 5000);
|
|
92
|
-
}
|
|
93
|
-
// Parse result
|
|
94
|
-
const parsedResult = mineruClient.serverType === 'self-hosted'
|
|
95
|
-
? await resultParser.parseLocalTask(result, taskId, { folder: 'mineru-output', name: fileName }, permissions.fileSystem)
|
|
96
|
-
: await resultParser.parseFromUrl(result.full_zip_url, taskId, { folder: 'mineru-output', name: fileName }, permissions.fileSystem);
|
|
97
|
-
// Get markdown content
|
|
98
|
-
const markdownContent = parsedResult.chunks[0]?.pageContent || '';
|
|
99
|
-
const outputFileName = fileName.replace(/\.pdf$/i, '.md');
|
|
100
|
-
const outputPath = `mineru-output/${outputFileName}`;
|
|
101
|
-
const outputUrl = await permissions.fileSystem.writeFile(outputPath, Buffer.from(markdownContent, 'utf-8'));
|
|
102
|
-
return [
|
|
103
|
-
`Successfully converted PDF to Markdown: ${outputFileName}`,
|
|
104
|
-
{
|
|
105
|
-
files: [
|
|
106
|
-
{
|
|
107
|
-
mimeType: 'text/markdown',
|
|
108
|
-
fileName: outputPath,
|
|
109
|
-
filePath: permissions.fileSystem.fullPath(outputPath),
|
|
110
|
-
fileUrl: outputUrl,
|
|
111
|
-
extension: 'md',
|
|
112
|
-
},
|
|
113
|
-
...(parsedResult.metadata.assets || []).map((asset) => ({
|
|
114
|
-
mimeType: asset.type === 'image' ? 'image/png' : 'application/json',
|
|
115
|
-
fileName: asset.filePath,
|
|
116
|
-
filePath: permissions.fileSystem.fullPath(asset.filePath),
|
|
117
|
-
fileUrl: asset.url,
|
|
118
|
-
extension: asset.type === 'image' ? 'png' : 'json',
|
|
119
|
-
})),
|
|
120
|
-
],
|
|
121
|
-
},
|
|
122
|
-
];
|
|
123
|
-
}
|
|
124
|
-
catch (error) {
|
|
125
|
-
throw new Error(`Error converting PDF to Markdown: ${getErrorMessage(error)}`);
|
|
126
|
-
}
|
|
127
|
-
}, {
|
|
128
|
-
name: 'pdf_to_markdown',
|
|
129
|
-
description: `Convert PDF file to Markdown format using MinerU. Supports OCR, formula recognition, and table extraction.`,
|
|
130
|
-
schema: z.object({
|
|
131
|
-
file: z.object({
|
|
132
|
-
name: z.string().optional(),
|
|
133
|
-
filename: z.string().optional(),
|
|
134
|
-
content: z.union([z.string(), z.instanceof(Buffer), z.instanceof(Uint8Array)]).optional(),
|
|
135
|
-
filePath: z.string().optional(),
|
|
136
|
-
fileUrl: z.string().optional(),
|
|
137
|
-
}),
|
|
138
|
-
isOcr: z.boolean().optional().describe('Enable OCR for image-based PDFs'),
|
|
139
|
-
enableFormula: z.boolean().optional().describe('Enable recognition of mathematical formulas'),
|
|
140
|
-
enableTable: z.boolean().optional().describe('Enable recognition of tables'),
|
|
141
|
-
language: z.enum(['en', 'ch']).optional().describe('Document language (en for English, ch for Chinese)'),
|
|
142
|
-
modelVersion: z.enum(['pipeline', 'vlm']).optional().describe('MinerU model version'),
|
|
143
|
-
}),
|
|
144
|
-
responseFormat: 'content_and_artifact',
|
|
145
|
-
});
|
|
146
|
-
}
|