@helloxiaohu/plugin-mineru 0.0.20 → 0.1.2
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +101 -101
- package/dist/index.d.ts.map +1 -1
- package/dist/index.js +7 -8
- package/dist/lib/integration.strategy.d.ts.map +1 -1
- package/dist/lib/integration.strategy.js +19 -4
- package/dist/lib/mineru.client.d.ts.map +1 -1
- package/dist/lib/mineru.client.js +157 -11
- package/dist/lib/mineru.plugin.d.ts.map +1 -1
- package/dist/lib/mineru.plugin.js +0 -2
- package/dist/lib/result-parser.service.d.ts.map +1 -1
- package/dist/lib/result-parser.service.js +1 -0
- package/dist/lib/transformer-mineru.strategy.d.ts +11 -0
- package/dist/lib/transformer-mineru.strategy.d.ts.map +1 -1
- package/dist/lib/transformer-mineru.strategy.js +31 -9
- package/dist/lib/types.d.ts +3 -13
- package/dist/lib/types.d.ts.map +1 -1
- package/dist/lib/types.js +22 -35
- package/package.json +54 -62
- package/dist/lib/mineru-toolset.strategy.d.ts +0 -167
- package/dist/lib/mineru-toolset.strategy.d.ts.map +0 -1
- package/dist/lib/mineru-toolset.strategy.js +0 -216
- package/dist/lib/mineru.tool.d.ts +0 -70
- package/dist/lib/mineru.tool.d.ts.map +0 -1
- package/dist/lib/mineru.tool.js +0 -145
- package/dist/lib/mineru.toolset.d.ts +0 -51
- package/dist/lib/mineru.toolset.d.ts.map +0 -1
- package/dist/lib/mineru.toolset.js +0 -52
package/README.md
CHANGED
|
@@ -1,101 +1,101 @@
|
|
|
1
|
-
# Xpert Plugin: MinerU
|
|
2
|
-
|
|
3
|
-
`@xpert-ai/plugin-mineru` is a MinerU document converter plugin for the [Xpert AI](https://github.com/xpert-ai/xpert) platform, providing extraction capabilities from PDF to Markdown and structured JSON. The plugin includes built-in MinerU integration strategies, document conversion strategies, and result parsing services, enabling secure access to the MinerU API in automated workflows, polling task status, and writing parsed content and attachment resources to the platform file system.
|
|
4
|
-
|
|
5
|
-
## Installation
|
|
6
|
-
|
|
7
|
-
```bash
|
|
8
|
-
pnpm add @xpert-ai/plugin-mineru
|
|
9
|
-
# or
|
|
10
|
-
npm install @xpert-ai/plugin-mineru
|
|
11
|
-
```
|
|
12
|
-
|
|
13
|
-
> **Note**: This plugin depends on `@xpert-ai/plugin-sdk`, `@nestjs/common@^11`, `@nestjs/config@^4`, `@metad/contracts`, `axios@1`, `chalk@4`, `@langchain/core@^0.3.72`, and `uuid@8` as peerDependencies. Please ensure these packages are installed in your host project.
|
|
14
|
-
|
|
15
|
-
## Quick Start
|
|
16
|
-
|
|
17
|
-
1. **Prepare MinerU Credentials**
|
|
18
|
-
Obtain a valid API Key from the MinerU dashboard and confirm the service address (default: `https://mineru.net/api/v4`).
|
|
19
|
-
|
|
20
|
-
2. **Configure Integration in Xpert**
|
|
21
|
-
- Via Xpert Console: Create a MinerU integration and fill in the following fields.
|
|
22
|
-
- Or set environment variables in your deployment environment:
|
|
23
|
-
- `MINERU_API_BASE_URL`: Optional, defaults to `https://mineru.net/api/v4`.
|
|
24
|
-
- `MINERU_API_TOKEN`: Required, used as a fallback credential if no integration is configured.
|
|
25
|
-
|
|
26
|
-
Example integration configuration (JSON):
|
|
27
|
-
|
|
28
|
-
```json
|
|
29
|
-
{
|
|
30
|
-
"provider": "mineru",
|
|
31
|
-
"options": {
|
|
32
|
-
"apiUrl": "https://mineru.net/api/v4",
|
|
33
|
-
"apiKey": "your-mineru-api-key"
|
|
34
|
-
}
|
|
35
|
-
}
|
|
36
|
-
```
|
|
37
|
-
|
|
38
|
-
3. **Register the Plugin**
|
|
39
|
-
Configure the plugin in your host service's plugin registration process:
|
|
40
|
-
|
|
41
|
-
```sh .env
|
|
42
|
-
PLUGINS=@xpert-ai/plugin-mineru
|
|
43
|
-
```
|
|
44
|
-
|
|
45
|
-
The plugin returns the NestJS module `MinerUPlugin` in the `register` hook and logs messages during the `onStart`/`onStop` lifecycle.
|
|
46
|
-
|
|
47
|
-
## MinerU Integration Options
|
|
48
|
-
|
|
49
|
-
| Field | Type | Description | Required | Default |
|
|
50
|
-
| -------- | ------ | ------------------------------------- | -------- | ---------------------------- |
|
|
51
|
-
| apiUrl | string | MinerU API base URL | No | `https://mineru.net/api/v4` |
|
|
52
|
-
| apiKey | string | MinerU service API Key (keep secret) | Yes | — |
|
|
53
|
-
|
|
54
|
-
> If both integration configuration and environment variables are provided, options from the integration configuration take precedence.
|
|
55
|
-
|
|
56
|
-
## Document Conversion Parameters
|
|
57
|
-
|
|
58
|
-
`MinerUTransformerStrategy` supports the following configuration options (passed to the MinerU API when starting a workflow):
|
|
59
|
-
|
|
60
|
-
| Field | Type | Default | Description |
|
|
61
|
-
| ---------------- | ------- | ------------ | --------------------------------------------------- |
|
|
62
|
-
| `isOcr` | boolean | `true` | Enable OCR for image-based PDFs. |
|
|
63
|
-
| `enableFormula` | boolean | `true` | Recognize mathematical formulas and output tags. |
|
|
64
|
-
| `enableTable` | boolean | `true` | Recognize tables and output structured tags. |
|
|
65
|
-
| `language` | string | `"ch"` | Main document language, per MinerU API (`en`/`ch`). |
|
|
66
|
-
| `modelVersion` | string | `"pipeline"` | MinerU model version (`pipeline`, `vlm`, etc.). |
|
|
67
|
-
|
|
68
|
-
By default, the plugin creates MinerU tasks for each file to be processed, polls until `full_zip_url` is returned, then downloads and parses the zip package in memory.
|
|
69
|
-
|
|
70
|
-
## Permissions
|
|
71
|
-
|
|
72
|
-
- **Integration**: Access MinerU integration configuration to read API address and credentials.
|
|
73
|
-
- **File System**: Perform `read/write/list` on `XpFileSystem` to store image resources from MinerU results.
|
|
74
|
-
|
|
75
|
-
Ensure the plugin is granted these permissions in your authorization policy, or it will not be able to retrieve results or write attachments.
|
|
76
|
-
|
|
77
|
-
## Output Content
|
|
78
|
-
|
|
79
|
-
The parser generates:
|
|
80
|
-
|
|
81
|
-
- Full Markdown: Resource links are automatically replaced to point to actual URLs written via `XpFileSystem`.
|
|
82
|
-
- Structured metadata: Includes MinerU task ID, layout JSON (`layout.json`), content list (`content_list.json`), original PDF filename, etc.
|
|
83
|
-
- Attachment asset list: Records written image resources for easy association by callers.
|
|
84
|
-
|
|
85
|
-
The returned `Document<ChunkMetadata>` array currently defaults to a single chunk containing the full Markdown; you can split it as needed.
|
|
86
|
-
|
|
87
|
-
## Development & Debugging
|
|
88
|
-
|
|
89
|
-
Run the following commands in the repository root to build and test locally:
|
|
90
|
-
|
|
91
|
-
```bash
|
|
92
|
-
npm install
|
|
93
|
-
npx nx build @xpert-ai/plugin-mineru
|
|
94
|
-
npx nx test @xpert-ai/plugin-mineru
|
|
95
|
-
```
|
|
96
|
-
|
|
97
|
-
TypeScript build artifacts are output to `packages/mineru/dist`. Before publishing, ensure `package.json`, type declarations, and runtime files are in sync.
|
|
98
|
-
|
|
99
|
-
## License
|
|
100
|
-
|
|
101
|
-
This project follows the [AGPL-3.0 License](../../../LICENSE) in the repository root.
|
|
1
|
+
# Xpert Plugin: MinerU
|
|
2
|
+
|
|
3
|
+
`@xpert-ai/plugin-mineru` is a MinerU document converter plugin for the [Xpert AI](https://github.com/xpert-ai/xpert) platform, providing extraction capabilities from PDF to Markdown and structured JSON. The plugin includes built-in MinerU integration strategies, document conversion strategies, and result parsing services, enabling secure access to the MinerU API in automated workflows, polling task status, and writing parsed content and attachment resources to the platform file system.
|
|
4
|
+
|
|
5
|
+
## Installation
|
|
6
|
+
|
|
7
|
+
```bash
|
|
8
|
+
pnpm add @xpert-ai/plugin-mineru
|
|
9
|
+
# or
|
|
10
|
+
npm install @xpert-ai/plugin-mineru
|
|
11
|
+
```
|
|
12
|
+
|
|
13
|
+
> **Note**: This plugin depends on `@xpert-ai/plugin-sdk`, `@nestjs/common@^11`, `@nestjs/config@^4`, `@metad/contracts`, `axios@1`, `chalk@4`, `@langchain/core@^0.3.72`, and `uuid@8` as peerDependencies. Please ensure these packages are installed in your host project.
|
|
14
|
+
|
|
15
|
+
## Quick Start
|
|
16
|
+
|
|
17
|
+
1. **Prepare MinerU Credentials**
|
|
18
|
+
Obtain a valid API Key from the MinerU dashboard and confirm the service address (default: `https://mineru.net/api/v4`).
|
|
19
|
+
|
|
20
|
+
2. **Configure Integration in Xpert**
|
|
21
|
+
- Via Xpert Console: Create a MinerU integration and fill in the following fields.
|
|
22
|
+
- Or set environment variables in your deployment environment:
|
|
23
|
+
- `MINERU_API_BASE_URL`: Optional, defaults to `https://mineru.net/api/v4`.
|
|
24
|
+
- `MINERU_API_TOKEN`: Required, used as a fallback credential if no integration is configured.
|
|
25
|
+
|
|
26
|
+
Example integration configuration (JSON):
|
|
27
|
+
|
|
28
|
+
```json
|
|
29
|
+
{
|
|
30
|
+
"provider": "mineru",
|
|
31
|
+
"options": {
|
|
32
|
+
"apiUrl": "https://mineru.net/api/v4",
|
|
33
|
+
"apiKey": "your-mineru-api-key"
|
|
34
|
+
}
|
|
35
|
+
}
|
|
36
|
+
```
|
|
37
|
+
|
|
38
|
+
3. **Register the Plugin**
|
|
39
|
+
Configure the plugin in your host service's plugin registration process:
|
|
40
|
+
|
|
41
|
+
```sh .env
|
|
42
|
+
PLUGINS=@xpert-ai/plugin-mineru
|
|
43
|
+
```
|
|
44
|
+
|
|
45
|
+
The plugin returns the NestJS module `MinerUPlugin` in the `register` hook and logs messages during the `onStart`/`onStop` lifecycle.
|
|
46
|
+
|
|
47
|
+
## MinerU Integration Options
|
|
48
|
+
|
|
49
|
+
| Field | Type | Description | Required | Default |
|
|
50
|
+
| -------- | ------ | ------------------------------------- | -------- | ---------------------------- |
|
|
51
|
+
| apiUrl | string | MinerU API base URL | No | `https://mineru.net/api/v4` |
|
|
52
|
+
| apiKey | string | MinerU service API Key (keep secret) | Yes | — |
|
|
53
|
+
|
|
54
|
+
> If both integration configuration and environment variables are provided, options from the integration configuration take precedence.
|
|
55
|
+
|
|
56
|
+
## Document Conversion Parameters
|
|
57
|
+
|
|
58
|
+
`MinerUTransformerStrategy` supports the following configuration options (passed to the MinerU API when starting a workflow):
|
|
59
|
+
|
|
60
|
+
| Field | Type | Default | Description |
|
|
61
|
+
| ---------------- | ------- | ------------ | --------------------------------------------------- |
|
|
62
|
+
| `isOcr` | boolean | `true` | Enable OCR for image-based PDFs. |
|
|
63
|
+
| `enableFormula` | boolean | `true` | Recognize mathematical formulas and output tags. |
|
|
64
|
+
| `enableTable` | boolean | `true` | Recognize tables and output structured tags. |
|
|
65
|
+
| `language` | string | `"ch"` | Main document language, per MinerU API (`en`/`ch`). |
|
|
66
|
+
| `modelVersion` | string | `"pipeline"` | MinerU model version (`pipeline`, `vlm`, etc.). |
|
|
67
|
+
|
|
68
|
+
By default, the plugin creates MinerU tasks for each file to be processed, polls until `full_zip_url` is returned, then downloads and parses the zip package in memory.
|
|
69
|
+
|
|
70
|
+
## Permissions
|
|
71
|
+
|
|
72
|
+
- **Integration**: Access MinerU integration configuration to read API address and credentials.
|
|
73
|
+
- **File System**: Perform `read/write/list` on `XpFileSystem` to store image resources from MinerU results.
|
|
74
|
+
|
|
75
|
+
Ensure the plugin is granted these permissions in your authorization policy, or it will not be able to retrieve results or write attachments.
|
|
76
|
+
|
|
77
|
+
## Output Content
|
|
78
|
+
|
|
79
|
+
The parser generates:
|
|
80
|
+
|
|
81
|
+
- Full Markdown: Resource links are automatically replaced to point to actual URLs written via `XpFileSystem`.
|
|
82
|
+
- Structured metadata: Includes MinerU task ID, layout JSON (`layout.json`), content list (`content_list.json`), original PDF filename, etc.
|
|
83
|
+
- Attachment asset list: Records written image resources for easy association by callers.
|
|
84
|
+
|
|
85
|
+
The returned `Document<ChunkMetadata>` array currently defaults to a single chunk containing the full Markdown; you can split it as needed.
|
|
86
|
+
|
|
87
|
+
## Development & Debugging
|
|
88
|
+
|
|
89
|
+
Run the following commands in the repository root to build and test locally:
|
|
90
|
+
|
|
91
|
+
```bash
|
|
92
|
+
npm install
|
|
93
|
+
npx nx build @xpert-ai/plugin-mineru
|
|
94
|
+
npx nx test @xpert-ai/plugin-mineru
|
|
95
|
+
```
|
|
96
|
+
|
|
97
|
+
TypeScript build artifacts are output to `packages/mineru/dist`. Before publishing, ensure `package.json`, type declarations, and runtime files are in sync.
|
|
98
|
+
|
|
99
|
+
## License
|
|
100
|
+
|
|
101
|
+
This project follows the [AGPL-3.0 License](../../../LICENSE) in the repository root.
|
package/dist/index.d.ts.map
CHANGED
|
@@ -1 +1 @@
|
|
|
1
|
-
{"version":3,"file":"index.d.ts","sourceRoot":"","sources":["../src/index.ts"],"names":[],"mappings":"AAAA,OAAO,EAAE,CAAC,EAAE,MAAM,KAAK,CAAC;AACxB,OAAO,KAAK,EAAE,WAAW,EAAE,MAAM,sBAAsB,CAAC;
|
|
1
|
+
{"version":3,"file":"index.d.ts","sourceRoot":"","sources":["../src/index.ts"],"names":[],"mappings":"AAAA,OAAO,EAAE,CAAC,EAAE,MAAM,KAAK,CAAC;AACxB,OAAO,KAAK,EAAE,WAAW,EAAE,MAAM,sBAAsB,CAAC;AAcxD,QAAA,MAAM,YAAY,gDAChB,CAAC;AAEH,QAAA,MAAM,MAAM,EAAE,WAAW,CAAC,CAAC,CAAC,KAAK,CAAC,OAAO,YAAY,CAAC,CA4BrD,CAAC;AAEF,eAAe,MAAM,CAAC"}
|
package/dist/index.js
CHANGED
|
@@ -1,25 +1,24 @@
|
|
|
1
1
|
import { z } from 'zod';
|
|
2
2
|
import { readFileSync } from 'fs';
|
|
3
|
-
import {
|
|
4
|
-
import { dirname, join } from 'path';
|
|
3
|
+
import { join } from 'path';
|
|
5
4
|
import { MinerUPlugin } from './lib/mineru.plugin.js';
|
|
6
5
|
import { icon } from './lib/types.js';
|
|
7
|
-
|
|
8
|
-
const
|
|
9
|
-
const packageJson = JSON.parse(readFileSync(join(
|
|
6
|
+
import { getModuleMeta } from './lib/path-meta.js';
|
|
7
|
+
const { __filename, __dirname } = getModuleMeta(import.meta);
|
|
8
|
+
const packageJson = JSON.parse(readFileSync(join(__dirname, '../package.json'), 'utf8'));
|
|
10
9
|
const ConfigSchema = z.object({});
|
|
11
10
|
const plugin = {
|
|
12
11
|
meta: {
|
|
13
12
|
name: packageJson.name,
|
|
14
13
|
version: packageJson.version,
|
|
15
|
-
category: '
|
|
14
|
+
category: 'set',
|
|
16
15
|
icon: {
|
|
17
16
|
type: 'svg',
|
|
18
17
|
value: icon
|
|
19
18
|
},
|
|
20
19
|
displayName: 'MinerU Transformer',
|
|
21
|
-
description: 'Provide
|
|
22
|
-
keywords: ['integration', 'pdf', 'markdown', 'json', 'transformer'],
|
|
20
|
+
description: 'Provide document to Markdown and JSON transformation functionality',
|
|
21
|
+
keywords: ['integration', 'document', 'pdf', 'docx', 'ppt', 'image', 'markdown', 'json', 'transformer'],
|
|
23
22
|
author: 'XpertAI Team',
|
|
24
23
|
homepage: 'https://www.npmjs.com/package/@xpert-ai/plugin-mineru',
|
|
25
24
|
},
|
|
@@ -1 +1 @@
|
|
|
1
|
-
{"version":3,"file":"integration.strategy.d.ts","sourceRoot":"","sources":["../../src/lib/integration.strategy.ts"],"names":[],"mappings":"AAAA,OAAO,EAAE,KAAK,YAAY,EAAE,oBAAoB,EAAE,MAAM,kBAAkB,CAAC;AAO3E,OAAO,EACL,mBAAmB,EAGnB,0BAA0B,EAC3B,MAAM,sBAAsB,CAAC;AAE9B,OAAO,
|
|
1
|
+
{"version":3,"file":"integration.strategy.d.ts","sourceRoot":"","sources":["../../src/lib/integration.strategy.ts"],"names":[],"mappings":"AAAA,OAAO,EAAE,KAAK,YAAY,EAAE,oBAAoB,EAAE,MAAM,kBAAkB,CAAC;AAO3E,OAAO,EACL,mBAAmB,EAGnB,0BAA0B,EAC3B,MAAM,sBAAsB,CAAC;AAE9B,OAAO,EAAgB,wBAAwB,EAAE,MAAM,YAAY,CAAC;AAEpE,qBAEa,yBACX,YAAW,mBAAmB,CAAC,wBAAwB,CAAC;IAExD,QAAQ,CAAC,IAAI,EAAE,oBAAoB,CAuFjC;IAGF,OAAO,CAAC,QAAQ,CAAC,aAAa,CAAgB;IAExC,OAAO,CACX,WAAW,EAAE,YAAY,CAAC,wBAAwB,CAAC,EACnD,OAAO,EAAE,0BAA0B,GAClC,OAAO,CAAC,GAAG,CAAC;IAIT,cAAc,CAAC,MAAM,EAAE,wBAAwB,GAAG,OAAO,CAAC,IAAI,CAAC;CA2BtE"}
|
|
@@ -3,11 +3,11 @@ import { forwardRef, Inject, Injectable, } from '@nestjs/common';
|
|
|
3
3
|
import { ConfigService } from '@nestjs/config';
|
|
4
4
|
import { IntegrationStrategyKey, } from '@xpert-ai/plugin-sdk';
|
|
5
5
|
import { MinerUClient } from './mineru.client.js';
|
|
6
|
-
import { icon,
|
|
6
|
+
import { icon, MinerU } from './types.js';
|
|
7
7
|
let MinerUIntegrationStrategy = class MinerUIntegrationStrategy {
|
|
8
8
|
constructor() {
|
|
9
9
|
this.meta = {
|
|
10
|
-
name:
|
|
10
|
+
name: MinerU,
|
|
11
11
|
label: {
|
|
12
12
|
en_US: 'MinerU',
|
|
13
13
|
},
|
|
@@ -68,6 +68,21 @@ let MinerUIntegrationStrategy = class MinerUIntegrationStrategy {
|
|
|
68
68
|
enum: ['official', 'self-hosted'],
|
|
69
69
|
default: 'official',
|
|
70
70
|
},
|
|
71
|
+
extraFormats: {
|
|
72
|
+
type: 'array',
|
|
73
|
+
title: {
|
|
74
|
+
en_US: 'Extra Formats',
|
|
75
|
+
zh_Hans: '额外输出格式',
|
|
76
|
+
},
|
|
77
|
+
description: {
|
|
78
|
+
en_US: 'Optional extra output formats (docx, html, latex). Markdown and JSON are always included.',
|
|
79
|
+
zh_Hans: '可选额外输出格式(docx、html、latex)。Markdown 和 JSON 默认包含。',
|
|
80
|
+
},
|
|
81
|
+
items: {
|
|
82
|
+
type: 'string',
|
|
83
|
+
enum: ['docx', 'html', 'latex'],
|
|
84
|
+
},
|
|
85
|
+
},
|
|
71
86
|
},
|
|
72
87
|
},
|
|
73
88
|
features: [],
|
|
@@ -80,7 +95,7 @@ let MinerUIntegrationStrategy = class MinerUIntegrationStrategy {
|
|
|
80
95
|
async validateConfig(config) {
|
|
81
96
|
const mineruClient = new MinerUClient(this.configService, {
|
|
82
97
|
integration: {
|
|
83
|
-
provider:
|
|
98
|
+
provider: MinerU,
|
|
84
99
|
options: config,
|
|
85
100
|
},
|
|
86
101
|
});
|
|
@@ -113,6 +128,6 @@ __decorate([
|
|
|
113
128
|
], MinerUIntegrationStrategy.prototype, "configService", void 0);
|
|
114
129
|
MinerUIntegrationStrategy = __decorate([
|
|
115
130
|
Injectable(),
|
|
116
|
-
IntegrationStrategyKey(
|
|
131
|
+
IntegrationStrategyKey(MinerU)
|
|
117
132
|
], MinerUIntegrationStrategy);
|
|
118
133
|
export { MinerUIntegrationStrategy };
|
|
@@ -1 +1 @@
|
|
|
1
|
-
{"version":3,"file":"mineru.client.d.ts","sourceRoot":"","sources":["../../src/lib/mineru.client.ts"],"names":[],"mappings":"AAAA,OAAO,EAAE,YAAY,EAAE,MAAM,kBAAkB,CAAC;AAEhD,OAAO,EAAE,aAAa,EAAE,MAAM,gBAAgB,CAAC;AAC/C,OAAO,EAAmB,YAAY,EAAE,MAAM,sBAAsB,CAAC;AACrE,OAAc,EAAE,aAAa,EAAE,MAAM,OAAO,CAAC;AAK7C,OAAO,EAIL,wBAAwB,EAExB,0BAA0B,EAC1B,gBAAgB,EACjB,MAAM,YAAY,CAAC;AAIpB,UAAU,iBAAiB;IACzB,GAAG,CAAC,EAAE,MAAM,CAAC;IACb,QAAQ,CAAC,EAAE,MAAM,CAAC;IAClB,QAAQ,CAAC,EAAE,MAAM,CAAC;IAClB,KAAK,CAAC,EAAE,OAAO,CAAC;IAChB,aAAa,CAAC,EAAE,OAAO,CAAC;IACxB,WAAW,CAAC,EAAE,OAAO,CAAC;IACtB,QAAQ,CAAC,EAAE,MAAM,CAAC;IAClB,YAAY,CAAC,EAAE,MAAM,CAAC;IACtB,MAAM,CAAC,EAAE,MAAM,CAAC;IAChB,UAAU,CAAC,EAAE,MAAM,CAAC;IACpB,YAAY,CAAC,EAAE,MAAM,EAAE,CAAC;IACxB,WAAW,CAAC,EAAE,MAAM,CAAC;IACrB,IAAI,CAAC,EAAE,MAAM,CAAC;IACd,mEAAmE;IACnE,WAAW,CAAC,EAAE,MAAM,CAAC;IACrB,yEAAyE;IACzE,OAAO,CAAC,EAAE,MAAM,CAAC;IACjB,2EAA2E;IAC3E,SAAS,CAAC,EAAE,MAAM,CAAC;IACnB,4EAA4E;IAC5E,gBAAgB,CAAC,EAAE,OAAO,CAAC;CAC5B;AAED,UAAU,mBAAmB;IAC3B,GAAG,EAAE,MAAM,CAAC;IACZ,KAAK,CAAC,EAAE,OAAO,CAAC;IAChB,MAAM,CAAC,EAAE,MAAM,CAAC;IAChB,UAAU,CAAC,EAAE,MAAM,CAAC;CACrB;AAED,UAAU,sBAAsB;IAC9B,KAAK,EAAE,mBAAmB,EAAE,CAAC;IAC7B,aAAa,CAAC,EAAE,OAAO,CAAC;IACxB,WAAW,CAAC,EAAE,OAAO,CAAC;IACtB,QAAQ,CAAC,EAAE,MAAM,CAAC;IAClB,YAAY,CAAC,EAAE,MAAM,CAAC;IACtB,YAAY,CAAC,EAAE,MAAM,EAAE,CAAC;IACxB,WAAW,CAAC,EAAE,MAAM,CAAC;IACrB,IAAI,CAAC,EAAE,MAAM,CAAC;CACf;AAED,UAAU,iBAAiB;IACzB,aAAa,CAAC,EAAE,OAAO,CAAC;IACxB,WAAW,CAAC,EAAE,OAAO,CAAC;IACtB,QAAQ,CAAC,EAAE,MAAM,CAAC;CACnB;AASD,qBAAa,YAAY;IAWrB,OAAO,CAAC,QAAQ,CAAC,aAAa;IAC9B,OAAO,CAAC,QAAQ,CAAC,WAAW,CAAC;IAX/B,OAAO,CAAC,QAAQ,CAAC,MAAM,CAAiC;IACxD,OAAO,CAAC,QAAQ,CAAC,OAAO,CAAS;IACjC,OAAO,CAAC,QAAQ,CAAC,KAAK,CAAC,CAAS;IAChC,SAAgB,UAAU,EAAE,gBAAgB,CAAC;IAC7C,OAAO,CAAC,QAAQ,CAAC,UAAU,CAAiD;IAE5E,IAAI,UAAU,IAAI,YAAY,GAAG,SAAS,CAEzC;gBAEkB,aAAa,EAAE,aAAa,EAC5B,WAAW,CAAC,EAAE;QACvB,UAAU,CAAC,EAAE,YAAY,CAAC;QAC1B,WAAW,CAAC,EAAE,OAAO,CAAC,YAAY,CAAC,wBAAwB,CAAC,CAAC,CAAC;KACjE;IAkBP;;;OAGG;IACG,UAAU,CAAC,OAAO,EAAE,iBAAiB,GAAG,OAAO,CAAC;QAAE,MAAM,EAAE,MAAM,CAAA;KAAE,CAAC;IAYzE;;OAEG;IACG,eAAe,CAAC,OAAO,EAAE,sBAAsB,GAAG,OAAO,CAAC;QAAE,OAAO,EAAE,MAAM,CAAC;QAAC,QAAQ,CAAC,EAAE,MAAM,EAAE,CAAA;KAAE,CAAC;IAmCzG,iBAAiB,CAAC,MAAM,EAAE,MAAM,GAAG,0BAA0B,GAAG,SAAS;IAOzE;;OAEG;IACG,aAAa,CAAC,MAAM,EAAE,MAAM,EAAE,OAAO,CAAC,EAAE,iBAAiB,GAAG,OAAO,CAAC;QACxE,YAAY,CAAC,EAAE,MAAM,CAAC;QACtB,QAAQ,CAAC,EAAE,MAAM,CAAC;QAClB,OAAO,CAAC,EAAE,MAAM,CAAC;QACjB,MAAM,CAAC,EAAE,MAAM,CAAC;KACjB,CAAC;IAoBF;;OAEG;IACG,cAAc,CAAC,OAAO,EAAE,MAAM,GAAG,OAAO,CAAC,GAAG,CAAC;IAiBnD;;OAEG;IACG,WAAW,CAAC,MAAM,EAAE,MAAM,EAAE,SAAS,SAAgB,EAAE,UAAU,SAAO,GAAG,OAAO,CAAC,GAAG,CAAC;IAsB7F,OAAO,CAAC,cAAc;IAMtB,OAAO,CAAC,iBAAiB;IAczB,OAAO,CAAC,kBAAkB;
|
|
1
|
+
{"version":3,"file":"mineru.client.d.ts","sourceRoot":"","sources":["../../src/lib/mineru.client.ts"],"names":[],"mappings":"AAAA,OAAO,EAAE,YAAY,EAAE,MAAM,kBAAkB,CAAC;AAEhD,OAAO,EAAE,aAAa,EAAE,MAAM,gBAAgB,CAAC;AAC/C,OAAO,EAAmB,YAAY,EAAE,MAAM,sBAAsB,CAAC;AACrE,OAAc,EAAE,aAAa,EAAE,MAAM,OAAO,CAAC;AAK7C,OAAO,EAIL,wBAAwB,EAExB,0BAA0B,EAC1B,gBAAgB,EACjB,MAAM,YAAY,CAAC;AAIpB,UAAU,iBAAiB;IACzB,GAAG,CAAC,EAAE,MAAM,CAAC;IACb,QAAQ,CAAC,EAAE,MAAM,CAAC;IAClB,QAAQ,CAAC,EAAE,MAAM,CAAC;IAClB,KAAK,CAAC,EAAE,OAAO,CAAC;IAChB,aAAa,CAAC,EAAE,OAAO,CAAC;IACxB,WAAW,CAAC,EAAE,OAAO,CAAC;IACtB,QAAQ,CAAC,EAAE,MAAM,CAAC;IAClB,YAAY,CAAC,EAAE,MAAM,CAAC;IACtB,MAAM,CAAC,EAAE,MAAM,CAAC;IAChB,UAAU,CAAC,EAAE,MAAM,CAAC;IACpB,YAAY,CAAC,EAAE,MAAM,EAAE,CAAC;IACxB,WAAW,CAAC,EAAE,MAAM,CAAC;IACrB,IAAI,CAAC,EAAE,MAAM,CAAC;IACd,mEAAmE;IACnE,WAAW,CAAC,EAAE,MAAM,CAAC;IACrB,yEAAyE;IACzE,OAAO,CAAC,EAAE,MAAM,CAAC;IACjB,2EAA2E;IAC3E,SAAS,CAAC,EAAE,MAAM,CAAC;IACnB,4EAA4E;IAC5E,gBAAgB,CAAC,EAAE,OAAO,CAAC;CAC5B;AAED,UAAU,mBAAmB;IAC3B,GAAG,EAAE,MAAM,CAAC;IACZ,KAAK,CAAC,EAAE,OAAO,CAAC;IAChB,MAAM,CAAC,EAAE,MAAM,CAAC;IAChB,UAAU,CAAC,EAAE,MAAM,CAAC;CACrB;AAED,UAAU,sBAAsB;IAC9B,KAAK,EAAE,mBAAmB,EAAE,CAAC;IAC7B,aAAa,CAAC,EAAE,OAAO,CAAC;IACxB,WAAW,CAAC,EAAE,OAAO,CAAC;IACtB,QAAQ,CAAC,EAAE,MAAM,CAAC;IAClB,YAAY,CAAC,EAAE,MAAM,CAAC;IACtB,YAAY,CAAC,EAAE,MAAM,EAAE,CAAC;IACxB,WAAW,CAAC,EAAE,MAAM,CAAC;IACrB,IAAI,CAAC,EAAE,MAAM,CAAC;CACf;AAED,UAAU,iBAAiB;IACzB,aAAa,CAAC,EAAE,OAAO,CAAC;IACxB,WAAW,CAAC,EAAE,OAAO,CAAC;IACtB,QAAQ,CAAC,EAAE,MAAM,CAAC;CACnB;AASD,qBAAa,YAAY;IAWrB,OAAO,CAAC,QAAQ,CAAC,aAAa;IAC9B,OAAO,CAAC,QAAQ,CAAC,WAAW,CAAC;IAX/B,OAAO,CAAC,QAAQ,CAAC,MAAM,CAAiC;IACxD,OAAO,CAAC,QAAQ,CAAC,OAAO,CAAS;IACjC,OAAO,CAAC,QAAQ,CAAC,KAAK,CAAC,CAAS;IAChC,SAAgB,UAAU,EAAE,gBAAgB,CAAC;IAC7C,OAAO,CAAC,QAAQ,CAAC,UAAU,CAAiD;IAE5E,IAAI,UAAU,IAAI,YAAY,GAAG,SAAS,CAEzC;gBAEkB,aAAa,EAAE,aAAa,EAC5B,WAAW,CAAC,EAAE;QACvB,UAAU,CAAC,EAAE,YAAY,CAAC;QAC1B,WAAW,CAAC,EAAE,OAAO,CAAC,YAAY,CAAC,wBAAwB,CAAC,CAAC,CAAC;KACjE;IAkBP;;;OAGG;IACG,UAAU,CAAC,OAAO,EAAE,iBAAiB,GAAG,OAAO,CAAC;QAAE,MAAM,EAAE,MAAM,CAAA;KAAE,CAAC;IAYzE;;OAEG;IACG,eAAe,CAAC,OAAO,EAAE,sBAAsB,GAAG,OAAO,CAAC;QAAE,OAAO,EAAE,MAAM,CAAC;QAAC,QAAQ,CAAC,EAAE,MAAM,EAAE,CAAA;KAAE,CAAC;IAmCzG,iBAAiB,CAAC,MAAM,EAAE,MAAM,GAAG,0BAA0B,GAAG,SAAS;IAOzE;;OAEG;IACG,aAAa,CAAC,MAAM,EAAE,MAAM,EAAE,OAAO,CAAC,EAAE,iBAAiB,GAAG,OAAO,CAAC;QACxE,YAAY,CAAC,EAAE,MAAM,CAAC;QACtB,QAAQ,CAAC,EAAE,MAAM,CAAC;QAClB,OAAO,CAAC,EAAE,MAAM,CAAC;QACjB,MAAM,CAAC,EAAE,MAAM,CAAC;KACjB,CAAC;IAoBF;;OAEG;IACG,cAAc,CAAC,OAAO,EAAE,MAAM,GAAG,OAAO,CAAC,GAAG,CAAC;IAiBnD;;OAEG;IACG,WAAW,CAAC,MAAM,EAAE,MAAM,EAAE,SAAS,SAAgB,EAAE,UAAU,SAAO,GAAG,OAAO,CAAC,GAAG,CAAC;IAsB7F,OAAO,CAAC,cAAc;IAMtB,OAAO,CAAC,iBAAiB;IAczB,OAAO,CAAC,kBAAkB;IAiC1B,OAAO,CAAC,sBAAsB;IAI9B,OAAO,CAAC,gBAAgB;IAIxB,OAAO,CAAC,WAAW;IAQnB,OAAO,CAAC,kBAAkB;IAO1B,OAAO,CAAC,oBAAoB;YAYd,kBAAkB;YA4BlB,oBAAoB;YAwIpB,qBAAqB;YAyFrB,uBAAuB;IAsDrC,OAAO,CAAC,iBAAiB;IAgBzB,OAAO,CAAC,2BAA2B;IAenC,OAAO,CAAC,6BAA6B;IAcrC,OAAO,CAAC,iBAAiB;IAQzB,OAAO,CAAC,aAAa;IAcrB,OAAO,CAAC,iBAAiB;IAQzB,OAAO,CAAC,eAAe;YAIT,YAAY;IAkB1B,OAAO,CAAC,eAAe;IA0BvB,wBAAwB,IAAI,OAAO,CAAC,aAAa,CAAC,GAAG,EAAE,GAAG,CAAC,CAAC;IAKtD,wBAAwB;CAU/B"}
|
|
@@ -3,7 +3,7 @@ import { getErrorMessage } from '@xpert-ai/plugin-sdk';
|
|
|
3
3
|
import axios from 'axios';
|
|
4
4
|
import FormData from 'form-data';
|
|
5
5
|
import { randomUUID } from 'crypto';
|
|
6
|
-
import { basename } from 'path';
|
|
6
|
+
import { basename, isAbsolute, join as pathJoin } from 'path';
|
|
7
7
|
import fs from 'fs';
|
|
8
8
|
import { ENV_MINERU_API_BASE_URL, ENV_MINERU_API_TOKEN, ENV_MINERU_SERVER_TYPE, } from './types.js';
|
|
9
9
|
const DEFAULT_OFFICIAL_BASE_URL = 'https://mineru.net/api/v4';
|
|
@@ -182,8 +182,13 @@ export class MinerUClient {
|
|
|
182
182
|
const tokenFromEnv = this.configService.get(tokenEnvKey);
|
|
183
183
|
const baseUrl = baseUrlFromIntegration ||
|
|
184
184
|
baseUrlFromEnv ||
|
|
185
|
-
(this.serverType === 'official' ? DEFAULT_OFFICIAL_BASE_URL :
|
|
185
|
+
(this.serverType === 'official' ? DEFAULT_OFFICIAL_BASE_URL : undefined);
|
|
186
186
|
const token = tokenFromIntegration || tokenFromEnv;
|
|
187
|
+
// Validate baseUrl is provided for self-hosted mode
|
|
188
|
+
if (this.serverType === 'self-hosted' && !baseUrl) {
|
|
189
|
+
throw new Error('MinerU self-hosted mode requires apiUrl to be configured in integration options or ' +
|
|
190
|
+
`${ENV_MINERU_API_BASE_URL} environment variable`);
|
|
191
|
+
}
|
|
187
192
|
return { baseUrl, token };
|
|
188
193
|
}
|
|
189
194
|
readIntegrationOptions(integration) {
|
|
@@ -251,18 +256,141 @@ export class MinerUClient {
|
|
|
251
256
|
}
|
|
252
257
|
}
|
|
253
258
|
async createSelfHostedTask(options) {
|
|
254
|
-
|
|
259
|
+
// Validate fileSystem is available for self-hosted mode
|
|
260
|
+
if (!this.fileSystem) {
|
|
261
|
+
throw new Error('MinerU self-hosted mode requires fileSystem permission');
|
|
262
|
+
}
|
|
263
|
+
// Validate filePath is provided
|
|
264
|
+
if (!options.filePath) {
|
|
265
|
+
throw new Error('MinerU self-hosted mode requires filePath to be provided');
|
|
266
|
+
}
|
|
267
|
+
// Resolve absolute file path
|
|
268
|
+
// Log original filePath for debugging
|
|
269
|
+
const basePath = this.fileSystem ? this.fileSystem.basePath : 'N/A';
|
|
270
|
+
this.logger.debug(`Resolving file path. Original filePath: ${options.filePath}, basePath: ${basePath}`);
|
|
271
|
+
// Check if filePath is already an absolute path
|
|
272
|
+
const isAbsolutePath = isAbsolute(options.filePath);
|
|
273
|
+
// Also check if it looks like a full path even without leading slash
|
|
274
|
+
const looksLikeFullPath = !isAbsolutePath && (options.filePath.startsWith('Users/') ||
|
|
275
|
+
options.filePath.startsWith('home/'));
|
|
276
|
+
let filePath;
|
|
277
|
+
if (isAbsolutePath) {
|
|
278
|
+
// Use absolute path directly
|
|
279
|
+
filePath = options.filePath;
|
|
280
|
+
this.logger.debug(`Using absolute path directly: ${filePath}`);
|
|
281
|
+
}
|
|
282
|
+
else if (looksLikeFullPath) {
|
|
283
|
+
// If it looks like a full path but doesn't start with /, add it
|
|
284
|
+
filePath = options.filePath.startsWith('/') ? options.filePath : '/' + options.filePath;
|
|
285
|
+
this.logger.debug(`Detected full path pattern, normalized to: ${filePath}`);
|
|
286
|
+
}
|
|
287
|
+
else {
|
|
288
|
+
// Use xpFileSystem.fullPath() to resolve relative path to absolute path
|
|
289
|
+
filePath = this.fileSystem.fullPath(options.filePath);
|
|
290
|
+
this.logger.debug(`Resolved relative path using basePath: ${filePath}`);
|
|
291
|
+
}
|
|
292
|
+
// Validate file exists and is readable before attempting to parse
|
|
293
|
+
try {
|
|
294
|
+
await fs.promises.access(filePath, fs.constants.F_OK | fs.constants.R_OK);
|
|
295
|
+
const stats = await fs.promises.stat(filePath);
|
|
296
|
+
this.logger.debug(`Processing file: ${filePath}, size: ${stats.size} bytes`);
|
|
297
|
+
if (stats.size === 0) {
|
|
298
|
+
throw new Error(`File is empty: ${filePath}`);
|
|
299
|
+
}
|
|
300
|
+
}
|
|
301
|
+
catch (error) {
|
|
302
|
+
// If file not found in the resolved path, try to find it in common alternative locations
|
|
303
|
+
// This handles two scenarios:
|
|
304
|
+
// 1. StorageFile: files/{tenantId}/filename -> apps/api/public/files/{tenantId}/filename (already tried above)
|
|
305
|
+
// 2. VolumeClient: folder/filename or filename -> ~/data/folder/filename or ~/data/filename
|
|
306
|
+
if (error instanceof Error && error.code === 'ENOENT') {
|
|
307
|
+
const homeDir = process.env.HOME || process.env.USERPROFILE;
|
|
308
|
+
const originalFilePath = options.filePath;
|
|
309
|
+
const fileName = basename(originalFilePath);
|
|
310
|
+
// Build alternative paths for VolumeClient storage
|
|
311
|
+
const alternativePaths = [];
|
|
312
|
+
// If original path contains directory separators, try both full path and just filename
|
|
313
|
+
if (originalFilePath.includes('/') || originalFilePath.includes('\\')) {
|
|
314
|
+
// Try full path in ~/data/
|
|
315
|
+
alternativePaths.push(pathJoin(homeDir || '', 'data', originalFilePath));
|
|
316
|
+
// Try just filename in ~/data/ (for VolumeClient files stored directly in root)
|
|
317
|
+
alternativePaths.push(pathJoin(homeDir || '', 'data', fileName));
|
|
318
|
+
}
|
|
319
|
+
else {
|
|
320
|
+
// If original path is just a filename, try in ~/data/ root
|
|
321
|
+
alternativePaths.push(pathJoin(homeDir || '', 'data', originalFilePath));
|
|
322
|
+
}
|
|
323
|
+
// Also try in knowledge base specific paths if we can determine knowledgebaseId
|
|
324
|
+
// Note: We don't have direct access to knowledgebaseId here, but files might be in knowledges subdirectory
|
|
325
|
+
const resolvedPath = this.fileSystem.fullPath(originalFilePath);
|
|
326
|
+
if (resolvedPath.includes('apps/api/public')) {
|
|
327
|
+
// This looks like a StorageFile path, but file not found
|
|
328
|
+
// Try VolumeClient paths as fallback
|
|
329
|
+
this.logger.debug(`File not found in StorageFile path, trying VolumeClient paths...`);
|
|
330
|
+
}
|
|
331
|
+
let foundPath = null;
|
|
332
|
+
for (const altPath of alternativePaths) {
|
|
333
|
+
try {
|
|
334
|
+
await fs.promises.access(altPath, fs.constants.F_OK | fs.constants.R_OK);
|
|
335
|
+
const stats = await fs.promises.stat(altPath);
|
|
336
|
+
this.logger.debug(`Found file in alternative location: ${altPath}, size: ${stats.size} bytes`);
|
|
337
|
+
foundPath = altPath;
|
|
338
|
+
if (stats.size === 0) {
|
|
339
|
+
throw new Error(`File is empty: ${foundPath}`);
|
|
340
|
+
}
|
|
341
|
+
break; // File found, exit loop
|
|
342
|
+
}
|
|
343
|
+
catch (altError) {
|
|
344
|
+
// Continue to next alternative path
|
|
345
|
+
continue;
|
|
346
|
+
}
|
|
347
|
+
}
|
|
348
|
+
// If file found in alternative location, use it
|
|
349
|
+
if (foundPath) {
|
|
350
|
+
filePath = foundPath;
|
|
351
|
+
}
|
|
352
|
+
else {
|
|
353
|
+
// If still not found after trying alternatives, throw original error
|
|
354
|
+
const basePath = this.fileSystem ? this.fileSystem.basePath : 'N/A';
|
|
355
|
+
this.logger.error(`File not found or not readable. ` +
|
|
356
|
+
`Original path: ${originalFilePath}, ` +
|
|
357
|
+
`Resolved path: ${filePath}, ` +
|
|
358
|
+
`Base path: ${basePath}, ` +
|
|
359
|
+
`Tried alternative paths: ${alternativePaths.join(', ')}`, error instanceof Error ? error.stack : error);
|
|
360
|
+
throw new Error(`File not found or not readable: ${filePath}. ` +
|
|
361
|
+
`Original path: ${originalFilePath}, ` +
|
|
362
|
+
`Base path: ${basePath}. ` +
|
|
363
|
+
`Tried alternative locations: ${alternativePaths.join(', ')}`);
|
|
364
|
+
}
|
|
365
|
+
}
|
|
366
|
+
else if (error instanceof Error && error.message.includes('empty')) {
|
|
367
|
+
this.logger.error(`File is empty: ${filePath}`);
|
|
368
|
+
throw error;
|
|
369
|
+
}
|
|
370
|
+
else {
|
|
371
|
+
// Re-throw other errors
|
|
372
|
+
throw error;
|
|
373
|
+
}
|
|
374
|
+
}
|
|
255
375
|
const taskId = randomUUID();
|
|
256
|
-
const result = await this.invokeSelfHostedParse(filePath, options.fileName, options);
|
|
376
|
+
const result = await this.invokeSelfHostedParse(filePath, options.fileName || basename(filePath), options);
|
|
257
377
|
this.localTasks.set(taskId, { ...result, sourceUrl: options.url });
|
|
258
378
|
return { taskId };
|
|
259
379
|
}
|
|
260
380
|
async invokeSelfHostedParse(filePath, fileName, options) {
|
|
261
381
|
const parseUrl = this.buildApiUrl('file_parse');
|
|
382
|
+
this.logger.debug(`Sending parse request to: ${parseUrl}, file: ${fileName}`);
|
|
262
383
|
const form = new FormData();
|
|
263
|
-
|
|
264
|
-
|
|
265
|
-
|
|
384
|
+
// Create file read stream (file existence is already validated in createSelfHostedTask)
|
|
385
|
+
try {
|
|
386
|
+
form.append('files', fs.createReadStream(filePath), {
|
|
387
|
+
filename: fileName,
|
|
388
|
+
});
|
|
389
|
+
}
|
|
390
|
+
catch (error) {
|
|
391
|
+
this.logger.error(`Failed to create read stream for file: ${filePath}`, error instanceof Error ? error.stack : error);
|
|
392
|
+
throw new Error(`Failed to read file: ${filePath}. ${error instanceof Error ? error.message : String(error)}`);
|
|
393
|
+
}
|
|
266
394
|
// form.append('files', fileBuffer, { filename: fileName, contentType: contentType || 'application/pdf' });
|
|
267
395
|
form.append('parse_method', options.parseMethod ?? 'auto');
|
|
268
396
|
form.append('return_md', 'true');
|
|
@@ -290,11 +418,27 @@ export class MinerUClient {
|
|
|
290
418
|
return this.invokeSelfHostedParseV1(filePath, fileName, options);
|
|
291
419
|
}
|
|
292
420
|
if (response.status === 400) {
|
|
293
|
-
|
|
421
|
+
const errorMessage = getErrorMessage(response.data);
|
|
422
|
+
this.logger.error(`MinerU self-hosted parse failed with 400: ${errorMessage}`, JSON.stringify(response.data));
|
|
423
|
+
throw new BadRequestException(`MinerU self-hosted parse failed: ${response.status} ${errorMessage}`);
|
|
294
424
|
}
|
|
295
425
|
if (response.status !== 200) {
|
|
296
|
-
|
|
297
|
-
|
|
426
|
+
const errorMessage = getErrorMessage(response.data) || response.statusText;
|
|
427
|
+
const errorDetails = typeof response.data === 'object' ? JSON.stringify(response.data) : String(response.data);
|
|
428
|
+
this.logger.error(`MinerU self-hosted parse failed with ${response.status}: ${errorMessage}`, `Request URL: ${parseUrl}, File: ${fileName}, Details: ${errorDetails}`);
|
|
429
|
+
// Provide more helpful error message for common issues
|
|
430
|
+
let userFriendlyMessage = `MinerU self-hosted parse failed: ${response.status} ${response.statusText}`;
|
|
431
|
+
if (errorMessage) {
|
|
432
|
+
userFriendlyMessage += `. ${errorMessage}`;
|
|
433
|
+
}
|
|
434
|
+
// Check for specific error patterns
|
|
435
|
+
if (errorMessage && errorMessage.includes('0 active models')) {
|
|
436
|
+
userFriendlyMessage += ' Please ensure MinerU service has active models configured.';
|
|
437
|
+
}
|
|
438
|
+
else if (errorMessage && errorMessage.includes('NoneType')) {
|
|
439
|
+
userFriendlyMessage += ' This may indicate a configuration issue with the MinerU service.';
|
|
440
|
+
}
|
|
441
|
+
throw new Error(userFriendlyMessage);
|
|
298
442
|
}
|
|
299
443
|
return this.normalizeSelfHostedResponse(response.data);
|
|
300
444
|
}
|
|
@@ -323,7 +467,9 @@ export class MinerUClient {
|
|
|
323
467
|
validateStatus: () => true,
|
|
324
468
|
});
|
|
325
469
|
if (response.status !== 200) {
|
|
326
|
-
|
|
470
|
+
const errorMessage = getErrorMessage(response.data) || response.statusText;
|
|
471
|
+
this.logger.error(`MinerU self-hosted legacy parse failed with ${response.status}: ${errorMessage}`, JSON.stringify(response.data));
|
|
472
|
+
throw new Error(`MinerU self-hosted legacy parse failed: ${response.status} ${response.statusText}. ${errorMessage}`);
|
|
327
473
|
}
|
|
328
474
|
return this.normalizeSelfHostedResponse(response.data);
|
|
329
475
|
}
|
|
@@ -1 +1 @@
|
|
|
1
|
-
{"version":3,"file":"mineru.plugin.d.ts","sourceRoot":"","sources":["../../src/lib/mineru.plugin.ts"],"names":[],"mappings":"AACA,OAAO,EAAqB,kBAAkB,EAAE,gBAAgB,EAAE,MAAM,sBAAsB,CAAC;
|
|
1
|
+
{"version":3,"file":"mineru.plugin.d.ts","sourceRoot":"","sources":["../../src/lib/mineru.plugin.ts"],"names":[],"mappings":"AACA,OAAO,EAAqB,kBAAkB,EAAE,gBAAgB,EAAE,MAAM,sBAAsB,CAAC;AAO/F,qBAiBa,YAAa,YAAW,kBAAkB,EAAE,gBAAgB;IAExE,OAAO,CAAC,UAAU,CAAQ;IAE1B;;OAEG;IACH,iBAAiB,IAAI,IAAI,GAAG,OAAO,CAAC,IAAI,CAAC;IAMzC;;OAEG;IACH,eAAe,IAAI,IAAI,GAAG,OAAO,CAAC,IAAI,CAAC;CAKvC"}
|
|
@@ -7,7 +7,6 @@ import { MinerUTransformerStrategy } from './transformer-mineru.strategy.js';
|
|
|
7
7
|
import { MinerUResultParserService } from './result-parser.service.js';
|
|
8
8
|
import { MinerUIntegrationStrategy } from './integration.strategy.js';
|
|
9
9
|
import { MinerUController } from './mineru.controller.js';
|
|
10
|
-
import { MinerUToolsetStrategy } from './mineru-toolset.strategy.js';
|
|
11
10
|
let MinerUPlugin = MinerUPlugin_1 = class MinerUPlugin {
|
|
12
11
|
constructor() {
|
|
13
12
|
// We disable by default additional logging for each event to avoid cluttering the logs
|
|
@@ -42,7 +41,6 @@ MinerUPlugin = MinerUPlugin_1 = __decorate([
|
|
|
42
41
|
MinerUIntegrationStrategy,
|
|
43
42
|
MinerUTransformerStrategy,
|
|
44
43
|
MinerUResultParserService,
|
|
45
|
-
MinerUToolsetStrategy,
|
|
46
44
|
],
|
|
47
45
|
controllers: [
|
|
48
46
|
MinerUController
|
|
@@ -1 +1 @@
|
|
|
1
|
-
{"version":3,"file":"result-parser.service.d.ts","sourceRoot":"","sources":["../../src/lib/result-parser.service.ts"],"names":[],"mappings":"AAAA,OAAO,EAAE,QAAQ,EAAE,MAAM,2BAA2B,CAAC;AACrD,OAAO,EAAE,kBAAkB,EAAE,MAAM,kBAAkB,CAAC;AAEtD,OAAO,EACL,aAAa,EAEb,YAAY,EACb,MAAM,sBAAsB,CAAC;AAK9B,OAAO,EAEL,sBAAsB,EACtB,0BAA0B,EAC3B,MAAM,YAAY,CAAC;AAEpB,qBACa,yBAAyB;IACpC,OAAO,CAAC,QAAQ,CAAC,MAAM,CAA8C;IAE/D,YAAY,CAChB,UAAU,EAAE,MAAM,EAClB,MAAM,EAAE,MAAM,EACd,QAAQ,EAAE,OAAO,CAAC,kBAAkB,CAAC,EACrC,UAAU,EAAE,YAAY,GACvB,OAAO,CAAC;QACT,EAAE,CAAC,EAAE,MAAM,CAAC;QACZ,MAAM,EAAE,QAAQ,CAAC,aAAa,CAAC,EAAE,CAAC;QAClC,QAAQ,EAAE,sBAAsB,CAAC;KAClC,CAAC;
|
|
1
|
+
{"version":3,"file":"result-parser.service.d.ts","sourceRoot":"","sources":["../../src/lib/result-parser.service.ts"],"names":[],"mappings":"AAAA,OAAO,EAAE,QAAQ,EAAE,MAAM,2BAA2B,CAAC;AACrD,OAAO,EAAE,kBAAkB,EAAE,MAAM,kBAAkB,CAAC;AAEtD,OAAO,EACL,aAAa,EAEb,YAAY,EACb,MAAM,sBAAsB,CAAC;AAK9B,OAAO,EAEL,sBAAsB,EACtB,0BAA0B,EAC3B,MAAM,YAAY,CAAC;AAEpB,qBACa,yBAAyB;IACpC,OAAO,CAAC,QAAQ,CAAC,MAAM,CAA8C;IAE/D,YAAY,CAChB,UAAU,EAAE,MAAM,EAClB,MAAM,EAAE,MAAM,EACd,QAAQ,EAAE,OAAO,CAAC,kBAAkB,CAAC,EACrC,UAAU,EAAE,YAAY,GACvB,OAAO,CAAC;QACT,EAAE,CAAC,EAAE,MAAM,CAAC;QACZ,MAAM,EAAE,QAAQ,CAAC,aAAa,CAAC,EAAE,CAAC;QAClC,QAAQ,EAAE,sBAAsB,CAAC;KAClC,CAAC;IAsFI,cAAc,CAClB,MAAM,EAAE,0BAA0B,EAClC,MAAM,EAAE,MAAM,EACd,QAAQ,EAAE,OAAO,CAAC,kBAAkB,CAAC,EACrC,UAAU,EAAE,YAAY,GACvB,OAAO,CAAC;QACT,EAAE,CAAC,EAAE,MAAM,CAAC;QACZ,MAAM,EAAE,QAAQ,CAAC,aAAa,CAAC,EAAE,CAAC;QAClC,QAAQ,EAAE,sBAAsB,CAAC;KAClC,CAAC;CAkDH"}
|
|
@@ -85,6 +85,17 @@ export declare class MinerUTransformerStrategy implements IDocumentTransformerSt
|
|
|
85
85
|
enum: string[];
|
|
86
86
|
default: string;
|
|
87
87
|
};
|
|
88
|
+
pageRanges: {
|
|
89
|
+
type: string;
|
|
90
|
+
title: {
|
|
91
|
+
en_US: string;
|
|
92
|
+
zh_Hans: string;
|
|
93
|
+
};
|
|
94
|
+
description: {
|
|
95
|
+
en_US: string;
|
|
96
|
+
zh_Hans: string;
|
|
97
|
+
};
|
|
98
|
+
};
|
|
88
99
|
};
|
|
89
100
|
required: any[];
|
|
90
101
|
};
|
|
@@ -1 +1 @@
|
|
|
1
|
-
{"version":3,"file":"transformer-mineru.strategy.d.ts","sourceRoot":"","sources":["../../src/lib/transformer-mineru.strategy.ts"],"names":[],"mappings":"AAAA,OAAO,EAAE,QAAQ,EAAE,kBAAkB,EAAE,MAAM,kBAAkB,CAAA;AAG/D,OAAO,EACL,aAAa,EAEb,oBAAoB,EACpB,4BAA4B,EAC5B,qBAAqB,EACtB,MAAM,sBAAsB,CAAA;AAI7B,OAAO,
|
|
1
|
+
{"version":3,"file":"transformer-mineru.strategy.d.ts","sourceRoot":"","sources":["../../src/lib/transformer-mineru.strategy.ts"],"names":[],"mappings":"AAAA,OAAO,EAAE,QAAQ,EAAE,kBAAkB,EAAE,MAAM,kBAAkB,CAAA;AAG/D,OAAO,EACL,aAAa,EAEb,oBAAoB,EACpB,4BAA4B,EAC5B,qBAAqB,EACtB,MAAM,sBAAsB,CAAA;AAI7B,OAAO,EAA0C,wBAAwB,EAAE,MAAM,YAAY,CAAA;AAE7F,qBAEa,yBAA0B,YAAW,4BAA4B,CAAC,wBAAwB,CAAC;IAEtG,OAAO,CAAC,QAAQ,CAAC,YAAY,CAA2B;IAGxD,OAAO,CAAC,QAAQ,CAAC,aAAa,CAAe;IAE7C,QAAQ,CAAC,WAAW,mDAWnB;IAED,QAAQ,CAAC,IAAI;;;;;;;;;;;kBAWM,QAAQ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;MAmF1B;IAED,cAAc,CAAC,MAAM,EAAE,GAAG,GAAG,OAAO,CAAC,IAAI,CAAC;IAIpC,kBAAkB,CACtB,SAAS,EAAE,OAAO,CAAC,kBAAkB,CAAC,EAAE,EACxC,MAAM,EAAE,wBAAwB,GAC/B,OAAO,CAAC,OAAO,CAAC,kBAAkB,CAAC,aAAa,CAAC,CAAC,EAAE,CAAC;CAiEzD"}
|