@helloxiaohu/plugin-mineru 0.0.11
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +101 -0
- package/dist/index.d.ts +6 -0
- package/dist/index.d.ts.map +1 -0
- package/dist/index.js +39 -0
- package/dist/lib/integration.strategy.d.ts +10 -0
- package/dist/lib/integration.strategy.d.ts.map +1 -0
- package/dist/lib/integration.strategy.js +118 -0
- package/dist/lib/mineru-toolset.strategy.d.ts +69 -0
- package/dist/lib/mineru-toolset.strategy.d.ts.map +1 -0
- package/dist/lib/mineru-toolset.strategy.js +109 -0
- package/dist/lib/mineru.client.d.ts +120 -0
- package/dist/lib/mineru.client.d.ts.map +1 -0
- package/dist/lib/mineru.client.js +456 -0
- package/dist/lib/mineru.controller.d.ts +9 -0
- package/dist/lib/mineru.controller.d.ts.map +1 -0
- package/dist/lib/mineru.controller.js +41 -0
- package/dist/lib/mineru.plugin.d.ts +13 -0
- package/dist/lib/mineru.plugin.d.ts.map +1 -0
- package/dist/lib/mineru.plugin.js +52 -0
- package/dist/lib/mineru.tool.d.ts +61 -0
- package/dist/lib/mineru.tool.d.ts.map +1 -0
- package/dist/lib/mineru.tool.js +132 -0
- package/dist/lib/mineru.toolset.d.ts +40 -0
- package/dist/lib/mineru.toolset.d.ts.map +1 -0
- package/dist/lib/mineru.toolset.js +47 -0
- package/dist/lib/path-meta.d.ts +5 -0
- package/dist/lib/path-meta.d.ts.map +1 -0
- package/dist/lib/path-meta.js +8 -0
- package/dist/lib/result-parser.service.d.ts +18 -0
- package/dist/lib/result-parser.service.d.ts.map +1 -0
- package/dist/lib/result-parser.service.js +142 -0
- package/dist/lib/transformer-mineru.strategy.d.ts +95 -0
- package/dist/lib/transformer-mineru.strategy.d.ts.map +1 -0
- package/dist/lib/transformer-mineru.strategy.js +163 -0
- package/dist/lib/types.d.ts +40 -0
- package/dist/lib/types.d.ts.map +1 -0
- package/dist/lib/types.js +27 -0
- package/package.json +60 -0
package/README.md
ADDED
|
@@ -0,0 +1,101 @@
|
|
|
1
|
+
# Xpert Plugin: MinerU
|
|
2
|
+
|
|
3
|
+
`@xpert-ai/plugin-mineru` is a MinerU document converter plugin for the [Xpert AI](https://github.com/xpert-ai/xpert) platform, providing extraction capabilities from PDF to Markdown and structured JSON. The plugin includes built-in MinerU integration strategies, document conversion strategies, and result parsing services, enabling secure access to the MinerU API in automated workflows, polling task status, and writing parsed content and attachment resources to the platform file system.
|
|
4
|
+
|
|
5
|
+
## Installation
|
|
6
|
+
|
|
7
|
+
```bash
|
|
8
|
+
pnpm add @xpert-ai/plugin-mineru
|
|
9
|
+
# or
|
|
10
|
+
npm install @xpert-ai/plugin-mineru
|
|
11
|
+
```
|
|
12
|
+
|
|
13
|
+
> **Note**: This plugin depends on `@xpert-ai/plugin-sdk`, `@nestjs/common@^11`, `@nestjs/config@^4`, `@metad/contracts`, `axios@1`, `chalk@4`, `@langchain/core@^0.3.72`, and `uuid@8` as peerDependencies. Please ensure these packages are installed in your host project.
|
|
14
|
+
|
|
15
|
+
## Quick Start
|
|
16
|
+
|
|
17
|
+
1. **Prepare MinerU Credentials**
|
|
18
|
+
Obtain a valid API Key from the MinerU dashboard and confirm the service address (default: `https://mineru.net/api/v4`).
|
|
19
|
+
|
|
20
|
+
2. **Configure Integration in Xpert**
|
|
21
|
+
- Via Xpert Console: Create a MinerU integration and fill in the following fields.
|
|
22
|
+
- Or set environment variables in your deployment environment:
|
|
23
|
+
- `MINERU_API_BASE_URL`: Optional, defaults to `https://mineru.net/api/v4`.
|
|
24
|
+
- `MINERU_API_TOKEN`: Required, used as a fallback credential if no integration is configured.
|
|
25
|
+
|
|
26
|
+
Example integration configuration (JSON):
|
|
27
|
+
|
|
28
|
+
```json
|
|
29
|
+
{
|
|
30
|
+
"provider": "mineru",
|
|
31
|
+
"options": {
|
|
32
|
+
"apiUrl": "https://mineru.net/api/v4",
|
|
33
|
+
"apiKey": "your-mineru-api-key"
|
|
34
|
+
}
|
|
35
|
+
}
|
|
36
|
+
```
|
|
37
|
+
|
|
38
|
+
3. **Register the Plugin**
|
|
39
|
+
Configure the plugin in your host service's plugin registration process:
|
|
40
|
+
|
|
41
|
+
```sh .env
|
|
42
|
+
PLUGINS=@xpert-ai/plugin-mineru
|
|
43
|
+
```
|
|
44
|
+
|
|
45
|
+
The plugin returns the NestJS module `MinerUPlugin` in the `register` hook and logs messages during the `onStart`/`onStop` lifecycle.
|
|
46
|
+
|
|
47
|
+
## MinerU Integration Options
|
|
48
|
+
|
|
49
|
+
| Field | Type | Description | Required | Default |
|
|
50
|
+
| -------- | ------ | ------------------------------------- | -------- | ---------------------------- |
|
|
51
|
+
| apiUrl | string | MinerU API base URL | No | `https://mineru.net/api/v4` |
|
|
52
|
+
| apiKey | string | MinerU service API Key (keep secret) | Yes | — |
|
|
53
|
+
|
|
54
|
+
> If both integration configuration and environment variables are provided, options from the integration configuration take precedence.
|
|
55
|
+
|
|
56
|
+
## Document Conversion Parameters
|
|
57
|
+
|
|
58
|
+
`MinerUTransformerStrategy` supports the following configuration options (passed to the MinerU API when starting a workflow):
|
|
59
|
+
|
|
60
|
+
| Field | Type | Default | Description |
|
|
61
|
+
| ---------------- | ------- | ------------ | --------------------------------------------------- |
|
|
62
|
+
| `isOcr` | boolean | `true` | Enable OCR for image-based PDFs. |
|
|
63
|
+
| `enableFormula` | boolean | `true` | Recognize mathematical formulas and output tags. |
|
|
64
|
+
| `enableTable` | boolean | `true` | Recognize tables and output structured tags. |
|
|
65
|
+
| `language` | string | `"ch"` | Main document language, per MinerU API (`en`/`ch`). |
|
|
66
|
+
| `modelVersion` | string | `"pipeline"` | MinerU model version (`pipeline`, `vlm`, etc.). |
|
|
67
|
+
|
|
68
|
+
By default, the plugin creates MinerU tasks for each file to be processed, polls until `full_zip_url` is returned, then downloads and parses the zip package in memory.
|
|
69
|
+
|
|
70
|
+
## Permissions
|
|
71
|
+
|
|
72
|
+
- **Integration**: Access MinerU integration configuration to read API address and credentials.
|
|
73
|
+
- **File System**: Perform `read/write/list` on `XpFileSystem` to store image resources from MinerU results.
|
|
74
|
+
|
|
75
|
+
Ensure the plugin is granted these permissions in your authorization policy, or it will not be able to retrieve results or write attachments.
|
|
76
|
+
|
|
77
|
+
## Output Content
|
|
78
|
+
|
|
79
|
+
The parser generates:
|
|
80
|
+
|
|
81
|
+
- Full Markdown: Resource links are automatically replaced to point to actual URLs written via `XpFileSystem`.
|
|
82
|
+
- Structured metadata: Includes MinerU task ID, layout JSON (`layout.json`), content list (`content_list.json`), original PDF filename, etc.
|
|
83
|
+
- Attachment asset list: Records written image resources for easy association by callers.
|
|
84
|
+
|
|
85
|
+
The returned `Document<ChunkMetadata>` array currently defaults to a single chunk containing the full Markdown; you can split it as needed.
|
|
86
|
+
|
|
87
|
+
## Development & Debugging
|
|
88
|
+
|
|
89
|
+
Run the following commands in the repository root to build and test locally:
|
|
90
|
+
|
|
91
|
+
```bash
|
|
92
|
+
npm install
|
|
93
|
+
npx nx build @xpert-ai/plugin-mineru
|
|
94
|
+
npx nx test @xpert-ai/plugin-mineru
|
|
95
|
+
```
|
|
96
|
+
|
|
97
|
+
TypeScript build artifacts are output to `packages/mineru/dist`. Before publishing, ensure `package.json`, type declarations, and runtime files are in sync.
|
|
98
|
+
|
|
99
|
+
## License
|
|
100
|
+
|
|
101
|
+
This project follows the [AGPL-3.0 License](../../../LICENSE) in the repository root.
|
package/dist/index.d.ts
ADDED
|
@@ -0,0 +1,6 @@
|
|
|
1
|
+
import { z } from 'zod';
|
|
2
|
+
import type { XpertPlugin } from '@xpert-ai/plugin-sdk';
|
|
3
|
+
declare const ConfigSchema: z.ZodObject<{}, "strip", z.ZodTypeAny, {}, {}>;
|
|
4
|
+
declare const plugin: XpertPlugin<z.infer<typeof ConfigSchema>>;
|
|
5
|
+
export default plugin;
|
|
6
|
+
//# sourceMappingURL=index.d.ts.map
|
|
@@ -0,0 +1 @@
|
|
|
1
|
+
{"version":3,"file":"index.d.ts","sourceRoot":"","sources":["../src/index.ts"],"names":[],"mappings":"AAAA,OAAO,EAAE,CAAC,EAAE,MAAM,KAAK,CAAC;AACxB,OAAO,KAAK,EAAE,WAAW,EAAE,MAAM,sBAAsB,CAAC;AAcxD,QAAA,MAAM,YAAY,gDAChB,CAAC;AAEH,QAAA,MAAM,MAAM,EAAE,WAAW,CAAC,CAAC,CAAC,KAAK,CAAC,OAAO,YAAY,CAAC,CA4BrD,CAAC;AAEF,eAAe,MAAM,CAAC"}
|
package/dist/index.js
ADDED
|
@@ -0,0 +1,39 @@
|
|
|
1
|
+
import { z } from 'zod';
|
|
2
|
+
import { readFileSync } from 'fs';
|
|
3
|
+
import { join } from 'path';
|
|
4
|
+
import { MinerUPlugin } from './lib/mineru.plugin.js';
|
|
5
|
+
import { icon } from './lib/types.js';
|
|
6
|
+
import { getModuleMeta } from './lib/path-meta.js';
|
|
7
|
+
const { __filename, __dirname } = getModuleMeta(import.meta);
|
|
8
|
+
const packageJson = JSON.parse(readFileSync(join(__dirname, '../package.json'), 'utf8'));
|
|
9
|
+
const ConfigSchema = z.object({});
|
|
10
|
+
const plugin = {
|
|
11
|
+
meta: {
|
|
12
|
+
name: packageJson.name,
|
|
13
|
+
version: packageJson.version,
|
|
14
|
+
category: 'set',
|
|
15
|
+
icon: {
|
|
16
|
+
type: 'svg',
|
|
17
|
+
value: icon
|
|
18
|
+
},
|
|
19
|
+
displayName: 'MinerU Transformer',
|
|
20
|
+
description: 'Provide PDF to Markdown and JSON transformation functionality',
|
|
21
|
+
keywords: ['integration', 'pdf', 'markdown', 'json', 'transformer'],
|
|
22
|
+
author: 'XpertAI Team',
|
|
23
|
+
homepage: 'https://www.npmjs.com/package/@xpert-ai/plugin-mineru',
|
|
24
|
+
},
|
|
25
|
+
config: {
|
|
26
|
+
schema: ConfigSchema,
|
|
27
|
+
},
|
|
28
|
+
register(ctx) {
|
|
29
|
+
ctx.logger.log('register mineru transformer plugin');
|
|
30
|
+
return { module: MinerUPlugin, global: true };
|
|
31
|
+
},
|
|
32
|
+
async onStart(ctx) {
|
|
33
|
+
ctx.logger.log('mineru transformer plugin started');
|
|
34
|
+
},
|
|
35
|
+
async onStop(ctx) {
|
|
36
|
+
ctx.logger.log('mineru transformer plugin stopped');
|
|
37
|
+
},
|
|
38
|
+
};
|
|
39
|
+
export default plugin;
|
|
@@ -0,0 +1,10 @@
|
|
|
1
|
+
import { type IIntegration, TIntegrationProvider } from '@metad/contracts';
|
|
2
|
+
import { IntegrationStrategy, TIntegrationStrategyParams } from '@xpert-ai/plugin-sdk';
|
|
3
|
+
import { MinerUIntegrationOptions } from './types.js';
|
|
4
|
+
export declare class MinerUIntegrationStrategy implements IntegrationStrategy<MinerUIntegrationOptions> {
|
|
5
|
+
readonly meta: TIntegrationProvider;
|
|
6
|
+
private readonly configService;
|
|
7
|
+
execute(integration: IIntegration<MinerUIntegrationOptions>, payload: TIntegrationStrategyParams): Promise<any>;
|
|
8
|
+
validateConfig(config: MinerUIntegrationOptions): Promise<void>;
|
|
9
|
+
}
|
|
10
|
+
//# sourceMappingURL=integration.strategy.d.ts.map
|
|
@@ -0,0 +1 @@
|
|
|
1
|
+
{"version":3,"file":"integration.strategy.d.ts","sourceRoot":"","sources":["../../src/lib/integration.strategy.ts"],"names":[],"mappings":"AAAA,OAAO,EAAE,KAAK,YAAY,EAAE,oBAAoB,EAAE,MAAM,kBAAkB,CAAC;AAO3E,OAAO,EACL,mBAAmB,EAGnB,0BAA0B,EAC3B,MAAM,sBAAsB,CAAC;AAE9B,OAAO,EAAgB,wBAAwB,EAAE,MAAM,YAAY,CAAC;AAEpE,qBAEa,yBACX,YAAW,mBAAmB,CAAC,wBAAwB,CAAC;IAExD,QAAQ,CAAC,IAAI,EAAE,oBAAoB,CAsEjC;IAGF,OAAO,CAAC,QAAQ,CAAC,aAAa,CAAgB;IAExC,OAAO,CACX,WAAW,EAAE,YAAY,CAAC,wBAAwB,CAAC,EACnD,OAAO,EAAE,0BAA0B,GAClC,OAAO,CAAC,GAAG,CAAC;IAIT,cAAc,CAAC,MAAM,EAAE,wBAAwB,GAAG,OAAO,CAAC,IAAI,CAAC;CA2BtE"}
|
|
@@ -0,0 +1,118 @@
|
|
|
1
|
+
import { __decorate, __metadata } from "tslib";
|
|
2
|
+
import { forwardRef, Inject, Injectable, } from '@nestjs/common';
|
|
3
|
+
import { ConfigService } from '@nestjs/config';
|
|
4
|
+
import { IntegrationStrategyKey, } from '@xpert-ai/plugin-sdk';
|
|
5
|
+
import { MinerUClient } from './mineru.client.js';
|
|
6
|
+
import { icon, MinerU } from './types.js';
|
|
7
|
+
let MinerUIntegrationStrategy = class MinerUIntegrationStrategy {
|
|
8
|
+
constructor() {
|
|
9
|
+
this.meta = {
|
|
10
|
+
name: MinerU,
|
|
11
|
+
label: {
|
|
12
|
+
en_US: 'MinerU',
|
|
13
|
+
},
|
|
14
|
+
description: {
|
|
15
|
+
en_US: 'MinerU is a tool that converts PDFs into machine-readable formats (e.g., markdown, JSON), allowing for easy extraction into any format. ',
|
|
16
|
+
zh_Hans: 'MinerU 是一种将 PDF 转换为机器可读格式(例如 markdown、JSON)的工具,可以轻松提取为任何格式。',
|
|
17
|
+
},
|
|
18
|
+
icon: {
|
|
19
|
+
type: 'svg',
|
|
20
|
+
value: icon,
|
|
21
|
+
color: '#4CAF50',
|
|
22
|
+
},
|
|
23
|
+
schema: {
|
|
24
|
+
type: 'object',
|
|
25
|
+
properties: {
|
|
26
|
+
apiUrl: {
|
|
27
|
+
type: 'string',
|
|
28
|
+
title: {
|
|
29
|
+
en_US: 'Base URL',
|
|
30
|
+
},
|
|
31
|
+
description: {
|
|
32
|
+
en_US: 'https://api.mineru.dev',
|
|
33
|
+
ja_JP: 'MinerUサーバのBase URLを入力してください',
|
|
34
|
+
zh_Hans: '请输入你的 MinerU 服务的 Base URL',
|
|
35
|
+
},
|
|
36
|
+
},
|
|
37
|
+
apiKey: {
|
|
38
|
+
type: 'string',
|
|
39
|
+
title: {
|
|
40
|
+
en_US: 'API Key',
|
|
41
|
+
},
|
|
42
|
+
description: {
|
|
43
|
+
en_US: 'The API Key of the MinerU server',
|
|
44
|
+
ja_JP: 'MinerUサーバのトークンを入力してください',
|
|
45
|
+
zh_Hans: '请输入你的 MinerU 服务的令牌',
|
|
46
|
+
},
|
|
47
|
+
'x-ui': {
|
|
48
|
+
component: 'secretInput',
|
|
49
|
+
label: 'API Key',
|
|
50
|
+
placeholder: 'MinerU API Key',
|
|
51
|
+
revealable: true,
|
|
52
|
+
maskSymbol: '*',
|
|
53
|
+
persist: true,
|
|
54
|
+
},
|
|
55
|
+
},
|
|
56
|
+
serverType: {
|
|
57
|
+
type: 'string',
|
|
58
|
+
title: {
|
|
59
|
+
en_US: 'Server Type',
|
|
60
|
+
ja_JP: 'サーバータイプ',
|
|
61
|
+
zh_Hans: '服务类型',
|
|
62
|
+
},
|
|
63
|
+
description: {
|
|
64
|
+
en_US: 'Please select MinerU service type, local deployment or official API',
|
|
65
|
+
ja_JP: 'MinerUサービスのタイプを選択してください、ローカルデプロイまたは公式API',
|
|
66
|
+
zh_Hans: '请选择MinerU服务类型,本地部署或官方API',
|
|
67
|
+
},
|
|
68
|
+
enum: ['official', 'self-hosted'],
|
|
69
|
+
default: 'official',
|
|
70
|
+
},
|
|
71
|
+
},
|
|
72
|
+
},
|
|
73
|
+
features: [],
|
|
74
|
+
helpUrl: 'https://mineru.net/apiManage/docs',
|
|
75
|
+
};
|
|
76
|
+
}
|
|
77
|
+
async execute(integration, payload) {
|
|
78
|
+
throw new Error('Method not implemented.');
|
|
79
|
+
}
|
|
80
|
+
async validateConfig(config) {
|
|
81
|
+
const mineruClient = new MinerUClient(this.configService, {
|
|
82
|
+
integration: {
|
|
83
|
+
provider: MinerU,
|
|
84
|
+
options: config,
|
|
85
|
+
},
|
|
86
|
+
});
|
|
87
|
+
if (mineruClient.serverType === 'official') {
|
|
88
|
+
try {
|
|
89
|
+
await mineruClient.validateOfficialApiToken();
|
|
90
|
+
}
|
|
91
|
+
catch (error) {
|
|
92
|
+
console.error(`MinerU integration validation error:`);
|
|
93
|
+
console.error(error);
|
|
94
|
+
throw error;
|
|
95
|
+
}
|
|
96
|
+
}
|
|
97
|
+
else {
|
|
98
|
+
// Self-hosted MinerU validation logic: access openapi.json
|
|
99
|
+
try {
|
|
100
|
+
await mineruClient.getSelfHostedOpenApiSpec();
|
|
101
|
+
}
|
|
102
|
+
catch (error) {
|
|
103
|
+
console.error(`MinerU self-hosted integration validation error:`);
|
|
104
|
+
console.error(error);
|
|
105
|
+
throw error;
|
|
106
|
+
}
|
|
107
|
+
}
|
|
108
|
+
}
|
|
109
|
+
};
|
|
110
|
+
__decorate([
|
|
111
|
+
Inject(forwardRef(() => ConfigService)),
|
|
112
|
+
__metadata("design:type", ConfigService)
|
|
113
|
+
], MinerUIntegrationStrategy.prototype, "configService", void 0);
|
|
114
|
+
MinerUIntegrationStrategy = __decorate([
|
|
115
|
+
Injectable(),
|
|
116
|
+
IntegrationStrategyKey(MinerU)
|
|
117
|
+
], MinerUIntegrationStrategy);
|
|
118
|
+
export { MinerUIntegrationStrategy };
|
|
@@ -0,0 +1,69 @@
|
|
|
1
|
+
import { ConfigService } from '@nestjs/config';
|
|
2
|
+
import { BuiltinToolset, IToolsetStrategy, IntegrationPermission, FileSystemPermission } from '@xpert-ai/plugin-sdk';
|
|
3
|
+
import { MinerUResultParserService } from './result-parser.service.js';
|
|
4
|
+
import { MinerUToolsetConfig } from './mineru.toolset.js';
|
|
5
|
+
/**
|
|
6
|
+
* ToolsetStrategy for MinerU PDF parser tool
|
|
7
|
+
* Registers MinerU as a toolset that can be used in agent workflows
|
|
8
|
+
*/
|
|
9
|
+
export declare class MinerUToolsetStrategy implements IToolsetStrategy<MinerUToolsetConfig> {
|
|
10
|
+
private readonly configService;
|
|
11
|
+
private readonly resultParser;
|
|
12
|
+
/**
|
|
13
|
+
* Metadata for MinerU toolset
|
|
14
|
+
*/
|
|
15
|
+
meta: {
|
|
16
|
+
author: string;
|
|
17
|
+
tags: string[];
|
|
18
|
+
name: string;
|
|
19
|
+
label: {
|
|
20
|
+
en_US: string;
|
|
21
|
+
zh_Hans: string;
|
|
22
|
+
};
|
|
23
|
+
description: {
|
|
24
|
+
en_US: string;
|
|
25
|
+
zh_Hans: string;
|
|
26
|
+
};
|
|
27
|
+
icon: {
|
|
28
|
+
svg: string;
|
|
29
|
+
color: string;
|
|
30
|
+
};
|
|
31
|
+
configSchema: {
|
|
32
|
+
type: string;
|
|
33
|
+
properties: {
|
|
34
|
+
integration: {
|
|
35
|
+
type: string;
|
|
36
|
+
title: {
|
|
37
|
+
en_US: string;
|
|
38
|
+
zh_Hans: string;
|
|
39
|
+
};
|
|
40
|
+
description: {
|
|
41
|
+
en_US: string;
|
|
42
|
+
zh_Hans: string;
|
|
43
|
+
};
|
|
44
|
+
};
|
|
45
|
+
};
|
|
46
|
+
required: string[];
|
|
47
|
+
};
|
|
48
|
+
};
|
|
49
|
+
/**
|
|
50
|
+
* Permissions required by MinerU toolset
|
|
51
|
+
*/
|
|
52
|
+
readonly permissions: (IntegrationPermission | FileSystemPermission)[];
|
|
53
|
+
constructor(configService: ConfigService, resultParser: MinerUResultParserService);
|
|
54
|
+
/**
|
|
55
|
+
* Validate toolset configuration
|
|
56
|
+
*/
|
|
57
|
+
validateConfig(config: MinerUToolsetConfig): Promise<void>;
|
|
58
|
+
/**
|
|
59
|
+
* Create MinerU toolset instance
|
|
60
|
+
*/
|
|
61
|
+
create(config: MinerUToolsetConfig): Promise<BuiltinToolset>;
|
|
62
|
+
/**
|
|
63
|
+
* Create tools for MinerU toolset
|
|
64
|
+
* Tools are created dynamically in MinerUToolset.initTools()
|
|
65
|
+
* based on the integration configuration
|
|
66
|
+
*/
|
|
67
|
+
createTools(): any[];
|
|
68
|
+
}
|
|
69
|
+
//# sourceMappingURL=mineru-toolset.strategy.d.ts.map
|
|
@@ -0,0 +1 @@
|
|
|
1
|
+
{"version":3,"file":"mineru-toolset.strategy.d.ts","sourceRoot":"","sources":["../../src/lib/mineru-toolset.strategy.ts"],"names":[],"mappings":"AACA,OAAO,EAAE,aAAa,EAAE,MAAM,gBAAgB,CAAC;AAC/C,OAAO,EACL,cAAc,EACd,gBAAgB,EAEhB,qBAAqB,EACrB,oBAAoB,EACrB,MAAM,sBAAsB,CAAC;AAC9B,OAAO,EAAE,yBAAyB,EAAE,MAAM,4BAA4B,CAAC;AACvE,OAAO,EAAiB,mBAAmB,EAAE,MAAM,qBAAqB,CAAC;AAGzE;;;GAGG;AACH,qBAEa,qBAAsB,YAAW,gBAAgB,CAAC,mBAAmB,CAAC;IA0D/E,OAAO,CAAC,QAAQ,CAAC,aAAa;IAE9B,OAAO,CAAC,QAAQ,CAAC,YAAY;IA3D/B;;OAEG;IACH,IAAI;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;MAkCF;IAEF;;OAEG;IACH,QAAQ,CAAC,WAAW,mDAWlB;gBAIiB,aAAa,EAAE,aAAa,EAE5B,YAAY,EAAE,yBAAyB;IAG1D;;OAEG;IACH,cAAc,CAAC,MAAM,EAAE,mBAAmB,GAAG,OAAO,CAAC,IAAI,CAAC;IAO1D;;OAEG;IACG,MAAM,CAAC,MAAM,EAAE,mBAAmB,GAAG,OAAO,CAAC,cAAc,CAAC;IAUlE;;;;OAIG;IACH,WAAW;CAKZ"}
|
|
@@ -0,0 +1,109 @@
|
|
|
1
|
+
import { __decorate, __metadata, __param } from "tslib";
|
|
2
|
+
import { Injectable, forwardRef, Inject } from '@nestjs/common';
|
|
3
|
+
import { ConfigService } from '@nestjs/config';
|
|
4
|
+
import { ToolsetStrategy, } from '@xpert-ai/plugin-sdk';
|
|
5
|
+
import { MinerUResultParserService } from './result-parser.service.js';
|
|
6
|
+
import { MinerUToolset } from './mineru.toolset.js';
|
|
7
|
+
import { MinerU, icon } from './types.js';
|
|
8
|
+
/**
|
|
9
|
+
* ToolsetStrategy for MinerU PDF parser tool
|
|
10
|
+
* Registers MinerU as a toolset that can be used in agent workflows
|
|
11
|
+
*/
|
|
12
|
+
let MinerUToolsetStrategy = class MinerUToolsetStrategy {
|
|
13
|
+
constructor(configService, resultParser) {
|
|
14
|
+
this.configService = configService;
|
|
15
|
+
this.resultParser = resultParser;
|
|
16
|
+
/**
|
|
17
|
+
* Metadata for MinerU toolset
|
|
18
|
+
*/
|
|
19
|
+
this.meta = {
|
|
20
|
+
author: 'Xpert AI',
|
|
21
|
+
tags: ['pdf', 'markdown', 'parser', 'ocr', 'mineru', 'document', 'extraction'],
|
|
22
|
+
name: MinerU,
|
|
23
|
+
label: {
|
|
24
|
+
en_US: 'MinerU PDF Parser',
|
|
25
|
+
zh_Hans: 'MinerU PDF 解析器',
|
|
26
|
+
},
|
|
27
|
+
description: {
|
|
28
|
+
en_US: 'Convert PDF files to markdown format using MinerU. Supports OCR, formula recognition, and table extraction.',
|
|
29
|
+
zh_Hans: '使用 MinerU 将 PDF 文件转换为 Markdown 格式。支持 OCR、公式识别和表格提取。',
|
|
30
|
+
},
|
|
31
|
+
icon: {
|
|
32
|
+
svg: icon,
|
|
33
|
+
color: '#14b8a6',
|
|
34
|
+
},
|
|
35
|
+
configSchema: {
|
|
36
|
+
type: 'object',
|
|
37
|
+
properties: {
|
|
38
|
+
integration: {
|
|
39
|
+
type: 'object',
|
|
40
|
+
title: {
|
|
41
|
+
en_US: 'MinerU Integration',
|
|
42
|
+
zh_Hans: 'MinerU 集成',
|
|
43
|
+
},
|
|
44
|
+
description: {
|
|
45
|
+
en_US: 'MinerU integration configuration',
|
|
46
|
+
zh_Hans: 'MinerU 集成配置',
|
|
47
|
+
},
|
|
48
|
+
},
|
|
49
|
+
},
|
|
50
|
+
required: ['integration'],
|
|
51
|
+
},
|
|
52
|
+
};
|
|
53
|
+
/**
|
|
54
|
+
* Permissions required by MinerU toolset
|
|
55
|
+
*/
|
|
56
|
+
this.permissions = [
|
|
57
|
+
{
|
|
58
|
+
type: 'integration',
|
|
59
|
+
service: MinerU,
|
|
60
|
+
description: 'Access to MinerU system integrations',
|
|
61
|
+
},
|
|
62
|
+
{
|
|
63
|
+
type: 'filesystem',
|
|
64
|
+
operations: ['read', 'write', 'list'],
|
|
65
|
+
scope: [],
|
|
66
|
+
},
|
|
67
|
+
];
|
|
68
|
+
}
|
|
69
|
+
/**
|
|
70
|
+
* Validate toolset configuration
|
|
71
|
+
*/
|
|
72
|
+
validateConfig(config) {
|
|
73
|
+
if (!config.integration) {
|
|
74
|
+
throw new Error('MinerU integration is required');
|
|
75
|
+
}
|
|
76
|
+
return Promise.resolve();
|
|
77
|
+
}
|
|
78
|
+
/**
|
|
79
|
+
* Create MinerU toolset instance
|
|
80
|
+
*/
|
|
81
|
+
async create(config) {
|
|
82
|
+
// Inject dependencies into config
|
|
83
|
+
const configWithDependencies = {
|
|
84
|
+
...config,
|
|
85
|
+
configService: this.configService,
|
|
86
|
+
resultParser: this.resultParser,
|
|
87
|
+
};
|
|
88
|
+
return new MinerUToolset(configWithDependencies);
|
|
89
|
+
}
|
|
90
|
+
/**
|
|
91
|
+
* Create tools for MinerU toolset
|
|
92
|
+
* Tools are created dynamically in MinerUToolset.initTools()
|
|
93
|
+
* based on the integration configuration
|
|
94
|
+
*/
|
|
95
|
+
createTools() {
|
|
96
|
+
// Tools are created dynamically in MinerUToolset.initTools()
|
|
97
|
+
// based on the integration configuration
|
|
98
|
+
return [];
|
|
99
|
+
}
|
|
100
|
+
};
|
|
101
|
+
MinerUToolsetStrategy = __decorate([
|
|
102
|
+
Injectable(),
|
|
103
|
+
ToolsetStrategy(MinerU),
|
|
104
|
+
__param(0, Inject(forwardRef(() => ConfigService))),
|
|
105
|
+
__param(1, Inject(MinerUResultParserService)),
|
|
106
|
+
__metadata("design:paramtypes", [ConfigService,
|
|
107
|
+
MinerUResultParserService])
|
|
108
|
+
], MinerUToolsetStrategy);
|
|
109
|
+
export { MinerUToolsetStrategy };
|
|
@@ -0,0 +1,120 @@
|
|
|
1
|
+
import { IIntegration } from '@metad/contracts';
|
|
2
|
+
import { ConfigService } from '@nestjs/config';
|
|
3
|
+
import { XpFileSystem } from '@xpert-ai/plugin-sdk';
|
|
4
|
+
import { AxiosResponse } from 'axios';
|
|
5
|
+
import { MinerUIntegrationOptions, MineruSelfHostedTaskResult, MinerUServerType } from './types.js';
|
|
6
|
+
interface CreateTaskOptions {
|
|
7
|
+
url?: string;
|
|
8
|
+
filePath?: string;
|
|
9
|
+
fileName?: string;
|
|
10
|
+
isOcr?: boolean;
|
|
11
|
+
enableFormula?: boolean;
|
|
12
|
+
enableTable?: boolean;
|
|
13
|
+
language?: string;
|
|
14
|
+
modelVersion?: string;
|
|
15
|
+
dataId?: string;
|
|
16
|
+
pageRanges?: string;
|
|
17
|
+
extraFormats?: string[];
|
|
18
|
+
callbackUrl?: string;
|
|
19
|
+
seed?: string;
|
|
20
|
+
/** Optional parse method used by self-hosted MinerU deployments */
|
|
21
|
+
parseMethod?: string;
|
|
22
|
+
/** Optional backend identifier used by self-hosted MinerU deployments */
|
|
23
|
+
backend?: string;
|
|
24
|
+
/** Optional mineru backend server url (used when backend is VLM client) */
|
|
25
|
+
serverUrl?: string;
|
|
26
|
+
/** Whether to request intermediate JSON payloads from self-hosted MinerU */
|
|
27
|
+
returnMiddleJson?: boolean;
|
|
28
|
+
}
|
|
29
|
+
interface CreateBatchTaskFile {
|
|
30
|
+
url: string;
|
|
31
|
+
isOcr?: boolean;
|
|
32
|
+
dataId?: string;
|
|
33
|
+
pageRanges?: string;
|
|
34
|
+
}
|
|
35
|
+
interface CreateBatchTaskOptions {
|
|
36
|
+
files: CreateBatchTaskFile[];
|
|
37
|
+
enableFormula?: boolean;
|
|
38
|
+
enableTable?: boolean;
|
|
39
|
+
language?: string;
|
|
40
|
+
modelVersion?: string;
|
|
41
|
+
extraFormats?: string[];
|
|
42
|
+
callbackUrl?: string;
|
|
43
|
+
seed?: string;
|
|
44
|
+
}
|
|
45
|
+
interface TaskResultOptions {
|
|
46
|
+
enableFormula?: boolean;
|
|
47
|
+
enableTable?: boolean;
|
|
48
|
+
language?: string;
|
|
49
|
+
}
|
|
50
|
+
export declare class MinerUClient {
|
|
51
|
+
private readonly configService;
|
|
52
|
+
private readonly permissions?;
|
|
53
|
+
private readonly logger;
|
|
54
|
+
private readonly baseUrl;
|
|
55
|
+
private readonly token?;
|
|
56
|
+
readonly serverType: MinerUServerType;
|
|
57
|
+
private readonly localTasks;
|
|
58
|
+
get fileSystem(): XpFileSystem | undefined;
|
|
59
|
+
constructor(configService: ConfigService, permissions?: {
|
|
60
|
+
fileSystem?: XpFileSystem;
|
|
61
|
+
integration?: Partial<IIntegration<MinerUIntegrationOptions>>;
|
|
62
|
+
});
|
|
63
|
+
/**
|
|
64
|
+
* Create a MinerU extraction task. For self-hosted deployments the file will be uploaded immediately
|
|
65
|
+
* and the parsed result cached locally, while official deployments follow the async task lifecycle.
|
|
66
|
+
*/
|
|
67
|
+
createTask(options: CreateTaskOptions): Promise<{
|
|
68
|
+
taskId: string;
|
|
69
|
+
}>;
|
|
70
|
+
/**
|
|
71
|
+
* Create a batch MinerU extraction task. Only supported for official MinerU deployments.
|
|
72
|
+
*/
|
|
73
|
+
createBatchTask(options: CreateBatchTaskOptions): Promise<{
|
|
74
|
+
batchId: string;
|
|
75
|
+
fileUrls?: string[];
|
|
76
|
+
}>;
|
|
77
|
+
getSelfHostedTask(taskId: string): MineruSelfHostedTaskResult | undefined;
|
|
78
|
+
/**
|
|
79
|
+
* Query offical task status or results.
|
|
80
|
+
*/
|
|
81
|
+
getTaskResult(taskId: string, options?: TaskResultOptions): Promise<{
|
|
82
|
+
full_zip_url?: string;
|
|
83
|
+
full_url?: string;
|
|
84
|
+
content?: string;
|
|
85
|
+
status?: string;
|
|
86
|
+
}>;
|
|
87
|
+
/**
|
|
88
|
+
* Query batch task results. Only supported for official MinerU deployments.
|
|
89
|
+
*/
|
|
90
|
+
getBatchResult(batchId: string): Promise<any>;
|
|
91
|
+
/**
|
|
92
|
+
* Wait for a task to complete and return the result when available.
|
|
93
|
+
*/
|
|
94
|
+
waitForTask(taskId: string, timeoutMs?: number, intervalMs?: number): Promise<any>;
|
|
95
|
+
private ensureOfficial;
|
|
96
|
+
private resolveServerType;
|
|
97
|
+
private resolveCredentials;
|
|
98
|
+
private readIntegrationOptions;
|
|
99
|
+
private normalizeBaseUrl;
|
|
100
|
+
private buildApiUrl;
|
|
101
|
+
private getOfficialHeaders;
|
|
102
|
+
private getSelfHostedHeaders;
|
|
103
|
+
private createOfficialTask;
|
|
104
|
+
private createSelfHostedTask;
|
|
105
|
+
private invokeSelfHostedParse;
|
|
106
|
+
private invokeSelfHostedParseV1;
|
|
107
|
+
private isSelfHostedApiV1;
|
|
108
|
+
private normalizeSelfHostedResponse;
|
|
109
|
+
private normalizeSelfHostedFileResult;
|
|
110
|
+
private normalizeImageMap;
|
|
111
|
+
private parseJsonSafe;
|
|
112
|
+
private buildLanguageList;
|
|
113
|
+
private booleanToString;
|
|
114
|
+
private downloadFile;
|
|
115
|
+
private extractFileName;
|
|
116
|
+
getSelfHostedOpenApiSpec(): Promise<AxiosResponse<any, any>>;
|
|
117
|
+
validateOfficialApiToken(): Promise<void>;
|
|
118
|
+
}
|
|
119
|
+
export {};
|
|
120
|
+
//# sourceMappingURL=mineru.client.d.ts.map
|
|
@@ -0,0 +1 @@
|
|
|
1
|
+
{"version":3,"file":"mineru.client.d.ts","sourceRoot":"","sources":["../../src/lib/mineru.client.ts"],"names":[],"mappings":"AAAA,OAAO,EAAE,YAAY,EAAE,MAAM,kBAAkB,CAAC;AAEhD,OAAO,EAAE,aAAa,EAAE,MAAM,gBAAgB,CAAC;AAC/C,OAAO,EAAmB,YAAY,EAAE,MAAM,sBAAsB,CAAC;AACrE,OAAc,EAAE,aAAa,EAAE,MAAM,OAAO,CAAC;AAK7C,OAAO,EAIL,wBAAwB,EAExB,0BAA0B,EAC1B,gBAAgB,EACjB,MAAM,YAAY,CAAC;AAIpB,UAAU,iBAAiB;IACzB,GAAG,CAAC,EAAE,MAAM,CAAC;IACb,QAAQ,CAAC,EAAE,MAAM,CAAC;IAClB,QAAQ,CAAC,EAAE,MAAM,CAAC;IAClB,KAAK,CAAC,EAAE,OAAO,CAAC;IAChB,aAAa,CAAC,EAAE,OAAO,CAAC;IACxB,WAAW,CAAC,EAAE,OAAO,CAAC;IACtB,QAAQ,CAAC,EAAE,MAAM,CAAC;IAClB,YAAY,CAAC,EAAE,MAAM,CAAC;IACtB,MAAM,CAAC,EAAE,MAAM,CAAC;IAChB,UAAU,CAAC,EAAE,MAAM,CAAC;IACpB,YAAY,CAAC,EAAE,MAAM,EAAE,CAAC;IACxB,WAAW,CAAC,EAAE,MAAM,CAAC;IACrB,IAAI,CAAC,EAAE,MAAM,CAAC;IACd,mEAAmE;IACnE,WAAW,CAAC,EAAE,MAAM,CAAC;IACrB,yEAAyE;IACzE,OAAO,CAAC,EAAE,MAAM,CAAC;IACjB,2EAA2E;IAC3E,SAAS,CAAC,EAAE,MAAM,CAAC;IACnB,4EAA4E;IAC5E,gBAAgB,CAAC,EAAE,OAAO,CAAC;CAC5B;AAED,UAAU,mBAAmB;IAC3B,GAAG,EAAE,MAAM,CAAC;IACZ,KAAK,CAAC,EAAE,OAAO,CAAC;IAChB,MAAM,CAAC,EAAE,MAAM,CAAC;IAChB,UAAU,CAAC,EAAE,MAAM,CAAC;CACrB;AAED,UAAU,sBAAsB;IAC9B,KAAK,EAAE,mBAAmB,EAAE,CAAC;IAC7B,aAAa,CAAC,EAAE,OAAO,CAAC;IACxB,WAAW,CAAC,EAAE,OAAO,CAAC;IACtB,QAAQ,CAAC,EAAE,MAAM,CAAC;IAClB,YAAY,CAAC,EAAE,MAAM,CAAC;IACtB,YAAY,CAAC,EAAE,MAAM,EAAE,CAAC;IACxB,WAAW,CAAC,EAAE,MAAM,CAAC;IACrB,IAAI,CAAC,EAAE,MAAM,CAAC;CACf;AAED,UAAU,iBAAiB;IACzB,aAAa,CAAC,EAAE,OAAO,CAAC;IACxB,WAAW,CAAC,EAAE,OAAO,CAAC;IACtB,QAAQ,CAAC,EAAE,MAAM,CAAC;CACnB;AASD,qBAAa,YAAY;IAWrB,OAAO,CAAC,QAAQ,CAAC,aAAa;IAC9B,OAAO,CAAC,QAAQ,CAAC,WAAW,CAAC;IAX/B,OAAO,CAAC,QAAQ,CAAC,MAAM,CAAiC;IACxD,OAAO,CAAC,QAAQ,CAAC,OAAO,CAAS;IACjC,OAAO,CAAC,QAAQ,CAAC,KAAK,CAAC,CAAS;IAChC,SAAgB,UAAU,EAAE,gBAAgB,CAAC;IAC7C,OAAO,CAAC,QAAQ,CAAC,UAAU,CAAiD;IAE5E,IAAI,UAAU,IAAI,YAAY,GAAG,SAAS,CAEzC;gBAEkB,aAAa,EAAE,aAAa,EAC5B,WAAW,CAAC,EAAE;QACvB,UAAU,CAAC,EAAE,YAAY,CAAC;QAC1B,WAAW,CAAC,EAAE,OAAO,CAAC,YAAY,CAAC,wBAAwB,CAAC,CAAC,CAAC;KACjE;IAkBP;;;OAGG;IACG,UAAU,CAAC,OAAO,EAAE,iBAAiB,GAAG,OAAO,CAAC;QAAE,MAAM,EAAE,MAAM,CAAA;KAAE,CAAC;IAYzE;;OAEG;IACG,eAAe,CAAC,OAAO,EAAE,sBAAsB,GAAG,OAAO,CAAC;QAAE,OAAO,EAAE,MAAM,CAAC;QAAC,QAAQ,CAAC,EAAE,MAAM,EAAE,CAAA;KAAE,CAAC;IAmCzG,iBAAiB,CAAC,MAAM,EAAE,MAAM,GAAG,0BAA0B,GAAG,SAAS;IAOzE;;OAEG;IACG,aAAa,CAAC,MAAM,EAAE,MAAM,EAAE,OAAO,CAAC,EAAE,iBAAiB,GAAG,OAAO,CAAC;QACxE,YAAY,CAAC,EAAE,MAAM,CAAC;QACtB,QAAQ,CAAC,EAAE,MAAM,CAAC;QAClB,OAAO,CAAC,EAAE,MAAM,CAAC;QACjB,MAAM,CAAC,EAAE,MAAM,CAAC;KACjB,CAAC;IAoBF;;OAEG;IACG,cAAc,CAAC,OAAO,EAAE,MAAM,GAAG,OAAO,CAAC,GAAG,CAAC;IAiBnD;;OAEG;IACG,WAAW,CAAC,MAAM,EAAE,MAAM,EAAE,SAAS,SAAgB,EAAE,UAAU,SAAO,GAAG,OAAO,CAAC,GAAG,CAAC;IAsB7F,OAAO,CAAC,cAAc;IAMtB,OAAO,CAAC,iBAAiB;IAczB,OAAO,CAAC,kBAAkB;IAyB1B,OAAO,CAAC,sBAAsB;IAI9B,OAAO,CAAC,gBAAgB;IAIxB,OAAO,CAAC,WAAW;IAQnB,OAAO,CAAC,kBAAkB;IAO1B,OAAO,CAAC,oBAAoB;YAYd,kBAAkB;YA4BlB,oBAAoB;YASpB,qBAAqB;YA0DrB,uBAAuB;IA+CrC,OAAO,CAAC,iBAAiB;IAgBzB,OAAO,CAAC,2BAA2B;IAenC,OAAO,CAAC,6BAA6B;IAcrC,OAAO,CAAC,iBAAiB;IAQzB,OAAO,CAAC,aAAa;IAcrB,OAAO,CAAC,iBAAiB;IAQzB,OAAO,CAAC,eAAe;YAIT,YAAY;IAkB1B,OAAO,CAAC,eAAe;IA0BvB,wBAAwB,IAAI,OAAO,CAAC,aAAa,CAAC,GAAG,EAAE,GAAG,CAAC,CAAC;IAKtD,wBAAwB;CAU/B"}
|