multi-modal-mcp 0.0.1
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/LICENSE +21 -0
- package/README.md +129 -0
- package/dist/config.js +24 -0
- package/dist/index.js +71 -0
- package/dist/tools/ImageGenerationTool.js +58 -0
- package/dist/tools/MultiModalUnderstandingTool.js +135 -0
- package/dist/tools/TextGenerationTool.js +75 -0
- package/dist/tools/VideoGenerationTool.js +122 -0
- package/dist/utils/http.js +137 -0
- package/package.json +51 -0
package/LICENSE
ADDED
|
@@ -0,0 +1,21 @@
|
|
|
1
|
+
MIT License
|
|
2
|
+
|
|
3
|
+
Copyright (c) 2024 橘子
|
|
4
|
+
|
|
5
|
+
Permission is hereby granted, free of charge, to any person obtaining a copy
|
|
6
|
+
of this software and associated documentation files (the "Software"), to deal
|
|
7
|
+
in the Software without restriction, including without limitation the rights
|
|
8
|
+
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
|
|
9
|
+
copies of the Software, and to permit persons to whom the Software is
|
|
10
|
+
furnished to do so, subject to the following conditions:
|
|
11
|
+
|
|
12
|
+
The above copyright notice and this permission notice shall be included in all
|
|
13
|
+
copies or substantial portions of the Software.
|
|
14
|
+
|
|
15
|
+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
|
16
|
+
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
|
17
|
+
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
|
|
18
|
+
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
|
19
|
+
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
|
|
20
|
+
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
|
|
21
|
+
SOFTWARE.
|
package/README.md
ADDED
|
@@ -0,0 +1,129 @@
|
|
|
1
|
+
# @juzi/multi-modal-mcp
|
|
2
|
+
|
|
3
|
+
基于智谱 AI 的多模态 MCP 服务器,提供文本生成、图片生成、视频生成和多模态理解等功能。
|
|
4
|
+
|
|
5
|
+
## 功能特性
|
|
6
|
+
|
|
7
|
+
- **文本生成**: 基于 GLM-4.5-Flash 模型,支持对话、写作、翻译、代码生成等多种文本生成任务
|
|
8
|
+
- **多模态理解**: 基于 GLM-4.6V-Flash 模型,支持图片、视频、文档等多种媒体类型的智能分析与理解
|
|
9
|
+
- **图片生成**: 基于 Cogview-3-Flash 模型,根据文本描述生成高质量图片
|
|
10
|
+
- **视频生成**: 基于 CogVideoX-Flash 模型,根据文本描述生成高质量视频
|
|
11
|
+
|
|
12
|
+
## 安装
|
|
13
|
+
|
|
14
|
+
```bash
|
|
15
|
+
npm install -g @juzi/multi-modal-mcp
|
|
16
|
+
```
|
|
17
|
+
|
|
18
|
+
或使用 npx 直接运行:
|
|
19
|
+
|
|
20
|
+
```bash
|
|
21
|
+
npx @juzi/multi-modal-mcp@latest
|
|
22
|
+
```
|
|
23
|
+
|
|
24
|
+
## 配置
|
|
25
|
+
|
|
26
|
+
在使用前,需要设置智谱 AI 的 API 密钥:
|
|
27
|
+
|
|
28
|
+
```bash
|
|
29
|
+
export KEY="your-api-key"
|
|
30
|
+
```
|
|
31
|
+
|
|
32
|
+
或在 MCP Inspector 中配置环境变量。
|
|
33
|
+
|
|
34
|
+
## 使用方法
|
|
35
|
+
|
|
36
|
+
### 直接运行
|
|
37
|
+
|
|
38
|
+
```bash
|
|
39
|
+
npx @juzi/multi-modal-mcp@latest
|
|
40
|
+
```
|
|
41
|
+
|
|
42
|
+
### 使用 MCP Inspector 调试
|
|
43
|
+
|
|
44
|
+
```bash
|
|
45
|
+
npx @modelcontextprotocol/inspector npx @juzi/multi-modal-mcp@latest
|
|
46
|
+
```
|
|
47
|
+
|
|
48
|
+
### 在 MCP 客户端中配置
|
|
49
|
+
|
|
50
|
+
在 Claude Desktop 或其他 MCP 客户端的配置文件中添加:
|
|
51
|
+
|
|
52
|
+
```json
|
|
53
|
+
{
|
|
54
|
+
"mcpServers": {
|
|
55
|
+
"multi-modal": {
|
|
56
|
+
"command": "npx",
|
|
57
|
+
"args": ["-y", "@juzi/multi-modal-mcp@latest"],
|
|
58
|
+
"env": {
|
|
59
|
+
"KEY": "your-api-key"
|
|
60
|
+
}
|
|
61
|
+
}
|
|
62
|
+
}
|
|
63
|
+
}
|
|
64
|
+
```
|
|
65
|
+
|
|
66
|
+
## 工具列表
|
|
67
|
+
|
|
68
|
+
### 1. text_generation
|
|
69
|
+
|
|
70
|
+
基于 GLM-4.5-Flash 模型的文本生成工具,支持对话、写作、翻译、代码生成等多种文本生成任务。支持思考模式,可展示模型思考过程。通过温度参数控制生成文本的随机性和创造性。
|
|
71
|
+
|
|
72
|
+
**参数**:
|
|
73
|
+
|
|
74
|
+
- `messages`: 消息文本,必传参数
|
|
75
|
+
- `thinking`: 是否启用思考模式,可选值:enabled/disabled,默认 disabled
|
|
76
|
+
- `temperature`: 温度参数,控制生成文本的随机性,范围 0-1,默认 1
|
|
77
|
+
|
|
78
|
+
### 2. multi_modal_understanding
|
|
79
|
+
|
|
80
|
+
基于 GLM-4.6V-Flash 模型的多模态理解工具,支持图片、视频、文档等多种媒体类型的智能分析与理解。可执行 OCR 文字识别、表格解析、内容分析、缺陷检测、图像转提示词、视频标签提取、关键帧提取、时间线生成、脚本生成、视频问答、文档问答、文档对比等多种任务。支持同时处理多种媒体类型。
|
|
81
|
+
|
|
82
|
+
**参数**:
|
|
83
|
+
|
|
84
|
+
- `content`: 媒体内容列表,支持图片、视频、文档
|
|
85
|
+
- `question`: 关于媒体内容的问题,必传参数
|
|
86
|
+
- `thinking`: 是否启用思考模式,可选值:enabled/disabled,默认 disabled
|
|
87
|
+
- `temperature`: 温度参数,控制生成文本的随机性,范围 0-1,默认 1
|
|
88
|
+
|
|
89
|
+
### 3. image_generation
|
|
90
|
+
|
|
91
|
+
基于 Cogview-3-Flash 模型的图片生成工具,根据文本描述生成高质量图片。支持多种尺寸选择,包括横屏、竖屏和方形等多种比例。可选择是否添加水印。适用于插画、设计素材、场景生成等多种应用场景。
|
|
92
|
+
|
|
93
|
+
**参数**:
|
|
94
|
+
|
|
95
|
+
- `prompt`: 图片的文本描述,必传参数
|
|
96
|
+
- `size`: 图片尺寸,可选值:1024x1024/768x1344/864x1152/1344x768/1152x864/1440x720/720x1440,默认 1024x1024
|
|
97
|
+
- `watermark_enabled`: 是否添加水印,默认 false
|
|
98
|
+
|
|
99
|
+
### 4. video_generation
|
|
100
|
+
|
|
101
|
+
基于 CogVideoX-Flash 模型的视频生成工具,根据文本描述生成高质量视频。支持多种分辨率选择,包括 720p、1080p、2K、4K 等多种规格。支持质量优先和速度优先两种输出模式。可选择是否生成 AI 音效和添加水印。采用异步处理机制,自动轮询任务状态直至完成。
|
|
102
|
+
|
|
103
|
+
**参数**:
|
|
104
|
+
|
|
105
|
+
- `prompt`: 视频的文本描述,最大输入长度为 512 个字符,必传参数
|
|
106
|
+
- `quality`: 输出模式,可选值:quality/speed,默认 speed
|
|
107
|
+
- `withAudio`: 是否生成 AI 音效,默认 false
|
|
108
|
+
- `watermarkEnabled`: 是否添加水印,默认 false
|
|
109
|
+
- `size`: 视频分辨率,支持多种规格,默认 1024x1024
|
|
110
|
+
- `fps`: 视频帧率,可选值:30/60,默认 30
|
|
111
|
+
|
|
112
|
+
## API 密钥获取
|
|
113
|
+
|
|
114
|
+
1. 访问 [智谱 AI 开放平台](https://open.bigmodel.cn/)
|
|
115
|
+
2. 注册并登录账号
|
|
116
|
+
3. 在控制台创建 API 密钥
|
|
117
|
+
|
|
118
|
+
## 许可证
|
|
119
|
+
|
|
120
|
+
MIT License
|
|
121
|
+
|
|
122
|
+
## 作者
|
|
123
|
+
|
|
124
|
+
橘子
|
|
125
|
+
|
|
126
|
+
## 相关链接
|
|
127
|
+
|
|
128
|
+
- [智谱 AI 开放平台](https://open.bigmodel.cn/)
|
|
129
|
+
- [Model Context Protocol](https://modelcontextprotocol.io/)
|
package/dist/config.js
ADDED
|
@@ -0,0 +1,24 @@
|
|
|
1
|
+
/*
|
|
2
|
+
* @Author: 橘子
|
|
3
|
+
* @Project_description: 配置文件
|
|
4
|
+
* @Description: 代码是我抄的,不会也是真的
|
|
5
|
+
*/
|
|
6
|
+
'use strict';
|
|
7
|
+
export const config = {
|
|
8
|
+
/** API 基础地址 */
|
|
9
|
+
baseUrl: 'https://open.bigmodel.cn/api/paas/v4',
|
|
10
|
+
/** 请求超时时间(毫秒) */
|
|
11
|
+
timeout: 10 * 60 * 1000,
|
|
12
|
+
/** 文本生成模型 */
|
|
13
|
+
textModel: 'GLM-4.5-Flash',
|
|
14
|
+
/** 图像生成模型 */
|
|
15
|
+
imageModel: 'Cogview-3-Flash',
|
|
16
|
+
/** 视觉理解模型 */
|
|
17
|
+
visualModel: 'GLM-4.6V-Flash',
|
|
18
|
+
/** 视频生成模型 */
|
|
19
|
+
videoModel: 'CogVideoX-Flash',
|
|
20
|
+
/** MCP 服务器名称 */
|
|
21
|
+
serverName: 'multi-modal-mcp',
|
|
22
|
+
/** MCP 服务器版本 */
|
|
23
|
+
serverVersion: '1.0.0',
|
|
24
|
+
};
|
package/dist/index.js
ADDED
|
@@ -0,0 +1,71 @@
|
|
|
1
|
+
#!/usr/bin/env node
|
|
2
|
+
/*
|
|
3
|
+
* @Author: 橘子
|
|
4
|
+
* @Project_description: MCP 多模态服务器主文件
|
|
5
|
+
* @Description: 代码是我抄的,不会也是真的
|
|
6
|
+
*/
|
|
7
|
+
import { logger, MCPServer } from 'mcp-framework';
|
|
8
|
+
import { config } from './config.js';
|
|
9
|
+
/**
|
|
10
|
+
* MyMCPServer 类
|
|
11
|
+
* 封装 MCP 服务器的启动、运行和关闭逻辑
|
|
12
|
+
*/
|
|
13
|
+
class MyMCPServer {
|
|
14
|
+
/** MCP 服务器实例 */
|
|
15
|
+
server;
|
|
16
|
+
/** 服务器名称 */
|
|
17
|
+
name = config.serverName;
|
|
18
|
+
/** 服务器版本 */
|
|
19
|
+
version = config.serverVersion;
|
|
20
|
+
/**
|
|
21
|
+
* 构造函数
|
|
22
|
+
* 初始化 MCP 服务器并注册信号处理
|
|
23
|
+
*/
|
|
24
|
+
constructor() {
|
|
25
|
+
// 创建 MCP 服务器实例,配置名称、版本和传输方式
|
|
26
|
+
this.server = new MCPServer({
|
|
27
|
+
name: this.name,
|
|
28
|
+
version: this.version,
|
|
29
|
+
transport: { type: 'stdio' },
|
|
30
|
+
});
|
|
31
|
+
// 注册进程信号监听器,用于优雅关闭
|
|
32
|
+
process.on('SIGINT', () => this.shutdown()); // Ctrl+C 信号
|
|
33
|
+
process.on('SIGTERM', () => this.shutdown()); // 终止信号
|
|
34
|
+
}
|
|
35
|
+
/**
|
|
36
|
+
* 启动服务器
|
|
37
|
+
* 异步启动 MCP 服务器,处理启动过程中的错误
|
|
38
|
+
*/
|
|
39
|
+
async start() {
|
|
40
|
+
try {
|
|
41
|
+
// 启动 MCP 服务器
|
|
42
|
+
await this.server.start();
|
|
43
|
+
logger.info('启动成功');
|
|
44
|
+
}
|
|
45
|
+
catch (error) {
|
|
46
|
+
// 启动失败时记录错误并退出
|
|
47
|
+
logger.error(`启动失败 ${String(error)}`);
|
|
48
|
+
process.exit(1);
|
|
49
|
+
}
|
|
50
|
+
}
|
|
51
|
+
/**
|
|
52
|
+
* 关闭服务器
|
|
53
|
+
* 优雅地关闭 MCP 服务器,处理关闭过程中的错误
|
|
54
|
+
*/
|
|
55
|
+
async shutdown() {
|
|
56
|
+
logger.info('正在关闭服务器...');
|
|
57
|
+
try {
|
|
58
|
+
// 停止 MCP 服务器
|
|
59
|
+
await this.server.stop();
|
|
60
|
+
// 正常退出
|
|
61
|
+
process.exit(0);
|
|
62
|
+
}
|
|
63
|
+
catch (error) {
|
|
64
|
+
// 关闭失败时记录错误并退出
|
|
65
|
+
logger.error(`关闭失败 ${String(error)}`);
|
|
66
|
+
process.exit(1);
|
|
67
|
+
}
|
|
68
|
+
}
|
|
69
|
+
}
|
|
70
|
+
// 创建服务器实例并启动,捕获任何未处理的错误
|
|
71
|
+
new MyMCPServer().start().catch(logger.error);
|
|
@@ -0,0 +1,58 @@
|
|
|
1
|
+
/*
|
|
2
|
+
* @Author: 橘子
|
|
3
|
+
* @Project_description: 智谱AI图片生成工具
|
|
4
|
+
* @Description: 代码是我抄的,不会也是真的
|
|
5
|
+
*/
|
|
6
|
+
import { MCPTool, logger } from 'mcp-framework';
|
|
7
|
+
import { z } from 'zod';
|
|
8
|
+
import { http } from '../utils/http.js';
|
|
9
|
+
import { config } from '../config.js';
|
|
10
|
+
/**
|
|
11
|
+
* 智谱AI图片生成工具
|
|
12
|
+
*/
|
|
13
|
+
class ImageGenerationTool extends MCPTool {
|
|
14
|
+
/** 工具名称 */
|
|
15
|
+
name = 'image_generation';
|
|
16
|
+
/** 工具描述 */
|
|
17
|
+
description = '基于Cogview-3-Flash模型的图片生成工具,根据文本描述生成高质量图片。支持多种尺寸选择,包括横屏、竖屏和方形等多种比例。可选择是否添加水印。适用于插画、设计素材、场景生成等多种应用场景。';
|
|
18
|
+
/** 参数模式定义 */
|
|
19
|
+
schema = z.object({
|
|
20
|
+
prompt: z.string().describe('所需图像的文本描述'),
|
|
21
|
+
size: z
|
|
22
|
+
.string()
|
|
23
|
+
.optional()
|
|
24
|
+
.default('1024x1024')
|
|
25
|
+
.describe('图片尺寸,推荐:1024x1024(默认), 768x1344, 864x1152, 1344x768, 1152x864, 1440x720, 720x1440。自定义尺寸需满足512px-2048px之间,被16整除,最大像素数不超过2^21px'),
|
|
26
|
+
watermark_enabled: z
|
|
27
|
+
.boolean()
|
|
28
|
+
.optional()
|
|
29
|
+
.default(false)
|
|
30
|
+
.describe('是否添加水印,默认false'),
|
|
31
|
+
});
|
|
32
|
+
/**
|
|
33
|
+
* 执行图片生成
|
|
34
|
+
*/
|
|
35
|
+
async execute(input) {
|
|
36
|
+
logger.info(`开始执行图片生成,描述: ${input.prompt},尺寸: ${input.size || '1024x1024'}`);
|
|
37
|
+
try {
|
|
38
|
+
const requestData = {
|
|
39
|
+
model: config.imageModel,
|
|
40
|
+
prompt: input.prompt,
|
|
41
|
+
size: input.size || '1024x1024',
|
|
42
|
+
watermark_enabled: input.watermark_enabled ?? false,
|
|
43
|
+
};
|
|
44
|
+
logger.info('调用智谱AI图片生成API');
|
|
45
|
+
const apiResponse = (await http.post('/images/generations', requestData));
|
|
46
|
+
const imageUrl = apiResponse.data?.[0]?.url;
|
|
47
|
+
if (!imageUrl)
|
|
48
|
+
throw new Error('生成图片失败,未返回图片URL');
|
|
49
|
+
logger.info('图片生成成功');
|
|
50
|
+
return imageUrl;
|
|
51
|
+
}
|
|
52
|
+
catch (error) {
|
|
53
|
+
logger.error(`图片生成失败: ${error}`);
|
|
54
|
+
throw new Error(`生成图片时发生错误: ${error}`);
|
|
55
|
+
}
|
|
56
|
+
}
|
|
57
|
+
}
|
|
58
|
+
export default ImageGenerationTool;
|
|
@@ -0,0 +1,135 @@
|
|
|
1
|
+
/*
|
|
2
|
+
* @Author: 橘子
|
|
3
|
+
* @Project_description: 智谱AI多模态理解工具
|
|
4
|
+
* @Description: 代码是我抄的,不会也是真的
|
|
5
|
+
*/
|
|
6
|
+
import { MCPTool, logger } from 'mcp-framework';
|
|
7
|
+
import { z } from 'zod';
|
|
8
|
+
import { http } from '../utils/http.js';
|
|
9
|
+
import { config } from '../config.js';
|
|
10
|
+
/**
|
|
11
|
+
* 智谱AI多模态理解工具
|
|
12
|
+
*/
|
|
13
|
+
class MultiModalUnderstandingTool extends MCPTool {
|
|
14
|
+
/** 工具名称 */
|
|
15
|
+
name = 'multi_modal_understanding';
|
|
16
|
+
/** 工具描述 */
|
|
17
|
+
description = '基于GLM-4.6V-Flash模型的多模态理解工具,支持图片、视频、文档等多种媒体类型的智能分析与理解。可执行OCR文字识别、表格解析、内容分析、缺陷检测、图像转提示词、视频标签提取、关键帧提取、时间线生成、脚本生成、视频问答、文档问答、文档对比等多种任务。支持同时处理多种媒体类型。';
|
|
18
|
+
/** 参数模式定义 */
|
|
19
|
+
schema = z.object({
|
|
20
|
+
content: z
|
|
21
|
+
.array(z.object({
|
|
22
|
+
type: z
|
|
23
|
+
.enum(['image_url', 'video_url', 'file_url'])
|
|
24
|
+
.describe('媒体类型:image_url-图片,video_url-视频,file_url-文档'),
|
|
25
|
+
image_url: z
|
|
26
|
+
.object({
|
|
27
|
+
url: z.string().describe('图片的URL地址或Base64编码'),
|
|
28
|
+
})
|
|
29
|
+
.optional()
|
|
30
|
+
.describe('图片URL对象,当type为image_url时必填'),
|
|
31
|
+
video_url: z
|
|
32
|
+
.object({
|
|
33
|
+
url: z.string().describe('视频的URL地址'),
|
|
34
|
+
})
|
|
35
|
+
.optional()
|
|
36
|
+
.describe('视频URL对象,当type为video_url时必填'),
|
|
37
|
+
file_url: z
|
|
38
|
+
.object({
|
|
39
|
+
url: z.string().describe('文件的URL地址'),
|
|
40
|
+
})
|
|
41
|
+
.optional()
|
|
42
|
+
.describe('文件URL对象,当type为file_url时必填'),
|
|
43
|
+
}))
|
|
44
|
+
.describe('媒体内容列表,支持混合图片、视频、文档'),
|
|
45
|
+
question: z
|
|
46
|
+
.string()
|
|
47
|
+
.describe('用自然语言描述您的需求,例如:"识别图片中的文字"、"提取表格数据"、"分析视频内容"、"回答关于文档的问题"等'),
|
|
48
|
+
thinking: z
|
|
49
|
+
.enum(['enabled', 'disabled'])
|
|
50
|
+
.default('disabled')
|
|
51
|
+
.describe('是否启用思考模式,启用后模型会展示思考过程,默认disabled'),
|
|
52
|
+
temperature: z
|
|
53
|
+
.number()
|
|
54
|
+
.default(1)
|
|
55
|
+
.describe('温度参数,控制生成文本的随机性,范围0-1,默认1'),
|
|
56
|
+
});
|
|
57
|
+
/**
|
|
58
|
+
* 执行多模态理解
|
|
59
|
+
*/
|
|
60
|
+
async execute(input) {
|
|
61
|
+
logger.info(`开始执行多模态理解,问题: ${input.question},媒体数量: ${input.content.length}`);
|
|
62
|
+
try {
|
|
63
|
+
const messageContent = this.buildMessageContent(input.content, input.question);
|
|
64
|
+
const requestData = {
|
|
65
|
+
model: config.visualModel,
|
|
66
|
+
messages: [
|
|
67
|
+
{
|
|
68
|
+
role: 'user',
|
|
69
|
+
content: messageContent,
|
|
70
|
+
},
|
|
71
|
+
],
|
|
72
|
+
thinking: {
|
|
73
|
+
type: input.thinking,
|
|
74
|
+
},
|
|
75
|
+
temperature: input.temperature,
|
|
76
|
+
stream: false,
|
|
77
|
+
};
|
|
78
|
+
logger.info('调用智谱AI多模态理解API');
|
|
79
|
+
const apiResponse = (await http.post('/chat/completions', requestData));
|
|
80
|
+
const messageContentResult = apiResponse.choices?.[0]?.message?.content;
|
|
81
|
+
let generatedText = '';
|
|
82
|
+
if (typeof messageContentResult === 'string') {
|
|
83
|
+
generatedText = messageContentResult;
|
|
84
|
+
}
|
|
85
|
+
else if (Array.isArray(messageContentResult)) {
|
|
86
|
+
const textItem = messageContentResult.find((item) => item.type === 'text');
|
|
87
|
+
generatedText = textItem?.text || '';
|
|
88
|
+
}
|
|
89
|
+
logger.info('多模态理解成功');
|
|
90
|
+
return generatedText;
|
|
91
|
+
}
|
|
92
|
+
catch (error) {
|
|
93
|
+
logger.error(`多模态理解失败: ${error}`);
|
|
94
|
+
throw new Error(`多模态理解时发生错误: ${error}`);
|
|
95
|
+
}
|
|
96
|
+
}
|
|
97
|
+
/**
|
|
98
|
+
* 构建消息内容
|
|
99
|
+
*/
|
|
100
|
+
buildMessageContent(content, question) {
|
|
101
|
+
const result = [];
|
|
102
|
+
for (const item of content) {
|
|
103
|
+
if (item.type === 'image_url' && item.image_url) {
|
|
104
|
+
result.push({
|
|
105
|
+
type: 'image_url',
|
|
106
|
+
image_url: {
|
|
107
|
+
url: item.image_url.url,
|
|
108
|
+
},
|
|
109
|
+
});
|
|
110
|
+
}
|
|
111
|
+
else if (item.type === 'video_url' && item.video_url) {
|
|
112
|
+
result.push({
|
|
113
|
+
type: 'video_url',
|
|
114
|
+
video_url: {
|
|
115
|
+
url: item.video_url.url,
|
|
116
|
+
},
|
|
117
|
+
});
|
|
118
|
+
}
|
|
119
|
+
else if (item.type === 'file_url' && item.file_url) {
|
|
120
|
+
result.push({
|
|
121
|
+
type: 'file_url',
|
|
122
|
+
file_url: {
|
|
123
|
+
url: item.file_url.url,
|
|
124
|
+
},
|
|
125
|
+
});
|
|
126
|
+
}
|
|
127
|
+
}
|
|
128
|
+
result.push({
|
|
129
|
+
type: 'text',
|
|
130
|
+
text: question,
|
|
131
|
+
});
|
|
132
|
+
return result;
|
|
133
|
+
}
|
|
134
|
+
}
|
|
135
|
+
export default MultiModalUnderstandingTool;
|
|
@@ -0,0 +1,75 @@
|
|
|
1
|
+
/*
|
|
2
|
+
* @Author: 橘子
|
|
3
|
+
* @Project_description: 智谱AI文本生成工具
|
|
4
|
+
* @Description: 代码是我抄的,不会也是真的
|
|
5
|
+
*/
|
|
6
|
+
import { MCPTool, logger } from 'mcp-framework';
|
|
7
|
+
import { z } from 'zod';
|
|
8
|
+
import { http } from '../utils/http.js';
|
|
9
|
+
import { config } from '../config.js';
|
|
10
|
+
/**
|
|
11
|
+
* 智谱AI文本生成工具
|
|
12
|
+
*/
|
|
13
|
+
class TextGenerationTool extends MCPTool {
|
|
14
|
+
/** 工具名称 */
|
|
15
|
+
name = 'text_generation';
|
|
16
|
+
/** 工具描述 */
|
|
17
|
+
description = '基于GLM-4.5-Flash模型的文本生成工具,支持对话、写作、翻译、代码生成等多种文本生成任务。支持思考模式,可展示模型思考过程。通过温度参数控制生成文本的随机性和创造性。';
|
|
18
|
+
/** 参数模式定义 */
|
|
19
|
+
schema = z.object({
|
|
20
|
+
messages: z.string().describe('消息文本,必传参数'),
|
|
21
|
+
thinking: z
|
|
22
|
+
.enum(['enabled', 'disabled'])
|
|
23
|
+
.default('disabled')
|
|
24
|
+
.describe('是否启用思考模式,启用后模型会展示思考过程,默认disabled'),
|
|
25
|
+
temperature: z
|
|
26
|
+
.number()
|
|
27
|
+
.default(1)
|
|
28
|
+
.describe('温度参数,控制生成文本的随机性,范围0-1,默认1'),
|
|
29
|
+
});
|
|
30
|
+
/**
|
|
31
|
+
* 执行文本生成
|
|
32
|
+
*/
|
|
33
|
+
async execute(input) {
|
|
34
|
+
logger.info(`开始执行文本生成,输入: ${input.messages}`);
|
|
35
|
+
try {
|
|
36
|
+
// 构建请求参数
|
|
37
|
+
const requestData = {
|
|
38
|
+
model: config.textModel,
|
|
39
|
+
messages: [
|
|
40
|
+
{
|
|
41
|
+
role: 'user',
|
|
42
|
+
content: input.messages,
|
|
43
|
+
},
|
|
44
|
+
],
|
|
45
|
+
thinking: {
|
|
46
|
+
type: input.thinking,
|
|
47
|
+
},
|
|
48
|
+
temperature: input.temperature,
|
|
49
|
+
stream: false, // 确保非流式返回
|
|
50
|
+
};
|
|
51
|
+
logger.info('调用智谱AI文本生成API');
|
|
52
|
+
// 调用智谱AI API
|
|
53
|
+
const apiResponse = (await http.post('/chat/completions', requestData));
|
|
54
|
+
// 提取生成的文本
|
|
55
|
+
const messageContent = apiResponse.choices?.[0]?.message?.content;
|
|
56
|
+
// 处理 content 可能是字符串或数组的情况
|
|
57
|
+
let generatedText = '';
|
|
58
|
+
if (typeof messageContent === 'string') {
|
|
59
|
+
generatedText = messageContent;
|
|
60
|
+
}
|
|
61
|
+
else if (Array.isArray(messageContent)) {
|
|
62
|
+
// 如果是数组,提取 type 为 'text' 的内容
|
|
63
|
+
const textItem = messageContent.find((item) => item.type === 'text');
|
|
64
|
+
generatedText = textItem?.text || '';
|
|
65
|
+
}
|
|
66
|
+
logger.info('文本生成成功');
|
|
67
|
+
return generatedText;
|
|
68
|
+
}
|
|
69
|
+
catch (error) {
|
|
70
|
+
logger.error(`文本生成失败: ${error}`);
|
|
71
|
+
throw new Error(`生成文本时发生错误: ${error}`);
|
|
72
|
+
}
|
|
73
|
+
}
|
|
74
|
+
}
|
|
75
|
+
export default TextGenerationTool;
|
|
@@ -0,0 +1,122 @@
|
|
|
1
|
+
/*
|
|
2
|
+
* @Author: 橘子
|
|
3
|
+
* @Project_description: 智谱AI视频生成工具
|
|
4
|
+
* @Description: 代码是我抄的,不会也是真的
|
|
5
|
+
*/
|
|
6
|
+
import { MCPTool, logger } from 'mcp-framework';
|
|
7
|
+
import { z } from 'zod';
|
|
8
|
+
import { http } from '../utils/http.js';
|
|
9
|
+
import { config } from '../config.js';
|
|
10
|
+
/**
|
|
11
|
+
* 智谱AI视频生成工具
|
|
12
|
+
*/
|
|
13
|
+
class VideoGenerationTool extends MCPTool {
|
|
14
|
+
/** 工具名称 */
|
|
15
|
+
name = 'video_generation';
|
|
16
|
+
/** 工具描述 */
|
|
17
|
+
description = '基于CogVideoX-Flash模型的视频生成工具,根据文本描述生成高质量视频。支持多种分辨率选择,包括720p、1080p、2K、4K等多种规格。支持质量优先和速度优先两种输出模式。可选择是否生成AI音效和添加水印。采用异步处理机制,自动轮询任务状态直至完成。';
|
|
18
|
+
/** 参数模式定义 */
|
|
19
|
+
schema = z.object({
|
|
20
|
+
prompt: z
|
|
21
|
+
.string()
|
|
22
|
+
.describe('视频的文本描述,最大输入长度为 512 个字符,必传参数'),
|
|
23
|
+
quality: z
|
|
24
|
+
.enum(['quality', 'speed'])
|
|
25
|
+
.optional()
|
|
26
|
+
.default('speed')
|
|
27
|
+
.describe('输出模式,quality-质量优先,speed-速度优先,默认speed'),
|
|
28
|
+
withAudio: z
|
|
29
|
+
.boolean()
|
|
30
|
+
.optional()
|
|
31
|
+
.default(false)
|
|
32
|
+
.describe('是否生成 AI 音效,默认false'),
|
|
33
|
+
watermarkEnabled: z
|
|
34
|
+
.boolean()
|
|
35
|
+
.optional()
|
|
36
|
+
.default(false)
|
|
37
|
+
.describe('控制是否添加水印,默认false'),
|
|
38
|
+
// imageUrl: z
|
|
39
|
+
// .string()
|
|
40
|
+
// .optional()
|
|
41
|
+
// .describe('用于内容生成的基础图像,支持 URL 或 Base64 编码图像'),
|
|
42
|
+
size: z
|
|
43
|
+
.enum([
|
|
44
|
+
'720x480',
|
|
45
|
+
'1024x1024',
|
|
46
|
+
'1280x960',
|
|
47
|
+
'960x1280',
|
|
48
|
+
'1920x1080',
|
|
49
|
+
'1080x1920',
|
|
50
|
+
'2048x1080',
|
|
51
|
+
'3840x2160',
|
|
52
|
+
])
|
|
53
|
+
.optional()
|
|
54
|
+
.default('1024x1024')
|
|
55
|
+
.describe('视频分辨率,默认1024x1024'),
|
|
56
|
+
fps: z
|
|
57
|
+
.union([z.literal(30), z.literal(60)])
|
|
58
|
+
.optional()
|
|
59
|
+
.default(30)
|
|
60
|
+
.describe('视频帧率(FPS),可选值为 30 或 60,默认30'),
|
|
61
|
+
});
|
|
62
|
+
/**
|
|
63
|
+
* 执行视频生成
|
|
64
|
+
*/
|
|
65
|
+
async execute(input) {
|
|
66
|
+
logger.info(`开始执行视频生成,描述: ${input.prompt},尺寸: ${input.size || '1024x1024'},帧率: ${input.fps || 30}`);
|
|
67
|
+
try {
|
|
68
|
+
const requestData = {
|
|
69
|
+
model: config.videoModel,
|
|
70
|
+
prompt: input.prompt,
|
|
71
|
+
quality: input.quality || 'speed',
|
|
72
|
+
with_audio: input.withAudio ?? false,
|
|
73
|
+
watermark_enabled: input.watermarkEnabled ?? true,
|
|
74
|
+
image_url: input.imageUrl,
|
|
75
|
+
size: input.size,
|
|
76
|
+
fps: input.fps || 30,
|
|
77
|
+
};
|
|
78
|
+
logger.info('调用智谱AI视频生成API,创建任务');
|
|
79
|
+
const apiResponse = (await http.post('/videos/generations', requestData));
|
|
80
|
+
const taskId = apiResponse.id;
|
|
81
|
+
if (!taskId)
|
|
82
|
+
throw new Error('视频生成任务创建失败,未返回任务ID');
|
|
83
|
+
logger.info(`视频生成任务创建成功,任务ID: ${taskId},开始轮询任务状态`);
|
|
84
|
+
const videoUrl = await this.pollTaskStatus(taskId);
|
|
85
|
+
logger.info('视频生成成功');
|
|
86
|
+
return videoUrl;
|
|
87
|
+
}
|
|
88
|
+
catch (error) {
|
|
89
|
+
logger.error(`视频生成失败: ${error}`);
|
|
90
|
+
throw new Error(`视频生成时发生错误: ${error}`);
|
|
91
|
+
}
|
|
92
|
+
}
|
|
93
|
+
/**
|
|
94
|
+
* 轮询任务状态
|
|
95
|
+
*/
|
|
96
|
+
async pollTaskStatus(taskId) {
|
|
97
|
+
const startTime = Date.now();
|
|
98
|
+
const timeout = 60000;
|
|
99
|
+
logger.info(`开始轮询任务状态,任务ID: ${taskId}`);
|
|
100
|
+
while (true) {
|
|
101
|
+
const elapsedTime = Date.now() - startTime;
|
|
102
|
+
if (elapsedTime >= timeout) {
|
|
103
|
+
logger.error(`视频生成超时,超过60秒未完成,任务ID: ${taskId}`);
|
|
104
|
+
throw new Error('视频生成超时,超过60秒未完成');
|
|
105
|
+
}
|
|
106
|
+
const apiResponse = (await http.get(`/async-result/${taskId}`));
|
|
107
|
+
const taskData = apiResponse;
|
|
108
|
+
if (taskData.task_status === 'SUCCESS' &&
|
|
109
|
+
taskData.video_result?.[0]?.url) {
|
|
110
|
+
logger.info(`任务完成,任务ID: ${taskId},状态: ${taskData.task_status}`);
|
|
111
|
+
return taskData.video_result[0].url;
|
|
112
|
+
}
|
|
113
|
+
if (taskData.task_status === 'FAILED') {
|
|
114
|
+
logger.error(`任务失败,任务ID: ${taskId},状态: ${taskData.task_status}`);
|
|
115
|
+
throw new Error('视频生成失败');
|
|
116
|
+
}
|
|
117
|
+
logger.debug(`任务进行中,任务ID: ${taskId},状态: ${taskData.task_status},已等待: ${Math.floor(elapsedTime / 1000)}秒`);
|
|
118
|
+
await new Promise((resolve) => setTimeout(resolve, 1000));
|
|
119
|
+
}
|
|
120
|
+
}
|
|
121
|
+
}
|
|
122
|
+
export default VideoGenerationTool;
|
|
@@ -0,0 +1,137 @@
|
|
|
1
|
+
/*
|
|
2
|
+
* @Author: 橘子
|
|
3
|
+
* @Project_description: API HTTP 封装
|
|
4
|
+
* @Description: 代码是我抄的,不会也是真的
|
|
5
|
+
*/
|
|
6
|
+
import axios from 'axios';
|
|
7
|
+
import { config } from '../config.js';
|
|
8
|
+
/**
|
|
9
|
+
* 智谱AI业务错误码定义
|
|
10
|
+
*/
|
|
11
|
+
var ZhipuErrorCode;
|
|
12
|
+
(function (ZhipuErrorCode) {
|
|
13
|
+
// 身份验证错误
|
|
14
|
+
ZhipuErrorCode["AUTH_FAILED"] = "1000";
|
|
15
|
+
ZhipuErrorCode["AUTH_MISSING"] = "1001";
|
|
16
|
+
ZhipuErrorCode["AUTH_INVALID"] = "1002";
|
|
17
|
+
ZhipuErrorCode["AUTH_EXPIRED"] = "1003";
|
|
18
|
+
ZhipuErrorCode["AUTH_VERIFY_FAILED"] = "1004";
|
|
19
|
+
// 账户错误
|
|
20
|
+
ZhipuErrorCode["ACCOUNT_INACTIVE"] = "1110";
|
|
21
|
+
ZhipuErrorCode["ACCOUNT_NOT_EXIST"] = "1111";
|
|
22
|
+
ZhipuErrorCode["ACCOUNT_LOCKED"] = "1112";
|
|
23
|
+
ZhipuErrorCode["ACCOUNT_ARREARS"] = "1113";
|
|
24
|
+
// API调用错误
|
|
25
|
+
ZhipuErrorCode["API_PARAM_ERROR"] = "1210";
|
|
26
|
+
ZhipuErrorCode["API_MODEL_NOT_EXIST"] = "1211";
|
|
27
|
+
ZhipuErrorCode["API_METHOD_NOT_SUPPORTED"] = "1212";
|
|
28
|
+
ZhipuErrorCode["API_PARAM_MISSING"] = "1213";
|
|
29
|
+
ZhipuErrorCode["API_PARAM_INVALID"] = "1214";
|
|
30
|
+
ZhipuErrorCode["API_NO_PERMISSION"] = "1220";
|
|
31
|
+
ZhipuErrorCode["API_OFFLINE"] = "1221";
|
|
32
|
+
ZhipuErrorCode["API_NOT_EXIST"] = "1222";
|
|
33
|
+
// 策略阻止错误
|
|
34
|
+
ZhipuErrorCode["POLICY_BLOCKED"] = "1301";
|
|
35
|
+
ZhipuErrorCode["CONCURRENCY_LIMIT"] = "1302";
|
|
36
|
+
ZhipuErrorCode["RATE_LIMIT"] = "1303";
|
|
37
|
+
ZhipuErrorCode["DAILY_LIMIT"] = "1304";
|
|
38
|
+
ZhipuErrorCode["TRAFFIC_LIMIT"] = "1305";
|
|
39
|
+
})(ZhipuErrorCode || (ZhipuErrorCode = {}));
|
|
40
|
+
/**
|
|
41
|
+
* 错误信息映射
|
|
42
|
+
*/
|
|
43
|
+
const ERROR_MESSAGES = {
|
|
44
|
+
[ZhipuErrorCode.AUTH_FAILED]: '身份验证失败',
|
|
45
|
+
[ZhipuErrorCode.AUTH_MISSING]: 'Header中未收到Authentication参数',
|
|
46
|
+
[ZhipuErrorCode.AUTH_INVALID]: 'Authorization Token非法',
|
|
47
|
+
[ZhipuErrorCode.AUTH_EXPIRED]: 'Authorization Token已过期',
|
|
48
|
+
[ZhipuErrorCode.AUTH_VERIFY_FAILED]: 'Token验证失败',
|
|
49
|
+
[ZhipuErrorCode.ACCOUNT_INACTIVE]: '账户处于非活动状态',
|
|
50
|
+
[ZhipuErrorCode.ACCOUNT_NOT_EXIST]: '账户不存在',
|
|
51
|
+
[ZhipuErrorCode.ACCOUNT_LOCKED]: '账户已被锁定',
|
|
52
|
+
[ZhipuErrorCode.ACCOUNT_ARREARS]: '账户已欠费',
|
|
53
|
+
[ZhipuErrorCode.API_PARAM_ERROR]: 'API调用参数有误',
|
|
54
|
+
[ZhipuErrorCode.API_MODEL_NOT_EXIST]: '模型不存在',
|
|
55
|
+
[ZhipuErrorCode.API_METHOD_NOT_SUPPORTED]: '当前模型不支持此调用方式',
|
|
56
|
+
[ZhipuErrorCode.API_PARAM_MISSING]: '未正常接收到必需参数',
|
|
57
|
+
[ZhipuErrorCode.API_PARAM_INVALID]: '参数非法',
|
|
58
|
+
[ZhipuErrorCode.API_NO_PERMISSION]: '无权访问此API',
|
|
59
|
+
[ZhipuErrorCode.API_OFFLINE]: 'API已下线',
|
|
60
|
+
[ZhipuErrorCode.API_NOT_EXIST]: 'API不存在',
|
|
61
|
+
[ZhipuErrorCode.POLICY_BLOCKED]: '输入或生成内容可能包含不安全或敏感内容',
|
|
62
|
+
[ZhipuErrorCode.CONCURRENCY_LIMIT]: '并发数过高,请降低并发',
|
|
63
|
+
[ZhipuErrorCode.RATE_LIMIT]: '请求频率过高,请降低频率',
|
|
64
|
+
[ZhipuErrorCode.DAILY_LIMIT]: '已达今日调用次数限额',
|
|
65
|
+
[ZhipuErrorCode.TRAFFIC_LIMIT]: '已触发流量限制',
|
|
66
|
+
};
|
|
67
|
+
/**
|
|
68
|
+
* HTTP错误类
|
|
69
|
+
*/
|
|
70
|
+
export class HttpError extends Error {
|
|
71
|
+
code;
|
|
72
|
+
statusCode;
|
|
73
|
+
constructor(message, code, statusCode) {
|
|
74
|
+
super(message);
|
|
75
|
+
this.code = code;
|
|
76
|
+
this.statusCode = statusCode;
|
|
77
|
+
this.name = 'HttpError';
|
|
78
|
+
}
|
|
79
|
+
}
|
|
80
|
+
/**
|
|
81
|
+
* 创建axios实例
|
|
82
|
+
*/
|
|
83
|
+
const http = axios.create({
|
|
84
|
+
baseURL: config.baseUrl,
|
|
85
|
+
timeout: config.timeout,
|
|
86
|
+
headers: {
|
|
87
|
+
'Content-Type': 'application/json',
|
|
88
|
+
Authorization: `Bearer ${process.env.KEY || ''}`,
|
|
89
|
+
},
|
|
90
|
+
});
|
|
91
|
+
/**
|
|
92
|
+
* 请求拦截器
|
|
93
|
+
*/
|
|
94
|
+
http.interceptors.request.use((requestConfig) => {
|
|
95
|
+
// 可以在这里添加请求日志
|
|
96
|
+
return requestConfig;
|
|
97
|
+
}, (error) => {
|
|
98
|
+
return Promise.reject(error);
|
|
99
|
+
});
|
|
100
|
+
/**
|
|
101
|
+
* 响应拦截器
|
|
102
|
+
*/
|
|
103
|
+
http.interceptors.response.use((response) => {
|
|
104
|
+
return response.data;
|
|
105
|
+
}, (error) => {
|
|
106
|
+
const statusCode = error.response?.status;
|
|
107
|
+
const responseData = error.response?.data;
|
|
108
|
+
// 处理HTTP状态码错误
|
|
109
|
+
if (statusCode) {
|
|
110
|
+
switch (statusCode) {
|
|
111
|
+
case 400:
|
|
112
|
+
return Promise.reject(new HttpError('参数错误', responseData?.error?.code, statusCode));
|
|
113
|
+
case 401:
|
|
114
|
+
return Promise.reject(new HttpError('鉴权失败或Token超时', responseData?.error?.code, statusCode));
|
|
115
|
+
case 404:
|
|
116
|
+
return Promise.reject(new HttpError('功能未开放或任务不存在', responseData?.error?.code, statusCode));
|
|
117
|
+
case 429:
|
|
118
|
+
return Promise.reject(new HttpError('请求频率过高或并发超额', responseData?.error?.code, statusCode));
|
|
119
|
+
case 500:
|
|
120
|
+
return Promise.reject(new HttpError('服务器内部错误', responseData?.error?.code, statusCode));
|
|
121
|
+
default:
|
|
122
|
+
return Promise.reject(new HttpError(`HTTP错误: ${statusCode}`, responseData?.error?.code, statusCode));
|
|
123
|
+
}
|
|
124
|
+
}
|
|
125
|
+
// 处理业务错误码
|
|
126
|
+
if (responseData?.error?.code) {
|
|
127
|
+
const errorCode = responseData.error.code;
|
|
128
|
+
const errorMessage = ERROR_MESSAGES[errorCode] || responseData.error.message || '未知错误';
|
|
129
|
+
return Promise.reject(new HttpError(errorMessage, errorCode, statusCode));
|
|
130
|
+
}
|
|
131
|
+
// 网络错误
|
|
132
|
+
if (error.code === 'ECONNABORTED') {
|
|
133
|
+
return Promise.reject(new HttpError('请求超时', undefined, undefined));
|
|
134
|
+
}
|
|
135
|
+
return Promise.reject(new HttpError(error.message || '网络错误', undefined, undefined));
|
|
136
|
+
});
|
|
137
|
+
export { http };
|
package/package.json
ADDED
|
@@ -0,0 +1,51 @@
|
|
|
1
|
+
{
|
|
2
|
+
"name": "multi-modal-mcp",
|
|
3
|
+
"version": "0.0.1",
|
|
4
|
+
"description": "multi-modal MCP server",
|
|
5
|
+
"type": "module",
|
|
6
|
+
"bin": {
|
|
7
|
+
"multi-modal-mcp": "./dist/index.js"
|
|
8
|
+
},
|
|
9
|
+
"files": [
|
|
10
|
+
"dist",
|
|
11
|
+
"README.md",
|
|
12
|
+
"LICENSE"
|
|
13
|
+
],
|
|
14
|
+
"scripts": {
|
|
15
|
+
"build": "tsc && mcp-build",
|
|
16
|
+
"watch": "tsc --watch",
|
|
17
|
+
"start": "node dist/index.js",
|
|
18
|
+
"debug": "npx @modelcontextprotocol/inspector node dist/index.js",
|
|
19
|
+
"publish": "npx tsx scripts/publish.ts"
|
|
20
|
+
},
|
|
21
|
+
"dependencies": {
|
|
22
|
+
"axios": "^1.13.2",
|
|
23
|
+
"mcp-framework": "^0.2.2",
|
|
24
|
+
"zod": "^3.22.4"
|
|
25
|
+
},
|
|
26
|
+
"devDependencies": {
|
|
27
|
+
"@modelcontextprotocol/inspector": "^0.18.0",
|
|
28
|
+
"@types/node": "^20.11.24",
|
|
29
|
+
"typescript": "^5.3.3"
|
|
30
|
+
},
|
|
31
|
+
"engines": {
|
|
32
|
+
"node": ">=18.19.0"
|
|
33
|
+
},
|
|
34
|
+
"publishConfig": {
|
|
35
|
+
"access": "public"
|
|
36
|
+
},
|
|
37
|
+
"keywords": [
|
|
38
|
+
"mcp",
|
|
39
|
+
"multi-modal",
|
|
40
|
+
"ai",
|
|
41
|
+
"glm",
|
|
42
|
+
"cogview",
|
|
43
|
+
"cogvideo",
|
|
44
|
+
"text-generation",
|
|
45
|
+
"image-generation",
|
|
46
|
+
"video-generation",
|
|
47
|
+
"model-context-protocol"
|
|
48
|
+
],
|
|
49
|
+
"author": "橘子",
|
|
50
|
+
"license": "MIT"
|
|
51
|
+
}
|