npm - @idk500/video-vision-mcp - Versions diffs - 1.2.0 - Mend

@idk500/video-vision-mcp 1.2.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (11) hide show

package/LICENSE +22 -0
package/README.md +136 -0
package/dist/frame-extractor.d.ts +28 -0
package/dist/frame-extractor.js +246 -0
package/dist/hunyuan-client.d.ts +95 -0
package/dist/hunyuan-client.js +319 -0
package/dist/index.d.ts +24 -0
package/dist/index.js +813 -0
package/dist/video-processor.d.ts +68 -0
package/dist/video-processor.js +478 -0
package/package.json +67 -0

package/LICENSE ADDED Viewed

@@ -0,0 +1,22 @@
+MIT License
+Copyright (c) 2025 pickstar-2002
+Copyright (c) 2025 idk500 (Bigmodel glm-4.6v-flash backend fork)
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.

package/README.md ADDED Viewed

@@ -0,0 +1,136 @@
+# 🎬 Video Vision MCP
+[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
+[![Node.js Version](https://img.shields.io/badge/node-%3E%3D18.0.0-brightgreen.svg)](https://nodejs.org/)
+[![TypeScript](https://img.shields.io/badge/TypeScript-007ACC?logo=typescript&logoColor=white)](https://www.typescriptlang.org/)
+> 🚀 基于 Model Context Protocol (MCP) 的视频分析与拍摄脚本生成工具，使用 OpenAI 兼容的多模态视觉模型（默认：智谱 Bigmodel glm-4.6v-flash，免费可用）
+## ✨ 简介
+Video Vision MCP 是一个专业的视频分析和脚本生成工具，通过 MCP 协议为 AI 助手提供强大的视频处理能力。它可以从视频中提取关键帧，使用多模态视觉模型进行智能内容分析，并生成专业的拍摄脚本。
+> **Fork 自 [pickstar-2002/video-capture-script-mcp](https://github.com/pickstar-2002/video-capture-script-mcp)**，主要改动：将视觉后端从腾讯混元替换为 OpenAI 兼容接口（默认智谱 Bigmodel `glm-4.6v-flash`，免费可用），无需腾讯云密钥。感谢原作者的开源贡献。
+## 🎯 主要功能
+- 🖼️ **智能帧提取**: 支持多种策略提取视频关键帧
+  - 均匀间隔提取 (uniform)
+  - 关键帧提取 (keyframe)
+  - 场景变化检测 (scene_change)
+- 🤖 **AI 内容分析**: 使用多模态视觉模型分析视频/图片内容
+- 🎬 **拍摄脚本生成**: 基于视频分析结果，生成专业拍摄脚本
+  - 支持多种类型：商业广告、纪录片、教学视频、叙事视频
+  - 可自定义目标受众、拍摄风格、时长要求
+- 📊 **批量图片分析**: 批量分析多张图片内容
+- 📹 **视频信息获取**: 获取视频时长、分辨率、帧率等元数据
+## 📦 安装
+### 在 MCP 兼容工具中配置
+```json
+{
+  "mcpServers": {
+    "video-vision-mcp": {
+      "command": "npx",
+      "args": ["-y", "@idk500/video-vision-mcp@latest"],
+      "env": {
+        "VISION_API_KEY": "your_api_key_here"
+      }
+    }
+  }
+}
+```
+### 本地开发
+```bash
+git clone https://github.com/idk500/video-capture-script-mcp.git
+cd video-capture-script-mcp
+npm install
+npm run build
+```
+## 🔑 配置（视觉模型 API Key）
+视觉分析功能需要一个支持图片输入的多模态模型 API Key。**默认使用智谱 Bigmodel `glm-4.6v-flash`（免费）**。
+### 获取 Key
+1. 访问 https://open.bigmodel.cn/usercenter/apikeys
+2. 注册/登录智谱开放平台
+3. 创建 API Key（格式：`xxxxxxxxxxxxxxxx.xxxxxxxxxxxxxxxx`）
+### 配置方式（任选其一）
+**环境变量（推荐）：**
+```bash
+export VISION_API_KEY=your_api_key_here
+# 可选：自定义端点和模型
+export VISION_ENDPOINT=https://open.bigmodel.cn/api/paas/v4
+export VISION_MODEL=glm-4.6v-flash
+export TEXT_MODEL=glm-4.6
+```
+**启动参数：**
+```bash
+node dist/index.js --secret-id your_api_key_here
+```
+### 切换其他 OpenAI 兼容后端
+只要接口兼容 OpenAI Chat Completions（支持 `image_url` 内容），都可以用：
+```bash
+export VISION_ENDPOINT=https://your-openai-compatible-endpoint/v1
+export VISION_MODEL=your-vision-model
+export TEXT_MODEL=your-text-model
+export VISION_API_KEY=your-key
+```
+## 🛠️ 可用工具
+| 工具 | 说明 | 需要视觉模型 |
+|------|------|:---:|
+| `extract_video_frames` | 从视频提取关键帧（本地，依赖 FFmpeg） | 否 |
+| `get_video_info` | 获取视频时长/分辨率/帧率等元信息 | 否 |
+| `analyze_video_content` | 抽帧后送入视觉模型，总结视频内容 | 是 |
+| `analyze_image_batch` | 批量分析图片内容 | 是 |
+| `generate_video_script` | 抽帧→视觉理解→生成专业拍摄脚本 | 是 |
+| `generate_image_script` | 基于多张图片生成拍摄脚本 | 是 |
+**脚本类型**：`commercial`（商业广告）/ `documentary`（纪录片）/ `tutorial`（教学）/ `narrative`（叙事）/ `custom`（自定义）
+## 📋 系统要求与依赖
+- **Node.js** >= 18.0.0
+- **FFmpeg**（帧提取与视频信息）
+  - Windows: `choco install ffmpeg`
+  - macOS: `brew install ffmpeg`
+  - Linux: `sudo apt install ffmpeg`
+## 📝 开发
+```bash
+npm run dev      # 开发模式（tsx）
+npm run build    # 编译到 dist/
+npm run lint     # 代码检查
+```
+## 📄 许可证
+[MIT](LICENSE) License
+Copyright (c) 2025 pickstar-2002（原作者）
+Copyright (c) 2025 idk500（Bigmodel 后端改造）
+## 🙏 致谢
+本项目 Fork 自 [pickstar-2002/video-capture-script-mcp](https://github.com/pickstar-2002/video-capture-script-mcp)，感谢原作者的完整实现。Fork 改动点：将腾讯混元视觉后端替换为 OpenAI 兼容接口（默认智谱 Bigmodel glm-4.6v-flash）。
+## 🐛 问题反馈
+请在 [GitHub Issues](https://github.com/idk500/video-capture-script-mcp/issues) 中反馈问题或建议。

package/dist/frame-extractor.d.ts ADDED Viewed

@@ -0,0 +1,28 @@
+export interface FrameExtractionOptions {
+    maxFrames: number;
+    outputDir?: string;
+    strategy: 'uniform' | 'keyframe' | 'scene_change';
+    quality?: number;
+}
+export interface VideoInfo {
+    duration: number;
+    width: number;
+    height: number;
+    frameRate: number;
+    frameCount: number;
+    format: string;
+}
+export declare class FrameExtractor {
+    private defaultOutputDir;
+    constructor();
+    private ensureOutputDir;
+    getVideoInfo(videoPath: string): Promise<VideoInfo>;
+    private parseFrameRate;
+    extractFrames(videoPath: string, options: FrameExtractionOptions): Promise<string[]>;
+    private calculateTimestamps;
+    private generateUniformTimestamps;
+    private detectKeyframes;
+    private detectSceneChanges;
+    private extractFrameAtTimestamp;
+    cleanupFrames(framePaths: string[]): Promise<void>;
+}

package/dist/frame-extractor.js ADDED Viewed

@@ -0,0 +1,246 @@
+import ffmpeg from 'fluent-ffmpeg';
+import path from 'path';
+import { promises as fs } from 'fs';
+export class FrameExtractor {
+    defaultOutputDir = './temp_frames';
+    constructor() {
+        // 确保临时目录存在
+        this.ensureOutputDir(this.defaultOutputDir);
+    }
+    async ensureOutputDir(dir) {
+        try {
+            await fs.access(dir);
+        }
+        catch {
+            await fs.mkdir(dir, { recursive: true });
+        }
+    }
+    async getVideoInfo(videoPath) {
+        return new Promise((resolve, reject) => {
+            console.error(`分析视频文件: ${videoPath}`);
+            ffmpeg.ffprobe(videoPath, (err, metadata) => {
+                if (err) {
+                    let errorMessage = `获取视频信息失败: ${err.message}`;
+                    if (err.message.includes('No such file')) {
+                        errorMessage = `视频文件不存在: ${videoPath}`;
+                    }
+                    else if (err.message.includes('Invalid data')) {
+                        errorMessage = `视频文件格式无效或损坏: ${videoPath}`;
+                    }
+                    else if (err.message.includes('Permission denied')) {
+                        errorMessage = `无权限访问视频文件: ${videoPath}`;
+                    }
+                    else if (err.message.includes('ffprobe')) {
+                        errorMessage = `FFmpeg未正确安装或配置。请确保FFmpeg已正确安装并可在命令行中使用。\n原始错误: ${err.message}`;
+                    }
+                    reject(new Error(errorMessage));
+                    return;
+                }
+                if (!metadata || !metadata.streams) {
+                    reject(new Error(`视频文件元数据获取失败: ${videoPath}`));
+                    return;
+                }
+                const videoStream = metadata.streams.find(s => s.codec_type === 'video');
+                if (!videoStream) {
+                    reject(new Error(`视频文件中未找到视频流，可能是纯音频文件: ${videoPath}`));
+                    return;
+                }
+                const duration = metadata.format.duration || 0;
+                if (duration <= 0) {
+                    reject(new Error(`视频时长无效 (${duration}秒)，可能是损坏的视频文件: ${videoPath}`));
+                    return;
+                }
+                const frameRate = this.parseFrameRate(videoStream.r_frame_rate || '25/1');
+                const frameCount = Math.floor(duration * frameRate);
+                const videoInfo = {
+                    duration,
+                    width: videoStream.width || 0,
+                    height: videoStream.height || 0,
+                    frameRate,
+                    frameCount,
+                    format: metadata.format.format_name || 'unknown',
+                };
+                console.error(`视频信息获取成功 - 时长: ${duration.toFixed(2)}s, 分辨率: ${videoInfo.width}x${videoInfo.height}, 帧率: ${frameRate.toFixed(2)}fps`);
+                resolve(videoInfo);
+            });
+        });
+    }
+    parseFrameRate(frameRateStr) {
+        const parts = frameRateStr.split('/');
+        if (parts.length === 2) {
+            return parseInt(parts[0]) / parseInt(parts[1]);
+        }
+        return parseFloat(frameRateStr) || 25;
+    }
+    async extractFrames(videoPath, options) {
+        try {
+            const outputDir = options.outputDir || this.defaultOutputDir;
+            await this.ensureOutputDir(outputDir);
+            console.error(`开始提取视频帧 - 输出目录: ${outputDir}`);
+            const videoInfo = await this.getVideoInfo(videoPath);
+            // 验证提取参数
+            if (options.maxFrames <= 0) {
+                throw new Error(`最大帧数必须大于0，当前值: ${options.maxFrames}`);
+            }
+            if (options.maxFrames > 100) {
+                console.warn(`警告: 请求提取大量帧 (${options.maxFrames})，这可能会消耗大量存储空间和处理时间`);
+            }
+            const timestamps = await this.calculateTimestamps(videoPath, videoInfo, options);
+            if (timestamps.length === 0) {
+                throw new Error('无法计算有效的时间戳，可能是视频太短或参数设置有误');
+            }
+            console.error(`计算出 ${timestamps.length} 个提取时间点: ${timestamps.map(t => t.toFixed(2)).join(', ')}s`);
+            const framePaths = [];
+            const videoName = path.basename(videoPath, path.extname(videoPath));
+            let successCount = 0;
+            let failureCount = 0;
+            for (let i = 0; i < timestamps.length; i++) {
+                const timestamp = timestamps[i];
+                const framePath = path.join(outputDir, `${videoName}_frame_${i + 1}_${timestamp.toFixed(2)}s.jpg`);
+                try {
+                    console.error(`提取第${i + 1}/${timestamps.length}帧 - 时间: ${timestamp.toFixed(2)}s`);
+                    await this.extractFrameAtTimestamp(videoPath, timestamp, framePath, options.quality || 90);
+                    framePaths.push(framePath);
+                    successCount++;
+                    console.error(`第${i + 1}帧提取成功: ${framePath}`);
+                }
+                catch (error) {
+                    failureCount++;
+                    console.error(`第${i + 1}帧提取失败 (时间: ${timestamp.toFixed(2)}s):`, error);
+                    // 如果连续失败太多，停止提取
+                    if (failureCount >= 3 && successCount === 0) {
+                        throw new Error(`连续多帧提取失败，停止处理。可能原因:\n1. FFmpeg配置问题\n2. 视频文件损坏\n3. 磁盘空间不足\n4. 输出目录权限问题`);
+                    }
+                }
+            }
+            console.error(`帧提取完成 - 成功: ${successCount}, 失败: ${failureCount}`);
+            if (framePaths.length === 0) {
+                throw new Error('所有帧提取都失败了，请检查视频文件和FFmpeg配置');
+            }
+            return framePaths;
+        }
+        catch (error) {
+            console.error(`视频帧提取过程失败:`, error);
+            throw error;
+        }
+    }
+    async calculateTimestamps(videoPath, videoInfo, options) {
+        const { duration } = videoInfo;
+        const { maxFrames, strategy } = options;
+        switch (strategy) {
+            case 'uniform':
+                return this.generateUniformTimestamps(duration, maxFrames);
+            case 'keyframe':
+                return await this.detectKeyframes(videoPath, maxFrames);
+            case 'scene_change':
+                return await this.detectSceneChanges(videoPath, maxFrames);
+            default:
+                return this.generateUniformTimestamps(duration, maxFrames);
+        }
+    }
+    generateUniformTimestamps(duration, maxFrames) {
+        const timestamps = [];
+        const interval = duration / (maxFrames + 1);
+        for (let i = 1; i <= maxFrames; i++) {
+            timestamps.push(interval * i);
+        }
+        return timestamps;
+    }
+    async detectKeyframes(videoPath, maxFrames) {
+        return new Promise((resolve, reject) => {
+            // 简化方法：直接回退到均匀采样，避免复杂的FFmpeg参数问题
+            console.warn('Using uniform sampling instead of keyframe detection for better compatibility');
+            // 首先获取视频时长
+            ffmpeg.ffprobe(videoPath, (err, metadata) => {
+                if (err) {
+                    console.warn('Could not get video duration, using default 60s');
+                    resolve(this.generateUniformTimestamps(60, maxFrames));
+                    return;
+                }
+                const duration = metadata.format.duration || 60;
+                resolve(this.generateUniformTimestamps(duration, maxFrames));
+            });
+        });
+    }
+    async detectSceneChanges(videoPath, maxFrames) {
+        return new Promise((resolve, reject) => {
+            // 简化方法：直接回退到均匀采样，避免复杂的FFmpeg参数问题
+            console.warn('Using uniform sampling instead of scene detection for better compatibility');
+            // 首先获取视频时长
+            ffmpeg.ffprobe(videoPath, (err, metadata) => {
+                if (err) {
+                    console.warn('Could not get video duration, using default 60s');
+                    resolve(this.generateUniformTimestamps(60, maxFrames));
+                    return;
+                }
+                const duration = metadata.format.duration || 60;
+                resolve(this.generateUniformTimestamps(duration, maxFrames));
+            });
+        });
+    }
+    async extractFrameAtTimestamp(videoPath, timestamp, outputPath, quality = 90) {
+        return new Promise((resolve, reject) => {
+            // 验证参数
+            if (timestamp < 0) {
+                reject(new Error(`时间戳不能为负数: ${timestamp}`));
+                return;
+            }
+            if (quality < 1 || quality > 100) {
+                console.warn(`质量参数超出范围 (${quality})，将使用默认值90`);
+                quality = 90;
+            }
+            const timeoutId = setTimeout(() => {
+                reject(new Error(`帧提取超时 (${timestamp}s) - 可能是视频文件问题或FFmpeg响应慢`));
+            }, 30000); // 30秒超时
+            ffmpeg(videoPath)
+                .seekInput(timestamp)
+                .frames(1)
+                .outputOptions([`-q:v ${Math.floor((100 - quality) / 10)}`]) // 转换质量参数
+                .output(outputPath)
+                .on('start', (commandLine) => {
+                console.error(`FFmpeg命令: ${commandLine}`);
+            })
+                .on('end', () => {
+                clearTimeout(timeoutId);
+                resolve();
+            })
+                .on('error', (err) => {
+                clearTimeout(timeoutId);
+                let errorMessage = `帧提取失败 (时间: ${timestamp.toFixed(2)}s): ${err.message}`;
+                if (err.message.includes('Invalid data')) {
+                    errorMessage = `在时间点 ${timestamp.toFixed(2)}s 处的视频数据无效，可能超出视频时长或视频损坏`;
+                }
+                else if (err.message.includes('No such file')) {
+                    errorMessage = `视频文件在处理过程中丢失: ${videoPath}`;
+                }
+                else if (err.message.includes('Permission denied')) {
+                    errorMessage = `无权限写入输出文件: ${outputPath}`;
+                }
+                else if (err.message.includes('No space left')) {
+                    errorMessage = `磁盘空间不足，无法保存帧文件: ${outputPath}`;
+                }
+                reject(new Error(errorMessage));
+            })
+                .run();
+        });
+    }
+    async cleanupFrames(framePaths) {
+        let successCount = 0;
+        let failureCount = 0;
+        console.error(`开始清理 ${framePaths.length} 个临时帧文件...`);
+        for (const framePath of framePaths) {
+            try {
+                await fs.unlink(framePath);
+                successCount++;
+            }
+            catch (error) {
+                failureCount++;
+                console.warn(`清理文件失败 ${framePath}:`, error);
+            }
+        }
+        console.error(`文件清理完成 - 成功: ${successCount}, 失败: ${failureCount}`);
+        if (failureCount > 0) {
+            console.warn(`部分临时文件清理失败，可能需要手动删除`);
+        }
+    }
+}

package/dist/hunyuan-client.d.ts ADDED Viewed

@@ -0,0 +1,95 @@
+export interface HunyuanConfig {
+    /** OpenAI-compatible API key (sent as `Authorization: Bearer <key>`). */
+    secretId?: string;
+    /** Kept for config-shape compatibility; ignored (auth uses secretId only). */
+    secretKey?: string;
+    /** Unused for OpenAI-compatible auth; kept for config-shape compatibility. */
+    region?: string;
+    /** Base URL, e.g. `https://open.bigmodel.cn/api/paas/v4`. */
+    endpoint?: string;
+    /** Vision model name. */
+    visionModel?: string;
+    /** Text model name. */
+    textModel?: string;
+}
+export interface ImageAnalysisResult {
+    content: string;
+    usage: {
+        promptTokens: number;
+        completionTokens: number;
+        totalTokens: number;
+    };
+}
+export interface TextGenerationResult {
+    content: string;
+    usage: {
+        promptTokens: number;
+        completionTokens: number;
+        totalTokens: number;
+    };
+}
+export interface Message {
+    Role: string;
+    Contents?: Array<{
+        Type: string;
+        Text?: string;
+        ImageUrl?: {
+            Url: string;
+        };
+    }>;
+    Content?: string;
+}
+export declare class HunyuanClient {
+    private apiKey?;
+    private endpoint;
+    private visionModel;
+    private textModel;
+    constructor(config?: HunyuanConfig);
+    /**
+     * Set credentials at runtime. For the OpenAI-compatible backend only the
+     * `secretId` (API key) is meaningful; `secretKey` is accepted for
+     * signature compatibility and ignored.
+     */
+    setCredentials(secretId: string, _secretKey?: string): void;
+    private resolveApiKey;
+    private resolveEndpoint;
+    private resolveVisionModel;
+    private resolveTextModel;
+    /** Map a file extension to its MIME type, defaulting to jpeg. */
+    private mimeFromExt;
+    /**
+     * Read an image and return a base64 string suitable for `image_url.url`.
+     * Returns a bare base64 string (no `data:` prefix) — this is what the
+     * Zhipu Bigmodel endpoint expects; an explicit `data:` URL is *not* added
+     * here because the backend rejects it for the glm-4.6v family.
+     */
+    private imageToBase64;
+    /** Sleep helper for retry backoff. */
+    private sleep;
+    /**
+     * POST to /chat/completions with retry on 429/5xx. The Zhipu free vision
+     * model throttles aggressively under load, so callers benefit from backoff.
+     */
+    private chatCompletions;
+    /**
+     * Analyze a single image. The backend is the OpenAI-compatible Chat
+     * Completions API with an `image_url` content part.
+     */
+    analyzeImage(imagePath: string, prompt?: string, apiKeyOverride?: string): Promise<ImageAnalysisResult>;
+    /**
+     * Analyze each image in sequence. Serial processing keeps us under the free
+     * model's QPS limit; failed images do not abort the batch.
+     */
+    analyzeImageBatch(imagePaths: string[], prompt?: string, apiKeyOverride?: string): Promise<ImageAnalysisResult[]>;
+    /**
+     * Analyze multiple images in a single request. Useful for video frames so
+     * the model can reason about sequence/context. Limited to 4 images to keep
+     * request size and cost bounded (matches the original behavior).
+     */
+    analyzeImagesInSingleRequest(imagePaths: string[], prompt?: string, apiKeyOverride?: string): Promise<ImageAnalysisResult>;
+    /**
+     * Generate text from a prompt. Used for the second pass of script
+     * generation, where we don't need vision — a plain text model is cheaper.
+     */
+    generateText(prompt: string, modelOverride?: string, apiKeyOverride?: string): Promise<TextGenerationResult>;
+}