npm - @visionengine/video-recognize - Versions diffs - 1.0.0 - Mend

@visionengine/video-recognize 1.0.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (5) hide show

package/README-zh.md ADDED Viewed

@@ -0,0 +1,104 @@
+# VE 视频理解 MCP
+通过 `ve-backend` 代理实现视频理解的异步 MCP 服务。
+## 环境变量
+- `API_URL`：后端代理地址，默认 `https://api.visionengine-tech.com/api/v1/video`
+- `API_KEY`：VisionEngine 平台用户 API Key（用于 submit/query 和 remote 上传，必填）
+- `MODEL`：平台模型 ID，默认 `@preset/vec-1-0-video-recognize`
+- `WORKDIR`：本地工作目录根路径
+- `FILE_MODE`：本地文件处理模式，`local` 或 `remote`，默认 `remote`
+- `REMOTION_WORK_DIR`：`local` 模式下共享挂载根目录，默认 `/vec`
+- `BASE_URL`：后端对外基础地址，用于拼接 `/save` 和 `/shared` 链接，默认 `https://api.visionengine-tech.com`
+- 远程模式上传目录为代码内置：`public/videos`
+## 工具
+- `submit`
+- `query`
+### `submit`
+用于提交异步视频理解任务，提交成功后会返回 `taskId`，供后续查询使用。
+支持的 `taskType`：
+- `understand`
+- `cut_effect_points`
+- `emotion_analysis`
+- `script_generate`
+- `style_analyze`
+输入统一为一个参数：
+- `video`：既可以是公网可访问视频 URL，也可以是本地文件路径
+当 `video` 为本地文件路径时：
+- `FILE_MODE=local`：校验文件位于 `REMOTION_WORK_DIR` 下后，MCP 会传递相对于 `REMOTION_WORK_DIR` 的路径，由后端在内部解析为本地文件输入
+- `FILE_MODE=remote`（默认）：先调用后端 `/save` 上传文件，再将返回路径拼接为 `/shared/...?...download=true` 下载链接
+第一版统一使用 `stream=false`，`submit` 仅负责返回稳定的任务提交结果，最终识别结果请通过 `query` 获取。
+支持可选分析范围参数：
+- `analysisRange.type`：`time` 或 `frame`
+- `analysisRange.startSec` / `analysisRange.endSec`：按秒指定分析区间
+- `analysisRange.startFrame` / `analysisRange.endFrame`：按帧指定分析区间
+约束规则：
+- `type=time` 时只能传 `startSec` / `endSec`
+- `type=frame` 时只能传 `startFrame` / `endFrame`
+- 至少提供一个边界
+- 支持单边省略，例如 `{ type: "time", startSec: 30 }`
+提交示例：
+```json
+{
+  "video": "https://example.com/demo.mp4",
+  "analysisRange": {
+    "type": "time",
+    "startSec": 5,
+    "endSec": 20
+  },
+  "taskType": "understand",
+  "responseFormat": "json_object"
+}
+```
+### `query`
+用于根据 `taskId` 查询任务状态。
+- 如果任务仍在执行中，会返回当前状态并提示稍后再次查询
+- 如果任务成功或部分成功，会自动继续请求 `/task/{taskId}/result` 并一并返回最终结构化结果
+- 如果任务失败或已取消，会返回状态、消息和错误信息
+典型调用流程：
+1. 调用 `submit`
+2. 等待片刻
+3. 使用返回的 `taskId` 调用 `query`
+4. 若仍未完成，继续调用 `query` 直到任务结束
+```json
+{
+  "mcpServers": {
+    "ve-video-recognize": {
+      "type": "local",
+      "command": "npx",
+      "args": ["-y", "@visionengine/video-recognize@latest"],
+      "transport": "stdio",
+      "env": {
+        "API_KEY": "你的API密钥",
+        "WORKDIR": "./",
+        "FILE_MODE": "remote",
+        "REMOTION_WORK_DIR": "/vec"
+      }
+    }
+  }
+}
+```

package/README.md ADDED Viewed

@@ -0,0 +1,105 @@
+# VE Video Recognize MCP
+Async MCP server for video understanding via `ve-backend` proxy.
+## Environment
+- `API_URL`: backend proxy url, default `https://api.visionengine-tech.com/api/v1/video`
+- `API_KEY`: user API key from VisionEngine backend (required for submit/query and remote upload)
+- `MODEL`: platform model id, default `@preset/vec-1-0-video-recognize`
+- `WORKDIR`: local workspace root
+- `FILE_MODE`: local file handling mode, `local` or `remote`, default `remote`
+- `REMOTION_WORK_DIR`: shared mount root used in `local` mode, default `/vec`
+- `BASE_URL`: backend public base url used for `/save` and `/shared` links, default `https://api.visionengine-tech.com`
+- Remote upload path is built-in in code: `public/videos`
+## Tools
+- `submit`
+- `query`
+### `submit`
+Submit an async video understanding task and receive a `taskId` for later polling.
+Supported task types:
+- `understand`
+- `cut_effect_points`
+- `emotion_analysis`
+- `script_generate`
+- `style_analyze`
+Input uses a single parameter:
+- `video`: can be either a public video URL or a local file path
+When `video` is a local file path:
+- `FILE_MODE=local`: after validating the file is under `REMOTION_WORK_DIR`, MCP sends a path relative to `REMOTION_WORK_DIR`, and backend resolves it as local file input internally
+- `FILE_MODE=remote` (default): upload local file to backend `/save`, then convert returned path to `/shared/...?...download=true` URL
+First version always submits with `stream=false` and returns a stable task-oriented payload. Use `query` to retrieve final results.
+Supported optional analysis range parameter:
+- `analysisRange.type`: `time` or `frame`
+- `analysisRange.startSec` / `analysisRange.endSec`: select a time range in seconds
+- `analysisRange.startFrame` / `analysisRange.endFrame`: select a frame range
+Rules:
+- `type=time` only allows `startSec` / `endSec`
+- `type=frame` only allows `startFrame` / `endFrame`
+- at least one boundary is required
+- single-sided ranges are supported, for example `{ type: "time", startSec: 30 }`
+Example submit parameters:
+```json
+{
+  "video": "https://example.com/demo.mp4",
+  "analysisRange": {
+    "type": "time",
+    "startSec": 5,
+    "endSec": 20
+  },
+  "taskType": "understand",
+  "responseFormat": "json_object"
+}
+```
+### `query`
+Query a submitted task by `taskId`.
+- If the task is still running, the tool returns the current status and asks the caller to try again later.
+- If the task succeeds or partially succeeds, the tool automatically fetches `/task/{taskId}/result` and returns the final structured result.
+- If the task failed or was canceled, the tool returns the status and backend message/error.
+Typical flow:
+1. Call `submit`
+2. Wait a short time
+3. Call `query` with the returned `taskId`
+4. Repeat `query` until the task finishes
+## Example MCP config
+```json
+{
+  "mcpServers": {
+    "ve-video-recognize": {
+      "command": "npx",
+      "args": ["-y", "@visionengine/video-recognize@latest"],
+      "transport": "stdio",
+      "env": {
+        "API_KEY": "<YOUR_API_KEY>",
+        "WORKDIR": "./",
+        "FILE_MODE": "remote",
+        "REMOTION_WORK_DIR": "/vec"
+      }
+    }
+  }
+}
+```

package/dist/index.js ADDED Viewed

@@ -0,0 +1,5 @@
+#!/usr/bin/env node
+import { server } from "./server.js";
+server.start({
+    transportType: "stdio",
+});

package/dist/server.js ADDED Viewed

@@ -0,0 +1,320 @@
+import { FastMCP } from "fastmcp";
+import { z } from "zod";
+import * as fs from "fs";
+import * as path from "path";
+const server = new FastMCP({
+    name: "VE Video Recognize",
+    version: "1.0.0",
+});
+const API_URL = process.env.API_URL || "https://api.visionengine-tech.com/api/v1/video";
+const API_KEY = process.env.API_KEY || "";
+const MODEL = process.env.MODEL || "@preset/vec-1-0-video-recognize";
+const WORKDIR = process.env.WORKDIR || "./";
+const FILE_MODE = (process.env.FILE_MODE || "remote").toLowerCase();
+const REMOTION_WORK_DIR = process.env.REMOTION_WORK_DIR || "/vec";
+const BASE_URL = process.env.BASE_URL || "https://api.visionengine-tech.com";
+const REMOTE_UPLOAD_PATH = "public/videos";
+function getWorkDir() {
+    return path.isAbsolute(WORKDIR) ? WORKDIR : path.resolve(process.cwd(), WORKDIR);
+}
+function resolveInputPath(inputPath) {
+    return path.isAbsolute(inputPath) ? inputPath : path.resolve(getWorkDir(), inputPath);
+}
+function isHttpUrl(value) {
+    return /^https?:\/\//i.test(value);
+}
+function toPosixPath(value) {
+    return value.replace(/\\/g, "/");
+}
+function normalizeMode(value) {
+    return value === "local" ? "local" : "remote";
+}
+function getBaseUrl() {
+    return BASE_URL.replace(/\/+$/, "");
+}
+function buildSharedDownloadUrl(relativePath) {
+    const normalizedRelativePath = toPosixPath(relativePath).replace(/^\/+/, "");
+    const encodedRelativePath = normalizedRelativePath
+        .split("/")
+        .filter(Boolean)
+        .map((segment) => encodeURIComponent(segment))
+        .join("/");
+    return `${getBaseUrl()}/shared/${encodedRelativePath}?download=true`;
+}
+function resolveRemotionWorkDir() {
+    return path.isAbsolute(REMOTION_WORK_DIR)
+        ? path.resolve(REMOTION_WORK_DIR)
+        : path.resolve(getWorkDir(), REMOTION_WORK_DIR);
+}
+function toSharedRelativePathFromLocal(fullPath) {
+    const mountRoot = resolveRemotionWorkDir();
+    const normalizedMountRoot = path.resolve(mountRoot);
+    const normalizedFullPath = path.resolve(fullPath);
+    const relativePath = path.relative(normalizedMountRoot, normalizedFullPath);
+    if (relativePath.startsWith("..") || path.isAbsolute(relativePath)) {
+        throw new Error(`File path is outside REMOTION_WORK_DIR: ${fullPath}`);
+    }
+    return toPosixPath(relativePath).replace(/^\/+/, "");
+}
+async function uploadFileToBackend(fullPath) {
+    if (!API_KEY) {
+        throw new Error("API_KEY environment variable is required");
+    }
+    const fileName = path.basename(fullPath);
+    const fileBuffer = await fs.promises.readFile(fullPath);
+    const formData = new FormData();
+    formData.set("file", new Blob([fileBuffer]), fileName);
+    formData.set("file_name", fileName);
+    formData.set("path", REMOTE_UPLOAD_PATH);
+    const response = await fetch(`${getBaseUrl()}/save`, {
+        method: "POST",
+        headers: {
+            Authorization: `Bearer ${API_KEY}`,
+        },
+        body: formData,
+    });
+    if (!response.ok) {
+        throw new Error(`Backend save API error (${response.status}): ${await response.text()}`);
+    }
+    const result = (await response.json());
+    if (!result.success || !result.file?.path) {
+        throw new Error("Backend save API response is missing file.path");
+    }
+    return buildSharedDownloadUrl(result.file.path);
+}
+async function normalizeVideoInput(video) {
+    if (isHttpUrl(video)) {
+        return video;
+    }
+    const fullPath = resolveInputPath(video);
+    if (!fs.existsSync(fullPath)) {
+        throw new Error(`Video file not found: ${fullPath}`);
+    }
+    const mode = normalizeMode(FILE_MODE);
+    if (mode === "local") {
+        toSharedRelativePathFromLocal(fullPath);
+        return path.relative(resolveRemotionWorkDir(), fullPath).replace(/\\/g, "/") || path.basename(fullPath);
+    }
+    return await uploadFileToBackend(fullPath);
+}
+async function requestJson(url, init) {
+    if (!API_KEY) {
+        throw new Error("API_KEY environment variable is required");
+    }
+    const response = await fetch(url, {
+        method: init.method,
+        headers: {
+            Authorization: `Bearer ${API_KEY}`,
+            "Content-Type": "application/json",
+        },
+        body: init.body !== undefined ? JSON.stringify(init.body) : undefined,
+    });
+    if (!response.ok) {
+        throw new Error(`Backend API error (${response.status}): ${await response.text()}`);
+    }
+    return (await response.json());
+}
+function normalizeAnalysisRange(analysisRange) {
+    if (!analysisRange) {
+        return undefined;
+    }
+    if (analysisRange.type === "time") {
+        if (analysisRange.startFrame !== undefined || analysisRange.endFrame !== undefined) {
+            throw new Error("analysisRange of type=time does not allow startFrame/endFrame");
+        }
+        if (analysisRange.startSec === undefined && analysisRange.endSec === undefined) {
+            throw new Error("analysisRange of type=time requires at least one of startSec or endSec");
+        }
+        if (analysisRange.startSec !== undefined &&
+            analysisRange.endSec !== undefined &&
+            analysisRange.endSec <= analysisRange.startSec) {
+            throw new Error("analysisRange.endSec must be greater than analysisRange.startSec");
+        }
+        return {
+            type: "time",
+            start_sec: analysisRange.startSec,
+            end_sec: analysisRange.endSec,
+        };
+    }
+    if (analysisRange.startSec !== undefined || analysisRange.endSec !== undefined) {
+        throw new Error("analysisRange of type=frame does not allow startSec/endSec");
+    }
+    if (analysisRange.startFrame === undefined && analysisRange.endFrame === undefined) {
+        throw new Error("analysisRange of type=frame requires at least one of startFrame or endFrame");
+    }
+    if (analysisRange.startFrame !== undefined &&
+        analysisRange.endFrame !== undefined &&
+        analysisRange.endFrame <= analysisRange.startFrame) {
+        throw new Error("analysisRange.endFrame must be greater than analysisRange.startFrame");
+    }
+    return {
+        type: "frame",
+        start_frame: analysisRange.startFrame,
+        end_frame: analysisRange.endFrame,
+    };
+}
+function buildSubmitPayload(args) {
+    const segmentConfig = {};
+    if (args.forceSegment !== undefined)
+        segmentConfig.force_segment = args.forceSegment;
+    if (args.segmentDuration !== undefined)
+        segmentConfig.segment_duration = args.segmentDuration;
+    if (args.overlap !== undefined)
+        segmentConfig.overlap = args.overlap;
+    if (args.maxSegments !== undefined)
+        segmentConfig.max_segments = args.maxSegments;
+    const normalizedResponseFormat = args.responseFormat ?? "json_object";
+    const normalizedAnalysisRange = normalizeAnalysisRange(args.analysisRange);
+    return {
+        video: args.video,
+        analysis_range: normalizedAnalysisRange,
+        task_type: args.taskType,
+        prompt_mode: args.promptMode ?? "template",
+        user_prompt: args.userPrompt,
+        stream: args.stream ?? false,
+        response_format: normalizedResponseFormat === "json_object" ? { type: "json_object" } : "text",
+        model_id: args.modelId || MODEL,
+        segment_config: Object.keys(segmentConfig).length > 0 ? segmentConfig : undefined,
+    };
+}
+function extractStatus(payload) {
+    return payload.task_status ?? payload.status ?? "UNKNOWN";
+}
+async function submitVideoRecognizeTask(args) {
+    const normalizedVideo = await normalizeVideoInput(args.video);
+    const payload = buildSubmitPayload({
+        video: normalizedVideo,
+        analysisRange: args.analysisRange,
+        taskType: args.taskType,
+        promptMode: args.promptMode,
+        userPrompt: args.userPrompt,
+        responseFormat: args.responseFormat,
+        stream: false,
+        forceSegment: args.forceSegment,
+        segmentDuration: args.segmentDuration,
+        overlap: args.overlap,
+        maxSegments: args.maxSegments,
+        modelId: args.modelId,
+    });
+    const submitResult = await requestJson(`${API_URL}/analyze`, {
+        method: "POST",
+        body: payload,
+    });
+    return {
+        success: submitResult.success ?? true,
+        taskId: submitResult.task_id ?? null,
+        status: extractStatus(submitResult),
+        taskType: submitResult.task_type ?? args.taskType,
+        video: normalizedVideo,
+        analysisRange: payload.analysis_range ?? null,
+        message: submitResult.message || "Video analysis task submitted successfully.",
+        nextAction: "Use the query tool with taskId to get task status or final result.",
+        raw: submitResult,
+    };
+}
+async function queryTask(taskId) {
+    return requestJson(`${API_URL}/task/${encodeURIComponent(taskId)}`, { method: "GET" });
+}
+async function getTaskResult(taskId) {
+    return requestJson(`${API_URL}/task/${encodeURIComponent(taskId)}/result`, { method: "GET" });
+}
+async function buildQueryResult(taskId) {
+    const statusResult = await queryTask(taskId);
+    const normalizedStatus = String(extractStatus(statusResult)).toUpperCase();
+    if (normalizedStatus === "SUCCEEDED" || normalizedStatus === "PARTIAL_SUCCESS") {
+        const resultPayload = await getTaskResult(taskId);
+        return {
+            success: resultPayload.success ?? statusResult.success ?? true,
+            taskId: resultPayload.task_id ?? statusResult.task_id ?? taskId,
+            status: extractStatus(resultPayload) !== "UNKNOWN" ? extractStatus(resultPayload) : extractStatus(statusResult),
+            taskType: resultPayload.task_type ?? statusResult.task_type ?? null,
+            createdAt: statusResult.created_at ?? null,
+            updatedAt: statusResult.updated_at ?? null,
+            result: resultPayload.result ?? null,
+            segmentResults: resultPayload.segment_results ?? null,
+            usage: resultPayload.usage ?? null,
+            error: resultPayload.error ?? null,
+            message: resultPayload.message || "Video analysis completed successfully.",
+            rawStatus: statusResult,
+            rawResult: resultPayload,
+        };
+    }
+    return {
+        success: statusResult.success ?? true,
+        taskId: statusResult.task_id ?? taskId,
+        taskStatus: extractStatus(statusResult),
+        taskType: statusResult.task_type ?? null,
+        createdAt: statusResult.created_at ?? null,
+        updatedAt: statusResult.updated_at ?? null,
+        error: statusResult.error ?? null,
+        message: normalizedStatus === "FAILED"
+            ? statusResult.message || "Video analysis task failed. Please check the input video and backend logs before retrying."
+            : normalizedStatus === "CANCELED" || normalizedStatus === "CANCELLED"
+                ? statusResult.message || "Video analysis task was canceled."
+                : statusResult.message || "Task is still running. Please query again later.",
+        rawStatus: statusResult,
+    };
+}
+server.addTool({
+    annotations: {
+        openWorldHint: true,
+        readOnlyHint: false,
+        title: "Submit Video Recognition Task",
+    },
+    description: "Submit a video understanding task to the backend proxy. Use a single video parameter: it may be a public URL or a local path. For local files, FILE_MODE=local sends the path relative to REMOTION_WORK_DIR and lets backend resolve it; FILE_MODE=remote uploads the file to /save first, then submits the returned shared URL. Also supports analysisRange so the backend only analyzes the selected time/frame range of the original video. First version always uses stream=false and supports task types: understand, cut_effect_points, emotion_analysis, script_generate, style_analyze.",
+    execute: async (args) => {
+        const result = await submitVideoRecognizeTask({
+            video: args.video,
+            analysisRange: args.analysisRange,
+            taskType: args.taskType,
+            promptMode: args.promptMode,
+            userPrompt: args.userPrompt,
+            responseFormat: args.responseFormat,
+            stream: args.stream,
+            forceSegment: args.forceSegment,
+            segmentDuration: args.segmentDuration,
+            overlap: args.overlap,
+            maxSegments: args.maxSegments,
+            modelId: args.modelId,
+        });
+        return JSON.stringify(result, null, 2);
+    },
+    name: "submit",
+    parameters: z.object({
+        video: z.string().describe("Single video input. Can be a public http(s) URL or a local file path. In FILE_MODE=local, local path should be under REMOTION_WORK_DIR and may be relative to WORKDIR; MCP will convert it to a path relative to REMOTION_WORK_DIR for backend resolution. In FILE_MODE=remote, local files are uploaded via /save before analysis."),
+        analysisRange: z.object({
+            type: z.enum(["time", "frame"]).describe("Select analysis range by time or frame."),
+            startSec: z.number().min(0).optional().describe("Start time in seconds for type=time."),
+            endSec: z.number().positive().optional().describe("End time in seconds for type=time."),
+            startFrame: z.number().int().min(0).optional().describe("Start frame index for type=frame."),
+            endFrame: z.number().int().positive().optional().describe("End frame index for type=frame."),
+        }).optional().describe("Optional hard analysis range. Supports selecting only a specific time segment or frame interval of the original video."),
+        taskType: z.enum(["understand", "cut_effect_points", "emotion_analysis", "script_generate", "style_analyze"]).describe("Video understanding task type."),
+        promptMode: z.enum(["template", "auto"]).optional().describe("Prompt mode for backend analysis. Default template."),
+        userPrompt: z.string().optional().describe("Optional custom prompt appended or used by the backend for the selected task."),
+        responseFormat: z.enum(["text", "json_object"]).optional().describe("Desired backend response format. json_object is recommended for structured outputs."),
+        stream: z.boolean().optional().describe("Reserved for compatibility. First version still forces non-streaming backend requests."),
+        forceSegment: z.boolean().optional().describe("Whether to force backend video segmentation."),
+        segmentDuration: z.number().positive().optional().describe("Backend segment duration in seconds."),
+        overlap: z.number().min(0).optional().describe("Backend segment overlap in seconds."),
+        maxSegments: z.number().int().positive().optional().describe("Maximum number of backend segments."),
+        modelId: z.string().optional().describe("Platform model ID override. Defaults to MODEL env."),
+    }),
+});
+server.addTool({
+    annotations: {
+        openWorldHint: true,
+        readOnlyHint: true,
+        title: "Query Video Recognition Task",
+    },
+    description: "Query a previously submitted video understanding task by taskId. If the task is finished, this tool also fetches and returns the final result payload.",
+    execute: async (args) => {
+        const result = await buildQueryResult(args.taskId);
+        return JSON.stringify(result, null, 2);
+    },
+    name: "query",
+    parameters: z.object({
+        taskId: z.string().describe("Task ID returned by the submit tool."),
+    }),
+});
+export { server };

package/package.json ADDED Viewed

@@ -0,0 +1,48 @@
+{
+  "name": "@visionengine/video-recognize",
+  "version": "1.0.0",
+  "description": "VisionEngine Video Recognize MCP Server - Async video understanding via backend proxy",
+  "main": "dist/index.js",
+  "type": "module",
+  "bin": {
+    "ve-video-recognize": "./dist/index.js"
+  },
+  "files": [
+    "dist",
+    "README.md",
+    "README-zh.md"
+  ],
+  "scripts": {
+    "build": "tsc",
+    "clear": "node -e \"require('fs').rmSync('dist', { recursive: true, force: true })\"",
+    "test": "vitest run",
+    "test:watch": "vitest",
+    "prepublishOnly": "npm run build"
+  },
+  "keywords": [
+    "mcp",
+    "video-recognize",
+    "video-understand",
+    "visionengine"
+  ],
+  "author": "team@visionengine-tech.com",
+  "license": "MIT",
+  "repository": {
+    "type": "git",
+    "url": "https://github.com/crazyyanchao/ve-mcp.git",
+    "directory": "packages/video-recognize"
+  },
+  "homepage": "https://visionengine-tech.com/mcp",
+  "dependencies": {
+    "fastmcp": "^3.26.8",
+    "zod": "^4.1.12"
+  },
+  "devDependencies": {
+    "@types/node": "^24.10.1",
+    "typescript": "^5.8.3",
+    "vitest": "^3.1.3"
+  },
+  "engines": {
+    "node": ">=18.0.0"
+  }
+}