npm - @hsiehchenwei/mcp-gemini-transcriber - Versions diffs - 1.0.0 - Mend

@hsiehchenwei/mcp-gemini-transcriber 1.0.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (5) hide show

package/README.md ADDED Viewed

@@ -0,0 +1,217 @@
+# MCP Gemini Transcriber
+音訊轉逐字稿 MCP 工具（使用 Gemini API），支援語者識別與情緒分析。
+## 功能特色
+### 🎵 音訊轉逐字稿
+- **自動分段**：長音檔自動切割成 5 分鐘區塊
+- **平行處理**：最多 25 個任務同時進行
+- **時間戳調整**：自動調整各段時間戳，確保連續
+- **語者識別**：識別不同發言者並自動命名
+- **情緒分析**：分析對話情緒流動與關鍵時刻
+- **失敗重試**：自動重試失敗片段（2 次）
+### 🖼️ 圖片描述
+- **多詳細程度**：簡單/一般/詳細三種模式
+- **Markdown 輸出**：結構化描述
+## 支援格式
+### 音訊
+`.mp3`, `.m4a`, `.wav`, `.webm`, `.ogg`, `.flac`, `.aiff`, `.aac`
+### 圖片
+`.png`, `.jpg`, `.jpeg`, `.webp`, `.heic`, `.heif`
+## 快速開始（推薦：使用 npx）
+無需下載或安裝，直接使用 npx 執行：
+### 1. 設定 Cursor MCP
+編輯 `~/.cursor/mcp.json`：
+```json
+{
+  "mcpServers": {
+    "gemini-transcriber": {
+      "command": "npx",
+      "args": [
+        "-y",
+        "@hsiehchenwei/mcp-gemini-transcriber"
+      ],
+      "env": {
+        "GEMINI_API_KEY": "你的-Gemini-API-Key",
+        "CURSOR_WORKSPACE_ROOT": "${workspaceFolder}"
+      }
+    }
+  }
+}
+```
+### 2. 重啟 Cursor
+完成！工具會自動下載並執行，無需本地安裝。
+**環境變數說明**：
+- `GEMINI_API_KEY`（必填）：你的 Gemini API 金鑰
+- `CURSOR_WORKSPACE_ROOT`（選填）：工作區根目錄，用於解析相對路徑
+- `DEFAULT_MODE`（選填）：預設轉錄模式，可設定為 `fast` 或 `speaker`
+  - 未設定時預設為 `fast`（快速模式）
+  - 設定為 `speaker` 時，預設使用語者識別+情緒分析模式
+  - 在對話中仍可指定 `mode` 參數覆蓋預設值
+---
+## 本地安裝（可選）
+如果你想在本地安裝：
+```bash
+npm install -g @hsiehchenwei/mcp-gemini-transcriber
+```
+然後在 `mcp.json` 中使用：
+```json
+{
+  "mcpServers": {
+    "gemini-transcriber": {
+      "command": "mcp-gemini-transcriber",
+      "env": {
+        "GEMINI_API_KEY": "你的-Gemini-API-Key",
+        "CURSOR_WORKSPACE_ROOT": "${workspaceFolder}",
+        "DEFAULT_MODE": "fast"
+      }
+    }
+  }
+}
+```
+## 系統需求
+### 需要 FFmpeg
+```bash
+# macOS
+brew install ffmpeg
+# Ubuntu/Debian
+sudo apt install ffmpeg
+# Windows
+# 下載並安裝：https://ffmpeg.org/download.html
+```
+## 使用方式
+### 音訊轉逐字稿
+```
+請轉錄這個音檔：/path/to/audio.m4a
+```
+**speaker 模式輸出包含**：
+- 摘要
+- 情緒流動（簡短段落，包含經典句子）
+- 語者資訊（辨識出的名字和特徵）
+- 逐字稿（已替換為辨識出的名字，含情緒轉折標記）
+### 圖片描述
+```
+請描述這張圖片：/path/to/image.jpg
+```
+### 列出支援格式
+```
+請列出支援的檔案格式
+```
+### 分析特定語者
+```
+分析音頻中 01:18 處的語者資訊
+```
+## 工具說明
+### transcribe_audio
+| 參數 | 必填 | 說明 |
+|------|------|------|
+| `audio_path` | ✅ | 音訊檔案路徑 |
+| `output_path` | ❌ | 輸出路徑（預設同目錄 .md） |
+| `model` | ❌ | 模型（預設 gemini-3-flash-preview） |
+| `mode` | ❌ | 模式：`fast`（快速，預設）或 `speaker`（語者識別+情緒分析） |
+**模式說明**：
+- **fast**（預設）：快速模式，平行轉錄，不進行語者識別和情緒分析
+- **speaker**：逐段處理，同時進行語者識別和情緒分析，輸出包含語者資訊、情緒流動摘要和逐字稿（含情緒轉折標記）
+**預設模式設定**：
+- 預設模式為 `fast`
+- 可透過環境變數 `DEFAULT_MODE` 設定預設模式（`fast` 或 `speaker`）
+- 在對話中指定 `mode` 參數可覆蓋預設值
+### describe_image
+| 參數 | 必填 | 說明 |
+|------|------|------|
+| `image_path` | ✅ | 圖片路徑 |
+| `output_path` | ❌ | 輸出路徑 |
+| `detail_level` | ❌ | simple/normal/detailed |
+## 處理流程
+```
+原始音訊
+    ↓
+ffprobe 取得總時長
+    ↓
+ffmpeg 分割（每段 5 分鐘）
+    ↓
+平行上傳 Gemini Files API
+    ↓
+平行轉錄（最多 25 並行）
+    ↓
+調整時間戳 + 合併
+    ↓
+產生摘要 + 關鍵字
+    ↓
+輸出 Markdown
+```
+## 可用模型
+| 模型 | 說明 |
+|------|------|
+| `gemini-3-flash-preview` | 預設，速度快 |
+| `gemini-2.5-flash` | Flash 系列 |
+| `gemini-2.5-pro` | 高品質 |
+## 發布狀態
+✅ 已發布到 npm：`@chenwei/mcp-gemini-transcriber`
+使用 npx 即可直接使用，無需本地安裝：
+```json
+{
+  "mcpServers": {
+    "gemini-transcriber": {
+      "command": "npx",
+      "args": ["-y", "@chenwei/mcp-gemini-transcriber"],
+      "env": {
+        "GEMINI_API_KEY": "your-key"
+      }
+    }
+  }
+}
+```
+## 授權
+MIT License

package/mcp.json.example ADDED Viewed

@@ -0,0 +1,14 @@
+{
+  "mcpServers": {
+    "gemini-transcriber": {
+      "command": "node",
+      "args": [
+        "/Users/chenwei/Documents/GitHub/MCPTools/mcp-gemini-transcriber/server.mjs"
+      ],
+      "env": {
+        "GEMINI_API_KEY": "your-gemini-api-key-here",
+        "CURSOR_WORKSPACE_ROOT": "${workspaceFolder}"
+      }
+    }
+  }
+}

package/mcp.json.npx.example ADDED Viewed

@@ -0,0 +1,16 @@
+{
+  "mcpServers": {
+    "gemini-transcriber": {
+      "command": "npx",
+      "args": [
+        "-y",
+        "@hsiehchenwei/mcp-gemini-transcriber"
+      ],
+      "env": {
+        "GEMINI_API_KEY": "your-gemini-api-key-here",
+        "CURSOR_WORKSPACE_ROOT": "${workspaceFolder}",
+        "DEFAULT_MODE": "fast"
+      }
+    }
+  }
+}

package/package.json ADDED Viewed

@@ -0,0 +1,46 @@
+{
+  "name": "@hsiehchenwei/mcp-gemini-transcriber",
+  "version": "1.0.0",
+  "type": "module",
+  "description": "MCP 音訊轉逐字稿工具（使用 Gemini API）- 支援語者識別與情緒分析",
+  "main": "server.mjs",
+  "bin": {
+    "mcp-gemini-transcriber": "./server.mjs"
+  },
+  "scripts": {
+    "start": "node server.mjs"
+  },
+  "keywords": [
+    "mcp",
+    "model-context-protocol",
+    "gemini",
+    "transcription",
+    "audio",
+    "speech-to-text",
+    "speaker-identification",
+    "emotion-analysis"
+  ],
+  "author": "chenwei",
+  "license": "MIT",
+  "repository": {
+    "type": "git",
+    "url": "https://github.com/chenwei/MCPTools.git",
+    "directory": "mcp-gemini-transcriber"
+  },
+  "dependencies": {
+    "@google/genai": "^1.0.0",
+    "@modelcontextprotocol/sdk": "^1.12.1",
+    "dotenv": "^16.4.5",
+    "glob": "^10.3.10",
+    "zod": "^3.22.4"
+  },
+  "engines": {
+    "node": ">=18.0.0"
+  },
+  "files": [
+    "server.mjs",
+    "README.md",
+    "mcp.json.example",
+    "mcp.json.npx.example"
+  ]
+}