PyPI - openspeechapi - Versions diffs - 0.1.0__tar.gz - Mend

openspeechapi 0.1.0__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (247) hide show

openspeechapi-0.1.0/.dockerignore ADDED Viewed

@@ -0,0 +1,16 @@
+.venv/
+.git/
+.env
+__pycache__/
+*.pyc
+.pytest_cache/
+.coverage
+dist/
+*.egg-info/
+tests/
+docs/
+examples/
+scripts/
+output.wav
+output/
+.DS_Store

openspeechapi-0.1.0/.env.example ADDED Viewed

@@ -0,0 +1,12 @@
+# OpenSpeech API Keys
+# 复制此文件为 .env 并填入你的 Key:  cp .env.example .env
+# ── 云端 Provider Keys ──────────────────────────
+OPENAI_API_KEY=sk-...              # OpenAI STT (Whisper) + TTS
+DEEPGRAM_API_KEY=                  # Deepgram STT (实时流式)
+ELEVENLABS_API_KEY=                # ElevenLabs TTS
+MINIMAX_API_KEY=                   # Minimax TTS
+MINIMAX_GROUP_ID=                  # Minimax Group ID
+# ── 服务端认证 ──────────────────────────────────
+OPENSPEECH_API_KEY=                # HTTP 服务 Bearer token (可选)

openspeechapi-0.1.0/.github/workflows/ci.yml ADDED Viewed

@@ -0,0 +1,31 @@
+name: CI
+on:
+  push:
+    branches: [main]
+  pull_request:
+    branches: [main]
+jobs:
+  test:
+    runs-on: ubuntu-latest
+    steps:
+      - uses: actions/checkout@v4
+      - name: Set up Python 3.11
+        uses: actions/setup-python@v5
+        with:
+          python-version: "3.11"
+      - name: Install dependencies
+        run: pip install -e ".[dev]"
+      - name: Lint with ruff
+        run: ruff check openspeech/ tests/
+      - name: Run unit and integration tests
+        run: pytest tests/unit tests/integration -v --tb=short
+      - name: Check coverage
+        run: pytest tests/unit tests/integration --cov=openspeech --cov-report=term-missing --cov-fail-under=70

openspeechapi-0.1.0/.gitignore ADDED Viewed

@@ -0,0 +1,29 @@
+.superpowers/
+.venv/
+.env
+# Local runtime config (copy providers.example.yaml → providers.yaml on first run)
+providers.yaml
+__pycache__/
+*.egg-info/
+.pytest_cache/
+.coverage
+dist/
+build/
+*.pyc
+*.pyo
+# macOS STT compiled artifacts
+scripts/engines/macos-stt/macos-stt-helper
+scripts/engines/macos-stt/macos-stt-request-auth
+scripts/engines/macos-stt/MacOSSTTHelper.app/
+scripts/engines/macos-stt/Info.plist
+# Runtime logs (structured JSONL + server stdio captures)
+logs/
+server_err.log
+server_stdout.log
+server_stderr.log
+# Claude Code project-local config
+.claude/

openspeechapi-0.1.0/.tmp/audio/en.aiff ADDED Viewed

Binary file

openspeechapi-0.1.0/.tmp/audio/en_16k.wav ADDED Viewed

Binary file

openspeechapi-0.1.0/.tmp/audio/en_16k_pad6.wav ADDED Viewed

Binary file

openspeechapi-0.1.0/.tmp/audio/en_long.aiff ADDED Viewed

Binary file

openspeechapi-0.1.0/.tmp/audio/en_long_16k.wav ADDED Viewed

Binary file

openspeechapi-0.1.0/.tmp/audio/en_mid.aiff ADDED Viewed

Binary file

openspeechapi-0.1.0/.tmp/audio/en_mid_16k.wav ADDED Viewed

Binary file

openspeechapi-0.1.0/.tmp/audio/zh.aiff ADDED Viewed

Binary file

openspeechapi-0.1.0/.tmp/audio/zh_16k.wav ADDED Viewed

Binary file

openspeechapi-0.1.0/.tmp/openspeech-8600.log ADDED Viewed

@@ -0,0 +1,5 @@
+INFO:     Started server process [22552]
+INFO:     Waiting for application startup.
+2026-04-10 11:28:53.753 | INFO     | openspeech.dispatch.watcher:start:35 - Config watcher started: .tmp/providers.webui.stt.yaml
+INFO:     Application startup complete.
+INFO:     Uvicorn running on http://127.0.0.1:8600 (Press CTRL+C to quit)

openspeechapi-0.1.0/.tmp/openspeech-serve.log ADDED Viewed

File without changes

openspeechapi-0.1.0/.tmp/webui-server.log ADDED Viewed

@@ -0,0 +1,5 @@
+INFO:     Started server process [96229]
+INFO:     Waiting for application startup.
+2026-04-08 16:36:24.899 | INFO     | openspeech.dispatch.watcher:start:35 - Config watcher started: .tmp/providers.webui.yaml
+INFO:     Application startup complete.
+INFO:     Uvicorn running on http://127.0.0.1:8600 (Press CTRL+C to quit)

openspeechapi-0.1.0/.tmp/webui-server.pid ADDED Viewed

	@@ -0,0 +1 @@
1	+ 96229

openspeechapi-0.1.0/.tmp/wlk12101.log ADDED Viewed

@@ -0,0 +1,15 @@
+  WhisperLiveKit
+  Backend: mlx-whisper | Model: base | Language: auto
+  Accelerator: MPS (Apple Silicon), MLX
+  Web UI:       http://127.0.0.1:12101/
+  WebSocket:    ws://127.0.0.1:12101/asr
+  Deepgram:     ws://127.0.0.1:12101/v1/listen
+  REST API:     http://127.0.0.1:12101/v1/audio/transcriptions
+  Models:       http://127.0.0.1:12101/v1/models
+  Health:       http://127.0.0.1:12101/health
+INFO:     Started server process [93444]
+INFO:     Waiting for application startup.

openspeechapi-0.1.0/.tmp/wlk12101.pid ADDED Viewed

	@@ -0,0 +1 @@
1	+ 93444

openspeechapi-0.1.0/.tmp/wlk12102.log ADDED Viewed

@@ -0,0 +1,14 @@
+  WhisperLiveKit
+  Backend: mlx-whisper | Model: base | Language: auto
+  Accelerator: MPS (Apple Silicon), MLX
+  Web UI:       http://127.0.0.1:12102/
+  WebSocket:    ws://127.0.0.1:12102/asr
+  Deepgram:     ws://127.0.0.1:12102/v1/listen
+  REST API:     http://127.0.0.1:12102/v1/audio/transcriptions
+  Models:       http://127.0.0.1:12102/v1/models
+  Health:       http://127.0.0.1:12102/health
+INFO:     Started server process [23522]
+INFO:     Waiting for application startup.

openspeechapi-0.1.0/.tmp/wlk12102.pid ADDED Viewed

	@@ -0,0 +1 @@
1	+ 23522

openspeechapi-0.1.0/AGENTS.md ADDED Viewed

@@ -0,0 +1,36 @@
+# 语音大模型统一接口项目规范
+## Agent 调度规则
+**所有 Agent 工具调用必须加上 `mode: "bypassPermissions"`**，示例：
+```json
+{
+  "subagent_type": "general-purpose",
+  "mode": "bypassPermissions",
+  "prompt": "..."
+}
+```
+### Agent角色分工
+1. 可选的Agent角色参考 [~/.Codex/AGENTS.md](~/.Codex/AGENTS.md)文件的描述。
+2. 默认启动项目经理角色，项目经理仅负责项目当前进度的同步管理和任务调度分派，其他工作根据已设定的agent，按照需要选择合适的角色启动并进行工作分配。
+## 功能实现规范
+1. 每次对话提到问题bug或者功能点变化时，先进行问题分析和功能实现方案设计，并和用户确认后再执行实际开发修复工作。
+2. 功能修复或者功能点更新完成后，同步检查是否需要更新对应的文档。
+## 测试规范
+### 核心要求
+1. **UI E2E 测试（强制）**：所有界面功能必须具备从"前端页面操作"到"前端页面最终结果"的完整 E2E 测试用例
+2. **GIVEN → WHEN → THEN → AND**：每个用例必须覆盖前置条件、页面操作、后端状态变更、前端结果验证
+3. **新增/修改界面功能时必须同步新增/更新对应的 UI E2E 用例**
+## 文档&同步规则
+### 文档规范
+**默认将项目过程和设计文档都存放于[docs/](docs/)目录下**

openspeechapi-0.1.0/CLAUDE.md ADDED Viewed

@@ -0,0 +1,73 @@
+# 语音大模型统一接口项目规范
+## 系统架构和整体规范
+1. 当前系统为MacOS，对模型和引擎的选择，优先匹配macOS和mlx框架
+2. 随时要确保前端页面和后台服务之间的状态是保持同步一致的，避免服务已经异常，但前台没有任何状态表现
+## Agent 调度规则
+**所有 Agent 工具调用必须加上 `mode: "bypassPermissions"`**，示例：
+```json
+{
+  "subagent_type": "general-purpose",
+  "mode": "bypassPermissions",
+  "prompt": "..."
+}
+```
+### Agent角色分工
+1. 可选的Agent角色参考 [~/.claude/AGENTS.md](~/.claude/AGENTS.md)文件的描述。
+2. 默认启动项目经理角色，项目经理仅负责项目当前进度的同步管理和任务调度分派，其他工作根据已设定的agent，按照需要选择合适的角色启动并进行工作分配。
+## 功能实现规范
+1. 每次对话提到问题bug或者功能点变化时，先进行问题分析和功能实现方案设计，并和用户确认后再执行实际开发修复工作。
+2. 功能修复或者功能点更新完成后，同步检查是否需要更新对应的文档。
+## Provider 开发规范
+### STT 流式识别规范
+所有声明了 `Capability.STREAMING` 的 STT Provider 必须遵循 [STT 流式识别开发规范](docs/architecture/stt-streaming-spec.md)，核心要求：
+1. **`is_partial` 标记**：中间结果 `is_partial=True`，最终结果（VAD/用户停止）`is_partial=False`，服务端据此区分 `partial`/`final` 消息类型
+2. **全文快照 yield**：每次 yield 的 `Transcription.text` 必须是完整文本快照（非增量片段），前端直接替换显示
+3. **sender/receiver 并发模型**：使用 `_sender_stop` Event 协调、sender `ConnectionClosed` 容错、`send_task.cancel()` 防挂起、Queue sentinel `None` 保证退出
+4. **前端自动停止**：收到 `final` 后自动停止录音释放麦克风，无需用户手动点击 Stop
+5. **性能日志（里程碑计时）**：WS 连接耗时、首帧发送、首次响应（含协议元数据）、最终结果（含响应计数+文本预览）、流式完成总耗时
+6. **批量模式帧间 pacing**：预录音频通过 WS 发送时需添加帧间延迟（~10ms），防止服务端读超时
+7. **新增流式 STT Provider 时**，必须对照文档末尾的 Checklist 逐项确认
+### field_options（下拉可选项）
+每个 Provider 必须在类属性中定义 `field_options`，为所有枚举型参数提供完整的可选值列表。该属性被 Config 页面和 Lab 页面用于生成下拉选择框。
+**规则：**
+1. **所有具有固定可选值的 settings 字段**（如 model、voice、language、device、format 等）必须在 `field_options` 中列出
+2. **布尔值、数值、自由文本字段**（如 speed、temperature、api_url）不需要列入
+3. **Vendor 共享凭据字段**（如 api_key、api_secret）不应出现在 engine 的 `default_settings` 中，由 vendor 层注入
+4. 新增或修改 Provider 时，必须同步更新 `field_options`，确保 UI 下拉选项与 API 文档一致
+5. 定期检查各 Provider 的 `field_options` 是否与上游 API 保持同步
+**示例：**
+```python
+class MyTTSProvider(TTSProvider):
+    field_options = {
+        "model": ["model-v1", "model-v2"],
+        "voice": ["alice", "bob", "charlie"],
+        "language": ["en-US", "zh-CN", "ja"],
+    }
+```
+## 测试规范
+### 核心要求
+1. **UI E2E 测试（强制）**：所有界面功能必须具备从"前端页面操作"到"前端页面最终结果"的完整 E2E 测试用例
+2. **GIVEN → WHEN → THEN → AND**：每个用例必须覆盖前置条件、页面操作、后端状态变更、前端结果验证
+3. **新增/修改界面功能时必须同步新增/更新对应的 UI E2E 用例**
+## 文档&同步规则
+### 文档规范
+**默认将项目过程和设计文档都存放于[docs/](docs/)目录下**

openspeechapi-0.1.0/Dockerfile ADDED Viewed

@@ -0,0 +1,20 @@
+FROM python:3.11-slim
+WORKDIR /app
+# Install system dependencies for audio processing
+RUN apt-get update && apt-get install -y --no-install-recommends \
+    ffmpeg \
+    && rm -rf /var/lib/apt/lists/*
+# Copy project files
+COPY pyproject.toml .
+COPY openspeech/ openspeech/
+COPY providers.example.yaml providers.yaml
+# Install with all provider deps + server
+RUN pip install --no-cache-dir -e ".[all,server]"
+EXPOSE 8600
+CMD ["openspeech", "serve", "--config", "providers.yaml", "--host", "0.0.0.0", "--port", "8600"]

openspeechapi-0.1.0/PKG-INFO ADDED Viewed

@@ -0,0 +1,101 @@
+Metadata-Version: 2.4
+Name: openspeechapi
+Version: 0.1.0
+Summary: Unified speech interface for STT/TTS providers
+Requires-Python: >=3.11
+Requires-Dist: httpx>=0.27
+Requires-Dist: loguru>=0.7
+Requires-Dist: msgpack>=1.0
+Requires-Dist: pydantic>=2.0
+Requires-Dist: pyyaml>=6.0
+Provides-Extra: alibaba
+Provides-Extra: alibaba-stt
+Provides-Extra: alibaba-tts
+Provides-Extra: all
+Requires-Dist: elevenlabs; extra == 'all'
+Requires-Dist: faster-whisper; extra == 'all'
+Requires-Dist: openai; extra == 'all'
+Requires-Dist: openai-whisper; extra == 'all'
+Requires-Dist: piper-tts; extra == 'all'
+Requires-Dist: pyttsx3; (sys_platform == 'win32') and extra == 'all'
+Requires-Dist: torchaudio; extra == 'all'
+Requires-Dist: tts; extra == 'all'
+Requires-Dist: websockets; extra == 'all'
+Provides-Extra: assemblyai-stt
+Provides-Extra: audio
+Requires-Dist: numpy; extra == 'audio'
+Requires-Dist: sounddevice; extra == 'audio'
+Provides-Extra: azure
+Provides-Extra: azure-stt
+Provides-Extra: azure-tts
+Provides-Extra: baidu
+Provides-Extra: baidu-stt
+Provides-Extra: baidu-tts
+Provides-Extra: cloud
+Requires-Dist: websockets; extra == 'cloud'
+Provides-Extra: coqui-tts
+Requires-Dist: tts; extra == 'coqui-tts'
+Provides-Extra: cosyvoice-tts
+Requires-Dist: torchaudio; extra == 'cosyvoice-tts'
+Provides-Extra: deepgram
+Requires-Dist: websockets; extra == 'deepgram'
+Provides-Extra: deepgram-stt
+Requires-Dist: websockets; extra == 'deepgram-stt'
+Provides-Extra: deepgram-tts
+Provides-Extra: dev
+Requires-Dist: pytest-asyncio>=0.24; extra == 'dev'
+Requires-Dist: pytest-cov; extra == 'dev'
+Requires-Dist: pytest-dotenv; extra == 'dev'
+Requires-Dist: pytest>=8.0; extra == 'dev'
+Requires-Dist: ruff; extra == 'dev'
+Provides-Extra: elevenlabs
+Requires-Dist: elevenlabs; extra == 'elevenlabs'
+Requires-Dist: websockets; extra == 'elevenlabs'
+Provides-Extra: elevenlabs-stt
+Requires-Dist: websockets; extra == 'elevenlabs-stt'
+Provides-Extra: elevenlabs-tts
+Requires-Dist: elevenlabs; extra == 'elevenlabs-tts'
+Provides-Extra: faster-whisper-stt
+Requires-Dist: faster-whisper; extra == 'faster-whisper-stt'
+Provides-Extra: fish-speech-tts
+Provides-Extra: google
+Provides-Extra: google-stt
+Provides-Extra: google-tts
+Provides-Extra: iflytek
+Requires-Dist: websockets; extra == 'iflytek'
+Provides-Extra: iflytek-stt
+Requires-Dist: websockets; extra == 'iflytek-stt'
+Provides-Extra: iflytek-tts
+Requires-Dist: websockets; extra == 'iflytek-tts'
+Provides-Extra: macos-native
+Provides-Extra: minimax-tts
+Provides-Extra: openai
+Requires-Dist: openai; extra == 'openai'
+Provides-Extra: openai-stt
+Requires-Dist: openai; extra == 'openai-stt'
+Provides-Extra: openai-tts
+Requires-Dist: openai; extra == 'openai-tts'
+Provides-Extra: piper-tts
+Requires-Dist: piper-tts; extra == 'piper-tts'
+Provides-Extra: server
+Requires-Dist: fastapi; extra == 'server'
+Requires-Dist: python-multipart; extra == 'server'
+Requires-Dist: uvicorn; extra == 'server'
+Requires-Dist: websockets; extra == 'server'
+Provides-Extra: sherpa-onnx-stt
+Requires-Dist: websockets; extra == 'sherpa-onnx-stt'
+Provides-Extra: tencent
+Provides-Extra: tencent-stt
+Provides-Extra: tencent-tts
+Provides-Extra: tracing
+Requires-Dist: opentelemetry-api; extra == 'tracing'
+Requires-Dist: opentelemetry-sdk; extra == 'tracing'
+Provides-Extra: volcengine
+Provides-Extra: volcengine-stt
+Provides-Extra: volcengine-tts
+Provides-Extra: whisper-stt
+Requires-Dist: openai-whisper; extra == 'whisper-stt'
+Provides-Extra: whisperlivekit-stt
+Requires-Dist: websockets; extra == 'whisperlivekit-stt'
+Provides-Extra: windows-native
+Requires-Dist: pyttsx3; (sys_platform == 'win32') and extra == 'windows-native'