PyPI - openspeechapi - Versions diffs - 0.2.9__tar.gz → 0.2.11__tar.gz - Mend

openspeechapi 0.2.9tar.gz → 0.2.11tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (164) hide show

{openspeechapi-0.2.9 → openspeechapi-0.2.11}/.gitignore RENAMED Viewed

@@ -12,6 +12,7 @@ dist/
 build/
 *.pyc
 *.pyo
+.DS_Store
 # macOS STT compiled artifacts
 scripts/engines/macos-stt/macos-stt-helper

openspeechapi-0.2.9/README.md → openspeechapi-0.2.11/PKG-INFO RENAMED Viewed

@@ -1,3 +1,172 @@
+Metadata-Version: 2.4
+Name: openspeechapi
+Version: 0.2.11
+Summary: Unified speech interface for STT/TTS providers
+Requires-Python: >=3.11
+Requires-Dist: httpx>=0.27
+Requires-Dist: loguru>=0.7
+Requires-Dist: msgpack>=1.0
+Requires-Dist: pydantic>=2.0
+Requires-Dist: pyyaml>=6.0
+Provides-Extra: alibaba
+Provides-Extra: alibaba-stt
+Provides-Extra: alibaba-tts
+Provides-Extra: all
+Requires-Dist: elevenlabs; extra == 'all'
+Requires-Dist: faster-whisper; extra == 'all'
+Requires-Dist: openai; extra == 'all'
+Requires-Dist: openai-whisper; extra == 'all'
+Requires-Dist: piper-tts; extra == 'all'
+Requires-Dist: pyttsx3; (sys_platform == 'win32') and extra == 'all'
+Requires-Dist: torchaudio; extra == 'all'
+Requires-Dist: tts; extra == 'all'
+Requires-Dist: websockets; extra == 'all'
+Provides-Extra: assemblyai-stt
+Provides-Extra: audio
+Requires-Dist: numpy; extra == 'audio'
+Requires-Dist: sounddevice; extra == 'audio'
+Provides-Extra: azure
+Provides-Extra: azure-stt
+Provides-Extra: azure-tts
+Provides-Extra: baidu
+Provides-Extra: baidu-stt
+Provides-Extra: baidu-tts
+Provides-Extra: canary-qwen-stt
+Provides-Extra: cloud
+Requires-Dist: websockets; extra == 'cloud'
+Provides-Extra: coqui-tts
+Requires-Dist: tts; extra == 'coqui-tts'
+Provides-Extra: cosyvoice-tts
+Requires-Dist: torchaudio; extra == 'cosyvoice-tts'
+Provides-Extra: deepgram
+Requires-Dist: websockets; extra == 'deepgram'
+Provides-Extra: deepgram-stt
+Requires-Dist: websockets; extra == 'deepgram-stt'
+Provides-Extra: deepgram-tts
+Provides-Extra: dev
+Requires-Dist: numpy; extra == 'dev'
+Requires-Dist: pytest-asyncio>=0.24; extra == 'dev'
+Requires-Dist: pytest-cov; extra == 'dev'
+Requires-Dist: pytest-dotenv; extra == 'dev'
+Requires-Dist: pytest>=8.0; extra == 'dev'
+Requires-Dist: ruff==0.15.*; extra == 'dev'
+Provides-Extra: dolphin-stt
+Requires-Dist: dataoceanai-dolphin; extra == 'dolphin-stt'
+Requires-Dist: torchcodec; extra == 'dolphin-stt'
+Provides-Extra: elevenlabs
+Requires-Dist: elevenlabs; extra == 'elevenlabs'
+Requires-Dist: websockets; extra == 'elevenlabs'
+Provides-Extra: elevenlabs-stt
+Requires-Dist: websockets; extra == 'elevenlabs-stt'
+Provides-Extra: elevenlabs-tts
+Requires-Dist: elevenlabs; extra == 'elevenlabs-tts'
+Provides-Extra: faster-whisper-stt
+Requires-Dist: faster-whisper; extra == 'faster-whisper-stt'
+Provides-Extra: fireredasr-stt
+Requires-Dist: fireredasr; extra == 'fireredasr-stt'
+Provides-Extra: fish-speech-tts
+Provides-Extra: funasr-stt
+Requires-Dist: funasr>=1.1.0; extra == 'funasr-stt'
+Requires-Dist: torch; extra == 'funasr-stt'
+Requires-Dist: torchaudio; extra == 'funasr-stt'
+Provides-Extra: gemma4-stt
+Requires-Dist: accelerate; (sys_platform != 'darwin') and extra == 'gemma4-stt'
+Requires-Dist: librosa; (sys_platform != 'darwin') and extra == 'gemma4-stt'
+Requires-Dist: mlx-vlm<0.6.2,>=0.6.1; (sys_platform == 'darwin') and extra == 'gemma4-stt'
+Requires-Dist: torch; (sys_platform != 'darwin') and extra == 'gemma4-stt'
+Requires-Dist: transformers; (sys_platform != 'darwin') and extra == 'gemma4-stt'
+Provides-Extra: google
+Provides-Extra: google-stt
+Provides-Extra: google-tts
+Provides-Extra: iflytek
+Requires-Dist: websockets; extra == 'iflytek'
+Provides-Extra: iflytek-stt
+Requires-Dist: websockets; extra == 'iflytek-stt'
+Provides-Extra: iflytek-tts
+Requires-Dist: websockets; extra == 'iflytek-tts'
+Provides-Extra: kimi-audio-stt
+Requires-Dist: torch; extra == 'kimi-audio-stt'
+Provides-Extra: macos-native
+Provides-Extra: minimax-tts
+Provides-Extra: mlx-whisper-stt
+Requires-Dist: mlx-whisper; (sys_platform == 'darwin') and extra == 'mlx-whisper-stt'
+Provides-Extra: mms-stt
+Requires-Dist: soundfile; extra == 'mms-stt'
+Requires-Dist: torch; extra == 'mms-stt'
+Requires-Dist: transformers; extra == 'mms-stt'
+Provides-Extra: moonshine-stt
+Requires-Dist: soundfile; extra == 'moonshine-stt'
+Requires-Dist: torch; extra == 'moonshine-stt'
+Requires-Dist: transformers; extra == 'moonshine-stt'
+Provides-Extra: openai
+Requires-Dist: openai; extra == 'openai'
+Provides-Extra: openai-stt
+Requires-Dist: openai; extra == 'openai-stt'
+Provides-Extra: openai-tts
+Requires-Dist: openai; extra == 'openai-tts'
+Provides-Extra: paraformer-stt
+Requires-Dist: funasr>=1.1.0; extra == 'paraformer-stt'
+Requires-Dist: torch; extra == 'paraformer-stt'
+Requires-Dist: torchaudio; extra == 'paraformer-stt'
+Provides-Extra: parakeet-stt
+Requires-Dist: parakeet-mlx; (sys_platform == 'darwin') and extra == 'parakeet-stt'
+Provides-Extra: phi4-multimodal-stt
+Requires-Dist: accelerate; extra == 'phi4-multimodal-stt'
+Requires-Dist: backoff; extra == 'phi4-multimodal-stt'
+Requires-Dist: peft; extra == 'phi4-multimodal-stt'
+Requires-Dist: pillow; extra == 'phi4-multimodal-stt'
+Requires-Dist: scipy; extra == 'phi4-multimodal-stt'
+Requires-Dist: soundfile; extra == 'phi4-multimodal-stt'
+Requires-Dist: torch; extra == 'phi4-multimodal-stt'
+Requires-Dist: torchvision; extra == 'phi4-multimodal-stt'
+Requires-Dist: transformers; extra == 'phi4-multimodal-stt'
+Provides-Extra: piper-tts
+Requires-Dist: piper-tts; extra == 'piper-tts'
+Provides-Extra: qwen3-asr-stt
+Requires-Dist: modelscope; extra == 'qwen3-asr-stt'
+Requires-Dist: qwen-asr; extra == 'qwen3-asr-stt'
+Provides-Extra: qwen3-omni-stt
+Requires-Dist: accelerate; extra == 'qwen3-omni-stt'
+Requires-Dist: qwen-omni-utils; extra == 'qwen3-omni-stt'
+Requires-Dist: torch; extra == 'qwen3-omni-stt'
+Requires-Dist: transformers; extra == 'qwen3-omni-stt'
+Provides-Extra: sensevoice-stt
+Requires-Dist: funasr>=1.1.0; extra == 'sensevoice-stt'
+Requires-Dist: torch; extra == 'sensevoice-stt'
+Requires-Dist: torchaudio; extra == 'sensevoice-stt'
+Provides-Extra: server
+Requires-Dist: fastapi; extra == 'server'
+Requires-Dist: python-multipart; extra == 'server'
+Requires-Dist: uvicorn; extra == 'server'
+Requires-Dist: websockets; extra == 'server'
+Provides-Extra: sherpa-onnx-stt
+Requires-Dist: websockets; extra == 'sherpa-onnx-stt'
+Provides-Extra: tencent
+Provides-Extra: tencent-stt
+Provides-Extra: tencent-tts
+Provides-Extra: tracing
+Requires-Dist: opentelemetry-api; extra == 'tracing'
+Requires-Dist: opentelemetry-sdk; extra == 'tracing'
+Provides-Extra: volcengine
+Provides-Extra: volcengine-stt
+Provides-Extra: volcengine-tts
+Provides-Extra: vosk-stt
+Requires-Dist: huggingface-hub; extra == 'vosk-stt'
+Requires-Dist: vosk; extra == 'vosk-stt'
+Provides-Extra: voxtral-stt
+Requires-Dist: accelerate; extra == 'voxtral-stt'
+Requires-Dist: mistral-common[audio]>=1.8.1; extra == 'voxtral-stt'
+Requires-Dist: torch; extra == 'voxtral-stt'
+Requires-Dist: transformers>=4.54.0; extra == 'voxtral-stt'
+Provides-Extra: wenet-stt
+Provides-Extra: whisper-stt
+Requires-Dist: openai-whisper; extra == 'whisper-stt'
+Provides-Extra: whisperlivekit-stt
+Requires-Dist: websockets; extra == 'whisperlivekit-stt'
+Provides-Extra: windows-native
+Requires-Dist: pyttsx3; (sys_platform == 'win32') and extra == 'windows-native'
+Description-Content-Type: text/markdown
 # OpenSpeechAPI
 > Unified speech interface for STT/TTS providers — one API, multiple backends.
@@ -8,22 +177,24 @@ OpenSpeechAPI 提供统一的语音接口，通过字符串指定 provider 即
 ### 安装
+**方式一 · 通过 PyPI 安装(直接使用)**
 ```bash
-# 安装全部 provider
-pip install -e ".[all]"
-# 或按需安装
-pip install -e ".[openai]"           # OpenAI Whisper STT + TTS
-pip install -e ".[faster-whisper]"   # 本地 faster-whisper STT
-pip install -e ".[openai,faster-whisper]"  # 指定多个
-# 仅核心包（不含任何 provider）
-pip install -e .
+pip install "openspeechapi[server]"           # 起 HTTP 服务 / WebUI 必须带 [server](fastapi/uvicorn)
+pip install "openspeechapi[server,openai]"    # 服务 + 指定 provider
+pip install "openspeechapi[server,all]"       # 服务 + 全部 provider
+pip install openspeechapi                      # 仅核心库(库模式;不含服务,也起不了 server)
+```
-# 开发环境
-pip install -e ".[dev]"
+**方式二 · 源码安装(开发,可编辑)**
+```bash
+git clone https://github.com/wingsfly/OpenSpeechAPI.git
+cd OpenSpeechAPI
+uv venv && uv pip install -e ".[server,dev]"  # 或 pip install -e ".[server,dev]";按需换 .[all] 等
 ```
+> ⚠️ 纯 `pip install openspeechapi`(核心库)**不含 fastapi/uvicorn**,无法 `serve`;起服务请带 `[server]`。
+> 两种方式启动服务的差异见下方 [启动服务](#启动服务)。
 ### 30 秒上手 — TTS
 ```python
@@ -205,7 +376,24 @@ python -m openspeechapi.demo tts -t "Hello world" --play \
 | `whisperlivekit-stt` | STT | WhisperLiveKit 本地服务（Deepgram 兼容 WS，支持 MLX 后端） | local | `pip install -e ".[whisperlivekit]"` |
 | `elevenlabs-stt` | STT | ElevenLabs Scribe API（云端，支持实时流式 WS + 批量） | remote | `pip install -e ".[elevenlabs-stt]"` |
 | `deepgram` | STT | Deepgram API（云端，支持实时流式） | remote | `pip install -e ".[deepgram]"` |
-| `gemma4` | STT | Google Gemma 4 多模态 ASR（macOS/MLX 本地，E4B 默认/12B 可选，>30s 自动分段，支持转写/翻译/理解） | subprocess | `pip install -e ".[gemma4-stt]"` |
+| `gemma4` | STT | Google Gemma 4 多模态 ASR（macOS/MLX 本地，E2B/E4B，>30s 自动分段；任务：转写 / 翻译(任意目标语言) / 理解 / 问答 / 语种识别） | subprocess | `pip install -e ".[gemma4-stt]"` |
+| `sensevoice` | STT | SenseVoice-Small 本地多语种 ASR（FunASR，zh/粤/en/ja/ko，比 Whisper 快 ~15-50×） | subprocess | `pip install -e ".[sensevoice-stt]"` |
+| `qwen3-asr` | STT | Qwen3-ASR 本地多语种 ASR（2026 开源 SOTA，中/方言/英，0.6B/1.7B） | subprocess | `pip install -e ".[qwen3-asr-stt]"` |
+| `mlx-whisper` | STT | Whisper on Apple MLX（本地，large-v3 / turbo，中/英多语种，仅 Apple Silicon） | subprocess | `pip install -e ".[mlx-whisper-stt]"` |
+| `paraformer` | STT | Paraformer 本地 ASR（FunASR，普通话 SOTA 级，VAD+标点，zh/en） | subprocess | `pip install -e ".[paraformer-stt]"` |
+| `funasr` | STT | FunASR 总入口（任选模型库 + VAD/标点/说话人分离） | subprocess | `pip install -e ".[funasr-stt]"` |
+| `fireredasr` | STT | 小红书 FireRedASR（普通话 SOTA+方言+英文，歌词识别，AED/LLM） | subprocess | `pip install -e ".[fireredasr-stt]"` |
+| `dolphin` | STT | DataoceanAI Dolphin（40 东方语种 + 22 中文方言，small/base） | subprocess | `pip install -e ".[dolphin-stt]"` |
+| `wenet` | STT | WeNet U2++ Conformer（生产级，zh/en 预置；流式后续） | subprocess | WebUI Engines 安装,或 `pip install 'wenet @ git+https://github.com/wenet-e2e/wenet.git'` |
+| `canary-qwen` | STT | NVIDIA Canary-Qwen-2.5B（Open ASR 英文第1，SALM；仅英文，需 NeMo+GPU） | subprocess | WebUI Engines 安装,或 `pip install 'nemo_toolkit[asr] @ git+https://github.com/NVIDIA/NeMo.git'` |
+| `parakeet` | STT | NVIDIA Parakeet-TDT on MLX（最快，v2 英文/v3 欧语；中文弱，仅 Apple Silicon） | subprocess | `pip install -e ".[parakeet-stt]"` |
+| `qwen3-omni` | STT | Qwen3-Omni-30B 全模态 LLM（ASR+理解，zh/en+；需大 GPU ~60GB） | subprocess | `pip install -e ".[qwen3-omni-stt]"` |
+| `voxtral` | STT | Mistral Voxtral（Mini-3B/Small-24B，转写+理解，多语种；建议 GPU） | subprocess | `pip install -e ".[voxtral-stt]"` |
+| `phi4-multimodal` | STT | 微软 Phi-4-multimodal（多模态 LLM，ASR+理解，zh/en+；建议 GPU） | subprocess | `pip install -e ".[phi4-multimodal-stt]"` |
+| `kimi-audio` | STT | 月之暗面 Kimi-Audio-7B（音频基础模型，ASR+理解，zh/en；需 Linux+CUDA/flash-attn） | subprocess | WebUI Engines 安装,或 `pip install 'kimi-audio @ git+https://github.com/MoonshotAI/Kimi-Audio.git'` |
+| `moonshine` | STT | Useful Sensors Moonshine（边缘/实时英文 ASR，tiny/base，轻量） | subprocess | `pip install -e ".[moonshine-stt]"` |
+| `vosk` | STT | Vosk（Kaldi 离线，20+ 语言含 zh/en，轻量低资源） | subprocess | `pip install -e ".[vosk-stt]"` |
+| `mms` | STT | Meta MMS（Wav2Vec2-CTC，1000+ 语言含 zh/en，按 ISO 639-3 选语言） | subprocess | `pip install -e ".[mms-stt]"` |
 | `openai-tts` | TTS | OpenAI Speech API（云端，支持流式） | remote | `pip install -e ".[openai]"` |
 | `elevenlabs` | TTS | ElevenLabs 高质量语音（云端，支持 HTTP/WS 流式） | remote | `pip install -e ".[elevenlabs-tts]"` |
 | `minimax` | TTS | Minimax 语音合成（云端） | remote | `pip install -e ".[minimax]"` |
@@ -229,6 +417,8 @@ print(list_providers())
 #  'whisperlivekit-stt']
 ```
+> **音频输入格式**:STT 上传支持 WAV/PCM/MP3/FLAC/OGG/WebM 等。引擎不能直接处理的格式会由服务端自动转为 16k 单声道 WAV(压缩格式需 `ffmpeg`);缺 ffmpeg 且格式不被支持时返回 400,Web UI 会在上传/录音前拦截提示。详见 [docs/architecture/audio-format-negotiation.md](docs/architecture/audio-format-negotiation.md)。
 ## Provider 参数
 ### `openai-stt`
@@ -263,6 +453,204 @@ create_provider("faster-whisper",
 )
 ```
+### `gemma4`
+```python
+create_provider("gemma4",
+    model="mlx-community/gemma-4-E4B-it-8bit",  # E2B/E4B（8bit 翻译更准；勿用 12B）
+    task="transcribe",          # transcribe｜translate｜understand｜qa｜detect_language
+    target_language="English",  # task=translate 的目标语言（任意语言）
+    include_transcript=False,   # task=translate：同时输出源转写 + 译文
+)
+```
+macOS / Apple Silicon 本地多模态 ASR（mlx-vlm）。5 个任务及全部字段可在 Web UI 的 Lab「Advanced Options」按请求覆盖。详见 [docs/architecture/gemma4-stt-provider.md](docs/architecture/gemma4-stt-provider.md)。
+### `sensevoice`
+```python
+create_provider("sensevoice",
+    model="FunAudioLLM/SenseVoiceSmall",
+    language="auto",      # auto|zh|en|yue|ja|ko|nospeech
+    device="cpu",         # cpu|mps|cuda
+    use_itn=True,         # 标点/数字规整
+)
+```
+FunASR 本地多语种 ASR（zh/粤/en/ja/ko），非自回归、极快；全部字段可在 Lab「Advanced Options」按请求覆盖。详见 [docs/architecture/sensevoice-stt-provider.md](docs/architecture/sensevoice-stt-provider.md)。
+### `qwen3-asr`
+```python
+create_provider("qwen3-asr",
+    model="Qwen/Qwen3-ASR-0.6B",  # 或 Qwen/Qwen3-ASR-1.7B
+    language="auto",              # auto|Chinese|English|Cantonese|Japanese|Korean
+    device="cpu",                 # cpu|mps|cuda
+)
+```
+阿里 Qwen3-ASR（2026 开源 ASR SOTA，中/方言/英）本地推理（qwen-asr 包）。需另装 `torch`。详见 [docs/architecture/qwen3-asr-stt-provider.md](docs/architecture/qwen3-asr-stt-provider.md)。
+### `mlx-whisper`
+```python
+create_provider("mlx-whisper",
+    model="mlx-community/whisper-large-v3-turbo",  # 或 whisper-large-v3-mlx
+    language="auto",                               # auto|en|zh|yue|ja|ko|...
+)
+```
+Apple Silicon 原生 Whisper（MLX），large-v3 / turbo，中英文多语种。仅 macOS/Apple Silicon。详见 [docs/architecture/mlx-whisper-stt-provider.md](docs/architecture/mlx-whisper-stt-provider.md)。
+### `paraformer`
+```python
+create_provider("paraformer",
+    model="funasr/paraformer-zh",  # 或 funasr/paraformer-en
+    vad=True, punc=True,           # VAD 切分 + 标点恢复
+)
+```
+阿里 Paraformer（FunASR），普通话 SOTA 级非自回归 ASR，带 VAD + 标点。详见 [docs/architecture/paraformer-stt-provider.md](docs/architecture/paraformer-stt-provider.md)。
+### `funasr`
+```python
+create_provider("funasr",
+    model="funasr/paraformer-zh",  # 模型库任意条目
+    spk=True,                      # CAM++ 说话人分离 → [spk0]/[spk1] 标注
+)
+```
+FunASR 通用总入口:任选模型库模型 + VAD/标点/**说话人分离**。详见 [docs/architecture/funasr-stt-provider.md](docs/architecture/funasr-stt-provider.md)。
+### `fireredasr`
+```python
+create_provider("fireredasr",
+    model_type="aed",   # aed(≤60s) | llm(≤30s);权重自动下载
+)
+```
+小红书 FireRedASR,普通话公开基准 SOTA + 方言 + 英文,歌词识别强。详见 [docs/architecture/fireredasr-stt-provider.md](docs/architecture/fireredasr-stt-provider.md)。
+### `dolphin`
+```python
+create_provider("dolphin",
+    model_name="small",   # small | base
+    lang_sym="zh", region_sym="CN",   # 留空则自动检测
+)
+```
+DataoceanAI Dolphin,40 种东方语言 + 22 种中文方言。详见 [docs/architecture/dolphin-stt-provider.md](docs/architecture/dolphin-stt-provider.md)。
+### `wenet`
+```python
+create_provider("wenet",
+    model="chinese",   # chinese | english
+)
+```
+WeNet 生产级 U2++ Conformer(zh/en 预置)。从 git 安装(不在 PyPI)。详见 [docs/architecture/wenet-stt-provider.md](docs/architecture/wenet-stt-provider.md)。
+### `canary-qwen`
+```python
+create_provider("canary-qwen",
+    model="nvidia/canary-qwen-2.5b",
+    device="cuda", dtype="bfloat16",   # 仅英文;强烈建议 GPU
+)
+```
+NVIDIA Canary-Qwen-2.5B(Open ASR 英文第 1,SALM)。**仅英文**;NeMo 重型安装 + 建议 GPU。详见 [docs/architecture/canary-qwen-stt-provider.md](docs/architecture/canary-qwen-stt-provider.md)。
+### `parakeet`
+```python
+create_provider("parakeet",
+    model="mlx-community/parakeet-tdt-0.6b-v2",  # v2 英文;v3 + 欧洲语言
+)
+```
+NVIDIA Parakeet-TDT on Apple MLX,榜上最快。英文/欧语为主,**中文弱**;仅 Apple Silicon。详见 [docs/architecture/parakeet-stt-provider.md](docs/architecture/parakeet-stt-provider.md)。
+### `qwen3-omni`
+```python
+create_provider("qwen3-omni",
+    model="Qwen/Qwen3-Omni-30B-A3B-Instruct",
+    prompt="Transcribe the audio into text.",   # 改成问题即可做音频问答
+)
+```
+阿里 Qwen3-Omni-30B-A3B 全模态 LLM(ASR + 音频理解,zh/en+)。**需大显存 GPU(~60GB),笔记本装不下**。详见 [docs/architecture/qwen3-omni-stt-provider.md](docs/architecture/qwen3-omni-stt-provider.md)。
+### `voxtral`
+```python
+create_provider("voxtral",
+    model="mistralai/Voxtral-Mini-3B-2507",  # 或 Voxtral-Small-24B-2507
+    language="en",
+)
+```
+Mistral Voxtral(转写 + 音频理解,多语种)。3B/24B,建议 GPU。详见 [docs/architecture/voxtral-stt-provider.md](docs/architecture/voxtral-stt-provider.md)。
+### `phi4-multimodal`
+```python
+create_provider("phi4-multimodal",
+    model="microsoft/Phi-4-multimodal-instruct",
+    prompt="Transcribe the audio clip into text.",
+)
+```
+微软 Phi-4-multimodal,紧凑多模态 LLM(ASR + 音频理解,zh/en+)。建议 GPU。详见 [docs/architecture/phi4-multimodal-stt-provider.md](docs/architecture/phi4-multimodal-stt-provider.md)。
+### `kimi-audio`
+```python
+create_provider("kimi-audio",
+    model="moonshotai/Kimi-Audio-7B-Instruct",
+    prompt="Please transcribe the audio into text.",
+)
+```
+月之暗面 Kimi-Audio-7B 音频基础模型(ASR + 音频理解,zh/en)。git 安装 + 建议 GPU。详见 [docs/architecture/kimi-audio-stt-provider.md](docs/architecture/kimi-audio-stt-provider.md)。
+### `moonshine`
+```python
+create_provider("moonshine",
+    model="UsefulSensors/moonshine-base",  # base | tiny
+)
+```
+Useful Sensors Moonshine,边缘/实时英文 ASR,轻量快速。详见 [docs/architecture/moonshine-stt-provider.md](docs/architecture/moonshine-stt-provider.md)。
+### `vosk`
+```python
+create_provider("vosk",
+    model="vosk-model-small-en-us-0.15",  # 中文: vosk-model-small-cn-0.22
+)
+```
+Vosk(Kaldi 离线),20+ 语言,轻量低资源,模型自动下载。详见 [docs/architecture/vosk-stt-provider.md](docs/architecture/vosk-stt-provider.md)。
+### `mms`
+```python
+create_provider("mms",
+    model="facebook/mms-1b-all",
+    language="eng",   # ISO 639-3: eng / cmn / yue / jpn ...
+)
+```
+Meta MMS(Wav2Vec2-CTC),1000+ 语言含中英,按 **ISO 639-3** 码切换语言适配器;CTC 输出小写无标点。详见 [docs/architecture/mms-stt-provider.md](docs/architecture/mms-stt-provider.md)。
 ### `openai-tts`
 ```python
@@ -407,10 +795,25 @@ bash scripts/engines/macos-stt/install.sh
 ### 启动服务
+**pip 安装后**(已带 `[server]`)—— 配置自动解析/生成,开箱即起:
 ```bash
-openspeechapi serve --config providers.yaml --port 8600
+openspeechapi serve                 # 自动解析配置;没有则生成默认(macOS 默认 macos_tts)
+openspeechapi serve --port 8600     # 指定端口
 ```
+**源码目录运行**:
+```bash
+python -m openspeechapi.cli serve   # 或 openspeechapi serve;在仓库目录优先用 ./providers.yaml
+```
+启动后打开 WebUI:**http://127.0.0.1:8600/ui/**
+**配置解析顺序**(`--config` 可放在 `serve` **前或后**,例如 `openspeechapi --config x serve` 或 `openspeechapi serve --config x`):
+1. 显式 `--config <path>`
+2. 当前目录 `./providers.yaml`(源码目录运行时优先)
+3. `~/.config/openspeechapi/providers.yaml`(遵循 `XDG_CONFIG_HOME`)
+4. 都没有 → 在 `~/.config/openspeechapi/providers.yaml` **自动生成**一份可用默认配置
 ### Python Client（与 Library 模式接口一致）
 ```python

openspeechapi 0.2.9__tar.gz → 0.2.11__tar.gz

openspeechapi 0.2.9tar.gz → 0.2.11tar.gz