pipecat-moonshine 0.1.0__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,24 @@
1
+ BSD 2-Clause License
2
+
3
+ Copyright (c) 2026, pipecat-moonshine contributors
4
+
5
+ Redistribution and use in source and binary forms, with or without
6
+ modification, are permitted provided that the following conditions are met:
7
+
8
+ 1. Redistributions of source code must retain the above copyright notice, this
9
+ list of conditions and the following disclaimer.
10
+
11
+ 2. Redistributions in binary form must reproduce the above copyright notice,
12
+ this list of conditions and the following disclaimer in the documentation
13
+ and/or other materials provided with the distribution.
14
+
15
+ THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
16
+ AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
17
+ IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
18
+ DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE
19
+ FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
20
+ DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
21
+ SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
22
+ CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
23
+ OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
24
+ OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
@@ -0,0 +1,204 @@
1
+ Metadata-Version: 2.4
2
+ Name: pipecat-moonshine
3
+ Version: 0.1.0
4
+ Summary: Moonshine ASR (speech-to-text) community integration for Pipecat
5
+ Author: pipecat-moonshine contributors
6
+ License: BSD-2-Clause
7
+ Project-URL: Homepage, https://github.com/ubopod/pipecat-moonshine
8
+ Project-URL: Source, https://github.com/ubopod/pipecat-moonshine
9
+ Project-URL: Issues, https://github.com/ubopod/pipecat-moonshine/issues
10
+ Project-URL: Changelog, https://github.com/ubopod/pipecat-moonshine/blob/main/CHANGELOG.md
11
+ Keywords: pipecat,moonshine,asr,stt,speech-to-text,voice,ai
12
+ Classifier: Development Status :: 4 - Beta
13
+ Classifier: Intended Audience :: Developers
14
+ Classifier: License :: OSI Approved :: BSD License
15
+ Classifier: Programming Language :: Python :: 3
16
+ Classifier: Programming Language :: Python :: 3.11
17
+ Classifier: Programming Language :: Python :: 3.12
18
+ Classifier: Programming Language :: Python :: 3.13
19
+ Classifier: Topic :: Multimedia :: Sound/Audio :: Speech
20
+ Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
21
+ Requires-Python: >=3.11
22
+ Description-Content-Type: text/markdown
23
+ License-File: LICENSE
24
+ Requires-Dist: pipecat-ai[websockets-base]>=1.2.1
25
+ Requires-Dist: useful-moonshine-onnx>=20251121
26
+ Requires-Dist: numpy<3,>=1.26.4
27
+ Requires-Dist: loguru<1,>=0.7.0
28
+ Provides-Extra: examples
29
+ Requires-Dist: python-dotenv<2,>=1.0.0; extra == "examples"
30
+ Requires-Dist: pyaudio~=0.2.14; extra == "examples"
31
+ Provides-Extra: dev
32
+ Requires-Dist: ruff<1,>=0.12.11; extra == "dev"
33
+ Requires-Dist: pytest<10,>=9.0.0; extra == "dev"
34
+ Requires-Dist: pytest-asyncio<2,>=1.0.0; extra == "dev"
35
+ Dynamic: license-file
36
+
37
+ # pipecat-moonshine
38
+
39
+ [Moonshine ASR](https://github.com/moonshine-ai/moonshine) speech-to-text
40
+ integration for [Pipecat](https://github.com/pipecat-ai/pipecat).
41
+
42
+ Moonshine is a family of small, fast automatic-speech-recognition models
43
+ optimized for resource-constrained devices. The Tiny English model is roughly
44
+ 26 M parameters, the Base English model roughly 58 M — both run on CPU via
45
+ ONNX Runtime with no GPU required. That makes Moonshine an attractive choice
46
+ for low-latency, on-device transcription in Pipecat pipelines that already
47
+ handle VAD upstream.
48
+
49
+ This package provides `MoonshineSTTService`, a `SegmentedSTTService`
50
+ subclass that plugs straight into any Pipecat pipeline.
51
+
52
+ ## Status
53
+
54
+ Community-maintained integration. See
55
+ [Pipecat's Community Integrations guide](https://github.com/pipecat-ai/pipecat/blob/main/COMMUNITY_INTEGRATIONS.md)
56
+ for what that means — in short, the Pipecat team does not maintain or
57
+ support this package; please file issues here.
58
+
59
+ **Tested with Pipecat v1.2.1.**
60
+
61
+ ## Installation
62
+
63
+ ```bash
64
+ pip install pipecat-moonshine
65
+ ```
66
+
67
+ This pulls in [`useful-moonshine-onnx`](https://pypi.org/project/useful-moonshine-onnx/)
68
+ and `pipecat-ai` automatically. The first time you instantiate the service,
69
+ the chosen model weights are downloaded from Hugging Face
70
+ (`UsefulSensors/moonshine`) and cached locally.
71
+
72
+ For the included foundational example you also need the local-audio extras:
73
+
74
+ ```bash
75
+ pip install 'pipecat-moonshine[examples]'
76
+ ```
77
+
78
+ ## Usage
79
+
80
+ ```python
81
+ from pipecat.pipeline.pipeline import Pipeline
82
+ from pipecat_moonshine import MoonshineSTTService, Model
83
+
84
+ stt = MoonshineSTTService(model=Model.TINY_EN)
85
+
86
+ pipeline = Pipeline([
87
+ transport.input(),
88
+ vad_processor, # MUST run upstream of the STT — see below
89
+ stt,
90
+ # ... downstream processors
91
+ ])
92
+ ```
93
+
94
+ `MoonshineSTTService` subclasses `SegmentedSTTService`, so a VAD-driven
95
+ processor (e.g. `VADProcessor` with `SileroVADAnalyzer`) must produce
96
+ `VADUserStartedSpeakingFrame` / `VADUserStoppedSpeakingFrame` upstream of
97
+ it. Each detected speech segment is decoded into a single final
98
+ `TranscriptionFrame` — Moonshine does not emit interim results.
99
+
100
+ ### Configuring the model at runtime
101
+
102
+ Pass an explicit model or a fully-built settings object:
103
+
104
+ ```python
105
+ # Convenience kwarg
106
+ stt = MoonshineSTTService(model=Model.BASE_EN)
107
+
108
+ # Or via Settings (e.g. when you want to update at runtime)
109
+ stt = MoonshineSTTService(
110
+ settings=MoonshineSTTService.Settings(model="moonshine/base"),
111
+ )
112
+ ```
113
+
114
+ ## Running the example
115
+
116
+ ```bash
117
+ git clone https://github.com/ubopod/pipecat-moonshine
118
+ cd pipecat-moonshine
119
+ pip install -e '.[examples]'
120
+ python examples/transcription-moonshine.py
121
+ ```
122
+
123
+ Speak into your default mic; lines like `Transcription: hello world` will be
124
+ printed for each detected utterance.
125
+
126
+ ## Audio format requirements
127
+
128
+ Moonshine expects **16 kHz, mono, 16-bit signed PCM** input. Pipecat's
129
+ default `LocalAudioTransport` and most WebRTC transports already provide
130
+ this. If your pipeline runs at a different sample rate the service will log
131
+ a warning on the first segment and transcription quality may degrade — add
132
+ a resampler upstream if you need a different rate.
133
+
134
+ Moonshine also enforces a per-segment duration window: speech segments
135
+ shorter than 0.1 s or longer than 64 s are silently dropped (the service
136
+ logs at `DEBUG` level when this happens).
137
+
138
+ ## Supported models
139
+
140
+ | Constant | Model name | Params | Notes |
141
+ | ---------------- | ------------------ | ------ | ------------------------------------ |
142
+ | `Model.TINY_EN` | `moonshine/tiny` | 26 M | English-only, MIT-licensed weights. |
143
+ | `Model.BASE_EN` | `moonshine/base` | 58 M | English-only, MIT-licensed weights. |
144
+
145
+ ### Multilingual models — important license note
146
+
147
+ Moonshine also publishes multilingual checkpoints (Spanish, Japanese,
148
+ Arabic, Korean, Mandarin, Vietnamese, Ukrainian, …). Those weights are
149
+ released under the **Moonshine Community License**, which is *non-commercial*.
150
+
151
+ For that reason they are intentionally **not** enumerated in the `Model`
152
+ enum. If you want to use one you must:
153
+
154
+ 1. Read and accept the upstream Moonshine Community License.
155
+ 2. Pass the model name as a string explicitly, e.g.:
156
+
157
+ ```python
158
+ stt = MoonshineSTTService(model="moonshine/base")
159
+ # then load the appropriate language checkpoint via your own download flow
160
+ ```
161
+
162
+ This package does not bundle, mirror, or auto-download non-commercial
163
+ weights, and the maintainers make no representation that doing so complies
164
+ with your use case.
165
+
166
+ ## Frames
167
+
168
+ | In | Out |
169
+ | ----------------------------------- | ---------------------------------------------------------- |
170
+ | `VADUserStartedSpeakingFrame` | (no output — buffers audio internally) |
171
+ | `VADUserStoppedSpeakingFrame` | one `TranscriptionFrame` per segment (final), or nothing |
172
+ | Any non-VAD audio | buffered/forwarded according to `SegmentedSTTService` |
173
+
174
+ Errors during transcription are pushed as `ErrorFrame`s; the pipeline is
175
+ not torn down so other services can continue.
176
+
177
+ ## Metrics
178
+
179
+ `can_generate_metrics()` returns `True`. Per-segment processing time is
180
+ recorded via `start_processing_metrics` / `stop_processing_metrics`, so
181
+ enabling metrics on your `PipelineTask` will surface Moonshine latency
182
+ alongside the rest of your pipeline.
183
+
184
+ ## Maintainer
185
+
186
+ Community-maintained. Not affiliated with Moonshine AI or Daily.
187
+
188
+ ## License
189
+
190
+ BSD-2-Clause — see [LICENSE](./LICENSE). Note that the Moonshine model
191
+ *weights* are governed by their own license (MIT for English models,
192
+ Moonshine Community License for others) — see the section above.
193
+
194
+ ## Versioning and changelog
195
+
196
+ See [CHANGELOG.md](./CHANGELOG.md). This package follows semantic
197
+ versioning.
198
+
199
+ ## Getting help
200
+
201
+ - Pipecat Discord: <https://discord.gg/pipecat> (`#community-integrations`)
202
+ - Pipecat changelog (track upstream changes that may affect this integration):
203
+ <https://github.com/pipecat-ai/pipecat/blob/main/CHANGELOG.md>
204
+ - Issues for this integration: file them in this repo.
@@ -0,0 +1,168 @@
1
+ # pipecat-moonshine
2
+
3
+ [Moonshine ASR](https://github.com/moonshine-ai/moonshine) speech-to-text
4
+ integration for [Pipecat](https://github.com/pipecat-ai/pipecat).
5
+
6
+ Moonshine is a family of small, fast automatic-speech-recognition models
7
+ optimized for resource-constrained devices. The Tiny English model is roughly
8
+ 26 M parameters, the Base English model roughly 58 M — both run on CPU via
9
+ ONNX Runtime with no GPU required. That makes Moonshine an attractive choice
10
+ for low-latency, on-device transcription in Pipecat pipelines that already
11
+ handle VAD upstream.
12
+
13
+ This package provides `MoonshineSTTService`, a `SegmentedSTTService`
14
+ subclass that plugs straight into any Pipecat pipeline.
15
+
16
+ ## Status
17
+
18
+ Community-maintained integration. See
19
+ [Pipecat's Community Integrations guide](https://github.com/pipecat-ai/pipecat/blob/main/COMMUNITY_INTEGRATIONS.md)
20
+ for what that means — in short, the Pipecat team does not maintain or
21
+ support this package; please file issues here.
22
+
23
+ **Tested with Pipecat v1.2.1.**
24
+
25
+ ## Installation
26
+
27
+ ```bash
28
+ pip install pipecat-moonshine
29
+ ```
30
+
31
+ This pulls in [`useful-moonshine-onnx`](https://pypi.org/project/useful-moonshine-onnx/)
32
+ and `pipecat-ai` automatically. The first time you instantiate the service,
33
+ the chosen model weights are downloaded from Hugging Face
34
+ (`UsefulSensors/moonshine`) and cached locally.
35
+
36
+ For the included foundational example you also need the local-audio extras:
37
+
38
+ ```bash
39
+ pip install 'pipecat-moonshine[examples]'
40
+ ```
41
+
42
+ ## Usage
43
+
44
+ ```python
45
+ from pipecat.pipeline.pipeline import Pipeline
46
+ from pipecat_moonshine import MoonshineSTTService, Model
47
+
48
+ stt = MoonshineSTTService(model=Model.TINY_EN)
49
+
50
+ pipeline = Pipeline([
51
+ transport.input(),
52
+ vad_processor, # MUST run upstream of the STT — see below
53
+ stt,
54
+ # ... downstream processors
55
+ ])
56
+ ```
57
+
58
+ `MoonshineSTTService` subclasses `SegmentedSTTService`, so a VAD-driven
59
+ processor (e.g. `VADProcessor` with `SileroVADAnalyzer`) must produce
60
+ `VADUserStartedSpeakingFrame` / `VADUserStoppedSpeakingFrame` upstream of
61
+ it. Each detected speech segment is decoded into a single final
62
+ `TranscriptionFrame` — Moonshine does not emit interim results.
63
+
64
+ ### Configuring the model at runtime
65
+
66
+ Pass an explicit model or a fully-built settings object:
67
+
68
+ ```python
69
+ # Convenience kwarg
70
+ stt = MoonshineSTTService(model=Model.BASE_EN)
71
+
72
+ # Or via Settings (e.g. when you want to update at runtime)
73
+ stt = MoonshineSTTService(
74
+ settings=MoonshineSTTService.Settings(model="moonshine/base"),
75
+ )
76
+ ```
77
+
78
+ ## Running the example
79
+
80
+ ```bash
81
+ git clone https://github.com/ubopod/pipecat-moonshine
82
+ cd pipecat-moonshine
83
+ pip install -e '.[examples]'
84
+ python examples/transcription-moonshine.py
85
+ ```
86
+
87
+ Speak into your default mic; lines like `Transcription: hello world` will be
88
+ printed for each detected utterance.
89
+
90
+ ## Audio format requirements
91
+
92
+ Moonshine expects **16 kHz, mono, 16-bit signed PCM** input. Pipecat's
93
+ default `LocalAudioTransport` and most WebRTC transports already provide
94
+ this. If your pipeline runs at a different sample rate the service will log
95
+ a warning on the first segment and transcription quality may degrade — add
96
+ a resampler upstream if you need a different rate.
97
+
98
+ Moonshine also enforces a per-segment duration window: speech segments
99
+ shorter than 0.1 s or longer than 64 s are silently dropped (the service
100
+ logs at `DEBUG` level when this happens).
101
+
102
+ ## Supported models
103
+
104
+ | Constant | Model name | Params | Notes |
105
+ | ---------------- | ------------------ | ------ | ------------------------------------ |
106
+ | `Model.TINY_EN` | `moonshine/tiny` | 26 M | English-only, MIT-licensed weights. |
107
+ | `Model.BASE_EN` | `moonshine/base` | 58 M | English-only, MIT-licensed weights. |
108
+
109
+ ### Multilingual models — important license note
110
+
111
+ Moonshine also publishes multilingual checkpoints (Spanish, Japanese,
112
+ Arabic, Korean, Mandarin, Vietnamese, Ukrainian, …). Those weights are
113
+ released under the **Moonshine Community License**, which is *non-commercial*.
114
+
115
+ For that reason they are intentionally **not** enumerated in the `Model`
116
+ enum. If you want to use one you must:
117
+
118
+ 1. Read and accept the upstream Moonshine Community License.
119
+ 2. Pass the model name as a string explicitly, e.g.:
120
+
121
+ ```python
122
+ stt = MoonshineSTTService(model="moonshine/base")
123
+ # then load the appropriate language checkpoint via your own download flow
124
+ ```
125
+
126
+ This package does not bundle, mirror, or auto-download non-commercial
127
+ weights, and the maintainers make no representation that doing so complies
128
+ with your use case.
129
+
130
+ ## Frames
131
+
132
+ | In | Out |
133
+ | ----------------------------------- | ---------------------------------------------------------- |
134
+ | `VADUserStartedSpeakingFrame` | (no output — buffers audio internally) |
135
+ | `VADUserStoppedSpeakingFrame` | one `TranscriptionFrame` per segment (final), or nothing |
136
+ | Any non-VAD audio | buffered/forwarded according to `SegmentedSTTService` |
137
+
138
+ Errors during transcription are pushed as `ErrorFrame`s; the pipeline is
139
+ not torn down so other services can continue.
140
+
141
+ ## Metrics
142
+
143
+ `can_generate_metrics()` returns `True`. Per-segment processing time is
144
+ recorded via `start_processing_metrics` / `stop_processing_metrics`, so
145
+ enabling metrics on your `PipelineTask` will surface Moonshine latency
146
+ alongside the rest of your pipeline.
147
+
148
+ ## Maintainer
149
+
150
+ Community-maintained. Not affiliated with Moonshine AI or Daily.
151
+
152
+ ## License
153
+
154
+ BSD-2-Clause — see [LICENSE](./LICENSE). Note that the Moonshine model
155
+ *weights* are governed by their own license (MIT for English models,
156
+ Moonshine Community License for others) — see the section above.
157
+
158
+ ## Versioning and changelog
159
+
160
+ See [CHANGELOG.md](./CHANGELOG.md). This package follows semantic
161
+ versioning.
162
+
163
+ ## Getting help
164
+
165
+ - Pipecat Discord: <https://discord.gg/pipecat> (`#community-integrations`)
166
+ - Pipecat changelog (track upstream changes that may affect this integration):
167
+ <https://github.com/pipecat-ai/pipecat/blob/main/CHANGELOG.md>
168
+ - Issues for this integration: file them in this repo.
@@ -0,0 +1,86 @@
1
+ [build-system]
2
+ requires = ["setuptools>=64"]
3
+ build-backend = "setuptools.build_meta"
4
+
5
+ [project]
6
+ name = "pipecat-moonshine"
7
+ version = "0.1.0"
8
+ description = "Moonshine ASR (speech-to-text) community integration for Pipecat"
9
+ readme = "README.md"
10
+ license = { text = "BSD-2-Clause" }
11
+ requires-python = ">=3.11"
12
+ keywords = ["pipecat", "moonshine", "asr", "stt", "speech-to-text", "voice", "ai"]
13
+ authors = [
14
+ { name = "pipecat-moonshine contributors" },
15
+ ]
16
+ classifiers = [
17
+ "Development Status :: 4 - Beta",
18
+ "Intended Audience :: Developers",
19
+ "License :: OSI Approved :: BSD License",
20
+ "Programming Language :: Python :: 3",
21
+ "Programming Language :: Python :: 3.11",
22
+ "Programming Language :: Python :: 3.12",
23
+ "Programming Language :: Python :: 3.13",
24
+ "Topic :: Multimedia :: Sound/Audio :: Speech",
25
+ "Topic :: Scientific/Engineering :: Artificial Intelligence",
26
+ ]
27
+ dependencies = [
28
+ # ``websockets-base`` extra is required because pipecat's
29
+ # ``stt_service`` module imports ``websockets.protocol.State`` at module
30
+ # top-level, even for services that don't use websockets themselves.
31
+ "pipecat-ai[websockets-base]>=1.2.1",
32
+ "useful-moonshine-onnx>=20251121",
33
+ "numpy>=1.26.4,<3",
34
+ "loguru>=0.7.0,<1",
35
+ ]
36
+
37
+ [project.urls]
38
+ Homepage = "https://github.com/ubopod/pipecat-moonshine"
39
+ Source = "https://github.com/ubopod/pipecat-moonshine"
40
+ Issues = "https://github.com/ubopod/pipecat-moonshine/issues"
41
+ Changelog = "https://github.com/ubopod/pipecat-moonshine/blob/main/CHANGELOG.md"
42
+
43
+ [project.optional-dependencies]
44
+ examples = [
45
+ "python-dotenv>=1.0.0,<2",
46
+ "pyaudio~=0.2.14",
47
+ ]
48
+ dev = [
49
+ "ruff>=0.12.11,<1",
50
+ "pytest>=9.0.0,<10",
51
+ "pytest-asyncio>=1.0.0,<2",
52
+ ]
53
+
54
+ [tool.setuptools.packages.find]
55
+ where = ["src"]
56
+
57
+ [tool.setuptools.package-data]
58
+ "pipecat_moonshine" = ["py.typed"]
59
+
60
+ [tool.ruff]
61
+ exclude = [".git", ".venv"]
62
+ line-length = 100
63
+
64
+ [tool.ruff.lint]
65
+ select = [
66
+ "D", # pydocstyle
67
+ "I", # isort
68
+ "UP", # pyupgrade
69
+ ]
70
+ ignore = [
71
+ "D105", # missing docstring in magic methods
72
+ ]
73
+
74
+ [tool.ruff.lint.per-file-ignores]
75
+ "examples/**/*.py" = ["D"]
76
+ "tests/**/*.py" = ["D"]
77
+ "**/__init__.py" = ["D104"]
78
+
79
+ [tool.ruff.lint.pydocstyle]
80
+ convention = "google"
81
+
82
+ [tool.pytest.ini_options]
83
+ addopts = "--verbose"
84
+ testpaths = ["tests"]
85
+ pythonpath = ["src"]
86
+ asyncio_default_fixture_loop_scope = "function"
@@ -0,0 +1,4 @@
1
+ [egg_info]
2
+ tag_build =
3
+ tag_date = 0
4
+
@@ -0,0 +1,23 @@
1
+ #
2
+ # Copyright (c) 2026, pipecat-moonshine contributors
3
+ #
4
+ # SPDX-License-Identifier: BSD-2-Clause
5
+ #
6
+
7
+ """Moonshine ASR (speech-to-text) integration for Pipecat."""
8
+
9
+ from pipecat_moonshine.stt import (
10
+ MOONSHINE_SAMPLE_RATE,
11
+ Model,
12
+ MoonshineSTTService,
13
+ MoonshineSTTSettings,
14
+ language_to_moonshine_language,
15
+ )
16
+
17
+ __all__ = [
18
+ "MOONSHINE_SAMPLE_RATE",
19
+ "Model",
20
+ "MoonshineSTTService",
21
+ "MoonshineSTTSettings",
22
+ "language_to_moonshine_language",
23
+ ]
File without changes
@@ -0,0 +1,287 @@
1
+ #
2
+ # Copyright (c) 2026, pipecat-moonshine contributors
3
+ #
4
+ # SPDX-License-Identifier: BSD-2-Clause
5
+ #
6
+
7
+ """Moonshine speech-to-text service for Pipecat.
8
+
9
+ This module integrates the Moonshine ASR family of models (via the
10
+ ``useful-moonshine-onnx`` package) with Pipecat's ``SegmentedSTTService``
11
+ interface, allowing fast, on-device transcription of audio segments produced
12
+ by VAD-driven pipelines.
13
+ """
14
+
15
+ import asyncio
16
+ from collections.abc import AsyncGenerator
17
+ from dataclasses import dataclass
18
+ from enum import Enum
19
+ from typing import TYPE_CHECKING
20
+
21
+ import numpy as np
22
+ from loguru import logger
23
+ from pipecat.frames.frames import ErrorFrame, Frame, TranscriptionFrame
24
+ from pipecat.services.settings import STTSettings, assert_given
25
+ from pipecat.services.stt_service import SegmentedSTTService
26
+ from pipecat.transcriptions.language import Language, resolve_language
27
+ from pipecat.utils.time import time_now_iso8601
28
+ from pipecat.utils.tracing.service_decorators import traced_stt
29
+
30
+ if TYPE_CHECKING:
31
+ try:
32
+ from moonshine_onnx import MoonshineOnnxModel
33
+ except ModuleNotFoundError as e:
34
+ logger.error(f"Exception: {e}")
35
+ logger.error(
36
+ "In order to use Moonshine STT, you need to `pip install useful-moonshine-onnx`."
37
+ )
38
+ raise Exception(f"Missing module: {e}") from e
39
+
40
+
41
+ # Moonshine's input contract: 16 kHz, mono, float32 in [-1, 1].
42
+ MOONSHINE_SAMPLE_RATE = 16000
43
+
44
+ # Per moonshine_onnx.transcribe.assert_audio_size: 0.1s < duration < 64s.
45
+ _MIN_AUDIO_SECONDS = 0.1
46
+ _MAX_AUDIO_SECONDS = 64.0
47
+
48
+
49
+ class Model(Enum):
50
+ """Moonshine model selection options.
51
+
52
+ Parameters:
53
+ TINY_EN: 26 M-parameter English-only model, fastest inference.
54
+ BASE_EN: 58 M-parameter English-only model, better accuracy.
55
+
56
+ Note:
57
+ Only the English models are enumerated here because their weights are
58
+ released under the permissive MIT license. Multilingual Moonshine
59
+ weights (Spanish, Japanese, Arabic, Korean, Mandarin, etc.) are
60
+ released under the non-commercial **Moonshine Community License** and
61
+ must be opted into explicitly by passing the model-name string
62
+ directly. By doing so you accept the upstream license terms.
63
+ """
64
+
65
+ TINY_EN = "moonshine/tiny"
66
+ BASE_EN = "moonshine/base"
67
+
68
+
69
+ def language_to_moonshine_language(language: Language) -> str | None:
70
+ """Map a pipecat ``Language`` enum to a Moonshine language code.
71
+
72
+ The ONNX models bundled with ``useful-moonshine-onnx`` are currently
73
+ English-only; this helper returns ``"en"`` for any English variant and
74
+ ``None`` for anything else (logging a warning via ``resolve_language``).
75
+
76
+ Args:
77
+ language: A ``Language`` enum value.
78
+
79
+ Returns:
80
+ The Moonshine language code, or ``None`` if not supported.
81
+ """
82
+ return resolve_language(language, {Language.EN: "en"}, use_base_code=True)
83
+
84
+
85
+ @dataclass
86
+ class MoonshineSTTSettings(STTSettings):
87
+ """Runtime-updatable settings for ``MoonshineSTTService``.
88
+
89
+ Inherits the standard ``model`` and ``language`` fields from
90
+ ``STTSettings``. The Moonshine ONNX inference path currently exposes no
91
+ additional runtime knobs; add new fields here as the upstream API grows.
92
+ """
93
+
94
+
95
+ class MoonshineSTTService(SegmentedSTTService):
96
+ """Transcribe audio with a locally-downloaded Moonshine ONNX model.
97
+
98
+ Moonshine is a family of small, fast ASR models optimized for on-device
99
+ inference. This service wraps :func:`moonshine_onnx.transcribe` and is
100
+ designed to plug directly into Pipecat pipelines that perform VAD
101
+ upstream: each speech segment yielded by VAD is fed to ``run_stt`` as
102
+ 16-bit PCM, converted to float32 in ``[-1, 1]``, and decoded into a
103
+ single final ``TranscriptionFrame``.
104
+
105
+ The first time a given model is requested it is downloaded from
106
+ ``huggingface.co/UsefulSensors/moonshine``. Subsequent runs read from the
107
+ local Hugging Face cache.
108
+
109
+ Audio format: 16 kHz, mono, 16-bit signed PCM. Pipecat's default
110
+ audio-in sample rate matches this; other rates will trigger a warning
111
+ and may degrade accuracy.
112
+
113
+ Example::
114
+
115
+ from pipecat_moonshine import MoonshineSTTService, Model
116
+
117
+ stt = MoonshineSTTService(model=Model.TINY_EN)
118
+ pipeline = Pipeline([transport.input(), vad_processor, stt, llm, ...])
119
+ """
120
+
121
+ Settings = MoonshineSTTSettings
122
+ _settings: Settings
123
+
124
+ def __init__(
125
+ self,
126
+ *,
127
+ model: str | Model | None = None,
128
+ settings: Settings | None = None,
129
+ **kwargs,
130
+ ):
131
+ """Initialize the Moonshine STT service.
132
+
133
+ Args:
134
+ model: Convenience shortcut for setting the model. Accepts a
135
+ ``Model`` enum value or a model-name string (e.g.
136
+ ``"moonshine/tiny"``). Equivalent to passing
137
+ ``settings=MoonshineSTTService.Settings(model=...)``.
138
+ settings: Runtime-updatable settings. When both ``settings`` and
139
+ ``model`` are provided, fields in ``settings`` win.
140
+ **kwargs: Additional arguments passed to ``SegmentedSTTService``.
141
+ """
142
+ # 1. Hardcoded defaults.
143
+ default_settings = self.Settings(
144
+ model=Model.TINY_EN.value,
145
+ language=Language.EN,
146
+ )
147
+
148
+ # 2. Convenience kwarg overrides.
149
+ if model is not None:
150
+ default_settings.model = model.value if isinstance(model, Model) else model
151
+
152
+ # 3. Settings delta (canonical API, always wins).
153
+ if settings is not None:
154
+ default_settings.apply_update(settings)
155
+
156
+ super().__init__(settings=default_settings, **kwargs)
157
+
158
+ self._model_obj: MoonshineOnnxModel | None = None
159
+ self._sample_rate_warned = False
160
+ self._load()
161
+
162
+ def can_generate_metrics(self) -> bool:
163
+ """Indicate whether this service emits processing metrics.
164
+
165
+ Returns:
166
+ ``True`` — Moonshine STT records per-segment processing time.
167
+ """
168
+ return True
169
+
170
+ def language_to_service_language(self, language: Language) -> str | None:
171
+ """Map a pipecat ``Language`` to a Moonshine language code.
172
+
173
+ Args:
174
+ language: The ``Language`` enum value to convert.
175
+
176
+ Returns:
177
+ The Moonshine code, or ``None`` if unsupported.
178
+ """
179
+ return language_to_moonshine_language(language)
180
+
181
+ def _load(self):
182
+ """Load the configured Moonshine ONNX model.
183
+
184
+ Note:
185
+ The first invocation for a given model downloads weights from the
186
+ Hugging Face Hub (``UsefulSensors/moonshine``); subsequent calls
187
+ reuse the local cache.
188
+ """
189
+ try:
190
+ from moonshine_onnx import MoonshineOnnxModel
191
+
192
+ model_name = assert_given(self._settings.model)
193
+ if model_name is None:
194
+ raise ValueError("Moonshine model must be specified")
195
+ logger.debug(f"Loading Moonshine model {model_name}...")
196
+ self._model_obj = MoonshineOnnxModel(model_name=model_name)
197
+ logger.debug(f"Loaded Moonshine model {model_name}")
198
+ except ModuleNotFoundError as e:
199
+ logger.error(f"Exception: {e}")
200
+ logger.error(
201
+ "In order to use Moonshine STT, you need to `pip install useful-moonshine-onnx`."
202
+ )
203
+ self._model_obj = None
204
+
205
+ @traced_stt
206
+ async def _handle_transcription(
207
+ self, transcript: str, is_final: bool, language: Language | None = None
208
+ ):
209
+ """Handle a transcription result with tracing.
210
+
211
+ Decorated with ``@traced_stt`` so OpenTelemetry spans are emitted
212
+ when Pipecat tracing is enabled. The body is intentionally empty;
213
+ the decorator does the work.
214
+ """
215
+ pass
216
+
217
+ async def run_stt(self, audio: bytes) -> AsyncGenerator[Frame, None]:
218
+ """Transcribe a single VAD-delimited speech segment.
219
+
220
+ Args:
221
+ audio: Raw 16-bit signed PCM audio bytes for one speech segment.
222
+
223
+ Yields:
224
+ Either a single ``TranscriptionFrame`` containing the decoded
225
+ text, or an ``ErrorFrame`` on failure. Out-of-range segments
226
+ (too short or too long for Moonshine) are silently dropped.
227
+ """
228
+ if self._model_obj is None:
229
+ yield ErrorFrame("Moonshine model not available")
230
+ return
231
+
232
+ if not self._sample_rate_warned and self.sample_rate != MOONSHINE_SAMPLE_RATE:
233
+ logger.warning(
234
+ f"Moonshine expects {MOONSHINE_SAMPLE_RATE} Hz audio; pipeline is "
235
+ f"providing {self.sample_rate} Hz. Transcription quality may degrade."
236
+ )
237
+ self._sample_rate_warned = True
238
+
239
+ await self.start_processing_metrics()
240
+
241
+ # Signed 16-bit PCM -> float32 in [-1, 1].
242
+ audio_float = np.frombuffer(audio, dtype=np.int16).astype(np.float32) / 32768.0
243
+
244
+ duration = audio_float.size / max(self.sample_rate, 1)
245
+ if duration <= _MIN_AUDIO_SECONDS or duration >= _MAX_AUDIO_SECONDS:
246
+ logger.debug(
247
+ f"Skipping Moonshine transcription: segment duration {duration:.2f}s "
248
+ f"is outside the supported ({_MIN_AUDIO_SECONDS}, {_MAX_AUDIO_SECONDS}) range."
249
+ )
250
+ await self.stop_processing_metrics()
251
+ return
252
+
253
+ try:
254
+ import moonshine_onnx
255
+
256
+ results = await asyncio.to_thread(
257
+ moonshine_onnx.transcribe, audio_float, self._model_obj
258
+ )
259
+ except Exception as e:
260
+ await self.stop_processing_metrics()
261
+ logger.exception(f"{self} Moonshine transcription error")
262
+ yield ErrorFrame(error=f"Moonshine transcription error: {e}")
263
+ return
264
+
265
+ await self.stop_processing_metrics()
266
+
267
+ text = (results[0] if results else "").strip()
268
+ if not text:
269
+ return
270
+
271
+ # STTSettings normalises ``language`` to a service-code string during
272
+ # init, so ``self._settings.language`` is e.g. ``"en"`` rather than
273
+ # ``Language.EN``. Map common codes back to the enum for the frame;
274
+ # anything we don't recognise (e.g. a user-overridden code) falls
275
+ # through as None — runtime callers can still read it from the frame.
276
+ language_setting = assert_given(self._settings.language)
277
+ language: Language | None
278
+ if isinstance(language_setting, Language):
279
+ language = language_setting
280
+ elif language_setting == "en":
281
+ language = Language.EN
282
+ else:
283
+ language = None
284
+
285
+ await self._handle_transcription(text, True, language)
286
+ logger.debug(f"Transcription: [{text}]")
287
+ yield TranscriptionFrame(text, self._user_id, time_now_iso8601(), language)
@@ -0,0 +1,204 @@
1
+ Metadata-Version: 2.4
2
+ Name: pipecat-moonshine
3
+ Version: 0.1.0
4
+ Summary: Moonshine ASR (speech-to-text) community integration for Pipecat
5
+ Author: pipecat-moonshine contributors
6
+ License: BSD-2-Clause
7
+ Project-URL: Homepage, https://github.com/ubopod/pipecat-moonshine
8
+ Project-URL: Source, https://github.com/ubopod/pipecat-moonshine
9
+ Project-URL: Issues, https://github.com/ubopod/pipecat-moonshine/issues
10
+ Project-URL: Changelog, https://github.com/ubopod/pipecat-moonshine/blob/main/CHANGELOG.md
11
+ Keywords: pipecat,moonshine,asr,stt,speech-to-text,voice,ai
12
+ Classifier: Development Status :: 4 - Beta
13
+ Classifier: Intended Audience :: Developers
14
+ Classifier: License :: OSI Approved :: BSD License
15
+ Classifier: Programming Language :: Python :: 3
16
+ Classifier: Programming Language :: Python :: 3.11
17
+ Classifier: Programming Language :: Python :: 3.12
18
+ Classifier: Programming Language :: Python :: 3.13
19
+ Classifier: Topic :: Multimedia :: Sound/Audio :: Speech
20
+ Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
21
+ Requires-Python: >=3.11
22
+ Description-Content-Type: text/markdown
23
+ License-File: LICENSE
24
+ Requires-Dist: pipecat-ai[websockets-base]>=1.2.1
25
+ Requires-Dist: useful-moonshine-onnx>=20251121
26
+ Requires-Dist: numpy<3,>=1.26.4
27
+ Requires-Dist: loguru<1,>=0.7.0
28
+ Provides-Extra: examples
29
+ Requires-Dist: python-dotenv<2,>=1.0.0; extra == "examples"
30
+ Requires-Dist: pyaudio~=0.2.14; extra == "examples"
31
+ Provides-Extra: dev
32
+ Requires-Dist: ruff<1,>=0.12.11; extra == "dev"
33
+ Requires-Dist: pytest<10,>=9.0.0; extra == "dev"
34
+ Requires-Dist: pytest-asyncio<2,>=1.0.0; extra == "dev"
35
+ Dynamic: license-file
36
+
37
+ # pipecat-moonshine
38
+
39
+ [Moonshine ASR](https://github.com/moonshine-ai/moonshine) speech-to-text
40
+ integration for [Pipecat](https://github.com/pipecat-ai/pipecat).
41
+
42
+ Moonshine is a family of small, fast automatic-speech-recognition models
43
+ optimized for resource-constrained devices. The Tiny English model is roughly
44
+ 26 M parameters, the Base English model roughly 58 M — both run on CPU via
45
+ ONNX Runtime with no GPU required. That makes Moonshine an attractive choice
46
+ for low-latency, on-device transcription in Pipecat pipelines that already
47
+ handle VAD upstream.
48
+
49
+ This package provides `MoonshineSTTService`, a `SegmentedSTTService`
50
+ subclass that plugs straight into any Pipecat pipeline.
51
+
52
+ ## Status
53
+
54
+ Community-maintained integration. See
55
+ [Pipecat's Community Integrations guide](https://github.com/pipecat-ai/pipecat/blob/main/COMMUNITY_INTEGRATIONS.md)
56
+ for what that means — in short, the Pipecat team does not maintain or
57
+ support this package; please file issues here.
58
+
59
+ **Tested with Pipecat v1.2.1.**
60
+
61
+ ## Installation
62
+
63
+ ```bash
64
+ pip install pipecat-moonshine
65
+ ```
66
+
67
+ This pulls in [`useful-moonshine-onnx`](https://pypi.org/project/useful-moonshine-onnx/)
68
+ and `pipecat-ai` automatically. The first time you instantiate the service,
69
+ the chosen model weights are downloaded from Hugging Face
70
+ (`UsefulSensors/moonshine`) and cached locally.
71
+
72
+ For the included foundational example you also need the local-audio extras:
73
+
74
+ ```bash
75
+ pip install 'pipecat-moonshine[examples]'
76
+ ```
77
+
78
+ ## Usage
79
+
80
+ ```python
81
+ from pipecat.pipeline.pipeline import Pipeline
82
+ from pipecat_moonshine import MoonshineSTTService, Model
83
+
84
+ stt = MoonshineSTTService(model=Model.TINY_EN)
85
+
86
+ pipeline = Pipeline([
87
+ transport.input(),
88
+ vad_processor, # MUST run upstream of the STT — see below
89
+ stt,
90
+ # ... downstream processors
91
+ ])
92
+ ```
93
+
94
+ `MoonshineSTTService` subclasses `SegmentedSTTService`, so a VAD-driven
95
+ processor (e.g. `VADProcessor` with `SileroVADAnalyzer`) must produce
96
+ `VADUserStartedSpeakingFrame` / `VADUserStoppedSpeakingFrame` upstream of
97
+ it. Each detected speech segment is decoded into a single final
98
+ `TranscriptionFrame` — Moonshine does not emit interim results.
99
+
100
+ ### Configuring the model at runtime
101
+
102
+ Pass an explicit model or a fully-built settings object:
103
+
104
+ ```python
105
+ # Convenience kwarg
106
+ stt = MoonshineSTTService(model=Model.BASE_EN)
107
+
108
+ # Or via Settings (e.g. when you want to update at runtime)
109
+ stt = MoonshineSTTService(
110
+ settings=MoonshineSTTService.Settings(model="moonshine/base"),
111
+ )
112
+ ```
113
+
114
+ ## Running the example
115
+
116
+ ```bash
117
+ git clone https://github.com/ubopod/pipecat-moonshine
118
+ cd pipecat-moonshine
119
+ pip install -e '.[examples]'
120
+ python examples/transcription-moonshine.py
121
+ ```
122
+
123
+ Speak into your default mic; lines like `Transcription: hello world` will be
124
+ printed for each detected utterance.
125
+
126
+ ## Audio format requirements
127
+
128
+ Moonshine expects **16 kHz, mono, 16-bit signed PCM** input. Pipecat's
129
+ default `LocalAudioTransport` and most WebRTC transports already provide
130
+ this. If your pipeline runs at a different sample rate the service will log
131
+ a warning on the first segment and transcription quality may degrade — add
132
+ a resampler upstream if you need a different rate.
133
+
134
+ Moonshine also enforces a per-segment duration window: speech segments
135
+ shorter than 0.1 s or longer than 64 s are silently dropped (the service
136
+ logs at `DEBUG` level when this happens).
137
+
138
+ ## Supported models
139
+
140
+ | Constant | Model name | Params | Notes |
141
+ | ---------------- | ------------------ | ------ | ------------------------------------ |
142
+ | `Model.TINY_EN` | `moonshine/tiny` | 26 M | English-only, MIT-licensed weights. |
143
+ | `Model.BASE_EN` | `moonshine/base` | 58 M | English-only, MIT-licensed weights. |
144
+
145
+ ### Multilingual models — important license note
146
+
147
+ Moonshine also publishes multilingual checkpoints (Spanish, Japanese,
148
+ Arabic, Korean, Mandarin, Vietnamese, Ukrainian, …). Those weights are
149
+ released under the **Moonshine Community License**, which is *non-commercial*.
150
+
151
+ For that reason they are intentionally **not** enumerated in the `Model`
152
+ enum. If you want to use one you must:
153
+
154
+ 1. Read and accept the upstream Moonshine Community License.
155
+ 2. Pass the model name as a string explicitly, e.g.:
156
+
157
+ ```python
158
+ stt = MoonshineSTTService(model="moonshine/base")
159
+ # then load the appropriate language checkpoint via your own download flow
160
+ ```
161
+
162
+ This package does not bundle, mirror, or auto-download non-commercial
163
+ weights, and the maintainers make no representation that doing so complies
164
+ with your use case.
165
+
166
+ ## Frames
167
+
168
+ | In | Out |
169
+ | ----------------------------------- | ---------------------------------------------------------- |
170
+ | `VADUserStartedSpeakingFrame` | (no output — buffers audio internally) |
171
+ | `VADUserStoppedSpeakingFrame` | one `TranscriptionFrame` per segment (final), or nothing |
172
+ | Any non-VAD audio | buffered/forwarded according to `SegmentedSTTService` |
173
+
174
+ Errors during transcription are pushed as `ErrorFrame`s; the pipeline is
175
+ not torn down so other services can continue.
176
+
177
+ ## Metrics
178
+
179
+ `can_generate_metrics()` returns `True`. Per-segment processing time is
180
+ recorded via `start_processing_metrics` / `stop_processing_metrics`, so
181
+ enabling metrics on your `PipelineTask` will surface Moonshine latency
182
+ alongside the rest of your pipeline.
183
+
184
+ ## Maintainer
185
+
186
+ Community-maintained. Not affiliated with Moonshine AI or Daily.
187
+
188
+ ## License
189
+
190
+ BSD-2-Clause — see [LICENSE](./LICENSE). Note that the Moonshine model
191
+ *weights* are governed by their own license (MIT for English models,
192
+ Moonshine Community License for others) — see the section above.
193
+
194
+ ## Versioning and changelog
195
+
196
+ See [CHANGELOG.md](./CHANGELOG.md). This package follows semantic
197
+ versioning.
198
+
199
+ ## Getting help
200
+
201
+ - Pipecat Discord: <https://discord.gg/pipecat> (`#community-integrations`)
202
+ - Pipecat changelog (track upstream changes that may affect this integration):
203
+ <https://github.com/pipecat-ai/pipecat/blob/main/CHANGELOG.md>
204
+ - Issues for this integration: file them in this repo.
@@ -0,0 +1,11 @@
1
+ LICENSE
2
+ README.md
3
+ pyproject.toml
4
+ src/pipecat_moonshine/__init__.py
5
+ src/pipecat_moonshine/py.typed
6
+ src/pipecat_moonshine/stt.py
7
+ src/pipecat_moonshine.egg-info/PKG-INFO
8
+ src/pipecat_moonshine.egg-info/SOURCES.txt
9
+ src/pipecat_moonshine.egg-info/dependency_links.txt
10
+ src/pipecat_moonshine.egg-info/requires.txt
11
+ src/pipecat_moonshine.egg-info/top_level.txt
@@ -0,0 +1,13 @@
1
+ pipecat-ai[websockets-base]>=1.2.1
2
+ useful-moonshine-onnx>=20251121
3
+ numpy<3,>=1.26.4
4
+ loguru<1,>=0.7.0
5
+
6
+ [dev]
7
+ ruff<1,>=0.12.11
8
+ pytest<10,>=9.0.0
9
+ pytest-asyncio<2,>=1.0.0
10
+
11
+ [examples]
12
+ python-dotenv<2,>=1.0.0
13
+ pyaudio~=0.2.14
@@ -0,0 +1 @@
1
+ pipecat_moonshine