pipecat-moonshine 0.1.0__tar.gz
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- pipecat_moonshine-0.1.0/LICENSE +24 -0
- pipecat_moonshine-0.1.0/PKG-INFO +204 -0
- pipecat_moonshine-0.1.0/README.md +168 -0
- pipecat_moonshine-0.1.0/pyproject.toml +86 -0
- pipecat_moonshine-0.1.0/setup.cfg +4 -0
- pipecat_moonshine-0.1.0/src/pipecat_moonshine/__init__.py +23 -0
- pipecat_moonshine-0.1.0/src/pipecat_moonshine/py.typed +0 -0
- pipecat_moonshine-0.1.0/src/pipecat_moonshine/stt.py +287 -0
- pipecat_moonshine-0.1.0/src/pipecat_moonshine.egg-info/PKG-INFO +204 -0
- pipecat_moonshine-0.1.0/src/pipecat_moonshine.egg-info/SOURCES.txt +11 -0
- pipecat_moonshine-0.1.0/src/pipecat_moonshine.egg-info/dependency_links.txt +1 -0
- pipecat_moonshine-0.1.0/src/pipecat_moonshine.egg-info/requires.txt +13 -0
- pipecat_moonshine-0.1.0/src/pipecat_moonshine.egg-info/top_level.txt +1 -0
|
@@ -0,0 +1,24 @@
|
|
|
1
|
+
BSD 2-Clause License
|
|
2
|
+
|
|
3
|
+
Copyright (c) 2026, pipecat-moonshine contributors
|
|
4
|
+
|
|
5
|
+
Redistribution and use in source and binary forms, with or without
|
|
6
|
+
modification, are permitted provided that the following conditions are met:
|
|
7
|
+
|
|
8
|
+
1. Redistributions of source code must retain the above copyright notice, this
|
|
9
|
+
list of conditions and the following disclaimer.
|
|
10
|
+
|
|
11
|
+
2. Redistributions in binary form must reproduce the above copyright notice,
|
|
12
|
+
this list of conditions and the following disclaimer in the documentation
|
|
13
|
+
and/or other materials provided with the distribution.
|
|
14
|
+
|
|
15
|
+
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
|
|
16
|
+
AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
|
|
17
|
+
IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
|
|
18
|
+
DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE
|
|
19
|
+
FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
|
|
20
|
+
DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
|
|
21
|
+
SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
|
|
22
|
+
CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
|
|
23
|
+
OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
|
|
24
|
+
OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
|
|
@@ -0,0 +1,204 @@
|
|
|
1
|
+
Metadata-Version: 2.4
|
|
2
|
+
Name: pipecat-moonshine
|
|
3
|
+
Version: 0.1.0
|
|
4
|
+
Summary: Moonshine ASR (speech-to-text) community integration for Pipecat
|
|
5
|
+
Author: pipecat-moonshine contributors
|
|
6
|
+
License: BSD-2-Clause
|
|
7
|
+
Project-URL: Homepage, https://github.com/ubopod/pipecat-moonshine
|
|
8
|
+
Project-URL: Source, https://github.com/ubopod/pipecat-moonshine
|
|
9
|
+
Project-URL: Issues, https://github.com/ubopod/pipecat-moonshine/issues
|
|
10
|
+
Project-URL: Changelog, https://github.com/ubopod/pipecat-moonshine/blob/main/CHANGELOG.md
|
|
11
|
+
Keywords: pipecat,moonshine,asr,stt,speech-to-text,voice,ai
|
|
12
|
+
Classifier: Development Status :: 4 - Beta
|
|
13
|
+
Classifier: Intended Audience :: Developers
|
|
14
|
+
Classifier: License :: OSI Approved :: BSD License
|
|
15
|
+
Classifier: Programming Language :: Python :: 3
|
|
16
|
+
Classifier: Programming Language :: Python :: 3.11
|
|
17
|
+
Classifier: Programming Language :: Python :: 3.12
|
|
18
|
+
Classifier: Programming Language :: Python :: 3.13
|
|
19
|
+
Classifier: Topic :: Multimedia :: Sound/Audio :: Speech
|
|
20
|
+
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
|
|
21
|
+
Requires-Python: >=3.11
|
|
22
|
+
Description-Content-Type: text/markdown
|
|
23
|
+
License-File: LICENSE
|
|
24
|
+
Requires-Dist: pipecat-ai[websockets-base]>=1.2.1
|
|
25
|
+
Requires-Dist: useful-moonshine-onnx>=20251121
|
|
26
|
+
Requires-Dist: numpy<3,>=1.26.4
|
|
27
|
+
Requires-Dist: loguru<1,>=0.7.0
|
|
28
|
+
Provides-Extra: examples
|
|
29
|
+
Requires-Dist: python-dotenv<2,>=1.0.0; extra == "examples"
|
|
30
|
+
Requires-Dist: pyaudio~=0.2.14; extra == "examples"
|
|
31
|
+
Provides-Extra: dev
|
|
32
|
+
Requires-Dist: ruff<1,>=0.12.11; extra == "dev"
|
|
33
|
+
Requires-Dist: pytest<10,>=9.0.0; extra == "dev"
|
|
34
|
+
Requires-Dist: pytest-asyncio<2,>=1.0.0; extra == "dev"
|
|
35
|
+
Dynamic: license-file
|
|
36
|
+
|
|
37
|
+
# pipecat-moonshine
|
|
38
|
+
|
|
39
|
+
[Moonshine ASR](https://github.com/moonshine-ai/moonshine) speech-to-text
|
|
40
|
+
integration for [Pipecat](https://github.com/pipecat-ai/pipecat).
|
|
41
|
+
|
|
42
|
+
Moonshine is a family of small, fast automatic-speech-recognition models
|
|
43
|
+
optimized for resource-constrained devices. The Tiny English model is roughly
|
|
44
|
+
26 M parameters, the Base English model roughly 58 M — both run on CPU via
|
|
45
|
+
ONNX Runtime with no GPU required. That makes Moonshine an attractive choice
|
|
46
|
+
for low-latency, on-device transcription in Pipecat pipelines that already
|
|
47
|
+
handle VAD upstream.
|
|
48
|
+
|
|
49
|
+
This package provides `MoonshineSTTService`, a `SegmentedSTTService`
|
|
50
|
+
subclass that plugs straight into any Pipecat pipeline.
|
|
51
|
+
|
|
52
|
+
## Status
|
|
53
|
+
|
|
54
|
+
Community-maintained integration. See
|
|
55
|
+
[Pipecat's Community Integrations guide](https://github.com/pipecat-ai/pipecat/blob/main/COMMUNITY_INTEGRATIONS.md)
|
|
56
|
+
for what that means — in short, the Pipecat team does not maintain or
|
|
57
|
+
support this package; please file issues here.
|
|
58
|
+
|
|
59
|
+
**Tested with Pipecat v1.2.1.**
|
|
60
|
+
|
|
61
|
+
## Installation
|
|
62
|
+
|
|
63
|
+
```bash
|
|
64
|
+
pip install pipecat-moonshine
|
|
65
|
+
```
|
|
66
|
+
|
|
67
|
+
This pulls in [`useful-moonshine-onnx`](https://pypi.org/project/useful-moonshine-onnx/)
|
|
68
|
+
and `pipecat-ai` automatically. The first time you instantiate the service,
|
|
69
|
+
the chosen model weights are downloaded from Hugging Face
|
|
70
|
+
(`UsefulSensors/moonshine`) and cached locally.
|
|
71
|
+
|
|
72
|
+
For the included foundational example you also need the local-audio extras:
|
|
73
|
+
|
|
74
|
+
```bash
|
|
75
|
+
pip install 'pipecat-moonshine[examples]'
|
|
76
|
+
```
|
|
77
|
+
|
|
78
|
+
## Usage
|
|
79
|
+
|
|
80
|
+
```python
|
|
81
|
+
from pipecat.pipeline.pipeline import Pipeline
|
|
82
|
+
from pipecat_moonshine import MoonshineSTTService, Model
|
|
83
|
+
|
|
84
|
+
stt = MoonshineSTTService(model=Model.TINY_EN)
|
|
85
|
+
|
|
86
|
+
pipeline = Pipeline([
|
|
87
|
+
transport.input(),
|
|
88
|
+
vad_processor, # MUST run upstream of the STT — see below
|
|
89
|
+
stt,
|
|
90
|
+
# ... downstream processors
|
|
91
|
+
])
|
|
92
|
+
```
|
|
93
|
+
|
|
94
|
+
`MoonshineSTTService` subclasses `SegmentedSTTService`, so a VAD-driven
|
|
95
|
+
processor (e.g. `VADProcessor` with `SileroVADAnalyzer`) must produce
|
|
96
|
+
`VADUserStartedSpeakingFrame` / `VADUserStoppedSpeakingFrame` upstream of
|
|
97
|
+
it. Each detected speech segment is decoded into a single final
|
|
98
|
+
`TranscriptionFrame` — Moonshine does not emit interim results.
|
|
99
|
+
|
|
100
|
+
### Configuring the model at runtime
|
|
101
|
+
|
|
102
|
+
Pass an explicit model or a fully-built settings object:
|
|
103
|
+
|
|
104
|
+
```python
|
|
105
|
+
# Convenience kwarg
|
|
106
|
+
stt = MoonshineSTTService(model=Model.BASE_EN)
|
|
107
|
+
|
|
108
|
+
# Or via Settings (e.g. when you want to update at runtime)
|
|
109
|
+
stt = MoonshineSTTService(
|
|
110
|
+
settings=MoonshineSTTService.Settings(model="moonshine/base"),
|
|
111
|
+
)
|
|
112
|
+
```
|
|
113
|
+
|
|
114
|
+
## Running the example
|
|
115
|
+
|
|
116
|
+
```bash
|
|
117
|
+
git clone https://github.com/ubopod/pipecat-moonshine
|
|
118
|
+
cd pipecat-moonshine
|
|
119
|
+
pip install -e '.[examples]'
|
|
120
|
+
python examples/transcription-moonshine.py
|
|
121
|
+
```
|
|
122
|
+
|
|
123
|
+
Speak into your default mic; lines like `Transcription: hello world` will be
|
|
124
|
+
printed for each detected utterance.
|
|
125
|
+
|
|
126
|
+
## Audio format requirements
|
|
127
|
+
|
|
128
|
+
Moonshine expects **16 kHz, mono, 16-bit signed PCM** input. Pipecat's
|
|
129
|
+
default `LocalAudioTransport` and most WebRTC transports already provide
|
|
130
|
+
this. If your pipeline runs at a different sample rate the service will log
|
|
131
|
+
a warning on the first segment and transcription quality may degrade — add
|
|
132
|
+
a resampler upstream if you need a different rate.
|
|
133
|
+
|
|
134
|
+
Moonshine also enforces a per-segment duration window: speech segments
|
|
135
|
+
shorter than 0.1 s or longer than 64 s are silently dropped (the service
|
|
136
|
+
logs at `DEBUG` level when this happens).
|
|
137
|
+
|
|
138
|
+
## Supported models
|
|
139
|
+
|
|
140
|
+
| Constant | Model name | Params | Notes |
|
|
141
|
+
| ---------------- | ------------------ | ------ | ------------------------------------ |
|
|
142
|
+
| `Model.TINY_EN` | `moonshine/tiny` | 26 M | English-only, MIT-licensed weights. |
|
|
143
|
+
| `Model.BASE_EN` | `moonshine/base` | 58 M | English-only, MIT-licensed weights. |
|
|
144
|
+
|
|
145
|
+
### Multilingual models — important license note
|
|
146
|
+
|
|
147
|
+
Moonshine also publishes multilingual checkpoints (Spanish, Japanese,
|
|
148
|
+
Arabic, Korean, Mandarin, Vietnamese, Ukrainian, …). Those weights are
|
|
149
|
+
released under the **Moonshine Community License**, which is *non-commercial*.
|
|
150
|
+
|
|
151
|
+
For that reason they are intentionally **not** enumerated in the `Model`
|
|
152
|
+
enum. If you want to use one you must:
|
|
153
|
+
|
|
154
|
+
1. Read and accept the upstream Moonshine Community License.
|
|
155
|
+
2. Pass the model name as a string explicitly, e.g.:
|
|
156
|
+
|
|
157
|
+
```python
|
|
158
|
+
stt = MoonshineSTTService(model="moonshine/base")
|
|
159
|
+
# then load the appropriate language checkpoint via your own download flow
|
|
160
|
+
```
|
|
161
|
+
|
|
162
|
+
This package does not bundle, mirror, or auto-download non-commercial
|
|
163
|
+
weights, and the maintainers make no representation that doing so complies
|
|
164
|
+
with your use case.
|
|
165
|
+
|
|
166
|
+
## Frames
|
|
167
|
+
|
|
168
|
+
| In | Out |
|
|
169
|
+
| ----------------------------------- | ---------------------------------------------------------- |
|
|
170
|
+
| `VADUserStartedSpeakingFrame` | (no output — buffers audio internally) |
|
|
171
|
+
| `VADUserStoppedSpeakingFrame` | one `TranscriptionFrame` per segment (final), or nothing |
|
|
172
|
+
| Any non-VAD audio | buffered/forwarded according to `SegmentedSTTService` |
|
|
173
|
+
|
|
174
|
+
Errors during transcription are pushed as `ErrorFrame`s; the pipeline is
|
|
175
|
+
not torn down so other services can continue.
|
|
176
|
+
|
|
177
|
+
## Metrics
|
|
178
|
+
|
|
179
|
+
`can_generate_metrics()` returns `True`. Per-segment processing time is
|
|
180
|
+
recorded via `start_processing_metrics` / `stop_processing_metrics`, so
|
|
181
|
+
enabling metrics on your `PipelineTask` will surface Moonshine latency
|
|
182
|
+
alongside the rest of your pipeline.
|
|
183
|
+
|
|
184
|
+
## Maintainer
|
|
185
|
+
|
|
186
|
+
Community-maintained. Not affiliated with Moonshine AI or Daily.
|
|
187
|
+
|
|
188
|
+
## License
|
|
189
|
+
|
|
190
|
+
BSD-2-Clause — see [LICENSE](./LICENSE). Note that the Moonshine model
|
|
191
|
+
*weights* are governed by their own license (MIT for English models,
|
|
192
|
+
Moonshine Community License for others) — see the section above.
|
|
193
|
+
|
|
194
|
+
## Versioning and changelog
|
|
195
|
+
|
|
196
|
+
See [CHANGELOG.md](./CHANGELOG.md). This package follows semantic
|
|
197
|
+
versioning.
|
|
198
|
+
|
|
199
|
+
## Getting help
|
|
200
|
+
|
|
201
|
+
- Pipecat Discord: <https://discord.gg/pipecat> (`#community-integrations`)
|
|
202
|
+
- Pipecat changelog (track upstream changes that may affect this integration):
|
|
203
|
+
<https://github.com/pipecat-ai/pipecat/blob/main/CHANGELOG.md>
|
|
204
|
+
- Issues for this integration: file them in this repo.
|
|
@@ -0,0 +1,168 @@
|
|
|
1
|
+
# pipecat-moonshine
|
|
2
|
+
|
|
3
|
+
[Moonshine ASR](https://github.com/moonshine-ai/moonshine) speech-to-text
|
|
4
|
+
integration for [Pipecat](https://github.com/pipecat-ai/pipecat).
|
|
5
|
+
|
|
6
|
+
Moonshine is a family of small, fast automatic-speech-recognition models
|
|
7
|
+
optimized for resource-constrained devices. The Tiny English model is roughly
|
|
8
|
+
26 M parameters, the Base English model roughly 58 M — both run on CPU via
|
|
9
|
+
ONNX Runtime with no GPU required. That makes Moonshine an attractive choice
|
|
10
|
+
for low-latency, on-device transcription in Pipecat pipelines that already
|
|
11
|
+
handle VAD upstream.
|
|
12
|
+
|
|
13
|
+
This package provides `MoonshineSTTService`, a `SegmentedSTTService`
|
|
14
|
+
subclass that plugs straight into any Pipecat pipeline.
|
|
15
|
+
|
|
16
|
+
## Status
|
|
17
|
+
|
|
18
|
+
Community-maintained integration. See
|
|
19
|
+
[Pipecat's Community Integrations guide](https://github.com/pipecat-ai/pipecat/blob/main/COMMUNITY_INTEGRATIONS.md)
|
|
20
|
+
for what that means — in short, the Pipecat team does not maintain or
|
|
21
|
+
support this package; please file issues here.
|
|
22
|
+
|
|
23
|
+
**Tested with Pipecat v1.2.1.**
|
|
24
|
+
|
|
25
|
+
## Installation
|
|
26
|
+
|
|
27
|
+
```bash
|
|
28
|
+
pip install pipecat-moonshine
|
|
29
|
+
```
|
|
30
|
+
|
|
31
|
+
This pulls in [`useful-moonshine-onnx`](https://pypi.org/project/useful-moonshine-onnx/)
|
|
32
|
+
and `pipecat-ai` automatically. The first time you instantiate the service,
|
|
33
|
+
the chosen model weights are downloaded from Hugging Face
|
|
34
|
+
(`UsefulSensors/moonshine`) and cached locally.
|
|
35
|
+
|
|
36
|
+
For the included foundational example you also need the local-audio extras:
|
|
37
|
+
|
|
38
|
+
```bash
|
|
39
|
+
pip install 'pipecat-moonshine[examples]'
|
|
40
|
+
```
|
|
41
|
+
|
|
42
|
+
## Usage
|
|
43
|
+
|
|
44
|
+
```python
|
|
45
|
+
from pipecat.pipeline.pipeline import Pipeline
|
|
46
|
+
from pipecat_moonshine import MoonshineSTTService, Model
|
|
47
|
+
|
|
48
|
+
stt = MoonshineSTTService(model=Model.TINY_EN)
|
|
49
|
+
|
|
50
|
+
pipeline = Pipeline([
|
|
51
|
+
transport.input(),
|
|
52
|
+
vad_processor, # MUST run upstream of the STT — see below
|
|
53
|
+
stt,
|
|
54
|
+
# ... downstream processors
|
|
55
|
+
])
|
|
56
|
+
```
|
|
57
|
+
|
|
58
|
+
`MoonshineSTTService` subclasses `SegmentedSTTService`, so a VAD-driven
|
|
59
|
+
processor (e.g. `VADProcessor` with `SileroVADAnalyzer`) must produce
|
|
60
|
+
`VADUserStartedSpeakingFrame` / `VADUserStoppedSpeakingFrame` upstream of
|
|
61
|
+
it. Each detected speech segment is decoded into a single final
|
|
62
|
+
`TranscriptionFrame` — Moonshine does not emit interim results.
|
|
63
|
+
|
|
64
|
+
### Configuring the model at runtime
|
|
65
|
+
|
|
66
|
+
Pass an explicit model or a fully-built settings object:
|
|
67
|
+
|
|
68
|
+
```python
|
|
69
|
+
# Convenience kwarg
|
|
70
|
+
stt = MoonshineSTTService(model=Model.BASE_EN)
|
|
71
|
+
|
|
72
|
+
# Or via Settings (e.g. when you want to update at runtime)
|
|
73
|
+
stt = MoonshineSTTService(
|
|
74
|
+
settings=MoonshineSTTService.Settings(model="moonshine/base"),
|
|
75
|
+
)
|
|
76
|
+
```
|
|
77
|
+
|
|
78
|
+
## Running the example
|
|
79
|
+
|
|
80
|
+
```bash
|
|
81
|
+
git clone https://github.com/ubopod/pipecat-moonshine
|
|
82
|
+
cd pipecat-moonshine
|
|
83
|
+
pip install -e '.[examples]'
|
|
84
|
+
python examples/transcription-moonshine.py
|
|
85
|
+
```
|
|
86
|
+
|
|
87
|
+
Speak into your default mic; lines like `Transcription: hello world` will be
|
|
88
|
+
printed for each detected utterance.
|
|
89
|
+
|
|
90
|
+
## Audio format requirements
|
|
91
|
+
|
|
92
|
+
Moonshine expects **16 kHz, mono, 16-bit signed PCM** input. Pipecat's
|
|
93
|
+
default `LocalAudioTransport` and most WebRTC transports already provide
|
|
94
|
+
this. If your pipeline runs at a different sample rate the service will log
|
|
95
|
+
a warning on the first segment and transcription quality may degrade — add
|
|
96
|
+
a resampler upstream if you need a different rate.
|
|
97
|
+
|
|
98
|
+
Moonshine also enforces a per-segment duration window: speech segments
|
|
99
|
+
shorter than 0.1 s or longer than 64 s are silently dropped (the service
|
|
100
|
+
logs at `DEBUG` level when this happens).
|
|
101
|
+
|
|
102
|
+
## Supported models
|
|
103
|
+
|
|
104
|
+
| Constant | Model name | Params | Notes |
|
|
105
|
+
| ---------------- | ------------------ | ------ | ------------------------------------ |
|
|
106
|
+
| `Model.TINY_EN` | `moonshine/tiny` | 26 M | English-only, MIT-licensed weights. |
|
|
107
|
+
| `Model.BASE_EN` | `moonshine/base` | 58 M | English-only, MIT-licensed weights. |
|
|
108
|
+
|
|
109
|
+
### Multilingual models — important license note
|
|
110
|
+
|
|
111
|
+
Moonshine also publishes multilingual checkpoints (Spanish, Japanese,
|
|
112
|
+
Arabic, Korean, Mandarin, Vietnamese, Ukrainian, …). Those weights are
|
|
113
|
+
released under the **Moonshine Community License**, which is *non-commercial*.
|
|
114
|
+
|
|
115
|
+
For that reason they are intentionally **not** enumerated in the `Model`
|
|
116
|
+
enum. If you want to use one you must:
|
|
117
|
+
|
|
118
|
+
1. Read and accept the upstream Moonshine Community License.
|
|
119
|
+
2. Pass the model name as a string explicitly, e.g.:
|
|
120
|
+
|
|
121
|
+
```python
|
|
122
|
+
stt = MoonshineSTTService(model="moonshine/base")
|
|
123
|
+
# then load the appropriate language checkpoint via your own download flow
|
|
124
|
+
```
|
|
125
|
+
|
|
126
|
+
This package does not bundle, mirror, or auto-download non-commercial
|
|
127
|
+
weights, and the maintainers make no representation that doing so complies
|
|
128
|
+
with your use case.
|
|
129
|
+
|
|
130
|
+
## Frames
|
|
131
|
+
|
|
132
|
+
| In | Out |
|
|
133
|
+
| ----------------------------------- | ---------------------------------------------------------- |
|
|
134
|
+
| `VADUserStartedSpeakingFrame` | (no output — buffers audio internally) |
|
|
135
|
+
| `VADUserStoppedSpeakingFrame` | one `TranscriptionFrame` per segment (final), or nothing |
|
|
136
|
+
| Any non-VAD audio | buffered/forwarded according to `SegmentedSTTService` |
|
|
137
|
+
|
|
138
|
+
Errors during transcription are pushed as `ErrorFrame`s; the pipeline is
|
|
139
|
+
not torn down so other services can continue.
|
|
140
|
+
|
|
141
|
+
## Metrics
|
|
142
|
+
|
|
143
|
+
`can_generate_metrics()` returns `True`. Per-segment processing time is
|
|
144
|
+
recorded via `start_processing_metrics` / `stop_processing_metrics`, so
|
|
145
|
+
enabling metrics on your `PipelineTask` will surface Moonshine latency
|
|
146
|
+
alongside the rest of your pipeline.
|
|
147
|
+
|
|
148
|
+
## Maintainer
|
|
149
|
+
|
|
150
|
+
Community-maintained. Not affiliated with Moonshine AI or Daily.
|
|
151
|
+
|
|
152
|
+
## License
|
|
153
|
+
|
|
154
|
+
BSD-2-Clause — see [LICENSE](./LICENSE). Note that the Moonshine model
|
|
155
|
+
*weights* are governed by their own license (MIT for English models,
|
|
156
|
+
Moonshine Community License for others) — see the section above.
|
|
157
|
+
|
|
158
|
+
## Versioning and changelog
|
|
159
|
+
|
|
160
|
+
See [CHANGELOG.md](./CHANGELOG.md). This package follows semantic
|
|
161
|
+
versioning.
|
|
162
|
+
|
|
163
|
+
## Getting help
|
|
164
|
+
|
|
165
|
+
- Pipecat Discord: <https://discord.gg/pipecat> (`#community-integrations`)
|
|
166
|
+
- Pipecat changelog (track upstream changes that may affect this integration):
|
|
167
|
+
<https://github.com/pipecat-ai/pipecat/blob/main/CHANGELOG.md>
|
|
168
|
+
- Issues for this integration: file them in this repo.
|
|
@@ -0,0 +1,86 @@
|
|
|
1
|
+
[build-system]
|
|
2
|
+
requires = ["setuptools>=64"]
|
|
3
|
+
build-backend = "setuptools.build_meta"
|
|
4
|
+
|
|
5
|
+
[project]
|
|
6
|
+
name = "pipecat-moonshine"
|
|
7
|
+
version = "0.1.0"
|
|
8
|
+
description = "Moonshine ASR (speech-to-text) community integration for Pipecat"
|
|
9
|
+
readme = "README.md"
|
|
10
|
+
license = { text = "BSD-2-Clause" }
|
|
11
|
+
requires-python = ">=3.11"
|
|
12
|
+
keywords = ["pipecat", "moonshine", "asr", "stt", "speech-to-text", "voice", "ai"]
|
|
13
|
+
authors = [
|
|
14
|
+
{ name = "pipecat-moonshine contributors" },
|
|
15
|
+
]
|
|
16
|
+
classifiers = [
|
|
17
|
+
"Development Status :: 4 - Beta",
|
|
18
|
+
"Intended Audience :: Developers",
|
|
19
|
+
"License :: OSI Approved :: BSD License",
|
|
20
|
+
"Programming Language :: Python :: 3",
|
|
21
|
+
"Programming Language :: Python :: 3.11",
|
|
22
|
+
"Programming Language :: Python :: 3.12",
|
|
23
|
+
"Programming Language :: Python :: 3.13",
|
|
24
|
+
"Topic :: Multimedia :: Sound/Audio :: Speech",
|
|
25
|
+
"Topic :: Scientific/Engineering :: Artificial Intelligence",
|
|
26
|
+
]
|
|
27
|
+
dependencies = [
|
|
28
|
+
# ``websockets-base`` extra is required because pipecat's
|
|
29
|
+
# ``stt_service`` module imports ``websockets.protocol.State`` at module
|
|
30
|
+
# top-level, even for services that don't use websockets themselves.
|
|
31
|
+
"pipecat-ai[websockets-base]>=1.2.1",
|
|
32
|
+
"useful-moonshine-onnx>=20251121",
|
|
33
|
+
"numpy>=1.26.4,<3",
|
|
34
|
+
"loguru>=0.7.0,<1",
|
|
35
|
+
]
|
|
36
|
+
|
|
37
|
+
[project.urls]
|
|
38
|
+
Homepage = "https://github.com/ubopod/pipecat-moonshine"
|
|
39
|
+
Source = "https://github.com/ubopod/pipecat-moonshine"
|
|
40
|
+
Issues = "https://github.com/ubopod/pipecat-moonshine/issues"
|
|
41
|
+
Changelog = "https://github.com/ubopod/pipecat-moonshine/blob/main/CHANGELOG.md"
|
|
42
|
+
|
|
43
|
+
[project.optional-dependencies]
|
|
44
|
+
examples = [
|
|
45
|
+
"python-dotenv>=1.0.0,<2",
|
|
46
|
+
"pyaudio~=0.2.14",
|
|
47
|
+
]
|
|
48
|
+
dev = [
|
|
49
|
+
"ruff>=0.12.11,<1",
|
|
50
|
+
"pytest>=9.0.0,<10",
|
|
51
|
+
"pytest-asyncio>=1.0.0,<2",
|
|
52
|
+
]
|
|
53
|
+
|
|
54
|
+
[tool.setuptools.packages.find]
|
|
55
|
+
where = ["src"]
|
|
56
|
+
|
|
57
|
+
[tool.setuptools.package-data]
|
|
58
|
+
"pipecat_moonshine" = ["py.typed"]
|
|
59
|
+
|
|
60
|
+
[tool.ruff]
|
|
61
|
+
exclude = [".git", ".venv"]
|
|
62
|
+
line-length = 100
|
|
63
|
+
|
|
64
|
+
[tool.ruff.lint]
|
|
65
|
+
select = [
|
|
66
|
+
"D", # pydocstyle
|
|
67
|
+
"I", # isort
|
|
68
|
+
"UP", # pyupgrade
|
|
69
|
+
]
|
|
70
|
+
ignore = [
|
|
71
|
+
"D105", # missing docstring in magic methods
|
|
72
|
+
]
|
|
73
|
+
|
|
74
|
+
[tool.ruff.lint.per-file-ignores]
|
|
75
|
+
"examples/**/*.py" = ["D"]
|
|
76
|
+
"tests/**/*.py" = ["D"]
|
|
77
|
+
"**/__init__.py" = ["D104"]
|
|
78
|
+
|
|
79
|
+
[tool.ruff.lint.pydocstyle]
|
|
80
|
+
convention = "google"
|
|
81
|
+
|
|
82
|
+
[tool.pytest.ini_options]
|
|
83
|
+
addopts = "--verbose"
|
|
84
|
+
testpaths = ["tests"]
|
|
85
|
+
pythonpath = ["src"]
|
|
86
|
+
asyncio_default_fixture_loop_scope = "function"
|
|
@@ -0,0 +1,23 @@
|
|
|
1
|
+
#
|
|
2
|
+
# Copyright (c) 2026, pipecat-moonshine contributors
|
|
3
|
+
#
|
|
4
|
+
# SPDX-License-Identifier: BSD-2-Clause
|
|
5
|
+
#
|
|
6
|
+
|
|
7
|
+
"""Moonshine ASR (speech-to-text) integration for Pipecat."""
|
|
8
|
+
|
|
9
|
+
from pipecat_moonshine.stt import (
|
|
10
|
+
MOONSHINE_SAMPLE_RATE,
|
|
11
|
+
Model,
|
|
12
|
+
MoonshineSTTService,
|
|
13
|
+
MoonshineSTTSettings,
|
|
14
|
+
language_to_moonshine_language,
|
|
15
|
+
)
|
|
16
|
+
|
|
17
|
+
__all__ = [
|
|
18
|
+
"MOONSHINE_SAMPLE_RATE",
|
|
19
|
+
"Model",
|
|
20
|
+
"MoonshineSTTService",
|
|
21
|
+
"MoonshineSTTSettings",
|
|
22
|
+
"language_to_moonshine_language",
|
|
23
|
+
]
|
|
File without changes
|
|
@@ -0,0 +1,287 @@
|
|
|
1
|
+
#
|
|
2
|
+
# Copyright (c) 2026, pipecat-moonshine contributors
|
|
3
|
+
#
|
|
4
|
+
# SPDX-License-Identifier: BSD-2-Clause
|
|
5
|
+
#
|
|
6
|
+
|
|
7
|
+
"""Moonshine speech-to-text service for Pipecat.
|
|
8
|
+
|
|
9
|
+
This module integrates the Moonshine ASR family of models (via the
|
|
10
|
+
``useful-moonshine-onnx`` package) with Pipecat's ``SegmentedSTTService``
|
|
11
|
+
interface, allowing fast, on-device transcription of audio segments produced
|
|
12
|
+
by VAD-driven pipelines.
|
|
13
|
+
"""
|
|
14
|
+
|
|
15
|
+
import asyncio
|
|
16
|
+
from collections.abc import AsyncGenerator
|
|
17
|
+
from dataclasses import dataclass
|
|
18
|
+
from enum import Enum
|
|
19
|
+
from typing import TYPE_CHECKING
|
|
20
|
+
|
|
21
|
+
import numpy as np
|
|
22
|
+
from loguru import logger
|
|
23
|
+
from pipecat.frames.frames import ErrorFrame, Frame, TranscriptionFrame
|
|
24
|
+
from pipecat.services.settings import STTSettings, assert_given
|
|
25
|
+
from pipecat.services.stt_service import SegmentedSTTService
|
|
26
|
+
from pipecat.transcriptions.language import Language, resolve_language
|
|
27
|
+
from pipecat.utils.time import time_now_iso8601
|
|
28
|
+
from pipecat.utils.tracing.service_decorators import traced_stt
|
|
29
|
+
|
|
30
|
+
if TYPE_CHECKING:
|
|
31
|
+
try:
|
|
32
|
+
from moonshine_onnx import MoonshineOnnxModel
|
|
33
|
+
except ModuleNotFoundError as e:
|
|
34
|
+
logger.error(f"Exception: {e}")
|
|
35
|
+
logger.error(
|
|
36
|
+
"In order to use Moonshine STT, you need to `pip install useful-moonshine-onnx`."
|
|
37
|
+
)
|
|
38
|
+
raise Exception(f"Missing module: {e}") from e
|
|
39
|
+
|
|
40
|
+
|
|
41
|
+
# Moonshine's input contract: 16 kHz, mono, float32 in [-1, 1].
|
|
42
|
+
MOONSHINE_SAMPLE_RATE = 16000
|
|
43
|
+
|
|
44
|
+
# Per moonshine_onnx.transcribe.assert_audio_size: 0.1s < duration < 64s.
|
|
45
|
+
_MIN_AUDIO_SECONDS = 0.1
|
|
46
|
+
_MAX_AUDIO_SECONDS = 64.0
|
|
47
|
+
|
|
48
|
+
|
|
49
|
+
class Model(Enum):
|
|
50
|
+
"""Moonshine model selection options.
|
|
51
|
+
|
|
52
|
+
Parameters:
|
|
53
|
+
TINY_EN: 26 M-parameter English-only model, fastest inference.
|
|
54
|
+
BASE_EN: 58 M-parameter English-only model, better accuracy.
|
|
55
|
+
|
|
56
|
+
Note:
|
|
57
|
+
Only the English models are enumerated here because their weights are
|
|
58
|
+
released under the permissive MIT license. Multilingual Moonshine
|
|
59
|
+
weights (Spanish, Japanese, Arabic, Korean, Mandarin, etc.) are
|
|
60
|
+
released under the non-commercial **Moonshine Community License** and
|
|
61
|
+
must be opted into explicitly by passing the model-name string
|
|
62
|
+
directly. By doing so you accept the upstream license terms.
|
|
63
|
+
"""
|
|
64
|
+
|
|
65
|
+
TINY_EN = "moonshine/tiny"
|
|
66
|
+
BASE_EN = "moonshine/base"
|
|
67
|
+
|
|
68
|
+
|
|
69
|
+
def language_to_moonshine_language(language: Language) -> str | None:
|
|
70
|
+
"""Map a pipecat ``Language`` enum to a Moonshine language code.
|
|
71
|
+
|
|
72
|
+
The ONNX models bundled with ``useful-moonshine-onnx`` are currently
|
|
73
|
+
English-only; this helper returns ``"en"`` for any English variant and
|
|
74
|
+
``None`` for anything else (logging a warning via ``resolve_language``).
|
|
75
|
+
|
|
76
|
+
Args:
|
|
77
|
+
language: A ``Language`` enum value.
|
|
78
|
+
|
|
79
|
+
Returns:
|
|
80
|
+
The Moonshine language code, or ``None`` if not supported.
|
|
81
|
+
"""
|
|
82
|
+
return resolve_language(language, {Language.EN: "en"}, use_base_code=True)
|
|
83
|
+
|
|
84
|
+
|
|
85
|
+
@dataclass
|
|
86
|
+
class MoonshineSTTSettings(STTSettings):
|
|
87
|
+
"""Runtime-updatable settings for ``MoonshineSTTService``.
|
|
88
|
+
|
|
89
|
+
Inherits the standard ``model`` and ``language`` fields from
|
|
90
|
+
``STTSettings``. The Moonshine ONNX inference path currently exposes no
|
|
91
|
+
additional runtime knobs; add new fields here as the upstream API grows.
|
|
92
|
+
"""
|
|
93
|
+
|
|
94
|
+
|
|
95
|
+
class MoonshineSTTService(SegmentedSTTService):
|
|
96
|
+
"""Transcribe audio with a locally-downloaded Moonshine ONNX model.
|
|
97
|
+
|
|
98
|
+
Moonshine is a family of small, fast ASR models optimized for on-device
|
|
99
|
+
inference. This service wraps :func:`moonshine_onnx.transcribe` and is
|
|
100
|
+
designed to plug directly into Pipecat pipelines that perform VAD
|
|
101
|
+
upstream: each speech segment yielded by VAD is fed to ``run_stt`` as
|
|
102
|
+
16-bit PCM, converted to float32 in ``[-1, 1]``, and decoded into a
|
|
103
|
+
single final ``TranscriptionFrame``.
|
|
104
|
+
|
|
105
|
+
The first time a given model is requested it is downloaded from
|
|
106
|
+
``huggingface.co/UsefulSensors/moonshine``. Subsequent runs read from the
|
|
107
|
+
local Hugging Face cache.
|
|
108
|
+
|
|
109
|
+
Audio format: 16 kHz, mono, 16-bit signed PCM. Pipecat's default
|
|
110
|
+
audio-in sample rate matches this; other rates will trigger a warning
|
|
111
|
+
and may degrade accuracy.
|
|
112
|
+
|
|
113
|
+
Example::
|
|
114
|
+
|
|
115
|
+
from pipecat_moonshine import MoonshineSTTService, Model
|
|
116
|
+
|
|
117
|
+
stt = MoonshineSTTService(model=Model.TINY_EN)
|
|
118
|
+
pipeline = Pipeline([transport.input(), vad_processor, stt, llm, ...])
|
|
119
|
+
"""
|
|
120
|
+
|
|
121
|
+
Settings = MoonshineSTTSettings
|
|
122
|
+
_settings: Settings
|
|
123
|
+
|
|
124
|
+
def __init__(
|
|
125
|
+
self,
|
|
126
|
+
*,
|
|
127
|
+
model: str | Model | None = None,
|
|
128
|
+
settings: Settings | None = None,
|
|
129
|
+
**kwargs,
|
|
130
|
+
):
|
|
131
|
+
"""Initialize the Moonshine STT service.
|
|
132
|
+
|
|
133
|
+
Args:
|
|
134
|
+
model: Convenience shortcut for setting the model. Accepts a
|
|
135
|
+
``Model`` enum value or a model-name string (e.g.
|
|
136
|
+
``"moonshine/tiny"``). Equivalent to passing
|
|
137
|
+
``settings=MoonshineSTTService.Settings(model=...)``.
|
|
138
|
+
settings: Runtime-updatable settings. When both ``settings`` and
|
|
139
|
+
``model`` are provided, fields in ``settings`` win.
|
|
140
|
+
**kwargs: Additional arguments passed to ``SegmentedSTTService``.
|
|
141
|
+
"""
|
|
142
|
+
# 1. Hardcoded defaults.
|
|
143
|
+
default_settings = self.Settings(
|
|
144
|
+
model=Model.TINY_EN.value,
|
|
145
|
+
language=Language.EN,
|
|
146
|
+
)
|
|
147
|
+
|
|
148
|
+
# 2. Convenience kwarg overrides.
|
|
149
|
+
if model is not None:
|
|
150
|
+
default_settings.model = model.value if isinstance(model, Model) else model
|
|
151
|
+
|
|
152
|
+
# 3. Settings delta (canonical API, always wins).
|
|
153
|
+
if settings is not None:
|
|
154
|
+
default_settings.apply_update(settings)
|
|
155
|
+
|
|
156
|
+
super().__init__(settings=default_settings, **kwargs)
|
|
157
|
+
|
|
158
|
+
self._model_obj: MoonshineOnnxModel | None = None
|
|
159
|
+
self._sample_rate_warned = False
|
|
160
|
+
self._load()
|
|
161
|
+
|
|
162
|
+
def can_generate_metrics(self) -> bool:
|
|
163
|
+
"""Indicate whether this service emits processing metrics.
|
|
164
|
+
|
|
165
|
+
Returns:
|
|
166
|
+
``True`` — Moonshine STT records per-segment processing time.
|
|
167
|
+
"""
|
|
168
|
+
return True
|
|
169
|
+
|
|
170
|
+
def language_to_service_language(self, language: Language) -> str | None:
|
|
171
|
+
"""Map a pipecat ``Language`` to a Moonshine language code.
|
|
172
|
+
|
|
173
|
+
Args:
|
|
174
|
+
language: The ``Language`` enum value to convert.
|
|
175
|
+
|
|
176
|
+
Returns:
|
|
177
|
+
The Moonshine code, or ``None`` if unsupported.
|
|
178
|
+
"""
|
|
179
|
+
return language_to_moonshine_language(language)
|
|
180
|
+
|
|
181
|
+
def _load(self):
|
|
182
|
+
"""Load the configured Moonshine ONNX model.
|
|
183
|
+
|
|
184
|
+
Note:
|
|
185
|
+
The first invocation for a given model downloads weights from the
|
|
186
|
+
Hugging Face Hub (``UsefulSensors/moonshine``); subsequent calls
|
|
187
|
+
reuse the local cache.
|
|
188
|
+
"""
|
|
189
|
+
try:
|
|
190
|
+
from moonshine_onnx import MoonshineOnnxModel
|
|
191
|
+
|
|
192
|
+
model_name = assert_given(self._settings.model)
|
|
193
|
+
if model_name is None:
|
|
194
|
+
raise ValueError("Moonshine model must be specified")
|
|
195
|
+
logger.debug(f"Loading Moonshine model {model_name}...")
|
|
196
|
+
self._model_obj = MoonshineOnnxModel(model_name=model_name)
|
|
197
|
+
logger.debug(f"Loaded Moonshine model {model_name}")
|
|
198
|
+
except ModuleNotFoundError as e:
|
|
199
|
+
logger.error(f"Exception: {e}")
|
|
200
|
+
logger.error(
|
|
201
|
+
"In order to use Moonshine STT, you need to `pip install useful-moonshine-onnx`."
|
|
202
|
+
)
|
|
203
|
+
self._model_obj = None
|
|
204
|
+
|
|
205
|
+
@traced_stt
|
|
206
|
+
async def _handle_transcription(
|
|
207
|
+
self, transcript: str, is_final: bool, language: Language | None = None
|
|
208
|
+
):
|
|
209
|
+
"""Handle a transcription result with tracing.
|
|
210
|
+
|
|
211
|
+
Decorated with ``@traced_stt`` so OpenTelemetry spans are emitted
|
|
212
|
+
when Pipecat tracing is enabled. The body is intentionally empty;
|
|
213
|
+
the decorator does the work.
|
|
214
|
+
"""
|
|
215
|
+
pass
|
|
216
|
+
|
|
217
|
+
async def run_stt(self, audio: bytes) -> AsyncGenerator[Frame, None]:
|
|
218
|
+
"""Transcribe a single VAD-delimited speech segment.
|
|
219
|
+
|
|
220
|
+
Args:
|
|
221
|
+
audio: Raw 16-bit signed PCM audio bytes for one speech segment.
|
|
222
|
+
|
|
223
|
+
Yields:
|
|
224
|
+
Either a single ``TranscriptionFrame`` containing the decoded
|
|
225
|
+
text, or an ``ErrorFrame`` on failure. Out-of-range segments
|
|
226
|
+
(too short or too long for Moonshine) are silently dropped.
|
|
227
|
+
"""
|
|
228
|
+
if self._model_obj is None:
|
|
229
|
+
yield ErrorFrame("Moonshine model not available")
|
|
230
|
+
return
|
|
231
|
+
|
|
232
|
+
if not self._sample_rate_warned and self.sample_rate != MOONSHINE_SAMPLE_RATE:
|
|
233
|
+
logger.warning(
|
|
234
|
+
f"Moonshine expects {MOONSHINE_SAMPLE_RATE} Hz audio; pipeline is "
|
|
235
|
+
f"providing {self.sample_rate} Hz. Transcription quality may degrade."
|
|
236
|
+
)
|
|
237
|
+
self._sample_rate_warned = True
|
|
238
|
+
|
|
239
|
+
await self.start_processing_metrics()
|
|
240
|
+
|
|
241
|
+
# Signed 16-bit PCM -> float32 in [-1, 1].
|
|
242
|
+
audio_float = np.frombuffer(audio, dtype=np.int16).astype(np.float32) / 32768.0
|
|
243
|
+
|
|
244
|
+
duration = audio_float.size / max(self.sample_rate, 1)
|
|
245
|
+
if duration <= _MIN_AUDIO_SECONDS or duration >= _MAX_AUDIO_SECONDS:
|
|
246
|
+
logger.debug(
|
|
247
|
+
f"Skipping Moonshine transcription: segment duration {duration:.2f}s "
|
|
248
|
+
f"is outside the supported ({_MIN_AUDIO_SECONDS}, {_MAX_AUDIO_SECONDS}) range."
|
|
249
|
+
)
|
|
250
|
+
await self.stop_processing_metrics()
|
|
251
|
+
return
|
|
252
|
+
|
|
253
|
+
try:
|
|
254
|
+
import moonshine_onnx
|
|
255
|
+
|
|
256
|
+
results = await asyncio.to_thread(
|
|
257
|
+
moonshine_onnx.transcribe, audio_float, self._model_obj
|
|
258
|
+
)
|
|
259
|
+
except Exception as e:
|
|
260
|
+
await self.stop_processing_metrics()
|
|
261
|
+
logger.exception(f"{self} Moonshine transcription error")
|
|
262
|
+
yield ErrorFrame(error=f"Moonshine transcription error: {e}")
|
|
263
|
+
return
|
|
264
|
+
|
|
265
|
+
await self.stop_processing_metrics()
|
|
266
|
+
|
|
267
|
+
text = (results[0] if results else "").strip()
|
|
268
|
+
if not text:
|
|
269
|
+
return
|
|
270
|
+
|
|
271
|
+
# STTSettings normalises ``language`` to a service-code string during
|
|
272
|
+
# init, so ``self._settings.language`` is e.g. ``"en"`` rather than
|
|
273
|
+
# ``Language.EN``. Map common codes back to the enum for the frame;
|
|
274
|
+
# anything we don't recognise (e.g. a user-overridden code) falls
|
|
275
|
+
# through as None — runtime callers can still read it from the frame.
|
|
276
|
+
language_setting = assert_given(self._settings.language)
|
|
277
|
+
language: Language | None
|
|
278
|
+
if isinstance(language_setting, Language):
|
|
279
|
+
language = language_setting
|
|
280
|
+
elif language_setting == "en":
|
|
281
|
+
language = Language.EN
|
|
282
|
+
else:
|
|
283
|
+
language = None
|
|
284
|
+
|
|
285
|
+
await self._handle_transcription(text, True, language)
|
|
286
|
+
logger.debug(f"Transcription: [{text}]")
|
|
287
|
+
yield TranscriptionFrame(text, self._user_id, time_now_iso8601(), language)
|
|
@@ -0,0 +1,204 @@
|
|
|
1
|
+
Metadata-Version: 2.4
|
|
2
|
+
Name: pipecat-moonshine
|
|
3
|
+
Version: 0.1.0
|
|
4
|
+
Summary: Moonshine ASR (speech-to-text) community integration for Pipecat
|
|
5
|
+
Author: pipecat-moonshine contributors
|
|
6
|
+
License: BSD-2-Clause
|
|
7
|
+
Project-URL: Homepage, https://github.com/ubopod/pipecat-moonshine
|
|
8
|
+
Project-URL: Source, https://github.com/ubopod/pipecat-moonshine
|
|
9
|
+
Project-URL: Issues, https://github.com/ubopod/pipecat-moonshine/issues
|
|
10
|
+
Project-URL: Changelog, https://github.com/ubopod/pipecat-moonshine/blob/main/CHANGELOG.md
|
|
11
|
+
Keywords: pipecat,moonshine,asr,stt,speech-to-text,voice,ai
|
|
12
|
+
Classifier: Development Status :: 4 - Beta
|
|
13
|
+
Classifier: Intended Audience :: Developers
|
|
14
|
+
Classifier: License :: OSI Approved :: BSD License
|
|
15
|
+
Classifier: Programming Language :: Python :: 3
|
|
16
|
+
Classifier: Programming Language :: Python :: 3.11
|
|
17
|
+
Classifier: Programming Language :: Python :: 3.12
|
|
18
|
+
Classifier: Programming Language :: Python :: 3.13
|
|
19
|
+
Classifier: Topic :: Multimedia :: Sound/Audio :: Speech
|
|
20
|
+
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
|
|
21
|
+
Requires-Python: >=3.11
|
|
22
|
+
Description-Content-Type: text/markdown
|
|
23
|
+
License-File: LICENSE
|
|
24
|
+
Requires-Dist: pipecat-ai[websockets-base]>=1.2.1
|
|
25
|
+
Requires-Dist: useful-moonshine-onnx>=20251121
|
|
26
|
+
Requires-Dist: numpy<3,>=1.26.4
|
|
27
|
+
Requires-Dist: loguru<1,>=0.7.0
|
|
28
|
+
Provides-Extra: examples
|
|
29
|
+
Requires-Dist: python-dotenv<2,>=1.0.0; extra == "examples"
|
|
30
|
+
Requires-Dist: pyaudio~=0.2.14; extra == "examples"
|
|
31
|
+
Provides-Extra: dev
|
|
32
|
+
Requires-Dist: ruff<1,>=0.12.11; extra == "dev"
|
|
33
|
+
Requires-Dist: pytest<10,>=9.0.0; extra == "dev"
|
|
34
|
+
Requires-Dist: pytest-asyncio<2,>=1.0.0; extra == "dev"
|
|
35
|
+
Dynamic: license-file
|
|
36
|
+
|
|
37
|
+
# pipecat-moonshine
|
|
38
|
+
|
|
39
|
+
[Moonshine ASR](https://github.com/moonshine-ai/moonshine) speech-to-text
|
|
40
|
+
integration for [Pipecat](https://github.com/pipecat-ai/pipecat).
|
|
41
|
+
|
|
42
|
+
Moonshine is a family of small, fast automatic-speech-recognition models
|
|
43
|
+
optimized for resource-constrained devices. The Tiny English model is roughly
|
|
44
|
+
26 M parameters, the Base English model roughly 58 M — both run on CPU via
|
|
45
|
+
ONNX Runtime with no GPU required. That makes Moonshine an attractive choice
|
|
46
|
+
for low-latency, on-device transcription in Pipecat pipelines that already
|
|
47
|
+
handle VAD upstream.
|
|
48
|
+
|
|
49
|
+
This package provides `MoonshineSTTService`, a `SegmentedSTTService`
|
|
50
|
+
subclass that plugs straight into any Pipecat pipeline.
|
|
51
|
+
|
|
52
|
+
## Status
|
|
53
|
+
|
|
54
|
+
Community-maintained integration. See
|
|
55
|
+
[Pipecat's Community Integrations guide](https://github.com/pipecat-ai/pipecat/blob/main/COMMUNITY_INTEGRATIONS.md)
|
|
56
|
+
for what that means — in short, the Pipecat team does not maintain or
|
|
57
|
+
support this package; please file issues here.
|
|
58
|
+
|
|
59
|
+
**Tested with Pipecat v1.2.1.**
|
|
60
|
+
|
|
61
|
+
## Installation
|
|
62
|
+
|
|
63
|
+
```bash
|
|
64
|
+
pip install pipecat-moonshine
|
|
65
|
+
```
|
|
66
|
+
|
|
67
|
+
This pulls in [`useful-moonshine-onnx`](https://pypi.org/project/useful-moonshine-onnx/)
|
|
68
|
+
and `pipecat-ai` automatically. The first time you instantiate the service,
|
|
69
|
+
the chosen model weights are downloaded from Hugging Face
|
|
70
|
+
(`UsefulSensors/moonshine`) and cached locally.
|
|
71
|
+
|
|
72
|
+
For the included foundational example you also need the local-audio extras:
|
|
73
|
+
|
|
74
|
+
```bash
|
|
75
|
+
pip install 'pipecat-moonshine[examples]'
|
|
76
|
+
```
|
|
77
|
+
|
|
78
|
+
## Usage
|
|
79
|
+
|
|
80
|
+
```python
|
|
81
|
+
from pipecat.pipeline.pipeline import Pipeline
|
|
82
|
+
from pipecat_moonshine import MoonshineSTTService, Model
|
|
83
|
+
|
|
84
|
+
stt = MoonshineSTTService(model=Model.TINY_EN)
|
|
85
|
+
|
|
86
|
+
pipeline = Pipeline([
|
|
87
|
+
transport.input(),
|
|
88
|
+
vad_processor, # MUST run upstream of the STT — see below
|
|
89
|
+
stt,
|
|
90
|
+
# ... downstream processors
|
|
91
|
+
])
|
|
92
|
+
```
|
|
93
|
+
|
|
94
|
+
`MoonshineSTTService` subclasses `SegmentedSTTService`, so a VAD-driven
|
|
95
|
+
processor (e.g. `VADProcessor` with `SileroVADAnalyzer`) must produce
|
|
96
|
+
`VADUserStartedSpeakingFrame` / `VADUserStoppedSpeakingFrame` upstream of
|
|
97
|
+
it. Each detected speech segment is decoded into a single final
|
|
98
|
+
`TranscriptionFrame` — Moonshine does not emit interim results.
|
|
99
|
+
|
|
100
|
+
### Configuring the model at runtime
|
|
101
|
+
|
|
102
|
+
Pass an explicit model or a fully-built settings object:
|
|
103
|
+
|
|
104
|
+
```python
|
|
105
|
+
# Convenience kwarg
|
|
106
|
+
stt = MoonshineSTTService(model=Model.BASE_EN)
|
|
107
|
+
|
|
108
|
+
# Or via Settings (e.g. when you want to update at runtime)
|
|
109
|
+
stt = MoonshineSTTService(
|
|
110
|
+
settings=MoonshineSTTService.Settings(model="moonshine/base"),
|
|
111
|
+
)
|
|
112
|
+
```
|
|
113
|
+
|
|
114
|
+
## Running the example
|
|
115
|
+
|
|
116
|
+
```bash
|
|
117
|
+
git clone https://github.com/ubopod/pipecat-moonshine
|
|
118
|
+
cd pipecat-moonshine
|
|
119
|
+
pip install -e '.[examples]'
|
|
120
|
+
python examples/transcription-moonshine.py
|
|
121
|
+
```
|
|
122
|
+
|
|
123
|
+
Speak into your default mic; lines like `Transcription: hello world` will be
|
|
124
|
+
printed for each detected utterance.
|
|
125
|
+
|
|
126
|
+
## Audio format requirements
|
|
127
|
+
|
|
128
|
+
Moonshine expects **16 kHz, mono, 16-bit signed PCM** input. Pipecat's
|
|
129
|
+
default `LocalAudioTransport` and most WebRTC transports already provide
|
|
130
|
+
this. If your pipeline runs at a different sample rate the service will log
|
|
131
|
+
a warning on the first segment and transcription quality may degrade — add
|
|
132
|
+
a resampler upstream if you need a different rate.
|
|
133
|
+
|
|
134
|
+
Moonshine also enforces a per-segment duration window: speech segments
|
|
135
|
+
shorter than 0.1 s or longer than 64 s are silently dropped (the service
|
|
136
|
+
logs at `DEBUG` level when this happens).
|
|
137
|
+
|
|
138
|
+
## Supported models
|
|
139
|
+
|
|
140
|
+
| Constant | Model name | Params | Notes |
|
|
141
|
+
| ---------------- | ------------------ | ------ | ------------------------------------ |
|
|
142
|
+
| `Model.TINY_EN` | `moonshine/tiny` | 26 M | English-only, MIT-licensed weights. |
|
|
143
|
+
| `Model.BASE_EN` | `moonshine/base` | 58 M | English-only, MIT-licensed weights. |
|
|
144
|
+
|
|
145
|
+
### Multilingual models — important license note
|
|
146
|
+
|
|
147
|
+
Moonshine also publishes multilingual checkpoints (Spanish, Japanese,
|
|
148
|
+
Arabic, Korean, Mandarin, Vietnamese, Ukrainian, …). Those weights are
|
|
149
|
+
released under the **Moonshine Community License**, which is *non-commercial*.
|
|
150
|
+
|
|
151
|
+
For that reason they are intentionally **not** enumerated in the `Model`
|
|
152
|
+
enum. If you want to use one you must:
|
|
153
|
+
|
|
154
|
+
1. Read and accept the upstream Moonshine Community License.
|
|
155
|
+
2. Pass the model name as a string explicitly, e.g.:
|
|
156
|
+
|
|
157
|
+
```python
|
|
158
|
+
stt = MoonshineSTTService(model="moonshine/base")
|
|
159
|
+
# then load the appropriate language checkpoint via your own download flow
|
|
160
|
+
```
|
|
161
|
+
|
|
162
|
+
This package does not bundle, mirror, or auto-download non-commercial
|
|
163
|
+
weights, and the maintainers make no representation that doing so complies
|
|
164
|
+
with your use case.
|
|
165
|
+
|
|
166
|
+
## Frames
|
|
167
|
+
|
|
168
|
+
| In | Out |
|
|
169
|
+
| ----------------------------------- | ---------------------------------------------------------- |
|
|
170
|
+
| `VADUserStartedSpeakingFrame` | (no output — buffers audio internally) |
|
|
171
|
+
| `VADUserStoppedSpeakingFrame` | one `TranscriptionFrame` per segment (final), or nothing |
|
|
172
|
+
| Any non-VAD audio | buffered/forwarded according to `SegmentedSTTService` |
|
|
173
|
+
|
|
174
|
+
Errors during transcription are pushed as `ErrorFrame`s; the pipeline is
|
|
175
|
+
not torn down so other services can continue.
|
|
176
|
+
|
|
177
|
+
## Metrics
|
|
178
|
+
|
|
179
|
+
`can_generate_metrics()` returns `True`. Per-segment processing time is
|
|
180
|
+
recorded via `start_processing_metrics` / `stop_processing_metrics`, so
|
|
181
|
+
enabling metrics on your `PipelineTask` will surface Moonshine latency
|
|
182
|
+
alongside the rest of your pipeline.
|
|
183
|
+
|
|
184
|
+
## Maintainer
|
|
185
|
+
|
|
186
|
+
Community-maintained. Not affiliated with Moonshine AI or Daily.
|
|
187
|
+
|
|
188
|
+
## License
|
|
189
|
+
|
|
190
|
+
BSD-2-Clause — see [LICENSE](./LICENSE). Note that the Moonshine model
|
|
191
|
+
*weights* are governed by their own license (MIT for English models,
|
|
192
|
+
Moonshine Community License for others) — see the section above.
|
|
193
|
+
|
|
194
|
+
## Versioning and changelog
|
|
195
|
+
|
|
196
|
+
See [CHANGELOG.md](./CHANGELOG.md). This package follows semantic
|
|
197
|
+
versioning.
|
|
198
|
+
|
|
199
|
+
## Getting help
|
|
200
|
+
|
|
201
|
+
- Pipecat Discord: <https://discord.gg/pipecat> (`#community-integrations`)
|
|
202
|
+
- Pipecat changelog (track upstream changes that may affect this integration):
|
|
203
|
+
<https://github.com/pipecat-ai/pipecat/blob/main/CHANGELOG.md>
|
|
204
|
+
- Issues for this integration: file them in this repo.
|
|
@@ -0,0 +1,11 @@
|
|
|
1
|
+
LICENSE
|
|
2
|
+
README.md
|
|
3
|
+
pyproject.toml
|
|
4
|
+
src/pipecat_moonshine/__init__.py
|
|
5
|
+
src/pipecat_moonshine/py.typed
|
|
6
|
+
src/pipecat_moonshine/stt.py
|
|
7
|
+
src/pipecat_moonshine.egg-info/PKG-INFO
|
|
8
|
+
src/pipecat_moonshine.egg-info/SOURCES.txt
|
|
9
|
+
src/pipecat_moonshine.egg-info/dependency_links.txt
|
|
10
|
+
src/pipecat_moonshine.egg-info/requires.txt
|
|
11
|
+
src/pipecat_moonshine.egg-info/top_level.txt
|
|
@@ -0,0 +1 @@
|
|
|
1
|
+
|
|
@@ -0,0 +1 @@
|
|
|
1
|
+
pipecat_moonshine
|