@codexstar/pi-listen 1.0.4
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/LICENSE +21 -0
- package/README.md +283 -0
- package/daemon.py +517 -0
- package/docs/API.md +273 -0
- package/docs/ARCHITECTURE.md +114 -0
- package/docs/backends.md +196 -0
- package/docs/plans/2026-03-12-pi-voice-master-plan.md +613 -0
- package/docs/plans/2026-03-12-pi-voice-model-aware-execution-plan.md +256 -0
- package/docs/plans/2026-03-12-pi-voice-onboarding-remediation-plan.md +391 -0
- package/docs/plans/pi-voice-model-aware-review.md +196 -0
- package/docs/plans/pi-voice-model-detection-qa-plan.md +226 -0
- package/docs/plans/pi-voice-model-detection-research.md +483 -0
- package/docs/plans/pi-voice-onboarding-ux-plan.md +388 -0
- package/docs/plans/pi-voice-release-validation-plan.md +386 -0
- package/docs/plans/pi-voice-remaining-implementation-plan.md +524 -0
- package/docs/plans/pi-voice-review-findings.md +227 -0
- package/docs/plans/pi-voice-technical-remediation-plan.md +613 -0
- package/docs/qa-matrix.md +69 -0
- package/docs/qa-results.md +357 -0
- package/docs/troubleshooting.md +265 -0
- package/extensions/voice/config.ts +206 -0
- package/extensions/voice/diagnostics.ts +212 -0
- package/extensions/voice/install.ts +62 -0
- package/extensions/voice/onboarding.ts +315 -0
- package/extensions/voice.ts +1149 -0
- package/package.json +48 -0
- package/scripts/setup-macos.sh +374 -0
- package/scripts/setup-windows.ps1 +271 -0
- package/transcribe.py +497 -0
package/docs/API.md
ADDED
|
@@ -0,0 +1,273 @@
|
|
|
1
|
+
# pi-listen — API Documentation
|
|
2
|
+
|
|
3
|
+
## Extension API
|
|
4
|
+
|
|
5
|
+
pi-listen registers as a Pi extension via the `pi` field in `package.json`:
|
|
6
|
+
|
|
7
|
+
```json
|
|
8
|
+
{
|
|
9
|
+
"pi": {
|
|
10
|
+
"extensions": ["./extensions/voice.ts"]
|
|
11
|
+
}
|
|
12
|
+
}
|
|
13
|
+
```
|
|
14
|
+
|
|
15
|
+
The extension exports a default function that receives the Pi `ExtensionAPI`.
|
|
16
|
+
|
|
17
|
+
---
|
|
18
|
+
|
|
19
|
+
## Daemon Protocol
|
|
20
|
+
|
|
21
|
+
The STT daemon communicates via Unix domain sockets using newline-delimited JSON.
|
|
22
|
+
|
|
23
|
+
### Socket Location
|
|
24
|
+
|
|
25
|
+
Socket path is derived from config hash:
|
|
26
|
+
```
|
|
27
|
+
/tmp/pi-voice-<sha1_12chars>.sock
|
|
28
|
+
```
|
|
29
|
+
|
|
30
|
+
### Commands
|
|
31
|
+
|
|
32
|
+
#### `ping`
|
|
33
|
+
|
|
34
|
+
Check if daemon is alive.
|
|
35
|
+
|
|
36
|
+
**Request:**
|
|
37
|
+
```json
|
|
38
|
+
{"cmd": "ping"}
|
|
39
|
+
```
|
|
40
|
+
|
|
41
|
+
**Response:**
|
|
42
|
+
```json
|
|
43
|
+
{"status": "ok", "pid": 12345}
|
|
44
|
+
```
|
|
45
|
+
|
|
46
|
+
---
|
|
47
|
+
|
|
48
|
+
#### `status`
|
|
49
|
+
|
|
50
|
+
Get daemon status and loaded model info.
|
|
51
|
+
|
|
52
|
+
**Request:**
|
|
53
|
+
```json
|
|
54
|
+
{"cmd": "status"}
|
|
55
|
+
```
|
|
56
|
+
|
|
57
|
+
**Response:**
|
|
58
|
+
```json
|
|
59
|
+
{
|
|
60
|
+
"status": "running",
|
|
61
|
+
"pid": 12345,
|
|
62
|
+
"uptime": 120.5,
|
|
63
|
+
"requests": 42,
|
|
64
|
+
"idle": 5.2,
|
|
65
|
+
"backend": "faster-whisper",
|
|
66
|
+
"model": "small",
|
|
67
|
+
"model_loaded": true
|
|
68
|
+
}
|
|
69
|
+
```
|
|
70
|
+
|
|
71
|
+
---
|
|
72
|
+
|
|
73
|
+
#### `load`
|
|
74
|
+
|
|
75
|
+
Load or switch to a specific backend and model.
|
|
76
|
+
|
|
77
|
+
**Request:**
|
|
78
|
+
```json
|
|
79
|
+
{"cmd": "load", "backend": "faster-whisper", "model": "small"}
|
|
80
|
+
```
|
|
81
|
+
|
|
82
|
+
**Response:**
|
|
83
|
+
```json
|
|
84
|
+
{"status": "loaded", "backend": "faster-whisper", "model": "small", "load_time": 2.34}
|
|
85
|
+
```
|
|
86
|
+
|
|
87
|
+
Or if already loaded:
|
|
88
|
+
```json
|
|
89
|
+
{"status": "already_loaded", "backend": "faster-whisper", "model": "small"}
|
|
90
|
+
```
|
|
91
|
+
|
|
92
|
+
---
|
|
93
|
+
|
|
94
|
+
#### `transcribe`
|
|
95
|
+
|
|
96
|
+
Transcribe an audio file.
|
|
97
|
+
|
|
98
|
+
**Request:**
|
|
99
|
+
```json
|
|
100
|
+
{
|
|
101
|
+
"cmd": "transcribe",
|
|
102
|
+
"audio": "/path/to/file.wav",
|
|
103
|
+
"language": "en",
|
|
104
|
+
"vad": true
|
|
105
|
+
}
|
|
106
|
+
```
|
|
107
|
+
|
|
108
|
+
**Response:**
|
|
109
|
+
```json
|
|
110
|
+
{
|
|
111
|
+
"text": "hello world",
|
|
112
|
+
"duration": 1.23,
|
|
113
|
+
"backend": "faster-whisper",
|
|
114
|
+
"model": "small",
|
|
115
|
+
"language": "en"
|
|
116
|
+
}
|
|
117
|
+
```
|
|
118
|
+
|
|
119
|
+
With VAD skip (no speech detected):
|
|
120
|
+
```json
|
|
121
|
+
{
|
|
122
|
+
"text": "",
|
|
123
|
+
"duration": 0,
|
|
124
|
+
"vad": {"has_speech": false, "vad_available": true, "segments": 0},
|
|
125
|
+
"skipped": true
|
|
126
|
+
}
|
|
127
|
+
```
|
|
128
|
+
|
|
129
|
+
---
|
|
130
|
+
|
|
131
|
+
#### `vad`
|
|
132
|
+
|
|
133
|
+
Run Voice Activity Detection only (no transcription).
|
|
134
|
+
|
|
135
|
+
**Request:**
|
|
136
|
+
```json
|
|
137
|
+
{"cmd": "vad", "audio": "/path/to/file.wav"}
|
|
138
|
+
```
|
|
139
|
+
|
|
140
|
+
**Response:**
|
|
141
|
+
```json
|
|
142
|
+
{
|
|
143
|
+
"has_speech": true,
|
|
144
|
+
"vad_available": true,
|
|
145
|
+
"segments": 3,
|
|
146
|
+
"speech_duration_ms": 4200
|
|
147
|
+
}
|
|
148
|
+
```
|
|
149
|
+
|
|
150
|
+
---
|
|
151
|
+
|
|
152
|
+
#### `backends`
|
|
153
|
+
|
|
154
|
+
List all available backends and their status.
|
|
155
|
+
|
|
156
|
+
**Request:**
|
|
157
|
+
```json
|
|
158
|
+
{"cmd": "backends"}
|
|
159
|
+
```
|
|
160
|
+
|
|
161
|
+
**Response:**
|
|
162
|
+
```json
|
|
163
|
+
{
|
|
164
|
+
"backends": [
|
|
165
|
+
{
|
|
166
|
+
"name": "faster-whisper",
|
|
167
|
+
"available": true,
|
|
168
|
+
"type": "local",
|
|
169
|
+
"default_model": "small",
|
|
170
|
+
"models": ["tiny", "base", "small", "medium", "large-v3"]
|
|
171
|
+
}
|
|
172
|
+
]
|
|
173
|
+
}
|
|
174
|
+
```
|
|
175
|
+
|
|
176
|
+
---
|
|
177
|
+
|
|
178
|
+
#### `shutdown`
|
|
179
|
+
|
|
180
|
+
Gracefully stop the daemon.
|
|
181
|
+
|
|
182
|
+
**Request:**
|
|
183
|
+
```json
|
|
184
|
+
{"cmd": "shutdown"}
|
|
185
|
+
```
|
|
186
|
+
|
|
187
|
+
**Response:**
|
|
188
|
+
```json
|
|
189
|
+
{"status": "shutting_down"}
|
|
190
|
+
```
|
|
191
|
+
|
|
192
|
+
---
|
|
193
|
+
|
|
194
|
+
## CLI Usage
|
|
195
|
+
|
|
196
|
+
### transcribe.py
|
|
197
|
+
|
|
198
|
+
```bash
|
|
199
|
+
# Transcribe with auto-detected backend
|
|
200
|
+
python3 transcribe.py audio.wav
|
|
201
|
+
|
|
202
|
+
# Specify backend and model
|
|
203
|
+
python3 transcribe.py --backend faster-whisper --model small audio.wav
|
|
204
|
+
|
|
205
|
+
# Specify language
|
|
206
|
+
python3 transcribe.py --language auto audio.wav
|
|
207
|
+
|
|
208
|
+
# List available backends
|
|
209
|
+
python3 transcribe.py --list-backends
|
|
210
|
+
|
|
211
|
+
# List models for a backend
|
|
212
|
+
python3 transcribe.py --list-models --backend faster-whisper
|
|
213
|
+
```
|
|
214
|
+
|
|
215
|
+
### daemon.py
|
|
216
|
+
|
|
217
|
+
```bash
|
|
218
|
+
# Start daemon
|
|
219
|
+
python3 daemon.py start --backend faster-whisper --model small
|
|
220
|
+
|
|
221
|
+
# Start with custom socket path
|
|
222
|
+
python3 daemon.py start --socket /tmp/my-daemon.sock
|
|
223
|
+
|
|
224
|
+
# Check status
|
|
225
|
+
python3 daemon.py status
|
|
226
|
+
|
|
227
|
+
# Transcribe via daemon
|
|
228
|
+
python3 daemon.py transcribe audio.wav --language en --vad
|
|
229
|
+
|
|
230
|
+
# Load a different model
|
|
231
|
+
python3 daemon.py load --backend moonshine --model moonshine/base
|
|
232
|
+
|
|
233
|
+
# Stop daemon
|
|
234
|
+
python3 daemon.py stop
|
|
235
|
+
|
|
236
|
+
# Ping daemon
|
|
237
|
+
python3 daemon.py ping
|
|
238
|
+
```
|
|
239
|
+
|
|
240
|
+
---
|
|
241
|
+
|
|
242
|
+
## Error Responses
|
|
243
|
+
|
|
244
|
+
All errors follow this format:
|
|
245
|
+
|
|
246
|
+
```json
|
|
247
|
+
{"error": "Human-readable error message"}
|
|
248
|
+
```
|
|
249
|
+
|
|
250
|
+
Common errors:
|
|
251
|
+
|
|
252
|
+
| Error | Cause |
|
|
253
|
+
|-------|-------|
|
|
254
|
+
| `"Audio file not found: /path"` | Audio file doesn't exist |
|
|
255
|
+
| `"No model loaded. Send 'load' first."` | Transcribe called before loading a model |
|
|
256
|
+
| `"No STT backend found"` | No backend is installed or available |
|
|
257
|
+
| `"Unknown command: foo"` | Invalid command sent to daemon |
|
|
258
|
+
| `"Message exceeds maximum size"` | Request larger than 1 MB |
|
|
259
|
+
| `"Daemon not running"` | Socket connection failed |
|
|
260
|
+
| `"Daemon timeout"` | Daemon didn't respond within timeout |
|
|
261
|
+
|
|
262
|
+
---
|
|
263
|
+
|
|
264
|
+
## Audio Requirements
|
|
265
|
+
|
|
266
|
+
| Property | Value |
|
|
267
|
+
|----------|-------|
|
|
268
|
+
| Format | WAV |
|
|
269
|
+
| Sample rate | 16000 Hz |
|
|
270
|
+
| Channels | 1 (mono) |
|
|
271
|
+
| Bit depth | 16-bit |
|
|
272
|
+
|
|
273
|
+
Audio is recorded via SoX's `rec` command with these parameters automatically.
|
|
@@ -0,0 +1,114 @@
|
|
|
1
|
+
# pi-listen Architecture
|
|
2
|
+
|
|
3
|
+
## System Overview
|
|
4
|
+
|
|
5
|
+
```
|
|
6
|
+
┌──────────────────────────────────────────────────────────────────────────┐
|
|
7
|
+
│ Pi CLI (Node.js) │
|
|
8
|
+
│ │
|
|
9
|
+
│ ┌─────────────────────────────────────────────────────────────────────┐ │
|
|
10
|
+
│ │ extensions/voice.ts │ │
|
|
11
|
+
│ │ │ │
|
|
12
|
+
│ │ ┌───────────┐ ┌──────────┐ ┌──────────┐ ┌──────────────────┐ │ │
|
|
13
|
+
│ │ │ Recording │ │ Daemon │ │ BTW │ │ Hold-to-Talk │ │ │
|
|
14
|
+
│ │ │ (SoX rec) │ │ IPC │ │ Threads │ │ (Kitty proto) │ │ │
|
|
15
|
+
│ │ └─────┬─────┘ └────┬─────┘ └────┬─────┘ └────────┬─────────┘ │ │
|
|
16
|
+
│ │ │ │ │ │ │ │
|
|
17
|
+
│ └────────┼──────────────┼─────────────┼──────────────────┼────────────┘ │
|
|
18
|
+
│ │ │ │ │ │
|
|
19
|
+
│ ┌────────┼──────────────┼─────────────┼──────────────────┼────────────┐ │
|
|
20
|
+
│ │ │ voice/config.ts voice/diagnostics.ts │ │
|
|
21
|
+
│ │ │ voice/onboarding.ts voice/install.ts │ │
|
|
22
|
+
│ └────────┼──────────────┼─────────────────────────────────────────────┘ │
|
|
23
|
+
│ │ │ │
|
|
24
|
+
└───────────┼──────────────┼───────────────────────────────────────────────┘
|
|
25
|
+
│ │ Unix Domain Socket
|
|
26
|
+
│ │ (newline-delimited JSON)
|
|
27
|
+
│ ▼
|
|
28
|
+
┌───────────┼──────────────────────────────────────────────────────────────┐
|
|
29
|
+
│ │ daemon.py (Python, persistent) │
|
|
30
|
+
│ │ │
|
|
31
|
+
│ ┌────────▼────────┐ ┌─────────────────┐ ┌──────────────────────────┐ │
|
|
32
|
+
│ │ Audio Files │ │ ModelCache │ │ DaemonServer │ │
|
|
33
|
+
│ │ (WAV, /tmp) │ │ (thread-safe) │ │ (socket, idle timeout) │ │
|
|
34
|
+
│ └─────────────────┘ └────────┬────────┘ └──────────────────────────┘ │
|
|
35
|
+
│ │ │
|
|
36
|
+
│ ┌────────▼────────┐ │
|
|
37
|
+
│ │ transcribe.py │ │
|
|
38
|
+
│ │ (5 backends) │ │
|
|
39
|
+
│ └────────┬────────┘ │
|
|
40
|
+
│ │ │
|
|
41
|
+
│ ┌─────────────────┼─────────────────────┐ │
|
|
42
|
+
│ ▼ ▼ ▼ │
|
|
43
|
+
│ ┌──────────────────┐ ┌──────────────┐ ┌──────────────────┐ │
|
|
44
|
+
│ │ faster-whisper │ │ whisper-cpp │ │ Deepgram │ │
|
|
45
|
+
│ │ moonshine │ │ │ │ (cloud API) │ │
|
|
46
|
+
│ │ parakeet │ │ │ │ │ │
|
|
47
|
+
│ └──────────────────┘ └──────────────┘ └──────────────────┘ │
|
|
48
|
+
└──────────────────────────────────────────────────────────────────────────┘
|
|
49
|
+
```
|
|
50
|
+
|
|
51
|
+
## Data Flow
|
|
52
|
+
|
|
53
|
+
### Voice Input Flow
|
|
54
|
+
|
|
55
|
+
```
|
|
56
|
+
1. User holds SPACE (or Ctrl+Shift+V)
|
|
57
|
+
2. voice.ts spawns `rec` (SoX) → writes WAV to /tmp
|
|
58
|
+
3. User releases key
|
|
59
|
+
4. voice.ts kills `rec`
|
|
60
|
+
5. voice.ts sends {"cmd": "transcribe", "audio": "/tmp/pi-voice-XXX.wav"} to daemon
|
|
61
|
+
6. daemon.py runs transcription via loaded model
|
|
62
|
+
7. daemon.py returns {"text": "...", "duration": 1.23}
|
|
63
|
+
8. voice.ts injects text into Pi editor
|
|
64
|
+
9. voice.ts deletes temp WAV file
|
|
65
|
+
```
|
|
66
|
+
|
|
67
|
+
### BTW Flow
|
|
68
|
+
|
|
69
|
+
```
|
|
70
|
+
1. User types /btw <message> (or holds Ctrl+Shift+B for voice)
|
|
71
|
+
2. voice.ts builds context from system prompt + prior BTW thread
|
|
72
|
+
3. voice.ts streams LLM response via pi-ai streamSimple()
|
|
73
|
+
4. Response displayed in BTW widget above editor
|
|
74
|
+
5. User can /btw:inject to push thread into main agent context
|
|
75
|
+
```
|
|
76
|
+
|
|
77
|
+
### Daemon Lifecycle
|
|
78
|
+
|
|
79
|
+
```
|
|
80
|
+
1. Extension starts → ensureDaemon() checks if daemon is running
|
|
81
|
+
2. If not running → spawns python3 daemon.py start --socket <path>
|
|
82
|
+
3. Daemon pre-loads configured backend + model
|
|
83
|
+
4. Daemon serves requests on Unix socket
|
|
84
|
+
5. After 5 minutes of inactivity → auto-shutdown
|
|
85
|
+
6. On session_shutdown → extension does NOT kill daemon (persists for reuse)
|
|
86
|
+
```
|
|
87
|
+
|
|
88
|
+
## Thread Safety
|
|
89
|
+
|
|
90
|
+
- **ModelCache**: Protected by `threading.Lock()` — only one transcription runs at a time
|
|
91
|
+
- **DaemonServer**: Each client connection handled in a separate daemon thread
|
|
92
|
+
- **voice.ts state**: Single-threaded Node.js event loop — no mutex needed, but async races are possible (documented in bug-hunter audit)
|
|
93
|
+
|
|
94
|
+
## Configuration Resolution
|
|
95
|
+
|
|
96
|
+
```
|
|
97
|
+
1. Check project settings: <cwd>/.pi/settings.json → voice key
|
|
98
|
+
2. Check global settings: ~/.pi/agent/settings.json → voice key
|
|
99
|
+
3. Fall back to DEFAULT_CONFIG (auto backend, small model, enabled)
|
|
100
|
+
```
|
|
101
|
+
|
|
102
|
+
Project settings override global settings completely (no merging).
|
|
103
|
+
|
|
104
|
+
## Security Boundaries
|
|
105
|
+
|
|
106
|
+
| Boundary | Protection |
|
|
107
|
+
|----------|------------|
|
|
108
|
+
| Unix socket | Local-only (no TCP), file permissions |
|
|
109
|
+
| Message size | MAX_MSG_SIZE = 1 MB |
|
|
110
|
+
| Audio files | Validated with os.path.exists() |
|
|
111
|
+
| Backend names | Validated against BACKENDS registry |
|
|
112
|
+
| Error responses | No stack traces (logged to stderr only) |
|
|
113
|
+
| Idle timeout | Daemon auto-shuts down after 5 minutes |
|
|
114
|
+
| Temp files | Deleted immediately after transcription |
|
package/docs/backends.md
ADDED
|
@@ -0,0 +1,196 @@
|
|
|
1
|
+
# pi-voice backend guide
|
|
2
|
+
|
|
3
|
+
`pi-voice` supports both **local** and **cloud** speech-to-text (STT) backends.
|
|
4
|
+
|
|
5
|
+
Today, the extension exposes backend selection through `/voice setup` and reads settings from:
|
|
6
|
+
|
|
7
|
+
- global: `~/.pi/agent/settings.json`
|
|
8
|
+
- project: `.pi/settings.json`
|
|
9
|
+
|
|
10
|
+
Project settings override global settings when both are present.
|
|
11
|
+
|
|
12
|
+
## Quick recommendation
|
|
13
|
+
|
|
14
|
+
If you want a conservative default:
|
|
15
|
+
|
|
16
|
+
- choose **faster-whisper** for a strong local default on macOS and general desktop use
|
|
17
|
+
- choose **Deepgram** if you want the fastest route to a working setup and you already have an API key
|
|
18
|
+
|
|
19
|
+
If onboarding tells you a model is already installed, prefer that ready-now path unless you have a strong reason to switch.
|
|
20
|
+
|
|
21
|
+
## Backend comparison
|
|
22
|
+
|
|
23
|
+
| Backend | Mode | Best for | Tradeoffs | Typical install path | Model detection confidence |
|
|
24
|
+
|---|---|---|---|---|---|
|
|
25
|
+
| `faster-whisper` | Local | Best overall local default, good quality/speed balance | Python dependency, model download time, CPU usage | `pip install faster-whisper` | High |
|
|
26
|
+
| `moonshine` | Local | Lower-resource local experiments | Smaller ecosystem, fewer model choices | `pip install useful-moonshine[onnx]` | Medium / heuristic |
|
|
27
|
+
| `whisper-cpp` | Local | CLI-oriented local setups, users already invested in whisper.cpp | Model file management is more manual | `brew install whisper-cpp` | Very high |
|
|
28
|
+
| `deepgram` | Cloud | Fastest setup if API key already exists | Audio leaves the machine, network required, API billing | set `DEEPGRAM_API_KEY` | API-ready vs missing key |
|
|
29
|
+
| `parakeet` | Local | NVIDIA/NeMo-oriented experimentation | Heavy dependency footprint, slower setup | `pip install nemo_toolkit[asr]` | Medium-low / heuristic |
|
|
30
|
+
|
|
31
|
+
## Backend details
|
|
32
|
+
|
|
33
|
+
## faster-whisper
|
|
34
|
+
|
|
35
|
+
**Use this when:**
|
|
36
|
+
- you want a high-confidence local default
|
|
37
|
+
- privacy/offline use matters
|
|
38
|
+
- you are okay with Python-based dependencies
|
|
39
|
+
|
|
40
|
+
**Pros**
|
|
41
|
+
- mature Whisper-family local option
|
|
42
|
+
- strong model selection range
|
|
43
|
+
- fits the current `pi-voice` architecture well
|
|
44
|
+
|
|
45
|
+
**Cons**
|
|
46
|
+
- first install can take longer than cloud setup
|
|
47
|
+
- larger models increase local CPU and startup costs
|
|
48
|
+
|
|
49
|
+
**Suggested starting models**
|
|
50
|
+
- `small` — good conservative starting point
|
|
51
|
+
- `small.en` — useful for English-only setups
|
|
52
|
+
- `medium` — move here if you want more accuracy and can accept higher cost
|
|
53
|
+
|
|
54
|
+
## moonshine
|
|
55
|
+
|
|
56
|
+
**Use this when:**
|
|
57
|
+
- you want a lighter local option
|
|
58
|
+
- you are willing to trade maturity and flexibility for smaller setup/runtime needs
|
|
59
|
+
|
|
60
|
+
**Pros**
|
|
61
|
+
- lightweight local direction
|
|
62
|
+
- fewer model decisions
|
|
63
|
+
|
|
64
|
+
**Cons**
|
|
65
|
+
- narrower model ecosystem
|
|
66
|
+
- less common than Whisper-family defaults
|
|
67
|
+
|
|
68
|
+
**Suggested starting models**
|
|
69
|
+
- `moonshine/base` — safer default
|
|
70
|
+
- `moonshine/tiny` — lowest-cost experiment path
|
|
71
|
+
|
|
72
|
+
## whisper-cpp
|
|
73
|
+
|
|
74
|
+
**Use this when:**
|
|
75
|
+
- you already use whisper.cpp
|
|
76
|
+
- you prefer CLI-driven local tooling
|
|
77
|
+
- you want to manage model files more explicitly
|
|
78
|
+
|
|
79
|
+
**Pros**
|
|
80
|
+
- straightforward local CLI model
|
|
81
|
+
- familiar to users already in the whisper.cpp ecosystem
|
|
82
|
+
|
|
83
|
+
**Cons**
|
|
84
|
+
- model file placement matters
|
|
85
|
+
- setup can feel more manual than Python-backed alternatives
|
|
86
|
+
|
|
87
|
+
**Suggested starting models**
|
|
88
|
+
- `small` — practical starting point
|
|
89
|
+
- `small.en` — good English-only choice
|
|
90
|
+
|
|
91
|
+
## deepgram
|
|
92
|
+
|
|
93
|
+
**Use this when:**
|
|
94
|
+
- you want the fastest route to a working setup
|
|
95
|
+
- you are comfortable with cloud transcription
|
|
96
|
+
- you already have or plan to add `DEEPGRAM_API_KEY`
|
|
97
|
+
|
|
98
|
+
**Pros**
|
|
99
|
+
- minimal local dependency burden
|
|
100
|
+
- fast setup when credentials are ready
|
|
101
|
+
- good fit for users optimizing for time-to-value
|
|
102
|
+
|
|
103
|
+
**Cons**
|
|
104
|
+
- requires network connectivity
|
|
105
|
+
- audio is sent to a cloud provider
|
|
106
|
+
- subject to account limits and billing
|
|
107
|
+
|
|
108
|
+
**Suggested starting model**
|
|
109
|
+
- `nova-3` — the current default in `transcribe.py`
|
|
110
|
+
|
|
111
|
+
## parakeet
|
|
112
|
+
|
|
113
|
+
**Use this when:**
|
|
114
|
+
- you specifically want to experiment with NVIDIA NeMo/Parakeet
|
|
115
|
+
- you are comfortable with a heavier Python stack
|
|
116
|
+
|
|
117
|
+
**Pros**
|
|
118
|
+
- interesting specialized option
|
|
119
|
+
- useful for experimentation in the NeMo ecosystem
|
|
120
|
+
|
|
121
|
+
**Cons**
|
|
122
|
+
- heavier setup than the other paths
|
|
123
|
+
- not the best default for a first-time user
|
|
124
|
+
|
|
125
|
+
## Choosing local vs cloud
|
|
126
|
+
|
|
127
|
+
## Choose local when
|
|
128
|
+
- privacy matters more than setup speed
|
|
129
|
+
- you want offline use
|
|
130
|
+
- you are comfortable installing dependencies
|
|
131
|
+
- you want per-project control without relying on network services
|
|
132
|
+
|
|
133
|
+
## Choose cloud when
|
|
134
|
+
- you want the fastest setup path
|
|
135
|
+
- you are okay with a provider API key
|
|
136
|
+
- you want to avoid local model/dependency installation
|
|
137
|
+
- you are comfortable with network and billing constraints
|
|
138
|
+
|
|
139
|
+
## Model selection guidance
|
|
140
|
+
|
|
141
|
+
The current package exposes raw backend model names plus model-aware status where it can detect it.
|
|
142
|
+
|
|
143
|
+
A conservative way to choose:
|
|
144
|
+
|
|
145
|
+
- prefer **already installed** models first when they meet your needs
|
|
146
|
+
- start small: `small`, `small.en`, or backend defaults
|
|
147
|
+
- only move up if the current model is not accurate enough
|
|
148
|
+
- prefer English-only variants when your use case is English-only and the backend provides them
|
|
149
|
+
- avoid jumping to the biggest model first unless you already know your machine can handle it
|
|
150
|
+
|
|
151
|
+
## How to read model status
|
|
152
|
+
|
|
153
|
+
During onboarding and in some diagnostics output, local models can show up as:
|
|
154
|
+
|
|
155
|
+
- **installed** — the model appears to already exist locally and should not need a fresh download
|
|
156
|
+
- **recommended, installed** — same as above, and also the preferred choice for the current path
|
|
157
|
+
- **download required** — backend is available, but the selected model does not appear to be present
|
|
158
|
+
- **unknown** — backend is present, but model presence could not be confirmed confidently
|
|
159
|
+
- **api** — cloud path; no local model download applies
|
|
160
|
+
|
|
161
|
+
For `unknown`, do not assume the system is broken. It means `pi-voice` is being conservative and avoiding a false claim.
|
|
162
|
+
|
|
163
|
+
## Configuration scope
|
|
164
|
+
|
|
165
|
+
`pi-voice` now supports both:
|
|
166
|
+
|
|
167
|
+
- **Global scope** — applies across projects
|
|
168
|
+
- **Project scope** — applies only in the current repository
|
|
169
|
+
|
|
170
|
+
Use **project scope** when:
|
|
171
|
+
- one repo needs a different voice workflow
|
|
172
|
+
- you want a safer rollout before using the same config everywhere
|
|
173
|
+
- you need local STT in one project but cloud STT in another
|
|
174
|
+
|
|
175
|
+
## Inspect current backend state
|
|
176
|
+
|
|
177
|
+
Useful commands:
|
|
178
|
+
|
|
179
|
+
- `/voice info` — show current effective config and selected model status
|
|
180
|
+
- `/voice backends` — list available backends, installed models, and install detection hints
|
|
181
|
+
- `/voice setup` — choose backend and model interactively
|
|
182
|
+
- `/voice test` — do a basic voice environment check plus current model readiness
|
|
183
|
+
- `/voice doctor` — compare current setup repair steps with a recommended alternative
|
|
184
|
+
- `/voice daemon status` — inspect the warm daemon state
|
|
185
|
+
|
|
186
|
+
## Current limitations
|
|
187
|
+
|
|
188
|
+
Be conservative about expectations:
|
|
189
|
+
|
|
190
|
+
- backend installation is still partly toolchain-dependent
|
|
191
|
+
- some setup steps may still require manual `brew` or `pip` work
|
|
192
|
+
- not every backend has equally strong model-detection confidence
|
|
193
|
+
- `unknown` model status is intentional when the system cannot verify a local cache safely
|
|
194
|
+
- backend available does **not** always mean every model is already ready locally
|
|
195
|
+
|
|
196
|
+
If you hit issues, see [`docs/troubleshooting.md`](./troubleshooting.md).
|