@codexstar/pi-listen 1.0.4

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/docs/API.md ADDED
@@ -0,0 +1,273 @@
1
+ # pi-listen — API Documentation
2
+
3
+ ## Extension API
4
+
5
+ pi-listen registers as a Pi extension via the `pi` field in `package.json`:
6
+
7
+ ```json
8
+ {
9
+ "pi": {
10
+ "extensions": ["./extensions/voice.ts"]
11
+ }
12
+ }
13
+ ```
14
+
15
+ The extension exports a default function that receives the Pi `ExtensionAPI`.
16
+
17
+ ---
18
+
19
+ ## Daemon Protocol
20
+
21
+ The STT daemon communicates via Unix domain sockets using newline-delimited JSON.
22
+
23
+ ### Socket Location
24
+
25
+ Socket path is derived from config hash:
26
+ ```
27
+ /tmp/pi-voice-<sha1_12chars>.sock
28
+ ```
29
+
30
+ ### Commands
31
+
32
+ #### `ping`
33
+
34
+ Check if daemon is alive.
35
+
36
+ **Request:**
37
+ ```json
38
+ {"cmd": "ping"}
39
+ ```
40
+
41
+ **Response:**
42
+ ```json
43
+ {"status": "ok", "pid": 12345}
44
+ ```
45
+
46
+ ---
47
+
48
+ #### `status`
49
+
50
+ Get daemon status and loaded model info.
51
+
52
+ **Request:**
53
+ ```json
54
+ {"cmd": "status"}
55
+ ```
56
+
57
+ **Response:**
58
+ ```json
59
+ {
60
+ "status": "running",
61
+ "pid": 12345,
62
+ "uptime": 120.5,
63
+ "requests": 42,
64
+ "idle": 5.2,
65
+ "backend": "faster-whisper",
66
+ "model": "small",
67
+ "model_loaded": true
68
+ }
69
+ ```
70
+
71
+ ---
72
+
73
+ #### `load`
74
+
75
+ Load or switch to a specific backend and model.
76
+
77
+ **Request:**
78
+ ```json
79
+ {"cmd": "load", "backend": "faster-whisper", "model": "small"}
80
+ ```
81
+
82
+ **Response:**
83
+ ```json
84
+ {"status": "loaded", "backend": "faster-whisper", "model": "small", "load_time": 2.34}
85
+ ```
86
+
87
+ Or if already loaded:
88
+ ```json
89
+ {"status": "already_loaded", "backend": "faster-whisper", "model": "small"}
90
+ ```
91
+
92
+ ---
93
+
94
+ #### `transcribe`
95
+
96
+ Transcribe an audio file.
97
+
98
+ **Request:**
99
+ ```json
100
+ {
101
+ "cmd": "transcribe",
102
+ "audio": "/path/to/file.wav",
103
+ "language": "en",
104
+ "vad": true
105
+ }
106
+ ```
107
+
108
+ **Response:**
109
+ ```json
110
+ {
111
+ "text": "hello world",
112
+ "duration": 1.23,
113
+ "backend": "faster-whisper",
114
+ "model": "small",
115
+ "language": "en"
116
+ }
117
+ ```
118
+
119
+ With VAD skip (no speech detected):
120
+ ```json
121
+ {
122
+ "text": "",
123
+ "duration": 0,
124
+ "vad": {"has_speech": false, "vad_available": true, "segments": 0},
125
+ "skipped": true
126
+ }
127
+ ```
128
+
129
+ ---
130
+
131
+ #### `vad`
132
+
133
+ Run Voice Activity Detection only (no transcription).
134
+
135
+ **Request:**
136
+ ```json
137
+ {"cmd": "vad", "audio": "/path/to/file.wav"}
138
+ ```
139
+
140
+ **Response:**
141
+ ```json
142
+ {
143
+ "has_speech": true,
144
+ "vad_available": true,
145
+ "segments": 3,
146
+ "speech_duration_ms": 4200
147
+ }
148
+ ```
149
+
150
+ ---
151
+
152
+ #### `backends`
153
+
154
+ List all available backends and their status.
155
+
156
+ **Request:**
157
+ ```json
158
+ {"cmd": "backends"}
159
+ ```
160
+
161
+ **Response:**
162
+ ```json
163
+ {
164
+ "backends": [
165
+ {
166
+ "name": "faster-whisper",
167
+ "available": true,
168
+ "type": "local",
169
+ "default_model": "small",
170
+ "models": ["tiny", "base", "small", "medium", "large-v3"]
171
+ }
172
+ ]
173
+ }
174
+ ```
175
+
176
+ ---
177
+
178
+ #### `shutdown`
179
+
180
+ Gracefully stop the daemon.
181
+
182
+ **Request:**
183
+ ```json
184
+ {"cmd": "shutdown"}
185
+ ```
186
+
187
+ **Response:**
188
+ ```json
189
+ {"status": "shutting_down"}
190
+ ```
191
+
192
+ ---
193
+
194
+ ## CLI Usage
195
+
196
+ ### transcribe.py
197
+
198
+ ```bash
199
+ # Transcribe with auto-detected backend
200
+ python3 transcribe.py audio.wav
201
+
202
+ # Specify backend and model
203
+ python3 transcribe.py --backend faster-whisper --model small audio.wav
204
+
205
+ # Specify language
206
+ python3 transcribe.py --language auto audio.wav
207
+
208
+ # List available backends
209
+ python3 transcribe.py --list-backends
210
+
211
+ # List models for a backend
212
+ python3 transcribe.py --list-models --backend faster-whisper
213
+ ```
214
+
215
+ ### daemon.py
216
+
217
+ ```bash
218
+ # Start daemon
219
+ python3 daemon.py start --backend faster-whisper --model small
220
+
221
+ # Start with custom socket path
222
+ python3 daemon.py start --socket /tmp/my-daemon.sock
223
+
224
+ # Check status
225
+ python3 daemon.py status
226
+
227
+ # Transcribe via daemon
228
+ python3 daemon.py transcribe audio.wav --language en --vad
229
+
230
+ # Load a different model
231
+ python3 daemon.py load --backend moonshine --model moonshine/base
232
+
233
+ # Stop daemon
234
+ python3 daemon.py stop
235
+
236
+ # Ping daemon
237
+ python3 daemon.py ping
238
+ ```
239
+
240
+ ---
241
+
242
+ ## Error Responses
243
+
244
+ All errors follow this format:
245
+
246
+ ```json
247
+ {"error": "Human-readable error message"}
248
+ ```
249
+
250
+ Common errors:
251
+
252
+ | Error | Cause |
253
+ |-------|-------|
254
+ | `"Audio file not found: /path"` | Audio file doesn't exist |
255
+ | `"No model loaded. Send 'load' first."` | Transcribe called before loading a model |
256
+ | `"No STT backend found"` | No backend is installed or available |
257
+ | `"Unknown command: foo"` | Invalid command sent to daemon |
258
+ | `"Message exceeds maximum size"` | Request larger than 1 MB |
259
+ | `"Daemon not running"` | Socket connection failed |
260
+ | `"Daemon timeout"` | Daemon didn't respond within timeout |
261
+
262
+ ---
263
+
264
+ ## Audio Requirements
265
+
266
+ | Property | Value |
267
+ |----------|-------|
268
+ | Format | WAV |
269
+ | Sample rate | 16000 Hz |
270
+ | Channels | 1 (mono) |
271
+ | Bit depth | 16-bit |
272
+
273
+ Audio is recorded via SoX's `rec` command with these parameters automatically.
@@ -0,0 +1,114 @@
1
+ # pi-listen Architecture
2
+
3
+ ## System Overview
4
+
5
+ ```
6
+ ┌──────────────────────────────────────────────────────────────────────────┐
7
+ │ Pi CLI (Node.js) │
8
+ │ │
9
+ │ ┌─────────────────────────────────────────────────────────────────────┐ │
10
+ │ │ extensions/voice.ts │ │
11
+ │ │ │ │
12
+ │ │ ┌───────────┐ ┌──────────┐ ┌──────────┐ ┌──────────────────┐ │ │
13
+ │ │ │ Recording │ │ Daemon │ │ BTW │ │ Hold-to-Talk │ │ │
14
+ │ │ │ (SoX rec) │ │ IPC │ │ Threads │ │ (Kitty proto) │ │ │
15
+ │ │ └─────┬─────┘ └────┬─────┘ └────┬─────┘ └────────┬─────────┘ │ │
16
+ │ │ │ │ │ │ │ │
17
+ │ └────────┼──────────────┼─────────────┼──────────────────┼────────────┘ │
18
+ │ │ │ │ │ │
19
+ │ ┌────────┼──────────────┼─────────────┼──────────────────┼────────────┐ │
20
+ │ │ │ voice/config.ts voice/diagnostics.ts │ │
21
+ │ │ │ voice/onboarding.ts voice/install.ts │ │
22
+ │ └────────┼──────────────┼─────────────────────────────────────────────┘ │
23
+ │ │ │ │
24
+ └───────────┼──────────────┼───────────────────────────────────────────────┘
25
+ │ │ Unix Domain Socket
26
+ │ │ (newline-delimited JSON)
27
+ │ ▼
28
+ ┌───────────┼──────────────────────────────────────────────────────────────┐
29
+ │ │ daemon.py (Python, persistent) │
30
+ │ │ │
31
+ │ ┌────────▼────────┐ ┌─────────────────┐ ┌──────────────────────────┐ │
32
+ │ │ Audio Files │ │ ModelCache │ │ DaemonServer │ │
33
+ │ │ (WAV, /tmp) │ │ (thread-safe) │ │ (socket, idle timeout) │ │
34
+ │ └─────────────────┘ └────────┬────────┘ └──────────────────────────┘ │
35
+ │ │ │
36
+ │ ┌────────▼────────┐ │
37
+ │ │ transcribe.py │ │
38
+ │ │ (5 backends) │ │
39
+ │ └────────┬────────┘ │
40
+ │ │ │
41
+ │ ┌─────────────────┼─────────────────────┐ │
42
+ │ ▼ ▼ ▼ │
43
+ │ ┌──────────────────┐ ┌──────────────┐ ┌──────────────────┐ │
44
+ │ │ faster-whisper │ │ whisper-cpp │ │ Deepgram │ │
45
+ │ │ moonshine │ │ │ │ (cloud API) │ │
46
+ │ │ parakeet │ │ │ │ │ │
47
+ │ └──────────────────┘ └──────────────┘ └──────────────────┘ │
48
+ └──────────────────────────────────────────────────────────────────────────┘
49
+ ```
50
+
51
+ ## Data Flow
52
+
53
+ ### Voice Input Flow
54
+
55
+ ```
56
+ 1. User holds SPACE (or Ctrl+Shift+V)
57
+ 2. voice.ts spawns `rec` (SoX) → writes WAV to /tmp
58
+ 3. User releases key
59
+ 4. voice.ts kills `rec`
60
+ 5. voice.ts sends {"cmd": "transcribe", "audio": "/tmp/pi-voice-XXX.wav"} to daemon
61
+ 6. daemon.py runs transcription via loaded model
62
+ 7. daemon.py returns {"text": "...", "duration": 1.23}
63
+ 8. voice.ts injects text into Pi editor
64
+ 9. voice.ts deletes temp WAV file
65
+ ```
66
+
67
+ ### BTW Flow
68
+
69
+ ```
70
+ 1. User types /btw <message> (or holds Ctrl+Shift+B for voice)
71
+ 2. voice.ts builds context from system prompt + prior BTW thread
72
+ 3. voice.ts streams LLM response via pi-ai streamSimple()
73
+ 4. Response displayed in BTW widget above editor
74
+ 5. User can /btw:inject to push thread into main agent context
75
+ ```
76
+
77
+ ### Daemon Lifecycle
78
+
79
+ ```
80
+ 1. Extension starts → ensureDaemon() checks if daemon is running
81
+ 2. If not running → spawns python3 daemon.py start --socket <path>
82
+ 3. Daemon pre-loads configured backend + model
83
+ 4. Daemon serves requests on Unix socket
84
+ 5. After 5 minutes of inactivity → auto-shutdown
85
+ 6. On session_shutdown → extension does NOT kill daemon (persists for reuse)
86
+ ```
87
+
88
+ ## Thread Safety
89
+
90
+ - **ModelCache**: Protected by `threading.Lock()` — only one transcription runs at a time
91
+ - **DaemonServer**: Each client connection handled in a separate daemon thread
92
+ - **voice.ts state**: Single-threaded Node.js event loop — no mutex needed, but async races are possible (documented in bug-hunter audit)
93
+
94
+ ## Configuration Resolution
95
+
96
+ ```
97
+ 1. Check project settings: <cwd>/.pi/settings.json → voice key
98
+ 2. Check global settings: ~/.pi/agent/settings.json → voice key
99
+ 3. Fall back to DEFAULT_CONFIG (auto backend, small model, enabled)
100
+ ```
101
+
102
+ Project settings override global settings completely (no merging).
103
+
104
+ ## Security Boundaries
105
+
106
+ | Boundary | Protection |
107
+ |----------|------------|
108
+ | Unix socket | Local-only (no TCP), file permissions |
109
+ | Message size | MAX_MSG_SIZE = 1 MB |
110
+ | Audio files | Validated with os.path.exists() |
111
+ | Backend names | Validated against BACKENDS registry |
112
+ | Error responses | No stack traces (logged to stderr only) |
113
+ | Idle timeout | Daemon auto-shuts down after 5 minutes |
114
+ | Temp files | Deleted immediately after transcription |
@@ -0,0 +1,196 @@
1
+ # pi-voice backend guide
2
+
3
+ `pi-voice` supports both **local** and **cloud** speech-to-text (STT) backends.
4
+
5
+ Today, the extension exposes backend selection through `/voice setup` and reads settings from:
6
+
7
+ - global: `~/.pi/agent/settings.json`
8
+ - project: `.pi/settings.json`
9
+
10
+ Project settings override global settings when both are present.
11
+
12
+ ## Quick recommendation
13
+
14
+ If you want a conservative default:
15
+
16
+ - choose **faster-whisper** for a strong local default on macOS and general desktop use
17
+ - choose **Deepgram** if you want the fastest route to a working setup and you already have an API key
18
+
19
+ If onboarding tells you a model is already installed, prefer that ready-now path unless you have a strong reason to switch.
20
+
21
+ ## Backend comparison
22
+
23
+ | Backend | Mode | Best for | Tradeoffs | Typical install path | Model detection confidence |
24
+ |---|---|---|---|---|---|
25
+ | `faster-whisper` | Local | Best overall local default, good quality/speed balance | Python dependency, model download time, CPU usage | `pip install faster-whisper` | High |
26
+ | `moonshine` | Local | Lower-resource local experiments | Smaller ecosystem, fewer model choices | `pip install useful-moonshine[onnx]` | Medium / heuristic |
27
+ | `whisper-cpp` | Local | CLI-oriented local setups, users already invested in whisper.cpp | Model file management is more manual | `brew install whisper-cpp` | Very high |
28
+ | `deepgram` | Cloud | Fastest setup if API key already exists | Audio leaves the machine, network required, API billing | set `DEEPGRAM_API_KEY` | API-ready vs missing key |
29
+ | `parakeet` | Local | NVIDIA/NeMo-oriented experimentation | Heavy dependency footprint, slower setup | `pip install nemo_toolkit[asr]` | Medium-low / heuristic |
30
+
31
+ ## Backend details
32
+
33
+ ## faster-whisper
34
+
35
+ **Use this when:**
36
+ - you want a high-confidence local default
37
+ - privacy/offline use matters
38
+ - you are okay with Python-based dependencies
39
+
40
+ **Pros**
41
+ - mature Whisper-family local option
42
+ - strong model selection range
43
+ - fits the current `pi-voice` architecture well
44
+
45
+ **Cons**
46
+ - first install can take longer than cloud setup
47
+ - larger models increase local CPU and startup costs
48
+
49
+ **Suggested starting models**
50
+ - `small` — good conservative starting point
51
+ - `small.en` — useful for English-only setups
52
+ - `medium` — move here if you want more accuracy and can accept higher cost
53
+
54
+ ## moonshine
55
+
56
+ **Use this when:**
57
+ - you want a lighter local option
58
+ - you are willing to trade maturity and flexibility for smaller setup/runtime needs
59
+
60
+ **Pros**
61
+ - lightweight local direction
62
+ - fewer model decisions
63
+
64
+ **Cons**
65
+ - narrower model ecosystem
66
+ - less common than Whisper-family defaults
67
+
68
+ **Suggested starting models**
69
+ - `moonshine/base` — safer default
70
+ - `moonshine/tiny` — lowest-cost experiment path
71
+
72
+ ## whisper-cpp
73
+
74
+ **Use this when:**
75
+ - you already use whisper.cpp
76
+ - you prefer CLI-driven local tooling
77
+ - you want to manage model files more explicitly
78
+
79
+ **Pros**
80
+ - straightforward local CLI model
81
+ - familiar to users already in the whisper.cpp ecosystem
82
+
83
+ **Cons**
84
+ - model file placement matters
85
+ - setup can feel more manual than Python-backed alternatives
86
+
87
+ **Suggested starting models**
88
+ - `small` — practical starting point
89
+ - `small.en` — good English-only choice
90
+
91
+ ## deepgram
92
+
93
+ **Use this when:**
94
+ - you want the fastest route to a working setup
95
+ - you are comfortable with cloud transcription
96
+ - you already have or plan to add `DEEPGRAM_API_KEY`
97
+
98
+ **Pros**
99
+ - minimal local dependency burden
100
+ - fast setup when credentials are ready
101
+ - good fit for users optimizing for time-to-value
102
+
103
+ **Cons**
104
+ - requires network connectivity
105
+ - audio is sent to a cloud provider
106
+ - subject to account limits and billing
107
+
108
+ **Suggested starting model**
109
+ - `nova-3` — the current default in `transcribe.py`
110
+
111
+ ## parakeet
112
+
113
+ **Use this when:**
114
+ - you specifically want to experiment with NVIDIA NeMo/Parakeet
115
+ - you are comfortable with a heavier Python stack
116
+
117
+ **Pros**
118
+ - interesting specialized option
119
+ - useful for experimentation in the NeMo ecosystem
120
+
121
+ **Cons**
122
+ - heavier setup than the other paths
123
+ - not the best default for a first-time user
124
+
125
+ ## Choosing local vs cloud
126
+
127
+ ## Choose local when
128
+ - privacy matters more than setup speed
129
+ - you want offline use
130
+ - you are comfortable installing dependencies
131
+ - you want per-project control without relying on network services
132
+
133
+ ## Choose cloud when
134
+ - you want the fastest setup path
135
+ - you are okay with a provider API key
136
+ - you want to avoid local model/dependency installation
137
+ - you are comfortable with network and billing constraints
138
+
139
+ ## Model selection guidance
140
+
141
+ The current package exposes raw backend model names plus model-aware status where it can detect it.
142
+
143
+ A conservative way to choose:
144
+
145
+ - prefer **already installed** models first when they meet your needs
146
+ - start small: `small`, `small.en`, or backend defaults
147
+ - only move up if the current model is not accurate enough
148
+ - prefer English-only variants when your use case is English-only and the backend provides them
149
+ - avoid jumping to the biggest model first unless you already know your machine can handle it
150
+
151
+ ## How to read model status
152
+
153
+ During onboarding and in some diagnostics output, local models can show up as:
154
+
155
+ - **installed** — the model appears to already exist locally and should not need a fresh download
156
+ - **recommended, installed** — same as above, and also the preferred choice for the current path
157
+ - **download required** — backend is available, but the selected model does not appear to be present
158
+ - **unknown** — backend is present, but model presence could not be confirmed confidently
159
+ - **api** — cloud path; no local model download applies
160
+
161
+ For `unknown`, do not assume the system is broken. It means `pi-voice` is being conservative and avoiding a false claim.
162
+
163
+ ## Configuration scope
164
+
165
+ `pi-voice` now supports both:
166
+
167
+ - **Global scope** — applies across projects
168
+ - **Project scope** — applies only in the current repository
169
+
170
+ Use **project scope** when:
171
+ - one repo needs a different voice workflow
172
+ - you want a safer rollout before using the same config everywhere
173
+ - you need local STT in one project but cloud STT in another
174
+
175
+ ## Inspect current backend state
176
+
177
+ Useful commands:
178
+
179
+ - `/voice info` — show current effective config and selected model status
180
+ - `/voice backends` — list available backends, installed models, and install detection hints
181
+ - `/voice setup` — choose backend and model interactively
182
+ - `/voice test` — do a basic voice environment check plus current model readiness
183
+ - `/voice doctor` — compare current setup repair steps with a recommended alternative
184
+ - `/voice daemon status` — inspect the warm daemon state
185
+
186
+ ## Current limitations
187
+
188
+ Be conservative about expectations:
189
+
190
+ - backend installation is still partly toolchain-dependent
191
+ - some setup steps may still require manual `brew` or `pip` work
192
+ - not every backend has equally strong model-detection confidence
193
+ - `unknown` model status is intentional when the system cannot verify a local cache safely
194
+ - backend available does **not** always mean every model is already ready locally
195
+
196
+ If you hit issues, see [`docs/troubleshooting.md`](./troubleshooting.md).