@codexstar/pi-listen 1.0.4
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/LICENSE +21 -0
- package/README.md +283 -0
- package/daemon.py +517 -0
- package/docs/API.md +273 -0
- package/docs/ARCHITECTURE.md +114 -0
- package/docs/backends.md +196 -0
- package/docs/plans/2026-03-12-pi-voice-master-plan.md +613 -0
- package/docs/plans/2026-03-12-pi-voice-model-aware-execution-plan.md +256 -0
- package/docs/plans/2026-03-12-pi-voice-onboarding-remediation-plan.md +391 -0
- package/docs/plans/pi-voice-model-aware-review.md +196 -0
- package/docs/plans/pi-voice-model-detection-qa-plan.md +226 -0
- package/docs/plans/pi-voice-model-detection-research.md +483 -0
- package/docs/plans/pi-voice-onboarding-ux-plan.md +388 -0
- package/docs/plans/pi-voice-release-validation-plan.md +386 -0
- package/docs/plans/pi-voice-remaining-implementation-plan.md +524 -0
- package/docs/plans/pi-voice-review-findings.md +227 -0
- package/docs/plans/pi-voice-technical-remediation-plan.md +613 -0
- package/docs/qa-matrix.md +69 -0
- package/docs/qa-results.md +357 -0
- package/docs/troubleshooting.md +265 -0
- package/extensions/voice/config.ts +206 -0
- package/extensions/voice/diagnostics.ts +212 -0
- package/extensions/voice/install.ts +62 -0
- package/extensions/voice/onboarding.ts +315 -0
- package/extensions/voice.ts +1149 -0
- package/package.json +48 -0
- package/scripts/setup-macos.sh +374 -0
- package/scripts/setup-windows.ps1 +271 -0
- package/transcribe.py +497 -0
|
@@ -0,0 +1,388 @@
|
|
|
1
|
+
# pi-voice onboarding UX/product plan
|
|
2
|
+
|
|
3
|
+
## Objective
|
|
4
|
+
|
|
5
|
+
Turn pi-voice from a technically capable extension into a polished, enterprise-grade product that feels guided from the first interactive session after installation. The extension should help users choose **cloud API vs local/offline STT**, recommend the right backend and model, validate the setup end to end, and leave the user confident that voice input is ready.
|
|
6
|
+
|
|
7
|
+
This plan assumes Pi packages do **not** provide a dedicated interactive install hook. The primary onboarding moment is therefore the **first interactive `session_start`** after package installation, with the same flow re-openable via `/voice setup`.
|
|
8
|
+
|
|
9
|
+
## Current UX gaps
|
|
10
|
+
|
|
11
|
+
- Install ends with no guidance; the user must discover `/voice setup` manually.
|
|
12
|
+
- `session_start` only loads config and starts the daemon if enabled, so first-run users get no onboarding help.
|
|
13
|
+
- `/voice setup` is a thin backend/model picker rather than a guided setup flow.
|
|
14
|
+
- Unavailable backends only show raw install hints, not guided next steps.
|
|
15
|
+
- Setup writes only to global settings even though the extension reads both global and project settings.
|
|
16
|
+
- The current language is implementation-centric (`backend`, `model`) rather than user-centric (`privacy`, `speed`, `works offline`).
|
|
17
|
+
|
|
18
|
+
## UX principles
|
|
19
|
+
|
|
20
|
+
1. **Lead with outcomes, not implementation.** Ask how the user wants to use voice, not which backend they want.
|
|
21
|
+
2. **Make the recommended path obvious.** Every decision screen should identify one recommended option.
|
|
22
|
+
3. **Let novices stay simple; let experts go deep.** Keep advanced knobs behind an “Advanced options” branch.
|
|
23
|
+
4. **No dead ends.** Every failure state must offer retry, change choice, or exit-with-instructions.
|
|
24
|
+
5. **Always end with proof.** Setup is not complete until a mic check and transcription test succeeds or the user intentionally skips validation.
|
|
25
|
+
6. **Respect scope.** Users should choose whether setup applies globally or only to the current project.
|
|
26
|
+
|
|
27
|
+
## Primary user journeys
|
|
28
|
+
|
|
29
|
+
### 1. First-time user, wants fastest path
|
|
30
|
+
- Installs package.
|
|
31
|
+
- Opens Pi interactively.
|
|
32
|
+
- Sees a welcome wizard.
|
|
33
|
+
- Chooses “Fastest setup”.
|
|
34
|
+
- System recommends cloud/API if no local stack is present, or a ready local backend if already installed.
|
|
35
|
+
- User validates with a short voice sample.
|
|
36
|
+
- Completion screen shows shortcuts and how to reopen setup.
|
|
37
|
+
|
|
38
|
+
### 2. Privacy-focused user, wants local/offline
|
|
39
|
+
- Opens wizard.
|
|
40
|
+
- Chooses “Keep audio on this machine”.
|
|
41
|
+
- System recommends the best local backend available, with model suggestions by latency/quality.
|
|
42
|
+
- If dependencies are missing, onboarding moves into guided install/remediation.
|
|
43
|
+
- Validation confirms offline transcription works.
|
|
44
|
+
|
|
45
|
+
### 3. Team/project-specific user
|
|
46
|
+
- Opens setup from a repo with shared preferences.
|
|
47
|
+
- Chooses “Only for this project”.
|
|
48
|
+
- Onboarding writes to `.pi/settings.json`.
|
|
49
|
+
- Completion screen explains that the project now has voice defaults separate from the user’s global defaults.
|
|
50
|
+
|
|
51
|
+
### 4. Returning user with broken setup
|
|
52
|
+
- `session_start` detects incomplete onboarding or failed validation.
|
|
53
|
+
- User sees a lightweight recovery banner:
|
|
54
|
+
- Re-run setup
|
|
55
|
+
- View diagnostics
|
|
56
|
+
- Continue without voice
|
|
57
|
+
|
|
58
|
+
## Proposed onboarding architecture
|
|
59
|
+
|
|
60
|
+
## Triggering rules
|
|
61
|
+
|
|
62
|
+
Show the full onboarding wizard when all are true:
|
|
63
|
+
- `ctx.hasUI === true`
|
|
64
|
+
- this is the first run for the current onboarding version, or config is incomplete
|
|
65
|
+
- voice onboarding has not been explicitly dismissed for the current version
|
|
66
|
+
|
|
67
|
+
Show a smaller recovery prompt when:
|
|
68
|
+
- voice is enabled but dependencies or validation are missing
|
|
69
|
+
- previously selected backend/model is no longer available
|
|
70
|
+
|
|
71
|
+
Do not auto-launch onboarding in:
|
|
72
|
+
- print mode
|
|
73
|
+
- non-interactive mode
|
|
74
|
+
- sessions where the user already completed onboarding for the current version
|
|
75
|
+
|
|
76
|
+
## State model required for UX
|
|
77
|
+
|
|
78
|
+
The product layer should persist more than the current `enabled/language/backend/model` fields. UX needs:
|
|
79
|
+
|
|
80
|
+
- `onboardingVersion`
|
|
81
|
+
- `onboardingCompletedAt`
|
|
82
|
+
- `onboardingDismissedAt`
|
|
83
|
+
- `installMode` (`cloud`, `local`, `auto`)
|
|
84
|
+
- `preferenceProfile` (`fastest`, `privacy`, `accuracy`, `balanced`)
|
|
85
|
+
- `settingsScope` (`global`, `project`)
|
|
86
|
+
- `lastValidatedAt`
|
|
87
|
+
- `lastValidationResult`
|
|
88
|
+
- `selectedBackend`
|
|
89
|
+
- `selectedModel`
|
|
90
|
+
- `language`
|
|
91
|
+
- `btwEnabled`
|
|
92
|
+
- `hotkeyMode`
|
|
93
|
+
|
|
94
|
+
This state enables re-entry, migration, recovery prompts, and a completion receipt.
|
|
95
|
+
|
|
96
|
+
## End-to-end onboarding flow
|
|
97
|
+
|
|
98
|
+
### Step 0: Welcome
|
|
99
|
+
Purpose: explain value, reassure the user, and set expectation.
|
|
100
|
+
|
|
101
|
+
Screen content:
|
|
102
|
+
- Title: `Set up voice input`
|
|
103
|
+
- One-line value prop: “Talk to Pi instead of typing. Voice can run through a cloud API or locally on your machine.”
|
|
104
|
+
- Estimated time: “About 2–3 minutes”
|
|
105
|
+
- Actions:
|
|
106
|
+
- `Start setup` (recommended)
|
|
107
|
+
- `Remind me later`
|
|
108
|
+
- `Skip and disable voice`
|
|
109
|
+
|
|
110
|
+
### Step 1: How do you want to use speech-to-text?
|
|
111
|
+
This is the key product question.
|
|
112
|
+
|
|
113
|
+
Options:
|
|
114
|
+
- `Use a cloud API` — best for fastest setup, usually strong accuracy, requires network and API credentials
|
|
115
|
+
- `Run locally on this machine` — best for privacy/offline use, may require local dependencies and model downloads
|
|
116
|
+
- `Help me choose` — guided recommendation path
|
|
117
|
+
|
|
118
|
+
This screen should use plain language, not backend names.
|
|
119
|
+
|
|
120
|
+
### Step 2: What matters most?
|
|
121
|
+
Shown either after “Help me choose” or as a tuning screen for the other choices.
|
|
122
|
+
|
|
123
|
+
Options:
|
|
124
|
+
- `Fastest setup`
|
|
125
|
+
- `Best privacy`
|
|
126
|
+
- `Best accuracy`
|
|
127
|
+
- `Balanced default` (recommended)
|
|
128
|
+
|
|
129
|
+
This preference drives recommendation logic and default model choice.
|
|
130
|
+
|
|
131
|
+
### Step 3: Recommendation screen
|
|
132
|
+
The extension should translate user intent into a recommended path.
|
|
133
|
+
|
|
134
|
+
Example output card:
|
|
135
|
+
- `Recommended: faster-whisper / small`
|
|
136
|
+
- Why: “Runs locally, is already available on this machine, and offers a good speed/accuracy tradeoff.”
|
|
137
|
+
- Secondary choices:
|
|
138
|
+
- `Deepgram / nova-3` for fast cloud setup
|
|
139
|
+
- `faster-whisper / medium` for higher accuracy
|
|
140
|
+
|
|
141
|
+
This screen should always include:
|
|
142
|
+
- recommendation
|
|
143
|
+
- one-sentence reasoning
|
|
144
|
+
- alternate choices
|
|
145
|
+
- an `Advanced options` action for explicit backend/model picking
|
|
146
|
+
|
|
147
|
+
### Step 4: Scope selection
|
|
148
|
+
Ask where to save settings.
|
|
149
|
+
|
|
150
|
+
Options:
|
|
151
|
+
- `Use in all projects` → writes to `~/.pi/agent/settings.json`
|
|
152
|
+
- `Only use in this project` → writes to `<cwd>/.pi/settings.json`
|
|
153
|
+
|
|
154
|
+
Copy should explain the difference in plain English.
|
|
155
|
+
|
|
156
|
+
### Step 5: Backend/model confirmation
|
|
157
|
+
This step is simple for the default path and richer for advanced users.
|
|
158
|
+
|
|
159
|
+
Default path:
|
|
160
|
+
- Show selected backend/model pair with brief tradeoff copy.
|
|
161
|
+
- Confirm or edit.
|
|
162
|
+
|
|
163
|
+
Advanced path:
|
|
164
|
+
- Show backend list grouped by `Cloud` and `Local`.
|
|
165
|
+
- Each option should include:
|
|
166
|
+
- availability
|
|
167
|
+
- whether install is required
|
|
168
|
+
- quality/speed/privacy hint
|
|
169
|
+
- Model picker should include badges such as:
|
|
170
|
+
- `fast`
|
|
171
|
+
- `recommended`
|
|
172
|
+
- `higher accuracy`
|
|
173
|
+
- `larger download`
|
|
174
|
+
|
|
175
|
+
### Step 6: Readiness and install guidance
|
|
176
|
+
This screen should summarize what the extension needs to finish setup.
|
|
177
|
+
|
|
178
|
+
Examples:
|
|
179
|
+
- `Microphone capture tool: ready`
|
|
180
|
+
- `Python runtime: ready`
|
|
181
|
+
- `Selected backend: needs install`
|
|
182
|
+
- `API key: missing`
|
|
183
|
+
|
|
184
|
+
Actions:
|
|
185
|
+
- `Install now` when safe/automatable
|
|
186
|
+
- `Show commands` when manual install is required
|
|
187
|
+
- `Choose a different option`
|
|
188
|
+
- `Continue after I’ve installed it`
|
|
189
|
+
|
|
190
|
+
This step should use a loader/spinner UI during scans and install attempts. Avoid dropping the user straight into notifications.
|
|
191
|
+
|
|
192
|
+
### Step 7: Live validation
|
|
193
|
+
This is the confidence moment.
|
|
194
|
+
|
|
195
|
+
Flow:
|
|
196
|
+
1. prompt user to say a short sentence
|
|
197
|
+
2. record 2–4 seconds
|
|
198
|
+
3. transcribe with the chosen backend/model
|
|
199
|
+
4. show transcript and latency
|
|
200
|
+
5. ask if the result looks correct
|
|
201
|
+
|
|
202
|
+
Actions:
|
|
203
|
+
- `Looks good`
|
|
204
|
+
- `Try again`
|
|
205
|
+
- `Change model`
|
|
206
|
+
- `Open advanced settings`
|
|
207
|
+
|
|
208
|
+
If validation is skipped, completion should clearly say “Configured but not validated”.
|
|
209
|
+
|
|
210
|
+
### Step 8: Completion receipt
|
|
211
|
+
The final screen should feel premium and operational.
|
|
212
|
+
|
|
213
|
+
Show:
|
|
214
|
+
- selected mode (`cloud` or `local`)
|
|
215
|
+
- backend and model
|
|
216
|
+
- where settings were saved
|
|
217
|
+
- key shortcuts:
|
|
218
|
+
- `Hold SPACE` to talk when the editor is empty
|
|
219
|
+
- `Ctrl+Shift+V` to toggle recording
|
|
220
|
+
- `Ctrl+Shift+B` for voice → BTW
|
|
221
|
+
- commands:
|
|
222
|
+
- `/voice setup`
|
|
223
|
+
- `/voice doctor`
|
|
224
|
+
- `/voice info`
|
|
225
|
+
|
|
226
|
+
Actions:
|
|
227
|
+
- `Finish`
|
|
228
|
+
- `Open advanced settings`
|
|
229
|
+
- `Run another test`
|
|
230
|
+
|
|
231
|
+
## Recommendation logic
|
|
232
|
+
|
|
233
|
+
## Input signals
|
|
234
|
+
Use these inputs to drive recommendation UX:
|
|
235
|
+
- installed backends from `transcribe.py --list-backends`
|
|
236
|
+
- presence of API key(s)
|
|
237
|
+
- SoX / `rec` availability
|
|
238
|
+
- Python availability
|
|
239
|
+
- current OS assumptions (Apple Silicon-friendly defaults)
|
|
240
|
+
- user’s stated preference (`fastest`, `privacy`, `accuracy`, `balanced`)
|
|
241
|
+
|
|
242
|
+
## Recommendation rules
|
|
243
|
+
|
|
244
|
+
### If user chooses `cloud API`
|
|
245
|
+
- Recommend `deepgram / nova-3` when API credentials exist.
|
|
246
|
+
- If credentials are missing, show it as available-but-needs-auth.
|
|
247
|
+
- If the user wants fastest setup and local dependencies are absent, cloud can still be recommended with a clear note about the missing API key step.
|
|
248
|
+
|
|
249
|
+
### If user chooses `local`
|
|
250
|
+
- Prefer a backend that is already installed.
|
|
251
|
+
- If `faster-whisper` is installed, make it the default recommendation.
|
|
252
|
+
- If nothing local is installed, recommend the easiest reliable local path and explain required setup.
|
|
253
|
+
- Use `small` as the default balanced model, `base`/`tiny` for speed-first, `medium` or better for accuracy-first where supported.
|
|
254
|
+
|
|
255
|
+
### If user chooses `help me choose`
|
|
256
|
+
- `privacy` → local first
|
|
257
|
+
- `fastest setup` → whichever of cloud-with-key or already-installed local backend minimizes friction
|
|
258
|
+
- `accuracy` → best available model/backend with clear tradeoff warning
|
|
259
|
+
- `balanced` → installed local backend first, else cloud if credentialed, else easiest local path
|
|
260
|
+
|
|
261
|
+
## Failure and recovery UX
|
|
262
|
+
|
|
263
|
+
### Failure class 1: missing dependency before selection
|
|
264
|
+
Example: `rec` missing.
|
|
265
|
+
|
|
266
|
+
UX:
|
|
267
|
+
- show a checklist screen
|
|
268
|
+
- explain why the dependency matters
|
|
269
|
+
- offer install/remediation path
|
|
270
|
+
- let the user continue exploring options without losing progress
|
|
271
|
+
|
|
272
|
+
### Failure class 2: selected backend unavailable
|
|
273
|
+
Example: user picks Deepgram without an API key.
|
|
274
|
+
|
|
275
|
+
UX:
|
|
276
|
+
- inline error card: `Deepgram needs an API key before it can be used.`
|
|
277
|
+
- actions:
|
|
278
|
+
- `Set up credentials`
|
|
279
|
+
- `Choose another backend`
|
|
280
|
+
- `Go back`
|
|
281
|
+
|
|
282
|
+
### Failure class 3: test recording fails
|
|
283
|
+
Example: no mic data recorded.
|
|
284
|
+
|
|
285
|
+
UX:
|
|
286
|
+
- say exactly what failed: no audio recorded vs device unavailable vs capture tool missing
|
|
287
|
+
- offer `Try again`, `Open diagnostics`, `Choose another setup path`
|
|
288
|
+
|
|
289
|
+
### Failure class 4: transcription succeeds but quality is poor
|
|
290
|
+
UX:
|
|
291
|
+
- present transcript with confidence framing: “We got a result, but it may not be accurate enough yet.”
|
|
292
|
+
- actions:
|
|
293
|
+
- `Use a larger model`
|
|
294
|
+
- `Switch to cloud`
|
|
295
|
+
- `Keep current setup`
|
|
296
|
+
|
|
297
|
+
### Failure class 5: onboarding abandoned mid-flow
|
|
298
|
+
UX:
|
|
299
|
+
- save draft onboarding state
|
|
300
|
+
- on next session start, show `Continue voice setup` instead of starting from zero
|
|
301
|
+
|
|
302
|
+
## Settings scope UX
|
|
303
|
+
|
|
304
|
+
Settings scope is important enough to get its own explicit decision.
|
|
305
|
+
|
|
306
|
+
### Global scope
|
|
307
|
+
Copy: “Use these voice settings in every project on this machine.”
|
|
308
|
+
|
|
309
|
+
Use when:
|
|
310
|
+
- the user is setting up a personal default workflow
|
|
311
|
+
- the chosen backend is tied to user-level credentials or machine-level installs
|
|
312
|
+
|
|
313
|
+
### Project scope
|
|
314
|
+
Copy: “Use these voice settings only in this project. This is useful for team-shared defaults or repo-specific preferences.”
|
|
315
|
+
|
|
316
|
+
Use when:
|
|
317
|
+
- the project has team norms
|
|
318
|
+
- a repo wants a specific STT mode
|
|
319
|
+
- the user wants to avoid changing other projects
|
|
320
|
+
|
|
321
|
+
### UX requirement
|
|
322
|
+
Every confirmation/receipt screen must repeat where settings were saved.
|
|
323
|
+
|
|
324
|
+
## TUI implementation guidance
|
|
325
|
+
|
|
326
|
+
Use Pi’s richer custom TUI patterns instead of plain `ctx.ui.select()` for the main wizard.
|
|
327
|
+
|
|
328
|
+
Recommended building blocks:
|
|
329
|
+
- `ctx.ui.custom()` for the multi-step wizard shell
|
|
330
|
+
- `SelectList` for decision screens
|
|
331
|
+
- `BorderedLoader` for scans, installs, and test transcription
|
|
332
|
+
- `SettingsList` for advanced toggles
|
|
333
|
+
- `setStatus` for persistent `MIC`, `SETUP`, `READY`, `ERROR` state
|
|
334
|
+
- `setWidget` for non-modal reminders such as `Voice setup incomplete — /voice setup`
|
|
335
|
+
|
|
336
|
+
## Suggested UX structure in code
|
|
337
|
+
|
|
338
|
+
- `showVoiceOnboardingWizard(ctx)`
|
|
339
|
+
- `showVoiceRecoveryPrompt(ctx)`
|
|
340
|
+
- `showVoiceValidationFlow(ctx, draftConfig)`
|
|
341
|
+
- `showVoiceCompletionReceipt(ctx, summary)`
|
|
342
|
+
- `showVoiceAdvancedSettings(ctx)`
|
|
343
|
+
|
|
344
|
+
The same wizard should power both:
|
|
345
|
+
- automatic first-run onboarding
|
|
346
|
+
- manual `/voice setup`
|
|
347
|
+
|
|
348
|
+
## Success metrics / acceptance criteria
|
|
349
|
+
|
|
350
|
+
The UX is ready when:
|
|
351
|
+
- a first-time user can complete setup without reading README instructions
|
|
352
|
+
- the user is asked API vs local in plain language
|
|
353
|
+
- the user gets a recommendation with reasoning, not just a backend list
|
|
354
|
+
- setup clearly distinguishes global vs project scope
|
|
355
|
+
- the user can validate mic + transcription before completion
|
|
356
|
+
- any failed step offers retry, back, or fallback without losing progress
|
|
357
|
+
- `/voice setup` reopens the same polished flow
|
|
358
|
+
- the completion screen leaves the user knowing exactly how to start using voice
|
|
359
|
+
|
|
360
|
+
## Phase-by-phase UX delivery plan
|
|
361
|
+
|
|
362
|
+
### Phase 1: Product foundation
|
|
363
|
+
- Expand config/onboarding state model.
|
|
364
|
+
- Add first-run detection and recovery-banner logic.
|
|
365
|
+
- Define backend metadata needed for recommendation copy.
|
|
366
|
+
|
|
367
|
+
### Phase 2: Wizard MVP
|
|
368
|
+
- Build welcome, mode choice, preference choice, recommendation, scope, and completion screens.
|
|
369
|
+
- Keep advanced settings behind a secondary branch.
|
|
370
|
+
- Reuse existing backend scan data.
|
|
371
|
+
|
|
372
|
+
### Phase 3: Validation and recovery
|
|
373
|
+
- Add live mic/transcription test flow.
|
|
374
|
+
- Add resumable onboarding state.
|
|
375
|
+
- Add incomplete-setup widget/banner on session start.
|
|
376
|
+
|
|
377
|
+
### Phase 4: Premium polish
|
|
378
|
+
- Better copy, badges, progress states, completion receipt, and clearer failure messages.
|
|
379
|
+
- Add `/voice doctor` and route broken states there.
|
|
380
|
+
- Tighten “recommended” logic based on observed machine readiness.
|
|
381
|
+
|
|
382
|
+
## Deliverables from the UX track
|
|
383
|
+
|
|
384
|
+
1. A multi-step first-run onboarding wizard spec.
|
|
385
|
+
2. Plain-language decision tree for cloud vs local STT.
|
|
386
|
+
3. Recommendation/copy matrix for backend and model suggestions.
|
|
387
|
+
4. Recovery UX for missing dependencies, missing auth, failed recording, and failed transcription.
|
|
388
|
+
5. Settings-scope UX that clearly supports both global and project-local configuration.
|