@codexstar/pi-listen 1.0.4

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,388 @@
1
+ # pi-voice onboarding UX/product plan
2
+
3
+ ## Objective
4
+
5
+ Turn pi-voice from a technically capable extension into a polished, enterprise-grade product that feels guided from the first interactive session after installation. The extension should help users choose **cloud API vs local/offline STT**, recommend the right backend and model, validate the setup end to end, and leave the user confident that voice input is ready.
6
+
7
+ This plan assumes Pi packages do **not** provide a dedicated interactive install hook. The primary onboarding moment is therefore the **first interactive `session_start`** after package installation, with the same flow re-openable via `/voice setup`.
8
+
9
+ ## Current UX gaps
10
+
11
+ - Install ends with no guidance; the user must discover `/voice setup` manually.
12
+ - `session_start` only loads config and starts the daemon if enabled, so first-run users get no onboarding help.
13
+ - `/voice setup` is a thin backend/model picker rather than a guided setup flow.
14
+ - Unavailable backends only show raw install hints, not guided next steps.
15
+ - Setup writes only to global settings even though the extension reads both global and project settings.
16
+ - The current language is implementation-centric (`backend`, `model`) rather than user-centric (`privacy`, `speed`, `works offline`).
17
+
18
+ ## UX principles
19
+
20
+ 1. **Lead with outcomes, not implementation.** Ask how the user wants to use voice, not which backend they want.
21
+ 2. **Make the recommended path obvious.** Every decision screen should identify one recommended option.
22
+ 3. **Let novices stay simple; let experts go deep.** Keep advanced knobs behind an “Advanced options” branch.
23
+ 4. **No dead ends.** Every failure state must offer retry, change choice, or exit-with-instructions.
24
+ 5. **Always end with proof.** Setup is not complete until a mic check and transcription test succeeds or the user intentionally skips validation.
25
+ 6. **Respect scope.** Users should choose whether setup applies globally or only to the current project.
26
+
27
+ ## Primary user journeys
28
+
29
+ ### 1. First-time user, wants fastest path
30
+ - Installs package.
31
+ - Opens Pi interactively.
32
+ - Sees a welcome wizard.
33
+ - Chooses “Fastest setup”.
34
+ - System recommends cloud/API if no local stack is present, or a ready local backend if already installed.
35
+ - User validates with a short voice sample.
36
+ - Completion screen shows shortcuts and how to reopen setup.
37
+
38
+ ### 2. Privacy-focused user, wants local/offline
39
+ - Opens wizard.
40
+ - Chooses “Keep audio on this machine”.
41
+ - System recommends the best local backend available, with model suggestions by latency/quality.
42
+ - If dependencies are missing, onboarding moves into guided install/remediation.
43
+ - Validation confirms offline transcription works.
44
+
45
+ ### 3. Team/project-specific user
46
+ - Opens setup from a repo with shared preferences.
47
+ - Chooses “Only for this project”.
48
+ - Onboarding writes to `.pi/settings.json`.
49
+ - Completion screen explains that the project now has voice defaults separate from the user’s global defaults.
50
+
51
+ ### 4. Returning user with broken setup
52
+ - `session_start` detects incomplete onboarding or failed validation.
53
+ - User sees a lightweight recovery banner:
54
+ - Re-run setup
55
+ - View diagnostics
56
+ - Continue without voice
57
+
58
+ ## Proposed onboarding architecture
59
+
60
+ ## Triggering rules
61
+
62
+ Show the full onboarding wizard when all are true:
63
+ - `ctx.hasUI === true`
64
+ - this is the first run for the current onboarding version, or config is incomplete
65
+ - voice onboarding has not been explicitly dismissed for the current version
66
+
67
+ Show a smaller recovery prompt when:
68
+ - voice is enabled but dependencies or validation are missing
69
+ - previously selected backend/model is no longer available
70
+
71
+ Do not auto-launch onboarding in:
72
+ - print mode
73
+ - non-interactive mode
74
+ - sessions where the user already completed onboarding for the current version
75
+
76
+ ## State model required for UX
77
+
78
+ The product layer should persist more than the current `enabled/language/backend/model` fields. UX needs:
79
+
80
+ - `onboardingVersion`
81
+ - `onboardingCompletedAt`
82
+ - `onboardingDismissedAt`
83
+ - `installMode` (`cloud`, `local`, `auto`)
84
+ - `preferenceProfile` (`fastest`, `privacy`, `accuracy`, `balanced`)
85
+ - `settingsScope` (`global`, `project`)
86
+ - `lastValidatedAt`
87
+ - `lastValidationResult`
88
+ - `selectedBackend`
89
+ - `selectedModel`
90
+ - `language`
91
+ - `btwEnabled`
92
+ - `hotkeyMode`
93
+
94
+ This state enables re-entry, migration, recovery prompts, and a completion receipt.
95
+
96
+ ## End-to-end onboarding flow
97
+
98
+ ### Step 0: Welcome
99
+ Purpose: explain value, reassure the user, and set expectation.
100
+
101
+ Screen content:
102
+ - Title: `Set up voice input`
103
+ - One-line value prop: “Talk to Pi instead of typing. Voice can run through a cloud API or locally on your machine.”
104
+ - Estimated time: “About 2–3 minutes”
105
+ - Actions:
106
+ - `Start setup` (recommended)
107
+ - `Remind me later`
108
+ - `Skip and disable voice`
109
+
110
+ ### Step 1: How do you want to use speech-to-text?
111
+ This is the key product question.
112
+
113
+ Options:
114
+ - `Use a cloud API` — best for fastest setup, usually strong accuracy, requires network and API credentials
115
+ - `Run locally on this machine` — best for privacy/offline use, may require local dependencies and model downloads
116
+ - `Help me choose` — guided recommendation path
117
+
118
+ This screen should use plain language, not backend names.
119
+
120
+ ### Step 2: What matters most?
121
+ Shown either after “Help me choose” or as a tuning screen for the other choices.
122
+
123
+ Options:
124
+ - `Fastest setup`
125
+ - `Best privacy`
126
+ - `Best accuracy`
127
+ - `Balanced default` (recommended)
128
+
129
+ This preference drives recommendation logic and default model choice.
130
+
131
+ ### Step 3: Recommendation screen
132
+ The extension should translate user intent into a recommended path.
133
+
134
+ Example output card:
135
+ - `Recommended: faster-whisper / small`
136
+ - Why: “Runs locally, is already available on this machine, and offers a good speed/accuracy tradeoff.”
137
+ - Secondary choices:
138
+ - `Deepgram / nova-3` for fast cloud setup
139
+ - `faster-whisper / medium` for higher accuracy
140
+
141
+ This screen should always include:
142
+ - recommendation
143
+ - one-sentence reasoning
144
+ - alternate choices
145
+ - an `Advanced options` action for explicit backend/model picking
146
+
147
+ ### Step 4: Scope selection
148
+ Ask where to save settings.
149
+
150
+ Options:
151
+ - `Use in all projects` → writes to `~/.pi/agent/settings.json`
152
+ - `Only use in this project` → writes to `<cwd>/.pi/settings.json`
153
+
154
+ Copy should explain the difference in plain English.
155
+
156
+ ### Step 5: Backend/model confirmation
157
+ This step is simple for the default path and richer for advanced users.
158
+
159
+ Default path:
160
+ - Show selected backend/model pair with brief tradeoff copy.
161
+ - Confirm or edit.
162
+
163
+ Advanced path:
164
+ - Show backend list grouped by `Cloud` and `Local`.
165
+ - Each option should include:
166
+ - availability
167
+ - whether install is required
168
+ - quality/speed/privacy hint
169
+ - Model picker should include badges such as:
170
+ - `fast`
171
+ - `recommended`
172
+ - `higher accuracy`
173
+ - `larger download`
174
+
175
+ ### Step 6: Readiness and install guidance
176
+ This screen should summarize what the extension needs to finish setup.
177
+
178
+ Examples:
179
+ - `Microphone capture tool: ready`
180
+ - `Python runtime: ready`
181
+ - `Selected backend: needs install`
182
+ - `API key: missing`
183
+
184
+ Actions:
185
+ - `Install now` when safe/automatable
186
+ - `Show commands` when manual install is required
187
+ - `Choose a different option`
188
+ - `Continue after I’ve installed it`
189
+
190
+ This step should use a loader/spinner UI during scans and install attempts. Avoid dropping the user straight into notifications.
191
+
192
+ ### Step 7: Live validation
193
+ This is the confidence moment.
194
+
195
+ Flow:
196
+ 1. prompt user to say a short sentence
197
+ 2. record 2–4 seconds
198
+ 3. transcribe with the chosen backend/model
199
+ 4. show transcript and latency
200
+ 5. ask if the result looks correct
201
+
202
+ Actions:
203
+ - `Looks good`
204
+ - `Try again`
205
+ - `Change model`
206
+ - `Open advanced settings`
207
+
208
+ If validation is skipped, completion should clearly say “Configured but not validated”.
209
+
210
+ ### Step 8: Completion receipt
211
+ The final screen should feel premium and operational.
212
+
213
+ Show:
214
+ - selected mode (`cloud` or `local`)
215
+ - backend and model
216
+ - where settings were saved
217
+ - key shortcuts:
218
+ - `Hold SPACE` to talk when the editor is empty
219
+ - `Ctrl+Shift+V` to toggle recording
220
+ - `Ctrl+Shift+B` for voice → BTW
221
+ - commands:
222
+ - `/voice setup`
223
+ - `/voice doctor`
224
+ - `/voice info`
225
+
226
+ Actions:
227
+ - `Finish`
228
+ - `Open advanced settings`
229
+ - `Run another test`
230
+
231
+ ## Recommendation logic
232
+
233
+ ## Input signals
234
+ Use these inputs to drive recommendation UX:
235
+ - installed backends from `transcribe.py --list-backends`
236
+ - presence of API key(s)
237
+ - SoX / `rec` availability
238
+ - Python availability
239
+ - current OS assumptions (Apple Silicon-friendly defaults)
240
+ - user’s stated preference (`fastest`, `privacy`, `accuracy`, `balanced`)
241
+
242
+ ## Recommendation rules
243
+
244
+ ### If user chooses `cloud API`
245
+ - Recommend `deepgram / nova-3` when API credentials exist.
246
+ - If credentials are missing, show it as available-but-needs-auth.
247
+ - If the user wants fastest setup and local dependencies are absent, cloud can still be recommended with a clear note about the missing API key step.
248
+
249
+ ### If user chooses `local`
250
+ - Prefer a backend that is already installed.
251
+ - If `faster-whisper` is installed, make it the default recommendation.
252
+ - If nothing local is installed, recommend the easiest reliable local path and explain required setup.
253
+ - Use `small` as the default balanced model, `base`/`tiny` for speed-first, `medium` or better for accuracy-first where supported.
254
+
255
+ ### If user chooses `help me choose`
256
+ - `privacy` → local first
257
+ - `fastest setup` → whichever of cloud-with-key or already-installed local backend minimizes friction
258
+ - `accuracy` → best available model/backend with clear tradeoff warning
259
+ - `balanced` → installed local backend first, else cloud if credentialed, else easiest local path
260
+
261
+ ## Failure and recovery UX
262
+
263
+ ### Failure class 1: missing dependency before selection
264
+ Example: `rec` missing.
265
+
266
+ UX:
267
+ - show a checklist screen
268
+ - explain why the dependency matters
269
+ - offer install/remediation path
270
+ - let the user continue exploring options without losing progress
271
+
272
+ ### Failure class 2: selected backend unavailable
273
+ Example: user picks Deepgram without an API key.
274
+
275
+ UX:
276
+ - inline error card: `Deepgram needs an API key before it can be used.`
277
+ - actions:
278
+ - `Set up credentials`
279
+ - `Choose another backend`
280
+ - `Go back`
281
+
282
+ ### Failure class 3: test recording fails
283
+ Example: no mic data recorded.
284
+
285
+ UX:
286
+ - say exactly what failed: no audio recorded vs device unavailable vs capture tool missing
287
+ - offer `Try again`, `Open diagnostics`, `Choose another setup path`
288
+
289
+ ### Failure class 4: transcription succeeds but quality is poor
290
+ UX:
291
+ - present transcript with confidence framing: “We got a result, but it may not be accurate enough yet.”
292
+ - actions:
293
+ - `Use a larger model`
294
+ - `Switch to cloud`
295
+ - `Keep current setup`
296
+
297
+ ### Failure class 5: onboarding abandoned mid-flow
298
+ UX:
299
+ - save draft onboarding state
300
+ - on next session start, show `Continue voice setup` instead of starting from zero
301
+
302
+ ## Settings scope UX
303
+
304
+ Settings scope is important enough to get its own explicit decision.
305
+
306
+ ### Global scope
307
+ Copy: “Use these voice settings in every project on this machine.”
308
+
309
+ Use when:
310
+ - the user is setting up a personal default workflow
311
+ - the chosen backend is tied to user-level credentials or machine-level installs
312
+
313
+ ### Project scope
314
+ Copy: “Use these voice settings only in this project. This is useful for team-shared defaults or repo-specific preferences.”
315
+
316
+ Use when:
317
+ - the project has team norms
318
+ - a repo wants a specific STT mode
319
+ - the user wants to avoid changing other projects
320
+
321
+ ### UX requirement
322
+ Every confirmation/receipt screen must repeat where settings were saved.
323
+
324
+ ## TUI implementation guidance
325
+
326
+ Use Pi’s richer custom TUI patterns instead of plain `ctx.ui.select()` for the main wizard.
327
+
328
+ Recommended building blocks:
329
+ - `ctx.ui.custom()` for the multi-step wizard shell
330
+ - `SelectList` for decision screens
331
+ - `BorderedLoader` for scans, installs, and test transcription
332
+ - `SettingsList` for advanced toggles
333
+ - `setStatus` for persistent `MIC`, `SETUP`, `READY`, `ERROR` state
334
+ - `setWidget` for non-modal reminders such as `Voice setup incomplete — /voice setup`
335
+
336
+ ## Suggested UX structure in code
337
+
338
+ - `showVoiceOnboardingWizard(ctx)`
339
+ - `showVoiceRecoveryPrompt(ctx)`
340
+ - `showVoiceValidationFlow(ctx, draftConfig)`
341
+ - `showVoiceCompletionReceipt(ctx, summary)`
342
+ - `showVoiceAdvancedSettings(ctx)`
343
+
344
+ The same wizard should power both:
345
+ - automatic first-run onboarding
346
+ - manual `/voice setup`
347
+
348
+ ## Success metrics / acceptance criteria
349
+
350
+ The UX is ready when:
351
+ - a first-time user can complete setup without reading README instructions
352
+ - the user is asked API vs local in plain language
353
+ - the user gets a recommendation with reasoning, not just a backend list
354
+ - setup clearly distinguishes global vs project scope
355
+ - the user can validate mic + transcription before completion
356
+ - any failed step offers retry, back, or fallback without losing progress
357
+ - `/voice setup` reopens the same polished flow
358
+ - the completion screen leaves the user knowing exactly how to start using voice
359
+
360
+ ## Phase-by-phase UX delivery plan
361
+
362
+ ### Phase 1: Product foundation
363
+ - Expand config/onboarding state model.
364
+ - Add first-run detection and recovery-banner logic.
365
+ - Define backend metadata needed for recommendation copy.
366
+
367
+ ### Phase 2: Wizard MVP
368
+ - Build welcome, mode choice, preference choice, recommendation, scope, and completion screens.
369
+ - Keep advanced settings behind a secondary branch.
370
+ - Reuse existing backend scan data.
371
+
372
+ ### Phase 3: Validation and recovery
373
+ - Add live mic/transcription test flow.
374
+ - Add resumable onboarding state.
375
+ - Add incomplete-setup widget/banner on session start.
376
+
377
+ ### Phase 4: Premium polish
378
+ - Better copy, badges, progress states, completion receipt, and clearer failure messages.
379
+ - Add `/voice doctor` and route broken states there.
380
+ - Tighten “recommended” logic based on observed machine readiness.
381
+
382
+ ## Deliverables from the UX track
383
+
384
+ 1. A multi-step first-run onboarding wizard spec.
385
+ 2. Plain-language decision tree for cloud vs local STT.
386
+ 3. Recommendation/copy matrix for backend and model suggestions.
387
+ 4. Recovery UX for missing dependencies, missing auth, failed recording, and failed transcription.
388
+ 5. Settings-scope UX that clearly supports both global and project-local configuration.