@codexstar/pi-listen 1.0.4

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,613 @@
1
+ # pi-voice world-class onboarding master plan
2
+
3
+ ## Context
4
+
5
+ This master plan consolidates the current audit of `pi-voice` and the initial team-agent planning pass into a single end-to-end execution plan.
6
+
7
+ Goal: turn `pi-voice` into a polished Pi package with a premium first-run onboarding experience that asks the user how they want to use STT:
8
+
9
+ - **Cloud API** vs **Local download**
10
+ - if local: **which backend/model**
11
+ - then provisions, validates, and leaves the user in a known-good state
12
+
13
+ Because Pi packages do not appear to expose a dedicated interactive install hook, the product should treat **first interactive `session_start` after install** as the onboarding moment.
14
+
15
+ ---
16
+
17
+ ## Product outcome
18
+
19
+ A user should be able to:
20
+
21
+ 1. `pi install ...`
22
+ 2. start Pi normally
23
+ 3. get a guided terminal onboarding wizard automatically on first run
24
+ 4. choose **API** or **Local**
25
+ 5. get a recommendation with tradeoff explanations
26
+ 6. install or validate the required dependencies
27
+ 7. run a real mic + transcription test
28
+ 8. finish with a success receipt and confidence that voice is working
29
+
30
+ Re-entry should always be available via `/voice setup` or `/voice reconfigure`.
31
+
32
+ ---
33
+
34
+ ## Current-state problems to fix
35
+
36
+ ### UX
37
+ - No first-run onboarding; setup is hidden behind `/voice setup`
38
+ - Current setup is a thin backend/model picker, not a product workflow
39
+ - No API-vs-local decision at the top of the flow
40
+ - No recommendation engine, progress UI, recovery UX, or completion receipt
41
+ - No scope selection (global vs project)
42
+
43
+ ### Technical correctness
44
+ - Daemon can keep stale backend/model config across sessions
45
+ - Single hard-coded socket path risks cross-session/project collisions
46
+ - Setup writes only global settings despite reading project + global settings
47
+ - Session startup eagerly tries to start the daemon before configuration is known-good
48
+
49
+ ### Packaging/productization
50
+ - No README or troubleshooting docs
51
+ - No backend matrix or install guide
52
+ - Minimal package metadata
53
+ - No onboarding state versioning or migration strategy
54
+
55
+ ---
56
+
57
+ ## Guiding principles
58
+
59
+ 1. **Correctness before polish** — runtime behavior must match saved settings every time
60
+ 2. **Onboarding is a product feature** — not an afterthought command
61
+ 3. **Automatic where safe, explicit where necessary**
62
+ 4. **Explain tradeoffs in user language** — privacy, speed, quality, setup effort
63
+ 5. **Recovery is part of UX** — failed setup must produce actionable next steps
64
+ 6. **Reconfiguration must be safe** — switching mode/model should be deterministic
65
+
66
+ ---
67
+
68
+ ## Target user journeys
69
+
70
+ ### Journey A — fastest setup
71
+ - User installs package
72
+ - First session starts
73
+ - Wizard recommends **API mode**
74
+ - Detects API key presence or guides user to add it
75
+ - Runs validation
76
+ - Finishes in under 2 minutes
77
+
78
+ ### Journey B — privacy/offline-first
79
+ - User installs package
80
+ - Wizard recommends **Local mode**
81
+ - Explains backend options and suggests a default model
82
+ - Installs SoX + Python/backend dependencies where possible
83
+ - Warms daemon/model
84
+ - Runs local transcription test
85
+
86
+ ### Journey C — project-specific configuration
87
+ - User wants different voice config in one repo
88
+ - Wizard offers **Use everywhere** or **Only in this project**
89
+ - Saves to the correct settings file
90
+ - Status and runtime behavior reflect the selected scope
91
+
92
+ ### Journey D — recovery
93
+ - Existing user has broken deps, stale daemon, or invalid config
94
+ - Startup detects mismatch
95
+ - User sees a non-destructive repair flow
96
+ - `/voice doctor` and `/voice reconfigure` provide guided recovery
97
+
98
+ ---
99
+
100
+ ## Milestone plan
101
+
102
+ ## Milestone 1 — runtime correctness foundation
103
+
104
+ ### Objective
105
+ Eliminate stale-config and lifecycle issues before adding a more ambitious UX layer.
106
+
107
+ ### Deliverables
108
+ - Runtime always uses the selected backend/model/language
109
+ - Daemon startup is gated by valid config
110
+ - Socket strategy is safer than one global hard-coded path
111
+ - Config persistence supports both global and project scopes
112
+
113
+ ### File changes
114
+
115
+ #### `extensions/voice.ts`
116
+ - Stop treating current in-memory config as sufficient truth
117
+ - Pass `backend`, `model`, and `language` in daemon transcription requests
118
+ - Add explicit runtime reload behavior when daemon config differs from desired config
119
+ - Remove unconditional eager warm-start from `session_start` until config readiness is known
120
+ - Split load/save logic into scope-aware helpers
121
+
122
+ #### `daemon.py`
123
+ - Return richer daemon status including active backend/model and config fingerprint
124
+ - Support safe reload semantics if requested config changes
125
+ - Move away from one universal socket path toward a config-aware or user-scoped socket strategy
126
+
127
+ #### `transcribe.py`
128
+ - Keep backend registry stable and machine-readable
129
+ - Ensure metadata needed by diagnostics/onboarding is available consistently
130
+
131
+ ### Acceptance criteria
132
+ - Switching backend/model in config always affects the next transcription
133
+ - Existing running daemon cannot silently keep old settings
134
+ - Project-level config can override global config cleanly
135
+ - No daemon starts on first run unless onboarding/config permits it
136
+
137
+ ### Validation
138
+ - `bunx tsc -p tsconfig.json`
139
+ - `python3 -m py_compile daemon.py transcribe.py`
140
+ - Manual test: start daemon on one config, switch to another config, verify transcription uses the new backend/model
141
+
142
+ ---
143
+
144
+ ## Milestone 2 — config schema + onboarding state
145
+
146
+ ### Objective
147
+ Add a durable, versioned configuration model that supports migrations and first-run logic.
148
+
149
+ ### Deliverables
150
+ - Versioned voice config schema
151
+ - Explicit onboarding state model
152
+ - Migration path from current simple config
153
+ - Scope-aware persistence API
154
+
155
+ ### Proposed config shape
156
+
157
+ ```json
158
+ {
159
+ "voice": {
160
+ "version": 2,
161
+ "enabled": true,
162
+ "language": "en",
163
+ "mode": "local",
164
+ "backend": "faster-whisper",
165
+ "model": "small",
166
+ "scope": "global",
167
+ "btwEnabled": true,
168
+ "onboarding": {
169
+ "completed": true,
170
+ "schemaVersion": 2,
171
+ "completedAt": "2026-03-12T00:00:00.000Z",
172
+ "lastValidatedAt": "2026-03-12T00:00:00.000Z",
173
+ "source": "first-run"
174
+ },
175
+ "local": {
176
+ "autoInstall": true,
177
+ "warmDaemonOnStart": true
178
+ },
179
+ "cloud": {
180
+ "provider": "deepgram",
181
+ "apiKeyEnv": "DEEPGRAM_API_KEY"
182
+ }
183
+ }
184
+ }
185
+ ```
186
+
187
+ ### New modules
188
+ - `extensions/voice/config.ts`
189
+ - `extensions/voice/types.ts`
190
+
191
+ ### Responsibilities
192
+ - `loadConfigWithSource(cwd)`
193
+ - `saveConfig(config, scope, cwd)`
194
+ - `migrateLegacyConfig(config)`
195
+ - `needsOnboarding(config)`
196
+ - `needsRepair(config, diagnostics)`
197
+
198
+ ### Acceptance criteria
199
+ - Legacy config continues to work
200
+ - New users get a complete state model
201
+ - Incomplete onboarding can be distinguished from valid onboarding
202
+ - Scope of settings is explicit and reversible
203
+
204
+ ---
205
+
206
+ ## Milestone 3 — diagnostics and recommendation engine
207
+
208
+ ### Objective
209
+ Create the intelligence layer that powers onboarding, repair, and doctor flows.
210
+
211
+ ### Deliverables
212
+ - Structured environment scan
213
+ - Recommendation engine
214
+ - Blocking vs fixable issue classification
215
+ - User-facing explanations for each recommendation
216
+
217
+ ### New module
218
+ - `extensions/voice/diagnostics.ts`
219
+
220
+ ### Inputs to scan
221
+ - `python3`
222
+ - SoX / `rec`
223
+ - Homebrew availability
224
+ - backend availability from `transcribe.py --list-backends`
225
+ - env vars like `DEEPGRAM_API_KEY`
226
+ - architecture / OS hints where useful
227
+ - daemon status
228
+
229
+ ### Outputs
230
+ - recommended mode (`api` | `local`)
231
+ - recommended backend/model
232
+ - blockers
233
+ - auto-fixable issues
234
+ - manual remediation instructions
235
+
236
+ ### UX requirements
237
+ The engine should answer questions such as:
238
+ - “You want privacy and faster-whisper is already available, so local is recommended.”
239
+ - “You want the fastest setup and no local stack is installed, so Deepgram is recommended.”
240
+ - “SoX is missing; local setup can continue after installing it.”
241
+
242
+ ### Acceptance criteria
243
+ - Recommendation logic is deterministic
244
+ - Recommendation is explainable in one or two lines
245
+ - All blocking issues are surfaced before provisioning starts
246
+
247
+ ---
248
+
249
+ ## Milestone 4 — premium onboarding wizard
250
+
251
+ ### Objective
252
+ Replace today’s setup picker with a full first-run guided terminal flow.
253
+
254
+ ### Deliverables
255
+ - Automatic first-run onboarding on interactive session start
256
+ - Re-runnable setup via `/voice setup` and `/voice reconfigure`
257
+ - Multi-step keyboard-first TUI
258
+ - Summary + completion receipt
259
+
260
+ ### New modules
261
+ - `extensions/voice/onboarding.ts`
262
+ - `extensions/voice/ui.ts`
263
+
264
+ ### UX flow
265
+
266
+ #### Step 1 — Welcome
267
+ - What pi-voice does
268
+ - What the setup will configure
269
+ - Estimated time
270
+ - Option to continue or skip for now
271
+
272
+ #### Step 2 — How do you want to use STT?
273
+ - **API mode** — fastest, cloud-based, API key required
274
+ - **Download mode** — local/private, more setup, works offline
275
+ - **Recommended** — decide for me
276
+
277
+ #### Step 3 — What matters most?
278
+ - setup speed
279
+ - privacy/offline use
280
+ - best accuracy
281
+ - low resource usage
282
+
283
+ #### Step 4 — Recommendation card
284
+ - recommended mode/backend/model
285
+ - why it is recommended
286
+ - alternatives and tradeoffs
287
+
288
+ #### Step 5A — API branch
289
+ - choose provider
290
+ - detect API key or prompt for setup path
291
+ - choose model
292
+ - explain privacy/cost implications
293
+
294
+ #### Step 5B — Local branch
295
+ - choose backend
296
+ - choose model size/profile
297
+ - explain speed/accuracy/setup implications
298
+ - show expected dependencies
299
+
300
+ #### Step 6 — Scope
301
+ - **Use everywhere**
302
+ - **Only in this project**
303
+
304
+ #### Step 7 — Confirm summary
305
+ - mode
306
+ - backend/model
307
+ - scope
308
+ - dependencies to be installed or validated
309
+
310
+ #### Step 8 — Provision + progress
311
+ - install/prepare dependencies
312
+ - show progress + live status
313
+ - allow graceful cancel
314
+
315
+ #### Step 9 — Validation
316
+ - mic check
317
+ - sample recording
318
+ - sample transcription
319
+ - show actual transcript result
320
+
321
+ #### Step 10 — Completion receipt
322
+ - success state
323
+ - selected config summary
324
+ - shortcuts
325
+ - commands: `/voice doctor`, `/voice reconfigure`
326
+
327
+ ### Behavior requirements
328
+ - support Back, Cancel, Skip for now
329
+ - if skipped, do not hard-block every session; use a lightweight reminder state
330
+ - non-interactive mode must skip wizard safely
331
+
332
+ ### Acceptance criteria
333
+ - Fresh install triggers onboarding automatically on first interactive session
334
+ - User is explicitly asked **API or Local** before backend details
335
+ - User can complete setup without prior knowledge of STT backends
336
+ - Successful path ends with real verification, not just config save
337
+
338
+ ---
339
+
340
+ ## Milestone 5 — provisioning and repair engine
341
+
342
+ ### Objective
343
+ Make the selected onboarding path actually succeed.
344
+
345
+ ### Deliverables
346
+ - Automatic dependency provisioning where safe
347
+ - Robust failure capture and recovery guidance
348
+ - Repair mode for broken existing setups
349
+
350
+ ### New module
351
+ - `extensions/voice/install.ts`
352
+
353
+ ### API path requirements
354
+ - detect `DEEPGRAM_API_KEY`
355
+ - validate provider availability
356
+ - fail with clear next-step guidance if key is absent or invalid
357
+
358
+ ### Local path requirements
359
+ - detect/install SoX
360
+ - detect/install backend-specific Python packages or Homebrew dependencies
361
+ - warm or load the selected model/backend
362
+ - surface stderr in a clean error screen if install fails
363
+
364
+ ### Safety rules
365
+ - automatic install should be opt-in or clearly confirmed
366
+ - if auto-install is not allowed, present exact commands and explain what each does
367
+ - installation failures should not leave ambiguous config behind
368
+
369
+ ### Repair flow
370
+ Add `/voice doctor` with:
371
+ - diagnostics summary
372
+ - fix suggestions
373
+ - rerun test flow
374
+ - optional entry into reconfigure flow
375
+
376
+ ### Acceptance criteria
377
+ - API happy path works end-to-end
378
+ - Local happy path works end-to-end
379
+ - Failure states provide precise remediation
380
+ - Repair flow can recover from at least one broken dependency scenario
381
+
382
+ ---
383
+
384
+ ## Milestone 6 — daily runtime polish
385
+
386
+ ### Objective
387
+ Make the post-onboarding experience feel premium, not just functional.
388
+
389
+ ### Deliverables
390
+ - polished status line behavior
391
+ - better notifications
392
+ - improved test and doctor output
393
+ - clearer voice runtime observability
394
+
395
+ ### Improvements
396
+ - Replace bare `MIC`, `REC`, `STT...` with more informative statuses where helpful
397
+ - Optionally show configured mode/backend in status or info flow
398
+ - Improve first-success notification after the user completes setup
399
+ - Refine `/voice info`, `/voice test`, `/voice doctor`, `/voice daemon status`
400
+ - Persist last-known-good validation timestamp
401
+
402
+ ### Acceptance criteria
403
+ - User can understand current voice state quickly
404
+ - Runtime messages feel calm and professional
405
+ - Debugging common issues does not require code reading
406
+
407
+ ---
408
+
409
+ ## Milestone 7 — package hardening and docs
410
+
411
+ ### Objective
412
+ Make the package installable, understandable, and supportable like a serious product.
413
+
414
+ ### Deliverables
415
+ - `README.md`
416
+ - backend matrix doc
417
+ - troubleshooting doc
418
+ - improved package metadata
419
+ - release checklist
420
+
421
+ ### Files
422
+ - `README.md`
423
+ - `docs/backends.md`
424
+ - `docs/troubleshooting.md`
425
+ - `package.json`
426
+
427
+ ### README contents
428
+ - what pi-voice does
429
+ - what happens on first run
430
+ - cloud vs local comparison
431
+ - supported backends/models
432
+ - shortcuts and commands
433
+ - troubleshooting starter guide
434
+
435
+ ### `package.json` improvements
436
+ - richer description
437
+ - homepage/repository/bugs metadata if available
438
+ - add docs to `files`
439
+ - add scripts for typecheck and Python smoke checks
440
+ - add peer dependency declarations for Pi runtime packages if appropriate for publishing
441
+
442
+ ### Acceptance criteria
443
+ - New user can understand the product from README alone
444
+ - Package metadata is good enough for distribution and discovery
445
+ - Troubleshooting docs cover at least the common install/runtime failures
446
+
447
+ ---
448
+
449
+ ## Milestone 8 — QA, migration, and release
450
+
451
+ ### Objective
452
+ Ship safely and prove the experience works across common scenarios.
453
+
454
+ ### Deliverables
455
+ - automated smoke checks
456
+ - manual QA matrix
457
+ - onboarding migration behavior
458
+ - release readiness checklist
459
+
460
+ ### Automated checks
461
+ - `bunx tsc -p tsconfig.json`
462
+ - `python3 -m py_compile daemon.py transcribe.py`
463
+ - TypeScript tests for config migration and recommendation logic
464
+ - Python tests or smoke scripts for daemon contract and backend metadata
465
+
466
+ ### Manual QA matrix
467
+ 1. fresh install, no config, no backends
468
+ 2. fresh install, local backend already present
469
+ 3. fresh install, API key present
470
+ 4. fresh install, API key absent
471
+ 5. local install path with missing SoX
472
+ 6. project-scope config
473
+ 7. global-scope config
474
+ 8. stale daemon scenario
475
+ 9. reconfigure from API -> Local
476
+ 10. reconfigure from Local -> API
477
+ 11. skipped onboarding -> later resume
478
+ 12. non-interactive session start
479
+
480
+ ### Definition of done
481
+ - First-run onboarding is automatic in interactive mode
482
+ - API vs Local decision is the top-level onboarding choice
483
+ - At least one API happy path and one Local happy path work end-to-end
484
+ - Existing legacy users migrate safely
485
+ - Runtime uses the saved config deterministically
486
+ - Docs and package metadata are production-ready
487
+
488
+ ---
489
+
490
+ ## Work breakdown by file
491
+
492
+ ### `extensions/voice.ts`
493
+ - shrink to orchestration/registration layer
494
+ - startup gating
495
+ - command registration
496
+ - shortcut wiring
497
+
498
+ ### `extensions/voice/config.ts`
499
+ - schema
500
+ - migrations
501
+ - scope-aware load/save
502
+ - onboarding state helpers
503
+
504
+ ### `extensions/voice/types.ts`
505
+ - shared types for config, diagnostics, recommendations, onboarding steps
506
+
507
+ ### `extensions/voice/diagnostics.ts`
508
+ - environment scan
509
+ - recommendation engine
510
+ - doctor report data
511
+
512
+ ### `extensions/voice/onboarding.ts`
513
+ - wizard controller
514
+ - first-run flow
515
+ - reconfigure flow
516
+
517
+ ### `extensions/voice/ui.ts`
518
+ - reusable TUI components and rendering helpers
519
+ - summary cards, progress screens, completion receipt
520
+
521
+ ### `extensions/voice/install.ts`
522
+ - provisioning actions
523
+ - command execution wrappers
524
+ - result parsing
525
+
526
+ ### `extensions/voice/runtime.ts`
527
+ - daemon communication
528
+ - recording/transcription logic
529
+ - runtime correctness helpers
530
+
531
+ ### `extensions/voice/btw.ts`
532
+ - extract BTW feature for separation of concerns
533
+
534
+ ### `daemon.py`
535
+ - explicit config consistency
536
+ - status improvements
537
+ - safer socket strategy
538
+
539
+ ### `transcribe.py`
540
+ - richer backend metadata
541
+ - stable diagnostics contract
542
+
543
+ ### `README.md`, `docs/backends.md`, `docs/troubleshooting.md`
544
+ - user-facing package documentation
545
+
546
+ ---
547
+
548
+ ## Execution priority
549
+
550
+ ### P0 — critical
551
+ 1. Runtime correctness and daemon staleness fixes
552
+ 2. Scope-aware config persistence
553
+ 3. Versioned config + onboarding state
554
+ 4. Diagnostics foundation
555
+ 5. First-run onboarding gate
556
+
557
+ ### P1 — major
558
+ 6. Full onboarding wizard
559
+ 7. Provisioning engine
560
+ 8. `/voice doctor` + repair flow
561
+ 9. Package docs and metadata
562
+
563
+ ### P2 — polish
564
+ 10. Runtime polish and premium messaging
565
+ 11. Gallery/demo assets
566
+ 12. richer backend recommendation heuristics
567
+
568
+ ---
569
+
570
+ ## Risks and mitigations
571
+
572
+ ### Risk: install automation becomes too platform-specific
573
+ **Mitigation:** keep provisioning modular, detect capabilities first, auto-install only where safe, always provide manual fallback commands.
574
+
575
+ ### Risk: onboarding becomes too invasive
576
+ **Mitigation:** run only for first interactive session or invalid config, support skip/remind later, never block non-interactive usage.
577
+
578
+ ### Risk: daemon complexity increases during refactor
579
+ **Mitigation:** ship Milestone 1 correctness improvements before UI expansion, and keep daemon protocol explicit.
580
+
581
+ ### Risk: local backend ecosystem is inconsistent
582
+ **Mitigation:** use recommendation logic to steer users toward known-good defaults and document tradeoffs clearly.
583
+
584
+ ---
585
+
586
+ ## Suggested shipping order
587
+
588
+ ### Release 1
589
+ - correctness fixes
590
+ - config schema + migration
591
+ - diagnostics foundation
592
+ - first-run onboarding skeleton with API vs Local choice
593
+
594
+ ### Release 2
595
+ - full wizard
596
+ - provisioning engine
597
+ - validation flow
598
+ - `/voice doctor`
599
+
600
+ ### Release 3
601
+ - docs hardening
602
+ - premium UI polish
603
+ - rollout tuning and edge-case cleanup
604
+
605
+ ---
606
+
607
+ ## Final recommendation
608
+
609
+ The right sequence is:
610
+
611
+ **correctness -> config/migration -> diagnostics -> onboarding -> provisioning -> validation -> docs -> polish**
612
+
613
+ If executed in that order, `pi-voice` can move from a promising expert-only extension to a world-class, enterprise-grade Pi package with a first-run experience that feels intentional, reliable, and premium.