npm - @codexstar/pi-listen - Versions diffs - 1.0.4 - Mend

@codexstar/pi-listen 1.0.4

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (29) hide show

package/LICENSE +21 -0
package/README.md +283 -0
package/daemon.py +517 -0
package/docs/API.md +273 -0
package/docs/ARCHITECTURE.md +114 -0
package/docs/backends.md +196 -0
package/docs/plans/2026-03-12-pi-voice-master-plan.md +613 -0
package/docs/plans/2026-03-12-pi-voice-model-aware-execution-plan.md +256 -0
package/docs/plans/2026-03-12-pi-voice-onboarding-remediation-plan.md +391 -0
package/docs/plans/pi-voice-model-aware-review.md +196 -0
package/docs/plans/pi-voice-model-detection-qa-plan.md +226 -0
package/docs/plans/pi-voice-model-detection-research.md +483 -0
package/docs/plans/pi-voice-onboarding-ux-plan.md +388 -0
package/docs/plans/pi-voice-release-validation-plan.md +386 -0
package/docs/plans/pi-voice-remaining-implementation-plan.md +524 -0
package/docs/plans/pi-voice-review-findings.md +227 -0
package/docs/plans/pi-voice-technical-remediation-plan.md +613 -0
package/docs/qa-matrix.md +69 -0
package/docs/qa-results.md +357 -0
package/docs/troubleshooting.md +265 -0
package/extensions/voice/config.ts +206 -0
package/extensions/voice/diagnostics.ts +212 -0
package/extensions/voice/install.ts +62 -0
package/extensions/voice/onboarding.ts +315 -0
package/extensions/voice.ts +1149 -0
package/package.json +48 -0
package/scripts/setup-macos.sh +374 -0
package/scripts/setup-windows.ps1 +271 -0
package/transcribe.py +497 -0

package/docs/qa-results.md ADDED Viewed

@@ -0,0 +1,357 @@
+# pi-voice QA results
+This document records the interactive/manual QA evidence gathered after the onboarding overhaul and the follow-up model-aware onboarding work.
+## Verification baseline
+Fresh automated verification completed successfully before and after manual QA checks:
+```sh
+bun run check
+```
+That run currently covers:
+- config migration / scope tests
+- onboarding fallback / finalization tests
+- diagnostics recommendation tests
+- provisioning-plan tests
+- model-detection metadata tests
+- model-aware recommendation and labeling tests
+- model-readiness confidence tests (`installed` / `download required` / `unknown` / `api`)
+- TypeScript compilation
+- Python compilation
+## Manual / RPC-assisted checks completed
+The checks below were run against the real Pi RPC mode and the current `extensions/voice.ts` implementation.
+### 1. Fresh install prompts for onboarding on startup
+**Result:** pass
+Observed startup UI requests for a clean HOME / clean cwd:
+```json
+{
+  "case": "fresh-first-run",
+  "titles": [
+    "setStatus",
+    "Set up pi-voice now?"
+  ],
+  "selectTitle": "Set up pi-voice now?",
+  "options": [
+    "Start voice setup",
+    "Remind me later"
+  ]
+}
+```
+### 2. Partial legacy config still re-enters onboarding
+**Result:** pass
+Seeded legacy config:
+```json
+{ "voice": { "enabled": true } }
+```
+Observed startup prompt:
+```json
+{
+  "case": "partial-legacy",
+  "selectTitle": "Set up pi-voice now?",
+  "options": [
+    "Start voice setup",
+    "Remind me later"
+  ]
+}
+```
+This confirms partial legacy config no longer counts as fully onboarded.
+### 3. Remind-me-later suppresses startup prompt and `/voice reconfigure` reopens setup
+**Result:** pass
+Seeded config with recent `onboarding.skippedAt` and observed no startup onboarding prompt before sending `/voice reconfigure`.
+Observed UI requests:
+```json
+{
+  "case": "skipped-then-reconfigure",
+  "titles": [
+    "setStatus",
+    "How do you want to use speech-to-text?"
+  ]
+}
+```
+This confirms the defer window suppresses startup prompting while keeping reconfiguration available.
+### 4. Reconfigure flow saves project-scoped config
+**Result:** pass
+Executed `/voice reconfigure` in RPC mode and selected:
+- Cloud API
+- Deepgram
+- `nova-3`
+- Project scope
+Observed notify + saved config:
+```json
+{
+  "case": "project-scope-save",
+  "notify": "Voice setup saved, but action is still required.\nMode: Cloud API\nBackend: deepgram\nModel: nova-3\nScope: project\n...",
+  "savedVoice": {
+    "version": 2,
+    "enabled": true,
+    "language": "en",
+    "mode": "api",
+    "backend": "deepgram",
+    "model": "nova-3",
+    "scope": "project",
+    "btwEnabled": true,
+    "onboarding": {
+      "completed": false,
+      "schemaVersion": 2,
+      "source": "repair"
+    }
+  }
+}
+```
+This confirms:
+- reconfigure launches the onboarding flow
+- project scope writes to `.pi/settings.json`
+- incomplete provisioning/validation leaves onboarding in repair-needed state rather than falsely complete
+### 5. Local path still exposes backend choices with install hints
+**Result:** pass
+Observed local backend selection options:
+```json
+{
+  "case": "local-fallback-options",
+  "backendOptions": [
+    "faster-whisper — available",
+    "moonshine — pip install useful-moonshine[onnx]",
+    "whisper-cpp — brew install whisper-cpp",
+    "parakeet — pip install nemo_toolkit[asr]"
+  ]
+}
+```
+This confirms the local flow does not dead-end when only some backends are installed and still surfaces install guidance.
+### 6. `/voice doctor` separates current repair from recommended alternative
+**Result:** pass
+Observed output:
+```json
+{
+  "case": "voice-doctor",
+  "notify": "Voice doctor:\n  python3: OK\n  sox/rec: missing\n  brew: OK\n  deepgram key: missing\n  available backends: faster-whisper (local)\n\nCurrent config: api/deepgram/nova-3\nRepair current setup:\n  - brew install sox\n  - Set DEEPGRAM_API_KEY before using Deepgram API mode\n\nRecommended alternative: local/faster-whisper/small\nWhy: Recommended local default with good balance of quality and setup effort.\nFixable issues:\n  - Install SoX for microphone recording\nSuggested commands for the recommendation:\n  - brew install sox"
+}
+```
+This confirms doctor output now distinguishes:
+- how to repair the current saved config
+- what the system recommends instead
+### 7. Model-aware backend metadata is now exposed from `transcribe.py`
+**Result:** pass
+Observed `--list-backends` metadata snippet:
+```json
+[
+  {
+    "name": "faster-whisper",
+    "installed_models": [],
+    "install_detection": "huggingface-cache"
+  },
+  {
+    "name": "moonshine",
+    "installed_models": [],
+    "install_detection": "moonshine-cache-heuristic"
+  },
+  {
+    "name": "whisper-cpp",
+    "installed_models": [],
+    "install_detection": "whisper-cpp-model-paths"
+  },
+  {
+    "name": "deepgram",
+    "installed_models": [],
+    "install_detection": "api-key"
+  },
+  {
+    "name": "parakeet",
+    "installed_models": [],
+    "install_detection": "huggingface-or-nemo-cache"
+  }
+]
+```
+This confirms the backend scan now exposes machine-readable model-readiness metadata, even when no models are currently detected on disk.
+### 8. Local reconfigure path stays model-aware even when no local model is currently cached
+**Result:** pass
+Observed reconfigure options in the current environment:
+```json
+{
+  "backendOptions": [
+    "faster-whisper — backend ready",
+    "moonshine — pip install useful-moonshine[onnx]",
+    "whisper-cpp — brew install whisper-cpp",
+    "parakeet — pip install nemo_toolkit[asr]"
+  ],
+  "modelOptions": [
+    "tiny",
+    "tiny.en",
+    "base",
+    "base.en",
+    "small (recommended)",
+    "small.en",
+    "medium",
+    "medium.en",
+    "large-v3",
+    "large-v3-turbo",
+    "distil-small.en",
+    "distil-medium.en",
+    "distil-large-v3"
+  ]
+}
+```
+This confirms:
+- model-aware metadata does not break the local onboarding path when no cached models are found
+- the user still gets a sensible backend list and model selection flow
+- the recommended model remains visible even without an installed-model hit
+### 9. Model-aware logic covers both installed-model and missing-model paths
+**Result:** pass (automated coverage), release-machine confirmation still recommended
+Automated model-aware checks currently cover:
+- installed-model recommendation preference
+- installed-model labeling in onboarding
+- model readiness states for local vs cloud backends
+- provisioning behavior when backend is available but selected model is not installed
+- transcribe metadata contract including `installed_models` and `install_detection`
+This gives regression safety for the model-aware phase even though the current QA machine does not have local STT model caches populated.
+### 10. `/voice info` surfaces current model readiness and scope details
+**Result:** pass
+Observed output:
+```text
+Voice config:
+  enabled:  true
+  mode:     local
+  scope:    global
+  backend:  faster-whisper
+  model:    small
+  model status: download required
+  language: en
+  state:    idle
+  setup:    complete (setup-command)
+  socket:   /tmp/.../pi-voice-<hash>.sock
+  daemon:   stopped
+```
+This confirms `info` now reports:
+- selected mode/backend/model
+- model readiness state
+- scope
+- config-scoped socket path
+- setup state
+### 11. `/voice backends` surfaces model-aware backend summaries
+**Result:** pass
+Observed output:
+```text
+Backends:
++ faster-whisper   local  no confirmed installed models
+    detection: huggingface-cache
+- moonshine        local  install: pip install useful-moonshine[onnx]
+    detection: moonshine-cache-heuristic
+- whisper-cpp      local  install: brew install whisper-cpp
+    detection: whisper-cpp-model-paths
+- deepgram         cloud  needs setup: Set DEEPGRAM_API_KEY env var (free: deepgram.com)
+    detection: api-key
+- parakeet         local  install: pip install nemo_toolkit[asr]
+    detection: huggingface-or-nemo-cache
+```
+This confirms `backends` now distinguishes:
+- installed models when present
+- no confirmed installed models
+- API readiness / setup-needed cloud wording
+- install detection source hints
+### 12. `/voice test` reports current model status plus missing-model guidance
+**Result:** pass
+Observed output:
+```text
+Voice test:
+  mode: local
+  backend: faster-whisper
+  model: small
+  model status: download required
+  language: en
+  onboarding: complete
+  python3: OK
+  sox/rec: missing
+  daemon: not running
+Suggested commands:
+  - brew install sox
+Manual steps:
+  - Selected model small is not installed yet and may need to be downloaded on first use
+```
+This confirms `test` now combines:
+- current model readiness
+- recording dependency status
+- targeted install/manual guidance for the selected model path
+## Remaining target-machine checks
+The highest-value interactive checks have been exercised. Remaining follow-up checks are mainly environment- or hardware-specific, especially for the new model-aware behavior.
+### Strongly recommended on the target release machine
+- real microphone capture with user audio input
+- real Deepgram end-to-end API validation with a valid key
+- local STT success path on a machine with SoX installed and the desired backend available
+- at least one path where a local model is **already cached** so onboarding can display an explicit installed-model path (`already installed`, `ready now`, or equivalent)
+- at least one path where a backend is installed but the selected model is **not** cached, to confirm the download-required messaging is clear
+- `/voice info` and `/voice test` on a machine with real model caches, so the displayed model status can be compared against actual local assets
+### Why these are still worth running
+The current environment used for QA did not have local STT model caches populated, so the model-aware logic was verified through:
+- automated tests
+- backend metadata checks
+- RPC-assisted onboarding checks
+That is sufficient for regression safety and release evidence, but a final pass on a machine with real cached local models would provide the strongest product validation for the installed-model UX.
+## Release signoff
+Based on:
+- `bun run check`
+- the manual/RPC-assisted onboarding and repair checks above
+- the model-detection metadata and model-aware onboarding checks documented here
+`pi-voice` now satisfies the current release-hardening checklist for the onboarding overhaul and the initial model-aware onboarding upgrade, with the remaining checks clearly isolated to target-machine / real-audio validation.

package/docs/troubleshooting.md ADDED Viewed

@@ -0,0 +1,265 @@
+# pi-voice troubleshooting
+This guide focuses on the current `pi-voice` behavior and the most likely setup/runtime issues.
+## First things to check
+Run these built-in commands first:
+- `/voice info` — shows the active config the extension believes it should use
+- `/voice test` — checks SoX, daemon state, and current model readiness
+- `/voice backends` — lists detected STT backends, installed models, and install hints
+- `/voice doctor` — compares how to repair the current config vs a recommended alternative
+- `/voice daemon status` — shows the current daemon backend/model state
+- `/voice setup` — re-run backend/model selection
+If you only do one thing, start with `/voice test`.
+## Symptom: "Voice requires SoX. Install: brew install sox"
+### What it means
+`pi-voice` could not find the `rec` command used for audio recording.
+### Fix
+Install SoX:
+```sh
+brew install sox
+```
+Then restart Pi or run `/voice test` again.
+### Why this matters
+Without SoX, the extension cannot record microphone input, even if the transcription backend itself is installed correctly.
+## Symptom: `/voice backends` shows everything as unavailable
+### What it means
+No STT backend is currently detected.
+### Common fixes
+Choose one path:
+#### Local default path
+```sh
+python3 -m pip install faster-whisper
+```
+#### Lightweight local path
+```sh
+python3 -m pip install 'useful-moonshine[onnx]'
+```
+#### whisper.cpp path
+```sh
+brew install whisper-cpp
+```
+#### Cloud path
+Set a Deepgram API key in your shell environment:
+```sh
+export DEEPGRAM_API_KEY=your_key_here
+```
+Then restart Pi so the environment is visible to the extension.
+## Symptom: `/voice test` says `SoX (rec): OK` but `Daemon: not running`
+### What it means
+The warm daemon is not currently running. This is not always fatal because `pi-voice` can still fall back to direct transcription subprocesses.
+### Fix
+Start it manually:
+```text
+/voice daemon start
+```
+Then inspect it:
+```text
+/voice daemon status
+```
+### If it still will not start
+Check Python availability and backend installation:
+```sh
+python3 --version
+python3 transcribe.py --list-backends
+```
+## Symptom: `/voice daemon status` shows the wrong backend or model
+### What it means
+The running daemon does not match the config you expect.
+Recent work in this repo is moving toward config-specific sockets and more explicit backend/model requests, but if you still see mismatch behavior, treat it as a runtime desynchronization issue.
+### Fixes
+1. Re-run setup:
+   ```text
+   /voice setup
+   ```
+2. Stop the daemon:
+   ```text
+   /voice daemon stop
+   ```
+3. Start it again:
+   ```text
+   /voice daemon start
+   ```
+4. Re-check:
+   ```text
+   /voice daemon status
+   ```
+## Symptom: recording starts, but transcription is empty or says "No speech detected"
+### Likely causes
+- recording was too short
+- microphone input level is too low
+- background noise or device permissions interfered
+- the backend is installed but not functioning correctly for the chosen model
+### Fixes
+- hold the record key a bit longer
+- try `/voice test` first to validate microphone capture
+- confirm the recorded sample file is not empty
+- switch to a more conservative model/backend through `/voice setup`
+## Symptom: cloud setup is selected, but transcription still fails
+### Likely causes
+- `DEEPGRAM_API_KEY` is missing or invalid
+- Pi was launched before the shell environment contained the key
+- network access is blocked or failing
+### Fixes
+1. Verify the environment variable exists in the shell that launches Pi:
+   ```sh
+   echo $DEEPGRAM_API_KEY
+   ```
+2. Restart Pi after setting the variable.
+3. Confirm the backend is detected:
+   ```sh
+   python3 transcribe.py --list-backends
+   ```
+4. Re-run `/voice setup` if needed.
+## Symptom: backend is installed, but the selected model is still reported as missing
+### What it means
+`pi-voice` can see the backend package or CLI, but it does not see the specific model you selected as already available locally.
+### Typical examples
+- `faster-whisper` installed, but `medium` or `large-v3-turbo` not cached yet
+- `whisper-cpp` installed, but no `ggml-<model>.bin` file found
+- backend available, but onboarding marks the selected model as **download required**
+### Fixes
+- choose an **installed** model in onboarding if one is already available
+- keep the current model and allow first use to download it if that is acceptable
+- use `/voice backends` to inspect installed-model hints
+- use `/voice doctor` to compare your current setup with a recommended alternative
+## Symptom: backend is installed, but model status is unknown
+### What it means
+The backend package exists, but `pi-voice` cannot verify local model presence with high confidence for that backend.
+This is a conservative result, not necessarily an error.
+### Fixes
+- try the chosen model anyway if you expect it to already exist
+- use `/voice test` and `/voice doctor` to see whether repair is still needed
+- if you want a more deterministic local path, prefer a backend with stronger model detection, such as `faster-whisper` or `whisper-cpp`
+## Symptom: local backend selected, but transcription is slow
+### What it means
+The chosen local model may be too heavy for the current machine or use case.
+### Fixes
+- switch to a smaller model (`small`, `small.en`, or backend default)
+- prefer an already-installed smaller model if onboarding shows one
+- prefer `faster-whisper` as the conservative local default
+- use cloud mode if setup speed and responsiveness matter more than privacy/offline behavior
+## Symptom: project config is ignored
+### What it means
+Either:
+- the config was saved globally instead of at project scope, or
+- the project does not have `.pi/settings.json`, or
+- an older config file is still being read
+### Fixes
+1. Re-run setup and select **Project only** when prompted.
+2. Inspect both files:
+   - `~/.pi/agent/settings.json`
+   - `.pi/settings.json`
+3. Remember that project settings are intended to override global settings.
+## Symptom: the hold-to-talk shortcut does nothing
+### Current behavior to remember
+- hold **Space** to talk only when the editor is empty
+- `Ctrl+Shift+V` is the fallback toggle shortcut
+- `Ctrl+Shift+B` is the BTW voice shortcut
+### Fixes
+- make sure the editor is empty before using hold-Space
+- try `Ctrl+Shift+V` instead
+- use `/voice on` if voice was disabled
+- run `/voice info` to confirm `enabled: true`
+## Symptom: "Recording too short" or "No audio recorded"
+### What it means
+The audio file was missing, too small, or recording ended before a usable sample was captured.
+### Fixes
+- hold the key slightly longer
+- try a direct microphone test via `/voice test`
+- confirm SoX can record in your environment
+- avoid tapping the shortcut too quickly
+## Manual backend checks
+These are useful outside Pi too:
+```sh
+python3 transcribe.py --list-backends
+python3 daemon.py ping
+python3 daemon.py status
+```
+If you are debugging local setup, `--list-backends` is usually the most useful first command because it now includes installed-model hints and detection metadata.
+## When to re-run setup
+Use `/voice setup` again when:
+- switching from cloud to local or vice versa
+- changing model sizes
+- moving from global to project scope
+- recovering from a broken dependency install
+## If you are still stuck
+Capture these four pieces of information before debugging further:
+1. `/voice info` output
+2. `/voice test` output
+3. `/voice backends` output
+4. `/voice daemon status` output
+That is usually enough to identify whether the issue is:
+- recording
+- backend installation
+- selected model missing vs already installed
+- model status unknown
+- API credentials
+- daemon state
+- config scope