@codexstar/pi-listen 1.0.4

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,483 @@
1
+ # pi-voice model detection research
2
+
3
+ ## Goal
4
+
5
+ Figure out how `pi-voice` can detect whether a backend/model is **already available locally** so onboarding can say things like:
6
+
7
+ - "You already have `small` installed — configure this now"
8
+ - "`medium` is not downloaded yet — first use will download it"
9
+ - "Deepgram is API-based — no local model download is needed"
10
+
11
+ This research focuses on practical heuristics that can be implemented now in this repo, with confidence notes.
12
+
13
+ ---
14
+
15
+ ## Bottom-line recommendation
16
+
17
+ Implement model detection as a **best-effort, backend-specific layer** that returns one of:
18
+
19
+ - `installed`
20
+ - `missing`
21
+ - `unknown`
22
+
23
+ Do **not** force every backend into a binary installed/missing answer if the storage semantics are unclear.
24
+
25
+ ### Recommended rule
26
+ - Prefer **runtime/library probes** over hard-coded path guesses when the backend library exposes a reliable local-only lookup.
27
+ - Use **filesystem heuristics** where the project already has explicit path logic.
28
+ - If confidence is low, return `unknown` and let onboarding say:
29
+ - "Package/backend is installed, but local model presence could not be confirmed."
30
+
31
+ ---
32
+
33
+ ## Suggested output contract
34
+
35
+ Add a model-detection layer that can return something like:
36
+
37
+ ```json
38
+ {
39
+ "backend": "faster-whisper",
40
+ "backendAvailable": true,
41
+ "models": [
42
+ {
43
+ "name": "small",
44
+ "status": "installed",
45
+ "confidence": "high",
46
+ "path": "/Users/.../.cache/huggingface/hub/..."
47
+ },
48
+ {
49
+ "name": "medium",
50
+ "status": "missing",
51
+ "confidence": "high"
52
+ },
53
+ {
54
+ "name": "large-v3-turbo",
55
+ "status": "unknown",
56
+ "confidence": "low",
57
+ "reason": "backend installed, but cache semantics are backend-managed"
58
+ }
59
+ ]
60
+ }
61
+ ```
62
+
63
+ This is better than a plain boolean because onboarding can explicitly say:
64
+ - **Already installed**
65
+ - **Download required**
66
+ - **Unknown / may download on first use**
67
+
68
+ ---
69
+
70
+ ## Cross-backend cache/env locations to consider first
71
+
72
+ Before backend-specific logic, normalize these locations:
73
+
74
+ - `HF_HOME`
75
+ - `HUGGINGFACE_HUB_CACHE`
76
+ - `XDG_CACHE_HOME`
77
+ - `TORCH_HOME`
78
+
79
+ ### Practical default resolution order
80
+
81
+ #### Hugging Face cache base
82
+ 1. `HUGGINGFACE_HUB_CACHE`
83
+ 2. `${HF_HOME}/hub`
84
+ 3. `${XDG_CACHE_HOME}/huggingface/hub`
85
+ 4. `~/.cache/huggingface/hub`
86
+
87
+ #### Torch / NeMo cache base
88
+ 1. `${TORCH_HOME}/NeMo`
89
+ 2. `${XDG_CACHE_HOME}/torch/NeMo`
90
+ 3. `~/.cache/torch/NeMo`
91
+ 4. `~/.cache/nemo`
92
+
93
+ These should be centralized in one helper so all model detection uses the same cache roots.
94
+
95
+ ---
96
+
97
+ ## Backend-by-backend findings
98
+
99
+ ## 1. faster-whisper
100
+
101
+ ### Current repo behavior
102
+ `transcribe.py` treats `faster-whisper` as available when the Python package imports:
103
+ - `is_faster_whisper_available()`
104
+
105
+ At runtime it calls:
106
+ - `WhisperModel(model_name, ...)`
107
+
108
+ ### Best detection strategy
109
+ **Use the library itself**.
110
+
111
+ The installed package exposes `faster_whisper.utils.download_model()`, which supports:
112
+ - `local_files_only=True`
113
+
114
+ I verified locally that this works as a pure local probe:
115
+
116
+ ```python
117
+ from faster_whisper.utils import download_model
118
+ path = download_model("small", local_files_only=True)
119
+ ```
120
+
121
+ If present, it returns the local cached snapshot path.
122
+ If missing, it raises `LocalEntryNotFoundError`.
123
+
124
+ ### Why this is the best option
125
+ - high confidence
126
+ - no need to reverse-engineer the exact cache layout
127
+ - automatically respects Hugging Face cache behavior
128
+ - avoids false positives from partial folders
129
+
130
+ ### Repo mapping data available now
131
+ The package exposes a concrete model -> repo mapping via `faster_whisper.utils._MODELS`.
132
+
133
+ Observed locally:
134
+
135
+ ```python
136
+ {
137
+ 'small': 'Systran/faster-whisper-small',
138
+ 'small.en': 'Systran/faster-whisper-small.en',
139
+ 'medium': 'Systran/faster-whisper-medium',
140
+ 'large-v3': 'Systran/faster-whisper-large-v3',
141
+ 'large-v3-turbo': 'mobiuslabsgmbh/faster-whisper-large-v3-turbo',
142
+ 'distil-small.en': 'Systran/faster-distil-whisper-small.en',
143
+ ...
144
+ }
145
+ ```
146
+
147
+ ### Filesystem heuristic (fallback only)
148
+ If you need path-only detection, Hugging Face snapshots typically appear under:
149
+
150
+ ```text
151
+ ~/.cache/huggingface/hub/models--<org>--<repo>/snapshots/<rev>/
152
+ ```
153
+
154
+ Examples:
155
+ - `models--Systran--faster-whisper-small`
156
+ - `models--mobiuslabsgmbh--faster-whisper-large-v3-turbo`
157
+
158
+ ### Confidence
159
+ - **Library probe:** high
160
+ - **Filesystem-only scan:** medium
161
+
162
+ ### Recommendation
163
+ Implement detection for `faster-whisper` via:
164
+ 1. import `faster_whisper.utils.download_model`
165
+ 2. call with `local_files_only=True`
166
+ 3. mark `installed` or `missing`
167
+ 4. return the resolved path when found
168
+
169
+ ---
170
+
171
+ ## 2. whisper-cpp
172
+
173
+ ### Current repo behavior
174
+ `transcribe.py` already contains concrete model path logic:
175
+
176
+ ```python
177
+ candidates = [
178
+ ~/.cache/whisper-cpp/ggml-{model}.bin,
179
+ /opt/homebrew/share/whisper-cpp/models/ggml-{model}.bin,
180
+ /usr/local/share/whisper-cpp/models/ggml-{model}.bin,
181
+ ]
182
+ ```
183
+
184
+ It also accepts `model_name` directly if it is already a path.
185
+
186
+ ### Best detection strategy
187
+ Use the **same candidate path logic already in the repo**.
188
+
189
+ For each model:
190
+ 1. if `model_name` itself is an existing path -> `installed`
191
+ 2. otherwise check:
192
+ - `~/.cache/whisper-cpp/ggml-{model}.bin`
193
+ - `/opt/homebrew/share/whisper-cpp/models/ggml-{model}.bin`
194
+ - `/usr/local/share/whisper-cpp/models/ggml-{model}.bin`
195
+ 3. if any exists -> `installed`
196
+ 4. else -> `missing`
197
+
198
+ ### Why this is strong
199
+ - this is already the runtime resolution behavior
200
+ - onboarding can exactly mirror real runtime behavior
201
+ - avoids mismatch between UI and transcription runtime
202
+
203
+ ### Confidence
204
+ - **Filesystem/path detection:** very high
205
+
206
+ ### Recommendation
207
+ Extract the candidate-path logic into a reusable helper so:
208
+ - transcription
209
+ - doctor
210
+ - onboarding
211
+ all share the same source of truth.
212
+
213
+ ---
214
+
215
+ ## 3. deepgram
216
+
217
+ ### Current repo behavior
218
+ There is no local model download.
219
+
220
+ Availability is simply:
221
+ - `DEEPGRAM_API_KEY` present -> backend available
222
+
223
+ ### Best detection strategy
224
+ Treat all models as:
225
+ - `installed` in the sense of **ready to use now** if the API key exists
226
+ - `missing` if the API key does not exist
227
+
228
+ ### Suggested wording in onboarding
229
+ Instead of "already downloaded", say:
230
+ - **Ready via API**
231
+ - **Requires API key**
232
+
233
+ ### Confidence
234
+ - **Env-based readiness:** very high
235
+
236
+ ### Recommendation
237
+ Represent cloud models differently in UI:
238
+ - no local disk/download badge
239
+ - `Ready now` vs `Needs API key`
240
+
241
+ ---
242
+
243
+ ## 4. moonshine
244
+
245
+ ### Current repo behavior
246
+ `transcribe.py` supports two paths:
247
+ - Python API via `moonshine_onnx`
248
+ - CLI fallback via `moonshine`
249
+
250
+ Models exposed in the repo:
251
+ - `moonshine/tiny`
252
+ - `moonshine/base`
253
+
254
+ ### Problem
255
+ I do **not** have strong evidence yet for a stable public cache layout for Moonshine model files.
256
+
257
+ ### Practical detection options
258
+
259
+ #### Option A — package resource scan (recommended first)
260
+ If `moonshine_onnx` is installed:
261
+ - inspect the installed package directory
262
+ - search for `.onnx` or related model artifacts that contain `tiny` / `base`
263
+
264
+ Example approach:
265
+ ```python
266
+ import moonshine_onnx, inspect
267
+ from pathlib import Path
268
+ pkg_dir = Path(moonshine_onnx.__file__).resolve().parent
269
+ # scan recursively for files containing "tiny" / "base" and ".onnx"
270
+ ```
271
+
272
+ This is implementable now, though still heuristic.
273
+
274
+ #### Option B — CLI/package-managed cache scan
275
+ If the CLI is installed but Python package is not:
276
+ - inspect likely package-managed cache locations under:
277
+ - `~/.cache`
278
+ - `${XDG_CACHE_HOME}`
279
+ - search for directories/files matching `moonshine`, `tiny`, `base`, `.onnx`
280
+
281
+ #### Option C — conservative fallback
282
+ If backend package/CLI is installed but no model artifacts can be confidently located:
283
+ - return `unknown`
284
+
285
+ ### Confidence
286
+ - **Package-resource scan:** medium
287
+ - **Generic cache scan:** low
288
+ - **Binary installed => model installed:** too risky, do not assume
289
+
290
+ ### Recommendation
291
+ For the first implementation:
292
+ - detect backend availability normally
293
+ - try package-resource scan
294
+ - if inconclusive, return `unknown`
295
+ - onboarding should say:
296
+ - `moonshine/base — backend installed, local model presence not confirmed`
297
+
298
+ This is safer than lying about downloaded state.
299
+
300
+ ---
301
+
302
+ ## 5. parakeet / NeMo
303
+
304
+ ### Current repo behavior
305
+ `transcribe.py` uses:
306
+
307
+ ```python
308
+ nemo_asr.models.ASRModel.from_pretrained(model_name)
309
+ ```
310
+
311
+ Model names are repo-like strings such as:
312
+ - `nvidia/parakeet-tdt-0.6b-v2`
313
+ - `nvidia/parakeet-ctc-0.6b`
314
+
315
+ ### Problem
316
+ NeMo caching can vary by version/environment.
317
+ It may use:
318
+ - Hugging Face cache
319
+ - Torch/NeMo cache
320
+ - internal download helpers
321
+
322
+ ### Practical detection options
323
+
324
+ #### Option A — cache path scan by model slug
325
+ Check likely cache roots:
326
+ - `${TORCH_HOME}/NeMo`
327
+ - `~/.cache/torch/NeMo`
328
+ - `~/.cache/nemo`
329
+ - Hugging Face cache under `models--nvidia--parakeet-*`
330
+
331
+ Search for folder/file names containing a normalized slug, e.g.:
332
+ - `nvidia--parakeet-tdt-0.6b-v2`
333
+ - `parakeet-tdt-0.6b-v2`
334
+
335
+ #### Option B — local-only library probe (if NeMo exposes one)
336
+ If there is a NeMo utility for local cache resolution without downloading, that would be preferable. I do not have that confirmed from the current environment.
337
+
338
+ #### Option C — conservative fallback
339
+ If `nemo.collections.asr` imports but the cache cannot be confidently located:
340
+ - return `unknown`
341
+
342
+ ### Confidence
343
+ - **Torch/NeMo + HF path scan:** medium-low
344
+ - **Library-level local-only probe:** unknown until implemented/researched further
345
+
346
+ ### Recommendation
347
+ For the first shipping iteration:
348
+ - backend availability from import
349
+ - model presence from cache-root scan only
350
+ - return `unknown` if no confident hit
351
+
352
+ This is enough for onboarding to say:
353
+ - `Parakeet backend is installed, but local model presence could not be confirmed`
354
+
355
+ ---
356
+
357
+ ## Local machine observations from this environment
358
+
359
+ Observed cache/env state on this machine:
360
+
361
+ ### Cache/env
362
+ - `HF_HOME`: unset
363
+ - `HUGGINGFACE_HUB_CACHE`: unset
364
+ - `XDG_CACHE_HOME`: unset
365
+ - `TORCH_HOME`: unset
366
+ - `~/.cache/huggingface/hub`: exists
367
+ - `~/.cache/whisper-cpp`: absent
368
+ - `/opt/homebrew/share/whisper-cpp/models`: absent
369
+
370
+ ### Hugging Face hub contents observed
371
+ Current HF hub cache did **not** show any obvious `faster-whisper-*` repos present in the default cache root.
372
+
373
+ ### faster-whisper local-only probe results
374
+ I verified locally that these probes report `missing` via `LocalEntryNotFoundError`:
375
+ - `small`
376
+ - `large-v3-turbo`
377
+ - `distil-small.en`
378
+
379
+ That confirms the local-only probe is usable as an installed/missing signal.
380
+
381
+ ---
382
+
383
+ ## Recommended implementation order
384
+
385
+ ## Phase 1 — high-confidence backends first
386
+ Implement model detection for:
387
+ 1. `faster-whisper`
388
+ 2. `whisper-cpp`
389
+ 3. `deepgram`
390
+
391
+ These give the biggest product value with the least ambiguity.
392
+
393
+ ## Phase 2 — conservative support for the ambiguous backends
394
+ Implement best-effort detection for:
395
+ 4. `moonshine`
396
+ 5. `parakeet`
397
+
398
+ Return `unknown` rather than over-claiming installed state.
399
+
400
+ ---
401
+
402
+ ## Suggested onboarding behavior
403
+
404
+ ### For `installed`
405
+ Show:
406
+ - **Already installed**
407
+ - **Ready to configure now**
408
+
409
+ ### For `missing`
410
+ Show:
411
+ - **Download required**
412
+ - backend-specific install guidance
413
+
414
+ ### For `unknown`
415
+ Show:
416
+ - **Backend installed**
417
+ - **Model presence not confirmed**
418
+ - **May download or initialize on first use**
419
+
420
+ This lets the system feel smart without making brittle claims.
421
+
422
+ ---
423
+
424
+ ## Concrete implementation suggestion for this repo
425
+
426
+ ### Add to `transcribe.py`
427
+ A machine-readable command such as:
428
+
429
+ ```bash
430
+ python3 transcribe.py --list-model-status --backend faster-whisper
431
+ ```
432
+
433
+ Output shape:
434
+
435
+ ```json
436
+ {
437
+ "backend": "faster-whisper",
438
+ "models": [
439
+ {"name": "small", "status": "installed", "confidence": "high", "path": "..."},
440
+ {"name": "medium", "status": "missing", "confidence": "high"}
441
+ ]
442
+ }
443
+ ```
444
+
445
+ ### Then wire into
446
+ - `extensions/voice/diagnostics.ts`
447
+ - `extensions/voice/onboarding.ts`
448
+ - `extensions/voice/install.ts`
449
+
450
+ ### Product behavior unlocked
451
+ - prefer already-downloaded models in recommendations
452
+ - show `Already installed` badges in onboarding
453
+ - skip unnecessary download/setup prompts when model is already present
454
+
455
+ ---
456
+
457
+ ## Confidence summary
458
+
459
+ | Backend | Best detection method | Confidence |
460
+ |---|---|---|
461
+ | `faster-whisper` | `download_model(..., local_files_only=True)` | high |
462
+ | `whisper-cpp` | exact path checks already used in `transcribe.py` | very high |
463
+ | `deepgram` | `DEEPGRAM_API_KEY` presence | very high |
464
+ | `moonshine` | package-resource scan + fallback `unknown` | medium / low |
465
+ | `parakeet` | Torch/NeMo/HF cache scan + fallback `unknown` | medium-low |
466
+
467
+ ---
468
+
469
+ ## Final recommendation
470
+
471
+ Implement **installed-model-aware onboarding** in two passes:
472
+
473
+ ### Pass 1 (ship fast, high confidence)
474
+ - faster-whisper
475
+ - whisper-cpp
476
+ - deepgram
477
+
478
+ ### Pass 2 (ship carefully)
479
+ - moonshine
480
+ - parakeet
481
+ - use `unknown` status when confidence is not good enough
482
+
483
+ That gives `pi-voice` the product behavior you want without introducing misleading model-detection claims.