claude-voice 1.4.2 → 1.5.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -2,424 +2,122 @@
2
2
 
3
3
  [![npm version](https://img.shields.io/npm/v/claude-voice)](https://www.npmjs.com/package/claude-voice)
4
4
  [![License: PolyForm Noncommercial](https://img.shields.io/badge/License-PolyForm%20Noncommercial-red.svg)](LICENSE)
5
- [![Node.js](https://img.shields.io/node/v/claude-voice)](https://nodejs.org)
6
5
  [![Platform](https://img.shields.io/badge/platform-macOS%20%7C%20Linux-blue)]()
7
6
 
8
- Voice interface for Claude Code CLI. Speak commands, hear responses.
9
-
10
- <!-- Demo GIF placeholder - replace with actual recording -->
11
- <!-- ![Demo](docs/demo.gif) -->
12
-
13
- **Features:**
14
-
15
- - Speaks Claude's responses aloud (Text-to-Speech)
16
- - Transcribes your voice commands (Speech-to-Text)
17
- - Hands-free with wake word detection ("Jarvis")
18
- - Works offline with local providers - no API keys required
19
- - Deep integration with Claude Code via hooks system
20
-
21
- ## Quick Start
7
+ Voice interface for Claude Code. Speak commands, hear responses.
22
8
 
23
9
  ```bash
24
10
  npm install -g claude-voice
25
- claude-voice setup
26
- claude-voice start
27
11
  ```
28
12
 
29
- Say **"Jarvis"** followed by your command, or press **Cmd+Shift+Space** (macOS) / **Ctrl+Shift+Space** (Linux).
13
+ Say **"Hey Jarvis"** followed by your command. The extension auto-starts with Claude Code.
30
14
 
31
15
  ## How It Works
32
16
 
33
17
  ```
34
- You speak Claude Voice Claude Code
35
- | | |
36
- |--- "Jarvis..." -----> | |
37
- | (wake word) |--- transcribe ------> |
38
- | | (STT) |
39
- | | |
40
- | | <---- response ------ |
41
- | <-- speaks aloud ---- | |
42
- | (TTS) | |
18
+ You speak → "Hey Jarvis..." → Wake word detected → STT transcribes → Claude Code receives
19
+ Claude responds → Hook captures → TTS speaks aloud → You hear the response
43
20
  ```
44
21
 
45
- **Claude Code Integration:**
46
-
47
- | Hook | Purpose |
48
- |------|---------|
49
- | `session-start` | Auto-starts daemon when Claude Code launches |
50
- | `stop` | Speaks responses when Claude finishes |
51
- | `post-tool-use` | Announces tool completions (file reads, bash commands) |
52
- | `notification` | Voice alerts for permission prompts |
22
+ The extension integrates via Claude Code hooks: auto-start on session, speak responses, announce tool completions, and voice alerts for permission prompts.
53
23
 
54
24
  ## Providers
55
25
 
56
- Choose local (free, offline) or cloud providers:
57
-
58
- | Capability | Local (Free) | Cloud |
59
- |------------|--------------|-------|
60
- | Text-to-Speech | Piper, macOS Say, espeak | OpenAI TTS, ElevenLabs |
61
- | Speech-to-Text | Sherpa-ONNX | OpenAI Whisper |
62
- | Wake Word | Sherpa-ONNX | Picovoice Porcupine |
63
-
64
- <details>
65
- <summary><strong>TTS Providers</strong></summary>
66
-
67
- ### Piper (Default)
68
-
69
- Local neural TTS with high-quality voices. No API key required.
70
-
71
- ```bash
72
- claude-voice voice list # See available voices
73
- claude-voice voice download en_US-amy-medium
74
- claude-voice config set tts.provider=piper
75
- ```
76
-
77
- ### macOS Say
78
-
79
- Built-in macOS speech synthesis.
80
-
81
- ```bash
82
- claude-voice voices # List available voices
83
- claude-voice config set tts.provider=macos-say
84
- claude-voice config set tts.macos.voice=Samantha
85
- ```
86
-
87
- ### OpenAI TTS
26
+ | | Local (Free) | Cloud |
27
+ |---|---|---|
28
+ | **TTS** | macOS Say, Piper, espeak | OpenAI, ElevenLabs |
29
+ | **STT** | Sherpa-ONNX Whisper | OpenAI Whisper |
30
+ | **Wake Word** | openWakeWord, Sherpa-ONNX | Picovoice |
88
31
 
89
- High-quality neural voices. Requires `OPENAI_API_KEY`.
32
+ **Quick presets:**
90
33
 
91
34
  ```bash
92
- echo "OPENAI_API_KEY=sk-..." >> ~/.claude-voice/.env
93
- claude-voice config set tts.provider=openai
94
- claude-voice config set tts.openai.voice=nova
95
- ```
96
-
97
- ### ElevenLabs
98
-
99
- Premium voice synthesis. Requires `ELEVENLABS_API_KEY`.
100
-
101
- ```bash
102
- echo "ELEVENLABS_API_KEY=..." >> ~/.claude-voice/.env
103
- claude-voice config set tts.provider=elevenlabs
104
- ```
105
-
106
- </details>
107
-
108
- <details>
109
- <summary><strong>STT Providers</strong></summary>
110
-
111
- ### Sherpa-ONNX (Default)
112
-
113
- Local Whisper models. No API key required. Supports 100+ languages.
114
-
115
- ```bash
116
- claude-voice model list # Available models
117
- claude-voice model download whisper-small # Best accuracy (488MB)
118
- claude-voice config set stt.provider=sherpa-onnx
119
- claude-voice config set stt.language=en # or: tr, de, fr, es, etc.
120
- ```
121
-
122
- | Model | Size | Speed | Accuracy |
123
- |-------|------|-------|----------|
124
- | whisper-tiny | 75 MB | Fast | Good |
125
- | whisper-base | 142 MB | Medium | Better |
126
- | whisper-small | 488 MB | Slower | Best |
127
-
128
- ### OpenAI Whisper
129
-
130
- Cloud transcription. Requires `OPENAI_API_KEY`.
131
-
132
- ```bash
133
- claude-voice config set stt.provider=openai
134
- ```
135
-
136
- </details>
137
-
138
- <details>
139
- <summary><strong>Wake Word Providers</strong></summary>
140
-
141
- ### Sherpa-ONNX (Default)
142
-
143
- Local wake word detection. No API key required.
144
-
145
- ```bash
146
- claude-voice config set wakeWord.provider=sherpa-onnx
147
- claude-voice config set wakeWord.keyword=jarvis # or: claude
148
- ```
149
-
150
- ### Picovoice Porcupine
151
-
152
- High-accuracy wake word detection. Requires `PICOVOICE_ACCESS_KEY`.
153
-
154
- 1. Get a free access key at [Picovoice Console](https://console.picovoice.ai/)
155
- 2. Configure:
156
-
157
- ```bash
158
- echo "PICOVOICE_ACCESS_KEY=..." >> ~/.claude-voice/.env
159
- claude-voice config set wakeWord.provider=picovoice
160
- claude-voice config set wakeWord.keyword=jarvis # jarvis, computer, alexa, etc.
35
+ claude-voice setup # Interactive setup wizard
36
+ claude-voice openai # Cloud TTS + STT (requires API key)
37
+ claude-voice local --download # Piper TTS + larger Whisper model (offline)
161
38
  ```
162
39
 
163
- Built-in keywords: jarvis, computer, alexa, americano, blueberry, bumblebee, grapefruit, grasshopper, hey google, hey siri, ok google, picovoice, porcupine, terminator
164
-
165
- </details>
166
-
167
40
  ## Configuration
168
41
 
169
- Config file: `~/.claude-voice/config.json`
170
-
171
42
  ```bash
172
- claude-voice config # View all settings
173
- claude-voice config get tts.provider # Get specific value
43
+ claude-voice config # View all
174
44
  claude-voice config set tts.provider=openai # Set value
175
- claude-voice config edit # Open in editor
176
- claude-voice config reset # Reset to defaults
45
+ claude-voice config set stt.language=tr # Change language
46
+ claude-voice config edit # Open in editor
177
47
  ```
178
48
 
179
- <details>
180
- <summary><strong>TTS Options</strong></summary>
181
-
182
- | Option | Default | Description |
183
- |--------|---------|-------------|
184
- | `tts.provider` | `piper` | piper, macos-say, openai, elevenlabs, espeak, disabled |
185
- | `tts.autoSpeak` | `true` | Automatically speak Claude's responses |
186
- | `tts.maxSpeechLength` | `5000` | Maximum characters to speak |
187
- | `tts.skipCodeBlocks` | `true` | Skip code blocks when speaking |
188
-
189
- </details>
49
+ Config file: `~/.claude-voice/config.json`
190
50
 
191
51
  <details>
192
- <summary><strong>STT Options</strong></summary>
52
+ <summary><strong>All options</strong></summary>
193
53
 
194
54
  | Option | Default | Description |
195
55
  |--------|---------|-------------|
56
+ | `tts.provider` | `macos-say` | macos-say, piper, openai, elevenlabs, espeak, disabled |
57
+ | `tts.autoSpeak` | `true` | Auto-speak Claude responses |
58
+ | `tts.maxSpeechLength` | `5000` | Max characters to speak |
196
59
  | `stt.provider` | `sherpa-onnx` | sherpa-onnx, openai, whisper-local, disabled |
197
- | `stt.language` | `en` | Language code (en, tr, de, fr, es, ja, zh, etc.) |
198
-
199
- </details>
200
-
201
- <details>
202
- <summary><strong>Wake Word Options</strong></summary>
203
-
204
- | Option | Default | Description |
205
- |--------|---------|-------------|
60
+ | `stt.language` | `en` | Language code (en, tr, de, fr, es, ja, zh...) |
206
61
  | `wakeWord.enabled` | `true` | Enable wake word detection |
207
- | `wakeWord.provider` | `sherpa-onnx` | sherpa-onnx or picovoice |
208
- | `wakeWord.keyword` | `jarvis` | Wake word: jarvis, claude, computer, etc. |
62
+ | `wakeWord.provider` | `openwakeword` | openwakeword, sherpa-onnx, picovoice |
209
63
  | `wakeWord.sensitivity` | `0.5` | Detection sensitivity (0.0-1.0) |
210
- | `wakeWord.playSound` | `true` | Play sound on detection |
211
-
212
- </details>
213
-
214
- <details>
215
- <summary><strong>Voice Output Options</strong></summary>
216
-
217
- When enabled, Claude formats responses with a spoken abstract before technical details.
218
-
219
- | Option | Default | Description |
220
- |--------|---------|-------------|
221
- | `voiceOutput.enabled` | `false` | Enable TTS-friendly formatting |
222
- | `voiceOutput.abstractMarker` | `<!-- TTS -->` | Marker separating spoken/technical content |
223
- | `voiceOutput.maxAbstractLength` | `200` | Max characters for spoken abstract |
224
-
225
- ```bash
226
- claude-voice output enable # Enable voice-friendly formatting
227
- claude-voice output status # Check current status
228
- ```
229
-
230
- </details>
231
-
232
- <details>
233
- <summary><strong>Tool Announcements</strong></summary>
234
-
235
- | Option | Default | Description |
236
- |--------|---------|-------------|
64
+ | `voiceOutput.enabled` | `false` | TTS-friendly response formatting |
237
65
  | `toolTTS.enabled` | `false` | Announce tool completions |
238
- | `toolTTS.mode` | `summarize` | summarize or completion |
239
- | `toolTTS.announceErrors` | `true` | Announce tool errors |
240
-
241
- </details>
242
-
243
- <details>
244
- <summary><strong>Keyboard Shortcut</strong></summary>
245
-
246
- | Option | Default | Description |
247
- |--------|---------|-------------|
248
- | `shortcut.enabled` | `false` | Enable keyboard shortcut |
249
- | `shortcut.key` | `CommandOrControl+Shift+Space` | Key combination |
250
-
251
- **Modifiers:** CommandOrControl, Command, Control, Shift, Alt
252
-
253
- </details>
254
-
255
- <details>
256
- <summary><strong>Recording Options</strong></summary>
257
-
258
- | Option | Default | Description |
259
- |--------|---------|-------------|
260
- | `recording.sampleRate` | `16000` | Audio sample rate (Hz) |
261
- | `recording.silenceThreshold` | `2500` | Silence duration to stop (ms) |
262
- | `recording.silenceAmplitude` | `500` | Amplitude threshold |
66
+ | `recording.silenceThreshold` | `3500` | Silence duration to stop recording (ms) |
263
67
  | `recording.maxDuration` | `60000` | Max recording length (ms) |
264
68
 
265
69
  </details>
266
70
 
267
- <details>
268
- <summary><strong>Server Options</strong></summary>
269
-
270
- | Option | Default | Description |
271
- |--------|---------|-------------|
272
- | `server.port` | `3456` | HTTP server port |
273
- | `server.host` | `127.0.0.1` | Server host |
274
-
275
- </details>
276
-
277
71
  ## CLI Commands
278
72
 
279
73
  ```bash
280
- # Daemon Management
281
- claude-voice start # Start daemon
282
- claude-voice stop # Stop daemon
283
- claude-voice restart # Restart daemon
284
- claude-voice status # Check status
74
+ # Daemon
75
+ claude-voice start / stop / restart / status
285
76
 
286
- # Setup
287
- claude-voice setup # Interactive setup wizard
288
- claude-voice doctor # Diagnose issues
77
+ # Setup & Diagnostics
78
+ claude-voice setup # Interactive wizard
79
+ claude-voice doctor # Diagnose issues
289
80
 
290
81
  # Models & Voices
291
- claude-voice model list # List STT models
292
- claude-voice model download <id>
293
- claude-voice voice list # List TTS voices
294
- claude-voice voice download <id>
82
+ claude-voice model list / download <id> # STT models (whisper-tiny/base/small)
83
+ claude-voice voice list / download <id> # Piper TTS voices
295
84
 
296
- # Hooks
297
- claude-voice hooks install # Install Claude Code hooks
298
- claude-voice hooks status # Check installation
85
+ # Wake Word
86
+ claude-voice openwakeword --install # Better wake word detection
299
87
 
300
88
  # Testing
301
- claude-voice test-tts "Hello" # Test text-to-speech
302
- claude-voice test-stt file.wav # Test speech-to-text
89
+ claude-voice test-tts "Hello"
90
+ claude-voice test-stt recording.wav
303
91
 
304
92
  # Utilities
305
- claude-voice logs # View daemon logs
306
- claude-voice logs -f # Follow logs
307
- claude-voice devices # List audio devices
93
+ claude-voice logs -f # Follow daemon logs
94
+ claude-voice devices # List audio devices
308
95
  ```
309
96
 
310
- Run `claude-voice --help` for all 50+ commands.
311
-
312
97
  ## Platform Support
313
98
 
314
- | Feature | macOS | Linux |
315
- |---------|-------|-------|
316
- | TTS | Piper, Say, OpenAI, ElevenLabs | Piper, espeak, OpenAI, ElevenLabs |
99
+ | | macOS | Linux |
100
+ |---|---|---|
101
+ | TTS | Say, Piper, OpenAI, ElevenLabs | espeak, Piper, OpenAI, ElevenLabs |
317
102
  | STT | Sherpa-ONNX, OpenAI | Sherpa-ONNX, OpenAI |
318
- | Wake Word | Sherpa-ONNX, Picovoice | Sherpa-ONNX, Picovoice |
319
- | Keyboard Shortcut | Cmd+Shift+Space | Ctrl+Shift+Space |
320
- | Terminal Injection | AppleScript | xdotool (X11), dotool (Wayland) |
103
+ | Wake Word | openWakeWord, Sherpa-ONNX, Picovoice | openWakeWord, Sherpa-ONNX, Picovoice |
321
104
 
322
- **Requirements:**
323
- - Node.js 18+
324
- - Microphone access
105
+ **Requires:** Node.js 18+, microphone access. Python 3 recommended (for openWakeWord).
325
106
 
326
107
  ## Troubleshooting
327
108
 
328
- Run diagnostics:
329
-
330
109
  ```bash
331
- claude-voice doctor
110
+ claude-voice doctor # Auto-diagnose and fix issues
111
+ claude-voice logs # Check daemon logs
112
+ claude-voice start -f # Run in foreground for debugging
332
113
  ```
333
114
 
334
- <details>
335
- <summary><strong>Common Issues</strong></summary>
336
-
337
- **Daemon won't start**
338
- ```bash
339
- claude-voice logs # Check logs
340
- claude-voice start -f # Run in foreground for debugging
341
- ```
342
-
343
- **No audio output**
344
- ```bash
345
- claude-voice test-tts "Hello"
346
- claude-voice config get tts.provider
347
- ```
348
-
349
- **Wake word not detecting**
350
- - Check microphone permissions in System Preferences
351
- - Run `claude-voice devices` to verify microphone
352
- - Adjust sensitivity: `claude-voice config set wakeWord.sensitivity=0.7`
353
-
354
- **Text not appearing in terminal**
355
- - macOS: Allow Terminal in System Preferences > Privacy > Accessibility
356
- - Run `claude-voice doctor` to check terminal injection status
357
-
358
- </details>
359
-
360
- ## API Reference
361
-
362
- <details>
363
- <summary><strong>HTTP API (port 3456)</strong></summary>
364
-
365
- | Endpoint | Method | Description |
366
- |----------|--------|-------------|
367
- | `/status` | GET | Daemon status and provider info |
368
- | `/tts` | POST | Speak text `{"text": "...", "priority": false}` |
369
- | `/tts/stop` | POST | Stop current playback |
370
- | `/stt` | POST | Transcribe audio (multipart/form-data) |
371
- | `/config` | GET | Get configuration |
372
- | `/config` | POST | Update configuration |
373
-
374
- </details>
375
-
376
- ## Changelog
377
-
378
- ### v1.4.0
379
- - **License change** - Switched to PolyForm Noncommercial (commercial use restricted)
380
-
381
- ### v1.3.19
382
- - **`shh` command** - Stop TTS instantly
383
-
384
- ### v1.3.18
385
- - **`listen` command** - Manual voice trigger without wake word
386
-
387
- ### v1.3.16
388
- - **`stop talking` command** - Stop speech playback
389
- - **dotool support** - Alternative input injection for Linux Wayland
390
- - **Python detection** - Better cross-platform support
391
-
392
- ### v1.3.9
393
- - **Pure Node.js model downloads** - No system dependencies (curl/wget not required)
394
-
395
- ### v1.3.3
396
- - **Linux support** - Full Linux platform support added
397
- - **OpenWakeWord removed** - Using only sherpa-onnx for wake word detection
398
-
399
- ## Contributing
400
-
401
- Contributions are welcome.
402
-
403
- ```bash
404
- git clone https://github.com/Menesahin/claude-voice-extension.git
405
- cd claude-voice-extension
406
- npm install
407
- npm run dev
408
- ```
409
-
410
- **Guidelines:**
411
- - Run `npm run lint` before committing
412
- - Add tests for new features
413
- - Follow existing code patterns
115
+ **Wake word not detecting?** Run `claude-voice openwakeword --install` for better accuracy.
414
116
 
415
117
  ## License
416
118
 
417
- **PolyForm Noncommercial License 1.0.0**
418
-
419
- Free for personal use, research, education, and non-profit organizations. Commercial use requires a separate license. See [LICENSE](LICENSE) for details.
119
+ [PolyForm Noncommercial 1.0.0](LICENSE) - Free for personal use, research, and education.
420
120
 
421
121
  ---
422
122
 
423
- [Documentation](https://github.com/Menesahin/claude-voice-extension#readme) |
424
- [Issues](https://github.com/Menesahin/claude-voice-extension/issues) |
425
- [Releases](https://github.com/Menesahin/claude-voice-extension/releases)
123
+ [Issues](https://github.com/Menesahin/claude-voice-extension/issues) | [Releases](https://github.com/Menesahin/claude-voice-extension/releases)
@@ -1,7 +1,7 @@
1
1
  {
2
2
  "version": 1,
3
3
  "tts": {
4
- "provider": "piper",
4
+ "provider": "macos-say",
5
5
  "autoSpeak": true,
6
6
  "maxSpeechLength": 5000,
7
7
  "skipCodeBlocks": true,
@@ -34,7 +34,7 @@
34
34
  "provider": "sherpa-onnx",
35
35
  "language": "en",
36
36
  "sherpaOnnx": {
37
- "model": "whisper-base"
37
+ "model": "whisper-tiny"
38
38
  },
39
39
  "whisperLocal": {
40
40
  "model": "base",
@@ -46,7 +46,7 @@
46
46
  },
47
47
  "wakeWord": {
48
48
  "enabled": true,
49
- "provider": "sherpa-onnx",
49
+ "provider": "openwakeword",
50
50
  "keyword": "jarvis",
51
51
  "sensitivity": 1.0,
52
52
  "playSound": true,
@@ -54,16 +54,10 @@
54
54
  "jarvis": [
55
55
  "▁JA R VI S",
56
56
  "▁JA R V I S",
57
- "J AR VI S",
58
57
  "J A R VI S",
59
58
  "J A R V I S",
60
- "▁JA R V IS",
61
- "▁JAR VIS",
62
- "JAR VIS",
63
- "▁J AR VIS",
64
- "J AR VIS",
65
- "▁JAR V IS",
66
- "JAR V IS"
59
+ "J AR V I S",
60
+ "J AR VI S"
67
61
  ],
68
62
  "claude": [
69
63
  "▁C L A U DE",
@@ -76,6 +70,11 @@
76
70
  },
77
71
  "picovoice": {
78
72
  "accessKey": ""
73
+ },
74
+ "openwakeword": {
75
+ "model": "hey_jarvis",
76
+ "threshold": 0.5,
77
+ "debug": false
79
78
  }
80
79
  },
81
80
  "notifications": {