claude-can-speak 0.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/LICENSE ADDED
@@ -0,0 +1,21 @@
1
+ MIT License
2
+
3
+ Copyright (c) 2026 Ramazan Yavuz
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ of this software and associated documentation files (the "Software"), to deal
7
+ in the Software without restriction, including without limitation the rights
8
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ copies of the Software, and to permit persons to whom the Software is
10
+ furnished to do so, subject to the following conditions:
11
+
12
+ The above copyright notice and this permission notice shall be included in all
13
+ copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21
+ SOFTWARE.
package/README.md ADDED
@@ -0,0 +1,144 @@
1
+ # claude-can-speak
2
+
3
+ **Now Claude Code talks back.** Speech-out for Claude Code: a companion to the
4
+ built-in `/voice` speech-in. Turn `/voice` on and Claude can read its replies
5
+ aloud through your speakers; turn it off and you are back to silent, text-only.
6
+ Two ways to use it, a local neural voice, nothing sent to the cloud.
7
+
8
+ - **Firehose mode** - a Stop hook speaks every finished reply while `/voice` is
9
+ on. One switch (`/voice`) controls both directions: you talk to it, it talks
10
+ back.
11
+ - **Deliberate mode** - a `speak` skill lets Claude choose what to voice: a
12
+ spoken "the build is done", a heads-up while you are looking away, a shoutout.
13
+ Selective, on purpose, not a firehose.
14
+
15
+ Speech is synthesised locally by [Kokoro](https://github.com/thewh1teagle/kokoro-onnx)
16
+ (natural English, the default) or [Piper](https://github.com/OHF-Voice/piper1-gpl)
17
+ (multilingual: English, German, Turkish, and more), running in a Docker
18
+ container so they never touch your host Python environment. No API keys, no
19
+ network at speak time, no telemetry.
20
+
21
+ ## Requirements
22
+
23
+ - **Claude Code** (this is an extension for it).
24
+ - **Node.js >= 16 / npm** (to install the CLI).
25
+ - **Docker** (the TTS engines run in a container). The CLI checks for it and
26
+ tells you how to install it if it is missing.
27
+ - **An audio player**: `pw-play` (PipeWire), `paplay` (PulseAudio), or `aplay`
28
+ (ALSA) on Linux. (macOS/Windows playback support is on the roadmap.)
29
+
30
+ ## Install
31
+
32
+ ```sh
33
+ npm install -g claude-can-speak
34
+
35
+ # one-time: build the local TTS container image (needs Docker)
36
+ claude-can-speak build
37
+ ```
38
+
39
+ > If `npm install -g` fails with `EACCES` (a system-owned npm prefix like
40
+ > `/usr`), either use a user-level prefix once:
41
+ > `npm config set prefix ~/.npm-global && export PATH="$HOME/.npm-global/bin:$PATH"`
42
+ > (add that `export` to your shell profile), or install with `sudo`.
43
+
44
+ Then pick the mode(s) you want:
45
+
46
+ ```sh
47
+ claude-can-speak install-skill # deliberate mode: the 'speak' skill
48
+ claude-can-speak install-hooks # firehose mode: speak every reply on /voice
49
+ ```
50
+
51
+ Models are downloaded on first use into `~/.cache/claude-can-speak/models`
52
+ (nothing model-shaped is bundled in the package; see
53
+ [THIRD_PARTY.md](THIRD_PARTY.md)).
54
+
55
+ Then in Claude Code, toggle `/voice` on. Check everything with:
56
+
57
+ ```sh
58
+ claude-can-speak status
59
+ claude-can-speak test # speak a sample line
60
+ ```
61
+
62
+ ## Usage
63
+
64
+ ```sh
65
+ claude-can-speak status # gate state, container, voice, model cache
66
+ claude-can-speak test [text] # speak a sample (or your text)
67
+ claude-can-speak say <text> # speak text now (ignores the /voice gate)
68
+ claude-can-speak stop # interrupt whatever is being spoken
69
+ claude-can-speak voice <name> # set the default voice (e.g. af_heart)
70
+ claude-can-speak engine kokoro|piper
71
+ claude-can-speak voices # list voices for the current engine
72
+ claude-can-speak skill on|off # enable/disable the 'speak' skill
73
+ claude-can-speak install-hooks | remove-hooks
74
+ claude-can-speak install-skill
75
+ claude-can-speak build # (re)build the TTS container image
76
+ ```
77
+
78
+ ### Interrupting
79
+
80
+ You can stop playback three ways: run `claude-can-speak stop`, just send your
81
+ next message (a `UserPromptSubmit` hook stops the previous reply), or let a new
82
+ reply supersede the old one. Interrupts kill both in-flight synthesis and active
83
+ playback.
84
+
85
+ ### Choosing a voice
86
+
87
+ The default is Kokoro `af_heart` (natural US English, female). List options with
88
+ `claude-can-speak voices`. For German or Turkish, switch to Piper:
89
+
90
+ ```sh
91
+ claude-can-speak engine piper
92
+ claude-can-speak voice de_DE-thorsten-high # German
93
+ claude-can-speak voice tr_TR-dfki-medium # Turkish
94
+ ```
95
+
96
+ ## How it works
97
+
98
+ ```
99
+ Claude Code reply ─▶ Stop hook (gated on /voice) ─▶ strip markdown & code
100
+ ─▶ docker exec synth (Kokoro/Piper)
101
+ ─▶ play WAV on the host
102
+ ```
103
+
104
+ The container is persistent: it starts once and stays warm, so the Python and
105
+ ONNX import cost is paid once, not per reply. Only audio crosses back to the
106
+ host. The `speak` skill drives the same pipeline through `claude-can-speak say`,
107
+ but only when Claude (or you) chooses to speak.
108
+
109
+ ## Configuration
110
+
111
+ Per-user config lives in `~/.config/claude-can-speak/config.env` (written by the
112
+ `voice` / `engine` commands). Environment overrides: `CCS_IMAGE`,
113
+ `CCS_CONTAINER`, `CCS_MODELS_DIR`, `CLAUDE_SETTINGS`.
114
+
115
+ The `/voice` gate is read from `~/.claude/settings.json` (`voiceEnabled` or
116
+ `voice.enabled`). The `speak` skill is toggled via `skillOverrides` there.
117
+
118
+ ## Uninstall
119
+
120
+ ```sh
121
+ claude-can-speak remove-hooks
122
+ claude-can-speak skill off
123
+ claude-can-speak stop-container
124
+ npm uninstall -g claude-can-speak
125
+ rm -rf ~/.config/claude-can-speak ~/.claude/skills/speak
126
+ # optional: reclaim the model cache and image
127
+ rm -rf ~/.cache/claude-can-speak
128
+ docker image rm claude-can-speak:latest
129
+ ```
130
+
131
+ ## Disclaimer
132
+
133
+ This software is provided **AS IS, with NO WARRANTY** of any kind, express or
134
+ implied. The author is not liable for any damage, data loss, or other harm
135
+ arising from its use. It runs background processes, plays audio, builds and runs
136
+ a Docker container, and downloads third-party models from the internet on your
137
+ behalf. **By installing and using it you accept all risk.** You are responsible
138
+ for complying with the licences of the bundled engines and the downloaded models
139
+ (see [THIRD_PARTY.md](THIRD_PARTY.md)).
140
+
141
+ ## Licence
142
+
143
+ MIT - see [LICENSE](LICENSE). Author: Ramazan Yavuz. Part of the public,
144
+ open-source projects at [ra-yavuz.github.io](https://ra-yavuz.github.io/).
package/THIRD_PARTY.md ADDED
@@ -0,0 +1,44 @@
1
+ # Third-party components
2
+
3
+ claude-can-speak is MIT-licensed and ships only its own code: the CLI, the hook
4
+ and worker scripts, the `speak` skill, and the container recipe. It bundles **no
5
+ third-party model weights**. The TTS models are downloaded on first use, from
6
+ their official upstreams, into a local cache (`~/.cache/claude-can-speak/models`)
7
+ on your machine. Each model therefore reaches you directly from its own source
8
+ under its own licence; this project redistributes none of them.
9
+
10
+ This keeps the project cleanly MIT and avoids redistributing weights whose terms
11
+ differ from ours.
12
+
13
+ ## Engines (installed into the container at build time)
14
+
15
+ | Component | Licence | Source |
16
+ |---|---|---|
17
+ | Piper (`piper-tts`) | MIT | https://github.com/OHF-Voice/piper1-gpl |
18
+ | Kokoro runtime (`kokoro-onnx`) | MIT | https://github.com/thewh1teagle/kokoro-onnx |
19
+ | onnxruntime | MIT | https://github.com/microsoft/onnxruntime |
20
+ | soundfile / libsndfile | BSD / LGPL-2.1 | https://github.com/bastibe/python-soundfile |
21
+ | espeak-ng | GPL-3.0 | https://github.com/espeak-ng/espeak-ng |
22
+
23
+ espeak-ng (GPL-3.0) is used inside the container as a grapheme-to-phoneme step.
24
+ It is invoked as a separate program at runtime and is not linked into, or
25
+ redistributed by, this project; it is installed from the distribution's package
26
+ repository when the image is built.
27
+
28
+ ## Models (fetched on first use, not shipped)
29
+
30
+ | Model | Licence | Source |
31
+ |---|---|---|
32
+ | Kokoro-82M weights | Apache-2.0 | https://huggingface.co/hexgrad/Kokoro-82M |
33
+ | Kokoro ONNX + voices (v1.0) | Apache-2.0 (weights) | https://github.com/thewh1teagle/kokoro-onnx/releases |
34
+ | Piper voices (en_US, de_DE, tr_TR, ...) | per-voice (commonly MIT / CC BY 4.0) | https://huggingface.co/rhasspy/piper-voices |
35
+
36
+ Piper voice licences vary per voice; consult the voice's model card on
37
+ `rhasspy/piper-voices` for the exact terms and any attribution requirement. The
38
+ default voice (Kokoro `af_heart`) is covered by the Apache-2.0 Kokoro release.
39
+
40
+ ## No warranty
41
+
42
+ This project, and your use of these third-party components and models, is
43
+ provided AS IS with NO WARRANTY. You are responsible for complying with each
44
+ component's and model's licence. See LICENSE for the full disclaimer.
@@ -0,0 +1,300 @@
1
+ #!/usr/bin/env bash
2
+ # claude-can-speak: speech-out for Claude Code, gated on /voice mode.
3
+ #
4
+ # This CLI manages the TTS container and the playback lifecycle. The speaking
5
+ # itself is driven by a Stop hook (tts-speak.sh); this command is for setup,
6
+ # testing, picking a voice, and interrupting playback.
7
+ #
8
+ # Provided AS IS, with NO WARRANTY of any kind. The author is not liable for
9
+ # any damage or loss arising from its use. By using it you accept all risk.
10
+ set -uo pipefail
11
+
12
+ VERSION="0.1.0"
13
+
14
+ # Install layout: bundled scripts live in ../lib/claude-can-speak relative to
15
+ # this CLI, whether installed via npm (into the global node_modules) or run from
16
+ # a git checkout. SELF resolves through symlinks so a PATH symlink still works.
17
+ SOURCE="${BASH_SOURCE[0]}"
18
+ while [ -h "$SOURCE" ]; do
19
+ DIR="$(cd -P "$(dirname "$SOURCE")" && pwd)"
20
+ SOURCE="$(readlink "$SOURCE")"
21
+ [ "${SOURCE#/}" = "$SOURCE" ] && SOURCE="$DIR/$SOURCE"
22
+ done
23
+ SELF="$(cd -P "$(dirname "$SOURCE")" && pwd)"
24
+ LIBEXEC="$SELF/../lib/claude-can-speak"
25
+ [ -f "$LIBEXEC/tts-speak.sh" ] || LIBEXEC="$SELF/../lib/claude-can-speak"
26
+
27
+ CCS_HOME="${CCS_HOME:-$HOME/.config/claude-can-speak}"
28
+ CONFIG="$CCS_HOME/config.env"
29
+ PIDFILE="$CCS_HOME/speaking.pid"
30
+ CONTAINER="${CCS_CONTAINER:-ccs-tts}"
31
+ IMAGE="${CCS_IMAGE:-claude-can-speak:latest}"
32
+ MODELS_DIR="${CCS_MODELS_DIR:-$HOME/.cache/claude-can-speak/models}"
33
+ SETTINGS_JSON="${CLAUDE_SETTINGS:-$HOME/.claude/settings.json}"
34
+
35
+ mkdir -p "$CCS_HOME" 2>/dev/null || true
36
+
37
+ # Defaults; config.env overrides. (MAX_CHARS is a hook-only concern; the CLI's
38
+ # explicit say/test never truncate, so it is not set here.)
39
+ ENGINE="kokoro"; VOICE="af_heart"; LANG="en-us"; SPEED="1.0"
40
+ # shellcheck source=/dev/null
41
+ [ -f "$CONFIG" ] && . "$CONFIG"
42
+
43
+ die() { echo "claude-can-speak: $*" >&2; exit 1; }
44
+
45
+ # Docker is a runtime requirement (the TTS engines run in a container so they do
46
+ # not touch the host Python env). Check for it with a clear, actionable error.
47
+ require_docker() {
48
+ command -v docker >/dev/null 2>&1 || die \
49
+ "Docker is required but was not found.
50
+ claude-can-speak runs the TTS engines in a container so they never touch your
51
+ host. Install Docker, then retry:
52
+ https://docs.docker.com/get-docker/"
53
+ docker info >/dev/null 2>&1 || die \
54
+ "Docker is installed but the daemon is not reachable.
55
+ Start Docker (e.g. 'systemctl --user start docker' or Docker Desktop) and retry.
56
+ On Linux you may need to be in the 'docker' group: https://docs.docker.com/engine/install/linux-postinstall/"
57
+ }
58
+
59
+ cmd_help() {
60
+ cat <<EOF
61
+ claude-can-speak $VERSION - speech-out for Claude Code (gated on /voice)
62
+
63
+ USAGE
64
+ claude-can-speak <command> [args]
65
+
66
+ COMMANDS
67
+ status Show gate state, container, config, and model cache.
68
+ test [text] Speak a sample (or the given text) with the current voice.
69
+ stop Interrupt any reply currently being spoken.
70
+ say <text> Speak arbitrary text now (ignores the /voice gate).
71
+ start | up Start the persistent TTS container.
72
+ stop-container Stop and remove the TTS container.
73
+ voice <name> Set the default voice (e.g. af_heart, af_bella).
74
+ engine <name> Set the engine: kokoro (English) or piper (multilingual).
75
+ voices List known voices for the current engine.
76
+ install-hooks Register the Stop + interrupt hooks in settings.json.
77
+ remove-hooks Remove this project's hooks from settings.json.
78
+ install-skill Install the 'speak' skill into ~/.claude/skills.
79
+ skill on|off Enable/disable the 'speak' skill (settings skillOverrides).
80
+ build Build the TTS container image locally.
81
+ help | --help This text.
82
+
83
+ TWO MODES
84
+ Firehose : the Stop hook speaks every reply while /voice is on
85
+ (claude-can-speak install-hooks).
86
+ Deliberate: the 'speak' skill lets Claude choose what to voice
87
+ (notifications, shoutouts) via 'claude-can-speak say'
88
+ (claude-can-speak install-skill). Toggle with 'skill on|off'.
89
+
90
+ GATING
91
+ Speech-out only runs while /voice mode is on (voiceEnabled / voice.enabled
92
+ in $SETTINGS_JSON). Toggle /voice in Claude Code to switch both speech-in
93
+ and speech-out at once. Turn it off for full silence.
94
+
95
+ DISCLAIMER
96
+ Provided AS IS, with NO WARRANTY. You accept all risk. See the project
97
+ README for the full text.
98
+ EOF
99
+ }
100
+
101
+ ensure_image() {
102
+ docker image inspect "$IMAGE" >/dev/null 2>&1
103
+ }
104
+
105
+ cmd_build() {
106
+ require_docker
107
+ local ctx="$SELF/../container"
108
+ [ -f "$ctx/Dockerfile" ] || die "container build context not found ($ctx)"
109
+ echo "Building $IMAGE from $ctx ..."
110
+ docker build -t "$IMAGE" "$ctx"
111
+ }
112
+
113
+ cmd_start() {
114
+ require_docker
115
+ ensure_image || die "image $IMAGE missing; run: claude-can-speak build"
116
+ if docker inspect -f '{{.State.Running}}' "$CONTAINER" 2>/dev/null | grep -q true; then
117
+ echo "container $CONTAINER already running"; return 0
118
+ fi
119
+ docker rm -f "$CONTAINER" >/dev/null 2>&1 || true
120
+ mkdir -p "$MODELS_DIR"
121
+ docker run -d --name "$CONTAINER" -v "$MODELS_DIR:/models" "$IMAGE" >/dev/null \
122
+ && echo "started $CONTAINER (models cache: $MODELS_DIR)"
123
+ }
124
+
125
+ cmd_stop_container() {
126
+ docker rm -f "$CONTAINER" >/dev/null 2>&1 && echo "removed $CONTAINER" \
127
+ || echo "no container to remove"
128
+ }
129
+
130
+ cmd_stop() {
131
+ # Interrupt current playback. Mirrors stop_current() in the hook.
132
+ local pid stopped=0
133
+ if [ -f "$PIDFILE" ]; then
134
+ pid="$(cat "$PIDFILE" 2>/dev/null)"
135
+ if [ -n "$pid" ] && kill -0 "$pid" 2>/dev/null; then
136
+ kill -TERM "-$pid" 2>/dev/null || kill -TERM "$pid" 2>/dev/null
137
+ stopped=1
138
+ fi
139
+ rm -f "$PIDFILE" 2>/dev/null || true
140
+ fi
141
+ # Belt and suspenders: stop any stray players too.
142
+ pkill -TERM -x pw-play 2>/dev/null && stopped=1
143
+ pkill -TERM -x paplay 2>/dev/null && stopped=1
144
+ [ "$stopped" = 1 ] && echo "stopped playback" || echo "nothing playing"
145
+ }
146
+
147
+ _synth_to() { # text -> wav path on stdout fd
148
+ ensure_image || die "image $IMAGE missing; run: claude-can-speak build"
149
+ cmd_start >/dev/null
150
+ docker exec -i "$CONTAINER" python3 /app/synth.py \
151
+ --engine "$ENGINE" --voice "$VOICE" --lang "$LANG" --speed "$SPEED"
152
+ }
153
+
154
+ _player() {
155
+ for p in pw-play paplay aplay; do
156
+ command -v "$p" >/dev/null 2>&1 && { echo "$p"; return; }
157
+ done
158
+ return 1
159
+ }
160
+
161
+ cmd_say() {
162
+ local text="$*"
163
+ [ -n "$text" ] || die "usage: claude-can-speak say <text>"
164
+ local player; player="$(_player)" || die "no audio player (pw-play/paplay/aplay)"
165
+ local wav; wav="$(mktemp --suffix=.wav)"
166
+ if printf '%s' "$text" | _synth_to >"$wav" && [ -s "$wav" ]; then
167
+ "$player" "$wav"
168
+ else
169
+ rm -f "$wav"; die "synthesis failed"
170
+ fi
171
+ rm -f "$wav"
172
+ }
173
+
174
+ cmd_test() {
175
+ local text="${*:-Hi Ramazan. This is Claude Code. Voice output is working, using the $VOICE voice.}"
176
+ echo "engine=$ENGINE voice=$VOICE lang=$LANG"
177
+ cmd_say "$text"
178
+ }
179
+
180
+ cmd_status() {
181
+ echo "claude-can-speak $VERSION"
182
+ printf 'voice gate : '
183
+ if [ -f "$SETTINGS_JSON" ] && command -v jq >/dev/null 2>&1 \
184
+ && [ "$(jq -r '(.voiceEnabled // .voice.enabled // false)|tostring' "$SETTINGS_JSON" 2>/dev/null)" = true ]; then
185
+ echo "ON (/voice enabled)"
186
+ else
187
+ echo "off (/voice disabled) - replies will be silent"
188
+ fi
189
+ printf 'engine/voice : %s / %s (%s)\n' "$ENGINE" "$VOICE" "$LANG"
190
+ printf 'image : '; ensure_image && echo "$IMAGE present" || echo "$IMAGE MISSING (run: build)"
191
+ printf 'container : '
192
+ docker inspect -f '{{.State.Running}}' "$CONTAINER" 2>/dev/null | grep -q true \
193
+ && echo "$CONTAINER running" || echo "$CONTAINER not running"
194
+ printf 'models cache : %s' "$MODELS_DIR"
195
+ [ -d "$MODELS_DIR" ] && printf ' (%s)\n' "$(du -sh "$MODELS_DIR" 2>/dev/null | cut -f1)" || printf ' (empty)\n'
196
+ printf 'hooks : '
197
+ if [ -f "$SETTINGS_JSON" ] && grep -q 'tts-speak.sh' "$SETTINGS_JSON" 2>/dev/null; then
198
+ echo "registered"; else echo "NOT registered (run: install-hooks)"; fi
199
+ }
200
+
201
+ _set_config() { # key value
202
+ touch "$CONFIG"
203
+ if grep -q "^$1=" "$CONFIG" 2>/dev/null; then
204
+ sed -i "s|^$1=.*|$1=\"$2\"|" "$CONFIG"
205
+ else
206
+ printf '%s="%s"\n' "$1" "$2" >>"$CONFIG"
207
+ fi
208
+ }
209
+
210
+ cmd_voice() { [ -n "${1:-}" ] || die "usage: claude-can-speak voice <name>"; _set_config VOICE "$1"; echo "voice set to $1"; }
211
+ cmd_engine() {
212
+ case "${1:-}" in
213
+ kokoro) _set_config ENGINE kokoro; _set_config LANG en-us; echo "engine set to kokoro (English)";;
214
+ piper) _set_config ENGINE piper; echo "engine set to piper (multilingual); set a voice with: claude-can-speak voice de_DE-thorsten-high";;
215
+ *) die "engine must be 'kokoro' or 'piper'";;
216
+ esac
217
+ }
218
+
219
+ cmd_voices() {
220
+ if [ "$ENGINE" = piper ]; then
221
+ cat <<EOF
222
+ piper voices (multilingual):
223
+ en_US-amy-medium en_US-lessac-high en_US-libritts_r-medium
224
+ en_US-hfc_female-medium en_US-kristin-medium en_GB-jenny_dioco-medium
225
+ de_DE-thorsten-medium de_DE-thorsten-high tr_TR-dfki-medium
226
+ EOF
227
+ else
228
+ cat <<EOF
229
+ kokoro voices (English; af_=US female, am_=US male, bf_/bm_=British):
230
+ af_heart af_bella af_nicole af_aoede af_kore af_sarah
231
+ af_nova af_sky af_jessica af_river am_michael am_fenrir am_puck
232
+ EOF
233
+ fi
234
+ }
235
+
236
+ cmd_install_skill() {
237
+ # Locate the packaged skill (deb vs git checkout).
238
+ local src
239
+ for cand in "/usr/share/claude-can-speak/skills/speak" "$SELF/../skills/speak"; do
240
+ [ -f "$cand/SKILL.md" ] && { src="$cand"; break; }
241
+ done
242
+ [ -n "${src:-}" ] || die "packaged skill not found"
243
+ local dest="$HOME/.claude/skills/speak"
244
+ mkdir -p "$dest"
245
+ cp "$src/SKILL.md" "$dest/SKILL.md"
246
+ echo "installed 'speak' skill -> $dest/SKILL.md"
247
+ echo "if ~/.claude/skills did not exist before, restart your session to discover it."
248
+ }
249
+
250
+ cmd_skill() {
251
+ case "${1:-}" in
252
+ on) _skill_override on; echo "'speak' skill enabled" ;;
253
+ off) _skill_override off; echo "'speak' skill disabled" ;;
254
+ *) die "usage: claude-can-speak skill on|off" ;;
255
+ esac
256
+ }
257
+
258
+ _skill_override() { # on|off -> settings.json skillOverrides.speak
259
+ [ -f "$SETTINGS_JSON" ] || { mkdir -p "$(dirname "$SETTINGS_JSON")"; echo '{}' >"$SETTINGS_JSON"; }
260
+ python3 - "$SETTINGS_JSON" "$1" <<'PY'
261
+ import json, sys, os
262
+ path, state = sys.argv[1], sys.argv[2]
263
+ with open(path) as f: cfg = json.load(f)
264
+ ov = cfg.setdefault("skillOverrides", {})
265
+ if state == "off": ov["speak"] = "off"
266
+ else: ov.pop("speak", None) # remove override = default 'on'
267
+ tmp = path + ".tmp"
268
+ with open(tmp, "w") as f: json.dump(cfg, f, indent=2); f.write("\n")
269
+ os.replace(tmp, path)
270
+ PY
271
+ }
272
+
273
+ cmd_install_hooks() {
274
+ [ -x "$LIBEXEC/install-hooks.sh" ] || die "install-hooks.sh not found in $LIBEXEC"
275
+ "$LIBEXEC/install-hooks.sh" install
276
+ }
277
+ cmd_remove_hooks() {
278
+ [ -x "$LIBEXEC/install-hooks.sh" ] || die "install-hooks.sh not found in $LIBEXEC"
279
+ "$LIBEXEC/install-hooks.sh" remove
280
+ }
281
+
282
+ case "${1:-help}" in
283
+ status) cmd_status ;;
284
+ test) shift; cmd_test "$@" ;;
285
+ say) shift; cmd_say "$@" ;;
286
+ stop) cmd_stop ;;
287
+ start|up) cmd_start ;;
288
+ stop-container) cmd_stop_container ;;
289
+ voice) shift; cmd_voice "${1:-}" ;;
290
+ engine) shift; cmd_engine "${1:-}" ;;
291
+ voices) cmd_voices ;;
292
+ install-hooks) cmd_install_hooks ;;
293
+ remove-hooks) cmd_remove_hooks ;;
294
+ install-skill) cmd_install_skill ;;
295
+ skill) shift; cmd_skill "${1:-}" ;;
296
+ build) cmd_build ;;
297
+ help|-h|--help) cmd_help ;;
298
+ --version|-V) echo "claude-can-speak $VERSION" ;;
299
+ *) die "unknown command '$1' (try: claude-can-speak help)" ;;
300
+ esac
package/bin/cli.js ADDED
@@ -0,0 +1,49 @@
1
+ #!/usr/bin/env node
2
+ // npm entry point for claude-can-speak. This is a thin cross-platform shim: it
3
+ // locates the bundled bash CLI shipped in the package and execs it, passing
4
+ // through all arguments. The real logic lives in bin/claude-can-speak (bash),
5
+ // shared by the npm install and a direct git checkout.
6
+ //
7
+ // Bash is required (the CLI drives Docker + audio via shell). On Linux/macOS it
8
+ // is present; on Windows use WSL or Git Bash. We fail with a clear message if
9
+ // bash cannot be found rather than half-running.
10
+ "use strict";
11
+
12
+ const { spawnSync } = require("node:child_process");
13
+ const path = require("node:path");
14
+ const fs = require("node:fs");
15
+
16
+ const cliPath = path.join(__dirname, "claude-can-speak");
17
+
18
+ if (!fs.existsSync(cliPath)) {
19
+ console.error("claude-can-speak: bundled CLI not found at " + cliPath);
20
+ process.exit(1);
21
+ }
22
+
23
+ // Resolve a bash interpreter. PATH lookup covers Linux/macOS and Git-Bash/WSL
24
+ // shims on Windows.
25
+ function findBash() {
26
+ const candidates =
27
+ process.platform === "win32"
28
+ ? ["bash.exe", "bash"]
29
+ : ["/bin/bash", "/usr/bin/bash", "bash"];
30
+ for (const c of candidates) {
31
+ const r = spawnSync(c, ["-c", "exit 0"], { stdio: "ignore" });
32
+ if (r.status === 0) return c;
33
+ }
34
+ return null;
35
+ }
36
+
37
+ const bash = findBash();
38
+ if (!bash) {
39
+ console.error(
40
+ "claude-can-speak: bash is required but was not found.\n" +
41
+ "Install bash (Linux/macOS have it; on Windows use WSL or Git Bash)."
42
+ );
43
+ process.exit(1);
44
+ }
45
+
46
+ const res = spawnSync(bash, [cliPath, ...process.argv.slice(2)], {
47
+ stdio: "inherit",
48
+ });
49
+ process.exit(res.status === null ? 1 : res.status);
@@ -0,0 +1,33 @@
1
+ # claude-can-speak TTS container.
2
+ #
3
+ # Bundles the two synthesis engines (Piper + Kokoro) and their Python deps,
4
+ # but NO model weights: models are fetched on first use into /models, which
5
+ # is a host-mounted cache. The image therefore redistributes no third-party
6
+ # model files. See ../THIRD_PARTY.md for per-model licences.
7
+ FROM python:3.12-slim
8
+
9
+ # espeak-ng is required by Kokoro's grapheme-to-phoneme stage; harmless for
10
+ # Piper (which carries its own phonemization). libsndfile backs soundfile.
11
+ RUN apt-get update \
12
+ && apt-get install -y --no-install-recommends espeak-ng libsndfile1 \
13
+ && rm -rf /var/lib/apt/lists/*
14
+
15
+ # Pinned to keep synthesis reproducible. onnxruntime is the CPU build.
16
+ RUN pip install --no-cache-dir \
17
+ piper-tts==1.4.2 \
18
+ kokoro-onnx==0.5.0 \
19
+ soundfile==0.13.1 \
20
+ onnxruntime==1.26.0
21
+
22
+ ENV CCS_MODELS_DIR=/models
23
+ RUN mkdir -p /models
24
+ VOLUME ["/models"]
25
+
26
+ COPY synth.py /app/synth.py
27
+ COPY serve.sh /app/serve.sh
28
+ RUN chmod +x /app/serve.sh
29
+
30
+ # The container stays alive doing nothing; the host drives synthesis via
31
+ # `docker exec ... python3 /app/synth.py`. This keeps the Python import cost
32
+ # (onnxruntime, piper, kokoro) paid once per container, not once per reply.
33
+ ENTRYPOINT ["/app/serve.sh"]
@@ -0,0 +1,6 @@
1
+ #!/bin/sh
2
+ # Keep the container alive so the host can drive synthesis via `docker exec`.
3
+ # Paying Python + onnxruntime import cost once per container (not once per
4
+ # reply) is the whole point of the persistent-container design.
5
+ echo "[claude-can-speak] tts container ready; awaiting docker exec" >&2
6
+ exec tail -f /dev/null
@@ -0,0 +1,169 @@
1
+ #!/usr/bin/env python3
2
+ """claude-can-speak synthesis entrypoint (runs inside the TTS container).
3
+
4
+ Reads text on stdin, writes a WAV stream on stdout. The engine and voice are
5
+ chosen by flags so the host-side hook stays a thin wrapper. Two engines are
6
+ supported:
7
+
8
+ piper VITS2, multilingual (English, German, Turkish, ...). The default.
9
+ kokoro Kokoro-82M, English-family only, higher naturalness.
10
+
11
+ Models are NOT bundled in the image. They are fetched on first use into
12
+ /models (a host-mounted cache) from their official upstreams, so the image
13
+ and the .deb redistribute no third-party model weights. See THIRD_PARTY.md
14
+ for the per-model licences.
15
+
16
+ This program prints nothing to stdout except the WAV bytes; all diagnostics
17
+ go to stderr so the audio stream stays clean.
18
+ """
19
+
20
+ import argparse
21
+ import os
22
+ import sys
23
+ import urllib.request
24
+
25
+ MODELS_DIR = os.environ.get("CCS_MODELS_DIR", "/models")
26
+
27
+ # Piper voices we know about: slug -> (subpath on rhasspy/piper-voices, lang).
28
+ # Verified to resolve (HTTP 200) on huggingface.co/rhasspy/piper-voices.
29
+ PIPER_VOICES = {
30
+ "en_US-amy-medium": "en/en_US/amy/medium/en_US-amy-medium",
31
+ "en_US-lessac-high": "en/en_US/lessac/high/en_US-lessac-high",
32
+ "en_US-libritts_r-medium": "en/en_US/libritts_r/medium/en_US-libritts_r-medium",
33
+ "en_US-hfc_female-medium": "en/en_US/hfc_female/medium/en_US-hfc_female-medium",
34
+ "en_US-kristin-medium": "en/en_US/kristin/medium/en_US-kristin-medium",
35
+ "en_GB-jenny_dioco-medium": "en/en_GB/jenny_dioco/medium/en_GB-jenny_dioco-medium",
36
+ "de_DE-thorsten-medium": "de/de_DE/thorsten/medium/de_DE-thorsten-medium",
37
+ "de_DE-thorsten-high": "de/de_DE/thorsten/high/de_DE-thorsten-high",
38
+ "tr_TR-dfki-medium": "tr/tr_TR/dfki/medium/tr_TR-dfki-medium",
39
+ }
40
+ PIPER_BASE = "https://huggingface.co/rhasspy/piper-voices/resolve/main"
41
+
42
+ # Kokoro model + voices pack (single combined ONNX + a voices.bin).
43
+ KOKORO_MODEL_URL = (
44
+ "https://github.com/thewh1teagle/kokoro-onnx/releases/download/"
45
+ "model-files-v1.0/kokoro-v1.0.onnx"
46
+ )
47
+ KOKORO_VOICES_URL = (
48
+ "https://github.com/thewh1teagle/kokoro-onnx/releases/download/"
49
+ "model-files-v1.0/voices-v1.0.bin"
50
+ )
51
+
52
+
53
+ def log(msg):
54
+ print(f"[claude-can-speak/synth] {msg}", file=sys.stderr, flush=True)
55
+
56
+
57
+ def _download(url, dest):
58
+ """Fetch url to dest atomically. Skips if dest already exists."""
59
+ if os.path.exists(dest) and os.path.getsize(dest) > 0:
60
+ return dest
61
+ os.makedirs(os.path.dirname(dest), exist_ok=True)
62
+ # Unique temp per process so concurrent synths never race on the same
63
+ # .part file (os.replace is atomic, so the last writer wins cleanly).
64
+ tmp = f"{dest}.part.{os.getpid()}"
65
+ log(f"fetching {url}")
66
+ req = urllib.request.Request(url, headers={"User-Agent": "claude-can-speak"})
67
+ try:
68
+ with urllib.request.urlopen(req) as r, open(tmp, "wb") as f:
69
+ while True:
70
+ chunk = r.read(1 << 16)
71
+ if not chunk:
72
+ break
73
+ f.write(chunk)
74
+ os.replace(tmp, dest)
75
+ finally:
76
+ if os.path.exists(tmp):
77
+ os.unlink(tmp)
78
+ log(f"cached {dest} ({os.path.getsize(dest)} bytes)")
79
+ return dest
80
+
81
+
82
+ def ensure_piper_voice(voice):
83
+ if voice not in PIPER_VOICES:
84
+ raise SystemExit(
85
+ f"unknown piper voice {voice!r}; known: {', '.join(PIPER_VOICES)}"
86
+ )
87
+ subpath = PIPER_VOICES[voice]
88
+ onnx = os.path.join(MODELS_DIR, "piper", voice + ".onnx")
89
+ conf = onnx + ".json"
90
+ _download(f"{PIPER_BASE}/{subpath}.onnx", onnx)
91
+ _download(f"{PIPER_BASE}/{subpath}.onnx.json", conf)
92
+ return onnx
93
+
94
+
95
+ def ensure_kokoro_model():
96
+ model = os.path.join(MODELS_DIR, "kokoro", "kokoro-v1.0.onnx")
97
+ voices = os.path.join(MODELS_DIR, "kokoro", "voices-v1.0.bin")
98
+ _download(KOKORO_MODEL_URL, model)
99
+ _download(KOKORO_VOICES_URL, voices)
100
+ return model, voices
101
+
102
+
103
+ def synth_piper(text, voice, speed, wav_path):
104
+ from piper import PiperVoice, SynthesisConfig # piper-tts (piper1-gpl)
105
+ import wave
106
+
107
+ onnx = ensure_piper_voice(voice)
108
+ v = PiperVoice.load(onnx)
109
+ # length_scale < 1 speeds speech up; invert the user-facing speed factor.
110
+ cfg = SynthesisConfig(length_scale=(1.0 / speed if speed else 1.0))
111
+ with wave.open(wav_path, "wb") as wf:
112
+ v.synthesize_wav(text, wf, syn_config=cfg)
113
+
114
+
115
+ def synth_kokoro(text, voice, lang, speed, wav_path):
116
+ from kokoro_onnx import Kokoro
117
+ import soundfile as sf
118
+
119
+ model, voices = ensure_kokoro_model()
120
+ k = Kokoro(model, voices)
121
+ samples, sample_rate = k.create(text, voice=voice, speed=speed, lang=lang)
122
+ sf.write(wav_path, samples, sample_rate, format="WAV", subtype="PCM_16")
123
+
124
+
125
+ def main():
126
+ ap = argparse.ArgumentParser(description="claude-can-speak TTS synth")
127
+ ap.add_argument("--engine", choices=["piper", "kokoro"], default="kokoro")
128
+ ap.add_argument("--voice", default="af_heart")
129
+ ap.add_argument("--lang", default="en-us",
130
+ help="kokoro language tag (e.g. en-us); ignored by piper")
131
+ ap.add_argument("--speed", type=float, default=1.0)
132
+ ap.add_argument("--text", default=None,
133
+ help="text to speak; if omitted, read from stdin")
134
+ args = ap.parse_args()
135
+
136
+ text = args.text if args.text is not None else sys.stdin.read()
137
+ text = text.strip()
138
+ if not text:
139
+ log("empty text, nothing to synthesize")
140
+ return 0
141
+
142
+ # WAV encoders (wave, soundfile) seek back to patch the header size, so
143
+ # they need a seekable target; a stdout pipe is not seekable. Synthesize
144
+ # to a temp file, then stream the bytes to stdout.
145
+ import tempfile
146
+
147
+ tmp = tempfile.NamedTemporaryFile(suffix=".wav", delete=False)
148
+ tmp.close()
149
+ try:
150
+ if args.engine == "piper":
151
+ synth_piper(text, args.voice, args.speed, tmp.name)
152
+ else:
153
+ synth_kokoro(text, args.voice, args.lang, args.speed, tmp.name)
154
+ with open(tmp.name, "rb") as f:
155
+ sys.stdout.buffer.write(f.read())
156
+ sys.stdout.buffer.flush()
157
+ except Exception as e: # fail loud on stderr, silent on stdout
158
+ log(f"synthesis failed: {e}")
159
+ return 1
160
+ finally:
161
+ try:
162
+ os.unlink(tmp.name)
163
+ except OSError:
164
+ pass
165
+ return 0
166
+
167
+
168
+ if __name__ == "__main__":
169
+ sys.exit(main())
@@ -0,0 +1,43 @@
1
+ #!/usr/bin/env python3
2
+ """Strip markdown and code from a Claude reply so it reads naturally aloud.
3
+
4
+ Reads text on stdin, writes cleaned text on stdout. argv[1] (optional) is the
5
+ maximum character count; longer text is truncated on a sentence boundary so a
6
+ long reply does not monologue forever. Fenced code blocks are removed entirely
7
+ (they are unlistenable); inline code keeps its contents.
8
+ """
9
+ import re
10
+ import sys
11
+
12
+
13
+ def clean(text, maxc):
14
+ # Remove fenced code blocks entirely.
15
+ text = re.sub(r"```.*?```", " ", text, flags=re.DOTALL)
16
+ text = re.sub(r"~~~.*?~~~", " ", text, flags=re.DOTALL)
17
+ # Inline code: keep contents, drop backticks.
18
+ text = re.sub(r"`([^`]*)`", r"\1", text)
19
+ # Images / links: keep visible text, drop URL.
20
+ text = re.sub(r"!\[[^\]]*\]\([^)]*\)", " ", text)
21
+ text = re.sub(r"\[([^\]]+)\]\([^)]*\)", r"\1", text)
22
+ # Headings, list bullets, blockquote markers.
23
+ text = re.sub(r"^[ \t]*#{1,6}[ \t]*", "", text, flags=re.MULTILINE)
24
+ text = re.sub(r"^[ \t]*[-*+][ \t]+", "", text, flags=re.MULTILINE)
25
+ text = re.sub(r"^[ \t]*>[ \t]?", "", text, flags=re.MULTILINE)
26
+ # Emphasis markers around a run of text.
27
+ text = re.sub(r"[*_]{1,3}([^*_]+)[*_]{1,3}", r"\1", text)
28
+ # Collapse whitespace.
29
+ text = re.sub(r"\s+", " ", text).strip()
30
+ if maxc and len(text) > maxc:
31
+ cut = text[:maxc]
32
+ m = max(cut.rfind(". "), cut.rfind("! "), cut.rfind("? "))
33
+ text = cut[: m + 1] if m > maxc * 0.5 else cut
34
+ return text
35
+
36
+
37
+ def main():
38
+ maxc = int(sys.argv[1]) if len(sys.argv) > 1 else 700
39
+ sys.stdout.write(clean(sys.stdin.read(), maxc))
40
+
41
+
42
+ if __name__ == "__main__":
43
+ main()
@@ -0,0 +1,85 @@
1
+ #!/usr/bin/env bash
2
+ # claude-can-speak: register (or remove) the Stop + UserPromptSubmit hooks in
3
+ # the user's Claude Code settings.json, merging into any existing hooks rather
4
+ # than overwriting them. Idempotent.
5
+ set -uo pipefail
6
+
7
+ SETTINGS_JSON="${CLAUDE_SETTINGS:-$HOME/.claude/settings.json}"
8
+
9
+ # Resolve where the installed hook scripts live (deb vs git checkout).
10
+ SELF="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
11
+ if [ -f "/usr/lib/claude-can-speak/tts-speak.sh" ]; then
12
+ HOOKDIR="/usr/lib/claude-can-speak"
13
+ else
14
+ HOOKDIR="$SELF"
15
+ fi
16
+ SPEAK_HOOK="$HOOKDIR/tts-speak.sh"
17
+ INTERRUPT_HOOK="$HOOKDIR/interrupt.sh"
18
+
19
+ action="${1:-install}"
20
+
21
+ [ -f "$SETTINGS_JSON" ] || { mkdir -p "$(dirname "$SETTINGS_JSON")"; echo '{}' >"$SETTINGS_JSON"; }
22
+
23
+ python3 - "$SETTINGS_JSON" "$action" "$SPEAK_HOOK" "$INTERRUPT_HOOK" <<'PY'
24
+ import json, sys, os, shutil
25
+
26
+ path, action, speak_hook, interrupt_hook = sys.argv[1:5]
27
+
28
+ with open(path) as f:
29
+ cfg = json.load(f)
30
+
31
+ hooks = cfg.setdefault("hooks", {})
32
+
33
+ MARK = "tts-speak.sh" # how we recognise our Stop hook
34
+ IMARK = "interrupt.sh" # how we recognise our UserPromptSubmit hook
35
+
36
+ def groups(event):
37
+ return hooks.setdefault(event, [])
38
+
39
+ def strip(event, needle):
40
+ """Remove any hook entry whose command mentions needle."""
41
+ kept = []
42
+ for group in hooks.get(event, []):
43
+ group_hooks = [h for h in group.get("hooks", [])
44
+ if needle not in str(h.get("command", ""))]
45
+ if group_hooks:
46
+ group = dict(group); group["hooks"] = group_hooks
47
+ kept.append(group)
48
+ if kept:
49
+ hooks[event] = kept
50
+ elif event in hooks:
51
+ del hooks[event]
52
+
53
+ # Always strip our previous entries first so the operation is idempotent.
54
+ strip("Stop", MARK)
55
+ strip("UserPromptSubmit", IMARK)
56
+
57
+ if action == "install":
58
+ groups("Stop").append({
59
+ "hooks": [{"type": "command", "command": speak_hook, "timeout": 15}]
60
+ })
61
+ groups("UserPromptSubmit").append({
62
+ "hooks": [{"type": "command", "command": interrupt_hook, "timeout": 5}]
63
+ })
64
+ elif action == "remove":
65
+ pass
66
+ else:
67
+ sys.stderr.write("action must be install or remove\n")
68
+ sys.exit(2)
69
+
70
+ # Back up once, then write atomically.
71
+ if not os.path.exists(path + ".ccs-bak"):
72
+ shutil.copy2(path, path + ".ccs-bak")
73
+ tmp = path + ".tmp"
74
+ with open(tmp, "w") as f:
75
+ json.dump(cfg, f, indent=2)
76
+ f.write("\n")
77
+ os.replace(tmp, path)
78
+ print(f"{action}ed claude-can-speak hooks in {path}")
79
+ PY
80
+
81
+ if [ "$action" = install ]; then
82
+ echo "Stop hook -> $SPEAK_HOOK"
83
+ echo "UserPromptSubmit -> $INTERRUPT_HOOK"
84
+ echo "Note: Claude Code loads hooks at session start; restart your session to activate."
85
+ fi
@@ -0,0 +1,22 @@
1
+ #!/usr/bin/env bash
2
+ # claude-can-speak: UserPromptSubmit hook. When you send your next message,
3
+ # stop whatever the previous reply was still speaking. This makes "just start
4
+ # typing" the natural interrupt: the moment you move on, the voice goes quiet.
5
+ #
6
+ # Fails silent, returns immediately.
7
+ set -uo pipefail
8
+
9
+ CCS_HOME="${CCS_HOME:-$HOME/.config/claude-can-speak}"
10
+ PIDFILE="$CCS_HOME/speaking.pid"
11
+
12
+ # Drain stdin (the hook payload) so the caller's pipe never blocks.
13
+ cat >/dev/null 2>&1 || true
14
+
15
+ if [ -f "$PIDFILE" ]; then
16
+ pid="$(cat "$PIDFILE" 2>/dev/null)"
17
+ if [ -n "$pid" ] && kill -0 "$pid" 2>/dev/null; then
18
+ kill -TERM "-$pid" 2>/dev/null || kill -TERM "$pid" 2>/dev/null
19
+ fi
20
+ rm -f "$PIDFILE" 2>/dev/null || true
21
+ fi
22
+ exit 0
@@ -0,0 +1,59 @@
1
+ #!/usr/bin/env bash
2
+ # claude-can-speak speak worker. Launched detached (setsid) by tts-speak.sh as
3
+ # a new session leader, so $$ is this worker's process-group id. It records that
4
+ # pid IMMEDIATELY - before the multi-second synthesis - so an interrupt issued
5
+ # at any point (during synth or during playback) kills the whole group via
6
+ # `kill -TERM -$pid`. Cleaned text arrives on stdin; config via the environment.
7
+ set -uo pipefail
8
+
9
+ : "${CCS_HOME:=$HOME/.config/claude-can-speak}"
10
+ : "${LOG:=$CCS_HOME/claude-can-speak.log}"
11
+ : "${PIDFILE:=$CCS_HOME/speaking.pid}"
12
+ : "${CONTAINER:=ccs-tts}"
13
+ : "${IMAGE:=claude-can-speak:latest}"
14
+ : "${MODELS_DIR:=$HOME/.cache/claude-can-speak/models}"
15
+ : "${PLAYER:=pw-play}"
16
+ : "${ENGINE:=kokoro}"
17
+ : "${VOICE:=af_heart}"
18
+ : "${CCS_LANG:=en-us}"
19
+ : "${SPEED:=1.0}"
20
+
21
+ log() { printf '%s %s\n' "$(date -Is)" "$*" >>"$LOG" 2>/dev/null || true; }
22
+
23
+ # Record our group id up front so we are interruptible during synthesis.
24
+ printf '%s' "$$" >"$PIDFILE" 2>/dev/null || true
25
+ # If we are signalled, drop the pidfile on the way out.
26
+ trap 'rm -f "$PIDFILE" 2>/dev/null; exit 0' TERM INT
27
+
28
+ ensure_container() {
29
+ if docker inspect -f '{{.State.Running}}' "$CONTAINER" 2>/dev/null | grep -q true; then
30
+ return 0
31
+ fi
32
+ docker rm -f "$CONTAINER" >/dev/null 2>&1 || true
33
+ docker image inspect "$IMAGE" >/dev/null 2>&1 || { log "image $IMAGE missing"; return 1; }
34
+ mkdir -p "$MODELS_DIR" 2>/dev/null || true
35
+ docker run -d --name "$CONTAINER" -v "$MODELS_DIR:/models" "$IMAGE" >/dev/null 2>&1 \
36
+ || { log "container start failed"; return 1; }
37
+ log "started container $CONTAINER"
38
+ }
39
+
40
+ CLEAN="$(cat)"
41
+ [ -n "$CLEAN" ] || { rm -f "$PIDFILE" 2>/dev/null; exit 0; }
42
+
43
+ ensure_container || { rm -f "$PIDFILE" 2>/dev/null; exit 0; }
44
+
45
+ wav="$(mktemp --suffix=.wav 2>/dev/null)" || { rm -f "$PIDFILE" 2>/dev/null; exit 0; }
46
+ if printf '%s' "$CLEAN" | docker exec -i "$CONTAINER" \
47
+ python3 /app/synth.py \
48
+ --engine "$ENGINE" --voice "$VOICE" --lang "$CCS_LANG" --speed "$SPEED" \
49
+ >"$wav" 2>>"$LOG" && [ -s "$wav" ]; then
50
+ "$PLAYER" "$wav" >/dev/null 2>&1
51
+ else
52
+ log "synthesis produced no audio (engine=$ENGINE voice=$VOICE)"
53
+ fi
54
+ rm -f "$wav" 2>/dev/null || true
55
+
56
+ # Clear the pidfile only if it still points at us (a newer reply may have
57
+ # replaced it while we were speaking).
58
+ [ "$(cat "$PIDFILE" 2>/dev/null)" = "$$" ] && rm -f "$PIDFILE" 2>/dev/null
59
+ exit 0
@@ -0,0 +1,126 @@
1
+ #!/usr/bin/env bash
2
+ # claude-can-speak: Claude Code Stop hook that speaks each finished reply.
3
+ #
4
+ # Gated on /voice mode: it reads the same voiceEnabled / voice.enabled flag
5
+ # that the built-in /voice (speech-in) toggles, so one switch controls both
6
+ # directions. Voice off -> this exits silently and Claude Code stays text-only.
7
+ #
8
+ # Design contract:
9
+ # - Non-blocking: synthesis + playback run in a detached background job so
10
+ # the hook returns immediately and never delays the next turn.
11
+ # - Fails silent: any error (no docker, no audio, gate off) exits 0 quietly.
12
+ # - Absolute paths only; no reliance on the caller's PATH or cwd.
13
+ #
14
+ # Provided AS IS, no warranty. See the project README for the full disclaimer.
15
+ set -uo pipefail
16
+
17
+ # --- Resolve config -------------------------------------------------------
18
+ CCS_HOME="${CCS_HOME:-$HOME/.config/claude-can-speak}"
19
+ CCS_CONFIG="$CCS_HOME/config.env"
20
+ SETTINGS_JSON="${CLAUDE_SETTINGS:-$HOME/.claude/settings.json}"
21
+ CONTAINER="${CCS_CONTAINER:-ccs-tts}"
22
+ IMAGE="${CCS_IMAGE:-claude-can-speak:latest}"
23
+ MODELS_DIR="${CCS_MODELS_DIR:-$HOME/.cache/claude-can-speak/models}"
24
+ LOG="$CCS_HOME/claude-can-speak.log"
25
+ PIDFILE="$CCS_HOME/speaking.pid"
26
+
27
+ # Defaults (overridable via config.env). Chosen from a listening test:
28
+ # Kokoro af_heart, US female, is the most natural.
29
+ ENGINE="kokoro"
30
+ VOICE="af_heart"
31
+ LANG="en-us"
32
+ SPEED="1.0"
33
+ MAX_CHARS="700"
34
+
35
+ # shellcheck source=/dev/null
36
+ [ -f "$CCS_CONFIG" ] && . "$CCS_CONFIG"
37
+
38
+ mkdir -p "$CCS_HOME" 2>/dev/null || true
39
+
40
+ log() { printf '%s %s\n' "$(date -Is)" "$*" >>"$LOG" 2>/dev/null || true; }
41
+
42
+ # --- Tool availability ----------------------------------------------------
43
+ command -v docker >/dev/null 2>&1 || { log "no docker; silent"; exit 0; }
44
+
45
+ PLAYER=""
46
+ for p in pw-play paplay aplay; do
47
+ if command -v "$p" >/dev/null 2>&1; then PLAYER="$p"; break; fi
48
+ done
49
+ [ -n "$PLAYER" ] || { log "no audio player; silent"; exit 0; }
50
+
51
+ # --- Read the Stop hook payload ------------------------------------------
52
+ PAYLOAD="$(cat)"
53
+
54
+ # --- Gate: only speak when /voice mode is on ------------------------------
55
+ # Read voiceEnabled OR voice.enabled from settings.json. Absent/false = off.
56
+ voice_on() {
57
+ [ -f "$SETTINGS_JSON" ] || return 1
58
+ if command -v jq >/dev/null 2>&1; then
59
+ local v
60
+ v="$(jq -r '(.voiceEnabled // .voice.enabled // false) | tostring' \
61
+ "$SETTINGS_JSON" 2>/dev/null)"
62
+ [ "$v" = "true" ]
63
+ return
64
+ fi
65
+ # jq-less fallback: grep the two known keys.
66
+ grep -Eq '"voiceEnabled"[[:space:]]*:[[:space:]]*true' "$SETTINGS_JSON" && return 0
67
+ grep -Eq '"enabled"[[:space:]]*:[[:space:]]*true' "$SETTINGS_JSON" && return 0
68
+ return 1
69
+ }
70
+ voice_on || { log "voice gate off; silent"; exit 0; }
71
+
72
+ # --- Extract the reply text ----------------------------------------------
73
+ extract_text() {
74
+ if command -v jq >/dev/null 2>&1; then
75
+ printf '%s' "$PAYLOAD" | jq -r '.last_assistant_message // empty' 2>/dev/null
76
+ else
77
+ # Minimal fallback: pull last_assistant_message with python if present.
78
+ printf '%s' "$PAYLOAD" | python3 -c \
79
+ 'import sys,json; print(json.load(sys.stdin).get("last_assistant_message",""))' \
80
+ 2>/dev/null
81
+ fi
82
+ }
83
+ TEXT="$(extract_text)"
84
+ [ -n "$TEXT" ] || { log "no last_assistant_message; silent"; exit 0; }
85
+
86
+ # --- Clean text for speech ------------------------------------------------
87
+ # Drop fenced code blocks (unlistenable), strip common markdown, collapse
88
+ # whitespace, cap length. Logic lives in clean.py beside this script.
89
+ SELF_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
90
+ CLEAN_PY="${CCS_CLEAN_PY:-$SELF_DIR/clean.py}"
91
+ [ -f "$CLEAN_PY" ] || { log "clean.py not found at $CLEAN_PY; silent"; exit 0; }
92
+ CLEAN="$(printf '%s' "$TEXT" | python3 "$CLEAN_PY" "$MAX_CHARS" 2>>"$LOG")"
93
+ [ -n "$CLEAN" ] || { log "empty after cleaning; silent"; exit 0; }
94
+
95
+ # --- Interrupt any currently-playing reply --------------------------------
96
+ # A new reply supersedes the previous one: stop stale playback before we
97
+ # start. The same logic backs `claude-can-speak stop`.
98
+ stop_current() {
99
+ [ -f "$PIDFILE" ] || return 0
100
+ local pid
101
+ pid="$(cat "$PIDFILE" 2>/dev/null)"
102
+ if [ -n "$pid" ] && kill -0 "$pid" 2>/dev/null; then
103
+ # Kill the player and its group so the pipeline dies with it.
104
+ kill -TERM "-$pid" 2>/dev/null || kill -TERM "$pid" 2>/dev/null
105
+ fi
106
+ rm -f "$PIDFILE" 2>/dev/null || true
107
+ }
108
+
109
+ # --- Speak (detached worker) ----------------------------------------------
110
+ # A fresh reply interrupts whatever was still being spoken.
111
+ stop_current
112
+
113
+ # Launch the worker in a new session (setsid) so the hook returns immediately
114
+ # and the worker is a process-group leader: an interrupt can signal its whole
115
+ # group (in-flight synth + player) by the single recorded pid. Config is passed
116
+ # through the environment; the cleaned text arrives on the worker's stdin.
117
+ SELF_DIR="${SELF_DIR:-$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)}"
118
+ WORKER="${CCS_WORKER:-$SELF_DIR/speak-worker.sh}"
119
+ [ -f "$WORKER" ] || { log "speak-worker.sh not found at $WORKER; silent"; exit 0; }
120
+
121
+ printf '%s' "$CLEAN" | \
122
+ CCS_HOME="$CCS_HOME" LOG="$LOG" PIDFILE="$PIDFILE" \
123
+ CONTAINER="$CONTAINER" IMAGE="$IMAGE" MODELS_DIR="$MODELS_DIR" PLAYER="$PLAYER" \
124
+ ENGINE="$ENGINE" VOICE="$VOICE" CCS_LANG="$LANG" SPEED="$SPEED" \
125
+ setsid bash "$WORKER" >/dev/null 2>&1 &
126
+ exit 0
package/package.json ADDED
@@ -0,0 +1,49 @@
1
+ {
2
+ "name": "claude-can-speak",
3
+ "version": "0.1.0",
4
+ "description": "Speech-out for Claude Code: speak replies aloud (Stop-hook firehose) or let Claude voice deliberate notifications (skill). Local neural TTS via Kokoro/Piper in Docker. Gated on /voice mode.",
5
+ "keywords": [
6
+ "claude",
7
+ "claude-code",
8
+ "tts",
9
+ "text-to-speech",
10
+ "voice",
11
+ "kokoro",
12
+ "piper",
13
+ "speech",
14
+ "skill",
15
+ "hook"
16
+ ],
17
+ "homepage": "https://ra-yavuz.github.io/claude-can-speak/",
18
+ "bugs": {
19
+ "url": "https://github.com/ra-yavuz/claude-can-speak/issues"
20
+ },
21
+ "repository": {
22
+ "type": "git",
23
+ "url": "git+https://github.com/ra-yavuz/claude-can-speak.git"
24
+ },
25
+ "license": "MIT",
26
+ "author": "Ramazan Yavuz <yavuzramazan1994@gmail.com>",
27
+ "type": "commonjs",
28
+ "bin": {
29
+ "claude-can-speak": "bin/cli.js"
30
+ },
31
+ "files": [
32
+ "bin/cli.js",
33
+ "bin/claude-can-speak",
34
+ "lib/claude-can-speak/",
35
+ "container/",
36
+ "skills/",
37
+ "THIRD_PARTY.md",
38
+ "LICENSE",
39
+ "README.md"
40
+ ],
41
+ "engines": {
42
+ "node": ">=16"
43
+ },
44
+ "os": [
45
+ "linux",
46
+ "darwin",
47
+ "win32"
48
+ ]
49
+ }
@@ -0,0 +1,53 @@
1
+ ---
2
+ name: speak
3
+ description: >-
4
+ Speak a short message aloud through the user's speakers using claude-can-speak
5
+ (text-to-speech). Use this for deliberate, selective audio: a spoken
6
+ notification when a long task finishes, a heads-up that needs attention while
7
+ the user is looking away, a brief shoutout or status callout, or when the user
8
+ asks you to say something out loud. This is NOT for reading every reply aloud
9
+ (that is the separate Stop-hook mode); use it only for things genuinely worth
10
+ hearing. Keep spoken text to one or two short sentences.
11
+ argument-hint: "[message to speak]"
12
+ allowed-tools: Bash(claude-can-speak say *)
13
+ ---
14
+
15
+ # Speak aloud
16
+
17
+ You can send a short message to the user's speakers with the `claude-can-speak`
18
+ text-to-speech tool. Use it deliberately, not for everything.
19
+
20
+ ## When to speak
21
+
22
+ Good uses:
23
+ - A finished long-running task the user stepped away from: "The build is done and all tests passed."
24
+ - A notification that wants attention now: "Heads up, the deploy needs your confirmation."
25
+ - A short status callout or shoutout the user asked for out loud.
26
+ - The user explicitly says "tell me out loud", "say this", "read that back", etc.
27
+
28
+ Do NOT use it to narrate routine replies, read long passages, or speak code,
29
+ file paths, commands, or anything awkward to hear. If it is not worth
30
+ interrupting the user's ears for, keep it text-only.
31
+
32
+ ## How to speak
33
+
34
+ Run the CLI with the exact text to say (one or two short sentences, plain
35
+ words, no markdown or code):
36
+
37
+ ```
38
+ claude-can-speak say "Your short message here."
39
+ ```
40
+
41
+ If a message was passed to this skill, speak that:
42
+
43
+ ```
44
+ claude-can-speak say "$ARGUMENTS"
45
+ ```
46
+
47
+ Notes:
48
+ - The user can interrupt playback at any time with `claude-can-speak stop`, or
49
+ simply by sending their next message.
50
+ - `say` speaks regardless of /voice mode (it is a deliberate, explicit call),
51
+ whereas the firehose Stop-hook mode only speaks while /voice is on.
52
+ - If the command reports the image or container is missing, tell the user to run
53
+ `claude-can-speak build` once, then retry; do not keep retrying silently.