claude-can-speak 0.1.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/LICENSE +21 -0
- package/README.md +144 -0
- package/THIRD_PARTY.md +44 -0
- package/bin/claude-can-speak +300 -0
- package/bin/cli.js +49 -0
- package/container/Dockerfile +33 -0
- package/container/serve.sh +6 -0
- package/container/synth.py +169 -0
- package/lib/claude-can-speak/clean.py +43 -0
- package/lib/claude-can-speak/install-hooks.sh +85 -0
- package/lib/claude-can-speak/interrupt.sh +22 -0
- package/lib/claude-can-speak/speak-worker.sh +59 -0
- package/lib/claude-can-speak/tts-speak.sh +126 -0
- package/package.json +49 -0
- package/skills/speak/SKILL.md +53 -0
package/LICENSE
ADDED
|
@@ -0,0 +1,21 @@
|
|
|
1
|
+
MIT License
|
|
2
|
+
|
|
3
|
+
Copyright (c) 2026 Ramazan Yavuz
|
|
4
|
+
|
|
5
|
+
Permission is hereby granted, free of charge, to any person obtaining a copy
|
|
6
|
+
of this software and associated documentation files (the "Software"), to deal
|
|
7
|
+
in the Software without restriction, including without limitation the rights
|
|
8
|
+
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
|
|
9
|
+
copies of the Software, and to permit persons to whom the Software is
|
|
10
|
+
furnished to do so, subject to the following conditions:
|
|
11
|
+
|
|
12
|
+
The above copyright notice and this permission notice shall be included in all
|
|
13
|
+
copies or substantial portions of the Software.
|
|
14
|
+
|
|
15
|
+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
|
16
|
+
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
|
17
|
+
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
|
|
18
|
+
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
|
19
|
+
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
|
|
20
|
+
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
|
|
21
|
+
SOFTWARE.
|
package/README.md
ADDED
|
@@ -0,0 +1,144 @@
|
|
|
1
|
+
# claude-can-speak
|
|
2
|
+
|
|
3
|
+
**Now Claude Code talks back.** Speech-out for Claude Code: a companion to the
|
|
4
|
+
built-in `/voice` speech-in. Turn `/voice` on and Claude can read its replies
|
|
5
|
+
aloud through your speakers; turn it off and you are back to silent, text-only.
|
|
6
|
+
Two ways to use it, a local neural voice, nothing sent to the cloud.
|
|
7
|
+
|
|
8
|
+
- **Firehose mode** - a Stop hook speaks every finished reply while `/voice` is
|
|
9
|
+
on. One switch (`/voice`) controls both directions: you talk to it, it talks
|
|
10
|
+
back.
|
|
11
|
+
- **Deliberate mode** - a `speak` skill lets Claude choose what to voice: a
|
|
12
|
+
spoken "the build is done", a heads-up while you are looking away, a shoutout.
|
|
13
|
+
Selective, on purpose, not a firehose.
|
|
14
|
+
|
|
15
|
+
Speech is synthesised locally by [Kokoro](https://github.com/thewh1teagle/kokoro-onnx)
|
|
16
|
+
(natural English, the default) or [Piper](https://github.com/OHF-Voice/piper1-gpl)
|
|
17
|
+
(multilingual: English, German, Turkish, and more), running in a Docker
|
|
18
|
+
container so they never touch your host Python environment. No API keys, no
|
|
19
|
+
network at speak time, no telemetry.
|
|
20
|
+
|
|
21
|
+
## Requirements
|
|
22
|
+
|
|
23
|
+
- **Claude Code** (this is an extension for it).
|
|
24
|
+
- **Node.js >= 16 / npm** (to install the CLI).
|
|
25
|
+
- **Docker** (the TTS engines run in a container). The CLI checks for it and
|
|
26
|
+
tells you how to install it if it is missing.
|
|
27
|
+
- **An audio player**: `pw-play` (PipeWire), `paplay` (PulseAudio), or `aplay`
|
|
28
|
+
(ALSA) on Linux. (macOS/Windows playback support is on the roadmap.)
|
|
29
|
+
|
|
30
|
+
## Install
|
|
31
|
+
|
|
32
|
+
```sh
|
|
33
|
+
npm install -g claude-can-speak
|
|
34
|
+
|
|
35
|
+
# one-time: build the local TTS container image (needs Docker)
|
|
36
|
+
claude-can-speak build
|
|
37
|
+
```
|
|
38
|
+
|
|
39
|
+
> If `npm install -g` fails with `EACCES` (a system-owned npm prefix like
|
|
40
|
+
> `/usr`), either use a user-level prefix once:
|
|
41
|
+
> `npm config set prefix ~/.npm-global && export PATH="$HOME/.npm-global/bin:$PATH"`
|
|
42
|
+
> (add that `export` to your shell profile), or install with `sudo`.
|
|
43
|
+
|
|
44
|
+
Then pick the mode(s) you want:
|
|
45
|
+
|
|
46
|
+
```sh
|
|
47
|
+
claude-can-speak install-skill # deliberate mode: the 'speak' skill
|
|
48
|
+
claude-can-speak install-hooks # firehose mode: speak every reply on /voice
|
|
49
|
+
```
|
|
50
|
+
|
|
51
|
+
Models are downloaded on first use into `~/.cache/claude-can-speak/models`
|
|
52
|
+
(nothing model-shaped is bundled in the package; see
|
|
53
|
+
[THIRD_PARTY.md](THIRD_PARTY.md)).
|
|
54
|
+
|
|
55
|
+
Then in Claude Code, toggle `/voice` on. Check everything with:
|
|
56
|
+
|
|
57
|
+
```sh
|
|
58
|
+
claude-can-speak status
|
|
59
|
+
claude-can-speak test # speak a sample line
|
|
60
|
+
```
|
|
61
|
+
|
|
62
|
+
## Usage
|
|
63
|
+
|
|
64
|
+
```sh
|
|
65
|
+
claude-can-speak status # gate state, container, voice, model cache
|
|
66
|
+
claude-can-speak test [text] # speak a sample (or your text)
|
|
67
|
+
claude-can-speak say <text> # speak text now (ignores the /voice gate)
|
|
68
|
+
claude-can-speak stop # interrupt whatever is being spoken
|
|
69
|
+
claude-can-speak voice <name> # set the default voice (e.g. af_heart)
|
|
70
|
+
claude-can-speak engine kokoro|piper
|
|
71
|
+
claude-can-speak voices # list voices for the current engine
|
|
72
|
+
claude-can-speak skill on|off # enable/disable the 'speak' skill
|
|
73
|
+
claude-can-speak install-hooks | remove-hooks
|
|
74
|
+
claude-can-speak install-skill
|
|
75
|
+
claude-can-speak build # (re)build the TTS container image
|
|
76
|
+
```
|
|
77
|
+
|
|
78
|
+
### Interrupting
|
|
79
|
+
|
|
80
|
+
You can stop playback three ways: run `claude-can-speak stop`, just send your
|
|
81
|
+
next message (a `UserPromptSubmit` hook stops the previous reply), or let a new
|
|
82
|
+
reply supersede the old one. Interrupts kill both in-flight synthesis and active
|
|
83
|
+
playback.
|
|
84
|
+
|
|
85
|
+
### Choosing a voice
|
|
86
|
+
|
|
87
|
+
The default is Kokoro `af_heart` (natural US English, female). List options with
|
|
88
|
+
`claude-can-speak voices`. For German or Turkish, switch to Piper:
|
|
89
|
+
|
|
90
|
+
```sh
|
|
91
|
+
claude-can-speak engine piper
|
|
92
|
+
claude-can-speak voice de_DE-thorsten-high # German
|
|
93
|
+
claude-can-speak voice tr_TR-dfki-medium # Turkish
|
|
94
|
+
```
|
|
95
|
+
|
|
96
|
+
## How it works
|
|
97
|
+
|
|
98
|
+
```
|
|
99
|
+
Claude Code reply ─▶ Stop hook (gated on /voice) ─▶ strip markdown & code
|
|
100
|
+
─▶ docker exec synth (Kokoro/Piper)
|
|
101
|
+
─▶ play WAV on the host
|
|
102
|
+
```
|
|
103
|
+
|
|
104
|
+
The container is persistent: it starts once and stays warm, so the Python and
|
|
105
|
+
ONNX import cost is paid once, not per reply. Only audio crosses back to the
|
|
106
|
+
host. The `speak` skill drives the same pipeline through `claude-can-speak say`,
|
|
107
|
+
but only when Claude (or you) chooses to speak.
|
|
108
|
+
|
|
109
|
+
## Configuration
|
|
110
|
+
|
|
111
|
+
Per-user config lives in `~/.config/claude-can-speak/config.env` (written by the
|
|
112
|
+
`voice` / `engine` commands). Environment overrides: `CCS_IMAGE`,
|
|
113
|
+
`CCS_CONTAINER`, `CCS_MODELS_DIR`, `CLAUDE_SETTINGS`.
|
|
114
|
+
|
|
115
|
+
The `/voice` gate is read from `~/.claude/settings.json` (`voiceEnabled` or
|
|
116
|
+
`voice.enabled`). The `speak` skill is toggled via `skillOverrides` there.
|
|
117
|
+
|
|
118
|
+
## Uninstall
|
|
119
|
+
|
|
120
|
+
```sh
|
|
121
|
+
claude-can-speak remove-hooks
|
|
122
|
+
claude-can-speak skill off
|
|
123
|
+
claude-can-speak stop-container
|
|
124
|
+
npm uninstall -g claude-can-speak
|
|
125
|
+
rm -rf ~/.config/claude-can-speak ~/.claude/skills/speak
|
|
126
|
+
# optional: reclaim the model cache and image
|
|
127
|
+
rm -rf ~/.cache/claude-can-speak
|
|
128
|
+
docker image rm claude-can-speak:latest
|
|
129
|
+
```
|
|
130
|
+
|
|
131
|
+
## Disclaimer
|
|
132
|
+
|
|
133
|
+
This software is provided **AS IS, with NO WARRANTY** of any kind, express or
|
|
134
|
+
implied. The author is not liable for any damage, data loss, or other harm
|
|
135
|
+
arising from its use. It runs background processes, plays audio, builds and runs
|
|
136
|
+
a Docker container, and downloads third-party models from the internet on your
|
|
137
|
+
behalf. **By installing and using it you accept all risk.** You are responsible
|
|
138
|
+
for complying with the licences of the bundled engines and the downloaded models
|
|
139
|
+
(see [THIRD_PARTY.md](THIRD_PARTY.md)).
|
|
140
|
+
|
|
141
|
+
## Licence
|
|
142
|
+
|
|
143
|
+
MIT - see [LICENSE](LICENSE). Author: Ramazan Yavuz. Part of the public,
|
|
144
|
+
open-source projects at [ra-yavuz.github.io](https://ra-yavuz.github.io/).
|
package/THIRD_PARTY.md
ADDED
|
@@ -0,0 +1,44 @@
|
|
|
1
|
+
# Third-party components
|
|
2
|
+
|
|
3
|
+
claude-can-speak is MIT-licensed and ships only its own code: the CLI, the hook
|
|
4
|
+
and worker scripts, the `speak` skill, and the container recipe. It bundles **no
|
|
5
|
+
third-party model weights**. The TTS models are downloaded on first use, from
|
|
6
|
+
their official upstreams, into a local cache (`~/.cache/claude-can-speak/models`)
|
|
7
|
+
on your machine. Each model therefore reaches you directly from its own source
|
|
8
|
+
under its own licence; this project redistributes none of them.
|
|
9
|
+
|
|
10
|
+
This keeps the project cleanly MIT and avoids redistributing weights whose terms
|
|
11
|
+
differ from ours.
|
|
12
|
+
|
|
13
|
+
## Engines (installed into the container at build time)
|
|
14
|
+
|
|
15
|
+
| Component | Licence | Source |
|
|
16
|
+
|---|---|---|
|
|
17
|
+
| Piper (`piper-tts`) | MIT | https://github.com/OHF-Voice/piper1-gpl |
|
|
18
|
+
| Kokoro runtime (`kokoro-onnx`) | MIT | https://github.com/thewh1teagle/kokoro-onnx |
|
|
19
|
+
| onnxruntime | MIT | https://github.com/microsoft/onnxruntime |
|
|
20
|
+
| soundfile / libsndfile | BSD / LGPL-2.1 | https://github.com/bastibe/python-soundfile |
|
|
21
|
+
| espeak-ng | GPL-3.0 | https://github.com/espeak-ng/espeak-ng |
|
|
22
|
+
|
|
23
|
+
espeak-ng (GPL-3.0) is used inside the container as a grapheme-to-phoneme step.
|
|
24
|
+
It is invoked as a separate program at runtime and is not linked into, or
|
|
25
|
+
redistributed by, this project; it is installed from the distribution's package
|
|
26
|
+
repository when the image is built.
|
|
27
|
+
|
|
28
|
+
## Models (fetched on first use, not shipped)
|
|
29
|
+
|
|
30
|
+
| Model | Licence | Source |
|
|
31
|
+
|---|---|---|
|
|
32
|
+
| Kokoro-82M weights | Apache-2.0 | https://huggingface.co/hexgrad/Kokoro-82M |
|
|
33
|
+
| Kokoro ONNX + voices (v1.0) | Apache-2.0 (weights) | https://github.com/thewh1teagle/kokoro-onnx/releases |
|
|
34
|
+
| Piper voices (en_US, de_DE, tr_TR, ...) | per-voice (commonly MIT / CC BY 4.0) | https://huggingface.co/rhasspy/piper-voices |
|
|
35
|
+
|
|
36
|
+
Piper voice licences vary per voice; consult the voice's model card on
|
|
37
|
+
`rhasspy/piper-voices` for the exact terms and any attribution requirement. The
|
|
38
|
+
default voice (Kokoro `af_heart`) is covered by the Apache-2.0 Kokoro release.
|
|
39
|
+
|
|
40
|
+
## No warranty
|
|
41
|
+
|
|
42
|
+
This project, and your use of these third-party components and models, is
|
|
43
|
+
provided AS IS with NO WARRANTY. You are responsible for complying with each
|
|
44
|
+
component's and model's licence. See LICENSE for the full disclaimer.
|
|
@@ -0,0 +1,300 @@
|
|
|
1
|
+
#!/usr/bin/env bash
|
|
2
|
+
# claude-can-speak: speech-out for Claude Code, gated on /voice mode.
|
|
3
|
+
#
|
|
4
|
+
# This CLI manages the TTS container and the playback lifecycle. The speaking
|
|
5
|
+
# itself is driven by a Stop hook (tts-speak.sh); this command is for setup,
|
|
6
|
+
# testing, picking a voice, and interrupting playback.
|
|
7
|
+
#
|
|
8
|
+
# Provided AS IS, with NO WARRANTY of any kind. The author is not liable for
|
|
9
|
+
# any damage or loss arising from its use. By using it you accept all risk.
|
|
10
|
+
set -uo pipefail
|
|
11
|
+
|
|
12
|
+
VERSION="0.1.0"
|
|
13
|
+
|
|
14
|
+
# Install layout: bundled scripts live in ../lib/claude-can-speak relative to
|
|
15
|
+
# this CLI, whether installed via npm (into the global node_modules) or run from
|
|
16
|
+
# a git checkout. SELF resolves through symlinks so a PATH symlink still works.
|
|
17
|
+
SOURCE="${BASH_SOURCE[0]}"
|
|
18
|
+
while [ -h "$SOURCE" ]; do
|
|
19
|
+
DIR="$(cd -P "$(dirname "$SOURCE")" && pwd)"
|
|
20
|
+
SOURCE="$(readlink "$SOURCE")"
|
|
21
|
+
[ "${SOURCE#/}" = "$SOURCE" ] && SOURCE="$DIR/$SOURCE"
|
|
22
|
+
done
|
|
23
|
+
SELF="$(cd -P "$(dirname "$SOURCE")" && pwd)"
|
|
24
|
+
LIBEXEC="$SELF/../lib/claude-can-speak"
|
|
25
|
+
[ -f "$LIBEXEC/tts-speak.sh" ] || LIBEXEC="$SELF/../lib/claude-can-speak"
|
|
26
|
+
|
|
27
|
+
CCS_HOME="${CCS_HOME:-$HOME/.config/claude-can-speak}"
|
|
28
|
+
CONFIG="$CCS_HOME/config.env"
|
|
29
|
+
PIDFILE="$CCS_HOME/speaking.pid"
|
|
30
|
+
CONTAINER="${CCS_CONTAINER:-ccs-tts}"
|
|
31
|
+
IMAGE="${CCS_IMAGE:-claude-can-speak:latest}"
|
|
32
|
+
MODELS_DIR="${CCS_MODELS_DIR:-$HOME/.cache/claude-can-speak/models}"
|
|
33
|
+
SETTINGS_JSON="${CLAUDE_SETTINGS:-$HOME/.claude/settings.json}"
|
|
34
|
+
|
|
35
|
+
mkdir -p "$CCS_HOME" 2>/dev/null || true
|
|
36
|
+
|
|
37
|
+
# Defaults; config.env overrides. (MAX_CHARS is a hook-only concern; the CLI's
|
|
38
|
+
# explicit say/test never truncate, so it is not set here.)
|
|
39
|
+
ENGINE="kokoro"; VOICE="af_heart"; LANG="en-us"; SPEED="1.0"
|
|
40
|
+
# shellcheck source=/dev/null
|
|
41
|
+
[ -f "$CONFIG" ] && . "$CONFIG"
|
|
42
|
+
|
|
43
|
+
die() { echo "claude-can-speak: $*" >&2; exit 1; }
|
|
44
|
+
|
|
45
|
+
# Docker is a runtime requirement (the TTS engines run in a container so they do
|
|
46
|
+
# not touch the host Python env). Check for it with a clear, actionable error.
|
|
47
|
+
require_docker() {
|
|
48
|
+
command -v docker >/dev/null 2>&1 || die \
|
|
49
|
+
"Docker is required but was not found.
|
|
50
|
+
claude-can-speak runs the TTS engines in a container so they never touch your
|
|
51
|
+
host. Install Docker, then retry:
|
|
52
|
+
https://docs.docker.com/get-docker/"
|
|
53
|
+
docker info >/dev/null 2>&1 || die \
|
|
54
|
+
"Docker is installed but the daemon is not reachable.
|
|
55
|
+
Start Docker (e.g. 'systemctl --user start docker' or Docker Desktop) and retry.
|
|
56
|
+
On Linux you may need to be in the 'docker' group: https://docs.docker.com/engine/install/linux-postinstall/"
|
|
57
|
+
}
|
|
58
|
+
|
|
59
|
+
cmd_help() {
|
|
60
|
+
cat <<EOF
|
|
61
|
+
claude-can-speak $VERSION - speech-out for Claude Code (gated on /voice)
|
|
62
|
+
|
|
63
|
+
USAGE
|
|
64
|
+
claude-can-speak <command> [args]
|
|
65
|
+
|
|
66
|
+
COMMANDS
|
|
67
|
+
status Show gate state, container, config, and model cache.
|
|
68
|
+
test [text] Speak a sample (or the given text) with the current voice.
|
|
69
|
+
stop Interrupt any reply currently being spoken.
|
|
70
|
+
say <text> Speak arbitrary text now (ignores the /voice gate).
|
|
71
|
+
start | up Start the persistent TTS container.
|
|
72
|
+
stop-container Stop and remove the TTS container.
|
|
73
|
+
voice <name> Set the default voice (e.g. af_heart, af_bella).
|
|
74
|
+
engine <name> Set the engine: kokoro (English) or piper (multilingual).
|
|
75
|
+
voices List known voices for the current engine.
|
|
76
|
+
install-hooks Register the Stop + interrupt hooks in settings.json.
|
|
77
|
+
remove-hooks Remove this project's hooks from settings.json.
|
|
78
|
+
install-skill Install the 'speak' skill into ~/.claude/skills.
|
|
79
|
+
skill on|off Enable/disable the 'speak' skill (settings skillOverrides).
|
|
80
|
+
build Build the TTS container image locally.
|
|
81
|
+
help | --help This text.
|
|
82
|
+
|
|
83
|
+
TWO MODES
|
|
84
|
+
Firehose : the Stop hook speaks every reply while /voice is on
|
|
85
|
+
(claude-can-speak install-hooks).
|
|
86
|
+
Deliberate: the 'speak' skill lets Claude choose what to voice
|
|
87
|
+
(notifications, shoutouts) via 'claude-can-speak say'
|
|
88
|
+
(claude-can-speak install-skill). Toggle with 'skill on|off'.
|
|
89
|
+
|
|
90
|
+
GATING
|
|
91
|
+
Speech-out only runs while /voice mode is on (voiceEnabled / voice.enabled
|
|
92
|
+
in $SETTINGS_JSON). Toggle /voice in Claude Code to switch both speech-in
|
|
93
|
+
and speech-out at once. Turn it off for full silence.
|
|
94
|
+
|
|
95
|
+
DISCLAIMER
|
|
96
|
+
Provided AS IS, with NO WARRANTY. You accept all risk. See the project
|
|
97
|
+
README for the full text.
|
|
98
|
+
EOF
|
|
99
|
+
}
|
|
100
|
+
|
|
101
|
+
ensure_image() {
|
|
102
|
+
docker image inspect "$IMAGE" >/dev/null 2>&1
|
|
103
|
+
}
|
|
104
|
+
|
|
105
|
+
cmd_build() {
|
|
106
|
+
require_docker
|
|
107
|
+
local ctx="$SELF/../container"
|
|
108
|
+
[ -f "$ctx/Dockerfile" ] || die "container build context not found ($ctx)"
|
|
109
|
+
echo "Building $IMAGE from $ctx ..."
|
|
110
|
+
docker build -t "$IMAGE" "$ctx"
|
|
111
|
+
}
|
|
112
|
+
|
|
113
|
+
cmd_start() {
|
|
114
|
+
require_docker
|
|
115
|
+
ensure_image || die "image $IMAGE missing; run: claude-can-speak build"
|
|
116
|
+
if docker inspect -f '{{.State.Running}}' "$CONTAINER" 2>/dev/null | grep -q true; then
|
|
117
|
+
echo "container $CONTAINER already running"; return 0
|
|
118
|
+
fi
|
|
119
|
+
docker rm -f "$CONTAINER" >/dev/null 2>&1 || true
|
|
120
|
+
mkdir -p "$MODELS_DIR"
|
|
121
|
+
docker run -d --name "$CONTAINER" -v "$MODELS_DIR:/models" "$IMAGE" >/dev/null \
|
|
122
|
+
&& echo "started $CONTAINER (models cache: $MODELS_DIR)"
|
|
123
|
+
}
|
|
124
|
+
|
|
125
|
+
cmd_stop_container() {
|
|
126
|
+
docker rm -f "$CONTAINER" >/dev/null 2>&1 && echo "removed $CONTAINER" \
|
|
127
|
+
|| echo "no container to remove"
|
|
128
|
+
}
|
|
129
|
+
|
|
130
|
+
cmd_stop() {
|
|
131
|
+
# Interrupt current playback. Mirrors stop_current() in the hook.
|
|
132
|
+
local pid stopped=0
|
|
133
|
+
if [ -f "$PIDFILE" ]; then
|
|
134
|
+
pid="$(cat "$PIDFILE" 2>/dev/null)"
|
|
135
|
+
if [ -n "$pid" ] && kill -0 "$pid" 2>/dev/null; then
|
|
136
|
+
kill -TERM "-$pid" 2>/dev/null || kill -TERM "$pid" 2>/dev/null
|
|
137
|
+
stopped=1
|
|
138
|
+
fi
|
|
139
|
+
rm -f "$PIDFILE" 2>/dev/null || true
|
|
140
|
+
fi
|
|
141
|
+
# Belt and suspenders: stop any stray players too.
|
|
142
|
+
pkill -TERM -x pw-play 2>/dev/null && stopped=1
|
|
143
|
+
pkill -TERM -x paplay 2>/dev/null && stopped=1
|
|
144
|
+
[ "$stopped" = 1 ] && echo "stopped playback" || echo "nothing playing"
|
|
145
|
+
}
|
|
146
|
+
|
|
147
|
+
_synth_to() { # text -> wav path on stdout fd
|
|
148
|
+
ensure_image || die "image $IMAGE missing; run: claude-can-speak build"
|
|
149
|
+
cmd_start >/dev/null
|
|
150
|
+
docker exec -i "$CONTAINER" python3 /app/synth.py \
|
|
151
|
+
--engine "$ENGINE" --voice "$VOICE" --lang "$LANG" --speed "$SPEED"
|
|
152
|
+
}
|
|
153
|
+
|
|
154
|
+
_player() {
|
|
155
|
+
for p in pw-play paplay aplay; do
|
|
156
|
+
command -v "$p" >/dev/null 2>&1 && { echo "$p"; return; }
|
|
157
|
+
done
|
|
158
|
+
return 1
|
|
159
|
+
}
|
|
160
|
+
|
|
161
|
+
cmd_say() {
|
|
162
|
+
local text="$*"
|
|
163
|
+
[ -n "$text" ] || die "usage: claude-can-speak say <text>"
|
|
164
|
+
local player; player="$(_player)" || die "no audio player (pw-play/paplay/aplay)"
|
|
165
|
+
local wav; wav="$(mktemp --suffix=.wav)"
|
|
166
|
+
if printf '%s' "$text" | _synth_to >"$wav" && [ -s "$wav" ]; then
|
|
167
|
+
"$player" "$wav"
|
|
168
|
+
else
|
|
169
|
+
rm -f "$wav"; die "synthesis failed"
|
|
170
|
+
fi
|
|
171
|
+
rm -f "$wav"
|
|
172
|
+
}
|
|
173
|
+
|
|
174
|
+
cmd_test() {
|
|
175
|
+
local text="${*:-Hi Ramazan. This is Claude Code. Voice output is working, using the $VOICE voice.}"
|
|
176
|
+
echo "engine=$ENGINE voice=$VOICE lang=$LANG"
|
|
177
|
+
cmd_say "$text"
|
|
178
|
+
}
|
|
179
|
+
|
|
180
|
+
cmd_status() {
|
|
181
|
+
echo "claude-can-speak $VERSION"
|
|
182
|
+
printf 'voice gate : '
|
|
183
|
+
if [ -f "$SETTINGS_JSON" ] && command -v jq >/dev/null 2>&1 \
|
|
184
|
+
&& [ "$(jq -r '(.voiceEnabled // .voice.enabled // false)|tostring' "$SETTINGS_JSON" 2>/dev/null)" = true ]; then
|
|
185
|
+
echo "ON (/voice enabled)"
|
|
186
|
+
else
|
|
187
|
+
echo "off (/voice disabled) - replies will be silent"
|
|
188
|
+
fi
|
|
189
|
+
printf 'engine/voice : %s / %s (%s)\n' "$ENGINE" "$VOICE" "$LANG"
|
|
190
|
+
printf 'image : '; ensure_image && echo "$IMAGE present" || echo "$IMAGE MISSING (run: build)"
|
|
191
|
+
printf 'container : '
|
|
192
|
+
docker inspect -f '{{.State.Running}}' "$CONTAINER" 2>/dev/null | grep -q true \
|
|
193
|
+
&& echo "$CONTAINER running" || echo "$CONTAINER not running"
|
|
194
|
+
printf 'models cache : %s' "$MODELS_DIR"
|
|
195
|
+
[ -d "$MODELS_DIR" ] && printf ' (%s)\n' "$(du -sh "$MODELS_DIR" 2>/dev/null | cut -f1)" || printf ' (empty)\n'
|
|
196
|
+
printf 'hooks : '
|
|
197
|
+
if [ -f "$SETTINGS_JSON" ] && grep -q 'tts-speak.sh' "$SETTINGS_JSON" 2>/dev/null; then
|
|
198
|
+
echo "registered"; else echo "NOT registered (run: install-hooks)"; fi
|
|
199
|
+
}
|
|
200
|
+
|
|
201
|
+
_set_config() { # key value
|
|
202
|
+
touch "$CONFIG"
|
|
203
|
+
if grep -q "^$1=" "$CONFIG" 2>/dev/null; then
|
|
204
|
+
sed -i "s|^$1=.*|$1=\"$2\"|" "$CONFIG"
|
|
205
|
+
else
|
|
206
|
+
printf '%s="%s"\n' "$1" "$2" >>"$CONFIG"
|
|
207
|
+
fi
|
|
208
|
+
}
|
|
209
|
+
|
|
210
|
+
cmd_voice() { [ -n "${1:-}" ] || die "usage: claude-can-speak voice <name>"; _set_config VOICE "$1"; echo "voice set to $1"; }
|
|
211
|
+
cmd_engine() {
|
|
212
|
+
case "${1:-}" in
|
|
213
|
+
kokoro) _set_config ENGINE kokoro; _set_config LANG en-us; echo "engine set to kokoro (English)";;
|
|
214
|
+
piper) _set_config ENGINE piper; echo "engine set to piper (multilingual); set a voice with: claude-can-speak voice de_DE-thorsten-high";;
|
|
215
|
+
*) die "engine must be 'kokoro' or 'piper'";;
|
|
216
|
+
esac
|
|
217
|
+
}
|
|
218
|
+
|
|
219
|
+
cmd_voices() {
|
|
220
|
+
if [ "$ENGINE" = piper ]; then
|
|
221
|
+
cat <<EOF
|
|
222
|
+
piper voices (multilingual):
|
|
223
|
+
en_US-amy-medium en_US-lessac-high en_US-libritts_r-medium
|
|
224
|
+
en_US-hfc_female-medium en_US-kristin-medium en_GB-jenny_dioco-medium
|
|
225
|
+
de_DE-thorsten-medium de_DE-thorsten-high tr_TR-dfki-medium
|
|
226
|
+
EOF
|
|
227
|
+
else
|
|
228
|
+
cat <<EOF
|
|
229
|
+
kokoro voices (English; af_=US female, am_=US male, bf_/bm_=British):
|
|
230
|
+
af_heart af_bella af_nicole af_aoede af_kore af_sarah
|
|
231
|
+
af_nova af_sky af_jessica af_river am_michael am_fenrir am_puck
|
|
232
|
+
EOF
|
|
233
|
+
fi
|
|
234
|
+
}
|
|
235
|
+
|
|
236
|
+
cmd_install_skill() {
|
|
237
|
+
# Locate the packaged skill (deb vs git checkout).
|
|
238
|
+
local src
|
|
239
|
+
for cand in "/usr/share/claude-can-speak/skills/speak" "$SELF/../skills/speak"; do
|
|
240
|
+
[ -f "$cand/SKILL.md" ] && { src="$cand"; break; }
|
|
241
|
+
done
|
|
242
|
+
[ -n "${src:-}" ] || die "packaged skill not found"
|
|
243
|
+
local dest="$HOME/.claude/skills/speak"
|
|
244
|
+
mkdir -p "$dest"
|
|
245
|
+
cp "$src/SKILL.md" "$dest/SKILL.md"
|
|
246
|
+
echo "installed 'speak' skill -> $dest/SKILL.md"
|
|
247
|
+
echo "if ~/.claude/skills did not exist before, restart your session to discover it."
|
|
248
|
+
}
|
|
249
|
+
|
|
250
|
+
cmd_skill() {
|
|
251
|
+
case "${1:-}" in
|
|
252
|
+
on) _skill_override on; echo "'speak' skill enabled" ;;
|
|
253
|
+
off) _skill_override off; echo "'speak' skill disabled" ;;
|
|
254
|
+
*) die "usage: claude-can-speak skill on|off" ;;
|
|
255
|
+
esac
|
|
256
|
+
}
|
|
257
|
+
|
|
258
|
+
_skill_override() { # on|off -> settings.json skillOverrides.speak
|
|
259
|
+
[ -f "$SETTINGS_JSON" ] || { mkdir -p "$(dirname "$SETTINGS_JSON")"; echo '{}' >"$SETTINGS_JSON"; }
|
|
260
|
+
python3 - "$SETTINGS_JSON" "$1" <<'PY'
|
|
261
|
+
import json, sys, os
|
|
262
|
+
path, state = sys.argv[1], sys.argv[2]
|
|
263
|
+
with open(path) as f: cfg = json.load(f)
|
|
264
|
+
ov = cfg.setdefault("skillOverrides", {})
|
|
265
|
+
if state == "off": ov["speak"] = "off"
|
|
266
|
+
else: ov.pop("speak", None) # remove override = default 'on'
|
|
267
|
+
tmp = path + ".tmp"
|
|
268
|
+
with open(tmp, "w") as f: json.dump(cfg, f, indent=2); f.write("\n")
|
|
269
|
+
os.replace(tmp, path)
|
|
270
|
+
PY
|
|
271
|
+
}
|
|
272
|
+
|
|
273
|
+
cmd_install_hooks() {
|
|
274
|
+
[ -x "$LIBEXEC/install-hooks.sh" ] || die "install-hooks.sh not found in $LIBEXEC"
|
|
275
|
+
"$LIBEXEC/install-hooks.sh" install
|
|
276
|
+
}
|
|
277
|
+
cmd_remove_hooks() {
|
|
278
|
+
[ -x "$LIBEXEC/install-hooks.sh" ] || die "install-hooks.sh not found in $LIBEXEC"
|
|
279
|
+
"$LIBEXEC/install-hooks.sh" remove
|
|
280
|
+
}
|
|
281
|
+
|
|
282
|
+
case "${1:-help}" in
|
|
283
|
+
status) cmd_status ;;
|
|
284
|
+
test) shift; cmd_test "$@" ;;
|
|
285
|
+
say) shift; cmd_say "$@" ;;
|
|
286
|
+
stop) cmd_stop ;;
|
|
287
|
+
start|up) cmd_start ;;
|
|
288
|
+
stop-container) cmd_stop_container ;;
|
|
289
|
+
voice) shift; cmd_voice "${1:-}" ;;
|
|
290
|
+
engine) shift; cmd_engine "${1:-}" ;;
|
|
291
|
+
voices) cmd_voices ;;
|
|
292
|
+
install-hooks) cmd_install_hooks ;;
|
|
293
|
+
remove-hooks) cmd_remove_hooks ;;
|
|
294
|
+
install-skill) cmd_install_skill ;;
|
|
295
|
+
skill) shift; cmd_skill "${1:-}" ;;
|
|
296
|
+
build) cmd_build ;;
|
|
297
|
+
help|-h|--help) cmd_help ;;
|
|
298
|
+
--version|-V) echo "claude-can-speak $VERSION" ;;
|
|
299
|
+
*) die "unknown command '$1' (try: claude-can-speak help)" ;;
|
|
300
|
+
esac
|
package/bin/cli.js
ADDED
|
@@ -0,0 +1,49 @@
|
|
|
1
|
+
#!/usr/bin/env node
|
|
2
|
+
// npm entry point for claude-can-speak. This is a thin cross-platform shim: it
|
|
3
|
+
// locates the bundled bash CLI shipped in the package and execs it, passing
|
|
4
|
+
// through all arguments. The real logic lives in bin/claude-can-speak (bash),
|
|
5
|
+
// shared by the npm install and a direct git checkout.
|
|
6
|
+
//
|
|
7
|
+
// Bash is required (the CLI drives Docker + audio via shell). On Linux/macOS it
|
|
8
|
+
// is present; on Windows use WSL or Git Bash. We fail with a clear message if
|
|
9
|
+
// bash cannot be found rather than half-running.
|
|
10
|
+
"use strict";
|
|
11
|
+
|
|
12
|
+
const { spawnSync } = require("node:child_process");
|
|
13
|
+
const path = require("node:path");
|
|
14
|
+
const fs = require("node:fs");
|
|
15
|
+
|
|
16
|
+
const cliPath = path.join(__dirname, "claude-can-speak");
|
|
17
|
+
|
|
18
|
+
if (!fs.existsSync(cliPath)) {
|
|
19
|
+
console.error("claude-can-speak: bundled CLI not found at " + cliPath);
|
|
20
|
+
process.exit(1);
|
|
21
|
+
}
|
|
22
|
+
|
|
23
|
+
// Resolve a bash interpreter. PATH lookup covers Linux/macOS and Git-Bash/WSL
|
|
24
|
+
// shims on Windows.
|
|
25
|
+
function findBash() {
|
|
26
|
+
const candidates =
|
|
27
|
+
process.platform === "win32"
|
|
28
|
+
? ["bash.exe", "bash"]
|
|
29
|
+
: ["/bin/bash", "/usr/bin/bash", "bash"];
|
|
30
|
+
for (const c of candidates) {
|
|
31
|
+
const r = spawnSync(c, ["-c", "exit 0"], { stdio: "ignore" });
|
|
32
|
+
if (r.status === 0) return c;
|
|
33
|
+
}
|
|
34
|
+
return null;
|
|
35
|
+
}
|
|
36
|
+
|
|
37
|
+
const bash = findBash();
|
|
38
|
+
if (!bash) {
|
|
39
|
+
console.error(
|
|
40
|
+
"claude-can-speak: bash is required but was not found.\n" +
|
|
41
|
+
"Install bash (Linux/macOS have it; on Windows use WSL or Git Bash)."
|
|
42
|
+
);
|
|
43
|
+
process.exit(1);
|
|
44
|
+
}
|
|
45
|
+
|
|
46
|
+
const res = spawnSync(bash, [cliPath, ...process.argv.slice(2)], {
|
|
47
|
+
stdio: "inherit",
|
|
48
|
+
});
|
|
49
|
+
process.exit(res.status === null ? 1 : res.status);
|
|
@@ -0,0 +1,33 @@
|
|
|
1
|
+
# claude-can-speak TTS container.
|
|
2
|
+
#
|
|
3
|
+
# Bundles the two synthesis engines (Piper + Kokoro) and their Python deps,
|
|
4
|
+
# but NO model weights: models are fetched on first use into /models, which
|
|
5
|
+
# is a host-mounted cache. The image therefore redistributes no third-party
|
|
6
|
+
# model files. See ../THIRD_PARTY.md for per-model licences.
|
|
7
|
+
FROM python:3.12-slim
|
|
8
|
+
|
|
9
|
+
# espeak-ng is required by Kokoro's grapheme-to-phoneme stage; harmless for
|
|
10
|
+
# Piper (which carries its own phonemization). libsndfile backs soundfile.
|
|
11
|
+
RUN apt-get update \
|
|
12
|
+
&& apt-get install -y --no-install-recommends espeak-ng libsndfile1 \
|
|
13
|
+
&& rm -rf /var/lib/apt/lists/*
|
|
14
|
+
|
|
15
|
+
# Pinned to keep synthesis reproducible. onnxruntime is the CPU build.
|
|
16
|
+
RUN pip install --no-cache-dir \
|
|
17
|
+
piper-tts==1.4.2 \
|
|
18
|
+
kokoro-onnx==0.5.0 \
|
|
19
|
+
soundfile==0.13.1 \
|
|
20
|
+
onnxruntime==1.26.0
|
|
21
|
+
|
|
22
|
+
ENV CCS_MODELS_DIR=/models
|
|
23
|
+
RUN mkdir -p /models
|
|
24
|
+
VOLUME ["/models"]
|
|
25
|
+
|
|
26
|
+
COPY synth.py /app/synth.py
|
|
27
|
+
COPY serve.sh /app/serve.sh
|
|
28
|
+
RUN chmod +x /app/serve.sh
|
|
29
|
+
|
|
30
|
+
# The container stays alive doing nothing; the host drives synthesis via
|
|
31
|
+
# `docker exec ... python3 /app/synth.py`. This keeps the Python import cost
|
|
32
|
+
# (onnxruntime, piper, kokoro) paid once per container, not once per reply.
|
|
33
|
+
ENTRYPOINT ["/app/serve.sh"]
|
|
@@ -0,0 +1,6 @@
|
|
|
1
|
+
#!/bin/sh
|
|
2
|
+
# Keep the container alive so the host can drive synthesis via `docker exec`.
|
|
3
|
+
# Paying Python + onnxruntime import cost once per container (not once per
|
|
4
|
+
# reply) is the whole point of the persistent-container design.
|
|
5
|
+
echo "[claude-can-speak] tts container ready; awaiting docker exec" >&2
|
|
6
|
+
exec tail -f /dev/null
|
|
@@ -0,0 +1,169 @@
|
|
|
1
|
+
#!/usr/bin/env python3
|
|
2
|
+
"""claude-can-speak synthesis entrypoint (runs inside the TTS container).
|
|
3
|
+
|
|
4
|
+
Reads text on stdin, writes a WAV stream on stdout. The engine and voice are
|
|
5
|
+
chosen by flags so the host-side hook stays a thin wrapper. Two engines are
|
|
6
|
+
supported:
|
|
7
|
+
|
|
8
|
+
piper VITS2, multilingual (English, German, Turkish, ...). The default.
|
|
9
|
+
kokoro Kokoro-82M, English-family only, higher naturalness.
|
|
10
|
+
|
|
11
|
+
Models are NOT bundled in the image. They are fetched on first use into
|
|
12
|
+
/models (a host-mounted cache) from their official upstreams, so the image
|
|
13
|
+
and the .deb redistribute no third-party model weights. See THIRD_PARTY.md
|
|
14
|
+
for the per-model licences.
|
|
15
|
+
|
|
16
|
+
This program prints nothing to stdout except the WAV bytes; all diagnostics
|
|
17
|
+
go to stderr so the audio stream stays clean.
|
|
18
|
+
"""
|
|
19
|
+
|
|
20
|
+
import argparse
|
|
21
|
+
import os
|
|
22
|
+
import sys
|
|
23
|
+
import urllib.request
|
|
24
|
+
|
|
25
|
+
MODELS_DIR = os.environ.get("CCS_MODELS_DIR", "/models")
|
|
26
|
+
|
|
27
|
+
# Piper voices we know about: slug -> (subpath on rhasspy/piper-voices, lang).
|
|
28
|
+
# Verified to resolve (HTTP 200) on huggingface.co/rhasspy/piper-voices.
|
|
29
|
+
PIPER_VOICES = {
|
|
30
|
+
"en_US-amy-medium": "en/en_US/amy/medium/en_US-amy-medium",
|
|
31
|
+
"en_US-lessac-high": "en/en_US/lessac/high/en_US-lessac-high",
|
|
32
|
+
"en_US-libritts_r-medium": "en/en_US/libritts_r/medium/en_US-libritts_r-medium",
|
|
33
|
+
"en_US-hfc_female-medium": "en/en_US/hfc_female/medium/en_US-hfc_female-medium",
|
|
34
|
+
"en_US-kristin-medium": "en/en_US/kristin/medium/en_US-kristin-medium",
|
|
35
|
+
"en_GB-jenny_dioco-medium": "en/en_GB/jenny_dioco/medium/en_GB-jenny_dioco-medium",
|
|
36
|
+
"de_DE-thorsten-medium": "de/de_DE/thorsten/medium/de_DE-thorsten-medium",
|
|
37
|
+
"de_DE-thorsten-high": "de/de_DE/thorsten/high/de_DE-thorsten-high",
|
|
38
|
+
"tr_TR-dfki-medium": "tr/tr_TR/dfki/medium/tr_TR-dfki-medium",
|
|
39
|
+
}
|
|
40
|
+
PIPER_BASE = "https://huggingface.co/rhasspy/piper-voices/resolve/main"
|
|
41
|
+
|
|
42
|
+
# Kokoro model + voices pack (single combined ONNX + a voices.bin).
|
|
43
|
+
KOKORO_MODEL_URL = (
|
|
44
|
+
"https://github.com/thewh1teagle/kokoro-onnx/releases/download/"
|
|
45
|
+
"model-files-v1.0/kokoro-v1.0.onnx"
|
|
46
|
+
)
|
|
47
|
+
KOKORO_VOICES_URL = (
|
|
48
|
+
"https://github.com/thewh1teagle/kokoro-onnx/releases/download/"
|
|
49
|
+
"model-files-v1.0/voices-v1.0.bin"
|
|
50
|
+
)
|
|
51
|
+
|
|
52
|
+
|
|
53
|
+
def log(msg):
|
|
54
|
+
print(f"[claude-can-speak/synth] {msg}", file=sys.stderr, flush=True)
|
|
55
|
+
|
|
56
|
+
|
|
57
|
+
def _download(url, dest):
|
|
58
|
+
"""Fetch url to dest atomically. Skips if dest already exists."""
|
|
59
|
+
if os.path.exists(dest) and os.path.getsize(dest) > 0:
|
|
60
|
+
return dest
|
|
61
|
+
os.makedirs(os.path.dirname(dest), exist_ok=True)
|
|
62
|
+
# Unique temp per process so concurrent synths never race on the same
|
|
63
|
+
# .part file (os.replace is atomic, so the last writer wins cleanly).
|
|
64
|
+
tmp = f"{dest}.part.{os.getpid()}"
|
|
65
|
+
log(f"fetching {url}")
|
|
66
|
+
req = urllib.request.Request(url, headers={"User-Agent": "claude-can-speak"})
|
|
67
|
+
try:
|
|
68
|
+
with urllib.request.urlopen(req) as r, open(tmp, "wb") as f:
|
|
69
|
+
while True:
|
|
70
|
+
chunk = r.read(1 << 16)
|
|
71
|
+
if not chunk:
|
|
72
|
+
break
|
|
73
|
+
f.write(chunk)
|
|
74
|
+
os.replace(tmp, dest)
|
|
75
|
+
finally:
|
|
76
|
+
if os.path.exists(tmp):
|
|
77
|
+
os.unlink(tmp)
|
|
78
|
+
log(f"cached {dest} ({os.path.getsize(dest)} bytes)")
|
|
79
|
+
return dest
|
|
80
|
+
|
|
81
|
+
|
|
82
|
+
def ensure_piper_voice(voice):
|
|
83
|
+
if voice not in PIPER_VOICES:
|
|
84
|
+
raise SystemExit(
|
|
85
|
+
f"unknown piper voice {voice!r}; known: {', '.join(PIPER_VOICES)}"
|
|
86
|
+
)
|
|
87
|
+
subpath = PIPER_VOICES[voice]
|
|
88
|
+
onnx = os.path.join(MODELS_DIR, "piper", voice + ".onnx")
|
|
89
|
+
conf = onnx + ".json"
|
|
90
|
+
_download(f"{PIPER_BASE}/{subpath}.onnx", onnx)
|
|
91
|
+
_download(f"{PIPER_BASE}/{subpath}.onnx.json", conf)
|
|
92
|
+
return onnx
|
|
93
|
+
|
|
94
|
+
|
|
95
|
+
def ensure_kokoro_model():
|
|
96
|
+
model = os.path.join(MODELS_DIR, "kokoro", "kokoro-v1.0.onnx")
|
|
97
|
+
voices = os.path.join(MODELS_DIR, "kokoro", "voices-v1.0.bin")
|
|
98
|
+
_download(KOKORO_MODEL_URL, model)
|
|
99
|
+
_download(KOKORO_VOICES_URL, voices)
|
|
100
|
+
return model, voices
|
|
101
|
+
|
|
102
|
+
|
|
103
|
+
def synth_piper(text, voice, speed, wav_path):
|
|
104
|
+
from piper import PiperVoice, SynthesisConfig # piper-tts (piper1-gpl)
|
|
105
|
+
import wave
|
|
106
|
+
|
|
107
|
+
onnx = ensure_piper_voice(voice)
|
|
108
|
+
v = PiperVoice.load(onnx)
|
|
109
|
+
# length_scale < 1 speeds speech up; invert the user-facing speed factor.
|
|
110
|
+
cfg = SynthesisConfig(length_scale=(1.0 / speed if speed else 1.0))
|
|
111
|
+
with wave.open(wav_path, "wb") as wf:
|
|
112
|
+
v.synthesize_wav(text, wf, syn_config=cfg)
|
|
113
|
+
|
|
114
|
+
|
|
115
|
+
def synth_kokoro(text, voice, lang, speed, wav_path):
|
|
116
|
+
from kokoro_onnx import Kokoro
|
|
117
|
+
import soundfile as sf
|
|
118
|
+
|
|
119
|
+
model, voices = ensure_kokoro_model()
|
|
120
|
+
k = Kokoro(model, voices)
|
|
121
|
+
samples, sample_rate = k.create(text, voice=voice, speed=speed, lang=lang)
|
|
122
|
+
sf.write(wav_path, samples, sample_rate, format="WAV", subtype="PCM_16")
|
|
123
|
+
|
|
124
|
+
|
|
125
|
+
def main():
|
|
126
|
+
ap = argparse.ArgumentParser(description="claude-can-speak TTS synth")
|
|
127
|
+
ap.add_argument("--engine", choices=["piper", "kokoro"], default="kokoro")
|
|
128
|
+
ap.add_argument("--voice", default="af_heart")
|
|
129
|
+
ap.add_argument("--lang", default="en-us",
|
|
130
|
+
help="kokoro language tag (e.g. en-us); ignored by piper")
|
|
131
|
+
ap.add_argument("--speed", type=float, default=1.0)
|
|
132
|
+
ap.add_argument("--text", default=None,
|
|
133
|
+
help="text to speak; if omitted, read from stdin")
|
|
134
|
+
args = ap.parse_args()
|
|
135
|
+
|
|
136
|
+
text = args.text if args.text is not None else sys.stdin.read()
|
|
137
|
+
text = text.strip()
|
|
138
|
+
if not text:
|
|
139
|
+
log("empty text, nothing to synthesize")
|
|
140
|
+
return 0
|
|
141
|
+
|
|
142
|
+
# WAV encoders (wave, soundfile) seek back to patch the header size, so
|
|
143
|
+
# they need a seekable target; a stdout pipe is not seekable. Synthesize
|
|
144
|
+
# to a temp file, then stream the bytes to stdout.
|
|
145
|
+
import tempfile
|
|
146
|
+
|
|
147
|
+
tmp = tempfile.NamedTemporaryFile(suffix=".wav", delete=False)
|
|
148
|
+
tmp.close()
|
|
149
|
+
try:
|
|
150
|
+
if args.engine == "piper":
|
|
151
|
+
synth_piper(text, args.voice, args.speed, tmp.name)
|
|
152
|
+
else:
|
|
153
|
+
synth_kokoro(text, args.voice, args.lang, args.speed, tmp.name)
|
|
154
|
+
with open(tmp.name, "rb") as f:
|
|
155
|
+
sys.stdout.buffer.write(f.read())
|
|
156
|
+
sys.stdout.buffer.flush()
|
|
157
|
+
except Exception as e: # fail loud on stderr, silent on stdout
|
|
158
|
+
log(f"synthesis failed: {e}")
|
|
159
|
+
return 1
|
|
160
|
+
finally:
|
|
161
|
+
try:
|
|
162
|
+
os.unlink(tmp.name)
|
|
163
|
+
except OSError:
|
|
164
|
+
pass
|
|
165
|
+
return 0
|
|
166
|
+
|
|
167
|
+
|
|
168
|
+
if __name__ == "__main__":
|
|
169
|
+
sys.exit(main())
|
|
@@ -0,0 +1,43 @@
|
|
|
1
|
+
#!/usr/bin/env python3
|
|
2
|
+
"""Strip markdown and code from a Claude reply so it reads naturally aloud.
|
|
3
|
+
|
|
4
|
+
Reads text on stdin, writes cleaned text on stdout. argv[1] (optional) is the
|
|
5
|
+
maximum character count; longer text is truncated on a sentence boundary so a
|
|
6
|
+
long reply does not monologue forever. Fenced code blocks are removed entirely
|
|
7
|
+
(they are unlistenable); inline code keeps its contents.
|
|
8
|
+
"""
|
|
9
|
+
import re
|
|
10
|
+
import sys
|
|
11
|
+
|
|
12
|
+
|
|
13
|
+
def clean(text, maxc):
|
|
14
|
+
# Remove fenced code blocks entirely.
|
|
15
|
+
text = re.sub(r"```.*?```", " ", text, flags=re.DOTALL)
|
|
16
|
+
text = re.sub(r"~~~.*?~~~", " ", text, flags=re.DOTALL)
|
|
17
|
+
# Inline code: keep contents, drop backticks.
|
|
18
|
+
text = re.sub(r"`([^`]*)`", r"\1", text)
|
|
19
|
+
# Images / links: keep visible text, drop URL.
|
|
20
|
+
text = re.sub(r"!\[[^\]]*\]\([^)]*\)", " ", text)
|
|
21
|
+
text = re.sub(r"\[([^\]]+)\]\([^)]*\)", r"\1", text)
|
|
22
|
+
# Headings, list bullets, blockquote markers.
|
|
23
|
+
text = re.sub(r"^[ \t]*#{1,6}[ \t]*", "", text, flags=re.MULTILINE)
|
|
24
|
+
text = re.sub(r"^[ \t]*[-*+][ \t]+", "", text, flags=re.MULTILINE)
|
|
25
|
+
text = re.sub(r"^[ \t]*>[ \t]?", "", text, flags=re.MULTILINE)
|
|
26
|
+
# Emphasis markers around a run of text.
|
|
27
|
+
text = re.sub(r"[*_]{1,3}([^*_]+)[*_]{1,3}", r"\1", text)
|
|
28
|
+
# Collapse whitespace.
|
|
29
|
+
text = re.sub(r"\s+", " ", text).strip()
|
|
30
|
+
if maxc and len(text) > maxc:
|
|
31
|
+
cut = text[:maxc]
|
|
32
|
+
m = max(cut.rfind(". "), cut.rfind("! "), cut.rfind("? "))
|
|
33
|
+
text = cut[: m + 1] if m > maxc * 0.5 else cut
|
|
34
|
+
return text
|
|
35
|
+
|
|
36
|
+
|
|
37
|
+
def main():
|
|
38
|
+
maxc = int(sys.argv[1]) if len(sys.argv) > 1 else 700
|
|
39
|
+
sys.stdout.write(clean(sys.stdin.read(), maxc))
|
|
40
|
+
|
|
41
|
+
|
|
42
|
+
if __name__ == "__main__":
|
|
43
|
+
main()
|
|
@@ -0,0 +1,85 @@
|
|
|
1
|
+
#!/usr/bin/env bash
|
|
2
|
+
# claude-can-speak: register (or remove) the Stop + UserPromptSubmit hooks in
|
|
3
|
+
# the user's Claude Code settings.json, merging into any existing hooks rather
|
|
4
|
+
# than overwriting them. Idempotent.
|
|
5
|
+
set -uo pipefail
|
|
6
|
+
|
|
7
|
+
SETTINGS_JSON="${CLAUDE_SETTINGS:-$HOME/.claude/settings.json}"
|
|
8
|
+
|
|
9
|
+
# Resolve where the installed hook scripts live (deb vs git checkout).
|
|
10
|
+
SELF="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
|
|
11
|
+
if [ -f "/usr/lib/claude-can-speak/tts-speak.sh" ]; then
|
|
12
|
+
HOOKDIR="/usr/lib/claude-can-speak"
|
|
13
|
+
else
|
|
14
|
+
HOOKDIR="$SELF"
|
|
15
|
+
fi
|
|
16
|
+
SPEAK_HOOK="$HOOKDIR/tts-speak.sh"
|
|
17
|
+
INTERRUPT_HOOK="$HOOKDIR/interrupt.sh"
|
|
18
|
+
|
|
19
|
+
action="${1:-install}"
|
|
20
|
+
|
|
21
|
+
[ -f "$SETTINGS_JSON" ] || { mkdir -p "$(dirname "$SETTINGS_JSON")"; echo '{}' >"$SETTINGS_JSON"; }
|
|
22
|
+
|
|
23
|
+
python3 - "$SETTINGS_JSON" "$action" "$SPEAK_HOOK" "$INTERRUPT_HOOK" <<'PY'
|
|
24
|
+
import json, sys, os, shutil
|
|
25
|
+
|
|
26
|
+
path, action, speak_hook, interrupt_hook = sys.argv[1:5]
|
|
27
|
+
|
|
28
|
+
with open(path) as f:
|
|
29
|
+
cfg = json.load(f)
|
|
30
|
+
|
|
31
|
+
hooks = cfg.setdefault("hooks", {})
|
|
32
|
+
|
|
33
|
+
MARK = "tts-speak.sh" # how we recognise our Stop hook
|
|
34
|
+
IMARK = "interrupt.sh" # how we recognise our UserPromptSubmit hook
|
|
35
|
+
|
|
36
|
+
def groups(event):
|
|
37
|
+
return hooks.setdefault(event, [])
|
|
38
|
+
|
|
39
|
+
def strip(event, needle):
|
|
40
|
+
"""Remove any hook entry whose command mentions needle."""
|
|
41
|
+
kept = []
|
|
42
|
+
for group in hooks.get(event, []):
|
|
43
|
+
group_hooks = [h for h in group.get("hooks", [])
|
|
44
|
+
if needle not in str(h.get("command", ""))]
|
|
45
|
+
if group_hooks:
|
|
46
|
+
group = dict(group); group["hooks"] = group_hooks
|
|
47
|
+
kept.append(group)
|
|
48
|
+
if kept:
|
|
49
|
+
hooks[event] = kept
|
|
50
|
+
elif event in hooks:
|
|
51
|
+
del hooks[event]
|
|
52
|
+
|
|
53
|
+
# Always strip our previous entries first so the operation is idempotent.
|
|
54
|
+
strip("Stop", MARK)
|
|
55
|
+
strip("UserPromptSubmit", IMARK)
|
|
56
|
+
|
|
57
|
+
if action == "install":
|
|
58
|
+
groups("Stop").append({
|
|
59
|
+
"hooks": [{"type": "command", "command": speak_hook, "timeout": 15}]
|
|
60
|
+
})
|
|
61
|
+
groups("UserPromptSubmit").append({
|
|
62
|
+
"hooks": [{"type": "command", "command": interrupt_hook, "timeout": 5}]
|
|
63
|
+
})
|
|
64
|
+
elif action == "remove":
|
|
65
|
+
pass
|
|
66
|
+
else:
|
|
67
|
+
sys.stderr.write("action must be install or remove\n")
|
|
68
|
+
sys.exit(2)
|
|
69
|
+
|
|
70
|
+
# Back up once, then write atomically.
|
|
71
|
+
if not os.path.exists(path + ".ccs-bak"):
|
|
72
|
+
shutil.copy2(path, path + ".ccs-bak")
|
|
73
|
+
tmp = path + ".tmp"
|
|
74
|
+
with open(tmp, "w") as f:
|
|
75
|
+
json.dump(cfg, f, indent=2)
|
|
76
|
+
f.write("\n")
|
|
77
|
+
os.replace(tmp, path)
|
|
78
|
+
print(f"{action}ed claude-can-speak hooks in {path}")
|
|
79
|
+
PY
|
|
80
|
+
|
|
81
|
+
if [ "$action" = install ]; then
|
|
82
|
+
echo "Stop hook -> $SPEAK_HOOK"
|
|
83
|
+
echo "UserPromptSubmit -> $INTERRUPT_HOOK"
|
|
84
|
+
echo "Note: Claude Code loads hooks at session start; restart your session to activate."
|
|
85
|
+
fi
|
|
@@ -0,0 +1,22 @@
|
|
|
1
|
+
#!/usr/bin/env bash
|
|
2
|
+
# claude-can-speak: UserPromptSubmit hook. When you send your next message,
|
|
3
|
+
# stop whatever the previous reply was still speaking. This makes "just start
|
|
4
|
+
# typing" the natural interrupt: the moment you move on, the voice goes quiet.
|
|
5
|
+
#
|
|
6
|
+
# Fails silent, returns immediately.
|
|
7
|
+
set -uo pipefail
|
|
8
|
+
|
|
9
|
+
CCS_HOME="${CCS_HOME:-$HOME/.config/claude-can-speak}"
|
|
10
|
+
PIDFILE="$CCS_HOME/speaking.pid"
|
|
11
|
+
|
|
12
|
+
# Drain stdin (the hook payload) so the caller's pipe never blocks.
|
|
13
|
+
cat >/dev/null 2>&1 || true
|
|
14
|
+
|
|
15
|
+
if [ -f "$PIDFILE" ]; then
|
|
16
|
+
pid="$(cat "$PIDFILE" 2>/dev/null)"
|
|
17
|
+
if [ -n "$pid" ] && kill -0 "$pid" 2>/dev/null; then
|
|
18
|
+
kill -TERM "-$pid" 2>/dev/null || kill -TERM "$pid" 2>/dev/null
|
|
19
|
+
fi
|
|
20
|
+
rm -f "$PIDFILE" 2>/dev/null || true
|
|
21
|
+
fi
|
|
22
|
+
exit 0
|
|
@@ -0,0 +1,59 @@
|
|
|
1
|
+
#!/usr/bin/env bash
|
|
2
|
+
# claude-can-speak speak worker. Launched detached (setsid) by tts-speak.sh as
|
|
3
|
+
# a new session leader, so $$ is this worker's process-group id. It records that
|
|
4
|
+
# pid IMMEDIATELY - before the multi-second synthesis - so an interrupt issued
|
|
5
|
+
# at any point (during synth or during playback) kills the whole group via
|
|
6
|
+
# `kill -TERM -$pid`. Cleaned text arrives on stdin; config via the environment.
|
|
7
|
+
set -uo pipefail
|
|
8
|
+
|
|
9
|
+
: "${CCS_HOME:=$HOME/.config/claude-can-speak}"
|
|
10
|
+
: "${LOG:=$CCS_HOME/claude-can-speak.log}"
|
|
11
|
+
: "${PIDFILE:=$CCS_HOME/speaking.pid}"
|
|
12
|
+
: "${CONTAINER:=ccs-tts}"
|
|
13
|
+
: "${IMAGE:=claude-can-speak:latest}"
|
|
14
|
+
: "${MODELS_DIR:=$HOME/.cache/claude-can-speak/models}"
|
|
15
|
+
: "${PLAYER:=pw-play}"
|
|
16
|
+
: "${ENGINE:=kokoro}"
|
|
17
|
+
: "${VOICE:=af_heart}"
|
|
18
|
+
: "${CCS_LANG:=en-us}"
|
|
19
|
+
: "${SPEED:=1.0}"
|
|
20
|
+
|
|
21
|
+
log() { printf '%s %s\n' "$(date -Is)" "$*" >>"$LOG" 2>/dev/null || true; }
|
|
22
|
+
|
|
23
|
+
# Record our group id up front so we are interruptible during synthesis.
|
|
24
|
+
printf '%s' "$$" >"$PIDFILE" 2>/dev/null || true
|
|
25
|
+
# If we are signalled, drop the pidfile on the way out.
|
|
26
|
+
trap 'rm -f "$PIDFILE" 2>/dev/null; exit 0' TERM INT
|
|
27
|
+
|
|
28
|
+
ensure_container() {
|
|
29
|
+
if docker inspect -f '{{.State.Running}}' "$CONTAINER" 2>/dev/null | grep -q true; then
|
|
30
|
+
return 0
|
|
31
|
+
fi
|
|
32
|
+
docker rm -f "$CONTAINER" >/dev/null 2>&1 || true
|
|
33
|
+
docker image inspect "$IMAGE" >/dev/null 2>&1 || { log "image $IMAGE missing"; return 1; }
|
|
34
|
+
mkdir -p "$MODELS_DIR" 2>/dev/null || true
|
|
35
|
+
docker run -d --name "$CONTAINER" -v "$MODELS_DIR:/models" "$IMAGE" >/dev/null 2>&1 \
|
|
36
|
+
|| { log "container start failed"; return 1; }
|
|
37
|
+
log "started container $CONTAINER"
|
|
38
|
+
}
|
|
39
|
+
|
|
40
|
+
CLEAN="$(cat)"
|
|
41
|
+
[ -n "$CLEAN" ] || { rm -f "$PIDFILE" 2>/dev/null; exit 0; }
|
|
42
|
+
|
|
43
|
+
ensure_container || { rm -f "$PIDFILE" 2>/dev/null; exit 0; }
|
|
44
|
+
|
|
45
|
+
wav="$(mktemp --suffix=.wav 2>/dev/null)" || { rm -f "$PIDFILE" 2>/dev/null; exit 0; }
|
|
46
|
+
if printf '%s' "$CLEAN" | docker exec -i "$CONTAINER" \
|
|
47
|
+
python3 /app/synth.py \
|
|
48
|
+
--engine "$ENGINE" --voice "$VOICE" --lang "$CCS_LANG" --speed "$SPEED" \
|
|
49
|
+
>"$wav" 2>>"$LOG" && [ -s "$wav" ]; then
|
|
50
|
+
"$PLAYER" "$wav" >/dev/null 2>&1
|
|
51
|
+
else
|
|
52
|
+
log "synthesis produced no audio (engine=$ENGINE voice=$VOICE)"
|
|
53
|
+
fi
|
|
54
|
+
rm -f "$wav" 2>/dev/null || true
|
|
55
|
+
|
|
56
|
+
# Clear the pidfile only if it still points at us (a newer reply may have
|
|
57
|
+
# replaced it while we were speaking).
|
|
58
|
+
[ "$(cat "$PIDFILE" 2>/dev/null)" = "$$" ] && rm -f "$PIDFILE" 2>/dev/null
|
|
59
|
+
exit 0
|
|
@@ -0,0 +1,126 @@
|
|
|
1
|
+
#!/usr/bin/env bash
|
|
2
|
+
# claude-can-speak: Claude Code Stop hook that speaks each finished reply.
|
|
3
|
+
#
|
|
4
|
+
# Gated on /voice mode: it reads the same voiceEnabled / voice.enabled flag
|
|
5
|
+
# that the built-in /voice (speech-in) toggles, so one switch controls both
|
|
6
|
+
# directions. Voice off -> this exits silently and Claude Code stays text-only.
|
|
7
|
+
#
|
|
8
|
+
# Design contract:
|
|
9
|
+
# - Non-blocking: synthesis + playback run in a detached background job so
|
|
10
|
+
# the hook returns immediately and never delays the next turn.
|
|
11
|
+
# - Fails silent: any error (no docker, no audio, gate off) exits 0 quietly.
|
|
12
|
+
# - Absolute paths only; no reliance on the caller's PATH or cwd.
|
|
13
|
+
#
|
|
14
|
+
# Provided AS IS, no warranty. See the project README for the full disclaimer.
|
|
15
|
+
set -uo pipefail
|
|
16
|
+
|
|
17
|
+
# --- Resolve config -------------------------------------------------------
|
|
18
|
+
CCS_HOME="${CCS_HOME:-$HOME/.config/claude-can-speak}"
|
|
19
|
+
CCS_CONFIG="$CCS_HOME/config.env"
|
|
20
|
+
SETTINGS_JSON="${CLAUDE_SETTINGS:-$HOME/.claude/settings.json}"
|
|
21
|
+
CONTAINER="${CCS_CONTAINER:-ccs-tts}"
|
|
22
|
+
IMAGE="${CCS_IMAGE:-claude-can-speak:latest}"
|
|
23
|
+
MODELS_DIR="${CCS_MODELS_DIR:-$HOME/.cache/claude-can-speak/models}"
|
|
24
|
+
LOG="$CCS_HOME/claude-can-speak.log"
|
|
25
|
+
PIDFILE="$CCS_HOME/speaking.pid"
|
|
26
|
+
|
|
27
|
+
# Defaults (overridable via config.env). Chosen from a listening test:
|
|
28
|
+
# Kokoro af_heart, US female, is the most natural.
|
|
29
|
+
ENGINE="kokoro"
|
|
30
|
+
VOICE="af_heart"
|
|
31
|
+
LANG="en-us"
|
|
32
|
+
SPEED="1.0"
|
|
33
|
+
MAX_CHARS="700"
|
|
34
|
+
|
|
35
|
+
# shellcheck source=/dev/null
|
|
36
|
+
[ -f "$CCS_CONFIG" ] && . "$CCS_CONFIG"
|
|
37
|
+
|
|
38
|
+
mkdir -p "$CCS_HOME" 2>/dev/null || true
|
|
39
|
+
|
|
40
|
+
log() { printf '%s %s\n' "$(date -Is)" "$*" >>"$LOG" 2>/dev/null || true; }
|
|
41
|
+
|
|
42
|
+
# --- Tool availability ----------------------------------------------------
|
|
43
|
+
command -v docker >/dev/null 2>&1 || { log "no docker; silent"; exit 0; }
|
|
44
|
+
|
|
45
|
+
PLAYER=""
|
|
46
|
+
for p in pw-play paplay aplay; do
|
|
47
|
+
if command -v "$p" >/dev/null 2>&1; then PLAYER="$p"; break; fi
|
|
48
|
+
done
|
|
49
|
+
[ -n "$PLAYER" ] || { log "no audio player; silent"; exit 0; }
|
|
50
|
+
|
|
51
|
+
# --- Read the Stop hook payload ------------------------------------------
|
|
52
|
+
PAYLOAD="$(cat)"
|
|
53
|
+
|
|
54
|
+
# --- Gate: only speak when /voice mode is on ------------------------------
|
|
55
|
+
# Read voiceEnabled OR voice.enabled from settings.json. Absent/false = off.
|
|
56
|
+
voice_on() {
|
|
57
|
+
[ -f "$SETTINGS_JSON" ] || return 1
|
|
58
|
+
if command -v jq >/dev/null 2>&1; then
|
|
59
|
+
local v
|
|
60
|
+
v="$(jq -r '(.voiceEnabled // .voice.enabled // false) | tostring' \
|
|
61
|
+
"$SETTINGS_JSON" 2>/dev/null)"
|
|
62
|
+
[ "$v" = "true" ]
|
|
63
|
+
return
|
|
64
|
+
fi
|
|
65
|
+
# jq-less fallback: grep the two known keys.
|
|
66
|
+
grep -Eq '"voiceEnabled"[[:space:]]*:[[:space:]]*true' "$SETTINGS_JSON" && return 0
|
|
67
|
+
grep -Eq '"enabled"[[:space:]]*:[[:space:]]*true' "$SETTINGS_JSON" && return 0
|
|
68
|
+
return 1
|
|
69
|
+
}
|
|
70
|
+
voice_on || { log "voice gate off; silent"; exit 0; }
|
|
71
|
+
|
|
72
|
+
# --- Extract the reply text ----------------------------------------------
|
|
73
|
+
extract_text() {
|
|
74
|
+
if command -v jq >/dev/null 2>&1; then
|
|
75
|
+
printf '%s' "$PAYLOAD" | jq -r '.last_assistant_message // empty' 2>/dev/null
|
|
76
|
+
else
|
|
77
|
+
# Minimal fallback: pull last_assistant_message with python if present.
|
|
78
|
+
printf '%s' "$PAYLOAD" | python3 -c \
|
|
79
|
+
'import sys,json; print(json.load(sys.stdin).get("last_assistant_message",""))' \
|
|
80
|
+
2>/dev/null
|
|
81
|
+
fi
|
|
82
|
+
}
|
|
83
|
+
TEXT="$(extract_text)"
|
|
84
|
+
[ -n "$TEXT" ] || { log "no last_assistant_message; silent"; exit 0; }
|
|
85
|
+
|
|
86
|
+
# --- Clean text for speech ------------------------------------------------
|
|
87
|
+
# Drop fenced code blocks (unlistenable), strip common markdown, collapse
|
|
88
|
+
# whitespace, cap length. Logic lives in clean.py beside this script.
|
|
89
|
+
SELF_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
|
|
90
|
+
CLEAN_PY="${CCS_CLEAN_PY:-$SELF_DIR/clean.py}"
|
|
91
|
+
[ -f "$CLEAN_PY" ] || { log "clean.py not found at $CLEAN_PY; silent"; exit 0; }
|
|
92
|
+
CLEAN="$(printf '%s' "$TEXT" | python3 "$CLEAN_PY" "$MAX_CHARS" 2>>"$LOG")"
|
|
93
|
+
[ -n "$CLEAN" ] || { log "empty after cleaning; silent"; exit 0; }
|
|
94
|
+
|
|
95
|
+
# --- Interrupt any currently-playing reply --------------------------------
|
|
96
|
+
# A new reply supersedes the previous one: stop stale playback before we
|
|
97
|
+
# start. The same logic backs `claude-can-speak stop`.
|
|
98
|
+
stop_current() {
|
|
99
|
+
[ -f "$PIDFILE" ] || return 0
|
|
100
|
+
local pid
|
|
101
|
+
pid="$(cat "$PIDFILE" 2>/dev/null)"
|
|
102
|
+
if [ -n "$pid" ] && kill -0 "$pid" 2>/dev/null; then
|
|
103
|
+
# Kill the player and its group so the pipeline dies with it.
|
|
104
|
+
kill -TERM "-$pid" 2>/dev/null || kill -TERM "$pid" 2>/dev/null
|
|
105
|
+
fi
|
|
106
|
+
rm -f "$PIDFILE" 2>/dev/null || true
|
|
107
|
+
}
|
|
108
|
+
|
|
109
|
+
# --- Speak (detached worker) ----------------------------------------------
|
|
110
|
+
# A fresh reply interrupts whatever was still being spoken.
|
|
111
|
+
stop_current
|
|
112
|
+
|
|
113
|
+
# Launch the worker in a new session (setsid) so the hook returns immediately
|
|
114
|
+
# and the worker is a process-group leader: an interrupt can signal its whole
|
|
115
|
+
# group (in-flight synth + player) by the single recorded pid. Config is passed
|
|
116
|
+
# through the environment; the cleaned text arrives on the worker's stdin.
|
|
117
|
+
SELF_DIR="${SELF_DIR:-$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)}"
|
|
118
|
+
WORKER="${CCS_WORKER:-$SELF_DIR/speak-worker.sh}"
|
|
119
|
+
[ -f "$WORKER" ] || { log "speak-worker.sh not found at $WORKER; silent"; exit 0; }
|
|
120
|
+
|
|
121
|
+
printf '%s' "$CLEAN" | \
|
|
122
|
+
CCS_HOME="$CCS_HOME" LOG="$LOG" PIDFILE="$PIDFILE" \
|
|
123
|
+
CONTAINER="$CONTAINER" IMAGE="$IMAGE" MODELS_DIR="$MODELS_DIR" PLAYER="$PLAYER" \
|
|
124
|
+
ENGINE="$ENGINE" VOICE="$VOICE" CCS_LANG="$LANG" SPEED="$SPEED" \
|
|
125
|
+
setsid bash "$WORKER" >/dev/null 2>&1 &
|
|
126
|
+
exit 0
|
package/package.json
ADDED
|
@@ -0,0 +1,49 @@
|
|
|
1
|
+
{
|
|
2
|
+
"name": "claude-can-speak",
|
|
3
|
+
"version": "0.1.0",
|
|
4
|
+
"description": "Speech-out for Claude Code: speak replies aloud (Stop-hook firehose) or let Claude voice deliberate notifications (skill). Local neural TTS via Kokoro/Piper in Docker. Gated on /voice mode.",
|
|
5
|
+
"keywords": [
|
|
6
|
+
"claude",
|
|
7
|
+
"claude-code",
|
|
8
|
+
"tts",
|
|
9
|
+
"text-to-speech",
|
|
10
|
+
"voice",
|
|
11
|
+
"kokoro",
|
|
12
|
+
"piper",
|
|
13
|
+
"speech",
|
|
14
|
+
"skill",
|
|
15
|
+
"hook"
|
|
16
|
+
],
|
|
17
|
+
"homepage": "https://ra-yavuz.github.io/claude-can-speak/",
|
|
18
|
+
"bugs": {
|
|
19
|
+
"url": "https://github.com/ra-yavuz/claude-can-speak/issues"
|
|
20
|
+
},
|
|
21
|
+
"repository": {
|
|
22
|
+
"type": "git",
|
|
23
|
+
"url": "git+https://github.com/ra-yavuz/claude-can-speak.git"
|
|
24
|
+
},
|
|
25
|
+
"license": "MIT",
|
|
26
|
+
"author": "Ramazan Yavuz <yavuzramazan1994@gmail.com>",
|
|
27
|
+
"type": "commonjs",
|
|
28
|
+
"bin": {
|
|
29
|
+
"claude-can-speak": "bin/cli.js"
|
|
30
|
+
},
|
|
31
|
+
"files": [
|
|
32
|
+
"bin/cli.js",
|
|
33
|
+
"bin/claude-can-speak",
|
|
34
|
+
"lib/claude-can-speak/",
|
|
35
|
+
"container/",
|
|
36
|
+
"skills/",
|
|
37
|
+
"THIRD_PARTY.md",
|
|
38
|
+
"LICENSE",
|
|
39
|
+
"README.md"
|
|
40
|
+
],
|
|
41
|
+
"engines": {
|
|
42
|
+
"node": ">=16"
|
|
43
|
+
},
|
|
44
|
+
"os": [
|
|
45
|
+
"linux",
|
|
46
|
+
"darwin",
|
|
47
|
+
"win32"
|
|
48
|
+
]
|
|
49
|
+
}
|
|
@@ -0,0 +1,53 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: speak
|
|
3
|
+
description: >-
|
|
4
|
+
Speak a short message aloud through the user's speakers using claude-can-speak
|
|
5
|
+
(text-to-speech). Use this for deliberate, selective audio: a spoken
|
|
6
|
+
notification when a long task finishes, a heads-up that needs attention while
|
|
7
|
+
the user is looking away, a brief shoutout or status callout, or when the user
|
|
8
|
+
asks you to say something out loud. This is NOT for reading every reply aloud
|
|
9
|
+
(that is the separate Stop-hook mode); use it only for things genuinely worth
|
|
10
|
+
hearing. Keep spoken text to one or two short sentences.
|
|
11
|
+
argument-hint: "[message to speak]"
|
|
12
|
+
allowed-tools: Bash(claude-can-speak say *)
|
|
13
|
+
---
|
|
14
|
+
|
|
15
|
+
# Speak aloud
|
|
16
|
+
|
|
17
|
+
You can send a short message to the user's speakers with the `claude-can-speak`
|
|
18
|
+
text-to-speech tool. Use it deliberately, not for everything.
|
|
19
|
+
|
|
20
|
+
## When to speak
|
|
21
|
+
|
|
22
|
+
Good uses:
|
|
23
|
+
- A finished long-running task the user stepped away from: "The build is done and all tests passed."
|
|
24
|
+
- A notification that wants attention now: "Heads up, the deploy needs your confirmation."
|
|
25
|
+
- A short status callout or shoutout the user asked for out loud.
|
|
26
|
+
- The user explicitly says "tell me out loud", "say this", "read that back", etc.
|
|
27
|
+
|
|
28
|
+
Do NOT use it to narrate routine replies, read long passages, or speak code,
|
|
29
|
+
file paths, commands, or anything awkward to hear. If it is not worth
|
|
30
|
+
interrupting the user's ears for, keep it text-only.
|
|
31
|
+
|
|
32
|
+
## How to speak
|
|
33
|
+
|
|
34
|
+
Run the CLI with the exact text to say (one or two short sentences, plain
|
|
35
|
+
words, no markdown or code):
|
|
36
|
+
|
|
37
|
+
```
|
|
38
|
+
claude-can-speak say "Your short message here."
|
|
39
|
+
```
|
|
40
|
+
|
|
41
|
+
If a message was passed to this skill, speak that:
|
|
42
|
+
|
|
43
|
+
```
|
|
44
|
+
claude-can-speak say "$ARGUMENTS"
|
|
45
|
+
```
|
|
46
|
+
|
|
47
|
+
Notes:
|
|
48
|
+
- The user can interrupt playback at any time with `claude-can-speak stop`, or
|
|
49
|
+
simply by sending their next message.
|
|
50
|
+
- `say` speaks regardless of /voice mode (it is a deliberate, explicit call),
|
|
51
|
+
whereas the firehose Stop-hook mode only speaks while /voice is on.
|
|
52
|
+
- If the command reports the image or container is missing, tell the user to run
|
|
53
|
+
`claude-can-speak build` once, then retry; do not keep retrying silently.
|