vupai 0.1.0__tar.gz
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- vupai-0.1.0/.gitignore +17 -0
- vupai-0.1.0/LICENSE +21 -0
- vupai-0.1.0/PKG-INFO +414 -0
- vupai-0.1.0/README.md +384 -0
- vupai-0.1.0/assets/brand/README.md +34 -0
- vupai-0.1.0/pyproject.toml +85 -0
- vupai-0.1.0/src/vupai/__init__.py +4 -0
- vupai-0.1.0/src/vupai/__main__.py +6 -0
- vupai-0.1.0/src/vupai/asr.py +91 -0
- vupai-0.1.0/src/vupai/audio.py +147 -0
- vupai-0.1.0/src/vupai/board.py +510 -0
- vupai-0.1.0/src/vupai/claude_summarize.py +104 -0
- vupai-0.1.0/src/vupai/cli.py +1443 -0
- vupai-0.1.0/src/vupai/commands.py +1358 -0
- vupai-0.1.0/src/vupai/config.py +658 -0
- vupai-0.1.0/src/vupai/confirm.py +106 -0
- vupai-0.1.0/src/vupai/daemon.py +787 -0
- vupai-0.1.0/src/vupai/feedback.py +155 -0
- vupai-0.1.0/src/vupai/filler.py +65 -0
- vupai-0.1.0/src/vupai/hosts.py +117 -0
- vupai-0.1.0/src/vupai/hotkey.py +189 -0
- vupai-0.1.0/src/vupai/injector.py +118 -0
- vupai-0.1.0/src/vupai/journal.py +75 -0
- vupai-0.1.0/src/vupai/panestate.py +214 -0
- vupai-0.1.0/src/vupai/permissions.py +234 -0
- vupai-0.1.0/src/vupai/platform_guard.py +47 -0
- vupai-0.1.0/src/vupai/recorder.py +84 -0
- vupai-0.1.0/src/vupai/registry.py +92 -0
- vupai-0.1.0/src/vupai/router.py +366 -0
- vupai-0.1.0/src/vupai/speech.py +184 -0
- vupai-0.1.0/src/vupai/summarize.py +349 -0
- vupai-0.1.0/src/vupai/tips.py +144 -0
- vupai-0.1.0/src/vupai/tmuxio.py +805 -0
- vupai-0.1.0/src/vupai/watcher.py +149 -0
vupai-0.1.0/.gitignore
ADDED
|
@@ -0,0 +1,17 @@
|
|
|
1
|
+
# Local-only design/plan docs — never committed (see CLAUDE.md)
|
|
2
|
+
docs/superpowers/
|
|
3
|
+
|
|
4
|
+
# Subagent-driven-development scratch: briefs, reports, review diffs, ledger
|
|
5
|
+
.superpowers/
|
|
6
|
+
|
|
7
|
+
# Python
|
|
8
|
+
__pycache__/
|
|
9
|
+
*.py[cod]
|
|
10
|
+
.venv/
|
|
11
|
+
venv/
|
|
12
|
+
*.egg-info/
|
|
13
|
+
.pytest_cache/
|
|
14
|
+
.ruff_cache/
|
|
15
|
+
|
|
16
|
+
# macOS
|
|
17
|
+
.DS_Store
|
vupai-0.1.0/LICENSE
ADDED
|
@@ -0,0 +1,21 @@
|
|
|
1
|
+
MIT License
|
|
2
|
+
|
|
3
|
+
Copyright (c) 2026 José Andrade
|
|
4
|
+
|
|
5
|
+
Permission is hereby granted, free of charge, to any person obtaining a copy
|
|
6
|
+
of this software and associated documentation files (the "Software"), to deal
|
|
7
|
+
in the Software without restriction, including without limitation the rights
|
|
8
|
+
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
|
|
9
|
+
copies of the Software, and to permit persons to whom the Software is
|
|
10
|
+
furnished to do so, subject to the following conditions:
|
|
11
|
+
|
|
12
|
+
The above copyright notice and this permission notice shall be included in all
|
|
13
|
+
copies or substantial portions of the Software.
|
|
14
|
+
|
|
15
|
+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
|
16
|
+
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
|
17
|
+
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
|
|
18
|
+
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
|
19
|
+
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
|
|
20
|
+
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
|
|
21
|
+
SOFTWARE.
|
vupai-0.1.0/PKG-INFO
ADDED
|
@@ -0,0 +1,414 @@
|
|
|
1
|
+
Metadata-Version: 2.4
|
|
2
|
+
Name: vupai
|
|
3
|
+
Version: 0.1.0
|
|
4
|
+
Summary: Voice UI for AI panes - push-to-talk voice control over tmux agent panes
|
|
5
|
+
Project-URL: Homepage, https://github.com/itsjrsa/vupai
|
|
6
|
+
Project-URL: Repository, https://github.com/itsjrsa/vupai
|
|
7
|
+
Project-URL: Issues, https://github.com/itsjrsa/vupai/issues
|
|
8
|
+
Author-email: José Andrade <jrsa2012@gmail.com>
|
|
9
|
+
License-Expression: MIT
|
|
10
|
+
License-File: LICENSE
|
|
11
|
+
Keywords: ai-agents,apple-silicon,macos,mlx,parakeet,push-to-talk,speech-to-text,tmux,voice
|
|
12
|
+
Classifier: Development Status :: 4 - Beta
|
|
13
|
+
Classifier: Environment :: Console
|
|
14
|
+
Classifier: Environment :: MacOS X
|
|
15
|
+
Classifier: Intended Audience :: Developers
|
|
16
|
+
Classifier: Operating System :: MacOS :: MacOS X
|
|
17
|
+
Classifier: Programming Language :: Python :: 3
|
|
18
|
+
Classifier: Programming Language :: Python :: 3.11
|
|
19
|
+
Classifier: Programming Language :: Python :: 3.12
|
|
20
|
+
Classifier: Programming Language :: Python :: 3.13
|
|
21
|
+
Classifier: Topic :: Multimedia :: Sound/Audio :: Speech
|
|
22
|
+
Classifier: Topic :: Terminals
|
|
23
|
+
Classifier: Topic :: Utilities
|
|
24
|
+
Requires-Python: >=3.11
|
|
25
|
+
Requires-Dist: metaphone
|
|
26
|
+
Requires-Dist: parakeet-mlx; sys_platform == 'darwin' and platform_machine == 'arm64'
|
|
27
|
+
Requires-Dist: pynput
|
|
28
|
+
Requires-Dist: rapidfuzz
|
|
29
|
+
Description-Content-Type: text/markdown
|
|
30
|
+
|
|
31
|
+
<p align="center">
|
|
32
|
+
<picture>
|
|
33
|
+
<source media="(prefers-color-scheme: dark)" srcset="https://raw.githubusercontent.com/itsjrsa/vupai/master/assets/brand/vupai-lockup-dark.png">
|
|
34
|
+
<img alt="vupai" src="https://raw.githubusercontent.com/itsjrsa/vupai/master/assets/brand/vupai-lockup.png" width="260">
|
|
35
|
+
</picture>
|
|
36
|
+
</p>
|
|
37
|
+
|
|
38
|
+
<p align="center">
|
|
39
|
+
<strong>Voice UI for AI panes</strong>: push-to-talk voice control for your tmux agent panes, on macOS, fully local.
|
|
40
|
+
</p>
|
|
41
|
+
|
|
42
|
+
<p align="center">
|
|
43
|
+
<a href="https://pypi.org/project/vupai/"><img alt="PyPI" src="https://img.shields.io/pypi/v/vupai.svg"></a>
|
|
44
|
+
<a href="./LICENSE"><img alt="License: MIT" src="https://img.shields.io/badge/License-MIT-blue.svg"></a>
|
|
45
|
+
<a href="https://www.python.org/"><img alt="Python" src="https://img.shields.io/badge/python-%3E%3D3.11-brightgreen.svg"></a>
|
|
46
|
+
<img alt="Platform" src="https://img.shields.io/badge/platform-macOS%20Apple%20Silicon-black.svg">
|
|
47
|
+
</p>
|
|
48
|
+
|
|
49
|
+
*vupai* (say "voo-pie") is a **V**oice **U**I for your AI **pa**nes.
|
|
50
|
+
|
|
51
|
+
Hold a key, speak, and what you say is typed into the right tmux pane: the one
|
|
52
|
+
you're looking at, or an agent you call by name (*"atlas, run the tests"*).
|
|
53
|
+
Speech-to-text runs on-device with NVIDIA Parakeet (via Apple MLX): no cloud,
|
|
54
|
+
no API keys.
|
|
55
|
+
|
|
56
|
+
Built for a tmux-centric workflow where you keep several coding agents and
|
|
57
|
+
shells open at once and want to drive them by voice without reaching for the
|
|
58
|
+
mouse. New panes launch an agent by default (`claude` out of the box) and should
|
|
59
|
+
work with other agentic coding tools (Codex, Gemini, …), though testing so far
|
|
60
|
+
has focused on Claude Code.
|
|
61
|
+
|
|
62
|
+
## Why not plain tmux?
|
|
63
|
+
|
|
64
|
+
vupai *runs on* tmux: it doesn't replace it, it adds a voice layer on top. tmux
|
|
65
|
+
already gives you panes, splits, and a way to keep many agents on screen. What it
|
|
66
|
+
can't do is let you talk to them. That's the gap vupai fills.
|
|
67
|
+
|
|
68
|
+
| With plain tmux | With vupai |
|
|
69
|
+
|---|---|
|
|
70
|
+
| Switch panes with `<prefix>`-arrow, then type | **Hold a key and talk** to the focused pane |
|
|
71
|
+
| Manually track which pane is which agent | Panes **auto-name themselves**; address them by name (*"atlas, run the tests"*) |
|
|
72
|
+
| Re-type the same command in each pane | **Broadcast by voice** to every agent at once (*"everyone, pull main"*) |
|
|
73
|
+
| Split / resize / re-layout with prefix chords | **Voice commands**: *"create 3 panes"*, *"focus atlas"*, *"swap atlas and sage"*, *"tile"* |
|
|
74
|
+
| Read each pane yourself to see what agents are doing | **Supervision board** + *"read atlas"* speaks a one-line summary aloud |
|
|
75
|
+
| n/a | **On-device speech** (Parakeet via Apple MLX) - no cloud, no API keys |
|
|
76
|
+
|
|
77
|
+
If you only have one shell open, you don't need vupai. It earns its keep when you
|
|
78
|
+
are juggling several agents and want to drive them hands-on-keyboard-optional.
|
|
79
|
+
|
|
80
|
+
## How it works
|
|
81
|
+
|
|
82
|
+
```
|
|
83
|
+
hold dictation key (Right-Option) → record (sox) → transcribe (Parakeet) → route → paste into a tmux pane → Enter
|
|
84
|
+
```
|
|
85
|
+
|
|
86
|
+
- **Routing is hybrid.** By default your speech goes to the **focused** pane. If
|
|
87
|
+
you start with an agent's **name**, it goes there instead — even when it isn't
|
|
88
|
+
focused. Say a **number** (*"two, …"*) to hit a pane by its position in the
|
|
89
|
+
current window.
|
|
90
|
+
- **Injection is safe.** vupai pastes your text and waits until it actually
|
|
91
|
+
appears in the pane before pressing Enter — it never blindly submits.
|
|
92
|
+
- **Fully local, fully private.** Speech-to-text runs entirely on-device via
|
|
93
|
+
Apple MLX (NVIDIA Parakeet). There is no cloud service, no API key, and no
|
|
94
|
+
account: your voice and transcripts never leave your Mac. The only network
|
|
95
|
+
access is a one-time model download (~2 GB) on first use.
|
|
96
|
+
|
|
97
|
+
## Requirements
|
|
98
|
+
|
|
99
|
+
> [!IMPORTANT]
|
|
100
|
+
> vupai is **macOS Apple-Silicon only**: it depends on Apple MLX for on-device
|
|
101
|
+
> speech, plus two Homebrew binaries. It will not run on Linux or Intel Macs.
|
|
102
|
+
> On an unsupported host the CLI fails fast with a clear message instead of a
|
|
103
|
+
> stray import error, and `parakeet-mlx` is skipped at install time.
|
|
104
|
+
|
|
105
|
+
- macOS on **Apple Silicon** (M-series), macOS 13.5+ (developed on macOS 26).
|
|
106
|
+
- [`tmux`](https://github.com/tmux/tmux) and [`sox`](https://sox.sourceforge.net/):
|
|
107
|
+
```bash
|
|
108
|
+
brew install tmux sox
|
|
109
|
+
```
|
|
110
|
+
- Python ≥ 3.11 and [`uv`](https://docs.astral.sh/uv/), used to install the CLI:
|
|
111
|
+
```bash
|
|
112
|
+
brew install uv # or: curl -LsSf https://astral.sh/uv/install.sh | sh
|
|
113
|
+
```
|
|
114
|
+
|
|
115
|
+
## Install
|
|
116
|
+
|
|
117
|
+
After the Homebrew step above, install the `vupai` CLI from PyPI in its own
|
|
118
|
+
isolated environment with [`uv`](https://docs.astral.sh/uv/):
|
|
119
|
+
|
|
120
|
+
```bash
|
|
121
|
+
uv tool install vupai
|
|
122
|
+
```
|
|
123
|
+
|
|
124
|
+
This puts `vupai` on your `PATH`. The Parakeet model (~0.6B, ~2 GB) downloads
|
|
125
|
+
automatically on first transcription.
|
|
126
|
+
|
|
127
|
+
To upgrade later: `uv tool upgrade vupai`.
|
|
128
|
+
|
|
129
|
+
> [!NOTE]
|
|
130
|
+
> Prefer the bleeding edge? Install straight from git instead:
|
|
131
|
+
> `uv tool install git+https://github.com/itsjrsa/vupai`.
|
|
132
|
+
|
|
133
|
+
### From source (development)
|
|
134
|
+
|
|
135
|
+
```bash
|
|
136
|
+
git clone git@github.com:itsjrsa/vupai.git
|
|
137
|
+
cd vupai
|
|
138
|
+
uv sync # creates .venv and installs everything (incl. the MLX runtime)
|
|
139
|
+
```
|
|
140
|
+
|
|
141
|
+
Run the CLI with `uv run vupai …` from the repo, or see the live-reload loop
|
|
142
|
+
(`vupai reload` / `vupai --reload`) in [AGENTS.md](AGENTS.md).
|
|
143
|
+
|
|
144
|
+
> [!NOTE]
|
|
145
|
+
> The examples below use the bare `vupai` command (the installed tool). If you're
|
|
146
|
+
> running from a source checkout, prefix each one with `uv run` (e.g. `uv run
|
|
147
|
+
> vupai setup`).
|
|
148
|
+
|
|
149
|
+
Working on vupai with an AI coding agent (Claude Code, Codex, opencode, Cursor,
|
|
150
|
+
Aider, …)? [AGENTS.md](AGENTS.md) is the single source of truth for repo
|
|
151
|
+
conventions, architecture, and invariants; [CLAUDE.md](CLAUDE.md) just points to
|
|
152
|
+
it.
|
|
153
|
+
|
|
154
|
+
## Set up (once)
|
|
155
|
+
|
|
156
|
+
The fastest path after install is the interactive bootstrap:
|
|
157
|
+
|
|
158
|
+
```bash
|
|
159
|
+
vupai setup
|
|
160
|
+
```
|
|
161
|
+
|
|
162
|
+
It walks you through everything first-run: checks the Homebrew tools, captures
|
|
163
|
+
consent for the local transcript journal, lets you pick a mic and your push-to-talk key(s)/addressing
|
|
164
|
+
mode, downloads the speech model up front (so the first hotkey press doesn't
|
|
165
|
+
stall on a silent fetch), then deep-links you to each macOS permission pane that
|
|
166
|
+
still needs your terminal app enabled. It's safe to re-run any time.
|
|
167
|
+
|
|
168
|
+
### Grant macOS permissions
|
|
169
|
+
|
|
170
|
+
`setup` handles these, but to check them on their own: vupai needs three
|
|
171
|
+
permissions, granted to **your terminal app** (Ghostty / iTerm / Terminal / …),
|
|
172
|
+
under **System Settings → Privacy & Security**: **Accessibility**, **Input
|
|
173
|
+
Monitoring**, and **Microphone**. Run:
|
|
174
|
+
|
|
175
|
+
```bash
|
|
176
|
+
vupai doctor
|
|
177
|
+
```
|
|
178
|
+
|
|
179
|
+
It probes each one and prints the exact System-Settings path for anything
|
|
180
|
+
missing.
|
|
181
|
+
|
|
182
|
+
> [!WARNING]
|
|
183
|
+
> macOS grants these to the terminal binary, not the script, so the hotkey and
|
|
184
|
+
> mic silently fail until you grant them. If voice seems dead, this is the first
|
|
185
|
+
> thing to check (`vupai doctor`).
|
|
186
|
+
|
|
187
|
+
## Usage
|
|
188
|
+
|
|
189
|
+
Start vupai inside a project, open a few agent panes, and drive them by voice. The
|
|
190
|
+
push-to-talk daemon runs in the background, so you stay in tmux and just hold a key
|
|
191
|
+
to talk. Launch (or re-attach to) a session with:
|
|
192
|
+
|
|
193
|
+
```bash
|
|
194
|
+
vupai # attach-or-create the session named after the cwd
|
|
195
|
+
vupai attach backend # attach to "backend" (create it if absent)
|
|
196
|
+
vupai new backend # create "backend" (error if it already exists)
|
|
197
|
+
vupai kill backend # kill the "backend" session
|
|
198
|
+
```
|
|
199
|
+
|
|
200
|
+
> [!NOTE]
|
|
201
|
+
> **vupai runs on its own tmux server**, so it never touches your existing tmux
|
|
202
|
+
> setup. The trade-off: its sessions don't show in a plain `tmux ls`; reach them
|
|
203
|
+
> with `vupai attach`. (Set `tmux_socket = ""` to share your default server.)
|
|
204
|
+
|
|
205
|
+
Once attached, you talk to vupai with **two push-to-talk keys**:
|
|
206
|
+
|
|
207
|
+
| Key | Config | Default | Hold and speak to… |
|
|
208
|
+
|---|---|---|---|
|
|
209
|
+
| **Dictation key** | `hotkey` | Right-Option | Type your words into the **focused** pane. |
|
|
210
|
+
| **System key** | `command_hotkey` | Right-Command | Run a **voice command** (below). The key is the signal, so there is no spoken control word; vupai acts on the panes instead of typing. |
|
|
211
|
+
|
|
212
|
+
Both defaults are customizable: set `hotkey` / `command_hotkey` in the config (each
|
|
213
|
+
takes a list, so you can bind several keys to one action).
|
|
214
|
+
|
|
215
|
+
> vupai only listens for keyboard keys. To use a mouse button or other input as a
|
|
216
|
+
> push-to-talk key, remap it to a keyboard key (e.g. `F13`) with a tool like
|
|
217
|
+
> [Karabiner-Elements](https://karabiner-elements.pqrs.org/) or BetterTouchTool,
|
|
218
|
+
> then bind that key here.
|
|
219
|
+
|
|
220
|
+
### Voice commands
|
|
221
|
+
|
|
222
|
+
Hold the **system key** and say any of these. Run `vupai voice-commands` for a
|
|
223
|
+
cheat sheet tailored to your config.
|
|
224
|
+
|
|
225
|
+
| Say | What happens |
|
|
226
|
+
|---|---|
|
|
227
|
+
| *"create 3 panes"* | Spin up N auto-named panes, tiled (up to 30; *"create 2 shell panes"* picks the program) |
|
|
228
|
+
| *"focus atlas"* | Focus the **atlas** pane (also *"switch to / go to"*) |
|
|
229
|
+
| *"swap atlas and sage"* | Swap two named panes |
|
|
230
|
+
| *"zoom atlas"* / *"unzoom"* | Maximize a pane / restore the layout |
|
|
231
|
+
| *"tile"* / *"layout …"* | Re-layout the window (tiled, main-vertical, …) |
|
|
232
|
+
| *"close atlas"* / *"kill atlas"* | Close a pane (asks y/n by default) |
|
|
233
|
+
| *"board"* | Open the **supervision board** (one per session) |
|
|
234
|
+
| *"read atlas"* / *"read all"* | Speak a pane's summary aloud (`read board` for a digest) |
|
|
235
|
+
| *"clear atlas"* / *"clear all"* | Send a slash command (`/clear`) to a pane or every agent |
|
|
236
|
+
| *"everyone, pull main"* | **Broadcast** the message to every named agent |
|
|
237
|
+
| *"connect to box"* / *"ssh box"* | SSH the focused pane into a configured host |
|
|
238
|
+
| *"mute"* / *"unmute"* / *"stop"* | Silence/restore talk-back, or cut off the current read |
|
|
239
|
+
| *"louder"* / *"quieter"* | Nudge readback volume (`tts_volume`, macOS `say` only) |
|
|
240
|
+
| *"atlas, run the tests"* | Not a command, so it falls through to **name addressing** |
|
|
241
|
+
|
|
242
|
+
## Commands
|
|
243
|
+
|
|
244
|
+
Run `vupai --help` for the full command list (and `vupai <command> --help` for a
|
|
245
|
+
specific one). The everyday ones are in [Usage](#usage) above; a few worth
|
|
246
|
+
knowing: `vupai setup` (first-run bootstrap), `vupai voice-commands` (spoken-command
|
|
247
|
+
cheat sheet for your config), and `vupai board` (the [supervision
|
|
248
|
+
board](#supervision-board)).
|
|
249
|
+
|
|
250
|
+
The push-to-talk daemon runs as a **detached background process** under your
|
|
251
|
+
terminal app (not inside tmux — that's required for the global hotkey to work).
|
|
252
|
+
It logs to `~/.config/vupai/daemon.log` and survives detach/reattach.
|
|
253
|
+
|
|
254
|
+
## Supervision board
|
|
255
|
+
|
|
256
|
+
When you have several agents running at once, you can't watch them all. The
|
|
257
|
+
**supervision board** does it for you: `vupai board` (or just say *"board"*)
|
|
258
|
+
splits a dedicated pane (right, ~40%) that shows, per named agent pane, a
|
|
259
|
+
one-line summary of its main conclusion or pending action — so a glance tells you
|
|
260
|
+
who's done, who's stuck, and who needs you.
|
|
261
|
+
|
|
262
|
+
- **Tool-agnostic.** Works with any agentic CLI, not just Claude Code: pane
|
|
263
|
+
activity is detected from terminal-output churn, and the summarizer is a
|
|
264
|
+
swappable command (`board_summarizer_cmd`), not a fixed model. vupai appends the
|
|
265
|
+
pane's scrollback tail as the command's last argument and takes its last stdout
|
|
266
|
+
line as the summary, so any command that follows that contract works:
|
|
267
|
+
|
|
268
|
+
| To summarize with… | Set `board_summarizer_cmd` to |
|
|
269
|
+
|---|---|
|
|
270
|
+
| Claude Haiku (default, streaming) | `python -m vupai.claude_summarize --model claude-haiku-4-5` |
|
|
271
|
+
| Claude (plain, buffered) | `claude -p --model claude-haiku-4-5` |
|
|
272
|
+
| Codex | `codex exec` |
|
|
273
|
+
| Gemini | `gemini -p` |
|
|
274
|
+
| Ollama (local/remote) | `python scripts/ollama_summarize.py --host http://BOX:11434 --model qwen2.5:7b` |
|
|
275
|
+
|
|
276
|
+
The model is whatever that command uses (e.g. Codex's own config/profile). If the
|
|
277
|
+
command is missing or fails, the board falls back to a non-LLM last-line summary.
|
|
278
|
+
- **Cheap by design.** A pane is summarized only when it *settles* (finishes a
|
|
279
|
+
burst of work), skipped when nothing changed, and throttled per pane
|
|
280
|
+
(`board_min_summary_interval`).
|
|
281
|
+
- **Speak it too.** *"read board"* reads the digest aloud; *"read atlas"* reads a
|
|
282
|
+
single pane.
|
|
283
|
+
|
|
284
|
+
One board per session. Close the pane to stop it.
|
|
285
|
+
|
|
286
|
+
## Configuration
|
|
287
|
+
|
|
288
|
+
vupai reads `~/.config/vupai/config.toml`. `vupai setup` writes it on first run,
|
|
289
|
+
pre-filled with **every key at its default and an inline comment explaining it**, so
|
|
290
|
+
the file itself is the reference. It's left untouched if one already exists, and
|
|
291
|
+
`vupai config --init` tops it up with any keys a newer version added without
|
|
292
|
+
disturbing your edits. Editing is optional; open the file to see them all.
|
|
293
|
+
|
|
294
|
+
**Applying changes.** The daemon reads its config once at startup, so a change
|
|
295
|
+
takes effect only after it respawns. The interactive commands (`vupai setup`,
|
|
296
|
+
`vupai keys`, `vupai mic`) apply their change automatically: if a daemon is
|
|
297
|
+
running, they reload it for you. But edits you make **by hand** to
|
|
298
|
+
`config.toml` or `hosts.toml` are not watched, so run `vupai reload` to pick
|
|
299
|
+
them up (or `vupai --reload` to reload and attach in one step).
|
|
300
|
+
|
|
301
|
+
The keys most people touch:
|
|
302
|
+
|
|
303
|
+
| Key | What it does |
|
|
304
|
+
|---|---|
|
|
305
|
+
| `hotkey` / `command_hotkey` | The dictation and system push-to-talk keys (pynput names; each a list, so you can bind several). |
|
|
306
|
+
| `addressing` | `button` (two keys, default) or `keyword` (legacy single key, no command layer). |
|
|
307
|
+
| `pane_command` | Default program for voice-created panes (e.g. `claude`). |
|
|
308
|
+
| `broadcast_word` | Leading word that injects to every named agent (default `everyone`). |
|
|
309
|
+
| `board_summarizer_cmd` | Command that summarizes panes for the board and `read` (see [Supervision board](#supervision-board)). |
|
|
310
|
+
| `[programs]` / `[aliases]` / `[macros]` / `[slash_commands]` | Spoken-token tables: program names, pane-name aliases, phrase macros, and slash verbs. |
|
|
311
|
+
|
|
312
|
+
**Addressing modes.** In `button` mode (default) you hold one of two keys: the
|
|
313
|
+
dictation key (`hotkey`) types your words verbatim into the focused pane, while the
|
|
314
|
+
system key (`command_hotkey`) interprets them as a command, a broadcast, or a
|
|
315
|
+
name-addressed message ("atlas, are you there?"). The key is the control signal, so
|
|
316
|
+
no spoken control word is needed. Each key field is a list, so you can bind several
|
|
317
|
+
keys to the same action (any one triggers it) and keep one config that works across
|
|
318
|
+
keyboards with different layouts. `keyword` mode is the legacy single-key mode: it
|
|
319
|
+
has no command layer - only the `broadcast_word` ("everyone ...") leads; everything
|
|
320
|
+
else is name-addressed or dictated verbatim to the focused pane.
|
|
321
|
+
|
|
322
|
+
### Remote machines (SSH)
|
|
323
|
+
|
|
324
|
+
The *"ssh box"* / *"connect to box"* voice command opens a new pane and SSHes into
|
|
325
|
+
a host you name. Hosts live in a separate file, `~/.config/vupai/hosts.toml`. Write
|
|
326
|
+
a commented template with:
|
|
327
|
+
|
|
328
|
+
```bash
|
|
329
|
+
vupai hosts --init # scaffold ~/.config/vupai/hosts.toml
|
|
330
|
+
vupai hosts # list what's configured
|
|
331
|
+
```
|
|
332
|
+
|
|
333
|
+
Each host is one table; only `host` is required (SSH key auth must already work):
|
|
334
|
+
|
|
335
|
+
```toml
|
|
336
|
+
[hosts.box]
|
|
337
|
+
user = "me" # optional; omit to use ~/.ssh/config defaults
|
|
338
|
+
host = "box.example.com" # required: hostname/IP or an ssh-config Host alias
|
|
339
|
+
port = 22 # optional
|
|
340
|
+
program = "claude" # optional; omit to land in a plain login shell (default)
|
|
341
|
+
```
|
|
342
|
+
|
|
343
|
+
Say the table name (*"ssh box"*) to connect. By default you land in a login shell,
|
|
344
|
+
so you can `cd` into a project first; set `program` to auto-start an agent instead.
|
|
345
|
+
|
|
346
|
+
## tmux tips
|
|
347
|
+
|
|
348
|
+
vupai sets the tmux options it needs at startup (`ensure_up`), so **no config is
|
|
349
|
+
required**. A couple of optional settings just make the multi-agent flow nicer:
|
|
350
|
+
|
|
351
|
+
```tmux
|
|
352
|
+
# ~/.tmux.conf (optional, pairs well with vupai)
|
|
353
|
+
set -g mouse on # click a pane to focus, scroll to read history
|
|
354
|
+
bind -T copy-mode-vi WheelUpPane send -X scroll-up
|
|
355
|
+
bind -T copy-mode-vi WheelDownPane send -X scroll-down
|
|
356
|
+
```
|
|
357
|
+
|
|
358
|
+
> [!WARNING]
|
|
359
|
+
> Do **not** enable `extended-keys` (CSI-u) in your tmux config:
|
|
360
|
+
> ```tmux
|
|
361
|
+
> set -s extended-keys on # breaks vupai
|
|
362
|
+
> set -as terminal-features 'xterm*:extkeys' # breaks vupai
|
|
363
|
+
> ```
|
|
364
|
+
> It re-encodes Enter, so vupai's injected text never submits in Claude Code.
|
|
365
|
+
> vupai forces `extended-keys off` at startup, but a later `tmux source-file`
|
|
366
|
+
> (config reload) flips it back on and silently breaks submission. For the same
|
|
367
|
+
> reason, don't override `pane-border-format` / `pane-border-status` (clobbers the
|
|
368
|
+
> voice-name border) or rebind `<prefix> + R` (vupai uses it to rename a pane).
|
|
369
|
+
> These apply **inside vupai's own session** (tmux still sources your
|
|
370
|
+
> `~/.tmux.conf` on vupai's dedicated server); your default tmux is untouched.
|
|
371
|
+
|
|
372
|
+
## Roadmap
|
|
373
|
+
|
|
374
|
+
vupai is young and evolving. A few things on the horizon (in no particular
|
|
375
|
+
order):
|
|
376
|
+
|
|
377
|
+
- **Tighter pane-state and activity awareness** so routing and the board react
|
|
378
|
+
faster to what each agent is actually doing.
|
|
379
|
+
- **Broader agent-CLI coverage**: validate the flow end-to-end with Codex,
|
|
380
|
+
opencode, Gemini, and other agentic tools (testing so far has centered on
|
|
381
|
+
Claude Code).
|
|
382
|
+
- **Smarter addressing**: more forgiving name matching and disambiguation when
|
|
383
|
+
several agents answer to similar names.
|
|
384
|
+
- **More voice commands** for everyday tmux moves, so less reaching for the
|
|
385
|
+
prefix key.
|
|
386
|
+
- **Linux support** is a long shot (the speech stack is Apple-MLX-only today),
|
|
387
|
+
but a pluggable transcription backend would open the door.
|
|
388
|
+
|
|
389
|
+
Ideas and contributions are welcome: open an issue or PR.
|
|
390
|
+
|
|
391
|
+
## Uninstall
|
|
392
|
+
|
|
393
|
+
```bash
|
|
394
|
+
vupai down # stop the background daemon
|
|
395
|
+
vupai cleanup # revert any leftover settings on your default tmux server
|
|
396
|
+
uv tool uninstall vupai # remove the CLI
|
|
397
|
+
```
|
|
398
|
+
|
|
399
|
+
That removes the program. To also delete what it created on disk:
|
|
400
|
+
|
|
401
|
+
```bash
|
|
402
|
+
rm -rf ~/.config/vupai # config, hosts, daemon log, journal
|
|
403
|
+
rm -rf ~/.cache/huggingface/hub/models--mlx-community--parakeet-tdt-0.6b-v2 # the ~2 GB speech model
|
|
404
|
+
```
|
|
405
|
+
|
|
406
|
+
The Homebrew tools (`tmux`, `sox`) are general-purpose; remove them only if nothing
|
|
407
|
+
else needs them (`brew uninstall tmux sox`). The macOS permissions were granted to
|
|
408
|
+
your terminal app, not to vupai, so leave them unless you want to revoke them by
|
|
409
|
+
hand under **System Settings → Privacy & Security**.
|
|
410
|
+
|
|
411
|
+
## License
|
|
412
|
+
|
|
413
|
+
[MIT](LICENSE). (Note: `pynput` is LGPL-3.0 and the Parakeet model weights are
|
|
414
|
+
CC-BY-4.0; both are runtime dependencies, not part of this repo's code.)
|