stttui 0.2.0__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
stttui-0.2.0/LICENSE ADDED
@@ -0,0 +1,21 @@
1
+ MIT License
2
+
3
+ Copyright (c) 2026 stttui contributors
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ of this software and associated documentation files (the "Software"), to deal
7
+ in the Software without restriction, including without limitation the rights
8
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ copies of the Software, and to permit persons to whom the Software is
10
+ furnished to do so, subject to the following conditions:
11
+
12
+ The above copyright notice and this permission notice shall be included in all
13
+ copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21
+ SOFTWARE.
stttui-0.2.0/PKG-INFO ADDED
@@ -0,0 +1,195 @@
1
+ Metadata-Version: 2.4
2
+ Name: stttui
3
+ Version: 0.2.0
4
+ Summary: Speech-to-Text TUI — offline transcription powered by faster-whisper
5
+ Author: Anthony Holten
6
+ License-Expression: MIT
7
+ Project-URL: Homepage, https://github.com/aholten/sttui
8
+ Project-URL: Repository, https://github.com/aholten/sttui
9
+ Project-URL: Issues, https://github.com/aholten/sttui/issues
10
+ Keywords: speech-to-text,whisper,tui,transcription,offline
11
+ Classifier: Development Status :: 4 - Beta
12
+ Classifier: Environment :: Console
13
+ Classifier: Intended Audience :: Developers
14
+ Classifier: Intended Audience :: End Users/Desktop
15
+ Classifier: Operating System :: OS Independent
16
+ Classifier: Programming Language :: Python :: 3
17
+ Classifier: Programming Language :: Python :: 3.10
18
+ Classifier: Programming Language :: Python :: 3.11
19
+ Classifier: Programming Language :: Python :: 3.12
20
+ Classifier: Programming Language :: Python :: 3.13
21
+ Classifier: Topic :: Multimedia :: Sound/Audio :: Speech
22
+ Requires-Python: >=3.10
23
+ Description-Content-Type: text/markdown
24
+ License-File: LICENSE
25
+ Requires-Dist: faster-whisper<2.0.0,>=1.0.0
26
+ Requires-Dist: sounddevice<1.0.0,>=0.4.0
27
+ Requires-Dist: soundfile<1.0.0,>=0.12.0
28
+ Requires-Dist: pynput<2.0.0,>=1.7.0
29
+ Requires-Dist: pyperclip<2.0.0,>=1.8.0
30
+ Requires-Dist: numpy<3.0.0,>=1.24.0
31
+ Requires-Dist: tzdata>=2024.1
32
+ Requires-Dist: textual<1.0.0,>=0.40.0
33
+ Dynamic: license-file
34
+
35
+ # stttui: Speech To Text Terminal User Interface
36
+
37
+ > **v0.2.0** | Python 3.10+ | MIT License
38
+
39
+ A local, fully offline speech-to-text tool powered by [faster-whisper](https://github.com/SYSTRAN/faster-whisper). Record audio with a global hotkey from any window, transcribe it on-device, and get the result copied straight to your clipboard — no cloud, no API keys, no latency.
40
+
41
+ Comes with a polished terminal UI built on [Textual](https://github.com/Textualize/textual), complete with a live audio level meter, transcription history, and on-the-fly model switching.
42
+
43
+ <p align="center">
44
+ <img src="demo.gif" alt="stttui in action" width="700">
45
+ </p>
46
+
47
+ ---
48
+
49
+ ## Features
50
+
51
+ - **Fully offline** — runs entirely on your machine using faster-whisper with int8 quantization. No data ever leaves your device.
52
+ - **Global hotkeys** — record from anywhere without focusing the terminal
53
+ - **Live audio meter** — real-time visual feedback while recording
54
+ - **Transcription history** — timestamped, scrollable log of every dictation
55
+ - **Model switching** — swap between `tiny`, `base`, `small`, `medium`, and `large` whisper models on the fly
56
+ - **Clipboard integration** — transcriptions are automatically copied and ready to paste
57
+ - **Cross-platform** — works on Windows, macOS, and Linux
58
+ - **Three modes** — rich TUI (default), lightweight CLI (`--cli`), or headless (`--headless`) for scripting/piping
59
+
60
+ ## Installation
61
+
62
+ ### Homebrew (macOS / Linux)
63
+
64
+ ```bash
65
+ brew tap aholten/tap
66
+ brew install stttui
67
+ ```
68
+
69
+ ### pip / pipx
70
+
71
+ ```bash
72
+ # With pipx (recommended — isolated environment)
73
+ pipx install stttui
74
+
75
+ # Or with pip
76
+ pip install stttui
77
+ ```
78
+
79
+ ### From source
80
+
81
+ ```bash
82
+ git clone https://github.com/aholten/sttui.git
83
+ cd stttui
84
+ pip install .
85
+ ```
86
+
87
+ > On macOS you may need to grant terminal accessibility permissions for global hotkeys.
88
+ > On Linux, install `portaudio` (`sudo apt install portaudio19-dev`) and `xclip` for clipboard support.
89
+
90
+ ## Usage
91
+
92
+ ### Hotkeys
93
+
94
+ These work **globally** — even when the terminal is not focused:
95
+
96
+ | Hotkey | Action |
97
+ |---|---|
98
+ | `Ctrl+Shift+Space` | Start / stop recording |
99
+ | `Ctrl+C` | Quit |
100
+
101
+ ### Workflow
102
+
103
+ 1. Launch the tool — the whisper model loads in the background
104
+ 2. Switch to any application (browser, editor, chat, etc.)
105
+ 3. Press **Ctrl+Shift+Space** to start recording
106
+ 4. Speak
107
+ 5. Press **Ctrl+Shift+Space** again to stop
108
+ 6. The transcription is copied to your clipboard — paste with **Ctrl+V**
109
+
110
+ ### TUI Mode (default)
111
+
112
+ ```bash
113
+ stttui
114
+ ```
115
+
116
+ The terminal interface includes:
117
+ - **Model selector** — dropdown to switch whisper models without restarting
118
+ - **Status badge** — shows IDLE / RECORDING / TRANSCRIBING state
119
+ - **Level meter** — live audio input visualization
120
+ - **Transcription history** — scrollable log with timestamps
121
+ - **Status log** — model loading progress and system messages
122
+
123
+ ### CLI Mode
124
+
125
+ ```bash
126
+ stttui --cli
127
+ ```
128
+
129
+ Minimal output, no UI — just hotkeys and clipboard.
130
+
131
+ ### Headless Mode
132
+
133
+ Record, transcribe, print to stdout, and exit. No hotkeys or UI — ideal for scripts and piping.
134
+
135
+ ```bash
136
+ # Record until crtl+shift+space is pressed
137
+ stttui --headless
138
+
139
+ # Record for exactly 10 seconds
140
+ stttui --headless --duration 10
141
+
142
+ # Pipe transcription to another command
143
+ stttui --headless --duration 5 | xargs echo "You said:"
144
+ ```
145
+
146
+ Status messages go to stderr, transcription goes to stdout.
147
+
148
+ ### Options
149
+
150
+ | Flag | Description | Default |
151
+ |---|---|---|
152
+ | `--cli` | Run in CLI-only mode (no TUI) | off |
153
+ | `--headless` | Record, transcribe, print to stdout, and exit | off |
154
+ | `--duration` | Recording duration in seconds (headless mode; omit to wait for Enter) | — |
155
+ | `--model` | Whisper model size | `base` |
156
+ | `--version` | Print version and exit | — |
157
+
158
+ ```bash
159
+ # Example: launch with the small model
160
+ stttui --model small
161
+ ```
162
+
163
+ ## Model Sizes
164
+
165
+ Larger models are more accurate but slower to load and transcribe. All models run on CPU with int8 quantization.
166
+
167
+ | Model | Parameters | Speed |
168
+ |---|---|---|
169
+ | `tiny` | 39M | Fastest |
170
+ | `base` | 74M | Fast |
171
+ | `small` | 244M | Moderate |
172
+ | `medium` | 769M | Slow |
173
+ | `large` | 1550M | Slowest |
174
+
175
+ In TUI mode you can switch models from the dropdown at any time.
176
+
177
+ ## Platform Notes
178
+
179
+ | Platform | Notes |
180
+ |---|---|
181
+ | **Windows** | Works out of the box via Git Bash or MSYS2. |
182
+ | **macOS** | Clipboard works via `pbcopy`. You may need to grant terminal accessibility permissions for global hotkeys. |
183
+ | **Linux** | Install `xclip` or `xsel` for clipboard support (`sudo apt install xclip`). |
184
+
185
+ ## Transcription Log
186
+
187
+ All transcriptions are automatically saved to `transcription_log.txt` with timestamps for reference.
188
+
189
+ ## License
190
+
191
+ MIT
192
+
193
+ ## Author
194
+
195
+ Anthony Holten @aholten on GitHub
stttui-0.2.0/README.md ADDED
@@ -0,0 +1,161 @@
1
+ # stttui: Speech To Text Terminal User Interface
2
+
3
+ > **v0.2.0** | Python 3.10+ | MIT License
4
+
5
+ A local, fully offline speech-to-text tool powered by [faster-whisper](https://github.com/SYSTRAN/faster-whisper). Record audio with a global hotkey from any window, transcribe it on-device, and get the result copied straight to your clipboard — no cloud, no API keys, no latency.
6
+
7
+ Comes with a polished terminal UI built on [Textual](https://github.com/Textualize/textual), complete with a live audio level meter, transcription history, and on-the-fly model switching.
8
+
9
+ <p align="center">
10
+ <img src="demo.gif" alt="stttui in action" width="700">
11
+ </p>
12
+
13
+ ---
14
+
15
+ ## Features
16
+
17
+ - **Fully offline** — runs entirely on your machine using faster-whisper with int8 quantization. No data ever leaves your device.
18
+ - **Global hotkeys** — record from anywhere without focusing the terminal
19
+ - **Live audio meter** — real-time visual feedback while recording
20
+ - **Transcription history** — timestamped, scrollable log of every dictation
21
+ - **Model switching** — swap between `tiny`, `base`, `small`, `medium`, and `large` whisper models on the fly
22
+ - **Clipboard integration** — transcriptions are automatically copied and ready to paste
23
+ - **Cross-platform** — works on Windows, macOS, and Linux
24
+ - **Three modes** — rich TUI (default), lightweight CLI (`--cli`), or headless (`--headless`) for scripting/piping
25
+
26
+ ## Installation
27
+
28
+ ### Homebrew (macOS / Linux)
29
+
30
+ ```bash
31
+ brew tap aholten/tap
32
+ brew install stttui
33
+ ```
34
+
35
+ ### pip / pipx
36
+
37
+ ```bash
38
+ # With pipx (recommended — isolated environment)
39
+ pipx install stttui
40
+
41
+ # Or with pip
42
+ pip install stttui
43
+ ```
44
+
45
+ ### From source
46
+
47
+ ```bash
48
+ git clone https://github.com/aholten/sttui.git
49
+ cd stttui
50
+ pip install .
51
+ ```
52
+
53
+ > On macOS you may need to grant terminal accessibility permissions for global hotkeys.
54
+ > On Linux, install `portaudio` (`sudo apt install portaudio19-dev`) and `xclip` for clipboard support.
55
+
56
+ ## Usage
57
+
58
+ ### Hotkeys
59
+
60
+ These work **globally** — even when the terminal is not focused:
61
+
62
+ | Hotkey | Action |
63
+ |---|---|
64
+ | `Ctrl+Shift+Space` | Start / stop recording |
65
+ | `Ctrl+C` | Quit |
66
+
67
+ ### Workflow
68
+
69
+ 1. Launch the tool — the whisper model loads in the background
70
+ 2. Switch to any application (browser, editor, chat, etc.)
71
+ 3. Press **Ctrl+Shift+Space** to start recording
72
+ 4. Speak
73
+ 5. Press **Ctrl+Shift+Space** again to stop
74
+ 6. The transcription is copied to your clipboard — paste with **Ctrl+V**
75
+
76
+ ### TUI Mode (default)
77
+
78
+ ```bash
79
+ stttui
80
+ ```
81
+
82
+ The terminal interface includes:
83
+ - **Model selector** — dropdown to switch whisper models without restarting
84
+ - **Status badge** — shows IDLE / RECORDING / TRANSCRIBING state
85
+ - **Level meter** — live audio input visualization
86
+ - **Transcription history** — scrollable log with timestamps
87
+ - **Status log** — model loading progress and system messages
88
+
89
+ ### CLI Mode
90
+
91
+ ```bash
92
+ stttui --cli
93
+ ```
94
+
95
+ Minimal output, no UI — just hotkeys and clipboard.
96
+
97
+ ### Headless Mode
98
+
99
+ Record, transcribe, print to stdout, and exit. No hotkeys or UI — ideal for scripts and piping.
100
+
101
+ ```bash
102
+ # Record until crtl+shift+space is pressed
103
+ stttui --headless
104
+
105
+ # Record for exactly 10 seconds
106
+ stttui --headless --duration 10
107
+
108
+ # Pipe transcription to another command
109
+ stttui --headless --duration 5 | xargs echo "You said:"
110
+ ```
111
+
112
+ Status messages go to stderr, transcription goes to stdout.
113
+
114
+ ### Options
115
+
116
+ | Flag | Description | Default |
117
+ |---|---|---|
118
+ | `--cli` | Run in CLI-only mode (no TUI) | off |
119
+ | `--headless` | Record, transcribe, print to stdout, and exit | off |
120
+ | `--duration` | Recording duration in seconds (headless mode; omit to wait for Enter) | — |
121
+ | `--model` | Whisper model size | `base` |
122
+ | `--version` | Print version and exit | — |
123
+
124
+ ```bash
125
+ # Example: launch with the small model
126
+ stttui --model small
127
+ ```
128
+
129
+ ## Model Sizes
130
+
131
+ Larger models are more accurate but slower to load and transcribe. All models run on CPU with int8 quantization.
132
+
133
+ | Model | Parameters | Speed |
134
+ |---|---|---|
135
+ | `tiny` | 39M | Fastest |
136
+ | `base` | 74M | Fast |
137
+ | `small` | 244M | Moderate |
138
+ | `medium` | 769M | Slow |
139
+ | `large` | 1550M | Slowest |
140
+
141
+ In TUI mode you can switch models from the dropdown at any time.
142
+
143
+ ## Platform Notes
144
+
145
+ | Platform | Notes |
146
+ |---|---|
147
+ | **Windows** | Works out of the box via Git Bash or MSYS2. |
148
+ | **macOS** | Clipboard works via `pbcopy`. You may need to grant terminal accessibility permissions for global hotkeys. |
149
+ | **Linux** | Install `xclip` or `xsel` for clipboard support (`sudo apt install xclip`). |
150
+
151
+ ## Transcription Log
152
+
153
+ All transcriptions are automatically saved to `transcription_log.txt` with timestamps for reference.
154
+
155
+ ## License
156
+
157
+ MIT
158
+
159
+ ## Author
160
+
161
+ Anthony Holten @aholten on GitHub
@@ -0,0 +1,49 @@
1
+ [build-system]
2
+ requires = ["setuptools>=68.0"]
3
+ build-backend = "setuptools.build_meta"
4
+
5
+ [project]
6
+ name = "stttui"
7
+ version = "0.2.0"
8
+ description = "Speech-to-Text TUI — offline transcription powered by faster-whisper"
9
+ readme = "README.md"
10
+ license = "MIT"
11
+ requires-python = ">=3.10"
12
+ authors = [
13
+ { name = "Anthony Holten" },
14
+ ]
15
+ keywords = ["speech-to-text", "whisper", "tui", "transcription", "offline"]
16
+ classifiers = [
17
+ "Development Status :: 4 - Beta",
18
+ "Environment :: Console",
19
+ "Intended Audience :: Developers",
20
+ "Intended Audience :: End Users/Desktop",
21
+ "Operating System :: OS Independent",
22
+ "Programming Language :: Python :: 3",
23
+ "Programming Language :: Python :: 3.10",
24
+ "Programming Language :: Python :: 3.11",
25
+ "Programming Language :: Python :: 3.12",
26
+ "Programming Language :: Python :: 3.13",
27
+ "Topic :: Multimedia :: Sound/Audio :: Speech",
28
+ ]
29
+ dependencies = [
30
+ "faster-whisper>=1.0.0,<2.0.0",
31
+ "sounddevice>=0.4.0,<1.0.0",
32
+ "soundfile>=0.12.0,<1.0.0",
33
+ "pynput>=1.7.0,<2.0.0",
34
+ "pyperclip>=1.8.0,<2.0.0",
35
+ "numpy>=1.24.0,<3.0.0",
36
+ "tzdata>=2024.1",
37
+ "textual>=0.40.0,<1.0.0",
38
+ ]
39
+
40
+ [project.urls]
41
+ Homepage = "https://github.com/aholten/sttui"
42
+ Repository = "https://github.com/aholten/sttui"
43
+ Issues = "https://github.com/aholten/sttui/issues"
44
+
45
+ [project.scripts]
46
+ stttui = "stttui.speech_to_text:main"
47
+
48
+ [tool.setuptools.packages.find]
49
+ where = ["src"]
stttui-0.2.0/setup.cfg ADDED
@@ -0,0 +1,4 @@
1
+ [egg_info]
2
+ tag_build =
3
+ tag_date = 0
4
+
@@ -0,0 +1,3 @@
1
+ """stttui — Speech-to-Text TUI"""
2
+
3
+ __version__ = "0.2.0"
@@ -0,0 +1,5 @@
1
+ """Allow running as `python -m stttui`."""
2
+
3
+ from stttui.speech_to_text import main
4
+
5
+ main()
@@ -0,0 +1,538 @@
1
+ """
2
+ stttui — Speech-to-Text TUI
3
+
4
+ Press Ctrl+Shift+Space to start recording.
5
+ Press Ctrl+Shift+Space again to stop recording, transcribe, and paste.
6
+ Press Ctrl+C to quit.
7
+
8
+ Run with --cli for CLI-only mode.
9
+ """
10
+
11
+ import argparse
12
+ import os
13
+ import signal
14
+ import sys
15
+ import tempfile
16
+ import threading
17
+ import time
18
+ from datetime import datetime
19
+ from pathlib import Path
20
+ from zoneinfo import ZoneInfo
21
+
22
+ if sys.version_info < (3, 10):
23
+ sys.exit("stttui requires Python 3.10 or later.")
24
+
25
+ import numpy as np
26
+ import pyperclip
27
+ import sounddevice as sd
28
+ import soundfile as sf
29
+ from faster_whisper import WhisperModel
30
+ from pynput import keyboard as pynput_keyboard
31
+
32
+ from stttui import __version__ as VERSION
33
+
34
+ HOTKEY_RECORD = "ctrl+shift+space"
35
+ HOTKEY_QUIT = "ctrl+c"
36
+ SAMPLE_RATE = 16000
37
+ CHANNELS = 1
38
+ LOG_FILE = Path.home() / ".stttui" / "transcription_log.txt"
39
+ EASTERN = ZoneInfo("America/New_York")
40
+ MODEL_SIZES = ["tiny", "base", "small", "medium", "large"]
41
+
42
+ # RMS level throttle
43
+ _LEVEL_INTERVAL = 1.0 / 15 # ~15 updates/sec
44
+
45
+
46
+ class SpeechToText:
47
+ """Core engine with callback hooks for UI integration."""
48
+
49
+ def __init__(self, model_name="base"):
50
+ self.recording = False
51
+ self.audio_chunks = []
52
+ self.stream = None
53
+ self.model = None
54
+ self.model_name = model_name
55
+ self.lock = threading.Lock()
56
+ self._last_level_time = 0.0
57
+
58
+ # Callback hooks — default to print for CLI mode
59
+ self.on_status = lambda msg: print(msg)
60
+ self.on_transcription = lambda text, ts: print(f"Transcription: {text}")
61
+ self.on_level = lambda rms: None
62
+ self.on_state_change = lambda state: None
63
+
64
+ def load_model(self):
65
+ self.on_status(f"Loading faster-whisper model '{self.model_name}' (CPU, int8)...")
66
+ self.model = WhisperModel(
67
+ self.model_name, device="cpu", compute_type="int8"
68
+ )
69
+ self.on_status("Model loaded. Ready!")
70
+
71
+ def audio_callback(self, indata, frames, time_info, status):
72
+ if status:
73
+ self.on_status(f"Audio status: {status}")
74
+ self.audio_chunks.append(indata.copy())
75
+
76
+ # Throttled RMS level
77
+ now = time.monotonic()
78
+ if now - self._last_level_time >= _LEVEL_INTERVAL:
79
+ self._last_level_time = now
80
+ rms = float(np.sqrt(np.mean(indata ** 2)))
81
+ self.on_level(rms)
82
+
83
+ def start_recording(self):
84
+ self.audio_chunks = []
85
+ self.stream = sd.InputStream(
86
+ samplerate=SAMPLE_RATE,
87
+ channels=CHANNELS,
88
+ dtype="float32",
89
+ callback=self.audio_callback,
90
+ )
91
+ self.stream.start()
92
+ self.recording = True
93
+ self.on_state_change("RECORDING")
94
+ self.on_status(f"Recording... (press {HOTKEY_RECORD} again to stop)")
95
+
96
+ def stop_recording(self):
97
+ if self.stream:
98
+ self.stream.stop()
99
+ self.stream.close()
100
+ self.stream = None
101
+ self.recording = False
102
+ self.on_state_change("TRANSCRIBING")
103
+ self.on_status("Recording stopped. Transcribing...")
104
+
105
+ def transcribe(self):
106
+ if not self.audio_chunks:
107
+ self.on_status("No audio recorded.")
108
+ self.on_state_change("IDLE")
109
+ return
110
+
111
+ audio = np.concatenate(self.audio_chunks, axis=0).flatten()
112
+ duration = len(audio) / SAMPLE_RATE
113
+ self.on_status(f"Audio duration: {duration:.1f}s")
114
+
115
+ if duration < 0.5:
116
+ self.on_status("Too short, skipping.")
117
+ self.on_state_change("IDLE")
118
+ return
119
+
120
+ tmp = tempfile.NamedTemporaryFile(suffix=".wav", delete=False)
121
+ tmp_path = tmp.name
122
+ tmp.close()
123
+
124
+ try:
125
+ sf.write(tmp_path, audio, SAMPLE_RATE)
126
+ segments, info = self.model.transcribe(tmp_path)
127
+ text = " ".join(seg.text.strip() for seg in segments).strip()
128
+
129
+ if text:
130
+ pyperclip.copy(text)
131
+ timestamp = datetime.now(EASTERN).strftime("%Y-%m-%d %I:%M:%S %p %Z")
132
+ self.on_transcription(text, timestamp)
133
+ self._log_transcription(text, timestamp)
134
+ else:
135
+ self.on_status("No speech detected.")
136
+ finally:
137
+ os.unlink(tmp_path)
138
+ self.on_state_change("IDLE")
139
+
140
+ def _log_transcription(self, text, timestamp=None):
141
+ if timestamp is None:
142
+ timestamp = datetime.now(EASTERN).strftime("%Y-%m-%d %I:%M:%S %p %Z")
143
+ LOG_FILE.parent.mkdir(parents=True, exist_ok=True)
144
+ with open(LOG_FILE, "a", encoding="utf-8") as f:
145
+ f.write(f"[{timestamp}] {text}\n")
146
+
147
+ def toggle_recording(self):
148
+ with self.lock:
149
+ if not self.recording:
150
+ self.start_recording()
151
+ else:
152
+ self.stop_recording()
153
+ self.transcribe()
154
+
155
+
156
+ # ---------------------------------------------------------------------------
157
+ # Cross-platform hotkey handling (pynput)
158
+ # ---------------------------------------------------------------------------
159
+
160
+ # Canonical key names used in the pressed set
161
+ _KEY_CTRL = "ctrl"
162
+ _KEY_SHIFT = "shift"
163
+ _KEY_SPACE = "space"
164
+
165
+ _RECORD_COMBO = frozenset({_KEY_CTRL, _KEY_SHIFT, _KEY_SPACE})
166
+
167
+
168
+ def _normalise_key(key):
169
+ """Map any pynput key to a canonical string for reliable matching."""
170
+ # Modifier keys
171
+ if key in (pynput_keyboard.Key.shift, pynput_keyboard.Key.shift_l, pynput_keyboard.Key.shift_r):
172
+ return _KEY_SHIFT
173
+ if key in (pynput_keyboard.Key.ctrl, pynput_keyboard.Key.ctrl_l, pynput_keyboard.Key.ctrl_r):
174
+ return _KEY_CTRL
175
+ # Space — can arrive as Key.space, KeyCode(char=' '), or KeyCode(vk=32, char=None)
176
+ if key == pynput_keyboard.Key.space:
177
+ return _KEY_SPACE
178
+ if isinstance(key, pynput_keyboard.KeyCode):
179
+ if key.char == ' ':
180
+ return _KEY_SPACE
181
+ if getattr(key, 'vk', None) == 32:
182
+ return _KEY_SPACE
183
+ if key.char:
184
+ return key.char.lower()
185
+ return key
186
+
187
+
188
+ def _create_hotkey_listener(on_record):
189
+ """Create a pynput keyboard listener for the record hotkey.
190
+
191
+ The listener never suppresses keys — all other shortcuts continue to work
192
+ normally while recording. Includes a 300ms debounce to prevent double-fires.
193
+ """
194
+ pressed = set()
195
+ last_fire = [0.0]
196
+
197
+ def on_press(key):
198
+ pressed.add(_normalise_key(key))
199
+ if _RECORD_COMBO.issubset(pressed):
200
+ now = time.monotonic()
201
+ if now - last_fire[0] < 0.3:
202
+ return
203
+ last_fire[0] = now
204
+ threading.Thread(target=on_record, daemon=True).start()
205
+
206
+ def on_release(key):
207
+ pressed.discard(_normalise_key(key))
208
+
209
+ return pynput_keyboard.Listener(on_press=on_press, on_release=on_release, suppress=False)
210
+
211
+
212
+ # ---------------------------------------------------------------------------
213
+ # CLI mode (original behavior)
214
+ # ---------------------------------------------------------------------------
215
+
216
+ def run_cli(model_name="base"):
217
+ stt = SpeechToText(model_name=model_name)
218
+
219
+ def cli_transcription(text, ts):
220
+ print(f"Transcription: {text}")
221
+ print("Copied to clipboard! Use Ctrl+V to paste.")
222
+
223
+ stt.on_transcription = cli_transcription
224
+
225
+ stt.load_model()
226
+ print(f" Press {HOTKEY_RECORD} to start/stop recording")
227
+ print(f" Press {HOTKEY_QUIT} to quit")
228
+ print("\nListening for hotkey...")
229
+
230
+ signal.signal(signal.SIGINT, lambda *_: os._exit(0))
231
+
232
+ listener = _create_hotkey_listener(on_record=stt.toggle_recording)
233
+ listener.start()
234
+ listener.join()
235
+
236
+
237
+ # ---------------------------------------------------------------------------
238
+ # TUI mode (Textual)
239
+ # ---------------------------------------------------------------------------
240
+
241
+ def run_tui(model_name="base"):
242
+ from textual.app import App, ComposeResult
243
+ from textual.binding import Binding
244
+ from textual.containers import Horizontal, Vertical
245
+ from textual.reactive import reactive
246
+ from textual.widgets import Footer, Header, ProgressBar, RichLog, Select, Static
247
+
248
+ APP_CSS = """
249
+ #top-bar {
250
+ height: 3;
251
+ padding: 0 1;
252
+ }
253
+
254
+ #model-select {
255
+ width: 24;
256
+ }
257
+
258
+ #status-badge {
259
+ width: 20;
260
+ content-align: center middle;
261
+ text-style: bold;
262
+ margin-left: 2;
263
+ padding: 0 1;
264
+ }
265
+
266
+ #status-badge.idle {
267
+ background: $primary-darken-2;
268
+ color: $text;
269
+ }
270
+
271
+ #status-badge.recording {
272
+ background: $error;
273
+ color: $text;
274
+ }
275
+
276
+ #status-badge.transcribing {
277
+ background: $warning;
278
+ color: $text;
279
+ }
280
+
281
+ #main-content {
282
+ height: 1fr;
283
+ }
284
+
285
+ #level-meter {
286
+ height: 1;
287
+ margin: 0 1;
288
+ }
289
+
290
+ #transcription-history {
291
+ height: 1fr;
292
+ margin: 0 1;
293
+ border: solid $primary;
294
+ }
295
+
296
+ #status-log {
297
+ height: 7;
298
+ margin: 0 1;
299
+ border: solid $accent;
300
+ }
301
+ """
302
+
303
+ class SpeechToTextApp(App):
304
+ TITLE = "stttui"
305
+ CSS = APP_CSS
306
+
307
+ BINDINGS = [
308
+ Binding("ctrl+shift+space", "toggle_record_binding", "Record", show=True),
309
+ Binding("ctrl+c", "quit_app", "Quit", show=True, priority=True),
310
+ ]
311
+
312
+ state = reactive("IDLE")
313
+
314
+ def __init__(self, initial_model: str = "base"):
315
+ super().__init__()
316
+ self.engine = SpeechToText(model_name=initial_model)
317
+ self._initial_model = initial_model
318
+ self._listener = None
319
+ # Threading event for cross-thread signalling
320
+ self._toggle_event = threading.Event()
321
+ self._last_toggle_time = 0.0
322
+ # Queues for engine callbacks → UI
323
+ self._status_queue = []
324
+ self._transcription_queue = []
325
+ self._level_value = 0.0
326
+ self._pending_state = None
327
+ self._queue_lock = threading.Lock()
328
+
329
+ def compose(self) -> ComposeResult:
330
+ yield Header()
331
+ with Horizontal(id="top-bar"):
332
+ yield Select(
333
+ [(s, s) for s in MODEL_SIZES],
334
+ value=self._initial_model,
335
+ id="model-select",
336
+ allow_blank=False,
337
+ )
338
+ yield Static("IDLE", id="status-badge", classes="idle")
339
+ with Vertical(id="main-content"):
340
+ yield ProgressBar(id="level-meter", total=100, show_eta=False, show_percentage=False)
341
+ yield RichLog(id="transcription-history", highlight=True, markup=True)
342
+ yield RichLog(id="status-log", highlight=True, markup=True, max_lines=50)
343
+ yield Footer()
344
+
345
+ def on_mount(self) -> None:
346
+ # Wire engine callbacks to queue updates (thread-safe)
347
+ self.engine.on_status = self._enqueue_status
348
+ self.engine.on_transcription = self._enqueue_transcription
349
+ self.engine.on_level = self._enqueue_level
350
+ self.engine.on_state_change = self._enqueue_state
351
+
352
+ # Register global record hotkey via pynput
353
+ self._listener = _create_hotkey_listener(
354
+ on_record=self._toggle_event.set,
355
+ )
356
+ self._listener.start()
357
+
358
+ # Poll events and queues from Textual's own timer (~20 Hz)
359
+ self.set_interval(0.05, self._poll)
360
+
361
+ # Load model in background
362
+ self.run_worker(self._load_model_worker, thread=True)
363
+
364
+ async def _load_model_worker(self) -> None:
365
+ self.engine.load_model()
366
+
367
+ # -- Thread-safe enqueue helpers (called from engine/audio threads) --
368
+
369
+ def _enqueue_status(self, msg: str) -> None:
370
+ with self._queue_lock:
371
+ self._status_queue.append(msg)
372
+
373
+ def _enqueue_transcription(self, text: str, ts: str) -> None:
374
+ with self._queue_lock:
375
+ self._transcription_queue.append((text, ts))
376
+
377
+ def _enqueue_level(self, rms: float) -> None:
378
+ self._level_value = rms
379
+
380
+ def _enqueue_state(self, s: str) -> None:
381
+ self._pending_state = s
382
+
383
+ # -- Polling (runs on Textual main thread via set_interval) --
384
+
385
+ def _poll(self) -> None:
386
+ # Check hotkey events
387
+ if self._toggle_event.is_set():
388
+ self._toggle_event.clear()
389
+ self._toggle_record()
390
+
391
+ # Drain queued UI updates
392
+ with self._queue_lock:
393
+ statuses = self._status_queue[:]
394
+ self._status_queue.clear()
395
+ transcriptions = self._transcription_queue[:]
396
+ self._transcription_queue.clear()
397
+
398
+ for msg in statuses:
399
+ self._log_status(msg)
400
+ for text, ts in transcriptions:
401
+ self._add_transcription(text, ts)
402
+
403
+ # Update level meter
404
+ rms = self._level_value
405
+ level = min(100, int(rms * 700))
406
+ bar = self.query_one("#level-meter", ProgressBar)
407
+ bar.update(progress=level)
408
+
409
+ # Update state
410
+ pending = self._pending_state
411
+ if pending is not None:
412
+ self._pending_state = None
413
+ self.state = pending
414
+
415
+ # -- State management --
416
+
417
+ def watch_state(self, new_state: str) -> None:
418
+ badge = self.query_one("#status-badge", Static)
419
+ badge.update(new_state)
420
+ badge.remove_class("idle", "recording", "transcribing")
421
+ badge.add_class(new_state.lower())
422
+
423
+ if new_state != "RECORDING":
424
+ level = self.query_one("#level-meter", ProgressBar)
425
+ level.update(progress=0)
426
+
427
+ # -- UI helpers --
428
+
429
+ def _log_status(self, msg: str) -> None:
430
+ log = self.query_one("#status-log", RichLog)
431
+ log.write(msg)
432
+
433
+ def _add_transcription(self, text: str, timestamp: str) -> None:
434
+ history = self.query_one("#transcription-history", RichLog)
435
+ history.write(f"[dim]{timestamp}[/dim] {text}")
436
+ self._log_status("Copied to clipboard!")
437
+
438
+ # -- Actions --
439
+
440
+ def _toggle_record(self) -> None:
441
+ # Debounce: both pynput and Textual may fire for the same keypress
442
+ now = time.monotonic()
443
+ if now - self._last_toggle_time < 0.3:
444
+ return
445
+ self._last_toggle_time = now
446
+
447
+ if self.engine.model is None:
448
+ self._log_status("Model still loading, please wait...")
449
+ return
450
+ if self.state == "TRANSCRIBING":
451
+ self._log_status("Still transcribing, please wait...")
452
+ return
453
+
454
+ if self.state == "IDLE":
455
+ self.state = "RECORDING"
456
+ self.engine.start_recording()
457
+ else:
458
+ self.state = "TRANSCRIBING"
459
+ self.engine.stop_recording()
460
+ self.run_worker(self._transcribe_worker, thread=True)
461
+
462
+ async def _transcribe_worker(self) -> None:
463
+ self.engine.transcribe()
464
+
465
+ def action_toggle_record_binding(self) -> None:
466
+ self._toggle_record()
467
+
468
+ def action_quit_app(self) -> None:
469
+ if self._listener:
470
+ self._listener.stop()
471
+ if self.engine.stream:
472
+ self.engine.stream.stop()
473
+ self.engine.stream.close()
474
+ self.exit()
475
+
476
+ def on_select_changed(self, event: Select.Changed) -> None:
477
+ new_model = str(event.value)
478
+ if new_model == self.engine.model_name:
479
+ return
480
+ if self.state != "IDLE":
481
+ self._log_status("Cannot change model while recording/transcribing.")
482
+ event.select.value = self.engine.model_name
483
+ return
484
+ self.engine.model_name = new_model
485
+ self.engine.model = None
486
+ self._log_status(f"Switching to model '{new_model}'...")
487
+ self.run_worker(self._load_model_worker, thread=True)
488
+
489
+ app = SpeechToTextApp(initial_model=model_name)
490
+ app.run()
491
+
492
+
493
+ # ---------------------------------------------------------------------------
494
+ # Entry point
495
+ # ---------------------------------------------------------------------------
496
+
497
+ def run_headless(model_name="base", duration=None):
498
+ """Record for a fixed duration (or until Enter), transcribe, print to stdout, and exit."""
499
+ stt = SpeechToText(model_name=model_name)
500
+
501
+ # Suppress status messages; only output the transcription
502
+ stt.on_status = lambda msg: sys.stderr.write(f"{msg}\n")
503
+ stt.on_transcription = lambda text, ts: print(text)
504
+
505
+ stt.load_model()
506
+
507
+ stt.start_recording()
508
+
509
+ if duration:
510
+ sys.stderr.write(f"Recording for {duration}s...\n")
511
+ time.sleep(duration)
512
+ else:
513
+ sys.stderr.write("Recording... press Enter to stop.\n")
514
+ input()
515
+
516
+ stt.stop_recording()
517
+ stt.transcribe()
518
+
519
+
520
+ def main():
521
+ parser = argparse.ArgumentParser(description="stttui — Speech-to-Text TUI")
522
+ parser.add_argument("--version", action="version", version=f"stttui {VERSION}")
523
+ parser.add_argument("--cli", action="store_true", help="Run in CLI-only mode (no TUI)")
524
+ parser.add_argument("--headless", action="store_true", help="Record, transcribe, print to stdout, and exit (no hotkeys/UI)")
525
+ parser.add_argument("--duration", type=float, default=None, help="Recording duration in seconds (headless mode; omit to wait for Enter)")
526
+ parser.add_argument("--model", default="base", choices=MODEL_SIZES, help="Whisper model size")
527
+ args = parser.parse_args()
528
+
529
+ if args.headless:
530
+ run_headless(model_name=args.model, duration=args.duration)
531
+ elif args.cli:
532
+ run_cli(model_name=args.model)
533
+ else:
534
+ run_tui(model_name=args.model)
535
+
536
+
537
+ if __name__ == "__main__":
538
+ main()
@@ -0,0 +1,195 @@
1
+ Metadata-Version: 2.4
2
+ Name: stttui
3
+ Version: 0.2.0
4
+ Summary: Speech-to-Text TUI — offline transcription powered by faster-whisper
5
+ Author: Anthony Holten
6
+ License-Expression: MIT
7
+ Project-URL: Homepage, https://github.com/aholten/sttui
8
+ Project-URL: Repository, https://github.com/aholten/sttui
9
+ Project-URL: Issues, https://github.com/aholten/sttui/issues
10
+ Keywords: speech-to-text,whisper,tui,transcription,offline
11
+ Classifier: Development Status :: 4 - Beta
12
+ Classifier: Environment :: Console
13
+ Classifier: Intended Audience :: Developers
14
+ Classifier: Intended Audience :: End Users/Desktop
15
+ Classifier: Operating System :: OS Independent
16
+ Classifier: Programming Language :: Python :: 3
17
+ Classifier: Programming Language :: Python :: 3.10
18
+ Classifier: Programming Language :: Python :: 3.11
19
+ Classifier: Programming Language :: Python :: 3.12
20
+ Classifier: Programming Language :: Python :: 3.13
21
+ Classifier: Topic :: Multimedia :: Sound/Audio :: Speech
22
+ Requires-Python: >=3.10
23
+ Description-Content-Type: text/markdown
24
+ License-File: LICENSE
25
+ Requires-Dist: faster-whisper<2.0.0,>=1.0.0
26
+ Requires-Dist: sounddevice<1.0.0,>=0.4.0
27
+ Requires-Dist: soundfile<1.0.0,>=0.12.0
28
+ Requires-Dist: pynput<2.0.0,>=1.7.0
29
+ Requires-Dist: pyperclip<2.0.0,>=1.8.0
30
+ Requires-Dist: numpy<3.0.0,>=1.24.0
31
+ Requires-Dist: tzdata>=2024.1
32
+ Requires-Dist: textual<1.0.0,>=0.40.0
33
+ Dynamic: license-file
34
+
35
+ # stttui: Speech To Text Terminal User Interface
36
+
37
+ > **v0.2.0** | Python 3.10+ | MIT License
38
+
39
+ A local, fully offline speech-to-text tool powered by [faster-whisper](https://github.com/SYSTRAN/faster-whisper). Record audio with a global hotkey from any window, transcribe it on-device, and get the result copied straight to your clipboard — no cloud, no API keys, no latency.
40
+
41
+ Comes with a polished terminal UI built on [Textual](https://github.com/Textualize/textual), complete with a live audio level meter, transcription history, and on-the-fly model switching.
42
+
43
+ <p align="center">
44
+ <img src="demo.gif" alt="stttui in action" width="700">
45
+ </p>
46
+
47
+ ---
48
+
49
+ ## Features
50
+
51
+ - **Fully offline** — runs entirely on your machine using faster-whisper with int8 quantization. No data ever leaves your device.
52
+ - **Global hotkeys** — record from anywhere without focusing the terminal
53
+ - **Live audio meter** — real-time visual feedback while recording
54
+ - **Transcription history** — timestamped, scrollable log of every dictation
55
+ - **Model switching** — swap between `tiny`, `base`, `small`, `medium`, and `large` whisper models on the fly
56
+ - **Clipboard integration** — transcriptions are automatically copied and ready to paste
57
+ - **Cross-platform** — works on Windows, macOS, and Linux
58
+ - **Three modes** — rich TUI (default), lightweight CLI (`--cli`), or headless (`--headless`) for scripting/piping
59
+
60
+ ## Installation
61
+
62
+ ### Homebrew (macOS / Linux)
63
+
64
+ ```bash
65
+ brew tap aholten/tap
66
+ brew install stttui
67
+ ```
68
+
69
+ ### pip / pipx
70
+
71
+ ```bash
72
+ # With pipx (recommended — isolated environment)
73
+ pipx install stttui
74
+
75
+ # Or with pip
76
+ pip install stttui
77
+ ```
78
+
79
+ ### From source
80
+
81
+ ```bash
82
+ git clone https://github.com/aholten/sttui.git
83
+ cd stttui
84
+ pip install .
85
+ ```
86
+
87
+ > On macOS you may need to grant terminal accessibility permissions for global hotkeys.
88
+ > On Linux, install `portaudio` (`sudo apt install portaudio19-dev`) and `xclip` for clipboard support.
89
+
90
+ ## Usage
91
+
92
+ ### Hotkeys
93
+
94
+ These work **globally** — even when the terminal is not focused:
95
+
96
+ | Hotkey | Action |
97
+ |---|---|
98
+ | `Ctrl+Shift+Space` | Start / stop recording |
99
+ | `Ctrl+C` | Quit |
100
+
101
+ ### Workflow
102
+
103
+ 1. Launch the tool — the whisper model loads in the background
104
+ 2. Switch to any application (browser, editor, chat, etc.)
105
+ 3. Press **Ctrl+Shift+Space** to start recording
106
+ 4. Speak
107
+ 5. Press **Ctrl+Shift+Space** again to stop
108
+ 6. The transcription is copied to your clipboard — paste with **Ctrl+V**
109
+
110
+ ### TUI Mode (default)
111
+
112
+ ```bash
113
+ stttui
114
+ ```
115
+
116
+ The terminal interface includes:
117
+ - **Model selector** — dropdown to switch whisper models without restarting
118
+ - **Status badge** — shows IDLE / RECORDING / TRANSCRIBING state
119
+ - **Level meter** — live audio input visualization
120
+ - **Transcription history** — scrollable log with timestamps
121
+ - **Status log** — model loading progress and system messages
122
+
123
+ ### CLI Mode
124
+
125
+ ```bash
126
+ stttui --cli
127
+ ```
128
+
129
+ Minimal output, no UI — just hotkeys and clipboard.
130
+
131
+ ### Headless Mode
132
+
133
+ Record, transcribe, print to stdout, and exit. No hotkeys or UI — ideal for scripts and piping.
134
+
135
+ ```bash
136
+ # Record until crtl+shift+space is pressed
137
+ stttui --headless
138
+
139
+ # Record for exactly 10 seconds
140
+ stttui --headless --duration 10
141
+
142
+ # Pipe transcription to another command
143
+ stttui --headless --duration 5 | xargs echo "You said:"
144
+ ```
145
+
146
+ Status messages go to stderr, transcription goes to stdout.
147
+
148
+ ### Options
149
+
150
+ | Flag | Description | Default |
151
+ |---|---|---|
152
+ | `--cli` | Run in CLI-only mode (no TUI) | off |
153
+ | `--headless` | Record, transcribe, print to stdout, and exit | off |
154
+ | `--duration` | Recording duration in seconds (headless mode; omit to wait for Enter) | — |
155
+ | `--model` | Whisper model size | `base` |
156
+ | `--version` | Print version and exit | — |
157
+
158
+ ```bash
159
+ # Example: launch with the small model
160
+ stttui --model small
161
+ ```
162
+
163
+ ## Model Sizes
164
+
165
+ Larger models are more accurate but slower to load and transcribe. All models run on CPU with int8 quantization.
166
+
167
+ | Model | Parameters | Speed |
168
+ |---|---|---|
169
+ | `tiny` | 39M | Fastest |
170
+ | `base` | 74M | Fast |
171
+ | `small` | 244M | Moderate |
172
+ | `medium` | 769M | Slow |
173
+ | `large` | 1550M | Slowest |
174
+
175
+ In TUI mode you can switch models from the dropdown at any time.
176
+
177
+ ## Platform Notes
178
+
179
+ | Platform | Notes |
180
+ |---|---|
181
+ | **Windows** | Works out of the box via Git Bash or MSYS2. |
182
+ | **macOS** | Clipboard works via `pbcopy`. You may need to grant terminal accessibility permissions for global hotkeys. |
183
+ | **Linux** | Install `xclip` or `xsel` for clipboard support (`sudo apt install xclip`). |
184
+
185
+ ## Transcription Log
186
+
187
+ All transcriptions are automatically saved to `transcription_log.txt` with timestamps for reference.
188
+
189
+ ## License
190
+
191
+ MIT
192
+
193
+ ## Author
194
+
195
+ Anthony Holten @aholten on GitHub
@@ -0,0 +1,12 @@
1
+ LICENSE
2
+ README.md
3
+ pyproject.toml
4
+ src/stttui/__init__.py
5
+ src/stttui/__main__.py
6
+ src/stttui/speech_to_text.py
7
+ src/stttui.egg-info/PKG-INFO
8
+ src/stttui.egg-info/SOURCES.txt
9
+ src/stttui.egg-info/dependency_links.txt
10
+ src/stttui.egg-info/entry_points.txt
11
+ src/stttui.egg-info/requires.txt
12
+ src/stttui.egg-info/top_level.txt
@@ -0,0 +1,2 @@
1
+ [console_scripts]
2
+ stttui = stttui.speech_to_text:main
@@ -0,0 +1,8 @@
1
+ faster-whisper<2.0.0,>=1.0.0
2
+ sounddevice<1.0.0,>=0.4.0
3
+ soundfile<1.0.0,>=0.12.0
4
+ pynput<2.0.0,>=1.7.0
5
+ pyperclip<2.0.0,>=1.8.0
6
+ numpy<3.0.0,>=1.24.0
7
+ tzdata>=2024.1
8
+ textual<1.0.0,>=0.40.0
@@ -0,0 +1 @@
1
+ stttui