stttui 0.2.0__tar.gz
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- stttui-0.2.0/LICENSE +21 -0
- stttui-0.2.0/PKG-INFO +195 -0
- stttui-0.2.0/README.md +161 -0
- stttui-0.2.0/pyproject.toml +49 -0
- stttui-0.2.0/setup.cfg +4 -0
- stttui-0.2.0/src/stttui/__init__.py +3 -0
- stttui-0.2.0/src/stttui/__main__.py +5 -0
- stttui-0.2.0/src/stttui/speech_to_text.py +538 -0
- stttui-0.2.0/src/stttui.egg-info/PKG-INFO +195 -0
- stttui-0.2.0/src/stttui.egg-info/SOURCES.txt +12 -0
- stttui-0.2.0/src/stttui.egg-info/dependency_links.txt +1 -0
- stttui-0.2.0/src/stttui.egg-info/entry_points.txt +2 -0
- stttui-0.2.0/src/stttui.egg-info/requires.txt +8 -0
- stttui-0.2.0/src/stttui.egg-info/top_level.txt +1 -0
stttui-0.2.0/LICENSE
ADDED
|
@@ -0,0 +1,21 @@
|
|
|
1
|
+
MIT License
|
|
2
|
+
|
|
3
|
+
Copyright (c) 2026 stttui contributors
|
|
4
|
+
|
|
5
|
+
Permission is hereby granted, free of charge, to any person obtaining a copy
|
|
6
|
+
of this software and associated documentation files (the "Software"), to deal
|
|
7
|
+
in the Software without restriction, including without limitation the rights
|
|
8
|
+
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
|
|
9
|
+
copies of the Software, and to permit persons to whom the Software is
|
|
10
|
+
furnished to do so, subject to the following conditions:
|
|
11
|
+
|
|
12
|
+
The above copyright notice and this permission notice shall be included in all
|
|
13
|
+
copies or substantial portions of the Software.
|
|
14
|
+
|
|
15
|
+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
|
16
|
+
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
|
17
|
+
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
|
|
18
|
+
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
|
19
|
+
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
|
|
20
|
+
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
|
|
21
|
+
SOFTWARE.
|
stttui-0.2.0/PKG-INFO
ADDED
|
@@ -0,0 +1,195 @@
|
|
|
1
|
+
Metadata-Version: 2.4
|
|
2
|
+
Name: stttui
|
|
3
|
+
Version: 0.2.0
|
|
4
|
+
Summary: Speech-to-Text TUI — offline transcription powered by faster-whisper
|
|
5
|
+
Author: Anthony Holten
|
|
6
|
+
License-Expression: MIT
|
|
7
|
+
Project-URL: Homepage, https://github.com/aholten/sttui
|
|
8
|
+
Project-URL: Repository, https://github.com/aholten/sttui
|
|
9
|
+
Project-URL: Issues, https://github.com/aholten/sttui/issues
|
|
10
|
+
Keywords: speech-to-text,whisper,tui,transcription,offline
|
|
11
|
+
Classifier: Development Status :: 4 - Beta
|
|
12
|
+
Classifier: Environment :: Console
|
|
13
|
+
Classifier: Intended Audience :: Developers
|
|
14
|
+
Classifier: Intended Audience :: End Users/Desktop
|
|
15
|
+
Classifier: Operating System :: OS Independent
|
|
16
|
+
Classifier: Programming Language :: Python :: 3
|
|
17
|
+
Classifier: Programming Language :: Python :: 3.10
|
|
18
|
+
Classifier: Programming Language :: Python :: 3.11
|
|
19
|
+
Classifier: Programming Language :: Python :: 3.12
|
|
20
|
+
Classifier: Programming Language :: Python :: 3.13
|
|
21
|
+
Classifier: Topic :: Multimedia :: Sound/Audio :: Speech
|
|
22
|
+
Requires-Python: >=3.10
|
|
23
|
+
Description-Content-Type: text/markdown
|
|
24
|
+
License-File: LICENSE
|
|
25
|
+
Requires-Dist: faster-whisper<2.0.0,>=1.0.0
|
|
26
|
+
Requires-Dist: sounddevice<1.0.0,>=0.4.0
|
|
27
|
+
Requires-Dist: soundfile<1.0.0,>=0.12.0
|
|
28
|
+
Requires-Dist: pynput<2.0.0,>=1.7.0
|
|
29
|
+
Requires-Dist: pyperclip<2.0.0,>=1.8.0
|
|
30
|
+
Requires-Dist: numpy<3.0.0,>=1.24.0
|
|
31
|
+
Requires-Dist: tzdata>=2024.1
|
|
32
|
+
Requires-Dist: textual<1.0.0,>=0.40.0
|
|
33
|
+
Dynamic: license-file
|
|
34
|
+
|
|
35
|
+
# stttui: Speech To Text Terminal User Interface
|
|
36
|
+
|
|
37
|
+
> **v0.2.0** | Python 3.10+ | MIT License
|
|
38
|
+
|
|
39
|
+
A local, fully offline speech-to-text tool powered by [faster-whisper](https://github.com/SYSTRAN/faster-whisper). Record audio with a global hotkey from any window, transcribe it on-device, and get the result copied straight to your clipboard — no cloud, no API keys, no latency.
|
|
40
|
+
|
|
41
|
+
Comes with a polished terminal UI built on [Textual](https://github.com/Textualize/textual), complete with a live audio level meter, transcription history, and on-the-fly model switching.
|
|
42
|
+
|
|
43
|
+
<p align="center">
|
|
44
|
+
<img src="demo.gif" alt="stttui in action" width="700">
|
|
45
|
+
</p>
|
|
46
|
+
|
|
47
|
+
---
|
|
48
|
+
|
|
49
|
+
## Features
|
|
50
|
+
|
|
51
|
+
- **Fully offline** — runs entirely on your machine using faster-whisper with int8 quantization. No data ever leaves your device.
|
|
52
|
+
- **Global hotkeys** — record from anywhere without focusing the terminal
|
|
53
|
+
- **Live audio meter** — real-time visual feedback while recording
|
|
54
|
+
- **Transcription history** — timestamped, scrollable log of every dictation
|
|
55
|
+
- **Model switching** — swap between `tiny`, `base`, `small`, `medium`, and `large` whisper models on the fly
|
|
56
|
+
- **Clipboard integration** — transcriptions are automatically copied and ready to paste
|
|
57
|
+
- **Cross-platform** — works on Windows, macOS, and Linux
|
|
58
|
+
- **Three modes** — rich TUI (default), lightweight CLI (`--cli`), or headless (`--headless`) for scripting/piping
|
|
59
|
+
|
|
60
|
+
## Installation
|
|
61
|
+
|
|
62
|
+
### Homebrew (macOS / Linux)
|
|
63
|
+
|
|
64
|
+
```bash
|
|
65
|
+
brew tap aholten/tap
|
|
66
|
+
brew install stttui
|
|
67
|
+
```
|
|
68
|
+
|
|
69
|
+
### pip / pipx
|
|
70
|
+
|
|
71
|
+
```bash
|
|
72
|
+
# With pipx (recommended — isolated environment)
|
|
73
|
+
pipx install stttui
|
|
74
|
+
|
|
75
|
+
# Or with pip
|
|
76
|
+
pip install stttui
|
|
77
|
+
```
|
|
78
|
+
|
|
79
|
+
### From source
|
|
80
|
+
|
|
81
|
+
```bash
|
|
82
|
+
git clone https://github.com/aholten/sttui.git
|
|
83
|
+
cd stttui
|
|
84
|
+
pip install .
|
|
85
|
+
```
|
|
86
|
+
|
|
87
|
+
> On macOS you may need to grant terminal accessibility permissions for global hotkeys.
|
|
88
|
+
> On Linux, install `portaudio` (`sudo apt install portaudio19-dev`) and `xclip` for clipboard support.
|
|
89
|
+
|
|
90
|
+
## Usage
|
|
91
|
+
|
|
92
|
+
### Hotkeys
|
|
93
|
+
|
|
94
|
+
These work **globally** — even when the terminal is not focused:
|
|
95
|
+
|
|
96
|
+
| Hotkey | Action |
|
|
97
|
+
|---|---|
|
|
98
|
+
| `Ctrl+Shift+Space` | Start / stop recording |
|
|
99
|
+
| `Ctrl+C` | Quit |
|
|
100
|
+
|
|
101
|
+
### Workflow
|
|
102
|
+
|
|
103
|
+
1. Launch the tool — the whisper model loads in the background
|
|
104
|
+
2. Switch to any application (browser, editor, chat, etc.)
|
|
105
|
+
3. Press **Ctrl+Shift+Space** to start recording
|
|
106
|
+
4. Speak
|
|
107
|
+
5. Press **Ctrl+Shift+Space** again to stop
|
|
108
|
+
6. The transcription is copied to your clipboard — paste with **Ctrl+V**
|
|
109
|
+
|
|
110
|
+
### TUI Mode (default)
|
|
111
|
+
|
|
112
|
+
```bash
|
|
113
|
+
stttui
|
|
114
|
+
```
|
|
115
|
+
|
|
116
|
+
The terminal interface includes:
|
|
117
|
+
- **Model selector** — dropdown to switch whisper models without restarting
|
|
118
|
+
- **Status badge** — shows IDLE / RECORDING / TRANSCRIBING state
|
|
119
|
+
- **Level meter** — live audio input visualization
|
|
120
|
+
- **Transcription history** — scrollable log with timestamps
|
|
121
|
+
- **Status log** — model loading progress and system messages
|
|
122
|
+
|
|
123
|
+
### CLI Mode
|
|
124
|
+
|
|
125
|
+
```bash
|
|
126
|
+
stttui --cli
|
|
127
|
+
```
|
|
128
|
+
|
|
129
|
+
Minimal output, no UI — just hotkeys and clipboard.
|
|
130
|
+
|
|
131
|
+
### Headless Mode
|
|
132
|
+
|
|
133
|
+
Record, transcribe, print to stdout, and exit. No hotkeys or UI — ideal for scripts and piping.
|
|
134
|
+
|
|
135
|
+
```bash
|
|
136
|
+
# Record until crtl+shift+space is pressed
|
|
137
|
+
stttui --headless
|
|
138
|
+
|
|
139
|
+
# Record for exactly 10 seconds
|
|
140
|
+
stttui --headless --duration 10
|
|
141
|
+
|
|
142
|
+
# Pipe transcription to another command
|
|
143
|
+
stttui --headless --duration 5 | xargs echo "You said:"
|
|
144
|
+
```
|
|
145
|
+
|
|
146
|
+
Status messages go to stderr, transcription goes to stdout.
|
|
147
|
+
|
|
148
|
+
### Options
|
|
149
|
+
|
|
150
|
+
| Flag | Description | Default |
|
|
151
|
+
|---|---|---|
|
|
152
|
+
| `--cli` | Run in CLI-only mode (no TUI) | off |
|
|
153
|
+
| `--headless` | Record, transcribe, print to stdout, and exit | off |
|
|
154
|
+
| `--duration` | Recording duration in seconds (headless mode; omit to wait for Enter) | — |
|
|
155
|
+
| `--model` | Whisper model size | `base` |
|
|
156
|
+
| `--version` | Print version and exit | — |
|
|
157
|
+
|
|
158
|
+
```bash
|
|
159
|
+
# Example: launch with the small model
|
|
160
|
+
stttui --model small
|
|
161
|
+
```
|
|
162
|
+
|
|
163
|
+
## Model Sizes
|
|
164
|
+
|
|
165
|
+
Larger models are more accurate but slower to load and transcribe. All models run on CPU with int8 quantization.
|
|
166
|
+
|
|
167
|
+
| Model | Parameters | Speed |
|
|
168
|
+
|---|---|---|
|
|
169
|
+
| `tiny` | 39M | Fastest |
|
|
170
|
+
| `base` | 74M | Fast |
|
|
171
|
+
| `small` | 244M | Moderate |
|
|
172
|
+
| `medium` | 769M | Slow |
|
|
173
|
+
| `large` | 1550M | Slowest |
|
|
174
|
+
|
|
175
|
+
In TUI mode you can switch models from the dropdown at any time.
|
|
176
|
+
|
|
177
|
+
## Platform Notes
|
|
178
|
+
|
|
179
|
+
| Platform | Notes |
|
|
180
|
+
|---|---|
|
|
181
|
+
| **Windows** | Works out of the box via Git Bash or MSYS2. |
|
|
182
|
+
| **macOS** | Clipboard works via `pbcopy`. You may need to grant terminal accessibility permissions for global hotkeys. |
|
|
183
|
+
| **Linux** | Install `xclip` or `xsel` for clipboard support (`sudo apt install xclip`). |
|
|
184
|
+
|
|
185
|
+
## Transcription Log
|
|
186
|
+
|
|
187
|
+
All transcriptions are automatically saved to `transcription_log.txt` with timestamps for reference.
|
|
188
|
+
|
|
189
|
+
## License
|
|
190
|
+
|
|
191
|
+
MIT
|
|
192
|
+
|
|
193
|
+
## Author
|
|
194
|
+
|
|
195
|
+
Anthony Holten @aholten on GitHub
|
stttui-0.2.0/README.md
ADDED
|
@@ -0,0 +1,161 @@
|
|
|
1
|
+
# stttui: Speech To Text Terminal User Interface
|
|
2
|
+
|
|
3
|
+
> **v0.2.0** | Python 3.10+ | MIT License
|
|
4
|
+
|
|
5
|
+
A local, fully offline speech-to-text tool powered by [faster-whisper](https://github.com/SYSTRAN/faster-whisper). Record audio with a global hotkey from any window, transcribe it on-device, and get the result copied straight to your clipboard — no cloud, no API keys, no latency.
|
|
6
|
+
|
|
7
|
+
Comes with a polished terminal UI built on [Textual](https://github.com/Textualize/textual), complete with a live audio level meter, transcription history, and on-the-fly model switching.
|
|
8
|
+
|
|
9
|
+
<p align="center">
|
|
10
|
+
<img src="demo.gif" alt="stttui in action" width="700">
|
|
11
|
+
</p>
|
|
12
|
+
|
|
13
|
+
---
|
|
14
|
+
|
|
15
|
+
## Features
|
|
16
|
+
|
|
17
|
+
- **Fully offline** — runs entirely on your machine using faster-whisper with int8 quantization. No data ever leaves your device.
|
|
18
|
+
- **Global hotkeys** — record from anywhere without focusing the terminal
|
|
19
|
+
- **Live audio meter** — real-time visual feedback while recording
|
|
20
|
+
- **Transcription history** — timestamped, scrollable log of every dictation
|
|
21
|
+
- **Model switching** — swap between `tiny`, `base`, `small`, `medium`, and `large` whisper models on the fly
|
|
22
|
+
- **Clipboard integration** — transcriptions are automatically copied and ready to paste
|
|
23
|
+
- **Cross-platform** — works on Windows, macOS, and Linux
|
|
24
|
+
- **Three modes** — rich TUI (default), lightweight CLI (`--cli`), or headless (`--headless`) for scripting/piping
|
|
25
|
+
|
|
26
|
+
## Installation
|
|
27
|
+
|
|
28
|
+
### Homebrew (macOS / Linux)
|
|
29
|
+
|
|
30
|
+
```bash
|
|
31
|
+
brew tap aholten/tap
|
|
32
|
+
brew install stttui
|
|
33
|
+
```
|
|
34
|
+
|
|
35
|
+
### pip / pipx
|
|
36
|
+
|
|
37
|
+
```bash
|
|
38
|
+
# With pipx (recommended — isolated environment)
|
|
39
|
+
pipx install stttui
|
|
40
|
+
|
|
41
|
+
# Or with pip
|
|
42
|
+
pip install stttui
|
|
43
|
+
```
|
|
44
|
+
|
|
45
|
+
### From source
|
|
46
|
+
|
|
47
|
+
```bash
|
|
48
|
+
git clone https://github.com/aholten/sttui.git
|
|
49
|
+
cd stttui
|
|
50
|
+
pip install .
|
|
51
|
+
```
|
|
52
|
+
|
|
53
|
+
> On macOS you may need to grant terminal accessibility permissions for global hotkeys.
|
|
54
|
+
> On Linux, install `portaudio` (`sudo apt install portaudio19-dev`) and `xclip` for clipboard support.
|
|
55
|
+
|
|
56
|
+
## Usage
|
|
57
|
+
|
|
58
|
+
### Hotkeys
|
|
59
|
+
|
|
60
|
+
These work **globally** — even when the terminal is not focused:
|
|
61
|
+
|
|
62
|
+
| Hotkey | Action |
|
|
63
|
+
|---|---|
|
|
64
|
+
| `Ctrl+Shift+Space` | Start / stop recording |
|
|
65
|
+
| `Ctrl+C` | Quit |
|
|
66
|
+
|
|
67
|
+
### Workflow
|
|
68
|
+
|
|
69
|
+
1. Launch the tool — the whisper model loads in the background
|
|
70
|
+
2. Switch to any application (browser, editor, chat, etc.)
|
|
71
|
+
3. Press **Ctrl+Shift+Space** to start recording
|
|
72
|
+
4. Speak
|
|
73
|
+
5. Press **Ctrl+Shift+Space** again to stop
|
|
74
|
+
6. The transcription is copied to your clipboard — paste with **Ctrl+V**
|
|
75
|
+
|
|
76
|
+
### TUI Mode (default)
|
|
77
|
+
|
|
78
|
+
```bash
|
|
79
|
+
stttui
|
|
80
|
+
```
|
|
81
|
+
|
|
82
|
+
The terminal interface includes:
|
|
83
|
+
- **Model selector** — dropdown to switch whisper models without restarting
|
|
84
|
+
- **Status badge** — shows IDLE / RECORDING / TRANSCRIBING state
|
|
85
|
+
- **Level meter** — live audio input visualization
|
|
86
|
+
- **Transcription history** — scrollable log with timestamps
|
|
87
|
+
- **Status log** — model loading progress and system messages
|
|
88
|
+
|
|
89
|
+
### CLI Mode
|
|
90
|
+
|
|
91
|
+
```bash
|
|
92
|
+
stttui --cli
|
|
93
|
+
```
|
|
94
|
+
|
|
95
|
+
Minimal output, no UI — just hotkeys and clipboard.
|
|
96
|
+
|
|
97
|
+
### Headless Mode
|
|
98
|
+
|
|
99
|
+
Record, transcribe, print to stdout, and exit. No hotkeys or UI — ideal for scripts and piping.
|
|
100
|
+
|
|
101
|
+
```bash
|
|
102
|
+
# Record until crtl+shift+space is pressed
|
|
103
|
+
stttui --headless
|
|
104
|
+
|
|
105
|
+
# Record for exactly 10 seconds
|
|
106
|
+
stttui --headless --duration 10
|
|
107
|
+
|
|
108
|
+
# Pipe transcription to another command
|
|
109
|
+
stttui --headless --duration 5 | xargs echo "You said:"
|
|
110
|
+
```
|
|
111
|
+
|
|
112
|
+
Status messages go to stderr, transcription goes to stdout.
|
|
113
|
+
|
|
114
|
+
### Options
|
|
115
|
+
|
|
116
|
+
| Flag | Description | Default |
|
|
117
|
+
|---|---|---|
|
|
118
|
+
| `--cli` | Run in CLI-only mode (no TUI) | off |
|
|
119
|
+
| `--headless` | Record, transcribe, print to stdout, and exit | off |
|
|
120
|
+
| `--duration` | Recording duration in seconds (headless mode; omit to wait for Enter) | — |
|
|
121
|
+
| `--model` | Whisper model size | `base` |
|
|
122
|
+
| `--version` | Print version and exit | — |
|
|
123
|
+
|
|
124
|
+
```bash
|
|
125
|
+
# Example: launch with the small model
|
|
126
|
+
stttui --model small
|
|
127
|
+
```
|
|
128
|
+
|
|
129
|
+
## Model Sizes
|
|
130
|
+
|
|
131
|
+
Larger models are more accurate but slower to load and transcribe. All models run on CPU with int8 quantization.
|
|
132
|
+
|
|
133
|
+
| Model | Parameters | Speed |
|
|
134
|
+
|---|---|---|
|
|
135
|
+
| `tiny` | 39M | Fastest |
|
|
136
|
+
| `base` | 74M | Fast |
|
|
137
|
+
| `small` | 244M | Moderate |
|
|
138
|
+
| `medium` | 769M | Slow |
|
|
139
|
+
| `large` | 1550M | Slowest |
|
|
140
|
+
|
|
141
|
+
In TUI mode you can switch models from the dropdown at any time.
|
|
142
|
+
|
|
143
|
+
## Platform Notes
|
|
144
|
+
|
|
145
|
+
| Platform | Notes |
|
|
146
|
+
|---|---|
|
|
147
|
+
| **Windows** | Works out of the box via Git Bash or MSYS2. |
|
|
148
|
+
| **macOS** | Clipboard works via `pbcopy`. You may need to grant terminal accessibility permissions for global hotkeys. |
|
|
149
|
+
| **Linux** | Install `xclip` or `xsel` for clipboard support (`sudo apt install xclip`). |
|
|
150
|
+
|
|
151
|
+
## Transcription Log
|
|
152
|
+
|
|
153
|
+
All transcriptions are automatically saved to `transcription_log.txt` with timestamps for reference.
|
|
154
|
+
|
|
155
|
+
## License
|
|
156
|
+
|
|
157
|
+
MIT
|
|
158
|
+
|
|
159
|
+
## Author
|
|
160
|
+
|
|
161
|
+
Anthony Holten @aholten on GitHub
|
|
@@ -0,0 +1,49 @@
|
|
|
1
|
+
[build-system]
|
|
2
|
+
requires = ["setuptools>=68.0"]
|
|
3
|
+
build-backend = "setuptools.build_meta"
|
|
4
|
+
|
|
5
|
+
[project]
|
|
6
|
+
name = "stttui"
|
|
7
|
+
version = "0.2.0"
|
|
8
|
+
description = "Speech-to-Text TUI — offline transcription powered by faster-whisper"
|
|
9
|
+
readme = "README.md"
|
|
10
|
+
license = "MIT"
|
|
11
|
+
requires-python = ">=3.10"
|
|
12
|
+
authors = [
|
|
13
|
+
{ name = "Anthony Holten" },
|
|
14
|
+
]
|
|
15
|
+
keywords = ["speech-to-text", "whisper", "tui", "transcription", "offline"]
|
|
16
|
+
classifiers = [
|
|
17
|
+
"Development Status :: 4 - Beta",
|
|
18
|
+
"Environment :: Console",
|
|
19
|
+
"Intended Audience :: Developers",
|
|
20
|
+
"Intended Audience :: End Users/Desktop",
|
|
21
|
+
"Operating System :: OS Independent",
|
|
22
|
+
"Programming Language :: Python :: 3",
|
|
23
|
+
"Programming Language :: Python :: 3.10",
|
|
24
|
+
"Programming Language :: Python :: 3.11",
|
|
25
|
+
"Programming Language :: Python :: 3.12",
|
|
26
|
+
"Programming Language :: Python :: 3.13",
|
|
27
|
+
"Topic :: Multimedia :: Sound/Audio :: Speech",
|
|
28
|
+
]
|
|
29
|
+
dependencies = [
|
|
30
|
+
"faster-whisper>=1.0.0,<2.0.0",
|
|
31
|
+
"sounddevice>=0.4.0,<1.0.0",
|
|
32
|
+
"soundfile>=0.12.0,<1.0.0",
|
|
33
|
+
"pynput>=1.7.0,<2.0.0",
|
|
34
|
+
"pyperclip>=1.8.0,<2.0.0",
|
|
35
|
+
"numpy>=1.24.0,<3.0.0",
|
|
36
|
+
"tzdata>=2024.1",
|
|
37
|
+
"textual>=0.40.0,<1.0.0",
|
|
38
|
+
]
|
|
39
|
+
|
|
40
|
+
[project.urls]
|
|
41
|
+
Homepage = "https://github.com/aholten/sttui"
|
|
42
|
+
Repository = "https://github.com/aholten/sttui"
|
|
43
|
+
Issues = "https://github.com/aholten/sttui/issues"
|
|
44
|
+
|
|
45
|
+
[project.scripts]
|
|
46
|
+
stttui = "stttui.speech_to_text:main"
|
|
47
|
+
|
|
48
|
+
[tool.setuptools.packages.find]
|
|
49
|
+
where = ["src"]
|
stttui-0.2.0/setup.cfg
ADDED
|
@@ -0,0 +1,538 @@
|
|
|
1
|
+
"""
|
|
2
|
+
stttui — Speech-to-Text TUI
|
|
3
|
+
|
|
4
|
+
Press Ctrl+Shift+Space to start recording.
|
|
5
|
+
Press Ctrl+Shift+Space again to stop recording, transcribe, and paste.
|
|
6
|
+
Press Ctrl+C to quit.
|
|
7
|
+
|
|
8
|
+
Run with --cli for CLI-only mode.
|
|
9
|
+
"""
|
|
10
|
+
|
|
11
|
+
import argparse
|
|
12
|
+
import os
|
|
13
|
+
import signal
|
|
14
|
+
import sys
|
|
15
|
+
import tempfile
|
|
16
|
+
import threading
|
|
17
|
+
import time
|
|
18
|
+
from datetime import datetime
|
|
19
|
+
from pathlib import Path
|
|
20
|
+
from zoneinfo import ZoneInfo
|
|
21
|
+
|
|
22
|
+
if sys.version_info < (3, 10):
|
|
23
|
+
sys.exit("stttui requires Python 3.10 or later.")
|
|
24
|
+
|
|
25
|
+
import numpy as np
|
|
26
|
+
import pyperclip
|
|
27
|
+
import sounddevice as sd
|
|
28
|
+
import soundfile as sf
|
|
29
|
+
from faster_whisper import WhisperModel
|
|
30
|
+
from pynput import keyboard as pynput_keyboard
|
|
31
|
+
|
|
32
|
+
from stttui import __version__ as VERSION
|
|
33
|
+
|
|
34
|
+
HOTKEY_RECORD = "ctrl+shift+space"
|
|
35
|
+
HOTKEY_QUIT = "ctrl+c"
|
|
36
|
+
SAMPLE_RATE = 16000
|
|
37
|
+
CHANNELS = 1
|
|
38
|
+
LOG_FILE = Path.home() / ".stttui" / "transcription_log.txt"
|
|
39
|
+
EASTERN = ZoneInfo("America/New_York")
|
|
40
|
+
MODEL_SIZES = ["tiny", "base", "small", "medium", "large"]
|
|
41
|
+
|
|
42
|
+
# RMS level throttle
|
|
43
|
+
_LEVEL_INTERVAL = 1.0 / 15 # ~15 updates/sec
|
|
44
|
+
|
|
45
|
+
|
|
46
|
+
class SpeechToText:
|
|
47
|
+
"""Core engine with callback hooks for UI integration."""
|
|
48
|
+
|
|
49
|
+
def __init__(self, model_name="base"):
|
|
50
|
+
self.recording = False
|
|
51
|
+
self.audio_chunks = []
|
|
52
|
+
self.stream = None
|
|
53
|
+
self.model = None
|
|
54
|
+
self.model_name = model_name
|
|
55
|
+
self.lock = threading.Lock()
|
|
56
|
+
self._last_level_time = 0.0
|
|
57
|
+
|
|
58
|
+
# Callback hooks — default to print for CLI mode
|
|
59
|
+
self.on_status = lambda msg: print(msg)
|
|
60
|
+
self.on_transcription = lambda text, ts: print(f"Transcription: {text}")
|
|
61
|
+
self.on_level = lambda rms: None
|
|
62
|
+
self.on_state_change = lambda state: None
|
|
63
|
+
|
|
64
|
+
def load_model(self):
|
|
65
|
+
self.on_status(f"Loading faster-whisper model '{self.model_name}' (CPU, int8)...")
|
|
66
|
+
self.model = WhisperModel(
|
|
67
|
+
self.model_name, device="cpu", compute_type="int8"
|
|
68
|
+
)
|
|
69
|
+
self.on_status("Model loaded. Ready!")
|
|
70
|
+
|
|
71
|
+
def audio_callback(self, indata, frames, time_info, status):
|
|
72
|
+
if status:
|
|
73
|
+
self.on_status(f"Audio status: {status}")
|
|
74
|
+
self.audio_chunks.append(indata.copy())
|
|
75
|
+
|
|
76
|
+
# Throttled RMS level
|
|
77
|
+
now = time.monotonic()
|
|
78
|
+
if now - self._last_level_time >= _LEVEL_INTERVAL:
|
|
79
|
+
self._last_level_time = now
|
|
80
|
+
rms = float(np.sqrt(np.mean(indata ** 2)))
|
|
81
|
+
self.on_level(rms)
|
|
82
|
+
|
|
83
|
+
def start_recording(self):
|
|
84
|
+
self.audio_chunks = []
|
|
85
|
+
self.stream = sd.InputStream(
|
|
86
|
+
samplerate=SAMPLE_RATE,
|
|
87
|
+
channels=CHANNELS,
|
|
88
|
+
dtype="float32",
|
|
89
|
+
callback=self.audio_callback,
|
|
90
|
+
)
|
|
91
|
+
self.stream.start()
|
|
92
|
+
self.recording = True
|
|
93
|
+
self.on_state_change("RECORDING")
|
|
94
|
+
self.on_status(f"Recording... (press {HOTKEY_RECORD} again to stop)")
|
|
95
|
+
|
|
96
|
+
def stop_recording(self):
|
|
97
|
+
if self.stream:
|
|
98
|
+
self.stream.stop()
|
|
99
|
+
self.stream.close()
|
|
100
|
+
self.stream = None
|
|
101
|
+
self.recording = False
|
|
102
|
+
self.on_state_change("TRANSCRIBING")
|
|
103
|
+
self.on_status("Recording stopped. Transcribing...")
|
|
104
|
+
|
|
105
|
+
def transcribe(self):
|
|
106
|
+
if not self.audio_chunks:
|
|
107
|
+
self.on_status("No audio recorded.")
|
|
108
|
+
self.on_state_change("IDLE")
|
|
109
|
+
return
|
|
110
|
+
|
|
111
|
+
audio = np.concatenate(self.audio_chunks, axis=0).flatten()
|
|
112
|
+
duration = len(audio) / SAMPLE_RATE
|
|
113
|
+
self.on_status(f"Audio duration: {duration:.1f}s")
|
|
114
|
+
|
|
115
|
+
if duration < 0.5:
|
|
116
|
+
self.on_status("Too short, skipping.")
|
|
117
|
+
self.on_state_change("IDLE")
|
|
118
|
+
return
|
|
119
|
+
|
|
120
|
+
tmp = tempfile.NamedTemporaryFile(suffix=".wav", delete=False)
|
|
121
|
+
tmp_path = tmp.name
|
|
122
|
+
tmp.close()
|
|
123
|
+
|
|
124
|
+
try:
|
|
125
|
+
sf.write(tmp_path, audio, SAMPLE_RATE)
|
|
126
|
+
segments, info = self.model.transcribe(tmp_path)
|
|
127
|
+
text = " ".join(seg.text.strip() for seg in segments).strip()
|
|
128
|
+
|
|
129
|
+
if text:
|
|
130
|
+
pyperclip.copy(text)
|
|
131
|
+
timestamp = datetime.now(EASTERN).strftime("%Y-%m-%d %I:%M:%S %p %Z")
|
|
132
|
+
self.on_transcription(text, timestamp)
|
|
133
|
+
self._log_transcription(text, timestamp)
|
|
134
|
+
else:
|
|
135
|
+
self.on_status("No speech detected.")
|
|
136
|
+
finally:
|
|
137
|
+
os.unlink(tmp_path)
|
|
138
|
+
self.on_state_change("IDLE")
|
|
139
|
+
|
|
140
|
+
def _log_transcription(self, text, timestamp=None):
|
|
141
|
+
if timestamp is None:
|
|
142
|
+
timestamp = datetime.now(EASTERN).strftime("%Y-%m-%d %I:%M:%S %p %Z")
|
|
143
|
+
LOG_FILE.parent.mkdir(parents=True, exist_ok=True)
|
|
144
|
+
with open(LOG_FILE, "a", encoding="utf-8") as f:
|
|
145
|
+
f.write(f"[{timestamp}] {text}\n")
|
|
146
|
+
|
|
147
|
+
def toggle_recording(self):
|
|
148
|
+
with self.lock:
|
|
149
|
+
if not self.recording:
|
|
150
|
+
self.start_recording()
|
|
151
|
+
else:
|
|
152
|
+
self.stop_recording()
|
|
153
|
+
self.transcribe()
|
|
154
|
+
|
|
155
|
+
|
|
156
|
+
# ---------------------------------------------------------------------------
|
|
157
|
+
# Cross-platform hotkey handling (pynput)
|
|
158
|
+
# ---------------------------------------------------------------------------
|
|
159
|
+
|
|
160
|
+
# Canonical key names used in the pressed set
|
|
161
|
+
_KEY_CTRL = "ctrl"
|
|
162
|
+
_KEY_SHIFT = "shift"
|
|
163
|
+
_KEY_SPACE = "space"
|
|
164
|
+
|
|
165
|
+
_RECORD_COMBO = frozenset({_KEY_CTRL, _KEY_SHIFT, _KEY_SPACE})
|
|
166
|
+
|
|
167
|
+
|
|
168
|
+
def _normalise_key(key):
|
|
169
|
+
"""Map any pynput key to a canonical string for reliable matching."""
|
|
170
|
+
# Modifier keys
|
|
171
|
+
if key in (pynput_keyboard.Key.shift, pynput_keyboard.Key.shift_l, pynput_keyboard.Key.shift_r):
|
|
172
|
+
return _KEY_SHIFT
|
|
173
|
+
if key in (pynput_keyboard.Key.ctrl, pynput_keyboard.Key.ctrl_l, pynput_keyboard.Key.ctrl_r):
|
|
174
|
+
return _KEY_CTRL
|
|
175
|
+
# Space — can arrive as Key.space, KeyCode(char=' '), or KeyCode(vk=32, char=None)
|
|
176
|
+
if key == pynput_keyboard.Key.space:
|
|
177
|
+
return _KEY_SPACE
|
|
178
|
+
if isinstance(key, pynput_keyboard.KeyCode):
|
|
179
|
+
if key.char == ' ':
|
|
180
|
+
return _KEY_SPACE
|
|
181
|
+
if getattr(key, 'vk', None) == 32:
|
|
182
|
+
return _KEY_SPACE
|
|
183
|
+
if key.char:
|
|
184
|
+
return key.char.lower()
|
|
185
|
+
return key
|
|
186
|
+
|
|
187
|
+
|
|
188
|
+
def _create_hotkey_listener(on_record):
|
|
189
|
+
"""Create a pynput keyboard listener for the record hotkey.
|
|
190
|
+
|
|
191
|
+
The listener never suppresses keys — all other shortcuts continue to work
|
|
192
|
+
normally while recording. Includes a 300ms debounce to prevent double-fires.
|
|
193
|
+
"""
|
|
194
|
+
pressed = set()
|
|
195
|
+
last_fire = [0.0]
|
|
196
|
+
|
|
197
|
+
def on_press(key):
|
|
198
|
+
pressed.add(_normalise_key(key))
|
|
199
|
+
if _RECORD_COMBO.issubset(pressed):
|
|
200
|
+
now = time.monotonic()
|
|
201
|
+
if now - last_fire[0] < 0.3:
|
|
202
|
+
return
|
|
203
|
+
last_fire[0] = now
|
|
204
|
+
threading.Thread(target=on_record, daemon=True).start()
|
|
205
|
+
|
|
206
|
+
def on_release(key):
|
|
207
|
+
pressed.discard(_normalise_key(key))
|
|
208
|
+
|
|
209
|
+
return pynput_keyboard.Listener(on_press=on_press, on_release=on_release, suppress=False)
|
|
210
|
+
|
|
211
|
+
|
|
212
|
+
# ---------------------------------------------------------------------------
|
|
213
|
+
# CLI mode (original behavior)
|
|
214
|
+
# ---------------------------------------------------------------------------
|
|
215
|
+
|
|
216
|
+
def run_cli(model_name="base"):
|
|
217
|
+
stt = SpeechToText(model_name=model_name)
|
|
218
|
+
|
|
219
|
+
def cli_transcription(text, ts):
|
|
220
|
+
print(f"Transcription: {text}")
|
|
221
|
+
print("Copied to clipboard! Use Ctrl+V to paste.")
|
|
222
|
+
|
|
223
|
+
stt.on_transcription = cli_transcription
|
|
224
|
+
|
|
225
|
+
stt.load_model()
|
|
226
|
+
print(f" Press {HOTKEY_RECORD} to start/stop recording")
|
|
227
|
+
print(f" Press {HOTKEY_QUIT} to quit")
|
|
228
|
+
print("\nListening for hotkey...")
|
|
229
|
+
|
|
230
|
+
signal.signal(signal.SIGINT, lambda *_: os._exit(0))
|
|
231
|
+
|
|
232
|
+
listener = _create_hotkey_listener(on_record=stt.toggle_recording)
|
|
233
|
+
listener.start()
|
|
234
|
+
listener.join()
|
|
235
|
+
|
|
236
|
+
|
|
237
|
+
# ---------------------------------------------------------------------------
|
|
238
|
+
# TUI mode (Textual)
|
|
239
|
+
# ---------------------------------------------------------------------------
|
|
240
|
+
|
|
241
|
+
def run_tui(model_name="base"):
|
|
242
|
+
from textual.app import App, ComposeResult
|
|
243
|
+
from textual.binding import Binding
|
|
244
|
+
from textual.containers import Horizontal, Vertical
|
|
245
|
+
from textual.reactive import reactive
|
|
246
|
+
from textual.widgets import Footer, Header, ProgressBar, RichLog, Select, Static
|
|
247
|
+
|
|
248
|
+
APP_CSS = """
|
|
249
|
+
#top-bar {
|
|
250
|
+
height: 3;
|
|
251
|
+
padding: 0 1;
|
|
252
|
+
}
|
|
253
|
+
|
|
254
|
+
#model-select {
|
|
255
|
+
width: 24;
|
|
256
|
+
}
|
|
257
|
+
|
|
258
|
+
#status-badge {
|
|
259
|
+
width: 20;
|
|
260
|
+
content-align: center middle;
|
|
261
|
+
text-style: bold;
|
|
262
|
+
margin-left: 2;
|
|
263
|
+
padding: 0 1;
|
|
264
|
+
}
|
|
265
|
+
|
|
266
|
+
#status-badge.idle {
|
|
267
|
+
background: $primary-darken-2;
|
|
268
|
+
color: $text;
|
|
269
|
+
}
|
|
270
|
+
|
|
271
|
+
#status-badge.recording {
|
|
272
|
+
background: $error;
|
|
273
|
+
color: $text;
|
|
274
|
+
}
|
|
275
|
+
|
|
276
|
+
#status-badge.transcribing {
|
|
277
|
+
background: $warning;
|
|
278
|
+
color: $text;
|
|
279
|
+
}
|
|
280
|
+
|
|
281
|
+
#main-content {
|
|
282
|
+
height: 1fr;
|
|
283
|
+
}
|
|
284
|
+
|
|
285
|
+
#level-meter {
|
|
286
|
+
height: 1;
|
|
287
|
+
margin: 0 1;
|
|
288
|
+
}
|
|
289
|
+
|
|
290
|
+
#transcription-history {
|
|
291
|
+
height: 1fr;
|
|
292
|
+
margin: 0 1;
|
|
293
|
+
border: solid $primary;
|
|
294
|
+
}
|
|
295
|
+
|
|
296
|
+
#status-log {
|
|
297
|
+
height: 7;
|
|
298
|
+
margin: 0 1;
|
|
299
|
+
border: solid $accent;
|
|
300
|
+
}
|
|
301
|
+
"""
|
|
302
|
+
|
|
303
|
+
class SpeechToTextApp(App):
|
|
304
|
+
TITLE = "stttui"
|
|
305
|
+
CSS = APP_CSS
|
|
306
|
+
|
|
307
|
+
BINDINGS = [
|
|
308
|
+
Binding("ctrl+shift+space", "toggle_record_binding", "Record", show=True),
|
|
309
|
+
Binding("ctrl+c", "quit_app", "Quit", show=True, priority=True),
|
|
310
|
+
]
|
|
311
|
+
|
|
312
|
+
state = reactive("IDLE")
|
|
313
|
+
|
|
314
|
+
def __init__(self, initial_model: str = "base"):
|
|
315
|
+
super().__init__()
|
|
316
|
+
self.engine = SpeechToText(model_name=initial_model)
|
|
317
|
+
self._initial_model = initial_model
|
|
318
|
+
self._listener = None
|
|
319
|
+
# Threading event for cross-thread signalling
|
|
320
|
+
self._toggle_event = threading.Event()
|
|
321
|
+
self._last_toggle_time = 0.0
|
|
322
|
+
# Queues for engine callbacks → UI
|
|
323
|
+
self._status_queue = []
|
|
324
|
+
self._transcription_queue = []
|
|
325
|
+
self._level_value = 0.0
|
|
326
|
+
self._pending_state = None
|
|
327
|
+
self._queue_lock = threading.Lock()
|
|
328
|
+
|
|
329
|
+
def compose(self) -> ComposeResult:
|
|
330
|
+
yield Header()
|
|
331
|
+
with Horizontal(id="top-bar"):
|
|
332
|
+
yield Select(
|
|
333
|
+
[(s, s) for s in MODEL_SIZES],
|
|
334
|
+
value=self._initial_model,
|
|
335
|
+
id="model-select",
|
|
336
|
+
allow_blank=False,
|
|
337
|
+
)
|
|
338
|
+
yield Static("IDLE", id="status-badge", classes="idle")
|
|
339
|
+
with Vertical(id="main-content"):
|
|
340
|
+
yield ProgressBar(id="level-meter", total=100, show_eta=False, show_percentage=False)
|
|
341
|
+
yield RichLog(id="transcription-history", highlight=True, markup=True)
|
|
342
|
+
yield RichLog(id="status-log", highlight=True, markup=True, max_lines=50)
|
|
343
|
+
yield Footer()
|
|
344
|
+
|
|
345
|
+
def on_mount(self) -> None:
|
|
346
|
+
# Wire engine callbacks to queue updates (thread-safe)
|
|
347
|
+
self.engine.on_status = self._enqueue_status
|
|
348
|
+
self.engine.on_transcription = self._enqueue_transcription
|
|
349
|
+
self.engine.on_level = self._enqueue_level
|
|
350
|
+
self.engine.on_state_change = self._enqueue_state
|
|
351
|
+
|
|
352
|
+
# Register global record hotkey via pynput
|
|
353
|
+
self._listener = _create_hotkey_listener(
|
|
354
|
+
on_record=self._toggle_event.set,
|
|
355
|
+
)
|
|
356
|
+
self._listener.start()
|
|
357
|
+
|
|
358
|
+
# Poll events and queues from Textual's own timer (~20 Hz)
|
|
359
|
+
self.set_interval(0.05, self._poll)
|
|
360
|
+
|
|
361
|
+
# Load model in background
|
|
362
|
+
self.run_worker(self._load_model_worker, thread=True)
|
|
363
|
+
|
|
364
|
+
async def _load_model_worker(self) -> None:
|
|
365
|
+
self.engine.load_model()
|
|
366
|
+
|
|
367
|
+
# -- Thread-safe enqueue helpers (called from engine/audio threads) --
|
|
368
|
+
|
|
369
|
+
def _enqueue_status(self, msg: str) -> None:
|
|
370
|
+
with self._queue_lock:
|
|
371
|
+
self._status_queue.append(msg)
|
|
372
|
+
|
|
373
|
+
def _enqueue_transcription(self, text: str, ts: str) -> None:
|
|
374
|
+
with self._queue_lock:
|
|
375
|
+
self._transcription_queue.append((text, ts))
|
|
376
|
+
|
|
377
|
+
def _enqueue_level(self, rms: float) -> None:
|
|
378
|
+
self._level_value = rms
|
|
379
|
+
|
|
380
|
+
def _enqueue_state(self, s: str) -> None:
|
|
381
|
+
self._pending_state = s
|
|
382
|
+
|
|
383
|
+
# -- Polling (runs on Textual main thread via set_interval) --
|
|
384
|
+
|
|
385
|
+
def _poll(self) -> None:
|
|
386
|
+
# Check hotkey events
|
|
387
|
+
if self._toggle_event.is_set():
|
|
388
|
+
self._toggle_event.clear()
|
|
389
|
+
self._toggle_record()
|
|
390
|
+
|
|
391
|
+
# Drain queued UI updates
|
|
392
|
+
with self._queue_lock:
|
|
393
|
+
statuses = self._status_queue[:]
|
|
394
|
+
self._status_queue.clear()
|
|
395
|
+
transcriptions = self._transcription_queue[:]
|
|
396
|
+
self._transcription_queue.clear()
|
|
397
|
+
|
|
398
|
+
for msg in statuses:
|
|
399
|
+
self._log_status(msg)
|
|
400
|
+
for text, ts in transcriptions:
|
|
401
|
+
self._add_transcription(text, ts)
|
|
402
|
+
|
|
403
|
+
# Update level meter
|
|
404
|
+
rms = self._level_value
|
|
405
|
+
level = min(100, int(rms * 700))
|
|
406
|
+
bar = self.query_one("#level-meter", ProgressBar)
|
|
407
|
+
bar.update(progress=level)
|
|
408
|
+
|
|
409
|
+
# Update state
|
|
410
|
+
pending = self._pending_state
|
|
411
|
+
if pending is not None:
|
|
412
|
+
self._pending_state = None
|
|
413
|
+
self.state = pending
|
|
414
|
+
|
|
415
|
+
# -- State management --
|
|
416
|
+
|
|
417
|
+
def watch_state(self, new_state: str) -> None:
|
|
418
|
+
badge = self.query_one("#status-badge", Static)
|
|
419
|
+
badge.update(new_state)
|
|
420
|
+
badge.remove_class("idle", "recording", "transcribing")
|
|
421
|
+
badge.add_class(new_state.lower())
|
|
422
|
+
|
|
423
|
+
if new_state != "RECORDING":
|
|
424
|
+
level = self.query_one("#level-meter", ProgressBar)
|
|
425
|
+
level.update(progress=0)
|
|
426
|
+
|
|
427
|
+
# -- UI helpers --
|
|
428
|
+
|
|
429
|
+
def _log_status(self, msg: str) -> None:
|
|
430
|
+
log = self.query_one("#status-log", RichLog)
|
|
431
|
+
log.write(msg)
|
|
432
|
+
|
|
433
|
+
def _add_transcription(self, text: str, timestamp: str) -> None:
|
|
434
|
+
history = self.query_one("#transcription-history", RichLog)
|
|
435
|
+
history.write(f"[dim]{timestamp}[/dim] {text}")
|
|
436
|
+
self._log_status("Copied to clipboard!")
|
|
437
|
+
|
|
438
|
+
# -- Actions --
|
|
439
|
+
|
|
440
|
+
def _toggle_record(self) -> None:
|
|
441
|
+
# Debounce: both pynput and Textual may fire for the same keypress
|
|
442
|
+
now = time.monotonic()
|
|
443
|
+
if now - self._last_toggle_time < 0.3:
|
|
444
|
+
return
|
|
445
|
+
self._last_toggle_time = now
|
|
446
|
+
|
|
447
|
+
if self.engine.model is None:
|
|
448
|
+
self._log_status("Model still loading, please wait...")
|
|
449
|
+
return
|
|
450
|
+
if self.state == "TRANSCRIBING":
|
|
451
|
+
self._log_status("Still transcribing, please wait...")
|
|
452
|
+
return
|
|
453
|
+
|
|
454
|
+
if self.state == "IDLE":
|
|
455
|
+
self.state = "RECORDING"
|
|
456
|
+
self.engine.start_recording()
|
|
457
|
+
else:
|
|
458
|
+
self.state = "TRANSCRIBING"
|
|
459
|
+
self.engine.stop_recording()
|
|
460
|
+
self.run_worker(self._transcribe_worker, thread=True)
|
|
461
|
+
|
|
462
|
+
async def _transcribe_worker(self) -> None:
|
|
463
|
+
self.engine.transcribe()
|
|
464
|
+
|
|
465
|
+
def action_toggle_record_binding(self) -> None:
|
|
466
|
+
self._toggle_record()
|
|
467
|
+
|
|
468
|
+
def action_quit_app(self) -> None:
|
|
469
|
+
if self._listener:
|
|
470
|
+
self._listener.stop()
|
|
471
|
+
if self.engine.stream:
|
|
472
|
+
self.engine.stream.stop()
|
|
473
|
+
self.engine.stream.close()
|
|
474
|
+
self.exit()
|
|
475
|
+
|
|
476
|
+
def on_select_changed(self, event: Select.Changed) -> None:
|
|
477
|
+
new_model = str(event.value)
|
|
478
|
+
if new_model == self.engine.model_name:
|
|
479
|
+
return
|
|
480
|
+
if self.state != "IDLE":
|
|
481
|
+
self._log_status("Cannot change model while recording/transcribing.")
|
|
482
|
+
event.select.value = self.engine.model_name
|
|
483
|
+
return
|
|
484
|
+
self.engine.model_name = new_model
|
|
485
|
+
self.engine.model = None
|
|
486
|
+
self._log_status(f"Switching to model '{new_model}'...")
|
|
487
|
+
self.run_worker(self._load_model_worker, thread=True)
|
|
488
|
+
|
|
489
|
+
app = SpeechToTextApp(initial_model=model_name)
|
|
490
|
+
app.run()
|
|
491
|
+
|
|
492
|
+
|
|
493
|
+
# ---------------------------------------------------------------------------
|
|
494
|
+
# Entry point
|
|
495
|
+
# ---------------------------------------------------------------------------
|
|
496
|
+
|
|
497
|
+
def run_headless(model_name="base", duration=None):
|
|
498
|
+
"""Record for a fixed duration (or until Enter), transcribe, print to stdout, and exit."""
|
|
499
|
+
stt = SpeechToText(model_name=model_name)
|
|
500
|
+
|
|
501
|
+
# Suppress status messages; only output the transcription
|
|
502
|
+
stt.on_status = lambda msg: sys.stderr.write(f"{msg}\n")
|
|
503
|
+
stt.on_transcription = lambda text, ts: print(text)
|
|
504
|
+
|
|
505
|
+
stt.load_model()
|
|
506
|
+
|
|
507
|
+
stt.start_recording()
|
|
508
|
+
|
|
509
|
+
if duration:
|
|
510
|
+
sys.stderr.write(f"Recording for {duration}s...\n")
|
|
511
|
+
time.sleep(duration)
|
|
512
|
+
else:
|
|
513
|
+
sys.stderr.write("Recording... press Enter to stop.\n")
|
|
514
|
+
input()
|
|
515
|
+
|
|
516
|
+
stt.stop_recording()
|
|
517
|
+
stt.transcribe()
|
|
518
|
+
|
|
519
|
+
|
|
520
|
+
def main():
|
|
521
|
+
parser = argparse.ArgumentParser(description="stttui — Speech-to-Text TUI")
|
|
522
|
+
parser.add_argument("--version", action="version", version=f"stttui {VERSION}")
|
|
523
|
+
parser.add_argument("--cli", action="store_true", help="Run in CLI-only mode (no TUI)")
|
|
524
|
+
parser.add_argument("--headless", action="store_true", help="Record, transcribe, print to stdout, and exit (no hotkeys/UI)")
|
|
525
|
+
parser.add_argument("--duration", type=float, default=None, help="Recording duration in seconds (headless mode; omit to wait for Enter)")
|
|
526
|
+
parser.add_argument("--model", default="base", choices=MODEL_SIZES, help="Whisper model size")
|
|
527
|
+
args = parser.parse_args()
|
|
528
|
+
|
|
529
|
+
if args.headless:
|
|
530
|
+
run_headless(model_name=args.model, duration=args.duration)
|
|
531
|
+
elif args.cli:
|
|
532
|
+
run_cli(model_name=args.model)
|
|
533
|
+
else:
|
|
534
|
+
run_tui(model_name=args.model)
|
|
535
|
+
|
|
536
|
+
|
|
537
|
+
if __name__ == "__main__":
|
|
538
|
+
main()
|
|
@@ -0,0 +1,195 @@
|
|
|
1
|
+
Metadata-Version: 2.4
|
|
2
|
+
Name: stttui
|
|
3
|
+
Version: 0.2.0
|
|
4
|
+
Summary: Speech-to-Text TUI — offline transcription powered by faster-whisper
|
|
5
|
+
Author: Anthony Holten
|
|
6
|
+
License-Expression: MIT
|
|
7
|
+
Project-URL: Homepage, https://github.com/aholten/sttui
|
|
8
|
+
Project-URL: Repository, https://github.com/aholten/sttui
|
|
9
|
+
Project-URL: Issues, https://github.com/aholten/sttui/issues
|
|
10
|
+
Keywords: speech-to-text,whisper,tui,transcription,offline
|
|
11
|
+
Classifier: Development Status :: 4 - Beta
|
|
12
|
+
Classifier: Environment :: Console
|
|
13
|
+
Classifier: Intended Audience :: Developers
|
|
14
|
+
Classifier: Intended Audience :: End Users/Desktop
|
|
15
|
+
Classifier: Operating System :: OS Independent
|
|
16
|
+
Classifier: Programming Language :: Python :: 3
|
|
17
|
+
Classifier: Programming Language :: Python :: 3.10
|
|
18
|
+
Classifier: Programming Language :: Python :: 3.11
|
|
19
|
+
Classifier: Programming Language :: Python :: 3.12
|
|
20
|
+
Classifier: Programming Language :: Python :: 3.13
|
|
21
|
+
Classifier: Topic :: Multimedia :: Sound/Audio :: Speech
|
|
22
|
+
Requires-Python: >=3.10
|
|
23
|
+
Description-Content-Type: text/markdown
|
|
24
|
+
License-File: LICENSE
|
|
25
|
+
Requires-Dist: faster-whisper<2.0.0,>=1.0.0
|
|
26
|
+
Requires-Dist: sounddevice<1.0.0,>=0.4.0
|
|
27
|
+
Requires-Dist: soundfile<1.0.0,>=0.12.0
|
|
28
|
+
Requires-Dist: pynput<2.0.0,>=1.7.0
|
|
29
|
+
Requires-Dist: pyperclip<2.0.0,>=1.8.0
|
|
30
|
+
Requires-Dist: numpy<3.0.0,>=1.24.0
|
|
31
|
+
Requires-Dist: tzdata>=2024.1
|
|
32
|
+
Requires-Dist: textual<1.0.0,>=0.40.0
|
|
33
|
+
Dynamic: license-file
|
|
34
|
+
|
|
35
|
+
# stttui: Speech To Text Terminal User Interface
|
|
36
|
+
|
|
37
|
+
> **v0.2.0** | Python 3.10+ | MIT License
|
|
38
|
+
|
|
39
|
+
A local, fully offline speech-to-text tool powered by [faster-whisper](https://github.com/SYSTRAN/faster-whisper). Record audio with a global hotkey from any window, transcribe it on-device, and get the result copied straight to your clipboard — no cloud, no API keys, no latency.
|
|
40
|
+
|
|
41
|
+
Comes with a polished terminal UI built on [Textual](https://github.com/Textualize/textual), complete with a live audio level meter, transcription history, and on-the-fly model switching.
|
|
42
|
+
|
|
43
|
+
<p align="center">
|
|
44
|
+
<img src="demo.gif" alt="stttui in action" width="700">
|
|
45
|
+
</p>
|
|
46
|
+
|
|
47
|
+
---
|
|
48
|
+
|
|
49
|
+
## Features
|
|
50
|
+
|
|
51
|
+
- **Fully offline** — runs entirely on your machine using faster-whisper with int8 quantization. No data ever leaves your device.
|
|
52
|
+
- **Global hotkeys** — record from anywhere without focusing the terminal
|
|
53
|
+
- **Live audio meter** — real-time visual feedback while recording
|
|
54
|
+
- **Transcription history** — timestamped, scrollable log of every dictation
|
|
55
|
+
- **Model switching** — swap between `tiny`, `base`, `small`, `medium`, and `large` whisper models on the fly
|
|
56
|
+
- **Clipboard integration** — transcriptions are automatically copied and ready to paste
|
|
57
|
+
- **Cross-platform** — works on Windows, macOS, and Linux
|
|
58
|
+
- **Three modes** — rich TUI (default), lightweight CLI (`--cli`), or headless (`--headless`) for scripting/piping
|
|
59
|
+
|
|
60
|
+
## Installation
|
|
61
|
+
|
|
62
|
+
### Homebrew (macOS / Linux)
|
|
63
|
+
|
|
64
|
+
```bash
|
|
65
|
+
brew tap aholten/tap
|
|
66
|
+
brew install stttui
|
|
67
|
+
```
|
|
68
|
+
|
|
69
|
+
### pip / pipx
|
|
70
|
+
|
|
71
|
+
```bash
|
|
72
|
+
# With pipx (recommended — isolated environment)
|
|
73
|
+
pipx install stttui
|
|
74
|
+
|
|
75
|
+
# Or with pip
|
|
76
|
+
pip install stttui
|
|
77
|
+
```
|
|
78
|
+
|
|
79
|
+
### From source
|
|
80
|
+
|
|
81
|
+
```bash
|
|
82
|
+
git clone https://github.com/aholten/sttui.git
|
|
83
|
+
cd stttui
|
|
84
|
+
pip install .
|
|
85
|
+
```
|
|
86
|
+
|
|
87
|
+
> On macOS you may need to grant terminal accessibility permissions for global hotkeys.
|
|
88
|
+
> On Linux, install `portaudio` (`sudo apt install portaudio19-dev`) and `xclip` for clipboard support.
|
|
89
|
+
|
|
90
|
+
## Usage
|
|
91
|
+
|
|
92
|
+
### Hotkeys
|
|
93
|
+
|
|
94
|
+
These work **globally** — even when the terminal is not focused:
|
|
95
|
+
|
|
96
|
+
| Hotkey | Action |
|
|
97
|
+
|---|---|
|
|
98
|
+
| `Ctrl+Shift+Space` | Start / stop recording |
|
|
99
|
+
| `Ctrl+C` | Quit |
|
|
100
|
+
|
|
101
|
+
### Workflow
|
|
102
|
+
|
|
103
|
+
1. Launch the tool — the whisper model loads in the background
|
|
104
|
+
2. Switch to any application (browser, editor, chat, etc.)
|
|
105
|
+
3. Press **Ctrl+Shift+Space** to start recording
|
|
106
|
+
4. Speak
|
|
107
|
+
5. Press **Ctrl+Shift+Space** again to stop
|
|
108
|
+
6. The transcription is copied to your clipboard — paste with **Ctrl+V**
|
|
109
|
+
|
|
110
|
+
### TUI Mode (default)
|
|
111
|
+
|
|
112
|
+
```bash
|
|
113
|
+
stttui
|
|
114
|
+
```
|
|
115
|
+
|
|
116
|
+
The terminal interface includes:
|
|
117
|
+
- **Model selector** — dropdown to switch whisper models without restarting
|
|
118
|
+
- **Status badge** — shows IDLE / RECORDING / TRANSCRIBING state
|
|
119
|
+
- **Level meter** — live audio input visualization
|
|
120
|
+
- **Transcription history** — scrollable log with timestamps
|
|
121
|
+
- **Status log** — model loading progress and system messages
|
|
122
|
+
|
|
123
|
+
### CLI Mode
|
|
124
|
+
|
|
125
|
+
```bash
|
|
126
|
+
stttui --cli
|
|
127
|
+
```
|
|
128
|
+
|
|
129
|
+
Minimal output, no UI — just hotkeys and clipboard.
|
|
130
|
+
|
|
131
|
+
### Headless Mode
|
|
132
|
+
|
|
133
|
+
Record, transcribe, print to stdout, and exit. No hotkeys or UI — ideal for scripts and piping.
|
|
134
|
+
|
|
135
|
+
```bash
|
|
136
|
+
# Record until crtl+shift+space is pressed
|
|
137
|
+
stttui --headless
|
|
138
|
+
|
|
139
|
+
# Record for exactly 10 seconds
|
|
140
|
+
stttui --headless --duration 10
|
|
141
|
+
|
|
142
|
+
# Pipe transcription to another command
|
|
143
|
+
stttui --headless --duration 5 | xargs echo "You said:"
|
|
144
|
+
```
|
|
145
|
+
|
|
146
|
+
Status messages go to stderr, transcription goes to stdout.
|
|
147
|
+
|
|
148
|
+
### Options
|
|
149
|
+
|
|
150
|
+
| Flag | Description | Default |
|
|
151
|
+
|---|---|---|
|
|
152
|
+
| `--cli` | Run in CLI-only mode (no TUI) | off |
|
|
153
|
+
| `--headless` | Record, transcribe, print to stdout, and exit | off |
|
|
154
|
+
| `--duration` | Recording duration in seconds (headless mode; omit to wait for Enter) | — |
|
|
155
|
+
| `--model` | Whisper model size | `base` |
|
|
156
|
+
| `--version` | Print version and exit | — |
|
|
157
|
+
|
|
158
|
+
```bash
|
|
159
|
+
# Example: launch with the small model
|
|
160
|
+
stttui --model small
|
|
161
|
+
```
|
|
162
|
+
|
|
163
|
+
## Model Sizes
|
|
164
|
+
|
|
165
|
+
Larger models are more accurate but slower to load and transcribe. All models run on CPU with int8 quantization.
|
|
166
|
+
|
|
167
|
+
| Model | Parameters | Speed |
|
|
168
|
+
|---|---|---|
|
|
169
|
+
| `tiny` | 39M | Fastest |
|
|
170
|
+
| `base` | 74M | Fast |
|
|
171
|
+
| `small` | 244M | Moderate |
|
|
172
|
+
| `medium` | 769M | Slow |
|
|
173
|
+
| `large` | 1550M | Slowest |
|
|
174
|
+
|
|
175
|
+
In TUI mode you can switch models from the dropdown at any time.
|
|
176
|
+
|
|
177
|
+
## Platform Notes
|
|
178
|
+
|
|
179
|
+
| Platform | Notes |
|
|
180
|
+
|---|---|
|
|
181
|
+
| **Windows** | Works out of the box via Git Bash or MSYS2. |
|
|
182
|
+
| **macOS** | Clipboard works via `pbcopy`. You may need to grant terminal accessibility permissions for global hotkeys. |
|
|
183
|
+
| **Linux** | Install `xclip` or `xsel` for clipboard support (`sudo apt install xclip`). |
|
|
184
|
+
|
|
185
|
+
## Transcription Log
|
|
186
|
+
|
|
187
|
+
All transcriptions are automatically saved to `transcription_log.txt` with timestamps for reference.
|
|
188
|
+
|
|
189
|
+
## License
|
|
190
|
+
|
|
191
|
+
MIT
|
|
192
|
+
|
|
193
|
+
## Author
|
|
194
|
+
|
|
195
|
+
Anthony Holten @aholten on GitHub
|
|
@@ -0,0 +1,12 @@
|
|
|
1
|
+
LICENSE
|
|
2
|
+
README.md
|
|
3
|
+
pyproject.toml
|
|
4
|
+
src/stttui/__init__.py
|
|
5
|
+
src/stttui/__main__.py
|
|
6
|
+
src/stttui/speech_to_text.py
|
|
7
|
+
src/stttui.egg-info/PKG-INFO
|
|
8
|
+
src/stttui.egg-info/SOURCES.txt
|
|
9
|
+
src/stttui.egg-info/dependency_links.txt
|
|
10
|
+
src/stttui.egg-info/entry_points.txt
|
|
11
|
+
src/stttui.egg-info/requires.txt
|
|
12
|
+
src/stttui.egg-info/top_level.txt
|
|
@@ -0,0 +1 @@
|
|
|
1
|
+
|
|
@@ -0,0 +1 @@
|
|
|
1
|
+
stttui
|