scribe-cli 0.8.0__tar.gz → 0.10.0__tar.gz
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- {scribe_cli-0.8.0/scribe_cli.egg-info → scribe_cli-0.10.0}/PKG-INFO +26 -15
- {scribe_cli-0.8.0 → scribe_cli-0.10.0}/README.md +20 -14
- {scribe_cli-0.8.0 → scribe_cli-0.10.0}/pyproject.toml +2 -1
- {scribe_cli-0.8.0 → scribe_cli-0.10.0}/scribe/_version.py +2 -2
- {scribe_cli-0.8.0 → scribe_cli-0.10.0}/scribe/app.py +32 -21
- {scribe_cli-0.8.0 → scribe_cli-0.10.0}/scribe/install_desktop.py +14 -3
- {scribe_cli-0.8.0 → scribe_cli-0.10.0}/scribe/models.py +54 -8
- {scribe_cli-0.8.0 → scribe_cli-0.10.0/scribe_cli.egg-info}/PKG-INFO +26 -15
- {scribe_cli-0.8.0 → scribe_cli-0.10.0}/scribe_cli.egg-info/requires.txt +6 -0
- scribe_cli-0.10.0/scribe_data/templates/scribe.desktop +8 -0
- scribe_cli-0.8.0/scribe_data/templates/scribe.desktop +0 -8
- {scribe_cli-0.8.0 → scribe_cli-0.10.0}/.github/workflows/pypi.yml +0 -0
- {scribe_cli-0.8.0 → scribe_cli-0.10.0}/.gitignore +0 -0
- {scribe_cli-0.8.0 → scribe_cli-0.10.0}/LICENSE +0 -0
- {scribe_cli-0.8.0 → scribe_cli-0.10.0}/icon.xcf +0 -0
- {scribe_cli-0.8.0 → scribe_cli-0.10.0}/scribe/__init__.py +0 -0
- {scribe_cli-0.8.0 → scribe_cli-0.10.0}/scribe/audio.py +0 -0
- {scribe_cli-0.8.0 → scribe_cli-0.10.0}/scribe/keyboard.py +0 -0
- {scribe_cli-0.8.0 → scribe_cli-0.10.0}/scribe/models.toml +0 -0
- {scribe_cli-0.8.0 → scribe_cli-0.10.0}/scribe/saverecording.py +0 -0
- {scribe_cli-0.8.0 → scribe_cli-0.10.0}/scribe/testpynput.py +0 -0
- {scribe_cli-0.8.0 → scribe_cli-0.10.0}/scribe/util.py +0 -0
- {scribe_cli-0.8.0 → scribe_cli-0.10.0}/scribe_cli.egg-info/SOURCES.txt +0 -0
- {scribe_cli-0.8.0 → scribe_cli-0.10.0}/scribe_cli.egg-info/dependency_links.txt +0 -0
- {scribe_cli-0.8.0 → scribe_cli-0.10.0}/scribe_cli.egg-info/entry_points.txt +0 -0
- {scribe_cli-0.8.0 → scribe_cli-0.10.0}/scribe_cli.egg-info/top_level.txt +0 -0
- {scribe_cli-0.8.0 → scribe_cli-0.10.0}/scribe_data/__init__.py +0 -0
- {scribe_cli-0.8.0 → scribe_cli-0.10.0}/scribe_data/share/icon.png +0 -0
- {scribe_cli-0.8.0 → scribe_cli-0.10.0}/scribe_data/share/icon_recording.png +0 -0
- {scribe_cli-0.8.0 → scribe_cli-0.10.0}/scribe_data/share/icon_writing.png +0 -0
- {scribe_cli-0.8.0 → scribe_cli-0.10.0}/setup.cfg +0 -0
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
Metadata-Version: 2.2
|
|
2
2
|
Name: scribe-cli
|
|
3
|
-
Version: 0.
|
|
3
|
+
Version: 0.10.0
|
|
4
4
|
Summary: scribe is a local speech recognition tool that provides real-time transcription using vosk and whisper AI, with the goal of serving as a virtual keyboard on a computer
|
|
5
5
|
Author-email: Mahé Perrette <mahe.perrette@gmail.com>
|
|
6
6
|
License: MIT License
|
|
@@ -55,9 +55,14 @@ Requires-Dist: vosk; extra == "vosk"
|
|
|
55
55
|
Provides-Extra: app
|
|
56
56
|
Requires-Dist: pystray; extra == "app"
|
|
57
57
|
Requires-Dist: PyGObject; extra == "app"
|
|
58
|
+
Provides-Extra: openai
|
|
59
|
+
Requires-Dist: openai; extra == "openai"
|
|
60
|
+
Requires-Dist: soundfile; extra == "openai"
|
|
58
61
|
Provides-Extra: all
|
|
59
62
|
Requires-Dist: pynput; extra == "all"
|
|
60
63
|
Requires-Dist: openai-whisper; extra == "all"
|
|
64
|
+
Requires-Dist: openai; extra == "all"
|
|
65
|
+
Requires-Dist: soundfile; extra == "all"
|
|
61
66
|
Requires-Dist: vosk; extra == "all"
|
|
62
67
|
Requires-Dist: pystray; extra == "all"
|
|
63
68
|
|
|
@@ -66,7 +71,9 @@ Requires-Dist: pystray; extra == "all"
|
|
|
66
71
|
|
|
67
72
|
# Scribe <img src="scribe_data/share/icon.png" width=48px>
|
|
68
73
|
|
|
69
|
-
`scribe` is a
|
|
74
|
+
`scribe` is a speech recognition tool that provides real-time transcription using cutting-edge AI models, with the goal of serving as a virtual keyboard on a computer.
|
|
75
|
+
|
|
76
|
+
It features local, downloadable models with the `vosk` and `whisper` backends, as well as a client to open AI via `openaiapi` backend (API key required).
|
|
70
77
|
|
|
71
78
|
## Compatibility
|
|
72
79
|
|
|
@@ -101,12 +108,10 @@ cd scribe
|
|
|
101
108
|
pip install -e .[all]
|
|
102
109
|
```
|
|
103
110
|
|
|
104
|
-
You can leave the optional dependencies (leave out `[all]`) but must install at least one of `vosk` or `openai-whisper` packages (see Usage below).
|
|
105
|
-
|
|
106
|
-
The `vosk` language models will download on-the-fly.
|
|
107
|
-
The default download folder is `$XDG_CACHE_HOME/{backend}` where `$XDG_CACHE_HOME` defaults to `$HOME/.cache` (note for the `whisper` backend
|
|
108
|
-
the default is left to the `openai-whisper` package and might change in the future).
|
|
111
|
+
You can leave the optional dependencies (leave out `[all]`) but must install at least one of `vosk` or `openai-whisper` or `openai` packages (see Usage below).
|
|
109
112
|
|
|
113
|
+
The language models for local backends `vosk` and `whisper` will download on-the-fly.
|
|
114
|
+
The default download folder is `$XDG_CACHE_HOME/{backend}` where `$XDG_CACHE_HOME` defaults to `$HOME/.cache`.
|
|
110
115
|
|
|
111
116
|
## Usage
|
|
112
117
|
|
|
@@ -115,7 +120,7 @@ Just type in the terminal:
|
|
|
115
120
|
```bash
|
|
116
121
|
scribe
|
|
117
122
|
```
|
|
118
|
-
and the script will guide you through the choice of backend (`whisper` or `vosk`) and the specific language model.
|
|
123
|
+
and the script will guide you through the choice of backend (`whisper` or `vosk` or `openaiapi`) and the specific language model.
|
|
119
124
|
After this, you will be prompted to start recording your microphone and print the transcribed text in real-time (`vosk`)
|
|
120
125
|
or until after recording is complete (`whisper`).
|
|
121
126
|
You can interrupt the recording via Ctrl + C and start again or change model.
|
|
@@ -129,9 +134,9 @@ The `vosk` backend is much faster and very good at doing real-time transcription
|
|
|
129
134
|
It becomes really powerful when used for longer or interactive typing session with the [keyboard](#virtual-keyboard-experimental) option, e.g. to make notes or chat with an AI.
|
|
130
135
|
There are many [vosk models](https://alphacephei.com/vosk/models) available, and here a few are associated to [a handful of languages](scribe/models.toml) `en`, `fr`, `it`, `de` (so far).
|
|
131
136
|
|
|
132
|
-
|
|
137
|
+
The `openaiapi` backend uses `whisper-1` model at the time of writing. It requires an API key
|
|
133
138
|
```bash
|
|
134
|
-
scribe --backend
|
|
139
|
+
scribe --backend openaiapi --api YOURAPIKEY
|
|
135
140
|
```
|
|
136
141
|
where `--no-prompt` jumps right to the recording (after the first interruption, you can still choose to change the backend and model).
|
|
137
142
|
|
|
@@ -190,7 +195,8 @@ To activate start with:
|
|
|
190
195
|
```bash
|
|
191
196
|
scribe --app
|
|
192
197
|
```
|
|
193
|
-
or toggle the app option in the interactive menu. The scribe icon will show, with Record or Quit options.
|
|
198
|
+
or toggle the app option in the interactive menu. The scribe icon will show, with Record, Stop or Quit options. The icon will change based on what the app is doing.
|
|
199
|
+
For the vosk model, there are only two states : recording + transcribing or Idle. For the whisper model there are three states visible from the icon: recording, transcribing and idle/waiting.
|
|
194
200
|
That option requires `pystray` to be installed. This is included with the `pip install ...[all]` option. In Ubuntu the following dependencies were required to make the menus appear:
|
|
195
201
|
|
|
196
202
|
```bash
|
|
@@ -201,15 +207,20 @@ pip install PyGObject
|
|
|
201
207
|
## Start as an application in GNOME
|
|
202
208
|
|
|
203
209
|
If you run Ubuntu (or else?) with GNOME, the script `scribe-install [...]` will create a `scribe.desktop` file and place it under `$HOME/.local/share/applications`
|
|
204
|
-
to make it available from the quick launch menu. Any option will be passed on to `scribe`.
|
|
210
|
+
to make it available from the quick launch menu. Any option will be passed on to `scribe`, with the additional options `--name` and `--no-terminal`.
|
|
211
|
+
`--no-terminal` means no terminal will show up, and it also implies the options `--app --no-prompt`.
|
|
205
212
|
|
|
206
213
|
e.g.
|
|
207
214
|
|
|
208
215
|
```bash
|
|
209
|
-
scribe-install
|
|
216
|
+
scribe-install
|
|
217
|
+
scribe-install --name "Scribe Whisper" --backend whisper --model small --clipboard --restart-after-silence --no-prompt
|
|
218
|
+
scribe-install --name "Scribe Vosk FR" --backend vosk --language fr --keyboard --clipboard --no-terminal
|
|
210
219
|
```
|
|
211
|
-
|
|
212
|
-
|
|
220
|
+
This will install three separate apps:
|
|
221
|
+
- `Super + scribe` : will launch the default version with terminal prompt
|
|
222
|
+
- `Super + whisper` : will launch a present version with the `small` model from `whisper` and start recording right away. You can see what is going on in the terminal and the result is ready to paste from the clipboard.
|
|
223
|
+
- `Super + vosk fr` : will launch a preset version for real-time transcription in French with the `vosk` backend, and throughput to the clipboard and the keyboard, not even opening a terminal (you need to press Record in the tray icon menu to start the recording).
|
|
213
224
|
|
|
214
225
|
|
|
215
226
|
## Fine tuning
|
|
@@ -3,7 +3,9 @@
|
|
|
3
3
|
|
|
4
4
|
# Scribe <img src="scribe_data/share/icon.png" width=48px>
|
|
5
5
|
|
|
6
|
-
`scribe` is a
|
|
6
|
+
`scribe` is a speech recognition tool that provides real-time transcription using cutting-edge AI models, with the goal of serving as a virtual keyboard on a computer.
|
|
7
|
+
|
|
8
|
+
It features local, downloadable models with the `vosk` and `whisper` backends, as well as a client to open AI via `openaiapi` backend (API key required).
|
|
7
9
|
|
|
8
10
|
## Compatibility
|
|
9
11
|
|
|
@@ -38,12 +40,10 @@ cd scribe
|
|
|
38
40
|
pip install -e .[all]
|
|
39
41
|
```
|
|
40
42
|
|
|
41
|
-
You can leave the optional dependencies (leave out `[all]`) but must install at least one of `vosk` or `openai-whisper` packages (see Usage below).
|
|
42
|
-
|
|
43
|
-
The `vosk` language models will download on-the-fly.
|
|
44
|
-
The default download folder is `$XDG_CACHE_HOME/{backend}` where `$XDG_CACHE_HOME` defaults to `$HOME/.cache` (note for the `whisper` backend
|
|
45
|
-
the default is left to the `openai-whisper` package and might change in the future).
|
|
43
|
+
You can leave the optional dependencies (leave out `[all]`) but must install at least one of `vosk` or `openai-whisper` or `openai` packages (see Usage below).
|
|
46
44
|
|
|
45
|
+
The language models for local backends `vosk` and `whisper` will download on-the-fly.
|
|
46
|
+
The default download folder is `$XDG_CACHE_HOME/{backend}` where `$XDG_CACHE_HOME` defaults to `$HOME/.cache`.
|
|
47
47
|
|
|
48
48
|
## Usage
|
|
49
49
|
|
|
@@ -52,7 +52,7 @@ Just type in the terminal:
|
|
|
52
52
|
```bash
|
|
53
53
|
scribe
|
|
54
54
|
```
|
|
55
|
-
and the script will guide you through the choice of backend (`whisper` or `vosk`) and the specific language model.
|
|
55
|
+
and the script will guide you through the choice of backend (`whisper` or `vosk` or `openaiapi`) and the specific language model.
|
|
56
56
|
After this, you will be prompted to start recording your microphone and print the transcribed text in real-time (`vosk`)
|
|
57
57
|
or until after recording is complete (`whisper`).
|
|
58
58
|
You can interrupt the recording via Ctrl + C and start again or change model.
|
|
@@ -66,9 +66,9 @@ The `vosk` backend is much faster and very good at doing real-time transcription
|
|
|
66
66
|
It becomes really powerful when used for longer or interactive typing session with the [keyboard](#virtual-keyboard-experimental) option, e.g. to make notes or chat with an AI.
|
|
67
67
|
There are many [vosk models](https://alphacephei.com/vosk/models) available, and here a few are associated to [a handful of languages](scribe/models.toml) `en`, `fr`, `it`, `de` (so far).
|
|
68
68
|
|
|
69
|
-
|
|
69
|
+
The `openaiapi` backend uses `whisper-1` model at the time of writing. It requires an API key
|
|
70
70
|
```bash
|
|
71
|
-
scribe --backend
|
|
71
|
+
scribe --backend openaiapi --api YOURAPIKEY
|
|
72
72
|
```
|
|
73
73
|
where `--no-prompt` jumps right to the recording (after the first interruption, you can still choose to change the backend and model).
|
|
74
74
|
|
|
@@ -127,7 +127,8 @@ To activate start with:
|
|
|
127
127
|
```bash
|
|
128
128
|
scribe --app
|
|
129
129
|
```
|
|
130
|
-
or toggle the app option in the interactive menu. The scribe icon will show, with Record or Quit options.
|
|
130
|
+
or toggle the app option in the interactive menu. The scribe icon will show, with Record, Stop or Quit options. The icon will change based on what the app is doing.
|
|
131
|
+
For the vosk model, there are only two states : recording + transcribing or Idle. For the whisper model there are three states visible from the icon: recording, transcribing and idle/waiting.
|
|
131
132
|
That option requires `pystray` to be installed. This is included with the `pip install ...[all]` option. In Ubuntu the following dependencies were required to make the menus appear:
|
|
132
133
|
|
|
133
134
|
```bash
|
|
@@ -138,15 +139,20 @@ pip install PyGObject
|
|
|
138
139
|
## Start as an application in GNOME
|
|
139
140
|
|
|
140
141
|
If you run Ubuntu (or else?) with GNOME, the script `scribe-install [...]` will create a `scribe.desktop` file and place it under `$HOME/.local/share/applications`
|
|
141
|
-
to make it available from the quick launch menu. Any option will be passed on to `scribe`.
|
|
142
|
+
to make it available from the quick launch menu. Any option will be passed on to `scribe`, with the additional options `--name` and `--no-terminal`.
|
|
143
|
+
`--no-terminal` means no terminal will show up, and it also implies the options `--app --no-prompt`.
|
|
142
144
|
|
|
143
145
|
e.g.
|
|
144
146
|
|
|
145
147
|
```bash
|
|
146
|
-
scribe-install
|
|
148
|
+
scribe-install
|
|
149
|
+
scribe-install --name "Scribe Whisper" --backend whisper --model small --clipboard --restart-after-silence --no-prompt
|
|
150
|
+
scribe-install --name "Scribe Vosk FR" --backend vosk --language fr --keyboard --clipboard --no-terminal
|
|
147
151
|
```
|
|
148
|
-
|
|
149
|
-
|
|
152
|
+
This will install three separate apps:
|
|
153
|
+
- `Super + scribe` : will launch the default version with terminal prompt
|
|
154
|
+
- `Super + whisper` : will launch a present version with the `small` model from `whisper` and start recording right away. You can see what is going on in the terminal and the result is ready to paste from the clipboard.
|
|
155
|
+
- `Super + vosk fr` : will launch a preset version for real-time transcription in French with the `vosk` backend, and throughput to the clipboard and the keyboard, not even opening a terminal (you need to press Record in the tray icon menu to start the recording).
|
|
150
156
|
|
|
151
157
|
|
|
152
158
|
## Fine tuning
|
|
@@ -44,7 +44,8 @@ keyboard = ["pynput"]
|
|
|
44
44
|
whisper = ["openai-whisper"]
|
|
45
45
|
vosk = ["vosk"]
|
|
46
46
|
app = ["pystray", "PyGObject"]
|
|
47
|
-
|
|
47
|
+
openai = ["openai", "soundfile"]
|
|
48
|
+
all = ["pynput", "openai-whisper", "openai", "soundfile", "vosk", "pystray"]
|
|
48
49
|
|
|
49
50
|
|
|
50
51
|
[tool.setuptools]
|
|
@@ -4,8 +4,8 @@ import re
|
|
|
4
4
|
import time
|
|
5
5
|
import argparse
|
|
6
6
|
from scribe.audio import Microphone
|
|
7
|
-
from scribe.util import print_partial, clear_line, prompt_choices,
|
|
8
|
-
from scribe.models import VoskTranscriber, WhisperTranscriber
|
|
7
|
+
from scribe.util import print_partial, clear_line, prompt_choices, ansi_link, colored
|
|
8
|
+
from scribe.models import VoskTranscriber, WhisperTranscriber, OpenaiAPITranscriber
|
|
9
9
|
|
|
10
10
|
with open(Path(__file__).parent / "models.toml", "rb") as f:
|
|
11
11
|
language_config_default = tomllib.load(f)
|
|
@@ -24,7 +24,7 @@ def get_default_backend():
|
|
|
24
24
|
except ImportError:
|
|
25
25
|
raise ImportError("Please install either vosk or whisper to use this script.")
|
|
26
26
|
|
|
27
|
-
BACKENDS = ["whisper", "vosk"]
|
|
27
|
+
BACKENDS = ["whisper", "vosk", "openaiapi"]
|
|
28
28
|
UNAVAILABLE_BACKENDS = []
|
|
29
29
|
|
|
30
30
|
|
|
@@ -59,6 +59,7 @@ def get_transcriber(o, prompt=True):
|
|
|
59
59
|
|
|
60
60
|
whisper_models = ["tiny", "base", "small", "medium", "large", "turbo"]
|
|
61
61
|
whisper_english_models = ["tiny.en", "base.en", "small.en", "medium.en"]
|
|
62
|
+
whisperapi_models = ["whisper-1"]
|
|
62
63
|
|
|
63
64
|
if o.dummy:
|
|
64
65
|
return DummyTranscriber("whisper", "dummy")
|
|
@@ -68,26 +69,17 @@ def get_transcriber(o, prompt=True):
|
|
|
68
69
|
o.backend = "vosk"
|
|
69
70
|
elif o.model in whisper_models + whisper_english_models:
|
|
70
71
|
o.backend = "whisper"
|
|
72
|
+
elif o.model in whisperapi_models:
|
|
73
|
+
o.backend = "openaiapi"
|
|
71
74
|
|
|
72
75
|
if o.backend:
|
|
73
|
-
checked_backend = check_dependencies(o.backend)
|
|
74
|
-
if not checked_backend:
|
|
75
|
-
print(f"Backend {o.backend} is not available.")
|
|
76
|
-
exit(1)
|
|
77
76
|
backend = o.backend
|
|
78
77
|
|
|
79
78
|
elif not prompt:
|
|
80
79
|
backend = BACKENDS[0]
|
|
81
80
|
|
|
82
81
|
else:
|
|
83
|
-
|
|
84
|
-
while not checked_backend:
|
|
85
|
-
backend = prompt_choices(BACKENDS, o.backend, "backend", UNAVAILABLE_BACKENDS)
|
|
86
|
-
# raise an error if the user has explicitly selected a backend that is not available
|
|
87
|
-
checked_backend = check_dependencies(backend, raise_error=backend==o.backend)
|
|
88
|
-
if not checked_backend:
|
|
89
|
-
print(f"Backend {o.backend} is not available.")
|
|
90
|
-
UNAVAILABLE_BACKENDS.append(backend)
|
|
82
|
+
backend = prompt_choices(BACKENDS, o.backend, "backend", UNAVAILABLE_BACKENDS)
|
|
91
83
|
|
|
92
84
|
print(f"Selected backend: {backend}")
|
|
93
85
|
|
|
@@ -131,6 +123,13 @@ def get_transcriber(o, prompt=True):
|
|
|
131
123
|
|
|
132
124
|
model = pick_specialist_model(model, o.language, backend)
|
|
133
125
|
|
|
126
|
+
elif backend == "openaiapi":
|
|
127
|
+
model = o.model or "whisper-1"
|
|
128
|
+
|
|
129
|
+
else:
|
|
130
|
+
raise ValueError(f"Unknown backend: {backend}")
|
|
131
|
+
|
|
132
|
+
|
|
134
133
|
print(f"Selected model: {model}")
|
|
135
134
|
|
|
136
135
|
if backend == "vosk":
|
|
@@ -152,6 +151,12 @@ def get_transcriber(o, prompt=True):
|
|
|
152
151
|
restart_after_silence=o.restart_after_silence,
|
|
153
152
|
model_kwargs={"download_root": o.download_folder_whisper})
|
|
154
153
|
|
|
154
|
+
elif backend == "openaiapi":
|
|
155
|
+
transcriber = OpenaiAPITranscriber(model_name=model, samplerate=o.samplerate,
|
|
156
|
+
timeout=o.duration, silence_duration=o.silence, silence_thresh=o.silence_db,
|
|
157
|
+
restart_after_silence=o.restart_after_silence, api_key=o.api_key)
|
|
158
|
+
|
|
159
|
+
|
|
155
160
|
else:
|
|
156
161
|
raise ValueError(f"Unknown backend: {backend}")
|
|
157
162
|
|
|
@@ -195,6 +200,10 @@ def get_parser():
|
|
|
195
200
|
group.add_argument("--silence-db", default=-30, type=float, help="silence magnitude in decibel (default %(default)s db)")
|
|
196
201
|
group.add_argument("-a", "--restart-after-silence", action="store_true", help="Restart the recording after a transcription triggered by a silence")
|
|
197
202
|
|
|
203
|
+
group = parser.add_argument_group("whisper api")
|
|
204
|
+
group.add_argument("--api-key",
|
|
205
|
+
help="API key for the Whisper API backend.")
|
|
206
|
+
|
|
198
207
|
parser.add_argument("--download-folder-vosk", help="Folder to store Vosk models.")
|
|
199
208
|
parser.add_argument("--download-folder-whisper", help="Folder to store Whisper models.")
|
|
200
209
|
|
|
@@ -206,11 +215,11 @@ def start_recording(micro, transcriber, clipboard=True, keyboard=False, latency=
|
|
|
206
215
|
|
|
207
216
|
if keyboard:
|
|
208
217
|
from scribe.keyboard import type_text
|
|
209
|
-
|
|
218
|
+
transcriber.log("Change focus to target app during transcription.")
|
|
210
219
|
|
|
211
220
|
if clipboard:
|
|
212
221
|
import pyperclip
|
|
213
|
-
|
|
222
|
+
transcriber.log("The full transcription will be copied to clipboard as it becomes available.")
|
|
214
223
|
|
|
215
224
|
fulltext = ""
|
|
216
225
|
|
|
@@ -301,7 +310,7 @@ def create_app(micro, transcriber, **kwargs):
|
|
|
301
310
|
def callback_stop_recording(icon, item):
|
|
302
311
|
# Here we need to stop the recording thread
|
|
303
312
|
|
|
304
|
-
transcriber.
|
|
313
|
+
transcriber.interrupt = True
|
|
305
314
|
if hasattr(icon, "_recording_thread"):
|
|
306
315
|
icon._recording_thread.join()
|
|
307
316
|
if hasattr(icon, "_monitoring_thread"):
|
|
@@ -310,7 +319,7 @@ def create_app(micro, transcriber, **kwargs):
|
|
|
310
319
|
def callback_record(icon, item):
|
|
311
320
|
# kwargs["callback"] = icon.update_menu # NOTE: the thread will finish AFTER the callback is complete
|
|
312
321
|
if transcriber.busy:
|
|
313
|
-
|
|
322
|
+
transcriber.log("Still busy recording or transcribing.")
|
|
314
323
|
return
|
|
315
324
|
|
|
316
325
|
if hasattr(icon, "_recording_thread") and icon._recording_thread.is_alive():
|
|
@@ -362,13 +371,15 @@ def main(args=None):
|
|
|
362
371
|
transcriber = get_transcriber(o, prompt=o.prompt)
|
|
363
372
|
print(f"Model [{colored(transcriber.model_name, 'light_blue', attrs=['bold'])}] from [{colored(transcriber.backend, 'light_blue', attrs=['bold'])}] selected.")
|
|
364
373
|
show_output = ["clipboard", "keyboard", "output_file"]
|
|
365
|
-
show_options = ["ascii", "
|
|
374
|
+
show_options = ["ascii", "restart_after_silence"]
|
|
366
375
|
activated_output = [colored(option if type(getattr(o, option)) is bool else f'{option}={getattr(o, option)}', 'light_blue') for option in show_output if getattr(o, option)]
|
|
367
376
|
activated_options = [colored(option if type(getattr(o, option)) is bool else f'{option}={getattr(o, option)}', 'light_blue') for option in show_options if getattr(o, option)]
|
|
368
377
|
if activated_output:
|
|
369
378
|
print(f"Output: {' | '.join(activated_output)}")
|
|
370
379
|
else:
|
|
371
380
|
print(colored(f"No output selected -> terminal only", "light_red"))
|
|
381
|
+
if o.app:
|
|
382
|
+
print(colored("App mode enabled", "light_green"))
|
|
372
383
|
if activated_options:
|
|
373
384
|
print(f"Options: {' | '.join(activated_options)}")
|
|
374
385
|
if o.prompt:
|
|
@@ -421,7 +432,7 @@ def main(args=None):
|
|
|
421
432
|
o.app = not o.app
|
|
422
433
|
continue
|
|
423
434
|
if key == "a":
|
|
424
|
-
transcriber.restart_after_silence = not transcriber.restart_after_silence
|
|
435
|
+
o.restart_after_silence = transcriber.restart_after_silence = not transcriber.restart_after_silence
|
|
425
436
|
continue
|
|
426
437
|
if key == "t":
|
|
427
438
|
ans = input(f"Enter new duration in seconds (current: {transcriber.timeout}): ")
|
|
@@ -10,9 +10,17 @@ def main():
|
|
|
10
10
|
sys.exit(0)
|
|
11
11
|
|
|
12
12
|
parser = argparse.ArgumentParser("Install the desktop file for the scribe package. Any arguments to this script will be passed on to `scribe`.")
|
|
13
|
+
parser.add_argument("--name", help="The title of the desktop app", default="Scribe")
|
|
14
|
+
parser.add_argument("--startup-wm-class")
|
|
15
|
+
parser.add_argument("--no-terminal", action="store_false", dest="terminal", help="Don't show the terminal (goes in --app mode)")
|
|
13
16
|
o, rest = parser.parse_known_args()
|
|
14
17
|
o.arguments = rest
|
|
15
18
|
|
|
19
|
+
if not o.terminal and "--app" not in o.arguments:
|
|
20
|
+
o.arguments.append("--app")
|
|
21
|
+
if not o.terminal and "--no-prompt" not in o.arguments:
|
|
22
|
+
o.arguments.append("--no-prompt")
|
|
23
|
+
|
|
16
24
|
SOURCE_SCRIBE_DATA = os.path.dirname(scribe_data.__file__)
|
|
17
25
|
|
|
18
26
|
HOME = os.environ.get('HOME',os.path.expanduser('~'))
|
|
@@ -25,15 +33,18 @@ def main():
|
|
|
25
33
|
with open(os.path.join(SOURCE_SCRIBE_DATA, 'templates', 'scribe.desktop')) as f:
|
|
26
34
|
template = f.read()
|
|
27
35
|
|
|
36
|
+
simple_name = o.name.lower().replace(' ','-').replace(os.path.sep, '-')
|
|
28
37
|
bin_folder = sysconfig.get_path("scripts")
|
|
29
38
|
icon_folder = os.path.join(SOURCE_SCRIBE_DATA, 'share')
|
|
30
|
-
desktop_filecontent = template.format(icon_folder=icon_folder, bin_folder=bin_folder,
|
|
39
|
+
desktop_filecontent = template.format(icon_folder=icon_folder, bin_folder=bin_folder,
|
|
40
|
+
name=o.name, terminal=str(o.terminal).lower(),
|
|
41
|
+
StartupWMClass=o.startup_wm_class or f"crx_mpnasdanpmm_{simple_name}",
|
|
42
|
+
options=' ' + ' '.join(o.arguments) if o.arguments else '')
|
|
31
43
|
|
|
32
|
-
desktop_filepath = os.path.join(XDG_APP_DATA, '
|
|
44
|
+
desktop_filepath = os.path.join(XDG_APP_DATA, f'{simple_name}.desktop')
|
|
33
45
|
print("Writing GNOME desktop file:", desktop_filepath)
|
|
34
46
|
with open(desktop_filepath, "w") as f:
|
|
35
47
|
f.write(desktop_filecontent)
|
|
36
48
|
|
|
37
|
-
|
|
38
49
|
if __name__ == "__main__":
|
|
39
50
|
main()
|
|
@@ -22,7 +22,7 @@ class StopRecording(Exception):
|
|
|
22
22
|
class AbstractTranscriber:
|
|
23
23
|
backend = None
|
|
24
24
|
def __init__(self, model, model_name=None, language=None, samplerate=16000, timeout=None, model_kwargs={},
|
|
25
|
-
silence_thresh=-40, silence_duration=2, restart_after_silence=False):
|
|
25
|
+
silence_thresh=-40, silence_duration=2, restart_after_silence=False, logger=None):
|
|
26
26
|
self.model_name = model_name
|
|
27
27
|
self.language = language
|
|
28
28
|
self.model = model
|
|
@@ -35,6 +35,12 @@ class AbstractTranscriber:
|
|
|
35
35
|
self.recording = False
|
|
36
36
|
self.busy = False
|
|
37
37
|
self.waiting = False
|
|
38
|
+
self.interrupt = False
|
|
39
|
+
if logger is None:
|
|
40
|
+
import logging
|
|
41
|
+
logging.basicConfig(level=logging.INFO)
|
|
42
|
+
logger = logging.getLogger("scribe")
|
|
43
|
+
self.logger = logger
|
|
38
44
|
self.reset()
|
|
39
45
|
|
|
40
46
|
def get_elapsed(self):
|
|
@@ -54,11 +60,21 @@ class AbstractTranscriber:
|
|
|
54
60
|
self.audio_buffer = b''
|
|
55
61
|
self.start_time = time.time()
|
|
56
62
|
|
|
63
|
+
def log(self, text):
|
|
64
|
+
if text.startswith("\n"):
|
|
65
|
+
print("")
|
|
66
|
+
text = text[1:]
|
|
67
|
+
if self.logger:
|
|
68
|
+
self.logger.info(text)
|
|
69
|
+
else:
|
|
70
|
+
print(f"[{text}]")
|
|
71
|
+
|
|
57
72
|
def start_recording(self, microphone,
|
|
58
73
|
start_message="Recording... Press Ctrl+C to stop.",
|
|
59
|
-
stop_message="
|
|
74
|
+
stop_message="Exit."):
|
|
60
75
|
|
|
61
76
|
self.reset()
|
|
77
|
+
self.interrupt = False
|
|
62
78
|
self.recording = True
|
|
63
79
|
self.waiting = True
|
|
64
80
|
self.busy = True
|
|
@@ -71,9 +87,9 @@ class AbstractTranscriber:
|
|
|
71
87
|
try:
|
|
72
88
|
|
|
73
89
|
with microphone.open_stream():
|
|
74
|
-
|
|
90
|
+
self.log(start_message)
|
|
75
91
|
|
|
76
|
-
while self.
|
|
92
|
+
while not self.interrupt:
|
|
77
93
|
while not microphone.q.empty():
|
|
78
94
|
data = microphone.q.get()
|
|
79
95
|
|
|
@@ -105,7 +121,7 @@ class AbstractTranscriber:
|
|
|
105
121
|
|
|
106
122
|
else:
|
|
107
123
|
if not previous_waiting:
|
|
108
|
-
|
|
124
|
+
self.log("Silence detected...waiting for more audio")
|
|
109
125
|
|
|
110
126
|
if self.is_overtime():
|
|
111
127
|
raise StopRecording("Overtime: {:.2f} seconds".format(self.get_elapsed()))
|
|
@@ -123,7 +139,7 @@ class AbstractTranscriber:
|
|
|
123
139
|
self.busy = False
|
|
124
140
|
yield result
|
|
125
141
|
|
|
126
|
-
|
|
142
|
+
self.log(stop_message)
|
|
127
143
|
|
|
128
144
|
|
|
129
145
|
def get_vosk_model(model, download_root=None, url=None):
|
|
@@ -198,7 +214,7 @@ class WhisperTranscriber(AbstractTranscriber):
|
|
|
198
214
|
super().__init__(model, model_name, language, model_kwargs=model_kwargs, **kwargs)
|
|
199
215
|
|
|
200
216
|
def transcribe_audio(self, audio_bytes):
|
|
201
|
-
|
|
217
|
+
self.log("\nTranscribing")
|
|
202
218
|
audio_array = np.frombuffer(audio_bytes, dtype=np.int16).flatten().astype(np.float32) / 32768.0
|
|
203
219
|
return self.model.transcribe(audio_array, fp16=False, language=self.language)
|
|
204
220
|
|
|
@@ -207,4 +223,34 @@ class WhisperTranscriber(AbstractTranscriber):
|
|
|
207
223
|
return {"text": ""}
|
|
208
224
|
result = self.transcribe_audio(self.audio_buffer)
|
|
209
225
|
self.audio_buffer = b''
|
|
210
|
-
return result
|
|
226
|
+
return result
|
|
227
|
+
|
|
228
|
+
|
|
229
|
+
class OpenaiAPITranscriber(WhisperTranscriber):
|
|
230
|
+
backend = "openaiapi"
|
|
231
|
+
|
|
232
|
+
def __init__(self, model_name="whisper-1", language=None, model_kwargs={}, model=None, api_key=None, **kwargs):
|
|
233
|
+
if model is None:
|
|
234
|
+
import openai
|
|
235
|
+
model = openai.OpenAI(
|
|
236
|
+
api_key=api_key or openai.api_key,
|
|
237
|
+
# 20 seconds (default is 10 minutes)
|
|
238
|
+
timeout=20.0,
|
|
239
|
+
)
|
|
240
|
+
AbstractTranscriber.__init__(self, model, model_name, language, model_kwargs=model_kwargs, **kwargs)
|
|
241
|
+
|
|
242
|
+
def transcribe_audio(self, audio_bytes):
|
|
243
|
+
self.log("\nTranscribing")
|
|
244
|
+
import io
|
|
245
|
+
import soundfile as sf
|
|
246
|
+
audio_data = np.frombuffer(audio_bytes, dtype=np.int16).flatten().astype(np.float32) / 32768.0
|
|
247
|
+
# Write the audio data to an in-memory file in WAV format
|
|
248
|
+
buffer = io.BytesIO()
|
|
249
|
+
sf.write(buffer, audio_data, self.samplerate, format='WAV')
|
|
250
|
+
buffer.seek(0)
|
|
251
|
+
buffer.name = "audio.wav" # Set a filename with a valid extension
|
|
252
|
+
transcription = self.model.audio.transcriptions.create(
|
|
253
|
+
model=self.model_name,
|
|
254
|
+
file=buffer,
|
|
255
|
+
)
|
|
256
|
+
return {"text": transcription.text}
|
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
Metadata-Version: 2.2
|
|
2
2
|
Name: scribe-cli
|
|
3
|
-
Version: 0.
|
|
3
|
+
Version: 0.10.0
|
|
4
4
|
Summary: scribe is a local speech recognition tool that provides real-time transcription using vosk and whisper AI, with the goal of serving as a virtual keyboard on a computer
|
|
5
5
|
Author-email: Mahé Perrette <mahe.perrette@gmail.com>
|
|
6
6
|
License: MIT License
|
|
@@ -55,9 +55,14 @@ Requires-Dist: vosk; extra == "vosk"
|
|
|
55
55
|
Provides-Extra: app
|
|
56
56
|
Requires-Dist: pystray; extra == "app"
|
|
57
57
|
Requires-Dist: PyGObject; extra == "app"
|
|
58
|
+
Provides-Extra: openai
|
|
59
|
+
Requires-Dist: openai; extra == "openai"
|
|
60
|
+
Requires-Dist: soundfile; extra == "openai"
|
|
58
61
|
Provides-Extra: all
|
|
59
62
|
Requires-Dist: pynput; extra == "all"
|
|
60
63
|
Requires-Dist: openai-whisper; extra == "all"
|
|
64
|
+
Requires-Dist: openai; extra == "all"
|
|
65
|
+
Requires-Dist: soundfile; extra == "all"
|
|
61
66
|
Requires-Dist: vosk; extra == "all"
|
|
62
67
|
Requires-Dist: pystray; extra == "all"
|
|
63
68
|
|
|
@@ -66,7 +71,9 @@ Requires-Dist: pystray; extra == "all"
|
|
|
66
71
|
|
|
67
72
|
# Scribe <img src="scribe_data/share/icon.png" width=48px>
|
|
68
73
|
|
|
69
|
-
`scribe` is a
|
|
74
|
+
`scribe` is a speech recognition tool that provides real-time transcription using cutting-edge AI models, with the goal of serving as a virtual keyboard on a computer.
|
|
75
|
+
|
|
76
|
+
It features local, downloadable models with the `vosk` and `whisper` backends, as well as a client to open AI via `openaiapi` backend (API key required).
|
|
70
77
|
|
|
71
78
|
## Compatibility
|
|
72
79
|
|
|
@@ -101,12 +108,10 @@ cd scribe
|
|
|
101
108
|
pip install -e .[all]
|
|
102
109
|
```
|
|
103
110
|
|
|
104
|
-
You can leave the optional dependencies (leave out `[all]`) but must install at least one of `vosk` or `openai-whisper` packages (see Usage below).
|
|
105
|
-
|
|
106
|
-
The `vosk` language models will download on-the-fly.
|
|
107
|
-
The default download folder is `$XDG_CACHE_HOME/{backend}` where `$XDG_CACHE_HOME` defaults to `$HOME/.cache` (note for the `whisper` backend
|
|
108
|
-
the default is left to the `openai-whisper` package and might change in the future).
|
|
111
|
+
You can leave the optional dependencies (leave out `[all]`) but must install at least one of `vosk` or `openai-whisper` or `openai` packages (see Usage below).
|
|
109
112
|
|
|
113
|
+
The language models for local backends `vosk` and `whisper` will download on-the-fly.
|
|
114
|
+
The default download folder is `$XDG_CACHE_HOME/{backend}` where `$XDG_CACHE_HOME` defaults to `$HOME/.cache`.
|
|
110
115
|
|
|
111
116
|
## Usage
|
|
112
117
|
|
|
@@ -115,7 +120,7 @@ Just type in the terminal:
|
|
|
115
120
|
```bash
|
|
116
121
|
scribe
|
|
117
122
|
```
|
|
118
|
-
and the script will guide you through the choice of backend (`whisper` or `vosk`) and the specific language model.
|
|
123
|
+
and the script will guide you through the choice of backend (`whisper` or `vosk` or `openaiapi`) and the specific language model.
|
|
119
124
|
After this, you will be prompted to start recording your microphone and print the transcribed text in real-time (`vosk`)
|
|
120
125
|
or until after recording is complete (`whisper`).
|
|
121
126
|
You can interrupt the recording via Ctrl + C and start again or change model.
|
|
@@ -129,9 +134,9 @@ The `vosk` backend is much faster and very good at doing real-time transcription
|
|
|
129
134
|
It becomes really powerful when used for longer or interactive typing session with the [keyboard](#virtual-keyboard-experimental) option, e.g. to make notes or chat with an AI.
|
|
130
135
|
There are many [vosk models](https://alphacephei.com/vosk/models) available, and here a few are associated to [a handful of languages](scribe/models.toml) `en`, `fr`, `it`, `de` (so far).
|
|
131
136
|
|
|
132
|
-
|
|
137
|
+
The `openaiapi` backend uses `whisper-1` model at the time of writing. It requires an API key
|
|
133
138
|
```bash
|
|
134
|
-
scribe --backend
|
|
139
|
+
scribe --backend openaiapi --api YOURAPIKEY
|
|
135
140
|
```
|
|
136
141
|
where `--no-prompt` jumps right to the recording (after the first interruption, you can still choose to change the backend and model).
|
|
137
142
|
|
|
@@ -190,7 +195,8 @@ To activate start with:
|
|
|
190
195
|
```bash
|
|
191
196
|
scribe --app
|
|
192
197
|
```
|
|
193
|
-
or toggle the app option in the interactive menu. The scribe icon will show, with Record or Quit options.
|
|
198
|
+
or toggle the app option in the interactive menu. The scribe icon will show, with Record, Stop or Quit options. The icon will change based on what the app is doing.
|
|
199
|
+
For the vosk model, there are only two states : recording + transcribing or Idle. For the whisper model there are three states visible from the icon: recording, transcribing and idle/waiting.
|
|
194
200
|
That option requires `pystray` to be installed. This is included with the `pip install ...[all]` option. In Ubuntu the following dependencies were required to make the menus appear:
|
|
195
201
|
|
|
196
202
|
```bash
|
|
@@ -201,15 +207,20 @@ pip install PyGObject
|
|
|
201
207
|
## Start as an application in GNOME
|
|
202
208
|
|
|
203
209
|
If you run Ubuntu (or else?) with GNOME, the script `scribe-install [...]` will create a `scribe.desktop` file and place it under `$HOME/.local/share/applications`
|
|
204
|
-
to make it available from the quick launch menu. Any option will be passed on to `scribe`.
|
|
210
|
+
to make it available from the quick launch menu. Any option will be passed on to `scribe`, with the additional options `--name` and `--no-terminal`.
|
|
211
|
+
`--no-terminal` means no terminal will show up, and it also implies the options `--app --no-prompt`.
|
|
205
212
|
|
|
206
213
|
e.g.
|
|
207
214
|
|
|
208
215
|
```bash
|
|
209
|
-
scribe-install
|
|
216
|
+
scribe-install
|
|
217
|
+
scribe-install --name "Scribe Whisper" --backend whisper --model small --clipboard --restart-after-silence --no-prompt
|
|
218
|
+
scribe-install --name "Scribe Vosk FR" --backend vosk --language fr --keyboard --clipboard --no-terminal
|
|
210
219
|
```
|
|
211
|
-
|
|
212
|
-
|
|
220
|
+
This will install three separate apps:
|
|
221
|
+
- `Super + scribe` : will launch the default version with terminal prompt
|
|
222
|
+
- `Super + whisper` : will launch a present version with the `small` model from `whisper` and start recording right away. You can see what is going on in the terminal and the result is ready to paste from the clipboard.
|
|
223
|
+
- `Super + vosk fr` : will launch a preset version for real-time transcription in French with the `vosk` backend, and throughput to the clipboard and the keyboard, not even opening a terminal (you need to press Record in the tray icon menu to start the recording).
|
|
213
224
|
|
|
214
225
|
|
|
215
226
|
## Fine tuning
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|