scribe-cli 0.4.1__tar.gz → 0.5.1__tar.gz
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- {scribe_cli-0.4.1/scribe_cli.egg-info → scribe_cli-0.5.1}/PKG-INFO +5 -3
- {scribe_cli-0.4.1 → scribe_cli-0.5.1}/README.md +3 -2
- {scribe_cli-0.4.1 → scribe_cli-0.5.1}/pyproject.toml +1 -0
- {scribe_cli-0.4.1 → scribe_cli-0.5.1}/scribe/_version.py +2 -2
- {scribe_cli-0.4.1 → scribe_cli-0.5.1}/scribe/models.py +8 -6
- {scribe_cli-0.4.1 → scribe_cli-0.5.1}/scribe/streamer.py +31 -10
- {scribe_cli-0.4.1 → scribe_cli-0.5.1/scribe_cli.egg-info}/PKG-INFO +5 -3
- {scribe_cli-0.4.1 → scribe_cli-0.5.1}/scribe_cli.egg-info/requires.txt +1 -0
- {scribe_cli-0.4.1 → scribe_cli-0.5.1}/.github/workflows/pypi.yml +0 -0
- {scribe_cli-0.4.1 → scribe_cli-0.5.1}/.gitignore +0 -0
- {scribe_cli-0.4.1 → scribe_cli-0.5.1}/LICENSE +0 -0
- {scribe_cli-0.4.1 → scribe_cli-0.5.1}/scribe/__init__.py +0 -0
- {scribe_cli-0.4.1 → scribe_cli-0.5.1}/scribe/audio.py +0 -0
- {scribe_cli-0.4.1 → scribe_cli-0.5.1}/scribe/install_desktop.py +0 -0
- {scribe_cli-0.4.1 → scribe_cli-0.5.1}/scribe/keyboard.py +0 -0
- {scribe_cli-0.4.1 → scribe_cli-0.5.1}/scribe/models.toml +0 -0
- {scribe_cli-0.4.1 → scribe_cli-0.5.1}/scribe/saverecording.py +0 -0
- {scribe_cli-0.4.1 → scribe_cli-0.5.1}/scribe/testpynput.py +0 -0
- {scribe_cli-0.4.1 → scribe_cli-0.5.1}/scribe/util.py +0 -0
- {scribe_cli-0.4.1 → scribe_cli-0.5.1}/scribe_cli.egg-info/SOURCES.txt +0 -0
- {scribe_cli-0.4.1 → scribe_cli-0.5.1}/scribe_cli.egg-info/dependency_links.txt +0 -0
- {scribe_cli-0.4.1 → scribe_cli-0.5.1}/scribe_cli.egg-info/entry_points.txt +0 -0
- {scribe_cli-0.4.1 → scribe_cli-0.5.1}/scribe_cli.egg-info/top_level.txt +0 -0
- {scribe_cli-0.4.1 → scribe_cli-0.5.1}/scribe_data/__init__.py +0 -0
- {scribe_cli-0.4.1 → scribe_cli-0.5.1}/scribe_data/share/icon.jpg +0 -0
- {scribe_cli-0.4.1 → scribe_cli-0.5.1}/scribe_data/templates/scribe.desktop +0 -0
- {scribe_cli-0.4.1 → scribe_cli-0.5.1}/setup.cfg +0 -0
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
Metadata-Version: 2.2
|
|
2
2
|
Name: scribe-cli
|
|
3
|
-
Version: 0.
|
|
3
|
+
Version: 0.5.1
|
|
4
4
|
Summary: scribe is a local speech recognition tool that provides real-time transcription using vosk and whisper AI.
|
|
5
5
|
Author-email: Mahé Perrette <mahe.perrette@gmail.com>
|
|
6
6
|
License: MIT License
|
|
@@ -43,6 +43,7 @@ Requires-Dist: numpy
|
|
|
43
43
|
Requires-Dist: sounddevice
|
|
44
44
|
Requires-Dist: tqdm
|
|
45
45
|
Requires-Dist: requests
|
|
46
|
+
Requires-Dist: pyperclip
|
|
46
47
|
Provides-Extra: keyboard
|
|
47
48
|
Requires-Dist: pynput; extra == "keyboard"
|
|
48
49
|
Provides-Extra: whisper
|
|
@@ -56,7 +57,7 @@ Requires-Dist: vosk; extra == "all"
|
|
|
56
57
|
|
|
57
58
|
# Scribe
|
|
58
59
|
|
|
59
|
-
`scribe` is a local speech recognition tool that provides real-time transcription using vosk and whisper AI.
|
|
60
|
+
`scribe` is a local speech recognition tool that provides real-time transcription using vosk and whisper AI, with the goal of serving as a virtual keyboard.
|
|
60
61
|
|
|
61
62
|
## Installation
|
|
62
63
|
|
|
@@ -99,7 +100,7 @@ scribe
|
|
|
99
100
|
and the script will guide you through the choice of backend (`whisper` or `vosk`) and the specific language model.
|
|
100
101
|
After this, you will be prompted to start recording your microphone and print the transcribed text in real-time (`vosk`)
|
|
101
102
|
or until after recording is complete (`whisper`).
|
|
102
|
-
You can interrupt the recording via Ctrl + C and start again or change model.
|
|
103
|
+
You can interrupt the recording via Ctrl + C and start again or change model. The full content of the transcription will be pasted to the clipboard by default, until interruption.
|
|
103
104
|
|
|
104
105
|
The default (`whisper`) is excellent at transcribing a full-length audio sequences in [many languages](https://github.com/openai/whisper?tab=readme-ov-file#available-models-and-languages). It is really impressive,
|
|
105
106
|
but it cannot do real-time, and depending on the model can have relatively long execution time, especially with the `turbo` model (at least on my laptop with CPU only). The `small` model is also excellent and runs much faster. It is selected as default in `scribe` for that reason.
|
|
@@ -118,6 +119,7 @@ where `--no-prompt` jumps right to the recording (after the first interruption,
|
|
|
118
119
|
|
|
119
120
|
### Advanced usage as keyboard replacement
|
|
120
121
|
|
|
122
|
+
By default the content of the transcription is paster to the clipboard, but is not propagated further.
|
|
121
123
|
With the `--keyboard` option `scribe` will attempt to simulate a keyboard and send transcribed characters to the applcation under focus:
|
|
122
124
|
|
|
123
125
|
```bash
|
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
# Scribe
|
|
2
2
|
|
|
3
|
-
`scribe` is a local speech recognition tool that provides real-time transcription using vosk and whisper AI.
|
|
3
|
+
`scribe` is a local speech recognition tool that provides real-time transcription using vosk and whisper AI, with the goal of serving as a virtual keyboard.
|
|
4
4
|
|
|
5
5
|
## Installation
|
|
6
6
|
|
|
@@ -43,7 +43,7 @@ scribe
|
|
|
43
43
|
and the script will guide you through the choice of backend (`whisper` or `vosk`) and the specific language model.
|
|
44
44
|
After this, you will be prompted to start recording your microphone and print the transcribed text in real-time (`vosk`)
|
|
45
45
|
or until after recording is complete (`whisper`).
|
|
46
|
-
You can interrupt the recording via Ctrl + C and start again or change model.
|
|
46
|
+
You can interrupt the recording via Ctrl + C and start again or change model. The full content of the transcription will be pasted to the clipboard by default, until interruption.
|
|
47
47
|
|
|
48
48
|
The default (`whisper`) is excellent at transcribing a full-length audio sequences in [many languages](https://github.com/openai/whisper?tab=readme-ov-file#available-models-and-languages). It is really impressive,
|
|
49
49
|
but it cannot do real-time, and depending on the model can have relatively long execution time, especially with the `turbo` model (at least on my laptop with CPU only). The `small` model is also excellent and runs much faster. It is selected as default in `scribe` for that reason.
|
|
@@ -62,6 +62,7 @@ where `--no-prompt` jumps right to the recording (after the first interruption,
|
|
|
62
62
|
|
|
63
63
|
### Advanced usage as keyboard replacement
|
|
64
64
|
|
|
65
|
+
By default the content of the transcription is paster to the clipboard, but is not propagated further.
|
|
65
66
|
With the `--keyboard` option `scribe` will attempt to simulate a keyboard and send transcribed characters to the applcation under focus:
|
|
66
67
|
|
|
67
68
|
```bash
|
|
@@ -1,6 +1,7 @@
|
|
|
1
1
|
import os
|
|
2
2
|
import json
|
|
3
3
|
import numpy as np
|
|
4
|
+
import time
|
|
4
5
|
from scribe.util import download_model
|
|
5
6
|
|
|
6
7
|
VOSK_MODELS_FOLDER = os.path.join(os.environ.get("HOME"),
|
|
@@ -9,21 +10,22 @@ VOSK_MODELS_FOLDER = os.path.join(os.environ.get("HOME"),
|
|
|
9
10
|
|
|
10
11
|
class AbstractTranscriber:
|
|
11
12
|
backend = None
|
|
12
|
-
def __init__(self, model, model_name=None, language=None, samplerate=16000,
|
|
13
|
+
def __init__(self, model, model_name=None, language=None, samplerate=16000, timeout=None, model_kwargs={}):
|
|
13
14
|
self.model_name = model_name
|
|
14
15
|
self.language = language
|
|
15
16
|
self.model = model
|
|
16
17
|
self.model_kwargs = model_kwargs
|
|
17
18
|
self.samplerate = samplerate
|
|
18
|
-
self.
|
|
19
|
+
self.timeout = timeout
|
|
19
20
|
self.one_second_bytes = self.samplerate * 2 # 16-bit audio, 1 channel ~ 32000 bytes
|
|
20
21
|
self.audio_buffer = b''
|
|
22
|
+
self.start_time = time.time()
|
|
21
23
|
|
|
22
|
-
def get_elapsed(self
|
|
23
|
-
return
|
|
24
|
+
def get_elapsed(self):
|
|
25
|
+
return time.time() - self.start_time
|
|
24
26
|
|
|
25
|
-
def is_overtime(self
|
|
26
|
-
return
|
|
27
|
+
def is_overtime(self):
|
|
28
|
+
return time.time() - self.start_time > self.timeout
|
|
27
29
|
|
|
28
30
|
def transcribe_realtime_audio(self, audio_bytes=b""):
|
|
29
31
|
self.audio_buffer += audio_bytes
|
|
@@ -12,14 +12,25 @@ language_config = language_config_default.copy()
|
|
|
12
12
|
|
|
13
13
|
|
|
14
14
|
# Commencer l'enregistrement
|
|
15
|
-
def start_recording(micro, transcriber, keyboard=False, latency=0):
|
|
15
|
+
def start_recording(micro, transcriber, clipboard=True, keyboard=False, latency=0):
|
|
16
16
|
|
|
17
17
|
if keyboard:
|
|
18
18
|
try:
|
|
19
19
|
from scribe.keyboard import type_text
|
|
20
20
|
except ImportError:
|
|
21
21
|
keyboard = False
|
|
22
|
-
|
|
22
|
+
print("Keyboard simulation is not available.")
|
|
23
|
+
return
|
|
24
|
+
|
|
25
|
+
if clipboard:
|
|
26
|
+
try:
|
|
27
|
+
import pyperclip
|
|
28
|
+
except ImportError:
|
|
29
|
+
clipboard = False
|
|
30
|
+
print("Clipboard simulation is not available.")
|
|
31
|
+
return
|
|
32
|
+
|
|
33
|
+
fulltext = ""
|
|
23
34
|
|
|
24
35
|
greetings = { k: v for k, v in language_config["_meta"].get(transcriber.language, {}).items()
|
|
25
36
|
if v is not None and k.startswith(("start", "stop"))
|
|
@@ -32,6 +43,11 @@ def start_recording(micro, transcriber, keyboard=False, latency=0):
|
|
|
32
43
|
print(result.get('text'))
|
|
33
44
|
if keyboard:
|
|
34
45
|
type_text(result['text'] + " ", interval=latency) # Simulate typing
|
|
46
|
+
|
|
47
|
+
if clipboard:
|
|
48
|
+
fulltext += result['text'] + " "
|
|
49
|
+
pyperclip.copy(fulltext)
|
|
50
|
+
|
|
35
51
|
else:
|
|
36
52
|
print_partial(result.get('partial', ''))
|
|
37
53
|
|
|
@@ -136,7 +152,7 @@ def get_transcriber(o, prompt=True):
|
|
|
136
152
|
transcriber = VoskTranscriber(model_name=model,
|
|
137
153
|
language=o.language,
|
|
138
154
|
samplerate=o.samplerate,
|
|
139
|
-
|
|
155
|
+
timeout=None, # vosk keeps going (no timeout)
|
|
140
156
|
model_kwargs={"data_folder": o.data_folder})
|
|
141
157
|
except Exception as error:
|
|
142
158
|
print(error)
|
|
@@ -144,7 +160,7 @@ def get_transcriber(o, prompt=True):
|
|
|
144
160
|
exit(1)
|
|
145
161
|
|
|
146
162
|
elif backend == "whisper":
|
|
147
|
-
transcriber = WhisperTranscriber(model_name=model, language=o.language, samplerate=o.samplerate,
|
|
163
|
+
transcriber = WhisperTranscriber(model_name=model, language=o.language, samplerate=o.samplerate, timeout=o.duration)
|
|
148
164
|
|
|
149
165
|
else:
|
|
150
166
|
raise ValueError(f"Unknown backend: {backend}")
|
|
@@ -170,6 +186,7 @@ def get_parser():
|
|
|
170
186
|
parser.add_argument("--samplerate", default=16000, type=int, help=argparse.SUPPRESS)
|
|
171
187
|
parser.add_argument("--duration", default=60, type=int, help="duration in seconds before whisper models start transcribing (default %(default)ss)")
|
|
172
188
|
parser.add_argument("--keyboard", action="store_true")
|
|
189
|
+
parser.add_argument("--no-clipboard", dest="clipboard", action="store_false")
|
|
173
190
|
parser.add_argument("--latency", default=0, type=float, help="keyboard latency")
|
|
174
191
|
|
|
175
192
|
parser.add_argument("--data-folder", help="Folder to store Vosk models.")
|
|
@@ -191,14 +208,15 @@ def main(args=None):
|
|
|
191
208
|
while True:
|
|
192
209
|
if transcriber is None:
|
|
193
210
|
transcriber = get_transcriber(o, prompt=o.prompt)
|
|
194
|
-
print(f"[ Model {transcriber.model_name} from {transcriber.backend} selected. ]")
|
|
211
|
+
print(f"[ Model {transcriber.model_name} from {transcriber.backend} selected. Keyboard [{'on' if o.keyboard else 'off'}]. Clipboard [{'on' if o.clipboard else 'off'}]]")
|
|
195
212
|
if o.prompt:
|
|
196
213
|
print(f"Choose any of the following actions:")
|
|
197
214
|
print(f"[q] quit")
|
|
198
215
|
print(f"[e] change model")
|
|
199
|
-
print(f"[k] toggle keyboard {'off' if o.keyboard else 'on'}")
|
|
216
|
+
print(f"[k] toggle keyboard [{'off' if o.keyboard else 'on'}]")
|
|
217
|
+
print(f"[c] toggle clipboard [{'off' if o.clipboard else 'on'}]")
|
|
200
218
|
if transcriber.backend == "whisper":
|
|
201
|
-
print(f"[t] change duration (currently {transcriber.
|
|
219
|
+
print(f"[t] change duration (currently {transcriber.timeout}s)")
|
|
202
220
|
print(colored(f"Press [Enter] or any other key to start recording.", "BOLD"))
|
|
203
221
|
|
|
204
222
|
key = input()
|
|
@@ -210,15 +228,18 @@ def main(args=None):
|
|
|
210
228
|
if key == "k":
|
|
211
229
|
o.keyboard = not o.keyboard
|
|
212
230
|
continue
|
|
231
|
+
if key == "c":
|
|
232
|
+
o.clipboard = not o.clipboard
|
|
233
|
+
continue
|
|
213
234
|
if key == "t":
|
|
214
|
-
duration = input(f"Enter new duration in seconds (current: {transcriber.
|
|
235
|
+
duration = input(f"Enter new duration in seconds (current: {transcriber.timeout}): ")
|
|
215
236
|
try:
|
|
216
|
-
o.duration = transcriber.
|
|
237
|
+
o.duration = transcriber.timeout = int(duration)
|
|
217
238
|
except:
|
|
218
239
|
print("Invalid duration. Must be an integer.")
|
|
219
240
|
continue
|
|
220
241
|
|
|
221
|
-
start_recording(micro, transcriber, keyboard=o.keyboard, latency=o.latency)
|
|
242
|
+
start_recording(micro, transcriber, clipboard=o.clipboard, keyboard=o.keyboard, latency=o.latency)
|
|
222
243
|
|
|
223
244
|
# if we arrived so far, that means we pressed Ctrl + C anyway, and need Enter to move on.
|
|
224
245
|
# So we leave the wider range of options to change the model.
|
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
Metadata-Version: 2.2
|
|
2
2
|
Name: scribe-cli
|
|
3
|
-
Version: 0.
|
|
3
|
+
Version: 0.5.1
|
|
4
4
|
Summary: scribe is a local speech recognition tool that provides real-time transcription using vosk and whisper AI.
|
|
5
5
|
Author-email: Mahé Perrette <mahe.perrette@gmail.com>
|
|
6
6
|
License: MIT License
|
|
@@ -43,6 +43,7 @@ Requires-Dist: numpy
|
|
|
43
43
|
Requires-Dist: sounddevice
|
|
44
44
|
Requires-Dist: tqdm
|
|
45
45
|
Requires-Dist: requests
|
|
46
|
+
Requires-Dist: pyperclip
|
|
46
47
|
Provides-Extra: keyboard
|
|
47
48
|
Requires-Dist: pynput; extra == "keyboard"
|
|
48
49
|
Provides-Extra: whisper
|
|
@@ -56,7 +57,7 @@ Requires-Dist: vosk; extra == "all"
|
|
|
56
57
|
|
|
57
58
|
# Scribe
|
|
58
59
|
|
|
59
|
-
`scribe` is a local speech recognition tool that provides real-time transcription using vosk and whisper AI.
|
|
60
|
+
`scribe` is a local speech recognition tool that provides real-time transcription using vosk and whisper AI, with the goal of serving as a virtual keyboard.
|
|
60
61
|
|
|
61
62
|
## Installation
|
|
62
63
|
|
|
@@ -99,7 +100,7 @@ scribe
|
|
|
99
100
|
and the script will guide you through the choice of backend (`whisper` or `vosk`) and the specific language model.
|
|
100
101
|
After this, you will be prompted to start recording your microphone and print the transcribed text in real-time (`vosk`)
|
|
101
102
|
or until after recording is complete (`whisper`).
|
|
102
|
-
You can interrupt the recording via Ctrl + C and start again or change model.
|
|
103
|
+
You can interrupt the recording via Ctrl + C and start again or change model. The full content of the transcription will be pasted to the clipboard by default, until interruption.
|
|
103
104
|
|
|
104
105
|
The default (`whisper`) is excellent at transcribing a full-length audio sequences in [many languages](https://github.com/openai/whisper?tab=readme-ov-file#available-models-and-languages). It is really impressive,
|
|
105
106
|
but it cannot do real-time, and depending on the model can have relatively long execution time, especially with the `turbo` model (at least on my laptop with CPU only). The `small` model is also excellent and runs much faster. It is selected as default in `scribe` for that reason.
|
|
@@ -118,6 +119,7 @@ where `--no-prompt` jumps right to the recording (after the first interruption,
|
|
|
118
119
|
|
|
119
120
|
### Advanced usage as keyboard replacement
|
|
120
121
|
|
|
122
|
+
By default the content of the transcription is paster to the clipboard, but is not propagated further.
|
|
121
123
|
With the `--keyboard` option `scribe` will attempt to simulate a keyboard and send transcribed characters to the applcation under focus:
|
|
122
124
|
|
|
123
125
|
```bash
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|