scribe-cli 0.4.1__tar.gz → 0.5.1__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (27) hide show
  1. {scribe_cli-0.4.1/scribe_cli.egg-info → scribe_cli-0.5.1}/PKG-INFO +5 -3
  2. {scribe_cli-0.4.1 → scribe_cli-0.5.1}/README.md +3 -2
  3. {scribe_cli-0.4.1 → scribe_cli-0.5.1}/pyproject.toml +1 -0
  4. {scribe_cli-0.4.1 → scribe_cli-0.5.1}/scribe/_version.py +2 -2
  5. {scribe_cli-0.4.1 → scribe_cli-0.5.1}/scribe/models.py +8 -6
  6. {scribe_cli-0.4.1 → scribe_cli-0.5.1}/scribe/streamer.py +31 -10
  7. {scribe_cli-0.4.1 → scribe_cli-0.5.1/scribe_cli.egg-info}/PKG-INFO +5 -3
  8. {scribe_cli-0.4.1 → scribe_cli-0.5.1}/scribe_cli.egg-info/requires.txt +1 -0
  9. {scribe_cli-0.4.1 → scribe_cli-0.5.1}/.github/workflows/pypi.yml +0 -0
  10. {scribe_cli-0.4.1 → scribe_cli-0.5.1}/.gitignore +0 -0
  11. {scribe_cli-0.4.1 → scribe_cli-0.5.1}/LICENSE +0 -0
  12. {scribe_cli-0.4.1 → scribe_cli-0.5.1}/scribe/__init__.py +0 -0
  13. {scribe_cli-0.4.1 → scribe_cli-0.5.1}/scribe/audio.py +0 -0
  14. {scribe_cli-0.4.1 → scribe_cli-0.5.1}/scribe/install_desktop.py +0 -0
  15. {scribe_cli-0.4.1 → scribe_cli-0.5.1}/scribe/keyboard.py +0 -0
  16. {scribe_cli-0.4.1 → scribe_cli-0.5.1}/scribe/models.toml +0 -0
  17. {scribe_cli-0.4.1 → scribe_cli-0.5.1}/scribe/saverecording.py +0 -0
  18. {scribe_cli-0.4.1 → scribe_cli-0.5.1}/scribe/testpynput.py +0 -0
  19. {scribe_cli-0.4.1 → scribe_cli-0.5.1}/scribe/util.py +0 -0
  20. {scribe_cli-0.4.1 → scribe_cli-0.5.1}/scribe_cli.egg-info/SOURCES.txt +0 -0
  21. {scribe_cli-0.4.1 → scribe_cli-0.5.1}/scribe_cli.egg-info/dependency_links.txt +0 -0
  22. {scribe_cli-0.4.1 → scribe_cli-0.5.1}/scribe_cli.egg-info/entry_points.txt +0 -0
  23. {scribe_cli-0.4.1 → scribe_cli-0.5.1}/scribe_cli.egg-info/top_level.txt +0 -0
  24. {scribe_cli-0.4.1 → scribe_cli-0.5.1}/scribe_data/__init__.py +0 -0
  25. {scribe_cli-0.4.1 → scribe_cli-0.5.1}/scribe_data/share/icon.jpg +0 -0
  26. {scribe_cli-0.4.1 → scribe_cli-0.5.1}/scribe_data/templates/scribe.desktop +0 -0
  27. {scribe_cli-0.4.1 → scribe_cli-0.5.1}/setup.cfg +0 -0
@@ -1,6 +1,6 @@
1
1
  Metadata-Version: 2.2
2
2
  Name: scribe-cli
3
- Version: 0.4.1
3
+ Version: 0.5.1
4
4
  Summary: scribe is a local speech recognition tool that provides real-time transcription using vosk and whisper AI.
5
5
  Author-email: Mahé Perrette <mahe.perrette@gmail.com>
6
6
  License: MIT License
@@ -43,6 +43,7 @@ Requires-Dist: numpy
43
43
  Requires-Dist: sounddevice
44
44
  Requires-Dist: tqdm
45
45
  Requires-Dist: requests
46
+ Requires-Dist: pyperclip
46
47
  Provides-Extra: keyboard
47
48
  Requires-Dist: pynput; extra == "keyboard"
48
49
  Provides-Extra: whisper
@@ -56,7 +57,7 @@ Requires-Dist: vosk; extra == "all"
56
57
 
57
58
  # Scribe
58
59
 
59
- `scribe` is a local speech recognition tool that provides real-time transcription using vosk and whisper AI.
60
+ `scribe` is a local speech recognition tool that provides real-time transcription using vosk and whisper AI, with the goal of serving as a virtual keyboard.
60
61
 
61
62
  ## Installation
62
63
 
@@ -99,7 +100,7 @@ scribe
99
100
  and the script will guide you through the choice of backend (`whisper` or `vosk`) and the specific language model.
100
101
  After this, you will be prompted to start recording your microphone and print the transcribed text in real-time (`vosk`)
101
102
  or until after recording is complete (`whisper`).
102
- You can interrupt the recording via Ctrl + C and start again or change model.
103
+ You can interrupt the recording via Ctrl + C and start again or change model. The full content of the transcription will be pasted to the clipboard by default, until interruption.
103
104
 
104
105
  The default (`whisper`) is excellent at transcribing a full-length audio sequences in [many languages](https://github.com/openai/whisper?tab=readme-ov-file#available-models-and-languages). It is really impressive,
105
106
  but it cannot do real-time, and depending on the model can have relatively long execution time, especially with the `turbo` model (at least on my laptop with CPU only). The `small` model is also excellent and runs much faster. It is selected as default in `scribe` for that reason.
@@ -118,6 +119,7 @@ where `--no-prompt` jumps right to the recording (after the first interruption,
118
119
 
119
120
  ### Advanced usage as keyboard replacement
120
121
 
122
+ By default the content of the transcription is paster to the clipboard, but is not propagated further.
121
123
  With the `--keyboard` option `scribe` will attempt to simulate a keyboard and send transcribed characters to the applcation under focus:
122
124
 
123
125
  ```bash
@@ -1,6 +1,6 @@
1
1
  # Scribe
2
2
 
3
- `scribe` is a local speech recognition tool that provides real-time transcription using vosk and whisper AI.
3
+ `scribe` is a local speech recognition tool that provides real-time transcription using vosk and whisper AI, with the goal of serving as a virtual keyboard.
4
4
 
5
5
  ## Installation
6
6
 
@@ -43,7 +43,7 @@ scribe
43
43
  and the script will guide you through the choice of backend (`whisper` or `vosk`) and the specific language model.
44
44
  After this, you will be prompted to start recording your microphone and print the transcribed text in real-time (`vosk`)
45
45
  or until after recording is complete (`whisper`).
46
- You can interrupt the recording via Ctrl + C and start again or change model.
46
+ You can interrupt the recording via Ctrl + C and start again or change model. The full content of the transcription will be pasted to the clipboard by default, until interruption.
47
47
 
48
48
  The default (`whisper`) is excellent at transcribing a full-length audio sequences in [many languages](https://github.com/openai/whisper?tab=readme-ov-file#available-models-and-languages). It is really impressive,
49
49
  but it cannot do real-time, and depending on the model can have relatively long execution time, especially with the `turbo` model (at least on my laptop with CPU only). The `small` model is also excellent and runs much faster. It is selected as default in `scribe` for that reason.
@@ -62,6 +62,7 @@ where `--no-prompt` jumps right to the recording (after the first interruption,
62
62
 
63
63
  ### Advanced usage as keyboard replacement
64
64
 
65
+ By default the content of the transcription is paster to the clipboard, but is not propagated further.
65
66
  With the `--keyboard` option `scribe` will attempt to simulate a keyboard and send transcribed characters to the applcation under focus:
66
67
 
67
68
  ```bash
@@ -17,6 +17,7 @@ dependencies = [
17
17
  "sounddevice",
18
18
  "tqdm",
19
19
  "requests",
20
+ "pyperclip",
20
21
  ]
21
22
  optional-dependencies = { keyboard = ["pynput"], whisper = ["openai-whisper"], vosk = ["vosk"], all = ["pynput", "openai-whisper", "vosk"] }
22
23
 
@@ -12,5 +12,5 @@ __version__: str
12
12
  __version_tuple__: VERSION_TUPLE
13
13
  version_tuple: VERSION_TUPLE
14
14
 
15
- __version__ = version = '0.4.1'
16
- __version_tuple__ = version_tuple = (0, 4, 1)
15
+ __version__ = version = '0.5.1'
16
+ __version_tuple__ = version_tuple = (0, 5, 1)
@@ -1,6 +1,7 @@
1
1
  import os
2
2
  import json
3
3
  import numpy as np
4
+ import time
4
5
  from scribe.util import download_model
5
6
 
6
7
  VOSK_MODELS_FOLDER = os.path.join(os.environ.get("HOME"),
@@ -9,21 +10,22 @@ VOSK_MODELS_FOLDER = os.path.join(os.environ.get("HOME"),
9
10
 
10
11
  class AbstractTranscriber:
11
12
  backend = None
12
- def __init__(self, model, model_name=None, language=None, samplerate=16000, max_duration=None, model_kwargs={}):
13
+ def __init__(self, model, model_name=None, language=None, samplerate=16000, timeout=None, model_kwargs={}):
13
14
  self.model_name = model_name
14
15
  self.language = language
15
16
  self.model = model
16
17
  self.model_kwargs = model_kwargs
17
18
  self.samplerate = samplerate
18
- self.max_duration = max_duration
19
+ self.timeout = timeout
19
20
  self.one_second_bytes = self.samplerate * 2 # 16-bit audio, 1 channel ~ 32000 bytes
20
21
  self.audio_buffer = b''
22
+ self.start_time = time.time()
21
23
 
22
- def get_elapsed(self, size=None):
23
- return len(size or self.audio_buffer) / self.one_second_bytes
24
+ def get_elapsed(self):
25
+ return time.time() - self.start_time
24
26
 
25
- def is_overtime(self, elapsed=None, size=None):
26
- return self.max_duration and (elapsed or self.get_elapsed(size)) > self.max_duration
27
+ def is_overtime(self):
28
+ return time.time() - self.start_time > self.timeout
27
29
 
28
30
  def transcribe_realtime_audio(self, audio_bytes=b""):
29
31
  self.audio_buffer += audio_bytes
@@ -12,14 +12,25 @@ language_config = language_config_default.copy()
12
12
 
13
13
 
14
14
  # Commencer l'enregistrement
15
- def start_recording(micro, transcriber, keyboard=False, latency=0):
15
+ def start_recording(micro, transcriber, clipboard=True, keyboard=False, latency=0):
16
16
 
17
17
  if keyboard:
18
18
  try:
19
19
  from scribe.keyboard import type_text
20
20
  except ImportError:
21
21
  keyboard = False
22
- exit(1)
22
+ print("Keyboard simulation is not available.")
23
+ return
24
+
25
+ if clipboard:
26
+ try:
27
+ import pyperclip
28
+ except ImportError:
29
+ clipboard = False
30
+ print("Clipboard simulation is not available.")
31
+ return
32
+
33
+ fulltext = ""
23
34
 
24
35
  greetings = { k: v for k, v in language_config["_meta"].get(transcriber.language, {}).items()
25
36
  if v is not None and k.startswith(("start", "stop"))
@@ -32,6 +43,11 @@ def start_recording(micro, transcriber, keyboard=False, latency=0):
32
43
  print(result.get('text'))
33
44
  if keyboard:
34
45
  type_text(result['text'] + " ", interval=latency) # Simulate typing
46
+
47
+ if clipboard:
48
+ fulltext += result['text'] + " "
49
+ pyperclip.copy(fulltext)
50
+
35
51
  else:
36
52
  print_partial(result.get('partial', ''))
37
53
 
@@ -136,7 +152,7 @@ def get_transcriber(o, prompt=True):
136
152
  transcriber = VoskTranscriber(model_name=model,
137
153
  language=o.language,
138
154
  samplerate=o.samplerate,
139
- max_duration=None, # vosk keeps going (no timeout)
155
+ timeout=None, # vosk keeps going (no timeout)
140
156
  model_kwargs={"data_folder": o.data_folder})
141
157
  except Exception as error:
142
158
  print(error)
@@ -144,7 +160,7 @@ def get_transcriber(o, prompt=True):
144
160
  exit(1)
145
161
 
146
162
  elif backend == "whisper":
147
- transcriber = WhisperTranscriber(model_name=model, language=o.language, samplerate=o.samplerate, max_duration=o.duration)
163
+ transcriber = WhisperTranscriber(model_name=model, language=o.language, samplerate=o.samplerate, timeout=o.duration)
148
164
 
149
165
  else:
150
166
  raise ValueError(f"Unknown backend: {backend}")
@@ -170,6 +186,7 @@ def get_parser():
170
186
  parser.add_argument("--samplerate", default=16000, type=int, help=argparse.SUPPRESS)
171
187
  parser.add_argument("--duration", default=60, type=int, help="duration in seconds before whisper models start transcribing (default %(default)ss)")
172
188
  parser.add_argument("--keyboard", action="store_true")
189
+ parser.add_argument("--no-clipboard", dest="clipboard", action="store_false")
173
190
  parser.add_argument("--latency", default=0, type=float, help="keyboard latency")
174
191
 
175
192
  parser.add_argument("--data-folder", help="Folder to store Vosk models.")
@@ -191,14 +208,15 @@ def main(args=None):
191
208
  while True:
192
209
  if transcriber is None:
193
210
  transcriber = get_transcriber(o, prompt=o.prompt)
194
- print(f"[ Model {transcriber.model_name} from {transcriber.backend} selected. ]")
211
+ print(f"[ Model {transcriber.model_name} from {transcriber.backend} selected. Keyboard [{'on' if o.keyboard else 'off'}]. Clipboard [{'on' if o.clipboard else 'off'}]]")
195
212
  if o.prompt:
196
213
  print(f"Choose any of the following actions:")
197
214
  print(f"[q] quit")
198
215
  print(f"[e] change model")
199
- print(f"[k] toggle keyboard {'off' if o.keyboard else 'on'}")
216
+ print(f"[k] toggle keyboard [{'off' if o.keyboard else 'on'}]")
217
+ print(f"[c] toggle clipboard [{'off' if o.clipboard else 'on'}]")
200
218
  if transcriber.backend == "whisper":
201
- print(f"[t] change duration (currently {transcriber.max_duration}s)")
219
+ print(f"[t] change duration (currently {transcriber.timeout}s)")
202
220
  print(colored(f"Press [Enter] or any other key to start recording.", "BOLD"))
203
221
 
204
222
  key = input()
@@ -210,15 +228,18 @@ def main(args=None):
210
228
  if key == "k":
211
229
  o.keyboard = not o.keyboard
212
230
  continue
231
+ if key == "c":
232
+ o.clipboard = not o.clipboard
233
+ continue
213
234
  if key == "t":
214
- duration = input(f"Enter new duration in seconds (current: {transcriber.max_duration}): ")
235
+ duration = input(f"Enter new duration in seconds (current: {transcriber.timeout}): ")
215
236
  try:
216
- o.duration = transcriber.max_duration = int(duration)
237
+ o.duration = transcriber.timeout = int(duration)
217
238
  except:
218
239
  print("Invalid duration. Must be an integer.")
219
240
  continue
220
241
 
221
- start_recording(micro, transcriber, keyboard=o.keyboard, latency=o.latency)
242
+ start_recording(micro, transcriber, clipboard=o.clipboard, keyboard=o.keyboard, latency=o.latency)
222
243
 
223
244
  # if we arrived so far, that means we pressed Ctrl + C anyway, and need Enter to move on.
224
245
  # So we leave the wider range of options to change the model.
@@ -1,6 +1,6 @@
1
1
  Metadata-Version: 2.2
2
2
  Name: scribe-cli
3
- Version: 0.4.1
3
+ Version: 0.5.1
4
4
  Summary: scribe is a local speech recognition tool that provides real-time transcription using vosk and whisper AI.
5
5
  Author-email: Mahé Perrette <mahe.perrette@gmail.com>
6
6
  License: MIT License
@@ -43,6 +43,7 @@ Requires-Dist: numpy
43
43
  Requires-Dist: sounddevice
44
44
  Requires-Dist: tqdm
45
45
  Requires-Dist: requests
46
+ Requires-Dist: pyperclip
46
47
  Provides-Extra: keyboard
47
48
  Requires-Dist: pynput; extra == "keyboard"
48
49
  Provides-Extra: whisper
@@ -56,7 +57,7 @@ Requires-Dist: vosk; extra == "all"
56
57
 
57
58
  # Scribe
58
59
 
59
- `scribe` is a local speech recognition tool that provides real-time transcription using vosk and whisper AI.
60
+ `scribe` is a local speech recognition tool that provides real-time transcription using vosk and whisper AI, with the goal of serving as a virtual keyboard.
60
61
 
61
62
  ## Installation
62
63
 
@@ -99,7 +100,7 @@ scribe
99
100
  and the script will guide you through the choice of backend (`whisper` or `vosk`) and the specific language model.
100
101
  After this, you will be prompted to start recording your microphone and print the transcribed text in real-time (`vosk`)
101
102
  or until after recording is complete (`whisper`).
102
- You can interrupt the recording via Ctrl + C and start again or change model.
103
+ You can interrupt the recording via Ctrl + C and start again or change model. The full content of the transcription will be pasted to the clipboard by default, until interruption.
103
104
 
104
105
  The default (`whisper`) is excellent at transcribing a full-length audio sequences in [many languages](https://github.com/openai/whisper?tab=readme-ov-file#available-models-and-languages). It is really impressive,
105
106
  but it cannot do real-time, and depending on the model can have relatively long execution time, especially with the `turbo` model (at least on my laptop with CPU only). The `small` model is also excellent and runs much faster. It is selected as default in `scribe` for that reason.
@@ -118,6 +119,7 @@ where `--no-prompt` jumps right to the recording (after the first interruption,
118
119
 
119
120
  ### Advanced usage as keyboard replacement
120
121
 
122
+ By default the content of the transcription is paster to the clipboard, but is not propagated further.
121
123
  With the `--keyboard` option `scribe` will attempt to simulate a keyboard and send transcribed characters to the applcation under focus:
122
124
 
123
125
  ```bash
@@ -2,6 +2,7 @@ numpy
2
2
  sounddevice
3
3
  tqdm
4
4
  requests
5
+ pyperclip
5
6
 
6
7
  [all]
7
8
  pynput
File without changes
File without changes
File without changes
File without changes
File without changes