scribe-cli 0.8.0__tar.gz → 0.10.0__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (31) hide show
  1. {scribe_cli-0.8.0/scribe_cli.egg-info → scribe_cli-0.10.0}/PKG-INFO +26 -15
  2. {scribe_cli-0.8.0 → scribe_cli-0.10.0}/README.md +20 -14
  3. {scribe_cli-0.8.0 → scribe_cli-0.10.0}/pyproject.toml +2 -1
  4. {scribe_cli-0.8.0 → scribe_cli-0.10.0}/scribe/_version.py +2 -2
  5. {scribe_cli-0.8.0 → scribe_cli-0.10.0}/scribe/app.py +32 -21
  6. {scribe_cli-0.8.0 → scribe_cli-0.10.0}/scribe/install_desktop.py +14 -3
  7. {scribe_cli-0.8.0 → scribe_cli-0.10.0}/scribe/models.py +54 -8
  8. {scribe_cli-0.8.0 → scribe_cli-0.10.0/scribe_cli.egg-info}/PKG-INFO +26 -15
  9. {scribe_cli-0.8.0 → scribe_cli-0.10.0}/scribe_cli.egg-info/requires.txt +6 -0
  10. scribe_cli-0.10.0/scribe_data/templates/scribe.desktop +8 -0
  11. scribe_cli-0.8.0/scribe_data/templates/scribe.desktop +0 -8
  12. {scribe_cli-0.8.0 → scribe_cli-0.10.0}/.github/workflows/pypi.yml +0 -0
  13. {scribe_cli-0.8.0 → scribe_cli-0.10.0}/.gitignore +0 -0
  14. {scribe_cli-0.8.0 → scribe_cli-0.10.0}/LICENSE +0 -0
  15. {scribe_cli-0.8.0 → scribe_cli-0.10.0}/icon.xcf +0 -0
  16. {scribe_cli-0.8.0 → scribe_cli-0.10.0}/scribe/__init__.py +0 -0
  17. {scribe_cli-0.8.0 → scribe_cli-0.10.0}/scribe/audio.py +0 -0
  18. {scribe_cli-0.8.0 → scribe_cli-0.10.0}/scribe/keyboard.py +0 -0
  19. {scribe_cli-0.8.0 → scribe_cli-0.10.0}/scribe/models.toml +0 -0
  20. {scribe_cli-0.8.0 → scribe_cli-0.10.0}/scribe/saverecording.py +0 -0
  21. {scribe_cli-0.8.0 → scribe_cli-0.10.0}/scribe/testpynput.py +0 -0
  22. {scribe_cli-0.8.0 → scribe_cli-0.10.0}/scribe/util.py +0 -0
  23. {scribe_cli-0.8.0 → scribe_cli-0.10.0}/scribe_cli.egg-info/SOURCES.txt +0 -0
  24. {scribe_cli-0.8.0 → scribe_cli-0.10.0}/scribe_cli.egg-info/dependency_links.txt +0 -0
  25. {scribe_cli-0.8.0 → scribe_cli-0.10.0}/scribe_cli.egg-info/entry_points.txt +0 -0
  26. {scribe_cli-0.8.0 → scribe_cli-0.10.0}/scribe_cli.egg-info/top_level.txt +0 -0
  27. {scribe_cli-0.8.0 → scribe_cli-0.10.0}/scribe_data/__init__.py +0 -0
  28. {scribe_cli-0.8.0 → scribe_cli-0.10.0}/scribe_data/share/icon.png +0 -0
  29. {scribe_cli-0.8.0 → scribe_cli-0.10.0}/scribe_data/share/icon_recording.png +0 -0
  30. {scribe_cli-0.8.0 → scribe_cli-0.10.0}/scribe_data/share/icon_writing.png +0 -0
  31. {scribe_cli-0.8.0 → scribe_cli-0.10.0}/setup.cfg +0 -0
@@ -1,6 +1,6 @@
1
1
  Metadata-Version: 2.2
2
2
  Name: scribe-cli
3
- Version: 0.8.0
3
+ Version: 0.10.0
4
4
  Summary: scribe is a local speech recognition tool that provides real-time transcription using vosk and whisper AI, with the goal of serving as a virtual keyboard on a computer
5
5
  Author-email: Mahé Perrette <mahe.perrette@gmail.com>
6
6
  License: MIT License
@@ -55,9 +55,14 @@ Requires-Dist: vosk; extra == "vosk"
55
55
  Provides-Extra: app
56
56
  Requires-Dist: pystray; extra == "app"
57
57
  Requires-Dist: PyGObject; extra == "app"
58
+ Provides-Extra: openai
59
+ Requires-Dist: openai; extra == "openai"
60
+ Requires-Dist: soundfile; extra == "openai"
58
61
  Provides-Extra: all
59
62
  Requires-Dist: pynput; extra == "all"
60
63
  Requires-Dist: openai-whisper; extra == "all"
64
+ Requires-Dist: openai; extra == "all"
65
+ Requires-Dist: soundfile; extra == "all"
61
66
  Requires-Dist: vosk; extra == "all"
62
67
  Requires-Dist: pystray; extra == "all"
63
68
 
@@ -66,7 +71,9 @@ Requires-Dist: pystray; extra == "all"
66
71
 
67
72
  # Scribe <img src="scribe_data/share/icon.png" width=48px>
68
73
 
69
- `scribe` is a local speech recognition tool that provides real-time transcription using vosk and whisper AI, with the goal of serving as a virtual keyboard on a computer.
74
+ `scribe` is a speech recognition tool that provides real-time transcription using cutting-edge AI models, with the goal of serving as a virtual keyboard on a computer.
75
+
76
+ It features local, downloadable models with the `vosk` and `whisper` backends, as well as a client to open AI via `openaiapi` backend (API key required).
70
77
 
71
78
  ## Compatibility
72
79
 
@@ -101,12 +108,10 @@ cd scribe
101
108
  pip install -e .[all]
102
109
  ```
103
110
 
104
- You can leave the optional dependencies (leave out `[all]`) but must install at least one of `vosk` or `openai-whisper` packages (see Usage below).
105
-
106
- The `vosk` language models will download on-the-fly.
107
- The default download folder is `$XDG_CACHE_HOME/{backend}` where `$XDG_CACHE_HOME` defaults to `$HOME/.cache` (note for the `whisper` backend
108
- the default is left to the `openai-whisper` package and might change in the future).
111
+ You can leave the optional dependencies (leave out `[all]`) but must install at least one of `vosk` or `openai-whisper` or `openai` packages (see Usage below).
109
112
 
113
+ The language models for local backends `vosk` and `whisper` will download on-the-fly.
114
+ The default download folder is `$XDG_CACHE_HOME/{backend}` where `$XDG_CACHE_HOME` defaults to `$HOME/.cache`.
110
115
 
111
116
  ## Usage
112
117
 
@@ -115,7 +120,7 @@ Just type in the terminal:
115
120
  ```bash
116
121
  scribe
117
122
  ```
118
- and the script will guide you through the choice of backend (`whisper` or `vosk`) and the specific language model.
123
+ and the script will guide you through the choice of backend (`whisper` or `vosk` or `openaiapi`) and the specific language model.
119
124
  After this, you will be prompted to start recording your microphone and print the transcribed text in real-time (`vosk`)
120
125
  or until after recording is complete (`whisper`).
121
126
  You can interrupt the recording via Ctrl + C and start again or change model.
@@ -129,9 +134,9 @@ The `vosk` backend is much faster and very good at doing real-time transcription
129
134
  It becomes really powerful when used for longer or interactive typing session with the [keyboard](#virtual-keyboard-experimental) option, e.g. to make notes or chat with an AI.
130
135
  There are many [vosk models](https://alphacephei.com/vosk/models) available, and here a few are associated to [a handful of languages](scribe/models.toml) `en`, `fr`, `it`, `de` (so far).
131
136
 
132
- To skip the initial selection menu you can do:
137
+ The `openaiapi` backend uses `whisper-1` model at the time of writing. It requires an API key
133
138
  ```bash
134
- scribe --backend whisper --model small --no-prompt
139
+ scribe --backend openaiapi --api YOURAPIKEY
135
140
  ```
136
141
  where `--no-prompt` jumps right to the recording (after the first interruption, you can still choose to change the backend and model).
137
142
 
@@ -190,7 +195,8 @@ To activate start with:
190
195
  ```bash
191
196
  scribe --app
192
197
  ```
193
- or toggle the app option in the interactive menu. The scribe icon will show, with Record or Quit options.
198
+ or toggle the app option in the interactive menu. The scribe icon will show, with Record, Stop or Quit options. The icon will change based on what the app is doing.
199
+ For the vosk model, there are only two states : recording + transcribing or Idle. For the whisper model there are three states visible from the icon: recording, transcribing and idle/waiting.
194
200
  That option requires `pystray` to be installed. This is included with the `pip install ...[all]` option. In Ubuntu the following dependencies were required to make the menus appear:
195
201
 
196
202
  ```bash
@@ -201,15 +207,20 @@ pip install PyGObject
201
207
  ## Start as an application in GNOME
202
208
 
203
209
  If you run Ubuntu (or else?) with GNOME, the script `scribe-install [...]` will create a `scribe.desktop` file and place it under `$HOME/.local/share/applications`
204
- to make it available from the quick launch menu. Any option will be passed on to `scribe`.
210
+ to make it available from the quick launch menu. Any option will be passed on to `scribe`, with the additional options `--name` and `--no-terminal`.
211
+ `--no-terminal` means no terminal will show up, and it also implies the options `--app --no-prompt`.
205
212
 
206
213
  e.g.
207
214
 
208
215
  ```bash
209
- scribe-install --backend whisper --model small --clipboard --keyboard --restart-after-silence
216
+ scribe-install
217
+ scribe-install --name "Scribe Whisper" --backend whisper --model small --clipboard --restart-after-silence --no-prompt
218
+ scribe-install --name "Scribe Vosk FR" --backend vosk --language fr --keyboard --clipboard --no-terminal
210
219
  ```
211
-
212
- After that just typing Super + scri... at any time from any where will conveniently start the app in its own terminal with the prescribed options.
220
+ This will install three separate apps:
221
+ - `Super + scribe` : will launch the default version with terminal prompt
222
+ - `Super + whisper` : will launch a present version with the `small` model from `whisper` and start recording right away. You can see what is going on in the terminal and the result is ready to paste from the clipboard.
223
+ - `Super + vosk fr` : will launch a preset version for real-time transcription in French with the `vosk` backend, and throughput to the clipboard and the keyboard, not even opening a terminal (you need to press Record in the tray icon menu to start the recording).
213
224
 
214
225
 
215
226
  ## Fine tuning
@@ -3,7 +3,9 @@
3
3
 
4
4
  # Scribe <img src="scribe_data/share/icon.png" width=48px>
5
5
 
6
- `scribe` is a local speech recognition tool that provides real-time transcription using vosk and whisper AI, with the goal of serving as a virtual keyboard on a computer.
6
+ `scribe` is a speech recognition tool that provides real-time transcription using cutting-edge AI models, with the goal of serving as a virtual keyboard on a computer.
7
+
8
+ It features local, downloadable models with the `vosk` and `whisper` backends, as well as a client to open AI via `openaiapi` backend (API key required).
7
9
 
8
10
  ## Compatibility
9
11
 
@@ -38,12 +40,10 @@ cd scribe
38
40
  pip install -e .[all]
39
41
  ```
40
42
 
41
- You can leave the optional dependencies (leave out `[all]`) but must install at least one of `vosk` or `openai-whisper` packages (see Usage below).
42
-
43
- The `vosk` language models will download on-the-fly.
44
- The default download folder is `$XDG_CACHE_HOME/{backend}` where `$XDG_CACHE_HOME` defaults to `$HOME/.cache` (note for the `whisper` backend
45
- the default is left to the `openai-whisper` package and might change in the future).
43
+ You can leave the optional dependencies (leave out `[all]`) but must install at least one of `vosk` or `openai-whisper` or `openai` packages (see Usage below).
46
44
 
45
+ The language models for local backends `vosk` and `whisper` will download on-the-fly.
46
+ The default download folder is `$XDG_CACHE_HOME/{backend}` where `$XDG_CACHE_HOME` defaults to `$HOME/.cache`.
47
47
 
48
48
  ## Usage
49
49
 
@@ -52,7 +52,7 @@ Just type in the terminal:
52
52
  ```bash
53
53
  scribe
54
54
  ```
55
- and the script will guide you through the choice of backend (`whisper` or `vosk`) and the specific language model.
55
+ and the script will guide you through the choice of backend (`whisper` or `vosk` or `openaiapi`) and the specific language model.
56
56
  After this, you will be prompted to start recording your microphone and print the transcribed text in real-time (`vosk`)
57
57
  or until after recording is complete (`whisper`).
58
58
  You can interrupt the recording via Ctrl + C and start again or change model.
@@ -66,9 +66,9 @@ The `vosk` backend is much faster and very good at doing real-time transcription
66
66
  It becomes really powerful when used for longer or interactive typing session with the [keyboard](#virtual-keyboard-experimental) option, e.g. to make notes or chat with an AI.
67
67
  There are many [vosk models](https://alphacephei.com/vosk/models) available, and here a few are associated to [a handful of languages](scribe/models.toml) `en`, `fr`, `it`, `de` (so far).
68
68
 
69
- To skip the initial selection menu you can do:
69
+ The `openaiapi` backend uses `whisper-1` model at the time of writing. It requires an API key
70
70
  ```bash
71
- scribe --backend whisper --model small --no-prompt
71
+ scribe --backend openaiapi --api YOURAPIKEY
72
72
  ```
73
73
  where `--no-prompt` jumps right to the recording (after the first interruption, you can still choose to change the backend and model).
74
74
 
@@ -127,7 +127,8 @@ To activate start with:
127
127
  ```bash
128
128
  scribe --app
129
129
  ```
130
- or toggle the app option in the interactive menu. The scribe icon will show, with Record or Quit options.
130
+ or toggle the app option in the interactive menu. The scribe icon will show, with Record, Stop or Quit options. The icon will change based on what the app is doing.
131
+ For the vosk model, there are only two states : recording + transcribing or Idle. For the whisper model there are three states visible from the icon: recording, transcribing and idle/waiting.
131
132
  That option requires `pystray` to be installed. This is included with the `pip install ...[all]` option. In Ubuntu the following dependencies were required to make the menus appear:
132
133
 
133
134
  ```bash
@@ -138,15 +139,20 @@ pip install PyGObject
138
139
  ## Start as an application in GNOME
139
140
 
140
141
  If you run Ubuntu (or else?) with GNOME, the script `scribe-install [...]` will create a `scribe.desktop` file and place it under `$HOME/.local/share/applications`
141
- to make it available from the quick launch menu. Any option will be passed on to `scribe`.
142
+ to make it available from the quick launch menu. Any option will be passed on to `scribe`, with the additional options `--name` and `--no-terminal`.
143
+ `--no-terminal` means no terminal will show up, and it also implies the options `--app --no-prompt`.
142
144
 
143
145
  e.g.
144
146
 
145
147
  ```bash
146
- scribe-install --backend whisper --model small --clipboard --keyboard --restart-after-silence
148
+ scribe-install
149
+ scribe-install --name "Scribe Whisper" --backend whisper --model small --clipboard --restart-after-silence --no-prompt
150
+ scribe-install --name "Scribe Vosk FR" --backend vosk --language fr --keyboard --clipboard --no-terminal
147
151
  ```
148
-
149
- After that just typing Super + scri... at any time from any where will conveniently start the app in its own terminal with the prescribed options.
152
+ This will install three separate apps:
153
+ - `Super + scribe` : will launch the default version with terminal prompt
154
+ - `Super + whisper` : will launch a present version with the `small` model from `whisper` and start recording right away. You can see what is going on in the terminal and the result is ready to paste from the clipboard.
155
+ - `Super + vosk fr` : will launch a preset version for real-time transcription in French with the `vosk` backend, and throughput to the clipboard and the keyboard, not even opening a terminal (you need to press Record in the tray icon menu to start the recording).
150
156
 
151
157
 
152
158
  ## Fine tuning
@@ -44,7 +44,8 @@ keyboard = ["pynput"]
44
44
  whisper = ["openai-whisper"]
45
45
  vosk = ["vosk"]
46
46
  app = ["pystray", "PyGObject"]
47
- all = ["pynput", "openai-whisper", "vosk", "pystray"]
47
+ openai = ["openai", "soundfile"]
48
+ all = ["pynput", "openai-whisper", "openai", "soundfile", "vosk", "pystray"]
48
49
 
49
50
 
50
51
  [tool.setuptools]
@@ -12,5 +12,5 @@ __version__: str
12
12
  __version_tuple__: VERSION_TUPLE
13
13
  version_tuple: VERSION_TUPLE
14
14
 
15
- __version__ = version = '0.8.0'
16
- __version_tuple__ = version_tuple = (0, 8, 0)
15
+ __version__ = version = '0.10.0'
16
+ __version_tuple__ = version_tuple = (0, 10, 0)
@@ -4,8 +4,8 @@ import re
4
4
  import time
5
5
  import argparse
6
6
  from scribe.audio import Microphone
7
- from scribe.util import print_partial, clear_line, prompt_choices, check_dependencies, ansi_link, colored
8
- from scribe.models import VoskTranscriber, WhisperTranscriber
7
+ from scribe.util import print_partial, clear_line, prompt_choices, ansi_link, colored
8
+ from scribe.models import VoskTranscriber, WhisperTranscriber, OpenaiAPITranscriber
9
9
 
10
10
  with open(Path(__file__).parent / "models.toml", "rb") as f:
11
11
  language_config_default = tomllib.load(f)
@@ -24,7 +24,7 @@ def get_default_backend():
24
24
  except ImportError:
25
25
  raise ImportError("Please install either vosk or whisper to use this script.")
26
26
 
27
- BACKENDS = ["whisper", "vosk"]
27
+ BACKENDS = ["whisper", "vosk", "openaiapi"]
28
28
  UNAVAILABLE_BACKENDS = []
29
29
 
30
30
 
@@ -59,6 +59,7 @@ def get_transcriber(o, prompt=True):
59
59
 
60
60
  whisper_models = ["tiny", "base", "small", "medium", "large", "turbo"]
61
61
  whisper_english_models = ["tiny.en", "base.en", "small.en", "medium.en"]
62
+ whisperapi_models = ["whisper-1"]
62
63
 
63
64
  if o.dummy:
64
65
  return DummyTranscriber("whisper", "dummy")
@@ -68,26 +69,17 @@ def get_transcriber(o, prompt=True):
68
69
  o.backend = "vosk"
69
70
  elif o.model in whisper_models + whisper_english_models:
70
71
  o.backend = "whisper"
72
+ elif o.model in whisperapi_models:
73
+ o.backend = "openaiapi"
71
74
 
72
75
  if o.backend:
73
- checked_backend = check_dependencies(o.backend)
74
- if not checked_backend:
75
- print(f"Backend {o.backend} is not available.")
76
- exit(1)
77
76
  backend = o.backend
78
77
 
79
78
  elif not prompt:
80
79
  backend = BACKENDS[0]
81
80
 
82
81
  else:
83
- checked_backend = False
84
- while not checked_backend:
85
- backend = prompt_choices(BACKENDS, o.backend, "backend", UNAVAILABLE_BACKENDS)
86
- # raise an error if the user has explicitly selected a backend that is not available
87
- checked_backend = check_dependencies(backend, raise_error=backend==o.backend)
88
- if not checked_backend:
89
- print(f"Backend {o.backend} is not available.")
90
- UNAVAILABLE_BACKENDS.append(backend)
82
+ backend = prompt_choices(BACKENDS, o.backend, "backend", UNAVAILABLE_BACKENDS)
91
83
 
92
84
  print(f"Selected backend: {backend}")
93
85
 
@@ -131,6 +123,13 @@ def get_transcriber(o, prompt=True):
131
123
 
132
124
  model = pick_specialist_model(model, o.language, backend)
133
125
 
126
+ elif backend == "openaiapi":
127
+ model = o.model or "whisper-1"
128
+
129
+ else:
130
+ raise ValueError(f"Unknown backend: {backend}")
131
+
132
+
134
133
  print(f"Selected model: {model}")
135
134
 
136
135
  if backend == "vosk":
@@ -152,6 +151,12 @@ def get_transcriber(o, prompt=True):
152
151
  restart_after_silence=o.restart_after_silence,
153
152
  model_kwargs={"download_root": o.download_folder_whisper})
154
153
 
154
+ elif backend == "openaiapi":
155
+ transcriber = OpenaiAPITranscriber(model_name=model, samplerate=o.samplerate,
156
+ timeout=o.duration, silence_duration=o.silence, silence_thresh=o.silence_db,
157
+ restart_after_silence=o.restart_after_silence, api_key=o.api_key)
158
+
159
+
155
160
  else:
156
161
  raise ValueError(f"Unknown backend: {backend}")
157
162
 
@@ -195,6 +200,10 @@ def get_parser():
195
200
  group.add_argument("--silence-db", default=-30, type=float, help="silence magnitude in decibel (default %(default)s db)")
196
201
  group.add_argument("-a", "--restart-after-silence", action="store_true", help="Restart the recording after a transcription triggered by a silence")
197
202
 
203
+ group = parser.add_argument_group("whisper api")
204
+ group.add_argument("--api-key",
205
+ help="API key for the Whisper API backend.")
206
+
198
207
  parser.add_argument("--download-folder-vosk", help="Folder to store Vosk models.")
199
208
  parser.add_argument("--download-folder-whisper", help="Folder to store Whisper models.")
200
209
 
@@ -206,11 +215,11 @@ def start_recording(micro, transcriber, clipboard=True, keyboard=False, latency=
206
215
 
207
216
  if keyboard:
208
217
  from scribe.keyboard import type_text
209
- print("\nChange focus to target app during transcription.")
218
+ transcriber.log("Change focus to target app during transcription.")
210
219
 
211
220
  if clipboard:
212
221
  import pyperclip
213
- print("\nThe full transcription will be copied to clipboard as it becomes available.")
222
+ transcriber.log("The full transcription will be copied to clipboard as it becomes available.")
214
223
 
215
224
  fulltext = ""
216
225
 
@@ -301,7 +310,7 @@ def create_app(micro, transcriber, **kwargs):
301
310
  def callback_stop_recording(icon, item):
302
311
  # Here we need to stop the recording thread
303
312
 
304
- transcriber.recording = False
313
+ transcriber.interrupt = True
305
314
  if hasattr(icon, "_recording_thread"):
306
315
  icon._recording_thread.join()
307
316
  if hasattr(icon, "_monitoring_thread"):
@@ -310,7 +319,7 @@ def create_app(micro, transcriber, **kwargs):
310
319
  def callback_record(icon, item):
311
320
  # kwargs["callback"] = icon.update_menu # NOTE: the thread will finish AFTER the callback is complete
312
321
  if transcriber.busy:
313
- print("Still busy recording or transcribing.")
322
+ transcriber.log("Still busy recording or transcribing.")
314
323
  return
315
324
 
316
325
  if hasattr(icon, "_recording_thread") and icon._recording_thread.is_alive():
@@ -362,13 +371,15 @@ def main(args=None):
362
371
  transcriber = get_transcriber(o, prompt=o.prompt)
363
372
  print(f"Model [{colored(transcriber.model_name, 'light_blue', attrs=['bold'])}] from [{colored(transcriber.backend, 'light_blue', attrs=['bold'])}] selected.")
364
373
  show_output = ["clipboard", "keyboard", "output_file"]
365
- show_options = ["ascii", "app"]
374
+ show_options = ["ascii", "restart_after_silence"]
366
375
  activated_output = [colored(option if type(getattr(o, option)) is bool else f'{option}={getattr(o, option)}', 'light_blue') for option in show_output if getattr(o, option)]
367
376
  activated_options = [colored(option if type(getattr(o, option)) is bool else f'{option}={getattr(o, option)}', 'light_blue') for option in show_options if getattr(o, option)]
368
377
  if activated_output:
369
378
  print(f"Output: {' | '.join(activated_output)}")
370
379
  else:
371
380
  print(colored(f"No output selected -> terminal only", "light_red"))
381
+ if o.app:
382
+ print(colored("App mode enabled", "light_green"))
372
383
  if activated_options:
373
384
  print(f"Options: {' | '.join(activated_options)}")
374
385
  if o.prompt:
@@ -421,7 +432,7 @@ def main(args=None):
421
432
  o.app = not o.app
422
433
  continue
423
434
  if key == "a":
424
- transcriber.restart_after_silence = not transcriber.restart_after_silence
435
+ o.restart_after_silence = transcriber.restart_after_silence = not transcriber.restart_after_silence
425
436
  continue
426
437
  if key == "t":
427
438
  ans = input(f"Enter new duration in seconds (current: {transcriber.timeout}): ")
@@ -10,9 +10,17 @@ def main():
10
10
  sys.exit(0)
11
11
 
12
12
  parser = argparse.ArgumentParser("Install the desktop file for the scribe package. Any arguments to this script will be passed on to `scribe`.")
13
+ parser.add_argument("--name", help="The title of the desktop app", default="Scribe")
14
+ parser.add_argument("--startup-wm-class")
15
+ parser.add_argument("--no-terminal", action="store_false", dest="terminal", help="Don't show the terminal (goes in --app mode)")
13
16
  o, rest = parser.parse_known_args()
14
17
  o.arguments = rest
15
18
 
19
+ if not o.terminal and "--app" not in o.arguments:
20
+ o.arguments.append("--app")
21
+ if not o.terminal and "--no-prompt" not in o.arguments:
22
+ o.arguments.append("--no-prompt")
23
+
16
24
  SOURCE_SCRIBE_DATA = os.path.dirname(scribe_data.__file__)
17
25
 
18
26
  HOME = os.environ.get('HOME',os.path.expanduser('~'))
@@ -25,15 +33,18 @@ def main():
25
33
  with open(os.path.join(SOURCE_SCRIBE_DATA, 'templates', 'scribe.desktop')) as f:
26
34
  template = f.read()
27
35
 
36
+ simple_name = o.name.lower().replace(' ','-').replace(os.path.sep, '-')
28
37
  bin_folder = sysconfig.get_path("scripts")
29
38
  icon_folder = os.path.join(SOURCE_SCRIBE_DATA, 'share')
30
- desktop_filecontent = template.format(icon_folder=icon_folder, bin_folder=bin_folder, options=' '.join(o.arguments) if o.arguments else '')
39
+ desktop_filecontent = template.format(icon_folder=icon_folder, bin_folder=bin_folder,
40
+ name=o.name, terminal=str(o.terminal).lower(),
41
+ StartupWMClass=o.startup_wm_class or f"crx_mpnasdanpmm_{simple_name}",
42
+ options=' ' + ' '.join(o.arguments) if o.arguments else '')
31
43
 
32
- desktop_filepath = os.path.join(XDG_APP_DATA, 'scribe.desktop')
44
+ desktop_filepath = os.path.join(XDG_APP_DATA, f'{simple_name}.desktop')
33
45
  print("Writing GNOME desktop file:", desktop_filepath)
34
46
  with open(desktop_filepath, "w") as f:
35
47
  f.write(desktop_filecontent)
36
48
 
37
-
38
49
  if __name__ == "__main__":
39
50
  main()
@@ -22,7 +22,7 @@ class StopRecording(Exception):
22
22
  class AbstractTranscriber:
23
23
  backend = None
24
24
  def __init__(self, model, model_name=None, language=None, samplerate=16000, timeout=None, model_kwargs={},
25
- silence_thresh=-40, silence_duration=2, restart_after_silence=False):
25
+ silence_thresh=-40, silence_duration=2, restart_after_silence=False, logger=None):
26
26
  self.model_name = model_name
27
27
  self.language = language
28
28
  self.model = model
@@ -35,6 +35,12 @@ class AbstractTranscriber:
35
35
  self.recording = False
36
36
  self.busy = False
37
37
  self.waiting = False
38
+ self.interrupt = False
39
+ if logger is None:
40
+ import logging
41
+ logging.basicConfig(level=logging.INFO)
42
+ logger = logging.getLogger("scribe")
43
+ self.logger = logger
38
44
  self.reset()
39
45
 
40
46
  def get_elapsed(self):
@@ -54,11 +60,21 @@ class AbstractTranscriber:
54
60
  self.audio_buffer = b''
55
61
  self.start_time = time.time()
56
62
 
63
+ def log(self, text):
64
+ if text.startswith("\n"):
65
+ print("")
66
+ text = text[1:]
67
+ if self.logger:
68
+ self.logger.info(text)
69
+ else:
70
+ print(f"[{text}]")
71
+
57
72
  def start_recording(self, microphone,
58
73
  start_message="Recording... Press Ctrl+C to stop.",
59
- stop_message="Done transcribing."):
74
+ stop_message="Exit."):
60
75
 
61
76
  self.reset()
77
+ self.interrupt = False
62
78
  self.recording = True
63
79
  self.waiting = True
64
80
  self.busy = True
@@ -71,9 +87,9 @@ class AbstractTranscriber:
71
87
  try:
72
88
 
73
89
  with microphone.open_stream():
74
- print(start_message)
90
+ self.log(start_message)
75
91
 
76
- while self.recording:
92
+ while not self.interrupt:
77
93
  while not microphone.q.empty():
78
94
  data = microphone.q.get()
79
95
 
@@ -105,7 +121,7 @@ class AbstractTranscriber:
105
121
 
106
122
  else:
107
123
  if not previous_waiting:
108
- print("Silence detected...waiting for more audio")
124
+ self.log("Silence detected...waiting for more audio")
109
125
 
110
126
  if self.is_overtime():
111
127
  raise StopRecording("Overtime: {:.2f} seconds".format(self.get_elapsed()))
@@ -123,7 +139,7 @@ class AbstractTranscriber:
123
139
  self.busy = False
124
140
  yield result
125
141
 
126
- print(stop_message)
142
+ self.log(stop_message)
127
143
 
128
144
 
129
145
  def get_vosk_model(model, download_root=None, url=None):
@@ -198,7 +214,7 @@ class WhisperTranscriber(AbstractTranscriber):
198
214
  super().__init__(model, model_name, language, model_kwargs=model_kwargs, **kwargs)
199
215
 
200
216
  def transcribe_audio(self, audio_bytes):
201
- print("\nTranscribing...")
217
+ self.log("\nTranscribing")
202
218
  audio_array = np.frombuffer(audio_bytes, dtype=np.int16).flatten().astype(np.float32) / 32768.0
203
219
  return self.model.transcribe(audio_array, fp16=False, language=self.language)
204
220
 
@@ -207,4 +223,34 @@ class WhisperTranscriber(AbstractTranscriber):
207
223
  return {"text": ""}
208
224
  result = self.transcribe_audio(self.audio_buffer)
209
225
  self.audio_buffer = b''
210
- return result
226
+ return result
227
+
228
+
229
+ class OpenaiAPITranscriber(WhisperTranscriber):
230
+ backend = "openaiapi"
231
+
232
+ def __init__(self, model_name="whisper-1", language=None, model_kwargs={}, model=None, api_key=None, **kwargs):
233
+ if model is None:
234
+ import openai
235
+ model = openai.OpenAI(
236
+ api_key=api_key or openai.api_key,
237
+ # 20 seconds (default is 10 minutes)
238
+ timeout=20.0,
239
+ )
240
+ AbstractTranscriber.__init__(self, model, model_name, language, model_kwargs=model_kwargs, **kwargs)
241
+
242
+ def transcribe_audio(self, audio_bytes):
243
+ self.log("\nTranscribing")
244
+ import io
245
+ import soundfile as sf
246
+ audio_data = np.frombuffer(audio_bytes, dtype=np.int16).flatten().astype(np.float32) / 32768.0
247
+ # Write the audio data to an in-memory file in WAV format
248
+ buffer = io.BytesIO()
249
+ sf.write(buffer, audio_data, self.samplerate, format='WAV')
250
+ buffer.seek(0)
251
+ buffer.name = "audio.wav" # Set a filename with a valid extension
252
+ transcription = self.model.audio.transcriptions.create(
253
+ model=self.model_name,
254
+ file=buffer,
255
+ )
256
+ return {"text": transcription.text}
@@ -1,6 +1,6 @@
1
1
  Metadata-Version: 2.2
2
2
  Name: scribe-cli
3
- Version: 0.8.0
3
+ Version: 0.10.0
4
4
  Summary: scribe is a local speech recognition tool that provides real-time transcription using vosk and whisper AI, with the goal of serving as a virtual keyboard on a computer
5
5
  Author-email: Mahé Perrette <mahe.perrette@gmail.com>
6
6
  License: MIT License
@@ -55,9 +55,14 @@ Requires-Dist: vosk; extra == "vosk"
55
55
  Provides-Extra: app
56
56
  Requires-Dist: pystray; extra == "app"
57
57
  Requires-Dist: PyGObject; extra == "app"
58
+ Provides-Extra: openai
59
+ Requires-Dist: openai; extra == "openai"
60
+ Requires-Dist: soundfile; extra == "openai"
58
61
  Provides-Extra: all
59
62
  Requires-Dist: pynput; extra == "all"
60
63
  Requires-Dist: openai-whisper; extra == "all"
64
+ Requires-Dist: openai; extra == "all"
65
+ Requires-Dist: soundfile; extra == "all"
61
66
  Requires-Dist: vosk; extra == "all"
62
67
  Requires-Dist: pystray; extra == "all"
63
68
 
@@ -66,7 +71,9 @@ Requires-Dist: pystray; extra == "all"
66
71
 
67
72
  # Scribe <img src="scribe_data/share/icon.png" width=48px>
68
73
 
69
- `scribe` is a local speech recognition tool that provides real-time transcription using vosk and whisper AI, with the goal of serving as a virtual keyboard on a computer.
74
+ `scribe` is a speech recognition tool that provides real-time transcription using cutting-edge AI models, with the goal of serving as a virtual keyboard on a computer.
75
+
76
+ It features local, downloadable models with the `vosk` and `whisper` backends, as well as a client to open AI via `openaiapi` backend (API key required).
70
77
 
71
78
  ## Compatibility
72
79
 
@@ -101,12 +108,10 @@ cd scribe
101
108
  pip install -e .[all]
102
109
  ```
103
110
 
104
- You can leave the optional dependencies (leave out `[all]`) but must install at least one of `vosk` or `openai-whisper` packages (see Usage below).
105
-
106
- The `vosk` language models will download on-the-fly.
107
- The default download folder is `$XDG_CACHE_HOME/{backend}` where `$XDG_CACHE_HOME` defaults to `$HOME/.cache` (note for the `whisper` backend
108
- the default is left to the `openai-whisper` package and might change in the future).
111
+ You can leave the optional dependencies (leave out `[all]`) but must install at least one of `vosk` or `openai-whisper` or `openai` packages (see Usage below).
109
112
 
113
+ The language models for local backends `vosk` and `whisper` will download on-the-fly.
114
+ The default download folder is `$XDG_CACHE_HOME/{backend}` where `$XDG_CACHE_HOME` defaults to `$HOME/.cache`.
110
115
 
111
116
  ## Usage
112
117
 
@@ -115,7 +120,7 @@ Just type in the terminal:
115
120
  ```bash
116
121
  scribe
117
122
  ```
118
- and the script will guide you through the choice of backend (`whisper` or `vosk`) and the specific language model.
123
+ and the script will guide you through the choice of backend (`whisper` or `vosk` or `openaiapi`) and the specific language model.
119
124
  After this, you will be prompted to start recording your microphone and print the transcribed text in real-time (`vosk`)
120
125
  or until after recording is complete (`whisper`).
121
126
  You can interrupt the recording via Ctrl + C and start again or change model.
@@ -129,9 +134,9 @@ The `vosk` backend is much faster and very good at doing real-time transcription
129
134
  It becomes really powerful when used for longer or interactive typing session with the [keyboard](#virtual-keyboard-experimental) option, e.g. to make notes or chat with an AI.
130
135
  There are many [vosk models](https://alphacephei.com/vosk/models) available, and here a few are associated to [a handful of languages](scribe/models.toml) `en`, `fr`, `it`, `de` (so far).
131
136
 
132
- To skip the initial selection menu you can do:
137
+ The `openaiapi` backend uses `whisper-1` model at the time of writing. It requires an API key
133
138
  ```bash
134
- scribe --backend whisper --model small --no-prompt
139
+ scribe --backend openaiapi --api YOURAPIKEY
135
140
  ```
136
141
  where `--no-prompt` jumps right to the recording (after the first interruption, you can still choose to change the backend and model).
137
142
 
@@ -190,7 +195,8 @@ To activate start with:
190
195
  ```bash
191
196
  scribe --app
192
197
  ```
193
- or toggle the app option in the interactive menu. The scribe icon will show, with Record or Quit options.
198
+ or toggle the app option in the interactive menu. The scribe icon will show, with Record, Stop or Quit options. The icon will change based on what the app is doing.
199
+ For the vosk model, there are only two states : recording + transcribing or Idle. For the whisper model there are three states visible from the icon: recording, transcribing and idle/waiting.
194
200
  That option requires `pystray` to be installed. This is included with the `pip install ...[all]` option. In Ubuntu the following dependencies were required to make the menus appear:
195
201
 
196
202
  ```bash
@@ -201,15 +207,20 @@ pip install PyGObject
201
207
  ## Start as an application in GNOME
202
208
 
203
209
  If you run Ubuntu (or else?) with GNOME, the script `scribe-install [...]` will create a `scribe.desktop` file and place it under `$HOME/.local/share/applications`
204
- to make it available from the quick launch menu. Any option will be passed on to `scribe`.
210
+ to make it available from the quick launch menu. Any option will be passed on to `scribe`, with the additional options `--name` and `--no-terminal`.
211
+ `--no-terminal` means no terminal will show up, and it also implies the options `--app --no-prompt`.
205
212
 
206
213
  e.g.
207
214
 
208
215
  ```bash
209
- scribe-install --backend whisper --model small --clipboard --keyboard --restart-after-silence
216
+ scribe-install
217
+ scribe-install --name "Scribe Whisper" --backend whisper --model small --clipboard --restart-after-silence --no-prompt
218
+ scribe-install --name "Scribe Vosk FR" --backend vosk --language fr --keyboard --clipboard --no-terminal
210
219
  ```
211
-
212
- After that just typing Super + scri... at any time from any where will conveniently start the app in its own terminal with the prescribed options.
220
+ This will install three separate apps:
221
+ - `Super + scribe` : will launch the default version with terminal prompt
222
+ - `Super + whisper` : will launch a present version with the `small` model from `whisper` and start recording right away. You can see what is going on in the terminal and the result is ready to paste from the clipboard.
223
+ - `Super + vosk fr` : will launch a preset version for real-time transcription in French with the `vosk` backend, and throughput to the clipboard and the keyboard, not even opening a terminal (you need to press Record in the tray icon menu to start the recording).
213
224
 
214
225
 
215
226
  ## Fine tuning
@@ -9,6 +9,8 @@ termcolor
9
9
  [all]
10
10
  pynput
11
11
  openai-whisper
12
+ openai
13
+ soundfile
12
14
  vosk
13
15
  pystray
14
16
 
@@ -19,6 +21,10 @@ PyGObject
19
21
  [keyboard]
20
22
  pynput
21
23
 
24
+ [openai]
25
+ openai
26
+ soundfile
27
+
22
28
  [vosk]
23
29
  vosk
24
30
 
@@ -0,0 +1,8 @@
1
+ #!/usr/bin/env xdg-open
2
+ [Desktop Entry]
3
+ Terminal={terminal}
4
+ Type=Application
5
+ Name={name}
6
+ Exec={bin_folder}/scribe{options}
7
+ Icon={icon_folder}/icon.png
8
+ StartupWMClass={StartupWMClass}
@@ -1,8 +0,0 @@
1
- #!/usr/bin/env xdg-open
2
- [Desktop Entry]
3
- Terminal=true
4
- Type=Application
5
- Name=Scribe
6
- Exec={bin_folder}/scribe{options}
7
- Icon={icon_folder}/icon.jpg
8
- StartupWMClass=crx_mpnasdanpmmopoasdjdcgaaiekailkhb
File without changes
File without changes
File without changes
File without changes
File without changes
File without changes