scribe-cli 0.9.0__tar.gz → 0.11.0__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (30) hide show
  1. {scribe_cli-0.9.0/scribe_cli.egg-info → scribe_cli-0.11.0}/PKG-INFO +28 -19
  2. {scribe_cli-0.9.0 → scribe_cli-0.11.0}/README.md +22 -18
  3. {scribe_cli-0.9.0 → scribe_cli-0.11.0}/pyproject.toml +2 -1
  4. {scribe_cli-0.9.0 → scribe_cli-0.11.0}/scribe/_version.py +2 -2
  5. {scribe_cli-0.9.0 → scribe_cli-0.11.0}/scribe/app.py +122 -56
  6. {scribe_cli-0.9.0 → scribe_cli-0.11.0}/scribe/models.py +56 -7
  7. {scribe_cli-0.9.0 → scribe_cli-0.11.0/scribe_cli.egg-info}/PKG-INFO +28 -19
  8. {scribe_cli-0.9.0 → scribe_cli-0.11.0}/scribe_cli.egg-info/requires.txt +6 -0
  9. {scribe_cli-0.9.0 → scribe_cli-0.11.0}/.github/workflows/pypi.yml +0 -0
  10. {scribe_cli-0.9.0 → scribe_cli-0.11.0}/.gitignore +0 -0
  11. {scribe_cli-0.9.0 → scribe_cli-0.11.0}/LICENSE +0 -0
  12. {scribe_cli-0.9.0 → scribe_cli-0.11.0}/icon.xcf +0 -0
  13. {scribe_cli-0.9.0 → scribe_cli-0.11.0}/scribe/__init__.py +0 -0
  14. {scribe_cli-0.9.0 → scribe_cli-0.11.0}/scribe/audio.py +0 -0
  15. {scribe_cli-0.9.0 → scribe_cli-0.11.0}/scribe/install_desktop.py +0 -0
  16. {scribe_cli-0.9.0 → scribe_cli-0.11.0}/scribe/keyboard.py +0 -0
  17. {scribe_cli-0.9.0 → scribe_cli-0.11.0}/scribe/models.toml +0 -0
  18. {scribe_cli-0.9.0 → scribe_cli-0.11.0}/scribe/saverecording.py +0 -0
  19. {scribe_cli-0.9.0 → scribe_cli-0.11.0}/scribe/testpynput.py +0 -0
  20. {scribe_cli-0.9.0 → scribe_cli-0.11.0}/scribe/util.py +0 -0
  21. {scribe_cli-0.9.0 → scribe_cli-0.11.0}/scribe_cli.egg-info/SOURCES.txt +0 -0
  22. {scribe_cli-0.9.0 → scribe_cli-0.11.0}/scribe_cli.egg-info/dependency_links.txt +0 -0
  23. {scribe_cli-0.9.0 → scribe_cli-0.11.0}/scribe_cli.egg-info/entry_points.txt +0 -0
  24. {scribe_cli-0.9.0 → scribe_cli-0.11.0}/scribe_cli.egg-info/top_level.txt +0 -0
  25. {scribe_cli-0.9.0 → scribe_cli-0.11.0}/scribe_data/__init__.py +0 -0
  26. {scribe_cli-0.9.0 → scribe_cli-0.11.0}/scribe_data/share/icon.png +0 -0
  27. {scribe_cli-0.9.0 → scribe_cli-0.11.0}/scribe_data/share/icon_recording.png +0 -0
  28. {scribe_cli-0.9.0 → scribe_cli-0.11.0}/scribe_data/share/icon_writing.png +0 -0
  29. {scribe_cli-0.9.0 → scribe_cli-0.11.0}/scribe_data/templates/scribe.desktop +0 -0
  30. {scribe_cli-0.9.0 → scribe_cli-0.11.0}/setup.cfg +0 -0
@@ -1,6 +1,6 @@
1
1
  Metadata-Version: 2.2
2
2
  Name: scribe-cli
3
- Version: 0.9.0
3
+ Version: 0.11.0
4
4
  Summary: scribe is a local speech recognition tool that provides real-time transcription using vosk and whisper AI, with the goal of serving as a virtual keyboard on a computer
5
5
  Author-email: Mahé Perrette <mahe.perrette@gmail.com>
6
6
  License: MIT License
@@ -55,9 +55,14 @@ Requires-Dist: vosk; extra == "vosk"
55
55
  Provides-Extra: app
56
56
  Requires-Dist: pystray; extra == "app"
57
57
  Requires-Dist: PyGObject; extra == "app"
58
+ Provides-Extra: openai
59
+ Requires-Dist: openai; extra == "openai"
60
+ Requires-Dist: soundfile; extra == "openai"
58
61
  Provides-Extra: all
59
62
  Requires-Dist: pynput; extra == "all"
60
63
  Requires-Dist: openai-whisper; extra == "all"
64
+ Requires-Dist: openai; extra == "all"
65
+ Requires-Dist: soundfile; extra == "all"
61
66
  Requires-Dist: vosk; extra == "all"
62
67
  Requires-Dist: pystray; extra == "all"
63
68
 
@@ -66,7 +71,9 @@ Requires-Dist: pystray; extra == "all"
66
71
 
67
72
  # Scribe <img src="scribe_data/share/icon.png" width=48px>
68
73
 
69
- `scribe` is a local speech recognition tool that provides real-time transcription using vosk and whisper AI, with the goal of serving as a virtual keyboard on a computer.
74
+ `scribe` is a speech recognition tool that provides real-time transcription using cutting-edge AI models, with the goal of serving as a virtual keyboard on a computer.
75
+
76
+ It features local, downloadable models with the `vosk` and `whisper` backends, as well as a client to open AI via `openaiapi` backend (API key required).
70
77
 
71
78
  ## Compatibility
72
79
 
@@ -101,12 +108,10 @@ cd scribe
101
108
  pip install -e .[all]
102
109
  ```
103
110
 
104
- You can leave the optional dependencies (leave out `[all]`) but must install at least one of `vosk` or `openai-whisper` packages (see Usage below).
105
-
106
- The `vosk` language models will download on-the-fly.
107
- The default download folder is `$XDG_CACHE_HOME/{backend}` where `$XDG_CACHE_HOME` defaults to `$HOME/.cache` (note for the `whisper` backend
108
- the default is left to the `openai-whisper` package and might change in the future).
111
+ You can leave the optional dependencies (leave out `[all]`) but must install at least one of `vosk` or `openai-whisper` or `openai` packages (see Usage below).
109
112
 
113
+ The language models for local backends `vosk` and `whisper` will download on-the-fly.
114
+ The default download folder is `$XDG_CACHE_HOME/{backend}` where `$XDG_CACHE_HOME` defaults to `$HOME/.cache`.
110
115
 
111
116
  ## Usage
112
117
 
@@ -115,7 +120,7 @@ Just type in the terminal:
115
120
  ```bash
116
121
  scribe
117
122
  ```
118
- and the script will guide you through the choice of backend (`whisper` or `vosk`) and the specific language model.
123
+ and the script will guide you through the choice of backend (`whisper` or `vosk` or `openaiapi`) and the specific language model.
119
124
  After this, you will be prompted to start recording your microphone and print the transcribed text in real-time (`vosk`)
120
125
  or until after recording is complete (`whisper`).
121
126
  You can interrupt the recording via Ctrl + C and start again or change model.
@@ -129,9 +134,9 @@ The `vosk` backend is much faster and very good at doing real-time transcription
129
134
  It becomes really powerful when used for longer or interactive typing session with the [keyboard](#virtual-keyboard-experimental) option, e.g. to make notes or chat with an AI.
130
135
  There are many [vosk models](https://alphacephei.com/vosk/models) available, and here a few are associated to [a handful of languages](scribe/models.toml) `en`, `fr`, `it`, `de` (so far).
131
136
 
132
- To skip the initial selection menu you can do:
137
+ The `openaiapi` backend uses `whisper-1` model at the time of writing. It requires an API key
133
138
  ```bash
134
- scribe --backend whisper --model small --no-prompt
139
+ scribe --backend openaiapi --api YOURAPIKEY
135
140
  ```
136
141
  where `--no-prompt` jumps right to the recording (after the first interruption, you can still choose to change the backend and model).
137
142
 
@@ -190,7 +195,9 @@ To activate start with:
190
195
  ```bash
191
196
  scribe --app
192
197
  ```
193
- or toggle the app option in the interactive menu. The scribe icon will show, with Record or Quit options.
198
+ or toggle the app option in the interactive menu. The scribe icon will show, with Record and other options. The icon will change based on what the app is doing. It is possible to choose from a set
199
+ of predefined models, or to Quit and choose from the terminal before pressing Enter again.
200
+ For the vosk model, there are only two states : recording + transcribing or Idle. For the whisper model there are three states visible from the icon: recording, transcribing and idle/waiting.
194
201
  That option requires `pystray` to be installed. This is included with the `pip install ...[all]` option. In Ubuntu the following dependencies were required to make the menus appear:
195
202
 
196
203
  ```bash
@@ -204,17 +211,19 @@ If you run Ubuntu (or else?) with GNOME, the script `scribe-install [...]` will
204
211
  to make it available from the quick launch menu. Any option will be passed on to `scribe`, with the additional options `--name` and `--no-terminal`.
205
212
  `--no-terminal` means no terminal will show up, and it also implies the options `--app --no-prompt`.
206
213
 
207
- e.g.
214
+ In a relatively basic form
215
+
216
+ ```bash
217
+ scribe-install --clipboard --api YOUROPENAIAPIKEY
218
+ ```
219
+ (`--api` is optional and only useful if you plan to use `openaiapi` backend later on)
220
+
221
+ And to make an app running outside the terminal:
208
222
 
209
223
  ```bash
210
- scribe-install
211
- scribe-install --name "Scribe Whisper" --backend whisper --model small --clipboard --restart-after-silence --no-prompt
212
- scribe-install --name "Scribe Vosk FR" --backend vosk --language fr --keyboard --clipboard --no-terminal
224
+ scribe-install --backend openaiapi --name "Scribe App" --keyboard --clipboard --app --no-prompt --no-terminal --api YOUROPENAIAPIKEY
213
225
  ```
214
- This will install three separate apps:
215
- - `Super + scribe` : will launch the default version with terminal prompt
216
- - `Super + whisper` : will launch a present version with the `small` model from `whisper` and start recording right away. You can see what is going on in the terminal and the result is ready to paste from the clipboard
217
- - `Super + vosk fr` : will launch a preset version for real-time transcription in French with the `vosk` backend, and throughput to the clipboard and the keyboard, not even opening a terminal.
226
+ This will install two separate apps (names "Scribe" and "Scribe App")
218
227
 
219
228
 
220
229
  ## Fine tuning
@@ -3,7 +3,9 @@
3
3
 
4
4
  # Scribe <img src="scribe_data/share/icon.png" width=48px>
5
5
 
6
- `scribe` is a local speech recognition tool that provides real-time transcription using vosk and whisper AI, with the goal of serving as a virtual keyboard on a computer.
6
+ `scribe` is a speech recognition tool that provides real-time transcription using cutting-edge AI models, with the goal of serving as a virtual keyboard on a computer.
7
+
8
+ It features local, downloadable models with the `vosk` and `whisper` backends, as well as a client to open AI via `openaiapi` backend (API key required).
7
9
 
8
10
  ## Compatibility
9
11
 
@@ -38,12 +40,10 @@ cd scribe
38
40
  pip install -e .[all]
39
41
  ```
40
42
 
41
- You can leave the optional dependencies (leave out `[all]`) but must install at least one of `vosk` or `openai-whisper` packages (see Usage below).
42
-
43
- The `vosk` language models will download on-the-fly.
44
- The default download folder is `$XDG_CACHE_HOME/{backend}` where `$XDG_CACHE_HOME` defaults to `$HOME/.cache` (note for the `whisper` backend
45
- the default is left to the `openai-whisper` package and might change in the future).
43
+ You can leave the optional dependencies (leave out `[all]`) but must install at least one of `vosk` or `openai-whisper` or `openai` packages (see Usage below).
46
44
 
45
+ The language models for local backends `vosk` and `whisper` will download on-the-fly.
46
+ The default download folder is `$XDG_CACHE_HOME/{backend}` where `$XDG_CACHE_HOME` defaults to `$HOME/.cache`.
47
47
 
48
48
  ## Usage
49
49
 
@@ -52,7 +52,7 @@ Just type in the terminal:
52
52
  ```bash
53
53
  scribe
54
54
  ```
55
- and the script will guide you through the choice of backend (`whisper` or `vosk`) and the specific language model.
55
+ and the script will guide you through the choice of backend (`whisper` or `vosk` or `openaiapi`) and the specific language model.
56
56
  After this, you will be prompted to start recording your microphone and print the transcribed text in real-time (`vosk`)
57
57
  or until after recording is complete (`whisper`).
58
58
  You can interrupt the recording via Ctrl + C and start again or change model.
@@ -66,9 +66,9 @@ The `vosk` backend is much faster and very good at doing real-time transcription
66
66
  It becomes really powerful when used for longer or interactive typing session with the [keyboard](#virtual-keyboard-experimental) option, e.g. to make notes or chat with an AI.
67
67
  There are many [vosk models](https://alphacephei.com/vosk/models) available, and here a few are associated to [a handful of languages](scribe/models.toml) `en`, `fr`, `it`, `de` (so far).
68
68
 
69
- To skip the initial selection menu you can do:
69
+ The `openaiapi` backend uses `whisper-1` model at the time of writing. It requires an API key
70
70
  ```bash
71
- scribe --backend whisper --model small --no-prompt
71
+ scribe --backend openaiapi --api YOURAPIKEY
72
72
  ```
73
73
  where `--no-prompt` jumps right to the recording (after the first interruption, you can still choose to change the backend and model).
74
74
 
@@ -127,7 +127,9 @@ To activate start with:
127
127
  ```bash
128
128
  scribe --app
129
129
  ```
130
- or toggle the app option in the interactive menu. The scribe icon will show, with Record or Quit options.
130
+ or toggle the app option in the interactive menu. The scribe icon will show, with Record and other options. The icon will change based on what the app is doing. It is possible to choose from a set
131
+ of predefined models, or to Quit and choose from the terminal before pressing Enter again.
132
+ For the vosk model, there are only two states : recording + transcribing or Idle. For the whisper model there are three states visible from the icon: recording, transcribing and idle/waiting.
131
133
  That option requires `pystray` to be installed. This is included with the `pip install ...[all]` option. In Ubuntu the following dependencies were required to make the menus appear:
132
134
 
133
135
  ```bash
@@ -141,17 +143,19 @@ If you run Ubuntu (or else?) with GNOME, the script `scribe-install [...]` will
141
143
  to make it available from the quick launch menu. Any option will be passed on to `scribe`, with the additional options `--name` and `--no-terminal`.
142
144
  `--no-terminal` means no terminal will show up, and it also implies the options `--app --no-prompt`.
143
145
 
144
- e.g.
146
+ In a relatively basic form
147
+
148
+ ```bash
149
+ scribe-install --clipboard --api YOUROPENAIAPIKEY
150
+ ```
151
+ (`--api` is optional and only useful if you plan to use `openaiapi` backend later on)
152
+
153
+ And to make an app running outside the terminal:
145
154
 
146
155
  ```bash
147
- scribe-install
148
- scribe-install --name "Scribe Whisper" --backend whisper --model small --clipboard --restart-after-silence --no-prompt
149
- scribe-install --name "Scribe Vosk FR" --backend vosk --language fr --keyboard --clipboard --no-terminal
156
+ scribe-install --backend openaiapi --name "Scribe App" --keyboard --clipboard --app --no-prompt --no-terminal --api YOUROPENAIAPIKEY
150
157
  ```
151
- This will install three separate apps:
152
- - `Super + scribe` : will launch the default version with terminal prompt
153
- - `Super + whisper` : will launch a present version with the `small` model from `whisper` and start recording right away. You can see what is going on in the terminal and the result is ready to paste from the clipboard
154
- - `Super + vosk fr` : will launch a preset version for real-time transcription in French with the `vosk` backend, and throughput to the clipboard and the keyboard, not even opening a terminal.
158
+ This will install two separate apps (names "Scribe" and "Scribe App")
155
159
 
156
160
 
157
161
  ## Fine tuning
@@ -44,7 +44,8 @@ keyboard = ["pynput"]
44
44
  whisper = ["openai-whisper"]
45
45
  vosk = ["vosk"]
46
46
  app = ["pystray", "PyGObject"]
47
- all = ["pynput", "openai-whisper", "vosk", "pystray"]
47
+ openai = ["openai", "soundfile"]
48
+ all = ["pynput", "openai-whisper", "openai", "soundfile", "vosk", "pystray"]
48
49
 
49
50
 
50
51
  [tool.setuptools]
@@ -12,5 +12,5 @@ __version__: str
12
12
  __version_tuple__: VERSION_TUPLE
13
13
  version_tuple: VERSION_TUPLE
14
14
 
15
- __version__ = version = '0.9.0'
16
- __version_tuple__ = version_tuple = (0, 9, 0)
15
+ __version__ = version = '0.11.0'
16
+ __version_tuple__ = version_tuple = (0, 11, 0)
@@ -4,8 +4,8 @@ import re
4
4
  import time
5
5
  import argparse
6
6
  from scribe.audio import Microphone
7
- from scribe.util import print_partial, clear_line, prompt_choices, check_dependencies, ansi_link, colored
8
- from scribe.models import VoskTranscriber, WhisperTranscriber
7
+ from scribe.util import print_partial, clear_line, prompt_choices, ansi_link, colored
8
+ from scribe.models import VoskTranscriber, WhisperTranscriber, OpenaiAPITranscriber
9
9
 
10
10
  with open(Path(__file__).parent / "models.toml", "rb") as f:
11
11
  language_config_default = tomllib.load(f)
@@ -24,7 +24,7 @@ def get_default_backend():
24
24
  except ImportError:
25
25
  raise ImportError("Please install either vosk or whisper to use this script.")
26
26
 
27
- BACKENDS = ["whisper", "vosk"]
27
+ BACKENDS = ["whisper", "vosk", "openaiapi"]
28
28
  UNAVAILABLE_BACKENDS = []
29
29
 
30
30
 
@@ -55,57 +55,54 @@ class DummyTranscriber:
55
55
  def __getattr__(self, item):
56
56
  return None
57
57
 
58
- def get_transcriber(o, prompt=True):
58
+ whisper_models = ["tiny", "base", "small", "medium", "large", "turbo"]
59
+ whisper_english_models = ["tiny.en", "base.en", "small.en", "medium.en"]
60
+ whisperapi_models = ["whisper-1"]
61
+ vosk_models = [language_config["vosk"][lang]["model"] for lang in language_config["vosk"]]
59
62
 
60
- whisper_models = ["tiny", "base", "small", "medium", "large", "turbo"]
61
- whisper_english_models = ["tiny.en", "base.en", "small.en", "medium.en"]
62
63
 
63
- if o.dummy:
64
+ def get_transcriber(model=None, backend=None, dummy=False, prompt=True, language=None,
65
+ samplerate=None, duration=None, silence=None, silence_db=None, restart_after_silence=None,
66
+ api_key=None,
67
+ download_folder_vosk=None, download_folder_whisper=None, **kwargs):
68
+
69
+ if dummy:
64
70
  return DummyTranscriber("whisper", "dummy")
65
71
 
66
- if o.model and not o.backend:
67
- if o.model.startswith("vosk-"):
68
- o.backend = "vosk"
69
- elif o.model in whisper_models + whisper_english_models:
70
- o.backend = "whisper"
72
+ if model and not backend:
73
+ if model.startswith("vosk-"):
74
+ backend = "vosk"
75
+ elif model in whisper_models + whisper_english_models:
76
+ backend = "whisper"
77
+ elif model in whisperapi_models:
78
+ backend = "openaiapi"
71
79
 
72
- if o.backend:
73
- checked_backend = check_dependencies(o.backend)
74
- if not checked_backend:
75
- print(f"Backend {o.backend} is not available.")
76
- exit(1)
77
- backend = o.backend
80
+ if backend:
81
+ backend = backend
78
82
 
79
83
  elif not prompt:
80
84
  backend = BACKENDS[0]
81
85
 
82
86
  else:
83
- checked_backend = False
84
- while not checked_backend:
85
- backend = prompt_choices(BACKENDS, o.backend, "backend", UNAVAILABLE_BACKENDS)
86
- # raise an error if the user has explicitly selected a backend that is not available
87
- checked_backend = check_dependencies(backend, raise_error=backend==o.backend)
88
- if not checked_backend:
89
- print(f"Backend {o.backend} is not available.")
90
- UNAVAILABLE_BACKENDS.append(backend)
87
+ backend = prompt_choices(BACKENDS, backend, "backend", UNAVAILABLE_BACKENDS)
91
88
 
92
89
  print(f"Selected backend: {backend}")
93
90
 
94
- if o.model:
95
- model = pick_specialist_model(o.model, o.language, backend)
91
+ if model:
92
+ model = pick_specialist_model(model, language, backend)
96
93
 
97
94
  else:
98
95
 
99
96
  if backend == "vosk":
100
97
  available_languages = list(language_config[backend])
101
- if o.language:
102
- if o.language not in available_languages:
103
- print(f"Language '{o.language}' is not pre-defined (yet) for backend '{backend}'.")
98
+ if language:
99
+ if language not in available_languages:
100
+ print(f"Language '{language}' is not pre-defined (yet) for backend '{backend}'.")
104
101
  print(f"Yet it may actually exist.")
105
102
  print(f"Please choose the model explictly from {ansi_link('https://alphacephei.com/vosk/models')}.")
106
103
  print(f"Or pick one of the pre-defined languages: ", " ".join(available_languages))
107
104
  exit(1)
108
- choices = [language_config[backend][o.language]["model"]]
105
+ choices = [language_config[backend][language]["model"]]
109
106
  default_model = choices[0] # this is a string
110
107
 
111
108
  else:
@@ -129,28 +126,41 @@ def get_transcriber(o, prompt=True):
129
126
  else:
130
127
  model = default_model
131
128
 
132
- model = pick_specialist_model(model, o.language, backend)
129
+ model = pick_specialist_model(model, language, backend)
130
+
131
+ elif backend == "openaiapi":
132
+ model = model or "whisper-1"
133
+
134
+ else:
135
+ raise ValueError(f"Unknown backend: {backend}")
136
+
133
137
 
134
138
  print(f"Selected model: {model}")
135
139
 
136
140
  if backend == "vosk":
137
141
  try:
138
142
  transcriber = VoskTranscriber(model_name=model,
139
- language=o.language,
140
- samplerate=o.samplerate,
143
+ language=language,
144
+ samplerate=samplerate,
141
145
  timeout=None, # vosk keeps going (no timeout)
142
146
  silence_duration=None, # vosk handles silences internally
143
- model_kwargs={"download_root": o.download_folder_vosk})
147
+ model_kwargs={"download_root": download_folder_vosk})
144
148
  except Exception as error:
145
149
  print(error)
146
150
  print(f"Failed to (down)load model {model}.")
147
151
  exit(1)
148
152
 
149
153
  elif backend == "whisper":
150
- transcriber = WhisperTranscriber(model_name=model, language=o.language, samplerate=o.samplerate,
151
- timeout=o.duration, silence_duration=o.silence, silence_thresh=o.silence_db,
152
- restart_after_silence=o.restart_after_silence,
153
- model_kwargs={"download_root": o.download_folder_whisper})
154
+ transcriber = WhisperTranscriber(model_name=model, language=language, samplerate=samplerate,
155
+ timeout=duration, silence_duration=silence, silence_thresh=silence_db,
156
+ restart_after_silence=restart_after_silence,
157
+ model_kwargs={"download_root": download_folder_whisper})
158
+
159
+ elif backend == "openaiapi":
160
+ transcriber = OpenaiAPITranscriber(model_name=model, samplerate=samplerate,
161
+ timeout=duration, silence_duration=silence, silence_thresh=silence_db,
162
+ restart_after_silence=restart_after_silence, api_key=api_key)
163
+
154
164
 
155
165
  else:
156
166
  raise ValueError(f"Unknown backend: {backend}")
@@ -195,6 +205,10 @@ def get_parser():
195
205
  group.add_argument("--silence-db", default=-30, type=float, help="silence magnitude in decibel (default %(default)s db)")
196
206
  group.add_argument("-a", "--restart-after-silence", action="store_true", help="Restart the recording after a transcription triggered by a silence")
197
207
 
208
+ group = parser.add_argument_group("whisper api")
209
+ group.add_argument("--api-key",
210
+ help="API key for the Whisper API backend.")
211
+
198
212
  parser.add_argument("--download-folder-vosk", help="Folder to store Vosk models.")
199
213
  parser.add_argument("--download-folder-whisper", help="Folder to store Whisper models.")
200
214
 
@@ -206,11 +220,11 @@ def start_recording(micro, transcriber, clipboard=True, keyboard=False, latency=
206
220
 
207
221
  if keyboard:
208
222
  from scribe.keyboard import type_text
209
- print("\nChange focus to target app during transcription.")
223
+ transcriber.log("Change focus to target app during transcription.")
210
224
 
211
225
  if clipboard:
212
226
  import pyperclip
213
- print("\nThe full transcription will be copied to clipboard as it becomes available.")
227
+ transcriber.log("The full transcription will be copied to clipboard as it becomes available.")
214
228
 
215
229
  fulltext = ""
216
230
 
@@ -237,7 +251,7 @@ def start_recording(micro, transcriber, clipboard=True, keyboard=False, latency=
237
251
  callback()
238
252
 
239
253
 
240
- def create_app(micro, transcriber, **kwargs):
254
+ def create_app(micro, transcriber, other_transcribers=None, **kwargs):
241
255
  import pystray
242
256
  from pystray import Menu as pystrayMenu, MenuItem as Item
243
257
  from PIL import Image
@@ -257,6 +271,7 @@ def create_app(micro, transcriber, **kwargs):
257
271
  image_recording = Image.alpha_composite(image_recording.convert("RGBA"), image_writing.convert("RGBA"))
258
272
 
259
273
  def update_icon(icon, force=False):
274
+ transcriber = icon._transcriber
260
275
  if transcriber.recording and transcriber.waiting:
261
276
  # this is the situation with the whisper backend when the microphone is recording
262
277
  # but we wait for the speaker to speak (silence)
@@ -284,6 +299,7 @@ def create_app(micro, transcriber, **kwargs):
284
299
  icon.update_menu()
285
300
 
286
301
  def start_monitoring(icon):
302
+ transcriber = icon._transcriber
287
303
  try:
288
304
  while transcriber.busy:
289
305
  update_icon(icon)
@@ -299,8 +315,8 @@ def create_app(micro, transcriber, **kwargs):
299
315
  icon.stop()
300
316
 
301
317
  def callback_stop_recording(icon, item):
318
+ transcriber = icon._transcriber
302
319
  # Here we need to stop the recording thread
303
-
304
320
  transcriber.interrupt = True
305
321
  if hasattr(icon, "_recording_thread"):
306
322
  icon._recording_thread.join()
@@ -308,9 +324,9 @@ def create_app(micro, transcriber, **kwargs):
308
324
  icon._monitoring_thread.join()
309
325
 
310
326
  def callback_record(icon, item):
311
- # kwargs["callback"] = icon.update_menu # NOTE: the thread will finish AFTER the callback is complete
327
+ transcriber = icon._transcriber
312
328
  if transcriber.busy:
313
- print("Still busy recording or transcribing.")
329
+ transcriber.log("Still busy recording or transcribing.")
314
330
  return
315
331
 
316
332
  if hasattr(icon, "_recording_thread") and icon._recording_thread.is_alive():
@@ -325,22 +341,67 @@ def create_app(micro, transcriber, **kwargs):
325
341
  icon._monitoring_thread = threading.Thread(target=start_monitoring, args=(icon,))
326
342
  icon._monitoring_thread.start()
327
343
 
344
+ if other_transcribers:
345
+ other_transcribers_dict = {meta["model"]: meta for meta in other_transcribers}
346
+ else:
347
+ other_transcribers_dict = {}
348
+
349
+ def callback_set_model(icon, item):
350
+ transcriber = icon._transcriber
351
+ callback_stop_recording(icon, item)
352
+ model_name = str(item)
353
+ meta = other_transcribers_dict[model_name]
354
+ icon._transcriber = transcriber = get_transcriber(**meta)
355
+ icon.title = f"scribe :: {transcriber.backend} :: {transcriber.model_name}"
356
+ print("Set", transcriber.backend, transcriber.model_name)
357
+ # icon.menu.items[0].__name__ = f"Record [{str(item)}]"
358
+ icon._model_selection = False
359
+ icon.update_menu()
360
+ icon.notify(f"Set {transcriber.backend} {transcriber.model_name}")
361
+
362
+ def callback_info(icon, item):
363
+ transcriber = icon._transcriber
364
+ # icon.notify(f"scribe {transcriber.backend} {transcriber.model_name}")
365
+ title = f"""{transcriber.backend} :: {transcriber.model_name}"""
366
+ info = [name for name in kwargs if isinstance(kwargs[name], bool) and kwargs[name]]
367
+ icon.notify(" | ".join(info), title=title)
368
+
369
+ def callback_toggle_option(icon, item):
370
+ kwargs[str(item)] = not kwargs[str(item)]
371
+ callback_info(icon, item)
372
+
373
+ def is_model_selection(item):
374
+ return icon._model_selection
375
+
328
376
  def is_recording(item):
329
- return transcriber.busy
377
+ return icon._transcriber.busy
330
378
 
331
379
  def is_not_recording(item):
332
- return not is_recording(item)
380
+ return not is_recording(item) and not is_model_selection(item)
333
381
 
382
+ modeltitle = f"{transcriber.backend} :: {transcriber.model_name}"
383
+ title = f"scribe :: {modeltitle}"
334
384
 
335
- # Create a menu
336
- menu = pystrayMenu(
337
- Item("Record", callback_record, visible=is_not_recording),
338
- Item("Stop", callback_stop_recording, visible=is_recording),
339
- Item('Quit', callback_quit),
385
+ menus = []
386
+ menus.append(Item(f"Record" if len(other_transcribers_dict) <= 1 else f"Record", callback_record, visible=is_not_recording))
387
+ menus.append(Item("Stop", callback_stop_recording, visible=is_recording))
388
+ menus.append(Item("Choose Model", pystrayMenu(
389
+ *(Item(f"{name}", callback_set_model) for name in other_transcribers_dict)))
340
390
  )
391
+ menus.append(Item("Toggle Options", pystrayMenu(
392
+ *(Item(f"{name}", callback_toggle_option) for name in kwargs if isinstance(kwargs[name], bool))))
393
+ )
394
+ menus.append(Item(f"Info", callback_info))
395
+ menus.append(Item('Quit', callback_quit))
396
+
397
+ # Create a menu
398
+ menu = pystrayMenu(*menus)
341
399
 
342
400
  # Create the system tray icon
343
- icon = pystray.Icon('scribe', image, "scribe", menu)
401
+ icon = pystray.Icon('scribe', image, title, menu)
402
+ icon._model_selection = False
403
+ icon._transcriber = transcriber
404
+ del transcriber
344
405
 
345
406
  return icon
346
407
 
@@ -359,7 +420,7 @@ def main(args=None):
359
420
 
360
421
  while True:
361
422
  if transcriber is None:
362
- transcriber = get_transcriber(o, prompt=o.prompt)
423
+ transcriber = get_transcriber(**vars(o))
363
424
  print(f"Model [{colored(transcriber.model_name, 'light_blue', attrs=['bold'])}] from [{colored(transcriber.backend, 'light_blue', attrs=['bold'])}] selected.")
364
425
  show_output = ["clipboard", "keyboard", "output_file"]
365
426
  show_options = ["ascii", "restart_after_silence"]
@@ -473,7 +534,12 @@ def main(args=None):
473
534
  greetings = dict(
474
535
  start_message = "Listening... Use the try icon menu to stop.",
475
536
  )
476
- app = create_app(micro, transcriber, clipboard=o.clipboard, output_file=o.output_file,
537
+
538
+ app = create_app(micro, transcriber, other_transcribers=[
539
+ {**vars(o), "backend": "openaiapi", "model": "whisper-1"},
540
+ *[{**vars(o), "backend": "whisper", "model": model} for model in whisper_models],
541
+ *[{**vars(o), "backend": "vosk", "model": model} for model in vosk_models]],
542
+ clipboard=o.clipboard, output_file=o.output_file,
477
543
  keyboard=o.keyboard, latency=o.latency, ascii=o.ascii, **greetings)
478
544
  print("Starting app...")
479
545
  app.run()
@@ -22,7 +22,7 @@ class StopRecording(Exception):
22
22
  class AbstractTranscriber:
23
23
  backend = None
24
24
  def __init__(self, model, model_name=None, language=None, samplerate=16000, timeout=None, model_kwargs={},
25
- silence_thresh=-40, silence_duration=2, restart_after_silence=False):
25
+ silence_thresh=-40, silence_duration=2, restart_after_silence=False, logger=None):
26
26
  self.model_name = model_name
27
27
  self.language = language
28
28
  self.model = model
@@ -36,6 +36,11 @@ class AbstractTranscriber:
36
36
  self.busy = False
37
37
  self.waiting = False
38
38
  self.interrupt = False
39
+ if logger is None:
40
+ import logging
41
+ logging.basicConfig(level=logging.INFO)
42
+ logger = logging.getLogger("scribe")
43
+ self.logger = logger
39
44
  self.reset()
40
45
 
41
46
  def get_elapsed(self):
@@ -55,9 +60,18 @@ class AbstractTranscriber:
55
60
  self.audio_buffer = b''
56
61
  self.start_time = time.time()
57
62
 
63
+ def log(self, text):
64
+ if text.startswith("\n"):
65
+ print("")
66
+ text = text[1:]
67
+ if self.logger:
68
+ self.logger.info(text)
69
+ else:
70
+ print(f"[{text}]")
71
+
58
72
  def start_recording(self, microphone,
59
73
  start_message="Recording... Press Ctrl+C to stop.",
60
- stop_message="Done transcribing."):
74
+ stop_message="Exit."):
61
75
 
62
76
  self.reset()
63
77
  self.interrupt = False
@@ -73,7 +87,7 @@ class AbstractTranscriber:
73
87
  try:
74
88
 
75
89
  with microphone.open_stream():
76
- print(start_message)
90
+ self.log(start_message)
77
91
 
78
92
  while not self.interrupt:
79
93
  while not microphone.q.empty():
@@ -107,7 +121,7 @@ class AbstractTranscriber:
107
121
 
108
122
  else:
109
123
  if not previous_waiting:
110
- print("Silence detected...waiting for more audio")
124
+ self.log("Silence detected...waiting for more audio")
111
125
 
112
126
  if self.is_overtime():
113
127
  raise StopRecording("Overtime: {:.2f} seconds".format(self.get_elapsed()))
@@ -125,7 +139,7 @@ class AbstractTranscriber:
125
139
  self.busy = False
126
140
  yield result
127
141
 
128
- print(stop_message)
142
+ self.log(stop_message)
129
143
 
130
144
 
131
145
  def get_vosk_model(model, download_root=None, url=None):
@@ -200,7 +214,7 @@ class WhisperTranscriber(AbstractTranscriber):
200
214
  super().__init__(model, model_name, language, model_kwargs=model_kwargs, **kwargs)
201
215
 
202
216
  def transcribe_audio(self, audio_bytes):
203
- print("\nTranscribing...")
217
+ self.log("\nTranscribing")
204
218
  audio_array = np.frombuffer(audio_bytes, dtype=np.int16).flatten().astype(np.float32) / 32768.0
205
219
  return self.model.transcribe(audio_array, fp16=False, language=self.language)
206
220
 
@@ -209,4 +223,39 @@ class WhisperTranscriber(AbstractTranscriber):
209
223
  return {"text": ""}
210
224
  result = self.transcribe_audio(self.audio_buffer)
211
225
  self.audio_buffer = b''
212
- return result
226
+ return result
227
+
228
+
229
+ class OpenaiAPITranscriber(WhisperTranscriber):
230
+ backend = "openaiapi"
231
+
232
+ def __init__(self, model_name="whisper-1", language=None, model_kwargs={}, model=None, api_key=None, **kwargs):
233
+ if model is None:
234
+ import openai
235
+ model = openai.OpenAI(
236
+ api_key=api_key or openai.api_key,
237
+ # 20 seconds (default is 10 minutes)
238
+ timeout=20.0,
239
+ )
240
+ AbstractTranscriber.__init__(self, model, model_name, language, model_kwargs=model_kwargs, **kwargs)
241
+
242
+ def transcribe_audio(self, audio_bytes):
243
+ self.log("\nTranscribing")
244
+ import io
245
+ import openai
246
+ import soundfile as sf
247
+ audio_data = np.frombuffer(audio_bytes, dtype=np.int16).flatten().astype(np.float32) / 32768.0
248
+ # Write the audio data to an in-memory file in WAV format
249
+ buffer = io.BytesIO()
250
+ sf.write(buffer, audio_data, self.samplerate, format='WAV')
251
+ buffer.seek(0)
252
+ buffer.name = "audio.wav" # Set a filename with a valid extension
253
+ try:
254
+ transcription = self.model.audio.transcriptions.create(
255
+ model=self.model_name,
256
+ file=buffer,
257
+ )
258
+ except openai.BadRequestError as e:
259
+ self.log(f"Error: {e}")
260
+ return {"text": ""}
261
+ return {"text": transcription.text}
@@ -1,6 +1,6 @@
1
1
  Metadata-Version: 2.2
2
2
  Name: scribe-cli
3
- Version: 0.9.0
3
+ Version: 0.11.0
4
4
  Summary: scribe is a local speech recognition tool that provides real-time transcription using vosk and whisper AI, with the goal of serving as a virtual keyboard on a computer
5
5
  Author-email: Mahé Perrette <mahe.perrette@gmail.com>
6
6
  License: MIT License
@@ -55,9 +55,14 @@ Requires-Dist: vosk; extra == "vosk"
55
55
  Provides-Extra: app
56
56
  Requires-Dist: pystray; extra == "app"
57
57
  Requires-Dist: PyGObject; extra == "app"
58
+ Provides-Extra: openai
59
+ Requires-Dist: openai; extra == "openai"
60
+ Requires-Dist: soundfile; extra == "openai"
58
61
  Provides-Extra: all
59
62
  Requires-Dist: pynput; extra == "all"
60
63
  Requires-Dist: openai-whisper; extra == "all"
64
+ Requires-Dist: openai; extra == "all"
65
+ Requires-Dist: soundfile; extra == "all"
61
66
  Requires-Dist: vosk; extra == "all"
62
67
  Requires-Dist: pystray; extra == "all"
63
68
 
@@ -66,7 +71,9 @@ Requires-Dist: pystray; extra == "all"
66
71
 
67
72
  # Scribe <img src="scribe_data/share/icon.png" width=48px>
68
73
 
69
- `scribe` is a local speech recognition tool that provides real-time transcription using vosk and whisper AI, with the goal of serving as a virtual keyboard on a computer.
74
+ `scribe` is a speech recognition tool that provides real-time transcription using cutting-edge AI models, with the goal of serving as a virtual keyboard on a computer.
75
+
76
+ It features local, downloadable models with the `vosk` and `whisper` backends, as well as a client to open AI via `openaiapi` backend (API key required).
70
77
 
71
78
  ## Compatibility
72
79
 
@@ -101,12 +108,10 @@ cd scribe
101
108
  pip install -e .[all]
102
109
  ```
103
110
 
104
- You can leave the optional dependencies (leave out `[all]`) but must install at least one of `vosk` or `openai-whisper` packages (see Usage below).
105
-
106
- The `vosk` language models will download on-the-fly.
107
- The default download folder is `$XDG_CACHE_HOME/{backend}` where `$XDG_CACHE_HOME` defaults to `$HOME/.cache` (note for the `whisper` backend
108
- the default is left to the `openai-whisper` package and might change in the future).
111
+ You can leave the optional dependencies (leave out `[all]`) but must install at least one of `vosk` or `openai-whisper` or `openai` packages (see Usage below).
109
112
 
113
+ The language models for local backends `vosk` and `whisper` will download on-the-fly.
114
+ The default download folder is `$XDG_CACHE_HOME/{backend}` where `$XDG_CACHE_HOME` defaults to `$HOME/.cache`.
110
115
 
111
116
  ## Usage
112
117
 
@@ -115,7 +120,7 @@ Just type in the terminal:
115
120
  ```bash
116
121
  scribe
117
122
  ```
118
- and the script will guide you through the choice of backend (`whisper` or `vosk`) and the specific language model.
123
+ and the script will guide you through the choice of backend (`whisper` or `vosk` or `openaiapi`) and the specific language model.
119
124
  After this, you will be prompted to start recording your microphone and print the transcribed text in real-time (`vosk`)
120
125
  or until after recording is complete (`whisper`).
121
126
  You can interrupt the recording via Ctrl + C and start again or change model.
@@ -129,9 +134,9 @@ The `vosk` backend is much faster and very good at doing real-time transcription
129
134
  It becomes really powerful when used for longer or interactive typing session with the [keyboard](#virtual-keyboard-experimental) option, e.g. to make notes or chat with an AI.
130
135
  There are many [vosk models](https://alphacephei.com/vosk/models) available, and here a few are associated to [a handful of languages](scribe/models.toml) `en`, `fr`, `it`, `de` (so far).
131
136
 
132
- To skip the initial selection menu you can do:
137
+ The `openaiapi` backend uses `whisper-1` model at the time of writing. It requires an API key
133
138
  ```bash
134
- scribe --backend whisper --model small --no-prompt
139
+ scribe --backend openaiapi --api YOURAPIKEY
135
140
  ```
136
141
  where `--no-prompt` jumps right to the recording (after the first interruption, you can still choose to change the backend and model).
137
142
 
@@ -190,7 +195,9 @@ To activate start with:
190
195
  ```bash
191
196
  scribe --app
192
197
  ```
193
- or toggle the app option in the interactive menu. The scribe icon will show, with Record or Quit options.
198
+ or toggle the app option in the interactive menu. The scribe icon will show, with Record and other options. The icon will change based on what the app is doing. It is possible to choose from a set
199
+ of predefined models, or to Quit and choose from the terminal before pressing Enter again.
200
+ For the vosk model, there are only two states : recording + transcribing or Idle. For the whisper model there are three states visible from the icon: recording, transcribing and idle/waiting.
194
201
  That option requires `pystray` to be installed. This is included with the `pip install ...[all]` option. In Ubuntu the following dependencies were required to make the menus appear:
195
202
 
196
203
  ```bash
@@ -204,17 +211,19 @@ If you run Ubuntu (or else?) with GNOME, the script `scribe-install [...]` will
204
211
  to make it available from the quick launch menu. Any option will be passed on to `scribe`, with the additional options `--name` and `--no-terminal`.
205
212
  `--no-terminal` means no terminal will show up, and it also implies the options `--app --no-prompt`.
206
213
 
207
- e.g.
214
+ In a relatively basic form
215
+
216
+ ```bash
217
+ scribe-install --clipboard --api YOUROPENAIAPIKEY
218
+ ```
219
+ (`--api` is optional and only useful if you plan to use `openaiapi` backend later on)
220
+
221
+ And to make an app running outside the terminal:
208
222
 
209
223
  ```bash
210
- scribe-install
211
- scribe-install --name "Scribe Whisper" --backend whisper --model small --clipboard --restart-after-silence --no-prompt
212
- scribe-install --name "Scribe Vosk FR" --backend vosk --language fr --keyboard --clipboard --no-terminal
224
+ scribe-install --backend openaiapi --name "Scribe App" --keyboard --clipboard --app --no-prompt --no-terminal --api YOUROPENAIAPIKEY
213
225
  ```
214
- This will install three separate apps:
215
- - `Super + scribe` : will launch the default version with terminal prompt
216
- - `Super + whisper` : will launch a present version with the `small` model from `whisper` and start recording right away. You can see what is going on in the terminal and the result is ready to paste from the clipboard
217
- - `Super + vosk fr` : will launch a preset version for real-time transcription in French with the `vosk` backend, and throughput to the clipboard and the keyboard, not even opening a terminal.
226
+ This will install two separate apps (names "Scribe" and "Scribe App")
218
227
 
219
228
 
220
229
  ## Fine tuning
@@ -9,6 +9,8 @@ termcolor
9
9
  [all]
10
10
  pynput
11
11
  openai-whisper
12
+ openai
13
+ soundfile
12
14
  vosk
13
15
  pystray
14
16
 
@@ -19,6 +21,10 @@ PyGObject
19
21
  [keyboard]
20
22
  pynput
21
23
 
24
+ [openai]
25
+ openai
26
+ soundfile
27
+
22
28
  [vosk]
23
29
  vosk
24
30
 
File without changes
File without changes
File without changes
File without changes
File without changes
File without changes