scribe-cli 0.9.0__tar.gz → 0.10.0__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (30) hide show
  1. {scribe_cli-0.9.0/scribe_cli.egg-info → scribe_cli-0.10.0}/PKG-INFO +19 -13
  2. {scribe_cli-0.9.0 → scribe_cli-0.10.0}/README.md +13 -12
  3. {scribe_cli-0.9.0 → scribe_cli-0.10.0}/pyproject.toml +2 -1
  4. {scribe_cli-0.9.0 → scribe_cli-0.10.0}/scribe/_version.py +2 -2
  5. {scribe_cli-0.9.0 → scribe_cli-0.10.0}/scribe/app.py +27 -18
  6. {scribe_cli-0.9.0 → scribe_cli-0.10.0}/scribe/models.py +51 -7
  7. {scribe_cli-0.9.0 → scribe_cli-0.10.0/scribe_cli.egg-info}/PKG-INFO +19 -13
  8. {scribe_cli-0.9.0 → scribe_cli-0.10.0}/scribe_cli.egg-info/requires.txt +6 -0
  9. {scribe_cli-0.9.0 → scribe_cli-0.10.0}/.github/workflows/pypi.yml +0 -0
  10. {scribe_cli-0.9.0 → scribe_cli-0.10.0}/.gitignore +0 -0
  11. {scribe_cli-0.9.0 → scribe_cli-0.10.0}/LICENSE +0 -0
  12. {scribe_cli-0.9.0 → scribe_cli-0.10.0}/icon.xcf +0 -0
  13. {scribe_cli-0.9.0 → scribe_cli-0.10.0}/scribe/__init__.py +0 -0
  14. {scribe_cli-0.9.0 → scribe_cli-0.10.0}/scribe/audio.py +0 -0
  15. {scribe_cli-0.9.0 → scribe_cli-0.10.0}/scribe/install_desktop.py +0 -0
  16. {scribe_cli-0.9.0 → scribe_cli-0.10.0}/scribe/keyboard.py +0 -0
  17. {scribe_cli-0.9.0 → scribe_cli-0.10.0}/scribe/models.toml +0 -0
  18. {scribe_cli-0.9.0 → scribe_cli-0.10.0}/scribe/saverecording.py +0 -0
  19. {scribe_cli-0.9.0 → scribe_cli-0.10.0}/scribe/testpynput.py +0 -0
  20. {scribe_cli-0.9.0 → scribe_cli-0.10.0}/scribe/util.py +0 -0
  21. {scribe_cli-0.9.0 → scribe_cli-0.10.0}/scribe_cli.egg-info/SOURCES.txt +0 -0
  22. {scribe_cli-0.9.0 → scribe_cli-0.10.0}/scribe_cli.egg-info/dependency_links.txt +0 -0
  23. {scribe_cli-0.9.0 → scribe_cli-0.10.0}/scribe_cli.egg-info/entry_points.txt +0 -0
  24. {scribe_cli-0.9.0 → scribe_cli-0.10.0}/scribe_cli.egg-info/top_level.txt +0 -0
  25. {scribe_cli-0.9.0 → scribe_cli-0.10.0}/scribe_data/__init__.py +0 -0
  26. {scribe_cli-0.9.0 → scribe_cli-0.10.0}/scribe_data/share/icon.png +0 -0
  27. {scribe_cli-0.9.0 → scribe_cli-0.10.0}/scribe_data/share/icon_recording.png +0 -0
  28. {scribe_cli-0.9.0 → scribe_cli-0.10.0}/scribe_data/share/icon_writing.png +0 -0
  29. {scribe_cli-0.9.0 → scribe_cli-0.10.0}/scribe_data/templates/scribe.desktop +0 -0
  30. {scribe_cli-0.9.0 → scribe_cli-0.10.0}/setup.cfg +0 -0
@@ -1,6 +1,6 @@
1
1
  Metadata-Version: 2.2
2
2
  Name: scribe-cli
3
- Version: 0.9.0
3
+ Version: 0.10.0
4
4
  Summary: scribe is a local speech recognition tool that provides real-time transcription using vosk and whisper AI, with the goal of serving as a virtual keyboard on a computer
5
5
  Author-email: Mahé Perrette <mahe.perrette@gmail.com>
6
6
  License: MIT License
@@ -55,9 +55,14 @@ Requires-Dist: vosk; extra == "vosk"
55
55
  Provides-Extra: app
56
56
  Requires-Dist: pystray; extra == "app"
57
57
  Requires-Dist: PyGObject; extra == "app"
58
+ Provides-Extra: openai
59
+ Requires-Dist: openai; extra == "openai"
60
+ Requires-Dist: soundfile; extra == "openai"
58
61
  Provides-Extra: all
59
62
  Requires-Dist: pynput; extra == "all"
60
63
  Requires-Dist: openai-whisper; extra == "all"
64
+ Requires-Dist: openai; extra == "all"
65
+ Requires-Dist: soundfile; extra == "all"
61
66
  Requires-Dist: vosk; extra == "all"
62
67
  Requires-Dist: pystray; extra == "all"
63
68
 
@@ -66,7 +71,9 @@ Requires-Dist: pystray; extra == "all"
66
71
 
67
72
  # Scribe <img src="scribe_data/share/icon.png" width=48px>
68
73
 
69
- `scribe` is a local speech recognition tool that provides real-time transcription using vosk and whisper AI, with the goal of serving as a virtual keyboard on a computer.
74
+ `scribe` is a speech recognition tool that provides real-time transcription using cutting-edge AI models, with the goal of serving as a virtual keyboard on a computer.
75
+
76
+ It features local, downloadable models with the `vosk` and `whisper` backends, as well as a client to open AI via `openaiapi` backend (API key required).
70
77
 
71
78
  ## Compatibility
72
79
 
@@ -101,12 +108,10 @@ cd scribe
101
108
  pip install -e .[all]
102
109
  ```
103
110
 
104
- You can leave the optional dependencies (leave out `[all]`) but must install at least one of `vosk` or `openai-whisper` packages (see Usage below).
105
-
106
- The `vosk` language models will download on-the-fly.
107
- The default download folder is `$XDG_CACHE_HOME/{backend}` where `$XDG_CACHE_HOME` defaults to `$HOME/.cache` (note for the `whisper` backend
108
- the default is left to the `openai-whisper` package and might change in the future).
111
+ You can leave the optional dependencies (leave out `[all]`) but must install at least one of `vosk` or `openai-whisper` or `openai` packages (see Usage below).
109
112
 
113
+ The language models for local backends `vosk` and `whisper` will download on-the-fly.
114
+ The default download folder is `$XDG_CACHE_HOME/{backend}` where `$XDG_CACHE_HOME` defaults to `$HOME/.cache`.
110
115
 
111
116
  ## Usage
112
117
 
@@ -115,7 +120,7 @@ Just type in the terminal:
115
120
  ```bash
116
121
  scribe
117
122
  ```
118
- and the script will guide you through the choice of backend (`whisper` or `vosk`) and the specific language model.
123
+ and the script will guide you through the choice of backend (`whisper` or `vosk` or `openaiapi`) and the specific language model.
119
124
  After this, you will be prompted to start recording your microphone and print the transcribed text in real-time (`vosk`)
120
125
  or until after recording is complete (`whisper`).
121
126
  You can interrupt the recording via Ctrl + C and start again or change model.
@@ -129,9 +134,9 @@ The `vosk` backend is much faster and very good at doing real-time transcription
129
134
  It becomes really powerful when used for longer or interactive typing session with the [keyboard](#virtual-keyboard-experimental) option, e.g. to make notes or chat with an AI.
130
135
  There are many [vosk models](https://alphacephei.com/vosk/models) available, and here a few are associated to [a handful of languages](scribe/models.toml) `en`, `fr`, `it`, `de` (so far).
131
136
 
132
- To skip the initial selection menu you can do:
137
+ The `openaiapi` backend uses `whisper-1` model at the time of writing. It requires an API key
133
138
  ```bash
134
- scribe --backend whisper --model small --no-prompt
139
+ scribe --backend openaiapi --api YOURAPIKEY
135
140
  ```
136
141
  where `--no-prompt` jumps right to the recording (after the first interruption, you can still choose to change the backend and model).
137
142
 
@@ -190,7 +195,8 @@ To activate start with:
190
195
  ```bash
191
196
  scribe --app
192
197
  ```
193
- or toggle the app option in the interactive menu. The scribe icon will show, with Record or Quit options.
198
+ or toggle the app option in the interactive menu. The scribe icon will show, with Record, Stop or Quit options. The icon will change based on what the app is doing.
199
+ For the vosk model, there are only two states : recording + transcribing or Idle. For the whisper model there are three states visible from the icon: recording, transcribing and idle/waiting.
194
200
  That option requires `pystray` to be installed. This is included with the `pip install ...[all]` option. In Ubuntu the following dependencies were required to make the menus appear:
195
201
 
196
202
  ```bash
@@ -213,8 +219,8 @@ scribe-install --name "Scribe Vosk FR" --backend vosk --language fr --keyboard -
213
219
  ```
214
220
  This will install three separate apps:
215
221
  - `Super + scribe` : will launch the default version with terminal prompt
216
- - `Super + whisper` : will launch a present version with the `small` model from `whisper` and start recording right away. You can see what is going on in the terminal and the result is ready to paste from the clipboard
217
- - `Super + vosk fr` : will launch a preset version for real-time transcription in French with the `vosk` backend, and throughput to the clipboard and the keyboard, not even opening a terminal.
222
+ - `Super + whisper` : will launch a present version with the `small` model from `whisper` and start recording right away. You can see what is going on in the terminal and the result is ready to paste from the clipboard.
223
+ - `Super + vosk fr` : will launch a preset version for real-time transcription in French with the `vosk` backend, and throughput to the clipboard and the keyboard, not even opening a terminal (you need to press Record in the tray icon menu to start the recording).
218
224
 
219
225
 
220
226
  ## Fine tuning
@@ -3,7 +3,9 @@
3
3
 
4
4
  # Scribe <img src="scribe_data/share/icon.png" width=48px>
5
5
 
6
- `scribe` is a local speech recognition tool that provides real-time transcription using vosk and whisper AI, with the goal of serving as a virtual keyboard on a computer.
6
+ `scribe` is a speech recognition tool that provides real-time transcription using cutting-edge AI models, with the goal of serving as a virtual keyboard on a computer.
7
+
8
+ It features local, downloadable models with the `vosk` and `whisper` backends, as well as a client to open AI via `openaiapi` backend (API key required).
7
9
 
8
10
  ## Compatibility
9
11
 
@@ -38,12 +40,10 @@ cd scribe
38
40
  pip install -e .[all]
39
41
  ```
40
42
 
41
- You can leave the optional dependencies (leave out `[all]`) but must install at least one of `vosk` or `openai-whisper` packages (see Usage below).
42
-
43
- The `vosk` language models will download on-the-fly.
44
- The default download folder is `$XDG_CACHE_HOME/{backend}` where `$XDG_CACHE_HOME` defaults to `$HOME/.cache` (note for the `whisper` backend
45
- the default is left to the `openai-whisper` package and might change in the future).
43
+ You can leave the optional dependencies (leave out `[all]`) but must install at least one of `vosk` or `openai-whisper` or `openai` packages (see Usage below).
46
44
 
45
+ The language models for local backends `vosk` and `whisper` will download on-the-fly.
46
+ The default download folder is `$XDG_CACHE_HOME/{backend}` where `$XDG_CACHE_HOME` defaults to `$HOME/.cache`.
47
47
 
48
48
  ## Usage
49
49
 
@@ -52,7 +52,7 @@ Just type in the terminal:
52
52
  ```bash
53
53
  scribe
54
54
  ```
55
- and the script will guide you through the choice of backend (`whisper` or `vosk`) and the specific language model.
55
+ and the script will guide you through the choice of backend (`whisper` or `vosk` or `openaiapi`) and the specific language model.
56
56
  After this, you will be prompted to start recording your microphone and print the transcribed text in real-time (`vosk`)
57
57
  or until after recording is complete (`whisper`).
58
58
  You can interrupt the recording via Ctrl + C and start again or change model.
@@ -66,9 +66,9 @@ The `vosk` backend is much faster and very good at doing real-time transcription
66
66
  It becomes really powerful when used for longer or interactive typing session with the [keyboard](#virtual-keyboard-experimental) option, e.g. to make notes or chat with an AI.
67
67
  There are many [vosk models](https://alphacephei.com/vosk/models) available, and here a few are associated to [a handful of languages](scribe/models.toml) `en`, `fr`, `it`, `de` (so far).
68
68
 
69
- To skip the initial selection menu you can do:
69
+ The `openaiapi` backend uses `whisper-1` model at the time of writing. It requires an API key
70
70
  ```bash
71
- scribe --backend whisper --model small --no-prompt
71
+ scribe --backend openaiapi --api YOURAPIKEY
72
72
  ```
73
73
  where `--no-prompt` jumps right to the recording (after the first interruption, you can still choose to change the backend and model).
74
74
 
@@ -127,7 +127,8 @@ To activate start with:
127
127
  ```bash
128
128
  scribe --app
129
129
  ```
130
- or toggle the app option in the interactive menu. The scribe icon will show, with Record or Quit options.
130
+ or toggle the app option in the interactive menu. The scribe icon will show, with Record, Stop or Quit options. The icon will change based on what the app is doing.
131
+ For the vosk model, there are only two states : recording + transcribing or Idle. For the whisper model there are three states visible from the icon: recording, transcribing and idle/waiting.
131
132
  That option requires `pystray` to be installed. This is included with the `pip install ...[all]` option. In Ubuntu the following dependencies were required to make the menus appear:
132
133
 
133
134
  ```bash
@@ -150,8 +151,8 @@ scribe-install --name "Scribe Vosk FR" --backend vosk --language fr --keyboard -
150
151
  ```
151
152
  This will install three separate apps:
152
153
  - `Super + scribe` : will launch the default version with terminal prompt
153
- - `Super + whisper` : will launch a present version with the `small` model from `whisper` and start recording right away. You can see what is going on in the terminal and the result is ready to paste from the clipboard
154
- - `Super + vosk fr` : will launch a preset version for real-time transcription in French with the `vosk` backend, and throughput to the clipboard and the keyboard, not even opening a terminal.
154
+ - `Super + whisper` : will launch a present version with the `small` model from `whisper` and start recording right away. You can see what is going on in the terminal and the result is ready to paste from the clipboard.
155
+ - `Super + vosk fr` : will launch a preset version for real-time transcription in French with the `vosk` backend, and throughput to the clipboard and the keyboard, not even opening a terminal (you need to press Record in the tray icon menu to start the recording).
155
156
 
156
157
 
157
158
  ## Fine tuning
@@ -44,7 +44,8 @@ keyboard = ["pynput"]
44
44
  whisper = ["openai-whisper"]
45
45
  vosk = ["vosk"]
46
46
  app = ["pystray", "PyGObject"]
47
- all = ["pynput", "openai-whisper", "vosk", "pystray"]
47
+ openai = ["openai", "soundfile"]
48
+ all = ["pynput", "openai-whisper", "openai", "soundfile", "vosk", "pystray"]
48
49
 
49
50
 
50
51
  [tool.setuptools]
@@ -12,5 +12,5 @@ __version__: str
12
12
  __version_tuple__: VERSION_TUPLE
13
13
  version_tuple: VERSION_TUPLE
14
14
 
15
- __version__ = version = '0.9.0'
16
- __version_tuple__ = version_tuple = (0, 9, 0)
15
+ __version__ = version = '0.10.0'
16
+ __version_tuple__ = version_tuple = (0, 10, 0)
@@ -4,8 +4,8 @@ import re
4
4
  import time
5
5
  import argparse
6
6
  from scribe.audio import Microphone
7
- from scribe.util import print_partial, clear_line, prompt_choices, check_dependencies, ansi_link, colored
8
- from scribe.models import VoskTranscriber, WhisperTranscriber
7
+ from scribe.util import print_partial, clear_line, prompt_choices, ansi_link, colored
8
+ from scribe.models import VoskTranscriber, WhisperTranscriber, OpenaiAPITranscriber
9
9
 
10
10
  with open(Path(__file__).parent / "models.toml", "rb") as f:
11
11
  language_config_default = tomllib.load(f)
@@ -24,7 +24,7 @@ def get_default_backend():
24
24
  except ImportError:
25
25
  raise ImportError("Please install either vosk or whisper to use this script.")
26
26
 
27
- BACKENDS = ["whisper", "vosk"]
27
+ BACKENDS = ["whisper", "vosk", "openaiapi"]
28
28
  UNAVAILABLE_BACKENDS = []
29
29
 
30
30
 
@@ -59,6 +59,7 @@ def get_transcriber(o, prompt=True):
59
59
 
60
60
  whisper_models = ["tiny", "base", "small", "medium", "large", "turbo"]
61
61
  whisper_english_models = ["tiny.en", "base.en", "small.en", "medium.en"]
62
+ whisperapi_models = ["whisper-1"]
62
63
 
63
64
  if o.dummy:
64
65
  return DummyTranscriber("whisper", "dummy")
@@ -68,26 +69,17 @@ def get_transcriber(o, prompt=True):
68
69
  o.backend = "vosk"
69
70
  elif o.model in whisper_models + whisper_english_models:
70
71
  o.backend = "whisper"
72
+ elif o.model in whisperapi_models:
73
+ o.backend = "openaiapi"
71
74
 
72
75
  if o.backend:
73
- checked_backend = check_dependencies(o.backend)
74
- if not checked_backend:
75
- print(f"Backend {o.backend} is not available.")
76
- exit(1)
77
76
  backend = o.backend
78
77
 
79
78
  elif not prompt:
80
79
  backend = BACKENDS[0]
81
80
 
82
81
  else:
83
- checked_backend = False
84
- while not checked_backend:
85
- backend = prompt_choices(BACKENDS, o.backend, "backend", UNAVAILABLE_BACKENDS)
86
- # raise an error if the user has explicitly selected a backend that is not available
87
- checked_backend = check_dependencies(backend, raise_error=backend==o.backend)
88
- if not checked_backend:
89
- print(f"Backend {o.backend} is not available.")
90
- UNAVAILABLE_BACKENDS.append(backend)
82
+ backend = prompt_choices(BACKENDS, o.backend, "backend", UNAVAILABLE_BACKENDS)
91
83
 
92
84
  print(f"Selected backend: {backend}")
93
85
 
@@ -131,6 +123,13 @@ def get_transcriber(o, prompt=True):
131
123
 
132
124
  model = pick_specialist_model(model, o.language, backend)
133
125
 
126
+ elif backend == "openaiapi":
127
+ model = o.model or "whisper-1"
128
+
129
+ else:
130
+ raise ValueError(f"Unknown backend: {backend}")
131
+
132
+
134
133
  print(f"Selected model: {model}")
135
134
 
136
135
  if backend == "vosk":
@@ -152,6 +151,12 @@ def get_transcriber(o, prompt=True):
152
151
  restart_after_silence=o.restart_after_silence,
153
152
  model_kwargs={"download_root": o.download_folder_whisper})
154
153
 
154
+ elif backend == "openaiapi":
155
+ transcriber = OpenaiAPITranscriber(model_name=model, samplerate=o.samplerate,
156
+ timeout=o.duration, silence_duration=o.silence, silence_thresh=o.silence_db,
157
+ restart_after_silence=o.restart_after_silence, api_key=o.api_key)
158
+
159
+
155
160
  else:
156
161
  raise ValueError(f"Unknown backend: {backend}")
157
162
 
@@ -195,6 +200,10 @@ def get_parser():
195
200
  group.add_argument("--silence-db", default=-30, type=float, help="silence magnitude in decibel (default %(default)s db)")
196
201
  group.add_argument("-a", "--restart-after-silence", action="store_true", help="Restart the recording after a transcription triggered by a silence")
197
202
 
203
+ group = parser.add_argument_group("whisper api")
204
+ group.add_argument("--api-key",
205
+ help="API key for the Whisper API backend.")
206
+
198
207
  parser.add_argument("--download-folder-vosk", help="Folder to store Vosk models.")
199
208
  parser.add_argument("--download-folder-whisper", help="Folder to store Whisper models.")
200
209
 
@@ -206,11 +215,11 @@ def start_recording(micro, transcriber, clipboard=True, keyboard=False, latency=
206
215
 
207
216
  if keyboard:
208
217
  from scribe.keyboard import type_text
209
- print("\nChange focus to target app during transcription.")
218
+ transcriber.log("Change focus to target app during transcription.")
210
219
 
211
220
  if clipboard:
212
221
  import pyperclip
213
- print("\nThe full transcription will be copied to clipboard as it becomes available.")
222
+ transcriber.log("The full transcription will be copied to clipboard as it becomes available.")
214
223
 
215
224
  fulltext = ""
216
225
 
@@ -310,7 +319,7 @@ def create_app(micro, transcriber, **kwargs):
310
319
  def callback_record(icon, item):
311
320
  # kwargs["callback"] = icon.update_menu # NOTE: the thread will finish AFTER the callback is complete
312
321
  if transcriber.busy:
313
- print("Still busy recording or transcribing.")
322
+ transcriber.log("Still busy recording or transcribing.")
314
323
  return
315
324
 
316
325
  if hasattr(icon, "_recording_thread") and icon._recording_thread.is_alive():
@@ -22,7 +22,7 @@ class StopRecording(Exception):
22
22
  class AbstractTranscriber:
23
23
  backend = None
24
24
  def __init__(self, model, model_name=None, language=None, samplerate=16000, timeout=None, model_kwargs={},
25
- silence_thresh=-40, silence_duration=2, restart_after_silence=False):
25
+ silence_thresh=-40, silence_duration=2, restart_after_silence=False, logger=None):
26
26
  self.model_name = model_name
27
27
  self.language = language
28
28
  self.model = model
@@ -36,6 +36,11 @@ class AbstractTranscriber:
36
36
  self.busy = False
37
37
  self.waiting = False
38
38
  self.interrupt = False
39
+ if logger is None:
40
+ import logging
41
+ logging.basicConfig(level=logging.INFO)
42
+ logger = logging.getLogger("scribe")
43
+ self.logger = logger
39
44
  self.reset()
40
45
 
41
46
  def get_elapsed(self):
@@ -55,9 +60,18 @@ class AbstractTranscriber:
55
60
  self.audio_buffer = b''
56
61
  self.start_time = time.time()
57
62
 
63
+ def log(self, text):
64
+ if text.startswith("\n"):
65
+ print("")
66
+ text = text[1:]
67
+ if self.logger:
68
+ self.logger.info(text)
69
+ else:
70
+ print(f"[{text}]")
71
+
58
72
  def start_recording(self, microphone,
59
73
  start_message="Recording... Press Ctrl+C to stop.",
60
- stop_message="Done transcribing."):
74
+ stop_message="Exit."):
61
75
 
62
76
  self.reset()
63
77
  self.interrupt = False
@@ -73,7 +87,7 @@ class AbstractTranscriber:
73
87
  try:
74
88
 
75
89
  with microphone.open_stream():
76
- print(start_message)
90
+ self.log(start_message)
77
91
 
78
92
  while not self.interrupt:
79
93
  while not microphone.q.empty():
@@ -107,7 +121,7 @@ class AbstractTranscriber:
107
121
 
108
122
  else:
109
123
  if not previous_waiting:
110
- print("Silence detected...waiting for more audio")
124
+ self.log("Silence detected...waiting for more audio")
111
125
 
112
126
  if self.is_overtime():
113
127
  raise StopRecording("Overtime: {:.2f} seconds".format(self.get_elapsed()))
@@ -125,7 +139,7 @@ class AbstractTranscriber:
125
139
  self.busy = False
126
140
  yield result
127
141
 
128
- print(stop_message)
142
+ self.log(stop_message)
129
143
 
130
144
 
131
145
  def get_vosk_model(model, download_root=None, url=None):
@@ -200,7 +214,7 @@ class WhisperTranscriber(AbstractTranscriber):
200
214
  super().__init__(model, model_name, language, model_kwargs=model_kwargs, **kwargs)
201
215
 
202
216
  def transcribe_audio(self, audio_bytes):
203
- print("\nTranscribing...")
217
+ self.log("\nTranscribing")
204
218
  audio_array = np.frombuffer(audio_bytes, dtype=np.int16).flatten().astype(np.float32) / 32768.0
205
219
  return self.model.transcribe(audio_array, fp16=False, language=self.language)
206
220
 
@@ -209,4 +223,34 @@ class WhisperTranscriber(AbstractTranscriber):
209
223
  return {"text": ""}
210
224
  result = self.transcribe_audio(self.audio_buffer)
211
225
  self.audio_buffer = b''
212
- return result
226
+ return result
227
+
228
+
229
+ class OpenaiAPITranscriber(WhisperTranscriber):
230
+ backend = "openaiapi"
231
+
232
+ def __init__(self, model_name="whisper-1", language=None, model_kwargs={}, model=None, api_key=None, **kwargs):
233
+ if model is None:
234
+ import openai
235
+ model = openai.OpenAI(
236
+ api_key=api_key or openai.api_key,
237
+ # 20 seconds (default is 10 minutes)
238
+ timeout=20.0,
239
+ )
240
+ AbstractTranscriber.__init__(self, model, model_name, language, model_kwargs=model_kwargs, **kwargs)
241
+
242
+ def transcribe_audio(self, audio_bytes):
243
+ self.log("\nTranscribing")
244
+ import io
245
+ import soundfile as sf
246
+ audio_data = np.frombuffer(audio_bytes, dtype=np.int16).flatten().astype(np.float32) / 32768.0
247
+ # Write the audio data to an in-memory file in WAV format
248
+ buffer = io.BytesIO()
249
+ sf.write(buffer, audio_data, self.samplerate, format='WAV')
250
+ buffer.seek(0)
251
+ buffer.name = "audio.wav" # Set a filename with a valid extension
252
+ transcription = self.model.audio.transcriptions.create(
253
+ model=self.model_name,
254
+ file=buffer,
255
+ )
256
+ return {"text": transcription.text}
@@ -1,6 +1,6 @@
1
1
  Metadata-Version: 2.2
2
2
  Name: scribe-cli
3
- Version: 0.9.0
3
+ Version: 0.10.0
4
4
  Summary: scribe is a local speech recognition tool that provides real-time transcription using vosk and whisper AI, with the goal of serving as a virtual keyboard on a computer
5
5
  Author-email: Mahé Perrette <mahe.perrette@gmail.com>
6
6
  License: MIT License
@@ -55,9 +55,14 @@ Requires-Dist: vosk; extra == "vosk"
55
55
  Provides-Extra: app
56
56
  Requires-Dist: pystray; extra == "app"
57
57
  Requires-Dist: PyGObject; extra == "app"
58
+ Provides-Extra: openai
59
+ Requires-Dist: openai; extra == "openai"
60
+ Requires-Dist: soundfile; extra == "openai"
58
61
  Provides-Extra: all
59
62
  Requires-Dist: pynput; extra == "all"
60
63
  Requires-Dist: openai-whisper; extra == "all"
64
+ Requires-Dist: openai; extra == "all"
65
+ Requires-Dist: soundfile; extra == "all"
61
66
  Requires-Dist: vosk; extra == "all"
62
67
  Requires-Dist: pystray; extra == "all"
63
68
 
@@ -66,7 +71,9 @@ Requires-Dist: pystray; extra == "all"
66
71
 
67
72
  # Scribe <img src="scribe_data/share/icon.png" width=48px>
68
73
 
69
- `scribe` is a local speech recognition tool that provides real-time transcription using vosk and whisper AI, with the goal of serving as a virtual keyboard on a computer.
74
+ `scribe` is a speech recognition tool that provides real-time transcription using cutting-edge AI models, with the goal of serving as a virtual keyboard on a computer.
75
+
76
+ It features local, downloadable models with the `vosk` and `whisper` backends, as well as a client to open AI via `openaiapi` backend (API key required).
70
77
 
71
78
  ## Compatibility
72
79
 
@@ -101,12 +108,10 @@ cd scribe
101
108
  pip install -e .[all]
102
109
  ```
103
110
 
104
- You can leave the optional dependencies (leave out `[all]`) but must install at least one of `vosk` or `openai-whisper` packages (see Usage below).
105
-
106
- The `vosk` language models will download on-the-fly.
107
- The default download folder is `$XDG_CACHE_HOME/{backend}` where `$XDG_CACHE_HOME` defaults to `$HOME/.cache` (note for the `whisper` backend
108
- the default is left to the `openai-whisper` package and might change in the future).
111
+ You can leave the optional dependencies (leave out `[all]`) but must install at least one of `vosk` or `openai-whisper` or `openai` packages (see Usage below).
109
112
 
113
+ The language models for local backends `vosk` and `whisper` will download on-the-fly.
114
+ The default download folder is `$XDG_CACHE_HOME/{backend}` where `$XDG_CACHE_HOME` defaults to `$HOME/.cache`.
110
115
 
111
116
  ## Usage
112
117
 
@@ -115,7 +120,7 @@ Just type in the terminal:
115
120
  ```bash
116
121
  scribe
117
122
  ```
118
- and the script will guide you through the choice of backend (`whisper` or `vosk`) and the specific language model.
123
+ and the script will guide you through the choice of backend (`whisper` or `vosk` or `openaiapi`) and the specific language model.
119
124
  After this, you will be prompted to start recording your microphone and print the transcribed text in real-time (`vosk`)
120
125
  or until after recording is complete (`whisper`).
121
126
  You can interrupt the recording via Ctrl + C and start again or change model.
@@ -129,9 +134,9 @@ The `vosk` backend is much faster and very good at doing real-time transcription
129
134
  It becomes really powerful when used for longer or interactive typing session with the [keyboard](#virtual-keyboard-experimental) option, e.g. to make notes or chat with an AI.
130
135
  There are many [vosk models](https://alphacephei.com/vosk/models) available, and here a few are associated to [a handful of languages](scribe/models.toml) `en`, `fr`, `it`, `de` (so far).
131
136
 
132
- To skip the initial selection menu you can do:
137
+ The `openaiapi` backend uses `whisper-1` model at the time of writing. It requires an API key
133
138
  ```bash
134
- scribe --backend whisper --model small --no-prompt
139
+ scribe --backend openaiapi --api YOURAPIKEY
135
140
  ```
136
141
  where `--no-prompt` jumps right to the recording (after the first interruption, you can still choose to change the backend and model).
137
142
 
@@ -190,7 +195,8 @@ To activate start with:
190
195
  ```bash
191
196
  scribe --app
192
197
  ```
193
- or toggle the app option in the interactive menu. The scribe icon will show, with Record or Quit options.
198
+ or toggle the app option in the interactive menu. The scribe icon will show, with Record, Stop or Quit options. The icon will change based on what the app is doing.
199
+ For the vosk model, there are only two states : recording + transcribing or Idle. For the whisper model there are three states visible from the icon: recording, transcribing and idle/waiting.
194
200
  That option requires `pystray` to be installed. This is included with the `pip install ...[all]` option. In Ubuntu the following dependencies were required to make the menus appear:
195
201
 
196
202
  ```bash
@@ -213,8 +219,8 @@ scribe-install --name "Scribe Vosk FR" --backend vosk --language fr --keyboard -
213
219
  ```
214
220
  This will install three separate apps:
215
221
  - `Super + scribe` : will launch the default version with terminal prompt
216
- - `Super + whisper` : will launch a present version with the `small` model from `whisper` and start recording right away. You can see what is going on in the terminal and the result is ready to paste from the clipboard
217
- - `Super + vosk fr` : will launch a preset version for real-time transcription in French with the `vosk` backend, and throughput to the clipboard and the keyboard, not even opening a terminal.
222
+ - `Super + whisper` : will launch a present version with the `small` model from `whisper` and start recording right away. You can see what is going on in the terminal and the result is ready to paste from the clipboard.
223
+ - `Super + vosk fr` : will launch a preset version for real-time transcription in French with the `vosk` backend, and throughput to the clipboard and the keyboard, not even opening a terminal (you need to press Record in the tray icon menu to start the recording).
218
224
 
219
225
 
220
226
  ## Fine tuning
@@ -9,6 +9,8 @@ termcolor
9
9
  [all]
10
10
  pynput
11
11
  openai-whisper
12
+ openai
13
+ soundfile
12
14
  vosk
13
15
  pystray
14
16
 
@@ -19,6 +21,10 @@ PyGObject
19
21
  [keyboard]
20
22
  pynput
21
23
 
24
+ [openai]
25
+ openai
26
+ soundfile
27
+
22
28
  [vosk]
23
29
  vosk
24
30
 
File without changes
File without changes
File without changes
File without changes
File without changes
File without changes