scribe-cli 0.6.0__tar.gz → 0.6.2__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (27) hide show
  1. {scribe_cli-0.6.0/scribe_cli.egg-info → scribe_cli-0.6.2}/PKG-INFO +14 -5
  2. {scribe_cli-0.6.0 → scribe_cli-0.6.2}/README.md +14 -5
  3. {scribe_cli-0.6.0 → scribe_cli-0.6.2}/scribe/_version.py +2 -2
  4. {scribe_cli-0.6.0 → scribe_cli-0.6.2}/scribe/keyboard.py +0 -1
  5. {scribe_cli-0.6.0 → scribe_cli-0.6.2}/scribe/models.py +7 -7
  6. {scribe_cli-0.6.0 → scribe_cli-0.6.2}/scribe/streamer.py +11 -5
  7. {scribe_cli-0.6.0 → scribe_cli-0.6.2/scribe_cli.egg-info}/PKG-INFO +14 -5
  8. {scribe_cli-0.6.0 → scribe_cli-0.6.2}/.github/workflows/pypi.yml +0 -0
  9. {scribe_cli-0.6.0 → scribe_cli-0.6.2}/.gitignore +0 -0
  10. {scribe_cli-0.6.0 → scribe_cli-0.6.2}/LICENSE +0 -0
  11. {scribe_cli-0.6.0 → scribe_cli-0.6.2}/pyproject.toml +0 -0
  12. {scribe_cli-0.6.0 → scribe_cli-0.6.2}/scribe/__init__.py +0 -0
  13. {scribe_cli-0.6.0 → scribe_cli-0.6.2}/scribe/audio.py +0 -0
  14. {scribe_cli-0.6.0 → scribe_cli-0.6.2}/scribe/install_desktop.py +0 -0
  15. {scribe_cli-0.6.0 → scribe_cli-0.6.2}/scribe/models.toml +0 -0
  16. {scribe_cli-0.6.0 → scribe_cli-0.6.2}/scribe/saverecording.py +0 -0
  17. {scribe_cli-0.6.0 → scribe_cli-0.6.2}/scribe/testpynput.py +0 -0
  18. {scribe_cli-0.6.0 → scribe_cli-0.6.2}/scribe/util.py +0 -0
  19. {scribe_cli-0.6.0 → scribe_cli-0.6.2}/scribe_cli.egg-info/SOURCES.txt +0 -0
  20. {scribe_cli-0.6.0 → scribe_cli-0.6.2}/scribe_cli.egg-info/dependency_links.txt +0 -0
  21. {scribe_cli-0.6.0 → scribe_cli-0.6.2}/scribe_cli.egg-info/entry_points.txt +0 -0
  22. {scribe_cli-0.6.0 → scribe_cli-0.6.2}/scribe_cli.egg-info/requires.txt +0 -0
  23. {scribe_cli-0.6.0 → scribe_cli-0.6.2}/scribe_cli.egg-info/top_level.txt +0 -0
  24. {scribe_cli-0.6.0 → scribe_cli-0.6.2}/scribe_data/__init__.py +0 -0
  25. {scribe_cli-0.6.0 → scribe_cli-0.6.2}/scribe_data/share/icon.jpg +0 -0
  26. {scribe_cli-0.6.0 → scribe_cli-0.6.2}/scribe_data/templates/scribe.desktop +0 -0
  27. {scribe_cli-0.6.0 → scribe_cli-0.6.2}/setup.cfg +0 -0
@@ -1,6 +1,6 @@
1
1
  Metadata-Version: 2.2
2
2
  Name: scribe-cli
3
- Version: 0.6.0
3
+ Version: 0.6.2
4
4
  Summary: scribe is a local speech recognition tool that provides real-time transcription using vosk and whisper AI.
5
5
  Author-email: Mahé Perrette <mahe.perrette@gmail.com>
6
6
  License: MIT License
@@ -109,6 +109,7 @@ there is a maximum duration after which it will stop by itself, which is setup t
109
109
 
110
110
  The `vosk` backend is good at
111
111
  doing real-time transcription for one language, but tended to make more mistakes in my tests and it does not do punctuation.
112
+ Use mainly for longer typing session with the [keyboard](#virtual-keyboard-advanced) option, e.g. to make notes.
112
113
  There are many [vosk models](https://alphacephei.com/vosk/models) available, and here a few are associated to [a handful of languages](scribe/models.toml) `en`, `fr`, `it`, `de` (so far).
113
114
 
114
115
  To skip the initial selection menu you can do:
@@ -117,9 +118,9 @@ scribe --backend whisper --model small --no-prompt
117
118
  ```
118
119
  where `--no-prompt` jumps right to the recording (after the first interruption, you can still choose to change the backend and model).
119
120
 
120
- ### Advanced usage as keyboard replacement
121
+ ### Virtual keyboard (experimental)
121
122
 
122
- By default the content of the transcription is paster to the clipboard, but is not propagated further.
123
+ By default the content of the transcription is pasted to the clipboard, and it is up to the user to paste (e.g. Ctrl + V).
123
124
  With the `--keyboard` option `scribe` will attempt to simulate a keyboard and send transcribed characters to the applcation under focus:
124
125
 
125
126
  ```bash
@@ -128,10 +129,18 @@ scribe --keyboard
128
129
 
129
130
  It relies on the optional `pynput` dependency (installed together with `scribe` if you used the `[all]` or `[keyboard]` option).
130
131
 
131
- `pynput` may require [some configuration](https://pynput.readthedocs.io/en/latest/limitations.html) (I *think* got it to work with `xhost +SI:localuser:$(whoami)` as far as the display is concerned). It has [limitations]((https://pynput.readthedocs.io/en/latest/limitations.html)). In my Ubuntu + Wayland system it works in chromium based applications (including vscode) but it does not in firefox and sublime text and any of the rest (not even in a terminal !).
132
- Workarounds include using the Xorg version of GNOME... Suggestions welcome.
132
+ `pynput` may require [some configuration](https://pynput.readthedocs.io/en/latest/limitations.html). It has [limitations]((https://pynput.readthedocs.io/en/latest/limitations.html)). In my Ubuntu + Wayland system it works in chromium based applications (including vscode) but it does not in firefox and sublime text and any of the rest (not even in a terminal !). Workarounds include using the Xorg version of GNOME: in `etc/gdm3/custom.conf` uncomment `# WaylandEnable=false` and restart.
133
+
133
134
 
134
135
  ### Start as an application in Ubuntu
135
136
 
136
137
  If you run Ubuntu (or else?) with GNOME, the script `scribe-install [...]` will create a `scribe.desktop` file and place it under `$HOME/.local/share/applications`
137
138
  to make it available from the quick launch menu. Any option will be passed on to `scribe`.
139
+
140
+ e.g.
141
+
142
+ ```bash
143
+ scribe-install --backend whisper --model small
144
+ ```
145
+
146
+ After that just typing Cmd + scri... at any time from any where will conveniently start the app in its own terminal with the prescribed options.
@@ -52,6 +52,7 @@ there is a maximum duration after which it will stop by itself, which is setup t
52
52
 
53
53
  The `vosk` backend is good at
54
54
  doing real-time transcription for one language, but tended to make more mistakes in my tests and it does not do punctuation.
55
+ Use mainly for longer typing session with the [keyboard](#virtual-keyboard-advanced) option, e.g. to make notes.
55
56
  There are many [vosk models](https://alphacephei.com/vosk/models) available, and here a few are associated to [a handful of languages](scribe/models.toml) `en`, `fr`, `it`, `de` (so far).
56
57
 
57
58
  To skip the initial selection menu you can do:
@@ -60,9 +61,9 @@ scribe --backend whisper --model small --no-prompt
60
61
  ```
61
62
  where `--no-prompt` jumps right to the recording (after the first interruption, you can still choose to change the backend and model).
62
63
 
63
- ### Advanced usage as keyboard replacement
64
+ ### Virtual keyboard (experimental)
64
65
 
65
- By default the content of the transcription is paster to the clipboard, but is not propagated further.
66
+ By default the content of the transcription is pasted to the clipboard, and it is up to the user to paste (e.g. Ctrl + V).
66
67
  With the `--keyboard` option `scribe` will attempt to simulate a keyboard and send transcribed characters to the applcation under focus:
67
68
 
68
69
  ```bash
@@ -71,10 +72,18 @@ scribe --keyboard
71
72
 
72
73
  It relies on the optional `pynput` dependency (installed together with `scribe` if you used the `[all]` or `[keyboard]` option).
73
74
 
74
- `pynput` may require [some configuration](https://pynput.readthedocs.io/en/latest/limitations.html) (I *think* got it to work with `xhost +SI:localuser:$(whoami)` as far as the display is concerned). It has [limitations]((https://pynput.readthedocs.io/en/latest/limitations.html)). In my Ubuntu + Wayland system it works in chromium based applications (including vscode) but it does not in firefox and sublime text and any of the rest (not even in a terminal !).
75
- Workarounds include using the Xorg version of GNOME... Suggestions welcome.
75
+ `pynput` may require [some configuration](https://pynput.readthedocs.io/en/latest/limitations.html). It has [limitations]((https://pynput.readthedocs.io/en/latest/limitations.html)). In my Ubuntu + Wayland system it works in chromium based applications (including vscode) but it does not in firefox and sublime text and any of the rest (not even in a terminal !). Workarounds include using the Xorg version of GNOME: in `etc/gdm3/custom.conf` uncomment `# WaylandEnable=false` and restart.
76
+
76
77
 
77
78
  ### Start as an application in Ubuntu
78
79
 
79
80
  If you run Ubuntu (or else?) with GNOME, the script `scribe-install [...]` will create a `scribe.desktop` file and place it under `$HOME/.local/share/applications`
80
- to make it available from the quick launch menu. Any option will be passed on to `scribe`.
81
+ to make it available from the quick launch menu. Any option will be passed on to `scribe`.
82
+
83
+ e.g.
84
+
85
+ ```bash
86
+ scribe-install --backend whisper --model small
87
+ ```
88
+
89
+ After that just typing Cmd + scri... at any time from any where will conveniently start the app in its own terminal with the prescribed options.
@@ -12,5 +12,5 @@ __version__: str
12
12
  __version_tuple__: VERSION_TUPLE
13
13
  version_tuple: VERSION_TUPLE
14
14
 
15
- __version__ = version = '0.6.0'
16
- __version_tuple__ = version_tuple = (0, 6, 0)
15
+ __version__ = version = '0.6.2'
16
+ __version_tuple__ = version_tuple = (0, 6, 2)
@@ -6,7 +6,6 @@ try:
6
6
 
7
7
  except ImportError:
8
8
  print("Please install pynput to use the keyboard feature.")
9
- print("Alternatively specify [keyboard] optional dependency to voskrealtime, e.g. `pip install -e .[keyboard]`")
10
9
  raise
11
10
 
12
11
  # Create a keyboard controller
@@ -95,16 +95,16 @@ class AbstractTranscriber:
95
95
  print(stop_message)
96
96
 
97
97
 
98
- def get_vosk_model(model, data_folder=None, url=None):
98
+ def get_vosk_model(model, download_root=None, url=None):
99
99
  """Load the Vosk recognizer"""
100
100
  import vosk
101
- if data_folder is None:
102
- data_folder = VOSK_MODELS_FOLDER
103
- model_path = os.path.join(data_folder, model)
101
+ if download_root is None:
102
+ download_root = VOSK_MODELS_FOLDER
103
+ model_path = os.path.join(download_root, model)
104
104
  if not os.path.exists(model_path):
105
105
  if url is None:
106
106
  url = f"https://alphacephei.com/vosk/models/{model}.zip"
107
- download_model(url, data_folder)
107
+ download_model(url, download_root)
108
108
  assert os.path.exists(model_path)
109
109
 
110
110
  return vosk.Model(model_path)
@@ -162,11 +162,11 @@ class WhisperTranscriber(AbstractTranscriber):
162
162
  def __init__(self, model_name, language=None, model=None, model_kwargs={}, **kwargs):
163
163
  import whisper
164
164
  if model is None:
165
- model = whisper.load_model(model_name)
165
+ model = whisper.load_model(model_name, **model_kwargs)
166
166
  super().__init__(model, model_name, language, model_kwargs=model_kwargs, **kwargs)
167
167
 
168
168
  def transcribe_audio(self, audio_bytes):
169
- print("Transcribing...")
169
+ print("\nTranscribing...")
170
170
  audio_array = np.frombuffer(audio_bytes, dtype=np.int16).flatten().astype(np.float32) / 32768.0
171
171
  return self.model.transcribe(audio_array, fp16=False, language=self.language)
172
172
 
@@ -47,7 +47,7 @@ def get_transcriber(o, prompt=True):
47
47
  backend = o.backend
48
48
 
49
49
  elif not prompt:
50
- backend = choices[0]
50
+ backend = BACKENDS[0]
51
51
 
52
52
  else:
53
53
  checked_backend = False
@@ -113,14 +113,16 @@ def get_transcriber(o, prompt=True):
113
113
  samplerate=o.samplerate,
114
114
  timeout=None, # vosk keeps going (no timeout)
115
115
  silence_duration=None, # vosk handles silences internally
116
- model_kwargs={"data_folder": o.data_folder})
116
+ model_kwargs={"download_root": o.download_folder_vosk})
117
117
  except Exception as error:
118
118
  print(error)
119
119
  print(f"Failed to (down)load model {model}.")
120
120
  exit(1)
121
121
 
122
122
  elif backend == "whisper":
123
- transcriber = WhisperTranscriber(model_name=model, language=o.language, samplerate=o.samplerate, timeout=o.duration, silence_duration=o.silence)
123
+ transcriber = WhisperTranscriber(model_name=model, language=o.language, samplerate=o.samplerate,
124
+ timeout=o.duration, silence_duration=o.silence, restart_after_silence=o.restart_after_silence,
125
+ model_kwargs={"download_root": o.download_folder_whisper})
124
126
 
125
127
  else:
126
128
  raise ValueError(f"Unknown backend: {backend}")
@@ -153,7 +155,8 @@ def get_parser():
153
155
  group.add_argument("--silence", default=2, type=float, help="silence duration that prompt transcription (whisper) (default %(default)ss)")
154
156
  group.add_argument("--restart-after-silence", action="store_true", help="Restart the recording after a transcription triggered by a silence")
155
157
 
156
- parser.add_argument("--data-folder", help="Folder to store Vosk models.")
158
+ parser.add_argument("--download-folder-vosk", help="Folder to store Vosk models.")
159
+ parser.add_argument("--download-folder-whisper", help="Folder to store Whisper models.")
157
160
 
158
161
  return parser
159
162
 
@@ -235,7 +238,7 @@ def main(args=None):
235
238
  if transcriber.backend == "whisper":
236
239
  print(f"[t] change duration (currently {transcriber.timeout}s)")
237
240
  print(f"[b] change silence duration (currently {transcriber.silence_duration}s)")
238
- print(f"[a] toggle auto-restart after silence [{toggle[o.restart_after_silence]}] -> [{toggle[not o.restart_after_silence]}]")
241
+ print(f"[a] toggle auto-restart after silence [{toggle[transcriber.restart_after_silence]}] -> [{toggle[not transcriber.restart_after_silence]}]")
239
242
  print(colored(f"Press [Enter] or any other key to start recording.", "BOLD"))
240
243
 
241
244
  key = input()
@@ -250,6 +253,9 @@ def main(args=None):
250
253
  if key == "c":
251
254
  o.clipboard = not o.clipboard
252
255
  continue
256
+ if key == "a":
257
+ transcriber.restart_after_silence = not transcriber.restart_after_silence
258
+ continue
253
259
  if key == "t":
254
260
  ans = input(f"Enter new duration in seconds (current: {transcriber.timeout}): ")
255
261
  try:
@@ -1,6 +1,6 @@
1
1
  Metadata-Version: 2.2
2
2
  Name: scribe-cli
3
- Version: 0.6.0
3
+ Version: 0.6.2
4
4
  Summary: scribe is a local speech recognition tool that provides real-time transcription using vosk and whisper AI.
5
5
  Author-email: Mahé Perrette <mahe.perrette@gmail.com>
6
6
  License: MIT License
@@ -109,6 +109,7 @@ there is a maximum duration after which it will stop by itself, which is setup t
109
109
 
110
110
  The `vosk` backend is good at
111
111
  doing real-time transcription for one language, but tended to make more mistakes in my tests and it does not do punctuation.
112
+ Use mainly for longer typing session with the [keyboard](#virtual-keyboard-advanced) option, e.g. to make notes.
112
113
  There are many [vosk models](https://alphacephei.com/vosk/models) available, and here a few are associated to [a handful of languages](scribe/models.toml) `en`, `fr`, `it`, `de` (so far).
113
114
 
114
115
  To skip the initial selection menu you can do:
@@ -117,9 +118,9 @@ scribe --backend whisper --model small --no-prompt
117
118
  ```
118
119
  where `--no-prompt` jumps right to the recording (after the first interruption, you can still choose to change the backend and model).
119
120
 
120
- ### Advanced usage as keyboard replacement
121
+ ### Virtual keyboard (experimental)
121
122
 
122
- By default the content of the transcription is paster to the clipboard, but is not propagated further.
123
+ By default the content of the transcription is pasted to the clipboard, and it is up to the user to paste (e.g. Ctrl + V).
123
124
  With the `--keyboard` option `scribe` will attempt to simulate a keyboard and send transcribed characters to the applcation under focus:
124
125
 
125
126
  ```bash
@@ -128,10 +129,18 @@ scribe --keyboard
128
129
 
129
130
  It relies on the optional `pynput` dependency (installed together with `scribe` if you used the `[all]` or `[keyboard]` option).
130
131
 
131
- `pynput` may require [some configuration](https://pynput.readthedocs.io/en/latest/limitations.html) (I *think* got it to work with `xhost +SI:localuser:$(whoami)` as far as the display is concerned). It has [limitations]((https://pynput.readthedocs.io/en/latest/limitations.html)). In my Ubuntu + Wayland system it works in chromium based applications (including vscode) but it does not in firefox and sublime text and any of the rest (not even in a terminal !).
132
- Workarounds include using the Xorg version of GNOME... Suggestions welcome.
132
+ `pynput` may require [some configuration](https://pynput.readthedocs.io/en/latest/limitations.html). It has [limitations]((https://pynput.readthedocs.io/en/latest/limitations.html)). In my Ubuntu + Wayland system it works in chromium based applications (including vscode) but it does not in firefox and sublime text and any of the rest (not even in a terminal !). Workarounds include using the Xorg version of GNOME: in `etc/gdm3/custom.conf` uncomment `# WaylandEnable=false` and restart.
133
+
133
134
 
134
135
  ### Start as an application in Ubuntu
135
136
 
136
137
  If you run Ubuntu (or else?) with GNOME, the script `scribe-install [...]` will create a `scribe.desktop` file and place it under `$HOME/.local/share/applications`
137
138
  to make it available from the quick launch menu. Any option will be passed on to `scribe`.
139
+
140
+ e.g.
141
+
142
+ ```bash
143
+ scribe-install --backend whisper --model small
144
+ ```
145
+
146
+ After that just typing Cmd + scri... at any time from any where will conveniently start the app in its own terminal with the prescribed options.
File without changes
File without changes
File without changes
File without changes
File without changes
File without changes