audium-md 0.1.0__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,17 @@
1
+ .venv/
2
+ __pycache__/
3
+ *.pyc
4
+ *.pyo
5
+ .pytest_cache/
6
+ dist/
7
+ *.egg-info/
8
+ *.egg
9
+ transcripts/
10
+ .DS_Store
11
+
12
+ # Agent and IDE artifacts
13
+ .deepseek/
14
+ .codewhale/
15
+ .agents/
16
+ .superpowers/
17
+ STATE_DIR/
@@ -0,0 +1,226 @@
1
+ Metadata-Version: 2.4
2
+ Name: audium-md
3
+ Version: 0.1.0
4
+ Summary: Audio-to-Markdown transcription optimized for AI consumption
5
+ Author: tamukj
6
+ License: MIT
7
+ Requires-Python: >=3.10
8
+ Requires-Dist: click>=8.1.0
9
+ Requires-Dist: faster-whisper>=1.2.0
10
+ Requires-Dist: pyyaml>=6.0
11
+ Requires-Dist: rich>=13.0.0
12
+ Provides-Extra: dev
13
+ Requires-Dist: pytest-cov>=5.0; extra == 'dev'
14
+ Requires-Dist: pytest>=8.0; extra == 'dev'
15
+ Description-Content-Type: text/markdown
16
+
17
+ <p align="center">
18
+ <a href="https://github.com/Tamukj/Audium">
19
+ <img src="assets/logo.svg" width="180" alt="Audium logo">
20
+ </a>
21
+ </p>
22
+
23
+ <h1 align="center">Audium</h1>
24
+
25
+ <p align="center">
26
+ <strong>🎧 Audio → AI‑optimized Markdown</strong>
27
+ <br>
28
+ <sub>Transcribe MP3/WAV/FLAC into clean, token‑efficient Markdown — ready for any LLM.</sub>
29
+ </p>
30
+
31
+ <p align="center">
32
+ <a href="https://www.python.org/downloads/"><img src="https://img.shields.io/badge/python-3.10%2B-blue?style=flat&logo=python&logoColor=white" alt="Python 3.10+"></a>
33
+ <a href="LICENSE"><img src="https://img.shields.io/badge/license-MIT-green?style=flat" alt="MIT License"></a>
34
+ <a href="https://pypi.org/project/audium-md/"><img src="https://img.shields.io/badge/pypi-v0.1.0-blue?style=flat&logo=pypi&logoColor=white" alt="PyPI version"></a>
35
+ <a href="https://github.com/SYSTRAN/faster-whisper"><img src="https://img.shields.io/badge/backend-faster--whisper-8A2BE2?style=flat" alt="faster-whisper"></a>
36
+ <a href="https://github.com/Tamukj/Audium"><img src="https://img.shields.io/badge/platform-linux%20%7C%20macOS%20%7C%20windows-lightgrey?style=flat" alt="Platform"></a>
37
+ </p>
38
+
39
+ <p align="center">
40
+ <a href="README.md">English</a> ·
41
+ <a href="README.ru.md">Русский</a> ·
42
+ <a href="README.zh-CN.md">中文</a>
43
+ </p>
44
+
45
+ ---
46
+
47
+ <h2 align="center">✨ Why Audium?</h2>
48
+
49
+ Feed audio to an LLM. Get answers. Simple.
50
+
51
+ But raw transcripts burn tokens on noise: long timestamps, filler words,
52
+ silent segments, markup that adds nothing.
53
+
54
+ Audium turns speech into **the minimum viable Markdown**: every character
55
+ counts, nothing wasted.
56
+
57
+ <div align="center">
58
+
59
+ | 🎯 | ⚡ | 🪙 | 👁️ | 🌍 |
60
+ |---|---|---|---|---|
61
+ | **3 formats** | **GPU‑accelerated** | **Token‑aware** | **Watch mode** | **~97 languages** |
62
+ | compact, minimal, structured | 2–10× real‑time on CUDA | `[MM:SS]` + VAD + filler‑strip | drop files → auto‑transcribe | tiny to large‑v3 |
63
+
64
+ </div>
65
+
66
+ ---
67
+
68
+ <h2 align="center">📦 Install</h2>
69
+
70
+ ```bash
71
+ pip install audium-md
72
+ ```
73
+
74
+ > Requires `ffmpeg` on your system: `sudo apt install ffmpeg` / `brew install ffmpeg`
75
+
76
+ ---
77
+
78
+ <h2 align="center">🚀 Quick Start</h2>
79
+
80
+ ```bash
81
+ # Process a folder
82
+ audium run ./my-recordings/
83
+
84
+ # Single file
85
+ audium run lecture.mp3
86
+
87
+ # Watch folder — auto‑transcribe new files
88
+ audium watch ./incoming/
89
+
90
+ # See what you've transcribed
91
+ audium list
92
+
93
+ # Change model
94
+ audium config set model large-v3
95
+ ```
96
+
97
+ ---
98
+
99
+ <h2 align="center">📝 Formats</h2>
100
+
101
+ ### compact *(default)*
102
+
103
+ ```
104
+ # lecture.mp3 (01:23:45)
105
+
106
+ [00:00] Neural networks learn hierarchical representations
107
+ [00:04] Each layer detects increasingly abstract features
108
+ [00:08] Early layers find edges and textures
109
+ [00:12] Later layers detect objects and scenes
110
+ ```
111
+
112
+ ### minimal
113
+
114
+ ```
115
+ Neural networks learn hierarchical representations
116
+ Each layer detects increasingly abstract features
117
+ Early layers find edges and textures
118
+ Later layers detect objects and scenes
119
+ ```
120
+
121
+ ### structured *(requires speaker diarization)*
122
+
123
+ ```
124
+ # interview.mp3 (00:45:12)
125
+
126
+ ## Alice [00:00-00:30]
127
+ Neural networks are a powerful tool. It's important to understand their limitations.
128
+
129
+ ## Bob [00:30-01:15]
130
+ I completely agree. Let me walk through an example to make this concrete.
131
+ ```
132
+
133
+ ---
134
+
135
+ <h2 align="center">⚙️ Commands</h2>
136
+
137
+ | Command | Description |
138
+ |---------|-------------|
139
+ | `audium run <path>` | Transcribe audio files or folders |
140
+ | `audium watch <path>` | Watch folder and auto‑process new files |
141
+ | `audium list [dir]` | Show processed transcripts with file sizes |
142
+ | `audium config` | Show current configuration |
143
+ | `audium config set <key> <value>` | Change a setting |
144
+ | `audium config reset` | Reset to factory defaults |
145
+ | `audium config path` | Show config file location |
146
+
147
+ ### Common flags for `run` and `watch`
148
+
149
+ | Flag | Default | Description |
150
+ |------|---------|-------------|
151
+ | `-o, --output-dir` | `./transcripts` | Where to save .md files |
152
+ | `-f, --format` | `compact` | `compact` / `minimal` / `structured` |
153
+ | `-r, --recursive` | off | Search subdirectories |
154
+ | `--model` | `small` | `tiny` / `base` / `small` / `medium` / `large-v3` |
155
+ | `--language` | `auto` | Force language code: `ru`, `en`, `zh`, ... |
156
+ | `--strip-fillers` | off | Remove "um", "uh", "like", "мм", "ээ", etc. |
157
+ | `--no-vad` | off | Disable voice activity detection |
158
+ | `--no-progress` | off | Hide the progress bar |
159
+
160
+ ---
161
+
162
+ <h2 align="center">🔧 Configuration</h2>
163
+
164
+ Settings are merged: **CLI flags > `.audium.yaml` (project) > `~/.config/audium/config.yaml` > defaults**
165
+
166
+ ```bash
167
+ # Set default model
168
+ audium config set model large-v3
169
+
170
+ # Always strip filler words
171
+ audium config set strip_fillers true
172
+
173
+ # Custom output folder
174
+ audium config set output_dir ~/Documents/transcripts
175
+
176
+ # See what you changed
177
+ audium config
178
+ ```
179
+
180
+ ```yaml
181
+ # Example .audium.yaml (place in project root)
182
+ model: medium
183
+ language: ru
184
+ format: minimal
185
+ output_dir: ./transcripts
186
+ ```
187
+
188
+ ---
189
+
190
+ <h2 align="center">🪙 Token Optimization</h2>
191
+
192
+ Audium is built to minimize LLM token cost:
193
+
194
+ | Technique | Savings |
195
+ |-----------|---------|
196
+ | `[MM:SS]` instead of `[HH:MM:SS.mmm]` | ~30% on timestamps |
197
+ | VAD filtering (skip silence) | 15–40% on meeting recordings |
198
+ | Filler‑word stripping | 5–10% on conversational speech |
199
+ | `min_segment_duration` threshold | skip noise fragments |
200
+ | One line per segment, no blank lines | ~8% vs paragraph output |
201
+
202
+ ---
203
+
204
+ <h2 align="center">📊 Model Sizes</h2>
205
+
206
+ | Model | Parameters | Speed (GPU) | Best for |
207
+ |-------|-----------|-------------|----------|
208
+ | tiny | 39M | ~32× real‑time | Quick drafts, low‑resource |
209
+ | base | 74M | ~16× real‑time | Dictation, clean audio |
210
+ | small | 244M | ~6× real‑time | **General purpose** |
211
+ | medium | 769M | ~2× real‑time | Accents, noisy audio |
212
+ | large‑v3 | 1.5B | ~1× real‑time | Maximum accuracy |
213
+
214
+ > All multilingual models support the same ~97 languages. The size trades accuracy for speed.
215
+
216
+ ---
217
+
218
+ <h2 align="center">📄 License</h2>
219
+
220
+ <p align="center">
221
+ <a href="LICENSE"><img src="https://img.shields.io/badge/license-MIT-blue?style=for-the-badge" alt="MIT License"></a>
222
+ </p>
223
+
224
+ <p align="center">
225
+ MIT — do whatever you want. Attribution appreciated.
226
+ </p>
@@ -0,0 +1,210 @@
1
+ <p align="center">
2
+ <a href="https://github.com/Tamukj/Audium">
3
+ <img src="assets/logo.svg" width="180" alt="Audium logo">
4
+ </a>
5
+ </p>
6
+
7
+ <h1 align="center">Audium</h1>
8
+
9
+ <p align="center">
10
+ <strong>🎧 Audio → AI‑optimized Markdown</strong>
11
+ <br>
12
+ <sub>Transcribe MP3/WAV/FLAC into clean, token‑efficient Markdown — ready for any LLM.</sub>
13
+ </p>
14
+
15
+ <p align="center">
16
+ <a href="https://www.python.org/downloads/"><img src="https://img.shields.io/badge/python-3.10%2B-blue?style=flat&logo=python&logoColor=white" alt="Python 3.10+"></a>
17
+ <a href="LICENSE"><img src="https://img.shields.io/badge/license-MIT-green?style=flat" alt="MIT License"></a>
18
+ <a href="https://pypi.org/project/audium-md/"><img src="https://img.shields.io/badge/pypi-v0.1.0-blue?style=flat&logo=pypi&logoColor=white" alt="PyPI version"></a>
19
+ <a href="https://github.com/SYSTRAN/faster-whisper"><img src="https://img.shields.io/badge/backend-faster--whisper-8A2BE2?style=flat" alt="faster-whisper"></a>
20
+ <a href="https://github.com/Tamukj/Audium"><img src="https://img.shields.io/badge/platform-linux%20%7C%20macOS%20%7C%20windows-lightgrey?style=flat" alt="Platform"></a>
21
+ </p>
22
+
23
+ <p align="center">
24
+ <a href="README.md">English</a> ·
25
+ <a href="README.ru.md">Русский</a> ·
26
+ <a href="README.zh-CN.md">中文</a>
27
+ </p>
28
+
29
+ ---
30
+
31
+ <h2 align="center">✨ Why Audium?</h2>
32
+
33
+ Feed audio to an LLM. Get answers. Simple.
34
+
35
+ But raw transcripts burn tokens on noise: long timestamps, filler words,
36
+ silent segments, markup that adds nothing.
37
+
38
+ Audium turns speech into **the minimum viable Markdown**: every character
39
+ counts, nothing wasted.
40
+
41
+ <div align="center">
42
+
43
+ | 🎯 | ⚡ | 🪙 | 👁️ | 🌍 |
44
+ |---|---|---|---|---|
45
+ | **3 formats** | **GPU‑accelerated** | **Token‑aware** | **Watch mode** | **~97 languages** |
46
+ | compact, minimal, structured | 2–10× real‑time on CUDA | `[MM:SS]` + VAD + filler‑strip | drop files → auto‑transcribe | tiny to large‑v3 |
47
+
48
+ </div>
49
+
50
+ ---
51
+
52
+ <h2 align="center">📦 Install</h2>
53
+
54
+ ```bash
55
+ pip install audium-md
56
+ ```
57
+
58
+ > Requires `ffmpeg` on your system: `sudo apt install ffmpeg` / `brew install ffmpeg`
59
+
60
+ ---
61
+
62
+ <h2 align="center">🚀 Quick Start</h2>
63
+
64
+ ```bash
65
+ # Process a folder
66
+ audium run ./my-recordings/
67
+
68
+ # Single file
69
+ audium run lecture.mp3
70
+
71
+ # Watch folder — auto‑transcribe new files
72
+ audium watch ./incoming/
73
+
74
+ # See what you've transcribed
75
+ audium list
76
+
77
+ # Change model
78
+ audium config set model large-v3
79
+ ```
80
+
81
+ ---
82
+
83
+ <h2 align="center">📝 Formats</h2>
84
+
85
+ ### compact *(default)*
86
+
87
+ ```
88
+ # lecture.mp3 (01:23:45)
89
+
90
+ [00:00] Neural networks learn hierarchical representations
91
+ [00:04] Each layer detects increasingly abstract features
92
+ [00:08] Early layers find edges and textures
93
+ [00:12] Later layers detect objects and scenes
94
+ ```
95
+
96
+ ### minimal
97
+
98
+ ```
99
+ Neural networks learn hierarchical representations
100
+ Each layer detects increasingly abstract features
101
+ Early layers find edges and textures
102
+ Later layers detect objects and scenes
103
+ ```
104
+
105
+ ### structured *(requires speaker diarization)*
106
+
107
+ ```
108
+ # interview.mp3 (00:45:12)
109
+
110
+ ## Alice [00:00-00:30]
111
+ Neural networks are a powerful tool. It's important to understand their limitations.
112
+
113
+ ## Bob [00:30-01:15]
114
+ I completely agree. Let me walk through an example to make this concrete.
115
+ ```
116
+
117
+ ---
118
+
119
+ <h2 align="center">⚙️ Commands</h2>
120
+
121
+ | Command | Description |
122
+ |---------|-------------|
123
+ | `audium run <path>` | Transcribe audio files or folders |
124
+ | `audium watch <path>` | Watch folder and auto‑process new files |
125
+ | `audium list [dir]` | Show processed transcripts with file sizes |
126
+ | `audium config` | Show current configuration |
127
+ | `audium config set <key> <value>` | Change a setting |
128
+ | `audium config reset` | Reset to factory defaults |
129
+ | `audium config path` | Show config file location |
130
+
131
+ ### Common flags for `run` and `watch`
132
+
133
+ | Flag | Default | Description |
134
+ |------|---------|-------------|
135
+ | `-o, --output-dir` | `./transcripts` | Where to save .md files |
136
+ | `-f, --format` | `compact` | `compact` / `minimal` / `structured` |
137
+ | `-r, --recursive` | off | Search subdirectories |
138
+ | `--model` | `small` | `tiny` / `base` / `small` / `medium` / `large-v3` |
139
+ | `--language` | `auto` | Force language code: `ru`, `en`, `zh`, ... |
140
+ | `--strip-fillers` | off | Remove "um", "uh", "like", "мм", "ээ", etc. |
141
+ | `--no-vad` | off | Disable voice activity detection |
142
+ | `--no-progress` | off | Hide the progress bar |
143
+
144
+ ---
145
+
146
+ <h2 align="center">🔧 Configuration</h2>
147
+
148
+ Settings are merged: **CLI flags > `.audium.yaml` (project) > `~/.config/audium/config.yaml` > defaults**
149
+
150
+ ```bash
151
+ # Set default model
152
+ audium config set model large-v3
153
+
154
+ # Always strip filler words
155
+ audium config set strip_fillers true
156
+
157
+ # Custom output folder
158
+ audium config set output_dir ~/Documents/transcripts
159
+
160
+ # See what you changed
161
+ audium config
162
+ ```
163
+
164
+ ```yaml
165
+ # Example .audium.yaml (place in project root)
166
+ model: medium
167
+ language: ru
168
+ format: minimal
169
+ output_dir: ./transcripts
170
+ ```
171
+
172
+ ---
173
+
174
+ <h2 align="center">🪙 Token Optimization</h2>
175
+
176
+ Audium is built to minimize LLM token cost:
177
+
178
+ | Technique | Savings |
179
+ |-----------|---------|
180
+ | `[MM:SS]` instead of `[HH:MM:SS.mmm]` | ~30% on timestamps |
181
+ | VAD filtering (skip silence) | 15–40% on meeting recordings |
182
+ | Filler‑word stripping | 5–10% on conversational speech |
183
+ | `min_segment_duration` threshold | skip noise fragments |
184
+ | One line per segment, no blank lines | ~8% vs paragraph output |
185
+
186
+ ---
187
+
188
+ <h2 align="center">📊 Model Sizes</h2>
189
+
190
+ | Model | Parameters | Speed (GPU) | Best for |
191
+ |-------|-----------|-------------|----------|
192
+ | tiny | 39M | ~32× real‑time | Quick drafts, low‑resource |
193
+ | base | 74M | ~16× real‑time | Dictation, clean audio |
194
+ | small | 244M | ~6× real‑time | **General purpose** |
195
+ | medium | 769M | ~2× real‑time | Accents, noisy audio |
196
+ | large‑v3 | 1.5B | ~1× real‑time | Maximum accuracy |
197
+
198
+ > All multilingual models support the same ~97 languages. The size trades accuracy for speed.
199
+
200
+ ---
201
+
202
+ <h2 align="center">📄 License</h2>
203
+
204
+ <p align="center">
205
+ <a href="LICENSE"><img src="https://img.shields.io/badge/license-MIT-blue?style=for-the-badge" alt="MIT License"></a>
206
+ </p>
207
+
208
+ <p align="center">
209
+ MIT — do whatever you want. Attribution appreciated.
210
+ </p>
@@ -0,0 +1,210 @@
1
+ <p align="center">
2
+ <a href="https://github.com/Tamukj/Audium">
3
+ <img src="assets/logo.svg" width="180" alt="Audium logo">
4
+ </a>
5
+ </p>
6
+
7
+ <h1 align="center">Audium</h1>
8
+
9
+ <p align="center">
10
+ <strong>🎧 Аудио → Markdown, оптимизированный для ИИ</strong>
11
+ <br>
12
+ <sub>Транскрибируйте MP3/WAV/FLAC в чистый, токен‑эффективный Markdown — готовый для любого LLM.</sub>
13
+ </p>
14
+
15
+ <p align="center">
16
+ <a href="https://www.python.org/downloads/"><img src="https://img.shields.io/badge/python-3.10%2B-blue?style=flat&logo=python&logoColor=white" alt="Python 3.10+"></a>
17
+ <a href="LICENSE"><img src="https://img.shields.io/badge/license-MIT-green?style=flat" alt="MIT License"></a>
18
+ <a href="https://pypi.org/project/audium-md/"><img src="https://img.shields.io/badge/pypi-v0.1.0-blue?style=flat&logo=pypi&logoColor=white" alt="PyPI version"></a>
19
+ <a href="https://github.com/SYSTRAN/faster-whisper"><img src="https://img.shields.io/badge/backend-faster--whisper-8A2BE2?style=flat" alt="faster-whisper"></a>
20
+ <a href="https://github.com/Tamukj/Audium"><img src="https://img.shields.io/badge/platform-linux%20%7C%20macOS%20%7C%20windows-lightgrey?style=flat" alt="Platform"></a>
21
+ </p>
22
+
23
+ <p align="center">
24
+ <a href="README.md">English</a> ·
25
+ <a href="README.ru.md">Русский</a> ·
26
+ <a href="README.zh-CN.md">中文</a>
27
+ </p>
28
+
29
+ ---
30
+
31
+ <h2 align="center">✨ Зачем Audium?</h2>
32
+
33
+ Скормить аудио LLM и получить ответ. Просто.
34
+
35
+ Но сырые транскрипты сжигают токены на шум: длинные таймкоды,
36
+ слова‑паразиты, пустые сегменты, разметка, которая ничего не добавляет.
37
+
38
+ Audium превращает речь в **минимально достаточный Markdown**: каждый символ
39
+ на счету, ничего лишнего.
40
+
41
+ <div align="center">
42
+
43
+ | 🎯 | ⚡ | 🪙 | 👁️ | 🌍 |
44
+ |---|---|---|---|---|
45
+ | **3 формата** | **GPU‑ускорение** | **Оптимизация токенов** | **Режим наблюдения** | **~97 языков** |
46
+ | compact, minimal, structured | 2–10× реал. времени на CUDA | `[MM:SS]` + VAD + очистка паразитов | закинул файлы → авто‑транскрибация | tiny до large‑v3 |
47
+
48
+ </div>
49
+
50
+ ---
51
+
52
+ <h2 align="center">📦 Установка</h2>
53
+
54
+ ```bash
55
+ pip install audium-md
56
+ ```
57
+
58
+ > Требуется `ffmpeg` на системе: `sudo apt install ffmpeg` / `brew install ffmpeg`
59
+
60
+ ---
61
+
62
+ <h2 align="center">🚀 Быстрый старт</h2>
63
+
64
+ ```bash
65
+ # Обработать папку
66
+ audium run ./my-recordings/
67
+
68
+ # Один файл
69
+ audium run lecture.mp3
70
+
71
+ # Следить за папкой — авто‑транскрибация новых файлов
72
+ audium watch ./incoming/
73
+
74
+ # Посмотреть что уже обработано
75
+ audium list
76
+
77
+ # Сменить модель
78
+ audium config set model large-v3
79
+ ```
80
+
81
+ ---
82
+
83
+ <h2 align="center">📝 Форматы</h2>
84
+
85
+ ### compact *(по умолчанию)*
86
+
87
+ ```
88
+ # lecture.mp3 (01:23:45)
89
+
90
+ [00:00] Нейронные сети обучаются иерархическим представлениям
91
+ [00:04] Каждый слой выявляет всё более абстрактные признаки
92
+ [00:08] На ранних слоях — грани и текстуры
93
+ [00:12] На поздних — объекты и сцены
94
+ ```
95
+
96
+ ### minimal
97
+
98
+ ```
99
+ Нейронные сети обучаются иерархическим представлениям
100
+ Каждый слой выявляет всё более абстрактные признаки
101
+ На ранних слоях — грани и текстуры
102
+ На поздних — объекты и сцены
103
+ ```
104
+
105
+ ### structured *(требуется диаризация спикеров)*
106
+
107
+ ```
108
+ # interview.mp3 (00:45:12)
109
+
110
+ ## Alice [00:00-00:30]
111
+ Нейронные сети — мощный инструмент. Важно понимать их ограничения.
112
+
113
+ ## Bob [00:30-01:15]
114
+ Полностью согласен. Давайте разберём конкретный пример.
115
+ ```
116
+
117
+ ---
118
+
119
+ <h2 align="center">⚙️ Команды</h2>
120
+
121
+ | Команда | Описание |
122
+ |---------|----------|
123
+ | `audium run <путь>` | Транскрибировать аудиофайлы или папки |
124
+ | `audium watch <путь>` | Следить за папкой и авто‑обрабатывать новые файлы |
125
+ | `audium list [папка]` | Показать обработанные транскрипты с размерами |
126
+ | `audium config` | Показать текущую конфигурацию |
127
+ | `audium config set <ключ> <значение>` | Изменить настройку |
128
+ | `audium config reset` | Сбросить к заводским значениям |
129
+ | `audium config path` | Показать путь к файлу конфигурации |
130
+
131
+ ### Основные флаги для `run` и `watch`
132
+
133
+ | Флаг | По умолчанию | Описание |
134
+ |------|-------------|----------|
135
+ | `-o, --output-dir` | `./transcripts` | Куда сохранять .md файлы |
136
+ | `-f, --format` | `compact` | `compact` / `minimal` / `structured` |
137
+ | `-r, --recursive` | выкл | Искать в подпапках |
138
+ | `--model` | `small` | `tiny` / `base` / `small` / `medium` / `large-v3` |
139
+ | `--language` | `auto` | Код языка: `ru`, `en`, `zh`, ... |
140
+ | `--strip-fillers` | выкл | Удалять «мм», «ээ», «типа», «um», «uh» и т.д. |
141
+ | `--no-vad` | выкл | Отключить VAD (voice activity detection) |
142
+ | `--no-progress` | выкл | Скрыть прогресс‑бар |
143
+
144
+ ---
145
+
146
+ <h2 align="center">🔧 Конфигурация</h2>
147
+
148
+ Настройки объединяются: **флаги CLI > `.audium.yaml` (проект) > `~/.config/audium/config.yaml` > значения по умолчанию**
149
+
150
+ ```bash
151
+ # Модель по умолчанию
152
+ audium config set model large-v3
153
+
154
+ # Всегда удалять слова‑паразиты
155
+ audium config set strip_fillers true
156
+
157
+ # Своя папка для вывода
158
+ audium config set output_dir ~/Documents/transcripts
159
+
160
+ # Посмотреть изменения
161
+ audium config
162
+ ```
163
+
164
+ ```yaml
165
+ # Пример .audium.yaml (в корне проекта)
166
+ model: medium
167
+ language: ru
168
+ format: minimal
169
+ output_dir: ./transcripts
170
+ ```
171
+
172
+ ---
173
+
174
+ <h2 align="center">🪙 Оптимизация токенов</h2>
175
+
176
+ Audium создан чтобы минимизировать расход токенов LLM:
177
+
178
+ | Техника | Экономия |
179
+ |---------|----------|
180
+ | `[MM:SS]` вместо `[HH:MM:SS.mmm]` | ~30% на таймкодах |
181
+ | VAD‑фильтр (пропуск тишины) | 15–40% на записях встреч |
182
+ | Удаление слов‑паразитов | 5–10% на разговорной речи |
183
+ | Порог `min_segment_duration` | пропуск шумовых фрагментов |
184
+ | Одна строка на сегмент, без пустых строк | ~8% по сравнению с абзацами |
185
+
186
+ ---
187
+
188
+ <h2 align="center">📊 Размеры моделей</h2>
189
+
190
+ | Модель | Параметров | Скорость (GPU) | Для чего |
191
+ |--------|-----------|----------------|----------|
192
+ | tiny | 39M | ~32× реал. времени | Черновики, слабые машины |
193
+ | base | 74M | ~16× реал. времени | Диктовка, чистое аудио |
194
+ | small | 244M | ~6× реал. времени | **Универсальная** |
195
+ | medium | 769M | ~2× реал. времени | Акценты, шумное аудио |
196
+ | large‑v3 | 1.5B | ~1× реал. времени | Максимальная точность |
197
+
198
+ > Все мультиязычные модели поддерживают одни и те же ~97 языков. Размер влияет на точность и скорость.
199
+
200
+ ---
201
+
202
+ <h2 align="center">📄 Лицензия</h2>
203
+
204
+ <p align="center">
205
+ <a href="LICENSE"><img src="https://img.shields.io/badge/license-MIT-blue?style=for-the-badge" alt="MIT License"></a>
206
+ </p>
207
+
208
+ <p align="center">
209
+ MIT — делайте что хотите. Ссылка на автора приветствуется.
210
+ </p>