audium-md 0.1.0__tar.gz
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- audium_md-0.1.0/.gitignore +17 -0
- audium_md-0.1.0/PKG-INFO +226 -0
- audium_md-0.1.0/README.md +210 -0
- audium_md-0.1.0/README.ru.md +210 -0
- audium_md-0.1.0/README.zh-CN.md +208 -0
- audium_md-0.1.0/assets/logo.svg +37 -0
- audium_md-0.1.0/audium/__init__.py +3 -0
- audium_md-0.1.0/audium/cli.py +305 -0
- audium_md-0.1.0/audium/config.py +146 -0
- audium_md-0.1.0/audium/formatter.py +162 -0
- audium_md-0.1.0/audium/scanner.py +32 -0
- audium_md-0.1.0/audium/transcriber.py +82 -0
- audium_md-0.1.0/pyproject.toml +35 -0
- audium_md-0.1.0/tests/__init__.py +0 -0
- audium_md-0.1.0/tests/test_config.py +140 -0
- audium_md-0.1.0/tests/test_formatter.py +136 -0
- audium_md-0.1.0/tests/test_scanner.py +82 -0
audium_md-0.1.0/PKG-INFO
ADDED
|
@@ -0,0 +1,226 @@
|
|
|
1
|
+
Metadata-Version: 2.4
|
|
2
|
+
Name: audium-md
|
|
3
|
+
Version: 0.1.0
|
|
4
|
+
Summary: Audio-to-Markdown transcription optimized for AI consumption
|
|
5
|
+
Author: tamukj
|
|
6
|
+
License: MIT
|
|
7
|
+
Requires-Python: >=3.10
|
|
8
|
+
Requires-Dist: click>=8.1.0
|
|
9
|
+
Requires-Dist: faster-whisper>=1.2.0
|
|
10
|
+
Requires-Dist: pyyaml>=6.0
|
|
11
|
+
Requires-Dist: rich>=13.0.0
|
|
12
|
+
Provides-Extra: dev
|
|
13
|
+
Requires-Dist: pytest-cov>=5.0; extra == 'dev'
|
|
14
|
+
Requires-Dist: pytest>=8.0; extra == 'dev'
|
|
15
|
+
Description-Content-Type: text/markdown
|
|
16
|
+
|
|
17
|
+
<p align="center">
|
|
18
|
+
<a href="https://github.com/Tamukj/Audium">
|
|
19
|
+
<img src="assets/logo.svg" width="180" alt="Audium logo">
|
|
20
|
+
</a>
|
|
21
|
+
</p>
|
|
22
|
+
|
|
23
|
+
<h1 align="center">Audium</h1>
|
|
24
|
+
|
|
25
|
+
<p align="center">
|
|
26
|
+
<strong>🎧 Audio → AI‑optimized Markdown</strong>
|
|
27
|
+
<br>
|
|
28
|
+
<sub>Transcribe MP3/WAV/FLAC into clean, token‑efficient Markdown — ready for any LLM.</sub>
|
|
29
|
+
</p>
|
|
30
|
+
|
|
31
|
+
<p align="center">
|
|
32
|
+
<a href="https://www.python.org/downloads/"><img src="https://img.shields.io/badge/python-3.10%2B-blue?style=flat&logo=python&logoColor=white" alt="Python 3.10+"></a>
|
|
33
|
+
<a href="LICENSE"><img src="https://img.shields.io/badge/license-MIT-green?style=flat" alt="MIT License"></a>
|
|
34
|
+
<a href="https://pypi.org/project/audium-md/"><img src="https://img.shields.io/badge/pypi-v0.1.0-blue?style=flat&logo=pypi&logoColor=white" alt="PyPI version"></a>
|
|
35
|
+
<a href="https://github.com/SYSTRAN/faster-whisper"><img src="https://img.shields.io/badge/backend-faster--whisper-8A2BE2?style=flat" alt="faster-whisper"></a>
|
|
36
|
+
<a href="https://github.com/Tamukj/Audium"><img src="https://img.shields.io/badge/platform-linux%20%7C%20macOS%20%7C%20windows-lightgrey?style=flat" alt="Platform"></a>
|
|
37
|
+
</p>
|
|
38
|
+
|
|
39
|
+
<p align="center">
|
|
40
|
+
<a href="README.md">English</a> ·
|
|
41
|
+
<a href="README.ru.md">Русский</a> ·
|
|
42
|
+
<a href="README.zh-CN.md">中文</a>
|
|
43
|
+
</p>
|
|
44
|
+
|
|
45
|
+
---
|
|
46
|
+
|
|
47
|
+
<h2 align="center">✨ Why Audium?</h2>
|
|
48
|
+
|
|
49
|
+
Feed audio to an LLM. Get answers. Simple.
|
|
50
|
+
|
|
51
|
+
But raw transcripts burn tokens on noise: long timestamps, filler words,
|
|
52
|
+
silent segments, markup that adds nothing.
|
|
53
|
+
|
|
54
|
+
Audium turns speech into **the minimum viable Markdown**: every character
|
|
55
|
+
counts, nothing wasted.
|
|
56
|
+
|
|
57
|
+
<div align="center">
|
|
58
|
+
|
|
59
|
+
| 🎯 | ⚡ | 🪙 | 👁️ | 🌍 |
|
|
60
|
+
|---|---|---|---|---|
|
|
61
|
+
| **3 formats** | **GPU‑accelerated** | **Token‑aware** | **Watch mode** | **~97 languages** |
|
|
62
|
+
| compact, minimal, structured | 2–10× real‑time on CUDA | `[MM:SS]` + VAD + filler‑strip | drop files → auto‑transcribe | tiny to large‑v3 |
|
|
63
|
+
|
|
64
|
+
</div>
|
|
65
|
+
|
|
66
|
+
---
|
|
67
|
+
|
|
68
|
+
<h2 align="center">📦 Install</h2>
|
|
69
|
+
|
|
70
|
+
```bash
|
|
71
|
+
pip install audium-md
|
|
72
|
+
```
|
|
73
|
+
|
|
74
|
+
> Requires `ffmpeg` on your system: `sudo apt install ffmpeg` / `brew install ffmpeg`
|
|
75
|
+
|
|
76
|
+
---
|
|
77
|
+
|
|
78
|
+
<h2 align="center">🚀 Quick Start</h2>
|
|
79
|
+
|
|
80
|
+
```bash
|
|
81
|
+
# Process a folder
|
|
82
|
+
audium run ./my-recordings/
|
|
83
|
+
|
|
84
|
+
# Single file
|
|
85
|
+
audium run lecture.mp3
|
|
86
|
+
|
|
87
|
+
# Watch folder — auto‑transcribe new files
|
|
88
|
+
audium watch ./incoming/
|
|
89
|
+
|
|
90
|
+
# See what you've transcribed
|
|
91
|
+
audium list
|
|
92
|
+
|
|
93
|
+
# Change model
|
|
94
|
+
audium config set model large-v3
|
|
95
|
+
```
|
|
96
|
+
|
|
97
|
+
---
|
|
98
|
+
|
|
99
|
+
<h2 align="center">📝 Formats</h2>
|
|
100
|
+
|
|
101
|
+
### compact *(default)*
|
|
102
|
+
|
|
103
|
+
```
|
|
104
|
+
# lecture.mp3 (01:23:45)
|
|
105
|
+
|
|
106
|
+
[00:00] Neural networks learn hierarchical representations
|
|
107
|
+
[00:04] Each layer detects increasingly abstract features
|
|
108
|
+
[00:08] Early layers find edges and textures
|
|
109
|
+
[00:12] Later layers detect objects and scenes
|
|
110
|
+
```
|
|
111
|
+
|
|
112
|
+
### minimal
|
|
113
|
+
|
|
114
|
+
```
|
|
115
|
+
Neural networks learn hierarchical representations
|
|
116
|
+
Each layer detects increasingly abstract features
|
|
117
|
+
Early layers find edges and textures
|
|
118
|
+
Later layers detect objects and scenes
|
|
119
|
+
```
|
|
120
|
+
|
|
121
|
+
### structured *(requires speaker diarization)*
|
|
122
|
+
|
|
123
|
+
```
|
|
124
|
+
# interview.mp3 (00:45:12)
|
|
125
|
+
|
|
126
|
+
## Alice [00:00-00:30]
|
|
127
|
+
Neural networks are a powerful tool. It's important to understand their limitations.
|
|
128
|
+
|
|
129
|
+
## Bob [00:30-01:15]
|
|
130
|
+
I completely agree. Let me walk through an example to make this concrete.
|
|
131
|
+
```
|
|
132
|
+
|
|
133
|
+
---
|
|
134
|
+
|
|
135
|
+
<h2 align="center">⚙️ Commands</h2>
|
|
136
|
+
|
|
137
|
+
| Command | Description |
|
|
138
|
+
|---------|-------------|
|
|
139
|
+
| `audium run <path>` | Transcribe audio files or folders |
|
|
140
|
+
| `audium watch <path>` | Watch folder and auto‑process new files |
|
|
141
|
+
| `audium list [dir]` | Show processed transcripts with file sizes |
|
|
142
|
+
| `audium config` | Show current configuration |
|
|
143
|
+
| `audium config set <key> <value>` | Change a setting |
|
|
144
|
+
| `audium config reset` | Reset to factory defaults |
|
|
145
|
+
| `audium config path` | Show config file location |
|
|
146
|
+
|
|
147
|
+
### Common flags for `run` and `watch`
|
|
148
|
+
|
|
149
|
+
| Flag | Default | Description |
|
|
150
|
+
|------|---------|-------------|
|
|
151
|
+
| `-o, --output-dir` | `./transcripts` | Where to save .md files |
|
|
152
|
+
| `-f, --format` | `compact` | `compact` / `minimal` / `structured` |
|
|
153
|
+
| `-r, --recursive` | off | Search subdirectories |
|
|
154
|
+
| `--model` | `small` | `tiny` / `base` / `small` / `medium` / `large-v3` |
|
|
155
|
+
| `--language` | `auto` | Force language code: `ru`, `en`, `zh`, ... |
|
|
156
|
+
| `--strip-fillers` | off | Remove "um", "uh", "like", "мм", "ээ", etc. |
|
|
157
|
+
| `--no-vad` | off | Disable voice activity detection |
|
|
158
|
+
| `--no-progress` | off | Hide the progress bar |
|
|
159
|
+
|
|
160
|
+
---
|
|
161
|
+
|
|
162
|
+
<h2 align="center">🔧 Configuration</h2>
|
|
163
|
+
|
|
164
|
+
Settings are merged: **CLI flags > `.audium.yaml` (project) > `~/.config/audium/config.yaml` > defaults**
|
|
165
|
+
|
|
166
|
+
```bash
|
|
167
|
+
# Set default model
|
|
168
|
+
audium config set model large-v3
|
|
169
|
+
|
|
170
|
+
# Always strip filler words
|
|
171
|
+
audium config set strip_fillers true
|
|
172
|
+
|
|
173
|
+
# Custom output folder
|
|
174
|
+
audium config set output_dir ~/Documents/transcripts
|
|
175
|
+
|
|
176
|
+
# See what you changed
|
|
177
|
+
audium config
|
|
178
|
+
```
|
|
179
|
+
|
|
180
|
+
```yaml
|
|
181
|
+
# Example .audium.yaml (place in project root)
|
|
182
|
+
model: medium
|
|
183
|
+
language: ru
|
|
184
|
+
format: minimal
|
|
185
|
+
output_dir: ./transcripts
|
|
186
|
+
```
|
|
187
|
+
|
|
188
|
+
---
|
|
189
|
+
|
|
190
|
+
<h2 align="center">🪙 Token Optimization</h2>
|
|
191
|
+
|
|
192
|
+
Audium is built to minimize LLM token cost:
|
|
193
|
+
|
|
194
|
+
| Technique | Savings |
|
|
195
|
+
|-----------|---------|
|
|
196
|
+
| `[MM:SS]` instead of `[HH:MM:SS.mmm]` | ~30% on timestamps |
|
|
197
|
+
| VAD filtering (skip silence) | 15–40% on meeting recordings |
|
|
198
|
+
| Filler‑word stripping | 5–10% on conversational speech |
|
|
199
|
+
| `min_segment_duration` threshold | skip noise fragments |
|
|
200
|
+
| One line per segment, no blank lines | ~8% vs paragraph output |
|
|
201
|
+
|
|
202
|
+
---
|
|
203
|
+
|
|
204
|
+
<h2 align="center">📊 Model Sizes</h2>
|
|
205
|
+
|
|
206
|
+
| Model | Parameters | Speed (GPU) | Best for |
|
|
207
|
+
|-------|-----------|-------------|----------|
|
|
208
|
+
| tiny | 39M | ~32× real‑time | Quick drafts, low‑resource |
|
|
209
|
+
| base | 74M | ~16× real‑time | Dictation, clean audio |
|
|
210
|
+
| small | 244M | ~6× real‑time | **General purpose** |
|
|
211
|
+
| medium | 769M | ~2× real‑time | Accents, noisy audio |
|
|
212
|
+
| large‑v3 | 1.5B | ~1× real‑time | Maximum accuracy |
|
|
213
|
+
|
|
214
|
+
> All multilingual models support the same ~97 languages. The size trades accuracy for speed.
|
|
215
|
+
|
|
216
|
+
---
|
|
217
|
+
|
|
218
|
+
<h2 align="center">📄 License</h2>
|
|
219
|
+
|
|
220
|
+
<p align="center">
|
|
221
|
+
<a href="LICENSE"><img src="https://img.shields.io/badge/license-MIT-blue?style=for-the-badge" alt="MIT License"></a>
|
|
222
|
+
</p>
|
|
223
|
+
|
|
224
|
+
<p align="center">
|
|
225
|
+
MIT — do whatever you want. Attribution appreciated.
|
|
226
|
+
</p>
|
|
@@ -0,0 +1,210 @@
|
|
|
1
|
+
<p align="center">
|
|
2
|
+
<a href="https://github.com/Tamukj/Audium">
|
|
3
|
+
<img src="assets/logo.svg" width="180" alt="Audium logo">
|
|
4
|
+
</a>
|
|
5
|
+
</p>
|
|
6
|
+
|
|
7
|
+
<h1 align="center">Audium</h1>
|
|
8
|
+
|
|
9
|
+
<p align="center">
|
|
10
|
+
<strong>🎧 Audio → AI‑optimized Markdown</strong>
|
|
11
|
+
<br>
|
|
12
|
+
<sub>Transcribe MP3/WAV/FLAC into clean, token‑efficient Markdown — ready for any LLM.</sub>
|
|
13
|
+
</p>
|
|
14
|
+
|
|
15
|
+
<p align="center">
|
|
16
|
+
<a href="https://www.python.org/downloads/"><img src="https://img.shields.io/badge/python-3.10%2B-blue?style=flat&logo=python&logoColor=white" alt="Python 3.10+"></a>
|
|
17
|
+
<a href="LICENSE"><img src="https://img.shields.io/badge/license-MIT-green?style=flat" alt="MIT License"></a>
|
|
18
|
+
<a href="https://pypi.org/project/audium-md/"><img src="https://img.shields.io/badge/pypi-v0.1.0-blue?style=flat&logo=pypi&logoColor=white" alt="PyPI version"></a>
|
|
19
|
+
<a href="https://github.com/SYSTRAN/faster-whisper"><img src="https://img.shields.io/badge/backend-faster--whisper-8A2BE2?style=flat" alt="faster-whisper"></a>
|
|
20
|
+
<a href="https://github.com/Tamukj/Audium"><img src="https://img.shields.io/badge/platform-linux%20%7C%20macOS%20%7C%20windows-lightgrey?style=flat" alt="Platform"></a>
|
|
21
|
+
</p>
|
|
22
|
+
|
|
23
|
+
<p align="center">
|
|
24
|
+
<a href="README.md">English</a> ·
|
|
25
|
+
<a href="README.ru.md">Русский</a> ·
|
|
26
|
+
<a href="README.zh-CN.md">中文</a>
|
|
27
|
+
</p>
|
|
28
|
+
|
|
29
|
+
---
|
|
30
|
+
|
|
31
|
+
<h2 align="center">✨ Why Audium?</h2>
|
|
32
|
+
|
|
33
|
+
Feed audio to an LLM. Get answers. Simple.
|
|
34
|
+
|
|
35
|
+
But raw transcripts burn tokens on noise: long timestamps, filler words,
|
|
36
|
+
silent segments, markup that adds nothing.
|
|
37
|
+
|
|
38
|
+
Audium turns speech into **the minimum viable Markdown**: every character
|
|
39
|
+
counts, nothing wasted.
|
|
40
|
+
|
|
41
|
+
<div align="center">
|
|
42
|
+
|
|
43
|
+
| 🎯 | ⚡ | 🪙 | 👁️ | 🌍 |
|
|
44
|
+
|---|---|---|---|---|
|
|
45
|
+
| **3 formats** | **GPU‑accelerated** | **Token‑aware** | **Watch mode** | **~97 languages** |
|
|
46
|
+
| compact, minimal, structured | 2–10× real‑time on CUDA | `[MM:SS]` + VAD + filler‑strip | drop files → auto‑transcribe | tiny to large‑v3 |
|
|
47
|
+
|
|
48
|
+
</div>
|
|
49
|
+
|
|
50
|
+
---
|
|
51
|
+
|
|
52
|
+
<h2 align="center">📦 Install</h2>
|
|
53
|
+
|
|
54
|
+
```bash
|
|
55
|
+
pip install audium-md
|
|
56
|
+
```
|
|
57
|
+
|
|
58
|
+
> Requires `ffmpeg` on your system: `sudo apt install ffmpeg` / `brew install ffmpeg`
|
|
59
|
+
|
|
60
|
+
---
|
|
61
|
+
|
|
62
|
+
<h2 align="center">🚀 Quick Start</h2>
|
|
63
|
+
|
|
64
|
+
```bash
|
|
65
|
+
# Process a folder
|
|
66
|
+
audium run ./my-recordings/
|
|
67
|
+
|
|
68
|
+
# Single file
|
|
69
|
+
audium run lecture.mp3
|
|
70
|
+
|
|
71
|
+
# Watch folder — auto‑transcribe new files
|
|
72
|
+
audium watch ./incoming/
|
|
73
|
+
|
|
74
|
+
# See what you've transcribed
|
|
75
|
+
audium list
|
|
76
|
+
|
|
77
|
+
# Change model
|
|
78
|
+
audium config set model large-v3
|
|
79
|
+
```
|
|
80
|
+
|
|
81
|
+
---
|
|
82
|
+
|
|
83
|
+
<h2 align="center">📝 Formats</h2>
|
|
84
|
+
|
|
85
|
+
### compact *(default)*
|
|
86
|
+
|
|
87
|
+
```
|
|
88
|
+
# lecture.mp3 (01:23:45)
|
|
89
|
+
|
|
90
|
+
[00:00] Neural networks learn hierarchical representations
|
|
91
|
+
[00:04] Each layer detects increasingly abstract features
|
|
92
|
+
[00:08] Early layers find edges and textures
|
|
93
|
+
[00:12] Later layers detect objects and scenes
|
|
94
|
+
```
|
|
95
|
+
|
|
96
|
+
### minimal
|
|
97
|
+
|
|
98
|
+
```
|
|
99
|
+
Neural networks learn hierarchical representations
|
|
100
|
+
Each layer detects increasingly abstract features
|
|
101
|
+
Early layers find edges and textures
|
|
102
|
+
Later layers detect objects and scenes
|
|
103
|
+
```
|
|
104
|
+
|
|
105
|
+
### structured *(requires speaker diarization)*
|
|
106
|
+
|
|
107
|
+
```
|
|
108
|
+
# interview.mp3 (00:45:12)
|
|
109
|
+
|
|
110
|
+
## Alice [00:00-00:30]
|
|
111
|
+
Neural networks are a powerful tool. It's important to understand their limitations.
|
|
112
|
+
|
|
113
|
+
## Bob [00:30-01:15]
|
|
114
|
+
I completely agree. Let me walk through an example to make this concrete.
|
|
115
|
+
```
|
|
116
|
+
|
|
117
|
+
---
|
|
118
|
+
|
|
119
|
+
<h2 align="center">⚙️ Commands</h2>
|
|
120
|
+
|
|
121
|
+
| Command | Description |
|
|
122
|
+
|---------|-------------|
|
|
123
|
+
| `audium run <path>` | Transcribe audio files or folders |
|
|
124
|
+
| `audium watch <path>` | Watch folder and auto‑process new files |
|
|
125
|
+
| `audium list [dir]` | Show processed transcripts with file sizes |
|
|
126
|
+
| `audium config` | Show current configuration |
|
|
127
|
+
| `audium config set <key> <value>` | Change a setting |
|
|
128
|
+
| `audium config reset` | Reset to factory defaults |
|
|
129
|
+
| `audium config path` | Show config file location |
|
|
130
|
+
|
|
131
|
+
### Common flags for `run` and `watch`
|
|
132
|
+
|
|
133
|
+
| Flag | Default | Description |
|
|
134
|
+
|------|---------|-------------|
|
|
135
|
+
| `-o, --output-dir` | `./transcripts` | Where to save .md files |
|
|
136
|
+
| `-f, --format` | `compact` | `compact` / `minimal` / `structured` |
|
|
137
|
+
| `-r, --recursive` | off | Search subdirectories |
|
|
138
|
+
| `--model` | `small` | `tiny` / `base` / `small` / `medium` / `large-v3` |
|
|
139
|
+
| `--language` | `auto` | Force language code: `ru`, `en`, `zh`, ... |
|
|
140
|
+
| `--strip-fillers` | off | Remove "um", "uh", "like", "мм", "ээ", etc. |
|
|
141
|
+
| `--no-vad` | off | Disable voice activity detection |
|
|
142
|
+
| `--no-progress` | off | Hide the progress bar |
|
|
143
|
+
|
|
144
|
+
---
|
|
145
|
+
|
|
146
|
+
<h2 align="center">🔧 Configuration</h2>
|
|
147
|
+
|
|
148
|
+
Settings are merged: **CLI flags > `.audium.yaml` (project) > `~/.config/audium/config.yaml` > defaults**
|
|
149
|
+
|
|
150
|
+
```bash
|
|
151
|
+
# Set default model
|
|
152
|
+
audium config set model large-v3
|
|
153
|
+
|
|
154
|
+
# Always strip filler words
|
|
155
|
+
audium config set strip_fillers true
|
|
156
|
+
|
|
157
|
+
# Custom output folder
|
|
158
|
+
audium config set output_dir ~/Documents/transcripts
|
|
159
|
+
|
|
160
|
+
# See what you changed
|
|
161
|
+
audium config
|
|
162
|
+
```
|
|
163
|
+
|
|
164
|
+
```yaml
|
|
165
|
+
# Example .audium.yaml (place in project root)
|
|
166
|
+
model: medium
|
|
167
|
+
language: ru
|
|
168
|
+
format: minimal
|
|
169
|
+
output_dir: ./transcripts
|
|
170
|
+
```
|
|
171
|
+
|
|
172
|
+
---
|
|
173
|
+
|
|
174
|
+
<h2 align="center">🪙 Token Optimization</h2>
|
|
175
|
+
|
|
176
|
+
Audium is built to minimize LLM token cost:
|
|
177
|
+
|
|
178
|
+
| Technique | Savings |
|
|
179
|
+
|-----------|---------|
|
|
180
|
+
| `[MM:SS]` instead of `[HH:MM:SS.mmm]` | ~30% on timestamps |
|
|
181
|
+
| VAD filtering (skip silence) | 15–40% on meeting recordings |
|
|
182
|
+
| Filler‑word stripping | 5–10% on conversational speech |
|
|
183
|
+
| `min_segment_duration` threshold | skip noise fragments |
|
|
184
|
+
| One line per segment, no blank lines | ~8% vs paragraph output |
|
|
185
|
+
|
|
186
|
+
---
|
|
187
|
+
|
|
188
|
+
<h2 align="center">📊 Model Sizes</h2>
|
|
189
|
+
|
|
190
|
+
| Model | Parameters | Speed (GPU) | Best for |
|
|
191
|
+
|-------|-----------|-------------|----------|
|
|
192
|
+
| tiny | 39M | ~32× real‑time | Quick drafts, low‑resource |
|
|
193
|
+
| base | 74M | ~16× real‑time | Dictation, clean audio |
|
|
194
|
+
| small | 244M | ~6× real‑time | **General purpose** |
|
|
195
|
+
| medium | 769M | ~2× real‑time | Accents, noisy audio |
|
|
196
|
+
| large‑v3 | 1.5B | ~1× real‑time | Maximum accuracy |
|
|
197
|
+
|
|
198
|
+
> All multilingual models support the same ~97 languages. The size trades accuracy for speed.
|
|
199
|
+
|
|
200
|
+
---
|
|
201
|
+
|
|
202
|
+
<h2 align="center">📄 License</h2>
|
|
203
|
+
|
|
204
|
+
<p align="center">
|
|
205
|
+
<a href="LICENSE"><img src="https://img.shields.io/badge/license-MIT-blue?style=for-the-badge" alt="MIT License"></a>
|
|
206
|
+
</p>
|
|
207
|
+
|
|
208
|
+
<p align="center">
|
|
209
|
+
MIT — do whatever you want. Attribution appreciated.
|
|
210
|
+
</p>
|
|
@@ -0,0 +1,210 @@
|
|
|
1
|
+
<p align="center">
|
|
2
|
+
<a href="https://github.com/Tamukj/Audium">
|
|
3
|
+
<img src="assets/logo.svg" width="180" alt="Audium logo">
|
|
4
|
+
</a>
|
|
5
|
+
</p>
|
|
6
|
+
|
|
7
|
+
<h1 align="center">Audium</h1>
|
|
8
|
+
|
|
9
|
+
<p align="center">
|
|
10
|
+
<strong>🎧 Аудио → Markdown, оптимизированный для ИИ</strong>
|
|
11
|
+
<br>
|
|
12
|
+
<sub>Транскрибируйте MP3/WAV/FLAC в чистый, токен‑эффективный Markdown — готовый для любого LLM.</sub>
|
|
13
|
+
</p>
|
|
14
|
+
|
|
15
|
+
<p align="center">
|
|
16
|
+
<a href="https://www.python.org/downloads/"><img src="https://img.shields.io/badge/python-3.10%2B-blue?style=flat&logo=python&logoColor=white" alt="Python 3.10+"></a>
|
|
17
|
+
<a href="LICENSE"><img src="https://img.shields.io/badge/license-MIT-green?style=flat" alt="MIT License"></a>
|
|
18
|
+
<a href="https://pypi.org/project/audium-md/"><img src="https://img.shields.io/badge/pypi-v0.1.0-blue?style=flat&logo=pypi&logoColor=white" alt="PyPI version"></a>
|
|
19
|
+
<a href="https://github.com/SYSTRAN/faster-whisper"><img src="https://img.shields.io/badge/backend-faster--whisper-8A2BE2?style=flat" alt="faster-whisper"></a>
|
|
20
|
+
<a href="https://github.com/Tamukj/Audium"><img src="https://img.shields.io/badge/platform-linux%20%7C%20macOS%20%7C%20windows-lightgrey?style=flat" alt="Platform"></a>
|
|
21
|
+
</p>
|
|
22
|
+
|
|
23
|
+
<p align="center">
|
|
24
|
+
<a href="README.md">English</a> ·
|
|
25
|
+
<a href="README.ru.md">Русский</a> ·
|
|
26
|
+
<a href="README.zh-CN.md">中文</a>
|
|
27
|
+
</p>
|
|
28
|
+
|
|
29
|
+
---
|
|
30
|
+
|
|
31
|
+
<h2 align="center">✨ Зачем Audium?</h2>
|
|
32
|
+
|
|
33
|
+
Скормить аудио LLM и получить ответ. Просто.
|
|
34
|
+
|
|
35
|
+
Но сырые транскрипты сжигают токены на шум: длинные таймкоды,
|
|
36
|
+
слова‑паразиты, пустые сегменты, разметка, которая ничего не добавляет.
|
|
37
|
+
|
|
38
|
+
Audium превращает речь в **минимально достаточный Markdown**: каждый символ
|
|
39
|
+
на счету, ничего лишнего.
|
|
40
|
+
|
|
41
|
+
<div align="center">
|
|
42
|
+
|
|
43
|
+
| 🎯 | ⚡ | 🪙 | 👁️ | 🌍 |
|
|
44
|
+
|---|---|---|---|---|
|
|
45
|
+
| **3 формата** | **GPU‑ускорение** | **Оптимизация токенов** | **Режим наблюдения** | **~97 языков** |
|
|
46
|
+
| compact, minimal, structured | 2–10× реал. времени на CUDA | `[MM:SS]` + VAD + очистка паразитов | закинул файлы → авто‑транскрибация | tiny до large‑v3 |
|
|
47
|
+
|
|
48
|
+
</div>
|
|
49
|
+
|
|
50
|
+
---
|
|
51
|
+
|
|
52
|
+
<h2 align="center">📦 Установка</h2>
|
|
53
|
+
|
|
54
|
+
```bash
|
|
55
|
+
pip install audium-md
|
|
56
|
+
```
|
|
57
|
+
|
|
58
|
+
> Требуется `ffmpeg` на системе: `sudo apt install ffmpeg` / `brew install ffmpeg`
|
|
59
|
+
|
|
60
|
+
---
|
|
61
|
+
|
|
62
|
+
<h2 align="center">🚀 Быстрый старт</h2>
|
|
63
|
+
|
|
64
|
+
```bash
|
|
65
|
+
# Обработать папку
|
|
66
|
+
audium run ./my-recordings/
|
|
67
|
+
|
|
68
|
+
# Один файл
|
|
69
|
+
audium run lecture.mp3
|
|
70
|
+
|
|
71
|
+
# Следить за папкой — авто‑транскрибация новых файлов
|
|
72
|
+
audium watch ./incoming/
|
|
73
|
+
|
|
74
|
+
# Посмотреть что уже обработано
|
|
75
|
+
audium list
|
|
76
|
+
|
|
77
|
+
# Сменить модель
|
|
78
|
+
audium config set model large-v3
|
|
79
|
+
```
|
|
80
|
+
|
|
81
|
+
---
|
|
82
|
+
|
|
83
|
+
<h2 align="center">📝 Форматы</h2>
|
|
84
|
+
|
|
85
|
+
### compact *(по умолчанию)*
|
|
86
|
+
|
|
87
|
+
```
|
|
88
|
+
# lecture.mp3 (01:23:45)
|
|
89
|
+
|
|
90
|
+
[00:00] Нейронные сети обучаются иерархическим представлениям
|
|
91
|
+
[00:04] Каждый слой выявляет всё более абстрактные признаки
|
|
92
|
+
[00:08] На ранних слоях — грани и текстуры
|
|
93
|
+
[00:12] На поздних — объекты и сцены
|
|
94
|
+
```
|
|
95
|
+
|
|
96
|
+
### minimal
|
|
97
|
+
|
|
98
|
+
```
|
|
99
|
+
Нейронные сети обучаются иерархическим представлениям
|
|
100
|
+
Каждый слой выявляет всё более абстрактные признаки
|
|
101
|
+
На ранних слоях — грани и текстуры
|
|
102
|
+
На поздних — объекты и сцены
|
|
103
|
+
```
|
|
104
|
+
|
|
105
|
+
### structured *(требуется диаризация спикеров)*
|
|
106
|
+
|
|
107
|
+
```
|
|
108
|
+
# interview.mp3 (00:45:12)
|
|
109
|
+
|
|
110
|
+
## Alice [00:00-00:30]
|
|
111
|
+
Нейронные сети — мощный инструмент. Важно понимать их ограничения.
|
|
112
|
+
|
|
113
|
+
## Bob [00:30-01:15]
|
|
114
|
+
Полностью согласен. Давайте разберём конкретный пример.
|
|
115
|
+
```
|
|
116
|
+
|
|
117
|
+
---
|
|
118
|
+
|
|
119
|
+
<h2 align="center">⚙️ Команды</h2>
|
|
120
|
+
|
|
121
|
+
| Команда | Описание |
|
|
122
|
+
|---------|----------|
|
|
123
|
+
| `audium run <путь>` | Транскрибировать аудиофайлы или папки |
|
|
124
|
+
| `audium watch <путь>` | Следить за папкой и авто‑обрабатывать новые файлы |
|
|
125
|
+
| `audium list [папка]` | Показать обработанные транскрипты с размерами |
|
|
126
|
+
| `audium config` | Показать текущую конфигурацию |
|
|
127
|
+
| `audium config set <ключ> <значение>` | Изменить настройку |
|
|
128
|
+
| `audium config reset` | Сбросить к заводским значениям |
|
|
129
|
+
| `audium config path` | Показать путь к файлу конфигурации |
|
|
130
|
+
|
|
131
|
+
### Основные флаги для `run` и `watch`
|
|
132
|
+
|
|
133
|
+
| Флаг | По умолчанию | Описание |
|
|
134
|
+
|------|-------------|----------|
|
|
135
|
+
| `-o, --output-dir` | `./transcripts` | Куда сохранять .md файлы |
|
|
136
|
+
| `-f, --format` | `compact` | `compact` / `minimal` / `structured` |
|
|
137
|
+
| `-r, --recursive` | выкл | Искать в подпапках |
|
|
138
|
+
| `--model` | `small` | `tiny` / `base` / `small` / `medium` / `large-v3` |
|
|
139
|
+
| `--language` | `auto` | Код языка: `ru`, `en`, `zh`, ... |
|
|
140
|
+
| `--strip-fillers` | выкл | Удалять «мм», «ээ», «типа», «um», «uh» и т.д. |
|
|
141
|
+
| `--no-vad` | выкл | Отключить VAD (voice activity detection) |
|
|
142
|
+
| `--no-progress` | выкл | Скрыть прогресс‑бар |
|
|
143
|
+
|
|
144
|
+
---
|
|
145
|
+
|
|
146
|
+
<h2 align="center">🔧 Конфигурация</h2>
|
|
147
|
+
|
|
148
|
+
Настройки объединяются: **флаги CLI > `.audium.yaml` (проект) > `~/.config/audium/config.yaml` > значения по умолчанию**
|
|
149
|
+
|
|
150
|
+
```bash
|
|
151
|
+
# Модель по умолчанию
|
|
152
|
+
audium config set model large-v3
|
|
153
|
+
|
|
154
|
+
# Всегда удалять слова‑паразиты
|
|
155
|
+
audium config set strip_fillers true
|
|
156
|
+
|
|
157
|
+
# Своя папка для вывода
|
|
158
|
+
audium config set output_dir ~/Documents/transcripts
|
|
159
|
+
|
|
160
|
+
# Посмотреть изменения
|
|
161
|
+
audium config
|
|
162
|
+
```
|
|
163
|
+
|
|
164
|
+
```yaml
|
|
165
|
+
# Пример .audium.yaml (в корне проекта)
|
|
166
|
+
model: medium
|
|
167
|
+
language: ru
|
|
168
|
+
format: minimal
|
|
169
|
+
output_dir: ./transcripts
|
|
170
|
+
```
|
|
171
|
+
|
|
172
|
+
---
|
|
173
|
+
|
|
174
|
+
<h2 align="center">🪙 Оптимизация токенов</h2>
|
|
175
|
+
|
|
176
|
+
Audium создан чтобы минимизировать расход токенов LLM:
|
|
177
|
+
|
|
178
|
+
| Техника | Экономия |
|
|
179
|
+
|---------|----------|
|
|
180
|
+
| `[MM:SS]` вместо `[HH:MM:SS.mmm]` | ~30% на таймкодах |
|
|
181
|
+
| VAD‑фильтр (пропуск тишины) | 15–40% на записях встреч |
|
|
182
|
+
| Удаление слов‑паразитов | 5–10% на разговорной речи |
|
|
183
|
+
| Порог `min_segment_duration` | пропуск шумовых фрагментов |
|
|
184
|
+
| Одна строка на сегмент, без пустых строк | ~8% по сравнению с абзацами |
|
|
185
|
+
|
|
186
|
+
---
|
|
187
|
+
|
|
188
|
+
<h2 align="center">📊 Размеры моделей</h2>
|
|
189
|
+
|
|
190
|
+
| Модель | Параметров | Скорость (GPU) | Для чего |
|
|
191
|
+
|--------|-----------|----------------|----------|
|
|
192
|
+
| tiny | 39M | ~32× реал. времени | Черновики, слабые машины |
|
|
193
|
+
| base | 74M | ~16× реал. времени | Диктовка, чистое аудио |
|
|
194
|
+
| small | 244M | ~6× реал. времени | **Универсальная** |
|
|
195
|
+
| medium | 769M | ~2× реал. времени | Акценты, шумное аудио |
|
|
196
|
+
| large‑v3 | 1.5B | ~1× реал. времени | Максимальная точность |
|
|
197
|
+
|
|
198
|
+
> Все мультиязычные модели поддерживают одни и те же ~97 языков. Размер влияет на точность и скорость.
|
|
199
|
+
|
|
200
|
+
---
|
|
201
|
+
|
|
202
|
+
<h2 align="center">📄 Лицензия</h2>
|
|
203
|
+
|
|
204
|
+
<p align="center">
|
|
205
|
+
<a href="LICENSE"><img src="https://img.shields.io/badge/license-MIT-blue?style=for-the-badge" alt="MIT License"></a>
|
|
206
|
+
</p>
|
|
207
|
+
|
|
208
|
+
<p align="center">
|
|
209
|
+
MIT — делайте что хотите. Ссылка на автора приветствуется.
|
|
210
|
+
</p>
|