whispergram 0.1.0__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,21 @@
1
+ MIT License
2
+
3
+ Copyright (c) 2026 David Malko
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ of this software and associated documentation files (the "Software"), to deal
7
+ in the Software without restriction, including without limitation the rights
8
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ copies of the Software, and to permit persons to whom the Software is
10
+ furnished to do so, subject to the following conditions:
11
+
12
+ The above copyright notice and this permission notice shall be included in all
13
+ copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21
+ SOFTWARE.
@@ -0,0 +1,358 @@
1
+ Metadata-Version: 2.4
2
+ Name: whispergram
3
+ Version: 0.1.0
4
+ Summary: Transcribe Telegram voice & video messages locally with Whisper, merged into one chat transcript
5
+ Author-email: David Malko <davidmalko87@gmail.com>
6
+ License-Expression: MIT
7
+ Project-URL: Homepage, https://github.com/davidmalko87/whispergram
8
+ Project-URL: Repository, https://github.com/davidmalko87/whispergram
9
+ Project-URL: Changelog, https://github.com/davidmalko87/whispergram/blob/master/CHANGELOG.md
10
+ Project-URL: Bug Tracker, https://github.com/davidmalko87/whispergram/issues
11
+ Keywords: telegram,telegram-export,whisper,faster-whisper,transcription,speech-to-text,voice-messages,voice-to-text,offline
12
+ Classifier: Development Status :: 4 - Beta
13
+ Classifier: Environment :: GPU :: NVIDIA CUDA
14
+ Classifier: Intended Audience :: End Users/Desktop
15
+ Classifier: Operating System :: OS Independent
16
+ Classifier: Programming Language :: Python :: 3
17
+ Classifier: Programming Language :: Python :: 3.9
18
+ Classifier: Programming Language :: Python :: 3.10
19
+ Classifier: Programming Language :: Python :: 3.11
20
+ Classifier: Programming Language :: Python :: 3.12
21
+ Classifier: Programming Language :: Python :: 3.13
22
+ Classifier: Topic :: Communications :: Chat
23
+ Classifier: Topic :: Multimedia :: Sound/Audio :: Speech
24
+ Classifier: Topic :: Utilities
25
+ Requires-Python: >=3.9
26
+ Description-Content-Type: text/markdown
27
+ License-File: LICENSE
28
+ Requires-Dist: faster-whisper>=1.0
29
+ Provides-Extra: test
30
+ Requires-Dist: pytest>=7.0; extra == "test"
31
+ Requires-Dist: ruff; extra == "test"
32
+ Dynamic: license-file
33
+
34
+ # whispergram
35
+
36
+ [![CI](https://github.com/davidmalko87/whispergram/actions/workflows/ci.yml/badge.svg?branch=master)](https://github.com/davidmalko87/whispergram/actions/workflows/ci.yml)
37
+ [![PyPI version](https://img.shields.io/pypi/v/whispergram.svg)](https://pypi.org/project/whispergram/)
38
+ [![PyPI downloads](https://img.shields.io/pypi/dm/whispergram.svg)](https://pypi.org/project/whispergram/)
39
+ [![Python](https://img.shields.io/pypi/pyversions/whispergram.svg?logo=python&logoColor=white)](https://pypi.org/project/whispergram/)
40
+ [![License](https://img.shields.io/badge/license-MIT-green.svg)](LICENSE)
41
+ [![Platform](https://img.shields.io/badge/platform-Windows%20%7C%20macOS%20%7C%20Linux-lightgrey.svg)](#)
42
+ [![Offline](https://img.shields.io/badge/100%25-local%20%26%20offline-success.svg)](#%EF%B8%8F-privacy)
43
+ [![Round-trip](https://img.shields.io/badge/round--trip-validated-success.svg)](#-round-trip-validated)
44
+ [![Last commit](https://img.shields.io/github/last-commit/davidmalko87/whispergram.svg)](https://github.com/davidmalko87/whispergram/commits/master)
45
+ [![GitHub issues](https://img.shields.io/github/issues/davidmalko87/whispergram.svg)](https://github.com/davidmalko87/whispergram/issues)
46
+
47
+ > **Telegram voice-to-text, locally.** Transcribe Telegram **voice and round video messages** with
48
+ > Whisper ([faster-whisper](https://github.com/SYSTRAN/faster-whisper)) and merge them into one
49
+ > searchable, LLM-ready chat transcript — **100% offline, no API key, no cloud.**
50
+
51
+ Every line is tagged by sender and timestamp, with voice notes transcribed inline next to the text:
52
+
53
+ ```
54
+ [2026-06-20 12:33] Alex (voice 14s): hey, just finished the thing we talked about
55
+ [2026-06-20 12:35] You: nice, send it over
56
+ [2026-06-20 12:46] Alex (video-note 8s): here it is
57
+ [2026-06-20 12:47] You (photo): looks great
58
+ [2026-06-20 12:47] Alex (sticker 👍)
59
+ ```
60
+
61
+ ---
62
+
63
+ ## Why?
64
+
65
+ An audio-heavy Telegram chat is unreadable and unsearchable — you cannot grep a voice note, and
66
+ you cannot hand a folder of `.ogg` files to an LLM. The alternatives are worse: Telegram Premium
67
+ transcribes one message at a time by hand, and cloud speech APIs upload your private audio to a
68
+ third party. **whispergram** transcribes **every** voice and video note in one pass, entirely on
69
+ your own machine, and weaves them back into the text timeline as a single file you can read,
70
+ search, or feed to a model.
71
+
72
+ ---
73
+
74
+ ## Features
75
+
76
+ | Feature | Description |
77
+ |---|---|
78
+ | **Voice + video notes** | Both `voice_message` and round `video_message` notes are transcribed inline with the text |
79
+ | **One merged file** | A single `merged_chat.md`, chronological, every line tagged `[time] sender` |
80
+ | **100% local & offline** | faster-whisper runs on your machine — no upload, no API key, no account |
81
+ | **Lossless mapping** | Stickers, photos, animations, documents, music, locations, polls and contacts appear as markers — nothing content-bearing is dropped |
82
+ | **Handles missing media** | Notes excluded from the export are clearly marked `[not exported]`, never fed to the model |
83
+ | **All text shapes** | Reconstructs plain, rich, and entity-based message text (links, mentions, custom emoji) |
84
+ | **Dry-run** | Preview the full merge with `--dry-run` — no model download, no GPU, instant |
85
+ | **GPU or CPU** | CUDA with automatic CPU fallback; a one-command Windows CUDA fix is built in |
86
+ | **Auto-detect** | Finds the export JSON (any filename) and the language per file |
87
+ | **Tested** | 44 offline tests on the Python 3.9–3.13 CI matrix |
88
+
89
+ ---
90
+
91
+ ## Quick Start
92
+
93
+ ### 1. Install
94
+
95
+ **Via PyPI (recommended):**
96
+
97
+ ```bash
98
+ pip install whispergram
99
+ ```
100
+
101
+ **Or clone for development:**
102
+
103
+ ```bash
104
+ git clone https://github.com/davidmalko87/whispergram.git
105
+ cd whispergram
106
+ pip install -r requirements.txt
107
+ ```
108
+
109
+ You also need **ffmpeg** on your PATH:
110
+
111
+ ```bash
112
+ # Linux: sudo apt install ffmpeg
113
+ # macOS: brew install ffmpeg
114
+ # Windows: choco install ffmpeg (or: winget install Gyan.FFmpeg)
115
+ ```
116
+
117
+ ### 2. Export your chat from Telegram
118
+
119
+ Telegram **Desktop** → open the chat → ⋮ menu → **Export chat history**:
120
+
121
+ - Format: **JSON**
122
+ - Tick **Voice messages** (and **Video messages** for round notes)
123
+
124
+ You get a folder with a `.json` file plus `voice_messages/` and `video_files/` subfolders.
125
+
126
+ ### 3. Run
127
+
128
+ From **inside** the export folder:
129
+
130
+ ```bash
131
+ whispergram
132
+ # or, without installing:
133
+ python whispergram.py
134
+ ```
135
+
136
+ …or point it at the folder:
137
+
138
+ ```bash
139
+ whispergram "path/to/ChatExport_2026-06-20"
140
+ ```
141
+
142
+ The result is `merged_chat.md` in the export folder.
143
+
144
+ ---
145
+
146
+ ## Example output
147
+
148
+ ```
149
+ [2026-06-20 12:33] Alex: did you get the files?
150
+ [2026-06-20 12:33] You: yep, check https://example.com thanks
151
+ [2026-06-20 12:34] Alex (voice 6s): one sec, recording the summary now ...
152
+ [2026-06-20 12:35] Alex (video-note 8s): [not exported]
153
+ [2026-06-20 12:35] You (sticker 😅)
154
+ [2026-06-20 12:36] Alex (photo): the whiteboard from today
155
+ ```
156
+
157
+ ---
158
+
159
+ ## How each message appears
160
+
161
+ | Message type | In the merged file |
162
+ |---|---|
163
+ | Text | `[time] sender: message text` |
164
+ | Voice note | `[time] sender (voice 12s): <transcript>` |
165
+ | Round video note | `[time] sender (video-note 8s): <transcript>` |
166
+ | Voice/video note **with caption** | `[time] sender (voice 12s): <transcript> \| caption: <text>` |
167
+ | Voice/video not downloaded | `[time] sender (voice 12s): [not exported]` |
168
+ | Sticker | `[time] sender (sticker 😅)` |
169
+ | Photo (with caption) | `[time] sender (photo): caption` |
170
+ | Animation / GIF | `[time] sender (animation)` |
171
+ | Document | `[time] sender (file: report.pdf): caption` |
172
+ | Location / poll / contact | `[time] sender (location)` · `(poll)` · `(contact)` |
173
+ | Music / audio file | `[time] sender (audio: Artist - Title)` — transcribe with `--audio-files` |
174
+
175
+ Markers can be turned off with `--no-media-markers` (voice/video notes are always transcribed).
176
+
177
+ ---
178
+
179
+ ## ✅ Round-trip Validated
180
+
181
+ The merge has been **validated against a real 770-message Telegram export** (a live, audio-heavy
182
+ chat — not a synthetic fixture). Every dimension was diffed against the source JSON:
183
+
184
+ | Dimension | In export | In merged file | Result |
185
+ |---|---|---|---|
186
+ | Voice notes (downloaded) | 4 | 4 transcribed | ✅ |
187
+ | Round video notes (not downloaded) | 5 | 5 `[not exported]` | ✅ |
188
+ | Other media (stickers, photos, animations, videos, audio, …) | 107 | 107 markers | ✅ |
189
+ | Text messages | 654 | 654 | ✅ |
190
+ | **Messages dropped** | — | **0** | ✅ |
191
+
192
+ **All 770 messages map to 770 lines** — the per-type counts match the source exactly, and
193
+ not-exported notes are never sent to the model. (An earlier version silently dropped 88 of those
194
+ messages — every sticker, photo, and caption-less media item — leaving misleading gaps. The
195
+ round-trip is what surfaced it.)
196
+
197
+ > That export is private, so these counts were measured locally and are not reproducible from this
198
+ > repo. The synthetic export under [`tests/fixtures/`](tests/fixtures/) reproduces the same
199
+ > lossless mapping across every media type and guards it automatically in CI. A faithful merge is
200
+ > only proven once it has been run end-to-end and the output diffed back against every message type
201
+ > — structural validity alone is not enough.
202
+
203
+ ---
204
+
205
+ ## Known Limitations
206
+
207
+ These follow from the Telegram export format and from speech recognition itself — not from a lack
208
+ of effort in the tool:
209
+
210
+ | Area | Status | Notes |
211
+ |---|---|---|
212
+ | Round video notes | Audio only, if downloaded | Telegram often excludes the binary; those show `[not exported]` |
213
+ | Music / `audio_file` | Off by default | Opt in with `--audio-files`; songs are otherwise not run through ASR |
214
+ | Speaker labels | Sender only | Each note is attributed to its Telegram sender; no in-audio diarization |
215
+ | Timestamps | Minute resolution | Telegram exports `YYYY-MM-DDThh:mm`; seconds are not shown |
216
+ | Reactions / edits / replies | Not represented | The merged file is a clean reading transcript, not a full forensic dump |
217
+ | Transcription accuracy | Model-dependent | `large-v3` is best for uk/ru; `--lang` forces a language if auto-detect slips |
218
+
219
+ ---
220
+
221
+ ## Options
222
+
223
+ ```bash
224
+ whispergram --device cpu --model large-v3-turbo # no GPU, fast
225
+ whispergram --lang uk # force a language
226
+ whispergram --dry-run # preview the merge, no transcription
227
+ whispergram --audio-files # also transcribe music/long audio files
228
+ whispergram --out result.md # custom output path
229
+ ```
230
+
231
+ | Flag | Default | Notes |
232
+ |---|---|---|
233
+ | `--device` | `cuda` | `cuda` or `cpu`; auto-falls back to CPU if the GPU fails |
234
+ | `--model` | `large-v3` | try `large-v3-turbo` or `medium` if CPU is slow |
235
+ | `--lang` | auto | force a code like `uk`, `ru`, `en` if auto-detect mislabels |
236
+ | `--out` | `merged_chat.md` | output file |
237
+ | `--audio-files` | off | also transcribe `audio_file` messages (music, long memos) |
238
+ | `--no-media-markers` | off | omit `(sticker)` / `(photo)` / `(file)` markers |
239
+ | `--dry-run` | off | map the chat without loading a model or transcribing |
240
+ | `--setup-cuda-windows` | — | copy CUDA DLLs next to ctranslate2, then exit (Windows GPU fix) |
241
+
242
+ ---
243
+
244
+ ## GPU (CUDA) setup
245
+
246
+ **Linux / macOS:** with a working CUDA install it runs as-is on `--device cuda`.
247
+
248
+ **Windows** — the common pitfall is `RuntimeError: Library cublas64_12.dll is not found`:
249
+
250
+ 1. Install the CUDA runtime wheels (no full CUDA Toolkit needed):
251
+ ```bash
252
+ pip install nvidia-cublas-cu12 nvidia-cudnn-cu12
253
+ pip install -U "ctranslate2>=4.5"
254
+ ```
255
+ 2. If it *still* can't find the DLL, copy them next to CTranslate2 (the reliable fix):
256
+ ```bash
257
+ python whispergram.py --setup-cuda-windows
258
+ ```
259
+ 3. Or skip the GPU entirely: `--device cpu --model large-v3-turbo`.
260
+
261
+ > CTranslate2 loads cuBLAS/cuDNN lazily in native code that ignores `os.add_dll_directory`,
262
+ > which is why placing the DLLs inside the package dir is the dependable solution.
263
+
264
+ ---
265
+
266
+ ## FAQ
267
+
268
+ **How do I transcribe Telegram voice messages?**
269
+ Export the chat from Telegram Desktop as JSON (with voice messages), then run `whispergram` in the
270
+ export folder. Every voice note is transcribed with Whisper and merged into the text chat.
271
+
272
+ **Is it private / offline? Does my audio leave my machine?**
273
+ Yes, it is offline. Transcription runs locally with faster-whisper and needs no account or API key.
274
+ The tool makes no network calls with your data; faster-whisper downloads the speech model **once**
275
+ on first run, then works fully offline. Your chat audio and transcripts never leave your machine.
276
+
277
+ **Do I need a GPU?**
278
+ No. It runs on CPU (`--device cpu`); use `--model large-v3-turbo` for speed. A CUDA GPU is faster.
279
+
280
+ **Does it handle round video messages / video notes?**
281
+ Yes — round `video_message` notes are transcribed from their audio, just like voice notes.
282
+
283
+ **Which languages work?**
284
+ Any language Whisper supports. `large-v3` handles Ukrainian and Russian well; use `--lang uk` (or
285
+ `ru`, `en`, …) to force one if auto-detection slips.
286
+
287
+ **How is this different from Telegram Premium's transcription?**
288
+ Premium transcribes one message at a time, by hand, in the app. whispergram transcribes the
289
+ **entire** chat in one pass, offline, and produces a single searchable file.
290
+
291
+ ---
292
+
293
+ ## Project Structure
294
+
295
+ ```
296
+ whispergram/
297
+ ├── whispergram.py # The tool: text reconstruction, mapping, transcription, CLI
298
+ ├── requirements.txt # Runtime dependency (faster-whisper)
299
+ ├── pyproject.toml # Packaging + ruff + pytest configuration
300
+ ├── CHANGELOG.md
301
+ ├── CONTRIBUTING.md
302
+ ├── LICENSE
303
+ ├── README.md
304
+
305
+ ├── .github/
306
+ │ ├── workflows/
307
+ │ │ ├── ci.yml # ruff + pytest on Python 3.9–3.13 (no transcription deps)
308
+ │ │ └── publish.yml # tag v* → verify version → build → PyPI (trusted publishing)
309
+ │ ├── ISSUE_TEMPLATE/
310
+ │ └── dependabot.yml
311
+
312
+ └── tests/
313
+ ├── test_whispergram.py # 44 offline tests — no model download or GPU required
314
+ └── fixtures/
315
+ └── sample_export/
316
+ └── result.json # synthetic export (safe to commit; used by tests + CI)
317
+ ```
318
+
319
+ ---
320
+
321
+ ## ⚠️ Privacy
322
+
323
+ This tool processes **private conversations**, and the transcripts it produces are just as
324
+ sensitive as the audio. Two rules:
325
+
326
+ - **Nothing leaves your machine.** Transcription is fully local; the tool makes no network calls
327
+ with your data and needs no credentials.
328
+ - **Never commit your exports or transcripts.** The included `.gitignore` blocks chat data
329
+ (`*.json`, audio files, `merged_chat.md`, `ChatExport_*/`) by default — keep it. Build your repo
330
+ in a folder **separate** from any export, keep any `--out` path **inside** the export folder, and
331
+ run `git status` before pushing to confirm only code is staged. The only data file in this repo
332
+ is the synthetic fixture under `tests/fixtures/`.
333
+
334
+ ---
335
+
336
+ ## Requirements
337
+
338
+ - Python **3.9+**
339
+ - [ffmpeg](https://ffmpeg.org/) on your PATH
340
+ - [`faster-whisper`](https://pypi.org/project/faster-whisper/) >= 1.0 (`pip install -r requirements.txt`)
341
+ - For NVIDIA GPU on Windows: `nvidia-cublas-cu12`, `nvidia-cudnn-cu12`, `ctranslate2>=4.5`
342
+
343
+ > The test suite needs none of the above — only `ruff` and `pytest`.
344
+
345
+ ---
346
+
347
+ ## Changelog
348
+
349
+ See [CHANGELOG.md](CHANGELOG.md) for the full version history.
350
+
351
+ ## Contributing
352
+
353
+ See [CONTRIBUTING.md](CONTRIBUTING.md) for the development setup, the privacy rule, and the
354
+ versioning / release policy.
355
+
356
+ ## License
357
+
358
+ [MIT](LICENSE)