summscriber 0.1.0__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,130 @@
1
+ Metadata-Version: 2.4
2
+ Name: summscriber
3
+ Version: 0.1.0
4
+ Summary: Transcribe audio with Whisper (faster-whisper), with summarization (pysummarization, sumy, OpenAI) and short reply with OpenAI.
5
+ License: MIT
6
+ Project-URL: Repository, https://github.com/pablogventura/summscriber
7
+ Project-URL: Homepage, https://github.com/pablogventura/summscriber
8
+ Keywords: whisper,transcription,speech-to-text,summarization,openai
9
+ Classifier: Development Status :: 4 - Beta
10
+ Classifier: Intended Audience :: Developers
11
+ Classifier: License :: OSI Approved :: MIT License
12
+ Classifier: Programming Language :: Python :: 3
13
+ Classifier: Programming Language :: Python :: 3.10
14
+ Classifier: Programming Language :: Python :: 3.11
15
+ Classifier: Programming Language :: Python :: 3.12
16
+ Classifier: Topic :: Multimedia :: Sound/Audio :: Speech
17
+ Requires-Python: >=3.10
18
+ Description-Content-Type: text/markdown
19
+ Requires-Dist: faster-whisper>=1.0.0
20
+ Requires-Dist: pysummarization>=1.1.0
21
+ Requires-Dist: sumy>=0.11.0
22
+ Requires-Dist: openai>=1.0.0
23
+ Provides-Extra: dev
24
+ Requires-Dist: pytest>=7; extra == "dev"
25
+ Provides-Extra: publish
26
+ Requires-Dist: build>=1.0; extra == "publish"
27
+ Requires-Dist: twine>=5.0; extra == "publish"
28
+
29
+ # Summscriber
30
+
31
+ Transcribe audio with [faster-whisper](https://github.com/SYSTRAN/faster-whisper) (Whisper), with summarization options (pysummarization, sumy, OpenAI) and short reply generation via OpenAI API.
32
+
33
+ **Repository:** [github.com/pablogventura/summscriber](https://github.com/pablogventura/summscriber)
34
+
35
+ ## Installation
36
+
37
+ ### With pipx (recommended: isolated env, global command)
38
+
39
+ ```bash
40
+ pipx install git+https://github.com/pablogventura/summscriber.git
41
+ ```
42
+
43
+ To upgrade:
44
+
45
+ ```bash
46
+ pipx upgrade summscriber
47
+ ```
48
+
49
+ ### With pip (from project directory)
50
+
51
+ ```bash
52
+ pip install .
53
+ ```
54
+
55
+ Or in editable mode (development):
56
+
57
+ ```bash
58
+ pip install -e .
59
+ ```
60
+
61
+ Or from the repository:
62
+
63
+ ```bash
64
+ pip install git+https://github.com/pablogventura/summscriber.git
65
+ ```
66
+
67
+ Or from PyPI (once published):
68
+
69
+ ```bash
70
+ pip install summscriber
71
+ ```
72
+
73
+ ## Usage
74
+
75
+ After installing, the `summscriber` command is available:
76
+
77
+ ```bash
78
+ summscriber FILE [options]
79
+ ```
80
+
81
+ Examples:
82
+
83
+ ```bash
84
+ summscriber recording.mp3
85
+ summscriber interview.ogg --summary
86
+ summscriber audio.wav --summary --reply --json
87
+ ```
88
+
89
+ ### Main options
90
+
91
+ - **FILE**: audio file to transcribe (required).
92
+ - `--summary`: summarize (OpenAI if token works; otherwise shortest of pysummarization and sumy).
93
+ - `--summary-pysummarization` / `--summary-sumy` / `--summary-openai`: use a specific summarization backend.
94
+ - `--summary-sentences N`: number of sentences in the summary (default 3).
95
+ - `--reply`: generate a short reply to the message with OpenAI.
96
+ - `--json`: output as JSON.
97
+
98
+ For summarization and reply with OpenAI, use `config.ini` (section `[openai]`) or environment variables `OPENAI_API_KEY` and optionally `OPENAI_BASE_URL`. See `config.ini.example`. You can save your token and URL with:
99
+
100
+ ```bash
101
+ summscriber --save-config --api-key YOUR_TOKEN --base-url https://...
102
+ ```
103
+
104
+ ## Development
105
+
106
+ From the repo root without installing:
107
+
108
+ ```bash
109
+ python -m summscriber FILE [options]
110
+ ```
111
+
112
+ ## Publishing to PyPI
113
+
114
+ 1. Install build tools: `pip install build twine` (or `pip install ".[publish]"`).
115
+ 2. Create a PyPI account and an API token at [pypi.org/manage/account/token/](https://pypi.org/manage/account/token/).
116
+ 3. From the project root run:
117
+
118
+ ```bash
119
+ ./publish.sh
120
+ ```
121
+
122
+ Or manually:
123
+
124
+ ```bash
125
+ rm -rf build dist *.egg-info
126
+ python -m build
127
+ twine upload dist/*
128
+ ```
129
+
130
+ When prompted, use username `__token__` and password your PyPI token. Or set `TWINE_USERNAME=__token__` and `TWINE_PASSWORD=pypi-your-token`.
@@ -0,0 +1,102 @@
1
+ # Summscriber
2
+
3
+ Transcribe audio with [faster-whisper](https://github.com/SYSTRAN/faster-whisper) (Whisper), with summarization options (pysummarization, sumy, OpenAI) and short reply generation via OpenAI API.
4
+
5
+ **Repository:** [github.com/pablogventura/summscriber](https://github.com/pablogventura/summscriber)
6
+
7
+ ## Installation
8
+
9
+ ### With pipx (recommended: isolated env, global command)
10
+
11
+ ```bash
12
+ pipx install git+https://github.com/pablogventura/summscriber.git
13
+ ```
14
+
15
+ To upgrade:
16
+
17
+ ```bash
18
+ pipx upgrade summscriber
19
+ ```
20
+
21
+ ### With pip (from project directory)
22
+
23
+ ```bash
24
+ pip install .
25
+ ```
26
+
27
+ Or in editable mode (development):
28
+
29
+ ```bash
30
+ pip install -e .
31
+ ```
32
+
33
+ Or from the repository:
34
+
35
+ ```bash
36
+ pip install git+https://github.com/pablogventura/summscriber.git
37
+ ```
38
+
39
+ Or from PyPI (once published):
40
+
41
+ ```bash
42
+ pip install summscriber
43
+ ```
44
+
45
+ ## Usage
46
+
47
+ After installing, the `summscriber` command is available:
48
+
49
+ ```bash
50
+ summscriber FILE [options]
51
+ ```
52
+
53
+ Examples:
54
+
55
+ ```bash
56
+ summscriber recording.mp3
57
+ summscriber interview.ogg --summary
58
+ summscriber audio.wav --summary --reply --json
59
+ ```
60
+
61
+ ### Main options
62
+
63
+ - **FILE**: audio file to transcribe (required).
64
+ - `--summary`: summarize (OpenAI if token works; otherwise shortest of pysummarization and sumy).
65
+ - `--summary-pysummarization` / `--summary-sumy` / `--summary-openai`: use a specific summarization backend.
66
+ - `--summary-sentences N`: number of sentences in the summary (default 3).
67
+ - `--reply`: generate a short reply to the message with OpenAI.
68
+ - `--json`: output as JSON.
69
+
70
+ For summarization and reply with OpenAI, use `config.ini` (section `[openai]`) or environment variables `OPENAI_API_KEY` and optionally `OPENAI_BASE_URL`. See `config.ini.example`. You can save your token and URL with:
71
+
72
+ ```bash
73
+ summscriber --save-config --api-key YOUR_TOKEN --base-url https://...
74
+ ```
75
+
76
+ ## Development
77
+
78
+ From the repo root without installing:
79
+
80
+ ```bash
81
+ python -m summscriber FILE [options]
82
+ ```
83
+
84
+ ## Publishing to PyPI
85
+
86
+ 1. Install build tools: `pip install build twine` (or `pip install ".[publish]"`).
87
+ 2. Create a PyPI account and an API token at [pypi.org/manage/account/token/](https://pypi.org/manage/account/token/).
88
+ 3. From the project root run:
89
+
90
+ ```bash
91
+ ./publish.sh
92
+ ```
93
+
94
+ Or manually:
95
+
96
+ ```bash
97
+ rm -rf build dist *.egg-info
98
+ python -m build
99
+ twine upload dist/*
100
+ ```
101
+
102
+ When prompted, use username `__token__` and password your PyPI token. Or set `TWINE_USERNAME=__token__` and `TWINE_PASSWORD=pypi-your-token`.
@@ -0,0 +1,49 @@
1
+ [build-system]
2
+ requires = ["setuptools>=61", "wheel"]
3
+ build-backend = "setuptools.build_meta"
4
+
5
+ [project]
6
+ name = "summscriber"
7
+ version = "0.1.0"
8
+ description = "Transcribe audio with Whisper (faster-whisper), with summarization (pysummarization, sumy, OpenAI) and short reply with OpenAI."
9
+ readme = "README.md"
10
+ license = { text = "MIT" }
11
+ requires-python = ">=3.10"
12
+ authors = []
13
+ keywords = ["whisper", "transcription", "speech-to-text", "summarization", "openai"]
14
+ classifiers = [
15
+ "Development Status :: 4 - Beta",
16
+ "Intended Audience :: Developers",
17
+ "License :: OSI Approved :: MIT License",
18
+ "Programming Language :: Python :: 3",
19
+ "Programming Language :: Python :: 3.10",
20
+ "Programming Language :: Python :: 3.11",
21
+ "Programming Language :: Python :: 3.12",
22
+ "Topic :: Multimedia :: Sound/Audio :: Speech",
23
+ ]
24
+ dependencies = [
25
+ "faster-whisper>=1.0.0",
26
+ "pysummarization>=1.1.0",
27
+ "sumy>=0.11.0",
28
+ "openai>=1.0.0",
29
+ ]
30
+
31
+ [project.urls]
32
+ Repository = "https://github.com/pablogventura/summscriber"
33
+ Homepage = "https://github.com/pablogventura/summscriber"
34
+
35
+ [project.scripts]
36
+ summscriber = "summscriber.cli:main"
37
+
38
+ [project.optional-dependencies]
39
+ dev = [
40
+ "pytest>=7",
41
+ ]
42
+ publish = [
43
+ "build>=1.0",
44
+ "twine>=5.0",
45
+ ]
46
+
47
+ [tool.setuptools.packages.find]
48
+ where = ["."]
49
+ include = ["summscriber*"]
@@ -0,0 +1,4 @@
1
+ [egg_info]
2
+ tag_build =
3
+ tag_date = 0
4
+
@@ -0,0 +1,3 @@
1
+ """Summscriber: transcribe audio with Whisper, summarization and reply with OpenAI."""
2
+
3
+ __version__ = "0.1.0"
@@ -0,0 +1,6 @@
1
+ """Run the package with: python -m summscriber FILE ..."""
2
+
3
+ from summscriber.cli import main
4
+
5
+ if __name__ == "__main__":
6
+ main()
@@ -0,0 +1,424 @@
1
+ """Summscriber CLI: transcription, summarization, and reply generation."""
2
+
3
+ import argparse
4
+ import configparser
5
+ import json
6
+ import os
7
+ import sys
8
+ from pathlib import Path
9
+
10
+ import ctranslate2
11
+ from faster_whisper import WhisperModel
12
+ from openai import OpenAI
13
+
14
+ from pysummarization.nlpbase.auto_abstractor import AutoAbstractor
15
+ from pysummarization.tokenizabledoc.simple_tokenizer import SimpleTokenizer
16
+ from pysummarization.abstractabledoc.top_n_rank_abstractor import TopNRankAbstractor
17
+
18
+ from sumy.parsers.plaintext import PlaintextParser
19
+ from sumy.nlp.tokenizers import Tokenizer
20
+ from sumy.summarizers.lex_rank import LexRankSummarizer
21
+ from sumy.nlp.stemmers import Stemmer
22
+ from sumy.utils import get_stop_words
23
+
24
+ # Config file: first cwd (where command is run), then next to package
25
+ CONFIG_FILENAME = "config.ini"
26
+
27
+ # Language code (Whisper, ISO 639-1) -> sumy name (stopwords/stemmer)
28
+ _LANGUAGE_TO_SUMY = {
29
+ "ar": "arabic", "zh": "chinese", "cs": "czech", "en": "english",
30
+ "fr": "french", "de": "german", "el": "greek", "he": "hebrew",
31
+ "it": "italian", "ja": "japanese", "ko": "korean", "pt": "portuguese",
32
+ "sk": "slovak", "es": "spanish", "uk": "ukrainian",
33
+ }
34
+ # For OpenAI prompt: code -> language name
35
+ _LANGUAGE_CODE_TO_NAME = {
36
+ "es": "Spanish", "en": "English", "fr": "French", "de": "German",
37
+ "it": "Italian", "pt": "Portuguese", "ar": "Arabic", "zh": "Chinese",
38
+ "ja": "Japanese", "ko": "Korean", "ru": "Russian", "uk": "Ukrainian",
39
+ }
40
+
41
+
42
+ def _language_to_sumy(whisper_code: str) -> str:
43
+ code = (whisper_code or "en")[:2].lower()
44
+ return _LANGUAGE_TO_SUMY.get(code, "english")
45
+
46
+
47
+ def _language_name_for_prompt(whisper_code: str) -> str | None:
48
+ code = (whisper_code or "")[:2].lower()
49
+ return _LANGUAGE_CODE_TO_NAME.get(code)
50
+
51
+
52
+ def _load_openai_config() -> dict:
53
+ """Read api_key, base_url and model from config.ini or environment variables."""
54
+ config_path = Path.cwd() / CONFIG_FILENAME
55
+ if not config_path.exists():
56
+ config_path = Path(__file__).resolve().parent / CONFIG_FILENAME
57
+ if not config_path.exists():
58
+ config_path = Path(__file__).resolve().parent.parent / CONFIG_FILENAME
59
+ out = {"api_key": "", "base_url": "", "model": "gpt-4o-mini"}
60
+ if config_path.exists():
61
+ parser = configparser.ConfigParser()
62
+ parser.read(config_path, encoding="utf-8")
63
+ if parser.has_section("openai"):
64
+ out["api_key"] = parser.get("openai", "api_key", fallback="").strip()
65
+ out["base_url"] = parser.get(
66
+ "openai", "base_url", fallback=""
67
+ ).strip() or "https://api.openai.com/v1"
68
+ out["model"] = parser.get("openai", "model", fallback="gpt-4o-mini").strip()
69
+ out["api_key"] = os.environ.get("OPENAI_API_KEY") or out["api_key"]
70
+ out["base_url"] = os.environ.get("OPENAI_BASE_URL") or out["base_url"] or "https://api.openai.com/v1"
71
+ return out
72
+
73
+
74
+ def _save_openai_config(
75
+ api_key: str,
76
+ base_url: str,
77
+ model: str,
78
+ config_path: Path,
79
+ ) -> None:
80
+ """Write config.ini with [openai] section."""
81
+ parser = configparser.ConfigParser()
82
+ if config_path.exists():
83
+ parser.read(config_path, encoding="utf-8")
84
+ if not parser.has_section("openai"):
85
+ parser.add_section("openai")
86
+ parser.set("openai", "api_key", api_key)
87
+ parser.set("openai", "base_url", base_url)
88
+ parser.set("openai", "model", model)
89
+ with open(config_path, "w", encoding="utf-8") as f:
90
+ parser.write(f)
91
+
92
+
93
+ def _ensure_nltk_sumy():
94
+ import nltk
95
+ for resource in ("punkt", "punkt_tab"):
96
+ try:
97
+ nltk.data.find(f"tokenizers/{resource}")
98
+ except LookupError:
99
+ nltk.download(resource, quiet=True)
100
+
101
+
102
+ def summarize_text(text: str, num_sentences: int = 3) -> str:
103
+ if not text or not text.strip():
104
+ return ""
105
+ auto_abstractor = AutoAbstractor()
106
+ auto_abstractor.tokenizable_doc = SimpleTokenizer()
107
+ auto_abstractor.delimiter_list = [".", "\n", "?", "!"]
108
+ abstractable_doc = TopNRankAbstractor()
109
+ result_dict = auto_abstractor.summarize(text.strip(), abstractable_doc)
110
+ sentences = result_dict.get("summarize_result", [])[:num_sentences]
111
+ return " ".join(sentences) if sentences else ""
112
+
113
+
114
+ def summarize_text_sumy(text: str, num_sentences: int = 3, language: str = "spanish") -> str:
115
+ if not text or not text.strip():
116
+ return ""
117
+ _ensure_nltk_sumy()
118
+ try:
119
+ parser = PlaintextParser.from_string(text.strip(), Tokenizer(language))
120
+ stemmer = Stemmer(language)
121
+ summarizer = LexRankSummarizer(stemmer)
122
+ summarizer.stop_words = get_stop_words(language)
123
+ sentences = summarizer(parser.document, num_sentences)
124
+ return " ".join(str(s) for s in sentences) if sentences else ""
125
+ except Exception:
126
+ return ""
127
+
128
+
129
+ def summarize_text_openai(
130
+ text: str,
131
+ num_sentences: int = 3,
132
+ api_key: str | None = None,
133
+ base_url: str | None = None,
134
+ model: str | None = None,
135
+ detected_language: str | None = None,
136
+ ) -> str:
137
+ if not text or not text.strip():
138
+ return ""
139
+ cfg = _load_openai_config()
140
+ key = api_key or cfg["api_key"]
141
+ if not key:
142
+ return ""
143
+ base = base_url or cfg["base_url"]
144
+ model_name = model or cfg["model"]
145
+ client = OpenAI(api_key=key, base_url=base)
146
+ lang_inst = f" The text is in {detected_language}." if detected_language else ""
147
+ system = f"Summarize the text in a technical and concise way in approximately {num_sentences} sentences.{lang_inst} Use structure and hierarchy (e.g. lists) when possible. Summarize in the same language as the text."
148
+ try:
149
+ resp = client.chat.completions.create(
150
+ model=model_name,
151
+ messages=[
152
+ {"role": "system", "content": system},
153
+ {"role": "user", "content": text.strip()},
154
+ ],
155
+ temperature=0.2,
156
+ )
157
+ content = resp.choices[0].message.content
158
+ return (content or "").strip()
159
+ except Exception:
160
+ return ""
161
+
162
+
163
+ def reply_text_openai(
164
+ text: str,
165
+ api_key: str | None = None,
166
+ base_url: str | None = None,
167
+ model: str | None = None,
168
+ detected_language: str | None = None,
169
+ ) -> str:
170
+ if not text or not text.strip():
171
+ return ""
172
+ cfg = _load_openai_config()
173
+ key = api_key or cfg["api_key"]
174
+ if not key:
175
+ return ""
176
+ base = base_url or cfg["base_url"]
177
+ model_name = model or cfg["model"]
178
+ client = OpenAI(api_key=key, base_url=base)
179
+ lang_inst = f" Reply in {detected_language}." if detected_language else ""
180
+ system = f"Reply briefly to the following message.{lang_inst} Be direct and concise."
181
+ try:
182
+ resp = client.chat.completions.create(
183
+ model=model_name,
184
+ messages=[
185
+ {"role": "system", "content": system},
186
+ {"role": "user", "content": text.strip()},
187
+ ],
188
+ temperature=0.3,
189
+ )
190
+ content = resp.choices[0].message.content
191
+ return (content or "").strip()
192
+ except Exception:
193
+ return ""
194
+
195
+
196
+ def main():
197
+ parser = argparse.ArgumentParser(description="Transcribe audio with Whisper.")
198
+ parser.add_argument(
199
+ "audio",
200
+ nargs="?",
201
+ default=None,
202
+ metavar="FILE",
203
+ help="Audio file to transcribe (required unless using --save-config)",
204
+ )
205
+ parser.add_argument(
206
+ "--save-config",
207
+ action="store_true",
208
+ help="Save OpenAI token and URL to config.ini (use --api-key and/or --base-url, or env vars).",
209
+ )
210
+ parser.add_argument(
211
+ "--api-key",
212
+ metavar="TOKEN",
213
+ help="OpenAI API token (for --save-config). Falls back to OPENAI_API_KEY if not set.",
214
+ )
215
+ parser.add_argument(
216
+ "--base-url",
217
+ metavar="URL",
218
+ help="API base URL (for --save-config). Falls back to OPENAI_BASE_URL or default.",
219
+ )
220
+ parser.add_argument(
221
+ "--model",
222
+ default="gpt-4o-mini",
223
+ metavar="MODEL",
224
+ help="OpenAI model (default: gpt-4o-mini). Saved with --save-config.",
225
+ )
226
+ parser.add_argument(
227
+ "--summary",
228
+ action="store_true",
229
+ help="Summarize: use OpenAI if token works; otherwise shortest of pysummarization and sumy.",
230
+ )
231
+ parser.add_argument(
232
+ "--summary-pysummarization",
233
+ action="store_true",
234
+ help="Print a summary using pysummarization.",
235
+ )
236
+ parser.add_argument(
237
+ "--summary-sentences",
238
+ type=int,
239
+ default=3,
240
+ metavar="N",
241
+ help="Number of sentences in the summary (default: 3).",
242
+ )
243
+ parser.add_argument(
244
+ "--summary-sumy",
245
+ action="store_true",
246
+ help="Print a summary using sumy (LexRank).",
247
+ )
248
+ parser.add_argument(
249
+ "--summary-openai",
250
+ action="store_true",
251
+ help="Print a summary using OpenAI API (Ollama/CCAD). Requires config.ini or OPENAI_API_KEY.",
252
+ )
253
+ parser.add_argument(
254
+ "--reply",
255
+ action="store_true",
256
+ help="Generate a short reply to the transcribed message using OpenAI.",
257
+ )
258
+ parser.add_argument(
259
+ "--json",
260
+ action="store_true",
261
+ help="Output as JSON (text, language, summary and/or reply depending on options).",
262
+ )
263
+ args = parser.parse_args()
264
+
265
+ if args.save_config:
266
+ api_key = (args.api_key or os.environ.get("OPENAI_API_KEY") or "").strip()
267
+ base_url = (
268
+ (args.base_url or os.environ.get("OPENAI_BASE_URL") or "").strip()
269
+ or "https://api.openai.com/v1"
270
+ )
271
+ if not api_key:
272
+ print("Error: provide --api-key or set OPENAI_API_KEY to save configuration.")
273
+ sys.exit(1)
274
+ config_path = Path.cwd() / CONFIG_FILENAME
275
+ _save_openai_config(api_key, base_url, args.model or "gpt-4o-mini", config_path)
276
+ print(f"Configuration saved to {config_path.resolve()}")
277
+ return
278
+
279
+ if args.audio is None:
280
+ parser.error("the following arguments are required: FILE (unless --save-config)")
281
+
282
+ try:
283
+ gpu_count = ctranslate2.get_cuda_device_count()
284
+ use_cuda = gpu_count > 0
285
+ except Exception:
286
+ use_cuda = False
287
+
288
+ if use_cuda:
289
+ device, compute_type = "cuda", "float16"
290
+ else:
291
+ device, compute_type = "cpu", "int8"
292
+
293
+ if not args.json:
294
+ print("Using GPU (CUDA)" if use_cuda else "Using CPU")
295
+
296
+ model = WhisperModel("large-v3", device=device, compute_type=compute_type)
297
+ segments, info = model.transcribe(args.audio)
298
+
299
+ language_code = getattr(info, "language", None) or ""
300
+ sumy_language = _language_to_sumy(language_code)
301
+ language_for_prompt = _language_name_for_prompt(language_code)
302
+
303
+ full_text = " ".join(s.text for s in segments).strip()
304
+
305
+ output = {}
306
+ if args.json:
307
+ output["text"] = full_text
308
+ output["language"] = language_code
309
+ output["device"] = "cuda" if use_cuda else "cpu"
310
+ else:
311
+ print(full_text)
312
+ if full_text:
313
+ print()
314
+
315
+ n = args.summary_sentences
316
+
317
+ if args.summary and full_text:
318
+ cfg = _load_openai_config()
319
+ summary_openai = ""
320
+ if cfg["api_key"]:
321
+ summary_openai = summarize_text_openai(
322
+ full_text, num_sentences=n, detected_language=language_for_prompt
323
+ )
324
+ if summary_openai:
325
+ if args.json:
326
+ output["summary"] = summary_openai
327
+ output["summary_source"] = "openai"
328
+ else:
329
+ print("--- Summary (openai) ---")
330
+ print(summary_openai)
331
+ else:
332
+ summary_py = summarize_text(full_text, num_sentences=n)
333
+ summary_sumy = summarize_text_sumy(
334
+ full_text, num_sentences=n, language=sumy_language
335
+ )
336
+ candidates = [
337
+ (summary_py, "pysummarization"),
338
+ (summary_sumy, "sumy"),
339
+ ]
340
+ candidates = [(t, name) for t, name in candidates if t]
341
+ if candidates:
342
+ shortest = min(candidates, key=lambda x: len(x[0]))
343
+ summary_text, name = shortest
344
+ if args.json:
345
+ output["summary"] = summary_text
346
+ output["summary_source"] = name
347
+ else:
348
+ print(f"--- Summary ({name}, shortest) ---")
349
+ print(summary_text)
350
+ elif not args.json:
351
+ print("(Text too short to generate summary)")
352
+
353
+ if args.summary_pysummarization and full_text:
354
+ summary = summarize_text(full_text, num_sentences=n)
355
+ if summary:
356
+ if args.json:
357
+ output["summary_pysummarization"] = summary
358
+ else:
359
+ print("--- Summary (pysummarization) ---")
360
+ print(summary)
361
+ elif not args.json:
362
+ print("(Text too short to generate summary)")
363
+
364
+ if args.summary_sumy and full_text:
365
+ summary = summarize_text_sumy(
366
+ full_text, num_sentences=n, language=sumy_language
367
+ )
368
+ if summary:
369
+ if args.json:
370
+ output["summary_sumy"] = summary
371
+ else:
372
+ print("--- Summary (sumy) ---")
373
+ print(summary)
374
+ elif not args.json:
375
+ print("(Text too short to generate summary with sumy)")
376
+
377
+ if args.summary_openai and full_text:
378
+ cfg = _load_openai_config()
379
+ if not cfg["api_key"]:
380
+ if not args.json:
381
+ print(
382
+ "Error: --summary-openai requires api_key in config.ini ([openai] section) "
383
+ "or OPENAI_API_KEY environment variable."
384
+ )
385
+ else:
386
+ output["error_summary_openai"] = "Missing api_key in config.ini or OPENAI_API_KEY"
387
+ else:
388
+ summary = summarize_text_openai(
389
+ full_text, num_sentences=n, detected_language=language_for_prompt
390
+ )
391
+ if summary:
392
+ if args.json:
393
+ output["summary_openai"] = summary
394
+ else:
395
+ print("--- Summary (openai) ---")
396
+ print(summary)
397
+ elif not args.json:
398
+ print("(Could not generate summary with OpenAI)")
399
+
400
+ if args.reply and full_text:
401
+ cfg = _load_openai_config()
402
+ if not cfg["api_key"]:
403
+ if not args.json:
404
+ print(
405
+ "Error: --reply requires api_key in config.ini ([openai] section) "
406
+ "or OPENAI_API_KEY environment variable."
407
+ )
408
+ else:
409
+ output["error_reply"] = "Missing api_key in config.ini or OPENAI_API_KEY"
410
+ else:
411
+ reply = reply_text_openai(
412
+ full_text, detected_language=language_for_prompt
413
+ )
414
+ if reply:
415
+ if args.json:
416
+ output["reply"] = reply
417
+ else:
418
+ print("--- Reply ---")
419
+ print(reply)
420
+ elif not args.json:
421
+ print("(Could not generate reply with OpenAI)")
422
+
423
+ if args.json:
424
+ print(json.dumps(output, ensure_ascii=False, indent=2))
@@ -0,0 +1,130 @@
1
+ Metadata-Version: 2.4
2
+ Name: summscriber
3
+ Version: 0.1.0
4
+ Summary: Transcribe audio with Whisper (faster-whisper), with summarization (pysummarization, sumy, OpenAI) and short reply with OpenAI.
5
+ License: MIT
6
+ Project-URL: Repository, https://github.com/pablogventura/summscriber
7
+ Project-URL: Homepage, https://github.com/pablogventura/summscriber
8
+ Keywords: whisper,transcription,speech-to-text,summarization,openai
9
+ Classifier: Development Status :: 4 - Beta
10
+ Classifier: Intended Audience :: Developers
11
+ Classifier: License :: OSI Approved :: MIT License
12
+ Classifier: Programming Language :: Python :: 3
13
+ Classifier: Programming Language :: Python :: 3.10
14
+ Classifier: Programming Language :: Python :: 3.11
15
+ Classifier: Programming Language :: Python :: 3.12
16
+ Classifier: Topic :: Multimedia :: Sound/Audio :: Speech
17
+ Requires-Python: >=3.10
18
+ Description-Content-Type: text/markdown
19
+ Requires-Dist: faster-whisper>=1.0.0
20
+ Requires-Dist: pysummarization>=1.1.0
21
+ Requires-Dist: sumy>=0.11.0
22
+ Requires-Dist: openai>=1.0.0
23
+ Provides-Extra: dev
24
+ Requires-Dist: pytest>=7; extra == "dev"
25
+ Provides-Extra: publish
26
+ Requires-Dist: build>=1.0; extra == "publish"
27
+ Requires-Dist: twine>=5.0; extra == "publish"
28
+
29
+ # Summscriber
30
+
31
+ Transcribe audio with [faster-whisper](https://github.com/SYSTRAN/faster-whisper) (Whisper), with summarization options (pysummarization, sumy, OpenAI) and short reply generation via OpenAI API.
32
+
33
+ **Repository:** [github.com/pablogventura/summscriber](https://github.com/pablogventura/summscriber)
34
+
35
+ ## Installation
36
+
37
+ ### With pipx (recommended: isolated env, global command)
38
+
39
+ ```bash
40
+ pipx install git+https://github.com/pablogventura/summscriber.git
41
+ ```
42
+
43
+ To upgrade:
44
+
45
+ ```bash
46
+ pipx upgrade summscriber
47
+ ```
48
+
49
+ ### With pip (from project directory)
50
+
51
+ ```bash
52
+ pip install .
53
+ ```
54
+
55
+ Or in editable mode (development):
56
+
57
+ ```bash
58
+ pip install -e .
59
+ ```
60
+
61
+ Or from the repository:
62
+
63
+ ```bash
64
+ pip install git+https://github.com/pablogventura/summscriber.git
65
+ ```
66
+
67
+ Or from PyPI (once published):
68
+
69
+ ```bash
70
+ pip install summscriber
71
+ ```
72
+
73
+ ## Usage
74
+
75
+ After installing, the `summscriber` command is available:
76
+
77
+ ```bash
78
+ summscriber FILE [options]
79
+ ```
80
+
81
+ Examples:
82
+
83
+ ```bash
84
+ summscriber recording.mp3
85
+ summscriber interview.ogg --summary
86
+ summscriber audio.wav --summary --reply --json
87
+ ```
88
+
89
+ ### Main options
90
+
91
+ - **FILE**: audio file to transcribe (required).
92
+ - `--summary`: summarize (OpenAI if token works; otherwise shortest of pysummarization and sumy).
93
+ - `--summary-pysummarization` / `--summary-sumy` / `--summary-openai`: use a specific summarization backend.
94
+ - `--summary-sentences N`: number of sentences in the summary (default 3).
95
+ - `--reply`: generate a short reply to the message with OpenAI.
96
+ - `--json`: output as JSON.
97
+
98
+ For summarization and reply with OpenAI, use `config.ini` (section `[openai]`) or environment variables `OPENAI_API_KEY` and optionally `OPENAI_BASE_URL`. See `config.ini.example`. You can save your token and URL with:
99
+
100
+ ```bash
101
+ summscriber --save-config --api-key YOUR_TOKEN --base-url https://...
102
+ ```
103
+
104
+ ## Development
105
+
106
+ From the repo root without installing:
107
+
108
+ ```bash
109
+ python -m summscriber FILE [options]
110
+ ```
111
+
112
+ ## Publishing to PyPI
113
+
114
+ 1. Install build tools: `pip install build twine` (or `pip install ".[publish]"`).
115
+ 2. Create a PyPI account and an API token at [pypi.org/manage/account/token/](https://pypi.org/manage/account/token/).
116
+ 3. From the project root run:
117
+
118
+ ```bash
119
+ ./publish.sh
120
+ ```
121
+
122
+ Or manually:
123
+
124
+ ```bash
125
+ rm -rf build dist *.egg-info
126
+ python -m build
127
+ twine upload dist/*
128
+ ```
129
+
130
+ When prompted, use username `__token__` and password your PyPI token. Or set `TWINE_USERNAME=__token__` and `TWINE_PASSWORD=pypi-your-token`.
@@ -0,0 +1,11 @@
1
+ README.md
2
+ pyproject.toml
3
+ summscriber/__init__.py
4
+ summscriber/__main__.py
5
+ summscriber/cli.py
6
+ summscriber.egg-info/PKG-INFO
7
+ summscriber.egg-info/SOURCES.txt
8
+ summscriber.egg-info/dependency_links.txt
9
+ summscriber.egg-info/entry_points.txt
10
+ summscriber.egg-info/requires.txt
11
+ summscriber.egg-info/top_level.txt
@@ -0,0 +1,2 @@
1
+ [console_scripts]
2
+ summscriber = summscriber.cli:main
@@ -0,0 +1,11 @@
1
+ faster-whisper>=1.0.0
2
+ pysummarization>=1.1.0
3
+ sumy>=0.11.0
4
+ openai>=1.0.0
5
+
6
+ [dev]
7
+ pytest>=7
8
+
9
+ [publish]
10
+ build>=1.0
11
+ twine>=5.0
@@ -0,0 +1 @@
1
+ summscriber