PyPI - subtitle-engine - Versions diffs - 0.1.0__tar.gz - Mend

subtitle-engine 0.1.0__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (20) hide show

subtitle_engine-0.1.0/LICENSE ADDED Viewed

@@ -0,0 +1,21 @@
+MIT License
+Copyright (c) 2026 Leevi Puntanen
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.

subtitle_engine-0.1.0/PKG-INFO ADDED Viewed

@@ -0,0 +1,97 @@
+Metadata-Version: 2.4
+Name: subtitle-engine
+Version: 0.1.0
+Summary: Generate SRT subtitles from audio/video files using WhisperX
+Author: Leevi Puntanen
+License-Expression: MIT
+Project-URL: Homepage, https://github.com/leevipuntanen/subtitle-engine
+Project-URL: Issues, https://github.com/leevipuntanen/subtitle-engine/issues
+Keywords: subtitles,srt,whisperx,transcription,asr
+Classifier: Development Status :: 4 - Beta
+Classifier: Intended Audience :: End Users/Desktop
+Classifier: Programming Language :: Python :: 3
+Classifier: Programming Language :: Python :: 3.12
+Classifier: Programming Language :: Python :: 3.13
+Classifier: Topic :: Multimedia :: Sound/Audio :: Speech
+Classifier: Topic :: Utilities
+Requires-Python: >=3.12
+Description-Content-Type: text/markdown
+License-File: LICENSE
+Requires-Dist: typer>=0.12.0
+Requires-Dist: whisperx>=3.8.0
+Requires-Dist: requests>=2.32.0
+Provides-Extra: dev
+Requires-Dist: pytest>=8.0.0; extra == "dev"
+Dynamic: license-file
+# subtitle-engine
+Generate `.srt` subtitle files from audio or video files using [WhisperX](https://github.com/m-bain/whisperX). Optionally generate a caption from the transcript with a local [Ollama](https://ollama.com/) LLM.
+## Installation
+Requires Python 3.12 or newer.
+```bash
+pip install subtitle-engine
+```
+Or install from source:
+```bash
+git clone https://github.com/leevipuntanen/subtitle-engine.git
+cd subtitle-engine
+python -m venv .venv
+source .venv/bin/activate
+pip install -e ".[dev]"
+```
+## Usage
+```bash
+# Basic usage — writes <input>.srt next to the source file
+subeng video.mp4
+# Specify output file
+subeng video.mp4 --output subtitles.srt
+# Use a different model or language
+subeng video.mp4 --model medium --language fi
+# Force CPU / CUDA / MPS
+subeng video.mp4 --device cpu
+# Speaker diarization (requires a Hugging Face token)
+subeng video.mp4 --diarize --hf-token $HF_TOKEN
+# Generate a caption from the transcript using Ollama
+subeng video.mp4 --caption --ollama-model qwen3.5:0.8b
+```
+## Options
+| Option | Description |
+|--------|-------------|
+| `--output`, `-o` | Output SRT file path |
+| `--model`, `-m` | WhisperX model: `tiny`, `base`, `small` (default), `medium`, `large-v2`, `large-v3` |
+| `--language`, `-l` | ISO language code, e.g. `en`, `fi`. Auto-detected if omitted. |
+| `--device`, `-d` | `cpu`, `cuda` or `mps`. Auto-detected if omitted. |
+| `--batch-size`, `-b` | Inference batch size (default: 16) |
+| `--compute-type`, `-c` | `int8` or `float16`. Auto-selected if omitted. |
+| `--diarize` | Enable speaker diarization |
+| `--hf-token` | Hugging Face token for diarization (or set `HF_TOKEN` env var) |
+| `--caption` | Generate a caption from the transcript via Ollama |
+| `--ollama-model` | Ollama model name (required with `--caption`) |
+| `--ollama-host` | Ollama API host (default: `http://localhost:11434`) |
+## Development
+Run the test suite:
+```bash
+pytest
+```
+## License
+MIT

subtitle_engine-0.1.0/README.md ADDED Viewed

@@ -0,0 +1,71 @@
+# subtitle-engine
+Generate `.srt` subtitle files from audio or video files using [WhisperX](https://github.com/m-bain/whisperX). Optionally generate a caption from the transcript with a local [Ollama](https://ollama.com/) LLM.
+## Installation
+Requires Python 3.12 or newer.
+```bash
+pip install subtitle-engine
+```
+Or install from source:
+```bash
+git clone https://github.com/leevipuntanen/subtitle-engine.git
+cd subtitle-engine
+python -m venv .venv
+source .venv/bin/activate
+pip install -e ".[dev]"
+```
+## Usage
+```bash
+# Basic usage — writes <input>.srt next to the source file
+subeng video.mp4
+# Specify output file
+subeng video.mp4 --output subtitles.srt
+# Use a different model or language
+subeng video.mp4 --model medium --language fi
+# Force CPU / CUDA / MPS
+subeng video.mp4 --device cpu
+# Speaker diarization (requires a Hugging Face token)
+subeng video.mp4 --diarize --hf-token $HF_TOKEN
+# Generate a caption from the transcript using Ollama
+subeng video.mp4 --caption --ollama-model qwen3.5:0.8b
+```
+## Options
+| Option | Description |
+|--------|-------------|
+| `--output`, `-o` | Output SRT file path |
+| `--model`, `-m` | WhisperX model: `tiny`, `base`, `small` (default), `medium`, `large-v2`, `large-v3` |
+| `--language`, `-l` | ISO language code, e.g. `en`, `fi`. Auto-detected if omitted. |
+| `--device`, `-d` | `cpu`, `cuda` or `mps`. Auto-detected if omitted. |
+| `--batch-size`, `-b` | Inference batch size (default: 16) |
+| `--compute-type`, `-c` | `int8` or `float16`. Auto-selected if omitted. |
+| `--diarize` | Enable speaker diarization |
+| `--hf-token` | Hugging Face token for diarization (or set `HF_TOKEN` env var) |
+| `--caption` | Generate a caption from the transcript via Ollama |
+| `--ollama-model` | Ollama model name (required with `--caption`) |
+| `--ollama-host` | Ollama API host (default: `http://localhost:11434`) |
+## Development
+Run the test suite:
+```bash
+pytest
+```
+## License
+MIT

subtitle_engine-0.1.0/pyproject.toml ADDED Viewed

@@ -0,0 +1,48 @@
+[build-system]
+requires = ["setuptools>=77.0.0"]
+build-backend = "setuptools.build_meta"
+[project]
+name = "subtitle-engine"
+version = "0.1.0"
+description = "Generate SRT subtitles from audio/video files using WhisperX"
+readme = "README.md"
+license = "MIT"
+license-files = ["LICENSE"]
+authors = [
+    {name = "Leevi Puntanen"},
+]
+requires-python = ">=3.12"
+classifiers = [
+    "Development Status :: 4 - Beta",
+    "Intended Audience :: End Users/Desktop",
+    "Programming Language :: Python :: 3",
+    "Programming Language :: Python :: 3.12",
+    "Programming Language :: Python :: 3.13",
+    "Topic :: Multimedia :: Sound/Audio :: Speech",
+    "Topic :: Utilities",
+]
+keywords = ["subtitles", "srt", "whisperx", "transcription", "asr"]
+dependencies = [
+    "typer>=0.12.0",
+    "whisperx>=3.8.0",
+    "requests>=2.32.0",
+]
+[project.optional-dependencies]
+dev = [
+    "pytest>=8.0.0",
+]
+[project.scripts]
+subeng = "subtitle_engine.cli:app"
+[project.urls]
+Homepage = "https://github.com/leevipuntanen/subtitle-engine"
+Issues = "https://github.com/leevipuntanen/subtitle-engine/issues"
+[tool.setuptools.packages.find]
+where = ["src"]
+[tool.pytest.ini_options]
+testpaths = ["tests"]

subtitle_engine-0.1.0/setup.cfg ADDED Viewed

@@ -0,0 +1,4 @@
+[egg_info]
+tag_build =
+tag_date = 0

subtitle_engine-0.1.0/src/subtitle_engine/__init__.py ADDED Viewed

@@ -0,0 +1,3 @@
+"""Subtitle Engine — generate SRT files with WhisperX."""
+__version__ = "0.1.0"

subtitle_engine-0.1.0/src/subtitle_engine/captioner.py ADDED Viewed

@@ -0,0 +1,78 @@
+"""Generate captions from transcripts using a local Ollama instance."""
+import json
+from typing import Optional
+import requests
+def _default_prompt(transcript: str) -> str:
+    """Build the prompt sent to the LLM."""
+    return (
+        "Create a short, engaging caption (1-2 sentences) for a video based on the following transcript. "
+        "Write the caption in the same language as the transcript. "
+        "Answer directly with the caption only, without any thinking or explanation.\n\n"
+        f"Transcript:\n{transcript}"
+    )
+def generate_caption(
+    transcript: str,
+    *,
+    model: str,
+    host: str = "http://localhost:11434",
+    prompt: Optional[str] = None,
+) -> str:
+    """Generate a caption from a transcript via Ollama.
+    Parameters
+    ----------
+    transcript:
+        The transcript text to summarize.
+    model:
+        Name of the Ollama model to use.
+    host:
+        Base URL of the Ollama API.
+    prompt:
+        Custom prompt. A default prompt is used if omitted.
+    Returns
+    -------
+    The generated caption string.
+    """
+    if not transcript.strip():
+        raise ValueError("Cannot generate a caption from an empty transcript")
+    url = f"{host.rstrip('/')}/api/generate"
+    payload = {
+        "model": model,
+        "prompt": prompt or _default_prompt(transcript),
+        "stream": False,
+    }
+    try:
+        response = requests.post(url, json=payload, timeout=300)
+    except requests.ConnectionError as exc:
+        raise ConnectionError(
+            f"Could not connect to Ollama at {host}. Is Ollama running?"
+        ) from exc
+    response.raise_for_status()
+    data = response.json()
+    caption = data.get("response", "").strip()
+    if not caption:
+        raise ValueError(
+            "Ollama returned an empty caption. "
+            "This can happen with some models or languages — try a different --ollama-model."
+        )
+    return caption
+def list_models(host: str = "http://localhost:11434") -> list[str]:
+    """Return the names of models available in the local Ollama instance."""
+    url = f"{host.rstrip('/')}/api/tags"
+    response = requests.get(url, timeout=30)
+    response.raise_for_status()
+    return [model["name"] for model in response.json().get("models", [])]

subtitle_engine-0.1.0/src/subtitle_engine/cli.py ADDED Viewed

@@ -0,0 +1,170 @@
+"""Command-line interface for subtitle-engine."""
+from pathlib import Path
+from typing import Annotated, Optional
+import typer
+from rich.console import Console
+from subtitle_engine.captioner import generate_caption
+from subtitle_engine.srt_writer import write_srt
+from subtitle_engine.transcriber import transcribe
+from subtitle_engine.utils import resolve_output_path, validate_media_file
+app = typer.Typer(
+    help="Generate SRT subtitles from audio/video files using WhisperX",
+    no_args_is_help=True,
+)
+console = Console()
+@app.command()
+def main(
+    input_file: Annotated[
+        Path,
+        typer.Argument(
+            help="Audio or video file to transcribe",
+            exists=True,
+            file_okay=True,
+            dir_okay=False,
+            readable=True,
+        ),
+    ],
+    output: Annotated[
+        Optional[Path],
+        typer.Option(
+            "--output",
+            "-o",
+            help="Output SRT file (default: <input>.srt)",
+            file_okay=True,
+            dir_okay=False,
+        ),
+    ] = None,
+    model: Annotated[
+        str,
+        typer.Option(
+            "--model",
+            "-m",
+            help="WhisperX model: tiny, base, small, medium, large-v2, large-v3",
+        ),
+    ] = "small",
+    language: Annotated[
+        Optional[str],
+        typer.Option(
+            "--language",
+            "-l",
+            help="Language code, e.g. en, fi. Auto-detected if omitted.",
+        ),
+    ] = None,
+    device: Annotated[
+        Optional[str],
+        typer.Option(
+            "--device",
+            "-d",
+            help="Device: cpu, cuda or mps. Auto-detected if omitted.",
+        ),
+    ] = None,
+    batch_size: Annotated[
+        int,
+        typer.Option(
+            "--batch-size",
+            "-b",
+            help="WhisperX inference batch size",
+            min=1,
+        ),
+    ] = 16,
+    compute_type: Annotated[
+        Optional[str],
+        typer.Option(
+            "--compute-type",
+            "-c",
+            help="Compute type: int8 or float16. Auto-selected if omitted.",
+        ),
+    ] = None,
+    diarize: Annotated[
+        bool,
+        typer.Option(
+            "--diarize",
+            help="Run speaker diarization (requires --hf-token)",
+        ),
+    ] = False,
+    hf_token: Annotated[
+        Optional[str],
+        typer.Option(
+            "--hf-token",
+            help="Hugging Face token for diarization",
+            envvar="HF_TOKEN",
+        ),
+    ] = None,
+    caption: Annotated[
+        bool,
+        typer.Option(
+            "--caption",
+            help="Generate a caption from the transcript using Ollama",
+        ),
+    ] = False,
+    ollama_model: Annotated[
+        Optional[str],
+        typer.Option(
+            "--ollama-model",
+            help="Ollama model for caption generation. Required if --caption is set.",
+        ),
+    ] = None,
+    ollama_host: Annotated[
+        str,
+        typer.Option(
+            "--ollama-host",
+            help="Ollama API host",
+            envvar="OLLAMA_HOST",
+        ),
+    ] = "http://localhost:11434",
+) -> None:
+    """Generate SRT subtitles from a media file."""
+    try:
+        validate_media_file(input_file)
+        output_path = resolve_output_path(input_file, output)
+        if caption and not ollama_model:
+            raise ValueError("--ollama-model is required when using --caption")
+        console.print(f"[bold]Transcribing:[/bold] {input_file}")
+        console.print(f"[bold]Model:[/bold] {model}")
+        if language:
+            console.print(f"[bold]Language:[/bold] {language}")
+        if device:
+            console.print(f"[bold]Device:[/bold] {device}")
+        segments = transcribe(
+            input_file,
+            model_name=model,
+            language=language,
+            device=device,
+            batch_size=batch_size,
+            compute_type=compute_type,
+            diarize=diarize,
+            hf_token=hf_token,
+        )
+        write_srt(segments, output_path)
+        console.print(f"[green]Wrote subtitles to:[/green] {output_path}")
+        if caption:
+            transcript = " ".join(str(segment.get("text", "")).strip() for segment in segments)
+            caption_text = generate_caption(
+                transcript,
+                model=ollama_model,
+                host=ollama_host,
+            )
+            caption_path = output_path.with_suffix(".caption.txt")
+            caption_path.write_text(caption_text, encoding="utf-8")
+            console.print(f"[green]Wrote caption to:[/green] {caption_path}")
+    except (ValueError, FileNotFoundError, ConnectionError) as exc:
+        console.print(f"[red]Error:[/red] {exc}")
+        raise typer.Exit(code=1) from exc
+    except Exception as exc:  # noqa: BLE001
+        console.print(f"[red]Transcription failed:[/red] {exc}")
+        raise typer.Exit(code=1) from exc
+if __name__ == "__main__":
+    app()

subtitle_engine-0.1.0/src/subtitle_engine/srt_writer.py ADDED Viewed

@@ -0,0 +1,45 @@
+"""Convert transcription segments to SRT format."""
+from pathlib import Path
+from typing import Iterable
+def _format_time(seconds: float) -> str:
+    """Convert seconds to SRT time format HH:MM:SS,mmm."""
+    total_millis = int(round(seconds * 1000))
+    hours = total_millis // 3_600_000
+    minutes = (total_millis % 3_600_000) // 60_000
+    secs = (total_millis % 60_000) // 1_000
+    millis = total_millis % 1_000
+    return f"{hours:02d}:{minutes:02d}:{secs:02d},{millis:03d}"
+def _format_segment(index: int, start: float, end: float, text: str) -> str:
+    """Format a single segment as an SRT block."""
+    cleaned_text = text.strip()
+    if not cleaned_text:
+        cleaned_text = "..."
+    return f"{index}\n{_format_time(start)} --> {_format_time(end)}\n{cleaned_text}\n"
+def segments_to_srt(segments: Iterable[dict]) -> str:
+    """Build an SRT string from WhisperX-style segments.
+    Each segment is expected to be a dict with keys:
+    ``start`` (float), ``end`` (float), and ``text`` (str).
+    """
+    blocks = []
+    for index, segment in enumerate(segments, start=1):
+        start = float(segment["start"])
+        end = float(segment["end"])
+        text = str(segment["text"])
+        blocks.append(_format_segment(index, start, end, text))
+    return "\n".join(blocks)
+def write_srt(segments: Iterable[dict], output_path: Path) -> None:
+    """Write segments to an SRT file."""
+    output_path = Path(output_path)
+    output_path.parent.mkdir(parents=True, exist_ok=True)
+    output_path.write_text(segments_to_srt(segments), encoding="utf-8")

subtitle_engine-0.1.0/src/subtitle_engine/transcriber.py ADDED Viewed

@@ -0,0 +1,129 @@
+"""WhisperX transcription wrapper."""
+from pathlib import Path
+from typing import Optional
+import torch
+import whisperx
+VALID_MODELS = {"tiny", "base", "small", "medium", "large-v2", "large-v3"}
+VALID_DEVICES = {"cpu", "cuda", "mps"}
+def _detect_device(device: Optional[str]) -> str:
+    """Pick a device if none was specified."""
+    if device:
+        return device
+    if torch.cuda.is_available():
+        return "cuda"
+    if torch.backends.mps.is_available():
+        return "mps"
+    return "cpu"
+def _default_compute_type(device: str) -> str:
+    """Pick a safe compute type for the device."""
+    if device == "cpu":
+        return "int8"
+    return "float16"
+def _validate_model(model_name: str) -> None:
+    """Raise a ValueError if the model name is unknown."""
+    if model_name not in VALID_MODELS:
+        joined = ", ".join(sorted(VALID_MODELS))
+        raise ValueError(f"Unknown model '{model_name}'. Choose from: {joined}")
+def _validate_device(device: str) -> None:
+    """Raise a ValueError if the device name is unknown."""
+    if device not in VALID_DEVICES:
+        joined = ", ".join(sorted(VALID_DEVICES))
+        raise ValueError(f"Unknown device '{device}'. Choose from: {joined}")
+def transcribe(
+    audio_path: Path,
+    *,
+    model_name: str = "small",
+    language: Optional[str] = None,
+    device: Optional[str] = None,
+    batch_size: int = 16,
+    compute_type: Optional[str] = None,
+    diarize: bool = False,
+    hf_token: Optional[str] = None,
+) -> list[dict]:
+    """Transcribe an audio/video file and return SRT-ready segments.
+    Parameters
+    ----------
+    audio_path:
+        Path to the media file to transcribe.
+    model_name:
+        WhisperX model size. One of: tiny, base, small, medium, large-v2, large-v3.
+    language:
+        ISO language code, e.g. ``en`` or ``fi``. If ``None``, WhisperX auto-detects.
+    device:
+        ``cpu``, ``cuda`` or ``mps``. Auto-detected if omitted.
+    batch_size:
+        WhisperX batch size for transcription.
+    compute_type:
+        ``int8`` or ``float16``. Auto-selected per device if omitted.
+    diarize:
+        Whether to run speaker diarization.
+    hf_token:
+        Hugging Face token required for diarization.
+    Returns
+    -------
+    A list of segment dicts with ``start``, ``end`` and ``text`` keys.
+    """
+    _validate_model(model_name)
+    audio_path = Path(audio_path)
+    if not audio_path.exists():
+        raise FileNotFoundError(f"Audio file not found: {audio_path}")
+    device = _detect_device(device)
+    _validate_device(device)
+    if compute_type is None:
+        compute_type = _default_compute_type(device)
+    if diarize and not hf_token:
+        raise ValueError("--hf-token is required when using --diarize")
+    audio = whisperx.load_audio(str(audio_path))
+    model = whisperx.load_model(model_name, device, compute_type=compute_type)
+    result = model.transcribe(audio, batch_size=batch_size, language=language)
+    # Free transcription model memory before alignment
+    del model
+    detected_language = result.get("language")
+    if detected_language:
+        align_model, align_metadata = whisperx.load_align_model(
+            language_code=detected_language, device=device
+        )
+        result = whisperx.align(
+            result["segments"],
+            align_model,
+            align_metadata,
+            audio,
+            device,
+            return_char_alignments=False,
+        )
+        del align_model
+    if diarize:
+        diarize_model = whisperx.DiarizationPipeline(
+            model_name="pyannote/speaker-diarization-3.1",
+            use_auth_token=hf_token,
+            device=device,
+        )
+        diarize_segments = diarize_model(audio)
+        result = whisperx.assign_word_speakers(diarize_segments, result)
+    return result["segments"]

subtitle_engine-0.1.0/src/subtitle_engine/utils.py ADDED Viewed

@@ -0,0 +1,39 @@
+"""CLI helpers and path utilities."""
+from pathlib import Path
+from typing import Optional
+SUPPORTED_EXTENSIONS = {
+    ".mp3",
+    ".wav",
+    ".flac",
+    ".aac",
+    ".ogg",
+    ".m4a",
+    ".mp4",
+    ".mov",
+    ".mkv",
+    ".avi",
+    ".webm",
+}
+def resolve_output_path(input_path: Path, output: Optional[Path] = None) -> Path:
+    """Resolve the SRT output path.
+    If ``output`` is provided, use it. Otherwise create ``<input>.srt``
+    next to the input file.
+    """
+    if output:
+        return output
+    return input_path.with_suffix(".srt")
+def validate_media_file(path: Path) -> None:
+    """Raise a ValueError if the path does not look like a media file."""
+    if path.suffix.lower() not in SUPPORTED_EXTENSIONS:
+        joined = ", ".join(sorted(SUPPORTED_EXTENSIONS))
+        raise ValueError(
+            f"Unsupported file type '{path.suffix}'. Supported: {joined}"
+        )

subtitle_engine-0.1.0/src/subtitle_engine.egg-info/PKG-INFO ADDED Viewed

@@ -0,0 +1,97 @@
+Metadata-Version: 2.4
+Name: subtitle-engine
+Version: 0.1.0
+Summary: Generate SRT subtitles from audio/video files using WhisperX
+Author: Leevi Puntanen
+License-Expression: MIT
+Project-URL: Homepage, https://github.com/leevipuntanen/subtitle-engine
+Project-URL: Issues, https://github.com/leevipuntanen/subtitle-engine/issues
+Keywords: subtitles,srt,whisperx,transcription,asr
+Classifier: Development Status :: 4 - Beta
+Classifier: Intended Audience :: End Users/Desktop
+Classifier: Programming Language :: Python :: 3
+Classifier: Programming Language :: Python :: 3.12
+Classifier: Programming Language :: Python :: 3.13
+Classifier: Topic :: Multimedia :: Sound/Audio :: Speech
+Classifier: Topic :: Utilities
+Requires-Python: >=3.12
+Description-Content-Type: text/markdown
+License-File: LICENSE
+Requires-Dist: typer>=0.12.0
+Requires-Dist: whisperx>=3.8.0
+Requires-Dist: requests>=2.32.0
+Provides-Extra: dev
+Requires-Dist: pytest>=8.0.0; extra == "dev"
+Dynamic: license-file
+# subtitle-engine
+Generate `.srt` subtitle files from audio or video files using [WhisperX](https://github.com/m-bain/whisperX). Optionally generate a caption from the transcript with a local [Ollama](https://ollama.com/) LLM.
+## Installation
+Requires Python 3.12 or newer.
+```bash
+pip install subtitle-engine
+```
+Or install from source:
+```bash
+git clone https://github.com/leevipuntanen/subtitle-engine.git
+cd subtitle-engine
+python -m venv .venv
+source .venv/bin/activate
+pip install -e ".[dev]"
+```
+## Usage
+```bash
+# Basic usage — writes <input>.srt next to the source file
+subeng video.mp4
+# Specify output file
+subeng video.mp4 --output subtitles.srt
+# Use a different model or language
+subeng video.mp4 --model medium --language fi
+# Force CPU / CUDA / MPS
+subeng video.mp4 --device cpu
+# Speaker diarization (requires a Hugging Face token)
+subeng video.mp4 --diarize --hf-token $HF_TOKEN
+# Generate a caption from the transcript using Ollama
+subeng video.mp4 --caption --ollama-model qwen3.5:0.8b
+```
+## Options
+| Option | Description |
+|--------|-------------|
+| `--output`, `-o` | Output SRT file path |
+| `--model`, `-m` | WhisperX model: `tiny`, `base`, `small` (default), `medium`, `large-v2`, `large-v3` |
+| `--language`, `-l` | ISO language code, e.g. `en`, `fi`. Auto-detected if omitted. |
+| `--device`, `-d` | `cpu`, `cuda` or `mps`. Auto-detected if omitted. |
+| `--batch-size`, `-b` | Inference batch size (default: 16) |
+| `--compute-type`, `-c` | `int8` or `float16`. Auto-selected if omitted. |
+| `--diarize` | Enable speaker diarization |
+| `--hf-token` | Hugging Face token for diarization (or set `HF_TOKEN` env var) |
+| `--caption` | Generate a caption from the transcript via Ollama |
+| `--ollama-model` | Ollama model name (required with `--caption`) |
+| `--ollama-host` | Ollama API host (default: `http://localhost:11434`) |
+## Development
+Run the test suite:
+```bash
+pytest
+```
+## License
+MIT

subtitle_engine-0.1.0/src/subtitle_engine.egg-info/SOURCES.txt ADDED Viewed

@@ -0,0 +1,18 @@
+LICENSE
+README.md
+pyproject.toml
+src/subtitle_engine/__init__.py
+src/subtitle_engine/captioner.py
+src/subtitle_engine/cli.py
+src/subtitle_engine/srt_writer.py
+src/subtitle_engine/transcriber.py
+src/subtitle_engine/utils.py
+src/subtitle_engine.egg-info/PKG-INFO
+src/subtitle_engine.egg-info/SOURCES.txt
+src/subtitle_engine.egg-info/dependency_links.txt
+src/subtitle_engine.egg-info/entry_points.txt
+src/subtitle_engine.egg-info/requires.txt
+src/subtitle_engine.egg-info/top_level.txt
+tests/test_captioner.py
+tests/test_cli.py
+tests/test_srt_writer.py

subtitle_engine-0.1.0/src/subtitle_engine.egg-info/dependency_links.txt ADDED Viewed

	@@ -0,0 +1 @@
1	+

subtitle_engine-0.1.0/src/subtitle_engine.egg-info/entry_points.txt ADDED Viewed

	@@ -0,0 +1,2 @@
1	+ [console_scripts]
2	+ subeng = subtitle_engine.cli:app

subtitle_engine-0.1.0/src/subtitle_engine.egg-info/requires.txt ADDED Viewed

@@ -0,0 +1,6 @@
+typer>=0.12.0
+whisperx>=3.8.0
+requests>=2.32.0
+[dev]
+pytest>=8.0.0

subtitle_engine-0.1.0/src/subtitle_engine.egg-info/top_level.txt ADDED Viewed

	@@ -0,0 +1 @@
1	+ subtitle_engine

subtitle_engine-0.1.0/tests/test_captioner.py ADDED Viewed

@@ -0,0 +1,67 @@
+"""Tests for the Ollama captioner module."""
+from unittest.mock import Mock, patch
+import pytest
+import requests
+from subtitle_engine.captioner import generate_caption, list_models
+@patch("subtitle_engine.captioner.requests.post")
+def test_generate_caption_success(mock_post):
+    mock_post.return_value = Mock(
+        status_code=200,
+        json=lambda: {"response": "  A short caption.  "},
+        raise_for_status=lambda: None,
+    )
+    caption = generate_caption(
+        "hello world",
+        model="qwen3.5:0.8b",
+        host="http://localhost:11434",
+    )
+    assert caption == "A short caption."
+    mock_post.assert_called_once()
+    _, kwargs = mock_post.call_args
+    assert kwargs["json"]["model"] == "qwen3.5:0.8b"
+    assert "hello world" in kwargs["json"]["prompt"]
+@patch("subtitle_engine.captioner.requests.post")
+def test_generate_caption_empty_response(mock_post):
+    mock_post.return_value = Mock(
+        status_code=200,
+        json=lambda: {"response": "   "},
+        raise_for_status=lambda: None,
+    )
+    with pytest.raises(ValueError, match="empty caption"):
+        generate_caption("hello", model="qwen3.5:0.8b")
+def test_generate_caption_empty_transcript():
+    with pytest.raises(ValueError, match="empty transcript"):
+        generate_caption("   ", model="qwen3.5:0.8b")
+@patch(
+    "subtitle_engine.captioner.requests.post",
+    side_effect=requests.ConnectionError("connection refused"),
+)
+def test_generate_caption_connection_error(mock_post):
+    with pytest.raises(ConnectionError):
+        generate_caption("hello", model="qwen3.5:0.8b")
+@patch("subtitle_engine.captioner.requests.get")
+def test_list_models(mock_get):
+    mock_get.return_value = Mock(
+        status_code=200,
+        json=lambda: {"models": [{"name": "qwen3.5:0.8b"}, {"name": "llama3.2"}]},
+        raise_for_status=lambda: None,
+    )
+    models = list_models()
+    assert models == ["qwen3.5:0.8b", "llama3.2"]

subtitle_engine-0.1.0/tests/test_cli.py ADDED Viewed

@@ -0,0 +1,51 @@
+"""Tests for CLI helpers and argument parsing."""
+from pathlib import Path
+import pytest
+from typer.testing import CliRunner
+from subtitle_engine.cli import app
+from subtitle_engine.utils import resolve_output_path, validate_media_file
+runner = CliRunner()
+def test_resolve_output_path_default():
+    input_path = Path("movie.mp4")
+    assert resolve_output_path(input_path) == Path("movie.srt")
+def test_resolve_output_path_explicit():
+    input_path = Path("movie.mp4")
+    output = Path("custom.srt")
+    assert resolve_output_path(input_path, output) == output
+def test_validate_media_file_supported():
+    validate_media_file(Path("video.mp4"))
+def test_validate_media_file_unsupported():
+    with pytest.raises(ValueError, match="Unsupported file type"):
+        validate_media_file(Path("file.txt"))
+def test_cli_help():
+    result = runner.invoke(app, ["--help"])
+    assert result.exit_code == 0
+    assert "Generate SRT subtitles" in result.output
+def test_cli_no_args():
+    result = runner.invoke(app)
+    assert result.exit_code != 0
+    assert "Usage:" in result.output
+def test_caption_requires_ollama_model(tmp_path: Path):
+    media = tmp_path / "video.mp4"
+    media.write_bytes(b"fake")
+    result = runner.invoke(app, [str(media), "--caption"])
+    assert result.exit_code != 0
+    assert "--ollama-model is required" in result.output

subtitle_engine-0.1.0/tests/test_srt_writer.py ADDED Viewed

@@ -0,0 +1,66 @@
+"""Tests for the SRT writer module."""
+from pathlib import Path
+from subtitle_engine.srt_writer import (
+    _format_segment,
+    _format_time,
+    segments_to_srt,
+    write_srt,
+)
+def test_format_time_zero():
+    assert _format_time(0.0) == "00:00:00,000"
+def test_format_time_with_hours():
+    assert _format_time(3661.123) == "01:01:01,123"
+def test_format_time_milliseconds_rounding():
+    assert _format_time(0.9996) == "00:00:01,000"
+def test_format_time_millis_ceiling_guard():
+    # 1.9999 rounds to 2.000 -> should not produce 1000 ms
+    assert _format_time(1.9999) == "00:00:02,000"
+def test_format_segment():
+    block = _format_segment(1, 1.5, 4.25, "Hello world")
+    assert block == "1\n00:00:01,500 --> 00:00:04,250\nHello world\n"
+def test_segments_to_srt():
+    segments = [
+        {"start": 0.0, "end": 2.0, "text": "First line"},
+        {"start": 3.5, "end": 5.5, "text": "Second line"},
+    ]
+    srt = segments_to_srt(segments)
+    assert "1\n00:00:00,000 --> 00:00:02,000\nFirst line" in srt
+    assert "2\n00:00:03,500 --> 00:00:05,500\nSecond line" in srt
+def test_segments_to_srt_empty_text_falls_back():
+    srt = segments_to_srt([{"start": 0.0, "end": 1.0, "text": "   "}])
+    assert "..." in srt
+def test_segments_to_srt_empty():
+    assert segments_to_srt([]) == ""
+def test_write_srt(tmp_path: Path):
+    segments = [{"start": 0.0, "end": 1.0, "text": "Hello"}]
+    output = tmp_path / "subs.srt"
+    write_srt(segments, output)
+    assert output.exists()
+    assert "00:00:00,000 --> 00:00:01,000" in output.read_text(encoding="utf-8")
+def test_write_srt_creates_parent_dirs(tmp_path: Path):
+    segments = [{"start": 0.0, "end": 1.0, "text": "Hello"}]
+    output = tmp_path / "nested" / "dir" / "subs.srt"
+    write_srt(segments, output)
+    assert output.exists()