PyPI - ai-sub - Versions diffs - 0.0.1__tar.gz - Mend

ai-sub 0.0.1__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (17) hide show

ai_sub-0.0.1/LICENSE +21 -0
ai_sub-0.0.1/PKG-INFO +103 -0
ai_sub-0.0.1/README.md +84 -0
ai_sub-0.0.1/pyproject.toml +29 -0
ai_sub-0.0.1/setup.cfg +4 -0
ai_sub-0.0.1/src/ai_sub/__init__.py +0 -0
ai_sub-0.0.1/src/ai_sub/config.py +223 -0
ai_sub-0.0.1/src/ai_sub/gemini.py +512 -0
ai_sub-0.0.1/src/ai_sub/main.py +183 -0
ai_sub-0.0.1/src/ai_sub/models.py +153 -0
ai_sub-0.0.1/src/ai_sub/video.py +95 -0
ai_sub-0.0.1/src/ai_sub.egg-info/PKG-INFO +103 -0
ai_sub-0.0.1/src/ai_sub.egg-info/SOURCES.txt +15 -0
ai_sub-0.0.1/src/ai_sub.egg-info/dependency_links.txt +1 -0
ai_sub-0.0.1/src/ai_sub.egg-info/entry_points.txt +2 -0
ai_sub-0.0.1/src/ai_sub.egg-info/requires.txt +7 -0
ai_sub-0.0.1/src/ai_sub.egg-info/top_level.txt +1 -0

ai_sub-0.0.1/LICENSE ADDED Viewed

@@ -0,0 +1,21 @@
+MIT License
+Copyright (c) 2025 FlippFuzz
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.

ai_sub-0.0.1/PKG-INFO ADDED Viewed

@@ -0,0 +1,103 @@
+Metadata-Version: 2.4
+Name: ai-sub
+Version: 0.0.1
+Summary: Generate and translate English and Japanese subtitles using AI.
+Author: FlippFuzz
+Project-URL: Homepage, https://github.com/FlippFuzz/ai-sub
+Project-URL: Bug Tracker, https://github.com/FlippFuzz/ai-sub/issues
+Requires-Python: >=3.10
+Description-Content-Type: text/markdown
+License-File: LICENSE
+Requires-Dist: pysubs2
+Requires-Dist: google-genai
+Requires-Dist: static-ffmpeg
+Requires-Dist: pymediainfo
+Requires-Dist: json-repair
+Requires-Dist: pydantic
+Requires-Dist: retrying
+Dynamic: license-file
+# AI Sub: AI-Powered Subtitle Generation with Translation
+[![PyPI version](https://img.shields.io/pypi/v/ai-sub)](https://pypi.org/project/ai-sub)
+[![Downloads](https://img.shields.io/pypi/dw/ai-sub)](https://pypistats.org/packages/ai-sub)
+---
+## Project Overview
+AI Sub is a powerful tool that leverages AI (currently Google Gemini) to produce English and Japanese subtitles for videos, translating between languages as necessary.
+It is primarily tested and designed for Hololive concert/cover videos, but might work on other content.
+---
+## Showcase
+Here's an example of subtitles generated by AI Sub:
+[![Video Screenshot](https://github.com/FlippFuzz/ai-sub/raw/main/showcase/42h4ydJS3zk.png)](https://github.com/FlippFuzz/ai-sub/raw/main/showcase/42h4ydJS3zk.srt)
+For more examples, please visit the [showcase directory](https://github.com/FlippFuzz/ai-sub/blob/main/showcase/README.md).
+---
+## Pros and cons of using Gemini as the AI model
+### Pros:
+*   **Multimodal Context:** Gemini's advanced multimodal capabilities enable it to analyze video content comprehensively, including on-screen text, for superior contextual understanding and more accurate subtitle generation.
+*   **Cloud-Based Processing:** All processing is efficiently handled on Google Gemini's infrastructure, eliminating the need for local GPUs or extensive computational resources on your machine.
+### Cons:
+*   **Timestamp Precision:** Subtitle timestamps may exhibit a minor offset of a few seconds.
+*   **Network Usage:** Uploading entire video files to Google's services will consume network bandwidth.
+---
+## How AI Sub Works
+*   **Video Segmentation:** The input video is first segmented into 180-second segments. This duration is configurable via the `--split_seconds` flag.
+*   **Concurrent Processing:** Each video segment is then sent to the AI model (Google Gemini) for subtitle generation. You can adjust the number of concurrent processing threads using the `--num_processing_threads` flag to optimize performance.
+*   **Subtitle Compilation:** All generated subtitle parts are then combined into a single, final subtitle file.
+---
+## Getting Started: A Quick Guide
+### 1. Obtain Your Google Gemini API Key
+Follow these simple steps to acquire your API key:
+1.  Sign in to [Google AI Studio](https://aistudio.google.com/app/apikey).
+2.  Click "Create API Key."
+3.  Copy and securely store your API key. **Never disclose your API key publicly.**
+### 2. Set Up Your Python Environment (Python 3.10+ Required)
+Prepare your python virtual environment:
+```bash
+python -m venv venv
+source venv/bin/activate  # On Windows, use `venv\Scripts\activate.bat`
+pip install --upgrade ai-sub
+```
+### 3. Execute the Script
+Run the application with your video file:
+```bash
+ai-sub --api_key=YOUR_API_KEY "path/to/your/video.mp4"
+```
+**Note**: Replace `YOUR_API_KEY` with your actual Google Gemini API key and `"path/to/your/video.mp4"` with the full path to your video file.
+---
+## Known Limitations
+1.  **Timestamp Accuracy:** Subtitle timestamps may exhibit inaccuracies. This is an inherent characteristic of the Gemini AI model.
+    *   Observations indicate that shorter video segments generally lead to improved timestamp accuracy.
+    *   Requesting second-level precision for timestamps generally yields more accurate results compared to millisecond-level precision from the model. Consequently, the current implementation is designed to request second-level timestamps.
+2.  **AI Hallucinations:** Like all AI models, Gemini may occasionally produce "hallucinations" or inaccurate information. This is a known characteristic of current AI technology.
+If you encounter issues related to these limitations, consider re-processing specific video segments as detailed in the "Re-processing Specific Video Segments" section below.
+---
+## Re-processing Specific Video Segments
+Intermediate files generated during processing are stored in the temporary directory, which defaults to `tmp_<input_file_name>` but can be specified using the `--temp_dir` CLI flag.
+Users can examine these `part_XXX.json` files within this directory to review the AI's results for individual segments.
+To re-process a specific video segment, simply delete its corresponding `part_XXX.json` file.
+Upon subsequent execution, the script will automatically re-process only those segments for which the `part_XXX.json` file is absent.

ai_sub-0.0.1/README.md ADDED Viewed

@@ -0,0 +1,84 @@
+# AI Sub: AI-Powered Subtitle Generation with Translation
+[![PyPI version](https://img.shields.io/pypi/v/ai-sub)](https://pypi.org/project/ai-sub)
+[![Downloads](https://img.shields.io/pypi/dw/ai-sub)](https://pypistats.org/packages/ai-sub)
+---
+## Project Overview
+AI Sub is a powerful tool that leverages AI (currently Google Gemini) to produce English and Japanese subtitles for videos, translating between languages as necessary.
+It is primarily tested and designed for Hololive concert/cover videos, but might work on other content.
+---
+## Showcase
+Here's an example of subtitles generated by AI Sub:
+[![Video Screenshot](https://github.com/FlippFuzz/ai-sub/raw/main/showcase/42h4ydJS3zk.png)](https://github.com/FlippFuzz/ai-sub/raw/main/showcase/42h4ydJS3zk.srt)
+For more examples, please visit the [showcase directory](https://github.com/FlippFuzz/ai-sub/blob/main/showcase/README.md).
+---
+## Pros and cons of using Gemini as the AI model
+### Pros:
+*   **Multimodal Context:** Gemini's advanced multimodal capabilities enable it to analyze video content comprehensively, including on-screen text, for superior contextual understanding and more accurate subtitle generation.
+*   **Cloud-Based Processing:** All processing is efficiently handled on Google Gemini's infrastructure, eliminating the need for local GPUs or extensive computational resources on your machine.
+### Cons:
+*   **Timestamp Precision:** Subtitle timestamps may exhibit a minor offset of a few seconds.
+*   **Network Usage:** Uploading entire video files to Google's services will consume network bandwidth.
+---
+## How AI Sub Works
+*   **Video Segmentation:** The input video is first segmented into 180-second segments. This duration is configurable via the `--split_seconds` flag.
+*   **Concurrent Processing:** Each video segment is then sent to the AI model (Google Gemini) for subtitle generation. You can adjust the number of concurrent processing threads using the `--num_processing_threads` flag to optimize performance.
+*   **Subtitle Compilation:** All generated subtitle parts are then combined into a single, final subtitle file.
+---
+## Getting Started: A Quick Guide
+### 1. Obtain Your Google Gemini API Key
+Follow these simple steps to acquire your API key:
+1.  Sign in to [Google AI Studio](https://aistudio.google.com/app/apikey).
+2.  Click "Create API Key."
+3.  Copy and securely store your API key. **Never disclose your API key publicly.**
+### 2. Set Up Your Python Environment (Python 3.10+ Required)
+Prepare your python virtual environment:
+```bash
+python -m venv venv
+source venv/bin/activate  # On Windows, use `venv\Scripts\activate.bat`
+pip install --upgrade ai-sub
+```
+### 3. Execute the Script
+Run the application with your video file:
+```bash
+ai-sub --api_key=YOUR_API_KEY "path/to/your/video.mp4"
+```
+**Note**: Replace `YOUR_API_KEY` with your actual Google Gemini API key and `"path/to/your/video.mp4"` with the full path to your video file.
+---
+## Known Limitations
+1.  **Timestamp Accuracy:** Subtitle timestamps may exhibit inaccuracies. This is an inherent characteristic of the Gemini AI model.
+    *   Observations indicate that shorter video segments generally lead to improved timestamp accuracy.
+    *   Requesting second-level precision for timestamps generally yields more accurate results compared to millisecond-level precision from the model. Consequently, the current implementation is designed to request second-level timestamps.
+2.  **AI Hallucinations:** Like all AI models, Gemini may occasionally produce "hallucinations" or inaccurate information. This is a known characteristic of current AI technology.
+If you encounter issues related to these limitations, consider re-processing specific video segments as detailed in the "Re-processing Specific Video Segments" section below.
+---
+## Re-processing Specific Video Segments
+Intermediate files generated during processing are stored in the temporary directory, which defaults to `tmp_<input_file_name>` but can be specified using the `--temp_dir` CLI flag.
+Users can examine these `part_XXX.json` files within this directory to review the AI's results for individual segments.
+To re-process a specific video segment, simply delete its corresponding `part_XXX.json` file.
+Upon subsequent execution, the script will automatically re-process only those segments for which the `part_XXX.json` file is absent.

ai_sub-0.0.1/pyproject.toml ADDED Viewed

@@ -0,0 +1,29 @@
+[project]
+name = "ai-sub"
+version = "0.0.1"
+authors = [
+  { name="FlippFuzz" },
+]
+description = "Generate and translate English and Japanese subtitles using AI."
+readme = "README.md"
+requires-python = ">=3.10"
+dependencies = [
+    "pysubs2",
+    "google-genai",
+    "static-ffmpeg",
+    "pymediainfo",
+    "json-repair",
+    "pydantic",
+    "retrying",
+]
+[project.urls]
+"Homepage" = "https://github.com/FlippFuzz/ai-sub"
+"Bug Tracker" = "https://github.com/FlippFuzz/ai-sub/issues"
+[project.scripts]
+ai-sub = "ai_sub.main:main"
+[tool.setuptools.packages.find]
+where = ["src"]

ai_sub-0.0.1/setup.cfg ADDED Viewed

@@ -0,0 +1,4 @@
+[egg_info]
+tag_build =
+tag_date = 0

ai_sub-0.0.1/src/ai_sub/__init__.py ADDED Viewed

File without changes

ai_sub-0.0.1/src/ai_sub/config.py ADDED Viewed

@@ -0,0 +1,223 @@
+import logging
+import os
+from argparse import ArgumentParser, ArgumentTypeError, Namespace
+from pathlib import Path
+from typing import Tuple
+from google.genai.types import File
+def check_file_exists(filepath_str: str) -> Path:
+    """Checks if a given file path string corresponds to an existing file.
+    Args:
+        filepath_str (str): The string representation of the file path.
+    Returns:
+        Path: A resolved Path object if the file exists.
+    Raises:
+        ArgumentTypeError: If the file does not exist or is not a file.
+    """
+    # Resolve the path to get an absolute, normalized path, resolving symlinks
+    file_path = Path(filepath_str).resolve()
+    # Check if the path points to an actual file
+    if not file_path.is_file():
+        raise ArgumentTypeError(
+            f"Input file '{filepath_str}' does not exist or is not a file."
+        )
+    return file_path
+def parse_arguments() -> Namespace:
+    """Parses command-line arguments for the Gemini TL application.
+    This function sets up an ArgumentParser with various options for API
+    configuration, file and directory handling, processing parameters, and
+    logging. It also performs validation for the API key and sets default
+    values for temporary and output directories if not provided.
+    Returns:
+        Namespace: An argparse Namespace object containing the parsed arguments.
+    Raises:
+        ArgumentTypeError: If the input file does not exist.
+        SystemExit: If no Gemini API key is provided.
+    """
+    parser = ArgumentParser(
+        description="AI-Powered Subtitle Generation with Translation.",
+        prog="ai-sub",
+    )
+    parser.add_argument(
+        "input_file", type=check_file_exists, help="Path to the input video file."
+    )
+    api_group = parser.add_argument_group("API Options")
+    api_group.add_argument(
+        "--api_key",
+        type=str,
+        default=os.environ.get("GOOGLE_API_KEY"),
+        help="Your Gemini API key (or set GOOGLE_API_KEY environment variable).",
+    )
+    api_group.add_argument(
+        "--rpm",
+        type=int,
+        default=5,
+        help="Requests per minute for Gemini API (default: 10).",
+    )
+    api_group.add_argument(
+        "--tpm",
+        type=int,
+        default=250000,
+        help="Tokens per minute for Gemini API (default: 250000).",
+    )
+    api_group.add_argument(
+        "--model",
+        type=str,
+        default="gemini-2.5-flash",
+        help="Gemini model to use (default: gemini-2.5-flash).",
+    )
+    api_group.add_argument(
+        "--thinking_budget",
+        type=int,
+        default=24576,
+        help="Thinking budget for Gemini API (default: 24576).",
+    )
+    file_group = parser.add_argument_group("File and Directory Options")
+    file_group.add_argument(
+        "--output_dir",
+        type=Path,
+        help="Directory to save output files (default: input_file's parent directory).",
+    )
+    file_group.add_argument(
+        "--temp_dir",
+        type=Path,
+        help="Directory to store temporary files (default: tmp_<input_file_name>}).",
+    )
+    processing_group = parser.add_argument_group("Processing Options")
+    processing_group.add_argument(
+        "--max_subtitle_chars",
+        type=int,
+        default=60,
+        help="Maximum character length for each subtitle entry (default: 60).",
+    )
+    processing_group.add_argument(
+        "--num_processing_threads",
+        type=int,
+        default=4,
+        help="Number of threads to use for parallel subtitle processing (default: 4).",
+    )
+    processing_group.add_argument(
+        "--num_upload_threads",
+        type=int,
+        default=4,
+        help="Number of threads to use for parallel file uploads (default: 4).",
+    )
+    processing_group.add_argument(
+        "--split_seconds",
+        type=int,
+        default=180,
+        help="Duration in seconds to split the video into segments (default: 180s).",
+    )
+    logging_group = parser.add_argument_group("Logging Options")
+    logging_group.add_argument(
+        "--log_level",
+        type=str,
+        default="INFO",
+        choices=["DEBUG", "INFO", "WARNING", "ERROR", "CRITICAL"],
+        help="Set the logging level (default: INFO).",
+    )
+    args = parser.parse_args()
+    if args.api_key is None:
+        parser.error(
+            "No Gemini API key provided. Use --api_key or set the GOOGLE_API_KEY "
+            "environment variable."
+        )
+    # Set default temp_dir if not provided
+    if args.temp_dir is None:
+        args.temp_dir = args.input_file.parent / f"tmp_{args.input_file.stem}"
+    args.temp_dir.mkdir(parents=True, exist_ok=True)
+    # Set default output_dir if not provided
+    if args.output_dir is None:
+        args.output_dir = args.input_file.parent
+    return args
+def configure_logging(log_level: str):
+    """Configures the logging for the application.
+    This function sets up a stream handler for logging, defines the log format,
+    and sets the overall logging level. It also suppresses noisy INFO level
+    logs from specific external libraries like 'httpx' and 'google_genai.models'.
+    Args:
+        log_level (str): The desired logging level (e.g., "INFO", "DEBUG").
+    """
+    # Remove all existing handlers from the root logger to ensure a clean slate
+    for handler in logging.root.handlers[:]:
+        logging.root.removeHandler(handler)
+        handler.close()
+    # Create a formatter with the desired format (no date)
+    formatter = logging.Formatter("%(threadName)s %(levelname)s %(message)s")
+    # Create a stream handler and set the formatter
+    stream_handler = logging.StreamHandler()
+    stream_handler.setFormatter(formatter)
+    # Get the root logger and add the new handler
+    root_logger = logging.getLogger()
+    root_logger.addHandler(stream_handler)
+    root_logger.setLevel(log_level)
+    # Suppress INFO level logs from 'httpx' to reduce noise from HTTP request/response logging.
+    # Example noisy log: "INFO HTTP Request: GET https://generativelanguage.googleapis.com/v1beta/files?pageSize=100 'HTTP/1.1 200 OK'"
+    logging.getLogger("httpx").setLevel(logging.WARNING)
+    # Suppress INFO level logs from 'google.genai.models' to reduce noise from internal model operations.
+    # Example noisy log: "INFO AFC is enabled with max remote calls: 10."
+    # https://github.com/googleapis/python-genai/issues/278
+    logging.getLogger("google_genai.models").setLevel(logging.WARNING)
+def generate_output_paths(
+    video_file: Path | File, args: Namespace
+) -> Tuple[Path, Path]:
+    """Generates the output paths for subtitle and state files.
+    Based on the input video file (either a local Path or a Gemini File object)
+    and the provided arguments, this function constructs the full paths for
+    where the generated subtitle file (.srt) and the processing state file (.json)
+    should be saved.
+    Args:
+        video_file (Path | File): The input video file, which can be a pathlib.Path
+                                  object for local files or a google.genai.types.File
+                                  object for uploaded files.
+        args (Namespace): An argparse Namespace object containing command-line arguments,
+                          specifically `temp_dir` for the temporary directory.
+    Returns:
+        Tuple[Path, Path]: A tuple containing two Path objects:
+                           - The full path for the output subtitle file (.srt).
+                           - The full path for the output state file (.json).
+    """
+    stem = ""
+    if isinstance(video_file, Path):
+        stem = video_file.stem
+    elif isinstance(video_file, File):
+        stem = Path(str(video_file.display_name)).stem
+    output_subtitle_path = args.temp_dir / f"{stem}.srt"
+    output_state_path = args.temp_dir / f"{stem}.json"
+    return output_subtitle_path, output_state_path