ai-sub 0.0.1__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
ai_sub-0.0.1/LICENSE ADDED
@@ -0,0 +1,21 @@
1
+ MIT License
2
+
3
+ Copyright (c) 2025 FlippFuzz
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ of this software and associated documentation files (the "Software"), to deal
7
+ in the Software without restriction, including without limitation the rights
8
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ copies of the Software, and to permit persons to whom the Software is
10
+ furnished to do so, subject to the following conditions:
11
+
12
+ The above copyright notice and this permission notice shall be included in all
13
+ copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21
+ SOFTWARE.
ai_sub-0.0.1/PKG-INFO ADDED
@@ -0,0 +1,103 @@
1
+ Metadata-Version: 2.4
2
+ Name: ai-sub
3
+ Version: 0.0.1
4
+ Summary: Generate and translate English and Japanese subtitles using AI.
5
+ Author: FlippFuzz
6
+ Project-URL: Homepage, https://github.com/FlippFuzz/ai-sub
7
+ Project-URL: Bug Tracker, https://github.com/FlippFuzz/ai-sub/issues
8
+ Requires-Python: >=3.10
9
+ Description-Content-Type: text/markdown
10
+ License-File: LICENSE
11
+ Requires-Dist: pysubs2
12
+ Requires-Dist: google-genai
13
+ Requires-Dist: static-ffmpeg
14
+ Requires-Dist: pymediainfo
15
+ Requires-Dist: json-repair
16
+ Requires-Dist: pydantic
17
+ Requires-Dist: retrying
18
+ Dynamic: license-file
19
+
20
+ # AI Sub: AI-Powered Subtitle Generation with Translation
21
+
22
+ [![PyPI version](https://img.shields.io/pypi/v/ai-sub)](https://pypi.org/project/ai-sub)
23
+ [![Downloads](https://img.shields.io/pypi/dw/ai-sub)](https://pypistats.org/packages/ai-sub)
24
+
25
+ ---
26
+ ## Project Overview
27
+ AI Sub is a powerful tool that leverages AI (currently Google Gemini) to produce English and Japanese subtitles for videos, translating between languages as necessary.
28
+ It is primarily tested and designed for Hololive concert/cover videos, but might work on other content.
29
+
30
+ ---
31
+
32
+ ## Showcase
33
+
34
+ Here's an example of subtitles generated by AI Sub:
35
+
36
+ [![Video Screenshot](https://github.com/FlippFuzz/ai-sub/raw/main/showcase/42h4ydJS3zk.png)](https://github.com/FlippFuzz/ai-sub/raw/main/showcase/42h4ydJS3zk.srt)
37
+
38
+ For more examples, please visit the [showcase directory](https://github.com/FlippFuzz/ai-sub/blob/main/showcase/README.md).
39
+
40
+ ---
41
+
42
+ ## Pros and cons of using Gemini as the AI model
43
+
44
+ ### Pros:
45
+ * **Multimodal Context:** Gemini's advanced multimodal capabilities enable it to analyze video content comprehensively, including on-screen text, for superior contextual understanding and more accurate subtitle generation.
46
+ * **Cloud-Based Processing:** All processing is efficiently handled on Google Gemini's infrastructure, eliminating the need for local GPUs or extensive computational resources on your machine.
47
+
48
+ ### Cons:
49
+ * **Timestamp Precision:** Subtitle timestamps may exhibit a minor offset of a few seconds.
50
+ * **Network Usage:** Uploading entire video files to Google's services will consume network bandwidth.
51
+
52
+ ---
53
+
54
+ ## How AI Sub Works
55
+ * **Video Segmentation:** The input video is first segmented into 180-second segments. This duration is configurable via the `--split_seconds` flag.
56
+ * **Concurrent Processing:** Each video segment is then sent to the AI model (Google Gemini) for subtitle generation. You can adjust the number of concurrent processing threads using the `--num_processing_threads` flag to optimize performance.
57
+ * **Subtitle Compilation:** All generated subtitle parts are then combined into a single, final subtitle file.
58
+
59
+ ---
60
+
61
+ ## Getting Started: A Quick Guide
62
+
63
+ ### 1. Obtain Your Google Gemini API Key
64
+ Follow these simple steps to acquire your API key:
65
+ 1. Sign in to [Google AI Studio](https://aistudio.google.com/app/apikey).
66
+ 2. Click "Create API Key."
67
+ 3. Copy and securely store your API key. **Never disclose your API key publicly.**
68
+
69
+ ### 2. Set Up Your Python Environment (Python 3.10+ Required)
70
+ Prepare your python virtual environment:
71
+ ```bash
72
+ python -m venv venv
73
+ source venv/bin/activate # On Windows, use `venv\Scripts\activate.bat`
74
+ pip install --upgrade ai-sub
75
+ ```
76
+
77
+ ### 3. Execute the Script
78
+ Run the application with your video file:
79
+ ```bash
80
+ ai-sub --api_key=YOUR_API_KEY "path/to/your/video.mp4"
81
+ ```
82
+ **Note**: Replace `YOUR_API_KEY` with your actual Google Gemini API key and `"path/to/your/video.mp4"` with the full path to your video file.
83
+
84
+ ---
85
+
86
+ ## Known Limitations
87
+
88
+ 1. **Timestamp Accuracy:** Subtitle timestamps may exhibit inaccuracies. This is an inherent characteristic of the Gemini AI model.
89
+ * Observations indicate that shorter video segments generally lead to improved timestamp accuracy.
90
+ * Requesting second-level precision for timestamps generally yields more accurate results compared to millisecond-level precision from the model. Consequently, the current implementation is designed to request second-level timestamps.
91
+
92
+ 2. **AI Hallucinations:** Like all AI models, Gemini may occasionally produce "hallucinations" or inaccurate information. This is a known characteristic of current AI technology.
93
+
94
+ If you encounter issues related to these limitations, consider re-processing specific video segments as detailed in the "Re-processing Specific Video Segments" section below.
95
+
96
+ ---
97
+
98
+ ## Re-processing Specific Video Segments
99
+ Intermediate files generated during processing are stored in the temporary directory, which defaults to `tmp_<input_file_name>` but can be specified using the `--temp_dir` CLI flag.
100
+ Users can examine these `part_XXX.json` files within this directory to review the AI's results for individual segments.
101
+ To re-process a specific video segment, simply delete its corresponding `part_XXX.json` file.
102
+ Upon subsequent execution, the script will automatically re-process only those segments for which the `part_XXX.json` file is absent.
103
+
ai_sub-0.0.1/README.md ADDED
@@ -0,0 +1,84 @@
1
+ # AI Sub: AI-Powered Subtitle Generation with Translation
2
+
3
+ [![PyPI version](https://img.shields.io/pypi/v/ai-sub)](https://pypi.org/project/ai-sub)
4
+ [![Downloads](https://img.shields.io/pypi/dw/ai-sub)](https://pypistats.org/packages/ai-sub)
5
+
6
+ ---
7
+ ## Project Overview
8
+ AI Sub is a powerful tool that leverages AI (currently Google Gemini) to produce English and Japanese subtitles for videos, translating between languages as necessary.
9
+ It is primarily tested and designed for Hololive concert/cover videos, but might work on other content.
10
+
11
+ ---
12
+
13
+ ## Showcase
14
+
15
+ Here's an example of subtitles generated by AI Sub:
16
+
17
+ [![Video Screenshot](https://github.com/FlippFuzz/ai-sub/raw/main/showcase/42h4ydJS3zk.png)](https://github.com/FlippFuzz/ai-sub/raw/main/showcase/42h4ydJS3zk.srt)
18
+
19
+ For more examples, please visit the [showcase directory](https://github.com/FlippFuzz/ai-sub/blob/main/showcase/README.md).
20
+
21
+ ---
22
+
23
+ ## Pros and cons of using Gemini as the AI model
24
+
25
+ ### Pros:
26
+ * **Multimodal Context:** Gemini's advanced multimodal capabilities enable it to analyze video content comprehensively, including on-screen text, for superior contextual understanding and more accurate subtitle generation.
27
+ * **Cloud-Based Processing:** All processing is efficiently handled on Google Gemini's infrastructure, eliminating the need for local GPUs or extensive computational resources on your machine.
28
+
29
+ ### Cons:
30
+ * **Timestamp Precision:** Subtitle timestamps may exhibit a minor offset of a few seconds.
31
+ * **Network Usage:** Uploading entire video files to Google's services will consume network bandwidth.
32
+
33
+ ---
34
+
35
+ ## How AI Sub Works
36
+ * **Video Segmentation:** The input video is first segmented into 180-second segments. This duration is configurable via the `--split_seconds` flag.
37
+ * **Concurrent Processing:** Each video segment is then sent to the AI model (Google Gemini) for subtitle generation. You can adjust the number of concurrent processing threads using the `--num_processing_threads` flag to optimize performance.
38
+ * **Subtitle Compilation:** All generated subtitle parts are then combined into a single, final subtitle file.
39
+
40
+ ---
41
+
42
+ ## Getting Started: A Quick Guide
43
+
44
+ ### 1. Obtain Your Google Gemini API Key
45
+ Follow these simple steps to acquire your API key:
46
+ 1. Sign in to [Google AI Studio](https://aistudio.google.com/app/apikey).
47
+ 2. Click "Create API Key."
48
+ 3. Copy and securely store your API key. **Never disclose your API key publicly.**
49
+
50
+ ### 2. Set Up Your Python Environment (Python 3.10+ Required)
51
+ Prepare your python virtual environment:
52
+ ```bash
53
+ python -m venv venv
54
+ source venv/bin/activate # On Windows, use `venv\Scripts\activate.bat`
55
+ pip install --upgrade ai-sub
56
+ ```
57
+
58
+ ### 3. Execute the Script
59
+ Run the application with your video file:
60
+ ```bash
61
+ ai-sub --api_key=YOUR_API_KEY "path/to/your/video.mp4"
62
+ ```
63
+ **Note**: Replace `YOUR_API_KEY` with your actual Google Gemini API key and `"path/to/your/video.mp4"` with the full path to your video file.
64
+
65
+ ---
66
+
67
+ ## Known Limitations
68
+
69
+ 1. **Timestamp Accuracy:** Subtitle timestamps may exhibit inaccuracies. This is an inherent characteristic of the Gemini AI model.
70
+ * Observations indicate that shorter video segments generally lead to improved timestamp accuracy.
71
+ * Requesting second-level precision for timestamps generally yields more accurate results compared to millisecond-level precision from the model. Consequently, the current implementation is designed to request second-level timestamps.
72
+
73
+ 2. **AI Hallucinations:** Like all AI models, Gemini may occasionally produce "hallucinations" or inaccurate information. This is a known characteristic of current AI technology.
74
+
75
+ If you encounter issues related to these limitations, consider re-processing specific video segments as detailed in the "Re-processing Specific Video Segments" section below.
76
+
77
+ ---
78
+
79
+ ## Re-processing Specific Video Segments
80
+ Intermediate files generated during processing are stored in the temporary directory, which defaults to `tmp_<input_file_name>` but can be specified using the `--temp_dir` CLI flag.
81
+ Users can examine these `part_XXX.json` files within this directory to review the AI's results for individual segments.
82
+ To re-process a specific video segment, simply delete its corresponding `part_XXX.json` file.
83
+ Upon subsequent execution, the script will automatically re-process only those segments for which the `part_XXX.json` file is absent.
84
+
@@ -0,0 +1,29 @@
1
+ [project]
2
+ name = "ai-sub"
3
+ version = "0.0.1"
4
+ authors = [
5
+ { name="FlippFuzz" },
6
+ ]
7
+ description = "Generate and translate English and Japanese subtitles using AI."
8
+ readme = "README.md"
9
+ requires-python = ">=3.10"
10
+ dependencies = [
11
+ "pysubs2",
12
+ "google-genai",
13
+ "static-ffmpeg",
14
+ "pymediainfo",
15
+ "json-repair",
16
+ "pydantic",
17
+ "retrying",
18
+ ]
19
+
20
+ [project.urls]
21
+ "Homepage" = "https://github.com/FlippFuzz/ai-sub"
22
+ "Bug Tracker" = "https://github.com/FlippFuzz/ai-sub/issues"
23
+
24
+ [project.scripts]
25
+ ai-sub = "ai_sub.main:main"
26
+
27
+ [tool.setuptools.packages.find]
28
+ where = ["src"]
29
+
ai_sub-0.0.1/setup.cfg ADDED
@@ -0,0 +1,4 @@
1
+ [egg_info]
2
+ tag_build =
3
+ tag_date = 0
4
+
File without changes
@@ -0,0 +1,223 @@
1
+ import logging
2
+ import os
3
+ from argparse import ArgumentParser, ArgumentTypeError, Namespace
4
+ from pathlib import Path
5
+ from typing import Tuple
6
+
7
+ from google.genai.types import File
8
+
9
+
10
+ def check_file_exists(filepath_str: str) -> Path:
11
+ """Checks if a given file path string corresponds to an existing file.
12
+
13
+ Args:
14
+ filepath_str (str): The string representation of the file path.
15
+
16
+ Returns:
17
+ Path: A resolved Path object if the file exists.
18
+
19
+ Raises:
20
+ ArgumentTypeError: If the file does not exist or is not a file.
21
+ """
22
+ # Resolve the path to get an absolute, normalized path, resolving symlinks
23
+ file_path = Path(filepath_str).resolve()
24
+
25
+ # Check if the path points to an actual file
26
+ if not file_path.is_file():
27
+ raise ArgumentTypeError(
28
+ f"Input file '{filepath_str}' does not exist or is not a file."
29
+ )
30
+
31
+ return file_path
32
+
33
+
34
+ def parse_arguments() -> Namespace:
35
+ """Parses command-line arguments for the Gemini TL application.
36
+
37
+ This function sets up an ArgumentParser with various options for API
38
+ configuration, file and directory handling, processing parameters, and
39
+ logging. It also performs validation for the API key and sets default
40
+ values for temporary and output directories if not provided.
41
+
42
+ Returns:
43
+ Namespace: An argparse Namespace object containing the parsed arguments.
44
+
45
+ Raises:
46
+ ArgumentTypeError: If the input file does not exist.
47
+ SystemExit: If no Gemini API key is provided.
48
+ """
49
+ parser = ArgumentParser(
50
+ description="AI-Powered Subtitle Generation with Translation.",
51
+ prog="ai-sub",
52
+ )
53
+ parser.add_argument(
54
+ "input_file", type=check_file_exists, help="Path to the input video file."
55
+ )
56
+
57
+ api_group = parser.add_argument_group("API Options")
58
+ api_group.add_argument(
59
+ "--api_key",
60
+ type=str,
61
+ default=os.environ.get("GOOGLE_API_KEY"),
62
+ help="Your Gemini API key (or set GOOGLE_API_KEY environment variable).",
63
+ )
64
+ api_group.add_argument(
65
+ "--rpm",
66
+ type=int,
67
+ default=5,
68
+ help="Requests per minute for Gemini API (default: 10).",
69
+ )
70
+ api_group.add_argument(
71
+ "--tpm",
72
+ type=int,
73
+ default=250000,
74
+ help="Tokens per minute for Gemini API (default: 250000).",
75
+ )
76
+ api_group.add_argument(
77
+ "--model",
78
+ type=str,
79
+ default="gemini-2.5-flash",
80
+ help="Gemini model to use (default: gemini-2.5-flash).",
81
+ )
82
+ api_group.add_argument(
83
+ "--thinking_budget",
84
+ type=int,
85
+ default=24576,
86
+ help="Thinking budget for Gemini API (default: 24576).",
87
+ )
88
+
89
+ file_group = parser.add_argument_group("File and Directory Options")
90
+ file_group.add_argument(
91
+ "--output_dir",
92
+ type=Path,
93
+ help="Directory to save output files (default: input_file's parent directory).",
94
+ )
95
+ file_group.add_argument(
96
+ "--temp_dir",
97
+ type=Path,
98
+ help="Directory to store temporary files (default: tmp_<input_file_name>}).",
99
+ )
100
+
101
+ processing_group = parser.add_argument_group("Processing Options")
102
+ processing_group.add_argument(
103
+ "--max_subtitle_chars",
104
+ type=int,
105
+ default=60,
106
+ help="Maximum character length for each subtitle entry (default: 60).",
107
+ )
108
+ processing_group.add_argument(
109
+ "--num_processing_threads",
110
+ type=int,
111
+ default=4,
112
+ help="Number of threads to use for parallel subtitle processing (default: 4).",
113
+ )
114
+ processing_group.add_argument(
115
+ "--num_upload_threads",
116
+ type=int,
117
+ default=4,
118
+ help="Number of threads to use for parallel file uploads (default: 4).",
119
+ )
120
+ processing_group.add_argument(
121
+ "--split_seconds",
122
+ type=int,
123
+ default=180,
124
+ help="Duration in seconds to split the video into segments (default: 180s).",
125
+ )
126
+
127
+ logging_group = parser.add_argument_group("Logging Options")
128
+ logging_group.add_argument(
129
+ "--log_level",
130
+ type=str,
131
+ default="INFO",
132
+ choices=["DEBUG", "INFO", "WARNING", "ERROR", "CRITICAL"],
133
+ help="Set the logging level (default: INFO).",
134
+ )
135
+
136
+ args = parser.parse_args()
137
+
138
+ if args.api_key is None:
139
+ parser.error(
140
+ "No Gemini API key provided. Use --api_key or set the GOOGLE_API_KEY "
141
+ "environment variable."
142
+ )
143
+
144
+ # Set default temp_dir if not provided
145
+ if args.temp_dir is None:
146
+ args.temp_dir = args.input_file.parent / f"tmp_{args.input_file.stem}"
147
+ args.temp_dir.mkdir(parents=True, exist_ok=True)
148
+
149
+ # Set default output_dir if not provided
150
+ if args.output_dir is None:
151
+ args.output_dir = args.input_file.parent
152
+
153
+ return args
154
+
155
+
156
+ def configure_logging(log_level: str):
157
+ """Configures the logging for the application.
158
+
159
+ This function sets up a stream handler for logging, defines the log format,
160
+ and sets the overall logging level. It also suppresses noisy INFO level
161
+ logs from specific external libraries like 'httpx' and 'google_genai.models'.
162
+
163
+ Args:
164
+ log_level (str): The desired logging level (e.g., "INFO", "DEBUG").
165
+ """
166
+ # Remove all existing handlers from the root logger to ensure a clean slate
167
+ for handler in logging.root.handlers[:]:
168
+ logging.root.removeHandler(handler)
169
+ handler.close()
170
+
171
+ # Create a formatter with the desired format (no date)
172
+ formatter = logging.Formatter("%(threadName)s %(levelname)s %(message)s")
173
+
174
+ # Create a stream handler and set the formatter
175
+ stream_handler = logging.StreamHandler()
176
+ stream_handler.setFormatter(formatter)
177
+
178
+ # Get the root logger and add the new handler
179
+ root_logger = logging.getLogger()
180
+ root_logger.addHandler(stream_handler)
181
+ root_logger.setLevel(log_level)
182
+
183
+ # Suppress INFO level logs from 'httpx' to reduce noise from HTTP request/response logging.
184
+ # Example noisy log: "INFO HTTP Request: GET https://generativelanguage.googleapis.com/v1beta/files?pageSize=100 'HTTP/1.1 200 OK'"
185
+ logging.getLogger("httpx").setLevel(logging.WARNING)
186
+ # Suppress INFO level logs from 'google.genai.models' to reduce noise from internal model operations.
187
+ # Example noisy log: "INFO AFC is enabled with max remote calls: 10."
188
+ # https://github.com/googleapis/python-genai/issues/278
189
+ logging.getLogger("google_genai.models").setLevel(logging.WARNING)
190
+
191
+
192
+ def generate_output_paths(
193
+ video_file: Path | File, args: Namespace
194
+ ) -> Tuple[Path, Path]:
195
+ """Generates the output paths for subtitle and state files.
196
+
197
+ Based on the input video file (either a local Path or a Gemini File object)
198
+ and the provided arguments, this function constructs the full paths for
199
+ where the generated subtitle file (.srt) and the processing state file (.json)
200
+ should be saved.
201
+
202
+ Args:
203
+ video_file (Path | File): The input video file, which can be a pathlib.Path
204
+ object for local files or a google.genai.types.File
205
+ object for uploaded files.
206
+ args (Namespace): An argparse Namespace object containing command-line arguments,
207
+ specifically `temp_dir` for the temporary directory.
208
+
209
+ Returns:
210
+ Tuple[Path, Path]: A tuple containing two Path objects:
211
+ - The full path for the output subtitle file (.srt).
212
+ - The full path for the output state file (.json).
213
+ """
214
+ stem = ""
215
+ if isinstance(video_file, Path):
216
+ stem = video_file.stem
217
+ elif isinstance(video_file, File):
218
+ stem = Path(str(video_file.display_name)).stem
219
+
220
+ output_subtitle_path = args.temp_dir / f"{stem}.srt"
221
+ output_state_path = args.temp_dir / f"{stem}.json"
222
+
223
+ return output_subtitle_path, output_state_path