PyPI - karaoke-gen - Versions diffs - 0.76.20__py3-none-any.whl → 0.82.0__py3-none-any.whl - Mend

karaoke-gen 0.76.20py3-none-any.whl → 0.82.0py3-none-any.whl

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (35) hide show

karaoke_gen/instrumental_review/static/index.html CHANGED Viewed

@@ -60,8 +60,16 @@
         .logo {
             font-size: 1.25rem;
             font-weight: 600;
+            display: inline-flex;
+            align-items: center;
+            gap: 8px;
         }
+        .logo-img {
+            height: 40px;
+            width: auto;
+        }
         .track-info {
             font-size: 0.9rem;
             color: var(--text-muted);
@@ -568,6 +576,143 @@
             font-size: 1.5rem;
             color: var(--success);
         }
+        /* Mobile responsiveness */
+        @media (max-width: 768px) {
+            .app {
+                padding: 12px;
+                gap: 8px;
+                height: auto;
+                min-height: 100vh;
+                overflow-y: auto;
+            }
+            body {
+                overflow: auto;
+            }
+            .header {
+                flex-direction: column;
+                align-items: flex-start;
+                gap: 8px;
+            }
+            .header-left {
+                width: 100%;
+            }
+            .header-right {
+                width: 100%;
+                justify-content: flex-start;
+                flex-wrap: wrap;
+            }
+            .logo {
+                font-size: 1rem;
+            }
+            .logo-img {
+                height: 32px;
+            }
+            .waveform-player {
+                flex: none;
+                min-height: 200px;
+            }
+            .waveform-toolbar {
+                flex-wrap: wrap;
+                padding: 8px 12px;
+                gap: 8px;
+            }
+            .toolbar-left,
+            .toolbar-center,
+            .toolbar-right {
+                flex-wrap: wrap;
+            }
+            .audio-toggle-group {
+                order: 10;
+                width: 100%;
+                justify-content: center;
+            }
+            .bottom-section {
+                flex-direction: column;
+            }
+            .mute-panel {
+                max-height: none;
+            }
+            .selection-panel {
+                width: 100%;
+            }
+            .selection-option {
+                padding: 12px;
+            }
+            .btn {
+                min-height: 44px;
+                padding: 8px 12px;
+            }
+            .btn-icon {
+                width: 44px;
+                height: 44px;
+            }
+            .audio-toggle {
+                padding: 8px 12px;
+                min-height: 40px;
+            }
+            .zoom-btn {
+                width: 40px;
+                height: 40px;
+            }
+            .time-display {
+                font-size: 0.9rem;
+            }
+        }
+        @media (max-width: 480px) {
+            .app {
+                padding: 8px;
+            }
+            .header-left {
+                flex-wrap: wrap;
+            }
+            .track-info {
+                width: 100%;
+                margin-top: 4px;
+            }
+            .waveform-toolbar {
+                padding: 6px 8px;
+            }
+            .toolbar-center {
+                width: 100%;
+                justify-content: center;
+                order: -1;
+            }
+            .toolbar-left {
+                order: 1;
+            }
+            .toolbar-right {
+                order: 2;
+                width: 100%;
+                justify-content: space-between;
+            }
+        }
     </style>
 </head>
 <body>
@@ -641,7 +786,8 @@
                 if (waveformRes.ok) {
                     waveformData = await waveformRes.json();
-                    duration = waveformData.duration;
+                    // API returns duration_seconds, not duration
+                    duration = waveformData.duration_seconds || 0;
                 }
                 // Set initial selection based on recommendation
@@ -679,7 +825,7 @@
             app.innerHTML = `
                 <div class="header">
                     <div class="header-left">
-                        <span class="logo">🎤 Instrumental Review</span>
+                        <span class="logo"><img src="https://gen.nomadkaraoke.com/nomad-karaoke-logo.svg" alt="Nomad Karaoke" class="logo-img" onerror="this.style.display='none'"> Instrumental Review</span>
                         <span class="track-info">${escapeHtml(analysisData.artist) || ''} ${analysisData.artist && analysisData.title ? '–' : ''} ${escapeHtml(analysisData.title) || ''}</span>
                     </div>
                     <div class="header-right">
@@ -969,8 +1115,14 @@
             canvas.onmousedown = (e) => {
                 const rect = canvas.getBoundingClientRect();
                 const x = e.clientX - rect.left;
+                // Guard against invalid duration
+                if (!Number.isFinite(duration) || duration <= 0 || !Number.isFinite(rect.width) || rect.width <= 0) {
+                    return;
+                }
                 const time = (x / rect.width) * duration;
                 // Shift+drag to select mute region
                 if (e.shiftKey) {
                     isDragging = true;
@@ -993,18 +1145,26 @@
             const endDrag = (e) => {
                 if (!isDragging) return;
                 const rect = canvas.getBoundingClientRect();
                 const x = e.clientX - rect.left;
+                // Guard against invalid duration
+                if (!Number.isFinite(duration) || duration <= 0 || !Number.isFinite(rect.width) || rect.width <= 0) {
+                    isDragging = false;
+                    showSelectionOverlay(false);
+                    return;
+                }
                 const time = (x / rect.width) * duration;
                 const start = Math.min(dragStartTime, time);
                 const end = Math.max(dragStartTime, time);
                 if (end - start > 0.5) {
                     addRegion(start, end);
                 }
                 isDragging = false;
                 showSelectionOverlay(false);
             };
@@ -1090,14 +1250,15 @@
         function seekTo(time, autoPlay = true) {
             const audio = document.getElementById('audio-player');
-            if (audio) {
-                audio.currentTime = time;
-                currentTime = time;
-                updatePlayhead();
-                // Auto-play when seeking via click (if not already playing)
-                if (autoPlay && !isPlaying) {
-                    audio.play();
-                }
+            // Guard against non-finite time values (NaN, Infinity)
+            if (!audio || !Number.isFinite(time)) return;
+            audio.currentTime = time;
+            currentTime = time;
+            updatePlayhead();
+            // Auto-play when seeking via click (if not already playing)
+            if (autoPlay && !isPlaying) {
+                audio.play();
             }
         }
@@ -1155,6 +1316,8 @@
         }
         function formatTime(seconds) {
+            // Guard against NaN/Infinity
+            if (!Number.isFinite(seconds)) return '0:00';
             const mins = Math.floor(seconds / 60);
             const secs = Math.floor(seconds % 60);
             return `${mins}:${secs.toString().padStart(2, '0')}`;

karaoke_gen/karaoke_gen.py CHANGED Viewed

@@ -796,21 +796,22 @@ class KaraokePrep:
                     outputs = output_generator.generate_outputs(
                         transcription_corrected=correction_result,
+                        lyrics_results={},  # Lyrics already written during transcription phase
                         audio_filepath=audio_path,
                         output_prefix=output_prefix,
                     )
                     # Copy video to expected location in parent directory
-                    if outputs and outputs.get("video_filepath"):
-                        source_video = outputs["video_filepath"]
+                    if outputs and outputs.video:
+                        source_video = outputs.video
                         dest_video = os.path.join(track_output_dir, f"{artist_title} (With Vocals).mkv")
                         shutil.copy2(source_video, dest_video)
                         self.logger.info(f"Video rendered successfully: {dest_video}")
                         processed_track["with_vocals_video"] = dest_video
                         # Update ASS filepath for video background processing
-                        if outputs.get("ass_filepath"):
-                            processed_track["ass_filepath"] = outputs["ass_filepath"]
+                        if outputs.ass:
+                            processed_track["ass_filepath"] = outputs.ass
                     else:
                         self.logger.warning("Video rendering did not produce expected output")
                 else:

karaoke_gen/lyrics_processor.py CHANGED Viewed

@@ -170,15 +170,15 @@ class LyricsProcessor:
     def _check_transcription_providers(self) -> dict:
         """
         Check which transcription providers are configured and return their status.
         Returns:
             dict with 'configured' (list of provider names) and 'missing' (list of missing configs)
         """
         load_dotenv()
         configured = []
         missing = []
         # Check AudioShake
         audioshake_token = os.getenv("AUDIOSHAKE_API_TOKEN")
         if audioshake_token:
@@ -187,7 +187,7 @@ class LyricsProcessor:
         else:
             missing.append("AudioShake (AUDIOSHAKE_API_TOKEN)")
             self.logger.debug("AudioShake transcription provider: not configured (missing AUDIOSHAKE_API_TOKEN)")
         # Check Whisper via RunPod
         runpod_key = os.getenv("RUNPOD_API_KEY")
         whisper_id = os.getenv("WHISPER_RUNPOD_ID")
@@ -203,7 +203,16 @@ class LyricsProcessor:
         else:
             missing.append("Whisper (RUNPOD_API_KEY + WHISPER_RUNPOD_ID)")
             self.logger.debug("Whisper transcription provider: not configured")
+        # Check Local Whisper (whisper-timestamped)
+        try:
+            import whisper_timestamped
+            configured.append("Local Whisper")
+            self.logger.debug("Local Whisper transcription provider: configured (whisper-timestamped installed)")
+        except ImportError:
+            missing.append("Local Whisper (pip install karaoke-gen[local-whisper])")
+            self.logger.debug("Local Whisper transcription provider: not configured (whisper-timestamped not installed)")
         return {"configured": configured, "missing": missing}
     def _build_transcription_provider_error_message(self, missing_providers: list) -> str:
@@ -221,12 +230,18 @@ class LyricsProcessor:
             "   - Set environment variable: AUDIOSHAKE_API_TOKEN=your_token\n"
             "   - Get an API key at: https://www.audioshake.ai/\n"
             "\n"
-            "2. Whisper via RunPod (Open-source alternative)\n"
+            "2. Whisper via RunPod (Cloud-based open-source)\n"
             "   - Set environment variables:\n"
             "     RUNPOD_API_KEY=your_key\n"
             "     WHISPER_RUNPOD_ID=your_endpoint_id\n"
             "   - Set up a Whisper endpoint at: https://www.runpod.io/\n"
             "\n"
+            "3. Local Whisper (No cloud required - runs on your machine)\n"
+            "   - Install with: pip install karaoke-gen[local-whisper]\n"
+            "   - For CPU-only: pip install torch torchaudio --index-url https://download.pytorch.org/whl/cpu\n"
+            "                   pip install karaoke-gen[local-whisper]\n"
+            "   - Requires 2-10GB RAM depending on model size\n"
+            "\n"
             "ALTERNATIVES:\n"
             "\n"
             "- Use --skip-lyrics flag to generate instrumental-only karaoke (no synchronized lyrics)\n"
@@ -348,6 +363,10 @@ class LyricsProcessor:
         # Create config objects for LyricsTranscriber
         transcriber_config = TranscriberConfig(
             audioshake_api_token=env_config.get("audioshake_api_token"),
+            runpod_api_key=env_config.get("runpod_api_key"),
+            whisper_runpod_id=env_config.get("whisper_runpod_id"),
+            # Local Whisper is enabled by default as a fallback when no cloud providers are configured
+            enable_local_whisper=True,
         )
         lyrics_config = LyricsConfig(

{karaoke_gen-0.76.20.dist-info → karaoke_gen-0.82.0.dist-info}/METADATA RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: karaoke-gen
-Version: 0.76.20
+Version: 0.82.0
 Summary: Generate karaoke videos with synchronized lyrics. Handles the entire process from downloading audio and lyrics to creating the final video with title screens.
 License: MIT
 License-File: LICENSE
@@ -13,12 +13,14 @@ Classifier: Programming Language :: Python :: 3.10
 Classifier: Programming Language :: Python :: 3.11
 Classifier: Programming Language :: Python :: 3.12
 Classifier: Programming Language :: Python :: 3.13
+Provides-Extra: local-whisper
 Requires-Dist: argparse (>=1.4.0)
 Requires-Dist: attrs (>=24.2.0)
 Requires-Dist: audio-separator[cpu] (>=0.34.0)
 Requires-Dist: beautifulsoup4 (>=4)
 Requires-Dist: cattrs (>=24.1.2)
 Requires-Dist: dropbox (>=12)
+Requires-Dist: email-validator (>=2.0.0)
 Requires-Dist: fastapi (>=0.104.0)
 Requires-Dist: fetch-lyrics-from-genius (>=0.1)
 Requires-Dist: ffmpeg-python (>=0.2.0,<0.3.0)
@@ -40,6 +42,7 @@ Requires-Dist: kbputils (>=0.0.16,<0.0.17)
 Requires-Dist: langchain (>=0.3.0)
 Requires-Dist: langchain-anthropic (>=0.2.0)
 Requires-Dist: langchain-core (>=0.3.0)
+Requires-Dist: langchain-google-vertexai (>=3.1.1)
 Requires-Dist: langchain-ollama (>=0.2.0)
 Requires-Dist: langchain-openai (>=0.2.0)
 Requires-Dist: langfuse (>=3.0.0)
@@ -74,10 +77,12 @@ Requires-Dist: python-levenshtein (>=0.26)
 Requires-Dist: python-multipart (>=0.0.20,<0.0.21)
 Requires-Dist: python-slugify (>=8)
 Requires-Dist: requests (>=2)
+Requires-Dist: sendgrid (>=6.10.0)
 Requires-Dist: shortuuid (>=1.0.13)
 Requires-Dist: spacy (>=3.8.7)
 Requires-Dist: spacy-syllables (>=3)
 Requires-Dist: srsly (>=2.5.1)
+Requires-Dist: stripe (>=7.0.0)
 Requires-Dist: syllables (>=1)
 Requires-Dist: syrics (>=0)
 Requires-Dist: thefuzz (>=0.22)
@@ -86,6 +91,7 @@ Requires-Dist: torch (>=2.7)
 Requires-Dist: tqdm (>=4.67)
 Requires-Dist: transformers (>=4.47)
 Requires-Dist: uvicorn[standard] (>=0.24.0)
+Requires-Dist: whisper-timestamped (>=1.15.0) ; extra == "local-whisper"
 Requires-Dist: yt-dlp (>=2024.0.0)
 Project-URL: Documentation, https://github.com/nomadkaraoke/karaoke-gen/blob/main/README.md
 Project-URL: Homepage, https://github.com/nomadkaraoke/karaoke-gen
@@ -165,8 +171,40 @@ export AUDIOSHAKE_API_TOKEN="your_audioshake_token"
 Get an API key at [https://www.audioshake.ai/](https://www.audioshake.ai/) - business only, at time of writing this.
-#### Option 2: Whisper via RunPod
-Open-source alternative using OpenAI's Whisper model on RunPod infrastructure.
+#### Option 2: Local Whisper (No Cloud Required)
+Run Whisper directly on your local machine using whisper-timestamped. Works on CPU, NVIDIA GPU (CUDA), or Apple Silicon.
+```bash
+# Install with local Whisper support
+pip install "karaoke-gen[local-whisper]"
+# Optional: Configure model size (tiny, base, small, medium, large)
+export WHISPER_MODEL_SIZE="medium"
+# Optional: Force specific device (cpu, cuda, mps)
+export WHISPER_DEVICE="cpu"
+```
+**Model Size Guide:**
+| Model | VRAM | Speed | Quality |
+|-------|------|-------|---------|
+| tiny | ~1GB | Fast | Lower |
+| base | ~1GB | Fast | Basic |
+| small | ~2GB | Medium | Good |
+| medium | ~5GB | Slower | Better |
+| large | ~10GB | Slowest | Best |
+**CPU-Only Installation** (no GPU required):
+```bash
+# Pre-install CPU-only PyTorch first
+pip install torch torchaudio --index-url https://download.pytorch.org/whl/cpu
+pip install "karaoke-gen[local-whisper]"
+```
+Local Whisper runs automatically as a fallback when no cloud transcription services are configured.
+#### Option 3: Whisper via RunPod
+Cloud-based alternative using OpenAI's Whisper model on RunPod infrastructure.
 ```bash
 export RUNPOD_API_KEY="your_runpod_key"
@@ -668,6 +706,44 @@ If the output video has quality problems:
 - Check available codecs: `ffmpeg -codecs`
 - For 4K output, ensure sufficient disk space (10GB+ per track)
+### Local Whisper Issues
+#### GPU Out of Memory
+If you get CUDA out of memory errors:
+```bash
+# Use a smaller model
+export WHISPER_MODEL_SIZE="small"  # or "tiny"
+# Or force CPU mode
+export WHISPER_DEVICE="cpu"
+```
+#### Slow Transcription on CPU
+CPU transcription is significantly slower than GPU. For faster processing:
+- Use a smaller model (`tiny` or `base`)
+- Consider using cloud transcription (AudioShake or RunPod)
+- On Apple Silicon, the `small` model offers good speed/quality balance
+#### Model Download Issues
+Whisper models are downloaded on first use (~1-3GB depending on size). If downloads fail:
+- Check your internet connection
+- Set a custom cache directory: `export WHISPER_CACHE_DIR="/path/with/space"`
+- Models are cached in `~/.cache/whisper/` by default
+#### whisper-timestamped Not Found
+If you get "whisper-timestamped is not installed":
+```bash
+pip install "karaoke-gen[local-whisper]"
+# Or install directly:
+pip install whisper-timestamped
+```
+#### Disabling Local Whisper
+If you want to disable local Whisper (e.g., to force cloud transcription):
+```bash
+export ENABLE_LOCAL_WHISPER="false"
+```
 ---
 ## 🧪 Development

karaoke-gen 0.76.20__py3-none-any.whl → 0.82.0__py3-none-any.whl

karaoke-gen 0.76.20py3-none-any.whl → 0.82.0py3-none-any.whl