PyPI - supervoxtral - Versions diffs - 0.1.1__tar.gz → 0.1.2__tar.gz - Mend

supervoxtral 0.1.1tar.gz → 0.1.2tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (28) hide show

{supervoxtral-0.1.1 → supervoxtral-0.1.2}/AGENTS.md RENAMED Viewed

@@ -34,9 +34,9 @@ supervoxtral/
 - **Entry**: `svx/cli.py` Typer `record` command parses args (e.g., --prompt, --save-all, --gui, --transcribe).
 - **Config & Prompt**: Load `Config` via `Config.load()` (`core/config.py`); if transcribe_mode, skip prompt resolution; else resolve prompt with `cfg.resolve_prompt()` (`core/prompt.py`).
-- **Pipeline**: Run `RecordingPipeline` (`core/pipeline.py`): record WAV/stop (`core/audio.py`), optional conversion (ffmpeg), get provider/init (`providers/__init__.py`, e.g., `mistral.py` from `cfg`); if transcribe_mode: no prompt, model override to voxtral-mini-latest (with warning if changed), pass transcribe_mode to provider.transcribe; transcribe, conditional save (`core/storage.py` based on `keep_*`/`save_all`), clipboard copy, logging setup.
+- **Pipeline**: Run `RecordingPipeline` (`core/pipeline.py`): record WAV/stop (`core/audio.py`), optional conversion (ffmpeg), get provider/init (`providers/__init__.py`, e.g., `mistral.py` from `cfg`); if transcribe_mode (CLI only): no prompt, model override to voxtral-mini-latest (with warning if changed), pass transcribe_mode to provider.transcribe; for GUI: --transcribe ignored (warning), recording starts immediately, uses modular record()/process()/clean() with dynamic mode (Transcribe: no prompt, model override; Prompt: resolved prompt); transcribe, conditional save (`core/storage.py` based on `keep_*`/`save_all`), clipboard copy, logging setup.
 - **Cleanup**: Temp files auto-deleted (tempfile) if `keep_*=false`; dirs created only if persistence enabled.
-- **End**: Return `{"text": str, "raw": dict, "duration": float, "paths": dict}`; CLI prints result, GUI emits progress/updates via callback.
+- **End**: Return `{"text": str, "raw": dict, "duration": float, "paths": dict}`; CLI prints result, GUI emits progress/updates via callback (buttons: 'Transcribe' for stop/transcribe without prompt; 'Prompt' for stop/use resolved prompt; default 'Prompt' on Esc/close).
 ## Build & test
 ```bash

{supervoxtral-0.1.1 → supervoxtral-0.1.2}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: supervoxtral
-Version: 0.1.1
+Version: 0.1.2
 Summary: CLI/GUI audio recorder and transcription client using Mistral Voxtral (chat with audio and transcription).
 License: MIT
 License-File: LICENSE

{supervoxtral-0.1.1 → supervoxtral-0.1.2}/README.md RENAMED Viewed

@@ -1,6 +1,6 @@
 # supervoxtral
-![Supervoxtral](supervoxtral.png)
+![Supervoxtral](supervoxtral.gif)
 SuperVoxtral is a lightweight Python CLI/GUI utility for recording microphone audio and integrate with Mistral's Voxtral APIs for transcription or audio-enabled chat.
@@ -85,8 +85,8 @@ To get started quickly with SuperVoxtral:
    api_key = "your_mistral_api_key_here"
    ```
-3. Launch the GUI for transcription: `svx record --gui --transcribe`
-   This opens the minimal GUI, starts recording on launch, and transcribes the audio upon stopping (results copied to clipboard).
+3. Launch the GUI: `svx record --gui`
+   This opens the minimal GUI, starts recording immediately; click 'Transcribe' for pure transcription (no prompt) or 'Prompt' for prompted transcription (resolved prompt); --transcribe ignored with warning (results copied to clipboard).
 ### macOS Shortcuts Integration
@@ -206,7 +206,7 @@ svx record [OPTIONS]
 - `--user-prompt-file PATH` (or `--prompt-file PATH`): Path to a markdown file with the user prompt.
 - `--transcribe`: Enable pure transcription mode (ignores prompts; uses dedicated endpoint).
 - `--outfile-prefix PREFIX`: Custom prefix for output files (default: timestamp).
-- `--gui`: Launch the GUI frontend (respects config and other CLI options).
+- `--gui`: Launch the GUI frontend (interactive: recording starts immediately; buttons 'Transcribe' (pure, no prompt) or 'Prompt' (with resolved prompt); respects config and other CLI options; --transcribe ignored with warning).
 - `--save-all`: Override config to keep audio, transcripts, and logs for this run.
 - `--log-level LEVEL`: Set logging level (DEBUG, INFO, WARNING, ERROR; default: INFO).
@@ -218,7 +218,7 @@ svx record [OPTIONS]
 - Transcribe only: `svx record --transcribe`
   - No prompt; direct transcription. Add `--save-all` to persist.
 - Launch GUI: `svx record --gui`
-  - GUI respects config.toml and CLI flags (e.g., `--gui --save-all`).
+  - Interactive mode: recording starts immediately; click 'Transcribe' (pure transcription, no prompt) or 'Prompt' (with resolved prompt); --transcribe ignored with warning. GUI respects config.toml and CLI flags (e.g., `--gui --save-all`).
 **Prompt Resolution Priority** (for non-transcribe mode):
 1. CLI `--user-prompt` or `--user-prompt-file`
@@ -228,6 +228,7 @@ svx record [OPTIONS]
 ## Changelog
+- 0.1.2: Interactive mode in GUI (choose transcribe / prompt / cancel while recording)
 - 0.1.1: Minor updates to default config and default prompt
 ## License

{supervoxtral-0.1.1 → supervoxtral-0.1.2}/notes.md RENAMED Viewed

@@ -1,6 +1,5 @@
 todo
-- Bug config device
 - localisation reccording dans config
 - paste directement ?
 - nettoyer xml réponse (option)

{supervoxtral-0.1.1 → supervoxtral-0.1.2}/pyproject.toml RENAMED Viewed

@@ -4,7 +4,7 @@ build-backend = "hatchling.build"
 [project]
 name = "supervoxtral"
-version = "0.1.1"
+version = "0.1.2"
 description = "CLI/GUI audio recorder and transcription client using Mistral Voxtral (chat with audio and transcription)."
 requires-python = ">=3.11"
 license = { text = "MIT" }

supervoxtral-0.1.2/supervoxtral.gif ADDED Viewed

Binary file

{supervoxtral-0.1.1 → supervoxtral-0.1.2}/svx/cli.py RENAMED Viewed

@@ -191,6 +191,11 @@ def record(
         user_prompt = None
         user_prompt_file = None
+    if gui and transcribe:
+        console.print("[yellow]Warning: --transcribe has no effect in GUI mode.[/yellow]")
+        console.print("[yellow]Use the 'Transcribe' or 'Prompt' buttons in the interface.[/yellow]")
+        transcribe = False
     # If GUI requested, launch GUI with the resolved parameters and exit.
     if gui:
         from svx.ui.qt_app import run_gui
@@ -202,7 +207,6 @@ def record(
             user_prompt_file=user_prompt_file,
             save_all=save_all,
             outfile_prefix=outfile_prefix,
-            transcribe_mode=transcribe,
         )
         return

supervoxtral-0.1.2/svx/core/pipeline.py ADDED Viewed

@@ -0,0 +1,286 @@
+from __future__ import annotations
+import logging
+import tempfile
+import threading
+from collections.abc import Callable
+from logging import FileHandler
+from pathlib import Path
+from typing import Any
+import svx.core.config as config
+from svx.core.audio import convert_audio, record_wav, timestamp
+from svx.core.clipboard import copy_to_clipboard
+from svx.core.config import Config
+from svx.core.storage import save_transcript
+from svx.providers import get_provider
+class RecordingPipeline:
+    """
+    Centralized pipeline for recording audio, transcribing via provider, saving outputs,
+    and copying to clipboard. Handles temporary files when not keeping audio.
+    Supports runtime overrides like save_all for keeping all files and adding log handlers.
+    Optional progress_callback for status updates (e.g., for GUI).
+    Supports transcribe_mode for pure transcription without prompt using dedicated endpoint.
+    """
+    def __init__(
+        self,
+        cfg: Config,
+        user_prompt: str | None = None,
+        user_prompt_file: Path | None = None,
+        save_all: bool = False,
+        outfile_prefix: str | None = None,
+        progress_callback: Callable[[str], None] | None = None,
+        transcribe_mode: bool = False,
+    ) -> None:
+        self.cfg = cfg
+        self.user_prompt = user_prompt
+        self.user_prompt_file = user_prompt_file
+        self.save_all = save_all
+        self.outfile_prefix = outfile_prefix
+        self.progress_callback = progress_callback
+        self.transcribe_mode = transcribe_mode
+    def _status(self, msg: str) -> None:
+        """Emit status update via callback if provided."""
+        if self.progress_callback:
+            self.progress_callback(msg)
+        logging.info(msg)
+    def record(self, stop_event: threading.Event | None = None) -> tuple[Path, float]:
+        """
+        Record audio and return wav_path, duration.
+        Returns:
+            tuple[Path, float]: wav_path, duration.
+        """
+        # Resolve parameters
+        _provider = self.cfg.defaults.provider
+        audio_format = self.cfg.defaults.format
+        model = self.cfg.defaults.model
+        _original_model = model
+        _language = self.cfg.defaults.language
+        rate = self.cfg.defaults.rate
+        channels = self.cfg.defaults.channels
+        device = self.cfg.defaults.device
+        base = self.outfile_prefix or f"rec_{timestamp()}"
+        keep_audio = self.save_all or self.cfg.defaults.keep_audio_files
+        # Validation (fail fast)
+        if channels not in (1, 2):
+            raise ValueError("channels must be 1 or 2")
+        if rate <= 0:
+            raise ValueError("rate must be > 0")
+        if audio_format not in {"wav", "mp3", "opus"}:
+            raise ValueError("format must be one of wav|mp3|opus")
+        stop_for_recording = stop_event or threading.Event()
+        self._status("Recording...")
+        if keep_audio:
+            self.cfg.recordings_dir.mkdir(parents=True, exist_ok=True)
+            wav_path = self.cfg.recordings_dir / f"{base}.wav"
+            duration = record_wav(
+                wav_path,
+                samplerate=rate,
+                channels=channels,
+                device=device,
+                stop_event=stop_for_recording,
+            )
+        else:
+            # Use mktemp for temp wav_path
+            wav_path = Path(tempfile.mktemp(suffix=".wav"))
+            duration = record_wav(
+                wav_path,
+                samplerate=rate,
+                channels=channels,
+                device=device,
+                stop_event=stop_for_recording,
+            )
+        self._status("Recording completed.")
+        return wav_path, duration
+    def _setup_save_all(self) -> None:
+        """Apply save_all overrides: set keeps to True, create dirs, add file logging."""
+        if not self.save_all:
+            return
+        # Override config defaults
+        self.cfg.defaults.keep_audio_files = True
+        self.cfg.defaults.keep_transcript_files = True
+        self.cfg.defaults.keep_log_files = True
+        # Ensure directories
+        config.RECORDINGS_DIR.mkdir(parents=True, exist_ok=True)
+        config.TRANSCRIPTS_DIR.mkdir(parents=True, exist_ok=True)
+        config.LOGS_DIR.mkdir(parents=True, exist_ok=True)
+        # Add file handler if not present
+        root_logger = logging.getLogger()
+        if not any(isinstance(h, FileHandler) for h in root_logger.handlers):  # type: ignore[reportUnknownMemberType]
+            from svx.core.config import _get_log_level
+            log_level_int = _get_log_level(self.cfg.defaults.log_level)
+            formatter = logging.Formatter("%(asctime)s | %(levelname)s | %(name)s | %(message)s")
+            file_handler = logging.FileHandler(config.LOGS_DIR / "app.log", encoding="utf-8")
+            file_handler.setLevel(log_level_int)
+            file_handler.setFormatter(formatter)
+            root_logger.addHandler(file_handler)
+            logging.info("File logging enabled for this run")
+    def process(
+        self, wav_path: Path, duration: float, transcribe_mode: bool, user_prompt: str | None = None
+    ) -> dict[str, Any]:
+        """
+        Process recorded audio: convert if needed, transcribe, save, copy.
+        Args:
+            wav_path: Path to the recorded WAV file.
+            duration: Recording duration in seconds.
+            transcribe_mode: Whether to use pure transcription mode.
+            user_prompt: User prompt to use (None for transcribe_mode).
+        Returns:
+            Dict with 'text' (str), 'raw' (dict), 'duration' (float),
+            'paths' (dict of Path or None).
+        """
+        # Resolve parameters
+        provider = self.cfg.defaults.provider
+        audio_format = self.cfg.defaults.format
+        model = self.cfg.defaults.model
+        original_model = model
+        if transcribe_mode:
+            model = "voxtral-mini-latest"
+            if original_model != "voxtral-mini-latest":
+                logging.warning(
+                    "Transcribe mode: model override from '%s' to 'voxtral-mini-latest'\n"
+                    "(optimized for transcription).",
+                    original_model,
+                )
+        language = self.cfg.defaults.language
+        if wav_path.stem.endswith(".wav"):
+            base = wav_path.stem.replace(".wav", "")
+        else:
+            base = wav_path.stem
+        keep_transcript = self.save_all or self.cfg.defaults.keep_transcript_files
+        copy_to_clip = self.cfg.defaults.copy
+        # Resolve user prompt if not provided
+        final_user_prompt = None
+        if not transcribe_mode:
+            if user_prompt is None:
+                final_user_prompt = self.cfg.resolve_prompt(self.user_prompt, self.user_prompt_file)
+            else:
+                final_user_prompt = user_prompt
+            self._status("Transcribe mode not activated: using prompt.")
+        else:
+            self._status("Transcribe mode activated: no prompt used.")
+        paths: dict[str, Path | None] = {"wav": wav_path}
+        # Convert if needed
+        to_send_path = wav_path
+        _converted = False
+        if audio_format in {"mp3", "opus"}:
+            self._status("Converting...")
+            to_send_path = convert_audio(wav_path, audio_format)
+            logging.info("Converted %s -> %s", wav_path, to_send_path)
+            paths["converted"] = to_send_path
+            _converted = True
+        # Transcribe
+        self._status("Transcribing...")
+        prov = get_provider(provider, cfg=self.cfg)
+        result = prov.transcribe(
+            to_send_path,
+            user_prompt=final_user_prompt,
+            model=model,
+            language=language,
+            transcribe_mode=transcribe_mode,
+        )
+        text = result["text"]
+        raw = result["raw"]
+        # Save if keeping transcripts
+        if keep_transcript:
+            self.cfg.transcripts_dir.mkdir(parents=True, exist_ok=True)
+            txt_path, json_path = save_transcript(
+                self.cfg.transcripts_dir, base, provider, text, raw
+            )
+            paths["txt"] = txt_path
+            paths["json"] = json_path
+        else:
+            paths["txt"] = None
+            paths["json"] = None
+        # Copy to clipboard
+        if copy_to_clip:
+            try:
+                copy_to_clipboard(text)
+                logging.info("Copied transcription to clipboard")
+            except Exception as e:
+                logging.warning("Failed to copy to clipboard: %s", e)
+        logging.info("Processing finished (%.2fs)", duration)
+        return {
+            "text": text,
+            "raw": raw,
+            "duration": duration,
+            "paths": paths,
+        }
+    def clean(self, wav_path: Path, paths: dict[str, Path | None], keep_audio: bool) -> None:
+        """
+        Clean up temporary files.
+        Args:
+            wav_path: The original WAV path.
+            paths: The paths dict from process().
+            keep_audio: Whether to keep audio files (if True, no deletion).
+        """
+        if not keep_audio and wav_path.exists():
+            wav_path.unlink()
+            logging.info("Deleted temp WAV: %s", wav_path)
+        if "converted" in paths and paths["converted"] and paths["converted"] != wav_path:
+            if paths["converted"].exists():
+                paths["converted"].unlink()
+                logging.info("Deleted temp converted: %s", paths["converted"])
+        self._status("Cleanup completed.")
+    def run(self, stop_event: threading.Event | None = None) -> dict[str, Any]:
+        """
+        Execute the full pipeline.
+        Args:
+            stop_event: Optional event to signal recording stop (e.g., for GUI).
+        Returns:
+            Dict with 'text' (str), 'raw' (dict), 'duration' (float),
+            'paths' (dict of Path or None).
+        Raises:
+            Exception: On recording, conversion, or transcription errors.
+        """
+        self._setup_save_all()
+        wav_path, duration = self.record(stop_event)
+        keep_audio = self.save_all or self.cfg.defaults.keep_audio_files
+        if self.transcribe_mode:
+            final_user_prompt = None
+            self._status("Mode Transcribe activated: no prompt used.")
+        else:
+            final_user_prompt = self.cfg.resolve_prompt(self.user_prompt, self.user_prompt_file)
+        result = self.process(wav_path, duration, self.transcribe_mode, final_user_prompt)
+        self.clean(wav_path, result["paths"], keep_audio=keep_audio)
+        logging.info("Pipeline finished (%.2fs)", duration)
+        return result

{supervoxtral-0.1.1 → supervoxtral-0.1.2}/svx/ui/qt_app.py RENAMED Viewed

@@ -19,12 +19,14 @@ from __future__ import annotations
 import logging
 import threading
+import time
 from pathlib import Path
 from PySide6.QtCore import QObject, QPoint, Qt, QTimer, Signal
 from PySide6.QtGui import QAction, QFont, QFontDatabase, QKeySequence
 from PySide6.QtWidgets import (
     QApplication,
+    QHBoxLayout,
     QLabel,
     QMessageBox,
     QPushButton,
@@ -79,6 +81,18 @@ QPushButton:hover {
     background-color: #2a78ff;
 }
+/* Cancel button */
+QPushButton#cancel_btn {
+    background-color: #da3633;
+}
+QPushButton#cancel_btn:hover {
+    background-color: #f85149;
+}
+QPushButton#cancel_btn:disabled {
+    background-color: #8b0000;
+    color: #9fb8e6;
+}
 /* Small window border effect (subtle) */
 QWidget#recorder_window {
     border: 1px solid #203040;
@@ -189,12 +203,12 @@ class RecorderWorker(QObject):
         status (str): human-readable status updates for the UI.
         done (str): emitted with the final transcription text on success.
         error (str): emitted with an error message on failure.
-    Supports transcribe_mode for pure transcription without prompt.
     """
     status = Signal(str)
     done = Signal(str)
     error = Signal(str)
+    canceled = Signal()
     def __init__(
         self,
@@ -203,7 +217,6 @@ class RecorderWorker(QObject):
         user_prompt_file: Path | None = None,
         save_all: bool = False,
         outfile_prefix: str | None = None,
-        transcribe_mode: bool = False,
     ) -> None:
         super().__init__()
         self.cfg = cfg
@@ -211,13 +224,21 @@ class RecorderWorker(QObject):
         self.user_prompt_file = user_prompt_file
         self.save_all = save_all
         self.outfile_prefix = outfile_prefix
-        self.transcribe_mode = transcribe_mode
+        self.mode: str | None = None
+        self.cancel_requested: bool = False
         self._stop_event = threading.Event()
+    def set_mode(self, mode: str) -> None:
+        self.mode = mode
     def stop(self) -> None:
         """Request the recording to stop."""
         self._stop_event.set()
+    def cancel(self) -> None:
+        self.cancel_requested = True
+        self._stop_event.set()
     def _resolve_user_prompt(self) -> str:
         """
         Determine the final user prompt using the shared resolver.
@@ -227,14 +248,12 @@ class RecorderWorker(QObject):
     def run(self) -> None:
         """
         Execute the pipeline:
-        - record_wav (until stop)
-        - optional convert (mp3/opus)
-        - provider.transcribe
-        - save_transcript
-        - copy_to_clipboard
-        - optionally delete audio files
-        Supports transcribe_mode for pure transcription without prompt.
+        - record (until stop)
+        - wait for mode
+        - process
+        - clean
         """
         try:
             pipeline = RecordingPipeline(
                 cfg=self.cfg,
@@ -242,10 +261,24 @@ class RecorderWorker(QObject):
                 user_prompt_file=self.user_prompt_file,
                 save_all=self.save_all,
                 outfile_prefix=self.outfile_prefix,
-                transcribe_mode=self.transcribe_mode,
                 progress_callback=self.status.emit,
             )
-            result = pipeline.run(stop_event=self._stop_event)
+            self.status.emit("Recording in progress...")
+            wav_path, duration = pipeline.record(self._stop_event)
+            self.status.emit("Recording finished.")
+            if self.cancel_requested:
+                keep_audio = self.save_all or self.cfg.defaults.keep_audio_files
+                pipeline.clean(wav_path, {"wav": wav_path}, keep_audio)
+                self.canceled.emit()
+                return
+            self.status.emit("Processing in progress...")
+            while self.mode is None:
+                time.sleep(0.05)
+            transcribe_mode = self.mode == "transcribe"
+            user_prompt = None if transcribe_mode else self._resolve_user_prompt()
+            result = pipeline.process(wav_path, duration, transcribe_mode, user_prompt)
+            keep_audio = self.save_all or self.cfg.defaults.keep_audio_files
+            pipeline.clean(wav_path, result["paths"], keep_audio)
             self.done.emit(result["text"])
         except Exception as e:
             logging.exception("Pipeline failed")
@@ -254,13 +287,12 @@ class RecorderWorker(QObject):
 class RecorderWindow(QWidget):
     """
-    Frameless always-on-top window with a single Stop button.
+    Frameless always-on-top window with Transcribe and Prompt buttons.
     Launching this window will immediately start the recording in a background thread.
     Window can be dragged by clicking anywhere on the widget background.
-    Pressing Esc triggers Stop.
-    Supports transcribe_mode for pure transcription without prompt.
+    Pressing Esc triggers Prompt mode.
     """
     def __init__(
@@ -270,7 +302,6 @@ class RecorderWindow(QWidget):
         user_prompt_file: Path | None = None,
         save_all: bool = False,
         outfile_prefix: str | None = None,
-        transcribe_mode: bool = False,
     ) -> None:
         super().__init__()
@@ -279,7 +310,16 @@ class RecorderWindow(QWidget):
         self.user_prompt_file = user_prompt_file
         self.save_all = save_all
         self.outfile_prefix = outfile_prefix
-        self.transcribe_mode = transcribe_mode
+        # Background worker (create early for signal connections)
+        self._worker = RecorderWorker(
+            cfg=self.cfg,
+            user_prompt=user_prompt,
+            user_prompt_file=user_prompt_file,
+            save_all=save_all,
+            outfile_prefix=outfile_prefix,
+        )
+        self._thread = threading.Thread(target=self._worker.run, daemon=True)
         # Environment and prompt files
@@ -313,14 +353,9 @@ class RecorderWindow(QWidget):
             "</span>"
         )
         format_html = f"<span style='color:#ffa657'>{self.cfg.defaults.format}</span>"
-        if self.transcribe_mode:
-            mode_html = "<span style='color:#ff7b72'>Transcribe</span>"
-        else:
-            mode_html = "<span style='color:#7ee787'>Completion</span>"
         parts = [
             prov_model_html,
             format_html,
-            mode_html,
         ]
         if self.cfg.defaults.language:
             lang_html = f"<span style='color:#c9b4ff'>{self.cfg.defaults.language}</span>"
@@ -337,34 +372,42 @@ class RecorderWindow(QWidget):
         self._info_label.setAlignment(Qt.AlignmentFlag.AlignCenter)
         layout.addWidget(self._info_label)
-        self._status_label = QLabel("Recording... Press Stop to finish")
+        self._status_label = QLabel("Recording in progress...")
         self._status_label.setAlignment(Qt.AlignmentFlag.AlignCenter)
         layout.addWidget(self._status_label)
-        self._stop_btn = QPushButton("Stop")
-        self._stop_btn.clicked.connect(self._on_stop_clicked)
-        layout.addWidget(self._stop_btn, 0, Qt.AlignmentFlag.AlignCenter)
+        # Buttons layout
+        button_layout = QHBoxLayout()
+        button_layout.addStretch()
+        self._transcribe_btn = QPushButton("Transcribe")
+        self._transcribe_btn.setToolTip("Stop and transcribe without prompt")
+        self._transcribe_btn.clicked.connect(lambda: self._on_button_clicked("transcribe"))
+        button_layout.addWidget(self._transcribe_btn)
+        self._prompt_btn = QPushButton("Prompt")
+        self._prompt_btn.setToolTip("Stop and transcribe with prompt")
+        self._prompt_btn.clicked.connect(lambda: self._on_button_clicked("prompt"))
+        button_layout.addWidget(self._prompt_btn)
+        self._cancel_btn = QPushButton("Cancel")
+        self._cancel_btn.setObjectName("cancel_btn")
+        self._cancel_btn.setToolTip("Stop recording and quit without processing")
+        self._cancel_btn.clicked.connect(self._on_cancel_clicked)
+        button_layout.addWidget(self._cancel_btn)
+        button_layout.addStretch()
+        button_widget = QWidget()
+        button_widget.setLayout(button_layout)
+        layout.addWidget(button_widget, 0, Qt.AlignmentFlag.AlignCenter)
         # Keyboard shortcut: Esc to stop
         stop_action = QAction(self)
         stop_action.setShortcut(QKeySequence.StandardKey.Cancel)  # Esc
-        stop_action.triggered.connect(self._on_stop_clicked)
+        stop_action.triggered.connect(lambda: self._worker.cancel())
         self.addAction(stop_action)
-        # Background worker
-        self._worker = RecorderWorker(
-            cfg=self.cfg,
-            user_prompt=user_prompt,
-            user_prompt_file=user_prompt_file,
-            save_all=save_all,
-            outfile_prefix=outfile_prefix,
-        )
-        self._thread = threading.Thread(target=self._worker.run, daemon=True)
         # Signals wiring
         self._worker.status.connect(self._on_status)
         self._worker.done.connect(self._on_done)
         self._worker.error.connect(self._on_error)
+        self._worker.canceled.connect(self._close_soon)
         # Apply stylesheet to the application for consistent appearance
         app = QApplication.instance()
@@ -410,14 +453,24 @@ class RecorderWindow(QWidget):
     def closeEvent(self, event) -> None:  # type: ignore[override]
         # Attempt to stop recording if the user closes the window via window controls.
-        self._worker.stop()
+        self._worker.cancel()
         super().closeEvent(event)
-    def _on_stop_clicked(self) -> None:
-        self._stop_btn.setEnabled(False)
-        self._status_label.setText("Stopping...")
+    def _on_button_clicked(self, mode: str) -> None:
+        self._transcribe_btn.setEnabled(False)
+        self._prompt_btn.setEnabled(False)
+        self._cancel_btn.setEnabled(False)
+        self._status_label.setText("Stopping and processing...")
+        self._worker.set_mode(mode)
         self._worker.stop()
+    def _on_cancel_clicked(self) -> None:
+        self._transcribe_btn.setEnabled(False)
+        self._prompt_btn.setEnabled(False)
+        self._cancel_btn.setEnabled(False)
+        self._status_label.setText("Canceling...")
+        self._worker.cancel()
     # --- Drag handling for frameless window ---
     def mousePressEvent(self, event) -> None:  # type: ignore[override]
         if event.button() == Qt.MouseButton.LeftButton:
@@ -447,7 +500,7 @@ class RecorderWindow(QWidget):
     def keyPressEvent(self, event) -> None:  # type: ignore[override]
         # Qt.Key_Escape is a safety stop
         if event.key() == Qt.Key.Key_Escape:
-            self._on_stop_clicked()
+            self._worker.cancel()
         else:
             super().keyPressEvent(event)
@@ -458,14 +511,12 @@ def run_gui(
     user_prompt_file: Path | None = None,
     save_all: bool = False,
     outfile_prefix: str | None = None,
-    transcribe_mode: bool = False,
     log_level: str = "INFO",
 ) -> None:
     if cfg is None:
         cfg = Config.load(log_level=log_level)
     """
     Launch the PySide6 app with the minimal recorder window.
-    Supports transcribe_mode for pure transcription without prompt.
     """
     config.setup_environment(log_level=log_level)
@@ -485,7 +536,6 @@ def run_gui(
         user_prompt_file=user_prompt_file,
         save_all=save_all,
         outfile_prefix=outfile_prefix,
-        transcribe_mode=transcribe_mode,
     )
     window.show()
     app.exec()

supervoxtral-0.1.1/supervoxtral.png DELETED Viewed

Binary file

supervoxtral-0.1.1/svx/core/pipeline.py DELETED Viewed

@@ -1,260 +0,0 @@
-from __future__ import annotations
-import logging
-import tempfile
-import threading
-from collections.abc import Callable
-from logging import FileHandler
-from pathlib import Path
-from typing import Any
-import svx.core.config as config
-from svx.core.audio import convert_audio, record_wav, timestamp
-from svx.core.clipboard import copy_to_clipboard
-from svx.core.config import Config
-from svx.core.storage import save_transcript
-from svx.providers import get_provider
-class RecordingPipeline:
-    """
-    Centralized pipeline for recording audio, transcribing via provider, saving outputs,
-    and copying to clipboard. Handles temporary files when not keeping audio.
-    Supports runtime overrides like save_all for keeping all files and adding log handlers.
-    Optional progress_callback for status updates (e.g., for GUI).
-    Supports transcribe_mode for pure transcription without prompt using dedicated endpoint.
-    """
-    def __init__(
-        self,
-        cfg: Config,
-        user_prompt: str | None = None,
-        user_prompt_file: Path | None = None,
-        save_all: bool = False,
-        outfile_prefix: str | None = None,
-        progress_callback: Callable[[str], None] | None = None,
-        transcribe_mode: bool = False,
-    ) -> None:
-        self.cfg = cfg
-        self.user_prompt = user_prompt
-        self.user_prompt_file = user_prompt_file
-        self.save_all = save_all
-        self.outfile_prefix = outfile_prefix
-        self.progress_callback = progress_callback
-        self.transcribe_mode = transcribe_mode
-    def _status(self, msg: str) -> None:
-        """Emit status update via callback if provided."""
-        if self.progress_callback:
-            self.progress_callback(msg)
-        logging.info(msg)
-    def _setup_save_all(self) -> None:
-        """Apply save_all overrides: set keeps to True, create dirs, add file logging."""
-        if not self.save_all:
-            return
-        # Override config defaults
-        self.cfg.defaults.keep_audio_files = True
-        self.cfg.defaults.keep_transcript_files = True
-        self.cfg.defaults.keep_log_files = True
-        # Ensure directories
-        config.RECORDINGS_DIR.mkdir(parents=True, exist_ok=True)
-        config.TRANSCRIPTS_DIR.mkdir(parents=True, exist_ok=True)
-        config.LOGS_DIR.mkdir(parents=True, exist_ok=True)
-        # Add file handler if not present
-        root_logger = logging.getLogger()
-        if not any(isinstance(h, FileHandler) for h in root_logger.handlers):  # type: ignore[reportUnknownMemberType]
-            from svx.core.config import _get_log_level
-            log_level_int = _get_log_level(self.cfg.defaults.log_level)
-            formatter = logging.Formatter("%(asctime)s | %(levelname)s | %(name)s | %(message)s")
-            file_handler = logging.FileHandler(config.LOGS_DIR / "app.log", encoding="utf-8")
-            file_handler.setLevel(log_level_int)
-            file_handler.setFormatter(formatter)
-            root_logger.addHandler(file_handler)
-            logging.info("File logging enabled for this run")
-    def run(self, stop_event: threading.Event | None = None) -> dict[str, Any]:
-        """
-        Execute the full pipeline.
-        Args:
-            stop_event: Optional event to signal recording stop (e.g., for GUI).
-        Returns:
-            Dict with 'text' (str), 'raw' (dict), 'duration' (float),
-            'paths' (dict of Path or None).
-        Raises:
-            Exception: On recording, conversion, or transcription errors.
-        """
-        self._setup_save_all()
-        # Resolve parameters
-        provider = self.cfg.defaults.provider
-        audio_format = self.cfg.defaults.format
-        model = self.cfg.defaults.model
-        original_model = model
-        if self.transcribe_mode:
-            model = "voxtral-mini-latest"
-            if original_model != "voxtral-mini-latest":
-                logging.warning(
-                    "Mode Transcribe : modèle override de '%s' vers 'voxtral-mini-latest' "
-                    "(optimisé pour la transcription).",
-                    original_model,
-                )
-        language = self.cfg.defaults.language
-        rate = self.cfg.defaults.rate
-        channels = self.cfg.defaults.channels
-        device = self.cfg.defaults.device
-        base = self.outfile_prefix or f"rec_{timestamp()}"
-        if self.transcribe_mode:
-            final_user_prompt = None
-            self._status("Mode Transcribe activated: no prompt used.")
-        else:
-            final_user_prompt = self.cfg.resolve_prompt(self.user_prompt, self.user_prompt_file)
-        keep_audio = self.cfg.defaults.keep_audio_files
-        keep_transcript = self.cfg.defaults.keep_transcript_files
-        copy_to_clip = self.cfg.defaults.copy
-        # Validation (fail fast)
-        if channels not in (1, 2):
-            raise ValueError("channels must be 1 or 2")
-        if rate <= 0:
-            raise ValueError("rate must be > 0")
-        if audio_format not in {"wav", "mp3", "opus"}:  # noqa: E501
-            raise ValueError("format must be one of wav|mp3|opus")
-        paths: dict[str, Path | None] = {}
-        stop_for_recording = stop_event or threading.Event()
-        try:
-            self._status("Recording...")
-            if keep_audio:
-                self.cfg.recordings_dir.mkdir(parents=True, exist_ok=True)
-                wav_path = self.cfg.recordings_dir / f"{base}.wav"
-                duration = record_wav(
-                    wav_path,
-                    samplerate=rate,
-                    channels=channels,
-                    device=device,
-                    stop_event=stop_for_recording,
-                )
-                to_send_path = wav_path
-                paths["wav"] = wav_path
-            else:
-                with tempfile.TemporaryDirectory() as tmpdir:
-                    tmp_path = Path(tmpdir)
-                    wav_path = tmp_path / f"{base}.wav"
-                    duration = record_wav(
-                        wav_path,
-                        samplerate=rate,
-                        channels=channels,
-                        device=device,
-                        stop_event=stop_for_recording,
-                    )
-                    to_send_path = wav_path
-                    # Convert if needed
-                    if audio_format in {"mp3", "opus"}:
-                        self._status("Converting...")
-                        to_send_path = convert_audio(wav_path, audio_format)
-                        logging.info("Converted %s -> %s", wav_path, to_send_path)
-                    # Transcribe
-                    self._status("Transcribing...")
-                    prov = get_provider(provider, cfg=self.cfg)
-                    result = prov.transcribe(
-                        to_send_path,
-                        user_prompt=final_user_prompt,
-                        model=model,
-                        language=language,
-                        transcribe_mode=self.transcribe_mode,
-                    )
-                    text = result["text"]
-                    raw = result["raw"]
-                    # Save if keeping transcripts
-                    if keep_transcript:
-                        self.cfg.transcripts_dir.mkdir(parents=True, exist_ok=True)
-                        txt_path, json_path = save_transcript(
-                            self.cfg.transcripts_dir, base, provider, text, raw
-                        )
-                        paths["txt"] = txt_path
-                        paths["json"] = json_path
-                    else:
-                        paths["txt"] = None
-                        paths["json"] = None
-                    # Copy to clipboard
-                    if copy_to_clip:
-                        try:
-                            copy_to_clipboard(text)
-                            logging.info("Copied transcription to clipboard")
-                        except Exception as e:
-                            logging.warning("Failed to copy to clipboard: %s", e)
-                    logging.info("Pipeline finished (%.2fs)", duration)
-                    return {
-                        "text": text,
-                        "raw": raw,
-                        "duration": duration,
-                        "paths": paths,
-                    }
-            # For keep_audio=True: continue outside tempdir
-            # Convert if needed
-            if audio_format in {"mp3", "opus"}:
-                self._status("Converting...")
-                to_send_path = convert_audio(wav_path, audio_format)
-                logging.info("Converted %s -> %s", wav_path, to_send_path)
-                paths["converted"] = to_send_path
-            # Transcribe
-            self._status("Transcribing...")
-            prov = get_provider(provider, cfg=self.cfg)
-            result = prov.transcribe(
-                to_send_path,
-                user_prompt=final_user_prompt,
-                model=model,
-                language=language,
-                transcribe_mode=self.transcribe_mode,
-            )
-            text = result["text"]
-            raw = result["raw"]
-            # Save if keeping transcripts
-            if keep_transcript:
-                self.cfg.transcripts_dir.mkdir(parents=True, exist_ok=True)
-                txt_path, json_path = save_transcript(
-                    self.cfg.transcripts_dir, base, provider, text, raw
-                )
-                paths["txt"] = txt_path
-                paths["json"] = json_path
-            else:
-                paths["txt"] = None
-                paths["json"] = None
-            # Copy to clipboard
-            if copy_to_clip:
-                try:
-                    copy_to_clipboard(text)
-                    logging.info("Copied transcription to clipboard")
-                except Exception as e:
-                    logging.warning("Failed to copy to clipboard: %s", e)
-            logging.info("Pipeline finished (%.2fs)", duration)
-            return {
-                "text": text,
-                "raw": raw,
-                "duration": duration,
-                "paths": paths,
-            }
-        except Exception:
-            logging.exception("Pipeline failed")
-            raise