PyPI - supervoxtral - Versions diffs - 0.1.0__tar.gz → 0.1.1__tar.gz - Mend

supervoxtral 0.1.0tar.gz → 0.1.1tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (27) hide show

{supervoxtral-0.1.0 → supervoxtral-0.1.1}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: supervoxtral
-Version: 0.1.0
+Version: 0.1.1
 Summary: CLI/GUI audio recorder and transcription client using Mistral Voxtral (chat with audio and transcription).
 License: MIT
 License-File: LICENSE

supervoxtral-0.1.1/README.md ADDED Viewed

@@ -0,0 +1,235 @@
+# supervoxtral
+![Supervoxtral](supervoxtral.png)
+SuperVoxtral is a lightweight Python CLI/GUI utility for recording microphone audio and integrate with Mistral's Voxtral APIs for transcription or audio-enabled chat.
+Voxtral models, such as `voxtral-mini-latest` and `voxtral-small-latest`, deliver fast inference times, high transcription accuracy across languages and accents, and minimal API costs. In contrast to OpenAI's Whisper, which performs only standalone transcription, Voxtral supports two modes: pure transcription via a dedicated endpoint (no prompts needed) or chat mode, where audio input combines with text prompts for refined outputs—like error correction or contextual summarization—without invoking a separate LLM.
+For instance, use a prompt like: "_Transcribe this audio precisely and remove all minor speech hesitations: "um", "uh", "er", "euh", "ben", etc._"
+The GUI is minimal, launches fast, and can be bound to a system hotkey. Upon stopping recording, it transcribes via the pipeline and copies the result directly to the system clipboard, enabling efficient voice-driven workflows: e.g., dictating code snippets into an IDE or prompting LLMs via audio without typing.
+## Requirements
+- Python 3.11+
+- ffmpeg (for MP3/Opus conversions)
+  - macOS: `brew install ffmpeg`
+  - Ubuntu/Debian: `sudo apt-get install ffmpeg`
+  - Windows: https://ffmpeg.org/download.html
+## Installation
+The package is available on PyPI. We recommend using `uv` (a fast Python package installer) for a simple, global tool installation—no virtual environment setup required.
+- For core CLI functionality:
+  ```
+  uv tool install supervoxtral
+  ```
+- For GUI support (includes PySide6):
+  ```
+  uv tool install "supervoxtral[gui]"
+  ```
+This installs the `svx` command globally. If you don't have `uv`, install it first via `curl -LsSf https://astral.sh/uv/install.sh | sh` (or from https://docs.astral.sh/uv/getting-started/installation/).
+**Alternative: Using pip with a virtual environment**
+If you prefer not to use uv, you can install via pip in a virtual environment:
+1. Create and activate a virtual environment:
+   - macOS/Linux:
+     ```
+     python -m venv .venv
+     source .venv/bin/activate
+     ```
+   - Windows (PowerShell):
+     ```
+     python -m venv .venv
+     .\.venv\Scripts\Activate.ps1
+     ```
+2. Install the package:
+   ```
+   pip install supervoxtral
+   ```
+   For GUI support (includes PySide6):
+   ```
+   pip install supervoxtral[gui]
+   ```
+This installs the `svx` command within the virtual environment. Make sure to activate the environment before running `svx`.
+**For development** (local editing):
+1. Clone the repo and navigate to the project root.
+2. Create/activate a virtual environment:
+   - macOS/Linux: `python -m venv .venv && source .venv/bin/activate`
+   - Windows: `python -m venv .venv && .\.venv\Scripts\Activate.ps1`
+3. Install in editable mode: `pip install -e .` (or `pip install -e ".[dev]"` for dev tools).
+## Quick Start
+To get started quickly with SuperVoxtral:
+1. Initialize the configuration: `svx config init`
+   This creates the default `config.toml` file with zero-footprint settings.
+2. Open the configuration directory: `svx config open`
+   Edit `config.toml` and add your [Mistral API key](https://console.mistral.ai/api-keys) under the `[providers.mistral]` section:
+   ```
+   [providers.mistral]
+   api_key = "your_mistral_api_key_here"
+   ```
+3. Launch the GUI for transcription: `svx record --gui --transcribe`
+   This opens the minimal GUI, starts recording on launch, and transcribes the audio upon stopping (results copied to clipboard).
+### macOS Shortcuts Integration
+To enable fast, hotkey-driven access on macOS, integrate SuperVoxtral with the Shortcuts app. Create a new Shortcut that runs `svx record --gui` via a "Run Shell Script" action (ensure `svx` is in your PATH). Assign a global hotkey in Shortcuts settings for instant GUI launch—ideal for quick voice-to-text workflows, with results copied directly to the clipboard.
+#### Quick Setup Steps
+1. Open the Shortcuts app and create a new shortcut.
+2. Add the "Run Shell Script" action with input: `svx record --gui`.
+3. In shortcut details, set a keyboard shortcut (e.g., Cmd+Shift+V).
+![macOS Shortcut Setup](macos-shortcut.png)
+## Configuration (API keys and prompts)
+API keys and default behavior are configured only in your user configuration file (config.toml), not via environment variables.
+- Location of the user config:
+  - macOS: ~/Library/Application Support/SuperVoxtral/config.toml
+  - Linux: ${XDG_CONFIG_HOME:-~/.config}/supervoxtral/config.toml
+  - Windows: %APPDATA%/SuperVoxtral/config.toml
+- Initialize your user config and user prompt file:
+  - `svx config init`: Creates config.toml (with sensible defaults, including zero-footprint mode) and a user prompt file at: `~/Library/Application Support/SuperVoxtral/` (macOS), `${XDG_CONFIG_HOME:-~/.config}/supervoxtral/` (Linux), or `%APPDATA%/SuperVoxtral/prompt/` (Windows).
+  - `svx config open`: Opens the directory.
+  - `svx config show`: Displays the current configuration.
+Here's an example of the default `config.toml` generated by `svx config init`:
+```toml
+# SuperVoxtral - User configuration
+#
+# Basics:
+# - This configuration controls the default behavior of `svx record`.
+# - The parameters below override the binary's built-in defaults.
+# - You can override a few options at runtime via the CLI:
+#   --prompt / --prompt-file (set a one-off prompt for this run)
+#   --log-level (debugging)
+#   --outfile-prefix (one-off output naming)
+#
+# Output persistence:
+# - Set keep_* = true to create and save files to project
+#   directories (recordings/, transcripts/, logs/).
+# - false (default): use temp files/console only (no disk
+#   footprint in project dir).
+#
+# Authentication:
+# - API keys are defined in provider-specific sections in this file.
+[providers.mistral]
+# api_key = ""
+[defaults]
+# Provider to use (currently supported: "mistral")
+provider = "mistral"
+# File format sent to the provider: "wav" | "mp3" | "opus"
+# Recording is always WAV; conversion is applied if "mp3" or "opus"
+format = "opus"
+# Model to use on the provider side (example for Mistral Voxtral)
+model = "voxtral-mini-latest"
+# Language hint (may help the provider)
+language = "fr"
+# Audio recording parameters
+rate = 16000
+channels = 1
+device = ""
+# Output persistence:
+# - keep_audio_files: false uses temp files (no recordings/ dir),
+#   true saves to recordings/
+keep_audio_files = false
+# - keep_transcript_files: false prints/copies only (no
+#   transcripts/ dir), true saves to transcripts/
+keep_transcript_files = false
+# - keep_log_files: false console only (no logs/ dir), true
+#   saves to logs/app.log
+keep_log_files = false
+# Automatically copy the transcribed text to the system clipboard
+copy = true
+# Log level: "DEBUG" | "INFO" | "WARNING" | "ERROR"
+log_level = "INFO"
+[prompt]
+# Default user prompt source:
+# - Option 1: Use a file (recommended)
+file = "~/.config/supervoxtral/prompt/user.md"
+#
+# - Option 2: Inline prompt (less recommended for long text)
+# text = "Please transcribe the audio and provide a concise summary in French."
+```
+**Configuration is centralized via a structured `Config` object loaded from your user configuration file (`config.toml`). CLI arguments override select values (e.g., prompt, log level), but most defaults (provider, model, keep flags) come from `config.toml`. No environment variables are used for API keys or settings.**
+No `.env` or shell environment variables are used for API keys.
+## Usage (CLI)
+The CLI provides config utilities and a unified `record` entrypoint for both CLI and GUI modes, using a centralized pipeline for consistent behavior (recording, conversion, transcription, saving, clipboard copy, logging).
+**Zero-footprint defaults**: No directories created; outputs to console/clipboard. Use `--save-all` or set `keep_* = true` in config.toml for persistence.
+Most defaults (provider, format, model, language, rate, channels, device, keep flags, copy) come from config.toml. CLI overrides are limited to specific options.
+### Record Command
+```
+svx record [OPTIONS]
+```
+**Options**:
+- `--user-prompt TEXT` (or `--prompt TEXT`): Inline user prompt for this run.
+- `--user-prompt-file PATH` (or `--prompt-file PATH`): Path to a markdown file with the user prompt.
+- `--transcribe`: Enable pure transcription mode (ignores prompts; uses dedicated endpoint).
+- `--outfile-prefix PREFIX`: Custom prefix for output files (default: timestamp).
+- `--gui`: Launch the GUI frontend (respects config and other CLI options).
+- `--save-all`: Override config to keep audio, transcripts, and logs for this run.
+- `--log-level LEVEL`: Set logging level (DEBUG, INFO, WARNING, ERROR; default: INFO).
+**Examples**:
+- Record with prompt: `svx record --prompt "What's in this audio?"`
+  - Records WAV, converts if needed, sends to provider with prompt, outputs to console/clipboard.
+- Persist outputs: `svx record --save-all --prompt "Summarize this"`
+  - Saves to recordings/, transcripts/, logs/.
+- Transcribe only: `svx record --transcribe`
+  - No prompt; direct transcription. Add `--save-all` to persist.
+- Launch GUI: `svx record --gui`
+  - GUI respects config.toml and CLI flags (e.g., `--gui --save-all`).
+**Prompt Resolution Priority** (for non-transcribe mode):
+1. CLI `--user-prompt` or `--user-prompt-file`
+2. config.toml [prompt] section (text or file)
+3. User prompt file (user.md in config dir)
+4. Fallback: "What's in this audio?"
+## Changelog
+- 0.1.1: Minor updates to default config and default prompt
+## License
+MIT

supervoxtral-0.1.1/macos-shortcut.png ADDED Viewed

Binary file

{supervoxtral-0.1.0 → supervoxtral-0.1.1}/notes.md RENAMED Viewed

@@ -1,5 +1,6 @@
 todo
+- Bug config device
 - localisation reccording dans config
 - paste directement ?
 - nettoyer xml réponse (option)

{supervoxtral-0.1.0 → supervoxtral-0.1.1}/pyproject.toml RENAMED Viewed

@@ -4,7 +4,7 @@ build-backend = "hatchling.build"
 [project]
 name = "supervoxtral"
-version = "0.1.0"
+version = "0.1.1"
 description = "CLI/GUI audio recorder and transcription client using Mistral Voxtral (chat with audio and transcription)."
 requires-python = ">=3.11"
 license = { text = "MIT" }

supervoxtral-0.1.1/supervoxtral.png ADDED Viewed

Binary file

{supervoxtral-0.1.0 → supervoxtral-0.1.1}/svx/core/config.py RENAMED Viewed

@@ -227,7 +227,7 @@ def init_user_config(force: bool = False, prompt_file: Path | None = None) -> Pa
         "# Audio recording parameters\n"
         "rate = 16000\n"
         "channels = 1\n"
-        'device = ""\n\n'
+        '#device = ""\n\n'
         "# Output persistence:\n"
         "# - keep_audio_files: false uses temp files (no recordings/ dir),\n"
         "#   true saves to recordings/\n"

{supervoxtral-0.1.0 → supervoxtral-0.1.1}/svx/core/prompt.py RENAMED Viewed

@@ -152,7 +152,7 @@ def init_user_prompt_file(force: bool = False) -> Path:
         example_prompt = """
 - Transcribe the input audio file.
 - Do not respond to any question in the audio. Just transcribe.
-- DO NOT TRANSLATE. Your transcription will be in the speaker's language.
+- DO NOT TRANSLATE.
 - Responde only with the transcription. Do not provide explanations or notes.
 - Remove all minor speech hesitations: "um", "uh", "er", "euh", "ben", etc.
 - Remove false starts (e.g., "je veux dire... je pense" → "je pense").

supervoxtral-0.1.0/README.md DELETED Viewed

@@ -1,237 +0,0 @@
-# supervoxtral
-A simple Python CLI/GUI tool to record audio from your microphone, optionally convert it (WAV/MP3/Opus), and send it to Mistral Voxtral transcription/chat APIs.
----
-## Requirements
-- Python 3.11+
-- ffmpeg (for MP3/Opus conversions)
-  - macOS: `brew install ffmpeg`
-  - Ubuntu/Debian: `sudo apt-get install ffmpeg`
-  - Windows: https://ffmpeg.org/download.html
----
-## Installation
-1) Create and activate a virtual environment (example with venv):
-- macOS/Linux:
-  ```
-  python -m venv .venv
-  source .venv/bin/activate
-  ```
-- Windows (PowerShell):
-  ```
-  python -m venv .venv
-  .\.venv\Scripts\Activate.ps1
-  ```
-2) Install the package (editable mode during development is convenient):
-```
-pip install -e .
-```
-Optional extras:
-- Dev tools:
-  ```
-  pip install -e ".[dev]"
-  ```
----
-## Configuration (API keys and prompts)
-API keys and default behavior are configured only in your user configuration file (config.toml), not via environment variables.
-- Location of the user config:
-  - macOS: ~/Library/Application Support/SuperVoxtral/config.toml
-  - Linux: ${XDG_CONFIG_HOME:-~/.config}/supervoxtral/config.toml
-  - Windows: %APPDATA%/SuperVoxtral/config.toml
-- Initialize your user config and user prompt file:
-  ```
-  svx config init
-  ```
-  This creates:
-  - config.toml (with sensible defaults, including zero-footprint mode)
-  - a user prompt file at: ~/Library/Application Support/SuperVoxtral/prompt/user.md (macOS)
-    - Linux: ${XDG_CONFIG_HOME:-~/.config}/supervoxtral/prompt/user.md
-    - Windows: %APPDATA%/SuperVoxtral/prompt/user.md
-**Key config sections (edit `config.toml`):**
-- **[defaults]**: provider (e.g., "mistral"), model, format (e.g., "opus"), language, rate, channels, device, copy (clipboard), keep_audio_files = false, keep_transcript_files = false, keep_log_files = false.
-  - Zero-footprint mode (defaults): When `keep_* = false`, files are handled in OS temporary directories (auto-cleaned, no project dirs created). Set to `true` for persistence (creates `recordings/`, etc.).
-- **[providers.mistral]**: api_key = "your_mistral_key_here", model (e.g., "voxtral-small-latest").
-- **[prompt]**: text (inline prompt), file (path to prompt.md).
-  - Resolution priority: CLI `--prompt`/`--prompt-file` > config.toml [prompt] > user.md fallback > "What's in this audio?".
-**Configuration is centralized via a structured `Config` object loaded from your user configuration file (`config.toml`). CLI arguments override select values (e.g., prompt, log level), but most defaults (provider, model, keep flags) come from `config.toml`. No environment variables are used for API keys or settings.**
-No `.env` or shell environment variables are used for API keys.
----
-## Usage (CLI)
-Make sure your virtual environment is activated and the project is installed (`pip install -e .`).
-General command form:
-```
-svx record [OPTIONS]
-```
-**Unified entrypoint**: `svx record` handles both CLI and GUI modes via a centralized pipeline (`svx.core.pipeline.RecordingPipeline`). This ensures consistent behavior for recording, conversion, transcription, saving, clipboard copy, and logging across CLI and GUI.
-**Zero-footprint defaults**: No directories created; outputs to console/clipboard. Use `--save-all` or config `keep_* = true` for persistence.
-Note: the CLI now exposes a single recording entrypoint. Use `svx record --gui` to launch the GUI frontend. Most defaults (provider, format, model, language, rate, channels, device, keep_audio_files, copy) are configured via your user config (config.toml). The CLI only supports one-off overrides for: --prompt/--prompt-file, --log-level, --outfile-prefix, --gui, --save-all, --transcribe.
-Planned MVP commands:
-- Record with Mistral Voxtral (chat with audio) and a prompt (provider/format from config):
-  ```
-  svx record --prompt "What's in this file?"
-  ```
-  Tip: Outputs to console and clipboard (if copy=true in config). No files saved unless overridden.
-  Persist all outputs (one-off override):
-  ```
-  svx record --save-all --prompt "What's in this file?"
-  ```
-  Creates `recordings/`, `transcripts/`, `logs/` and saves files/logs.
-- Pure transcription mode with Mistral Voxtral (no prompt, dedicated endpoint):
-  ```
-  svx record --transcribe
-  ```
-  Note: Prompts are ignored in this mode. Combine with --save-all for persistence:
-  ```
-  svx record --transcribe --save-all
-  ```
-  To start the GUI frontend:
-  ```
-  svx record --gui
-  ```
-  The GUI uses the same pipeline and respects config + CLI overrides (e.g., `--gui --save-all` propagates persistence).
-  The CLI defaults have been unified to favour the previous GUI defaults (e.g. `--format opus`, `--copy` enabled, and `--no-keep-audio-files` by default). The final effective values still respect the precedence: CLI explicit > user config defaults (config.toml) > built-in defaults.
-### Advanced prompt management
-You can provide a user prompt, either inline or via a file:
-#### User prompt (inline)
-```
-svx record --user-prompt "Transcris puis résume ce qui est dit dans l'audio."
-```
-#### User prompt from file
-```
-svx record --user-prompt-file ~/Library/Application\ Support/SuperVoxtral/prompt/user.md
-```
-(Adjust the path for your OS; see “Configuration” for locations.)
-#### Resolution priority (no concatenation)
-Order of precedence for determining the final prompt:
-1) `--user-prompt` (inline)
-2) `--user-prompt-file` (explicit file)
-3) `config.toml` → `[prompt].text`
-4) `config.toml` → `[prompt].file`
-5) User prompt file in your user config dir (`.../SuperVoxtral/prompt/user.md`)
-6) Default fallback: "What's in this audio?"
-Note: the file and inline prompts are not concatenated; the first non-empty source wins. Uses `Config.resolve_prompt()` for unified resolution across CLI/GUI.
-If no user prompt is provided (by any of the above), it defaults to "What's in this audio?".
-A single user message is sent containing the audio and (optionally) text.
-  Flow:
-  - Starts recording WAV immediately.
-  - Press Enter to stop recording.
-  - Converts WAV to MP3 (if `--format mp3`) or Opus (if `--format opus`).
-  - Sends the audio to Mistral Voxtral as base64 input_audio plus your text prompt.
-  - Prints and saves the response to `transcripts/` (if keep_transcript_files=true or --save-all).
-  Flow:
-  - Starts recording WAV.
-  - Press Enter to stop.
-  - Sends the audio to Voxtral (transcription).
-  - Prints and saves the transcript.
-Config-driven options (set these in config.toml under [defaults]):
-- rate, channels, device
-- provider, model, format, language
-- keep_audio_files, copy
-One-off CLI overrides:
-- `--outfile-prefix mynote_2025-09-09` (custom file prefix)
-- `--log-level debug` (verbose logs)
-- `--user-prompt` (alias: `--prompt`; user prompt text, inline)
-- `--user-prompt-file` (alias: `--prompt-file`; path to user prompt markdown file in your user config dir)
-- `--transcribe` (pure transcription mode, ignores prompts)
-Alternative invocation (without console script):
-```
-python -m svx.cli record --prompt "..."
-```
----
-## Provider details
-### Mistral Voxtral (chat with audio)
-- Model: `voxtral-small-latest` by default (configurable)
-- API: `mistralai` Python client
-- Request structure:
-  - Messages with `content` array containing:
-    - `{ "type": "input_audio", "input_audio": "<base64>" }`
-    - `{ "type": "text", "text": "<prompt>" }`
-- Output: text content from the chat response; saved to `transcripts/`.
-Recommended formats:
-- Opus reduces file size and upload time.
-Authentication:
-- Mistral: key read from `Config` (user config at `providers.mistral.api_key`).
----
-## Recording formats and conversion
-- Recording happens in WAV (PCM 16-bit, mono, 16k/32k Hz).
-- Optional conversion via ffmpeg:
-  - WAV -> MP3:
-    ```
-    ffmpeg -y -i input.wav -codec:a libmp3lame -q:a 3 output.mp3
-    ```
-  - WAV -> Opus:
-    ```
-    ffmpeg -y -i input.wav -c:a libopus -b:a 24k output.opus
-    ```
-The tool will send the converted file if you set `--format mp3` or `--format opus`; otherwise it sends the raw WAV.
----
-## macOS notes
-- Microphone permission: on first run, macOS will ask for microphone access. Approve it in System Settings > Privacy & Security > Microphone if needed.
-- If you face issues with device selection, we will add a `--device` flag to choose a specific input device.
----
-## License
-MIT