PyPI - minidic - Versions diffs - 1.0.0__tar.gz - Mend

minidic 1.0.0__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (17) hide show

minidic-1.0.0/PKG-INFO +125 -0
minidic-1.0.0/README.md +100 -0
minidic-1.0.0/pyproject.toml +38 -0
minidic-1.0.0/src/minidic/__init__.py +1 -0
minidic-1.0.0/src/minidic/__main__.py +5 -0
minidic-1.0.0/src/minidic/audio.py +234 -0
minidic-1.0.0/src/minidic/daemon.py +238 -0
minidic-1.0.0/src/minidic/handlers.py +249 -0
minidic-1.0.0/src/minidic/hotkey.py +238 -0
minidic-1.0.0/src/minidic/inject.py +50 -0
minidic-1.0.0/src/minidic/main.py +81 -0
minidic-1.0.0/src/minidic/menubar.py +400 -0
minidic-1.0.0/src/minidic/runtime/__init__.py +1 -0
minidic-1.0.0/src/minidic/runtime/config.py +83 -0
minidic-1.0.0/src/minidic/runtime/process.py +125 -0
minidic-1.0.0/src/minidic/text_processing.py +109 -0
minidic-1.0.0/src/minidic/transcribe.py +211 -0

minidic-1.0.0/PKG-INFO ADDED Viewed

@@ -0,0 +1,125 @@
+Metadata-Version: 2.4
+Name: minidic
+Version: 1.0.0
+Summary: Voice dictation for macOS
+Keywords: dictation,speech-to-text,transcription,macOS,menubar
+Author: Yejun Su
+Author-email: Yejun Su <goofan.su@gmail.com>
+License-Expression: MIT
+Classifier: Environment :: Console
+Classifier: Environment :: MacOS X
+Classifier: Operating System :: MacOS :: MacOS X
+Classifier: Programming Language :: Python :: 3
+Classifier: Programming Language :: Python :: 3.12
+Classifier: Topic :: Multimedia :: Sound/Audio :: Speech
+Requires-Dist: google-genai>=1.0.0
+Requires-Dist: parakeet-mlx>=0.5.1
+Requires-Dist: pyobjc-framework-quartz>=12.1
+Requires-Dist: sounddevice>=0.5.5
+Requires-Dist: soxr>=1.0.0
+Requires-Python: >=3.12
+Project-URL: Homepage, https://github.com/goofansu/minidic
+Project-URL: Issues, https://github.com/goofansu/minidic/issues
+Project-URL: Repository, https://github.com/goofansu/minidic
+Description-Content-Type: text/markdown
+# minidic
+A tiny **vibe coding** project for voice dictation on macOS — built as a personal, fast-iteration tool for local use on one machine (not a polished/distributed app).
+## Install
+`minidic` is published on PyPI for macOS users.
+```bash
+uv tool install minidic
+```
+To upgrade an existing install:
+```bash
+uv tool install --reinstall minidic
+```
+The first run will download `mlx-community/parakeet-tdt-0.6b-v3`.
+`uv tool` installs `minidic` to `~/.local/bin/minidic`.
+Make sure `~/.local/bin` is on your `PATH`.
+## Usage
+On first use, macOS will prompt for the permissions required by `minidic`. In general, you need to grant these permissions to the terminal app you use to run the commands:
+- **Microphone** — needed to capture live audio for dictation
+- **Accessibility** — needed to inject the transcribed text into the active app and handle global hotkeys in menu bar mode
+To use `--gemini`, set `GEMINI_API_KEY` in your environment before running `minidic`.
+### Console
+Run an interactive dictation session in the terminal. This records from your microphone, transcribes locally, and inserts the final text into the active app.
+```bash
+minidic console
+minidic console --gemini
+```
+### Transcribe
+Transcribe an existing audio file from disk instead of recording live microphone input.
+```bash
+minidic transcribe path/to/file.wav
+minidic transcribe --gemini path/to/file.wav
+```
+### Menubar
+Run `minidic` as a menu bar app with a background daemon and global `F5` hotkey for push-to-toggle dictation.
+```bash
+minidic menubar
+```
+![Menu bar icon (stopped)](screenshots/menubar-daemon-stopped.png)
+![Menu bar icon (running)](screenshots/menubar-daemon-started.png)
+1. Start the menu bar app.
+2. Optionally choose a max recording length from **Duration** in the menu.
+3. Click **Start daemon** (or **Stop daemon** to stop it).
+4. Press `F5` to toggle start/stop dictation (captured globally; other apps will not receive `F5` while daemon is running).
+## Technique overview
+`minidic` captures microphone audio, normalizes it to 16 kHz, and runs local speech-to-text with streaming-style decoding.
+### Models used
+- **ASR model:** `parakeet-mlx` for on-device audio transcription on Apple Silicon / MLX
+- **LLM model:** `gemini-3.1-flash-lite-preview` for optional transcript cleanup (thinking disabled)
+### High-level pipeline
+1. Capture mic audio with `sounddevice`
+2. Resample to 16 kHz with `soxr` (when needed)
+3. Transcribe with `parakeet-mlx` on-device
+4. Smooth transcription by default with local regex cleanup (remove filler words like `um`, `uh`, etc.)
+5. Further smooth with Gemini when `GEMINI_API_KEY` is set and Gemini mode is enabled (via `--gemini` for `console`/`transcribe`, or via the menu bar toggle)
+6. Inject text into the active app on macOS
+The daemon mode is hotkey-driven and lazily loads/unloads the model to reduce idle resource usage.
+### Directory structure
+```text
+~/.minidic/
+└── recordings/             # saved WAV recordings captured during dictation/transcription
+~/.local/state/minidic/
+├── config.json            # persisted runtime config such as Gemini and duration settings
+├── daemon.log             # daemon logs
+├── daemon.pid             # daemon process ID
+├── daemon.state           # current daemon state: idle, recording, transcribing
+├── menubar.log            # menu bar app logs
+└── menubar.pid            # menu bar process ID
+```

minidic-1.0.0/README.md ADDED Viewed

@@ -0,0 +1,100 @@
+# minidic
+A tiny **vibe coding** project for voice dictation on macOS — built as a personal, fast-iteration tool for local use on one machine (not a polished/distributed app).
+## Install
+`minidic` is published on PyPI for macOS users.
+```bash
+uv tool install minidic
+```
+To upgrade an existing install:
+```bash
+uv tool install --reinstall minidic
+```
+The first run will download `mlx-community/parakeet-tdt-0.6b-v3`.
+`uv tool` installs `minidic` to `~/.local/bin/minidic`.
+Make sure `~/.local/bin` is on your `PATH`.
+## Usage
+On first use, macOS will prompt for the permissions required by `minidic`. In general, you need to grant these permissions to the terminal app you use to run the commands:
+- **Microphone** — needed to capture live audio for dictation
+- **Accessibility** — needed to inject the transcribed text into the active app and handle global hotkeys in menu bar mode
+To use `--gemini`, set `GEMINI_API_KEY` in your environment before running `minidic`.
+### Console
+Run an interactive dictation session in the terminal. This records from your microphone, transcribes locally, and inserts the final text into the active app.
+```bash
+minidic console
+minidic console --gemini
+```
+### Transcribe
+Transcribe an existing audio file from disk instead of recording live microphone input.
+```bash
+minidic transcribe path/to/file.wav
+minidic transcribe --gemini path/to/file.wav
+```
+### Menubar
+Run `minidic` as a menu bar app with a background daemon and global `F5` hotkey for push-to-toggle dictation.
+```bash
+minidic menubar
+```
+![Menu bar icon (stopped)](screenshots/menubar-daemon-stopped.png)
+![Menu bar icon (running)](screenshots/menubar-daemon-started.png)
+1. Start the menu bar app.
+2. Optionally choose a max recording length from **Duration** in the menu.
+3. Click **Start daemon** (or **Stop daemon** to stop it).
+4. Press `F5` to toggle start/stop dictation (captured globally; other apps will not receive `F5` while daemon is running).
+## Technique overview
+`minidic` captures microphone audio, normalizes it to 16 kHz, and runs local speech-to-text with streaming-style decoding.
+### Models used
+- **ASR model:** `parakeet-mlx` for on-device audio transcription on Apple Silicon / MLX
+- **LLM model:** `gemini-3.1-flash-lite-preview` for optional transcript cleanup (thinking disabled)
+### High-level pipeline
+1. Capture mic audio with `sounddevice`
+2. Resample to 16 kHz with `soxr` (when needed)
+3. Transcribe with `parakeet-mlx` on-device
+4. Smooth transcription by default with local regex cleanup (remove filler words like `um`, `uh`, etc.)
+5. Further smooth with Gemini when `GEMINI_API_KEY` is set and Gemini mode is enabled (via `--gemini` for `console`/`transcribe`, or via the menu bar toggle)
+6. Inject text into the active app on macOS
+The daemon mode is hotkey-driven and lazily loads/unloads the model to reduce idle resource usage.
+### Directory structure
+```text
+~/.minidic/
+└── recordings/             # saved WAV recordings captured during dictation/transcription
+~/.local/state/minidic/
+├── config.json            # persisted runtime config such as Gemini and duration settings
+├── daemon.log             # daemon logs
+├── daemon.pid             # daemon process ID
+├── daemon.state           # current daemon state: idle, recording, transcribing
+├── menubar.log            # menu bar app logs
+└── menubar.pid            # menu bar process ID
+```

minidic-1.0.0/pyproject.toml ADDED Viewed

@@ -0,0 +1,38 @@
+[project]
+name = "minidic"
+version = "1.0.0"
+description = "Voice dictation for macOS"
+readme = "README.md"
+license = "MIT"
+authors = [
+    { name = "Yejun Su", email = "goofan.su@gmail.com" }
+]
+requires-python = ">=3.12"
+keywords = ["dictation", "speech-to-text", "transcription", "macOS", "menubar"]
+classifiers = [
+    "Environment :: Console",
+    "Environment :: MacOS X",
+    "Operating System :: MacOS :: MacOS X",
+    "Programming Language :: Python :: 3",
+    "Programming Language :: Python :: 3.12",
+    "Topic :: Multimedia :: Sound/Audio :: Speech",
+]
+dependencies = [
+    "google-genai>=1.0.0",
+    "parakeet-mlx>=0.5.1",
+    "pyobjc-framework-quartz>=12.1",
+    "sounddevice>=0.5.5",
+    "soxr>=1.0.0",
+]
+[project.urls]
+Homepage = "https://github.com/goofansu/minidic"
+Repository = "https://github.com/goofansu/minidic"
+Issues = "https://github.com/goofansu/minidic/issues"
+[project.scripts]
+minidic = "minidic.main:main"
+[build-system]
+requires = ["uv_build>=0.9.15,<0.10.0"]
+build-backend = "uv_build"

minidic-1.0.0/src/minidic/__init__.py ADDED Viewed

	@@ -0,0 +1 @@
1	+ """minidic — macOS voice dictation using parakeet-mlx."""

minidic-1.0.0/src/minidic/__main__.py ADDED Viewed

@@ -0,0 +1,5 @@
+"""Allow running as ``python -m minidic``."""
+from minidic.main import main
+main()

minidic-1.0.0/src/minidic/audio.py ADDED Viewed

@@ -0,0 +1,234 @@
+"""Audio capture from microphone via sounddevice."""
+from __future__ import annotations
+import logging
+import queue
+from types import TracebackType
+import numpy as np
+import sounddevice as sd
+import soxr
+logger = logging.getLogger(__name__)
+TARGET_RATE = 16_000  # What VAD/ASR expect
+CHANNELS = 1
+DTYPE = "int16"
+BLOCKSIZE = 512  # 32ms chunks at 16kHz
+def int16_to_float32(audio: np.ndarray) -> np.ndarray:
+    """Convert int16 audio samples to float32 in [-1, 1]."""
+    return audio.astype(np.float32) / 32768.0
+def _refresh_portaudio() -> None:
+    """Terminate and reinitialize PortAudio to pick up newly connected devices.
+    PortAudio captures the device list at initialization time.  Calling
+    Pa_Terminate / Pa_Initialize forces a fresh enumeration so that devices
+    connected after the process started (e.g. Bluetooth headsets, USB mics)
+    are visible to subsequent ``sd.query_devices`` / ``sd.InputStream`` calls.
+    Uses the private ``sd._terminate`` / ``sd._initialize`` API — there is no
+    public equivalent.  Both symbols have been stable since sounddevice 0.5.1
+    and the project pins ``sounddevice>=0.5.5``.  If either call raises, the
+    exception is re-raised so the caller receives a clear error rather than a
+    deferred cryptic PortAudio failure (a partial reinit — e.g. terminate
+    succeeded but initialize failed — leaves PortAudio uninitialized and must
+    not be silently swallowed).
+    """
+    try:
+        sd._terminate()
+        sd._initialize()
+    except Exception:
+        logger.warning("PortAudio reinit failed", exc_info=True)
+        raise
+def _get_device_samplerate(device: int | str | None) -> float:
+    """Query the native sample rate for the given input device."""
+    info = sd.query_devices(device, kind="input")
+    return float(info["default_samplerate"])
+class AudioStream:
+    """Captures audio from the microphone and pushes 16 kHz chunks to a queue.
+    If the device's native sample rate differs from 16 kHz, audio is
+    captured at the native rate and resampled with libsoxr.
+    Usage::
+        with AudioStream() as stream:
+            while True:
+                chunk = stream.read()  # np.ndarray int16, shape (blocksize,)
+                ...
+    Parameters
+    ----------
+    blocksize:
+        Number of *output* samples per chunk at 16 kHz (default 512 = 32 ms).
+    device:
+        Input device index or name.  ``None`` uses the system default.
+    """
+    def __init__(
+        self,
+        blocksize: int = BLOCKSIZE,
+        device: int | str | None = None,
+    ) -> None:
+        self.blocksize = blocksize
+        self.device = device
+        self._queue: queue.Queue[np.ndarray] = queue.Queue()
+        self._stream: sd.InputStream | None = None
+        # Determined at start() time
+        self._native_rate: float = 0
+        self._resampler: soxr.ResampleStream | None = None
+        self._resample_buf: np.ndarray = np.array([], dtype=np.float32)
+    # -- callback ----------------------------------------------------------
+    def _callback(
+        self,
+        indata: np.ndarray,
+        frames: int,
+        time_info: object,
+        status: sd.CallbackFlags,
+    ) -> None:
+        if status:
+            logger.warning("sounddevice status: %s", status)
+        # indata shape is (blocksize, 1) int16; flatten to (blocksize,).
+        raw = indata[:, 0].copy()
+        if self._resampler is not None:
+            # Resample to 16 kHz.  soxr expects float32/float64.
+            f32 = raw.astype(np.float32)
+            resampled = self._resampler.resample_chunk(f32)
+            # Buffer resampled samples and emit fixed-size chunks.
+            self._resample_buf = np.concatenate([self._resample_buf, resampled])
+            while len(self._resample_buf) >= self.blocksize:
+                chunk = self._resample_buf[: self.blocksize]
+                self._resample_buf = self._resample_buf[self.blocksize :]
+                # Convert back to int16 for consistency
+                self._queue.put_nowait(
+                    np.clip(chunk, -32768, 32767).astype(np.int16)
+                )
+        else:
+            self._queue.put_nowait(raw)
+    # -- internal helpers --------------------------------------------------
+    def _do_open(self) -> None:
+        """Query device info, configure resampling, and open the PortAudio stream.
+        Separated from ``start()`` so the try-on-failure retry in ``start()``
+        can call it without duplicating the setup logic.  Caller must ensure
+        ``self._stream is None`` before calling.
+        """
+        self._native_rate = _get_device_samplerate(self.device)
+        needs_resample = abs(self._native_rate - TARGET_RATE) > 1
+        if needs_resample:
+            self._resampler = soxr.ResampleStream(
+                self._native_rate,
+                TARGET_RATE,
+                num_channels=1,
+                dtype=np.float32,
+            )
+            self._resample_buf = np.array([], dtype=np.float32)
+            # Capture blocksize scaled to native rate
+            native_blocksize = int(self.blocksize * self._native_rate / TARGET_RATE)
+        else:
+            self._resampler = None
+            native_blocksize = self.blocksize
+        stream = sd.InputStream(
+            samplerate=self._native_rate,
+            blocksize=native_blocksize,
+            device=self.device,
+            channels=CHANNELS,
+            dtype=DTYPE,
+            callback=self._callback,
+        )
+        try:
+            stream.start()
+        except Exception:
+            stream.close()
+            raise
+        self._stream = stream
+        logger.info(
+            "Audio stream started  native_rate=%d  target_rate=%d  "
+            "blocksize=%d  resample=%s  device=%s",
+            int(self._native_rate),
+            TARGET_RATE,
+            self.blocksize,
+            needs_resample,
+            self.device or "default",
+        )
+    # -- public API --------------------------------------------------------
+    def start(self) -> None:
+        """Open and start the audio stream.
+        On the first attempt the stream is opened against the current
+        PortAudio device list.  If that fails (e.g. the user connected a new
+        device since the process started and PortAudio's list is stale),
+        PortAudio is reinitialized — which forces CoreAudio to re-enumerate
+        devices — and the open is retried once.  The retry failure propagates
+        to the caller.
+        """
+        if self._stream is not None:
+            return
+        try:
+            self._do_open()
+        except Exception as exc:
+            logger.warning(
+                "Audio stream open failed (%s); reinitializing PortAudio and retrying",
+                exc,
+            )
+            _refresh_portaudio()
+            self._do_open()  # propagates on second failure
+    def stop(self) -> None:
+        """Stop and close the audio stream."""
+        if self._stream is None:
+            return
+        self._stream.stop()
+        self._stream.close()
+        self._stream = None
+        self._resampler = None
+        logger.info("Audio stream stopped")
+    def read(self, timeout: float | None = None) -> np.ndarray:
+        """Block until the next audio chunk is available.
+        Returns an int16 numpy array of shape ``(blocksize,)``.
+        Raises ``queue.Empty`` if *timeout* expires.
+        """
+        return self._queue.get(timeout=timeout)
+    @property
+    def queue(self) -> queue.Queue[np.ndarray]:
+        """Direct access to the underlying chunk queue."""
+        return self._queue
+    # -- context manager ---------------------------------------------------
+    def __enter__(self) -> AudioStream:
+        self.start()
+        return self
+    def __exit__(
+        self,
+        exc_type: type[BaseException] | None,
+        exc_val: BaseException | None,
+        exc_tb: TracebackType | None,
+    ) -> None:
+        self.stop()