susurro 0.4.0__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (41) hide show
  1. susurro-0.4.0/.github/ISSUE_TEMPLATE/bug_report.md +39 -0
  2. susurro-0.4.0/.github/ISSUE_TEMPLATE/config.yml +5 -0
  3. susurro-0.4.0/.github/ISSUE_TEMPLATE/feature_request.md +22 -0
  4. susurro-0.4.0/.github/PULL_REQUEST_TEMPLATE.md +18 -0
  5. susurro-0.4.0/.github/workflows/ci.yml +39 -0
  6. susurro-0.4.0/.gitignore +41 -0
  7. susurro-0.4.0/CHANGELOG.md +84 -0
  8. susurro-0.4.0/CODE_OF_CONDUCT.md +7 -0
  9. susurro-0.4.0/CONTRIBUTING.md +58 -0
  10. susurro-0.4.0/LICENSE +21 -0
  11. susurro-0.4.0/PKG-INFO +233 -0
  12. susurro-0.4.0/README.md +194 -0
  13. susurro-0.4.0/SECURITY.md +36 -0
  14. susurro-0.4.0/install.sh +120 -0
  15. susurro-0.4.0/pyproject.toml +90 -0
  16. susurro-0.4.0/scripts/generate_icons.py +77 -0
  17. susurro-0.4.0/scripts/run.sh +9 -0
  18. susurro-0.4.0/scripts/test_mic.py +50 -0
  19. susurro-0.4.0/susurro/__init__.py +3 -0
  20. susurro-0.4.0/susurro/__main__.py +6 -0
  21. susurro-0.4.0/susurro/app.py +387 -0
  22. susurro-0.4.0/susurro/audio.py +107 -0
  23. susurro-0.4.0/susurro/backends/__init__.py +77 -0
  24. susurro-0.4.0/susurro/backends/base.py +50 -0
  25. susurro-0.4.0/susurro/backends/local_mlx.py +61 -0
  26. susurro-0.4.0/susurro/backends/local_mlx_lm.py +76 -0
  27. susurro-0.4.0/susurro/config.py +67 -0
  28. susurro-0.4.0/susurro/hotkey.py +62 -0
  29. susurro-0.4.0/susurro/icons/idle.png +0 -0
  30. susurro-0.4.0/susurro/icons/processing.png +0 -0
  31. susurro-0.4.0/susurro/icons/recording.png +0 -0
  32. susurro-0.4.0/susurro/indicator.py +249 -0
  33. susurro-0.4.0/susurro/logging_config.py +20 -0
  34. susurro-0.4.0/susurro/permissions.py +29 -0
  35. susurro-0.4.0/susurro/polish/__init__.py +97 -0
  36. susurro-0.4.0/susurro/polish/prompt.py +61 -0
  37. susurro-0.4.0/susurro/polish/rules.py +49 -0
  38. susurro-0.4.0/susurro/polish/triggers.py +42 -0
  39. susurro-0.4.0/susurro/typer.py +64 -0
  40. susurro-0.4.0/tests/test_polish.py +94 -0
  41. susurro-0.4.0/tests/test_smoke.py +87 -0
@@ -0,0 +1,39 @@
1
+ ---
2
+ name: Bug report
3
+ about: Something is broken or behaving unexpectedly.
4
+ title: "[bug] "
5
+ labels: ["bug"]
6
+ ---
7
+
8
+ **What happened**
9
+
10
+ A clear and concise description of the bug.
11
+
12
+ **Expected behavior**
13
+
14
+ What you thought would happen.
15
+
16
+ **Steps to reproduce**
17
+
18
+ 1.
19
+ 2.
20
+ 3.
21
+
22
+ **Environment**
23
+
24
+ - macOS version + chip (e.g. `macOS 26.0 / M3 Pro`):
25
+ - Python version (`python --version`):
26
+ - Susurro version (`pip show susurro | grep Version`):
27
+ - Model in use (from `susurro/config.py`):
28
+
29
+ **Logs**
30
+
31
+ The last ~50 lines of `~/.susurro/susurro.log`:
32
+
33
+ ```
34
+ paste log output here
35
+ ```
36
+
37
+ **Additional context**
38
+
39
+ Anything else worth knowing.
@@ -0,0 +1,5 @@
1
+ blank_issues_enabled: false
2
+ contact_links:
3
+ - name: Question or discussion
4
+ url: https://github.com/danilobrando/susurro/discussions
5
+ about: For usage questions, ideas, and conversations. Use Issues only for bugs and concrete feature requests.
@@ -0,0 +1,22 @@
1
+ ---
2
+ name: Feature request
3
+ about: Suggest a new capability or improvement.
4
+ title: "[feature] "
5
+ labels: ["enhancement"]
6
+ ---
7
+
8
+ **What you'd like**
9
+
10
+ A clear description of the feature.
11
+
12
+ **Why it matters**
13
+
14
+ The use case this unlocks. Be concrete β€” "I want to dictate code comments in VS Code while my hands stay on the keyboard" is more useful than "more flexibility."
15
+
16
+ **Alternatives considered**
17
+
18
+ What you've tried instead, or other tools that solve this.
19
+
20
+ **Scope check**
21
+
22
+ Susurro aims to stay under ~800 lines of Python and avoid feature creep. Does this fit? If it adds significant surface area, what could be left out?
@@ -0,0 +1,18 @@
1
+ ## Summary
2
+
3
+ What this PR does and why.
4
+
5
+ ## Changes
6
+
7
+ -
8
+
9
+ ## Test plan
10
+
11
+ - [ ] `ruff check .` passes
12
+ - [ ] `ruff format --check .` passes
13
+ - [ ] Manually exercised the affected code paths on macOS
14
+ - [ ] If touching the daemon: held the hotkey, spoke, confirmed text was pasted at the cursor
15
+
16
+ ## Notes for reviewer
17
+
18
+ Anything non-obvious β€” design tradeoffs, edge cases considered, things deliberately left out.
@@ -0,0 +1,39 @@
1
+ name: CI
2
+
3
+ on:
4
+ push:
5
+ branches: [main]
6
+ pull_request:
7
+ branches: [main]
8
+
9
+ jobs:
10
+ lint:
11
+ name: Lint + format
12
+ runs-on: macos-14
13
+ steps:
14
+ - uses: actions/checkout@v4
15
+ - uses: actions/setup-python@v5
16
+ with:
17
+ python-version: "3.12"
18
+ - name: Install ruff
19
+ run: pip install ruff
20
+ - name: ruff check
21
+ run: ruff check .
22
+ - name: ruff format --check
23
+ run: ruff format --check .
24
+
25
+ smoke:
26
+ name: Smoke import test
27
+ runs-on: macos-14
28
+ steps:
29
+ - uses: actions/checkout@v4
30
+ - uses: actions/setup-python@v5
31
+ with:
32
+ python-version: "3.12"
33
+ - name: Install package
34
+ # We can't actually install mlx on CI runners reliably (Apple Silicon
35
+ # availability varies), so we just verify the package metadata is sound
36
+ # and imports that don't need mlx work.
37
+ run: pip install --no-deps -e .
38
+ - name: Verify package metadata
39
+ run: python -c "import susurro; print(susurro.__version__)"
@@ -0,0 +1,41 @@
1
+ # Python
2
+ __pycache__/
3
+ *.py[cod]
4
+ *$py.class
5
+ *.egg-info/
6
+ .eggs/
7
+ *.so
8
+ dist/
9
+ build/
10
+
11
+ # Virtual environments
12
+ .venv/
13
+ venv/
14
+ env/
15
+
16
+ # Environment + secrets
17
+ .env
18
+ .env.local
19
+
20
+ # OS
21
+ .DS_Store
22
+ Thumbs.db
23
+
24
+ # Logs + local state
25
+ *.log
26
+ .cache/
27
+
28
+ # Tooling
29
+ .coverage
30
+ htmlcov/
31
+ .pytest_cache/
32
+ .ruff_cache/
33
+ .mypy_cache/
34
+
35
+ # Packaged bundles (generated by py2app / briefcase)
36
+ *.app/
37
+ dist-app/
38
+
39
+ # IDEs
40
+ .idea/
41
+ .vscode/
@@ -0,0 +1,84 @@
1
+ # Changelog
2
+
3
+ All notable changes to this project will be documented in this file.
4
+
5
+ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/),
6
+ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
7
+
8
+ ## [Unreleased]
9
+
10
+ ## [0.4.0] β€” 2026-05-23
11
+
12
+ ### Changed
13
+
14
+ - **Split into two products.** This package (`susurro` on PyPI) is now the OSS local-only Mac client. The cloud-extended product (`susurro-pro`) lives in a separate private repo, depends on this package, and adds hosted transcription via api.susurro.live. Same hotkey, same UX β€” different backend.
15
+ - Default `STT_BACKEND` = `local` (was `groq`).
16
+ - Default `POLISH_BACKEND` = `local` (was `groq`).
17
+ - Removed `openai` and `httpx` dependencies (no more cloud calls from OSS).
18
+ - Default `LOCAL_STT_MODEL` switched from `whisper-large-v3-mlx` to `whisper-large-v3-turbo` for ~6x faster decode.
19
+
20
+ ### Removed
21
+
22
+ - `susurro/backends/susurro_pro.py` β€” moved to the susurro-pro package.
23
+ - `susurro/backends/groq.py` β€” moved to the susurro-pro package.
24
+ - `susurro/backends/audio_io.py` and `credentials.py` β€” used only by cloud backends, moved with them.
25
+ - `api/` directory β€” FastAPI backend moved to the susurro-pro repo.
26
+ - `docs/` directory β€” landing page moved to susurro-pro/landing/.
27
+ - Root-level `Dockerfile`, `Caddyfile`, `railway.json` β€” Pro-only.
28
+ - "Sign in to Susurro Pro" menu items β€” Pro adds these via `_extra_menu_items()` override.
29
+
30
+ ### Added
31
+
32
+ - `susurro.backends.register_transcriber()` and `register_polish_llm()` β€” public extension points so external packages can add backends without forking.
33
+ - `SusurroApp._build_menu()` and `_extra_menu_items()` β€” hook points so subclasses can extend the menu cleanly.
34
+ - `available_transcribers()` / `available_polish_llms()` helpers.
35
+ - Published to PyPI as `susurro`.
36
+
37
+ ## [0.2.0] β€” 2026-05-22
38
+
39
+ ### Added
40
+
41
+ - **Hot-swappable backends.** STT and polish providers are now pluggable. Ships with `local` (MLX Whisper) and `groq` (hosted Whisper + Llama 3.3 70B). Future drops: OpenAI, Anthropic, Gemini, Deepgram.
42
+ - **Cloud-first defaults.** `STT_BACKEND="groq"` and `POLISH_BACKEND="groq"` drop local RAM usage from ~3 GB to 0 GB. Latency improves from ~1.8 s to ~0.7 s end-to-end on a 5 s clip.
43
+ - **Smart formatting (LLM polish).** Three modes (`off`/`rules`/`smart`). The LLM runs **only** when triggers fire (ordinal markers, backtrack phrases, long-form input) to keep latency low for the common case.
44
+ - **Polish rules layer.** Regex-based filler removal (`eh`, `mmm`, `o sea sΓ­`, `um`, `uh`, `este pues`) + whitespace normalization. Runs in <5 ms, no network.
45
+ - **Polish trigger detection.** Detects Spanish/English ordinals (`primero/segundo/tercero`, `first/second/third`, `en primer lugar`) and self-correction phrases (`en realidad`, `actually`, `digo`).
46
+ - **Polish system prompt.** Idempotent prompt with 5 few-shot examples covering numbered lists, filler removal, and backtrack. Refuses to paraphrase or translate.
47
+ - **Polish event log.** Every (raw, polished, metadata) tuple appended to `~/.susurro/polish.jsonl` for local audit and future tuning. Never sent anywhere.
48
+ - **Menu submenu: Smart formatting β–Έ Off / Rules only / Smart (LLM).** Hot-toggle without restart.
49
+ - **Fallback chain.** If a cloud backend fails (missing key, network error), automatically falls back to local MLX.
50
+ - **Floating waveform indicator** (carried over from v0.1.x development). 16-bar pill, click-through, follows the active screen.
51
+ - API key resolution accepts both `SUSURRO_<PROVIDER>_API_KEY` and the provider's standard env var.
52
+
53
+ ### Changed
54
+
55
+ - **Tagline: dropped "local-first" as the default identity.** Susurro now positions as "your choice of where inference runs" β€” local for full privacy, cloud for low memory + low latency. Per-stage configuration.
56
+ - Default `LOCAL_STT_MODEL` switched from `whisper-large-v3-mlx` to `whisper-large-v3-turbo` for ~6Γ— faster local decode.
57
+ - Package version bumped to 0.2.0.
58
+ - `susurro/stt.py` removed; replaced by `susurro/backends/local_mlx.py` (`MLXTranscriber`).
59
+
60
+ ### Dependencies
61
+
62
+ - Added `openai>=1.40` (used for both OpenAI and Groq via OpenAI-compatible API).
63
+ - Added optional extras: `anthropic`, `gemini`, `deepgram`.
64
+
65
+ ## [0.1.0] β€” 2026-05-21
66
+
67
+ ## [0.1.0] β€” 2026-05-21
68
+
69
+ ### Added
70
+
71
+ - Push-to-talk dictation triggered by the right Option key.
72
+ - Local transcription via [`mlx-whisper`](https://github.com/ml-explore/mlx-examples/tree/main/whisper) (Apple Silicon only). Default model is `whisper-large-v3-mlx`; swap to `whisper-large-v3-turbo` in `susurro/config.py` for ~6Γ— faster decode.
73
+ - Menu bar app (rumps) with template PNG icons that adapt to light/dark mode.
74
+ - Clipboard paste mode (Cmd+V) with prior-clipboard restoration, and a direct-typing fallback.
75
+ - Status updates and last-transcript shortcut in the menu dropdown.
76
+ - Quick-access menu items that open the correct System Settings pane for Microphone, Accessibility, and Input Monitoring.
77
+ - File logging at `~/.susurro/susurro.log`.
78
+ - Smoke test script: `scripts/test_mic.py`.
79
+
80
+ ### Known limitations
81
+
82
+ - Apple Silicon only β€” Intel Macs aren't supported by MLX.
83
+ - First launch requires three System Settings permission grants (Microphone, Accessibility, Input Monitoring) and a terminal restart.
84
+ - No `.app` bundle yet; install via `pipx` or `pip` and launch from terminal.
@@ -0,0 +1,7 @@
1
+ # Code of Conduct
2
+
3
+ This project follows the [Contributor Covenant 2.1](https://www.contributor-covenant.org/version/2/1/code_of_conduct/).
4
+
5
+ In short: be kind, assume good faith, and remember the person on the other side of the screen is a human. Harassment, personal attacks, and demeaning behavior are not welcome and will result in being asked to leave the project.
6
+
7
+ Issues with conduct can be reported privately to `dannybravo@gmail.com`. Reports are confidential and will be reviewed within 72 hours.
@@ -0,0 +1,58 @@
1
+ # Contributing to Susurro
2
+
3
+ Thanks for considering a contribution. Susurro is intentionally small β€” under ~800 lines of Python total β€” so it stays readable and maintainable by one person in a weekend.
4
+
5
+ ## Dev setup
6
+
7
+ ```bash
8
+ git clone https://github.com/danilobrando/susurro
9
+ cd susurro
10
+ pip install -e ".[dev]"
11
+ ```
12
+
13
+ Run from source:
14
+
15
+ ```bash
16
+ python -m susurro
17
+ ```
18
+
19
+ Generate the menu bar icons (only needed if you change `scripts/generate_icons.py`):
20
+
21
+ ```bash
22
+ python scripts/generate_icons.py
23
+ ```
24
+
25
+ Run lint and format checks:
26
+
27
+ ```bash
28
+ ruff check .
29
+ ruff format --check .
30
+ ```
31
+
32
+ ## Pull request guidelines
33
+
34
+ - Keep the surface area small. If a change adds more than ~100 lines, open an issue first to align on scope.
35
+ - One concern per PR. Refactors and behavior changes don't mix.
36
+ - Don't add hidden network calls β€” privacy is the product. Anything that touches the network needs to be opt-in and clearly labeled.
37
+ - Match the existing style: type hints throughout, no docstrings longer than a sentence, no comments that just restate the code.
38
+
39
+ ## What we welcome
40
+
41
+ - Bug fixes, especially around macOS permissions and menu bar edge cases.
42
+ - New keyboard layouts and language defaults.
43
+ - Performance benchmarks across M-series chips and Whisper variants.
44
+ - Documentation improvements.
45
+
46
+ ## What we're more cautious about
47
+
48
+ - New features that grow the dependency tree.
49
+ - Anything that turns this into a full Speech-Privacy SaaS. There are other projects for that.
50
+
51
+ ## Reporting bugs
52
+
53
+ Open an issue with:
54
+
55
+ - macOS version + chip (e.g. `macOS 26.0 / M3 Pro`)
56
+ - Output of `pip show mlx-whisper`
57
+ - Last 50 lines of `~/.susurro/susurro.log`
58
+ - Steps to reproduce
susurro-0.4.0/LICENSE ADDED
@@ -0,0 +1,21 @@
1
+ MIT License
2
+
3
+ Copyright (c) 2026 Danny Bravo
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ of this software and associated documentation files (the "Software"), to deal
7
+ in the Software without restriction, including without limitation the rights
8
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ copies of the Software, and to permit persons to whom the Software is
10
+ furnished to do so, subject to the following conditions:
11
+
12
+ The above copyright notice and this permission notice shall be included in all
13
+ copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21
+ SOFTWARE.
susurro-0.4.0/PKG-INFO ADDED
@@ -0,0 +1,233 @@
1
+ Metadata-Version: 2.4
2
+ Name: susurro
3
+ Version: 0.4.0
4
+ Summary: Local voice dictation for macOS. Whisper + Llama 3.2 3B run on-device via Apple's MLX framework. No accounts, no API keys, no network.
5
+ Project-URL: Homepage, https://github.com/danilobrando/susurro
6
+ Project-URL: Repository, https://github.com/danilobrando/susurro
7
+ Project-URL: Issues, https://github.com/danilobrando/susurro/issues
8
+ Project-URL: Changelog, https://github.com/danilobrando/susurro/blob/main/CHANGELOG.md
9
+ Author: Danny Bravo
10
+ License-Expression: MIT
11
+ License-File: LICENSE
12
+ Keywords: apple-silicon,dictation,local-first,macos,menu-bar,mlx,offline,privacy,speech-to-text,stt,whisper
13
+ Classifier: Development Status :: 4 - Beta
14
+ Classifier: Environment :: MacOS X
15
+ Classifier: Intended Audience :: End Users/Desktop
16
+ Classifier: License :: OSI Approved :: MIT License
17
+ Classifier: Operating System :: MacOS :: MacOS X
18
+ Classifier: Programming Language :: Python :: 3
19
+ Classifier: Programming Language :: Python :: 3.10
20
+ Classifier: Programming Language :: Python :: 3.11
21
+ Classifier: Programming Language :: Python :: 3.12
22
+ Classifier: Topic :: Multimedia :: Sound/Audio :: Speech
23
+ Classifier: Topic :: Utilities
24
+ Requires-Python: >=3.10
25
+ Requires-Dist: mlx-lm>=0.21
26
+ Requires-Dist: mlx-whisper>=0.4
27
+ Requires-Dist: mlx>=0.21
28
+ Requires-Dist: numpy>=1.26
29
+ Requires-Dist: pynput>=1.7
30
+ Requires-Dist: rumps>=0.4
31
+ Requires-Dist: sounddevice>=0.4
32
+ Provides-Extra: dev
33
+ Requires-Dist: build>=1.2; extra == 'dev'
34
+ Requires-Dist: pillow>=10.0; extra == 'dev'
35
+ Requires-Dist: pytest>=8.0; extra == 'dev'
36
+ Requires-Dist: ruff>=0.6; extra == 'dev'
37
+ Requires-Dist: twine>=5.0; extra == 'dev'
38
+ Description-Content-Type: text/markdown
39
+
40
+ # Susurro
41
+
42
+ > **Local voice dictation for macOS. Fully offline. MIT licensed.**
43
+
44
+ Hold a hotkey, talk, release. The transcript is polished into structured text β€” ordinals become numbered lists, fillers get stripped, self-corrections get applied β€” and pasted at the cursor in any app. Everything runs on your Mac through Apple's MLX framework. No accounts, no API keys, no network.
45
+
46
+ [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](LICENSE)
47
+ [![macOS](https://img.shields.io/badge/macOS-13%2B-blue)]()
48
+ [![Apple Silicon](https://img.shields.io/badge/Apple%20Silicon-required-success)]()
49
+ [![Python](https://img.shields.io/badge/Python-3.10%2B-blue)]()
50
+ [![Version](https://img.shields.io/badge/version-0.4.0-success)](https://github.com/danilobrando/susurro/releases/latest)
51
+ [![PyPI](https://img.shields.io/pypi/v/susurro.svg)](https://pypi.org/project/susurro/)
52
+ [![Landing](https://img.shields.io/badge/site-susurro.live-white)](https://susurro.live)
53
+
54
+ ## Install (one line)
55
+
56
+ ```bash
57
+ curl -fsSL https://raw.githubusercontent.com/danilobrando/susurro/main/install.sh | bash
58
+ ```
59
+
60
+ Or with `pipx`:
61
+
62
+ ```bash
63
+ pipx install susurro
64
+ susurro
65
+ ```
66
+
67
+ After install, hold the **right Option key (βŒ₯)** to dictate. The first launch downloads Whisper + Llama weights (~5 GB total, one-time) and triggers three macOS permission prompts (Microphone, Accessibility, Input Monitoring).
68
+
69
+ ## Why Susurro
70
+
71
+ - **Fully offline.** Audio never leaves your machine. No telemetry, no analytics, no cloud calls during normal use.
72
+ - **WisprFlow-grade smart formatting.** An LLM polishes the raw transcript: ordinals become numbered lists, fillers like "um/eh/o sea" get removed, self-corrections ("Pedro, eh, Pablo digo") collapse to the final intent.
73
+ - **Auditable.** Every (raw β†’ polished) edit is logged locally to `~/.susurro/polish.jsonl`. Nothing is hidden.
74
+ - **MIT licensed.** Read the code, fork it, redistribute it.
75
+
76
+ If you want zero local RAM + lower latency at the cost of cloud transcription, [Susurro Pro](https://susurro.live) is a paid hosted variant that extends this package.
77
+
78
+ ## Requirements
79
+
80
+ - Apple Silicon Mac (M1 or later). MLX doesn't support Intel.
81
+ - macOS 13+ recommended (tested on 26).
82
+ - Python 3.10+.
83
+ - ~5 GB free disk (one-time model download).
84
+ - ~3 GB free RAM while running.
85
+
86
+ ## Usage
87
+
88
+ 1. Click into any text field.
89
+ 2. **Hold the right Option key (βŒ₯)** and speak.
90
+ 3. **Release.** After ~1.5 s, the polished transcript pastes at the cursor via Cmd+V.
91
+
92
+ While recording, a small dark **waveform pill** appears near the bottom of the active screen, with 16 white bars rippling to your voice. Toggle off via the *Show waveform indicator* menu item.
93
+
94
+ Menu bar icon reflects state:
95
+
96
+ | Icon | Meaning |
97
+ |---|---|
98
+ | πŸŽ™ idle | Ready. Hold the hotkey to record. |
99
+ | πŸ”΄ recording | Listening. Release to transcribe. |
100
+ | ⏳ processing | Transcribing + polishing on-device. |
101
+
102
+ ## Smart formatting
103
+
104
+ The polish step turns raw dictation into structured text. Three modes (switchable from the menu):
105
+
106
+ - **Off** β€” paste raw STT output unchanged.
107
+ - **Rules only** β€” regex cleanup: filler removal (`eh`, `mmm`, `o sea sΓ­`, `um`, `uh`), whitespace normalization. <5 ms.
108
+ - **Smart (LLM)** β€” rules + local Llama 3.2 3B polish, but only when triggers fire (ordinal markers, backtrack phrases, long-form input). Otherwise stays rules-only to keep latency low.
109
+
110
+ Example (`smart` mode):
111
+
112
+ ```
113
+ Raw: "Vamos a seguir tres pasos. Primero, reinicia. Segundo, vuelve a registrarte. Tercero, envΓ­a un correo."
114
+
115
+ Polished:
116
+ Vamos a seguir tres pasos.
117
+
118
+ 1. Reinicia
119
+ 2. Vuelve a registrarte
120
+ 3. EnvΓ­a un correo
121
+ ```
122
+
123
+ ## Configuration
124
+
125
+ Edit `susurro/config.py`:
126
+
127
+ - **`STT_BACKEND`** β€” `local` (default). Extension packages can register more.
128
+ - **`POLISH_MODE`** β€” `smart` (default), `rules`, or `off`.
129
+ - **`LOCAL_STT_MODEL`** β€” `whisper-large-v3-turbo` (default), or `whisper-large-v3-mlx` for max accuracy.
130
+ - **`LOCAL_POLISH_MODEL`** β€” `Llama-3.2-3B-Instruct-4bit` (default), or any mlx-community 3B-class model.
131
+ - **`HOTKEY`** β€” `alt_r` (default). Any pynput `Key` name: `alt_l`, `ctrl_r`, `f19`, etc.
132
+ - **`LANGUAGE`** β€” `None` for auto-detect, or pin to `"es"` / `"en"` to save ~100 ms per request.
133
+ - **`INPUT_DEVICE`** β€” pick a specific mic. Run `python -m sounddevice` to list devices.
134
+
135
+ ## Permissions
136
+
137
+ macOS will prompt for three permissions the first time you run Susurro:
138
+
139
+ 1. **Microphone** β€” to capture your voice.
140
+ 2. **Accessibility** β€” to paste the transcript into the focused app.
141
+ 3. **Input Monitoring** β€” to listen for the global hotkey.
142
+
143
+ After granting any of these, **fully quit and relaunch your terminal** for the new permission to take effect. The menu bar has direct links to each pane.
144
+
145
+ ## Architecture
146
+
147
+ ```
148
+ audio (sounddevice β†’ 16kHz mono float32)
149
+ β†’ MLX Whisper (whisper-large-v3-turbo, on-device)
150
+ β†’ raw text
151
+ β†’ Polisher
152
+ β”œ Tier 1: regex rules (filler removal, whitespace)
153
+ β”œ Tier 2: trigger check (ordinals / backtrack / long-form)
154
+ β”” Tier 3: MLX-LM polish (Llama 3.2 3B Instruct, on-device)
155
+ β†’ polished text
156
+ β†’ clipboard write + Cmd+V into focused app
157
+ ```
158
+
159
+ Source layout β€” under ~1500 lines of Python total:
160
+
161
+ ```
162
+ susurro/
163
+ config.py # all tunables
164
+ audio.py # mic capture + peak_level for indicator
165
+ hotkey.py # pynput global hotkey
166
+ typer.py # clipboard / keystroke insertion
167
+ indicator.py # floating waveform pill (PyObjC)
168
+ permissions.py # System Settings deep links
169
+ app.py # rumps menu bar + main loop (subclassable)
170
+ backends/
171
+ base.py # protocols (Transcriber, PolishLLM)
172
+ local_mlx.py # local Whisper via MLX
173
+ local_mlx_lm.py # local polish LLM via mlx-lm
174
+ __init__.py # factories + extension registration
175
+ polish/
176
+ __init__.py # Polisher orchestrator
177
+ rules.py # regex cleanup
178
+ triggers.py # decides if LLM should fire
179
+ prompt.py # system prompt + few-shot examples
180
+ icons/ # template PNGs for menu bar
181
+ ```
182
+
183
+ ## Extending Susurro
184
+
185
+ External packages can register additional backends without modifying this code:
186
+
187
+ ```python
188
+ # in your_extension/__init__.py
189
+ from susurro.backends import register_transcriber, register_polish_llm
190
+
191
+ class MyCloudSTT:
192
+ name = "mycloud"
193
+ def warmup(self): ...
194
+ def transcribe(self, audio): ...
195
+
196
+ register_transcriber("mycloud", lambda: MyCloudSTT())
197
+ ```
198
+
199
+ Then set `STT_BACKEND = "mycloud"` in `susurro/config.py` or via environment.
200
+
201
+ ## Troubleshooting
202
+
203
+ - **Menu bar icon invisible** β€” emoji-only menu bar items can be hidden on MacBooks with a notch. This release ships a real template PNG, which fixes it for most users.
204
+ - **"This process is not trusted"** β€” Accessibility permission isn't granted. Use the *Open Accessibility Settings…* menu item, then fully restart the terminal.
205
+ - **Hotkey doesn't trigger** β€” Input Monitoring permission is missing.
206
+ - **Silent recordings / empty transcript** β€” Microphone permission is missing, or `INPUT_DEVICE` is pointing at the wrong device.
207
+ - **First transcription is slow** β€” the model is still warming up. Wait until the menu shows *Status: idle* before the first real dictation.
208
+
209
+ Logs land in `~/.susurro/susurro.log`; polish events in `~/.susurro/polish.jsonl`.
210
+
211
+ ## Contributing
212
+
213
+ See [CONTRIBUTING.md](CONTRIBUTING.md). PRs welcome; please keep the package under ~1500 lines.
214
+
215
+ ## Security
216
+
217
+ See [SECURITY.md](SECURITY.md). Report vulnerabilities privately to the maintainer.
218
+
219
+ ## Maintainer
220
+
221
+ Built and maintained by [Danny Bravo](https://github.com/danilobrando) (`dannybravo@gmail.com`). Product strategist, AI ecosystem builder, educator β€” based in BogotΓ‘.
222
+
223
+ ## License
224
+
225
+ [MIT](LICENSE) Β© 2026 Danny Bravo.
226
+
227
+ ## Credits
228
+
229
+ - [ml-explore/mlx](https://github.com/ml-explore/mlx) and [mlx-examples/whisper](https://github.com/ml-explore/mlx-examples/tree/main/whisper) β€” Apple's MLX framework and the MLX Whisper port.
230
+ - [OpenAI Whisper](https://github.com/openai/whisper) β€” the model.
231
+ - [Meta Llama 3.2](https://www.llama.com/) β€” the polish LLM.
232
+ - [rumps](https://github.com/jaredks/rumps), [pynput](https://github.com/moses-palmer/pynput), [sounddevice](https://github.com/spatialaudio/python-sounddevice) β€” Python ↔ macOS glue.
233
+ - WisprFlow and SuperWhisper β€” the product UX this clones.