susurro 0.4.0__tar.gz
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- susurro-0.4.0/.github/ISSUE_TEMPLATE/bug_report.md +39 -0
- susurro-0.4.0/.github/ISSUE_TEMPLATE/config.yml +5 -0
- susurro-0.4.0/.github/ISSUE_TEMPLATE/feature_request.md +22 -0
- susurro-0.4.0/.github/PULL_REQUEST_TEMPLATE.md +18 -0
- susurro-0.4.0/.github/workflows/ci.yml +39 -0
- susurro-0.4.0/.gitignore +41 -0
- susurro-0.4.0/CHANGELOG.md +84 -0
- susurro-0.4.0/CODE_OF_CONDUCT.md +7 -0
- susurro-0.4.0/CONTRIBUTING.md +58 -0
- susurro-0.4.0/LICENSE +21 -0
- susurro-0.4.0/PKG-INFO +233 -0
- susurro-0.4.0/README.md +194 -0
- susurro-0.4.0/SECURITY.md +36 -0
- susurro-0.4.0/install.sh +120 -0
- susurro-0.4.0/pyproject.toml +90 -0
- susurro-0.4.0/scripts/generate_icons.py +77 -0
- susurro-0.4.0/scripts/run.sh +9 -0
- susurro-0.4.0/scripts/test_mic.py +50 -0
- susurro-0.4.0/susurro/__init__.py +3 -0
- susurro-0.4.0/susurro/__main__.py +6 -0
- susurro-0.4.0/susurro/app.py +387 -0
- susurro-0.4.0/susurro/audio.py +107 -0
- susurro-0.4.0/susurro/backends/__init__.py +77 -0
- susurro-0.4.0/susurro/backends/base.py +50 -0
- susurro-0.4.0/susurro/backends/local_mlx.py +61 -0
- susurro-0.4.0/susurro/backends/local_mlx_lm.py +76 -0
- susurro-0.4.0/susurro/config.py +67 -0
- susurro-0.4.0/susurro/hotkey.py +62 -0
- susurro-0.4.0/susurro/icons/idle.png +0 -0
- susurro-0.4.0/susurro/icons/processing.png +0 -0
- susurro-0.4.0/susurro/icons/recording.png +0 -0
- susurro-0.4.0/susurro/indicator.py +249 -0
- susurro-0.4.0/susurro/logging_config.py +20 -0
- susurro-0.4.0/susurro/permissions.py +29 -0
- susurro-0.4.0/susurro/polish/__init__.py +97 -0
- susurro-0.4.0/susurro/polish/prompt.py +61 -0
- susurro-0.4.0/susurro/polish/rules.py +49 -0
- susurro-0.4.0/susurro/polish/triggers.py +42 -0
- susurro-0.4.0/susurro/typer.py +64 -0
- susurro-0.4.0/tests/test_polish.py +94 -0
- susurro-0.4.0/tests/test_smoke.py +87 -0
|
@@ -0,0 +1,39 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: Bug report
|
|
3
|
+
about: Something is broken or behaving unexpectedly.
|
|
4
|
+
title: "[bug] "
|
|
5
|
+
labels: ["bug"]
|
|
6
|
+
---
|
|
7
|
+
|
|
8
|
+
**What happened**
|
|
9
|
+
|
|
10
|
+
A clear and concise description of the bug.
|
|
11
|
+
|
|
12
|
+
**Expected behavior**
|
|
13
|
+
|
|
14
|
+
What you thought would happen.
|
|
15
|
+
|
|
16
|
+
**Steps to reproduce**
|
|
17
|
+
|
|
18
|
+
1.
|
|
19
|
+
2.
|
|
20
|
+
3.
|
|
21
|
+
|
|
22
|
+
**Environment**
|
|
23
|
+
|
|
24
|
+
- macOS version + chip (e.g. `macOS 26.0 / M3 Pro`):
|
|
25
|
+
- Python version (`python --version`):
|
|
26
|
+
- Susurro version (`pip show susurro | grep Version`):
|
|
27
|
+
- Model in use (from `susurro/config.py`):
|
|
28
|
+
|
|
29
|
+
**Logs**
|
|
30
|
+
|
|
31
|
+
The last ~50 lines of `~/.susurro/susurro.log`:
|
|
32
|
+
|
|
33
|
+
```
|
|
34
|
+
paste log output here
|
|
35
|
+
```
|
|
36
|
+
|
|
37
|
+
**Additional context**
|
|
38
|
+
|
|
39
|
+
Anything else worth knowing.
|
|
@@ -0,0 +1,22 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: Feature request
|
|
3
|
+
about: Suggest a new capability or improvement.
|
|
4
|
+
title: "[feature] "
|
|
5
|
+
labels: ["enhancement"]
|
|
6
|
+
---
|
|
7
|
+
|
|
8
|
+
**What you'd like**
|
|
9
|
+
|
|
10
|
+
A clear description of the feature.
|
|
11
|
+
|
|
12
|
+
**Why it matters**
|
|
13
|
+
|
|
14
|
+
The use case this unlocks. Be concrete β "I want to dictate code comments in VS Code while my hands stay on the keyboard" is more useful than "more flexibility."
|
|
15
|
+
|
|
16
|
+
**Alternatives considered**
|
|
17
|
+
|
|
18
|
+
What you've tried instead, or other tools that solve this.
|
|
19
|
+
|
|
20
|
+
**Scope check**
|
|
21
|
+
|
|
22
|
+
Susurro aims to stay under ~800 lines of Python and avoid feature creep. Does this fit? If it adds significant surface area, what could be left out?
|
|
@@ -0,0 +1,18 @@
|
|
|
1
|
+
## Summary
|
|
2
|
+
|
|
3
|
+
What this PR does and why.
|
|
4
|
+
|
|
5
|
+
## Changes
|
|
6
|
+
|
|
7
|
+
-
|
|
8
|
+
|
|
9
|
+
## Test plan
|
|
10
|
+
|
|
11
|
+
- [ ] `ruff check .` passes
|
|
12
|
+
- [ ] `ruff format --check .` passes
|
|
13
|
+
- [ ] Manually exercised the affected code paths on macOS
|
|
14
|
+
- [ ] If touching the daemon: held the hotkey, spoke, confirmed text was pasted at the cursor
|
|
15
|
+
|
|
16
|
+
## Notes for reviewer
|
|
17
|
+
|
|
18
|
+
Anything non-obvious β design tradeoffs, edge cases considered, things deliberately left out.
|
|
@@ -0,0 +1,39 @@
|
|
|
1
|
+
name: CI
|
|
2
|
+
|
|
3
|
+
on:
|
|
4
|
+
push:
|
|
5
|
+
branches: [main]
|
|
6
|
+
pull_request:
|
|
7
|
+
branches: [main]
|
|
8
|
+
|
|
9
|
+
jobs:
|
|
10
|
+
lint:
|
|
11
|
+
name: Lint + format
|
|
12
|
+
runs-on: macos-14
|
|
13
|
+
steps:
|
|
14
|
+
- uses: actions/checkout@v4
|
|
15
|
+
- uses: actions/setup-python@v5
|
|
16
|
+
with:
|
|
17
|
+
python-version: "3.12"
|
|
18
|
+
- name: Install ruff
|
|
19
|
+
run: pip install ruff
|
|
20
|
+
- name: ruff check
|
|
21
|
+
run: ruff check .
|
|
22
|
+
- name: ruff format --check
|
|
23
|
+
run: ruff format --check .
|
|
24
|
+
|
|
25
|
+
smoke:
|
|
26
|
+
name: Smoke import test
|
|
27
|
+
runs-on: macos-14
|
|
28
|
+
steps:
|
|
29
|
+
- uses: actions/checkout@v4
|
|
30
|
+
- uses: actions/setup-python@v5
|
|
31
|
+
with:
|
|
32
|
+
python-version: "3.12"
|
|
33
|
+
- name: Install package
|
|
34
|
+
# We can't actually install mlx on CI runners reliably (Apple Silicon
|
|
35
|
+
# availability varies), so we just verify the package metadata is sound
|
|
36
|
+
# and imports that don't need mlx work.
|
|
37
|
+
run: pip install --no-deps -e .
|
|
38
|
+
- name: Verify package metadata
|
|
39
|
+
run: python -c "import susurro; print(susurro.__version__)"
|
susurro-0.4.0/.gitignore
ADDED
|
@@ -0,0 +1,41 @@
|
|
|
1
|
+
# Python
|
|
2
|
+
__pycache__/
|
|
3
|
+
*.py[cod]
|
|
4
|
+
*$py.class
|
|
5
|
+
*.egg-info/
|
|
6
|
+
.eggs/
|
|
7
|
+
*.so
|
|
8
|
+
dist/
|
|
9
|
+
build/
|
|
10
|
+
|
|
11
|
+
# Virtual environments
|
|
12
|
+
.venv/
|
|
13
|
+
venv/
|
|
14
|
+
env/
|
|
15
|
+
|
|
16
|
+
# Environment + secrets
|
|
17
|
+
.env
|
|
18
|
+
.env.local
|
|
19
|
+
|
|
20
|
+
# OS
|
|
21
|
+
.DS_Store
|
|
22
|
+
Thumbs.db
|
|
23
|
+
|
|
24
|
+
# Logs + local state
|
|
25
|
+
*.log
|
|
26
|
+
.cache/
|
|
27
|
+
|
|
28
|
+
# Tooling
|
|
29
|
+
.coverage
|
|
30
|
+
htmlcov/
|
|
31
|
+
.pytest_cache/
|
|
32
|
+
.ruff_cache/
|
|
33
|
+
.mypy_cache/
|
|
34
|
+
|
|
35
|
+
# Packaged bundles (generated by py2app / briefcase)
|
|
36
|
+
*.app/
|
|
37
|
+
dist-app/
|
|
38
|
+
|
|
39
|
+
# IDEs
|
|
40
|
+
.idea/
|
|
41
|
+
.vscode/
|
|
@@ -0,0 +1,84 @@
|
|
|
1
|
+
# Changelog
|
|
2
|
+
|
|
3
|
+
All notable changes to this project will be documented in this file.
|
|
4
|
+
|
|
5
|
+
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/),
|
|
6
|
+
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
|
|
7
|
+
|
|
8
|
+
## [Unreleased]
|
|
9
|
+
|
|
10
|
+
## [0.4.0] β 2026-05-23
|
|
11
|
+
|
|
12
|
+
### Changed
|
|
13
|
+
|
|
14
|
+
- **Split into two products.** This package (`susurro` on PyPI) is now the OSS local-only Mac client. The cloud-extended product (`susurro-pro`) lives in a separate private repo, depends on this package, and adds hosted transcription via api.susurro.live. Same hotkey, same UX β different backend.
|
|
15
|
+
- Default `STT_BACKEND` = `local` (was `groq`).
|
|
16
|
+
- Default `POLISH_BACKEND` = `local` (was `groq`).
|
|
17
|
+
- Removed `openai` and `httpx` dependencies (no more cloud calls from OSS).
|
|
18
|
+
- Default `LOCAL_STT_MODEL` switched from `whisper-large-v3-mlx` to `whisper-large-v3-turbo` for ~6x faster decode.
|
|
19
|
+
|
|
20
|
+
### Removed
|
|
21
|
+
|
|
22
|
+
- `susurro/backends/susurro_pro.py` β moved to the susurro-pro package.
|
|
23
|
+
- `susurro/backends/groq.py` β moved to the susurro-pro package.
|
|
24
|
+
- `susurro/backends/audio_io.py` and `credentials.py` β used only by cloud backends, moved with them.
|
|
25
|
+
- `api/` directory β FastAPI backend moved to the susurro-pro repo.
|
|
26
|
+
- `docs/` directory β landing page moved to susurro-pro/landing/.
|
|
27
|
+
- Root-level `Dockerfile`, `Caddyfile`, `railway.json` β Pro-only.
|
|
28
|
+
- "Sign in to Susurro Pro" menu items β Pro adds these via `_extra_menu_items()` override.
|
|
29
|
+
|
|
30
|
+
### Added
|
|
31
|
+
|
|
32
|
+
- `susurro.backends.register_transcriber()` and `register_polish_llm()` β public extension points so external packages can add backends without forking.
|
|
33
|
+
- `SusurroApp._build_menu()` and `_extra_menu_items()` β hook points so subclasses can extend the menu cleanly.
|
|
34
|
+
- `available_transcribers()` / `available_polish_llms()` helpers.
|
|
35
|
+
- Published to PyPI as `susurro`.
|
|
36
|
+
|
|
37
|
+
## [0.2.0] β 2026-05-22
|
|
38
|
+
|
|
39
|
+
### Added
|
|
40
|
+
|
|
41
|
+
- **Hot-swappable backends.** STT and polish providers are now pluggable. Ships with `local` (MLX Whisper) and `groq` (hosted Whisper + Llama 3.3 70B). Future drops: OpenAI, Anthropic, Gemini, Deepgram.
|
|
42
|
+
- **Cloud-first defaults.** `STT_BACKEND="groq"` and `POLISH_BACKEND="groq"` drop local RAM usage from ~3 GB to 0 GB. Latency improves from ~1.8 s to ~0.7 s end-to-end on a 5 s clip.
|
|
43
|
+
- **Smart formatting (LLM polish).** Three modes (`off`/`rules`/`smart`). The LLM runs **only** when triggers fire (ordinal markers, backtrack phrases, long-form input) to keep latency low for the common case.
|
|
44
|
+
- **Polish rules layer.** Regex-based filler removal (`eh`, `mmm`, `o sea sΓ`, `um`, `uh`, `este pues`) + whitespace normalization. Runs in <5 ms, no network.
|
|
45
|
+
- **Polish trigger detection.** Detects Spanish/English ordinals (`primero/segundo/tercero`, `first/second/third`, `en primer lugar`) and self-correction phrases (`en realidad`, `actually`, `digo`).
|
|
46
|
+
- **Polish system prompt.** Idempotent prompt with 5 few-shot examples covering numbered lists, filler removal, and backtrack. Refuses to paraphrase or translate.
|
|
47
|
+
- **Polish event log.** Every (raw, polished, metadata) tuple appended to `~/.susurro/polish.jsonl` for local audit and future tuning. Never sent anywhere.
|
|
48
|
+
- **Menu submenu: Smart formatting βΈ Off / Rules only / Smart (LLM).** Hot-toggle without restart.
|
|
49
|
+
- **Fallback chain.** If a cloud backend fails (missing key, network error), automatically falls back to local MLX.
|
|
50
|
+
- **Floating waveform indicator** (carried over from v0.1.x development). 16-bar pill, click-through, follows the active screen.
|
|
51
|
+
- API key resolution accepts both `SUSURRO_<PROVIDER>_API_KEY` and the provider's standard env var.
|
|
52
|
+
|
|
53
|
+
### Changed
|
|
54
|
+
|
|
55
|
+
- **Tagline: dropped "local-first" as the default identity.** Susurro now positions as "your choice of where inference runs" β local for full privacy, cloud for low memory + low latency. Per-stage configuration.
|
|
56
|
+
- Default `LOCAL_STT_MODEL` switched from `whisper-large-v3-mlx` to `whisper-large-v3-turbo` for ~6Γ faster local decode.
|
|
57
|
+
- Package version bumped to 0.2.0.
|
|
58
|
+
- `susurro/stt.py` removed; replaced by `susurro/backends/local_mlx.py` (`MLXTranscriber`).
|
|
59
|
+
|
|
60
|
+
### Dependencies
|
|
61
|
+
|
|
62
|
+
- Added `openai>=1.40` (used for both OpenAI and Groq via OpenAI-compatible API).
|
|
63
|
+
- Added optional extras: `anthropic`, `gemini`, `deepgram`.
|
|
64
|
+
|
|
65
|
+
## [0.1.0] β 2026-05-21
|
|
66
|
+
|
|
67
|
+
## [0.1.0] β 2026-05-21
|
|
68
|
+
|
|
69
|
+
### Added
|
|
70
|
+
|
|
71
|
+
- Push-to-talk dictation triggered by the right Option key.
|
|
72
|
+
- Local transcription via [`mlx-whisper`](https://github.com/ml-explore/mlx-examples/tree/main/whisper) (Apple Silicon only). Default model is `whisper-large-v3-mlx`; swap to `whisper-large-v3-turbo` in `susurro/config.py` for ~6Γ faster decode.
|
|
73
|
+
- Menu bar app (rumps) with template PNG icons that adapt to light/dark mode.
|
|
74
|
+
- Clipboard paste mode (Cmd+V) with prior-clipboard restoration, and a direct-typing fallback.
|
|
75
|
+
- Status updates and last-transcript shortcut in the menu dropdown.
|
|
76
|
+
- Quick-access menu items that open the correct System Settings pane for Microphone, Accessibility, and Input Monitoring.
|
|
77
|
+
- File logging at `~/.susurro/susurro.log`.
|
|
78
|
+
- Smoke test script: `scripts/test_mic.py`.
|
|
79
|
+
|
|
80
|
+
### Known limitations
|
|
81
|
+
|
|
82
|
+
- Apple Silicon only β Intel Macs aren't supported by MLX.
|
|
83
|
+
- First launch requires three System Settings permission grants (Microphone, Accessibility, Input Monitoring) and a terminal restart.
|
|
84
|
+
- No `.app` bundle yet; install via `pipx` or `pip` and launch from terminal.
|
|
@@ -0,0 +1,7 @@
|
|
|
1
|
+
# Code of Conduct
|
|
2
|
+
|
|
3
|
+
This project follows the [Contributor Covenant 2.1](https://www.contributor-covenant.org/version/2/1/code_of_conduct/).
|
|
4
|
+
|
|
5
|
+
In short: be kind, assume good faith, and remember the person on the other side of the screen is a human. Harassment, personal attacks, and demeaning behavior are not welcome and will result in being asked to leave the project.
|
|
6
|
+
|
|
7
|
+
Issues with conduct can be reported privately to `dannybravo@gmail.com`. Reports are confidential and will be reviewed within 72 hours.
|
|
@@ -0,0 +1,58 @@
|
|
|
1
|
+
# Contributing to Susurro
|
|
2
|
+
|
|
3
|
+
Thanks for considering a contribution. Susurro is intentionally small β under ~800 lines of Python total β so it stays readable and maintainable by one person in a weekend.
|
|
4
|
+
|
|
5
|
+
## Dev setup
|
|
6
|
+
|
|
7
|
+
```bash
|
|
8
|
+
git clone https://github.com/danilobrando/susurro
|
|
9
|
+
cd susurro
|
|
10
|
+
pip install -e ".[dev]"
|
|
11
|
+
```
|
|
12
|
+
|
|
13
|
+
Run from source:
|
|
14
|
+
|
|
15
|
+
```bash
|
|
16
|
+
python -m susurro
|
|
17
|
+
```
|
|
18
|
+
|
|
19
|
+
Generate the menu bar icons (only needed if you change `scripts/generate_icons.py`):
|
|
20
|
+
|
|
21
|
+
```bash
|
|
22
|
+
python scripts/generate_icons.py
|
|
23
|
+
```
|
|
24
|
+
|
|
25
|
+
Run lint and format checks:
|
|
26
|
+
|
|
27
|
+
```bash
|
|
28
|
+
ruff check .
|
|
29
|
+
ruff format --check .
|
|
30
|
+
```
|
|
31
|
+
|
|
32
|
+
## Pull request guidelines
|
|
33
|
+
|
|
34
|
+
- Keep the surface area small. If a change adds more than ~100 lines, open an issue first to align on scope.
|
|
35
|
+
- One concern per PR. Refactors and behavior changes don't mix.
|
|
36
|
+
- Don't add hidden network calls β privacy is the product. Anything that touches the network needs to be opt-in and clearly labeled.
|
|
37
|
+
- Match the existing style: type hints throughout, no docstrings longer than a sentence, no comments that just restate the code.
|
|
38
|
+
|
|
39
|
+
## What we welcome
|
|
40
|
+
|
|
41
|
+
- Bug fixes, especially around macOS permissions and menu bar edge cases.
|
|
42
|
+
- New keyboard layouts and language defaults.
|
|
43
|
+
- Performance benchmarks across M-series chips and Whisper variants.
|
|
44
|
+
- Documentation improvements.
|
|
45
|
+
|
|
46
|
+
## What we're more cautious about
|
|
47
|
+
|
|
48
|
+
- New features that grow the dependency tree.
|
|
49
|
+
- Anything that turns this into a full Speech-Privacy SaaS. There are other projects for that.
|
|
50
|
+
|
|
51
|
+
## Reporting bugs
|
|
52
|
+
|
|
53
|
+
Open an issue with:
|
|
54
|
+
|
|
55
|
+
- macOS version + chip (e.g. `macOS 26.0 / M3 Pro`)
|
|
56
|
+
- Output of `pip show mlx-whisper`
|
|
57
|
+
- Last 50 lines of `~/.susurro/susurro.log`
|
|
58
|
+
- Steps to reproduce
|
susurro-0.4.0/LICENSE
ADDED
|
@@ -0,0 +1,21 @@
|
|
|
1
|
+
MIT License
|
|
2
|
+
|
|
3
|
+
Copyright (c) 2026 Danny Bravo
|
|
4
|
+
|
|
5
|
+
Permission is hereby granted, free of charge, to any person obtaining a copy
|
|
6
|
+
of this software and associated documentation files (the "Software"), to deal
|
|
7
|
+
in the Software without restriction, including without limitation the rights
|
|
8
|
+
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
|
|
9
|
+
copies of the Software, and to permit persons to whom the Software is
|
|
10
|
+
furnished to do so, subject to the following conditions:
|
|
11
|
+
|
|
12
|
+
The above copyright notice and this permission notice shall be included in all
|
|
13
|
+
copies or substantial portions of the Software.
|
|
14
|
+
|
|
15
|
+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
|
16
|
+
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
|
17
|
+
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
|
|
18
|
+
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
|
19
|
+
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
|
|
20
|
+
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
|
|
21
|
+
SOFTWARE.
|
susurro-0.4.0/PKG-INFO
ADDED
|
@@ -0,0 +1,233 @@
|
|
|
1
|
+
Metadata-Version: 2.4
|
|
2
|
+
Name: susurro
|
|
3
|
+
Version: 0.4.0
|
|
4
|
+
Summary: Local voice dictation for macOS. Whisper + Llama 3.2 3B run on-device via Apple's MLX framework. No accounts, no API keys, no network.
|
|
5
|
+
Project-URL: Homepage, https://github.com/danilobrando/susurro
|
|
6
|
+
Project-URL: Repository, https://github.com/danilobrando/susurro
|
|
7
|
+
Project-URL: Issues, https://github.com/danilobrando/susurro/issues
|
|
8
|
+
Project-URL: Changelog, https://github.com/danilobrando/susurro/blob/main/CHANGELOG.md
|
|
9
|
+
Author: Danny Bravo
|
|
10
|
+
License-Expression: MIT
|
|
11
|
+
License-File: LICENSE
|
|
12
|
+
Keywords: apple-silicon,dictation,local-first,macos,menu-bar,mlx,offline,privacy,speech-to-text,stt,whisper
|
|
13
|
+
Classifier: Development Status :: 4 - Beta
|
|
14
|
+
Classifier: Environment :: MacOS X
|
|
15
|
+
Classifier: Intended Audience :: End Users/Desktop
|
|
16
|
+
Classifier: License :: OSI Approved :: MIT License
|
|
17
|
+
Classifier: Operating System :: MacOS :: MacOS X
|
|
18
|
+
Classifier: Programming Language :: Python :: 3
|
|
19
|
+
Classifier: Programming Language :: Python :: 3.10
|
|
20
|
+
Classifier: Programming Language :: Python :: 3.11
|
|
21
|
+
Classifier: Programming Language :: Python :: 3.12
|
|
22
|
+
Classifier: Topic :: Multimedia :: Sound/Audio :: Speech
|
|
23
|
+
Classifier: Topic :: Utilities
|
|
24
|
+
Requires-Python: >=3.10
|
|
25
|
+
Requires-Dist: mlx-lm>=0.21
|
|
26
|
+
Requires-Dist: mlx-whisper>=0.4
|
|
27
|
+
Requires-Dist: mlx>=0.21
|
|
28
|
+
Requires-Dist: numpy>=1.26
|
|
29
|
+
Requires-Dist: pynput>=1.7
|
|
30
|
+
Requires-Dist: rumps>=0.4
|
|
31
|
+
Requires-Dist: sounddevice>=0.4
|
|
32
|
+
Provides-Extra: dev
|
|
33
|
+
Requires-Dist: build>=1.2; extra == 'dev'
|
|
34
|
+
Requires-Dist: pillow>=10.0; extra == 'dev'
|
|
35
|
+
Requires-Dist: pytest>=8.0; extra == 'dev'
|
|
36
|
+
Requires-Dist: ruff>=0.6; extra == 'dev'
|
|
37
|
+
Requires-Dist: twine>=5.0; extra == 'dev'
|
|
38
|
+
Description-Content-Type: text/markdown
|
|
39
|
+
|
|
40
|
+
# Susurro
|
|
41
|
+
|
|
42
|
+
> **Local voice dictation for macOS. Fully offline. MIT licensed.**
|
|
43
|
+
|
|
44
|
+
Hold a hotkey, talk, release. The transcript is polished into structured text β ordinals become numbered lists, fillers get stripped, self-corrections get applied β and pasted at the cursor in any app. Everything runs on your Mac through Apple's MLX framework. No accounts, no API keys, no network.
|
|
45
|
+
|
|
46
|
+
[](LICENSE)
|
|
47
|
+
[]()
|
|
48
|
+
[]()
|
|
49
|
+
[]()
|
|
50
|
+
[](https://github.com/danilobrando/susurro/releases/latest)
|
|
51
|
+
[](https://pypi.org/project/susurro/)
|
|
52
|
+
[](https://susurro.live)
|
|
53
|
+
|
|
54
|
+
## Install (one line)
|
|
55
|
+
|
|
56
|
+
```bash
|
|
57
|
+
curl -fsSL https://raw.githubusercontent.com/danilobrando/susurro/main/install.sh | bash
|
|
58
|
+
```
|
|
59
|
+
|
|
60
|
+
Or with `pipx`:
|
|
61
|
+
|
|
62
|
+
```bash
|
|
63
|
+
pipx install susurro
|
|
64
|
+
susurro
|
|
65
|
+
```
|
|
66
|
+
|
|
67
|
+
After install, hold the **right Option key (β₯)** to dictate. The first launch downloads Whisper + Llama weights (~5 GB total, one-time) and triggers three macOS permission prompts (Microphone, Accessibility, Input Monitoring).
|
|
68
|
+
|
|
69
|
+
## Why Susurro
|
|
70
|
+
|
|
71
|
+
- **Fully offline.** Audio never leaves your machine. No telemetry, no analytics, no cloud calls during normal use.
|
|
72
|
+
- **WisprFlow-grade smart formatting.** An LLM polishes the raw transcript: ordinals become numbered lists, fillers like "um/eh/o sea" get removed, self-corrections ("Pedro, eh, Pablo digo") collapse to the final intent.
|
|
73
|
+
- **Auditable.** Every (raw β polished) edit is logged locally to `~/.susurro/polish.jsonl`. Nothing is hidden.
|
|
74
|
+
- **MIT licensed.** Read the code, fork it, redistribute it.
|
|
75
|
+
|
|
76
|
+
If you want zero local RAM + lower latency at the cost of cloud transcription, [Susurro Pro](https://susurro.live) is a paid hosted variant that extends this package.
|
|
77
|
+
|
|
78
|
+
## Requirements
|
|
79
|
+
|
|
80
|
+
- Apple Silicon Mac (M1 or later). MLX doesn't support Intel.
|
|
81
|
+
- macOS 13+ recommended (tested on 26).
|
|
82
|
+
- Python 3.10+.
|
|
83
|
+
- ~5 GB free disk (one-time model download).
|
|
84
|
+
- ~3 GB free RAM while running.
|
|
85
|
+
|
|
86
|
+
## Usage
|
|
87
|
+
|
|
88
|
+
1. Click into any text field.
|
|
89
|
+
2. **Hold the right Option key (β₯)** and speak.
|
|
90
|
+
3. **Release.** After ~1.5 s, the polished transcript pastes at the cursor via Cmd+V.
|
|
91
|
+
|
|
92
|
+
While recording, a small dark **waveform pill** appears near the bottom of the active screen, with 16 white bars rippling to your voice. Toggle off via the *Show waveform indicator* menu item.
|
|
93
|
+
|
|
94
|
+
Menu bar icon reflects state:
|
|
95
|
+
|
|
96
|
+
| Icon | Meaning |
|
|
97
|
+
|---|---|
|
|
98
|
+
| π idle | Ready. Hold the hotkey to record. |
|
|
99
|
+
| π΄ recording | Listening. Release to transcribe. |
|
|
100
|
+
| β³ processing | Transcribing + polishing on-device. |
|
|
101
|
+
|
|
102
|
+
## Smart formatting
|
|
103
|
+
|
|
104
|
+
The polish step turns raw dictation into structured text. Three modes (switchable from the menu):
|
|
105
|
+
|
|
106
|
+
- **Off** β paste raw STT output unchanged.
|
|
107
|
+
- **Rules only** β regex cleanup: filler removal (`eh`, `mmm`, `o sea sΓ`, `um`, `uh`), whitespace normalization. <5 ms.
|
|
108
|
+
- **Smart (LLM)** β rules + local Llama 3.2 3B polish, but only when triggers fire (ordinal markers, backtrack phrases, long-form input). Otherwise stays rules-only to keep latency low.
|
|
109
|
+
|
|
110
|
+
Example (`smart` mode):
|
|
111
|
+
|
|
112
|
+
```
|
|
113
|
+
Raw: "Vamos a seguir tres pasos. Primero, reinicia. Segundo, vuelve a registrarte. Tercero, envΓa un correo."
|
|
114
|
+
|
|
115
|
+
Polished:
|
|
116
|
+
Vamos a seguir tres pasos.
|
|
117
|
+
|
|
118
|
+
1. Reinicia
|
|
119
|
+
2. Vuelve a registrarte
|
|
120
|
+
3. EnvΓa un correo
|
|
121
|
+
```
|
|
122
|
+
|
|
123
|
+
## Configuration
|
|
124
|
+
|
|
125
|
+
Edit `susurro/config.py`:
|
|
126
|
+
|
|
127
|
+
- **`STT_BACKEND`** β `local` (default). Extension packages can register more.
|
|
128
|
+
- **`POLISH_MODE`** β `smart` (default), `rules`, or `off`.
|
|
129
|
+
- **`LOCAL_STT_MODEL`** β `whisper-large-v3-turbo` (default), or `whisper-large-v3-mlx` for max accuracy.
|
|
130
|
+
- **`LOCAL_POLISH_MODEL`** β `Llama-3.2-3B-Instruct-4bit` (default), or any mlx-community 3B-class model.
|
|
131
|
+
- **`HOTKEY`** β `alt_r` (default). Any pynput `Key` name: `alt_l`, `ctrl_r`, `f19`, etc.
|
|
132
|
+
- **`LANGUAGE`** β `None` for auto-detect, or pin to `"es"` / `"en"` to save ~100 ms per request.
|
|
133
|
+
- **`INPUT_DEVICE`** β pick a specific mic. Run `python -m sounddevice` to list devices.
|
|
134
|
+
|
|
135
|
+
## Permissions
|
|
136
|
+
|
|
137
|
+
macOS will prompt for three permissions the first time you run Susurro:
|
|
138
|
+
|
|
139
|
+
1. **Microphone** β to capture your voice.
|
|
140
|
+
2. **Accessibility** β to paste the transcript into the focused app.
|
|
141
|
+
3. **Input Monitoring** β to listen for the global hotkey.
|
|
142
|
+
|
|
143
|
+
After granting any of these, **fully quit and relaunch your terminal** for the new permission to take effect. The menu bar has direct links to each pane.
|
|
144
|
+
|
|
145
|
+
## Architecture
|
|
146
|
+
|
|
147
|
+
```
|
|
148
|
+
audio (sounddevice β 16kHz mono float32)
|
|
149
|
+
β MLX Whisper (whisper-large-v3-turbo, on-device)
|
|
150
|
+
β raw text
|
|
151
|
+
β Polisher
|
|
152
|
+
β Tier 1: regex rules (filler removal, whitespace)
|
|
153
|
+
β Tier 2: trigger check (ordinals / backtrack / long-form)
|
|
154
|
+
β Tier 3: MLX-LM polish (Llama 3.2 3B Instruct, on-device)
|
|
155
|
+
β polished text
|
|
156
|
+
β clipboard write + Cmd+V into focused app
|
|
157
|
+
```
|
|
158
|
+
|
|
159
|
+
Source layout β under ~1500 lines of Python total:
|
|
160
|
+
|
|
161
|
+
```
|
|
162
|
+
susurro/
|
|
163
|
+
config.py # all tunables
|
|
164
|
+
audio.py # mic capture + peak_level for indicator
|
|
165
|
+
hotkey.py # pynput global hotkey
|
|
166
|
+
typer.py # clipboard / keystroke insertion
|
|
167
|
+
indicator.py # floating waveform pill (PyObjC)
|
|
168
|
+
permissions.py # System Settings deep links
|
|
169
|
+
app.py # rumps menu bar + main loop (subclassable)
|
|
170
|
+
backends/
|
|
171
|
+
base.py # protocols (Transcriber, PolishLLM)
|
|
172
|
+
local_mlx.py # local Whisper via MLX
|
|
173
|
+
local_mlx_lm.py # local polish LLM via mlx-lm
|
|
174
|
+
__init__.py # factories + extension registration
|
|
175
|
+
polish/
|
|
176
|
+
__init__.py # Polisher orchestrator
|
|
177
|
+
rules.py # regex cleanup
|
|
178
|
+
triggers.py # decides if LLM should fire
|
|
179
|
+
prompt.py # system prompt + few-shot examples
|
|
180
|
+
icons/ # template PNGs for menu bar
|
|
181
|
+
```
|
|
182
|
+
|
|
183
|
+
## Extending Susurro
|
|
184
|
+
|
|
185
|
+
External packages can register additional backends without modifying this code:
|
|
186
|
+
|
|
187
|
+
```python
|
|
188
|
+
# in your_extension/__init__.py
|
|
189
|
+
from susurro.backends import register_transcriber, register_polish_llm
|
|
190
|
+
|
|
191
|
+
class MyCloudSTT:
|
|
192
|
+
name = "mycloud"
|
|
193
|
+
def warmup(self): ...
|
|
194
|
+
def transcribe(self, audio): ...
|
|
195
|
+
|
|
196
|
+
register_transcriber("mycloud", lambda: MyCloudSTT())
|
|
197
|
+
```
|
|
198
|
+
|
|
199
|
+
Then set `STT_BACKEND = "mycloud"` in `susurro/config.py` or via environment.
|
|
200
|
+
|
|
201
|
+
## Troubleshooting
|
|
202
|
+
|
|
203
|
+
- **Menu bar icon invisible** β emoji-only menu bar items can be hidden on MacBooks with a notch. This release ships a real template PNG, which fixes it for most users.
|
|
204
|
+
- **"This process is not trusted"** β Accessibility permission isn't granted. Use the *Open Accessibility Settingsβ¦* menu item, then fully restart the terminal.
|
|
205
|
+
- **Hotkey doesn't trigger** β Input Monitoring permission is missing.
|
|
206
|
+
- **Silent recordings / empty transcript** β Microphone permission is missing, or `INPUT_DEVICE` is pointing at the wrong device.
|
|
207
|
+
- **First transcription is slow** β the model is still warming up. Wait until the menu shows *Status: idle* before the first real dictation.
|
|
208
|
+
|
|
209
|
+
Logs land in `~/.susurro/susurro.log`; polish events in `~/.susurro/polish.jsonl`.
|
|
210
|
+
|
|
211
|
+
## Contributing
|
|
212
|
+
|
|
213
|
+
See [CONTRIBUTING.md](CONTRIBUTING.md). PRs welcome; please keep the package under ~1500 lines.
|
|
214
|
+
|
|
215
|
+
## Security
|
|
216
|
+
|
|
217
|
+
See [SECURITY.md](SECURITY.md). Report vulnerabilities privately to the maintainer.
|
|
218
|
+
|
|
219
|
+
## Maintainer
|
|
220
|
+
|
|
221
|
+
Built and maintained by [Danny Bravo](https://github.com/danilobrando) (`dannybravo@gmail.com`). Product strategist, AI ecosystem builder, educator β based in BogotΓ‘.
|
|
222
|
+
|
|
223
|
+
## License
|
|
224
|
+
|
|
225
|
+
[MIT](LICENSE) Β© 2026 Danny Bravo.
|
|
226
|
+
|
|
227
|
+
## Credits
|
|
228
|
+
|
|
229
|
+
- [ml-explore/mlx](https://github.com/ml-explore/mlx) and [mlx-examples/whisper](https://github.com/ml-explore/mlx-examples/tree/main/whisper) β Apple's MLX framework and the MLX Whisper port.
|
|
230
|
+
- [OpenAI Whisper](https://github.com/openai/whisper) β the model.
|
|
231
|
+
- [Meta Llama 3.2](https://www.llama.com/) β the polish LLM.
|
|
232
|
+
- [rumps](https://github.com/jaredks/rumps), [pynput](https://github.com/moses-palmer/pynput), [sounddevice](https://github.com/spatialaudio/python-sounddevice) β Python β macOS glue.
|
|
233
|
+
- WisprFlow and SuperWhisper β the product UX this clones.
|