PyPI - pushtotype - Versions diffs - 0.1.0__tar.gz - Mend

pushtotype 0.1.0__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (47) hide show

pushtotype-0.1.0/.claude/settings.local.json +11 -0
pushtotype-0.1.0/.github/ISSUE_TEMPLATE/bug_report.md +34 -0
pushtotype-0.1.0/.github/ISSUE_TEMPLATE/feature_request.md +17 -0
pushtotype-0.1.0/.github/workflows/ci.yml +28 -0
pushtotype-0.1.0/.gitignore +46 -0
pushtotype-0.1.0/.python-version +1 -0
pushtotype-0.1.0/CONTRIBUTING.md +107 -0
pushtotype-0.1.0/LICENSE +21 -0
pushtotype-0.1.0/PKG-INFO +260 -0
pushtotype-0.1.0/README.md +225 -0
pushtotype-0.1.0/assets/.gitkeep +0 -0
pushtotype-0.1.0/claude.md +19 -0
pushtotype-0.1.0/dogfood_test.md +68 -0
pushtotype-0.1.0/learnings/xclip_injection_fix.md +54 -0
pushtotype-0.1.0/main.py +6 -0
pushtotype-0.1.0/memory/MEMORY.md +4 -0
pushtotype-0.1.0/memory/project_efficiency_benchmark.md +15 -0
pushtotype-0.1.0/project-plan.md +340 -0
pushtotype-0.1.0/pyproject.toml +56 -0
pushtotype-0.1.0/src/pushtotype/__init__.py +1 -0
pushtotype-0.1.0/src/pushtotype/__main__.py +1 -0
pushtotype-0.1.0/src/pushtotype/audio.py +56 -0
pushtotype-0.1.0/src/pushtotype/cli.py +269 -0
pushtotype-0.1.0/src/pushtotype/config.py +144 -0
pushtotype-0.1.0/src/pushtotype/daemon.py +274 -0
pushtotype-0.1.0/src/pushtotype/feedback.py +175 -0
pushtotype-0.1.0/src/pushtotype/hotkey.py +183 -0
pushtotype-0.1.0/src/pushtotype/injector.py +72 -0
pushtotype-0.1.0/src/pushtotype/session.py +57 -0
pushtotype-0.1.0/src/pushtotype/sounds/start.wav +0 -0
pushtotype-0.1.0/src/pushtotype/sounds/stop.wav +0 -0
pushtotype-0.1.0/src/pushtotype/transcriber.py +62 -0
pushtotype-0.1.0/tasks/M0_PROJECT_SETUP.md +181 -0
pushtotype-0.1.0/tasks/M1_AUDIO_TRANSCRIPTION.md +207 -0
pushtotype-0.1.0/tasks/M2_HOTKEY_PUSH_TO_TALK.md +246 -0
pushtotype-0.1.0/tasks/M3_TEXT_INJECTION.md +249 -0
pushtotype-0.1.0/tasks/M4_CONFIG_POLISH.md +306 -0
pushtotype-0.1.0/tasks/M5_DISTRIBUTION_DOCS.md +329 -0
pushtotype-0.1.0/tests/test_audio.py +95 -0
pushtotype-0.1.0/tests/test_config.py +169 -0
pushtotype-0.1.0/tests/test_feedback.py +63 -0
pushtotype-0.1.0/tests/test_hotkey.py +119 -0
pushtotype-0.1.0/tests/test_injector.py +104 -0
pushtotype-0.1.0/tests/test_session.py +27 -0
pushtotype-0.1.0/tests/test_smoke.py +9 -0
pushtotype-0.1.0/tests/test_transcriber.py +75 -0
pushtotype-0.1.0/uv.lock +1049 -0

pushtotype-0.1.0/.claude/settings.local.json ADDED Viewed

@@ -0,0 +1,11 @@
+{
+  "permissions": {
+    "allow": [
+      "Bash(gh auth:*)",
+      "Bash(pip install:*)",
+      "Bash(uv run:*)",
+      "Bash(uv add:*)",
+      "Bash(uv pip:*)"
+    ]
+  }
+}

pushtotype-0.1.0/.github/ISSUE_TEMPLATE/bug_report.md ADDED Viewed

@@ -0,0 +1,34 @@
+---
+name: Bug report
+about: Something isn't working
+labels: bug
+---
+## What happened
+<!-- A clear description of the bug -->
+## Steps to reproduce
+1.
+2.
+3.
+## Expected behavior
+<!-- What you expected to happen -->
+## Startup output (`pushtotype -v`)
+```
+paste output here
+```
+## Environment
+- OS:
+- Display server (X11 / Wayland):
+- Python version (`python --version`):
+- PushToType version (`pushtotype --version`):
+- GPU / CPU:
+- Install method (pip / pipx / uv / source):

pushtotype-0.1.0/.github/ISSUE_TEMPLATE/feature_request.md ADDED Viewed

@@ -0,0 +1,17 @@
+---
+name: Feature request
+about: Suggest an idea or improvement
+labels: enhancement
+---
+## Problem
+<!-- What problem does this solve? Who is it for? -->
+## Proposed solution
+<!-- What would you like to see? -->
+## Alternatives considered
+<!-- Any other approaches you thought about? -->

pushtotype-0.1.0/.github/workflows/ci.yml ADDED Viewed

@@ -0,0 +1,28 @@
+name: CI
+on:
+  push:
+  pull_request:
+jobs:
+  lint-and-test:
+    runs-on: ubuntu-latest
+    steps:
+      - uses: actions/checkout@v4
+      - name: Set up Python
+        uses: actions/setup-python@v5
+        with:
+          python-version: "3.10"
+      - name: Install dependencies
+        run: pip install -e ".[dev]"
+      - name: Lint
+        run: ruff check .
+      - name: Format check
+        run: ruff format --check .
+      - name: Test
+        run: pytest

pushtotype-0.1.0/.gitignore ADDED Viewed

@@ -0,0 +1,46 @@
+# Python
+__pycache__/
+*.py[cod]
+*.pyo
+*.pyd
+.Python
+*.egg
+*.egg-info/
+dist/
+build/
+eggs/
+parts/
+var/
+sdist/
+wheels/
+.installed.cfg
+lib/
+lib64/
+# Virtual environments
+.venv/
+venv/
+env/
+ENV/
+# Testing
+.pytest_cache/
+.coverage
+htmlcov/
+# Ruff
+.ruff_cache/
+# IDE
+.idea/
+.vscode/
+*.swp
+*.swo
+# OS
+.DS_Store
+Thumbs.db
+# Distribution
+*.tar.gz
+*.whl

pushtotype-0.1.0/.python-version ADDED Viewed

	@@ -0,0 +1 @@
1	+ 3.13

pushtotype-0.1.0/CONTRIBUTING.md ADDED Viewed

@@ -0,0 +1,107 @@
+# Contributing to PushToType
+Thanks for your interest in contributing. PushToType is a focused tool — the goal is to keep it simple, fast, and reliable.
+---
+## Development Setup
+```bash
+# Clone the repo
+git clone https://github.com/danielgraviet/pushtotype.git
+cd pushtotype
+# Install with dev dependencies
+uv pip install -e ".[dev]"
+# Verify setup
+uv run pytest tests/
+uv run pushtotype test --duration 3
+```
+**System dependencies required for manual testing:**
+```bash
+sudo apt install libportaudio2 xdotool
+sudo usermod -aG input $USER  # log out/in after
+```
+---
+## Running Tests
+```bash
+uv run pytest tests/          # all tests
+uv run pytest tests/ -v       # verbose
+uv run pytest tests/ -q       # quiet
+```
+Tests use mocks for hardware (audio, evdev, GPU) so they run in CI without any devices attached.
+---
+## Code Style
+PushToType uses [ruff](https://docs.astral.sh/ruff/) for linting and formatting.
+```bash
+uv run ruff check src/ tests/          # lint
+uv run ruff format src/ tests/         # format
+uv run ruff format --check src/ tests/ # check without changing
+```
+CI will fail if either check fails. Run both before submitting a PR.
+---
+## Architecture
+```
+src/pushtotype/
+├── cli.py          Entry point — click commands, config wiring, wizard
+├── daemon.py       Main loop — hotkey → record → transcribe → inject
+├── config.py       TOML config loading, saving, validation, defaults
+├── hotkey.py       evdev-based global hotkey listener (async)
+├── transcriber.py  faster-whisper wrapper
+├── injector.py     xdotool type (X11) / wtype (Wayland)
+├── audio.py        sounddevice audio capture
+├── feedback.py     Start/stop/error beep sounds
+└── session.py      X11 / Wayland detection
+```
+**Data flow:**
+1. `HotkeyListener` (evdev, async) fires `_on_press` / `_on_release` callbacks
+2. `_on_release` concatenates recorded audio frames and schedules `_transcribe`
+3. `_transcribe` runs `Transcriber.transcribe()` in a thread pool executor
+4. Result is passed to `TextInjector.inject()` which calls `xdotool type`
+---
+## Where Help Is Wanted
+Check the [issues](https://github.com/danielgraviet/pushtotype/issues) page for `good first issue` labels. Some areas:
+- **Wayland improvements** — better session detection, testing on more compositors
+- **AMD GPU support** — ROCm / DirectML via ctranslate2
+- **Hotkey UX** — better evdev capture fallback for users not in the `input` group
+- **Tests** — more coverage for daemon and CLI integration paths
+---
+## Submitting a PR
+1. Fork the repo and create a branch from `master`
+2. Make your changes
+3. Run `uv run pytest tests/` and `uv run ruff check src/ tests/` — both must pass
+4. Open a PR with a clear description of what changed and why
+For larger changes, open an issue first to discuss the approach.
+---
+## Reporting Bugs
+Use the [bug report template](.github/ISSUE_TEMPLATE/bug_report.md). Include:
+- OS and display server (X11/Wayland)
+- Python version
+- Output of `pushtotype -v` startup block
+- Steps to reproduce

pushtotype-0.1.0/LICENSE ADDED Viewed

@@ -0,0 +1,21 @@
+MIT License
+Copyright (c) 2026 danielgraviet
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.

pushtotype-0.1.0/PKG-INFO ADDED Viewed

@@ -0,0 +1,260 @@
+Metadata-Version: 2.4
+Name: pushtotype
+Version: 0.1.0
+Summary: Real-time speech-to-text for Linux. Hold a hotkey, speak, release — your words appear wherever your cursor is.
+Project-URL: Homepage, https://github.com/danielgraviet/pushtotype
+Project-URL: Repository, https://github.com/danielgraviet/pushtotype
+Project-URL: Issues, https://github.com/danielgraviet/pushtotype/issues
+Author-email: Daniel Graviet <dtgraviet@gmail.com>
+License: MIT
+License-File: LICENSE
+Keywords: linux,push-to-talk,speech-to-text,transcription,voice-typing,whisper
+Classifier: Development Status :: 3 - Alpha
+Classifier: Environment :: X11 Applications
+Classifier: Intended Audience :: End Users/Desktop
+Classifier: License :: OSI Approved :: MIT License
+Classifier: Operating System :: POSIX :: Linux
+Classifier: Programming Language :: Python :: 3
+Classifier: Programming Language :: Python :: 3.10
+Classifier: Programming Language :: Python :: 3.11
+Classifier: Programming Language :: Python :: 3.12
+Classifier: Topic :: Multimedia :: Sound/Audio :: Speech
+Requires-Python: >=3.10
+Requires-Dist: click>=8.0
+Requires-Dist: evdev>=1.6.0
+Requires-Dist: faster-whisper>=1.0.0
+Requires-Dist: numpy>=1.24
+Requires-Dist: sounddevice>=0.4.6
+Requires-Dist: tomli-w>=1.0
+Requires-Dist: tomli>=2.0; python_version < '3.11'
+Provides-Extra: dev
+Requires-Dist: pytest; extra == 'dev'
+Requires-Dist: pytest-asyncio; extra == 'dev'
+Requires-Dist: ruff; extra == 'dev'
+Description-Content-Type: text/markdown
+# PushToType
+> Hold a hotkey, speak, release — your words appear wherever your cursor is.
+[![PyPI version](https://img.shields.io/pypi/v/pushtotype)](https://pypi.org/project/pushtotype/)
+[![Python](https://img.shields.io/pypi/pyversions/pushtotype)](https://pypi.org/project/pushtotype/)
+[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
+[![CI](https://github.com/danielgraviet/pushtotype/actions/workflows/ci.yml/badge.svg)](https://github.com/danielgraviet/pushtotype/actions)
+PushToType is a local, real-time speech-to-text tool for Linux. It transcribes your voice using a local Whisper model and types the result directly into whatever application has focus — no clipboard, no cloud, no API keys.
+An open-source alternative to OpenAI's Whisper Flow, which has no Linux support.
+---
+## Features
+- **Works everywhere** — types into any focused app: browsers, editors, terminals, search bars
+- **Local-only** — `faster-whisper` runs on your GPU (CUDA) with automatic CPU fallback
+- **No cloud** — no API keys, no network required after the one-time model download
+- **Fast** — ~250ms from hotkey release to text appearing
+- **Configurable** — TOML config file, interactive setup wizard, CLI flags
+- **Wayland + X11** — works on both display servers via `evdev`
+---
+## Quick Start
+```bash
+# Install
+uv add pushtotype        # or: pip install pushtotype
+# System dependencies (X11)
+sudo apt install libportaudio2 xdotool
+# Add yourself to the input group (required for hotkey detection)
+sudo usermod -aG input $USER
+# Log out and back in for this to take effect
+# Run the setup wizard
+pushtotype config
+# Start
+pushtotype
+```
+Hold your configured hotkey (default: right Ctrl), speak, release. Text appears at the cursor.
+---
+## How It Works
+```
+[Hold hotkey] → [Record audio] → [Whisper transcription] → [Type into focused app]
+     evdev            sounddevice       faster-whisper           xdotool type
+```
+PushToType runs as a background daemon. A global hotkey listener (via `evdev`, reading directly from `/dev/input/`) fires a recording callback. When you release the hotkey, the audio is sent to `faster-whisper` for transcription, then `xdotool type` injects the text into whatever window is focused.
+---
+## Installation
+### Recommended: uv
+```bash
+uv tool install pushtotype
+```
+### pip / pipx
+```bash
+pip install pushtotype
+# or
+pipx install pushtotype
+```
+### From source
+```bash
+git clone https://github.com/danielgraviet/pushtotype.git
+cd pushtotype
+uv pip install -e ".[dev]"
+```
+---
+## System Requirements
+| Requirement | Notes |
+|---|---|
+| Linux | X11 or Wayland |
+| Python 3.10+ | |
+| `libportaudio2` | `sudo apt install libportaudio2` |
+| `xdotool` | X11 only — `sudo apt install xdotool` |
+| `wtype` + `wl-clipboard` | Wayland only — `sudo apt install wtype wl-clipboard` |
+| `input` group | `sudo usermod -aG input $USER` |
+| NVIDIA GPU | Recommended for speed — CPU works but is slower |
+---
+## Configuration
+Config file lives at `~/.config/pushtotype/config.toml`. Run `pushtotype config` to create it interactively.
+```toml
+[hotkey]
+keys = ["KEY_RIGHTCTRL"]
+[audio]
+device = "default"
+sample_rate = 16000
+[model]
+name = "base.en"
+device = "auto"
+compute_type = "float16"
+[feedback]
+enabled = true
+volume = 0.5
+[output]
+method = "auto"   # "auto", "x11", or "wayland"
+```
+### Config priority (highest to lowest)
+1. CLI flags (e.g. `--model small.en`)
+2. Environment variables (e.g. `PUSHTOTYPE_MODEL=small.en`)
+3. Config file (`~/.config/pushtotype/config.toml`)
+4. Built-in defaults
+### Environment variables
+| Variable | Config key |
+|---|---|
+| `PUSHTOTYPE_MODEL` | `model.name` |
+| `PUSHTOTYPE_DEVICE` | `model.device` |
+| `PUSHTOTYPE_AUDIO_DEV` | `audio.device` |
+| `PUSHTOTYPE_FEEDBACK` | `feedback.enabled` |
+| `PUSHTOTYPE_HOTKEY` | `hotkey.keys` (comma-separated) |
+---
+## CLI Reference
+```
+pushtotype                  Start the push-to-talk daemon
+pushtotype config           Run the interactive setup wizard
+pushtotype config --show    Print the current effective config
+pushtotype devices          List available audio input devices
+pushtotype test             Record 5 seconds and transcribe (verify setup)
+pushtotype download [MODEL] Pre-download a Whisper model
+```
+**Global flags:**
+```
+-v, --verbose     Enable debug logging (shows per-step timings)
+-q, --quiet       Suppress all output except errors
+--log-file PATH   Write logs to a file
+--model NAME      Override model (e.g. small.en)
+--hotkey COMBO    Override hotkey (e.g. ctrl+shift+s)
+--device INDEX    Override audio device index
+--no-feedback     Disable start/stop beeps
+```
+---
+## Troubleshooting
+**`Permission denied` on `/dev/input/`**
+You need to be in the `input` group:
+```bash
+sudo usermod -aG input $USER
+# Log out and back in
+```
+**`xdotool not found`**
+```bash
+sudo apt install xdotool
+```
+**Text doesn't appear in my terminal**
+Terminals use `Ctrl+Shift+V` to paste, but PushToType uses `xdotool type` which bypasses the clipboard entirely — it should work in all terminals without any special config.
+**CUDA not available**
+PushToType automatically falls back to CPU. Transcription will be slower (~1-3s per 5s of audio vs ~0.2s on GPU). Check `pushtotype -v` startup output to see which device is being used.
+**Model download fails / slow**
+Models are cached in `~/.cache/huggingface/hub/` after the first download. Pre-download manually:
+```bash
+pushtotype download base.en
+```
+**`wtype` or `wl-copy` not found (Wayland)**
+```bash
+sudo apt install wtype wl-clipboard
+```
+---
+## Known Limitations
+- English only (`base.en` model)
+- No AMD GPU (ROCm) support
+- Wayland session detection relies on `XDG_SESSION_TYPE` or `WAYLAND_DISPLAY`
+- No GUI — terminal only
+---
+## Contributing
+See [CONTRIBUTING.md](CONTRIBUTING.md). Issues and PRs welcome.
+---
+## License
+[MIT](LICENSE)