PyPI - eye2byte - Versions diffs - 0.3.0__tar.gz - Mend

eye2byte 0.3.0__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (13) hide show

eye2byte-0.3.0/PKG-INFO +290 -0
eye2byte-0.3.0/README.md +258 -0
eye2byte-0.3.0/eye2byte.egg-info/PKG-INFO +290 -0
eye2byte-0.3.0/eye2byte.egg-info/SOURCES.txt +11 -0
eye2byte-0.3.0/eye2byte.egg-info/dependency_links.txt +1 -0
eye2byte-0.3.0/eye2byte.egg-info/entry_points.txt +4 -0
eye2byte-0.3.0/eye2byte.egg-info/requires.txt +12 -0
eye2byte-0.3.0/eye2byte.egg-info/top_level.txt +3 -0
eye2byte-0.3.0/eye2byte.py +2700 -0
eye2byte-0.3.0/eye2byte_mcp.py +375 -0
eye2byte-0.3.0/eye2byte_ui.py +2424 -0
eye2byte-0.3.0/pyproject.toml +47 -0
eye2byte-0.3.0/setup.cfg +4 -0

eye2byte-0.3.0/PKG-INFO ADDED Viewed

@@ -0,0 +1,290 @@
+Metadata-Version: 2.4
+Name: eye2byte
+Version: 0.3.0
+Summary: Screen-context sidecar for coding agents
+Author: wolverin0
+License-Expression: MIT
+Project-URL: Homepage, https://github.com/wolverin0/Eye2byte
+Project-URL: Changelog, https://github.com/wolverin0/Eye2byte/blob/claude/screen-context-sidecar-KDVSF/CHANGELOG.md
+Project-URL: Issues, https://github.com/wolverin0/Eye2byte/issues
+Keywords: screen-capture,mcp,coding-agent,vision,context
+Classifier: Development Status :: 4 - Beta
+Classifier: Environment :: Console
+Classifier: Intended Audience :: Developers
+Classifier: Operating System :: OS Independent
+Classifier: Programming Language :: Python :: 3
+Classifier: Programming Language :: Python :: 3.10
+Classifier: Programming Language :: Python :: 3.11
+Classifier: Programming Language :: Python :: 3.12
+Classifier: Programming Language :: Python :: 3.13
+Classifier: Topic :: Software Development :: Quality Assurance
+Requires-Python: >=3.10
+Description-Content-Type: text/markdown
+Requires-Dist: Pillow
+Requires-Dist: fastmcp>=2.10
+Provides-Extra: voice
+Requires-Dist: openai-whisper; extra == "voice"
+Provides-Extra: ui
+Requires-Dist: customtkinter>=5.0; extra == "ui"
+Provides-Extra: all
+Requires-Dist: openai-whisper; extra == "all"
+Requires-Dist: customtkinter>=5.0; extra == "all"
+<p align="center">
+  <h1 align="center">Eye2byte</h1>
+  <p align="center">Screen-context sidecar for coding agents</p>
+</p>
+<p align="center">
+  <a href="#setup"><img src="https://img.shields.io/badge/python-3.10+-blue?logo=python&logoColor=white" alt="Python 3.10+"></a>
+  <a href="LICENSE"><img src="https://img.shields.io/badge/license-MIT-green" alt="MIT License"></a>
+  <a href="#platforms"><img src="https://img.shields.io/badge/platform-Windows%20%7C%20macOS%20%7C%20Linux%20%7C%20Android-lightgrey" alt="Cross-platform"></a>
+  <a href="CHANGELOG.md"><img src="https://img.shields.io/badge/changelog-CHANGELOG.md-orange" alt="Changelog"></a>
+</p>
+---
+Captures your screen, voice, and annotations, feeds them to any vision model, and produces structured **Context Packs** your coding agent can act on.
+```
+Screen / Voice / Annotations  -->  Vision Model + Whisper  -->  Context Pack  -->  Coding Agent
+```
+## Features
+- **Multi-monitor capture** — active, specific (1/2/3), or all monitors at once
+- **Voice narration** — record, clean (noise removal + normalization), transcribe locally
+- **Annotations** — arrows, circles, rectangles, freehand, multi-line text on a frozen screenshot
+- **Screen clips** — record short videos, extract keyframes, analyze the sequence
+- **Image optimization** — auto resize + compress (~5x smaller, zero quality loss)
+- **MCP server** — coding agents query your screen directly via Model Context Protocol
+- **Context Packs** — structured output: goal, environment, errors, signals, next steps
+## Platforms
+| Platform | Screenshot | Voice | Annotation | Hotkeys |
+|----------|-----------|-------|------------|---------|
+| Windows | PowerShell .NET | ffmpeg | Pillow | Ctrl+Shift+1-5 |
+| macOS | screencapture | ffmpeg | Pillow | - |
+| Linux | scrot/maim/flameshot | ffmpeg | Pillow | - |
+| Android | ADB (Termux) | Termux:API | - | - |
+## Setup
+### 1. Install dependencies
+```bash
+pip install Pillow fastmcp       # Core + MCP server
+pip install openai-whisper       # Local voice transcription (optional)
+# ffmpeg is required for voice/clips — install via your package manager
+```
+### 2. Configure a vision provider
+Eye2byte works with **any vision model** — local or cloud. Set your provider in `~/.eye2byte/config.json` or the Settings UI:
+| Provider | Setup | Cost |
+|----------|-------|------|
+| **Ollama** (local) | [Install Ollama](https://ollama.com), `ollama pull qwen3-vl:8b` | Free |
+| **Gemini** | Set `GEMINI_API_KEY` in `.env` | Free tier (1000 req/day) |
+| **OpenRouter** | Set `OPENROUTER_API_KEY` in `.env` | Free models available |
+| **Hyperbolic** | Set `HYPERBOLIC_API_KEY` in `.env` | Pay per use |
+```bash
+# .env file (project dir, cwd, or ~/.eye2byte/.env)
+GEMINI_API_KEY=your-key-here
+# or OPENROUTER_API_KEY=...
+# or HYPERBOLIC_API_KEY=...
+```
+### 3. Run
+```bash
+python eye2byte.py capture              # Screenshot + analysis
+python eye2byte.py capture --voice      # + voice narration
+python eye2byte.py capture --mode window # Active window only
+python eye2byte_ui.py                    # Launch control panel
+```
+## Control Panel
+```bash
+python eye2byte_ui.py
+```
+A small always-on-top floating panel. Drag it anywhere. Global hotkeys work even when the panel isn't focused.
+### Global Hotkeys (Windows)
+These work system-wide — no need to focus the Eye2byte window:
+| Hotkey | Action | Notes |
+|--------|--------|-------|
+| `Ctrl+Shift+1` | Capture screenshot | Uses current mode (Full/Window/Region) |
+| `Ctrl+Shift+2` | Annotate | Freezes screen, opens drawing overlay |
+| `Ctrl+Shift+3` | Toggle voice recording | Press once to start, again to stop |
+| `Ctrl+Shift+5` | Grab clipboard image | Analyzes whatever image is on your clipboard |
+All keyboard shortcuts are customizable from Settings > Keyboard Shortcuts.
+### Panel Controls
+| Control | Action |
+|---------|--------|
+| `Space` (hold) | Push-to-talk — hold to record, release to stop |
+| Mode selector | Cycle between Full Screen / Window / Region |
+| Settings | Configure provider, model, image quality, cleanup |
+| Copy @path | Copy session path to clipboard for `@`-mentioning |
+### Annotation Overlay
+When you press `Ctrl+Shift+2` or click Annotate, the screen freezes and you can draw on it:
+| Key | Tool | How to use |
+|-----|------|-----------|
+| `X` | Arrow | Click and drag to draw an arrow |
+| `C` | Circle | Click and drag to draw an ellipse |
+| `V` | Rectangle | Click and drag to draw a box |
+| `B` | Freehand | Click and drag to draw freely |
+| `T` | Text | Click to place, type your text |
+| Action | How |
+|--------|-----|
+| **Save** | `Enter` (commits annotations and sends to vision model) |
+| **Cancel** | `Escape` (discards all annotations) |
+| **Undo** | Right-click near an annotation to remove it |
+| **Newline in text** | `Shift+Enter` (Enter alone commits the text) |
+| **Multi-line text** | Text box auto-grows up to 6 lines |
+### Voice Recording
+Three ways to record voice:
+1. **Toggle** — `Ctrl+Shift+3` starts recording, press again to stop
+2. **Push-to-talk** — Hold `Space` while panel is focused
+3. **Mouse PTT** — Hold click on the Record button
+While recording, any captures you take are automatically bundled with the voice note into a single session.
+## MCP Server
+Eye2byte exposes 6 tools via the [Model Context Protocol](https://modelcontextprotocol.io), letting coding agents capture and analyze your screen directly.
+| Tool | Description |
+|------|-------------|
+| `capture_and_summarize` | Screenshot + vision analysis. Supports monitor selection, delay, window targeting |
+| `capture_with_voice` | Screenshot + voice recording + transcription + analysis |
+| `record_clip_and_summarize` | Screen clip with keyframe extraction and sequence analysis |
+| `summarize_screenshot` | Analyze an existing image file |
+| `transcribe_audio` | Local Whisper transcription of any audio file |
+| `get_recent_context` | Retrieve recent Context Pack summaries |
+### Local Setup (stdio)
+Eye2byte runs on the machine whose screen you want to capture. For local agents like Claude Code on the same machine, use stdio transport:
+**Claude Code** — add to your project's `.mcp.json`:
+```json
+{
+  "mcpServers": {
+    "eye2byte": {
+      "command": "python",
+      "args": ["C:/path/to/eye2byte_mcp.py"]
+    }
+  }
+}
+```
+That's it — Claude Code will auto-start the server. Use full absolute paths.
+### Remote Setup (SSE)
+When your coding agent runs on a **different machine** (cloud VM, SSH dev box, CI runner) but needs to see your local screen, use SSE transport:
+**Step 1 — On your local machine** (the one with the screen):
+```bash
+# Install Eye2byte + dependencies
+pip install Pillow fastmcp
+pip install openai-whisper  # optional, for voice
+# Start the SSE server
+python eye2byte_mcp.py --sse                           # No auth (LAN only)
+python eye2byte_mcp.py --sse --token mysecret123       # Bearer token auth
+python eye2byte_mcp.py --sse --port 9000 --token abc   # Custom port + auth
+```
+The server stays running and accepts connections from any machine on your network. Use `--token` when the server is reachable beyond your trusted LAN.
+**Step 2 — On the remote machine** (where the coding agent runs):
+Nothing to install. Just configure the MCP client to point at your local IP:
+```json
+{
+  "mcpServers": {
+    "eye2byte": {
+      "url": "http://YOUR_LOCAL_IP:8808/sse",
+      "headers": {"Authorization": "Bearer mysecret123"}
+    }
+  }
+}
+```
+Omit the `headers` field if the server was started without `--token`.
+Find your local IP: `ipconfig` (Windows) or `ifconfig` / `ip addr` (Linux/macOS).
+**Firewall:** You may need to allow inbound TCP on port 8808. On Windows, run as admin:
+```powershell
+netsh advfirewall firewall add rule name="Eye2byte MCP" dir=in action=allow protocol=TCP localport=8808
+```
+### Multi-monitor Examples
+```
+capture_and_summarize(monitor=0)    # active monitor (default)
+capture_and_summarize(monitor=1)    # first monitor
+capture_and_summarize(monitor=2)    # second monitor
+capture_and_summarize(monitor=-1)   # ALL monitors at once
+```
+## Context Pack Format
+Every analysis produces a structured Context Pack:
+```markdown
+## Goal         — what the user appears to be doing
+## Environment  — OS, editor, repo, branch, language
+## Screen State — visible panels, files, terminal output
+## Signals      — verbatim errors, stack traces, warnings
+## Likely Situation — what's probably happening
+## Suggested Next Info — what a coding agent needs next
+```
+## Configuration
+Config: `~/.eye2byte/config.json` (created on first run or via `python eye2byte.py init`)
+| Setting | Default | Description |
+|---------|---------|-------------|
+| `provider` | `"ollama"` | Vision provider: ollama, gemini, openrouter, hyperbolic |
+| `model` | `"auto"` | Model name or "auto" for auto-detection |
+| `voice_clean` | `true` | Noise removal + pause trimming + volume normalization |
+| `auto_cleanup_days` | `7` | Delete old captures/summaries after N days (0=disabled) |
+| `image_max_size` | `1920` | Max image dimension before LLM processing |
+| `image_quality` | `90` | JPEG quality (1-100) |
+## Files
+| File | Purpose |
+|------|---------|
+| `eye2byte.py` | Core engine — capture, voice, clip, summarize, watch |
+| `eye2byte_ui.py` | Control panel with hotkeys and annotation overlay |
+| `eye2byte_mcp.py` | MCP server for coding agent integration |
+## License
+MIT

eye2byte-0.3.0/README.md ADDED Viewed

@@ -0,0 +1,258 @@
+<p align="center">
+  <h1 align="center">Eye2byte</h1>
+  <p align="center">Screen-context sidecar for coding agents</p>
+</p>
+<p align="center">
+  <a href="#setup"><img src="https://img.shields.io/badge/python-3.10+-blue?logo=python&logoColor=white" alt="Python 3.10+"></a>
+  <a href="LICENSE"><img src="https://img.shields.io/badge/license-MIT-green" alt="MIT License"></a>
+  <a href="#platforms"><img src="https://img.shields.io/badge/platform-Windows%20%7C%20macOS%20%7C%20Linux%20%7C%20Android-lightgrey" alt="Cross-platform"></a>
+  <a href="CHANGELOG.md"><img src="https://img.shields.io/badge/changelog-CHANGELOG.md-orange" alt="Changelog"></a>
+</p>
+---
+Captures your screen, voice, and annotations, feeds them to any vision model, and produces structured **Context Packs** your coding agent can act on.
+```
+Screen / Voice / Annotations  -->  Vision Model + Whisper  -->  Context Pack  -->  Coding Agent
+```
+## Features
+- **Multi-monitor capture** — active, specific (1/2/3), or all monitors at once
+- **Voice narration** — record, clean (noise removal + normalization), transcribe locally
+- **Annotations** — arrows, circles, rectangles, freehand, multi-line text on a frozen screenshot
+- **Screen clips** — record short videos, extract keyframes, analyze the sequence
+- **Image optimization** — auto resize + compress (~5x smaller, zero quality loss)
+- **MCP server** — coding agents query your screen directly via Model Context Protocol
+- **Context Packs** — structured output: goal, environment, errors, signals, next steps
+## Platforms
+| Platform | Screenshot | Voice | Annotation | Hotkeys |
+|----------|-----------|-------|------------|---------|
+| Windows | PowerShell .NET | ffmpeg | Pillow | Ctrl+Shift+1-5 |
+| macOS | screencapture | ffmpeg | Pillow | - |
+| Linux | scrot/maim/flameshot | ffmpeg | Pillow | - |
+| Android | ADB (Termux) | Termux:API | - | - |
+## Setup
+### 1. Install dependencies
+```bash
+pip install Pillow fastmcp       # Core + MCP server
+pip install openai-whisper       # Local voice transcription (optional)
+# ffmpeg is required for voice/clips — install via your package manager
+```
+### 2. Configure a vision provider
+Eye2byte works with **any vision model** — local or cloud. Set your provider in `~/.eye2byte/config.json` or the Settings UI:
+| Provider | Setup | Cost |
+|----------|-------|------|
+| **Ollama** (local) | [Install Ollama](https://ollama.com), `ollama pull qwen3-vl:8b` | Free |
+| **Gemini** | Set `GEMINI_API_KEY` in `.env` | Free tier (1000 req/day) |
+| **OpenRouter** | Set `OPENROUTER_API_KEY` in `.env` | Free models available |
+| **Hyperbolic** | Set `HYPERBOLIC_API_KEY` in `.env` | Pay per use |
+```bash
+# .env file (project dir, cwd, or ~/.eye2byte/.env)
+GEMINI_API_KEY=your-key-here
+# or OPENROUTER_API_KEY=...
+# or HYPERBOLIC_API_KEY=...
+```
+### 3. Run
+```bash
+python eye2byte.py capture              # Screenshot + analysis
+python eye2byte.py capture --voice      # + voice narration
+python eye2byte.py capture --mode window # Active window only
+python eye2byte_ui.py                    # Launch control panel
+```
+## Control Panel
+```bash
+python eye2byte_ui.py
+```
+A small always-on-top floating panel. Drag it anywhere. Global hotkeys work even when the panel isn't focused.
+### Global Hotkeys (Windows)
+These work system-wide — no need to focus the Eye2byte window:
+| Hotkey | Action | Notes |
+|--------|--------|-------|
+| `Ctrl+Shift+1` | Capture screenshot | Uses current mode (Full/Window/Region) |
+| `Ctrl+Shift+2` | Annotate | Freezes screen, opens drawing overlay |
+| `Ctrl+Shift+3` | Toggle voice recording | Press once to start, again to stop |
+| `Ctrl+Shift+5` | Grab clipboard image | Analyzes whatever image is on your clipboard |
+All keyboard shortcuts are customizable from Settings > Keyboard Shortcuts.
+### Panel Controls
+| Control | Action |
+|---------|--------|
+| `Space` (hold) | Push-to-talk — hold to record, release to stop |
+| Mode selector | Cycle between Full Screen / Window / Region |
+| Settings | Configure provider, model, image quality, cleanup |
+| Copy @path | Copy session path to clipboard for `@`-mentioning |
+### Annotation Overlay
+When you press `Ctrl+Shift+2` or click Annotate, the screen freezes and you can draw on it:
+| Key | Tool | How to use |
+|-----|------|-----------|
+| `X` | Arrow | Click and drag to draw an arrow |
+| `C` | Circle | Click and drag to draw an ellipse |
+| `V` | Rectangle | Click and drag to draw a box |
+| `B` | Freehand | Click and drag to draw freely |
+| `T` | Text | Click to place, type your text |
+| Action | How |
+|--------|-----|
+| **Save** | `Enter` (commits annotations and sends to vision model) |
+| **Cancel** | `Escape` (discards all annotations) |
+| **Undo** | Right-click near an annotation to remove it |
+| **Newline in text** | `Shift+Enter` (Enter alone commits the text) |
+| **Multi-line text** | Text box auto-grows up to 6 lines |
+### Voice Recording
+Three ways to record voice:
+1. **Toggle** — `Ctrl+Shift+3` starts recording, press again to stop
+2. **Push-to-talk** — Hold `Space` while panel is focused
+3. **Mouse PTT** — Hold click on the Record button
+While recording, any captures you take are automatically bundled with the voice note into a single session.
+## MCP Server
+Eye2byte exposes 6 tools via the [Model Context Protocol](https://modelcontextprotocol.io), letting coding agents capture and analyze your screen directly.
+| Tool | Description |
+|------|-------------|
+| `capture_and_summarize` | Screenshot + vision analysis. Supports monitor selection, delay, window targeting |
+| `capture_with_voice` | Screenshot + voice recording + transcription + analysis |
+| `record_clip_and_summarize` | Screen clip with keyframe extraction and sequence analysis |
+| `summarize_screenshot` | Analyze an existing image file |
+| `transcribe_audio` | Local Whisper transcription of any audio file |
+| `get_recent_context` | Retrieve recent Context Pack summaries |
+### Local Setup (stdio)
+Eye2byte runs on the machine whose screen you want to capture. For local agents like Claude Code on the same machine, use stdio transport:
+**Claude Code** — add to your project's `.mcp.json`:
+```json
+{
+  "mcpServers": {
+    "eye2byte": {
+      "command": "python",
+      "args": ["C:/path/to/eye2byte_mcp.py"]
+    }
+  }
+}
+```
+That's it — Claude Code will auto-start the server. Use full absolute paths.
+### Remote Setup (SSE)
+When your coding agent runs on a **different machine** (cloud VM, SSH dev box, CI runner) but needs to see your local screen, use SSE transport:
+**Step 1 — On your local machine** (the one with the screen):
+```bash
+# Install Eye2byte + dependencies
+pip install Pillow fastmcp
+pip install openai-whisper  # optional, for voice
+# Start the SSE server
+python eye2byte_mcp.py --sse                           # No auth (LAN only)
+python eye2byte_mcp.py --sse --token mysecret123       # Bearer token auth
+python eye2byte_mcp.py --sse --port 9000 --token abc   # Custom port + auth
+```
+The server stays running and accepts connections from any machine on your network. Use `--token` when the server is reachable beyond your trusted LAN.
+**Step 2 — On the remote machine** (where the coding agent runs):
+Nothing to install. Just configure the MCP client to point at your local IP:
+```json
+{
+  "mcpServers": {
+    "eye2byte": {
+      "url": "http://YOUR_LOCAL_IP:8808/sse",
+      "headers": {"Authorization": "Bearer mysecret123"}
+    }
+  }
+}
+```
+Omit the `headers` field if the server was started without `--token`.
+Find your local IP: `ipconfig` (Windows) or `ifconfig` / `ip addr` (Linux/macOS).
+**Firewall:** You may need to allow inbound TCP on port 8808. On Windows, run as admin:
+```powershell
+netsh advfirewall firewall add rule name="Eye2byte MCP" dir=in action=allow protocol=TCP localport=8808
+```
+### Multi-monitor Examples
+```
+capture_and_summarize(monitor=0)    # active monitor (default)
+capture_and_summarize(monitor=1)    # first monitor
+capture_and_summarize(monitor=2)    # second monitor
+capture_and_summarize(monitor=-1)   # ALL monitors at once
+```
+## Context Pack Format
+Every analysis produces a structured Context Pack:
+```markdown
+## Goal         — what the user appears to be doing
+## Environment  — OS, editor, repo, branch, language
+## Screen State — visible panels, files, terminal output
+## Signals      — verbatim errors, stack traces, warnings
+## Likely Situation — what's probably happening
+## Suggested Next Info — what a coding agent needs next
+```
+## Configuration
+Config: `~/.eye2byte/config.json` (created on first run or via `python eye2byte.py init`)
+| Setting | Default | Description |
+|---------|---------|-------------|
+| `provider` | `"ollama"` | Vision provider: ollama, gemini, openrouter, hyperbolic |
+| `model` | `"auto"` | Model name or "auto" for auto-detection |
+| `voice_clean` | `true` | Noise removal + pause trimming + volume normalization |
+| `auto_cleanup_days` | `7` | Delete old captures/summaries after N days (0=disabled) |
+| `image_max_size` | `1920` | Max image dimension before LLM processing |
+| `image_quality` | `90` | JPEG quality (1-100) |
+## Files
+| File | Purpose |
+|------|---------|
+| `eye2byte.py` | Core engine — capture, voice, clip, summarize, watch |
+| `eye2byte_ui.py` | Control panel with hotkeys and annotation overlay |
+| `eye2byte_mcp.py` | MCP server for coding agent integration |
+## License
+MIT