shuttersort 0.1.0__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,245 @@
1
+ Metadata-Version: 2.4
2
+ Name: shuttersort
3
+ Version: 0.1.0
4
+ Summary: AI-powered media folder analyzer and pruner using local Vision models
5
+ Author-email: camiloavilacm <camiloavilacm@users.noreply.github.com>
6
+ License: MIT
7
+ Project-URL: Homepage, https://github.com/camiloavilacm/ShutterSort
8
+ Project-URL: Repository, https://github.com/camiloavilacm/ShutterSort
9
+ Project-URL: Issues, https://github.com/camiloavilacm/ShutterSort/issues
10
+ Keywords: media,cleanup,ollama,vision,ai,photos,cli
11
+ Classifier: Development Status :: 3 - Alpha
12
+ Classifier: Environment :: Console
13
+ Classifier: Intended Audience :: End Users/Desktop
14
+ Classifier: License :: OSI Approved :: MIT License
15
+ Classifier: Operating System :: MacOS :: MacOS X
16
+ Classifier: Programming Language :: Python :: 3
17
+ Classifier: Programming Language :: Python :: 3.10
18
+ Classifier: Programming Language :: Python :: 3.11
19
+ Classifier: Programming Language :: Python :: 3.12
20
+ Classifier: Programming Language :: Python :: 3.13
21
+ Classifier: Topic :: Multimedia :: Graphics
22
+ Classifier: Topic :: Utilities
23
+ Requires-Python: >=3.10
24
+ Description-Content-Type: text/markdown
25
+ Requires-Dist: ollama>=0.1.0
26
+ Requires-Dist: rich>=13.0.0
27
+ Requires-Dist: Pillow>=10.0.0
28
+ Requires-Dist: rawpy>=0.18.0
29
+ Requires-Dist: opencv-python-headless>=4.8.0
30
+ Provides-Extra: dev
31
+ Requires-Dist: pytest>=7.0.0; extra == "dev"
32
+ Requires-Dist: pytest-cov>=4.0.0; extra == "dev"
33
+ Requires-Dist: ruff>=0.1.0; extra == "dev"
34
+ Requires-Dist: mypy>=1.0.0; extra == "dev"
35
+
36
+ # ShutterSort
37
+
38
+ AI-powered media folder analyzer and pruner. Scan your photo libraries, get intelligent scene analysis from a local Vision model, detect duplicates, and clean up with confidence.
39
+
40
+ ```
41
+ pip install shuttersort
42
+ shuttersort
43
+ ```
44
+
45
+ ## What It Does
46
+
47
+ | Feature | Description |
48
+ |---------|-------------|
49
+ | **AI Scene Analysis** | Classifies folders as landscape, portrait, event, junk, etc. using `llama3.2-vision` |
50
+ | **Quality Scoring** | Rates each folder 1-10 based on composition, lighting, and content value |
51
+ | **People Detection** | Counts people and describes appearances, emotions, and context |
52
+ | **Duplicate Detection** | Finds duplicate files across folders using content hashing (first 1MB + file size) |
53
+ | **Interactive Cleanup** | Review each folder in a Rich table, then Keep, Delete (to Trash), Open, or Skip |
54
+ | **RAW Support** | Extracts previews from Sony ARW files using `rawpy` |
55
+ | **Video Support** | Extracts 3 representative frames from MP4s using OpenCV |
56
+ | **100% Local** | All AI runs locally via Ollama — no cloud, no uploads, no API keys |
57
+
58
+ ## Quick Install
59
+
60
+ ### Prerequisites
61
+
62
+ 1. **Python 3.10+**
63
+ ```bash
64
+ python3 --version # Must be 3.10 or higher
65
+ ```
66
+
67
+ 2. **Ollama** installed and running
68
+ ```bash
69
+ # Install Ollama (macOS)
70
+ brew install ollama
71
+
72
+ # Start the Ollama service
73
+ ollama serve
74
+
75
+ # Pull the vision model (required)
76
+ ollama pull llama3.2-vision
77
+ ```
78
+
79
+ 3. **macOS Full Disk Access** (required for scanning Desktop, Downloads, Documents)
80
+ - Open **System Settings** → **Privacy & Security** → **Full Disk Access**
81
+ - Click the **+** button and add your terminal app:
82
+ - **Terminal.app**: `/System/Applications/Utilities/Terminal.app`
83
+ - **iTerm2**: `/Applications/iTerm.app`
84
+ - **VS Code Terminal**: `/Applications/Visual Studio Code.app`
85
+ - Restart your terminal after granting access
86
+
87
+ Without Full Disk Access, macOS will silently return empty results when scanning protected folders.
88
+
89
+ ### Install ShutterSort
90
+
91
+ ```bash
92
+ pip install shuttersort
93
+ ```
94
+
95
+ Or from source:
96
+
97
+ ```bash
98
+ git clone https://github.com/camiloavilacm/ShutterSort.git
99
+ cd ShutterSort
100
+ pip install -e .
101
+ ```
102
+
103
+ ## Usage
104
+
105
+ ### Basic Scan (Default Paths)
106
+
107
+ Scans `~/Desktop`, `~/Downloads`, and `~/Documents`:
108
+
109
+ ```bash
110
+ shuttersort
111
+ ```
112
+
113
+ ### Custom Paths
114
+
115
+ ```bash
116
+ # Single path
117
+ shuttersort --path ~/Photos
118
+
119
+ # Multiple paths
120
+ shuttersort --path ~/Photos ~/Pictures ~/ExternalDrive
121
+
122
+ # Shorthand
123
+ shuttersort -p ~/Photos
124
+ ```
125
+
126
+ ### Different Model
127
+
128
+ ```bash
129
+ shuttersort --model llava
130
+ shuttersort -m llava
131
+ ```
132
+
133
+ ### Dry Run (Preview Only)
134
+
135
+ See what would be deleted without actually deleting anything:
136
+
137
+ ```bash
138
+ shuttersort --dry-run
139
+ ```
140
+
141
+ ### Verbose Output
142
+
143
+ Show detailed debug logging:
144
+
145
+ ```bash
146
+ shuttersort --verbose
147
+ shuttersort -v
148
+ ```
149
+
150
+ ### Non-Interactive Mode
151
+
152
+ Just show the summary table without the interactive review prompts:
153
+
154
+ ```bash
155
+ shuttersort --no-interactive
156
+ ```
157
+
158
+ ## How It Works
159
+
160
+ ShutterSort uses a **three-agent architecture**:
161
+
162
+ ```
163
+ ┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
164
+ │ LibrarianAgent │────>│ CuratorAgent │────>│ DecisionAgent │
165
+ │ │ │ │ │ │
166
+ │ • Walks folders │ │ • Calls Ollama │ │ • Rich table │
167
+ │ • Finds media │ │ • Analyzes imgs │ │ • [K/D/O/S] loop│
168
+ │ • Extracts ARW │ │ • Scores 1-10 │ │ • AppleScript │
169
+ │ • Finds dupes │ │ • Detects people│ │ • Trash to Finder│
170
+ └─────────────────┘ └─────────────────┘ └─────────────────┘
171
+ ```
172
+
173
+ 1. **LibrarianAgent** walks your folders, finds all media files (JPG, PNG, ARW, MP4), extracts previews from RAW files, and detects duplicates.
174
+ 2. **CuratorAgent** sends up to 5 representative images per folder to `llama3.2-vision` and returns a structured analysis (scene type, score, people count, emotions).
175
+ 3. **DecisionAgent** presents everything in a color-coded Rich table and walks you through each folder with an interactive `[K]eep / [D]elete / [O]pen / [S]kip` loop.
176
+
177
+ ### Duplicate Detection
178
+
179
+ Files are matched by a composite key: **MD5 hash of the first 1MB + file size**. When duplicates are found across folders, ShutterSort suggests keeping the copy in the folder with the higher AI score.
180
+
181
+ ### Delete Behavior (Trash vs Permanent)
182
+
183
+ When you choose **Delete**, ShutterSort uses **AppleScript** to move files to the macOS Trash:
184
+
185
+ ```applescript
186
+ tell application "Finder" to delete POSIX file "/path/to/file"
187
+ ```
188
+
189
+ This is equivalent to right-clicking a file and selecting "Move to Trash." Files can be recovered from the Trash until you empty it. ShutterSort **never** permanently deletes files.
190
+
191
+ ### Temporary File Handling
192
+
193
+ All extracted previews (from ARW files) and video frames (from MP4s) are saved to temporary files using Python's `tempfile` module. These are cleaned up automatically after analysis. The `gc.collect()` call after ARW processing ensures native C memory from `rawpy` is released promptly, keeping RAM usage low on 16GB machines.
194
+
195
+ ## Output Example
196
+
197
+ ```
198
+ ShutterSort — Folder Analysis Summary
199
+ ┏━━━┳━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━┳━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━┳━━━━━━━━┳━━━━━━━━┳━━━━━━━┓
200
+ ┃ # ┃ Score ┃ Scene ┃ People ┃ Folder ┃ Summary ┃ Size ┃ Pic% ┃ Vid% ┃ Dupes ┃
201
+ ┡━━━╇━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━╇━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━╇━━━━━━━━╇━━━━━━━━╇━━━━━━━┩
202
+ │ 1 │ 8/10 │ landscape │ 2 │ .../vacation │ Beautiful beach photos from vac… │ 245.30 MB│ 100% │ 0% │ No │
203
+ │ 2 │ 2/10 │ junk │ 0 │ .../screenshots │ Screenshots of documents and re… │ 12.50 MB │ 100% │ 0% │ Yes │
204
+ │ 3 │ 9/10 │ event │ 6 │ .../family │ Birthday party with family memb… │ 512.00 MB│ 80% │ 20% │ No │
205
+ └───┴───────┴───────────┴────────┴──────────────────┴───────────────────────────────────┴──────────┴────────┴────────┴───────┘
206
+ ```
207
+
208
+ ## Troubleshooting
209
+
210
+ | Problem | Solution |
211
+ |---------|----------|
212
+ | **"No media folders found"** | Check Full Disk Access for your terminal app (see Prerequisites above) |
213
+ | **Ollama connection refused** | Run `ollama serve` in another terminal tab |
214
+ | **Model not found** | Run `ollama pull llama3.2-vision` |
215
+ | **ARW files fail to process** | Ensure `rawpy` is installed: `pip install rawpy` |
216
+ | **Slow analysis** | Large folders take longer; use `--verbose` to see progress |
217
+ | **JSON parse errors** | The retry loop handles this automatically (up to 3 retries) |
218
+
219
+ ## Contributing
220
+
221
+ We welcome contributions! See [CONTRIBUTING.md](CONTRIBUTING.md) for the full guide.
222
+
223
+ Quick start for contributors:
224
+
225
+ ```bash
226
+ git clone https://github.com/camiloavilacm/ShutterSort.git
227
+ cd ShutterSort
228
+ python3 -m venv .venv && source .venv/bin/activate
229
+ pip install -e ".[dev]"
230
+ pytest -m "not integration"
231
+ ```
232
+
233
+ ### Branching Model
234
+
235
+ - `main` — Production (auto-publishes to PyPI)
236
+ - `develop` — Staging (auto-publishes to TestPyPI)
237
+ - `feature/*` — Feature branches (PR → develop)
238
+
239
+ ### CI/CD
240
+
241
+ Every PR runs lint (ruff), type checks (mypy), and tests (pytest) on Python 3.10–3.13.
242
+
243
+ ## License
244
+
245
+ MIT — see [LICENSE](LICENSE) for details.
@@ -0,0 +1,210 @@
1
+ # ShutterSort
2
+
3
+ AI-powered media folder analyzer and pruner. Scan your photo libraries, get intelligent scene analysis from a local Vision model, detect duplicates, and clean up with confidence.
4
+
5
+ ```
6
+ pip install shuttersort
7
+ shuttersort
8
+ ```
9
+
10
+ ## What It Does
11
+
12
+ | Feature | Description |
13
+ |---------|-------------|
14
+ | **AI Scene Analysis** | Classifies folders as landscape, portrait, event, junk, etc. using `llama3.2-vision` |
15
+ | **Quality Scoring** | Rates each folder 1-10 based on composition, lighting, and content value |
16
+ | **People Detection** | Counts people and describes appearances, emotions, and context |
17
+ | **Duplicate Detection** | Finds duplicate files across folders using content hashing (first 1MB + file size) |
18
+ | **Interactive Cleanup** | Review each folder in a Rich table, then Keep, Delete (to Trash), Open, or Skip |
19
+ | **RAW Support** | Extracts previews from Sony ARW files using `rawpy` |
20
+ | **Video Support** | Extracts 3 representative frames from MP4s using OpenCV |
21
+ | **100% Local** | All AI runs locally via Ollama — no cloud, no uploads, no API keys |
22
+
23
+ ## Quick Install
24
+
25
+ ### Prerequisites
26
+
27
+ 1. **Python 3.10+**
28
+ ```bash
29
+ python3 --version # Must be 3.10 or higher
30
+ ```
31
+
32
+ 2. **Ollama** installed and running
33
+ ```bash
34
+ # Install Ollama (macOS)
35
+ brew install ollama
36
+
37
+ # Start the Ollama service
38
+ ollama serve
39
+
40
+ # Pull the vision model (required)
41
+ ollama pull llama3.2-vision
42
+ ```
43
+
44
+ 3. **macOS Full Disk Access** (required for scanning Desktop, Downloads, Documents)
45
+ - Open **System Settings** → **Privacy & Security** → **Full Disk Access**
46
+ - Click the **+** button and add your terminal app:
47
+ - **Terminal.app**: `/System/Applications/Utilities/Terminal.app`
48
+ - **iTerm2**: `/Applications/iTerm.app`
49
+ - **VS Code Terminal**: `/Applications/Visual Studio Code.app`
50
+ - Restart your terminal after granting access
51
+
52
+ Without Full Disk Access, macOS will silently return empty results when scanning protected folders.
53
+
54
+ ### Install ShutterSort
55
+
56
+ ```bash
57
+ pip install shuttersort
58
+ ```
59
+
60
+ Or from source:
61
+
62
+ ```bash
63
+ git clone https://github.com/camiloavilacm/ShutterSort.git
64
+ cd ShutterSort
65
+ pip install -e .
66
+ ```
67
+
68
+ ## Usage
69
+
70
+ ### Basic Scan (Default Paths)
71
+
72
+ Scans `~/Desktop`, `~/Downloads`, and `~/Documents`:
73
+
74
+ ```bash
75
+ shuttersort
76
+ ```
77
+
78
+ ### Custom Paths
79
+
80
+ ```bash
81
+ # Single path
82
+ shuttersort --path ~/Photos
83
+
84
+ # Multiple paths
85
+ shuttersort --path ~/Photos ~/Pictures ~/ExternalDrive
86
+
87
+ # Shorthand
88
+ shuttersort -p ~/Photos
89
+ ```
90
+
91
+ ### Different Model
92
+
93
+ ```bash
94
+ shuttersort --model llava
95
+ shuttersort -m llava
96
+ ```
97
+
98
+ ### Dry Run (Preview Only)
99
+
100
+ See what would be deleted without actually deleting anything:
101
+
102
+ ```bash
103
+ shuttersort --dry-run
104
+ ```
105
+
106
+ ### Verbose Output
107
+
108
+ Show detailed debug logging:
109
+
110
+ ```bash
111
+ shuttersort --verbose
112
+ shuttersort -v
113
+ ```
114
+
115
+ ### Non-Interactive Mode
116
+
117
+ Just show the summary table without the interactive review prompts:
118
+
119
+ ```bash
120
+ shuttersort --no-interactive
121
+ ```
122
+
123
+ ## How It Works
124
+
125
+ ShutterSort uses a **three-agent architecture**:
126
+
127
+ ```
128
+ ┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
129
+ │ LibrarianAgent │────>│ CuratorAgent │────>│ DecisionAgent │
130
+ │ │ │ │ │ │
131
+ │ • Walks folders │ │ • Calls Ollama │ │ • Rich table │
132
+ │ • Finds media │ │ • Analyzes imgs │ │ • [K/D/O/S] loop│
133
+ │ • Extracts ARW │ │ • Scores 1-10 │ │ • AppleScript │
134
+ │ • Finds dupes │ │ • Detects people│ │ • Trash to Finder│
135
+ └─────────────────┘ └─────────────────┘ └─────────────────┘
136
+ ```
137
+
138
+ 1. **LibrarianAgent** walks your folders, finds all media files (JPG, PNG, ARW, MP4), extracts previews from RAW files, and detects duplicates.
139
+ 2. **CuratorAgent** sends up to 5 representative images per folder to `llama3.2-vision` and returns a structured analysis (scene type, score, people count, emotions).
140
+ 3. **DecisionAgent** presents everything in a color-coded Rich table and walks you through each folder with an interactive `[K]eep / [D]elete / [O]pen / [S]kip` loop.
141
+
142
+ ### Duplicate Detection
143
+
144
+ Files are matched by a composite key: **MD5 hash of the first 1MB + file size**. When duplicates are found across folders, ShutterSort suggests keeping the copy in the folder with the higher AI score.
145
+
146
+ ### Delete Behavior (Trash vs Permanent)
147
+
148
+ When you choose **Delete**, ShutterSort uses **AppleScript** to move files to the macOS Trash:
149
+
150
+ ```applescript
151
+ tell application "Finder" to delete POSIX file "/path/to/file"
152
+ ```
153
+
154
+ This is equivalent to right-clicking a file and selecting "Move to Trash." Files can be recovered from the Trash until you empty it. ShutterSort **never** permanently deletes files.
155
+
156
+ ### Temporary File Handling
157
+
158
+ All extracted previews (from ARW files) and video frames (from MP4s) are saved to temporary files using Python's `tempfile` module. These are cleaned up automatically after analysis. The `gc.collect()` call after ARW processing ensures native C memory from `rawpy` is released promptly, keeping RAM usage low on 16GB machines.
159
+
160
+ ## Output Example
161
+
162
+ ```
163
+ ShutterSort — Folder Analysis Summary
164
+ ┏━━━┳━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━┳━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━┳━━━━━━━━┳━━━━━━━━┳━━━━━━━┓
165
+ ┃ # ┃ Score ┃ Scene ┃ People ┃ Folder ┃ Summary ┃ Size ┃ Pic% ┃ Vid% ┃ Dupes ┃
166
+ ┡━━━╇━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━╇━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━╇━━━━━━━━╇━━━━━━━━╇━━━━━━━┩
167
+ │ 1 │ 8/10 │ landscape │ 2 │ .../vacation │ Beautiful beach photos from vac… │ 245.30 MB│ 100% │ 0% │ No │
168
+ │ 2 │ 2/10 │ junk │ 0 │ .../screenshots │ Screenshots of documents and re… │ 12.50 MB │ 100% │ 0% │ Yes │
169
+ │ 3 │ 9/10 │ event │ 6 │ .../family │ Birthday party with family memb… │ 512.00 MB│ 80% │ 20% │ No │
170
+ └───┴───────┴───────────┴────────┴──────────────────┴───────────────────────────────────┴──────────┴────────┴────────┴───────┘
171
+ ```
172
+
173
+ ## Troubleshooting
174
+
175
+ | Problem | Solution |
176
+ |---------|----------|
177
+ | **"No media folders found"** | Check Full Disk Access for your terminal app (see Prerequisites above) |
178
+ | **Ollama connection refused** | Run `ollama serve` in another terminal tab |
179
+ | **Model not found** | Run `ollama pull llama3.2-vision` |
180
+ | **ARW files fail to process** | Ensure `rawpy` is installed: `pip install rawpy` |
181
+ | **Slow analysis** | Large folders take longer; use `--verbose` to see progress |
182
+ | **JSON parse errors** | The retry loop handles this automatically (up to 3 retries) |
183
+
184
+ ## Contributing
185
+
186
+ We welcome contributions! See [CONTRIBUTING.md](CONTRIBUTING.md) for the full guide.
187
+
188
+ Quick start for contributors:
189
+
190
+ ```bash
191
+ git clone https://github.com/camiloavilacm/ShutterSort.git
192
+ cd ShutterSort
193
+ python3 -m venv .venv && source .venv/bin/activate
194
+ pip install -e ".[dev]"
195
+ pytest -m "not integration"
196
+ ```
197
+
198
+ ### Branching Model
199
+
200
+ - `main` — Production (auto-publishes to PyPI)
201
+ - `develop` — Staging (auto-publishes to TestPyPI)
202
+ - `feature/*` — Feature branches (PR → develop)
203
+
204
+ ### CI/CD
205
+
206
+ Every PR runs lint (ruff), type checks (mypy), and tests (pytest) on Python 3.10–3.13.
207
+
208
+ ## License
209
+
210
+ MIT — see [LICENSE](LICENSE) for details.
@@ -0,0 +1,18 @@
1
+ """ShutterSort - AI-powered media folder analyzer and pruner.
2
+
3
+ An agent-based CLI tool that scans local media folders, analyzes them
4
+ with a local Vision Language Model (Ollama/llama3.2-vision), detects
5
+ duplicates, and provides an interactive cleanup interface.
6
+
7
+ Architecture:
8
+ - LibrarianAgent: Manages file system, extracts previews/frames, finds duplicates
9
+ - CuratorAgent: Vision analysis via Ollama, returns typed AnalysisResult
10
+ - DecisionAgent: Interactive review, Rich tables, AppleScript trash
11
+
12
+ Usage:
13
+ shuttersort --path ~/Desktop
14
+ shuttersort --path ~/Photos --model llama3.2-vision
15
+ """
16
+
17
+ __version__ = "0.1.0"
18
+ __author__ = "camiloavilacm"
@@ -0,0 +1,206 @@
1
+ """Base agent class with Ollama interface and automatic retry logic.
2
+
3
+ This module defines the MediaAgent abstract base class (ABC) that provides:
4
+ - Ollama API connection management
5
+ - Automatic retry with "reflection" on JSON parse failures
6
+ - Shared logging and configuration
7
+
8
+ The retry loop is a key "agentic" feature: if the LLM returns malformed JSON,
9
+ instead of crashing, the agent sends the error back to the model and asks it
10
+ to self-correct. This mimics how a human would say "that didn't make sense,
11
+ try again."
12
+ """
13
+
14
+ from __future__ import annotations
15
+
16
+ import logging
17
+ from abc import ABC, abstractmethod
18
+ from typing import Any
19
+
20
+ from .models import AnalysisResult
21
+ from .utils import parse_json_with_retry
22
+
23
+ # ---------------------------------------------------------------------------
24
+ # Logger setup
25
+ # ---------------------------------------------------------------------------
26
+ # We use Python's built-in logging module instead of print() because:
27
+ # - Logs can be directed to files, stdout, or both
28
+ # - Log levels (DEBUG, INFO, WARNING, ERROR) provide filtering control
29
+ # - Each log entry includes timestamp, level, and source module
30
+ # ---------------------------------------------------------------------------
31
+ logger = logging.getLogger(__name__)
32
+
33
+
34
+ class MediaAgent(ABC):
35
+ """Abstract base class for all agents in the ShutterSort system.
36
+
37
+ This class provides the shared infrastructure that all agents need:
38
+ - Connection to Ollama
39
+ - Retry logic for LLM calls
40
+ - Logging
41
+
42
+ Subclasses must implement their specific behavior via abstract methods.
43
+ This is the Template Method pattern: the base class defines the skeleton
44
+ of an algorithm (the retry loop), and subclasses fill in the details.
45
+
46
+ Attributes:
47
+ model: The Ollama model name (e.g., 'llama3.2-vision').
48
+ max_retries: Maximum number of retry attempts on JSON parse failure.
49
+ ollama_client: The Ollama client instance (set in __init__).
50
+ """
51
+
52
+ def __init__(
53
+ self,
54
+ model: str = "llama3.2-vision",
55
+ max_retries: int = 3,
56
+ ollama_client: Any = None,
57
+ ) -> None:
58
+ """Initialize the MediaAgent.
59
+
60
+ Args:
61
+ model: The Ollama model to use for vision analysis.
62
+ max_retries: How many times to retry on JSON parse failure.
63
+ ollama_client: Optional pre-configured Ollama client (for testing).
64
+ If None, a new ollama.Client() is created.
65
+ """
66
+ self.model = model
67
+ self.max_retries = max_retries
68
+
69
+ # Lazy import of ollama to avoid import errors when the package
70
+ # isn't installed yet (e.g., during `pip install` phase).
71
+ if ollama_client is not None:
72
+ self.ollama_client = ollama_client
73
+ else:
74
+ import ollama
75
+
76
+ self.ollama_client = ollama.Client()
77
+
78
+ # -----------------------------------------------------------------------
79
+ # Retry loop with reflection
80
+ # -----------------------------------------------------------------------
81
+ def call_ollama_with_retry(
82
+ self,
83
+ prompt: str,
84
+ images: list[bytes] | None = None,
85
+ ) -> AnalysisResult:
86
+ """Call Ollama with automatic retry on JSON parse failure.
87
+
88
+ This is the core "agentic" behavior. The flow is:
89
+ 1. Send prompt + images to Ollama
90
+ 2. Try to parse the response as JSON
91
+ 3. If parsing fails, send the error back to Ollama and retry
92
+ 4. Repeat up to max_retries times
93
+ 5. If all retries fail, raise the last exception
94
+
95
+ The "reflection" happens in step 3: we tell the model exactly what
96
+ went wrong ("Invalid JSON: ...") so it can self-correct. This is
97
+ much more effective than a blind retry.
98
+
99
+ Args:
100
+ prompt: The text prompt to send to the model.
101
+ images: Optional list of image bytes (JPEG-encoded).
102
+
103
+ Returns:
104
+ A typed AnalysisResult with the model's analysis.
105
+
106
+ Raises:
107
+ ValueError: If all retries fail to produce valid JSON.
108
+ Exception: If Ollama itself fails (network error, etc.).
109
+ """
110
+ last_error: Exception | None = None
111
+ current_prompt = prompt
112
+
113
+ for attempt in range(1, self.max_retries + 1):
114
+ try:
115
+ logger.info(
116
+ "Calling Ollama (attempt %d/%d, model=%s)",
117
+ attempt,
118
+ self.max_retries,
119
+ self.model,
120
+ )
121
+
122
+ # Build the message for Ollama
123
+ # The ollama Python client accepts images as base64 or bytes
124
+ kwargs: dict[str, Any] = {
125
+ "model": self.model,
126
+ "messages": [
127
+ {
128
+ "role": "user",
129
+ "content": current_prompt,
130
+ }
131
+ ],
132
+ }
133
+
134
+ if images:
135
+ kwargs["images"] = images
136
+
137
+ # Call the Ollama API
138
+ response = self.ollama_client.chat(**kwargs)
139
+ response_text: str = response["message"]["content"]
140
+
141
+ # Try to parse the response as JSON
142
+ parsed = parse_json_with_retry(response_text)
143
+
144
+ # Convert the parsed dict to an AnalysisResult
145
+ # Using .get() with defaults provides safety against missing fields
146
+ result = AnalysisResult(
147
+ scene_type=parsed.get("scene_type", "other"),
148
+ score=int(parsed.get("score", 1)),
149
+ summary=parsed.get("summary", ""),
150
+ people_count=int(parsed.get("people_count", 0)),
151
+ people_description=parsed.get("people_description", ""),
152
+ emotions_detected=parsed.get("emotions_detected", ""),
153
+ raw_json=response_text,
154
+ )
155
+
156
+ logger.info(
157
+ "Ollama response parsed successfully: scene=%s, score=%d",
158
+ result.scene_type,
159
+ result.score,
160
+ )
161
+ return result
162
+
163
+ except (ValueError, KeyError) as exc:
164
+ # JSON parse error or missing field — retry with reflection
165
+ last_error = exc
166
+ logger.warning(
167
+ "Attempt %d failed: %s. Retrying with error feedback...",
168
+ attempt,
169
+ exc,
170
+ )
171
+
172
+ # "Reflect" the error back to the model
173
+ # This tells the model what went wrong and asks it to fix it
174
+ current_prompt = (
175
+ f"{prompt}\n\n"
176
+ f"ERROR: Your previous response could not be parsed. "
177
+ f"Details: {exc}\n\n"
178
+ f"Please respond with ONLY a valid JSON object matching "
179
+ f"the required schema. Do NOT include any text before or "
180
+ f"after the JSON. Do NOT use markdown code blocks."
181
+ )
182
+
183
+ # All retries exhausted
184
+ raise ValueError(
185
+ f"Failed to get valid JSON from Ollama after {self.max_retries} "
186
+ f"attempts. Last error: {last_error}"
187
+ ) from last_error
188
+
189
+ # -----------------------------------------------------------------------
190
+ # Abstract method: each agent defines its own execution logic
191
+ # -----------------------------------------------------------------------
192
+ @abstractmethod
193
+ def execute(self, *args: Any, **kwargs: Any) -> Any:
194
+ """Execute the agent's primary task.
195
+
196
+ Each concrete agent implements this method with its specific logic.
197
+ This is the entry point for the agent's work.
198
+
199
+ Args:
200
+ *args: Positional arguments specific to the agent.
201
+ **kwargs: Keyword arguments specific to the agent.
202
+
203
+ Returns:
204
+ The result of the agent's execution (type varies by agent).
205
+ """
206
+ ...