pdf-file-renamer 0.6.0__tar.gz → 0.6.2__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (47) hide show
  1. pdf_file_renamer-0.6.2/CHANGELOG.md +137 -0
  2. pdf_file_renamer-0.6.2/PKG-INFO +443 -0
  3. pdf_file_renamer-0.6.2/README.md +395 -0
  4. {pdf_file_renamer-0.6.0 → pdf_file_renamer-0.6.2}/coverage.xml +84 -58
  5. pdf_file_renamer-0.6.2/demo.gif +0 -0
  6. pdf_file_renamer-0.6.2/demo.tape +25 -0
  7. {pdf_file_renamer-0.6.0 → pdf_file_renamer-0.6.2}/pyproject.toml +40 -2
  8. pdf_file_renamer-0.6.2/scripts/create_demo_gif.py +114 -0
  9. pdf_file_renamer-0.6.2/scripts/record_demo.sh +62 -0
  10. {pdf_file_renamer-0.6.0 → pdf_file_renamer-0.6.2}/src/pdf_file_renamer/__init__.py +1 -1
  11. {pdf_file_renamer-0.6.0 → pdf_file_renamer-0.6.2}/src/pdf_file_renamer/application/pdf_rename_workflow.py +8 -2
  12. {pdf_file_renamer-0.6.0 → pdf_file_renamer-0.6.2}/src/pdf_file_renamer/infrastructure/doi/pdf2doi_extractor.py +45 -14
  13. {pdf_file_renamer-0.6.0 → pdf_file_renamer-0.6.2}/src/pdf_file_renamer/presentation/formatters.py +13 -3
  14. {pdf_file_renamer-0.6.0 → pdf_file_renamer-0.6.2}/tests/data/2025-dennis-managing-complexity.pdf +0 -0
  15. pdf_file_renamer-0.6.0/PKG-INFO +0 -272
  16. pdf_file_renamer-0.6.0/README.md +0 -246
  17. {pdf_file_renamer-0.6.0 → pdf_file_renamer-0.6.2}/.env.example +0 -0
  18. {pdf_file_renamer-0.6.0 → pdf_file_renamer-0.6.2}/.github/workflows/ci.yml +0 -0
  19. {pdf_file_renamer-0.6.0 → pdf_file_renamer-0.6.2}/.github/workflows/release.yml +0 -0
  20. {pdf_file_renamer-0.6.0 → pdf_file_renamer-0.6.2}/.gitignore +0 -0
  21. {pdf_file_renamer-0.6.0 → pdf_file_renamer-0.6.2}/.python-version +0 -0
  22. {pdf_file_renamer-0.6.0 → pdf_file_renamer-0.6.2}/LICENSE +0 -0
  23. {pdf_file_renamer-0.6.0 → pdf_file_renamer-0.6.2}/REFACTORING_SUMMARY.md +0 -0
  24. {pdf_file_renamer-0.6.0 → pdf_file_renamer-0.6.2}/src/pdf_file_renamer/application/__init__.py +0 -0
  25. {pdf_file_renamer-0.6.0 → pdf_file_renamer-0.6.2}/src/pdf_file_renamer/application/filename_service.py +0 -0
  26. {pdf_file_renamer-0.6.0 → pdf_file_renamer-0.6.2}/src/pdf_file_renamer/application/rename_service.py +0 -0
  27. {pdf_file_renamer-0.6.0 → pdf_file_renamer-0.6.2}/src/pdf_file_renamer/domain/__init__.py +0 -0
  28. {pdf_file_renamer-0.6.0 → pdf_file_renamer-0.6.2}/src/pdf_file_renamer/domain/models.py +0 -0
  29. {pdf_file_renamer-0.6.0 → pdf_file_renamer-0.6.2}/src/pdf_file_renamer/domain/ports.py +0 -0
  30. {pdf_file_renamer-0.6.0 → pdf_file_renamer-0.6.2}/src/pdf_file_renamer/infrastructure/__init__.py +0 -0
  31. {pdf_file_renamer-0.6.0 → pdf_file_renamer-0.6.2}/src/pdf_file_renamer/infrastructure/config.py +0 -0
  32. {pdf_file_renamer-0.6.0 → pdf_file_renamer-0.6.2}/src/pdf_file_renamer/infrastructure/doi/__init__.py +0 -0
  33. {pdf_file_renamer-0.6.0 → pdf_file_renamer-0.6.2}/src/pdf_file_renamer/infrastructure/llm/__init__.py +0 -0
  34. {pdf_file_renamer-0.6.0 → pdf_file_renamer-0.6.2}/src/pdf_file_renamer/infrastructure/llm/pydantic_ai_provider.py +0 -0
  35. {pdf_file_renamer-0.6.0 → pdf_file_renamer-0.6.2}/src/pdf_file_renamer/infrastructure/pdf/__init__.py +0 -0
  36. {pdf_file_renamer-0.6.0 → pdf_file_renamer-0.6.2}/src/pdf_file_renamer/infrastructure/pdf/composite.py +0 -0
  37. {pdf_file_renamer-0.6.0 → pdf_file_renamer-0.6.2}/src/pdf_file_renamer/infrastructure/pdf/docling_extractor.py +0 -0
  38. {pdf_file_renamer-0.6.0 → pdf_file_renamer-0.6.2}/src/pdf_file_renamer/infrastructure/pdf/pymupdf_extractor.py +0 -0
  39. {pdf_file_renamer-0.6.0 → pdf_file_renamer-0.6.2}/src/pdf_file_renamer/main.py +0 -0
  40. {pdf_file_renamer-0.6.0 → pdf_file_renamer-0.6.2}/src/pdf_file_renamer/presentation/__init__.py +0 -0
  41. {pdf_file_renamer-0.6.0 → pdf_file_renamer-0.6.2}/src/pdf_file_renamer/presentation/cli.py +0 -0
  42. {pdf_file_renamer-0.6.0 → pdf_file_renamer-0.6.2}/tests/__init__.py +0 -0
  43. {pdf_file_renamer-0.6.0 → pdf_file_renamer-0.6.2}/tests/data/Camp_of_the_Saints.pdf +0 -0
  44. {pdf_file_renamer-0.6.0 → pdf_file_renamer-0.6.2}/tests/data/s43588-025-00854-1.pdf +0 -0
  45. {pdf_file_renamer-0.6.0 → pdf_file_renamer-0.6.2}/tests/test_domain_models.py +0 -0
  46. {pdf_file_renamer-0.6.0 → pdf_file_renamer-0.6.2}/tests/test_filename_service.py +0 -0
  47. {pdf_file_renamer-0.6.0 → pdf_file_renamer-0.6.2}/tests/test_rename_service.py +0 -0
@@ -0,0 +1,137 @@
1
+ # Changelog
2
+
3
+ All notable changes to this project will be documented in this file.
4
+
5
+ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
6
+ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
7
+
8
+ ## [0.6.2] - 2025-10-14
9
+
10
+ ### Added
11
+ - Demo GIF showing pdf-renamer in action with live TUI
12
+ - VHS recording infrastructure (demo.tape)
13
+ - Automated demo creation scripts (create_demo_gif.py, record_demo.sh)
14
+ - Comprehensive PyPI metadata and classifiers
15
+ - Table of contents in README for better navigation
16
+ - Architecture, Development, and Contributing sections in README
17
+ - Project URLs for homepage, repository, issues, and changelog
18
+
19
+ ### Changed
20
+ - Enhanced README with animated demo, better badges, and emoji icons
21
+ - Improved PyPI discoverability with keywords and proper categorization
22
+ - Updated description to highlight DOI-first approach and interactive mode
23
+
24
+ ## [0.6.1] - 2025-10-14
25
+
26
+ ### Fixed
27
+ - Fixed DOI extractor incorrectly expecting list instead of dict from pdf2doi
28
+ - Fixed JSON parsing for CrossRef API metadata instead of incorrect bibtex parsing
29
+ - Fixed confidence enum handling causing AttributeError in workflow and formatters
30
+ - Fixed linting errors (SIM105 - use contextlib.suppress)
31
+ - Fixed mypy type checking errors in author extraction
32
+ - Fixed code formatting issues
33
+
34
+ ### Changed
35
+ - Improved DOI metadata extraction from CrossRef JSON structure
36
+ - Enhanced type safety with explicit type annotations
37
+
38
+ ## [0.6.0] - 2025-10-14
39
+
40
+ ### Added
41
+ - DOI-based naming feature using pdf2doi library
42
+ - Automatic DOI extraction from academic papers
43
+ - CrossRef API integration for rich metadata (title, authors, year, journal, publisher)
44
+ - Hybrid naming strategy: DOI-first with LLM fallback
45
+ - DOI metadata display in interactive prompts
46
+ - Enhanced status display showing "DOI found" during processing
47
+
48
+ ### Changed
49
+ - Improved filename generation with VERY_HIGH confidence for DOI-based names
50
+ - Updated workflow to prioritize DOI extraction before LLM analysis
51
+ - Enhanced reasoning messages to indicate DOI-based naming
52
+
53
+ ## [0.5.0] - 2025-10-12
54
+
55
+ ### Changed
56
+ - Reorganized project to src layout structure (src/pdf_file_renamer)
57
+ - Improved package organization following Python best practices
58
+ - Updated all imports and references to new structure
59
+
60
+ ## [0.4.2] - 2025-10-12
61
+
62
+ ### Changed
63
+ - Renamed package from `pdf_renamer` to `pdf-file-renamer` for PyPI
64
+ - Updated package name across all configurations
65
+ - Improved PyPI package metadata
66
+
67
+ ## [0.4.1] - 2025-10-12
68
+
69
+ ### Added
70
+ - Initial PyPI publishing workflow
71
+ - Automated releases via GitHub Actions
72
+
73
+ ## [0.4.0] - 2025-10-12
74
+
75
+ ### Added
76
+ - Complete refactoring to Clean Architecture
77
+ - Comprehensive unit tests with pytest
78
+ - Type checking with mypy (strict mode)
79
+ - Code quality with ruff linting and formatting
80
+ - GitHub Actions CI/CD pipeline
81
+ - Code coverage reporting with pytest-cov
82
+ - Domain-driven design with clear separation of concerns
83
+ - Port and adapter pattern for external dependencies
84
+
85
+ ### Changed
86
+ - Reorganized codebase into domain, application, infrastructure, and presentation layers
87
+ - Improved testability and maintainability
88
+ - Enhanced documentation with architecture notes
89
+
90
+ ## [0.3.0] - 2025-10-11
91
+
92
+ ### Added
93
+ - Enhanced interactive mode with retry, edit, and skip options
94
+ - Multi-pass analysis for better accuracy
95
+ - Focused metadata extraction for improved LLM context
96
+ - Better error handling and recovery
97
+
98
+ ### Changed
99
+ - Improved LLM prompting strategy
100
+ - Enhanced user experience with clearer prompts
101
+ - Better handling of edge cases
102
+
103
+ ## [0.2.0] - 2025-10-10
104
+
105
+ ### Added
106
+ - Interactive mode for rename confirmation
107
+ - Rich terminal UI with tables and colored output
108
+ - Batch processing with progress tracking
109
+ - Live status updates during processing
110
+
111
+ ### Changed
112
+ - Simplified CLI by removing subcommand requirement
113
+ - Improved PDF processing pipeline
114
+ - Enhanced error messages
115
+
116
+ ## [0.1.0] - 2025-10-09
117
+
118
+ ### Added
119
+ - Initial release
120
+ - PDF text extraction using PyMuPDF
121
+ - LLM-based filename generation (OpenAI and Ollama support)
122
+ - Dry-run mode for safe testing
123
+ - Basic CLI interface
124
+ - Configuration via environment variables
125
+ - Confidence scoring for suggestions
126
+ - Support for custom output directories
127
+
128
+ [0.6.2]: https://github.com/nostoslabs/pdf-renamer/compare/v0.6.1...v0.6.2
129
+ [0.6.1]: https://github.com/nostoslabs/pdf-renamer/compare/v0.6.0...v0.6.1
130
+ [0.6.0]: https://github.com/nostoslabs/pdf-renamer/compare/v0.5.0...v0.6.0
131
+ [0.5.0]: https://github.com/nostoslabs/pdf-renamer/compare/v0.4.2...v0.5.0
132
+ [0.4.2]: https://github.com/nostoslabs/pdf-renamer/compare/v0.4.1...v0.4.2
133
+ [0.4.1]: https://github.com/nostoslabs/pdf-renamer/compare/v0.4.0...v0.4.1
134
+ [0.4.0]: https://github.com/nostoslabs/pdf-renamer/compare/v0.3.0...v0.4.0
135
+ [0.3.0]: https://github.com/nostoslabs/pdf-renamer/compare/v0.2.0...v0.3.0
136
+ [0.2.0]: https://github.com/nostoslabs/pdf-renamer/compare/v0.1.0...v0.2.0
137
+ [0.1.0]: https://github.com/nostoslabs/pdf-renamer/releases/tag/v0.1.0
@@ -0,0 +1,443 @@
1
+ Metadata-Version: 2.4
2
+ Name: pdf-file-renamer
3
+ Version: 0.6.2
4
+ Summary: Intelligent PDF renaming using LLMs with DOI-based naming and interactive workflow
5
+ Project-URL: Homepage, https://github.com/nostoslabs/pdf-renamer
6
+ Project-URL: Repository, https://github.com/nostoslabs/pdf-renamer
7
+ Project-URL: Issues, https://github.com/nostoslabs/pdf-renamer/issues
8
+ Project-URL: Changelog, https://github.com/nostoslabs/pdf-renamer/blob/main/CHANGELOG.md
9
+ Author-email: Nostos Labs <info@nostoslabs.com>
10
+ License: MIT
11
+ License-File: LICENSE
12
+ Keywords: academic-papers,ai,automation,document-management,doi,file-organization,llm,pdf,rename
13
+ Classifier: Development Status :: 4 - Beta
14
+ Classifier: Environment :: Console
15
+ Classifier: Intended Audience :: Education
16
+ Classifier: Intended Audience :: End Users/Desktop
17
+ Classifier: Intended Audience :: Science/Research
18
+ Classifier: License :: OSI Approved :: MIT License
19
+ Classifier: Operating System :: OS Independent
20
+ Classifier: Programming Language :: Python :: 3
21
+ Classifier: Programming Language :: Python :: 3.11
22
+ Classifier: Programming Language :: Python :: 3.12
23
+ Classifier: Topic :: Office/Business :: Office Suites
24
+ Classifier: Topic :: Scientific/Engineering
25
+ Classifier: Topic :: Text Processing :: General
26
+ Classifier: Topic :: Utilities
27
+ Classifier: Typing :: Typed
28
+ Requires-Python: >=3.11
29
+ Requires-Dist: docling-core>=2.0.0
30
+ Requires-Dist: docling-parse>=2.0.0
31
+ Requires-Dist: pdf2doi>=1.7
32
+ Requires-Dist: pydantic-ai>=1.0.17
33
+ Requires-Dist: pydantic-settings>=2.7.1
34
+ Requires-Dist: pydantic>=2.10.6
35
+ Requires-Dist: pymupdf>=1.26.5
36
+ Requires-Dist: python-dotenv>=1.1.1
37
+ Requires-Dist: rich>=14.2.0
38
+ Requires-Dist: tenacity>=9.0.0
39
+ Requires-Dist: typer>=0.19.2
40
+ Provides-Extra: dev
41
+ Requires-Dist: mypy>=1.14.1; extra == 'dev'
42
+ Requires-Dist: pytest-asyncio>=0.25.2; extra == 'dev'
43
+ Requires-Dist: pytest-cov>=6.0.0; extra == 'dev'
44
+ Requires-Dist: pytest-mock>=3.14.0; extra == 'dev'
45
+ Requires-Dist: pytest>=8.3.4; extra == 'dev'
46
+ Requires-Dist: ruff>=0.9.1; extra == 'dev'
47
+ Description-Content-Type: text/markdown
48
+
49
+ # PDF Renamer
50
+
51
+ [![PyPI version](https://img.shields.io/pypi/v/pdf-file-renamer.svg)](https://pypi.org/project/pdf-file-renamer/)
52
+ [![PyPI downloads](https://img.shields.io/pypi/dm/pdf-file-renamer.svg)](https://pypi.org/project/pdf-file-renamer/)
53
+ [![Python](https://img.shields.io/pypi/pyversions/pdf-file-renamer.svg)](https://pypi.org/project/pdf-file-renamer/)
54
+ [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
55
+ [![CI](https://github.com/nostoslabs/pdf-renamer/workflows/CI/badge.svg)](https://github.com/nostoslabs/pdf-renamer/actions)
56
+ [![Code style: ruff](https://img.shields.io/endpoint?url=https://raw.githubusercontent.com/astral-sh/ruff/main/assets/badge/v2.json)](https://github.com/astral-sh/ruff)
57
+ [![Type checked: mypy](https://img.shields.io/badge/type%20checked-mypy-blue.svg)](http://mypy-lang.org/)
58
+
59
+ **Intelligent PDF file renaming using LLMs and DOI metadata.** Automatically generate clean, descriptive filenames for your PDF library.
60
+
61
+ > 🚀 Works with **OpenAI**, **Ollama**, **LM Studio**, and any OpenAI-compatible API
62
+ > 📚 **DOI-first** approach for academic papers - no API costs!
63
+ > 🎯 **Interactive mode** with retry, edit, and skip options
64
+
65
+ ## Table of Contents
66
+
67
+ - [Quick Example](#quick-example)
68
+ - [Features](#features)
69
+ - [Installation](#installation)
70
+ - [Configuration](#configuration)
71
+ - [Usage](#usage)
72
+ - [Interactive Mode](#interactive-mode)
73
+ - [How It Works](#how-it-works)
74
+ - [Cost Considerations](#cost-considerations)
75
+ - [Architecture](#architecture)
76
+ - [Development](#development)
77
+ - [Contributing](#contributing)
78
+ - [License](#license)
79
+
80
+ ## Quick Example
81
+
82
+ ![Demo](demo.gif)
83
+
84
+ Transform messy filenames into clean, organized ones:
85
+
86
+ ```
87
+ Before: After:
88
+ 📄 paper_final_v3.pdf → Leroux-Analog-In-memory-Computing-2025.pdf
89
+ 📄 download (2).pdf → Ruiz-Why-Don-Trace-Requirements-2023.pdf
90
+ 📄 document.pdf → Raspail-Camp_of_the_Saints.pdf
91
+ ```
92
+
93
+ **Live Progress Display:**
94
+ ```
95
+ Processing 3 PDFs with max 3 concurrent API calls and 10 concurrent extractions
96
+
97
+ ╭─────────────────────────── 📊 Progress ───────────────────────────╮
98
+ │ Total: 3 | Pending: 0 | Extracting: 0 | Analyzing: 0 | Complete: 3 │
99
+ ╰───────────────────────────────────────────────────────────────────╯
100
+ ╭───────────────────────────────────────────────────────────────────╮
101
+ │ [██████████████████████████████████████████████] 100.0% │
102
+ ╰───────────────────────────────────────────────────────────────────╯
103
+ Processing Status
104
+ ┏━━━━━━━━━━━━━━━━━━━━┳━━━━━━━┳━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━┓
105
+ ┃ File ┃ Stage ┃ Status ┃ Details ┃
106
+ ┡━━━━━━━━━━━━━━━━━━━━╇━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━┩
107
+ │ paper_final_v3.pdf │ ✓ │ Complete │ very_high │
108
+ │ download (2).pdf │ ✓ │ Complete │ very_high (DOI) │
109
+ │ document.pdf │ ✓ │ Complete │ high │
110
+ └────────────────────┴───────┴──────────┴─────────────────────┘
111
+ ```
112
+
113
+ ## Features
114
+
115
+ - **🎓 DOI-based naming** - Automatically extracts DOI and fetches authoritative metadata for academic papers
116
+ - **🧠 Advanced PDF parsing** using docling-parse for better structure-aware extraction
117
+ - **👁️ OCR fallback** for scanned PDFs with low text content
118
+ - **🎯 Smart LLM prompting** with multi-pass analysis for improved accuracy
119
+ - **⚡ Hybrid approach** - Uses DOI metadata when available, falls back to LLM analysis for other documents
120
+ - **📝 Standardized format** - Generates filenames like `Author-Topic-Year.pdf`
121
+ - **🔍 Dry-run mode** to preview changes before applying
122
+ - **💬 Enhanced interactive mode** with options to accept, manually edit, retry, or skip each file
123
+ - **📊 Live progress display** with concurrent processing for speed
124
+ - **⚙️ Configurable concurrency** limits for API calls and PDF extraction
125
+ - **📦 Batch processing** of multiple PDFs with optional output directory
126
+
127
+ ## Installation
128
+
129
+ ### Quick Start (No Installation Required)
130
+
131
+ ```bash
132
+ # Run directly with uvx
133
+ uvx pdf-renamer --dry-run /path/to/pdfs
134
+ ```
135
+
136
+ ### Install from PyPI
137
+
138
+ ```bash
139
+ # Using pip
140
+ pip install pdf-file-renamer
141
+
142
+ # Using uv
143
+ uv pip install pdf-file-renamer
144
+ ```
145
+
146
+ ### Install from Source
147
+
148
+ ```bash
149
+ # Clone and install
150
+ git clone https://github.com/nostoslabs/pdf-renamer.git
151
+ cd pdf-renamer
152
+ uv sync
153
+ ```
154
+
155
+ ## Configuration
156
+
157
+ Configure your LLM provider:
158
+
159
+ **Option A: OpenAI (Cloud)**
160
+ ```bash
161
+ cp .env.example .env
162
+ # Edit .env and add your OPENAI_API_KEY
163
+ ```
164
+
165
+ **Option B: Ollama or other local models**
166
+ ```bash
167
+ # No API key needed for local models
168
+ # Either set LLM_BASE_URL in .env or use --url flag
169
+ echo "LLM_BASE_URL=http://patmos:11434/v1" > .env
170
+ ```
171
+
172
+ ## Usage
173
+
174
+ ### Quick Start
175
+
176
+ ```bash
177
+ # Preview renames (dry-run mode)
178
+ pdf-renamer --dry-run /path/to/pdf/directory
179
+
180
+ # Actually rename files
181
+ pdf-renamer --no-dry-run /path/to/pdf/directory
182
+
183
+ # Interactive mode - review each file
184
+ pdf-renamer --interactive --no-dry-run /path/to/pdf/directory
185
+ ```
186
+
187
+ ### Using uvx (No Installation)
188
+
189
+ ```bash
190
+ # Run directly without installing
191
+ uvx pdf-renamer --dry-run /path/to/pdfs
192
+
193
+ # Run from GitHub
194
+ uvx https://github.com/nostoslabs/pdf-renamer --dry-run /path/to/pdfs
195
+ ```
196
+
197
+ ### Options
198
+
199
+ - `--dry-run/--no-dry-run`: Show suggestions without renaming (default: True)
200
+ - `--interactive, -i`: Interactive mode with rich options:
201
+ - **Accept** - Use the suggested filename
202
+ - **Edit** - Manually modify the filename
203
+ - **Retry** - Ask the LLM to generate a new suggestion
204
+ - **Skip** - Skip this file and move to the next
205
+ - `--model`: Model to use (default: llama3.2, works with any OpenAI-compatible API)
206
+ - `--url`: Custom base URL for OpenAI-compatible APIs (default: http://localhost:11434/v1)
207
+ - `--pattern`: Glob pattern for files (default: *.pdf)
208
+ - `--output-dir, -o`: Move renamed files to a different directory
209
+ - `--max-concurrent-api`: Maximum concurrent API calls (default: 3)
210
+ - `--max-concurrent-pdf`: Maximum concurrent PDF extractions (default: 10)
211
+
212
+ ### Examples
213
+
214
+ **Using OpenAI:**
215
+ ```bash
216
+ # Preview all PDFs in current directory
217
+ uvx pdf-renamer --dry-run .
218
+
219
+ # Rename PDFs in specific directory
220
+ uvx pdf-renamer --no-dry-run ~/Documents/Papers
221
+
222
+ # Use a different OpenAI model
223
+ uvx pdf-renamer --model gpt-4o --dry-run .
224
+ ```
225
+
226
+ **Using Ollama (or other local models):**
227
+ ```bash
228
+ # Using Ollama on patmos server with gemma model
229
+ uvx pdf-renamer --url http://patmos:11434/v1 --model gemma3:latest --dry-run .
230
+
231
+ # Using local Ollama with qwen model
232
+ uvx pdf-renamer --url http://localhost:11434/v1 --model qwen2.5 --dry-run .
233
+
234
+ # Set URL in environment and just use model flag
235
+ export LLM_BASE_URL=http://patmos:11434/v1
236
+ uvx pdf-renamer --model gemma3:latest --dry-run .
237
+ ```
238
+
239
+ **Other examples:**
240
+ ```bash
241
+ # Process only specific files
242
+ uvx pdf-renamer --pattern "*2020*.pdf" --dry-run .
243
+
244
+ # Interactive mode with local model
245
+ uvx pdf-renamer --url http://patmos:11434/v1 --model gemma3:latest --interactive --no-dry-run .
246
+
247
+ # Run directly from GitHub
248
+ uvx https://github.com/nostoslabs/pdf-renamer --no-dry-run ~/Documents/Papers
249
+ ```
250
+
251
+ ## Interactive Mode
252
+
253
+ When using `--interactive` mode, you'll be presented with each file one at a time with detailed options:
254
+
255
+ ```
256
+ ================================================================================
257
+ Original: 2024-research-paper.pdf
258
+ Suggested: Smith-Machine-Learning-Applications-2024.pdf
259
+ Confidence: high
260
+ Reasoning: Clear author and topic identified from abstract
261
+ ================================================================================
262
+
263
+ Options:
264
+ y / yes / Enter - Accept suggested name
265
+ e / edit - Manually edit the filename
266
+ r / retry - Ask LLM to generate a new suggestion
267
+ n / no / skip - Skip this file
268
+
269
+ What would you like to do? [y]:
270
+ ```
271
+
272
+ This mode is perfect for:
273
+ - **Reviewing suggestions** before applying them
274
+ - **Fine-tuning filenames** that are close but not quite right
275
+ - **Retrying** when the LLM suggestion isn't good enough
276
+ - **Building confidence** in the tool before batch processing
277
+
278
+ You can use interactive mode with `--dry-run` to preview without actually renaming files, or with `--no-dry-run` to apply changes immediately after confirmation.
279
+
280
+ ## How It Works
281
+
282
+ ### Intelligent Hybrid Approach
283
+
284
+ The tool uses a multi-strategy approach to generate accurate filenames:
285
+
286
+ 1. **DOI Detection** (for academic papers)
287
+ - Searches PDF for DOI identifiers using [pdf2doi](https://github.com/MicheleCotrufo/pdf2doi)
288
+ - If found, queries authoritative metadata (title, authors, year, journal)
289
+ - Generates filename with **very high confidence** from validated metadata
290
+ - **Saves API costs** - no LLM call needed for papers with DOIs
291
+
292
+ 2. **LLM Analysis** (fallback for non-academic PDFs)
293
+ - **Extract**: Uses docling-parse to read first 5 pages with structure-aware parsing, falls back to PyMuPDF if needed
294
+ - **OCR**: Automatically applies OCR for scanned PDFs with minimal text
295
+ - **Metadata Enhancement**: Extracts focused hints (years, emails, author sections) to supplement unreliable PDF metadata
296
+ - **Analyze**: Sends full content excerpt to LLM with enhanced metadata and detailed extraction instructions
297
+ - **Multi-pass Review**: Low-confidence results trigger a second analysis pass with focused prompts
298
+ - **Suggest**: LLM returns filename in `Author-Topic-Year` format with confidence level and reasoning
299
+
300
+ 3. **Interactive Review** (optional): User can accept, edit, retry, or skip each suggestion
301
+ 4. **Rename**: Applies suggestions (if not in dry-run mode)
302
+
303
+ ### Benefits of DOI Integration
304
+
305
+ - **Accuracy**: DOI metadata is canonical and verified
306
+ - **Speed**: Instant lookup vs. LLM processing time
307
+ - **Cost**: Free DOI lookups save on API costs for academic papers
308
+ - **Reliability**: Works even when PDF text extraction is poor
309
+
310
+ ## Cost Considerations
311
+
312
+ **DOI-based Naming (Academic Papers):**
313
+ - **Completely free** - No API costs
314
+ - **No LLM needed** - Direct metadata lookup
315
+ - Works for most academic papers with embedded DOIs
316
+
317
+ **OpenAI (Fallback):**
318
+ - Uses `gpt-4o-mini` by default (very cost-effective)
319
+ - Only called when DOI not found
320
+ - Processes first ~4500 characters per PDF
321
+ - Typical cost: ~$0.001-0.003 per PDF
322
+
323
+ **Ollama/Local Models:**
324
+ - Completely free (runs on your hardware)
325
+ - Works with any Ollama model (llama3, qwen2.5, mistral, etc.)
326
+ - Also compatible with LM Studio, vLLM, and other OpenAI-compatible endpoints
327
+
328
+ ## Filename Format
329
+
330
+ The tool generates filenames in this format:
331
+ - `Smith-Kalman-Filtering-Applications-2020.pdf`
332
+ - `Adamy-Electronic-Warfare-Modeling-Techniques.pdf`
333
+ - `Blair-Monopulse-Processing-Unresolved-Targets.pdf`
334
+
335
+ Guidelines:
336
+ - First author's last name
337
+ - 3-6 word topic description (prioritizes clarity over brevity)
338
+ - Year (if identifiable)
339
+ - Hyphens between words
340
+ - Target ~80 characters (can be longer if needed for clarity)
341
+
342
+ ## Architecture
343
+
344
+ This project follows **Clean Architecture** principles with clear separation of concerns:
345
+
346
+ ```
347
+ src/pdf_file_renamer/
348
+ ├── domain/ # Core business logic (models, ports)
349
+ ├── application/ # Use cases and workflows
350
+ ├── infrastructure/ # External integrations (PDF, LLM, DOI)
351
+ └── presentation/ # CLI and UI components
352
+ ```
353
+
354
+ **Key Design Patterns:**
355
+ - **Ports and Adapters** - Clean interfaces for external dependencies
356
+ - **Dependency Injection** - Flexible component composition
357
+ - **Single Responsibility** - Each module has one clear purpose
358
+ - **Type Safety** - Full mypy strict mode compliance
359
+
360
+ See [REFACTORING_SUMMARY.md](REFACTORING_SUMMARY.md) for detailed architecture notes.
361
+
362
+ ## Development
363
+
364
+ ### Setup
365
+
366
+ ```bash
367
+ # Clone repository
368
+ git clone https://github.com/nostoslabs/pdf-renamer.git
369
+ cd pdf-renamer
370
+
371
+ # Install dependencies with uv
372
+ uv sync
373
+
374
+ # Run tests
375
+ uv run pytest
376
+
377
+ # Run linting
378
+ uv run ruff check src/ tests/
379
+
380
+ # Run type checking
381
+ uv run mypy src/
382
+ ```
383
+
384
+ ### Code Quality
385
+
386
+ - **Tests**: pytest with async support and coverage reporting
387
+ - **Linting**: ruff for fast, comprehensive linting
388
+ - **Formatting**: ruff format for consistent code style
389
+ - **Type Checking**: mypy in strict mode
390
+ - **CI/CD**: GitHub Actions for automated testing and releases
391
+
392
+ ### Running Locally
393
+
394
+ ```bash
395
+ # Run with local changes
396
+ uv run pdf-file-renamer --dry-run /path/to/pdfs
397
+
398
+ # Run specific module
399
+ uv run python -m pdf_file_renamer.main --help
400
+ ```
401
+
402
+ ## Contributing
403
+
404
+ Contributions are welcome! Please feel free to submit a Pull Request. For major changes, please open an issue first to discuss what you would like to change.
405
+
406
+ ### Development Workflow
407
+
408
+ 1. Fork the repository
409
+ 2. Create a feature branch (`git checkout -b feature/amazing-feature`)
410
+ 3. Make your changes
411
+ 4. Run tests and linting (`uv run pytest && uv run ruff check src/`)
412
+ 5. Commit your changes (`git commit -m 'Add amazing feature'`)
413
+ 6. Push to the branch (`git push origin feature/amazing-feature`)
414
+ 7. Open a Pull Request
415
+
416
+ ### Code Style
417
+
418
+ - Follow PEP 8 (enforced by ruff)
419
+ - Use type hints for all functions
420
+ - Write tests for new features
421
+ - Update documentation as needed
422
+
423
+ ## License
424
+
425
+ This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.
426
+
427
+ ## Acknowledgments
428
+
429
+ - [pdf2doi](https://github.com/MicheleCotrufo/pdf2doi) for DOI extraction
430
+ - [pydantic-ai](https://ai.pydantic.dev/) for LLM integration
431
+ - [docling-parse](https://github.com/DS4SD/docling-parse) for advanced PDF parsing
432
+ - [PyMuPDF](https://pymupdf.readthedocs.io/) for PDF text extraction
433
+ - [rich](https://rich.readthedocs.io/) for beautiful terminal UI
434
+
435
+ ## Support
436
+
437
+ - **Issues**: [GitHub Issues](https://github.com/nostoslabs/pdf-renamer/issues)
438
+ - **Discussions**: [GitHub Discussions](https://github.com/nostoslabs/pdf-renamer/discussions)
439
+ - **Changelog**: [CHANGELOG.md](CHANGELOG.md)
440
+
441
+ ---
442
+
443
+ **Made with ❤️ by [Nostos Labs](https://github.com/nostoslabs)**