pdf-file-renamer 0.6.1__tar.gz → 0.6.3__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (48) hide show
  1. pdf_file_renamer-0.6.3/CHANGELOG.md +156 -0
  2. pdf_file_renamer-0.6.3/PKG-INFO +444 -0
  3. pdf_file_renamer-0.6.3/README.md +396 -0
  4. {pdf_file_renamer-0.6.1 → pdf_file_renamer-0.6.3}/coverage.xml +81 -41
  5. pdf_file_renamer-0.6.3/demo.gif +0 -0
  6. pdf_file_renamer-0.6.3/demo.tape +25 -0
  7. {pdf_file_renamer-0.6.1 → pdf_file_renamer-0.6.3}/pyproject.toml +40 -2
  8. pdf_file_renamer-0.6.3/scripts/create_demo_gif.py +114 -0
  9. pdf_file_renamer-0.6.3/scripts/record_demo.sh +62 -0
  10. {pdf_file_renamer-0.6.1 → pdf_file_renamer-0.6.3}/src/pdf_file_renamer/__init__.py +1 -1
  11. pdf_file_renamer-0.6.3/src/pdf_file_renamer/infrastructure/doi/pdf2doi_extractor.py +296 -0
  12. {pdf_file_renamer-0.6.1 → pdf_file_renamer-0.6.3}/src/pdf_file_renamer/presentation/formatters.py +1 -3
  13. {pdf_file_renamer-0.6.1 → pdf_file_renamer-0.6.3}/tests/data/2025-dennis-managing-complexity.pdf +0 -0
  14. pdf_file_renamer-0.6.1/PKG-INFO +0 -272
  15. pdf_file_renamer-0.6.1/README.md +0 -246
  16. pdf_file_renamer-0.6.1/src/pdf_file_renamer/infrastructure/doi/pdf2doi_extractor.py +0 -163
  17. {pdf_file_renamer-0.6.1 → pdf_file_renamer-0.6.3}/.env.example +0 -0
  18. {pdf_file_renamer-0.6.1 → pdf_file_renamer-0.6.3}/.github/workflows/ci.yml +0 -0
  19. {pdf_file_renamer-0.6.1 → pdf_file_renamer-0.6.3}/.github/workflows/release.yml +0 -0
  20. {pdf_file_renamer-0.6.1 → pdf_file_renamer-0.6.3}/.gitignore +0 -0
  21. {pdf_file_renamer-0.6.1 → pdf_file_renamer-0.6.3}/.python-version +0 -0
  22. {pdf_file_renamer-0.6.1 → pdf_file_renamer-0.6.3}/LICENSE +0 -0
  23. {pdf_file_renamer-0.6.1 → pdf_file_renamer-0.6.3}/REFACTORING_SUMMARY.md +0 -0
  24. {pdf_file_renamer-0.6.1 → pdf_file_renamer-0.6.3}/src/pdf_file_renamer/application/__init__.py +0 -0
  25. {pdf_file_renamer-0.6.1 → pdf_file_renamer-0.6.3}/src/pdf_file_renamer/application/filename_service.py +0 -0
  26. {pdf_file_renamer-0.6.1 → pdf_file_renamer-0.6.3}/src/pdf_file_renamer/application/pdf_rename_workflow.py +0 -0
  27. {pdf_file_renamer-0.6.1 → pdf_file_renamer-0.6.3}/src/pdf_file_renamer/application/rename_service.py +0 -0
  28. {pdf_file_renamer-0.6.1 → pdf_file_renamer-0.6.3}/src/pdf_file_renamer/domain/__init__.py +0 -0
  29. {pdf_file_renamer-0.6.1 → pdf_file_renamer-0.6.3}/src/pdf_file_renamer/domain/models.py +0 -0
  30. {pdf_file_renamer-0.6.1 → pdf_file_renamer-0.6.3}/src/pdf_file_renamer/domain/ports.py +0 -0
  31. {pdf_file_renamer-0.6.1 → pdf_file_renamer-0.6.3}/src/pdf_file_renamer/infrastructure/__init__.py +0 -0
  32. {pdf_file_renamer-0.6.1 → pdf_file_renamer-0.6.3}/src/pdf_file_renamer/infrastructure/config.py +0 -0
  33. {pdf_file_renamer-0.6.1 → pdf_file_renamer-0.6.3}/src/pdf_file_renamer/infrastructure/doi/__init__.py +0 -0
  34. {pdf_file_renamer-0.6.1 → pdf_file_renamer-0.6.3}/src/pdf_file_renamer/infrastructure/llm/__init__.py +0 -0
  35. {pdf_file_renamer-0.6.1 → pdf_file_renamer-0.6.3}/src/pdf_file_renamer/infrastructure/llm/pydantic_ai_provider.py +0 -0
  36. {pdf_file_renamer-0.6.1 → pdf_file_renamer-0.6.3}/src/pdf_file_renamer/infrastructure/pdf/__init__.py +0 -0
  37. {pdf_file_renamer-0.6.1 → pdf_file_renamer-0.6.3}/src/pdf_file_renamer/infrastructure/pdf/composite.py +0 -0
  38. {pdf_file_renamer-0.6.1 → pdf_file_renamer-0.6.3}/src/pdf_file_renamer/infrastructure/pdf/docling_extractor.py +0 -0
  39. {pdf_file_renamer-0.6.1 → pdf_file_renamer-0.6.3}/src/pdf_file_renamer/infrastructure/pdf/pymupdf_extractor.py +0 -0
  40. {pdf_file_renamer-0.6.1 → pdf_file_renamer-0.6.3}/src/pdf_file_renamer/main.py +0 -0
  41. {pdf_file_renamer-0.6.1 → pdf_file_renamer-0.6.3}/src/pdf_file_renamer/presentation/__init__.py +0 -0
  42. {pdf_file_renamer-0.6.1 → pdf_file_renamer-0.6.3}/src/pdf_file_renamer/presentation/cli.py +0 -0
  43. {pdf_file_renamer-0.6.1 → pdf_file_renamer-0.6.3}/tests/__init__.py +0 -0
  44. {pdf_file_renamer-0.6.1 → pdf_file_renamer-0.6.3}/tests/data/Camp_of_the_Saints.pdf +0 -0
  45. {pdf_file_renamer-0.6.1 → pdf_file_renamer-0.6.3}/tests/data/s43588-025-00854-1.pdf +0 -0
  46. {pdf_file_renamer-0.6.1 → pdf_file_renamer-0.6.3}/tests/test_domain_models.py +0 -0
  47. {pdf_file_renamer-0.6.1 → pdf_file_renamer-0.6.3}/tests/test_filename_service.py +0 -0
  48. {pdf_file_renamer-0.6.1 → pdf_file_renamer-0.6.3}/tests/test_rename_service.py +0 -0
@@ -0,0 +1,156 @@
1
+ # Changelog
2
+
3
+ All notable changes to this project will be documented in this file.
4
+
5
+ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
6
+ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
7
+
8
+ ## [0.6.3] - 2025-10-14
9
+
10
+ ### Fixed
11
+ - Fixed critical bug where pdf2doi extracted DOIs from citations instead of the paper's own DOI
12
+ - Added DOI validation to verify metadata matches PDF content before accepting DOI
13
+ - Prevents incorrect naming when papers don't have their own DOI but cite other papers
14
+
15
+ ### Added
16
+ - DOI metadata validation against PDF first page content
17
+ - Title similarity checking using SequenceMatcher
18
+ - Configurable validation thresholds for DOI matching
19
+ - Fallback to LLM-based naming when DOI validation fails
20
+
21
+ ### Changed
22
+ - DOI extraction now validates that extracted metadata matches the actual PDF content
23
+ - Improved accuracy by rejecting citation DOIs that don't match the paper's title
24
+ - DOI validation checks title area (first ~300 characters) instead of full document
25
+
26
+ ## [0.6.2] - 2025-10-14
27
+
28
+ ### Added
29
+ - Demo GIF showing pdf-renamer in action with live TUI
30
+ - VHS recording infrastructure (demo.tape)
31
+ - Automated demo creation scripts (create_demo_gif.py, record_demo.sh)
32
+ - Comprehensive PyPI metadata and classifiers
33
+ - Table of contents in README for better navigation
34
+ - Architecture, Development, and Contributing sections in README
35
+ - Project URLs for homepage, repository, issues, and changelog
36
+
37
+ ### Changed
38
+ - Enhanced README with animated demo, better badges, and emoji icons
39
+ - Improved PyPI discoverability with keywords and proper categorization
40
+ - Updated description to highlight DOI-first approach and interactive mode
41
+
42
+ ## [0.6.1] - 2025-10-14
43
+
44
+ ### Fixed
45
+ - Fixed DOI extractor incorrectly expecting list instead of dict from pdf2doi
46
+ - Fixed JSON parsing for CrossRef API metadata instead of incorrect bibtex parsing
47
+ - Fixed confidence enum handling causing AttributeError in workflow and formatters
48
+ - Fixed linting errors (SIM105 - use contextlib.suppress)
49
+ - Fixed mypy type checking errors in author extraction
50
+ - Fixed code formatting issues
51
+
52
+ ### Changed
53
+ - Improved DOI metadata extraction from CrossRef JSON structure
54
+ - Enhanced type safety with explicit type annotations
55
+
56
+ ## [0.6.0] - 2025-10-14
57
+
58
+ ### Added
59
+ - DOI-based naming feature using pdf2doi library
60
+ - Automatic DOI extraction from academic papers
61
+ - CrossRef API integration for rich metadata (title, authors, year, journal, publisher)
62
+ - Hybrid naming strategy: DOI-first with LLM fallback
63
+ - DOI metadata display in interactive prompts
64
+ - Enhanced status display showing "DOI found" during processing
65
+
66
+ ### Changed
67
+ - Improved filename generation with VERY_HIGH confidence for DOI-based names
68
+ - Updated workflow to prioritize DOI extraction before LLM analysis
69
+ - Enhanced reasoning messages to indicate DOI-based naming
70
+
71
+ ## [0.5.0] - 2025-10-12
72
+
73
+ ### Changed
74
+ - Reorganized project to src layout structure (src/pdf_file_renamer)
75
+ - Improved package organization following Python best practices
76
+ - Updated all imports and references to new structure
77
+
78
+ ## [0.4.2] - 2025-10-12
79
+
80
+ ### Changed
81
+ - Renamed package from `pdf_renamer` to `pdf-file-renamer` for PyPI
82
+ - Updated package name across all configurations
83
+ - Improved PyPI package metadata
84
+
85
+ ## [0.4.1] - 2025-10-12
86
+
87
+ ### Added
88
+ - Initial PyPI publishing workflow
89
+ - Automated releases via GitHub Actions
90
+
91
+ ## [0.4.0] - 2025-10-12
92
+
93
+ ### Added
94
+ - Complete refactoring to Clean Architecture
95
+ - Comprehensive unit tests with pytest
96
+ - Type checking with mypy (strict mode)
97
+ - Code quality with ruff linting and formatting
98
+ - GitHub Actions CI/CD pipeline
99
+ - Code coverage reporting with pytest-cov
100
+ - Domain-driven design with clear separation of concerns
101
+ - Port and adapter pattern for external dependencies
102
+
103
+ ### Changed
104
+ - Reorganized codebase into domain, application, infrastructure, and presentation layers
105
+ - Improved testability and maintainability
106
+ - Enhanced documentation with architecture notes
107
+
108
+ ## [0.3.0] - 2025-10-11
109
+
110
+ ### Added
111
+ - Enhanced interactive mode with retry, edit, and skip options
112
+ - Multi-pass analysis for better accuracy
113
+ - Focused metadata extraction for improved LLM context
114
+ - Better error handling and recovery
115
+
116
+ ### Changed
117
+ - Improved LLM prompting strategy
118
+ - Enhanced user experience with clearer prompts
119
+ - Better handling of edge cases
120
+
121
+ ## [0.2.0] - 2025-10-10
122
+
123
+ ### Added
124
+ - Interactive mode for rename confirmation
125
+ - Rich terminal UI with tables and colored output
126
+ - Batch processing with progress tracking
127
+ - Live status updates during processing
128
+
129
+ ### Changed
130
+ - Simplified CLI by removing subcommand requirement
131
+ - Improved PDF processing pipeline
132
+ - Enhanced error messages
133
+
134
+ ## [0.1.0] - 2025-10-09
135
+
136
+ ### Added
137
+ - Initial release
138
+ - PDF text extraction using PyMuPDF
139
+ - LLM-based filename generation (OpenAI and Ollama support)
140
+ - Dry-run mode for safe testing
141
+ - Basic CLI interface
142
+ - Configuration via environment variables
143
+ - Confidence scoring for suggestions
144
+ - Support for custom output directories
145
+
146
+ [0.6.3]: https://github.com/nostoslabs/pdf-renamer/compare/v0.6.2...v0.6.3
147
+ [0.6.2]: https://github.com/nostoslabs/pdf-renamer/compare/v0.6.1...v0.6.2
148
+ [0.6.1]: https://github.com/nostoslabs/pdf-renamer/compare/v0.6.0...v0.6.1
149
+ [0.6.0]: https://github.com/nostoslabs/pdf-renamer/compare/v0.5.0...v0.6.0
150
+ [0.5.0]: https://github.com/nostoslabs/pdf-renamer/compare/v0.4.2...v0.5.0
151
+ [0.4.2]: https://github.com/nostoslabs/pdf-renamer/compare/v0.4.1...v0.4.2
152
+ [0.4.1]: https://github.com/nostoslabs/pdf-renamer/compare/v0.4.0...v0.4.1
153
+ [0.4.0]: https://github.com/nostoslabs/pdf-renamer/compare/v0.3.0...v0.4.0
154
+ [0.3.0]: https://github.com/nostoslabs/pdf-renamer/compare/v0.2.0...v0.3.0
155
+ [0.2.0]: https://github.com/nostoslabs/pdf-renamer/compare/v0.1.0...v0.2.0
156
+ [0.1.0]: https://github.com/nostoslabs/pdf-renamer/releases/tag/v0.1.0
@@ -0,0 +1,444 @@
1
+ Metadata-Version: 2.4
2
+ Name: pdf-file-renamer
3
+ Version: 0.6.3
4
+ Summary: Intelligent PDF renaming using LLMs with DOI-based naming and interactive workflow
5
+ Project-URL: Homepage, https://github.com/nostoslabs/pdf-renamer
6
+ Project-URL: Repository, https://github.com/nostoslabs/pdf-renamer
7
+ Project-URL: Issues, https://github.com/nostoslabs/pdf-renamer/issues
8
+ Project-URL: Changelog, https://github.com/nostoslabs/pdf-renamer/blob/main/CHANGELOG.md
9
+ Author-email: Nostos Labs <info@nostoslabs.com>
10
+ License: MIT
11
+ License-File: LICENSE
12
+ Keywords: academic-papers,ai,automation,document-management,doi,file-organization,llm,pdf,rename
13
+ Classifier: Development Status :: 4 - Beta
14
+ Classifier: Environment :: Console
15
+ Classifier: Intended Audience :: Education
16
+ Classifier: Intended Audience :: End Users/Desktop
17
+ Classifier: Intended Audience :: Science/Research
18
+ Classifier: License :: OSI Approved :: MIT License
19
+ Classifier: Operating System :: OS Independent
20
+ Classifier: Programming Language :: Python :: 3
21
+ Classifier: Programming Language :: Python :: 3.11
22
+ Classifier: Programming Language :: Python :: 3.12
23
+ Classifier: Topic :: Office/Business :: Office Suites
24
+ Classifier: Topic :: Scientific/Engineering
25
+ Classifier: Topic :: Text Processing :: General
26
+ Classifier: Topic :: Utilities
27
+ Classifier: Typing :: Typed
28
+ Requires-Python: >=3.11
29
+ Requires-Dist: docling-core>=2.0.0
30
+ Requires-Dist: docling-parse>=2.0.0
31
+ Requires-Dist: pdf2doi>=1.7
32
+ Requires-Dist: pydantic-ai>=1.0.17
33
+ Requires-Dist: pydantic-settings>=2.7.1
34
+ Requires-Dist: pydantic>=2.10.6
35
+ Requires-Dist: pymupdf>=1.26.5
36
+ Requires-Dist: python-dotenv>=1.1.1
37
+ Requires-Dist: rich>=14.2.0
38
+ Requires-Dist: tenacity>=9.0.0
39
+ Requires-Dist: typer>=0.19.2
40
+ Provides-Extra: dev
41
+ Requires-Dist: mypy>=1.14.1; extra == 'dev'
42
+ Requires-Dist: pytest-asyncio>=0.25.2; extra == 'dev'
43
+ Requires-Dist: pytest-cov>=6.0.0; extra == 'dev'
44
+ Requires-Dist: pytest-mock>=3.14.0; extra == 'dev'
45
+ Requires-Dist: pytest>=8.3.4; extra == 'dev'
46
+ Requires-Dist: ruff>=0.9.1; extra == 'dev'
47
+ Description-Content-Type: text/markdown
48
+
49
+ # PDF Renamer
50
+
51
+ [![PyPI version](https://img.shields.io/pypi/v/pdf-file-renamer.svg)](https://pypi.org/project/pdf-file-renamer/)
52
+ [![PyPI downloads](https://img.shields.io/pypi/dm/pdf-file-renamer.svg)](https://pypi.org/project/pdf-file-renamer/)
53
+ [![Python](https://img.shields.io/pypi/pyversions/pdf-file-renamer.svg)](https://pypi.org/project/pdf-file-renamer/)
54
+ [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
55
+ [![CI](https://github.com/nostoslabs/pdf-renamer/workflows/CI/badge.svg)](https://github.com/nostoslabs/pdf-renamer/actions)
56
+ [![Code style: ruff](https://img.shields.io/endpoint?url=https://raw.githubusercontent.com/astral-sh/ruff/main/assets/badge/v2.json)](https://github.com/astral-sh/ruff)
57
+ [![Type checked: mypy](https://img.shields.io/badge/type%20checked-mypy-blue.svg)](http://mypy-lang.org/)
58
+
59
+ **Intelligent PDF file renaming using LLMs and DOI metadata.** Automatically generate clean, descriptive filenames for your PDF library.
60
+
61
+ > 🚀 Works with **OpenAI**, **Ollama**, **LM Studio**, and any OpenAI-compatible API
62
+ > 📚 **DOI-first** approach for academic papers - no API costs!
63
+ > 🎯 **Interactive mode** with retry, edit, and skip options
64
+
65
+ ## Table of Contents
66
+
67
+ - [Quick Example](#quick-example)
68
+ - [Features](#features)
69
+ - [Installation](#installation)
70
+ - [Configuration](#configuration)
71
+ - [Usage](#usage)
72
+ - [Interactive Mode](#interactive-mode)
73
+ - [How It Works](#how-it-works)
74
+ - [Cost Considerations](#cost-considerations)
75
+ - [Architecture](#architecture)
76
+ - [Development](#development)
77
+ - [Contributing](#contributing)
78
+ - [License](#license)
79
+
80
+ ## Quick Example
81
+
82
+ ![Demo](demo.gif)
83
+
84
+ Transform messy filenames into clean, organized ones:
85
+
86
+ ```
87
+ Before: After:
88
+ 📄 paper_final_v3.pdf → Leroux-Analog-In-memory-Computing-2025.pdf
89
+ 📄 download (2).pdf → Ruiz-Why-Don-Trace-Requirements-2023.pdf
90
+ 📄 document.pdf → Raspail-Camp_of_the_Saints.pdf
91
+ ```
92
+
93
+ **Live Progress Display:**
94
+ ```
95
+ Processing 3 PDFs with max 3 concurrent API calls and 10 concurrent extractions
96
+
97
+ ╭─────────────────────────── 📊 Progress ───────────────────────────╮
98
+ │ Total: 3 | Pending: 0 | Extracting: 0 | Analyzing: 0 | Complete: 3 │
99
+ ╰───────────────────────────────────────────────────────────────────╯
100
+ ╭───────────────────────────────────────────────────────────────────╮
101
+ │ [██████████████████████████████████████████████] 100.0% │
102
+ ╰───────────────────────────────────────────────────────────────────╯
103
+ Processing Status
104
+ ┏━━━━━━━━━━━━━━━━━━━━┳━━━━━━━┳━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━┓
105
+ ┃ File ┃ Stage ┃ Status ┃ Details ┃
106
+ ┡━━━━━━━━━━━━━━━━━━━━╇━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━┩
107
+ │ paper_final_v3.pdf │ ✓ │ Complete │ very_high │
108
+ │ download (2).pdf │ ✓ │ Complete │ very_high (DOI) │
109
+ │ document.pdf │ ✓ │ Complete │ high │
110
+ └────────────────────┴───────┴──────────┴─────────────────────┘
111
+ ```
112
+
113
+ ## Features
114
+
115
+ - **🎓 DOI-based naming** - Automatically extracts DOI and fetches authoritative metadata for academic papers
116
+ - **🧠 Advanced PDF parsing** using docling-parse for better structure-aware extraction
117
+ - **👁️ OCR fallback** for scanned PDFs with low text content
118
+ - **🎯 Smart LLM prompting** with multi-pass analysis for improved accuracy
119
+ - **⚡ Hybrid approach** - Uses DOI metadata when available, falls back to LLM analysis for other documents
120
+ - **📝 Standardized format** - Generates filenames like `Author-Topic-Year.pdf`
121
+ - **🔍 Dry-run mode** to preview changes before applying
122
+ - **💬 Enhanced interactive mode** with options to accept, manually edit, retry, or skip each file
123
+ - **📊 Live progress display** with concurrent processing for speed
124
+ - **⚙️ Configurable concurrency** limits for API calls and PDF extraction
125
+ - **📦 Batch processing** of multiple PDFs with optional output directory
126
+
127
+ ## Installation
128
+
129
+ ### Quick Start (No Installation Required)
130
+
131
+ ```bash
132
+ # Run directly with uvx
133
+ uvx pdf-renamer --dry-run /path/to/pdfs
134
+ ```
135
+
136
+ ### Install from PyPI
137
+
138
+ ```bash
139
+ # Using pip
140
+ pip install pdf-file-renamer
141
+
142
+ # Using uv
143
+ uv pip install pdf-file-renamer
144
+ ```
145
+
146
+ ### Install from Source
147
+
148
+ ```bash
149
+ # Clone and install
150
+ git clone https://github.com/nostoslabs/pdf-renamer.git
151
+ cd pdf-renamer
152
+ uv sync
153
+ ```
154
+
155
+ ## Configuration
156
+
157
+ Configure your LLM provider:
158
+
159
+ **Option A: OpenAI (Cloud)**
160
+ ```bash
161
+ cp .env.example .env
162
+ # Edit .env and add your OPENAI_API_KEY
163
+ ```
164
+
165
+ **Option B: Ollama or other local models**
166
+ ```bash
167
+ # No API key needed for local models
168
+ # Either set LLM_BASE_URL in .env or use --url flag
169
+ echo "LLM_BASE_URL=http://patmos:11434/v1" > .env
170
+ ```
171
+
172
+ ## Usage
173
+
174
+ ### Quick Start
175
+
176
+ ```bash
177
+ # Preview renames (dry-run mode)
178
+ pdf-renamer --dry-run /path/to/pdf/directory
179
+
180
+ # Actually rename files
181
+ pdf-renamer --no-dry-run /path/to/pdf/directory
182
+
183
+ # Interactive mode - review each file
184
+ pdf-renamer --interactive --no-dry-run /path/to/pdf/directory
185
+ ```
186
+
187
+ ### Using uvx (No Installation)
188
+
189
+ ```bash
190
+ # Run directly without installing
191
+ uvx pdf-renamer --dry-run /path/to/pdfs
192
+
193
+ # Run from GitHub
194
+ uvx https://github.com/nostoslabs/pdf-renamer --dry-run /path/to/pdfs
195
+ ```
196
+
197
+ ### Options
198
+
199
+ - `--dry-run/--no-dry-run`: Show suggestions without renaming (default: True)
200
+ - `--interactive, -i`: Interactive mode with rich options:
201
+ - **Accept** - Use the suggested filename
202
+ - **Edit** - Manually modify the filename
203
+ - **Retry** - Ask the LLM to generate a new suggestion
204
+ - **Skip** - Skip this file and move to the next
205
+ - `--model`: Model to use (default: llama3.2, works with any OpenAI-compatible API)
206
+ - `--url`: Custom base URL for OpenAI-compatible APIs (default: http://localhost:11434/v1)
207
+ - `--pattern`: Glob pattern for files (default: *.pdf)
208
+ - `--output-dir, -o`: Move renamed files to a different directory
209
+ - `--max-concurrent-api`: Maximum concurrent API calls (default: 3)
210
+ - `--max-concurrent-pdf`: Maximum concurrent PDF extractions (default: 10)
211
+
212
+ ### Examples
213
+
214
+ **Using OpenAI:**
215
+ ```bash
216
+ # Preview all PDFs in current directory
217
+ uvx pdf-renamer --dry-run .
218
+
219
+ # Rename PDFs in specific directory
220
+ uvx pdf-renamer --no-dry-run ~/Documents/Papers
221
+
222
+ # Use a different OpenAI model
223
+ uvx pdf-renamer --model gpt-4o --dry-run .
224
+ ```
225
+
226
+ **Using Ollama (or other local models):**
227
+ ```bash
228
+ # Using Ollama on patmos server with gemma model
229
+ uvx pdf-renamer --url http://patmos:11434/v1 --model gemma3:latest --dry-run .
230
+
231
+ # Using local Ollama with qwen model
232
+ uvx pdf-renamer --url http://localhost:11434/v1 --model qwen2.5 --dry-run .
233
+
234
+ # Set URL in environment and just use model flag
235
+ export LLM_BASE_URL=http://patmos:11434/v1
236
+ uvx pdf-renamer --model gemma3:latest --dry-run .
237
+ ```
238
+
239
+ **Other examples:**
240
+ ```bash
241
+ # Process only specific files
242
+ uvx pdf-renamer --pattern "*2020*.pdf" --dry-run .
243
+
244
+ # Interactive mode with local model
245
+ uvx pdf-renamer --url http://patmos:11434/v1 --model gemma3:latest --interactive --no-dry-run .
246
+
247
+ # Run directly from GitHub
248
+ uvx https://github.com/nostoslabs/pdf-renamer --no-dry-run ~/Documents/Papers
249
+ ```
250
+
251
+ ## Interactive Mode
252
+
253
+ When using `--interactive` mode, you'll be presented with each file one at a time with detailed options:
254
+
255
+ ```
256
+ ================================================================================
257
+ Original: 2024-research-paper.pdf
258
+ Suggested: Smith-Machine-Learning-Applications-2024.pdf
259
+ Confidence: high
260
+ Reasoning: Clear author and topic identified from abstract
261
+ ================================================================================
262
+
263
+ Options:
264
+ y / yes / Enter - Accept suggested name
265
+ e / edit - Manually edit the filename
266
+ r / retry - Ask LLM to generate a new suggestion
267
+ n / no / skip - Skip this file
268
+
269
+ What would you like to do? [y]:
270
+ ```
271
+
272
+ This mode is perfect for:
273
+ - **Reviewing suggestions** before applying them
274
+ - **Fine-tuning filenames** that are close but not quite right
275
+ - **Retrying** when the LLM suggestion isn't good enough
276
+ - **Building confidence** in the tool before batch processing
277
+
278
+ You can use interactive mode with `--dry-run` to preview without actually renaming files, or with `--no-dry-run` to apply changes immediately after confirmation.
279
+
280
+ ## How It Works
281
+
282
+ ### Intelligent Hybrid Approach
283
+
284
+ The tool uses a multi-strategy approach to generate accurate filenames:
285
+
286
+ 1. **DOI Detection** (for academic papers)
287
+ - Searches PDF for DOI identifiers using [pdf2doi](https://github.com/MicheleCotrufo/pdf2doi)
288
+ - **Validates DOI metadata** against PDF content to prevent citation DOI mismatches
289
+ - If found and validated, queries authoritative metadata (title, authors, year, journal)
290
+ - Generates filename with **very high confidence** from validated metadata
291
+ - **Saves API costs** - no LLM call needed for papers with DOIs
292
+
293
+ 2. **LLM Analysis** (fallback for non-academic PDFs)
294
+ - **Extract**: Uses docling-parse to read first 5 pages with structure-aware parsing, falls back to PyMuPDF if needed
295
+ - **OCR**: Automatically applies OCR for scanned PDFs with minimal text
296
+ - **Metadata Enhancement**: Extracts focused hints (years, emails, author sections) to supplement unreliable PDF metadata
297
+ - **Analyze**: Sends full content excerpt to LLM with enhanced metadata and detailed extraction instructions
298
+ - **Multi-pass Review**: Low-confidence results trigger a second analysis pass with focused prompts
299
+ - **Suggest**: LLM returns filename in `Author-Topic-Year` format with confidence level and reasoning
300
+
301
+ 3. **Interactive Review** (optional): User can accept, edit, retry, or skip each suggestion
302
+ 4. **Rename**: Applies suggestions (if not in dry-run mode)
303
+
304
+ ### Benefits of DOI Integration
305
+
306
+ - **Accuracy**: DOI metadata is canonical and verified
307
+ - **Speed**: Instant lookup vs. LLM processing time
308
+ - **Cost**: Free DOI lookups save on API costs for academic papers
309
+ - **Reliability**: Works even when PDF text extraction is poor
310
+
311
+ ## Cost Considerations
312
+
313
+ **DOI-based Naming (Academic Papers):**
314
+ - **Completely free** - No API costs
315
+ - **No LLM needed** - Direct metadata lookup
316
+ - Works for most academic papers with embedded DOIs
317
+
318
+ **OpenAI (Fallback):**
319
+ - Uses `gpt-4o-mini` by default (very cost-effective)
320
+ - Only called when DOI not found
321
+ - Processes first ~4500 characters per PDF
322
+ - Typical cost: ~$0.001-0.003 per PDF
323
+
324
+ **Ollama/Local Models:**
325
+ - Completely free (runs on your hardware)
326
+ - Works with any Ollama model (llama3, qwen2.5, mistral, etc.)
327
+ - Also compatible with LM Studio, vLLM, and other OpenAI-compatible endpoints
328
+
329
+ ## Filename Format
330
+
331
+ The tool generates filenames in this format:
332
+ - `Smith-Kalman-Filtering-Applications-2020.pdf`
333
+ - `Adamy-Electronic-Warfare-Modeling-Techniques.pdf`
334
+ - `Blair-Monopulse-Processing-Unresolved-Targets.pdf`
335
+
336
+ Guidelines:
337
+ - First author's last name
338
+ - 3-6 word topic description (prioritizes clarity over brevity)
339
+ - Year (if identifiable)
340
+ - Hyphens between words
341
+ - Target ~80 characters (can be longer if needed for clarity)
342
+
343
+ ## Architecture
344
+
345
+ This project follows **Clean Architecture** principles with clear separation of concerns:
346
+
347
+ ```
348
+ src/pdf_file_renamer/
349
+ ├── domain/ # Core business logic (models, ports)
350
+ ├── application/ # Use cases and workflows
351
+ ├── infrastructure/ # External integrations (PDF, LLM, DOI)
352
+ └── presentation/ # CLI and UI components
353
+ ```
354
+
355
+ **Key Design Patterns:**
356
+ - **Ports and Adapters** - Clean interfaces for external dependencies
357
+ - **Dependency Injection** - Flexible component composition
358
+ - **Single Responsibility** - Each module has one clear purpose
359
+ - **Type Safety** - Full mypy strict mode compliance
360
+
361
+ See [REFACTORING_SUMMARY.md](REFACTORING_SUMMARY.md) for detailed architecture notes.
362
+
363
+ ## Development
364
+
365
+ ### Setup
366
+
367
+ ```bash
368
+ # Clone repository
369
+ git clone https://github.com/nostoslabs/pdf-renamer.git
370
+ cd pdf-renamer
371
+
372
+ # Install dependencies with uv
373
+ uv sync
374
+
375
+ # Run tests
376
+ uv run pytest
377
+
378
+ # Run linting
379
+ uv run ruff check src/ tests/
380
+
381
+ # Run type checking
382
+ uv run mypy src/
383
+ ```
384
+
385
+ ### Code Quality
386
+
387
+ - **Tests**: pytest with async support and coverage reporting
388
+ - **Linting**: ruff for fast, comprehensive linting
389
+ - **Formatting**: ruff format for consistent code style
390
+ - **Type Checking**: mypy in strict mode
391
+ - **CI/CD**: GitHub Actions for automated testing and releases
392
+
393
+ ### Running Locally
394
+
395
+ ```bash
396
+ # Run with local changes
397
+ uv run pdf-file-renamer --dry-run /path/to/pdfs
398
+
399
+ # Run specific module
400
+ uv run python -m pdf_file_renamer.main --help
401
+ ```
402
+
403
+ ## Contributing
404
+
405
+ Contributions are welcome! Please feel free to submit a Pull Request. For major changes, please open an issue first to discuss what you would like to change.
406
+
407
+ ### Development Workflow
408
+
409
+ 1. Fork the repository
410
+ 2. Create a feature branch (`git checkout -b feature/amazing-feature`)
411
+ 3. Make your changes
412
+ 4. Run tests and linting (`uv run pytest && uv run ruff check src/`)
413
+ 5. Commit your changes (`git commit -m 'Add amazing feature'`)
414
+ 6. Push to the branch (`git push origin feature/amazing-feature`)
415
+ 7. Open a Pull Request
416
+
417
+ ### Code Style
418
+
419
+ - Follow PEP 8 (enforced by ruff)
420
+ - Use type hints for all functions
421
+ - Write tests for new features
422
+ - Update documentation as needed
423
+
424
+ ## License
425
+
426
+ This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.
427
+
428
+ ## Acknowledgments
429
+
430
+ - [pdf2doi](https://github.com/MicheleCotrufo/pdf2doi) for DOI extraction
431
+ - [pydantic-ai](https://ai.pydantic.dev/) for LLM integration
432
+ - [docling-parse](https://github.com/DS4SD/docling-parse) for advanced PDF parsing
433
+ - [PyMuPDF](https://pymupdf.readthedocs.io/) for PDF text extraction
434
+ - [rich](https://rich.readthedocs.io/) for beautiful terminal UI
435
+
436
+ ## Support
437
+
438
+ - **Issues**: [GitHub Issues](https://github.com/nostoslabs/pdf-renamer/issues)
439
+ - **Discussions**: [GitHub Discussions](https://github.com/nostoslabs/pdf-renamer/discussions)
440
+ - **Changelog**: [CHANGELOG.md](CHANGELOG.md)
441
+
442
+ ---
443
+
444
+ **Made with ❤️ by [Nostos Labs](https://github.com/nostoslabs)**