pdf-file-renamer 0.4.2__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (34) hide show
  1. pdf_file_renamer-0.4.2/LICENSE +21 -0
  2. pdf_file_renamer-0.4.2/PKG-INFO +245 -0
  3. pdf_file_renamer-0.4.2/README.md +219 -0
  4. pdf_file_renamer-0.4.2/pdf_file_renamer.egg-info/PKG-INFO +245 -0
  5. pdf_file_renamer-0.4.2/pdf_file_renamer.egg-info/SOURCES.txt +32 -0
  6. pdf_file_renamer-0.4.2/pdf_file_renamer.egg-info/dependency_links.txt +1 -0
  7. pdf_file_renamer-0.4.2/pdf_file_renamer.egg-info/entry_points.txt +2 -0
  8. pdf_file_renamer-0.4.2/pdf_file_renamer.egg-info/requires.txt +18 -0
  9. pdf_file_renamer-0.4.2/pdf_file_renamer.egg-info/top_level.txt +1 -0
  10. pdf_file_renamer-0.4.2/pdf_renamer/__init__.py +3 -0
  11. pdf_file_renamer-0.4.2/pdf_renamer/application/__init__.py +7 -0
  12. pdf_file_renamer-0.4.2/pdf_renamer/application/filename_service.py +70 -0
  13. pdf_file_renamer-0.4.2/pdf_renamer/application/pdf_rename_workflow.py +144 -0
  14. pdf_file_renamer-0.4.2/pdf_renamer/application/rename_service.py +79 -0
  15. pdf_file_renamer-0.4.2/pdf_renamer/domain/__init__.py +25 -0
  16. pdf_file_renamer-0.4.2/pdf_renamer/domain/models.py +80 -0
  17. pdf_file_renamer-0.4.2/pdf_renamer/domain/ports.py +106 -0
  18. pdf_file_renamer-0.4.2/pdf_renamer/infrastructure/__init__.py +5 -0
  19. pdf_file_renamer-0.4.2/pdf_renamer/infrastructure/config.py +94 -0
  20. pdf_file_renamer-0.4.2/pdf_renamer/infrastructure/llm/__init__.py +5 -0
  21. pdf_file_renamer-0.4.2/pdf_renamer/infrastructure/llm/pydantic_ai_provider.py +234 -0
  22. pdf_file_renamer-0.4.2/pdf_renamer/infrastructure/pdf/__init__.py +7 -0
  23. pdf_file_renamer-0.4.2/pdf_renamer/infrastructure/pdf/composite.py +57 -0
  24. pdf_file_renamer-0.4.2/pdf_renamer/infrastructure/pdf/docling_extractor.py +116 -0
  25. pdf_file_renamer-0.4.2/pdf_renamer/infrastructure/pdf/pymupdf_extractor.py +165 -0
  26. pdf_file_renamer-0.4.2/pdf_renamer/main.py +6 -0
  27. pdf_file_renamer-0.4.2/pdf_renamer/presentation/__init__.py +6 -0
  28. pdf_file_renamer-0.4.2/pdf_renamer/presentation/cli.py +233 -0
  29. pdf_file_renamer-0.4.2/pdf_renamer/presentation/formatters.py +216 -0
  30. pdf_file_renamer-0.4.2/pyproject.toml +127 -0
  31. pdf_file_renamer-0.4.2/setup.cfg +4 -0
  32. pdf_file_renamer-0.4.2/tests/test_domain_models.py +111 -0
  33. pdf_file_renamer-0.4.2/tests/test_filename_service.py +108 -0
  34. pdf_file_renamer-0.4.2/tests/test_rename_service.py +97 -0
@@ -0,0 +1,21 @@
1
+ MIT License
2
+
3
+ Copyright (c) 2025 Nostos Labs
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ of this software and associated documentation files (the "Software"), to deal
7
+ in the Software without restriction, including without limitation the rights
8
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ copies of the Software, and to permit persons to whom the Software is
10
+ furnished to do so, subject to the following conditions:
11
+
12
+ The above copyright notice and this permission notice shall be included in all
13
+ copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21
+ SOFTWARE.
@@ -0,0 +1,245 @@
1
+ Metadata-Version: 2.4
2
+ Name: pdf-file-renamer
3
+ Version: 0.4.2
4
+ Summary: Intelligent PDF renaming using LLMs
5
+ Requires-Python: >=3.11
6
+ Description-Content-Type: text/markdown
7
+ License-File: LICENSE
8
+ Requires-Dist: pydantic>=2.10.6
9
+ Requires-Dist: pydantic-ai>=1.0.17
10
+ Requires-Dist: pydantic-settings>=2.7.1
11
+ Requires-Dist: pymupdf>=1.26.5
12
+ Requires-Dist: docling-parse>=2.0.0
13
+ Requires-Dist: docling-core>=2.0.0
14
+ Requires-Dist: python-dotenv>=1.1.1
15
+ Requires-Dist: rich>=14.2.0
16
+ Requires-Dist: typer>=0.19.2
17
+ Requires-Dist: tenacity>=9.0.0
18
+ Provides-Extra: dev
19
+ Requires-Dist: pytest>=8.3.4; extra == "dev"
20
+ Requires-Dist: pytest-cov>=6.0.0; extra == "dev"
21
+ Requires-Dist: pytest-asyncio>=0.25.2; extra == "dev"
22
+ Requires-Dist: pytest-mock>=3.14.0; extra == "dev"
23
+ Requires-Dist: ruff>=0.9.1; extra == "dev"
24
+ Requires-Dist: mypy>=1.14.1; extra == "dev"
25
+ Dynamic: license-file
26
+
27
+ # PDF Renamer
28
+
29
+ [![PyPI version](https://img.shields.io/pypi/v/pdf-file-renamer.svg)](https://pypi.org/project/pdf-file-renamer/)
30
+ [![Python](https://img.shields.io/badge/python-3.11+-blue.svg)](https://www.python.org/downloads/)
31
+ [![uv](https://img.shields.io/badge/uv-0.5+-orange.svg)](https://docs.astral.sh/uv/)
32
+ [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
33
+ [![pydantic-ai](https://img.shields.io/badge/pydantic--ai-1.0+-green.svg)](https://ai.pydantic.dev/)
34
+ [![GitHub](https://img.shields.io/badge/github-nostoslabs%2Fpdf--renamer-blue?logo=github)](https://github.com/nostoslabs/pdf-renamer)
35
+
36
+ [![Tests](https://img.shields.io/badge/tests-passing-brightgreen.svg)](https://github.com/nostoslabs/pdf-renamer)
37
+ [![Code style: ruff](https://img.shields.io/badge/code%20style-ruff-000000.svg)](https://github.com/astral-sh/ruff)
38
+ [![Type checked: mypy](https://img.shields.io/badge/type%20checked-mypy-blue.svg)](http://mypy-lang.org/)
39
+ [![Ruff](https://img.shields.io/endpoint?url=https://raw.githubusercontent.com/astral-sh/ruff/main/assets/badge/v2.json)](https://github.com/astral-sh/ruff)
40
+
41
+ Intelligent PDF file renaming using LLMs. This tool analyzes PDF content and metadata to suggest descriptive, standardized filenames.
42
+
43
+ > 🚀 Works with **OpenAI**, **Ollama**, **LM Studio**, and any OpenAI-compatible API
44
+
45
+ ## Features
46
+
47
+ - **Advanced PDF parsing** using docling-parse for better structure-aware extraction
48
+ - **OCR fallback** for scanned PDFs with low text content
49
+ - **Smart LLM prompting** with multi-pass analysis for improved accuracy
50
+ - Suggests filenames in format: `Author-Topic-Year.pdf`
51
+ - Dry-run mode to preview changes before applying
52
+ - **Enhanced interactive mode** with options to accept, manually edit, retry, or skip each file
53
+ - **Live progress display** with concurrent processing for speed
54
+ - **Configurable concurrency** limits for API calls and PDF extraction
55
+ - Batch processing of multiple PDFs with optional output directory
56
+
57
+ ## Installation
58
+
59
+ ### Quick Start (No Installation Required)
60
+
61
+ ```bash
62
+ # Run directly with uvx
63
+ uvx pdf-renamer --dry-run /path/to/pdfs
64
+ ```
65
+
66
+ ### Install from PyPI
67
+
68
+ ```bash
69
+ # Using pip
70
+ pip install pdf-file-renamer
71
+
72
+ # Using uv
73
+ uv pip install pdf-file-renamer
74
+ ```
75
+
76
+ ### Install from Source
77
+
78
+ ```bash
79
+ # Clone and install
80
+ git clone https://github.com/nostoslabs/pdf-renamer.git
81
+ cd pdf-renamer
82
+ uv sync
83
+ ```
84
+
85
+ ## Configuration
86
+
87
+ Configure your LLM provider:
88
+
89
+ **Option A: OpenAI (Cloud)**
90
+ ```bash
91
+ cp .env.example .env
92
+ # Edit .env and add your OPENAI_API_KEY
93
+ ```
94
+
95
+ **Option B: Ollama or other local models**
96
+ ```bash
97
+ # No API key needed for local models
98
+ # Either set LLM_BASE_URL in .env or use --url flag
99
+ echo "LLM_BASE_URL=http://patmos:11434/v1" > .env
100
+ ```
101
+
102
+ ## Usage
103
+
104
+ ### Quick Start
105
+
106
+ ```bash
107
+ # Preview renames (dry-run mode)
108
+ pdf-renamer --dry-run /path/to/pdf/directory
109
+
110
+ # Actually rename files
111
+ pdf-renamer --no-dry-run /path/to/pdf/directory
112
+
113
+ # Interactive mode - review each file
114
+ pdf-renamer --interactive --no-dry-run /path/to/pdf/directory
115
+ ```
116
+
117
+ ### Using uvx (No Installation)
118
+
119
+ ```bash
120
+ # Run directly without installing
121
+ uvx pdf-renamer --dry-run /path/to/pdfs
122
+
123
+ # Run from GitHub
124
+ uvx https://github.com/nostoslabs/pdf-renamer --dry-run /path/to/pdfs
125
+ ```
126
+
127
+ ### Options
128
+
129
+ - `--dry-run/--no-dry-run`: Show suggestions without renaming (default: True)
130
+ - `--interactive, -i`: Interactive mode with rich options:
131
+ - **Accept** - Use the suggested filename
132
+ - **Edit** - Manually modify the filename
133
+ - **Retry** - Ask the LLM to generate a new suggestion
134
+ - **Skip** - Skip this file and move to the next
135
+ - `--model`: Model to use (default: llama3.2, works with any OpenAI-compatible API)
136
+ - `--url`: Custom base URL for OpenAI-compatible APIs (default: http://localhost:11434/v1)
137
+ - `--pattern`: Glob pattern for files (default: *.pdf)
138
+ - `--output-dir, -o`: Move renamed files to a different directory
139
+ - `--max-concurrent-api`: Maximum concurrent API calls (default: 3)
140
+ - `--max-concurrent-pdf`: Maximum concurrent PDF extractions (default: 10)
141
+
142
+ ### Examples
143
+
144
+ **Using OpenAI:**
145
+ ```bash
146
+ # Preview all PDFs in current directory
147
+ uvx pdf-renamer --dry-run .
148
+
149
+ # Rename PDFs in specific directory
150
+ uvx pdf-renamer --no-dry-run ~/Documents/Papers
151
+
152
+ # Use a different OpenAI model
153
+ uvx pdf-renamer --model gpt-4o --dry-run .
154
+ ```
155
+
156
+ **Using Ollama (or other local models):**
157
+ ```bash
158
+ # Using Ollama on patmos server with gemma model
159
+ uvx pdf-renamer --url http://patmos:11434/v1 --model gemma3:latest --dry-run .
160
+
161
+ # Using local Ollama with qwen model
162
+ uvx pdf-renamer --url http://localhost:11434/v1 --model qwen2.5 --dry-run .
163
+
164
+ # Set URL in environment and just use model flag
165
+ export LLM_BASE_URL=http://patmos:11434/v1
166
+ uvx pdf-renamer --model gemma3:latest --dry-run .
167
+ ```
168
+
169
+ **Other examples:**
170
+ ```bash
171
+ # Process only specific files
172
+ uvx pdf-renamer --pattern "*2020*.pdf" --dry-run .
173
+
174
+ # Interactive mode with local model
175
+ uvx pdf-renamer --url http://patmos:11434/v1 --model gemma3:latest --interactive --no-dry-run .
176
+
177
+ # Run directly from GitHub
178
+ uvx https://github.com/nostoslabs/pdf-renamer --no-dry-run ~/Documents/Papers
179
+ ```
180
+
181
+ ## Interactive Mode
182
+
183
+ When using `--interactive` mode, you'll be presented with each file one at a time with detailed options:
184
+
185
+ ```
186
+ ================================================================================
187
+ Original: 2024-research-paper.pdf
188
+ Suggested: Smith-Machine-Learning-Applications-2024.pdf
189
+ Confidence: high
190
+ Reasoning: Clear author and topic identified from abstract
191
+ ================================================================================
192
+
193
+ Options:
194
+ y / yes / Enter - Accept suggested name
195
+ e / edit - Manually edit the filename
196
+ r / retry - Ask LLM to generate a new suggestion
197
+ n / no / skip - Skip this file
198
+
199
+ What would you like to do? [y]:
200
+ ```
201
+
202
+ This mode is perfect for:
203
+ - **Reviewing suggestions** before applying them
204
+ - **Fine-tuning filenames** that are close but not quite right
205
+ - **Retrying** when the LLM suggestion isn't good enough
206
+ - **Building confidence** in the tool before batch processing
207
+
208
+ You can use interactive mode with `--dry-run` to preview without actually renaming files, or with `--no-dry-run` to apply changes immediately after confirmation.
209
+
210
+ ## How It Works
211
+
212
+ 1. **Extract**: Uses docling-parse to read first 5 pages with structure-aware parsing, falls back to PyMuPDF if needed
213
+ 2. **OCR**: Automatically applies OCR for scanned PDFs with minimal text
214
+ 3. **Metadata Enhancement**: Extracts focused hints (years, emails, author sections) to supplement unreliable PDF metadata
215
+ 4. **Analyze**: Sends full content excerpt to LLM with enhanced metadata and detailed extraction instructions
216
+ 5. **Multi-pass Review**: Low-confidence results trigger a second analysis pass with focused prompts
217
+ 6. **Suggest**: LLM returns filename in `Author-Topic-Year` format with confidence level and reasoning
218
+ 7. **Interactive Review** (optional): User can accept, edit, retry, or skip each suggestion
219
+ 8. **Rename**: Applies suggestions (if not in dry-run mode)
220
+
221
+ ## Cost Considerations
222
+
223
+ **OpenAI:**
224
+ - Uses `gpt-4o-mini` by default (very cost-effective)
225
+ - Processes first ~4500 characters per PDF
226
+ - Typical cost: ~$0.001-0.003 per PDF
227
+
228
+ **Ollama/Local Models:**
229
+ - Completely free (runs on your hardware)
230
+ - Works with any Ollama model (llama3, qwen2.5, mistral, etc.)
231
+ - Also compatible with LM Studio, vLLM, and other OpenAI-compatible endpoints
232
+
233
+ ## Filename Format
234
+
235
+ The tool generates filenames in this format:
236
+ - `Smith-Kalman-Filtering-Applications-2020.pdf`
237
+ - `Adamy-Electronic-Warfare-Modeling-Techniques.pdf`
238
+ - `Blair-Monopulse-Processing-Unresolved-Targets.pdf`
239
+
240
+ Guidelines:
241
+ - First author's last name
242
+ - 3-6 word topic description (prioritizes clarity over brevity)
243
+ - Year (if identifiable)
244
+ - Hyphens between words
245
+ - Target ~80 characters (can be longer if needed for clarity)
@@ -0,0 +1,219 @@
1
+ # PDF Renamer
2
+
3
+ [![PyPI version](https://img.shields.io/pypi/v/pdf-file-renamer.svg)](https://pypi.org/project/pdf-file-renamer/)
4
+ [![Python](https://img.shields.io/badge/python-3.11+-blue.svg)](https://www.python.org/downloads/)
5
+ [![uv](https://img.shields.io/badge/uv-0.5+-orange.svg)](https://docs.astral.sh/uv/)
6
+ [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
7
+ [![pydantic-ai](https://img.shields.io/badge/pydantic--ai-1.0+-green.svg)](https://ai.pydantic.dev/)
8
+ [![GitHub](https://img.shields.io/badge/github-nostoslabs%2Fpdf--renamer-blue?logo=github)](https://github.com/nostoslabs/pdf-renamer)
9
+
10
+ [![Tests](https://img.shields.io/badge/tests-passing-brightgreen.svg)](https://github.com/nostoslabs/pdf-renamer)
11
+ [![Code style: ruff](https://img.shields.io/badge/code%20style-ruff-000000.svg)](https://github.com/astral-sh/ruff)
12
+ [![Type checked: mypy](https://img.shields.io/badge/type%20checked-mypy-blue.svg)](http://mypy-lang.org/)
13
+ [![Ruff](https://img.shields.io/endpoint?url=https://raw.githubusercontent.com/astral-sh/ruff/main/assets/badge/v2.json)](https://github.com/astral-sh/ruff)
14
+
15
+ Intelligent PDF file renaming using LLMs. This tool analyzes PDF content and metadata to suggest descriptive, standardized filenames.
16
+
17
+ > 🚀 Works with **OpenAI**, **Ollama**, **LM Studio**, and any OpenAI-compatible API
18
+
19
+ ## Features
20
+
21
+ - **Advanced PDF parsing** using docling-parse for better structure-aware extraction
22
+ - **OCR fallback** for scanned PDFs with low text content
23
+ - **Smart LLM prompting** with multi-pass analysis for improved accuracy
24
+ - Suggests filenames in format: `Author-Topic-Year.pdf`
25
+ - Dry-run mode to preview changes before applying
26
+ - **Enhanced interactive mode** with options to accept, manually edit, retry, or skip each file
27
+ - **Live progress display** with concurrent processing for speed
28
+ - **Configurable concurrency** limits for API calls and PDF extraction
29
+ - Batch processing of multiple PDFs with optional output directory
30
+
31
+ ## Installation
32
+
33
+ ### Quick Start (No Installation Required)
34
+
35
+ ```bash
36
+ # Run directly with uvx
37
+ uvx pdf-renamer --dry-run /path/to/pdfs
38
+ ```
39
+
40
+ ### Install from PyPI
41
+
42
+ ```bash
43
+ # Using pip
44
+ pip install pdf-file-renamer
45
+
46
+ # Using uv
47
+ uv pip install pdf-file-renamer
48
+ ```
49
+
50
+ ### Install from Source
51
+
52
+ ```bash
53
+ # Clone and install
54
+ git clone https://github.com/nostoslabs/pdf-renamer.git
55
+ cd pdf-renamer
56
+ uv sync
57
+ ```
58
+
59
+ ## Configuration
60
+
61
+ Configure your LLM provider:
62
+
63
+ **Option A: OpenAI (Cloud)**
64
+ ```bash
65
+ cp .env.example .env
66
+ # Edit .env and add your OPENAI_API_KEY
67
+ ```
68
+
69
+ **Option B: Ollama or other local models**
70
+ ```bash
71
+ # No API key needed for local models
72
+ # Either set LLM_BASE_URL in .env or use --url flag
73
+ echo "LLM_BASE_URL=http://patmos:11434/v1" > .env
74
+ ```
75
+
76
+ ## Usage
77
+
78
+ ### Quick Start
79
+
80
+ ```bash
81
+ # Preview renames (dry-run mode)
82
+ pdf-renamer --dry-run /path/to/pdf/directory
83
+
84
+ # Actually rename files
85
+ pdf-renamer --no-dry-run /path/to/pdf/directory
86
+
87
+ # Interactive mode - review each file
88
+ pdf-renamer --interactive --no-dry-run /path/to/pdf/directory
89
+ ```
90
+
91
+ ### Using uvx (No Installation)
92
+
93
+ ```bash
94
+ # Run directly without installing
95
+ uvx pdf-renamer --dry-run /path/to/pdfs
96
+
97
+ # Run from GitHub
98
+ uvx https://github.com/nostoslabs/pdf-renamer --dry-run /path/to/pdfs
99
+ ```
100
+
101
+ ### Options
102
+
103
+ - `--dry-run/--no-dry-run`: Show suggestions without renaming (default: True)
104
+ - `--interactive, -i`: Interactive mode with rich options:
105
+ - **Accept** - Use the suggested filename
106
+ - **Edit** - Manually modify the filename
107
+ - **Retry** - Ask the LLM to generate a new suggestion
108
+ - **Skip** - Skip this file and move to the next
109
+ - `--model`: Model to use (default: llama3.2, works with any OpenAI-compatible API)
110
+ - `--url`: Custom base URL for OpenAI-compatible APIs (default: http://localhost:11434/v1)
111
+ - `--pattern`: Glob pattern for files (default: *.pdf)
112
+ - `--output-dir, -o`: Move renamed files to a different directory
113
+ - `--max-concurrent-api`: Maximum concurrent API calls (default: 3)
114
+ - `--max-concurrent-pdf`: Maximum concurrent PDF extractions (default: 10)
115
+
116
+ ### Examples
117
+
118
+ **Using OpenAI:**
119
+ ```bash
120
+ # Preview all PDFs in current directory
121
+ uvx pdf-renamer --dry-run .
122
+
123
+ # Rename PDFs in specific directory
124
+ uvx pdf-renamer --no-dry-run ~/Documents/Papers
125
+
126
+ # Use a different OpenAI model
127
+ uvx pdf-renamer --model gpt-4o --dry-run .
128
+ ```
129
+
130
+ **Using Ollama (or other local models):**
131
+ ```bash
132
+ # Using Ollama on patmos server with gemma model
133
+ uvx pdf-renamer --url http://patmos:11434/v1 --model gemma3:latest --dry-run .
134
+
135
+ # Using local Ollama with qwen model
136
+ uvx pdf-renamer --url http://localhost:11434/v1 --model qwen2.5 --dry-run .
137
+
138
+ # Set URL in environment and just use model flag
139
+ export LLM_BASE_URL=http://patmos:11434/v1
140
+ uvx pdf-renamer --model gemma3:latest --dry-run .
141
+ ```
142
+
143
+ **Other examples:**
144
+ ```bash
145
+ # Process only specific files
146
+ uvx pdf-renamer --pattern "*2020*.pdf" --dry-run .
147
+
148
+ # Interactive mode with local model
149
+ uvx pdf-renamer --url http://patmos:11434/v1 --model gemma3:latest --interactive --no-dry-run .
150
+
151
+ # Run directly from GitHub
152
+ uvx https://github.com/nostoslabs/pdf-renamer --no-dry-run ~/Documents/Papers
153
+ ```
154
+
155
+ ## Interactive Mode
156
+
157
+ When using `--interactive` mode, you'll be presented with each file one at a time with detailed options:
158
+
159
+ ```
160
+ ================================================================================
161
+ Original: 2024-research-paper.pdf
162
+ Suggested: Smith-Machine-Learning-Applications-2024.pdf
163
+ Confidence: high
164
+ Reasoning: Clear author and topic identified from abstract
165
+ ================================================================================
166
+
167
+ Options:
168
+ y / yes / Enter - Accept suggested name
169
+ e / edit - Manually edit the filename
170
+ r / retry - Ask LLM to generate a new suggestion
171
+ n / no / skip - Skip this file
172
+
173
+ What would you like to do? [y]:
174
+ ```
175
+
176
+ This mode is perfect for:
177
+ - **Reviewing suggestions** before applying them
178
+ - **Fine-tuning filenames** that are close but not quite right
179
+ - **Retrying** when the LLM suggestion isn't good enough
180
+ - **Building confidence** in the tool before batch processing
181
+
182
+ You can use interactive mode with `--dry-run` to preview without actually renaming files, or with `--no-dry-run` to apply changes immediately after confirmation.
183
+
184
+ ## How It Works
185
+
186
+ 1. **Extract**: Uses docling-parse to read first 5 pages with structure-aware parsing, falls back to PyMuPDF if needed
187
+ 2. **OCR**: Automatically applies OCR for scanned PDFs with minimal text
188
+ 3. **Metadata Enhancement**: Extracts focused hints (years, emails, author sections) to supplement unreliable PDF metadata
189
+ 4. **Analyze**: Sends full content excerpt to LLM with enhanced metadata and detailed extraction instructions
190
+ 5. **Multi-pass Review**: Low-confidence results trigger a second analysis pass with focused prompts
191
+ 6. **Suggest**: LLM returns filename in `Author-Topic-Year` format with confidence level and reasoning
192
+ 7. **Interactive Review** (optional): User can accept, edit, retry, or skip each suggestion
193
+ 8. **Rename**: Applies suggestions (if not in dry-run mode)
194
+
195
+ ## Cost Considerations
196
+
197
+ **OpenAI:**
198
+ - Uses `gpt-4o-mini` by default (very cost-effective)
199
+ - Processes first ~4500 characters per PDF
200
+ - Typical cost: ~$0.001-0.003 per PDF
201
+
202
+ **Ollama/Local Models:**
203
+ - Completely free (runs on your hardware)
204
+ - Works with any Ollama model (llama3, qwen2.5, mistral, etc.)
205
+ - Also compatible with LM Studio, vLLM, and other OpenAI-compatible endpoints
206
+
207
+ ## Filename Format
208
+
209
+ The tool generates filenames in this format:
210
+ - `Smith-Kalman-Filtering-Applications-2020.pdf`
211
+ - `Adamy-Electronic-Warfare-Modeling-Techniques.pdf`
212
+ - `Blair-Monopulse-Processing-Unresolved-Targets.pdf`
213
+
214
+ Guidelines:
215
+ - First author's last name
216
+ - 3-6 word topic description (prioritizes clarity over brevity)
217
+ - Year (if identifiable)
218
+ - Hyphens between words
219
+ - Target ~80 characters (can be longer if needed for clarity)