PyPI - pdf-file-renamer - Versions diffs - 0.4.2__tar.gz → 0.6.0__tar.gz - Mend

pdf-file-renamer 0.4.2tar.gz → 0.6.0tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (52) hide show

pdf_file_renamer-0.6.0/.env.example ADDED Viewed

@@ -0,0 +1,9 @@
+# OpenAI API Key (required for OpenAI, optional for custom endpoints)
+OPENAI_API_KEY=your_api_key_here
+# Optional: Custom base URL for OpenAI-compatible APIs
+# Examples:
+# - Ollama: http://patmos:11434/v1
+# - LM Studio: http://localhost:1234/v1
+# - vLLM: http://your-server:8000/v1
+# LLM_BASE_URL=http://patmos:11434/v1

pdf_file_renamer-0.6.0/.github/workflows/ci.yml ADDED Viewed

@@ -0,0 +1,78 @@
+name: CI
+on:
+  push:
+    branches: [main, develop]
+  pull_request:
+    branches: [main, develop]
+jobs:
+  test:
+    name: Test Python ${{ matrix.python-version }}
+    runs-on: ubuntu-latest
+    strategy:
+      matrix:
+        python-version: ["3.11", "3.12"]
+    steps:
+      - uses: actions/checkout@v4
+      - name: Install uv
+        uses: astral-sh/setup-uv@v4
+        with:
+          version: "latest"
+      - name: Set up Python ${{ matrix.python-version }}
+        run: uv python install ${{ matrix.python-version }}
+      - name: Install dependencies
+        run: uv sync --all-extras
+      - name: Run ruff linting
+        run: uv run ruff check src/pdf_file_renamer tests
+      - name: Run ruff formatting check
+        run: uv run ruff format --check src/pdf_file_renamer tests
+      - name: Run mypy type checking
+        run: uv run mypy src/pdf_file_renamer
+      - name: Run tests with coverage
+        run: uv run pytest tests/ --cov=pdf_file_renamer --cov-report=xml --cov-report=term
+      - name: Upload coverage to Codecov
+        uses: codecov/codecov-action@v4
+        if: matrix.python-version == '3.11'
+        with:
+          file: ./coverage.xml
+          fail_ci_if_error: false
+  build:
+    name: Build distribution
+    runs-on: ubuntu-latest
+    needs: test
+    steps:
+      - uses: actions/checkout@v4
+      - name: Install uv
+        uses: astral-sh/setup-uv@v4
+        with:
+          version: "latest"
+      - name: Set up Python
+        run: uv python install 3.11
+      - name: Build package
+        run: uv build
+      - name: Check build
+        run: |
+          ls -lh dist/
+          uv run twine check dist/*
+      - name: Upload artifacts
+        uses: actions/upload-artifact@v4
+        with:
+          name: dist
+          path: dist/

pdf_file_renamer-0.6.0/.github/workflows/release.yml ADDED Viewed

@@ -0,0 +1,69 @@
+name: Release
+on:
+  push:
+    tags:
+      - "v*"
+permissions:
+  contents: write
+jobs:
+  build-and-release:
+    name: Build and Release
+    runs-on: ubuntu-latest
+    steps:
+      - uses: actions/checkout@v4
+      - name: Install uv
+        uses: astral-sh/setup-uv@v4
+        with:
+          version: "latest"
+      - name: Set up Python
+        run: uv python install 3.11
+      - name: Install dependencies
+        run: uv sync --all-extras
+      - name: Run tests
+        run: uv run pytest tests/
+      - name: Build package
+        run: uv build
+      - name: Extract version from tag
+        id: get_version
+        run: echo "VERSION=${GITHUB_REF#refs/tags/v}" >> $GITHUB_OUTPUT
+      - name: Publish to PyPI
+        env:
+          TWINE_USERNAME: __token__
+          TWINE_PASSWORD: ${{ secrets.PYPI_API_TOKEN }}
+        run: |
+          uv run twine upload dist/*
+      - name: Create Release
+        uses: softprops/action-gh-release@v1
+        with:
+          files: dist/*
+          generate_release_notes: true
+          body: |
+            ## What's Changed
+            Release version ${{ steps.get_version.outputs.VERSION }}
+            See the [REFACTORING_SUMMARY.md](https://github.com/${{ github.repository }}/blob/${{ github.ref_name }}/REFACTORING_SUMMARY.md) for architecture details.
+            ### Installation
+            **From PyPI:**
+            ```bash
+            pip install pdf-renamer==${{ steps.get_version.outputs.VERSION }}
+            ```
+            **Using uvx (no installation required):**
+            ```bash
+            uvx pdf-renamer@${{ steps.get_version.outputs.VERSION }}
+            ```

pdf_file_renamer-0.6.0/.gitignore ADDED Viewed

@@ -0,0 +1,55 @@
+.claude
+# Python
+__pycache__/
+*.py[cod]
+*$py.class
+*.so
+.Python
+build/
+develop-eggs/
+dist/
+downloads/
+eggs/
+.eggs/
+lib/
+lib64/
+parts/
+sdist/
+var/
+wheels/
+*.egg-info/
+.installed.cfg
+*.egg
+# Virtual environments
+venv/
+ENV/
+env/
+.venv/
+# uv
+uv.lock
+# IDEs
+.vscode/
+.idea/
+*.swp
+*.swo
+*~
+.DS_Store
+# Environment variables
+.env
+.env.local
+# Testing
+.pytest_cache/
+.coverage
+htmlcov/
+# Logs
+*.log
+# Temporary files
+*.tmp
+.cache/

pdf_file_renamer-0.6.0/.python-version ADDED Viewed

	@@ -0,0 +1 @@
1	+ 3.11

{pdf_file_renamer-0.4.2/pdf_file_renamer.egg-info → pdf_file_renamer-0.6.0}/PKG-INFO RENAMED Viewed

@@ -1,28 +1,28 @@
 Metadata-Version: 2.4
 Name: pdf-file-renamer
-Version: 0.4.2
+Version: 0.6.0
 Summary: Intelligent PDF renaming using LLMs
-Requires-Python: >=3.11
-Description-Content-Type: text/markdown
 License-File: LICENSE
-Requires-Dist: pydantic>=2.10.6
+Requires-Python: >=3.11
+Requires-Dist: docling-core>=2.0.0
+Requires-Dist: docling-parse>=2.0.0
+Requires-Dist: pdf2doi>=1.7
 Requires-Dist: pydantic-ai>=1.0.17
 Requires-Dist: pydantic-settings>=2.7.1
+Requires-Dist: pydantic>=2.10.6
 Requires-Dist: pymupdf>=1.26.5
-Requires-Dist: docling-parse>=2.0.0
-Requires-Dist: docling-core>=2.0.0
 Requires-Dist: python-dotenv>=1.1.1
 Requires-Dist: rich>=14.2.0
-Requires-Dist: typer>=0.19.2
 Requires-Dist: tenacity>=9.0.0
+Requires-Dist: typer>=0.19.2
 Provides-Extra: dev
-Requires-Dist: pytest>=8.3.4; extra == "dev"
-Requires-Dist: pytest-cov>=6.0.0; extra == "dev"
-Requires-Dist: pytest-asyncio>=0.25.2; extra == "dev"
-Requires-Dist: pytest-mock>=3.14.0; extra == "dev"
-Requires-Dist: ruff>=0.9.1; extra == "dev"
-Requires-Dist: mypy>=1.14.1; extra == "dev"
-Dynamic: license-file
+Requires-Dist: mypy>=1.14.1; extra == 'dev'
+Requires-Dist: pytest-asyncio>=0.25.2; extra == 'dev'
+Requires-Dist: pytest-cov>=6.0.0; extra == 'dev'
+Requires-Dist: pytest-mock>=3.14.0; extra == 'dev'
+Requires-Dist: pytest>=8.3.4; extra == 'dev'
+Requires-Dist: ruff>=0.9.1; extra == 'dev'
+Description-Content-Type: text/markdown
 # PDF Renamer
@@ -44,9 +44,11 @@ Intelligent PDF file renaming using LLMs. This tool analyzes PDF content and met
 ## Features
+- **DOI-based naming** - Automatically extracts DOI and fetches authoritative metadata for academic papers
 - **Advanced PDF parsing** using docling-parse for better structure-aware extraction
 - **OCR fallback** for scanned PDFs with low text content
 - **Smart LLM prompting** with multi-pass analysis for improved accuracy
+- **Hybrid approach** - Uses DOI metadata when available, falls back to LLM analysis for other documents
 - Suggests filenames in format: `Author-Topic-Year.pdf`
 - Dry-run mode to preview changes before applying
 - **Enhanced interactive mode** with options to accept, manually edit, retry, or skip each file
@@ -209,19 +211,44 @@ You can use interactive mode with `--dry-run` to preview without actually renami
 ## How It Works
-1. **Extract**: Uses docling-parse to read first 5 pages with structure-aware parsing, falls back to PyMuPDF if needed
-2. **OCR**: Automatically applies OCR for scanned PDFs with minimal text
-3. **Metadata Enhancement**: Extracts focused hints (years, emails, author sections) to supplement unreliable PDF metadata
-4. **Analyze**: Sends full content excerpt to LLM with enhanced metadata and detailed extraction instructions
-5. **Multi-pass Review**: Low-confidence results trigger a second analysis pass with focused prompts
-6. **Suggest**: LLM returns filename in `Author-Topic-Year` format with confidence level and reasoning
-7. **Interactive Review** (optional): User can accept, edit, retry, or skip each suggestion
-8. **Rename**: Applies suggestions (if not in dry-run mode)
+### Intelligent Hybrid Approach
+The tool uses a multi-strategy approach to generate accurate filenames:
+1. **DOI Detection** (for academic papers)
+   - Searches PDF for DOI identifiers using [pdf2doi](https://github.com/MicheleCotrufo/pdf2doi)
+   - If found, queries authoritative metadata (title, authors, year, journal)
+   - Generates filename with **very high confidence** from validated metadata
+   - **Saves API costs** - no LLM call needed for papers with DOIs
+2. **LLM Analysis** (fallback for non-academic PDFs)
+   - **Extract**: Uses docling-parse to read first 5 pages with structure-aware parsing, falls back to PyMuPDF if needed
+   - **OCR**: Automatically applies OCR for scanned PDFs with minimal text
+   - **Metadata Enhancement**: Extracts focused hints (years, emails, author sections) to supplement unreliable PDF metadata
+   - **Analyze**: Sends full content excerpt to LLM with enhanced metadata and detailed extraction instructions
+   - **Multi-pass Review**: Low-confidence results trigger a second analysis pass with focused prompts
+   - **Suggest**: LLM returns filename in `Author-Topic-Year` format with confidence level and reasoning
+3. **Interactive Review** (optional): User can accept, edit, retry, or skip each suggestion
+4. **Rename**: Applies suggestions (if not in dry-run mode)
+### Benefits of DOI Integration
+- **Accuracy**: DOI metadata is canonical and verified
+- **Speed**: Instant lookup vs. LLM processing time
+- **Cost**: Free DOI lookups save on API costs for academic papers
+- **Reliability**: Works even when PDF text extraction is poor
 ## Cost Considerations
-**OpenAI:**
+**DOI-based Naming (Academic Papers):**
+- **Completely free** - No API costs
+- **No LLM needed** - Direct metadata lookup
+- Works for most academic papers with embedded DOIs
+**OpenAI (Fallback):**
 - Uses `gpt-4o-mini` by default (very cost-effective)
+- Only called when DOI not found
 - Processes first ~4500 characters per PDF
 - Typical cost: ~$0.001-0.003 per PDF

pdf_file_renamer-0.4.2/PKG-INFO → pdf_file_renamer-0.6.0/README.md RENAMED Viewed

@@ -1,29 +1,3 @@
-Metadata-Version: 2.4
-Name: pdf-file-renamer
-Version: 0.4.2
-Summary: Intelligent PDF renaming using LLMs
-Requires-Python: >=3.11
-Description-Content-Type: text/markdown
-License-File: LICENSE
-Requires-Dist: pydantic>=2.10.6
-Requires-Dist: pydantic-ai>=1.0.17
-Requires-Dist: pydantic-settings>=2.7.1
-Requires-Dist: pymupdf>=1.26.5
-Requires-Dist: docling-parse>=2.0.0
-Requires-Dist: docling-core>=2.0.0
-Requires-Dist: python-dotenv>=1.1.1
-Requires-Dist: rich>=14.2.0
-Requires-Dist: typer>=0.19.2
-Requires-Dist: tenacity>=9.0.0
-Provides-Extra: dev
-Requires-Dist: pytest>=8.3.4; extra == "dev"
-Requires-Dist: pytest-cov>=6.0.0; extra == "dev"
-Requires-Dist: pytest-asyncio>=0.25.2; extra == "dev"
-Requires-Dist: pytest-mock>=3.14.0; extra == "dev"
-Requires-Dist: ruff>=0.9.1; extra == "dev"
-Requires-Dist: mypy>=1.14.1; extra == "dev"
-Dynamic: license-file
 # PDF Renamer
 [![PyPI version](https://img.shields.io/pypi/v/pdf-file-renamer.svg)](https://pypi.org/project/pdf-file-renamer/)
@@ -44,9 +18,11 @@ Intelligent PDF file renaming using LLMs. This tool analyzes PDF content and met
 ## Features
+- **DOI-based naming** - Automatically extracts DOI and fetches authoritative metadata for academic papers
 - **Advanced PDF parsing** using docling-parse for better structure-aware extraction
 - **OCR fallback** for scanned PDFs with low text content
 - **Smart LLM prompting** with multi-pass analysis for improved accuracy
+- **Hybrid approach** - Uses DOI metadata when available, falls back to LLM analysis for other documents
 - Suggests filenames in format: `Author-Topic-Year.pdf`
 - Dry-run mode to preview changes before applying
 - **Enhanced interactive mode** with options to accept, manually edit, retry, or skip each file
@@ -209,19 +185,44 @@ You can use interactive mode with `--dry-run` to preview without actually renami
 ## How It Works
-1. **Extract**: Uses docling-parse to read first 5 pages with structure-aware parsing, falls back to PyMuPDF if needed
-2. **OCR**: Automatically applies OCR for scanned PDFs with minimal text
-3. **Metadata Enhancement**: Extracts focused hints (years, emails, author sections) to supplement unreliable PDF metadata
-4. **Analyze**: Sends full content excerpt to LLM with enhanced metadata and detailed extraction instructions
-5. **Multi-pass Review**: Low-confidence results trigger a second analysis pass with focused prompts
-6. **Suggest**: LLM returns filename in `Author-Topic-Year` format with confidence level and reasoning
-7. **Interactive Review** (optional): User can accept, edit, retry, or skip each suggestion
-8. **Rename**: Applies suggestions (if not in dry-run mode)
+### Intelligent Hybrid Approach
+The tool uses a multi-strategy approach to generate accurate filenames:
+1. **DOI Detection** (for academic papers)
+   - Searches PDF for DOI identifiers using [pdf2doi](https://github.com/MicheleCotrufo/pdf2doi)
+   - If found, queries authoritative metadata (title, authors, year, journal)
+   - Generates filename with **very high confidence** from validated metadata
+   - **Saves API costs** - no LLM call needed for papers with DOIs
+2. **LLM Analysis** (fallback for non-academic PDFs)
+   - **Extract**: Uses docling-parse to read first 5 pages with structure-aware parsing, falls back to PyMuPDF if needed
+   - **OCR**: Automatically applies OCR for scanned PDFs with minimal text
+   - **Metadata Enhancement**: Extracts focused hints (years, emails, author sections) to supplement unreliable PDF metadata
+   - **Analyze**: Sends full content excerpt to LLM with enhanced metadata and detailed extraction instructions
+   - **Multi-pass Review**: Low-confidence results trigger a second analysis pass with focused prompts
+   - **Suggest**: LLM returns filename in `Author-Topic-Year` format with confidence level and reasoning
+3. **Interactive Review** (optional): User can accept, edit, retry, or skip each suggestion
+4. **Rename**: Applies suggestions (if not in dry-run mode)
+### Benefits of DOI Integration
+- **Accuracy**: DOI metadata is canonical and verified
+- **Speed**: Instant lookup vs. LLM processing time
+- **Cost**: Free DOI lookups save on API costs for academic papers
+- **Reliability**: Works even when PDF text extraction is poor
 ## Cost Considerations
-**OpenAI:**
+**DOI-based Naming (Academic Papers):**
+- **Completely free** - No API costs
+- **No LLM needed** - Direct metadata lookup
+- Works for most academic papers with embedded DOIs
+**OpenAI (Fallback):**
 - Uses `gpt-4o-mini` by default (very cost-effective)
+- Only called when DOI not found
 - Processes first ~4500 characters per PDF
 - Typical cost: ~$0.001-0.003 per PDF

pdf-file-renamer 0.4.2__tar.gz → 0.6.0__tar.gz

pdf-file-renamer 0.4.2tar.gz → 0.6.0tar.gz