PyPI - meaningful-pdf-names - Versions diffs - 0.1.0__py3-none-any.whl → 0.1.2__py3-none-any.whl - Mend

meaningful-pdf-names 0.1.0py3-none-any.whl → 0.1.2py3-none-any.whl

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Potentially problematic release.

This version of meaningful-pdf-names might be problematic. Click here for more details.

Files changed (10) hide show

meaningful_pdf_names/__init__.py CHANGED Viewed

@@ -1,2 +1,2 @@
 __all__ = ["__version__"]
-__version__ = "0.1.0"
+__version__ = "0.1.1"

meaningful_pdf_names/cli.py CHANGED Viewed

@@ -75,18 +75,27 @@ def summarize_text(text: str, max_chars: int = 4000) -> str:
         return text
-def extract_text_keywords(pdf_path: Path, max_keywords: int = 5):
+def extract_text_keywords(pdf_path: Path, max_keywords: int = 5, pages_to_read: int = 2):
     """
-    Extract up to `max_keywords` from ONLY the first page of the PDF.
+    Extract up to `max_keywords` from the first `pages_to_read` pages of the PDF.
+    If `pages_to_read` exceeds total pages, reads all available pages.
     """
     text = ""
     if PdfReader is not None:
         try:
             reader = PdfReader(str(pdf_path))
-            if len(reader.pages) > 0:
-                first_page = reader.pages[0]
-                text = (first_page.extract_text() or "")
+            total_pages = len(reader.pages)
+            if total_pages > 0:
+                # Determine how many pages to actually read
+                pages_to_extract = min(pages_to_read, total_pages)
+                # Extract text from the first N pages
+                for i in range(pages_to_extract):
+                    page = reader.pages[i]
+                    page_text = (page.extract_text() or "").strip()
+                    if page_text:
+                        text += page_text + " "
         except Exception:
             text = ""
@@ -205,9 +214,9 @@ def unique_target_path(folder: Path, base_slug: str, suffix_len: int = 3) -> Pat
             return candidate
-def rename_pdfs(folder: Path, dry_run: bool = False, verbose: bool = True):
+def rename_pdfs(folder: Path, dry_run: bool = False, verbose: bool = True, pages_to_read: int = 2):
     """
-    Rename all PDFs in the folder using first-page-based keywords.
+    Rename all PDFs in the folder using text from the first `pages_to_read` pages.
     """
     if not folder.is_dir():
         raise ValueError(f"{folder} is not a directory")
@@ -221,7 +230,7 @@ def rename_pdfs(folder: Path, dry_run: bool = False, verbose: bool = True):
         print(f"Found {len(pdf_files)} PDF(s) in {folder}")
     for pdf in pdf_files:
-        keywords = extract_text_keywords(pdf)
+        keywords = extract_text_keywords(pdf, pages_to_read=pages_to_read)
         base_slug = build_new_name(keywords)
         target = unique_target_path(folder, base_slug)
@@ -240,7 +249,7 @@ def rename_pdfs(folder: Path, dry_run: bool = False, verbose: bool = True):
 def main():
     parser = argparse.ArgumentParser(
         description=(
-            "Rename PDFs using first-page-derived keywords plus a short suffix "
+            "Rename PDFs using text-derived keywords plus a short suffix "
             "for clean, meaningful filenames."
         )
     )
@@ -249,6 +258,13 @@ def main():
         type=str,
         help="Path to folder containing PDFs."
     )
+    parser.add_argument(
+        "-p", "--pages",
+        type=int,
+        default=2,
+        help="Number of pages to read from each PDF (default: 2). "
+             "If larger than total pages, reads all available pages."
+    )
     parser.add_argument(
         "--dry-run",
         action="store_true",
@@ -263,7 +279,7 @@ def main():
     args = parser.parse_args()
     folder = Path(args.folder).expanduser().resolve()
-    rename_pdfs(folder, dry_run=args.dry_run, verbose=not args.quiet)
+    rename_pdfs(folder, dry_run=args.dry_run, verbose=not args.quiet, pages_to_read=args.pages)
 if __name__ == "__main__":

meaningful_pdf_names-0.1.2.dist-info/METADATA ADDED Viewed

@@ -0,0 +1,151 @@
+Metadata-Version: 2.4
+Name: meaningful-pdf-names
+Version: 0.1.2
+Summary: Offline-friendly PDF renamer that generates meaningful, keyword-rich filenames from PDF content.
+Author-email: Nishant Kumar <abcnishant007@gmail.com>
+License: MIT
+Project-URL: Homepage, https://github.com/abcnishant007/meaningful-pdf-names
+Project-URL: Source, https://github.com/abcnishant007/meaningful-pdf-names
+Project-URL: Issues, https://github.com/abcnishant007/meaningful-pdf-names/issues
+Keywords: pdf,rename,keywords,offline,cli
+Classifier: Programming Language :: Python :: 3
+Classifier: License :: OSI Approved :: MIT License
+Classifier: Operating System :: OS Independent
+Classifier: Topic :: Utilities
+Classifier: Topic :: Text Processing :: Linguistic
+Requires-Python: >=3.9
+Description-Content-Type: text/markdown
+License-File: LICENSE
+Requires-Dist: pypdf>=5.0.0
+Provides-Extra: summarizer
+Requires-Dist: transformers>=4.45.0; extra == "summarizer"
+Requires-Dist: torch>=2.0.0; extra == "summarizer"
+Dynamic: license-file
+# meaningful-pdf-names
+[![Python application](https://img.shields.io/badge/Python-3.9+-blue.svg)](https://www.python.org)
+[![PyPI version](https://img.shields.io/pypi/v/meaningful-pdf-names.svg)](https://pypi.org/project/meaningful-pdf-names/)
+[![codecov](https://codecov.io/gh/abcnishant007/meaningful-pdf-names/branch/main/graph/badge.svg)](https://codecov.io/gh/abcnishant007/meaningful-pdf-names)
+[![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black)
+[![Downloads](https://static.pepy.tech/badge/meaningful-pdf-names)](https://pepy.tech/projects/meaningful-pdf-names)
+Offline-friendly CLI to turn your messy paper filenames into **compact, keyword-rich names** based on the PDF's first page.
+Example:
+`final_v3_really_final.pdf` → `urban-resilience-transport-inequality-policy-a9f.pdf`
+## Features
+- Uses the **first 2 pages** by default (title, authors, abstract, introduction) for better context
+- Configurable page count with `-p` flag (e.g., `-p 4` for 4 pages)
+- Up to **5 meaningful keywords** per file
+- Adds a **3-character [a-z0-9] suffix** to avoid collisions
+- Works fully **offline** with `pypdf`
+- Optional: use a small local Hugging Face summarizer
+  (`sshleifer/distilbart-cnn-12-6`) via `transformers` + `torch`
+## Prerequisites
+- **Python 3.9+** installed on your system
+- **pip** (Python package manager) - usually comes with Python
+## Quick Install
+### From PyPI (Recommended)
+```bash
+pip install meaningful-pdf-names
+```
+## Quick Start Guide
+### For Mac Users
+1. **Install the package** (see above)
+2. **Navigate to your PDF folder**:
+   - Open Finder and go to the folder containing your PDFs
+   - Right-click on the folder and select "New Terminal at Folder"
+   - This opens Terminal directly in that folder
+3. **Run the command**:
+   ```bash
+   mpn .
+   ```
+### For Linux Users
+1. **Install the package** (see above)
+2. **Navigate to your PDF folder**:
+   ```bash
+   cd /path/to/your/pdf/folder
+   ```
+3. **Run the command**:
+   ```bash
+   mpn .
+   ```
+### For Any Folder Location
+If you want to rename PDFs in a different folder without navigating there:
+```bash
+mpn /full/path/to/your/pdf/folder
+```
+## Usage Examples
+**Basic usage (current folder):**
+```bash
+mpn .
+```
+**Specific folder:**
+```bash
+mpn ~/Downloads/research_papers
+mpn /Users/username/Documents/PDFs
+```
+**Dry run (preview changes without renaming):**
+```bash
+mpn . --dry-run
+```
+**Quiet mode (minimal output):**
+```bash
+mpn . --quiet
+```
+**Custom page count (read more pages for better context):**
+```bash
+mpn . -p 4          # Read first 4 pages
+mpn . -p 10         # Read up to 10 pages (or all if PDF has fewer)
+```
+## What It Does
+- Scans all PDF files in the specified folder
+- Extracts text from just the first page (fast!)
+- Identifies meaningful keywords from titles, authors, abstracts
+- Generates clean, readable filenames like:
+  - `climate-change-urban-planning-sustainability-a9f.pdf`
+  - `machine-learning-neural-networks-research-4x2.pdf`
+  - `healthcare-policy-digital-transformation-b7c.pdf`
+## Why Not Existing Tools?
+Other tools often:
+* Depend on **OpenAI / web APIs** (requires internet, API keys)
+* Require DOIs or external metadata (not always available)
+* Use long `Author - Title - Year` patterns (hard to read)
+`meaningful-pdf-names` is:
+* **Local-only** (no API keys, no network required)
+* **Fast** (first-page only extraction)
+* **Slug-based**: short, grep- and git-friendly names
+## License
+MIT

meaningful_pdf_names-0.1.2.dist-info/RECORD ADDED Viewed

@@ -0,0 +1,9 @@
+meaningful_pdf_names/__init__.py,sha256=mxfnxTtjjT0RlBl5L1-W0AT-IdYIc_KQVhB5cOlylEw,48
+meaningful_pdf_names/__main__.py,sha256=MSmt_5Xg84uHqzTN38JwgseJK8rsJn_11A8WD99VtEo,61
+meaningful_pdf_names/cli.py,sha256=EtW6J53ywyQvKeOla3iwzhRtnEYB1j-PCPWc7FfDFpI,8040
+meaningful_pdf_names-0.1.2.dist-info/licenses/LICENSE,sha256=OphKV48tcMv6ep-7j-8T6nycykPT0g8ZlMJ9zbGvdPs,1066
+meaningful_pdf_names-0.1.2.dist-info/METADATA,sha256=OUJ-XTJnd7t7ERHC2JYryI3-I44iMW1nfvllFldDjZw,4530
+meaningful_pdf_names-0.1.2.dist-info/WHEEL,sha256=_zCd3N1l69ArxyTb8rzEoP9TpbYXkqRFSNOD5OuxnTs,91
+meaningful_pdf_names-0.1.2.dist-info/entry_points.txt,sha256=EtPEkZe_yMNP99BJDtBPI2DL20GO3E5ELmOm2F4aPO4,107
+meaningful_pdf_names-0.1.2.dist-info/top_level.txt,sha256=TD_BuniRNpBdNggGi-6B8WQ4CxkYxzEgTSm2DfY4khw,21
+meaningful_pdf_names-0.1.2.dist-info/RECORD,,

meaningful_pdf_names-0.1.0.dist-info/METADATA DELETED Viewed

@@ -1,86 +0,0 @@
-Metadata-Version: 2.4
-Name: meaningful-pdf-names
-Version: 0.1.0
-Summary: Offline-friendly PDF renamer that generates meaningful, keyword-rich filenames from first-page content.
-Author-email: Nishant Kumar <abcnishant007@gmail.com>
-License: MIT
-Project-URL: Homepage, https://github.com/abcnishant007/meaningful-pdf-names
-Project-URL: Source, https://github.com/abcnishant007/meaningful-pdf-names
-Project-URL: Issues, https://github.com/abcnishant007/meaningful-pdf-names/issues
-Keywords: pdf,rename,keywords,offline,cli
-Classifier: Programming Language :: Python :: 3
-Classifier: License :: OSI Approved :: MIT License
-Classifier: Operating System :: OS Independent
-Classifier: Topic :: Utilities
-Classifier: Topic :: Text Processing :: Linguistic
-Requires-Python: >=3.9
-Description-Content-Type: text/markdown
-License-File: LICENSE
-Requires-Dist: pypdf>=5.0.0
-Provides-Extra: summarizer
-Requires-Dist: transformers>=4.45.0; extra == "summarizer"
-Requires-Dist: torch>=2.0.0; extra == "summarizer"
-Dynamic: license-file
-# meaningful-pdf-names
-Offline-friendly CLI to turn your messy paper filenames into **compact, keyword-rich names** based on the PDF's first page.
-Example:
-`final_v3_really_final.pdf` → `urban-resilience-transport-inequality-policy-a9f.pdf`
-## Features
-- Uses only the **first page** (title, authors, abstract region) for speed.
-- Up to **5 meaningful keywords** per file.
-- Adds a **3-character [a-z0-9] suffix** to avoid collisions.
-- Works fully **offline** with `pypdf`.
-- Optional: use a small local Hugging Face summarizer
-  (`sshleifer/distilbart-cnn-12-6`) via `transformers` + `torch`.
-## Install
-From source / Git:
-```bash
-pip install git+https://github.com/yourname/meaningful-pdf-names.git
-```
-(When published to PyPI:)
-```bash
-pip install meaningful-pdf-names
-```
-With optional local summarizer:
-```bash
-pip install "meaningful-pdf-names[summarizer]"
-```
-## Usage
-```bash
-meaningful-pdf-names /path/to/pdfs
-meaningful-pdf-names /path/to/pdfs --dry-run
-mpn /path/to/pdfs
-```
-## Why not existing tools?
-Other tools often:
-* Depend on **OpenAI / web APIs**.
-* Require DOIs or external metadata.
-* Use long `Author - Title - Year` patterns.
-`meaningful-pdf-names` is:
-* **Local-only** (no API keys, no network).
-* **Fast** (first-page only).
-* **Slug-based**: short, grep- and git-friendly names.
-## License
-MIT

meaningful_pdf_names-0.1.0.dist-info/RECORD DELETED Viewed

@@ -1,9 +0,0 @@
-meaningful_pdf_names/__init__.py,sha256=tXbRXsO0NE_UV1kIHiZTTQQH0fj0U2KoxxNusu_gzrM,48
-meaningful_pdf_names/__main__.py,sha256=MSmt_5Xg84uHqzTN38JwgseJK8rsJn_11A8WD99VtEo,61
-meaningful_pdf_names/cli.py,sha256=C5eYS9ZTBfkf9urzKrN8G85b9-Kt0JN8qabs0CizWAs,7236
-meaningful_pdf_names-0.1.0.dist-info/licenses/LICENSE,sha256=OphKV48tcMv6ep-7j-8T6nycykPT0g8ZlMJ9zbGvdPs,1066
-meaningful_pdf_names-0.1.0.dist-info/METADATA,sha256=cIZjWGIsHtS-bkAf6tOL_J7YnwOXFit7_pytGZqt0q4,2365
-meaningful_pdf_names-0.1.0.dist-info/WHEEL,sha256=_zCd3N1l69ArxyTb8rzEoP9TpbYXkqRFSNOD5OuxnTs,91
-meaningful_pdf_names-0.1.0.dist-info/entry_points.txt,sha256=EtPEkZe_yMNP99BJDtBPI2DL20GO3E5ELmOm2F4aPO4,107
-meaningful_pdf_names-0.1.0.dist-info/top_level.txt,sha256=TD_BuniRNpBdNggGi-6B8WQ4CxkYxzEgTSm2DfY4khw,21
-meaningful_pdf_names-0.1.0.dist-info/RECORD,,

{meaningful_pdf_names-0.1.0.dist-info → meaningful_pdf_names-0.1.2.dist-info}/WHEEL RENAMED Viewed

File without changes

{meaningful_pdf_names-0.1.0.dist-info → meaningful_pdf_names-0.1.2.dist-info}/entry_points.txt RENAMED Viewed

File without changes

{meaningful_pdf_names-0.1.0.dist-info → meaningful_pdf_names-0.1.2.dist-info}/licenses/LICENSE RENAMED Viewed

File without changes

{meaningful_pdf_names-0.1.0.dist-info → meaningful_pdf_names-0.1.2.dist-info}/top_level.txt RENAMED Viewed

File without changes

meaningful-pdf-names 0.1.0__py3-none-any.whl → 0.1.2__py3-none-any.whl

Potentially problematic release.

meaningful-pdf-names 0.1.0py3-none-any.whl → 0.1.2py3-none-any.whl