pdflinkcheck 1.1.7__py3-none-any.whl → 1.1.47__py3-none-any.whl

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -1,109 +0,0 @@
1
- Metadata-Version: 2.4
2
- Name: pdflinkcheck
3
- Version: 1.1.7
4
- Summary: A purpose-built PDF link analysis reporting tool.
5
- Requires-Python: >=3.12
6
- Description-Content-Type: text/markdown
7
- Requires-Dist: pymupdf>=1.26.6
8
- Requires-Dist: rich>=14.2.0
9
- Requires-Dist: typer>=0.20.0
10
- Requires-Dist: pyhabitat>=1.0.52
11
- Provides-Extra: dev
12
- Requires-Dist: ruff>=0.1.13; extra == "dev"
13
- Requires-Dist: pytest>=8.0.0; extra == "dev"
14
- Requires-Dist: pytest-cov>=4.1.0; extra == "dev"
15
-
16
- # pdflinkcheck
17
- A purpose-built tool for comprehensive analysis of hyperlinks and link remnants within PDF documents, primarily using the PyMuPDF library.
18
- Use the CLI or the GUI.
19
-
20
- ---
21
-
22
- ### Graphical User Interface (GUI)
23
-
24
- The tool can be run using a simple cross-platform graphical interface (Tkinter):
25
-
26
- ![Screenshot of the pdflinkcheck GUI](https://raw.githubusercontent.com/City-of-Memphis-Wastewater/pdflinkcheck/main/assets/pdflinkcheck_gui.png)
27
-
28
- To launch the GUI, use the command: `pdflinkcheck-gui`
29
-
30
- ---
31
-
32
- ### ✨ Features
33
-
34
- * **Active Link Extraction:** Identifies and categorizes all programmed links (External URIs, Internal GoTo/Destinations, Remote Jumps).
35
- * **Anchor Text Retrieval:** Extracts the visible text corresponding to each link's bounding box.
36
- * **Remnant Detection:** Scans the document's text layer for unlinked URIs and email addresses that should potentially be converted into active links.
37
- * **Structural TOC:** Extracts the PDF's internal Table of Contents (bookmarks/outline).
38
-
39
- ---
40
-
41
- ### 📥 Installation (Recommended via `pipx`)
42
-
43
- The recommended way to install `pdflinkcheck` is using `pipx`, which installs Python applications in isolated environments, preventing dependency conflicts.
44
-
45
- ```bash
46
- # Ensure you have pipx installed first (if not, run: pip install pipx)
47
- pipx install pdflinkcheck
48
- ```
49
-
50
-
51
- **Note for Developers:** If you prefer a traditional virtual environment or are developing locally, use `pip`:
52
- ```bash
53
- # From the root of the project
54
- pip install .
55
- ```
56
-
57
- ---
58
-
59
- ### 🚀 Usage
60
-
61
- The main command is `pdflinkcheck analyze`.
62
-
63
-
64
- ```bash
65
- # Basic usage: Analyze a PDF and check for remnants (default behavior)
66
- pdflinkcheck analyze "path/to/my/document.pdf"
67
- ```
68
-
69
- #### Command Options
70
-
71
- |**Option**|**Description**|**Default**|
72
- |---|---|---|
73
- |`<PDF_PATH>`|**Required.** The path to the PDF file to analyze.|N/A|
74
- |`--check-remnants / --no-check-remnants`|Toggle scanning the text layer for unlinked URLs/Emails.|`--check-remnants`|
75
- |`--max-links INTEGER`|Maximum number of links/remnants to display in the detailed report sections.|`50`|
76
- |`--help`|Show command help and exit.|N/A|
77
-
78
- #### Example Run
79
-
80
- ```bash
81
- pdflinkcheck analyze "TE Maxson WWTF O&M Manual.pdf" --max-links 10
82
- ```
83
-
84
- # Run from source
85
- ```
86
- git clone http://github.com/city-of-memphis-wastewater/pdflinkcheck.git
87
- cd pdflinkcheck
88
- uv sync
89
- python src/pdflinkcheck/analyze.py
90
- ```
91
-
92
- ---
93
-
94
- ### ⚠️ Platform Compatibility Note
95
-
96
- This tool relies on the `PyMuPDF` library, which requires specific native dependencies (like MuPDF) that may not be available on all platforms.
97
-
98
- **Known Incompatibility:** This tool is **not officially supported** and may fail to run on environments like **Termux (Android)** due to underlying C/C++ library compilation issues with PyMuPDF. It is recommended for use on standard Linux, macOS, or Windows operating systems.
99
-
100
- ---
101
-
102
- ### Document Compatibility
103
-
104
- While `pdflinkcheck` uses the robust PyMuPDF library, not all PDF files can be processed successfully. This tool is designed primarily for digitally generated (vector-based) PDFs.
105
-
106
- Processing may fail or yield incomplete results for:
107
- * **Scanned PDFs** (images of text) that lack an accessible text layer.
108
- * **Encrypted or Password-Protected** documents.
109
- * **Malformed or non-standard** PDF files.
@@ -1,10 +0,0 @@
1
- pdflinkcheck/__init__.py,sha256=47DEQpj8HBSa-_TImW-5JCeuQeRkm5NMpJWZG3hSuFU,0
2
- pdflinkcheck/analyze.py,sha256=wtj1fNvMl5553FYdHmd3K82ve2lHaDW68qBVITig2cQ,12982
3
- pdflinkcheck/cli.py,sha256=vo_2BF7A4jaR_Qvd4AZ8RIqlwV10Die2RbWc9Er6wQo,1872
4
- pdflinkcheck/gui.py,sha256=8uzaKqE0aVLzAGIwD52rbEJKfEHdi4R6S8fO8bPs8rI,6432
5
- pdflinkcheck/remnants.py,sha256=xgunD4hDDT0SqD9SywvPc5DLSLNLA6O0BL0KOuLQwV8,6151
6
- pdflinkcheck-1.1.7.dist-info/METADATA,sha256=SjgFk5-n8SlurKY0pjwZtTWtXt42vgz0KuYTFY729a4,3725
7
- pdflinkcheck-1.1.7.dist-info/WHEEL,sha256=_zCd3N1l69ArxyTb8rzEoP9TpbYXkqRFSNOD5OuxnTs,91
8
- pdflinkcheck-1.1.7.dist-info/entry_points.txt,sha256=Ql8fOpnnAGZ23DWcq0J97bPBafrP0rl8x9aVpSLh5Cs,100
9
- pdflinkcheck-1.1.7.dist-info/top_level.txt,sha256=WdBg8l6l3TF1HQDpR_PwSmBCSu5atKWFnPfNbRNwrME,13
10
- pdflinkcheck-1.1.7.dist-info/RECORD,,