pdflinkcheck 1.1.7__py3-none-any.whl → 1.1.47__py3-none-any.whl
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- pdflinkcheck/__init__.py +31 -0
- pdflinkcheck/analyze.py +306 -128
- pdflinkcheck/cli.py +97 -20
- pdflinkcheck/data/LICENSE +680 -0
- pdflinkcheck/gui.py +157 -29
- pdflinkcheck/io.py +106 -0
- pdflinkcheck-1.1.47.dist-info/METADATA +266 -0
- pdflinkcheck-1.1.47.dist-info/RECORD +13 -0
- {pdflinkcheck-1.1.7.dist-info → pdflinkcheck-1.1.47.dist-info}/entry_points.txt +0 -1
- pdflinkcheck-1.1.47.dist-info/licenses/LICENSE +680 -0
- pdflinkcheck-1.1.7.dist-info/METADATA +0 -109
- pdflinkcheck-1.1.7.dist-info/RECORD +0 -10
- {pdflinkcheck-1.1.7.dist-info → pdflinkcheck-1.1.47.dist-info}/WHEEL +0 -0
- {pdflinkcheck-1.1.7.dist-info → pdflinkcheck-1.1.47.dist-info}/top_level.txt +0 -0
|
@@ -1,109 +0,0 @@
|
|
|
1
|
-
Metadata-Version: 2.4
|
|
2
|
-
Name: pdflinkcheck
|
|
3
|
-
Version: 1.1.7
|
|
4
|
-
Summary: A purpose-built PDF link analysis reporting tool.
|
|
5
|
-
Requires-Python: >=3.12
|
|
6
|
-
Description-Content-Type: text/markdown
|
|
7
|
-
Requires-Dist: pymupdf>=1.26.6
|
|
8
|
-
Requires-Dist: rich>=14.2.0
|
|
9
|
-
Requires-Dist: typer>=0.20.0
|
|
10
|
-
Requires-Dist: pyhabitat>=1.0.52
|
|
11
|
-
Provides-Extra: dev
|
|
12
|
-
Requires-Dist: ruff>=0.1.13; extra == "dev"
|
|
13
|
-
Requires-Dist: pytest>=8.0.0; extra == "dev"
|
|
14
|
-
Requires-Dist: pytest-cov>=4.1.0; extra == "dev"
|
|
15
|
-
|
|
16
|
-
# pdflinkcheck
|
|
17
|
-
A purpose-built tool for comprehensive analysis of hyperlinks and link remnants within PDF documents, primarily using the PyMuPDF library.
|
|
18
|
-
Use the CLI or the GUI.
|
|
19
|
-
|
|
20
|
-
---
|
|
21
|
-
|
|
22
|
-
### Graphical User Interface (GUI)
|
|
23
|
-
|
|
24
|
-
The tool can be run using a simple cross-platform graphical interface (Tkinter):
|
|
25
|
-
|
|
26
|
-

|
|
27
|
-
|
|
28
|
-
To launch the GUI, use the command: `pdflinkcheck-gui`
|
|
29
|
-
|
|
30
|
-
---
|
|
31
|
-
|
|
32
|
-
### ✨ Features
|
|
33
|
-
|
|
34
|
-
* **Active Link Extraction:** Identifies and categorizes all programmed links (External URIs, Internal GoTo/Destinations, Remote Jumps).
|
|
35
|
-
* **Anchor Text Retrieval:** Extracts the visible text corresponding to each link's bounding box.
|
|
36
|
-
* **Remnant Detection:** Scans the document's text layer for unlinked URIs and email addresses that should potentially be converted into active links.
|
|
37
|
-
* **Structural TOC:** Extracts the PDF's internal Table of Contents (bookmarks/outline).
|
|
38
|
-
|
|
39
|
-
---
|
|
40
|
-
|
|
41
|
-
### 📥 Installation (Recommended via `pipx`)
|
|
42
|
-
|
|
43
|
-
The recommended way to install `pdflinkcheck` is using `pipx`, which installs Python applications in isolated environments, preventing dependency conflicts.
|
|
44
|
-
|
|
45
|
-
```bash
|
|
46
|
-
# Ensure you have pipx installed first (if not, run: pip install pipx)
|
|
47
|
-
pipx install pdflinkcheck
|
|
48
|
-
```
|
|
49
|
-
|
|
50
|
-
|
|
51
|
-
**Note for Developers:** If you prefer a traditional virtual environment or are developing locally, use `pip`:
|
|
52
|
-
```bash
|
|
53
|
-
# From the root of the project
|
|
54
|
-
pip install .
|
|
55
|
-
```
|
|
56
|
-
|
|
57
|
-
---
|
|
58
|
-
|
|
59
|
-
### 🚀 Usage
|
|
60
|
-
|
|
61
|
-
The main command is `pdflinkcheck analyze`.
|
|
62
|
-
|
|
63
|
-
|
|
64
|
-
```bash
|
|
65
|
-
# Basic usage: Analyze a PDF and check for remnants (default behavior)
|
|
66
|
-
pdflinkcheck analyze "path/to/my/document.pdf"
|
|
67
|
-
```
|
|
68
|
-
|
|
69
|
-
#### Command Options
|
|
70
|
-
|
|
71
|
-
|**Option**|**Description**|**Default**|
|
|
72
|
-
|---|---|---|
|
|
73
|
-
|`<PDF_PATH>`|**Required.** The path to the PDF file to analyze.|N/A|
|
|
74
|
-
|`--check-remnants / --no-check-remnants`|Toggle scanning the text layer for unlinked URLs/Emails.|`--check-remnants`|
|
|
75
|
-
|`--max-links INTEGER`|Maximum number of links/remnants to display in the detailed report sections.|`50`|
|
|
76
|
-
|`--help`|Show command help and exit.|N/A|
|
|
77
|
-
|
|
78
|
-
#### Example Run
|
|
79
|
-
|
|
80
|
-
```bash
|
|
81
|
-
pdflinkcheck analyze "TE Maxson WWTF O&M Manual.pdf" --max-links 10
|
|
82
|
-
```
|
|
83
|
-
|
|
84
|
-
# Run from source
|
|
85
|
-
```
|
|
86
|
-
git clone http://github.com/city-of-memphis-wastewater/pdflinkcheck.git
|
|
87
|
-
cd pdflinkcheck
|
|
88
|
-
uv sync
|
|
89
|
-
python src/pdflinkcheck/analyze.py
|
|
90
|
-
```
|
|
91
|
-
|
|
92
|
-
---
|
|
93
|
-
|
|
94
|
-
### ⚠️ Platform Compatibility Note
|
|
95
|
-
|
|
96
|
-
This tool relies on the `PyMuPDF` library, which requires specific native dependencies (like MuPDF) that may not be available on all platforms.
|
|
97
|
-
|
|
98
|
-
**Known Incompatibility:** This tool is **not officially supported** and may fail to run on environments like **Termux (Android)** due to underlying C/C++ library compilation issues with PyMuPDF. It is recommended for use on standard Linux, macOS, or Windows operating systems.
|
|
99
|
-
|
|
100
|
-
---
|
|
101
|
-
|
|
102
|
-
### Document Compatibility
|
|
103
|
-
|
|
104
|
-
While `pdflinkcheck` uses the robust PyMuPDF library, not all PDF files can be processed successfully. This tool is designed primarily for digitally generated (vector-based) PDFs.
|
|
105
|
-
|
|
106
|
-
Processing may fail or yield incomplete results for:
|
|
107
|
-
* **Scanned PDFs** (images of text) that lack an accessible text layer.
|
|
108
|
-
* **Encrypted or Password-Protected** documents.
|
|
109
|
-
* **Malformed or non-standard** PDF files.
|
|
@@ -1,10 +0,0 @@
|
|
|
1
|
-
pdflinkcheck/__init__.py,sha256=47DEQpj8HBSa-_TImW-5JCeuQeRkm5NMpJWZG3hSuFU,0
|
|
2
|
-
pdflinkcheck/analyze.py,sha256=wtj1fNvMl5553FYdHmd3K82ve2lHaDW68qBVITig2cQ,12982
|
|
3
|
-
pdflinkcheck/cli.py,sha256=vo_2BF7A4jaR_Qvd4AZ8RIqlwV10Die2RbWc9Er6wQo,1872
|
|
4
|
-
pdflinkcheck/gui.py,sha256=8uzaKqE0aVLzAGIwD52rbEJKfEHdi4R6S8fO8bPs8rI,6432
|
|
5
|
-
pdflinkcheck/remnants.py,sha256=xgunD4hDDT0SqD9SywvPc5DLSLNLA6O0BL0KOuLQwV8,6151
|
|
6
|
-
pdflinkcheck-1.1.7.dist-info/METADATA,sha256=SjgFk5-n8SlurKY0pjwZtTWtXt42vgz0KuYTFY729a4,3725
|
|
7
|
-
pdflinkcheck-1.1.7.dist-info/WHEEL,sha256=_zCd3N1l69ArxyTb8rzEoP9TpbYXkqRFSNOD5OuxnTs,91
|
|
8
|
-
pdflinkcheck-1.1.7.dist-info/entry_points.txt,sha256=Ql8fOpnnAGZ23DWcq0J97bPBafrP0rl8x9aVpSLh5Cs,100
|
|
9
|
-
pdflinkcheck-1.1.7.dist-info/top_level.txt,sha256=WdBg8l6l3TF1HQDpR_PwSmBCSu5atKWFnPfNbRNwrME,13
|
|
10
|
-
pdflinkcheck-1.1.7.dist-info/RECORD,,
|
|
File without changes
|
|
File without changes
|