pdflinkcheck 1.1.73__py3-none-any.whl → 1.1.94__py3-none-any.whl
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- pdflinkcheck/__init__.py +2 -5
- pdflinkcheck/analyze_pymupdf.py +12 -6
- pdflinkcheck/analyze_pypdf.py +25 -7
- pdflinkcheck/analyze_pypdf_v2.py +5 -6
- pdflinkcheck/cli.py +82 -91
- pdflinkcheck/data/I Have Questions.md +51 -0
- pdflinkcheck/data/LICENSE +17 -654
- pdflinkcheck/data/README.md +49 -49
- pdflinkcheck/data/icons/BoxArt-1080x1080.png +0 -0
- pdflinkcheck/data/icons/Logo-150x150.png +0 -0
- pdflinkcheck/data/icons/Logo-300x300.png +0 -0
- pdflinkcheck/data/icons/Logo-71x71.png +0 -0
- pdflinkcheck/data/icons/PosterArt-720x1080.png +0 -0
- pdflinkcheck/data/icons/SmallLogo-44x44.png +0 -0
- pdflinkcheck/data/icons/SplashScreen-620x300.png +0 -0
- pdflinkcheck/data/icons/StoreLogo-50x50.png +0 -0
- pdflinkcheck/data/icons/WideLogo-310x150.png +0 -0
- pdflinkcheck/data/icons/red_pdf_512px.ico +0 -0
- pdflinkcheck/data/pyproject.toml +20 -23
- pdflinkcheck/data/themes/forest/forest-dark/border-accent-hover.png +0 -0
- pdflinkcheck/data/themes/forest/forest-dark/border-accent.png +0 -0
- pdflinkcheck/data/themes/forest/forest-dark/border-basic.png +0 -0
- pdflinkcheck/data/themes/forest/forest-dark/border-hover.png +0 -0
- pdflinkcheck/data/themes/forest/forest-dark/border-invalid.png +0 -0
- pdflinkcheck/data/themes/forest/forest-dark/card.png +0 -0
- pdflinkcheck/data/themes/forest/forest-dark/check-accent.png +0 -0
- pdflinkcheck/data/themes/forest/forest-dark/check-basic.png +0 -0
- pdflinkcheck/data/themes/forest/forest-dark/check-hover.png +0 -0
- pdflinkcheck/data/themes/forest/forest-dark/check-tri-accent.png +0 -0
- pdflinkcheck/data/themes/forest/forest-dark/check-tri-basic.png +0 -0
- pdflinkcheck/data/themes/forest/forest-dark/check-tri-hover.png +0 -0
- pdflinkcheck/data/themes/forest/forest-dark/check-unsel-accent.png +0 -0
- pdflinkcheck/data/themes/forest/forest-dark/check-unsel-basic.png +0 -0
- pdflinkcheck/data/themes/forest/forest-dark/check-unsel-hover.png +0 -0
- pdflinkcheck/data/themes/forest/forest-dark/check-unsel-pressed.png +0 -0
- pdflinkcheck/data/themes/forest/forest-dark/combo-button-basic.png +0 -0
- pdflinkcheck/data/themes/forest/forest-dark/combo-button-focus.png +0 -0
- pdflinkcheck/data/themes/forest/forest-dark/combo-button-hover.png +0 -0
- pdflinkcheck/data/themes/forest/forest-dark/down.png +0 -0
- pdflinkcheck/data/themes/forest/forest-dark/empty.png +0 -0
- pdflinkcheck/data/themes/forest/forest-dark/hor-accent.png +0 -0
- pdflinkcheck/data/themes/forest/forest-dark/hor-basic.png +0 -0
- pdflinkcheck/data/themes/forest/forest-dark/hor-hover.png +0 -0
- pdflinkcheck/data/themes/forest/forest-dark/notebook.png +0 -0
- pdflinkcheck/data/themes/forest/forest-dark/off-accent.png +0 -0
- pdflinkcheck/data/themes/forest/forest-dark/off-basic.png +0 -0
- pdflinkcheck/data/themes/forest/forest-dark/off-hover.png +0 -0
- pdflinkcheck/data/themes/forest/forest-dark/on-accent.png +0 -0
- pdflinkcheck/data/themes/forest/forest-dark/on-basic.png +0 -0
- pdflinkcheck/data/themes/forest/forest-dark/on-hover.png +0 -0
- pdflinkcheck/data/themes/forest/forest-dark/radio-accent.png +0 -0
- pdflinkcheck/data/themes/forest/forest-dark/radio-basic.png +0 -0
- pdflinkcheck/data/themes/forest/forest-dark/radio-hover.png +0 -0
- pdflinkcheck/data/themes/forest/forest-dark/radio-tri-accent.png +0 -0
- pdflinkcheck/data/themes/forest/forest-dark/radio-tri-basic.png +0 -0
- pdflinkcheck/data/themes/forest/forest-dark/radio-tri-hover.png +0 -0
- pdflinkcheck/data/themes/forest/forest-dark/radio-unsel-accent.png +0 -0
- pdflinkcheck/data/themes/forest/forest-dark/radio-unsel-basic.png +0 -0
- pdflinkcheck/data/themes/forest/forest-dark/radio-unsel-hover.png +0 -0
- pdflinkcheck/data/themes/forest/forest-dark/radio-unsel-pressed.png +0 -0
- pdflinkcheck/data/themes/forest/forest-dark/rect-accent-hover.png +0 -0
- pdflinkcheck/data/themes/forest/forest-dark/rect-accent.png +0 -0
- pdflinkcheck/data/themes/forest/forest-dark/rect-basic.png +0 -0
- pdflinkcheck/data/themes/forest/forest-dark/rect-hover.png +0 -0
- pdflinkcheck/data/themes/forest/forest-dark/right.png +0 -0
- pdflinkcheck/data/themes/forest/forest-dark/scale-hor.png +0 -0
- pdflinkcheck/data/themes/forest/forest-dark/scale-vert.png +0 -0
- pdflinkcheck/data/themes/forest/forest-dark/separator.png +0 -0
- pdflinkcheck/data/themes/forest/forest-dark/sizegrip.png +0 -0
- pdflinkcheck/data/themes/forest/forest-dark/spin-button-down-basic.png +0 -0
- pdflinkcheck/data/themes/forest/forest-dark/spin-button-down-focus.png +0 -0
- pdflinkcheck/data/themes/forest/forest-dark/spin-button-up.png +0 -0
- pdflinkcheck/data/themes/forest/forest-dark/tab-accent.png +0 -0
- pdflinkcheck/data/themes/forest/forest-dark/tab-basic.png +0 -0
- pdflinkcheck/data/themes/forest/forest-dark/tab-hover.png +0 -0
- pdflinkcheck/data/themes/forest/forest-dark/thumb-hor-accent.png +0 -0
- pdflinkcheck/data/themes/forest/forest-dark/thumb-hor-basic.png +0 -0
- pdflinkcheck/data/themes/forest/forest-dark/thumb-hor-hover.png +0 -0
- pdflinkcheck/data/themes/forest/forest-dark/thumb-vert-accent.png +0 -0
- pdflinkcheck/data/themes/forest/forest-dark/thumb-vert-basic.png +0 -0
- pdflinkcheck/data/themes/forest/forest-dark/thumb-vert-hover.png +0 -0
- pdflinkcheck/data/themes/forest/forest-dark/tree-basic.png +0 -0
- pdflinkcheck/data/themes/forest/forest-dark/tree-pressed.png +0 -0
- pdflinkcheck/data/themes/forest/forest-dark/up.png +0 -0
- pdflinkcheck/data/themes/forest/forest-dark/vert-accent.png +0 -0
- pdflinkcheck/data/themes/forest/forest-dark/vert-basic.png +0 -0
- pdflinkcheck/data/themes/forest/forest-dark/vert-hover.png +0 -0
- pdflinkcheck/data/themes/forest/forest-dark.tcl +536 -0
- pdflinkcheck/data/themes/forest/forest-light/border-accent-hover.png +0 -0
- pdflinkcheck/data/themes/forest/forest-light/border-accent.png +0 -0
- pdflinkcheck/data/themes/forest/forest-light/border-basic.png +0 -0
- pdflinkcheck/data/themes/forest/forest-light/border-hover.png +0 -0
- pdflinkcheck/data/themes/forest/forest-light/border-invalid.png +0 -0
- pdflinkcheck/data/themes/forest/forest-light/card.png +0 -0
- pdflinkcheck/data/themes/forest/forest-light/check-accent.png +0 -0
- pdflinkcheck/data/themes/forest/forest-light/check-basic.png +0 -0
- pdflinkcheck/data/themes/forest/forest-light/check-hover.png +0 -0
- pdflinkcheck/data/themes/forest/forest-light/check-tri-accent.png +0 -0
- pdflinkcheck/data/themes/forest/forest-light/check-tri-basic.png +0 -0
- pdflinkcheck/data/themes/forest/forest-light/check-tri-hover.png +0 -0
- pdflinkcheck/data/themes/forest/forest-light/check-unsel-accent.png +0 -0
- pdflinkcheck/data/themes/forest/forest-light/check-unsel-basic.png +0 -0
- pdflinkcheck/data/themes/forest/forest-light/check-unsel-hover.png +0 -0
- pdflinkcheck/data/themes/forest/forest-light/check-unsel-pressed.png +0 -0
- pdflinkcheck/data/themes/forest/forest-light/combo-button-basic.png +0 -0
- pdflinkcheck/data/themes/forest/forest-light/combo-button-focus.png +0 -0
- pdflinkcheck/data/themes/forest/forest-light/combo-button-hover.png +0 -0
- pdflinkcheck/data/themes/forest/forest-light/down-focus.png +0 -0
- pdflinkcheck/data/themes/forest/forest-light/down.png +0 -0
- pdflinkcheck/data/themes/forest/forest-light/empty.png +0 -0
- pdflinkcheck/data/themes/forest/forest-light/hor-accent.png +0 -0
- pdflinkcheck/data/themes/forest/forest-light/hor-basic.png +0 -0
- pdflinkcheck/data/themes/forest/forest-light/hor-hover.png +0 -0
- pdflinkcheck/data/themes/forest/forest-light/notebook.png +0 -0
- pdflinkcheck/data/themes/forest/forest-light/off-accent.png +0 -0
- pdflinkcheck/data/themes/forest/forest-light/off-basic.png +0 -0
- pdflinkcheck/data/themes/forest/forest-light/off-hover.png +0 -0
- pdflinkcheck/data/themes/forest/forest-light/on-accent.png +0 -0
- pdflinkcheck/data/themes/forest/forest-light/on-basic.png +0 -0
- pdflinkcheck/data/themes/forest/forest-light/on-hover.png +0 -0
- pdflinkcheck/data/themes/forest/forest-light/radio-accent.png +0 -0
- pdflinkcheck/data/themes/forest/forest-light/radio-basic.png +0 -0
- pdflinkcheck/data/themes/forest/forest-light/radio-hover.png +0 -0
- pdflinkcheck/data/themes/forest/forest-light/radio-tri-accent.png +0 -0
- pdflinkcheck/data/themes/forest/forest-light/radio-tri-basic.png +0 -0
- pdflinkcheck/data/themes/forest/forest-light/radio-tri-hover.png +0 -0
- pdflinkcheck/data/themes/forest/forest-light/radio-unsel-accent.png +0 -0
- pdflinkcheck/data/themes/forest/forest-light/radio-unsel-basic.png +0 -0
- pdflinkcheck/data/themes/forest/forest-light/radio-unsel-hover.png +0 -0
- pdflinkcheck/data/themes/forest/forest-light/radio-unsel-pressed.png +0 -0
- pdflinkcheck/data/themes/forest/forest-light/rect-accent-hover.png +0 -0
- pdflinkcheck/data/themes/forest/forest-light/rect-accent.png +0 -0
- pdflinkcheck/data/themes/forest/forest-light/rect-basic.png +0 -0
- pdflinkcheck/data/themes/forest/forest-light/rect-hover.png +0 -0
- pdflinkcheck/data/themes/forest/forest-light/right-focus.png +0 -0
- pdflinkcheck/data/themes/forest/forest-light/right.png +0 -0
- pdflinkcheck/data/themes/forest/forest-light/scale-hor.png +0 -0
- pdflinkcheck/data/themes/forest/forest-light/scale-vert.png +0 -0
- pdflinkcheck/data/themes/forest/forest-light/separator.png +0 -0
- pdflinkcheck/data/themes/forest/forest-light/sizegrip.png +0 -0
- pdflinkcheck/data/themes/forest/forest-light/spin-button-down-basic.png +0 -0
- pdflinkcheck/data/themes/forest/forest-light/spin-button-down-focus.png +0 -0
- pdflinkcheck/data/themes/forest/forest-light/spin-button-up.png +0 -0
- pdflinkcheck/data/themes/forest/forest-light/tab-accent.png +0 -0
- pdflinkcheck/data/themes/forest/forest-light/tab-basic.png +0 -0
- pdflinkcheck/data/themes/forest/forest-light/tab-hover.png +0 -0
- pdflinkcheck/data/themes/forest/forest-light/thumb-hor-accent.png +0 -0
- pdflinkcheck/data/themes/forest/forest-light/thumb-hor-basic.png +0 -0
- pdflinkcheck/data/themes/forest/forest-light/thumb-hor-hover.png +0 -0
- pdflinkcheck/data/themes/forest/forest-light/thumb-vert-accent.png +0 -0
- pdflinkcheck/data/themes/forest/forest-light/thumb-vert-basic.png +0 -0
- pdflinkcheck/data/themes/forest/forest-light/thumb-vert-hover.png +0 -0
- pdflinkcheck/data/themes/forest/forest-light/tree-basic.png +0 -0
- pdflinkcheck/data/themes/forest/forest-light/tree-pressed.png +0 -0
- pdflinkcheck/data/themes/forest/forest-light/up.png +0 -0
- pdflinkcheck/data/themes/forest/forest-light/vert-accent.png +0 -0
- pdflinkcheck/data/themes/forest/forest-light/vert-basic.png +0 -0
- pdflinkcheck/data/themes/forest/forest-light/vert-hover.png +0 -0
- pdflinkcheck/data/themes/forest/forest-light.tcl +544 -0
- pdflinkcheck/datacopy.py +2 -0
- pdflinkcheck/dev.py +10 -23
- pdflinkcheck/environment.py +64 -0
- pdflinkcheck/gui.py +229 -103
- pdflinkcheck/io.py +4 -18
- pdflinkcheck/report.py +148 -78
- pdflinkcheck/stdlib_server.py +14 -6
- pdflinkcheck/update_msix_version.py +47 -0
- pdflinkcheck/validate.py +50 -73
- pdflinkcheck/version_info.py +5 -2
- {pdflinkcheck-1.1.73.dist-info → pdflinkcheck-1.1.94.dist-info}/METADATA +54 -52
- pdflinkcheck-1.1.94.dist-info/RECORD +176 -0
- pdflinkcheck-1.1.94.dist-info/licenses/LICENSE +24 -0
- pdflinkcheck-1.1.94.dist-info/licenses/LICENSE-MIT +9 -0
- pdflinkcheck-1.1.73.dist-info/RECORD +0 -21
- {pdflinkcheck-1.1.73.dist-info → pdflinkcheck-1.1.94.dist-info}/WHEEL +0 -0
- {pdflinkcheck-1.1.73.dist-info → pdflinkcheck-1.1.94.dist-info}/entry_points.txt +0 -0
- /pdflinkcheck-1.1.73.dist-info/licenses/LICENSE → /pdflinkcheck-1.1.94.dist-info/licenses/LICENSE-AGPL3 +0 -0
pdflinkcheck/validate.py
CHANGED
|
@@ -1,11 +1,13 @@
|
|
|
1
|
+
#!/usr/bin/env python3
|
|
2
|
+
# SPDX-License-Identifier: MIT
|
|
1
3
|
# src/pdflinkcheck/validate.py
|
|
2
4
|
|
|
3
5
|
import sys
|
|
4
6
|
from pathlib import Path
|
|
5
7
|
from typing import Dict, Any
|
|
6
8
|
|
|
7
|
-
from pdflinkcheck.
|
|
8
|
-
from pdflinkcheck.
|
|
9
|
+
from pdflinkcheck.io import get_friendly_path
|
|
10
|
+
from pdflinkcheck.environment import pymupdf_is_available
|
|
9
11
|
|
|
10
12
|
SEP_COUNT=28
|
|
11
13
|
|
|
@@ -13,19 +15,16 @@ def run_validation(
|
|
|
13
15
|
report_results: Dict[str, Any],
|
|
14
16
|
pdf_path: str,
|
|
15
17
|
pdf_library: str = "pypdf",
|
|
16
|
-
check_external: bool = False
|
|
17
|
-
export_json: bool = True,
|
|
18
|
-
print_bool: bool = True
|
|
18
|
+
check_external: bool = False
|
|
19
19
|
) -> Dict[str, Any]:
|
|
20
20
|
"""
|
|
21
|
-
Validates links using the
|
|
21
|
+
Validates links during run_report() using a partial completion of the data dict.
|
|
22
22
|
|
|
23
23
|
Args:
|
|
24
|
-
report_results: The dict returned by
|
|
24
|
+
report_results: The dict returned by run_report_and_call_exports()
|
|
25
25
|
pdf_path: Path to the original PDF (needed for relative file checks and page count)
|
|
26
26
|
pdf_library: Engine used ("pypdf" or "pymupdf")
|
|
27
27
|
check_external: Whether to validate HTTP URLs (requires network + requests)
|
|
28
|
-
print_bool: Whether to print results to console
|
|
29
28
|
|
|
30
29
|
Returns:
|
|
31
30
|
Validation summary stats with valid/broken counts and detailed issues
|
|
@@ -37,13 +36,12 @@ def run_validation(
|
|
|
37
36
|
toc = data.get("toc", [])
|
|
38
37
|
|
|
39
38
|
if not all_links and not toc:
|
|
40
|
-
|
|
41
|
-
print("No links or TOC to validate.")
|
|
39
|
+
print("No links or TOC to validate.")
|
|
42
40
|
return {"summary-stats": {"valid": 0, "broken": 0}, "issues": []}
|
|
43
41
|
|
|
44
42
|
# Get total page count (critical for internal validation)
|
|
45
43
|
try:
|
|
46
|
-
if pdf_library == "pymupdf":
|
|
44
|
+
if pymupdf_is_available() and pdf_library == "pymupdf":
|
|
47
45
|
import fitz
|
|
48
46
|
doc = fitz.open(pdf_path)
|
|
49
47
|
total_pages = doc.page_count
|
|
@@ -53,44 +51,54 @@ def run_validation(
|
|
|
53
51
|
reader = PdfReader(pdf_path)
|
|
54
52
|
total_pages = len(reader.pages)
|
|
55
53
|
except Exception as e:
|
|
56
|
-
|
|
57
|
-
print(f"Could not determine page count: {e}")
|
|
54
|
+
print(f"Could not determine page count: {e}")
|
|
58
55
|
total_pages = None
|
|
59
56
|
|
|
60
57
|
pdf_dir = Path(pdf_path).parent
|
|
61
58
|
|
|
62
59
|
issues = []
|
|
63
60
|
valid_count = 0
|
|
61
|
+
file_found_count = 0
|
|
64
62
|
broken_file_count = 0
|
|
65
63
|
broken_page_count = 0
|
|
66
|
-
|
|
64
|
+
no_destination_page_count = 0
|
|
67
65
|
unknown_web_count = 0
|
|
68
66
|
unknown_reasonableness_count = 0
|
|
69
67
|
unknown_link_count = 0
|
|
70
68
|
|
|
71
69
|
# Validate active links
|
|
70
|
+
#print("DEBUG validate: entering loop with", len(all_links), "links")
|
|
72
71
|
for i, link in enumerate(all_links):
|
|
73
72
|
link_type = link.get("type")
|
|
74
73
|
status = "valid"
|
|
75
74
|
reason = None
|
|
76
75
|
if link_type in ("Internal (GoTo/Dest)", "Internal (Resolved Action)"):
|
|
77
|
-
|
|
78
|
-
if
|
|
79
|
-
status = "
|
|
80
|
-
reason =
|
|
81
|
-
|
|
82
|
-
|
|
83
|
-
|
|
84
|
-
|
|
85
|
-
|
|
86
|
-
|
|
87
|
-
|
|
88
|
-
|
|
89
|
-
|
|
90
|
-
|
|
91
|
-
|
|
92
|
-
|
|
93
|
-
|
|
76
|
+
dest_page_raw = link.get("destination_page")
|
|
77
|
+
if dest_page_raw is None:
|
|
78
|
+
status = "no-destinstion-page"
|
|
79
|
+
reason = "No destination page resolved"
|
|
80
|
+
else:
|
|
81
|
+
try:
|
|
82
|
+
target_page = int(dest_page_raw)
|
|
83
|
+
#target_page = int(link.get("destination_page"))
|
|
84
|
+
if not isinstance(target_page, int):
|
|
85
|
+
status = "broken-page"
|
|
86
|
+
reason = f"Target page not a number: {target_page}"
|
|
87
|
+
elif (1 <= target_page) and total_pages is None:
|
|
88
|
+
status = "unknown-reasonableness"
|
|
89
|
+
reason = "Total page count unavailable, but the page number is reasonable"
|
|
90
|
+
elif (1 <= target_page <= total_pages):
|
|
91
|
+
status = "valid"
|
|
92
|
+
reason = f"Page {target_page} within range (1–{total_pages})"
|
|
93
|
+
elif target_page < 1:
|
|
94
|
+
status = "broken-page"
|
|
95
|
+
reason = f"TOC targets page negative {target_page}."
|
|
96
|
+
elif not (1 <= target_page <= total_pages):
|
|
97
|
+
status = "broken-page"
|
|
98
|
+
reason = f"Page {target_page} out of range (1–{total_pages})"
|
|
99
|
+
except (ValueError, TypeError):
|
|
100
|
+
status = "broken-page"
|
|
101
|
+
reason = f"Invalid page value: {dest_page_raw}"
|
|
94
102
|
elif link_type == "Remote (GoToR)":
|
|
95
103
|
remote_file = link.get("remote_file")
|
|
96
104
|
if not remote_file:
|
|
@@ -132,13 +140,15 @@ def run_validation(
|
|
|
132
140
|
unknown_reasonableness_count += 1
|
|
133
141
|
elif status == "unknown-link":
|
|
134
142
|
unknown_link_count += 1
|
|
135
|
-
elif status == "broken-
|
|
143
|
+
elif status == "broken-page":
|
|
136
144
|
broken_page_count += 1
|
|
137
145
|
issues.append(link_with_val)
|
|
138
146
|
elif status == "broken-file":
|
|
139
|
-
|
|
147
|
+
broken_file_count += 1
|
|
148
|
+
issues.append(link_with_val)
|
|
149
|
+
elif status == "no-destinstion-page":
|
|
150
|
+
no_destination_page_count += 1
|
|
140
151
|
issues.append(link_with_val)
|
|
141
|
-
|
|
142
152
|
# Validate TOC entries
|
|
143
153
|
for entry in toc:
|
|
144
154
|
target_page = int(entry.get("target_page"))
|
|
@@ -156,7 +166,7 @@ def run_validation(
|
|
|
156
166
|
continue
|
|
157
167
|
else:
|
|
158
168
|
status = "broken-page"
|
|
159
|
-
reason = f"TOC targets page {
|
|
169
|
+
reason = f"TOC targets page {target_page} (out of 1–{total_pages})"
|
|
160
170
|
broken_count += 1
|
|
161
171
|
else:
|
|
162
172
|
status = "broken-page"
|
|
@@ -177,6 +187,7 @@ def run_validation(
|
|
|
177
187
|
"file-found": file_found_count,
|
|
178
188
|
"broken-page": broken_page_count,
|
|
179
189
|
"broken-file": broken_file_count,
|
|
190
|
+
"no_destination_page_count": no_destination_page_count,
|
|
180
191
|
"unknown-web": unknown_web_count,
|
|
181
192
|
"unknown-reasonableness": unknown_reasonableness_count,
|
|
182
193
|
"unknown-link": unknown_link_count,
|
|
@@ -203,8 +214,8 @@ def run_validation(
|
|
|
203
214
|
log(f"🌐 Web Addresses (Not Checked): {summary_stats['unknown-web']}")
|
|
204
215
|
log(f"⚠️ Unknown Page Reasonableness (Due to Missing Total Page Count): {summary_stats['unknown-reasonableness']}")
|
|
205
216
|
log(f"⚠️ Unsupported PDF Links: {summary_stats['unknown-link']}")
|
|
206
|
-
log(f"❌ Broken Page Reference: {summary_stats['broken-page']}")
|
|
207
|
-
log(f"❌ Broken File Reference: {summary_stats['broken-file']}")
|
|
217
|
+
log(f"❌ Broken Page Reference (Page number beyond scope of availability): {summary_stats['broken-page']}")
|
|
218
|
+
log(f"❌ Broken File Reference (File not available): {summary_stats['broken-file']}")
|
|
208
219
|
log("=" * SEP_COUNT)
|
|
209
220
|
|
|
210
221
|
if issues:
|
|
@@ -220,7 +231,7 @@ def run_validation(
|
|
|
220
231
|
if len(issues) > 25:
|
|
221
232
|
log(f"... and {len(issues) - 25} more issues")
|
|
222
233
|
else:
|
|
223
|
-
log("No
|
|
234
|
+
log("Success: No broken links or TOC issues!")
|
|
224
235
|
|
|
225
236
|
# Final aggregation of the buffer into one string
|
|
226
237
|
validation_buffer_str = "\n".join(validation_buffer)
|
|
@@ -228,8 +239,6 @@ def run_validation(
|
|
|
228
239
|
return validation_buffer_str
|
|
229
240
|
|
|
230
241
|
summary_txt = generate_validation_summary_txt_buffer(summary_stats, issues, pdf_path)
|
|
231
|
-
if print_bool:
|
|
232
|
-
print(summary_txt)
|
|
233
242
|
|
|
234
243
|
validation_results = {
|
|
235
244
|
"pdf_path" : pdf_path,
|
|
@@ -239,9 +248,6 @@ def run_validation(
|
|
|
239
248
|
"total_pages": total_pages
|
|
240
249
|
}
|
|
241
250
|
|
|
242
|
-
# Have export run interally so that the logic need not happen in an interface
|
|
243
|
-
|
|
244
|
-
export_validation_json(validation_results, pdf_path, pdf_library)
|
|
245
251
|
return validation_results
|
|
246
252
|
|
|
247
253
|
|
|
@@ -256,7 +262,7 @@ def run_validation_more_readable_slop(pdf_path: str = None, pdf_library: str = "
|
|
|
256
262
|
if check_external_links:
|
|
257
263
|
import requests
|
|
258
264
|
|
|
259
|
-
# 1. Setup Library Engine (Reuse
|
|
265
|
+
# 1. Setup Library Engine (Reuse logic)
|
|
260
266
|
pdf_library = pdf_library.lower()
|
|
261
267
|
if pdf_library == "pypdf":
|
|
262
268
|
from pdflinkcheck.analyze_pypdf import extract_links_pypdf as extract_links
|
|
@@ -351,32 +357,3 @@ def run_validation_more_readable_slop(pdf_path: str = None, pdf_library: str = "
|
|
|
351
357
|
))
|
|
352
358
|
|
|
353
359
|
return results
|
|
354
|
-
|
|
355
|
-
|
|
356
|
-
if __name__ == "__main__":
|
|
357
|
-
|
|
358
|
-
from pdflinkcheck.io import get_first_pdf_in_cwd
|
|
359
|
-
pdf_path = get_first_pdf_in_cwd()
|
|
360
|
-
# Run analysis first
|
|
361
|
-
report = run_report(
|
|
362
|
-
pdf_path=pdf_path,
|
|
363
|
-
max_links=0,
|
|
364
|
-
export_format="",
|
|
365
|
-
pdf_library="pypdf",
|
|
366
|
-
print_bool=False # We handle printing in validation
|
|
367
|
-
)
|
|
368
|
-
|
|
369
|
-
if not report or not report.get("data"):
|
|
370
|
-
print("No data extracted — nothing to validate.")
|
|
371
|
-
sys.exit(1)
|
|
372
|
-
|
|
373
|
-
# Then validate
|
|
374
|
-
validation_results = run_validation(
|
|
375
|
-
report_results=report,
|
|
376
|
-
pdf_path=pdf_path,
|
|
377
|
-
pdf_library="pypdf",
|
|
378
|
-
export_json=True,
|
|
379
|
-
print_bool=True
|
|
380
|
-
)
|
|
381
|
-
|
|
382
|
-
export_validation_results()
|
pdflinkcheck/version_info.py
CHANGED
|
@@ -1,4 +1,7 @@
|
|
|
1
|
+
#!/usr/bin/env python3
|
|
2
|
+
# SPDX-License-Identifier: MIT
|
|
1
3
|
# src/pdflinkcheck/version_info.py
|
|
4
|
+
|
|
2
5
|
import re
|
|
3
6
|
from pathlib import Path
|
|
4
7
|
import sys
|
|
@@ -11,7 +14,7 @@ This portion of the codebase is MIT licensed. It does not rely on any AGPL-licen
|
|
|
11
14
|
|
|
12
15
|
MIT License
|
|
13
16
|
|
|
14
|
-
Copyright
|
|
17
|
+
Copyright © 2025 George Clayton Bennett <george.bennett@memphistn.gov>
|
|
15
18
|
|
|
16
19
|
Permission is hereby granted, free of charge, to any person obtaining a copy
|
|
17
20
|
of this software and associated documentation files (the "Software"), to deal
|
|
@@ -52,7 +55,7 @@ def find_pyproject(start: Path) -> Path | None:
|
|
|
52
55
|
if candidate.exists():
|
|
53
56
|
return candidate
|
|
54
57
|
|
|
55
|
-
# 3. Handle Installed / Wheel / Shiv state (using
|
|
58
|
+
# 3. Handle Installed / Wheel / Shiv state (using force-include path)
|
|
56
59
|
internal_path = Path(__file__).parent / "data" / "pyproject.toml"
|
|
57
60
|
if internal_path.exists():
|
|
58
61
|
return internal_path
|
|
@@ -1,10 +1,13 @@
|
|
|
1
1
|
Metadata-Version: 2.4
|
|
2
2
|
Name: pdflinkcheck
|
|
3
|
-
Version: 1.1.
|
|
3
|
+
Version: 1.1.94
|
|
4
4
|
Summary: A purpose-built PDF link analysis and reporting tool with GUI and CLI.
|
|
5
5
|
Author: George Clayton Bennett
|
|
6
6
|
Author-email: George Clayton Bennett <george.bennett@memphistn.gov>
|
|
7
|
+
License-Expression: MIT AND AGPL-3.0-or-later
|
|
7
8
|
License-File: LICENSE
|
|
9
|
+
License-File: LICENSE-AGPL3
|
|
10
|
+
License-File: LICENSE-MIT
|
|
8
11
|
Classifier: Programming Language :: Python :: 3
|
|
9
12
|
Classifier: Programming Language :: Python :: 3 :: Only
|
|
10
13
|
Classifier: Programming Language :: Python :: 3.10
|
|
@@ -13,6 +16,7 @@ Classifier: Programming Language :: Python :: 3.12
|
|
|
13
16
|
Classifier: Programming Language :: Python :: 3.13
|
|
14
17
|
Classifier: Programming Language :: Python :: 3.14
|
|
15
18
|
Classifier: License :: OSI Approved :: GNU Affero General Public License v3 or later (AGPLv3+)
|
|
19
|
+
Classifier: License :: OSI Approved :: MIT License
|
|
16
20
|
Classifier: Operating System :: OS Independent
|
|
17
21
|
Classifier: Intended Audience :: End Users/Desktop
|
|
18
22
|
Classifier: Intended Audience :: Developers
|
|
@@ -32,14 +36,12 @@ Requires-Dist: pypdf>=6.4.2
|
|
|
32
36
|
Requires-Dist: rich>=14.2.0
|
|
33
37
|
Requires-Dist: typer>=0.20.0
|
|
34
38
|
Requires-Dist: pymupdf>=1.26.7 ; extra == 'full'
|
|
35
|
-
Requires-Dist: sv-ttk>=2.6.1 ; extra == 'gui'
|
|
36
39
|
Maintainer: George Clayton Bennett
|
|
37
40
|
Maintainer-email: George Clayton Bennett <george.bennett@memphistn.gov>
|
|
38
41
|
Requires-Python: >=3.10
|
|
39
42
|
Project-URL: Homepage, https://github.com/city-of-memphis-wastewater/pdflinkcheck
|
|
40
43
|
Project-URL: Repository, https://github.com/city-of-memphis-wastewater/pdflinkcheck
|
|
41
44
|
Provides-Extra: full
|
|
42
|
-
Provides-Extra: gui
|
|
43
45
|
Description-Content-Type: text/markdown
|
|
44
46
|
|
|
45
47
|
# pdflinkcheck
|
|
@@ -48,7 +50,7 @@ A purpose-built tool for comprehensive analysis of hyperlinks and GoTo links wit
|
|
|
48
50
|
|
|
49
51
|
-----
|
|
50
52
|
|
|
51
|
-

|
|
52
54
|
|
|
53
55
|
-----
|
|
54
56
|
|
|
@@ -86,20 +88,9 @@ The tool can be run as simple cross-platform graphical interface (Tkinter).
|
|
|
86
88
|
|
|
87
89
|
### Launching the GUI
|
|
88
90
|
|
|
89
|
-
|
|
90
|
-
|
|
91
|
-
1. **Implicit Launch:** Run the main command with no arguments, subcommands, or flags (`pdflinkcheck`).
|
|
91
|
+
Ways to launch the GUI interface:
|
|
92
|
+
1. **Implicit Launch:** Run the tool or file with no arguments, subcommands, or flags. (Note: PyInstaller builds use the --windowed (or -noconsole) flag, except for on Termux.)
|
|
92
93
|
2. **Explicit Command:** Use the dedicated GUI subcommand (`pdflinkcheck gui`).
|
|
93
|
-
3. **Binary Double-Click:**
|
|
94
|
-
* **Windows:** Double-click the `pdflinkcheck-VERSION-gui.bat` file.
|
|
95
|
-
* **macOS/Linux:** Double-click the downloaded `.pyz` or `.elf` file.
|
|
96
|
-
|
|
97
|
-
### Planned GUI Updates
|
|
98
|
-
|
|
99
|
-
We are actively working on the following enhancements:
|
|
100
|
-
|
|
101
|
-
* **Report Export:** Functionality to export the full analysis report to a plain text file.
|
|
102
|
-
* **License Visibility:** A dedicated "License Info" button within the GUI to display the terms of the AGPLv3+ license.
|
|
103
94
|
|
|
104
95
|
-----
|
|
105
96
|
|
|
@@ -107,20 +98,30 @@ We are actively working on the following enhancements:
|
|
|
107
98
|
|
|
108
99
|
The core functionality is accessed via the `analyze` command.
|
|
109
100
|
|
|
110
|
-
`DEV_TYPER_HELP_TREE=1 pdflinkcheck help-tree`:
|
|
111
|
-

|
|
112
|
-
|
|
113
101
|
`pdflinkcheck --help`:
|
|
114
|
-

|
|
103
|
+
|
|
104
|
+
|
|
105
|
+
See the Help Tree by unlocking the help-tree CLI command, using the DEV_TYPER_HELP_TREE env var.
|
|
106
|
+
|
|
107
|
+
```
|
|
108
|
+
DEV_TYPER_HELP_TREE=1 pdflinkcheck help-tree` # bash
|
|
109
|
+
$env:DEV_TYPER_HELP_TREE = "1"; pdflinkcheck help-tree` # PowerShell
|
|
110
|
+
```
|
|
111
|
+
|
|
112
|
+

|
|
113
|
+
|
|
115
114
|
|
|
116
115
|
|
|
117
116
|
### Available Commands
|
|
118
117
|
|
|
119
118
|
|**Command**|**Description**|
|
|
120
119
|
|---|---|
|
|
121
|
-
|`pdflinkcheck analyze`|Analyzes a PDF file for links
|
|
120
|
+
|`pdflinkcheck analyze`|Analyzes a PDF file for links and validates their reasonableness.|
|
|
122
121
|
|`pdflinkcheck gui`|Explicitly launch the Graphical User Interface.|
|
|
123
122
|
|`pdflinkcheck docs`|Access documentation, including the README and AGPLv3+ license.|
|
|
123
|
+
|`pdflinkcheck serve`|Serve a basic local web app which uses only the Python standard library.|
|
|
124
|
+
|`pdflinkcheck tools`|Access additional tools, like `--clear-cache`.|
|
|
124
125
|
|
|
125
126
|
### `analyze` Command Options
|
|
126
127
|
|
|
@@ -232,37 +233,23 @@ This `help-tree` feature has not yet been submitted for inclusion into Typer.
|
|
|
232
233
|
|
|
233
234
|
## ⚠️ Compatibility Notes
|
|
234
235
|
|
|
235
|
-
|
|
236
|
+
### Termux Compatibility as a Key Goal
|
|
236
237
|
A key goal of City-of-Memphis-Wastewater is to release all software as Termux-compatible.
|
|
237
238
|
|
|
238
|
-
Termux compatibility is important in the modern age
|
|
239
|
+
Termux compatibility is important in the modern age, because Android devices are common among technicians, field engineers, and maintenace staff.
|
|
239
240
|
Android is the most common operating system in the Global South.
|
|
240
241
|
We aim to produce stable software that can do the most possible good.
|
|
241
242
|
|
|
242
|
-
|
|
243
|
-
|
|
244
|
-
`pypdf
|
|
245
|
-
|
|
246
|
-
Now that `pdflinkcheck` can run on Termux, we may find a work-around and be able to drop the PyMuPDF dependency.
|
|
247
|
-
- Build `pypdf`-only artifacts, to reduce size.
|
|
248
|
-
- Build a web-stack GUI as an alternative to the Tkinter GUI, to be compatible with Termux.
|
|
249
|
-
|
|
250
|
-
Because it works, we plan to keep the `PyMuPDF` portion of the codebase.
|
|
251
|
-
|
|
252
|
-
### Document Compatibility:
|
|
253
|
-
Not all PDF files can be processed successfully. This tool is designed primarily for digitally generated (vector-based) PDFs.
|
|
254
|
-
|
|
255
|
-
Processing may fail or yield incomplete results for:
|
|
256
|
-
* **Scanned PDFs** (images of text) that lack an accessible text layer.
|
|
257
|
-
* **Encrypted or Password-Protected** documents.
|
|
258
|
-
* **Malformed or non-standard** PDF files.
|
|
243
|
+
Now `pdflinkcheck` can run on Termux by using the `pypdf` engine.
|
|
244
|
+
Benefits:
|
|
245
|
+
- `pypdf`-only artifacts, to reduce size to about 6% compared to artifacts that include `PyMuPDF`.
|
|
246
|
+
- Web-stack GUI as an alternative to the Tkinter GUI, which can be run locally on Termux or as a web app.
|
|
259
247
|
|
|
260
|
-
-----
|
|
261
248
|
|
|
262
|
-
|
|
263
|
-
At long last, `PyMuPDF` is an optional dependency.
|
|
249
|
+
### PDF Library Selection
|
|
250
|
+
At long last, `PyMuPDF` is an optional dependency. All testing comparing `pyp df` and `PyMuPDF` has shown identical validation performance. However `PyMuPDF` is much faster. The benfit of `pypdf` is small size of packages and cross-platform compatibility.
|
|
264
251
|
|
|
265
|
-
|
|
252
|
+
Expecte that all binaries and artifacts contain PyMuPDF, unlss they are built on Android. The GUI and CLI interfaces both allow selection of the library; if PyMuPDF is selected but is not available, the user will be warned.
|
|
266
253
|
|
|
267
254
|
To install the complete version use one of these options:
|
|
268
255
|
|
|
@@ -273,6 +260,16 @@ uv tool install "pdflinkcheck[full]"
|
|
|
273
260
|
uv add "pdflinkcheck[full]"
|
|
274
261
|
```
|
|
275
262
|
|
|
263
|
+
---
|
|
264
|
+
|
|
265
|
+
### Document Compatibility:
|
|
266
|
+
Not all PDF files can be processed successfully. This tool is designed primarily for digitally generated (vector-based) PDFs.
|
|
267
|
+
|
|
268
|
+
Processing may fail or yield incomplete results for:
|
|
269
|
+
* **Scanned PDFs** (images of text) that lack an accessible text layer.
|
|
270
|
+
* **Encrypted or Password-Protected** documents.
|
|
271
|
+
* **Malformed or non-standard** PDF files.
|
|
272
|
+
|
|
276
273
|
-----
|
|
277
274
|
|
|
278
275
|
## Run from Source (Developers)
|
|
@@ -301,22 +298,27 @@ uv run python -m pdflinkcheck.stdlib_server
|
|
|
301
298
|
|
|
302
299
|
## 📜 License Implications (AGPLv3+)
|
|
303
300
|
|
|
304
|
-
**`pdflinkcheck` is licensed under the `GNU Affero General Public License` version 3 or later (`AGPLv3+`).**
|
|
305
301
|
|
|
306
|
-
The `AGPL3
|
|
302
|
+
The `AGPL3-or-later` is required for binaries of `pdflinkcheck` which include `PyMuPDF`, which is licensed under the `AGPL3`.
|
|
303
|
+
The source code itself for `pdflinkcheck` is licensed under the `MIT`.
|
|
307
304
|
|
|
308
|
-
|
|
309
|
-
The `AGPL3` appears as the primary license file in the source code. While this infers that the entire project is AGPL3-licensed, this is not true - portions of the codebase are MIT-licensed.
|
|
310
|
-
|
|
311
|
-
This license has significant implications for **distribution and network use**, particularly for organizations:
|
|
305
|
+
The AGPL3-or-later license has significant implications for **distribution and network use**, particularly for organizations:
|
|
312
306
|
|
|
313
307
|
* **Source Code Provision:** If you distribute this tool (modified or unmodified) to anyone, you **must** provide the full source code under the same license.
|
|
314
308
|
* **Network Interaction (Affero Clause):** If you modify this tool and make the modified version available to users over a computer network (e.g., as a web service or backend), you **must** also offer the source code to those network users.
|
|
315
309
|
|
|
316
310
|
> **Before deploying or modifying this tool for organizational use, especially for internal web services or distribution, please ensure compliance with the AGPLv3+ terms.**
|
|
317
311
|
|
|
312
|
+
Because the AGPLv3 is a strong copyleft license, any version of `pdflinkcheck` that includes AGPL‑licensed components (such as `PyMuPDF`) must be distributed as a whole under AGPLv3+. This means that for those versions, anyone who distributes the application — or makes a modified version available over a network — must also provide the complete corresponding source code under the same terms.
|
|
313
|
+
|
|
314
|
+
The source code of pdflinkcheck itself remains licensed under the **MIT License**; only the distributed binary becomes AGPL‑licensed when PyMuPDF is included.
|
|
315
|
+
|
|
316
|
+
|
|
318
317
|
Links:
|
|
319
318
|
- Source code: https://github.com/City-of-Memphis-Wastewater/pdflinkcheck/
|
|
320
|
-
-
|
|
319
|
+
- PyMuPDF source code: https://github.com/pymupdf/PyMuPDF/
|
|
320
|
+
- pypdf source code: https://github.com/py-pdf/pypdf/
|
|
321
|
+
- AGPLv3 text (FSF): https://www.gnu.org/licenses/agpl-3.0.html
|
|
322
|
+
- MIT License text: https://opensource.org/license/mit
|
|
321
323
|
|
|
322
324
|
Copyright © 2025 George Clayton Bennett
|