pdflinkcheck 1.1.72__py3-none-any.whl → 1.1.94__py3-none-any.whl
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- pdflinkcheck/__init__.py +2 -5
- pdflinkcheck/analyze_pymupdf.py +12 -6
- pdflinkcheck/analyze_pypdf.py +25 -7
- pdflinkcheck/analyze_pypdf_v2.py +5 -6
- pdflinkcheck/cli.py +82 -91
- pdflinkcheck/data/I Have Questions.md +51 -0
- pdflinkcheck/data/LICENSE +17 -654
- pdflinkcheck/data/README.md +49 -49
- pdflinkcheck/data/icons/BoxArt-1080x1080.png +0 -0
- pdflinkcheck/data/icons/Logo-150x150.png +0 -0
- pdflinkcheck/data/icons/Logo-300x300.png +0 -0
- pdflinkcheck/data/icons/Logo-71x71.png +0 -0
- pdflinkcheck/data/icons/PosterArt-720x1080.png +0 -0
- pdflinkcheck/data/icons/SmallLogo-44x44.png +0 -0
- pdflinkcheck/data/icons/SplashScreen-620x300.png +0 -0
- pdflinkcheck/data/icons/StoreLogo-50x50.png +0 -0
- pdflinkcheck/data/icons/WideLogo-310x150.png +0 -0
- pdflinkcheck/data/icons/red_pdf_512px.ico +0 -0
- pdflinkcheck/data/pyproject.toml +20 -23
- pdflinkcheck/data/themes/forest/forest-dark/border-accent-hover.png +0 -0
- pdflinkcheck/data/themes/forest/forest-dark/border-accent.png +0 -0
- pdflinkcheck/data/themes/forest/forest-dark/border-basic.png +0 -0
- pdflinkcheck/data/themes/forest/forest-dark/border-hover.png +0 -0
- pdflinkcheck/data/themes/forest/forest-dark/border-invalid.png +0 -0
- pdflinkcheck/data/themes/forest/forest-dark/card.png +0 -0
- pdflinkcheck/data/themes/forest/forest-dark/check-accent.png +0 -0
- pdflinkcheck/data/themes/forest/forest-dark/check-basic.png +0 -0
- pdflinkcheck/data/themes/forest/forest-dark/check-hover.png +0 -0
- pdflinkcheck/data/themes/forest/forest-dark/check-tri-accent.png +0 -0
- pdflinkcheck/data/themes/forest/forest-dark/check-tri-basic.png +0 -0
- pdflinkcheck/data/themes/forest/forest-dark/check-tri-hover.png +0 -0
- pdflinkcheck/data/themes/forest/forest-dark/check-unsel-accent.png +0 -0
- pdflinkcheck/data/themes/forest/forest-dark/check-unsel-basic.png +0 -0
- pdflinkcheck/data/themes/forest/forest-dark/check-unsel-hover.png +0 -0
- pdflinkcheck/data/themes/forest/forest-dark/check-unsel-pressed.png +0 -0
- pdflinkcheck/data/themes/forest/forest-dark/combo-button-basic.png +0 -0
- pdflinkcheck/data/themes/forest/forest-dark/combo-button-focus.png +0 -0
- pdflinkcheck/data/themes/forest/forest-dark/combo-button-hover.png +0 -0
- pdflinkcheck/data/themes/forest/forest-dark/down.png +0 -0
- pdflinkcheck/data/themes/forest/forest-dark/empty.png +0 -0
- pdflinkcheck/data/themes/forest/forest-dark/hor-accent.png +0 -0
- pdflinkcheck/data/themes/forest/forest-dark/hor-basic.png +0 -0
- pdflinkcheck/data/themes/forest/forest-dark/hor-hover.png +0 -0
- pdflinkcheck/data/themes/forest/forest-dark/notebook.png +0 -0
- pdflinkcheck/data/themes/forest/forest-dark/off-accent.png +0 -0
- pdflinkcheck/data/themes/forest/forest-dark/off-basic.png +0 -0
- pdflinkcheck/data/themes/forest/forest-dark/off-hover.png +0 -0
- pdflinkcheck/data/themes/forest/forest-dark/on-accent.png +0 -0
- pdflinkcheck/data/themes/forest/forest-dark/on-basic.png +0 -0
- pdflinkcheck/data/themes/forest/forest-dark/on-hover.png +0 -0
- pdflinkcheck/data/themes/forest/forest-dark/radio-accent.png +0 -0
- pdflinkcheck/data/themes/forest/forest-dark/radio-basic.png +0 -0
- pdflinkcheck/data/themes/forest/forest-dark/radio-hover.png +0 -0
- pdflinkcheck/data/themes/forest/forest-dark/radio-tri-accent.png +0 -0
- pdflinkcheck/data/themes/forest/forest-dark/radio-tri-basic.png +0 -0
- pdflinkcheck/data/themes/forest/forest-dark/radio-tri-hover.png +0 -0
- pdflinkcheck/data/themes/forest/forest-dark/radio-unsel-accent.png +0 -0
- pdflinkcheck/data/themes/forest/forest-dark/radio-unsel-basic.png +0 -0
- pdflinkcheck/data/themes/forest/forest-dark/radio-unsel-hover.png +0 -0
- pdflinkcheck/data/themes/forest/forest-dark/radio-unsel-pressed.png +0 -0
- pdflinkcheck/data/themes/forest/forest-dark/rect-accent-hover.png +0 -0
- pdflinkcheck/data/themes/forest/forest-dark/rect-accent.png +0 -0
- pdflinkcheck/data/themes/forest/forest-dark/rect-basic.png +0 -0
- pdflinkcheck/data/themes/forest/forest-dark/rect-hover.png +0 -0
- pdflinkcheck/data/themes/forest/forest-dark/right.png +0 -0
- pdflinkcheck/data/themes/forest/forest-dark/scale-hor.png +0 -0
- pdflinkcheck/data/themes/forest/forest-dark/scale-vert.png +0 -0
- pdflinkcheck/data/themes/forest/forest-dark/separator.png +0 -0
- pdflinkcheck/data/themes/forest/forest-dark/sizegrip.png +0 -0
- pdflinkcheck/data/themes/forest/forest-dark/spin-button-down-basic.png +0 -0
- pdflinkcheck/data/themes/forest/forest-dark/spin-button-down-focus.png +0 -0
- pdflinkcheck/data/themes/forest/forest-dark/spin-button-up.png +0 -0
- pdflinkcheck/data/themes/forest/forest-dark/tab-accent.png +0 -0
- pdflinkcheck/data/themes/forest/forest-dark/tab-basic.png +0 -0
- pdflinkcheck/data/themes/forest/forest-dark/tab-hover.png +0 -0
- pdflinkcheck/data/themes/forest/forest-dark/thumb-hor-accent.png +0 -0
- pdflinkcheck/data/themes/forest/forest-dark/thumb-hor-basic.png +0 -0
- pdflinkcheck/data/themes/forest/forest-dark/thumb-hor-hover.png +0 -0
- pdflinkcheck/data/themes/forest/forest-dark/thumb-vert-accent.png +0 -0
- pdflinkcheck/data/themes/forest/forest-dark/thumb-vert-basic.png +0 -0
- pdflinkcheck/data/themes/forest/forest-dark/thumb-vert-hover.png +0 -0
- pdflinkcheck/data/themes/forest/forest-dark/tree-basic.png +0 -0
- pdflinkcheck/data/themes/forest/forest-dark/tree-pressed.png +0 -0
- pdflinkcheck/data/themes/forest/forest-dark/up.png +0 -0
- pdflinkcheck/data/themes/forest/forest-dark/vert-accent.png +0 -0
- pdflinkcheck/data/themes/forest/forest-dark/vert-basic.png +0 -0
- pdflinkcheck/data/themes/forest/forest-dark/vert-hover.png +0 -0
- pdflinkcheck/data/themes/forest/forest-dark.tcl +536 -0
- pdflinkcheck/data/themes/forest/forest-light/border-accent-hover.png +0 -0
- pdflinkcheck/data/themes/forest/forest-light/border-accent.png +0 -0
- pdflinkcheck/data/themes/forest/forest-light/border-basic.png +0 -0
- pdflinkcheck/data/themes/forest/forest-light/border-hover.png +0 -0
- pdflinkcheck/data/themes/forest/forest-light/border-invalid.png +0 -0
- pdflinkcheck/data/themes/forest/forest-light/card.png +0 -0
- pdflinkcheck/data/themes/forest/forest-light/check-accent.png +0 -0
- pdflinkcheck/data/themes/forest/forest-light/check-basic.png +0 -0
- pdflinkcheck/data/themes/forest/forest-light/check-hover.png +0 -0
- pdflinkcheck/data/themes/forest/forest-light/check-tri-accent.png +0 -0
- pdflinkcheck/data/themes/forest/forest-light/check-tri-basic.png +0 -0
- pdflinkcheck/data/themes/forest/forest-light/check-tri-hover.png +0 -0
- pdflinkcheck/data/themes/forest/forest-light/check-unsel-accent.png +0 -0
- pdflinkcheck/data/themes/forest/forest-light/check-unsel-basic.png +0 -0
- pdflinkcheck/data/themes/forest/forest-light/check-unsel-hover.png +0 -0
- pdflinkcheck/data/themes/forest/forest-light/check-unsel-pressed.png +0 -0
- pdflinkcheck/data/themes/forest/forest-light/combo-button-basic.png +0 -0
- pdflinkcheck/data/themes/forest/forest-light/combo-button-focus.png +0 -0
- pdflinkcheck/data/themes/forest/forest-light/combo-button-hover.png +0 -0
- pdflinkcheck/data/themes/forest/forest-light/down-focus.png +0 -0
- pdflinkcheck/data/themes/forest/forest-light/down.png +0 -0
- pdflinkcheck/data/themes/forest/forest-light/empty.png +0 -0
- pdflinkcheck/data/themes/forest/forest-light/hor-accent.png +0 -0
- pdflinkcheck/data/themes/forest/forest-light/hor-basic.png +0 -0
- pdflinkcheck/data/themes/forest/forest-light/hor-hover.png +0 -0
- pdflinkcheck/data/themes/forest/forest-light/notebook.png +0 -0
- pdflinkcheck/data/themes/forest/forest-light/off-accent.png +0 -0
- pdflinkcheck/data/themes/forest/forest-light/off-basic.png +0 -0
- pdflinkcheck/data/themes/forest/forest-light/off-hover.png +0 -0
- pdflinkcheck/data/themes/forest/forest-light/on-accent.png +0 -0
- pdflinkcheck/data/themes/forest/forest-light/on-basic.png +0 -0
- pdflinkcheck/data/themes/forest/forest-light/on-hover.png +0 -0
- pdflinkcheck/data/themes/forest/forest-light/radio-accent.png +0 -0
- pdflinkcheck/data/themes/forest/forest-light/radio-basic.png +0 -0
- pdflinkcheck/data/themes/forest/forest-light/radio-hover.png +0 -0
- pdflinkcheck/data/themes/forest/forest-light/radio-tri-accent.png +0 -0
- pdflinkcheck/data/themes/forest/forest-light/radio-tri-basic.png +0 -0
- pdflinkcheck/data/themes/forest/forest-light/radio-tri-hover.png +0 -0
- pdflinkcheck/data/themes/forest/forest-light/radio-unsel-accent.png +0 -0
- pdflinkcheck/data/themes/forest/forest-light/radio-unsel-basic.png +0 -0
- pdflinkcheck/data/themes/forest/forest-light/radio-unsel-hover.png +0 -0
- pdflinkcheck/data/themes/forest/forest-light/radio-unsel-pressed.png +0 -0
- pdflinkcheck/data/themes/forest/forest-light/rect-accent-hover.png +0 -0
- pdflinkcheck/data/themes/forest/forest-light/rect-accent.png +0 -0
- pdflinkcheck/data/themes/forest/forest-light/rect-basic.png +0 -0
- pdflinkcheck/data/themes/forest/forest-light/rect-hover.png +0 -0
- pdflinkcheck/data/themes/forest/forest-light/right-focus.png +0 -0
- pdflinkcheck/data/themes/forest/forest-light/right.png +0 -0
- pdflinkcheck/data/themes/forest/forest-light/scale-hor.png +0 -0
- pdflinkcheck/data/themes/forest/forest-light/scale-vert.png +0 -0
- pdflinkcheck/data/themes/forest/forest-light/separator.png +0 -0
- pdflinkcheck/data/themes/forest/forest-light/sizegrip.png +0 -0
- pdflinkcheck/data/themes/forest/forest-light/spin-button-down-basic.png +0 -0
- pdflinkcheck/data/themes/forest/forest-light/spin-button-down-focus.png +0 -0
- pdflinkcheck/data/themes/forest/forest-light/spin-button-up.png +0 -0
- pdflinkcheck/data/themes/forest/forest-light/tab-accent.png +0 -0
- pdflinkcheck/data/themes/forest/forest-light/tab-basic.png +0 -0
- pdflinkcheck/data/themes/forest/forest-light/tab-hover.png +0 -0
- pdflinkcheck/data/themes/forest/forest-light/thumb-hor-accent.png +0 -0
- pdflinkcheck/data/themes/forest/forest-light/thumb-hor-basic.png +0 -0
- pdflinkcheck/data/themes/forest/forest-light/thumb-hor-hover.png +0 -0
- pdflinkcheck/data/themes/forest/forest-light/thumb-vert-accent.png +0 -0
- pdflinkcheck/data/themes/forest/forest-light/thumb-vert-basic.png +0 -0
- pdflinkcheck/data/themes/forest/forest-light/thumb-vert-hover.png +0 -0
- pdflinkcheck/data/themes/forest/forest-light/tree-basic.png +0 -0
- pdflinkcheck/data/themes/forest/forest-light/tree-pressed.png +0 -0
- pdflinkcheck/data/themes/forest/forest-light/up.png +0 -0
- pdflinkcheck/data/themes/forest/forest-light/vert-accent.png +0 -0
- pdflinkcheck/data/themes/forest/forest-light/vert-basic.png +0 -0
- pdflinkcheck/data/themes/forest/forest-light/vert-hover.png +0 -0
- pdflinkcheck/data/themes/forest/forest-light.tcl +544 -0
- pdflinkcheck/datacopy.py +2 -0
- pdflinkcheck/dev.py +10 -23
- pdflinkcheck/environment.py +64 -0
- pdflinkcheck/gui.py +229 -103
- pdflinkcheck/io.py +4 -18
- pdflinkcheck/report.py +161 -89
- pdflinkcheck/stdlib_server.py +14 -6
- pdflinkcheck/update_msix_version.py +47 -0
- pdflinkcheck/validate.py +59 -80
- pdflinkcheck/version_info.py +5 -2
- {pdflinkcheck-1.1.72.dist-info → pdflinkcheck-1.1.94.dist-info}/METADATA +54 -52
- pdflinkcheck-1.1.94.dist-info/RECORD +176 -0
- pdflinkcheck-1.1.94.dist-info/licenses/LICENSE +24 -0
- pdflinkcheck-1.1.94.dist-info/licenses/LICENSE-MIT +9 -0
- pdflinkcheck-1.1.72.dist-info/RECORD +0 -21
- {pdflinkcheck-1.1.72.dist-info → pdflinkcheck-1.1.94.dist-info}/WHEEL +0 -0
- {pdflinkcheck-1.1.72.dist-info → pdflinkcheck-1.1.94.dist-info}/entry_points.txt +0 -0
- /pdflinkcheck-1.1.72.dist-info/licenses/LICENSE → /pdflinkcheck-1.1.94.dist-info/licenses/LICENSE-AGPL3 +0 -0
pdflinkcheck/validate.py
CHANGED
|
@@ -1,29 +1,30 @@
|
|
|
1
|
+
#!/usr/bin/env python3
|
|
2
|
+
# SPDX-License-Identifier: MIT
|
|
1
3
|
# src/pdflinkcheck/validate.py
|
|
2
4
|
|
|
3
5
|
import sys
|
|
4
6
|
from pathlib import Path
|
|
5
7
|
from typing import Dict, Any
|
|
6
8
|
|
|
7
|
-
from pdflinkcheck.
|
|
8
|
-
from pdflinkcheck.
|
|
9
|
+
from pdflinkcheck.io import get_friendly_path
|
|
10
|
+
from pdflinkcheck.environment import pymupdf_is_available
|
|
11
|
+
|
|
12
|
+
SEP_COUNT=28
|
|
9
13
|
|
|
10
14
|
def run_validation(
|
|
11
15
|
report_results: Dict[str, Any],
|
|
12
16
|
pdf_path: str,
|
|
13
17
|
pdf_library: str = "pypdf",
|
|
14
|
-
check_external: bool = False
|
|
15
|
-
export_json: bool = True,
|
|
16
|
-
print_bool: bool = True
|
|
18
|
+
check_external: bool = False
|
|
17
19
|
) -> Dict[str, Any]:
|
|
18
20
|
"""
|
|
19
|
-
Validates links using the
|
|
21
|
+
Validates links during run_report() using a partial completion of the data dict.
|
|
20
22
|
|
|
21
23
|
Args:
|
|
22
|
-
report_results: The dict returned by
|
|
24
|
+
report_results: The dict returned by run_report_and_call_exports()
|
|
23
25
|
pdf_path: Path to the original PDF (needed for relative file checks and page count)
|
|
24
26
|
pdf_library: Engine used ("pypdf" or "pymupdf")
|
|
25
27
|
check_external: Whether to validate HTTP URLs (requires network + requests)
|
|
26
|
-
print_bool: Whether to print results to console
|
|
27
28
|
|
|
28
29
|
Returns:
|
|
29
30
|
Validation summary stats with valid/broken counts and detailed issues
|
|
@@ -35,13 +36,12 @@ def run_validation(
|
|
|
35
36
|
toc = data.get("toc", [])
|
|
36
37
|
|
|
37
38
|
if not all_links and not toc:
|
|
38
|
-
|
|
39
|
-
print("No links or TOC to validate.")
|
|
39
|
+
print("No links or TOC to validate.")
|
|
40
40
|
return {"summary-stats": {"valid": 0, "broken": 0}, "issues": []}
|
|
41
41
|
|
|
42
42
|
# Get total page count (critical for internal validation)
|
|
43
43
|
try:
|
|
44
|
-
if pdf_library == "pymupdf":
|
|
44
|
+
if pymupdf_is_available() and pdf_library == "pymupdf":
|
|
45
45
|
import fitz
|
|
46
46
|
doc = fitz.open(pdf_path)
|
|
47
47
|
total_pages = doc.page_count
|
|
@@ -51,44 +51,54 @@ def run_validation(
|
|
|
51
51
|
reader = PdfReader(pdf_path)
|
|
52
52
|
total_pages = len(reader.pages)
|
|
53
53
|
except Exception as e:
|
|
54
|
-
|
|
55
|
-
print(f"Could not determine page count: {e}")
|
|
54
|
+
print(f"Could not determine page count: {e}")
|
|
56
55
|
total_pages = None
|
|
57
56
|
|
|
58
57
|
pdf_dir = Path(pdf_path).parent
|
|
59
58
|
|
|
60
59
|
issues = []
|
|
61
60
|
valid_count = 0
|
|
61
|
+
file_found_count = 0
|
|
62
62
|
broken_file_count = 0
|
|
63
63
|
broken_page_count = 0
|
|
64
|
-
|
|
64
|
+
no_destination_page_count = 0
|
|
65
65
|
unknown_web_count = 0
|
|
66
66
|
unknown_reasonableness_count = 0
|
|
67
67
|
unknown_link_count = 0
|
|
68
68
|
|
|
69
69
|
# Validate active links
|
|
70
|
+
#print("DEBUG validate: entering loop with", len(all_links), "links")
|
|
70
71
|
for i, link in enumerate(all_links):
|
|
71
72
|
link_type = link.get("type")
|
|
72
73
|
status = "valid"
|
|
73
74
|
reason = None
|
|
74
75
|
if link_type in ("Internal (GoTo/Dest)", "Internal (Resolved Action)"):
|
|
75
|
-
|
|
76
|
-
if
|
|
77
|
-
status = "
|
|
78
|
-
reason =
|
|
79
|
-
|
|
80
|
-
|
|
81
|
-
|
|
82
|
-
|
|
83
|
-
|
|
84
|
-
|
|
85
|
-
|
|
86
|
-
|
|
87
|
-
|
|
88
|
-
|
|
89
|
-
|
|
90
|
-
|
|
91
|
-
|
|
76
|
+
dest_page_raw = link.get("destination_page")
|
|
77
|
+
if dest_page_raw is None:
|
|
78
|
+
status = "no-destinstion-page"
|
|
79
|
+
reason = "No destination page resolved"
|
|
80
|
+
else:
|
|
81
|
+
try:
|
|
82
|
+
target_page = int(dest_page_raw)
|
|
83
|
+
#target_page = int(link.get("destination_page"))
|
|
84
|
+
if not isinstance(target_page, int):
|
|
85
|
+
status = "broken-page"
|
|
86
|
+
reason = f"Target page not a number: {target_page}"
|
|
87
|
+
elif (1 <= target_page) and total_pages is None:
|
|
88
|
+
status = "unknown-reasonableness"
|
|
89
|
+
reason = "Total page count unavailable, but the page number is reasonable"
|
|
90
|
+
elif (1 <= target_page <= total_pages):
|
|
91
|
+
status = "valid"
|
|
92
|
+
reason = f"Page {target_page} within range (1–{total_pages})"
|
|
93
|
+
elif target_page < 1:
|
|
94
|
+
status = "broken-page"
|
|
95
|
+
reason = f"TOC targets page negative {target_page}."
|
|
96
|
+
elif not (1 <= target_page <= total_pages):
|
|
97
|
+
status = "broken-page"
|
|
98
|
+
reason = f"Page {target_page} out of range (1–{total_pages})"
|
|
99
|
+
except (ValueError, TypeError):
|
|
100
|
+
status = "broken-page"
|
|
101
|
+
reason = f"Invalid page value: {dest_page_raw}"
|
|
92
102
|
elif link_type == "Remote (GoToR)":
|
|
93
103
|
remote_file = link.get("remote_file")
|
|
94
104
|
if not remote_file:
|
|
@@ -130,13 +140,15 @@ def run_validation(
|
|
|
130
140
|
unknown_reasonableness_count += 1
|
|
131
141
|
elif status == "unknown-link":
|
|
132
142
|
unknown_link_count += 1
|
|
133
|
-
elif status == "broken-
|
|
143
|
+
elif status == "broken-page":
|
|
134
144
|
broken_page_count += 1
|
|
135
145
|
issues.append(link_with_val)
|
|
136
146
|
elif status == "broken-file":
|
|
137
|
-
|
|
147
|
+
broken_file_count += 1
|
|
148
|
+
issues.append(link_with_val)
|
|
149
|
+
elif status == "no-destinstion-page":
|
|
150
|
+
no_destination_page_count += 1
|
|
138
151
|
issues.append(link_with_val)
|
|
139
|
-
|
|
140
152
|
# Validate TOC entries
|
|
141
153
|
for entry in toc:
|
|
142
154
|
target_page = int(entry.get("target_page"))
|
|
@@ -154,7 +166,7 @@ def run_validation(
|
|
|
154
166
|
continue
|
|
155
167
|
else:
|
|
156
168
|
status = "broken-page"
|
|
157
|
-
reason = f"TOC targets page {
|
|
169
|
+
reason = f"TOC targets page {target_page} (out of 1–{total_pages})"
|
|
158
170
|
broken_count += 1
|
|
159
171
|
else:
|
|
160
172
|
status = "broken-page"
|
|
@@ -175,6 +187,7 @@ def run_validation(
|
|
|
175
187
|
"file-found": file_found_count,
|
|
176
188
|
"broken-page": broken_page_count,
|
|
177
189
|
"broken-file": broken_file_count,
|
|
190
|
+
"no_destination_page_count": no_destination_page_count,
|
|
178
191
|
"unknown-web": unknown_web_count,
|
|
179
192
|
"unknown-reasonableness": unknown_reasonableness_count,
|
|
180
193
|
"unknown-link": unknown_link_count,
|
|
@@ -192,23 +205,23 @@ def run_validation(
|
|
|
192
205
|
def log(msg: str):
|
|
193
206
|
validation_buffer.append(msg)
|
|
194
207
|
|
|
195
|
-
log("\n" + "=" *
|
|
208
|
+
log("\n" + "=" * SEP_COUNT)
|
|
196
209
|
log("## Validation Results")
|
|
197
|
-
log("=" *
|
|
210
|
+
log("=" * SEP_COUNT)
|
|
198
211
|
log(f"PDF Path = {get_friendly_path(pdf_path)}")
|
|
199
212
|
log(f"Total items checked: {summary_stats['total_checked']}")
|
|
200
213
|
log(f"✅ Valid: {summary_stats['valid']}")
|
|
201
214
|
log(f"🌐 Web Addresses (Not Checked): {summary_stats['unknown-web']}")
|
|
202
215
|
log(f"⚠️ Unknown Page Reasonableness (Due to Missing Total Page Count): {summary_stats['unknown-reasonableness']}")
|
|
203
216
|
log(f"⚠️ Unsupported PDF Links: {summary_stats['unknown-link']}")
|
|
204
|
-
log(f"❌ Broken Page Reference: {summary_stats['broken-page']}")
|
|
205
|
-
log(f"❌ Broken File Reference: {summary_stats['broken-file']}")
|
|
206
|
-
log("=" *
|
|
217
|
+
log(f"❌ Broken Page Reference (Page number beyond scope of availability): {summary_stats['broken-page']}")
|
|
218
|
+
log(f"❌ Broken File Reference (File not available): {summary_stats['broken-file']}")
|
|
219
|
+
log("=" * SEP_COUNT)
|
|
207
220
|
|
|
208
221
|
if issues:
|
|
209
222
|
log("\n## Issues Found")
|
|
210
223
|
log("{:<5} | {:<12} | {:<30} | {}".format("Idx", "Type", "Text", "Problem"))
|
|
211
|
-
log("-" *
|
|
224
|
+
log("-" * SEP_COUNT)
|
|
212
225
|
for i, issue in enumerate(issues[:25], 1):
|
|
213
226
|
link_type = issue.get("type", "Link")
|
|
214
227
|
text = issue.get("link_text", "") or issue.get("title", "") or "N/A"
|
|
@@ -218,7 +231,7 @@ def run_validation(
|
|
|
218
231
|
if len(issues) > 25:
|
|
219
232
|
log(f"... and {len(issues) - 25} more issues")
|
|
220
233
|
else:
|
|
221
|
-
log("No
|
|
234
|
+
log("Success: No broken links or TOC issues!")
|
|
222
235
|
|
|
223
236
|
# Final aggregation of the buffer into one string
|
|
224
237
|
validation_buffer_str = "\n".join(validation_buffer)
|
|
@@ -226,8 +239,6 @@ def run_validation(
|
|
|
226
239
|
return validation_buffer_str
|
|
227
240
|
|
|
228
241
|
summary_txt = generate_validation_summary_txt_buffer(summary_stats, issues, pdf_path)
|
|
229
|
-
if print_bool:
|
|
230
|
-
print(summary_txt)
|
|
231
242
|
|
|
232
243
|
validation_results = {
|
|
233
244
|
"pdf_path" : pdf_path,
|
|
@@ -237,9 +248,6 @@ def run_validation(
|
|
|
237
248
|
"total_pages": total_pages
|
|
238
249
|
}
|
|
239
250
|
|
|
240
|
-
# Have export run interally so that the logic need not happen in an interface
|
|
241
|
-
|
|
242
|
-
export_validation_json(validation_results, pdf_path, pdf_library)
|
|
243
251
|
return validation_results
|
|
244
252
|
|
|
245
253
|
|
|
@@ -254,7 +262,7 @@ def run_validation_more_readable_slop(pdf_path: str = None, pdf_library: str = "
|
|
|
254
262
|
if check_external_links:
|
|
255
263
|
import requests
|
|
256
264
|
|
|
257
|
-
# 1. Setup Library Engine (Reuse
|
|
265
|
+
# 1. Setup Library Engine (Reuse logic)
|
|
258
266
|
pdf_library = pdf_library.lower()
|
|
259
267
|
if pdf_library == "pypdf":
|
|
260
268
|
from pdflinkcheck.analyze_pypdf import extract_links_pypdf as extract_links
|
|
@@ -330,18 +338,18 @@ def run_validation_more_readable_slop(pdf_path: str = None, pdf_library: str = "
|
|
|
330
338
|
else:
|
|
331
339
|
results['broken'].append(link)
|
|
332
340
|
|
|
333
|
-
print("\n" + "=" *
|
|
341
|
+
print("\n" + "=" * SEP_COUNT)
|
|
334
342
|
print(f"--- Validation Summary Stats for {Path(pdf_path).name} ---")
|
|
335
343
|
print(f"Total Checked: {total_links}")
|
|
336
344
|
print(f"✅ Valid: {len(results['valid'])}")
|
|
337
345
|
print(f"❌ Broken: {len(results['broken'])}")
|
|
338
|
-
print("=" *
|
|
346
|
+
print("=" * SEP_COUNT)
|
|
339
347
|
|
|
340
348
|
# 4. Print Detail Report for Broken Links
|
|
341
349
|
if results['broken']:
|
|
342
350
|
print("\n## ❌ Broken Links Found:")
|
|
343
351
|
print("{:<5} | {:<5} | {:<30} | {}".format("Idx", "Page", "Reason", "Target"))
|
|
344
|
-
print("-" *
|
|
352
|
+
print("-" * SEP_COUNT)
|
|
345
353
|
for i, link in enumerate(results['broken'], 1):
|
|
346
354
|
target = link.get('url') or link.get('destination_page') or link.get('remote_file')
|
|
347
355
|
print("{:<5} | {:<5} | {:<30} | {}".format(
|
|
@@ -349,32 +357,3 @@ def run_validation_more_readable_slop(pdf_path: str = None, pdf_library: str = "
|
|
|
349
357
|
))
|
|
350
358
|
|
|
351
359
|
return results
|
|
352
|
-
|
|
353
|
-
|
|
354
|
-
if __name__ == "__main__":
|
|
355
|
-
|
|
356
|
-
from pdflinkcheck.io import get_first_pdf_in_cwd
|
|
357
|
-
pdf_path = get_first_pdf_in_cwd()
|
|
358
|
-
# Run analysis first
|
|
359
|
-
report = run_report(
|
|
360
|
-
pdf_path=pdf_path,
|
|
361
|
-
max_links=0,
|
|
362
|
-
export_format="",
|
|
363
|
-
pdf_library="pypdf",
|
|
364
|
-
print_bool=False # We handle printing in validation
|
|
365
|
-
)
|
|
366
|
-
|
|
367
|
-
if not report or not report.get("data"):
|
|
368
|
-
print("No data extracted — nothing to validate.")
|
|
369
|
-
sys.exit(1)
|
|
370
|
-
|
|
371
|
-
# Then validate
|
|
372
|
-
validation_results = run_validation(
|
|
373
|
-
report_results=report,
|
|
374
|
-
pdf_path=pdf_path,
|
|
375
|
-
pdf_library="pypdf",
|
|
376
|
-
export_json=True,
|
|
377
|
-
print_bool=True
|
|
378
|
-
)
|
|
379
|
-
|
|
380
|
-
export_validation_results()
|
pdflinkcheck/version_info.py
CHANGED
|
@@ -1,4 +1,7 @@
|
|
|
1
|
+
#!/usr/bin/env python3
|
|
2
|
+
# SPDX-License-Identifier: MIT
|
|
1
3
|
# src/pdflinkcheck/version_info.py
|
|
4
|
+
|
|
2
5
|
import re
|
|
3
6
|
from pathlib import Path
|
|
4
7
|
import sys
|
|
@@ -11,7 +14,7 @@ This portion of the codebase is MIT licensed. It does not rely on any AGPL-licen
|
|
|
11
14
|
|
|
12
15
|
MIT License
|
|
13
16
|
|
|
14
|
-
Copyright
|
|
17
|
+
Copyright © 2025 George Clayton Bennett <george.bennett@memphistn.gov>
|
|
15
18
|
|
|
16
19
|
Permission is hereby granted, free of charge, to any person obtaining a copy
|
|
17
20
|
of this software and associated documentation files (the "Software"), to deal
|
|
@@ -52,7 +55,7 @@ def find_pyproject(start: Path) -> Path | None:
|
|
|
52
55
|
if candidate.exists():
|
|
53
56
|
return candidate
|
|
54
57
|
|
|
55
|
-
# 3. Handle Installed / Wheel / Shiv state (using
|
|
58
|
+
# 3. Handle Installed / Wheel / Shiv state (using force-include path)
|
|
56
59
|
internal_path = Path(__file__).parent / "data" / "pyproject.toml"
|
|
57
60
|
if internal_path.exists():
|
|
58
61
|
return internal_path
|
|
@@ -1,10 +1,13 @@
|
|
|
1
1
|
Metadata-Version: 2.4
|
|
2
2
|
Name: pdflinkcheck
|
|
3
|
-
Version: 1.1.
|
|
3
|
+
Version: 1.1.94
|
|
4
4
|
Summary: A purpose-built PDF link analysis and reporting tool with GUI and CLI.
|
|
5
5
|
Author: George Clayton Bennett
|
|
6
6
|
Author-email: George Clayton Bennett <george.bennett@memphistn.gov>
|
|
7
|
+
License-Expression: MIT AND AGPL-3.0-or-later
|
|
7
8
|
License-File: LICENSE
|
|
9
|
+
License-File: LICENSE-AGPL3
|
|
10
|
+
License-File: LICENSE-MIT
|
|
8
11
|
Classifier: Programming Language :: Python :: 3
|
|
9
12
|
Classifier: Programming Language :: Python :: 3 :: Only
|
|
10
13
|
Classifier: Programming Language :: Python :: 3.10
|
|
@@ -13,6 +16,7 @@ Classifier: Programming Language :: Python :: 3.12
|
|
|
13
16
|
Classifier: Programming Language :: Python :: 3.13
|
|
14
17
|
Classifier: Programming Language :: Python :: 3.14
|
|
15
18
|
Classifier: License :: OSI Approved :: GNU Affero General Public License v3 or later (AGPLv3+)
|
|
19
|
+
Classifier: License :: OSI Approved :: MIT License
|
|
16
20
|
Classifier: Operating System :: OS Independent
|
|
17
21
|
Classifier: Intended Audience :: End Users/Desktop
|
|
18
22
|
Classifier: Intended Audience :: Developers
|
|
@@ -32,14 +36,12 @@ Requires-Dist: pypdf>=6.4.2
|
|
|
32
36
|
Requires-Dist: rich>=14.2.0
|
|
33
37
|
Requires-Dist: typer>=0.20.0
|
|
34
38
|
Requires-Dist: pymupdf>=1.26.7 ; extra == 'full'
|
|
35
|
-
Requires-Dist: sv-ttk>=2.6.1 ; extra == 'gui'
|
|
36
39
|
Maintainer: George Clayton Bennett
|
|
37
40
|
Maintainer-email: George Clayton Bennett <george.bennett@memphistn.gov>
|
|
38
41
|
Requires-Python: >=3.10
|
|
39
42
|
Project-URL: Homepage, https://github.com/city-of-memphis-wastewater/pdflinkcheck
|
|
40
43
|
Project-URL: Repository, https://github.com/city-of-memphis-wastewater/pdflinkcheck
|
|
41
44
|
Provides-Extra: full
|
|
42
|
-
Provides-Extra: gui
|
|
43
45
|
Description-Content-Type: text/markdown
|
|
44
46
|
|
|
45
47
|
# pdflinkcheck
|
|
@@ -48,7 +50,7 @@ A purpose-built tool for comprehensive analysis of hyperlinks and GoTo links wit
|
|
|
48
50
|
|
|
49
51
|
-----
|
|
50
52
|
|
|
51
|
-

|
|
52
54
|
|
|
53
55
|
-----
|
|
54
56
|
|
|
@@ -86,20 +88,9 @@ The tool can be run as simple cross-platform graphical interface (Tkinter).
|
|
|
86
88
|
|
|
87
89
|
### Launching the GUI
|
|
88
90
|
|
|
89
|
-
|
|
90
|
-
|
|
91
|
-
1. **Implicit Launch:** Run the main command with no arguments, subcommands, or flags (`pdflinkcheck`).
|
|
91
|
+
Ways to launch the GUI interface:
|
|
92
|
+
1. **Implicit Launch:** Run the tool or file with no arguments, subcommands, or flags. (Note: PyInstaller builds use the --windowed (or -noconsole) flag, except for on Termux.)
|
|
92
93
|
2. **Explicit Command:** Use the dedicated GUI subcommand (`pdflinkcheck gui`).
|
|
93
|
-
3. **Binary Double-Click:**
|
|
94
|
-
* **Windows:** Double-click the `pdflinkcheck-VERSION-gui.bat` file.
|
|
95
|
-
* **macOS/Linux:** Double-click the downloaded `.pyz` or `.elf` file.
|
|
96
|
-
|
|
97
|
-
### Planned GUI Updates
|
|
98
|
-
|
|
99
|
-
We are actively working on the following enhancements:
|
|
100
|
-
|
|
101
|
-
* **Report Export:** Functionality to export the full analysis report to a plain text file.
|
|
102
|
-
* **License Visibility:** A dedicated "License Info" button within the GUI to display the terms of the AGPLv3+ license.
|
|
103
94
|
|
|
104
95
|
-----
|
|
105
96
|
|
|
@@ -107,20 +98,30 @@ We are actively working on the following enhancements:
|
|
|
107
98
|
|
|
108
99
|
The core functionality is accessed via the `analyze` command.
|
|
109
100
|
|
|
110
|
-
`DEV_TYPER_HELP_TREE=1 pdflinkcheck help-tree`:
|
|
111
|
-

|
|
112
|
-
|
|
113
101
|
`pdflinkcheck --help`:
|
|
114
|
-

|
|
103
|
+
|
|
104
|
+
|
|
105
|
+
See the Help Tree by unlocking the help-tree CLI command, using the DEV_TYPER_HELP_TREE env var.
|
|
106
|
+
|
|
107
|
+
```
|
|
108
|
+
DEV_TYPER_HELP_TREE=1 pdflinkcheck help-tree` # bash
|
|
109
|
+
$env:DEV_TYPER_HELP_TREE = "1"; pdflinkcheck help-tree` # PowerShell
|
|
110
|
+
```
|
|
111
|
+
|
|
112
|
+

|
|
113
|
+
|
|
115
114
|
|
|
116
115
|
|
|
117
116
|
### Available Commands
|
|
118
117
|
|
|
119
118
|
|**Command**|**Description**|
|
|
120
119
|
|---|---|
|
|
121
|
-
|`pdflinkcheck analyze`|Analyzes a PDF file for links
|
|
120
|
+
|`pdflinkcheck analyze`|Analyzes a PDF file for links and validates their reasonableness.|
|
|
122
121
|
|`pdflinkcheck gui`|Explicitly launch the Graphical User Interface.|
|
|
123
122
|
|`pdflinkcheck docs`|Access documentation, including the README and AGPLv3+ license.|
|
|
123
|
+
|`pdflinkcheck serve`|Serve a basic local web app which uses only the Python standard library.|
|
|
124
|
+
|`pdflinkcheck tools`|Access additional tools, like `--clear-cache`.|
|
|
124
125
|
|
|
125
126
|
### `analyze` Command Options
|
|
126
127
|
|
|
@@ -232,37 +233,23 @@ This `help-tree` feature has not yet been submitted for inclusion into Typer.
|
|
|
232
233
|
|
|
233
234
|
## ⚠️ Compatibility Notes
|
|
234
235
|
|
|
235
|
-
|
|
236
|
+
### Termux Compatibility as a Key Goal
|
|
236
237
|
A key goal of City-of-Memphis-Wastewater is to release all software as Termux-compatible.
|
|
237
238
|
|
|
238
|
-
Termux compatibility is important in the modern age
|
|
239
|
+
Termux compatibility is important in the modern age, because Android devices are common among technicians, field engineers, and maintenace staff.
|
|
239
240
|
Android is the most common operating system in the Global South.
|
|
240
241
|
We aim to produce stable software that can do the most possible good.
|
|
241
242
|
|
|
242
|
-
|
|
243
|
-
|
|
244
|
-
`pypdf
|
|
245
|
-
|
|
246
|
-
Now that `pdflinkcheck` can run on Termux, we may find a work-around and be able to drop the PyMuPDF dependency.
|
|
247
|
-
- Build `pypdf`-only artifacts, to reduce size.
|
|
248
|
-
- Build a web-stack GUI as an alternative to the Tkinter GUI, to be compatible with Termux.
|
|
249
|
-
|
|
250
|
-
Because it works, we plan to keep the `PyMuPDF` portion of the codebase.
|
|
251
|
-
|
|
252
|
-
### Document Compatibility:
|
|
253
|
-
Not all PDF files can be processed successfully. This tool is designed primarily for digitally generated (vector-based) PDFs.
|
|
254
|
-
|
|
255
|
-
Processing may fail or yield incomplete results for:
|
|
256
|
-
* **Scanned PDFs** (images of text) that lack an accessible text layer.
|
|
257
|
-
* **Encrypted or Password-Protected** documents.
|
|
258
|
-
* **Malformed or non-standard** PDF files.
|
|
243
|
+
Now `pdflinkcheck` can run on Termux by using the `pypdf` engine.
|
|
244
|
+
Benefits:
|
|
245
|
+
- `pypdf`-only artifacts, to reduce size to about 6% compared to artifacts that include `PyMuPDF`.
|
|
246
|
+
- Web-stack GUI as an alternative to the Tkinter GUI, which can be run locally on Termux or as a web app.
|
|
259
247
|
|
|
260
|
-
-----
|
|
261
248
|
|
|
262
|
-
|
|
263
|
-
At long last, `PyMuPDF` is an optional dependency.
|
|
249
|
+
### PDF Library Selection
|
|
250
|
+
At long last, `PyMuPDF` is an optional dependency. All testing comparing `pyp df` and `PyMuPDF` has shown identical validation performance. However `PyMuPDF` is much faster. The benfit of `pypdf` is small size of packages and cross-platform compatibility.
|
|
264
251
|
|
|
265
|
-
|
|
252
|
+
Expecte that all binaries and artifacts contain PyMuPDF, unlss they are built on Android. The GUI and CLI interfaces both allow selection of the library; if PyMuPDF is selected but is not available, the user will be warned.
|
|
266
253
|
|
|
267
254
|
To install the complete version use one of these options:
|
|
268
255
|
|
|
@@ -273,6 +260,16 @@ uv tool install "pdflinkcheck[full]"
|
|
|
273
260
|
uv add "pdflinkcheck[full]"
|
|
274
261
|
```
|
|
275
262
|
|
|
263
|
+
---
|
|
264
|
+
|
|
265
|
+
### Document Compatibility:
|
|
266
|
+
Not all PDF files can be processed successfully. This tool is designed primarily for digitally generated (vector-based) PDFs.
|
|
267
|
+
|
|
268
|
+
Processing may fail or yield incomplete results for:
|
|
269
|
+
* **Scanned PDFs** (images of text) that lack an accessible text layer.
|
|
270
|
+
* **Encrypted or Password-Protected** documents.
|
|
271
|
+
* **Malformed or non-standard** PDF files.
|
|
272
|
+
|
|
276
273
|
-----
|
|
277
274
|
|
|
278
275
|
## Run from Source (Developers)
|
|
@@ -301,22 +298,27 @@ uv run python -m pdflinkcheck.stdlib_server
|
|
|
301
298
|
|
|
302
299
|
## 📜 License Implications (AGPLv3+)
|
|
303
300
|
|
|
304
|
-
**`pdflinkcheck` is licensed under the `GNU Affero General Public License` version 3 or later (`AGPLv3+`).**
|
|
305
301
|
|
|
306
|
-
The `AGPL3
|
|
302
|
+
The `AGPL3-or-later` is required for binaries of `pdflinkcheck` which include `PyMuPDF`, which is licensed under the `AGPL3`.
|
|
303
|
+
The source code itself for `pdflinkcheck` is licensed under the `MIT`.
|
|
307
304
|
|
|
308
|
-
|
|
309
|
-
The `AGPL3` appears as the primary license file in the source code. While this infers that the entire project is AGPL3-licensed, this is not true - portions of the codebase are MIT-licensed.
|
|
310
|
-
|
|
311
|
-
This license has significant implications for **distribution and network use**, particularly for organizations:
|
|
305
|
+
The AGPL3-or-later license has significant implications for **distribution and network use**, particularly for organizations:
|
|
312
306
|
|
|
313
307
|
* **Source Code Provision:** If you distribute this tool (modified or unmodified) to anyone, you **must** provide the full source code under the same license.
|
|
314
308
|
* **Network Interaction (Affero Clause):** If you modify this tool and make the modified version available to users over a computer network (e.g., as a web service or backend), you **must** also offer the source code to those network users.
|
|
315
309
|
|
|
316
310
|
> **Before deploying or modifying this tool for organizational use, especially for internal web services or distribution, please ensure compliance with the AGPLv3+ terms.**
|
|
317
311
|
|
|
312
|
+
Because the AGPLv3 is a strong copyleft license, any version of `pdflinkcheck` that includes AGPL‑licensed components (such as `PyMuPDF`) must be distributed as a whole under AGPLv3+. This means that for those versions, anyone who distributes the application — or makes a modified version available over a network — must also provide the complete corresponding source code under the same terms.
|
|
313
|
+
|
|
314
|
+
The source code of pdflinkcheck itself remains licensed under the **MIT License**; only the distributed binary becomes AGPL‑licensed when PyMuPDF is included.
|
|
315
|
+
|
|
316
|
+
|
|
318
317
|
Links:
|
|
319
318
|
- Source code: https://github.com/City-of-Memphis-Wastewater/pdflinkcheck/
|
|
320
|
-
-
|
|
319
|
+
- PyMuPDF source code: https://github.com/pymupdf/PyMuPDF/
|
|
320
|
+
- pypdf source code: https://github.com/py-pdf/pypdf/
|
|
321
|
+
- AGPLv3 text (FSF): https://www.gnu.org/licenses/agpl-3.0.html
|
|
322
|
+
- MIT License text: https://opensource.org/license/mit
|
|
321
323
|
|
|
322
324
|
Copyright © 2025 George Clayton Bennett
|