pdflinkcheck 1.1.73__py3-none-any.whl → 1.2.29__py3-none-any.whl

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (186) hide show
  1. pdflinkcheck/__init__.py +88 -21
  2. pdflinkcheck/__main__.py +6 -0
  3. pdflinkcheck/analysis_pdfium.py +131 -0
  4. pdflinkcheck/{analyze_pymupdf.py → analysis_pymupdf.py} +109 -145
  5. pdflinkcheck/{analyze_pypdf.py → analysis_pypdf.py} +67 -37
  6. pdflinkcheck/cli.py +111 -116
  7. pdflinkcheck/data/I Have Questions.md +51 -0
  8. pdflinkcheck/data/LICENSE +20 -654
  9. pdflinkcheck/data/README.md +65 -67
  10. pdflinkcheck/data/icons/BoxArt-1080x1080.png +0 -0
  11. pdflinkcheck/data/icons/Logo-150x150.png +0 -0
  12. pdflinkcheck/data/icons/Logo-300x300.png +0 -0
  13. pdflinkcheck/data/icons/Logo-71x71.png +0 -0
  14. pdflinkcheck/data/icons/PosterArt-720x1080.png +0 -0
  15. pdflinkcheck/data/icons/SmallLogo-44x44.png +0 -0
  16. pdflinkcheck/data/icons/SplashScreen-620x300.png +0 -0
  17. pdflinkcheck/data/icons/StoreLogo-50x50.png +0 -0
  18. pdflinkcheck/data/icons/WideLogo-310x150.png +0 -0
  19. pdflinkcheck/data/icons/red_pdf_512px.ico +0 -0
  20. pdflinkcheck/data/pyproject.toml +25 -37
  21. pdflinkcheck/data/themes/forest/forest-dark/border-accent-hover.png +0 -0
  22. pdflinkcheck/data/themes/forest/forest-dark/border-accent.png +0 -0
  23. pdflinkcheck/data/themes/forest/forest-dark/border-basic.png +0 -0
  24. pdflinkcheck/data/themes/forest/forest-dark/border-hover.png +0 -0
  25. pdflinkcheck/data/themes/forest/forest-dark/border-invalid.png +0 -0
  26. pdflinkcheck/data/themes/forest/forest-dark/card.png +0 -0
  27. pdflinkcheck/data/themes/forest/forest-dark/check-accent.png +0 -0
  28. pdflinkcheck/data/themes/forest/forest-dark/check-basic.png +0 -0
  29. pdflinkcheck/data/themes/forest/forest-dark/check-hover.png +0 -0
  30. pdflinkcheck/data/themes/forest/forest-dark/check-tri-accent.png +0 -0
  31. pdflinkcheck/data/themes/forest/forest-dark/check-tri-basic.png +0 -0
  32. pdflinkcheck/data/themes/forest/forest-dark/check-tri-hover.png +0 -0
  33. pdflinkcheck/data/themes/forest/forest-dark/check-unsel-accent.png +0 -0
  34. pdflinkcheck/data/themes/forest/forest-dark/check-unsel-basic.png +0 -0
  35. pdflinkcheck/data/themes/forest/forest-dark/check-unsel-hover.png +0 -0
  36. pdflinkcheck/data/themes/forest/forest-dark/check-unsel-pressed.png +0 -0
  37. pdflinkcheck/data/themes/forest/forest-dark/combo-button-basic.png +0 -0
  38. pdflinkcheck/data/themes/forest/forest-dark/combo-button-focus.png +0 -0
  39. pdflinkcheck/data/themes/forest/forest-dark/combo-button-hover.png +0 -0
  40. pdflinkcheck/data/themes/forest/forest-dark/down.png +0 -0
  41. pdflinkcheck/data/themes/forest/forest-dark/empty.png +0 -0
  42. pdflinkcheck/data/themes/forest/forest-dark/hor-accent.png +0 -0
  43. pdflinkcheck/data/themes/forest/forest-dark/hor-basic.png +0 -0
  44. pdflinkcheck/data/themes/forest/forest-dark/hor-hover.png +0 -0
  45. pdflinkcheck/data/themes/forest/forest-dark/notebook.png +0 -0
  46. pdflinkcheck/data/themes/forest/forest-dark/off-accent.png +0 -0
  47. pdflinkcheck/data/themes/forest/forest-dark/off-basic.png +0 -0
  48. pdflinkcheck/data/themes/forest/forest-dark/off-hover.png +0 -0
  49. pdflinkcheck/data/themes/forest/forest-dark/on-accent.png +0 -0
  50. pdflinkcheck/data/themes/forest/forest-dark/on-basic.png +0 -0
  51. pdflinkcheck/data/themes/forest/forest-dark/on-hover.png +0 -0
  52. pdflinkcheck/data/themes/forest/forest-dark/radio-accent.png +0 -0
  53. pdflinkcheck/data/themes/forest/forest-dark/radio-basic.png +0 -0
  54. pdflinkcheck/data/themes/forest/forest-dark/radio-hover.png +0 -0
  55. pdflinkcheck/data/themes/forest/forest-dark/radio-tri-accent.png +0 -0
  56. pdflinkcheck/data/themes/forest/forest-dark/radio-tri-basic.png +0 -0
  57. pdflinkcheck/data/themes/forest/forest-dark/radio-tri-hover.png +0 -0
  58. pdflinkcheck/data/themes/forest/forest-dark/radio-unsel-accent.png +0 -0
  59. pdflinkcheck/data/themes/forest/forest-dark/radio-unsel-basic.png +0 -0
  60. pdflinkcheck/data/themes/forest/forest-dark/radio-unsel-hover.png +0 -0
  61. pdflinkcheck/data/themes/forest/forest-dark/radio-unsel-pressed.png +0 -0
  62. pdflinkcheck/data/themes/forest/forest-dark/rect-accent-hover.png +0 -0
  63. pdflinkcheck/data/themes/forest/forest-dark/rect-accent.png +0 -0
  64. pdflinkcheck/data/themes/forest/forest-dark/rect-basic.png +0 -0
  65. pdflinkcheck/data/themes/forest/forest-dark/rect-hover.png +0 -0
  66. pdflinkcheck/data/themes/forest/forest-dark/right.png +0 -0
  67. pdflinkcheck/data/themes/forest/forest-dark/scale-hor.png +0 -0
  68. pdflinkcheck/data/themes/forest/forest-dark/scale-vert.png +0 -0
  69. pdflinkcheck/data/themes/forest/forest-dark/separator.png +0 -0
  70. pdflinkcheck/data/themes/forest/forest-dark/sizegrip.png +0 -0
  71. pdflinkcheck/data/themes/forest/forest-dark/spin-button-down-basic.png +0 -0
  72. pdflinkcheck/data/themes/forest/forest-dark/spin-button-down-focus.png +0 -0
  73. pdflinkcheck/data/themes/forest/forest-dark/spin-button-up.png +0 -0
  74. pdflinkcheck/data/themes/forest/forest-dark/tab-accent.png +0 -0
  75. pdflinkcheck/data/themes/forest/forest-dark/tab-basic.png +0 -0
  76. pdflinkcheck/data/themes/forest/forest-dark/tab-hover.png +0 -0
  77. pdflinkcheck/data/themes/forest/forest-dark/thumb-hor-accent.png +0 -0
  78. pdflinkcheck/data/themes/forest/forest-dark/thumb-hor-basic.png +0 -0
  79. pdflinkcheck/data/themes/forest/forest-dark/thumb-hor-hover.png +0 -0
  80. pdflinkcheck/data/themes/forest/forest-dark/thumb-vert-accent.png +0 -0
  81. pdflinkcheck/data/themes/forest/forest-dark/thumb-vert-basic.png +0 -0
  82. pdflinkcheck/data/themes/forest/forest-dark/thumb-vert-hover.png +0 -0
  83. pdflinkcheck/data/themes/forest/forest-dark/tree-basic.png +0 -0
  84. pdflinkcheck/data/themes/forest/forest-dark/tree-pressed.png +0 -0
  85. pdflinkcheck/data/themes/forest/forest-dark/up.png +0 -0
  86. pdflinkcheck/data/themes/forest/forest-dark/vert-accent.png +0 -0
  87. pdflinkcheck/data/themes/forest/forest-dark/vert-basic.png +0 -0
  88. pdflinkcheck/data/themes/forest/forest-dark/vert-hover.png +0 -0
  89. pdflinkcheck/data/themes/forest/forest-dark.tcl +536 -0
  90. pdflinkcheck/data/themes/forest/forest-light/border-accent-hover.png +0 -0
  91. pdflinkcheck/data/themes/forest/forest-light/border-accent.png +0 -0
  92. pdflinkcheck/data/themes/forest/forest-light/border-basic.png +0 -0
  93. pdflinkcheck/data/themes/forest/forest-light/border-hover.png +0 -0
  94. pdflinkcheck/data/themes/forest/forest-light/border-invalid.png +0 -0
  95. pdflinkcheck/data/themes/forest/forest-light/card.png +0 -0
  96. pdflinkcheck/data/themes/forest/forest-light/check-accent.png +0 -0
  97. pdflinkcheck/data/themes/forest/forest-light/check-basic.png +0 -0
  98. pdflinkcheck/data/themes/forest/forest-light/check-hover.png +0 -0
  99. pdflinkcheck/data/themes/forest/forest-light/check-tri-accent.png +0 -0
  100. pdflinkcheck/data/themes/forest/forest-light/check-tri-basic.png +0 -0
  101. pdflinkcheck/data/themes/forest/forest-light/check-tri-hover.png +0 -0
  102. pdflinkcheck/data/themes/forest/forest-light/check-unsel-accent.png +0 -0
  103. pdflinkcheck/data/themes/forest/forest-light/check-unsel-basic.png +0 -0
  104. pdflinkcheck/data/themes/forest/forest-light/check-unsel-hover.png +0 -0
  105. pdflinkcheck/data/themes/forest/forest-light/check-unsel-pressed.png +0 -0
  106. pdflinkcheck/data/themes/forest/forest-light/combo-button-basic.png +0 -0
  107. pdflinkcheck/data/themes/forest/forest-light/combo-button-focus.png +0 -0
  108. pdflinkcheck/data/themes/forest/forest-light/combo-button-hover.png +0 -0
  109. pdflinkcheck/data/themes/forest/forest-light/down-focus.png +0 -0
  110. pdflinkcheck/data/themes/forest/forest-light/down.png +0 -0
  111. pdflinkcheck/data/themes/forest/forest-light/empty.png +0 -0
  112. pdflinkcheck/data/themes/forest/forest-light/hor-accent.png +0 -0
  113. pdflinkcheck/data/themes/forest/forest-light/hor-basic.png +0 -0
  114. pdflinkcheck/data/themes/forest/forest-light/hor-hover.png +0 -0
  115. pdflinkcheck/data/themes/forest/forest-light/notebook.png +0 -0
  116. pdflinkcheck/data/themes/forest/forest-light/off-accent.png +0 -0
  117. pdflinkcheck/data/themes/forest/forest-light/off-basic.png +0 -0
  118. pdflinkcheck/data/themes/forest/forest-light/off-hover.png +0 -0
  119. pdflinkcheck/data/themes/forest/forest-light/on-accent.png +0 -0
  120. pdflinkcheck/data/themes/forest/forest-light/on-basic.png +0 -0
  121. pdflinkcheck/data/themes/forest/forest-light/on-hover.png +0 -0
  122. pdflinkcheck/data/themes/forest/forest-light/radio-accent.png +0 -0
  123. pdflinkcheck/data/themes/forest/forest-light/radio-basic.png +0 -0
  124. pdflinkcheck/data/themes/forest/forest-light/radio-hover.png +0 -0
  125. pdflinkcheck/data/themes/forest/forest-light/radio-tri-accent.png +0 -0
  126. pdflinkcheck/data/themes/forest/forest-light/radio-tri-basic.png +0 -0
  127. pdflinkcheck/data/themes/forest/forest-light/radio-tri-hover.png +0 -0
  128. pdflinkcheck/data/themes/forest/forest-light/radio-unsel-accent.png +0 -0
  129. pdflinkcheck/data/themes/forest/forest-light/radio-unsel-basic.png +0 -0
  130. pdflinkcheck/data/themes/forest/forest-light/radio-unsel-hover.png +0 -0
  131. pdflinkcheck/data/themes/forest/forest-light/radio-unsel-pressed.png +0 -0
  132. pdflinkcheck/data/themes/forest/forest-light/rect-accent-hover.png +0 -0
  133. pdflinkcheck/data/themes/forest/forest-light/rect-accent.png +0 -0
  134. pdflinkcheck/data/themes/forest/forest-light/rect-basic.png +0 -0
  135. pdflinkcheck/data/themes/forest/forest-light/rect-hover.png +0 -0
  136. pdflinkcheck/data/themes/forest/forest-light/right-focus.png +0 -0
  137. pdflinkcheck/data/themes/forest/forest-light/right.png +0 -0
  138. pdflinkcheck/data/themes/forest/forest-light/scale-hor.png +0 -0
  139. pdflinkcheck/data/themes/forest/forest-light/scale-vert.png +0 -0
  140. pdflinkcheck/data/themes/forest/forest-light/separator.png +0 -0
  141. pdflinkcheck/data/themes/forest/forest-light/sizegrip.png +0 -0
  142. pdflinkcheck/data/themes/forest/forest-light/spin-button-down-basic.png +0 -0
  143. pdflinkcheck/data/themes/forest/forest-light/spin-button-down-focus.png +0 -0
  144. pdflinkcheck/data/themes/forest/forest-light/spin-button-up.png +0 -0
  145. pdflinkcheck/data/themes/forest/forest-light/tab-accent.png +0 -0
  146. pdflinkcheck/data/themes/forest/forest-light/tab-basic.png +0 -0
  147. pdflinkcheck/data/themes/forest/forest-light/tab-hover.png +0 -0
  148. pdflinkcheck/data/themes/forest/forest-light/thumb-hor-accent.png +0 -0
  149. pdflinkcheck/data/themes/forest/forest-light/thumb-hor-basic.png +0 -0
  150. pdflinkcheck/data/themes/forest/forest-light/thumb-hor-hover.png +0 -0
  151. pdflinkcheck/data/themes/forest/forest-light/thumb-vert-accent.png +0 -0
  152. pdflinkcheck/data/themes/forest/forest-light/thumb-vert-basic.png +0 -0
  153. pdflinkcheck/data/themes/forest/forest-light/thumb-vert-hover.png +0 -0
  154. pdflinkcheck/data/themes/forest/forest-light/tree-basic.png +0 -0
  155. pdflinkcheck/data/themes/forest/forest-light/tree-pressed.png +0 -0
  156. pdflinkcheck/data/themes/forest/forest-light/up.png +0 -0
  157. pdflinkcheck/data/themes/forest/forest-light/vert-accent.png +0 -0
  158. pdflinkcheck/data/themes/forest/forest-light/vert-basic.png +0 -0
  159. pdflinkcheck/data/themes/forest/forest-light/vert-hover.png +0 -0
  160. pdflinkcheck/data/themes/forest/forest-light.tcl +544 -0
  161. pdflinkcheck/datacopy.py +18 -1
  162. pdflinkcheck/dev.py +12 -25
  163. pdflinkcheck/environment.py +76 -0
  164. pdflinkcheck/gui.py +366 -457
  165. pdflinkcheck/helpers.py +88 -0
  166. pdflinkcheck/io.py +27 -23
  167. pdflinkcheck/report.py +692 -121
  168. pdflinkcheck/security.py +189 -0
  169. pdflinkcheck/splash.py +38 -0
  170. pdflinkcheck/stdlib_server.py +14 -20
  171. pdflinkcheck/stdlib_server_alt.py +571 -0
  172. pdflinkcheck/tk_utils.py +188 -0
  173. pdflinkcheck/update_msix_version.py +49 -0
  174. pdflinkcheck/validate.py +129 -218
  175. pdflinkcheck/version_info.py +6 -3
  176. {pdflinkcheck-1.1.73.dist-info → pdflinkcheck-1.2.29.dist-info}/METADATA +84 -81
  177. pdflinkcheck-1.2.29.dist-info/RECORD +183 -0
  178. pdflinkcheck-1.2.29.dist-info/WHEEL +5 -0
  179. {pdflinkcheck-1.1.73.dist-info → pdflinkcheck-1.2.29.dist-info}/entry_points.txt +0 -1
  180. pdflinkcheck-1.2.29.dist-info/licenses/LICENSE +27 -0
  181. pdflinkcheck-1.2.29.dist-info/licenses/LICENSE-MIT +9 -0
  182. pdflinkcheck-1.2.29.dist-info/top_level.txt +1 -0
  183. pdflinkcheck/analyze_pypdf_v2.py +0 -218
  184. pdflinkcheck-1.1.73.dist-info/RECORD +0 -21
  185. pdflinkcheck-1.1.73.dist-info/WHEEL +0 -4
  186. /pdflinkcheck-1.1.73.dist-info/licenses/LICENSE → /pdflinkcheck-1.2.29.dist-info/licenses/LICENSE-AGPL3 +0 -0
@@ -1,4 +1,7 @@
1
- # src/pdflinkcheck/analyze_pypdf.py
1
+ #!/usr/bin/env python3
2
+ # SPDX-License-Identifier: MIT
3
+ # src/pdflinkcheck/analysis_pypdf.py
4
+ from __future__ import annotations
2
5
  import sys
3
6
  from pathlib import Path
4
7
  import logging
@@ -6,17 +9,37 @@ from typing import Dict, Any, Optional, List
6
9
 
7
10
  from pypdf import PdfReader
8
11
  from pypdf.generic import Destination, NameObject, ArrayObject, IndirectObject
12
+ from pdflinkcheck.helpers import PageRef
9
13
 
10
14
 
11
15
  from pdflinkcheck.io import error_logger, export_report_data, get_first_pdf_in_cwd, LOG_FILE_PATH
12
- from pdflinkcheck.report import run_report
13
- #from pdflinkcheck.validate import run_validation
14
16
 
15
17
  """
16
18
  Inspect target PDF for both URI links and for GoTo links, using only pypdf, not Fitz
17
19
  """
18
20
 
19
- def get_anchor_text_pypdf(page, rect) -> str:
21
+ def analyze_pdf(pdf_path: str):
22
+ data = {}
23
+ data["links"] = []
24
+ data["toc"] = []
25
+ data["file_ov"] = {}
26
+
27
+ try:
28
+ reader = PdfReader(pdf_path)
29
+ except Exception as e:
30
+ print(f"pypdf.PdfReader() failed: {e}")
31
+ return data
32
+
33
+ extracted_links = _extract_links_pypdf(reader)
34
+ structural_toc = _extract_toc_pypdf(reader)
35
+ page_count = len(reader.pages)
36
+ data["links"] = extracted_links
37
+ data["toc"] = structural_toc
38
+ data["file_ov"]["total_pages"] = page_count
39
+ return data
40
+
41
+
42
+ def _get_anchor_text_pypdf(page, rect) -> str:
20
43
  """
21
44
  Extracts text within the link's bounding box using a visitor function.
22
45
  Reliable for finding text associated with a link without PyMuPDF.
@@ -33,7 +56,7 @@ def get_anchor_text_pypdf(page, rect) -> str:
33
56
 
34
57
  parts: List[str] = []
35
58
 
36
- def visitor_body(text, cm, tm, font_dict, font_size):
59
+ def _visitor_body(text, cm, tm, font_dict, font_size):
37
60
  # tm[4], tm[5] are the current text insertion point coordinates (x, y)
38
61
  x, y = tm[4], tm[5]
39
62
 
@@ -44,49 +67,49 @@ def get_anchor_text_pypdf(page, rect) -> str:
44
67
  if text.strip():
45
68
  parts.append(text)
46
69
 
47
- page.extract_text(visitor_text=visitor_body)
70
+ page.extract_text(visitor_text=_visitor_body)
48
71
 
49
72
  raw_extracted = "".join(parts)
50
73
  cleaned = " ".join(raw_extracted.split()).strip()
51
74
 
52
75
  return cleaned if cleaned else "Graphic/Empty Link"
53
76
 
54
- def resolve_pypdf_destination(reader: PdfReader, dest, obj_id_to_page: dict) -> str:
55
- """
56
- Resolves a Destination object or IndirectObject to a 1-based page number string.
57
- """
77
+ def _resolve_pypdf_destination(reader: PdfReader, dest, obj_id_to_page: dict) -> Optional[int]:
58
78
  try:
59
79
  if isinstance(dest, Destination):
60
- return str(dest.page_number + 1)
61
-
80
+ # .page_number in pypdf is already 0-indexed
81
+ return dest.page_number
82
+
62
83
  if isinstance(dest, IndirectObject):
63
- return str(obj_id_to_page.get(dest.idnum, "Unknown"))
64
-
84
+ return obj_id_to_page.get(dest.idnum)
85
+
65
86
  if isinstance(dest, ArrayObject) and len(dest) > 0:
66
87
  if isinstance(dest[0], IndirectObject):
67
- return str(obj_id_to_page.get(dest[0].idnum, "Unknown"))
68
-
69
- return "Unknown"
88
+ return obj_id_to_page.get(dest[0].idnum)
89
+
90
+ return None # Unresolved → None
70
91
  except Exception:
71
- return "Error Resolving"
92
+ return None
93
+
72
94
 
73
- def extract_links_pypdf(pdf_path):
95
+ def _extract_links_pypdf(reader: PdfReader) -> List[Dict[str, Any]]:
74
96
  """
75
97
  Termux-compatible link extraction using pure-Python pypdf.
76
98
  Matches the reporting schema of the PyMuPDF version.
77
99
  """
78
- reader = PdfReader(pdf_path)
79
100
 
80
101
  # Pre-map Object IDs to Page Numbers for fast internal link resolution
81
102
  obj_id_to_page = {
82
- page.indirect_reference.idnum: i + 1
103
+ page.indirect_reference.idnum: i
83
104
  for i, page in enumerate(reader.pages)
84
105
  }
85
106
 
86
107
  all_links = []
87
108
 
88
109
  for i, page in enumerate(reader.pages):
89
- page_num = i + 1
110
+ #page_num = i
111
+ # Use PageRef to stay consistent
112
+ page_source = PageRef.from_index(i)
90
113
  if "/Annots" not in page:
91
114
  continue
92
115
 
@@ -96,16 +119,16 @@ def extract_links_pypdf(pdf_path):
96
119
  continue
97
120
 
98
121
  rect = obj.get("/Rect")
99
- anchor_text = get_anchor_text_pypdf(page, rect)
122
+ anchor_text = _get_anchor_text_pypdf(page, rect)
100
123
 
101
124
  link_dict = {
102
- 'page': page_num,
125
+ 'page': page_source.machine,
103
126
  'rect': list(rect) if rect else None,
104
127
  'link_text': anchor_text,
105
128
  'type': 'Other Action',
106
129
  'target': 'Unknown'
107
130
  }
108
-
131
+
109
132
  # Handle URI (External)
110
133
  if "/A" in obj and "/URI" in obj["/A"]:
111
134
  uri = obj["/A"]["/URI"]
@@ -114,16 +137,20 @@ def extract_links_pypdf(pdf_path):
114
137
  'url': uri,
115
138
  'target': uri
116
139
  })
117
-
140
+
118
141
  # Handle GoTo (Internal)
119
142
  elif "/Dest" in obj or ("/A" in obj and "/D" in obj["/A"]):
120
143
  dest = obj.get("/Dest") or obj["/A"].get("/D")
121
- target_page = resolve_pypdf_destination(reader, dest, obj_id_to_page)
122
- link_dict.update({
123
- 'type': 'Internal (GoTo/Dest)',
124
- 'destination_page': target_page,
125
- 'target': f"Page {target_page}"
126
- })
144
+ target_page = _resolve_pypdf_destination(reader, dest, obj_id_to_page)
145
+ # print(f"DEBUG: resolved target_page = {target_page} (type: {type(target_page)})")
146
+ if target_page is not None:
147
+ dest_page = PageRef.from_index(target_page)
148
+ link_dict.update({
149
+ 'type': 'Internal (GoTo/Dest)',
150
+ 'destination_page': dest_page.machine,
151
+ #'target': f"Page {target_page}"
152
+ 'target': dest_page.machine
153
+ })
127
154
 
128
155
  # Handle Remote GoTo (GoToR)
129
156
  elif "/A" in obj and obj["/A"].get("/S") == "/GoToR":
@@ -139,9 +166,8 @@ def extract_links_pypdf(pdf_path):
139
166
  return all_links
140
167
 
141
168
 
142
- def extract_toc_pypdf(pdf_path: str) -> List[Dict[str, Any]]:
169
+ def _extract_toc_pypdf(reader: PdfReader) -> List[Dict[str, Any]]:
143
170
  try:
144
- reader = PdfReader(pdf_path)
145
171
  # Note: outline is a property, not a method.
146
172
  toc_tree = reader.outline
147
173
  toc_data = []
@@ -152,7 +178,10 @@ def extract_toc_pypdf(pdf_path: str) -> List[Dict[str, Any]]:
152
178
  # Using the reader directly is the only way to avoid
153
179
  # the 'Destination' object has no attribute error
154
180
  try:
155
- page_num = reader.get_destination_page_number(item) + 1
181
+ page_num_raw = reader.get_destination_page_number(item)
182
+ # page_num_raw is 0-indexed. Use PageRef to store it.
183
+ ref = PageRef.from_index(page_num_raw)
184
+ page_num = ref.machine
156
185
  except:
157
186
  page_num = "N/A"
158
187
 
@@ -177,8 +206,9 @@ def call_stable():
177
206
  Note: This requires defining PROJECT_NAME, CLI_MAIN_FILE, etc., or
178
207
  passing them as arguments to run_report.
179
208
  """
180
- run_report(pdf_library = "pypdf")
181
- #run_validation(pdf_library = "pypdf")
209
+ from pdflinkcheck.report import run_report_and_call_exports
210
+
211
+ run_report_and_call_exports(pdf_library = "pypdf")
182
212
 
183
213
  if __name__ == "__main__":
184
214
  call_stable()
pdflinkcheck/cli.py CHANGED
@@ -1,10 +1,13 @@
1
+ #!/usr/bin/env python3
2
+ # SPDX-License-Identifier: MIT
1
3
  # src/pdflinkcheck/cli.py
4
+ from __future__ import annotations
2
5
  import typer
3
6
  from typing import Literal
4
7
  from typer.models import OptionInfo
5
8
  from rich.console import Console
6
9
  from pathlib import Path
7
- from pdflinkcheck.report import run_report # Assuming core logic moves here
10
+ from pdflinkcheck.report import run_report_and_call_exports # Assuming core logic moves here
8
11
  from typing import Dict, Optional, Union, List
9
12
  import pyhabitat
10
13
  import sys
@@ -13,35 +16,65 @@ from importlib.resources import files
13
16
 
14
17
  from pdflinkcheck.version_info import get_version_from_pyproject
15
18
  from pdflinkcheck.validate import run_validation
16
-
19
+ from pdflinkcheck.environment import is_in_git_repo, assess_default_pdf_library
20
+ from pdflinkcheck.io import get_first_pdf_in_cwd
17
21
 
18
22
  console = Console() # to be above the tkinter check, in case of console.print
19
23
 
24
+ # Force Rich to always enable colors, even when running from a .pyz bundle
25
+ os.environ["FORCE_COLOR"] = "1"
26
+ # Optional but helpful for full terminal feature detection
27
+ os.environ["TERM"] = "xterm-256color"
28
+
20
29
  app = typer.Typer(
21
30
  name="pdflinkcheck",
22
31
  help=f"A command-line tool for comprehensive PDF link analysis and reporting. (v{get_version_from_pyproject()})",
23
32
  add_completion=False,
24
33
  invoke_without_command = True,
25
34
  no_args_is_help = False,
35
+ context_settings={"ignore_unknown_options": True,
36
+ "allow_extra_args": True,
37
+ "help_option_names": ["-h", "--help"]},
26
38
  )
27
39
 
28
40
 
41
+ def debug_callback(value: bool):
42
+ #def debug_callback(ctx: typer.Context, value: bool):
43
+ if value:
44
+ # This runs IMMEDIATELY when --debug is parsed, even before --help
45
+ # 1. Access the list of all command-line arguments
46
+ full_command_list = sys.argv
47
+ # 2. Join the list into a single string to recreate the command
48
+ command_string = " ".join(full_command_list)
49
+ # 3. Print the command
50
+ typer.echo(f"command:\n{command_string}\n")
51
+ return value
52
+
53
+ if "--show-command" in sys.argv or "--debug" in sys.argv:
54
+ debug_callback(True)
55
+
29
56
  @app.callback()
30
- def main(ctx: typer.Context):
57
+ def main(ctx: typer.Context,
58
+ version: Optional[bool] = typer.Option(
59
+ None, "--version", is_flag=True, help="Show the version."
60
+ ),
61
+ debug: bool = typer.Option(
62
+ False, "--debug", is_flag=True, help="Enable verbose debug logging and echo the full command string."
63
+ ),
64
+ show_command: bool = typer.Option(
65
+ False, "--show-command", is_flag=True, help="Echo the full command string to the console before execution."
66
+ )
67
+ ):
31
68
  """
32
69
  If no subcommand is provided, launch the GUI.
33
70
  """
34
-
71
+ if version:
72
+ typer.echo(get_version_from_pyproject())
73
+ raise typer.Exit(code=0)
74
+
35
75
  if ctx.invoked_subcommand is None:
36
76
  gui_command()
37
77
  raise typer.Exit(code=0)
38
-
39
- # 1. Access the list of all command-line arguments
40
- full_command_list = sys.argv
41
- # 2. Join the list into a single string to recreate the command
42
- command_string = " ".join(full_command_list)
43
- # 3. Print the command
44
- typer.echo(f"command:\n{command_string}\n")
45
78
 
46
79
 
47
80
  # help-tree() command: fragile, experimental, defaults to not being included.
@@ -54,10 +87,10 @@ if os.environ.get('DEV_TYPER_HELP_TREE',0) in ('true','1'):
54
87
  @app.command(name="docs", help="Show the docs for this software.")
55
88
  def docs_command(
56
89
  license: Optional[bool] = typer.Option(
57
- None, "--license", "-l", help="Show the full AGPLv3 license text."
90
+ None, "--license", "-l", help="Show the LICENSE text."
58
91
  ),
59
92
  readme: Optional[bool] = typer.Option(
60
- None, "--readme", "-r", help="Show the full README.md content."
93
+ None, "--readme", "-r", help="Show the README.md content."
61
94
  ),
62
95
  ):
63
96
  """
@@ -70,12 +103,16 @@ def docs_command(
70
103
  console.print("[yellow]Please use either the --license or --readme flag.[/yellow]")
71
104
  return # Typer will automatically show the help message.
72
105
 
106
+ if is_in_git_repo():
107
+ """This is too aggressive. But we don't expect it often. Probably worth it."""
108
+ from pdflinkcheck.datacopy import ensure_data_files_for_build
109
+ ensure_data_files_for_build()
110
+
73
111
  # --- Handle --license flag ---
74
112
  if license:
75
113
  try:
76
114
  license_path = files("pdflinkcheck.data") / "LICENSE"
77
115
  license_text = license_path.read_text(encoding="utf-8")
78
-
79
116
  console.print(f"\n[bold green]=== GNU AFFERO GENERAL PUBLIC LICENSE V3+ ===[/bold green]")
80
117
  console.print(license_text, highlight=False)
81
118
 
@@ -100,35 +137,47 @@ def docs_command(
100
137
  # Exit successfully if any flag was processed
101
138
  raise typer.Exit(code=0)
102
139
 
140
+ @app.command(name="tools", help= "Additional features, hamburger menu.")
141
+ def tools_command(
142
+ clear_cache: bool = typer.Option(
143
+ False,
144
+ "--clear-cache",
145
+ is_flag=True,
146
+ help="Clear the environment caches. \n - pymupdf_is_available() \n - is_in_git_repo() \nMain purpose: Run after adding PyMuPDF to an existing installation where it was previously missing, because pymupdf_is_available() would have been cached as False."
147
+ )
148
+ ):
149
+ from pdflinkcheck.environment import clear_all_caches
150
+ if clear_cache:
151
+ clear_all_caches()
152
+
103
153
  @app.command(name="analyze") # Added a command name 'analyze' for clarity
104
154
  def analyze_pdf( # Renamed function for clarity
105
- pdf_path: Path = typer.Argument(
106
- ...,
155
+ pdf_path: Optional[Path] = typer.Argument(
156
+ None,
107
157
  exists=True,
108
158
  file_okay=True,
109
159
  dir_okay=False,
110
160
  readable=True,
111
161
  resolve_path=True,
112
- help="The path to the PDF file to analyze."
162
+ help="Path to the PDF file to analyze. If omitted, searches current directory."
113
163
  ),
114
164
  export_format: Optional[Literal["JSON", "TXT", "JSON,TXT", "NONE"]] = typer.Option(
115
- "JSON",
116
- "--export-format","-e",
165
+ "JSON,TXT",
166
+ "--format","-f",
117
167
  case_sensitive=False,
118
168
  help="Export format. Use 'None' to suppress file export.",
119
169
  ),
120
- max_links: int = typer.Option(
121
- 0,
122
- "--max-links", "-m",
123
- min=0,
124
- help="Report brevity control. Use 0 to show all."
125
- ),
126
170
 
127
- pdf_library: Literal["pypdf", "pymupdf"] = typer.Option(
128
- "pypdf",#"pymupdf",
129
- "--pdf-library","-p",
171
+ pdf_library: Literal["auto","pdfium","pypdf", "pymupdf"] = typer.Option(
172
+ assess_default_pdf_library(),
173
+ "--engine","-e",
130
174
  envvar="PDF_ENGINE",
131
- help="Select PDF parsing library, pymupdf or pypdf.",
175
+ help="PDF parsing library. pypdf (pure Python), pymupdf (fast, AGPL3+ licensed), pdfium (fast, BSD-3 licensed).",
176
+ ),
177
+ print_bool: bool = typer.Option(
178
+ True,
179
+ "--print/--quiet",
180
+ help="Print or do not print the analysis and validation report to console."
132
181
  )
133
182
  ):
134
183
  """
@@ -138,6 +187,11 @@ def analyze_pdf( # Renamed function for clarity
138
187
  • Internal GoTo links point to valid pages
139
188
  • Remote GoToR links point to existing files
140
189
  • TOC bookmarks target valid pages
190
+
191
+ Validates:
192
+ • Are referenced files available?
193
+ • Are the page numbers referenced by GoTo links within the length of the document?
194
+
141
195
  """
142
196
 
143
197
  """
@@ -149,118 +203,60 @@ def analyze_pdf( # Renamed function for clarity
149
203
 
150
204
  Env Var: If no flag is present, it checks PDF_ENGINE.
151
205
 
152
- Code Default: (Lowest priority) It falls back to "pypdf" as defined in your typer.Option.
206
+ Code Default: (Lowest priority) It falls back to "pypdf" as defined in typer.Option.
153
207
  """
154
208
 
209
+ if pdf_path is None:
210
+ pdf_path = get_first_pdf_in_cwd()
211
+ if pdf_path is None:
212
+ console.print("[red]Error: No PDF file provided and none found in current directory.[/red]")
213
+ raise typer.Exit(code=1)
214
+ console.print(f"[dim]No file specified — using: {Path(pdf_path).name}[/dim]")
215
+
216
+ pdf_path_str = str(pdf_path)
217
+
155
218
  VALID_FORMATS = ("JSON") # extend later
156
219
  requested_formats = [fmt.strip().upper() for fmt in export_format.split(",")]
157
220
  if "NONE" in requested_formats or not export_format.strip() or export_format == "0":
158
221
  export_formats = ""
159
222
  else:
160
223
  # Filter for valid ones: ("JSON", "TXT")
161
- # This allows "JSON,TXT" to become "JSONTXT" which your run_report logic can handle
224
+ # This allows "JSON,TXT" to become "JSONTXT" which run_report logic can handle
162
225
  valid = [f for f in requested_formats if f in ("JSON", "TXT")]
163
226
  export_formats = "".join(valid)
164
227
 
165
228
  if not valid and "NONE" not in requested_formats:
166
229
  typer.echo(f"Warning: No valid formats found in '{export_format}'. Supported: JSON, TXT.")
230
+
167
231
 
168
- run_report(
232
+ # The meat and potatoes
233
+ report_results = run_report_and_call_exports(
169
234
  pdf_path=str(pdf_path),
170
- max_links=max_links,
171
235
  export_format = export_formats,
172
236
  pdf_library = pdf_library,
237
+ print_bool = print_bool,
173
238
  )
174
239
 
175
- @app.command(name="validate")
176
- def validate_pdf(
177
- pdf_path: Optional[Path] = typer.Argument(
178
- None,
179
- exists=True,
180
- file_okay=True,
181
- dir_okay=False,
182
- readable=True,
183
- resolve_path=True,
184
- help="Path to the PDF file to validate. If omitted, searches current directory."
185
- ),
186
- export: bool = typer.Option(
187
- True,
188
- "--export",#"--no-export",
189
- help = "JSON export for validation check."
190
- ),
191
- pdf_library: Literal["pypdf", "pymupdf"] = typer.Option(
192
- "pypdf",
193
- "--library", "-l",
194
- envvar="PDF_ENGINE",
195
- help="PDF parsing engine: pypdf (pure Python) or pymupdf (faster, if available)"
196
- ),
197
- fail_on_broken: bool = typer.Option(
198
- False,
199
- "--fail",
200
- help="Exit with code 1 if any broken links are found (useful for CI)"
201
- )
202
- ):
203
- """
204
- Validate internal, remote, and TOC links in a PDF.
205
-
206
- 1. Call the run_report() function, like calling the 'analyze' CLI command.
207
- 2. Inspects the results from 'run_report():
208
- - Are referenced files available?
209
- - Are the page numbers referenced by GoTo links within the length of the document?
210
- """
211
- from pdflinkcheck.io import get_first_pdf_in_cwd
212
-
213
- if pdf_path is None:
214
- pdf_path = get_first_pdf_in_cwd()
215
- if pdf_path is None:
216
- console.print("[red]Error: No PDF file provided and none found in current directory.[/red]")
217
- raise typer.Exit(code=1)
218
- console.print(f"[dim]No file specified — using: {pdf_path.name}[/dim]")
219
-
220
- pdf_path_str = str(pdf_path)
221
-
222
- console.print(f"[bold]Validating links in:[/bold] {pdf_path.name}")
223
- console.print(f"[bold]Using engine:[/bold] {pdf_library}\n")
224
-
225
- # Step 1: Run analysis (quietly)
226
- report = run_report(
227
- pdf_path=pdf_path_str,
228
- max_links=0,
229
- export_format="",
230
- pdf_library=pdf_library,
231
- print_bool=False
232
- )
233
-
234
- if not report or not report.get("data"):
240
+ if not report_results or not report_results.get("data"):
235
241
  console.print("[yellow]No links or TOC found — nothing to validate.[/yellow]")
236
242
  raise typer.Exit(code=0)
237
243
 
238
- # Step 2: Run validation
239
- validation_results = run_validation(
240
- report_results=report,
241
- pdf_path=pdf_path_str,
242
- pdf_library=pdf_library,
243
- export_json=export,
244
- print_bool=True
245
- )
246
-
244
+ validation_results = report_results["data"]["validation"]
247
245
  # Optional: fail on broken links
248
- broken_count = validation_results["summary-stats"]["broken-page"] + validation_results["summary-stats"]["broken-file"]
249
- if fail_on_broken and broken_count > 0:
250
- console.print(f"\n[bold red]Validation failed:[/bold red] {broken_count} broken link(s) found.")
251
- raise typer.Exit(code=1)
252
- elif broken_count > 0:
253
- console.print(f"\n[bold yellow]Warning:[/bold yellow] {broken_count} broken link(s) found.")
254
- else:
255
- console.print(f"\n[bold green]Success:[/bold green] No broken links or TOC issues!")
246
+ broken_page_count = validation_results["summary-stats"]["broken-page"] + validation_results["summary-stats"]["broken-file"]
247
+
248
+ if broken_page_count > 0:
249
+ console.print(f"\n[bold yellow]Warning:[/bold yellow] {broken_page_count} broken link(s) found.")
250
+ #else:
251
+ # console.print(f"\n[bold green]Success:[/bold green] No broken links or TOC issues!\n")
256
252
 
257
- raise typer.Exit(code=0 if broken_count == 0 else 1)
253
+ raise typer.Exit(code=0 if broken_page_count == 0 else 1)
258
254
 
259
255
  @app.command(name="serve")
260
256
  def serve(
261
257
  host: str = typer.Option("0.0.0.0", "--host", "-h", help="Host to bind (use 0.0.0.0 for network access)"),
262
258
  port: int = typer.Option(8000, "--port", "-p", help="Port to listen on"),
263
- reload: bool = typer.Option(False, "--reload", help="Auto-reload on code changes (dev only)"),
259
+ reload: bool = typer.Option(False, "--reload", is_flag=True, help="Auto-reload on code changes (dev only)"),
264
260
  ):
265
261
  """
266
262
  Start the built-in web server for uploading and analyzing PDFs in the browser.
@@ -274,7 +270,7 @@ def serve(
274
270
  console.print(" → [yellow]Reload mode enabled[/yellow]")
275
271
 
276
272
  # Import here to avoid slow imports on other commands
277
- from pdflinkcheck.stdlib_server import ThreadedTCPServer, PDFLinkCheckHandler
273
+ from pdflinkcheck.stdlib_server_alt import ThreadedTCPServer, PDFLinkCheckHandler
278
274
  import socketserver
279
275
 
280
276
  try:
@@ -303,8 +299,6 @@ def gui_command(
303
299
  """
304
300
  Launch tkinter-based GUI.
305
301
  """
306
-
307
- # --- START FIX ---
308
302
  assured_auto_close_value = 0
309
303
 
310
304
  if isinstance(auto_close, OptionInfo):
@@ -316,11 +310,12 @@ def gui_command(
316
310
  # Case 2: Called explicitly by Typer (pdflinkcheck gui -c 3000)
317
311
  # Typer has successfully converted the command line argument, and auto_close is an int.
318
312
  assured_auto_close_value = int(auto_close)
319
- # --- END FIX ---
320
313
 
321
314
  if not pyhabitat.tkinter_is_available():
322
315
  _gui_failure_msg()
323
316
  return
317
+ #from pdflinkcheck.gui import start_gui
318
+ #from pdflinkcheck.gui_alt import start_gui
324
319
  from pdflinkcheck.gui import start_gui
325
320
  start_gui(time_auto_close = assured_auto_close_value)
326
321
 
@@ -0,0 +1,51 @@
1
+ # I Have Questions.md
2
+
3
+ ## Subject matter:
4
+ How to create a graphical user interface.
5
+
6
+ ## Body:
7
+ When I was about 10 years old I dug through 'C:/Program Files/' repeatedly in a hope to discover how to make a pop-up window.
8
+ What defined the edges of an interface?
9
+ Why do some windows look different than others?
10
+ How can I add buttons?
11
+
12
+ I was excited. I wanted to make something.
13
+
14
+ Could I mimick code from a software that was installed on my computer?
15
+ I checked each folder in 'C:/Program Files/' looking for clues.
16
+
17
+ As I searched, the questions changed.
18
+ What is a DLL?
19
+ Mostly, the only files I could open and inspect were little icons and fuzzy images - Why?
20
+
21
+ I gave up.
22
+ Wait - No I didn't. I am here now.
23
+
24
+ Honestly, I still don't understand where the edges of the window come from.
25
+ The easy answer is: **libraries**.
26
+ Many people have done a lot of work to build various GUI libraries, to help people like me (and you) build software.
27
+
28
+ For this package, the application window is built with Tkinter, which is included in Python's standard library.
29
+ You can see how the graphical user interface (GUI) is defined at: https://raw.githubusercontent.com/City-of-Memphis-Wastewater/pdflinkcheck/main/src/pdflinkheck/gui.py
30
+
31
+ This gui.py file isn't perfect, but exploring it will be far more illuminating than trying to open a DLL file in Notepad.
32
+
33
+ This is not a recomendation to use Tkinter. I would recommend learning how to build a basic web-stack GUI which can be served locally.
34
+
35
+ You might not want to make classic interfaces.
36
+ It is what I grew up with, so I get a tickle when I participate in the tradition of local programs, but web and mobile are super valid.
37
+ If you want to make classic interfaces, you should learn about Tauri.
38
+ If you write core logic and then expose it in a way that’s friendly to the web, you can then use Tauri to wrap that web interface into something that feels native on your machine.
39
+ This sounds wild, to go from native core to web tech back to native distribution, but it makes sense when you figure that:
40
+ - Web stack interfaces (HTML, CSS, TS/JS) offers the most control and best portability of graphics, with lots of people having built tools that you can leverage.
41
+ - Making your code accessible via web requests and/or an API will help it have the widest possible reach.
42
+
43
+ Personally, I get really excited when my Python code can run smoothly on Windows, iOS, Linux, and mostly importantly, as Linux on Android via Termux. Yes, sure, if Android is a target, the same core can be packaged as an Android app and be more accessible. Why do I want Termux? Because it's more about leveraging the machine. Basically, with code that can run on Termux, I can take any old android phone in a drawer and use it like I might use a Raspberry Pi. Tkinter will not run from Termux, not without proot. It is better to start a server on Termux, and then vew the app on localhost through your browser.
44
+
45
+ Links:
46
+ - https://docs.python.org/3/library/tkinter.html
47
+ - https://v2.tauri.app/start/
48
+ - https://pyo3.rs/main/doc/pyo3_ffi/index.html
49
+ - https://bheisler.github.io/post/calling-rust-in-python/
50
+
51
+ Copyright © 2025 George Clayton Bennett