pdflinkcheck 1.1.94__py3-none-any.whl → 1.2.29__py3-none-any.whl

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (35) hide show
  1. pdflinkcheck/__init__.py +88 -18
  2. pdflinkcheck/__main__.py +6 -0
  3. pdflinkcheck/analysis_pdfium.py +131 -0
  4. pdflinkcheck/{analyze_pymupdf.py → analysis_pymupdf.py} +99 -141
  5. pdflinkcheck/{analyze_pypdf.py → analysis_pypdf.py} +51 -39
  6. pdflinkcheck/cli.py +52 -48
  7. pdflinkcheck/data/LICENSE +18 -15
  8. pdflinkcheck/data/README.md +23 -25
  9. pdflinkcheck/data/pyproject.toml +17 -26
  10. pdflinkcheck/datacopy.py +16 -1
  11. pdflinkcheck/dev.py +2 -2
  12. pdflinkcheck/environment.py +14 -2
  13. pdflinkcheck/gui.py +346 -563
  14. pdflinkcheck/helpers.py +88 -0
  15. pdflinkcheck/io.py +24 -6
  16. pdflinkcheck/report.py +598 -97
  17. pdflinkcheck/security.py +189 -0
  18. pdflinkcheck/splash.py +38 -0
  19. pdflinkcheck/stdlib_server.py +7 -21
  20. pdflinkcheck/stdlib_server_alt.py +571 -0
  21. pdflinkcheck/tk_utils.py +188 -0
  22. pdflinkcheck/update_msix_version.py +2 -0
  23. pdflinkcheck/validate.py +104 -170
  24. pdflinkcheck/version_info.py +2 -2
  25. {pdflinkcheck-1.1.94.dist-info → pdflinkcheck-1.2.29.dist-info}/METADATA +41 -40
  26. {pdflinkcheck-1.1.94.dist-info → pdflinkcheck-1.2.29.dist-info}/RECORD +34 -27
  27. pdflinkcheck-1.2.29.dist-info/WHEEL +5 -0
  28. {pdflinkcheck-1.1.94.dist-info → pdflinkcheck-1.2.29.dist-info}/entry_points.txt +0 -1
  29. pdflinkcheck-1.2.29.dist-info/licenses/LICENSE +27 -0
  30. pdflinkcheck-1.2.29.dist-info/top_level.txt +1 -0
  31. pdflinkcheck/analyze_pypdf_v2.py +0 -217
  32. pdflinkcheck-1.1.94.dist-info/WHEEL +0 -4
  33. pdflinkcheck-1.1.94.dist-info/licenses/LICENSE +0 -24
  34. {pdflinkcheck-1.1.94.dist-info → pdflinkcheck-1.2.29.dist-info}/licenses/LICENSE-AGPL3 +0 -0
  35. {pdflinkcheck-1.1.94.dist-info → pdflinkcheck-1.2.29.dist-info}/licenses/LICENSE-MIT +0 -0
@@ -1,22 +1,20 @@
1
1
  Metadata-Version: 2.4
2
2
  Name: pdflinkcheck
3
- Version: 1.1.94
3
+ Version: 1.2.29
4
4
  Summary: A purpose-built PDF link analysis and reporting tool with GUI and CLI.
5
- Author: George Clayton Bennett
6
5
  Author-email: George Clayton Bennett <george.bennett@memphistn.gov>
6
+ Maintainer-email: George Clayton Bennett <george.bennett@memphistn.gov>
7
7
  License-Expression: MIT AND AGPL-3.0-or-later
8
- License-File: LICENSE
9
- License-File: LICENSE-AGPL3
10
- License-File: LICENSE-MIT
8
+ Project-URL: Homepage, https://github.com/city-of-memphis-wastewater/pdflinkcheck
9
+ Project-URL: Repository, https://github.com/city-of-memphis-wastewater/pdflinkcheck
11
10
  Classifier: Programming Language :: Python :: 3
12
11
  Classifier: Programming Language :: Python :: 3 :: Only
12
+ Classifier: Programming Language :: Python :: 3.9
13
13
  Classifier: Programming Language :: Python :: 3.10
14
14
  Classifier: Programming Language :: Python :: 3.11
15
15
  Classifier: Programming Language :: Python :: 3.12
16
16
  Classifier: Programming Language :: Python :: 3.13
17
17
  Classifier: Programming Language :: Python :: 3.14
18
- Classifier: License :: OSI Approved :: GNU Affero General Public License v3 or later (AGPLv3+)
19
- Classifier: License :: OSI Approved :: MIT License
20
18
  Classifier: Operating System :: OS Independent
21
19
  Classifier: Intended Audience :: End Users/Desktop
22
20
  Classifier: Intended Audience :: Developers
@@ -31,18 +29,23 @@ Classifier: Environment :: MacOS X
31
29
  Classifier: Environment :: Win32 (MS Windows)
32
30
  Classifier: Typing :: Typed
33
31
  Classifier: Development Status :: 4 - Beta
34
- Requires-Dist: pyhabitat>=1.0.53
32
+ Requires-Python: >=3.9
33
+ Description-Content-Type: text/markdown
34
+ License-File: LICENSE
35
+ License-File: LICENSE-MIT
36
+ License-File: LICENSE-AGPL3
37
+ Requires-Dist: pyhabitat>=1.1.5
35
38
  Requires-Dist: pypdf>=6.4.2
36
39
  Requires-Dist: rich>=14.2.0
37
40
  Requires-Dist: typer>=0.20.0
38
- Requires-Dist: pymupdf>=1.26.7 ; extra == 'full'
39
- Maintainer: George Clayton Bennett
40
- Maintainer-email: George Clayton Bennett <george.bennett@memphistn.gov>
41
- Requires-Python: >=3.10
42
- Project-URL: Homepage, https://github.com/city-of-memphis-wastewater/pdflinkcheck
43
- Project-URL: Repository, https://github.com/city-of-memphis-wastewater/pdflinkcheck
41
+ Provides-Extra: mupdf
42
+ Requires-Dist: pymupdf<2.0.0,>=1.24.0; extra == "mupdf"
43
+ Provides-Extra: pdfium
44
+ Requires-Dist: pypdfium2<6.0.0,>=5.2.0; extra == "pdfium"
44
45
  Provides-Extra: full
45
- Description-Content-Type: text/markdown
46
+ Requires-Dist: pymupdf<2.0.0,>=1.24.0; extra == "full"
47
+ Requires-Dist: pypdfium2<6.0.0,>=5.2.0; extra == "full"
48
+ Dynamic: license-file
46
49
 
47
50
  # pdflinkcheck
48
51
 
@@ -50,7 +53,7 @@ A purpose-built tool for comprehensive analysis of hyperlinks and GoTo links wit
50
53
 
51
54
  -----
52
55
 
53
- ![Screenshot of the pdflinkcheck GUI](https://raw.githubusercontent.com/City-of-Memphis-Wastewater/pdflinkcheck/main/assets/pdflinkcheck_gui_v1.1.92.png)
56
+ ![Screenshot of the pdflinkcheck GUI](https://raw.githubusercontent.com/City-of-Memphis-Wastewater/pdflinkcheck/main/assets/pdflinkcheck_gui_v1.1.97.png)
54
57
 
55
58
  -----
56
59
 
@@ -65,7 +68,7 @@ For the most user-typical experience, download the single-file binary matching y
65
68
  | **File Type** | **Primary Use Case** | **Recommended Launch Method** |
66
69
  | :--- | :--- | :--- |
67
70
  | **Executable (.exe, .elf)** | **GUI** | Double-click the file. |
68
- | **PYZ (Python Zip App)** | **CLI** or **GUI** | Run using your system's `python` command: `python pdflinkcheck-VERSION.pyz --help` |
71
+ | **PYZ (Python Zip App)** | **CLI** or **GUI** | Run using your system's `python` command: `python pdflinkcheck-VERSION.pyz --help` |
69
72
 
70
73
  ### Installation via pipx
71
74
 
@@ -99,7 +102,7 @@ Ways to launch the GUI interface:
99
102
  The core functionality is accessed via the `analyze` command.
100
103
 
101
104
  `pdflinkcheck --help`:
102
- ![Screenshot of the pdflinkcheck CLI Tree Help](https://raw.githubusercontent.com/City-of-Memphis-Wastewater/pdflinkcheck/main/assets/pdflinkcheck_cli_v1.1.92.png)
105
+ ![Screenshot of the pdflinkcheck CLI Tree Help](https://raw.githubusercontent.com/City-of-Memphis-Wastewater/pdflinkcheck/main/assets/pdflinkcheck_cli_v1.1.97.png)
103
106
 
104
107
 
105
108
  See the Help Tree by unlocking the help-tree CLI command, using the DEV_TYPER_HELP_TREE env var.
@@ -109,7 +112,7 @@ DEV_TYPER_HELP_TREE=1 pdflinkcheck help-tree` # bash
109
112
  $env:DEV_TYPER_HELP_TREE = "1"; pdflinkcheck help-tree` # PowerShell
110
113
  ```
111
114
 
112
- ![Screenshot of the pdflinkcheck CLI Tree Help](https://raw.githubusercontent.com/City-of-Memphis-Wastewater/pdflinkcheck/main/assets/pdflinkcheck_cli_v1.1.92_tree_help.png)
115
+ ![Screenshot of the pdflinkcheck CLI Tree Help](https://raw.githubusercontent.com/City-of-Memphis-Wastewater/pdflinkcheck/main/assets/pdflinkcheck_cli_v1.1.97_tree_help.png)
113
116
 
114
117
 
115
118
 
@@ -130,7 +133,6 @@ $env:DEV_TYPER_HELP_TREE = "1"; pdflinkcheck help-tree` # PowerShell
130
133
  |`<PDF_PATH>`|**Required.** The path to the PDF file to analyze.|N/A|
131
134
  |`--pdf-library / -p`|Select engine: `pymupdf` or `pypdf`.|`pypdf`|
132
135
  |`--export-format / -e`|Export to `JSON`, `TXT`, or `None` to suppress file output.|`JSON`|
133
- |`--max-links / -m`|Maximum links to display per section. Use `0` for all.|`0`|
134
136
 
135
137
  ### `gui` Command Options
136
138
 
@@ -144,9 +146,6 @@ $env:DEV_TYPER_HELP_TREE = "1"; pdflinkcheck help-tree` # PowerShell
144
146
  # Analyze a document, show all links, and save the report as JSON and TXT
145
147
  pdflinkcheck analyze "TE Maxson WWTF O&M Manual.pdf" --export-format JSON,TXT
146
148
 
147
- # Analyze a document but keep the print block short, showing only the first 10 links for each type
148
- pdflinkcheck analyze "TE Maxson WWTF O&M Manual.pdf" --max-links 10
149
-
150
149
  # Show the GUI for only a moment, like in a build check
151
150
  pdflinkcheck gui --auto-close 3000
152
151
 
@@ -158,22 +157,23 @@ pdflinkcheck docs --license --readme
158
157
 
159
158
  ## 📦 Library Access (Advanced)
160
159
 
161
- For developers importing `pdflinkcheck` into other Python projects, the core analysis functions are exposed directly in the root namespace:
160
+ For developers importing `pdflinkcheck` into other Python projects, the core analysis functions are exposed directly in the root namespace. The various `analysis_pdf_*` functions each use a different library to extract the target PDF's internal TOC, external links, and metadata.
162
161
 
163
- |**Function**|**Description**|
164
- |---|---|
165
- |`run_report()`|**(Primary function)** Performs the full analysis, prints to console, and handles file export.|
166
- |`extract_links_pynupdf()`|Function to retrieve all explicit links (URIs, GoTo, etc.) from a PDF path.|
167
- |`extract_toc_pymupdf()`|Function to extract the PDF's internal Table of Contents (bookmarks/outline).|
168
- |`extract_links_pynupdf()`|Function to retrieve all explicit links (URIs, GoTo, etc.) from a PDF path, using the pypdf library.|
169
- |`extract_toc_pymupdf()`|Function to extract the PDF's internal Table of Contents (bookmarks/outline), using the pypdf library.|
162
+ |**Function**|**Library**|**Description**|
163
+ |---|---|---|
164
+ |`run_report()`|pdflinkcheck | **(Primary function)** Performs the full analysis, prints to console, and handles file export.|
165
+ |`analyze_pdf_pdfium()`| pypdfium2 | Fast, ~10 mb, Permissively licensed |
166
+ |`analyze_pdf_pymupdf()`| PyMuPDF | Fast, ~30 mb, AGPL3+ licensed |
167
+ |`analyze_pdf_pypdf()`| pypdf library | Slow, ~2 mb, Permissively licensed |
170
168
 
171
169
  Exanple:
172
170
 
173
171
  ```python
174
- from pdflinkcheck.report import run_report
175
- from pdflinkcheck.analysis_pymupdf import extract_links_pymupdf, extract_toc_pymupdf 130 from pdflinkcheck.analysis_pymupdf import extract_links_pynupdf, extract_toc_pymupdf
176
- from pdflinkcheck.analysis_pypdf import extract_links_pypdf, extract_toc_pypdf
172
+ from pdflinkcheck import ( run_report,
173
+ analyze_pdf_pymupdf,
174
+ analyze_pdf_pypdf,
175
+ analyze_pdf_pdfium,
176
+ )
177
177
 
178
178
  file = "document1.pdf"
179
179
  report_data = run_report(file)
@@ -240,24 +240,24 @@ Termux compatibility is important in the modern age, because Android devices are
240
240
  Android is the most common operating system in the Global South.
241
241
  We aim to produce stable software that can do the most possible good.
242
242
 
243
- Now `pdflinkcheck` can run on Termux by using the `pypdf` engine.
243
+ Now `pdflinkcheck` can run on Termux by using the `pypdf` engine and the `pdfium` engine.
244
244
  Benefits:
245
245
  - `pypdf`-only artifacts, to reduce size to about 6% compared to artifacts that include `PyMuPDF`.
246
246
  - Web-stack GUI as an alternative to the Tkinter GUI, which can be run locally on Termux or as a web app.
247
247
 
248
248
 
249
249
  ### PDF Library Selection
250
- At long last, `PyMuPDF` is an optional dependency. All testing comparing `pyp df` and `PyMuPDF` has shown identical validation performance. However `PyMuPDF` is much faster. The benfit of `pypdf` is small size of packages and cross-platform compatibility.
250
+ At long last, `PyMuPDF` is an optional dependency. All testing comparing `pypdf` and `PyMuPDF` has shown identical validation performance. However `PyMuPDF` is much faster. The benfit of `pypdf` is small size of packages and cross-platform compatibility. We have recently added a PDFium option, which circumvents the AGPL3+.
251
251
 
252
252
  Expecte that all binaries and artifacts contain PyMuPDF, unlss they are built on Android. The GUI and CLI interfaces both allow selection of the library; if PyMuPDF is selected but is not available, the user will be warned.
253
253
 
254
254
  To install the complete version use one of these options:
255
255
 
256
256
  ```bash
257
- pip install "pdflinkcheck[full]"
258
- pipx install "pdflinkcheck[full]"
259
- uv tool install "pdflinkcheck[full]"
260
- uv add "pdflinkcheck[full]"
257
+ pip install "pdflinkcheck[mupdf]"
258
+ pipx install "pdflinkcheck[pdfium]"
259
+ uv tool install "pdflinkcheck[pdfium]"
260
+ uv add "pdflinkcheck[pdfium]"
261
261
  ```
262
262
 
263
263
  ---
@@ -317,6 +317,7 @@ The source code of pdflinkcheck itself remains licensed under the **MIT License*
317
317
  Links:
318
318
  - Source code: https://github.com/City-of-Memphis-Wastewater/pdflinkcheck/
319
319
  - PyMuPDF source code: https://github.com/pymupdf/PyMuPDF/
320
+ - pypdfium2 source code: https://github.com/pypdfium2-team/pypdfium2
320
321
  - pypdf source code: https://github.com/py-pdf/pypdf/
321
322
  - AGPLv3 text (FSF): https://www.gnu.org/licenses/agpl-3.0.html
322
323
  - MIT License text: https://opensource.org/license/mit
@@ -1,11 +1,28 @@
1
- pdflinkcheck/__init__.py,sha256=FYU-nvPd05mmzb2SVDEQ3VZjXo2R9ywivayH8TIwSiY,2369
2
- pdflinkcheck/analyze_pymupdf.py,sha256=Q-FHm0yhy_zRRKQiN0cXGYo5XbI5kA8wH_pRunKFF2I,12612
3
- pdflinkcheck/analyze_pypdf.py,sha256=RgAppxxdPtyjZZMsqHTmMOsZViC39kabMgaMQWBW_p0,7093
4
- pdflinkcheck/analyze_pypdf_v2.py,sha256=-NqQA0jkVodA6_b1GseaTyzHuLq4F6asFm5RWzYgzjM,7524
5
- pdflinkcheck/cli.py,sha256=BO8dmS4ZinsW1Ah406qRnGFHWUG0H_Zc_cfyYysSH_A,12776
1
+ pdflinkcheck/__init__.py,sha256=HOHHgzLwn3K_879klj4KiOdhyaFskcZFwKtX5l5XDX4,4441
2
+ pdflinkcheck/__main__.py,sha256=o2FpnQjrUbKe3oCt2LiSYsvEUfCjLSTPkxeNw9Oe2Hc,137
3
+ pdflinkcheck/analysis_pdfium.py,sha256=yGidq3xP9GZEZ9K0RBStNCMhq9k48PVbNQ0moMWbsCY,5077
4
+ pdflinkcheck/analysis_pymupdf.py,sha256=wY9V8NHm3o0puJ1pbZb4rfmknthshOXMqfyv2ex9S7E,10620
5
+ pdflinkcheck/analysis_pypdf.py,sha256=Av8FY05HSdLu7OhsilTTAYZT5BCBxI_dlAzRBlZnDXs,7481
6
+ pdflinkcheck/cli.py,sha256=n-uXUdUvG4KCJFRMGvmY95FeszOuwDZOsqvIgClKd7w,13365
7
+ pdflinkcheck/datacopy.py,sha256=jgVNzHwHsVp0L3fXJP3rR02CUQR96E3fGVvGWz9ZF44,3052
8
+ pdflinkcheck/dev.py,sha256=T6KlZYeC8yg61SshSRVD0Ja2VvrhQCIXaXjBIWYd4h4,3928
9
+ pdflinkcheck/environment.py,sha256=cz6J7ymfs3rnsgPucr7Hd5U9My5lkKQ2VPgPFPrLtIY,2617
10
+ pdflinkcheck/gui.py,sha256=aueTrjKe9mZPyJmpRTdBUeNB5kHfWIVjD7puZ6Q6Y9s,19918
11
+ pdflinkcheck/helpers.py,sha256=oDYm1p9cswee6DrJFhWfs3rpQXIFcVjfqmV8ZB4QcrA,3780
12
+ pdflinkcheck/io.py,sha256=pZPmzEMHiuqA0bPp-oPjlBNkjA996gfuZ4NZBH-8Eio,7980
13
+ pdflinkcheck/report.py,sha256=flVscJ-MMnaw4U1-woFy9dUKMjA3KpSs8gRegS7eVZY,33591
14
+ pdflinkcheck/security.py,sha256=U6U3_3utDg4lzM668c6OzDVujoo6EauHq1-JElU9_Nc,6157
15
+ pdflinkcheck/splash.py,sha256=u_2l6LpFpjTJianNiyY8VBAuW7bWPfq4qsKhPWRid48,1203
16
+ pdflinkcheck/stdlib_server.py,sha256=sKKgeHfFYlW5yAoXy4iDgMzb4Ho_rBV1OPHgS_DWZZ0,5940
17
+ pdflinkcheck/stdlib_server_alt.py,sha256=ixpBSd95e2a9dKE38FOCM8EtulVkpUqLWWa4oAAN6YY,16322
18
+ pdflinkcheck/tk_utils.py,sha256=8Y8oH-I9WNWUPUyrHlWG-GZacrQ_hn3qbZWzfN08pko,6795
19
+ pdflinkcheck/update_msix_version.py,sha256=RMxhpQOiieQFUF_SyCb7a5UpYF792nDgsHAP7bUAwo8,1629
20
+ pdflinkcheck/validate.py,sha256=H2bZopzcb5KPqdkahPT8Cg58WzSIQwdunT92JnXvdJA,11914
21
+ pdflinkcheck/version_info.py,sha256=h47z3TRAGXiahiPyBH2MqOXUO7x3c-iwvHV-wB_b2P0,3296
6
22
  pdflinkcheck/data/I Have Questions.md,sha256=lurBwKxUijxysq5qsUrLgzFpaSrZwe86KjbPMwCofvY,3152
7
- pdflinkcheck/data/LICENSE,sha256=gWkQJzBjRDg-GK4tbaw4_V-W1VJRbzx3Qx4i1_fWIrE,1718
8
- pdflinkcheck/data/README.md,sha256=bP4RqOxKrxUmiZY4-GOvYQZjQY75wkPFBD0bZFCHsCM,11549
23
+ pdflinkcheck/data/LICENSE,sha256=npi0VqlkL0AlE0FsGEIOXxMqXlmH4toZfBHesJpP0lA,1866
24
+ pdflinkcheck/data/README.md,sha256=ikWkKpHkkE2cC13St6fyKSxGs9JQ05QmLzOUTIPzUOg,11235
25
+ pdflinkcheck/data/pyproject.toml,sha256=8IGKmhfhdqxh-_mmVEg9_rTaxw_oQ8H9JL-zfJe2pOE,2732
9
26
  pdflinkcheck/data/icons/BoxArt-1080x1080.png,sha256=ifiadgqnq-23heD9_jEciuR1fOm-iVW9gU1Uj_2TRUA,140431
10
27
  pdflinkcheck/data/icons/Logo-150x150.png,sha256=qEjDoeD-fSV5AmcWoYRhs1cD7o95A7ckvqjooxIqrQY,6777
11
28
  pdflinkcheck/data/icons/Logo-300x300.png,sha256=o9zzwjFwKh7NVOilcQULMDwOYvy3EL8Afz_JpRtYhes,19479
@@ -16,7 +33,8 @@ pdflinkcheck/data/icons/SplashScreen-620x300.png,sha256=tc02X1Ykp7LrPrdjASPT6_Jj
16
33
  pdflinkcheck/data/icons/StoreLogo-50x50.png,sha256=9XpM96NWn0p0X5nXlsOdYW2Djbd_n48iK0aWVLcKaM8,1424
17
34
  pdflinkcheck/data/icons/WideLogo-310x150.png,sha256=qEjDoeD-fSV5AmcWoYRhs1cD7o95A7ckvqjooxIqrQY,6777
18
35
  pdflinkcheck/data/icons/red_pdf_512px.ico,sha256=GxrliQQkb7DbGG3DmAzVpYKjP75C8E6Dixtaw36vH94,67646
19
- pdflinkcheck/data/pyproject.toml,sha256=JtWnzstpx8vhNTrKrftdzJ8Ie9XelLGywkotu8RT5T4,2916
36
+ pdflinkcheck/data/themes/forest/forest-dark.tcl,sha256=sMUVZOLAkU9O67IQ8pqfLIXsM_zxYJIVTe6xhqYv0hQ,18926
37
+ pdflinkcheck/data/themes/forest/forest-light.tcl,sha256=_HKSSTECkNYg79m7KLvGugv8dcun2_cqpdr9k88uK9U,19218
20
38
  pdflinkcheck/data/themes/forest/forest-dark/border-accent-hover.png,sha256=GUqCh-r5m8S07gf8n9glMVRG4Wjz7Ih_pqz5rc2hA_g,385
21
39
  pdflinkcheck/data/themes/forest/forest-dark/border-accent.png,sha256=q4y0SyxR6fuAxgKiFZubGodg72yZ30ZXlS0XZQ1m0Yw,389
22
40
  pdflinkcheck/data/themes/forest/forest-dark/border-basic.png,sha256=xZc7CJx8cI4aKQ6HvRW3PJuH5X6WdUwQdOisaHd4nwE,333
@@ -85,7 +103,6 @@ pdflinkcheck/data/themes/forest/forest-dark/up.png,sha256=rYXTZXXxHpLqrnKV_PLukf
85
103
  pdflinkcheck/data/themes/forest/forest-dark/vert-accent.png,sha256=ByAmCAafDeylS7_FHpqX9DZ4ZFnPO-TqUl9Mh0fCPmc,158
86
104
  pdflinkcheck/data/themes/forest/forest-dark/vert-basic.png,sha256=-oOfurWXJNN7TSy_VEu6t-o9IMojg-1gmyGYjOk-Rwg,158
87
105
  pdflinkcheck/data/themes/forest/forest-dark/vert-hover.png,sha256=ran_87NoN4U9dSLXft-cO6XyKnP2whAgqUj7E6N4DKk,158
88
- pdflinkcheck/data/themes/forest/forest-dark.tcl,sha256=sMUVZOLAkU9O67IQ8pqfLIXsM_zxYJIVTe6xhqYv0hQ,18926
89
106
  pdflinkcheck/data/themes/forest/forest-light/border-accent-hover.png,sha256=G1F4TVltT88-xdqJgvek_v0QZ_RCNFJsI0hJgx5DvcA,445
90
107
  pdflinkcheck/data/themes/forest/forest-light/border-accent.png,sha256=b-r7W6ipGGzWCG_sZHyuoxZeLA7iLY5NAqNR2zEdF7M,463
91
108
  pdflinkcheck/data/themes/forest/forest-light/border-basic.png,sha256=aKludTEo35p98yt77J5vVhh4qy8QXPEipgz3a75Pwzw,311
@@ -156,21 +173,11 @@ pdflinkcheck/data/themes/forest/forest-light/up.png,sha256=qNsplPHruJhHV0mJDaomk
156
173
  pdflinkcheck/data/themes/forest/forest-light/vert-accent.png,sha256=Eyj8nUQHnEzxNVFhMv7FH_Ezn9pOt2Xe0-Aquh8_9YY,158
157
174
  pdflinkcheck/data/themes/forest/forest-light/vert-basic.png,sha256=tx9KswFqH_-ImUrm2M8cIjsBQKGupa2SJqjVjHCi9LM,158
158
175
  pdflinkcheck/data/themes/forest/forest-light/vert-hover.png,sha256=BjEilGWGjr0ulvCfmFiBmS-8Dr1jSQjt4CUkxS1e2h4,158
159
- pdflinkcheck/data/themes/forest/forest-light.tcl,sha256=_HKSSTECkNYg79m7KLvGugv8dcun2_cqpdr9k88uK9U,19218
160
- pdflinkcheck/datacopy.py,sha256=BobiRj4VhYUBCl57heY3g8fFPdhKSWzSPyk0YAZ24d4,2558
161
- pdflinkcheck/dev.py,sha256=dTHND_18kZx53SfYIhdOXKNaVdzRnZ-ed7uP9V8_NZo,3894
162
- pdflinkcheck/environment.py,sha256=gMGNqQpgM3tS4JaIZ7EclTVD7Xa5MNBJePtJ-fGi7m4,2146
163
- pdflinkcheck/gui.py,sha256=8tZggSRKw9w4mXL3hZKhCcoEnMwlMeCqlz2HzB4Hhsk,30074
164
- pdflinkcheck/io.py,sha256=XjB4yxLDRlB6JtroErzUOvWPgkZ4E7XmTxXnUC4VcXc,7352
165
- pdflinkcheck/report.py,sha256=T5EgSd9IC44KL4-NGjeRinLIy9vdcsbbOU-JS57LV4M,14239
166
- pdflinkcheck/stdlib_server.py,sha256=KJ4fP0d98qXJn2Ym3mjHhPbXLVsZM0yX2m-_coIjBoA,6455
167
- pdflinkcheck/update_msix_version.py,sha256=gH2vbX_nPMTz0IHo4spwwsrvam3ZgkF7_HFPi1Mo8z4,1552
168
- pdflinkcheck/validate.py,sha256=wytRPNQdSVOjWGywnR6r62qu2vSmyzx7HzLxOg-pGFg,14415
169
- pdflinkcheck/version_info.py,sha256=9c6b1kRyu0ksIpslhdNA0GDmQd5TsbcNEe0e5SXwZtY,3261
170
- pdflinkcheck-1.1.94.dist-info/licenses/LICENSE,sha256=gWkQJzBjRDg-GK4tbaw4_V-W1VJRbzx3Qx4i1_fWIrE,1718
171
- pdflinkcheck-1.1.94.dist-info/licenses/LICENSE-AGPL3,sha256=hIahDEOTzuHCU5J2nd07LWwkLW7Hko4UFO__ffsvB-8,34523
172
- pdflinkcheck-1.1.94.dist-info/licenses/LICENSE-MIT,sha256=hNjSMpr8jpZQzBj9tHKafry8bW1kfLnl-DxiSfw7Lyw,1127
173
- pdflinkcheck-1.1.94.dist-info/WHEEL,sha256=ZyFSCYkV2BrxH6-HRVRg3R9Fo7MALzer9KiPYqNxSbo,79
174
- pdflinkcheck-1.1.94.dist-info/entry_points.txt,sha256=OJs4WkAziNGSoZ2KP0FgYOj2JdL6EW8UphJebWJnz3c,55
175
- pdflinkcheck-1.1.94.dist-info/METADATA,sha256=nsnSp0FW4UziHfBezniQJ_2gjmuidyEUJxm0h-1t10E,13552
176
- pdflinkcheck-1.1.94.dist-info/RECORD,,
176
+ pdflinkcheck-1.2.29.dist-info/licenses/LICENSE,sha256=npi0VqlkL0AlE0FsGEIOXxMqXlmH4toZfBHesJpP0lA,1866
177
+ pdflinkcheck-1.2.29.dist-info/licenses/LICENSE-AGPL3,sha256=hIahDEOTzuHCU5J2nd07LWwkLW7Hko4UFO__ffsvB-8,34523
178
+ pdflinkcheck-1.2.29.dist-info/licenses/LICENSE-MIT,sha256=hNjSMpr8jpZQzBj9tHKafry8bW1kfLnl-DxiSfw7Lyw,1127
179
+ pdflinkcheck-1.2.29.dist-info/METADATA,sha256=37Kicn3cP9WUYoBnudSyIVytLqfsXAD2UZph9Qhol7M,13317
180
+ pdflinkcheck-1.2.29.dist-info/WHEEL,sha256=_zCd3N1l69ArxyTb8rzEoP9TpbYXkqRFSNOD5OuxnTs,91
181
+ pdflinkcheck-1.2.29.dist-info/entry_points.txt,sha256=cZaB_inIfr2X9lxMo1RhZr4602F3nTjTm3cXquzfw3Q,54
182
+ pdflinkcheck-1.2.29.dist-info/top_level.txt,sha256=WdBg8l6l3TF1HQDpR_PwSmBCSu5atKWFnPfNbRNwrME,13
183
+ pdflinkcheck-1.2.29.dist-info/RECORD,,
@@ -0,0 +1,5 @@
1
+ Wheel-Version: 1.0
2
+ Generator: setuptools (80.9.0)
3
+ Root-Is-Purelib: true
4
+ Tag: py3-none-any
5
+
@@ -1,3 +1,2 @@
1
1
  [console_scripts]
2
2
  pdflinkcheck = pdflinkcheck.cli:app
3
-
@@ -0,0 +1,27 @@
1
+ Some distributed binaries of this project include the PyMuPDF library, which is licensed under **AGPL3.0orlater**.
2
+ Any binary that incorporates PyMuPDF is therefore distributed under **AGPL3.0orlater**.
3
+ Other binaries use only the `pypdf` library and do not include PyMuPDF; these binaries are distributed under the **MIT License**.
4
+
5
+ For AGPLlicensed binaries, the complete corresponding source code must be made available to anyone who possesses a copy, upon request.
6
+ This obligation applies only to recipients of those binaries, and hosting the source code in GitHub Releases satisfies this requirement.
7
+
8
+ A binary becomes AGPLlicensed only when built with the optional `"full"` dependency group (as defined in `pyproject.toml` under `[project.optional-dependencies]`) or when PyMuPDF is otherwise included in the build environment.
9
+ The **source code of pdflinkcheck itself** remains licensed under the **MIT License**; only the distributed binary becomes AGPLlicensed when PyMuPDF is included.
10
+
11
+ Source code for each released version is available in the `pdflinkcheckVERSION.tar.gz` files on the projects GitHub Releases page.
12
+
13
+ Fulltext copies of **LICENSEMIT** and **LICENSEAGPL3** are included in the root of the repository.
14
+
15
+ **Links:**
16
+ - Project source code: https://github.com/City-of-Memphis-Wastewater/pdflinkcheck
17
+ - PyMuPDF source code: https://github.com/pymupdf/PyMuPDF
18
+ - pypdfium2 source code: https://github.com/pypdfium2-team/pypdfium2
19
+ - PDFium source code: https://pdfium.googlesource.com/pdfium/
20
+ - pypdf source code: https://github.com/py-pdf/pypdf
21
+ - AGPLv3 text (FSF): https://www.gnu.org/licenses/agpl-3.0.html
22
+ - MIT License text: https://opensource.org/license/mit
23
+ - BSD-3 License text: https://opensource.org/license/bsd-3-clause
24
+ - Apache-v2 License text: https://opensource.org/license/apache-2-0
25
+
26
+
27
+ Copyright 2025 George Clayton Bennett
@@ -0,0 +1 @@
1
+ pdflinkcheck
@@ -1,217 +0,0 @@
1
- #!/usr/bin/env python3
2
- # SPDX-License-Identifier: MIT
3
- # src/pdflinkcheck/analyze_pypdf_v2.py
4
- import sys
5
- from pathlib import Path
6
- import logging
7
- from typing import Dict, Any, List
8
-
9
- from pypdf import PdfReader
10
- from pypdf.generic import Destination, NameObject, IndirectObject
11
-
12
- """
13
- Inspect target PDF for both URI links and GoTo links, using only pypdf (no PyMuPDF/Fitz).
14
- Fully fixed and improved version as of December 2025 (compatible with pypdf >= 4.0).
15
- """
16
-
17
- def get_anchor_text_pypdf(page, rect) -> str:
18
- """
19
- Extracts text that falls within or near the link's bounding box using a visitor function.
20
- This is a reliable pure-pypdf method for associating visible text with a link annotation.
21
- """
22
- if not rect:
23
- return "N/A: Missing Rect"
24
-
25
- # PDF coordinates: bottom-left origin. Rect is [x0, y0, x1, y1]
26
- # Standardize Rect: [x_min, y_min, x_max, y_max]
27
- # Some PDF generators write Rect as [x_max, y_max, x_min, y_min]
28
- x_min, y_min, x_max, y_max = rect[0], rect[1], rect[2], rect[3]
29
- if x_min > x_max: x_min, x_max = x_max, x_min
30
- if y_min > y_max: y_min, y_max = y_max, y_min
31
-
32
- parts: List[str] = []
33
-
34
- def visitor_body(text: str, cm, tm, font_dict, font_size):
35
- # tm[4] and tm[5] are the (x, y) coordinates of the text insertion point
36
- x, y = tm[4], tm[5]
37
-
38
- # Guard against missing font_size
39
- actual_font_size = font_size if font_size else 10
40
-
41
-
42
- # Approximate Center-Alignment Check
43
- # Since tm[4/5] is usually the bottom-left of the character,
44
- # we shift our 'check point' slightly up and to the right based
45
- # on font size to approximate the center of the character.
46
- char_center_x = x + (actual_font_size / 4)
47
- char_center_y = y + (actual_font_size / 3)
48
-
49
- # Asymmetric Tolerance
50
- # We use a tighter vertical tolerance (3pt) to avoid catching lines above/below.
51
- # We use a wider horizontal tolerance (10pt) to catch kerning/spacing issues.
52
- v_tol = 3
53
- h_tol = 10
54
- if (x_min - h_tol) <= char_center_x <= (x_max + h_tol) and \
55
- (y_min - v_tol) <= char_center_y <= (y_max + v_tol):
56
- if text.strip():
57
- parts.append(text)
58
-
59
- # Extract text using the visitor – this preserves drawing order
60
- page.extract_text(visitor_text=visitor_body)
61
-
62
- raw = "".join(parts)
63
- cleaned = " ".join(raw.split()).strip()
64
-
65
- return cleaned if cleaned else "Graphic/Empty Link"
66
-
67
-
68
- def resolve_pypdf_destination(reader: PdfReader, dest) -> str:
69
- """
70
- Resolves any form of destination (/Dest or /A /D) to a human-readable page number.
71
- Uses the official pypdf helper when possible for maximum reliability.
72
- """
73
- try:
74
- if dest is None:
75
- return "N/A"
76
-
77
- # If it's an IndirectObject, resolve it first
78
- if isinstance(dest, (IndirectObject, NameObject)):
79
- dest = dest.get_object()
80
-
81
- # Named destinations or explicit destinations are handled correctly by this method
82
- if isinstance(dest, Destination):
83
- return str(reader.get_destination_page_number(dest) + 1)
84
-
85
- # Direct array or indirect reference
86
- page_num = reader.get_destination_page_number(dest)
87
- return str(page_num + 1)
88
-
89
- except Exception:
90
- return "Unknown/Error"
91
-
92
-
93
- def extract_links_pypdf(pdf_path: Path | str) -> List[Dict[str, Any]]:
94
- """
95
- Extract all link annotations (URI, internal GoTo, remote GoToR) using pure pypdf.
96
- Output schema matches typical reporting needs.
97
- """
98
- reader = PdfReader(pdf_path)
99
-
100
- all_links: List[Dict[str, Any]] = []
101
-
102
- for i, page in enumerate(reader.pages):
103
- page_num = i + 1
104
-
105
- if "/Annots" not in page:
106
- continue
107
-
108
- annots = page["/Annots"]
109
- for annot_ref in annots:
110
- try:
111
- annot = annot_ref.get_object()
112
- except Exception:
113
- continue # Corrupted annotation – skip
114
-
115
- if annot.get("/Subtype") != "/Link":
116
- continue
117
-
118
- rect = annot.get("/Rect")
119
- anchor_text = get_anchor_text_pypdf(page, rect)
120
-
121
- link_dict: Dict[str, Any] = {
122
- "page": page_num,
123
- "rect": list(rect) if rect else None,
124
- "link_text": anchor_text,
125
- "type": "Other Action",
126
- "target": "Unknown",
127
- }
128
-
129
- action = annot.get("/A")
130
-
131
- # External URI link
132
- if action and action.get("/URI"):
133
- uri = action["/URI"]
134
- link_dict.update({
135
- "type": "External (URI)",
136
- "url": str(uri),
137
- "target": str(uri),
138
- })
139
-
140
- # Internal GoTo – can be /Dest directly or inside /A /D
141
- elif annot.get("/Dest") or (action and action.get("/D")):
142
- dest = annot.get("/Dest") or (action and action["/D"])
143
- target_page = resolve_pypdf_destination(reader, dest)
144
- link_dict.update({
145
- "type": "Internal (GoTo/Dest)",
146
- "destination_page": target_page,
147
- "target": f"Page {target_page}",
148
- })
149
-
150
- # Remote GoToR (links to another PDF file)
151
- elif action and action.get("/S") == "/GoToR":
152
- file_spec = action.get("/F")
153
- remote_file = str(file_spec) if file_spec else "Unknown File"
154
- remote_dest = action.get("/D")
155
- remote_target = f"File: {remote_file}"
156
- if remote_dest:
157
- remote_target += f" → Dest: {remote_dest}"
158
- link_dict.update({
159
- "type": "Remote (GoToR)",
160
- "remote_file": remote_file,
161
- "target": remote_target,
162
- })
163
-
164
- all_links.append(link_dict)
165
-
166
- return all_links
167
-
168
-
169
- def extract_toc_pypdf(pdf_path: Path | str) -> List[Dict[str, Any]]:
170
- """
171
- Extract the PDF outline (bookmarks / table of contents) using pypdf.
172
- Correctly handles nested structure and uses the official page resolution method.
173
- """
174
- try:
175
- reader = PdfReader(pdf_path)
176
- outline = reader.outline
177
- if not outline:
178
- return []
179
-
180
- toc_data: List[Dict[str, Any]] = []
181
-
182
- def flatten_outline(items: List, level: int = 1):
183
- for item in items:
184
- if isinstance(item, Destination):
185
- try:
186
- page_num = reader.get_destination_page_number(item) + 1
187
- except Exception:
188
- page_num = "N/A"
189
-
190
- toc_data.append({
191
- "level": level,
192
- "title": item.title or "(Untitled)",
193
- "target_page": page_num,
194
- })
195
- elif isinstance(item, list):
196
- # Recurse into child entries
197
- flatten_outline(item, level + 1)
198
-
199
- flatten_outline(outline)
200
- return toc_data
201
-
202
- except Exception as e:
203
- print(f"TOC extraction error: {e}", file=sys.stderr)
204
- return []
205
-
206
-
207
- def call_stable():
208
- """
209
- Entry point for command-line execution or integration with reporting module.
210
- """
211
- from pdflinkcheck.report import run_report_and_call_exports
212
-
213
- run_report_and_call_exports(pdf_library="pypdf")
214
-
215
- if __name__ == "__main__":
216
- call_stable()
217
- # pypdf version updates
@@ -1,4 +0,0 @@
1
- Wheel-Version: 1.0
2
- Generator: uv 0.9.18
3
- Root-Is-Purelib: true
4
- Tag: py3-none-any
@@ -1,24 +0,0 @@
1
- **Copyright © 2025 George Clayton Bennett**
2
- <https://github.com/City-of-Memphis-Wastewater/pdflinkcheck>
3
-
4
- Some distributed binaries of this project include the PyMuPDF library, which is licensed under **AGPL‑3.0‑or‑later**.
5
- Any binary that incorporates PyMuPDF is therefore distributed under **AGPL‑3.0‑or‑later**.
6
- Other binaries use only the `pypdf` library and do not include PyMuPDF; these binaries are distributed under the **MIT License**.
7
-
8
- For AGPL‑licensed binaries, the complete corresponding source code must be made available to anyone who possesses a copy, upon request.
9
- This obligation applies only to recipients of those binaries, and hosting the source code in GitHub Releases satisfies this requirement.
10
-
11
- A binary becomes AGPL‑licensed only when built with the optional `"full"` dependency group (as defined in `pyproject.toml` under `[project.optional-dependencies]`) or when PyMuPDF is otherwise included in the build environment.
12
- The **source code of pdflinkcheck itself** remains licensed under the **MIT License**; only the distributed binary becomes AGPL‑licensed when PyMuPDF is included.
13
-
14
- Source code for each released version is available in the `pdflinkcheck‑VERSION.tar.gz` files on the project’s GitHub Releases page.
15
-
16
- Full‑text copies of **LICENSE‑MIT** and **LICENSE‑AGPL3** are included in the root of the repository.
17
-
18
- **Links:**
19
- - Project source code: https://github.com/City-of-Memphis-Wastewater/pdflinkcheck
20
- - PyMuPDF source code: https://github.com/pymupdf/PyMuPDF
21
- - pypdf source code: https://github.com/py-pdf/pypdf
22
- - AGPLv3 text (FSF): https://www.gnu.org/licenses/agpl-3.0.html
23
- - MIT License text: https://opensource.org/license/mit
24
-