pdflinkcheck 1.1.94__py3-none-any.whl → 1.2.29__py3-none-any.whl
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- pdflinkcheck/__init__.py +88 -18
- pdflinkcheck/__main__.py +6 -0
- pdflinkcheck/analysis_pdfium.py +131 -0
- pdflinkcheck/{analyze_pymupdf.py → analysis_pymupdf.py} +99 -141
- pdflinkcheck/{analyze_pypdf.py → analysis_pypdf.py} +51 -39
- pdflinkcheck/cli.py +52 -48
- pdflinkcheck/data/LICENSE +18 -15
- pdflinkcheck/data/README.md +23 -25
- pdflinkcheck/data/pyproject.toml +17 -26
- pdflinkcheck/datacopy.py +16 -1
- pdflinkcheck/dev.py +2 -2
- pdflinkcheck/environment.py +14 -2
- pdflinkcheck/gui.py +346 -563
- pdflinkcheck/helpers.py +88 -0
- pdflinkcheck/io.py +24 -6
- pdflinkcheck/report.py +598 -97
- pdflinkcheck/security.py +189 -0
- pdflinkcheck/splash.py +38 -0
- pdflinkcheck/stdlib_server.py +7 -21
- pdflinkcheck/stdlib_server_alt.py +571 -0
- pdflinkcheck/tk_utils.py +188 -0
- pdflinkcheck/update_msix_version.py +2 -0
- pdflinkcheck/validate.py +104 -170
- pdflinkcheck/version_info.py +2 -2
- {pdflinkcheck-1.1.94.dist-info → pdflinkcheck-1.2.29.dist-info}/METADATA +41 -40
- {pdflinkcheck-1.1.94.dist-info → pdflinkcheck-1.2.29.dist-info}/RECORD +34 -27
- pdflinkcheck-1.2.29.dist-info/WHEEL +5 -0
- {pdflinkcheck-1.1.94.dist-info → pdflinkcheck-1.2.29.dist-info}/entry_points.txt +0 -1
- pdflinkcheck-1.2.29.dist-info/licenses/LICENSE +27 -0
- pdflinkcheck-1.2.29.dist-info/top_level.txt +1 -0
- pdflinkcheck/analyze_pypdf_v2.py +0 -217
- pdflinkcheck-1.1.94.dist-info/WHEEL +0 -4
- pdflinkcheck-1.1.94.dist-info/licenses/LICENSE +0 -24
- {pdflinkcheck-1.1.94.dist-info → pdflinkcheck-1.2.29.dist-info}/licenses/LICENSE-AGPL3 +0 -0
- {pdflinkcheck-1.1.94.dist-info → pdflinkcheck-1.2.29.dist-info}/licenses/LICENSE-MIT +0 -0
|
@@ -1,22 +1,20 @@
|
|
|
1
1
|
Metadata-Version: 2.4
|
|
2
2
|
Name: pdflinkcheck
|
|
3
|
-
Version: 1.
|
|
3
|
+
Version: 1.2.29
|
|
4
4
|
Summary: A purpose-built PDF link analysis and reporting tool with GUI and CLI.
|
|
5
|
-
Author: George Clayton Bennett
|
|
6
5
|
Author-email: George Clayton Bennett <george.bennett@memphistn.gov>
|
|
6
|
+
Maintainer-email: George Clayton Bennett <george.bennett@memphistn.gov>
|
|
7
7
|
License-Expression: MIT AND AGPL-3.0-or-later
|
|
8
|
-
|
|
9
|
-
|
|
10
|
-
License-File: LICENSE-MIT
|
|
8
|
+
Project-URL: Homepage, https://github.com/city-of-memphis-wastewater/pdflinkcheck
|
|
9
|
+
Project-URL: Repository, https://github.com/city-of-memphis-wastewater/pdflinkcheck
|
|
11
10
|
Classifier: Programming Language :: Python :: 3
|
|
12
11
|
Classifier: Programming Language :: Python :: 3 :: Only
|
|
12
|
+
Classifier: Programming Language :: Python :: 3.9
|
|
13
13
|
Classifier: Programming Language :: Python :: 3.10
|
|
14
14
|
Classifier: Programming Language :: Python :: 3.11
|
|
15
15
|
Classifier: Programming Language :: Python :: 3.12
|
|
16
16
|
Classifier: Programming Language :: Python :: 3.13
|
|
17
17
|
Classifier: Programming Language :: Python :: 3.14
|
|
18
|
-
Classifier: License :: OSI Approved :: GNU Affero General Public License v3 or later (AGPLv3+)
|
|
19
|
-
Classifier: License :: OSI Approved :: MIT License
|
|
20
18
|
Classifier: Operating System :: OS Independent
|
|
21
19
|
Classifier: Intended Audience :: End Users/Desktop
|
|
22
20
|
Classifier: Intended Audience :: Developers
|
|
@@ -31,18 +29,23 @@ Classifier: Environment :: MacOS X
|
|
|
31
29
|
Classifier: Environment :: Win32 (MS Windows)
|
|
32
30
|
Classifier: Typing :: Typed
|
|
33
31
|
Classifier: Development Status :: 4 - Beta
|
|
34
|
-
Requires-
|
|
32
|
+
Requires-Python: >=3.9
|
|
33
|
+
Description-Content-Type: text/markdown
|
|
34
|
+
License-File: LICENSE
|
|
35
|
+
License-File: LICENSE-MIT
|
|
36
|
+
License-File: LICENSE-AGPL3
|
|
37
|
+
Requires-Dist: pyhabitat>=1.1.5
|
|
35
38
|
Requires-Dist: pypdf>=6.4.2
|
|
36
39
|
Requires-Dist: rich>=14.2.0
|
|
37
40
|
Requires-Dist: typer>=0.20.0
|
|
38
|
-
|
|
39
|
-
|
|
40
|
-
|
|
41
|
-
Requires-
|
|
42
|
-
Project-URL: Homepage, https://github.com/city-of-memphis-wastewater/pdflinkcheck
|
|
43
|
-
Project-URL: Repository, https://github.com/city-of-memphis-wastewater/pdflinkcheck
|
|
41
|
+
Provides-Extra: mupdf
|
|
42
|
+
Requires-Dist: pymupdf<2.0.0,>=1.24.0; extra == "mupdf"
|
|
43
|
+
Provides-Extra: pdfium
|
|
44
|
+
Requires-Dist: pypdfium2<6.0.0,>=5.2.0; extra == "pdfium"
|
|
44
45
|
Provides-Extra: full
|
|
45
|
-
|
|
46
|
+
Requires-Dist: pymupdf<2.0.0,>=1.24.0; extra == "full"
|
|
47
|
+
Requires-Dist: pypdfium2<6.0.0,>=5.2.0; extra == "full"
|
|
48
|
+
Dynamic: license-file
|
|
46
49
|
|
|
47
50
|
# pdflinkcheck
|
|
48
51
|
|
|
@@ -50,7 +53,7 @@ A purpose-built tool for comprehensive analysis of hyperlinks and GoTo links wit
|
|
|
50
53
|
|
|
51
54
|
-----
|
|
52
55
|
|
|
53
|
-

|
|
54
57
|
|
|
55
58
|
-----
|
|
56
59
|
|
|
@@ -65,7 +68,7 @@ For the most user-typical experience, download the single-file binary matching y
|
|
|
65
68
|
| **File Type** | **Primary Use Case** | **Recommended Launch Method** |
|
|
66
69
|
| :--- | :--- | :--- |
|
|
67
70
|
| **Executable (.exe, .elf)** | **GUI** | Double-click the file. |
|
|
68
|
-
| **PYZ (Python Zip App)** | **CLI** or **GUI** | Run using your system's `python` command: `python pdflinkcheck-VERSION.pyz --help` |
|
|
71
|
+
| **PYZ (Python Zip App)** | **CLI** or **GUI** | Run using your system's `python` command: `python pdflinkcheck-VERSION.pyz --help` |
|
|
69
72
|
|
|
70
73
|
### Installation via pipx
|
|
71
74
|
|
|
@@ -99,7 +102,7 @@ Ways to launch the GUI interface:
|
|
|
99
102
|
The core functionality is accessed via the `analyze` command.
|
|
100
103
|
|
|
101
104
|
`pdflinkcheck --help`:
|
|
102
|
-

|
|
103
106
|
|
|
104
107
|
|
|
105
108
|
See the Help Tree by unlocking the help-tree CLI command, using the DEV_TYPER_HELP_TREE env var.
|
|
@@ -109,7 +112,7 @@ DEV_TYPER_HELP_TREE=1 pdflinkcheck help-tree` # bash
|
|
|
109
112
|
$env:DEV_TYPER_HELP_TREE = "1"; pdflinkcheck help-tree` # PowerShell
|
|
110
113
|
```
|
|
111
114
|
|
|
112
|
-

|
|
113
116
|
|
|
114
117
|
|
|
115
118
|
|
|
@@ -130,7 +133,6 @@ $env:DEV_TYPER_HELP_TREE = "1"; pdflinkcheck help-tree` # PowerShell
|
|
|
130
133
|
|`<PDF_PATH>`|**Required.** The path to the PDF file to analyze.|N/A|
|
|
131
134
|
|`--pdf-library / -p`|Select engine: `pymupdf` or `pypdf`.|`pypdf`|
|
|
132
135
|
|`--export-format / -e`|Export to `JSON`, `TXT`, or `None` to suppress file output.|`JSON`|
|
|
133
|
-
|`--max-links / -m`|Maximum links to display per section. Use `0` for all.|`0`|
|
|
134
136
|
|
|
135
137
|
### `gui` Command Options
|
|
136
138
|
|
|
@@ -144,9 +146,6 @@ $env:DEV_TYPER_HELP_TREE = "1"; pdflinkcheck help-tree` # PowerShell
|
|
|
144
146
|
# Analyze a document, show all links, and save the report as JSON and TXT
|
|
145
147
|
pdflinkcheck analyze "TE Maxson WWTF O&M Manual.pdf" --export-format JSON,TXT
|
|
146
148
|
|
|
147
|
-
# Analyze a document but keep the print block short, showing only the first 10 links for each type
|
|
148
|
-
pdflinkcheck analyze "TE Maxson WWTF O&M Manual.pdf" --max-links 10
|
|
149
|
-
|
|
150
149
|
# Show the GUI for only a moment, like in a build check
|
|
151
150
|
pdflinkcheck gui --auto-close 3000
|
|
152
151
|
|
|
@@ -158,22 +157,23 @@ pdflinkcheck docs --license --readme
|
|
|
158
157
|
|
|
159
158
|
## 📦 Library Access (Advanced)
|
|
160
159
|
|
|
161
|
-
For developers importing `pdflinkcheck` into other Python projects, the core analysis functions are exposed directly in the root namespace
|
|
160
|
+
For developers importing `pdflinkcheck` into other Python projects, the core analysis functions are exposed directly in the root namespace. The various `analysis_pdf_*` functions each use a different library to extract the target PDF's internal TOC, external links, and metadata.
|
|
162
161
|
|
|
163
|
-
|**Function**|**Description**|
|
|
164
|
-
|
|
165
|
-
|`run_report()
|
|
166
|
-
|`
|
|
167
|
-
|`
|
|
168
|
-
|`
|
|
169
|
-
|`extract_toc_pymupdf()`|Function to extract the PDF's internal Table of Contents (bookmarks/outline), using the pypdf library.|
|
|
162
|
+
|**Function**|**Library**|**Description**|
|
|
163
|
+
|---|---|---|
|
|
164
|
+
|`run_report()`|pdflinkcheck | **(Primary function)** Performs the full analysis, prints to console, and handles file export.|
|
|
165
|
+
|`analyze_pdf_pdfium()`| pypdfium2 | Fast, ~10 mb, Permissively licensed |
|
|
166
|
+
|`analyze_pdf_pymupdf()`| PyMuPDF | Fast, ~30 mb, AGPL3+ licensed |
|
|
167
|
+
|`analyze_pdf_pypdf()`| pypdf library | Slow, ~2 mb, Permissively licensed |
|
|
170
168
|
|
|
171
169
|
Exanple:
|
|
172
170
|
|
|
173
171
|
```python
|
|
174
|
-
from pdflinkcheck
|
|
175
|
-
|
|
176
|
-
|
|
172
|
+
from pdflinkcheck import ( run_report,
|
|
173
|
+
analyze_pdf_pymupdf,
|
|
174
|
+
analyze_pdf_pypdf,
|
|
175
|
+
analyze_pdf_pdfium,
|
|
176
|
+
)
|
|
177
177
|
|
|
178
178
|
file = "document1.pdf"
|
|
179
179
|
report_data = run_report(file)
|
|
@@ -240,24 +240,24 @@ Termux compatibility is important in the modern age, because Android devices are
|
|
|
240
240
|
Android is the most common operating system in the Global South.
|
|
241
241
|
We aim to produce stable software that can do the most possible good.
|
|
242
242
|
|
|
243
|
-
Now `pdflinkcheck` can run on Termux by using the `pypdf` engine.
|
|
243
|
+
Now `pdflinkcheck` can run on Termux by using the `pypdf` engine and the `pdfium` engine.
|
|
244
244
|
Benefits:
|
|
245
245
|
- `pypdf`-only artifacts, to reduce size to about 6% compared to artifacts that include `PyMuPDF`.
|
|
246
246
|
- Web-stack GUI as an alternative to the Tkinter GUI, which can be run locally on Termux or as a web app.
|
|
247
247
|
|
|
248
248
|
|
|
249
249
|
### PDF Library Selection
|
|
250
|
-
At long last, `PyMuPDF` is an optional dependency. All testing comparing `
|
|
250
|
+
At long last, `PyMuPDF` is an optional dependency. All testing comparing `pypdf` and `PyMuPDF` has shown identical validation performance. However `PyMuPDF` is much faster. The benfit of `pypdf` is small size of packages and cross-platform compatibility. We have recently added a PDFium option, which circumvents the AGPL3+.
|
|
251
251
|
|
|
252
252
|
Expecte that all binaries and artifacts contain PyMuPDF, unlss they are built on Android. The GUI and CLI interfaces both allow selection of the library; if PyMuPDF is selected but is not available, the user will be warned.
|
|
253
253
|
|
|
254
254
|
To install the complete version use one of these options:
|
|
255
255
|
|
|
256
256
|
```bash
|
|
257
|
-
pip install "pdflinkcheck[
|
|
258
|
-
pipx install "pdflinkcheck[
|
|
259
|
-
uv tool install "pdflinkcheck[
|
|
260
|
-
uv add "pdflinkcheck[
|
|
257
|
+
pip install "pdflinkcheck[mupdf]"
|
|
258
|
+
pipx install "pdflinkcheck[pdfium]"
|
|
259
|
+
uv tool install "pdflinkcheck[pdfium]"
|
|
260
|
+
uv add "pdflinkcheck[pdfium]"
|
|
261
261
|
```
|
|
262
262
|
|
|
263
263
|
---
|
|
@@ -317,6 +317,7 @@ The source code of pdflinkcheck itself remains licensed under the **MIT License*
|
|
|
317
317
|
Links:
|
|
318
318
|
- Source code: https://github.com/City-of-Memphis-Wastewater/pdflinkcheck/
|
|
319
319
|
- PyMuPDF source code: https://github.com/pymupdf/PyMuPDF/
|
|
320
|
+
- pypdfium2 source code: https://github.com/pypdfium2-team/pypdfium2
|
|
320
321
|
- pypdf source code: https://github.com/py-pdf/pypdf/
|
|
321
322
|
- AGPLv3 text (FSF): https://www.gnu.org/licenses/agpl-3.0.html
|
|
322
323
|
- MIT License text: https://opensource.org/license/mit
|
|
@@ -1,11 +1,28 @@
|
|
|
1
|
-
pdflinkcheck/__init__.py,sha256=
|
|
2
|
-
pdflinkcheck/
|
|
3
|
-
pdflinkcheck/
|
|
4
|
-
pdflinkcheck/
|
|
5
|
-
pdflinkcheck/
|
|
1
|
+
pdflinkcheck/__init__.py,sha256=HOHHgzLwn3K_879klj4KiOdhyaFskcZFwKtX5l5XDX4,4441
|
|
2
|
+
pdflinkcheck/__main__.py,sha256=o2FpnQjrUbKe3oCt2LiSYsvEUfCjLSTPkxeNw9Oe2Hc,137
|
|
3
|
+
pdflinkcheck/analysis_pdfium.py,sha256=yGidq3xP9GZEZ9K0RBStNCMhq9k48PVbNQ0moMWbsCY,5077
|
|
4
|
+
pdflinkcheck/analysis_pymupdf.py,sha256=wY9V8NHm3o0puJ1pbZb4rfmknthshOXMqfyv2ex9S7E,10620
|
|
5
|
+
pdflinkcheck/analysis_pypdf.py,sha256=Av8FY05HSdLu7OhsilTTAYZT5BCBxI_dlAzRBlZnDXs,7481
|
|
6
|
+
pdflinkcheck/cli.py,sha256=n-uXUdUvG4KCJFRMGvmY95FeszOuwDZOsqvIgClKd7w,13365
|
|
7
|
+
pdflinkcheck/datacopy.py,sha256=jgVNzHwHsVp0L3fXJP3rR02CUQR96E3fGVvGWz9ZF44,3052
|
|
8
|
+
pdflinkcheck/dev.py,sha256=T6KlZYeC8yg61SshSRVD0Ja2VvrhQCIXaXjBIWYd4h4,3928
|
|
9
|
+
pdflinkcheck/environment.py,sha256=cz6J7ymfs3rnsgPucr7Hd5U9My5lkKQ2VPgPFPrLtIY,2617
|
|
10
|
+
pdflinkcheck/gui.py,sha256=aueTrjKe9mZPyJmpRTdBUeNB5kHfWIVjD7puZ6Q6Y9s,19918
|
|
11
|
+
pdflinkcheck/helpers.py,sha256=oDYm1p9cswee6DrJFhWfs3rpQXIFcVjfqmV8ZB4QcrA,3780
|
|
12
|
+
pdflinkcheck/io.py,sha256=pZPmzEMHiuqA0bPp-oPjlBNkjA996gfuZ4NZBH-8Eio,7980
|
|
13
|
+
pdflinkcheck/report.py,sha256=flVscJ-MMnaw4U1-woFy9dUKMjA3KpSs8gRegS7eVZY,33591
|
|
14
|
+
pdflinkcheck/security.py,sha256=U6U3_3utDg4lzM668c6OzDVujoo6EauHq1-JElU9_Nc,6157
|
|
15
|
+
pdflinkcheck/splash.py,sha256=u_2l6LpFpjTJianNiyY8VBAuW7bWPfq4qsKhPWRid48,1203
|
|
16
|
+
pdflinkcheck/stdlib_server.py,sha256=sKKgeHfFYlW5yAoXy4iDgMzb4Ho_rBV1OPHgS_DWZZ0,5940
|
|
17
|
+
pdflinkcheck/stdlib_server_alt.py,sha256=ixpBSd95e2a9dKE38FOCM8EtulVkpUqLWWa4oAAN6YY,16322
|
|
18
|
+
pdflinkcheck/tk_utils.py,sha256=8Y8oH-I9WNWUPUyrHlWG-GZacrQ_hn3qbZWzfN08pko,6795
|
|
19
|
+
pdflinkcheck/update_msix_version.py,sha256=RMxhpQOiieQFUF_SyCb7a5UpYF792nDgsHAP7bUAwo8,1629
|
|
20
|
+
pdflinkcheck/validate.py,sha256=H2bZopzcb5KPqdkahPT8Cg58WzSIQwdunT92JnXvdJA,11914
|
|
21
|
+
pdflinkcheck/version_info.py,sha256=h47z3TRAGXiahiPyBH2MqOXUO7x3c-iwvHV-wB_b2P0,3296
|
|
6
22
|
pdflinkcheck/data/I Have Questions.md,sha256=lurBwKxUijxysq5qsUrLgzFpaSrZwe86KjbPMwCofvY,3152
|
|
7
|
-
pdflinkcheck/data/LICENSE,sha256=
|
|
8
|
-
pdflinkcheck/data/README.md,sha256=
|
|
23
|
+
pdflinkcheck/data/LICENSE,sha256=npi0VqlkL0AlE0FsGEIOXxMqXlmH4toZfBHesJpP0lA,1866
|
|
24
|
+
pdflinkcheck/data/README.md,sha256=ikWkKpHkkE2cC13St6fyKSxGs9JQ05QmLzOUTIPzUOg,11235
|
|
25
|
+
pdflinkcheck/data/pyproject.toml,sha256=8IGKmhfhdqxh-_mmVEg9_rTaxw_oQ8H9JL-zfJe2pOE,2732
|
|
9
26
|
pdflinkcheck/data/icons/BoxArt-1080x1080.png,sha256=ifiadgqnq-23heD9_jEciuR1fOm-iVW9gU1Uj_2TRUA,140431
|
|
10
27
|
pdflinkcheck/data/icons/Logo-150x150.png,sha256=qEjDoeD-fSV5AmcWoYRhs1cD7o95A7ckvqjooxIqrQY,6777
|
|
11
28
|
pdflinkcheck/data/icons/Logo-300x300.png,sha256=o9zzwjFwKh7NVOilcQULMDwOYvy3EL8Afz_JpRtYhes,19479
|
|
@@ -16,7 +33,8 @@ pdflinkcheck/data/icons/SplashScreen-620x300.png,sha256=tc02X1Ykp7LrPrdjASPT6_Jj
|
|
|
16
33
|
pdflinkcheck/data/icons/StoreLogo-50x50.png,sha256=9XpM96NWn0p0X5nXlsOdYW2Djbd_n48iK0aWVLcKaM8,1424
|
|
17
34
|
pdflinkcheck/data/icons/WideLogo-310x150.png,sha256=qEjDoeD-fSV5AmcWoYRhs1cD7o95A7ckvqjooxIqrQY,6777
|
|
18
35
|
pdflinkcheck/data/icons/red_pdf_512px.ico,sha256=GxrliQQkb7DbGG3DmAzVpYKjP75C8E6Dixtaw36vH94,67646
|
|
19
|
-
pdflinkcheck/data/
|
|
36
|
+
pdflinkcheck/data/themes/forest/forest-dark.tcl,sha256=sMUVZOLAkU9O67IQ8pqfLIXsM_zxYJIVTe6xhqYv0hQ,18926
|
|
37
|
+
pdflinkcheck/data/themes/forest/forest-light.tcl,sha256=_HKSSTECkNYg79m7KLvGugv8dcun2_cqpdr9k88uK9U,19218
|
|
20
38
|
pdflinkcheck/data/themes/forest/forest-dark/border-accent-hover.png,sha256=GUqCh-r5m8S07gf8n9glMVRG4Wjz7Ih_pqz5rc2hA_g,385
|
|
21
39
|
pdflinkcheck/data/themes/forest/forest-dark/border-accent.png,sha256=q4y0SyxR6fuAxgKiFZubGodg72yZ30ZXlS0XZQ1m0Yw,389
|
|
22
40
|
pdflinkcheck/data/themes/forest/forest-dark/border-basic.png,sha256=xZc7CJx8cI4aKQ6HvRW3PJuH5X6WdUwQdOisaHd4nwE,333
|
|
@@ -85,7 +103,6 @@ pdflinkcheck/data/themes/forest/forest-dark/up.png,sha256=rYXTZXXxHpLqrnKV_PLukf
|
|
|
85
103
|
pdflinkcheck/data/themes/forest/forest-dark/vert-accent.png,sha256=ByAmCAafDeylS7_FHpqX9DZ4ZFnPO-TqUl9Mh0fCPmc,158
|
|
86
104
|
pdflinkcheck/data/themes/forest/forest-dark/vert-basic.png,sha256=-oOfurWXJNN7TSy_VEu6t-o9IMojg-1gmyGYjOk-Rwg,158
|
|
87
105
|
pdflinkcheck/data/themes/forest/forest-dark/vert-hover.png,sha256=ran_87NoN4U9dSLXft-cO6XyKnP2whAgqUj7E6N4DKk,158
|
|
88
|
-
pdflinkcheck/data/themes/forest/forest-dark.tcl,sha256=sMUVZOLAkU9O67IQ8pqfLIXsM_zxYJIVTe6xhqYv0hQ,18926
|
|
89
106
|
pdflinkcheck/data/themes/forest/forest-light/border-accent-hover.png,sha256=G1F4TVltT88-xdqJgvek_v0QZ_RCNFJsI0hJgx5DvcA,445
|
|
90
107
|
pdflinkcheck/data/themes/forest/forest-light/border-accent.png,sha256=b-r7W6ipGGzWCG_sZHyuoxZeLA7iLY5NAqNR2zEdF7M,463
|
|
91
108
|
pdflinkcheck/data/themes/forest/forest-light/border-basic.png,sha256=aKludTEo35p98yt77J5vVhh4qy8QXPEipgz3a75Pwzw,311
|
|
@@ -156,21 +173,11 @@ pdflinkcheck/data/themes/forest/forest-light/up.png,sha256=qNsplPHruJhHV0mJDaomk
|
|
|
156
173
|
pdflinkcheck/data/themes/forest/forest-light/vert-accent.png,sha256=Eyj8nUQHnEzxNVFhMv7FH_Ezn9pOt2Xe0-Aquh8_9YY,158
|
|
157
174
|
pdflinkcheck/data/themes/forest/forest-light/vert-basic.png,sha256=tx9KswFqH_-ImUrm2M8cIjsBQKGupa2SJqjVjHCi9LM,158
|
|
158
175
|
pdflinkcheck/data/themes/forest/forest-light/vert-hover.png,sha256=BjEilGWGjr0ulvCfmFiBmS-8Dr1jSQjt4CUkxS1e2h4,158
|
|
159
|
-
pdflinkcheck/
|
|
160
|
-
pdflinkcheck/
|
|
161
|
-
pdflinkcheck/
|
|
162
|
-
pdflinkcheck/
|
|
163
|
-
pdflinkcheck/
|
|
164
|
-
pdflinkcheck/
|
|
165
|
-
pdflinkcheck/
|
|
166
|
-
pdflinkcheck
|
|
167
|
-
pdflinkcheck/update_msix_version.py,sha256=gH2vbX_nPMTz0IHo4spwwsrvam3ZgkF7_HFPi1Mo8z4,1552
|
|
168
|
-
pdflinkcheck/validate.py,sha256=wytRPNQdSVOjWGywnR6r62qu2vSmyzx7HzLxOg-pGFg,14415
|
|
169
|
-
pdflinkcheck/version_info.py,sha256=9c6b1kRyu0ksIpslhdNA0GDmQd5TsbcNEe0e5SXwZtY,3261
|
|
170
|
-
pdflinkcheck-1.1.94.dist-info/licenses/LICENSE,sha256=gWkQJzBjRDg-GK4tbaw4_V-W1VJRbzx3Qx4i1_fWIrE,1718
|
|
171
|
-
pdflinkcheck-1.1.94.dist-info/licenses/LICENSE-AGPL3,sha256=hIahDEOTzuHCU5J2nd07LWwkLW7Hko4UFO__ffsvB-8,34523
|
|
172
|
-
pdflinkcheck-1.1.94.dist-info/licenses/LICENSE-MIT,sha256=hNjSMpr8jpZQzBj9tHKafry8bW1kfLnl-DxiSfw7Lyw,1127
|
|
173
|
-
pdflinkcheck-1.1.94.dist-info/WHEEL,sha256=ZyFSCYkV2BrxH6-HRVRg3R9Fo7MALzer9KiPYqNxSbo,79
|
|
174
|
-
pdflinkcheck-1.1.94.dist-info/entry_points.txt,sha256=OJs4WkAziNGSoZ2KP0FgYOj2JdL6EW8UphJebWJnz3c,55
|
|
175
|
-
pdflinkcheck-1.1.94.dist-info/METADATA,sha256=nsnSp0FW4UziHfBezniQJ_2gjmuidyEUJxm0h-1t10E,13552
|
|
176
|
-
pdflinkcheck-1.1.94.dist-info/RECORD,,
|
|
176
|
+
pdflinkcheck-1.2.29.dist-info/licenses/LICENSE,sha256=npi0VqlkL0AlE0FsGEIOXxMqXlmH4toZfBHesJpP0lA,1866
|
|
177
|
+
pdflinkcheck-1.2.29.dist-info/licenses/LICENSE-AGPL3,sha256=hIahDEOTzuHCU5J2nd07LWwkLW7Hko4UFO__ffsvB-8,34523
|
|
178
|
+
pdflinkcheck-1.2.29.dist-info/licenses/LICENSE-MIT,sha256=hNjSMpr8jpZQzBj9tHKafry8bW1kfLnl-DxiSfw7Lyw,1127
|
|
179
|
+
pdflinkcheck-1.2.29.dist-info/METADATA,sha256=37Kicn3cP9WUYoBnudSyIVytLqfsXAD2UZph9Qhol7M,13317
|
|
180
|
+
pdflinkcheck-1.2.29.dist-info/WHEEL,sha256=_zCd3N1l69ArxyTb8rzEoP9TpbYXkqRFSNOD5OuxnTs,91
|
|
181
|
+
pdflinkcheck-1.2.29.dist-info/entry_points.txt,sha256=cZaB_inIfr2X9lxMo1RhZr4602F3nTjTm3cXquzfw3Q,54
|
|
182
|
+
pdflinkcheck-1.2.29.dist-info/top_level.txt,sha256=WdBg8l6l3TF1HQDpR_PwSmBCSu5atKWFnPfNbRNwrME,13
|
|
183
|
+
pdflinkcheck-1.2.29.dist-info/RECORD,,
|
|
@@ -0,0 +1,27 @@
|
|
|
1
|
+
Some distributed binaries of this project include the PyMuPDF library, which is licensed under **AGPL3.0orlater**.
|
|
2
|
+
Any binary that incorporates PyMuPDF is therefore distributed under **AGPL3.0orlater**.
|
|
3
|
+
Other binaries use only the `pypdf` library and do not include PyMuPDF; these binaries are distributed under the **MIT License**.
|
|
4
|
+
|
|
5
|
+
For AGPLlicensed binaries, the complete corresponding source code must be made available to anyone who possesses a copy, upon request.
|
|
6
|
+
This obligation applies only to recipients of those binaries, and hosting the source code in GitHub Releases satisfies this requirement.
|
|
7
|
+
|
|
8
|
+
A binary becomes AGPLlicensed only when built with the optional `"full"` dependency group (as defined in `pyproject.toml` under `[project.optional-dependencies]`) or when PyMuPDF is otherwise included in the build environment.
|
|
9
|
+
The **source code of pdflinkcheck itself** remains licensed under the **MIT License**; only the distributed binary becomes AGPLlicensed when PyMuPDF is included.
|
|
10
|
+
|
|
11
|
+
Source code for each released version is available in the `pdflinkcheckVERSION.tar.gz` files on the projects GitHub Releases page.
|
|
12
|
+
|
|
13
|
+
Fulltext copies of **LICENSEMIT** and **LICENSEAGPL3** are included in the root of the repository.
|
|
14
|
+
|
|
15
|
+
**Links:**
|
|
16
|
+
- Project source code: https://github.com/City-of-Memphis-Wastewater/pdflinkcheck
|
|
17
|
+
- PyMuPDF source code: https://github.com/pymupdf/PyMuPDF
|
|
18
|
+
- pypdfium2 source code: https://github.com/pypdfium2-team/pypdfium2
|
|
19
|
+
- PDFium source code: https://pdfium.googlesource.com/pdfium/
|
|
20
|
+
- pypdf source code: https://github.com/py-pdf/pypdf
|
|
21
|
+
- AGPLv3 text (FSF): https://www.gnu.org/licenses/agpl-3.0.html
|
|
22
|
+
- MIT License text: https://opensource.org/license/mit
|
|
23
|
+
- BSD-3 License text: https://opensource.org/license/bsd-3-clause
|
|
24
|
+
- Apache-v2 License text: https://opensource.org/license/apache-2-0
|
|
25
|
+
|
|
26
|
+
|
|
27
|
+
Copyright 2025 George Clayton Bennett
|
|
@@ -0,0 +1 @@
|
|
|
1
|
+
pdflinkcheck
|
pdflinkcheck/analyze_pypdf_v2.py
DELETED
|
@@ -1,217 +0,0 @@
|
|
|
1
|
-
#!/usr/bin/env python3
|
|
2
|
-
# SPDX-License-Identifier: MIT
|
|
3
|
-
# src/pdflinkcheck/analyze_pypdf_v2.py
|
|
4
|
-
import sys
|
|
5
|
-
from pathlib import Path
|
|
6
|
-
import logging
|
|
7
|
-
from typing import Dict, Any, List
|
|
8
|
-
|
|
9
|
-
from pypdf import PdfReader
|
|
10
|
-
from pypdf.generic import Destination, NameObject, IndirectObject
|
|
11
|
-
|
|
12
|
-
"""
|
|
13
|
-
Inspect target PDF for both URI links and GoTo links, using only pypdf (no PyMuPDF/Fitz).
|
|
14
|
-
Fully fixed and improved version as of December 2025 (compatible with pypdf >= 4.0).
|
|
15
|
-
"""
|
|
16
|
-
|
|
17
|
-
def get_anchor_text_pypdf(page, rect) -> str:
|
|
18
|
-
"""
|
|
19
|
-
Extracts text that falls within or near the link's bounding box using a visitor function.
|
|
20
|
-
This is a reliable pure-pypdf method for associating visible text with a link annotation.
|
|
21
|
-
"""
|
|
22
|
-
if not rect:
|
|
23
|
-
return "N/A: Missing Rect"
|
|
24
|
-
|
|
25
|
-
# PDF coordinates: bottom-left origin. Rect is [x0, y0, x1, y1]
|
|
26
|
-
# Standardize Rect: [x_min, y_min, x_max, y_max]
|
|
27
|
-
# Some PDF generators write Rect as [x_max, y_max, x_min, y_min]
|
|
28
|
-
x_min, y_min, x_max, y_max = rect[0], rect[1], rect[2], rect[3]
|
|
29
|
-
if x_min > x_max: x_min, x_max = x_max, x_min
|
|
30
|
-
if y_min > y_max: y_min, y_max = y_max, y_min
|
|
31
|
-
|
|
32
|
-
parts: List[str] = []
|
|
33
|
-
|
|
34
|
-
def visitor_body(text: str, cm, tm, font_dict, font_size):
|
|
35
|
-
# tm[4] and tm[5] are the (x, y) coordinates of the text insertion point
|
|
36
|
-
x, y = tm[4], tm[5]
|
|
37
|
-
|
|
38
|
-
# Guard against missing font_size
|
|
39
|
-
actual_font_size = font_size if font_size else 10
|
|
40
|
-
|
|
41
|
-
|
|
42
|
-
# Approximate Center-Alignment Check
|
|
43
|
-
# Since tm[4/5] is usually the bottom-left of the character,
|
|
44
|
-
# we shift our 'check point' slightly up and to the right based
|
|
45
|
-
# on font size to approximate the center of the character.
|
|
46
|
-
char_center_x = x + (actual_font_size / 4)
|
|
47
|
-
char_center_y = y + (actual_font_size / 3)
|
|
48
|
-
|
|
49
|
-
# Asymmetric Tolerance
|
|
50
|
-
# We use a tighter vertical tolerance (3pt) to avoid catching lines above/below.
|
|
51
|
-
# We use a wider horizontal tolerance (10pt) to catch kerning/spacing issues.
|
|
52
|
-
v_tol = 3
|
|
53
|
-
h_tol = 10
|
|
54
|
-
if (x_min - h_tol) <= char_center_x <= (x_max + h_tol) and \
|
|
55
|
-
(y_min - v_tol) <= char_center_y <= (y_max + v_tol):
|
|
56
|
-
if text.strip():
|
|
57
|
-
parts.append(text)
|
|
58
|
-
|
|
59
|
-
# Extract text using the visitor – this preserves drawing order
|
|
60
|
-
page.extract_text(visitor_text=visitor_body)
|
|
61
|
-
|
|
62
|
-
raw = "".join(parts)
|
|
63
|
-
cleaned = " ".join(raw.split()).strip()
|
|
64
|
-
|
|
65
|
-
return cleaned if cleaned else "Graphic/Empty Link"
|
|
66
|
-
|
|
67
|
-
|
|
68
|
-
def resolve_pypdf_destination(reader: PdfReader, dest) -> str:
|
|
69
|
-
"""
|
|
70
|
-
Resolves any form of destination (/Dest or /A /D) to a human-readable page number.
|
|
71
|
-
Uses the official pypdf helper when possible for maximum reliability.
|
|
72
|
-
"""
|
|
73
|
-
try:
|
|
74
|
-
if dest is None:
|
|
75
|
-
return "N/A"
|
|
76
|
-
|
|
77
|
-
# If it's an IndirectObject, resolve it first
|
|
78
|
-
if isinstance(dest, (IndirectObject, NameObject)):
|
|
79
|
-
dest = dest.get_object()
|
|
80
|
-
|
|
81
|
-
# Named destinations or explicit destinations are handled correctly by this method
|
|
82
|
-
if isinstance(dest, Destination):
|
|
83
|
-
return str(reader.get_destination_page_number(dest) + 1)
|
|
84
|
-
|
|
85
|
-
# Direct array or indirect reference
|
|
86
|
-
page_num = reader.get_destination_page_number(dest)
|
|
87
|
-
return str(page_num + 1)
|
|
88
|
-
|
|
89
|
-
except Exception:
|
|
90
|
-
return "Unknown/Error"
|
|
91
|
-
|
|
92
|
-
|
|
93
|
-
def extract_links_pypdf(pdf_path: Path | str) -> List[Dict[str, Any]]:
|
|
94
|
-
"""
|
|
95
|
-
Extract all link annotations (URI, internal GoTo, remote GoToR) using pure pypdf.
|
|
96
|
-
Output schema matches typical reporting needs.
|
|
97
|
-
"""
|
|
98
|
-
reader = PdfReader(pdf_path)
|
|
99
|
-
|
|
100
|
-
all_links: List[Dict[str, Any]] = []
|
|
101
|
-
|
|
102
|
-
for i, page in enumerate(reader.pages):
|
|
103
|
-
page_num = i + 1
|
|
104
|
-
|
|
105
|
-
if "/Annots" not in page:
|
|
106
|
-
continue
|
|
107
|
-
|
|
108
|
-
annots = page["/Annots"]
|
|
109
|
-
for annot_ref in annots:
|
|
110
|
-
try:
|
|
111
|
-
annot = annot_ref.get_object()
|
|
112
|
-
except Exception:
|
|
113
|
-
continue # Corrupted annotation – skip
|
|
114
|
-
|
|
115
|
-
if annot.get("/Subtype") != "/Link":
|
|
116
|
-
continue
|
|
117
|
-
|
|
118
|
-
rect = annot.get("/Rect")
|
|
119
|
-
anchor_text = get_anchor_text_pypdf(page, rect)
|
|
120
|
-
|
|
121
|
-
link_dict: Dict[str, Any] = {
|
|
122
|
-
"page": page_num,
|
|
123
|
-
"rect": list(rect) if rect else None,
|
|
124
|
-
"link_text": anchor_text,
|
|
125
|
-
"type": "Other Action",
|
|
126
|
-
"target": "Unknown",
|
|
127
|
-
}
|
|
128
|
-
|
|
129
|
-
action = annot.get("/A")
|
|
130
|
-
|
|
131
|
-
# External URI link
|
|
132
|
-
if action and action.get("/URI"):
|
|
133
|
-
uri = action["/URI"]
|
|
134
|
-
link_dict.update({
|
|
135
|
-
"type": "External (URI)",
|
|
136
|
-
"url": str(uri),
|
|
137
|
-
"target": str(uri),
|
|
138
|
-
})
|
|
139
|
-
|
|
140
|
-
# Internal GoTo – can be /Dest directly or inside /A /D
|
|
141
|
-
elif annot.get("/Dest") or (action and action.get("/D")):
|
|
142
|
-
dest = annot.get("/Dest") or (action and action["/D"])
|
|
143
|
-
target_page = resolve_pypdf_destination(reader, dest)
|
|
144
|
-
link_dict.update({
|
|
145
|
-
"type": "Internal (GoTo/Dest)",
|
|
146
|
-
"destination_page": target_page,
|
|
147
|
-
"target": f"Page {target_page}",
|
|
148
|
-
})
|
|
149
|
-
|
|
150
|
-
# Remote GoToR (links to another PDF file)
|
|
151
|
-
elif action and action.get("/S") == "/GoToR":
|
|
152
|
-
file_spec = action.get("/F")
|
|
153
|
-
remote_file = str(file_spec) if file_spec else "Unknown File"
|
|
154
|
-
remote_dest = action.get("/D")
|
|
155
|
-
remote_target = f"File: {remote_file}"
|
|
156
|
-
if remote_dest:
|
|
157
|
-
remote_target += f" → Dest: {remote_dest}"
|
|
158
|
-
link_dict.update({
|
|
159
|
-
"type": "Remote (GoToR)",
|
|
160
|
-
"remote_file": remote_file,
|
|
161
|
-
"target": remote_target,
|
|
162
|
-
})
|
|
163
|
-
|
|
164
|
-
all_links.append(link_dict)
|
|
165
|
-
|
|
166
|
-
return all_links
|
|
167
|
-
|
|
168
|
-
|
|
169
|
-
def extract_toc_pypdf(pdf_path: Path | str) -> List[Dict[str, Any]]:
|
|
170
|
-
"""
|
|
171
|
-
Extract the PDF outline (bookmarks / table of contents) using pypdf.
|
|
172
|
-
Correctly handles nested structure and uses the official page resolution method.
|
|
173
|
-
"""
|
|
174
|
-
try:
|
|
175
|
-
reader = PdfReader(pdf_path)
|
|
176
|
-
outline = reader.outline
|
|
177
|
-
if not outline:
|
|
178
|
-
return []
|
|
179
|
-
|
|
180
|
-
toc_data: List[Dict[str, Any]] = []
|
|
181
|
-
|
|
182
|
-
def flatten_outline(items: List, level: int = 1):
|
|
183
|
-
for item in items:
|
|
184
|
-
if isinstance(item, Destination):
|
|
185
|
-
try:
|
|
186
|
-
page_num = reader.get_destination_page_number(item) + 1
|
|
187
|
-
except Exception:
|
|
188
|
-
page_num = "N/A"
|
|
189
|
-
|
|
190
|
-
toc_data.append({
|
|
191
|
-
"level": level,
|
|
192
|
-
"title": item.title or "(Untitled)",
|
|
193
|
-
"target_page": page_num,
|
|
194
|
-
})
|
|
195
|
-
elif isinstance(item, list):
|
|
196
|
-
# Recurse into child entries
|
|
197
|
-
flatten_outline(item, level + 1)
|
|
198
|
-
|
|
199
|
-
flatten_outline(outline)
|
|
200
|
-
return toc_data
|
|
201
|
-
|
|
202
|
-
except Exception as e:
|
|
203
|
-
print(f"TOC extraction error: {e}", file=sys.stderr)
|
|
204
|
-
return []
|
|
205
|
-
|
|
206
|
-
|
|
207
|
-
def call_stable():
|
|
208
|
-
"""
|
|
209
|
-
Entry point for command-line execution or integration with reporting module.
|
|
210
|
-
"""
|
|
211
|
-
from pdflinkcheck.report import run_report_and_call_exports
|
|
212
|
-
|
|
213
|
-
run_report_and_call_exports(pdf_library="pypdf")
|
|
214
|
-
|
|
215
|
-
if __name__ == "__main__":
|
|
216
|
-
call_stable()
|
|
217
|
-
# pypdf version updates
|
|
@@ -1,24 +0,0 @@
|
|
|
1
|
-
**Copyright © 2025 George Clayton Bennett**
|
|
2
|
-
<https://github.com/City-of-Memphis-Wastewater/pdflinkcheck>
|
|
3
|
-
|
|
4
|
-
Some distributed binaries of this project include the PyMuPDF library, which is licensed under **AGPL‑3.0‑or‑later**.
|
|
5
|
-
Any binary that incorporates PyMuPDF is therefore distributed under **AGPL‑3.0‑or‑later**.
|
|
6
|
-
Other binaries use only the `pypdf` library and do not include PyMuPDF; these binaries are distributed under the **MIT License**.
|
|
7
|
-
|
|
8
|
-
For AGPL‑licensed binaries, the complete corresponding source code must be made available to anyone who possesses a copy, upon request.
|
|
9
|
-
This obligation applies only to recipients of those binaries, and hosting the source code in GitHub Releases satisfies this requirement.
|
|
10
|
-
|
|
11
|
-
A binary becomes AGPL‑licensed only when built with the optional `"full"` dependency group (as defined in `pyproject.toml` under `[project.optional-dependencies]`) or when PyMuPDF is otherwise included in the build environment.
|
|
12
|
-
The **source code of pdflinkcheck itself** remains licensed under the **MIT License**; only the distributed binary becomes AGPL‑licensed when PyMuPDF is included.
|
|
13
|
-
|
|
14
|
-
Source code for each released version is available in the `pdflinkcheck‑VERSION.tar.gz` files on the project’s GitHub Releases page.
|
|
15
|
-
|
|
16
|
-
Full‑text copies of **LICENSE‑MIT** and **LICENSE‑AGPL3** are included in the root of the repository.
|
|
17
|
-
|
|
18
|
-
**Links:**
|
|
19
|
-
- Project source code: https://github.com/City-of-Memphis-Wastewater/pdflinkcheck
|
|
20
|
-
- PyMuPDF source code: https://github.com/pymupdf/PyMuPDF
|
|
21
|
-
- pypdf source code: https://github.com/py-pdf/pypdf
|
|
22
|
-
- AGPLv3 text (FSF): https://www.gnu.org/licenses/agpl-3.0.html
|
|
23
|
-
- MIT License text: https://opensource.org/license/mit
|
|
24
|
-
|
|
File without changes
|
|
File without changes
|