pdflinkcheck 1.1.7__py3-none-any.whl → 1.1.72__py3-none-any.whl

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,322 @@
1
+ Metadata-Version: 2.4
2
+ Name: pdflinkcheck
3
+ Version: 1.1.72
4
+ Summary: A purpose-built PDF link analysis and reporting tool with GUI and CLI.
5
+ Author: George Clayton Bennett
6
+ Author-email: George Clayton Bennett <george.bennett@memphistn.gov>
7
+ License-File: LICENSE
8
+ Classifier: Programming Language :: Python :: 3
9
+ Classifier: Programming Language :: Python :: 3 :: Only
10
+ Classifier: Programming Language :: Python :: 3.10
11
+ Classifier: Programming Language :: Python :: 3.11
12
+ Classifier: Programming Language :: Python :: 3.12
13
+ Classifier: Programming Language :: Python :: 3.13
14
+ Classifier: Programming Language :: Python :: 3.14
15
+ Classifier: License :: OSI Approved :: GNU Affero General Public License v3 or later (AGPLv3+)
16
+ Classifier: Operating System :: OS Independent
17
+ Classifier: Intended Audience :: End Users/Desktop
18
+ Classifier: Intended Audience :: Developers
19
+ Classifier: Intended Audience :: Science/Research
20
+ Classifier: Intended Audience :: Other Audience
21
+ Classifier: Topic :: File Formats
22
+ Classifier: Topic :: Office/Business
23
+ Classifier: Topic :: Text Processing :: General
24
+ Classifier: Topic :: Scientific/Engineering :: Information Analysis
25
+ Classifier: Environment :: Console
26
+ Classifier: Environment :: MacOS X
27
+ Classifier: Environment :: Win32 (MS Windows)
28
+ Classifier: Typing :: Typed
29
+ Classifier: Development Status :: 4 - Beta
30
+ Requires-Dist: pyhabitat>=1.0.53
31
+ Requires-Dist: pypdf>=6.4.2
32
+ Requires-Dist: rich>=14.2.0
33
+ Requires-Dist: typer>=0.20.0
34
+ Requires-Dist: pymupdf>=1.26.7 ; extra == 'full'
35
+ Requires-Dist: sv-ttk>=2.6.1 ; extra == 'gui'
36
+ Maintainer: George Clayton Bennett
37
+ Maintainer-email: George Clayton Bennett <george.bennett@memphistn.gov>
38
+ Requires-Python: >=3.10
39
+ Project-URL: Homepage, https://github.com/city-of-memphis-wastewater/pdflinkcheck
40
+ Project-URL: Repository, https://github.com/city-of-memphis-wastewater/pdflinkcheck
41
+ Provides-Extra: full
42
+ Provides-Extra: gui
43
+ Description-Content-Type: text/markdown
44
+
45
+ # pdflinkcheck
46
+
47
+ A purpose-built tool for comprehensive analysis of hyperlinks and GoTo links within PDF documents. Users may leverage either the PyMuPDF or the pypdf library. Use the CLI or the GUI.
48
+
49
+ -----
50
+
51
+ ![Screenshot of the pdflinkcheck GUI](https://raw.githubusercontent.com/City-of-Memphis-Wastewater/pdflinkcheck/main/assets/pdflinkcheck_gui_v1.1.58.png)
52
+
53
+ -----
54
+
55
+ ## 📥 Access and Installation
56
+
57
+ The recommended way to use `pdflinkcheck` is to either install the CLI with `pipx` or to download the appropriate latest binary for your system from [Releases](https://github.com/City-of-Memphis-Wastewater/pdflinkcheck/releases/).
58
+
59
+ ### 🚀 Release Artifact Files (EXE, PYZ, ELF)
60
+
61
+ For the most user-typical experience, download the single-file binary matching your OS.
62
+
63
+ | **File Type** | **Primary Use Case** | **Recommended Launch Method** |
64
+ | :--- | :--- | :--- |
65
+ | **Executable (.exe, .elf)** | **GUI** | Double-click the file. |
66
+ | **PYZ (Python Zip App)** | **CLI** or **GUI** | Run using your system's `python` command: `python pdflinkcheck-VERSION.pyz --help` |
67
+
68
+ ### Installation via pipx
69
+
70
+ For an isolated environment where you can access `pdflinkcheck` from any terminal:
71
+
72
+ ```bash
73
+ # Ensure you have pipx installed first (if not, run: pip install pipx)
74
+ pipx install pdflinkcheck[full]
75
+
76
+ # On Termux
77
+ pipx install pdflinkcheck
78
+
79
+ ```
80
+
81
+ -----
82
+
83
+ ## 💻 Graphical User Interface (GUI)
84
+
85
+ The tool can be run as simple cross-platform graphical interface (Tkinter).
86
+
87
+ ### Launching the GUI
88
+
89
+ There are three ways to launch the GUI interface:
90
+
91
+ 1. **Implicit Launch:** Run the main command with no arguments, subcommands, or flags (`pdflinkcheck`).
92
+ 2. **Explicit Command:** Use the dedicated GUI subcommand (`pdflinkcheck gui`).
93
+ 3. **Binary Double-Click:**
94
+ * **Windows:** Double-click the `pdflinkcheck-VERSION-gui.bat` file.
95
+ * **macOS/Linux:** Double-click the downloaded `.pyz` or `.elf` file.
96
+
97
+ ### Planned GUI Updates
98
+
99
+ We are actively working on the following enhancements:
100
+
101
+ * **Report Export:** Functionality to export the full analysis report to a plain text file.
102
+ * **License Visibility:** A dedicated "License Info" button within the GUI to display the terms of the AGPLv3+ license.
103
+
104
+ -----
105
+
106
+ ## 🚀 CLI Usage
107
+
108
+ The core functionality is accessed via the `analyze` command.
109
+
110
+ `DEV_TYPER_HELP_TREE=1 pdflinkcheck help-tree`:
111
+ ![Screenshot of the pdflinkcheck CLI Tree Help](https://raw.githubusercontent.com/City-of-Memphis-Wastewater/pdflinkcheck/main/assets/pdflinkcheck_cli_v1.1.58_tree_help.png)
112
+
113
+ `pdflinkcheck --help`:
114
+ ![Screenshot of the pdflinkcheck CLI Tree Help](https://raw.githubusercontent.com/City-of-Memphis-Wastewater/pdflinkcheck/main/assets/pdflinkcheck_cli_v1.1.58.png)
115
+
116
+
117
+ ### Available Commands
118
+
119
+ |**Command**|**Description**|
120
+ |---|---|
121
+ |`pdflinkcheck analyze`|Analyzes a PDF file for links |
122
+ |`pdflinkcheck gui`|Explicitly launch the Graphical User Interface.|
123
+ |`pdflinkcheck docs`|Access documentation, including the README and AGPLv3+ license.|
124
+
125
+ ### `analyze` Command Options
126
+
127
+ |**Option**|**Description**|**Default**|
128
+ |---|---|---|
129
+ |`<PDF_PATH>`|**Required.** The path to the PDF file to analyze.|N/A|
130
+ |`--pdf-library / -p`|Select engine: `pymupdf` or `pypdf`.|`pypdf`|
131
+ |`--export-format / -e`|Export to `JSON`, `TXT`, or `None` to suppress file output.|`JSON`|
132
+ |`--max-links / -m`|Maximum links to display per section. Use `0` for all.|`0`|
133
+
134
+ ### `gui` Command Options
135
+
136
+ | **Option** | **Description** | **Default** |
137
+ | ---------------------- | ------------------------------------------------------------------------------------------------------------- | -------------- |
138
+ | `--auto-close INTEGER` | **(For testing/automation only).** Delay in milliseconds after which the GUI window will automatically close. | `0` (Disabled) |
139
+
140
+ #### Example Runs
141
+
142
+ ```bash
143
+ # Analyze a document, show all links, and save the report as JSON and TXT
144
+ pdflinkcheck analyze "TE Maxson WWTF O&M Manual.pdf" --export-format JSON,TXT
145
+
146
+ # Analyze a document but keep the print block short, showing only the first 10 links for each type
147
+ pdflinkcheck analyze "TE Maxson WWTF O&M Manual.pdf" --max-links 10
148
+
149
+ # Show the GUI for only a moment, like in a build check
150
+ pdflinkcheck gui --auto-close 3000
151
+
152
+ # Show both the LICENSE and README.md docs
153
+ pdflinkcheck docs --license --readme
154
+ ```
155
+
156
+ -----
157
+
158
+ ## 📦 Library Access (Advanced)
159
+
160
+ For developers importing `pdflinkcheck` into other Python projects, the core analysis functions are exposed directly in the root namespace:
161
+
162
+ |**Function**|**Description**|
163
+ |---|---|
164
+ |`run_report()`|**(Primary function)** Performs the full analysis, prints to console, and handles file export.|
165
+ |`extract_links_pynupdf()`|Function to retrieve all explicit links (URIs, GoTo, etc.) from a PDF path.|
166
+ |`extract_toc_pymupdf()`|Function to extract the PDF's internal Table of Contents (bookmarks/outline).|
167
+ |`extract_links_pynupdf()`|Function to retrieve all explicit links (URIs, GoTo, etc.) from a PDF path, using the pypdf library.|
168
+ |`extract_toc_pymupdf()`|Function to extract the PDF's internal Table of Contents (bookmarks/outline), using the pypdf library.|
169
+
170
+ Exanple:
171
+
172
+ ```python
173
+ from pdflinkcheck.report import run_report
174
+ from pdflinkcheck.analysis_pymupdf import extract_links_pymupdf, extract_toc_pymupdf 130 from pdflinkcheck.analysis_pymupdf import extract_links_pynupdf, extract_toc_pymupdf
175
+ from pdflinkcheck.analysis_pypdf import extract_links_pypdf, extract_toc_pypdf
176
+
177
+ file = "document1.pdf"
178
+ report_data = run_report(file)
179
+ links_pymupdf = extract_links_pymupdf(file)
180
+ links_pypdf = extract_links_pypdf(file)
181
+ ```
182
+
183
+ -----
184
+
185
+ ## ✨ Features
186
+
187
+ * **Active Link Extraction:** Identifies and categorizes all programmed links (External URIs, Internal GoTo/Destinations, Remote Jumps).
188
+ * **Anchor Text Retrieval:** Extracts the visible text corresponding to each link's bounding box.
189
+ * **Structural TOC:** Extracts the PDF's internal Table of Contents (bookmarks/outline).
190
+
191
+ -----
192
+
193
+ ## 🥚 Optional REPL‑Friendly GUI Access (Easter Egg)
194
+
195
+ For users who prefer exploring tools interactively—especially those coming from MATLAB or other REPL‑first environments—`pdflinkcheck` includes an optional Easter egg that exposes the GUI launcher directly in the library namespace.
196
+
197
+ This feature is **disabled by default** and has **no effect on normal imports**.
198
+
199
+ ### Enabling the Easter Egg
200
+
201
+ Set the environment variable before importing the library:
202
+
203
+ ```python
204
+ import os
205
+ os.environ["PDFLINKCHECK_GUI_EASTEREGG"] = "true"
206
+
207
+ import pdflinkcheck
208
+ pdflinkcheck.start_gui()
209
+ ```
210
+
211
+ Accepted values include: `true`, `1`, `yes`, `on` (case‑insensitive).
212
+
213
+ ### Purpose
214
+
215
+ This opt‑in behavior is designed to make the library feel welcoming to beginners who are experimenting in a Python REPL for the first time. When enabled, the `start_gui()` function becomes available at the top level:
216
+
217
+ ```python
218
+ pdflinkcheck.start_gui()
219
+ ```
220
+
221
+ If the `PDFLINKCHECK_GUI_EASTEREGG` environment variable is not set—or if GUI support is unavailable—`pdflinkcheck` behaves as a normal library with no GUI functions exposed.
222
+
223
+ ### Another Easter Egg
224
+
225
+ ```bash
226
+ DEV_TYPER_HELP_TREE=1 pdflinkcheck help-tree
227
+ ```
228
+
229
+ This `help-tree` feature has not yet been submitted for inclusion into Typer.
230
+
231
+ -----
232
+
233
+ ## ⚠️ Compatibility Notes
234
+
235
+ #### Termux Compatibility as a Key Goal
236
+ A key goal of City-of-Memphis-Wastewater is to release all software as Termux-compatible.
237
+
238
+ Termux compatibility is important in the modern age as Android devices are common among technicians, field engineers, and maintenace staff.
239
+ Android is the most common operating system in the Global South.
240
+ We aim to produce stable software that can do the most possible good.
241
+
242
+ While using `PyMuPDF` in Python dependency resolution on Termux simply isn't possible, we are proud to have achieved a work-around by implementing a parallel solution in `pypdf`!
243
+ Now, there is PDF Engine selection in both the CLI and the GUI.
244
+ `pypdf` is the default in pdflinkcheck.report.run_report(); PyMuPDF can be explicitly requested in the CLI and is the default in the TKinter GUI.
245
+
246
+ Now that `pdflinkcheck` can run on Termux, we may find a work-around and be able to drop the PyMuPDF dependency.
247
+ - Build `pypdf`-only artifacts, to reduce size.
248
+ - Build a web-stack GUI as an alternative to the Tkinter GUI, to be compatible with Termux.
249
+
250
+ Because it works, we plan to keep the `PyMuPDF` portion of the codebase.
251
+
252
+ ### Document Compatibility:
253
+ Not all PDF files can be processed successfully. This tool is designed primarily for digitally generated (vector-based) PDFs.
254
+
255
+ Processing may fail or yield incomplete results for:
256
+ * **Scanned PDFs** (images of text) that lack an accessible text layer.
257
+ * **Encrypted or Password-Protected** documents.
258
+ * **Malformed or non-standard** PDF files.
259
+
260
+ -----
261
+
262
+ ## PDF Library Selection
263
+ At long last, `PyMuPDF` is an optional dependency. The default is `pypdf`. All testing has shown identical performance, though the `analyze_pymupdf.py` is faster and more direct and robust than `analyze_pypdf.py`, which requires a lot of intentional parsing.
264
+
265
+ Binaries and artifacts are expected to contain PyMuPDF, unless they are build on Android. The GUI and CLI interfaces both allow selection of the library; if PyMuPDF is selected but is not available, the user will be warned.
266
+
267
+ To install the complete version use one of these options:
268
+
269
+ ```bash
270
+ pip install "pdflinkcheck[full]"
271
+ pipx install "pdflinkcheck[full]"
272
+ uv tool install "pdflinkcheck[full]"
273
+ uv add "pdflinkcheck[full]"
274
+ ```
275
+
276
+ -----
277
+
278
+ ## Run from Source (Developers)
279
+
280
+ ```bash
281
+ git clone http://github.com/city-of-memphis-wastewater/pdflinkcheck.git
282
+ cd pdflinkcheck
283
+
284
+ # To include the PyMuPDF dependency in the installation:
285
+ uv sync --extras full
286
+
287
+ # On Termux, to not include PyMuPDF:
288
+ uv sync
289
+
290
+ # To include developer depedecies:
291
+ uv sync --all-extras --group dev
292
+
293
+ # Run the CLI
294
+ uv run python src/pdflinkcheck/cli.py --help
295
+
296
+ # Run a basic webapp and Termux-facing browser-based interface
297
+ uv run python -m pdflinkcheck.stdlib_server
298
+ ```
299
+
300
+ -----
301
+
302
+ ## 📜 License Implications (AGPLv3+)
303
+
304
+ **`pdflinkcheck` is licensed under the `GNU Affero General Public License` version 3 or later (`AGPLv3+`).**
305
+
306
+ The `AGPL3+` is required for portions of this codebase because `pdflinkcheck` uses `PyMuPDF`, which is licensed under the `AGPL3`.
307
+
308
+ To stay in compliance, the AGPL3 license text is readily available in the CLI and the GUI, and it is included in the build artifacts.
309
+ The `AGPL3` appears as the primary license file in the source code. While this infers that the entire project is AGPL3-licensed, this is not true - portions of the codebase are MIT-licensed.
310
+
311
+ This license has significant implications for **distribution and network use**, particularly for organizations:
312
+
313
+ * **Source Code Provision:** If you distribute this tool (modified or unmodified) to anyone, you **must** provide the full source code under the same license.
314
+ * **Network Interaction (Affero Clause):** If you modify this tool and make the modified version available to users over a computer network (e.g., as a web service or backend), you **must** also offer the source code to those network users.
315
+
316
+ > **Before deploying or modifying this tool for organizational use, especially for internal web services or distribution, please ensure compliance with the AGPLv3+ terms.**
317
+
318
+ Links:
319
+ - Source code: https://github.com/City-of-Memphis-Wastewater/pdflinkcheck/
320
+ - Official AGPLv3 Text (FSF): https://www.gnu.org/licenses/agpl-3.0.html
321
+
322
+ Copyright © 2025 George Clayton Bennett
@@ -0,0 +1,21 @@
1
+ pdflinkcheck/__init__.py,sha256=KyoFlScM3kPrp1HjcxHDFEf4YflsoYclVF99-rerl3E,2510
2
+ pdflinkcheck/analyze_pymupdf.py,sha256=Be17KJQnTX9OoAluoE2GzPXC3mDCo7VGCNuwc9ilosc,12452
3
+ pdflinkcheck/analyze_pypdf.py,sha256=gHF9o6EY4sie727vS6YjTCQSzw_XWZape4xEk-l4lRI,6397
4
+ pdflinkcheck/analyze_pypdf_v2.py,sha256=dAvq2OoiN1MjptWSgOrAlArg0A98Hvpr105BKXJBrjE,7563
5
+ pdflinkcheck/cli.py,sha256=8PTkbK4msbhYB2NUCkUv8DWU7lO2qYg8qQKT_cB2U6w,12634
6
+ pdflinkcheck/data/LICENSE,sha256=hIahDEOTzuHCU5J2nd07LWwkLW7Hko4UFO__ffsvB-8,34523
7
+ pdflinkcheck/data/README.md,sha256=9tM77vu5jTpFQplL2A-ysyVyOQg8QZISsmtcmEfQXZM,11650
8
+ pdflinkcheck/data/pyproject.toml,sha256=nsh5tK1V_MD7iPTXjxcPjWPi5xXEbmUW7iWn-MfxxJo,2955
9
+ pdflinkcheck/datacopy.py,sha256=pZysPvfsvRe3qvA-du8XJvwZFxEOB_1ygEvhEj_Zj2Y,2503
10
+ pdflinkcheck/dev.py,sha256=e-0353spmVPPQGB2aJ_QbEDtJQGQFBSLrrfSccJGwII,4783
11
+ pdflinkcheck/gui.py,sha256=TYjP0vCDtuyRYMi6-c2JdCgif4FWNKyrwdye13FTv_8,24434
12
+ pdflinkcheck/io.py,sha256=ZdvKUumFIR8Ql89WToaVDqnosAo43H6sCRnbqwspE80,7943
13
+ pdflinkcheck/report.py,sha256=MmUs2Cftm6sbT__uCzgU-v6lsSQ1IjzsvoM385Xxl8g,11777
14
+ pdflinkcheck/stdlib_server.py,sha256=NKDPi-cfrBnYtG7mIxSI1eR1XSt8bxyan9YpdDAwhEU,6138
15
+ pdflinkcheck/validate.py,sha256=AtROBUZ6EmXxsx0xmqcSTYSlaippnkymp8s5eN4qN3o,14391
16
+ pdflinkcheck/version_info.py,sha256=dRVbs9U97YKisB1cLqVC2IoNrHCYw3z9TG8aldqTVOk,3211
17
+ pdflinkcheck-1.1.72.dist-info/licenses/LICENSE,sha256=hIahDEOTzuHCU5J2nd07LWwkLW7Hko4UFO__ffsvB-8,34523
18
+ pdflinkcheck-1.1.72.dist-info/WHEEL,sha256=ZyFSCYkV2BrxH6-HRVRg3R9Fo7MALzer9KiPYqNxSbo,79
19
+ pdflinkcheck-1.1.72.dist-info/entry_points.txt,sha256=OJs4WkAziNGSoZ2KP0FgYOj2JdL6EW8UphJebWJnz3c,55
20
+ pdflinkcheck-1.1.72.dist-info/METADATA,sha256=HORgjln1UF9Zdx3BwHfrKBR1OZSV7WwfDi2s6z8JNnM,13568
21
+ pdflinkcheck-1.1.72.dist-info/RECORD,,
@@ -0,0 +1,4 @@
1
+ Wheel-Version: 1.0
2
+ Generator: uv 0.9.18
3
+ Root-Is-Purelib: true
4
+ Tag: py3-none-any
@@ -1,3 +1,3 @@
1
1
  [console_scripts]
2
2
  pdflinkcheck = pdflinkcheck.cli:app
3
- pdflinkcheck-gui = pdflinkcheck.gui:start_gui
3
+