zipinspect 0.1.0__py3-none-any.whl

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,107 @@
1
+ Metadata-Version: 2.4
2
+ Name: zipinspect
3
+ Version: 0.1.0
4
+ Summary: Small utility to inspect/extract Zip files over HTTP
5
+ Author-email: Cynthia <cynthia2048@proton.me>
6
+ Requires-Python: >=3.14
7
+ Description-Content-Type: text/markdown
8
+ License-File: LICENSE
9
+ Requires-Dist: aioconsole>=0.8.2
10
+ Requires-Dist: tabulate>=0.9.0
11
+ Requires-Dist: progress>=1.6.1
12
+ Requires-Dist: httpx[http2]>=0.28.1
13
+
14
+ # zipinspect
15
+
16
+ PKWare's [Zip](https://en.wikipedia.org/wiki/ZIP_(file_format)) is the ubiquitous format for file archival; so much so that it's considered both a noun and verb. Invented in 1989, it has been extensively used to compress or seamlessly transfer multiple files. Zip has one major advantage over [Tarballs](https://en.wikipedia.org/wiki/Tar_(computing)) — random access. Some (especially UNIX purists) may criticise Zip for worse compression ratios if there's data redundancy present amongst files in the archive, because it compresses file individually. However, that its strongest points too; it enables us to extract a single file without decompressing the whole archive, unlike compressed tarballs. And, not only that, it enables fast append/update/deletes, which is not possible Tarballs, without decompressing and creating one anew.
17
+
18
+ This tool covers a rather niche usecase — Zip files on the network, accessed using HTTP. HTTP has a neat feature called [range requests](https://http.dev/range-request), which is extensively used here; in your browser it's typically used for resumable downloads. In a nutshell, it's a variant of the normal GET request wherein the client signals the range of data it's interested in, and server responds accordingly with 206 status code. Here, this is what allows for random access of files.
19
+
20
+ ## Demo
21
+
22
+ ```sh
23
+ $ zipinspect 'https://example.com/ArthurRimbaud-OnlyFans.zip'
24
+ > list
25
+ # entry size modified date
26
+ --- ----------------------- ------ -------------------
27
+ 0 ArthurRimbaudOF_001.jpg 2.2M 2024-11-07T18:41:46
28
+ 1 ArthurRimbaudOF_002.jpg 2.4M 2024-11-07T18:41:48
29
+ 2 ArthurRimbaudOF_003.jpg 2.4M 2024-11-07T18:41:50
30
+ 3 ArthurRimbaudOF_004.jpg 2.5M 2024-11-07T18:41:50
31
+ 4 ArthurRimbaudOF_005.jpg 2.3M 2024-11-07T18:41:52
32
+ 5 ArthurRimbaudOF_006.jpg 2.4M 2024-11-07T18:41:52
33
+ 6 ArthurRimbaudOF_007.jpg 2.2M 2024-11-07T18:41:54
34
+ 7 ArthurRimbaudOF_008.jpg 2.4M 2024-11-07T18:41:56
35
+ 8 ArthurRimbaudOF_009.jpg 2.4M 2024-11-07T18:41:56
36
+ 9 ArthurRimbaudOF_010.jpg 2.3M 2024-11-07T18:41:58
37
+ 10 ArthurRimbaudOF_011.jpg 2.5M 2024-11-07T18:41:58
38
+ 11 ArthurRimbaudOF_012.jpg 1.5M 2024-11-07T18:42:00
39
+ 12 ArthurRimbaudOF_013.jpg 2.4M 2024-11-07T18:42:00
40
+ 13 ArthurRimbaudOF_014.jpg 2.6M 2024-11-07T18:42:02
41
+ 14 ArthurRimbaudOF_015.jpg 2.8M 2024-11-07T18:42:02
42
+ 15 ArthurRimbaudOF_016.jpg 2.8M 2024-11-07T18:42:04
43
+ 16 ArthurRimbaudOF_017.jpg 2.3M 2024-11-07T18:42:04
44
+ 17 ArthurRimbaudOF_018.jpg 2.9M 2024-11-07T18:42:06
45
+ 18 ArthurRimbaudOF_019.jpg 3.1M 2024-11-07T18:42:08
46
+ 19 ArthurRimbaudOF_020.jpg 2.9M 2024-11-07T18:42:08
47
+ 20 ArthurRimbaudOF_021.jpg 3.1M 2024-11-07T18:42:10
48
+ 21 ArthurRimbaudOF_022.jpg 3.1M 2024-11-07T18:42:10
49
+ 22 ArthurRimbaudOF_023.jpg 3.1M 2024-11-07T18:42:12
50
+ 23 ArthurRimbaudOF_024.jpg 3.0M 2024-11-07T18:42:14
51
+ 24 ArthurRimbaudOF_025.jpg 2.9M 2024-11-07T18:42:14
52
+ (Page 1/14)
53
+ > extract 8
54
+
55
+ |#######################################################################| 100%
56
+
57
+ > extract 8,9,16
58
+
59
+ |#######################################################################| 100%
60
+
61
+ > extract 20,...,24
62
+
63
+ |#######################################################################| 100%
64
+
65
+ >
66
+ ```
67
+
68
+ First the entries in the archive — files and directories — are loaded, and the user is presented with a REPL (command prompt), where the files could be easily browsed and extracted. Multiple entries could be downloaded concurrently thanks to its underlying asynchronous implementation.
69
+
70
+ ## Features & Limitations
71
+
72
+ - Multiple parallel extractions.
73
+ - HTTP/2 for better download performance.
74
+ - Zip files over 4GiB (Zip64) supported.
75
+ - DEFLATE, BZip2, LZMA and [Zstd](https://en.wikipedia.org/wiki/Zstd) compression supported.
76
+ - ZipCrypto or WinZip AES aren't supported.
77
+ - Multi-part (spanned) files aren't supported.
78
+
79
+ ## Help
80
+
81
+ In the REPL, `help` command lists all the available commands and their corresponding arguments.
82
+ > help
83
+ This is the REPL, and the following commands are available.
84
+
85
+ list List entries in the current page
86
+ prev Go backward one page and show entries
87
+ next Go forward one page and show entries
88
+ extract <index> [dir] Extract entry with index <index>
89
+ extract <start>,...,<end> [dir] Extract entries from <start> to <end>
90
+ extract <i0>,<i1>,...<in> [dir] Extract entries with specified indices
91
+
92
+ NOTE: The extract command accepts an optional path to the directory to extract into.
93
+ If not provided, it extracts into the current working directory
94
+
95
+ If any of the arguments contains a space wrap it in a double-quote; or if it contains a double quote, wrap in a double quote and backslash-escape it.
96
+
97
+ ## Remarks
98
+
99
+ Initially, [zipfile](https://docs.python.org/3/library/zipfile.html#zipfile-objects) was considered along with a seekable file-like interface into the remote file using HTTP transport. Although the prototype worked, but it was nowhere near as performant as it is now. The major issue was that, through the abstract interface sequential accesses couldn't be differentiated with random accesses.
100
+
101
+ Technically only sequential access is possible with HTTP, because HTTP is a stateless protocol; but to support our needs, random accesses are implemented using HTTP range requests. This isn't without performance penalty, as for each request the server has to setup a handler to serve that request; so we have to minimise these if the amount of data to be read is known in advance. We do [know the compressed size], but unfortunately the `zipfile` API isn't aware all these complexities, so it does a lot of unnecessary seeks that prevents any possible optimisations.
102
+
103
+ ### The solution?
104
+
105
+ Implement the Zip specification from scratch, preferably with asynchronous API to allow concurrent extractions. That's what was done. Much the information on implementation was derived from the Wikipedia page and PKWare [APPNOTE.txt](https://pkwaredownloads.blob.core.windows.net/pkware-general/Documentation/APPNOTE-6.3.9.TXT). It's not entirely specification-compliant, but hopes to function in majority of the cases.
106
+
107
+
@@ -0,0 +1,12 @@
1
+ zipinspect/__init__.py,sha256=T4K6Vtx0_pTDvGIJytv2Uud06KUCtB7F45LpuAqg3WE,8564
2
+ zipinspect/__main__.py,sha256=5BjNuyet8AY-POwoF5rGt722rHQ7tJ0Vf0UFUfzzi-I,58
3
+ zipinspect/utils/__init__.py,sha256=47DEQpj8HBSa-_TImW-5JCeuQeRkm5NMpJWZG3hSuFU,0
4
+ zipinspect/utils/asyncio.py,sha256=l-RyicibpbgbwArexewLU0a6b2poCb71zciOmGnXeSc,505
5
+ zipinspect/utils/misc.py,sha256=JG8NwZ93K1zEK_QkrJqUUDu_2Ghfyy2rvmaf1ivL1sE,987
6
+ zipinspect/zipread/__init__.py,sha256=8OCG4o0zOiaqkUXNeCV9ksrGCQTgGcF8P28FqJXLRDI,9918
7
+ zipinspect/zipread/stubs.py,sha256=1eqrmpsBZaYtkmXTZuUeEBwd5OTYuOJ27QfAdMJkW7w,1071
8
+ zipinspect-0.1.0.dist-info/entry_points.txt,sha256=BKEs4a92Zk75VfmObWBQDhv9T7GmLS20cI_AmYgV2mI,46
9
+ zipinspect-0.1.0.dist-info/licenses/LICENSE,sha256=OXLcl0T2SZ8Pmy2_dmlvKuetivmyPd5m1q-Gyd-zaYY,35149
10
+ zipinspect-0.1.0.dist-info/WHEEL,sha256=G2gURzTEtmeR8nrdXUJfNiB3VYVxigPQ-bEQujpNiNs,82
11
+ zipinspect-0.1.0.dist-info/METADATA,sha256=qftXDMWK1CT0U3Tjez9WAufglxcC7mBLRDBrGyy7bNQ,6678
12
+ zipinspect-0.1.0.dist-info/RECORD,,
@@ -0,0 +1,4 @@
1
+ Wheel-Version: 1.0
2
+ Generator: flit 3.12.0
3
+ Root-Is-Purelib: true
4
+ Tag: py3-none-any
@@ -0,0 +1,3 @@
1
+ [console_scripts]
2
+ zipinspect=zipinspect:main
3
+