zipremove 0.4.1__tar.gz → 0.6.0__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -1,6 +1,6 @@
1
1
  Metadata-Version: 2.4
2
2
  Name: zipremove
3
- Version: 0.4.1
3
+ Version: 0.6.0
4
4
  Summary: Extend `zipfile` with `remove`-related functionalities
5
5
  Home-page: https://github.com/danny0838/zipremove
6
6
  Author: Danny Lin
@@ -45,14 +45,10 @@ This package extends `zipfile` with `remove`-related functionalities.
45
45
 
46
46
  * `ZipFile.remove(zinfo_or_arcname)`
47
47
 
48
- Removes a member from the archive. *zinfo_or_arcname* may be the full path
49
- of the member or a `ZipInfo` instance.
50
-
51
- If multiple members share the same full path, only one is removed when
52
- a path is provided.
53
-
54
- This does not physically remove the local file entry from the archive.
55
- Call `ZipFile.repack` afterwards to reclaim space.
48
+ Removes a member entry from the archive's central directory.
49
+ *zinfo_or_arcname* may be the full path of the member or a `ZipInfo`
50
+ instance. If multiple members share the same full path and the path is
51
+ provided, only one of them is removed.
56
52
 
57
53
  The archive must be opened with mode ``'w'``, ``'x'`` or ``'a'``.
58
54
 
@@ -60,42 +56,49 @@ This package extends `zipfile` with `remove`-related functionalities.
60
56
 
61
57
  Calling `remove` on a closed ZipFile will raise a `ValueError`.
62
58
 
59
+ > **Note:**
60
+ > This method only removes the member's entry from the central directory,
61
+ > making it inaccessible to most tools. The member's local file entry,
62
+ > including content and metadata, remains in the archive and is still
63
+ > recoverable using forensic tools. Call `repack` afterwards to completely
64
+ > remove the member and reclaim space.
65
+
63
66
  * `ZipFile.repack(removed=None, *, strict_descriptor=False[, chunk_size])`
64
67
 
65
- Rewrites the archive to remove stale local file entries, shrinking its file
66
- size.
68
+ Rewrites the archive to remove unreferenced local file entries, shrinking
69
+ its file size. The archive must be opened with mode ``'a'``.
67
70
 
68
71
  If *removed* is provided, it must be a sequence of `ZipInfo` objects
69
- representing removed entries; only their corresponding local file entries
70
- will be removed.
71
-
72
- If *removed* is not provided, the archive is scanned to identify and remove
73
- local file entries that are no longer referenced in the central directory.
74
- The algorithm assumes that local file entries (and the central directory,
75
- which is mostly treated as the "last entry") are stored consecutively:
76
-
77
- 1. Data before the first referenced entry is removed only when it appears to
78
- be a sequence of consecutive entries with no extra following bytes; extra
79
- preceding bytes are preserved.
80
- 2. Data between referenced entries is removed only when it appears to
81
- be a sequence of consecutive entries with no extra preceding bytes; extra
82
- following bytes are preserved.
83
- 3. Entries must not overlap. If any entry's data overlaps with another, a
84
- `BadZipFile` error is raised and no changes are made.
85
-
86
- When scanning, setting `strict_descriptor=True` disables detection of any
87
- entry using an unsigned data descriptor (deprecated in the ZIP specification
88
- since version 6.3.0, released on 2006-09-29, and used only by some legacy
89
- tools). This improves performance, but may cause some stale entries to be
90
- preserved.
72
+ representing the recently removed members, and only their corresponding
73
+ local file entries will be removed. Otherwise, the archive is scanned to
74
+ locate and remove local file entries that are no longer referenced in the
75
+ central directory.
76
+
77
+ When scanning, setting ``strict_descriptor=True`` disables detection of any
78
+ entry using an unsigned data descriptor (a format deprecated by the ZIP
79
+ specification since version 6.3.0, released on 2006-09-29, and used only by
80
+ some legacy tools), which is significantly slower to scan—around 100 to
81
+ 1000 times in the worst case. This does not affect performance on entries
82
+ without such feature.
91
83
 
92
84
  *chunk_size* may be specified to control the buffer size when moving
93
85
  entry data (default is 1 MiB).
94
86
 
95
- The archive must be opened with mode ``'a'``.
96
-
97
87
  Calling `repack` on a closed ZipFile will raise a `ValueError`.
98
88
 
89
+ > **Note:**
90
+ > The scanning algorithm is heuristic-based and assumes that the ZIP file
91
+ > is normally structured—for example, with local file entries stored
92
+ > consecutively, without overlap or interleaved binary data. Prepended
93
+ > binary data, such as a self-extractor stub, is recognized and preserved
94
+ > unless it happens to contain bytes that coincidentally resemble a valid
95
+ > local file entry in multiple respects—an extremely rare case. Embedded
96
+ > ZIP payloads are also handled correctly, as long as they follow normal
97
+ > structure. However, the algorithm does not guarantee correctness or
98
+ > safety on untrusted or intentionally crafted input. It is generally
99
+ > recommended to provide the *removed* argument for better reliability and
100
+ > performance.
101
+
99
102
  * `ZipFile.copy(zinfo_or_arcname, new_arcname[, chunk_size])`
100
103
 
101
104
  Copies a member *zinfo_or_arcname* to *new_arcname* in the archive.
@@ -11,14 +11,10 @@ This package extends `zipfile` with `remove`-related functionalities.
11
11
 
12
12
  * `ZipFile.remove(zinfo_or_arcname)`
13
13
 
14
- Removes a member from the archive. *zinfo_or_arcname* may be the full path
15
- of the member or a `ZipInfo` instance.
16
-
17
- If multiple members share the same full path, only one is removed when
18
- a path is provided.
19
-
20
- This does not physically remove the local file entry from the archive.
21
- Call `ZipFile.repack` afterwards to reclaim space.
14
+ Removes a member entry from the archive's central directory.
15
+ *zinfo_or_arcname* may be the full path of the member or a `ZipInfo`
16
+ instance. If multiple members share the same full path and the path is
17
+ provided, only one of them is removed.
22
18
 
23
19
  The archive must be opened with mode ``'w'``, ``'x'`` or ``'a'``.
24
20
 
@@ -26,42 +22,49 @@ This package extends `zipfile` with `remove`-related functionalities.
26
22
 
27
23
  Calling `remove` on a closed ZipFile will raise a `ValueError`.
28
24
 
25
+ > **Note:**
26
+ > This method only removes the member's entry from the central directory,
27
+ > making it inaccessible to most tools. The member's local file entry,
28
+ > including content and metadata, remains in the archive and is still
29
+ > recoverable using forensic tools. Call `repack` afterwards to completely
30
+ > remove the member and reclaim space.
31
+
29
32
  * `ZipFile.repack(removed=None, *, strict_descriptor=False[, chunk_size])`
30
33
 
31
- Rewrites the archive to remove stale local file entries, shrinking its file
32
- size.
34
+ Rewrites the archive to remove unreferenced local file entries, shrinking
35
+ its file size. The archive must be opened with mode ``'a'``.
33
36
 
34
37
  If *removed* is provided, it must be a sequence of `ZipInfo` objects
35
- representing removed entries; only their corresponding local file entries
36
- will be removed.
37
-
38
- If *removed* is not provided, the archive is scanned to identify and remove
39
- local file entries that are no longer referenced in the central directory.
40
- The algorithm assumes that local file entries (and the central directory,
41
- which is mostly treated as the "last entry") are stored consecutively:
42
-
43
- 1. Data before the first referenced entry is removed only when it appears to
44
- be a sequence of consecutive entries with no extra following bytes; extra
45
- preceding bytes are preserved.
46
- 2. Data between referenced entries is removed only when it appears to
47
- be a sequence of consecutive entries with no extra preceding bytes; extra
48
- following bytes are preserved.
49
- 3. Entries must not overlap. If any entry's data overlaps with another, a
50
- `BadZipFile` error is raised and no changes are made.
51
-
52
- When scanning, setting `strict_descriptor=True` disables detection of any
53
- entry using an unsigned data descriptor (deprecated in the ZIP specification
54
- since version 6.3.0, released on 2006-09-29, and used only by some legacy
55
- tools). This improves performance, but may cause some stale entries to be
56
- preserved.
38
+ representing the recently removed members, and only their corresponding
39
+ local file entries will be removed. Otherwise, the archive is scanned to
40
+ locate and remove local file entries that are no longer referenced in the
41
+ central directory.
42
+
43
+ When scanning, setting ``strict_descriptor=True`` disables detection of any
44
+ entry using an unsigned data descriptor (a format deprecated by the ZIP
45
+ specification since version 6.3.0, released on 2006-09-29, and used only by
46
+ some legacy tools), which is significantly slower to scan—around 100 to
47
+ 1000 times in the worst case. This does not affect performance on entries
48
+ without such feature.
57
49
 
58
50
  *chunk_size* may be specified to control the buffer size when moving
59
51
  entry data (default is 1 MiB).
60
52
 
61
- The archive must be opened with mode ``'a'``.
62
-
63
53
  Calling `repack` on a closed ZipFile will raise a `ValueError`.
64
54
 
55
+ > **Note:**
56
+ > The scanning algorithm is heuristic-based and assumes that the ZIP file
57
+ > is normally structured—for example, with local file entries stored
58
+ > consecutively, without overlap or interleaved binary data. Prepended
59
+ > binary data, such as a self-extractor stub, is recognized and preserved
60
+ > unless it happens to contain bytes that coincidentally resemble a valid
61
+ > local file entry in multiple respects—an extremely rare case. Embedded
62
+ > ZIP payloads are also handled correctly, as long as they follow normal
63
+ > structure. However, the algorithm does not guarantee correctness or
64
+ > safety on untrusted or intentionally crafted input. It is generally
65
+ > recommended to provide the *removed* argument for better reliability and
66
+ > performance.
67
+
65
68
  * `ZipFile.copy(zinfo_or_arcname, new_arcname[, chunk_size])`
66
69
 
67
70
  Copies a member *zinfo_or_arcname* to *new_arcname* in the archive.
@@ -1,6 +1,6 @@
1
1
  [metadata]
2
2
  name = zipremove
3
- version = 0.4.1
3
+ version = 0.6.0
4
4
  author = Danny Lin
5
5
  author_email = danny0838@gmail.com
6
6
  url = https://github.com/danny0838/zipremove
@@ -82,7 +82,7 @@ class _ZipRepacker:
82
82
 
83
83
  def copy(self, zfile, zinfo, filename):
84
84
  # make a copy of zinfo
85
- zinfo2 = copy.deepcopy(zinfo)
85
+ zinfo2 = copy.copy(zinfo)
86
86
 
87
87
  # apply sanitized new filename as in `ZipInfo.__init__`
88
88
  zinfo2.orig_filename = filename
@@ -90,7 +90,7 @@ class _ZipRepacker:
90
90
 
91
91
  zinfo2.header_offset = zfile.start_dir
92
92
 
93
- # polyfill: update zinfo2._end_offset if exists
93
+ # polyfill: clear zinfo2._end_offset if exists
94
94
  # (Python >= 3.8 with fix #109858)
95
95
  if hasattr(zinfo2, '_end_offset'):
96
96
  zinfo2._end_offset = None
@@ -113,10 +113,9 @@ class _ZipRepacker:
113
113
  """
114
114
  Repack the ZIP file, stripping unreferenced local file entries.
115
115
 
116
- Assumes that local file entries are stored consecutively, with no gaps
117
- or overlaps.
118
-
119
- Behavior:
116
+ Assumes that local file entries (and the central directory, which is
117
+ mostly treated as the "last entry") are stored consecutively, with no
118
+ gaps or overlaps:
120
119
 
121
120
  1. If any referenced entry overlaps with another, a `BadZipFile` error
122
121
  is raised since safe repacking cannot be guaranteed.
@@ -129,8 +128,8 @@ class _ZipRepacker:
129
128
  be a sequence of consecutive entries with no extra preceding bytes;
130
129
  extra following bytes are preserved.
131
130
 
132
- 4. This is to prevent an unexpected data removal (false positive),
133
- though a false negative may happen in certain rare cases.
131
+ This is to prevent an unexpected data removal (false positive), though
132
+ a false negative may happen in certain rare cases.
134
133
 
135
134
  Examples:
136
135
 
@@ -178,10 +177,11 @@ class _ZipRepacker:
178
177
 
179
178
  Side effects:
180
179
  - Modifies the ZIP file in place.
181
- - Updates zfile.start_dir to account for removed data.
180
+ - Updates zfile.start_dir and zfile.data_offset to account for
181
+ removed data.
182
182
  - Sets zfile._didModify to True.
183
- - Updates header_offset and _end_offset of referenced ZipInfo
184
- instances.
183
+ - Updates header_offset and clears _end_offset of referenced
184
+ ZipInfo instances.
185
185
 
186
186
  Parameters:
187
187
  zfile: A ZipFile object representing the archive to repack.
@@ -283,17 +283,20 @@ class _ZipRepacker:
283
283
  zfile.start_dir -= entry_offset
284
284
  zfile._didModify = True
285
285
 
286
- # polyfill: update ZipInfo._end_offset if exists
286
+ # polyfill: update _data_offset if exists
287
+ if getattr(zfile, '_data_offset', None):
288
+ try:
289
+ offset = filelist[0].header_offset
290
+ except IndexError:
291
+ offset = zfile.start_dir
292
+ if offset < zfile._data_offset:
293
+ zfile._data_offset = offset
294
+
295
+ # polyfill: clear ZipInfo._end_offset if exists
287
296
  # (Python >= 3.8 with fix #109858)
288
297
  if hasattr(ZipInfo, '_end_offset'):
289
- end_offset = zfile.start_dir
290
- for zinfo in reversed(filelist):
291
- if zinfo in removed_zinfos:
292
- zinfo._end_offset = None
293
- else:
294
- if zinfo._end_offset is not None:
295
- zinfo._end_offset = end_offset
296
- end_offset = zinfo.header_offset
298
+ for zinfo in filelist:
299
+ zinfo._end_offset = None
297
300
 
298
301
  def _calc_initial_entry_offset(self, fp, data_offset):
299
302
  checked_offsets = {}
@@ -307,7 +310,8 @@ class _ZipRepacker:
307
310
  return entry_size
308
311
  return 0
309
312
 
310
- def _iter_scan_signature(self, fp, signature, start_offset, end_offset, chunk_size=4096):
313
+ def _iter_scan_signature(self, fp, signature, start_offset, end_offset,
314
+ chunk_size=io.DEFAULT_BUFFER_SIZE):
311
315
  sig_len = len(signature)
312
316
  remainder = b''
313
317
  pos = start_offset
@@ -513,7 +517,8 @@ class _ZipRepacker:
513
517
 
514
518
  return crc, compress_size, file_size, dd_size
515
519
 
516
- def _trace_compressed_block_end(self, fp, offset, end_offset, decompressor, chunk_size=4096):
520
+ def _trace_compressed_block_end(self, fp, offset, end_offset, decompressor,
521
+ chunk_size=io.DEFAULT_BUFFER_SIZE):
517
522
  fp.seek(offset)
518
523
  read_size = 0
519
524
  while True:
@@ -1,6 +1,6 @@
1
1
  Metadata-Version: 2.4
2
2
  Name: zipremove
3
- Version: 0.4.1
3
+ Version: 0.6.0
4
4
  Summary: Extend `zipfile` with `remove`-related functionalities
5
5
  Home-page: https://github.com/danny0838/zipremove
6
6
  Author: Danny Lin
@@ -45,14 +45,10 @@ This package extends `zipfile` with `remove`-related functionalities.
45
45
 
46
46
  * `ZipFile.remove(zinfo_or_arcname)`
47
47
 
48
- Removes a member from the archive. *zinfo_or_arcname* may be the full path
49
- of the member or a `ZipInfo` instance.
50
-
51
- If multiple members share the same full path, only one is removed when
52
- a path is provided.
53
-
54
- This does not physically remove the local file entry from the archive.
55
- Call `ZipFile.repack` afterwards to reclaim space.
48
+ Removes a member entry from the archive's central directory.
49
+ *zinfo_or_arcname* may be the full path of the member or a `ZipInfo`
50
+ instance. If multiple members share the same full path and the path is
51
+ provided, only one of them is removed.
56
52
 
57
53
  The archive must be opened with mode ``'w'``, ``'x'`` or ``'a'``.
58
54
 
@@ -60,42 +56,49 @@ This package extends `zipfile` with `remove`-related functionalities.
60
56
 
61
57
  Calling `remove` on a closed ZipFile will raise a `ValueError`.
62
58
 
59
+ > **Note:**
60
+ > This method only removes the member's entry from the central directory,
61
+ > making it inaccessible to most tools. The member's local file entry,
62
+ > including content and metadata, remains in the archive and is still
63
+ > recoverable using forensic tools. Call `repack` afterwards to completely
64
+ > remove the member and reclaim space.
65
+
63
66
  * `ZipFile.repack(removed=None, *, strict_descriptor=False[, chunk_size])`
64
67
 
65
- Rewrites the archive to remove stale local file entries, shrinking its file
66
- size.
68
+ Rewrites the archive to remove unreferenced local file entries, shrinking
69
+ its file size. The archive must be opened with mode ``'a'``.
67
70
 
68
71
  If *removed* is provided, it must be a sequence of `ZipInfo` objects
69
- representing removed entries; only their corresponding local file entries
70
- will be removed.
71
-
72
- If *removed* is not provided, the archive is scanned to identify and remove
73
- local file entries that are no longer referenced in the central directory.
74
- The algorithm assumes that local file entries (and the central directory,
75
- which is mostly treated as the "last entry") are stored consecutively:
76
-
77
- 1. Data before the first referenced entry is removed only when it appears to
78
- be a sequence of consecutive entries with no extra following bytes; extra
79
- preceding bytes are preserved.
80
- 2. Data between referenced entries is removed only when it appears to
81
- be a sequence of consecutive entries with no extra preceding bytes; extra
82
- following bytes are preserved.
83
- 3. Entries must not overlap. If any entry's data overlaps with another, a
84
- `BadZipFile` error is raised and no changes are made.
85
-
86
- When scanning, setting `strict_descriptor=True` disables detection of any
87
- entry using an unsigned data descriptor (deprecated in the ZIP specification
88
- since version 6.3.0, released on 2006-09-29, and used only by some legacy
89
- tools). This improves performance, but may cause some stale entries to be
90
- preserved.
72
+ representing the recently removed members, and only their corresponding
73
+ local file entries will be removed. Otherwise, the archive is scanned to
74
+ locate and remove local file entries that are no longer referenced in the
75
+ central directory.
76
+
77
+ When scanning, setting ``strict_descriptor=True`` disables detection of any
78
+ entry using an unsigned data descriptor (a format deprecated by the ZIP
79
+ specification since version 6.3.0, released on 2006-09-29, and used only by
80
+ some legacy tools), which is significantly slower to scan—around 100 to
81
+ 1000 times in the worst case. This does not affect performance on entries
82
+ without such feature.
91
83
 
92
84
  *chunk_size* may be specified to control the buffer size when moving
93
85
  entry data (default is 1 MiB).
94
86
 
95
- The archive must be opened with mode ``'a'``.
96
-
97
87
  Calling `repack` on a closed ZipFile will raise a `ValueError`.
98
88
 
89
+ > **Note:**
90
+ > The scanning algorithm is heuristic-based and assumes that the ZIP file
91
+ > is normally structured—for example, with local file entries stored
92
+ > consecutively, without overlap or interleaved binary data. Prepended
93
+ > binary data, such as a self-extractor stub, is recognized and preserved
94
+ > unless it happens to contain bytes that coincidentally resemble a valid
95
+ > local file entry in multiple respects—an extremely rare case. Embedded
96
+ > ZIP payloads are also handled correctly, as long as they follow normal
97
+ > structure. However, the algorithm does not guarantee correctness or
98
+ > safety on untrusted or intentionally crafted input. It is generally
99
+ > recommended to provide the *removed* argument for better reliability and
100
+ > performance.
101
+
99
102
  * `ZipFile.copy(zinfo_or_arcname, new_arcname[, chunk_size])`
100
103
 
101
104
  Copies a member *zinfo_or_arcname* to *new_arcname* in the archive.