zipremove 0.5.0__tar.gz → 0.6.0__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -1,6 +1,6 @@
1
1
  Metadata-Version: 2.4
2
2
  Name: zipremove
3
- Version: 0.5.0
3
+ Version: 0.6.0
4
4
  Summary: Extend `zipfile` with `remove`-related functionalities
5
5
  Home-page: https://github.com/danny0838/zipremove
6
6
  Author: Danny Lin
@@ -45,14 +45,10 @@ This package extends `zipfile` with `remove`-related functionalities.
45
45
 
46
46
  * `ZipFile.remove(zinfo_or_arcname)`
47
47
 
48
- Removes a member from the archive. *zinfo_or_arcname* may be the full path
49
- of the member or a `ZipInfo` instance.
50
-
51
- If multiple members share the same full path, only one is removed when
52
- a path is provided.
53
-
54
- This does not physically remove the local file entry from the archive.
55
- Call `repack` afterwards to reclaim space.
48
+ Removes a member entry from the archive's central directory.
49
+ *zinfo_or_arcname* may be the full path of the member or a `ZipInfo`
50
+ instance. If multiple members share the same full path and the path is
51
+ provided, only one of them is removed.
56
52
 
57
53
  The archive must be opened with mode ``'w'``, ``'x'`` or ``'a'``.
58
54
 
@@ -60,42 +56,49 @@ This package extends `zipfile` with `remove`-related functionalities.
60
56
 
61
57
  Calling `remove` on a closed ZipFile will raise a `ValueError`.
62
58
 
59
+ > **Note:**
60
+ > This method only removes the member's entry from the central directory,
61
+ > making it inaccessible to most tools. The member's local file entry,
62
+ > including content and metadata, remains in the archive and is still
63
+ > recoverable using forensic tools. Call `repack` afterwards to completely
64
+ > remove the member and reclaim space.
65
+
63
66
  * `ZipFile.repack(removed=None, *, strict_descriptor=False[, chunk_size])`
64
67
 
65
- Rewrites the archive to remove stale local file entries, shrinking its file
66
- size.
68
+ Rewrites the archive to remove unreferenced local file entries, shrinking
69
+ its file size. The archive must be opened with mode ``'a'``.
67
70
 
68
71
  If *removed* is provided, it must be a sequence of `ZipInfo` objects
69
- representing removed entries; only their corresponding local file entries
70
- will be removed.
71
-
72
- If *removed* is not provided, the archive is scanned to identify and remove
73
- local file entries that are no longer referenced in the central directory.
74
- The algorithm assumes that local file entries (and the central directory,
75
- which is mostly treated as the "last entry") are stored consecutively:
76
-
77
- 1. Data before the first referenced entry is removed only when it appears to
78
- be a sequence of consecutive entries with no extra following bytes; extra
79
- preceding bytes are preserved.
80
- 2. Data between referenced entries is removed only when it appears to
81
- be a sequence of consecutive entries with no extra preceding bytes; extra
82
- following bytes are preserved.
83
- 3. Entries must not overlap. If any entry's data overlaps with another, a
84
- `BadZipFile` error is raised and no changes are made.
85
-
86
- When scanning, setting `strict_descriptor=True` disables detection of any
87
- entry using an unsigned data descriptor (deprecated in the ZIP specification
88
- since version 6.3.0, released on 2006-09-29, and used only by some legacy
89
- tools). This improves performance, but may cause some stale entries to be
90
- preserved.
72
+ representing the recently removed members, and only their corresponding
73
+ local file entries will be removed. Otherwise, the archive is scanned to
74
+ locate and remove local file entries that are no longer referenced in the
75
+ central directory.
76
+
77
+ When scanning, setting ``strict_descriptor=True`` disables detection of any
78
+ entry using an unsigned data descriptor (a format deprecated by the ZIP
79
+ specification since version 6.3.0, released on 2006-09-29, and used only by
80
+ some legacy tools), which is significantly slower to scan—around 100 to
81
+ 1000 times in the worst case. This does not affect performance on entries
82
+ without such feature.
91
83
 
92
84
  *chunk_size* may be specified to control the buffer size when moving
93
85
  entry data (default is 1 MiB).
94
86
 
95
- The archive must be opened with mode ``'a'``.
96
-
97
87
  Calling `repack` on a closed ZipFile will raise a `ValueError`.
98
88
 
89
+ > **Note:**
90
+ > The scanning algorithm is heuristic-based and assumes that the ZIP file
91
+ > is normally structured—for example, with local file entries stored
92
+ > consecutively, without overlap or interleaved binary data. Prepended
93
+ > binary data, such as a self-extractor stub, is recognized and preserved
94
+ > unless it happens to contain bytes that coincidentally resemble a valid
95
+ > local file entry in multiple respects—an extremely rare case. Embedded
96
+ > ZIP payloads are also handled correctly, as long as they follow normal
97
+ > structure. However, the algorithm does not guarantee correctness or
98
+ > safety on untrusted or intentionally crafted input. It is generally
99
+ > recommended to provide the *removed* argument for better reliability and
100
+ > performance.
101
+
99
102
  * `ZipFile.copy(zinfo_or_arcname, new_arcname[, chunk_size])`
100
103
 
101
104
  Copies a member *zinfo_or_arcname* to *new_arcname* in the archive.
@@ -11,14 +11,10 @@ This package extends `zipfile` with `remove`-related functionalities.
11
11
 
12
12
  * `ZipFile.remove(zinfo_or_arcname)`
13
13
 
14
- Removes a member from the archive. *zinfo_or_arcname* may be the full path
15
- of the member or a `ZipInfo` instance.
16
-
17
- If multiple members share the same full path, only one is removed when
18
- a path is provided.
19
-
20
- This does not physically remove the local file entry from the archive.
21
- Call `repack` afterwards to reclaim space.
14
+ Removes a member entry from the archive's central directory.
15
+ *zinfo_or_arcname* may be the full path of the member or a `ZipInfo`
16
+ instance. If multiple members share the same full path and the path is
17
+ provided, only one of them is removed.
22
18
 
23
19
  The archive must be opened with mode ``'w'``, ``'x'`` or ``'a'``.
24
20
 
@@ -26,42 +22,49 @@ This package extends `zipfile` with `remove`-related functionalities.
26
22
 
27
23
  Calling `remove` on a closed ZipFile will raise a `ValueError`.
28
24
 
25
+ > **Note:**
26
+ > This method only removes the member's entry from the central directory,
27
+ > making it inaccessible to most tools. The member's local file entry,
28
+ > including content and metadata, remains in the archive and is still
29
+ > recoverable using forensic tools. Call `repack` afterwards to completely
30
+ > remove the member and reclaim space.
31
+
29
32
  * `ZipFile.repack(removed=None, *, strict_descriptor=False[, chunk_size])`
30
33
 
31
- Rewrites the archive to remove stale local file entries, shrinking its file
32
- size.
34
+ Rewrites the archive to remove unreferenced local file entries, shrinking
35
+ its file size. The archive must be opened with mode ``'a'``.
33
36
 
34
37
  If *removed* is provided, it must be a sequence of `ZipInfo` objects
35
- representing removed entries; only their corresponding local file entries
36
- will be removed.
37
-
38
- If *removed* is not provided, the archive is scanned to identify and remove
39
- local file entries that are no longer referenced in the central directory.
40
- The algorithm assumes that local file entries (and the central directory,
41
- which is mostly treated as the "last entry") are stored consecutively:
42
-
43
- 1. Data before the first referenced entry is removed only when it appears to
44
- be a sequence of consecutive entries with no extra following bytes; extra
45
- preceding bytes are preserved.
46
- 2. Data between referenced entries is removed only when it appears to
47
- be a sequence of consecutive entries with no extra preceding bytes; extra
48
- following bytes are preserved.
49
- 3. Entries must not overlap. If any entry's data overlaps with another, a
50
- `BadZipFile` error is raised and no changes are made.
51
-
52
- When scanning, setting `strict_descriptor=True` disables detection of any
53
- entry using an unsigned data descriptor (deprecated in the ZIP specification
54
- since version 6.3.0, released on 2006-09-29, and used only by some legacy
55
- tools). This improves performance, but may cause some stale entries to be
56
- preserved.
38
+ representing the recently removed members, and only their corresponding
39
+ local file entries will be removed. Otherwise, the archive is scanned to
40
+ locate and remove local file entries that are no longer referenced in the
41
+ central directory.
42
+
43
+ When scanning, setting ``strict_descriptor=True`` disables detection of any
44
+ entry using an unsigned data descriptor (a format deprecated by the ZIP
45
+ specification since version 6.3.0, released on 2006-09-29, and used only by
46
+ some legacy tools), which is significantly slower to scan—around 100 to
47
+ 1000 times in the worst case. This does not affect performance on entries
48
+ without such feature.
57
49
 
58
50
  *chunk_size* may be specified to control the buffer size when moving
59
51
  entry data (default is 1 MiB).
60
52
 
61
- The archive must be opened with mode ``'a'``.
62
-
63
53
  Calling `repack` on a closed ZipFile will raise a `ValueError`.
64
54
 
55
+ > **Note:**
56
+ > The scanning algorithm is heuristic-based and assumes that the ZIP file
57
+ > is normally structured—for example, with local file entries stored
58
+ > consecutively, without overlap or interleaved binary data. Prepended
59
+ > binary data, such as a self-extractor stub, is recognized and preserved
60
+ > unless it happens to contain bytes that coincidentally resemble a valid
61
+ > local file entry in multiple respects—an extremely rare case. Embedded
62
+ > ZIP payloads are also handled correctly, as long as they follow normal
63
+ > structure. However, the algorithm does not guarantee correctness or
64
+ > safety on untrusted or intentionally crafted input. It is generally
65
+ > recommended to provide the *removed* argument for better reliability and
66
+ > performance.
67
+
65
68
  * `ZipFile.copy(zinfo_or_arcname, new_arcname[, chunk_size])`
66
69
 
67
70
  Copies a member *zinfo_or_arcname* to *new_arcname* in the archive.
@@ -1,6 +1,6 @@
1
1
  [metadata]
2
2
  name = zipremove
3
- version = 0.5.0
3
+ version = 0.6.0
4
4
  author = Danny Lin
5
5
  author_email = danny0838@gmail.com
6
6
  url = https://github.com/danny0838/zipremove
@@ -177,7 +177,8 @@ class _ZipRepacker:
177
177
 
178
178
  Side effects:
179
179
  - Modifies the ZIP file in place.
180
- - Updates zfile.start_dir to account for removed data.
180
+ - Updates zfile.start_dir and zfile.data_offset to account for
181
+ removed data.
181
182
  - Sets zfile._didModify to True.
182
183
  - Updates header_offset and clears _end_offset of referenced
183
184
  ZipInfo instances.
@@ -282,6 +283,15 @@ class _ZipRepacker:
282
283
  zfile.start_dir -= entry_offset
283
284
  zfile._didModify = True
284
285
 
286
+ # polyfill: update _data_offset if exists
287
+ if getattr(zfile, '_data_offset', None):
288
+ try:
289
+ offset = filelist[0].header_offset
290
+ except IndexError:
291
+ offset = zfile.start_dir
292
+ if offset < zfile._data_offset:
293
+ zfile._data_offset = offset
294
+
285
295
  # polyfill: clear ZipInfo._end_offset if exists
286
296
  # (Python >= 3.8 with fix #109858)
287
297
  if hasattr(ZipInfo, '_end_offset'):
@@ -300,7 +310,8 @@ class _ZipRepacker:
300
310
  return entry_size
301
311
  return 0
302
312
 
303
- def _iter_scan_signature(self, fp, signature, start_offset, end_offset, chunk_size=4096):
313
+ def _iter_scan_signature(self, fp, signature, start_offset, end_offset,
314
+ chunk_size=io.DEFAULT_BUFFER_SIZE):
304
315
  sig_len = len(signature)
305
316
  remainder = b''
306
317
  pos = start_offset
@@ -506,7 +517,8 @@ class _ZipRepacker:
506
517
 
507
518
  return crc, compress_size, file_size, dd_size
508
519
 
509
- def _trace_compressed_block_end(self, fp, offset, end_offset, decompressor, chunk_size=4096):
520
+ def _trace_compressed_block_end(self, fp, offset, end_offset, decompressor,
521
+ chunk_size=io.DEFAULT_BUFFER_SIZE):
510
522
  fp.seek(offset)
511
523
  read_size = 0
512
524
  while True:
@@ -1,6 +1,6 @@
1
1
  Metadata-Version: 2.4
2
2
  Name: zipremove
3
- Version: 0.5.0
3
+ Version: 0.6.0
4
4
  Summary: Extend `zipfile` with `remove`-related functionalities
5
5
  Home-page: https://github.com/danny0838/zipremove
6
6
  Author: Danny Lin
@@ -45,14 +45,10 @@ This package extends `zipfile` with `remove`-related functionalities.
45
45
 
46
46
  * `ZipFile.remove(zinfo_or_arcname)`
47
47
 
48
- Removes a member from the archive. *zinfo_or_arcname* may be the full path
49
- of the member or a `ZipInfo` instance.
50
-
51
- If multiple members share the same full path, only one is removed when
52
- a path is provided.
53
-
54
- This does not physically remove the local file entry from the archive.
55
- Call `repack` afterwards to reclaim space.
48
+ Removes a member entry from the archive's central directory.
49
+ *zinfo_or_arcname* may be the full path of the member or a `ZipInfo`
50
+ instance. If multiple members share the same full path and the path is
51
+ provided, only one of them is removed.
56
52
 
57
53
  The archive must be opened with mode ``'w'``, ``'x'`` or ``'a'``.
58
54
 
@@ -60,42 +56,49 @@ This package extends `zipfile` with `remove`-related functionalities.
60
56
 
61
57
  Calling `remove` on a closed ZipFile will raise a `ValueError`.
62
58
 
59
+ > **Note:**
60
+ > This method only removes the member's entry from the central directory,
61
+ > making it inaccessible to most tools. The member's local file entry,
62
+ > including content and metadata, remains in the archive and is still
63
+ > recoverable using forensic tools. Call `repack` afterwards to completely
64
+ > remove the member and reclaim space.
65
+
63
66
  * `ZipFile.repack(removed=None, *, strict_descriptor=False[, chunk_size])`
64
67
 
65
- Rewrites the archive to remove stale local file entries, shrinking its file
66
- size.
68
+ Rewrites the archive to remove unreferenced local file entries, shrinking
69
+ its file size. The archive must be opened with mode ``'a'``.
67
70
 
68
71
  If *removed* is provided, it must be a sequence of `ZipInfo` objects
69
- representing removed entries; only their corresponding local file entries
70
- will be removed.
71
-
72
- If *removed* is not provided, the archive is scanned to identify and remove
73
- local file entries that are no longer referenced in the central directory.
74
- The algorithm assumes that local file entries (and the central directory,
75
- which is mostly treated as the "last entry") are stored consecutively:
76
-
77
- 1. Data before the first referenced entry is removed only when it appears to
78
- be a sequence of consecutive entries with no extra following bytes; extra
79
- preceding bytes are preserved.
80
- 2. Data between referenced entries is removed only when it appears to
81
- be a sequence of consecutive entries with no extra preceding bytes; extra
82
- following bytes are preserved.
83
- 3. Entries must not overlap. If any entry's data overlaps with another, a
84
- `BadZipFile` error is raised and no changes are made.
85
-
86
- When scanning, setting `strict_descriptor=True` disables detection of any
87
- entry using an unsigned data descriptor (deprecated in the ZIP specification
88
- since version 6.3.0, released on 2006-09-29, and used only by some legacy
89
- tools). This improves performance, but may cause some stale entries to be
90
- preserved.
72
+ representing the recently removed members, and only their corresponding
73
+ local file entries will be removed. Otherwise, the archive is scanned to
74
+ locate and remove local file entries that are no longer referenced in the
75
+ central directory.
76
+
77
+ When scanning, setting ``strict_descriptor=True`` disables detection of any
78
+ entry using an unsigned data descriptor (a format deprecated by the ZIP
79
+ specification since version 6.3.0, released on 2006-09-29, and used only by
80
+ some legacy tools), which is significantly slower to scan—around 100 to
81
+ 1000 times in the worst case. This does not affect performance on entries
82
+ without such feature.
91
83
 
92
84
  *chunk_size* may be specified to control the buffer size when moving
93
85
  entry data (default is 1 MiB).
94
86
 
95
- The archive must be opened with mode ``'a'``.
96
-
97
87
  Calling `repack` on a closed ZipFile will raise a `ValueError`.
98
88
 
89
+ > **Note:**
90
+ > The scanning algorithm is heuristic-based and assumes that the ZIP file
91
+ > is normally structured—for example, with local file entries stored
92
+ > consecutively, without overlap or interleaved binary data. Prepended
93
+ > binary data, such as a self-extractor stub, is recognized and preserved
94
+ > unless it happens to contain bytes that coincidentally resemble a valid
95
+ > local file entry in multiple respects—an extremely rare case. Embedded
96
+ > ZIP payloads are also handled correctly, as long as they follow normal
97
+ > structure. However, the algorithm does not guarantee correctness or
98
+ > safety on untrusted or intentionally crafted input. It is generally
99
+ > recommended to provide the *removed* argument for better reliability and
100
+ > performance.
101
+
99
102
  * `ZipFile.copy(zinfo_or_arcname, new_arcname[, chunk_size])`
100
103
 
101
104
  Copies a member *zinfo_or_arcname* to *new_arcname* in the archive.
@@ -436,45 +436,49 @@ class AbstractRemoveTests(RepackHelperMixin):
436
436
  # suppress duplicated name warning
437
437
  with warnings.catch_warnings():
438
438
  warnings.simplefilter("ignore")
439
-
440
439
  zinfos = self._prepare_zip_from_test_files(TESTFN, test_files)
441
- with zipfile.ZipFile(TESTFN, 'a', self.compression) as zh:
442
- zh.remove('file.txt')
443
440
 
444
- # check infolist
445
- self.assertEqual(
446
- [ComparableZipInfo(zi) for zi in zh.infolist()],
447
- [ComparableZipInfo(zi) for zi in [zinfos[0], zinfos[2]]],
448
- )
441
+ with zipfile.ZipFile(TESTFN, 'a', self.compression) as zh:
442
+ zh.remove('file.txt')
443
+
444
+ # check infolist
445
+ self.assertEqual(
446
+ [ComparableZipInfo(zi) for zi in zh.infolist()],
447
+ [ComparableZipInfo(zi) for zi in [zinfos[0], zinfos[2]]],
448
+ )
449
449
 
450
- # check NameToInfo cache
451
- self.assertEqual(
452
- ComparableZipInfo(zh.getinfo('file.txt')),
453
- ComparableZipInfo(zinfos[0]),
454
- )
450
+ # check NameToInfo cache
451
+ self.assertEqual(
452
+ ComparableZipInfo(zh.getinfo('file.txt')),
453
+ ComparableZipInfo(zinfos[0]),
454
+ )
455
455
 
456
- # make sure the zip file is still valid
457
- with zipfile.ZipFile(TESTFN) as zh:
458
- self.assertIsNone(zh.testzip())
456
+ # make sure the zip file is still valid
457
+ with zipfile.ZipFile(TESTFN) as zh:
458
+ self.assertIsNone(zh.testzip())
459
459
 
460
+ # suppress duplicated name warning
461
+ with warnings.catch_warnings():
462
+ warnings.simplefilter("ignore")
460
463
  zinfos = self._prepare_zip_from_test_files(TESTFN, test_files)
461
- with zipfile.ZipFile(TESTFN, 'a', self.compression) as zh:
462
- zh.remove('file.txt')
463
- zh.remove('file.txt')
464
464
 
465
- # check infolist
466
- self.assertEqual(
467
- [ComparableZipInfo(zi) for zi in zh.infolist()],
468
- [ComparableZipInfo(zi) for zi in [zinfos[2]]],
469
- )
465
+ with zipfile.ZipFile(TESTFN, 'a', self.compression) as zh:
466
+ zh.remove('file.txt')
467
+ zh.remove('file.txt')
468
+
469
+ # check infolist
470
+ self.assertEqual(
471
+ [ComparableZipInfo(zi) for zi in zh.infolist()],
472
+ [ComparableZipInfo(zi) for zi in [zinfos[2]]],
473
+ )
470
474
 
471
- # check NameToInfo cache
472
- with self.assertRaises(KeyError):
473
- zh.getinfo('file.txt')
475
+ # check NameToInfo cache
476
+ with self.assertRaises(KeyError):
477
+ zh.getinfo('file.txt')
474
478
 
475
- # make sure the zip file is still valid
476
- with zipfile.ZipFile(TESTFN) as zh:
477
- self.assertIsNone(zh.testzip())
479
+ # make sure the zip file is still valid
480
+ with zipfile.ZipFile(TESTFN) as zh:
481
+ self.assertIsNone(zh.testzip())
478
482
 
479
483
  def test_remove_by_zinfo_duplicated(self):
480
484
  test_files = [
@@ -486,66 +490,74 @@ class AbstractRemoveTests(RepackHelperMixin):
486
490
  # suppress duplicated name warning
487
491
  with warnings.catch_warnings():
488
492
  warnings.simplefilter("ignore")
489
-
490
493
  zinfos = self._prepare_zip_from_test_files(TESTFN, test_files)
491
- with zipfile.ZipFile(TESTFN, 'a', self.compression) as zh:
492
- zh.remove(zh.infolist()[0])
493
494
 
494
- # check infolist
495
- self.assertEqual(
496
- [ComparableZipInfo(zi) for zi in zh.infolist()],
497
- [ComparableZipInfo(zi) for zi in [zinfos[1], zinfos[2]]],
498
- )
495
+ with zipfile.ZipFile(TESTFN, 'a', self.compression) as zh:
496
+ zh.remove(zh.infolist()[0])
499
497
 
500
- # check NameToInfo cache
501
- self.assertEqual(
502
- ComparableZipInfo(zh.getinfo('file.txt')),
503
- ComparableZipInfo(zinfos[1]),
504
- )
498
+ # check infolist
499
+ self.assertEqual(
500
+ [ComparableZipInfo(zi) for zi in zh.infolist()],
501
+ [ComparableZipInfo(zi) for zi in [zinfos[1], zinfos[2]]],
502
+ )
505
503
 
506
- # make sure the zip file is still valid
507
- with zipfile.ZipFile(TESTFN) as zh:
508
- self.assertIsNone(zh.testzip())
504
+ # check NameToInfo cache
505
+ self.assertEqual(
506
+ ComparableZipInfo(zh.getinfo('file.txt')),
507
+ ComparableZipInfo(zinfos[1]),
508
+ )
509
509
 
510
+ # make sure the zip file is still valid
511
+ with zipfile.ZipFile(TESTFN) as zh:
512
+ self.assertIsNone(zh.testzip())
513
+
514
+ # suppress duplicated name warning
515
+ with warnings.catch_warnings():
516
+ warnings.simplefilter("ignore")
510
517
  zinfos = self._prepare_zip_from_test_files(TESTFN, test_files)
511
- with zipfile.ZipFile(TESTFN, 'a', self.compression) as zh:
512
- zh.remove(zh.infolist()[1])
513
518
 
514
- # check infolist
515
- self.assertEqual(
516
- [ComparableZipInfo(zi) for zi in zh.infolist()],
517
- [ComparableZipInfo(zi) for zi in [zinfos[0], zinfos[2]]],
518
- )
519
+ with zipfile.ZipFile(TESTFN, 'a', self.compression) as zh:
520
+ zh.remove(zh.infolist()[1])
521
+
522
+ # check infolist
523
+ self.assertEqual(
524
+ [ComparableZipInfo(zi) for zi in zh.infolist()],
525
+ [ComparableZipInfo(zi) for zi in [zinfos[0], zinfos[2]]],
526
+ )
519
527
 
520
- # check NameToInfo cache
521
- self.assertEqual(
522
- ComparableZipInfo(zh.getinfo('file.txt')),
523
- ComparableZipInfo(zinfos[0]),
524
- )
528
+ # check NameToInfo cache
529
+ self.assertEqual(
530
+ ComparableZipInfo(zh.getinfo('file.txt')),
531
+ ComparableZipInfo(zinfos[0]),
532
+ )
525
533
 
526
- # make sure the zip file is still valid
527
- with zipfile.ZipFile(TESTFN) as zh:
528
- self.assertIsNone(zh.testzip())
534
+ # make sure the zip file is still valid
535
+ with zipfile.ZipFile(TESTFN) as zh:
536
+ self.assertIsNone(zh.testzip())
529
537
 
538
+ # suppress duplicated name warning
539
+ with warnings.catch_warnings():
540
+ warnings.simplefilter("ignore")
530
541
  zinfos = self._prepare_zip_from_test_files(TESTFN, test_files)
531
- with zipfile.ZipFile(TESTFN, 'a', self.compression) as zh:
532
- infolist = zh.infolist().copy()
533
- zh.remove(infolist[0])
534
- zh.remove(infolist[1])
535
542
 
536
- # check infolist
537
- self.assertEqual(
538
- [ComparableZipInfo(zi) for zi in zh.infolist()],
539
- [ComparableZipInfo(zi) for zi in [zinfos[2]]],
540
- )
543
+ with zipfile.ZipFile(TESTFN, 'a', self.compression) as zh:
544
+ infolist = zh.infolist().copy()
545
+ zh.remove(infolist[0])
546
+ zh.remove(infolist[1])
541
547
 
542
- # check NameToInfo cache
543
- with self.assertRaises(KeyError):
544
- zh.getinfo('file.txt')
548
+ # check infolist
549
+ self.assertEqual(
550
+ [ComparableZipInfo(zi) for zi in zh.infolist()],
551
+ [ComparableZipInfo(zi) for zi in [zinfos[2]]],
552
+ )
553
+
554
+ # check NameToInfo cache
555
+ with self.assertRaises(KeyError):
556
+ zh.getinfo('file.txt')
545
557
 
546
- # make sure the zip file is still valid
547
- with zipfile.ZipFile(TESTFN) as zh:
548
- self.assertIsNone(zh.testzip())
558
+ # make sure the zip file is still valid
559
+ with zipfile.ZipFile(TESTFN) as zh:
560
+ self.assertIsNone(zh.testzip())
549
561
 
550
562
  @requires_zip64fix()
551
563
  def test_remove_zip64(self):
@@ -823,7 +835,7 @@ class AbstractRepackTests(RepackHelperMixin):
823
835
  with zipfile.ZipFile(TESTFN) as zh:
824
836
  self.assertIsNone(zh.testzip())
825
837
 
826
- @mock.patch.object(time, 'time', new=lambda: 315504000) # fix time for ZipFile.writestr()
838
+ @mock.patch.object(time, 'time', new=lambda: 315590400) # fix time for ZipFile.writestr()
827
839
  def test_repack_bytes_before_removed_files(self):
828
840
  """Should preserve if there are bytes before stale local file entries."""
829
841
  for ii in ([1], [1, 2], [2]):
@@ -867,7 +879,7 @@ class AbstractRepackTests(RepackHelperMixin):
867
879
  with zipfile.ZipFile(TESTFN) as zh:
868
880
  self.assertIsNone(zh.testzip())
869
881
 
870
- @mock.patch.object(time, 'time', new=lambda: 315504000) # fix time for ZipFile.writestr()
882
+ @mock.patch.object(time, 'time', new=lambda: 315590400) # fix time for ZipFile.writestr()
871
883
  def test_repack_bytes_after_removed_files(self):
872
884
  """Should keep extra bytes if there are bytes after stale local file entries."""
873
885
  for ii in ([1], [1, 2], [2]):
@@ -910,7 +922,7 @@ class AbstractRepackTests(RepackHelperMixin):
910
922
  with zipfile.ZipFile(TESTFN) as zh:
911
923
  self.assertIsNone(zh.testzip())
912
924
 
913
- @mock.patch.object(time, 'time', new=lambda: 315504000) # fix time for ZipFile.writestr()
925
+ @mock.patch.object(time, 'time', new=lambda: 315590400) # fix time for ZipFile.writestr()
914
926
  def test_repack_bytes_between_removed_files(self):
915
927
  """Should strip only local file entries before random bytes."""
916
928
  # calculate the expected results
@@ -954,8 +966,8 @@ class AbstractRepackTests(RepackHelperMixin):
954
966
  for ii in ([], [0], [0, 1], [1], [2]):
955
967
  with self.subTest(remove=ii):
956
968
  # calculate the expected results
957
- test_files = [data for j, data in enumerate(self.test_files) if j not in ii]
958
969
  fz = io.BytesIO()
970
+ test_files = [data for j, data in enumerate(self.test_files) if j not in ii]
959
971
  self._prepare_zip_from_test_files(fz, test_files)
960
972
  fz.seek(0)
961
973
  with open(TESTFN, 'wb') as fh:
@@ -973,9 +985,67 @@ class AbstractRepackTests(RepackHelperMixin):
973
985
  fh.write(b'dummy ')
974
986
  fh.write(fz.read())
975
987
  with zipfile.ZipFile(TESTFN, 'a', self.compression) as zh:
988
+ if hasattr(zh, 'data_offset'):
989
+ self.assertEqual(zh.data_offset, 6)
976
990
  for i in ii:
977
991
  zh.remove(self.test_files[i][0])
978
992
  zh.repack()
993
+ if hasattr(zh, 'data_offset'):
994
+ self.assertEqual(zh.data_offset, 6)
995
+
996
+ # check infolist
997
+ self.assertEqual(
998
+ [ComparableZipInfo(zi) for zi in zh.infolist()],
999
+ [ComparableZipInfo(zi) for zi in expected_zinfos],
1000
+ )
1001
+
1002
+ # check file size
1003
+ self.assertEqual(os.path.getsize(TESTFN), expected_size)
1004
+
1005
+ # make sure the zip file is still valid
1006
+ with zipfile.ZipFile(TESTFN) as zh:
1007
+ self.assertIsNone(zh.testzip())
1008
+
1009
+ def test_repack_prepended_file_entry(self):
1010
+ for ii in ([0], [0, 1], [0, 1, 2]):
1011
+ with self.subTest(remove=ii):
1012
+ # calculate the expected results
1013
+ fz = io.BytesIO()
1014
+ test_files = [data for j, data in enumerate(self.test_files) if j not in ii]
1015
+ self._prepare_zip_from_test_files(fz, test_files)
1016
+ fz.seek(0)
1017
+ with open(TESTFN, 'wb') as fh:
1018
+ fh.write(b'dummy ')
1019
+ fh.write(fz.read())
1020
+ with zipfile.ZipFile(TESTFN) as zh:
1021
+ expected_zinfos = list(zh.infolist())
1022
+ expected_size = os.path.getsize(TESTFN)
1023
+
1024
+ # do the removal and check the result
1025
+ fz = io.BytesIO()
1026
+ with zipfile.ZipFile(fz, 'w') as zh:
1027
+ for j, (file, data) in enumerate(self.test_files):
1028
+ if j in ii:
1029
+ zh.writestr(file, data)
1030
+ fz.seek(0)
1031
+ prefix = fz.read()
1032
+
1033
+ fz = io.BytesIO()
1034
+ test_files = [data for j, data in enumerate(self.test_files) if j not in ii]
1035
+ self._prepare_zip_from_test_files(fz, test_files)
1036
+ fz.seek(0)
1037
+
1038
+ with open(TESTFN, 'wb') as fh:
1039
+ fh.write(b'dummy ')
1040
+ fh.write(prefix)
1041
+ fh.write(fz.read())
1042
+
1043
+ with zipfile.ZipFile(TESTFN, 'a', self.compression) as zh:
1044
+ if hasattr(zh, 'data_offset'):
1045
+ self.assertEqual(zh.data_offset, 6 + len(prefix))
1046
+ zh.repack()
1047
+ if hasattr(zh, 'data_offset'):
1048
+ self.assertEqual(zh.data_offset, 6)
979
1049
 
980
1050
  # check infolist
981
1051
  self.assertEqual(
@@ -1068,7 +1138,7 @@ class AbstractRepackTests(RepackHelperMixin):
1068
1138
  with zipfile.ZipFile(TESTFN) as zh:
1069
1139
  self.assertIsNone(zh.testzip())
1070
1140
 
1071
- @mock.patch.object(time, 'time', new=lambda: 315504000) # fix time for ZipFile.writestr()
1141
+ @mock.patch.object(time, 'time', new=lambda: 315590400) # fix time for ZipFile.writestr()
1072
1142
  def test_repack_removed_bytes_between_files(self):
1073
1143
  """Should not remove bytes between local file entries."""
1074
1144
  for ii in ([0], [1], [2]):
@@ -1165,8 +1235,12 @@ class AbstractRepackTests(RepackHelperMixin):
1165
1235
  fh.write(b'dummy ')
1166
1236
  fh.write(fz.read())
1167
1237
  with zipfile.ZipFile(TESTFN, 'a', self.compression) as zh:
1238
+ if hasattr(zh, 'data_offset'):
1239
+ self.assertEqual(zh.data_offset, 6)
1168
1240
  zinfos = [zh.remove(self.test_files[i][0]) for i in ii]
1169
1241
  zh.repack(zinfos)
1242
+ if hasattr(zh, 'data_offset'):
1243
+ self.assertEqual(zh.data_offset, 6)
1170
1244
 
1171
1245
  # check infolist
1172
1246
  self.assertEqual(
@@ -1468,11 +1542,14 @@ class ZipRepackerTests(unittest.TestCase):
1468
1542
  fz = io.BytesIO()
1469
1543
  f = Unseekable(fz) if dd else fz
1470
1544
  cm = (mock.patch.object(struct, 'pack', side_effect=struct_pack_no_dd_sig)
1471
- if not dd_sig else contextlib.nullcontext())
1545
+ if dd and not dd_sig else contextlib.nullcontext())
1472
1546
  with zipfile.ZipFile(f, 'w', compression=compression) as zh:
1473
- with cm:
1474
- with zh.open(arcname, 'w', force_zip64=force_zip64) as fh:
1475
- fh.write(raw_bytes)
1547
+ with cm, zh.open(arcname, 'w', force_zip64=force_zip64) as fh:
1548
+ fh.write(raw_bytes)
1549
+ if dd:
1550
+ zi = zh.infolist()[0]
1551
+ self.assertTrue(zi.flag_bits & zipfile._MASK_USE_DATA_DESCRIPTOR,
1552
+ f'data descriptor flag not set: {zi.filename}')
1476
1553
  fz.seek(0)
1477
1554
  return fz.read()
1478
1555
 
@@ -1578,10 +1655,10 @@ class ZipRepackerTests(unittest.TestCase):
1578
1655
  m_sddnsbd.assert_not_called()
1579
1656
  m_sddns.assert_not_called()
1580
1657
 
1581
- # return None if no sufficient header length
1658
+ # return None if truncated local file header
1582
1659
  bytes_ = self._generate_local_file_entry(
1583
1660
  'file.txt', b'dummy', compression=method)
1584
- bytes_ = bytes_[:29]
1661
+ bytes_ = bytes_[:zipfile.sizeFileHeader - 1]
1585
1662
  fz = io.BytesIO(bytes_)
1586
1663
  with mock.patch.object(repacker, '_scan_data_descriptor',
1587
1664
  wraps=repacker._scan_data_descriptor) as m_sdd, \
@@ -2125,6 +2202,10 @@ class ZipRepackerTests(unittest.TestCase):
2125
2202
  def test_trace_compressed_block_end_bz2(self):
2126
2203
  self._test_trace_compressed_block_end(zipfile.ZIP_BZIP2, OSError)
2127
2204
 
2205
+ @requires_lzma()
2206
+ def test_trace_compressed_block_end_lzma(self):
2207
+ self._test_trace_compressed_block_end(zipfile.ZIP_LZMA, EOFError)
2208
+
2128
2209
  @requires_zstd()
2129
2210
  def test_trace_compressed_block_end_zstd(self):
2130
2211
  import compression.zstd
@@ -2193,6 +2274,87 @@ class ZipRepackerTests(unittest.TestCase):
2193
2274
  comp_len,
2194
2275
  )
2195
2276
 
2277
+ def test_calc_local_file_entry_size(self):
2278
+ repacker = zipfile._ZipRepacker()
2279
+
2280
+ # basic
2281
+ fz = io.BytesIO()
2282
+ with zipfile.ZipFile(fz, 'w') as zh:
2283
+ with zh.open('file.txt', 'w') as fh:
2284
+ fh.write(b'dummy')
2285
+ zi = zh.infolist()[-1]
2286
+
2287
+ self.assertEqual(
2288
+ repacker._calc_local_file_entry_size(fz, zi),
2289
+ (30, 8, 0, 5, 0),
2290
+ )
2291
+
2292
+ # data descriptor
2293
+ fz = io.BytesIO()
2294
+ with zipfile.ZipFile(Unseekable(fz), 'w') as zh:
2295
+ with zh.open('file.txt', 'w') as fh:
2296
+ fh.write(b'dummy')
2297
+ zi = zh.infolist()[-1]
2298
+
2299
+ self.assertEqual(
2300
+ repacker._calc_local_file_entry_size(fz, zi),
2301
+ (30, 8, 0, 5, 16),
2302
+ )
2303
+
2304
+ # data descriptor (unsigned)
2305
+ fz = io.BytesIO()
2306
+ with zipfile.ZipFile(Unseekable(fz), 'w') as zh:
2307
+ with mock.patch.object(struct, 'pack', side_effect=struct_pack_no_dd_sig), \
2308
+ zh.open('file.txt', 'w') as fh:
2309
+ fh.write(b'dummy')
2310
+ zi = zh.infolist()[-1]
2311
+
2312
+ self.assertEqual(
2313
+ repacker._calc_local_file_entry_size(fz, zi),
2314
+ (30, 8, 0, 5, 12),
2315
+ )
2316
+
2317
+ @requires_zip64fix()
2318
+ def test_calc_local_file_entry_size_zip64(self):
2319
+ repacker = zipfile._ZipRepacker()
2320
+
2321
+ # zip64
2322
+ fz = io.BytesIO()
2323
+ with zipfile.ZipFile(fz, 'w') as zh:
2324
+ with zh.open('file.txt', 'w', force_zip64=True) as fh:
2325
+ fh.write(b'dummy')
2326
+ zi = zh.infolist()[-1]
2327
+
2328
+ self.assertEqual(
2329
+ repacker._calc_local_file_entry_size(fz, zi),
2330
+ (30, 8, 20, 5, 0),
2331
+ )
2332
+
2333
+ # data descriptor + zip64
2334
+ fz = io.BytesIO()
2335
+ with zipfile.ZipFile(Unseekable(fz), 'w') as zh:
2336
+ with zh.open('file.txt', 'w', force_zip64=True) as fh:
2337
+ fh.write(b'dummy')
2338
+ zi = zh.infolist()[-1]
2339
+
2340
+ self.assertEqual(
2341
+ repacker._calc_local_file_entry_size(fz, zi),
2342
+ (30, 8, 20, 5, 24),
2343
+ )
2344
+
2345
+ # data descriptor (unsigned) + zip64
2346
+ fz = io.BytesIO()
2347
+ with zipfile.ZipFile(Unseekable(fz), 'w') as zh:
2348
+ with mock.patch.object(struct, 'pack', side_effect=struct_pack_no_dd_sig), \
2349
+ zh.open('file.txt', 'w', force_zip64=True) as fh:
2350
+ fh.write(b'dummy')
2351
+ zi = zh.infolist()[-1]
2352
+
2353
+ self.assertEqual(
2354
+ repacker._calc_local_file_entry_size(fz, zi),
2355
+ (30, 8, 20, 5, 20),
2356
+ )
2357
+
2196
2358
  def test_copy_bytes(self):
2197
2359
  repacker = zipfile._ZipRepacker()
2198
2360
 
@@ -132,10 +132,6 @@ class TestRepack(unittest.TestCase):
132
132
  zh.writestr(file, data)
133
133
 
134
134
  with zipfile.ZipFile(f, 'a') as zh:
135
- # make sure data descriptor bit is really set (by making zip file unseekable)
136
- for zi in zh.infolist():
137
- self.assertTrue(zi.flag_bits & 8, f'data descriptor flag not set: {zi.filename}')
138
-
139
135
  zh.remove(file1)
140
136
  zh.repack()
141
137
  self.assertIsNone(zh.testzip())
@@ -143,6 +139,10 @@ class TestRepack(unittest.TestCase):
143
139
  def test_strip_removed_large_file_with_dd_no_sig(self):
144
140
  """Should scan for the data descriptor (without signature) of a removed
145
141
  large file without causing a memory issue."""
142
+ # Reduce data scale for this test, as it's especially slow...
143
+ self.datacount = 30*1024**2 // len(self.data)
144
+ self.allowed_memory = 200*1024
145
+
146
146
  # Try the temp file. If we do TESTFN2, then it hogs
147
147
  # gigabytes of disk space for the duration of the test.
148
148
  with TemporaryFile() as f:
@@ -154,9 +154,6 @@ class TestRepack(unittest.TestCase):
154
154
  self.assertLess(peak, self.allowed_memory)
155
155
 
156
156
  def _test_strip_removed_large_file_with_dd_no_sig(self, f):
157
- # Reduce data to 400 MiB for this test, as it's especially slow...
158
- self.datacount = 400*1024**2 // len(self.data)
159
-
160
157
  file = 'file.txt'
161
158
  file1 = 'largefile.txt'
162
159
  data = b'Sed ut perspiciatis unde omnis iste natus error sit voluptatem'
@@ -167,10 +164,6 @@ class TestRepack(unittest.TestCase):
167
164
  zh.writestr(file, data)
168
165
 
169
166
  with zipfile.ZipFile(f, 'a') as zh:
170
- # make sure data descriptor bit is really set (by making zip file unseekable)
171
- for zi in zh.infolist():
172
- self.assertTrue(zi.flag_bits & 8, f'data descriptor flag not set: {zi.filename}')
173
-
174
167
  zh.remove(file1)
175
168
  zh.repack()
176
169
  self.assertIsNone(zh.testzip())
@@ -201,10 +194,6 @@ class TestRepack(unittest.TestCase):
201
194
  zh.writestr(file, data)
202
195
 
203
196
  with zipfile.ZipFile(f, 'a') as zh:
204
- # make sure data descriptor bit is really set (by making zip file unseekable)
205
- for zi in zh.infolist():
206
- self.assertTrue(zi.flag_bits & 8, f'data descriptor flag not set: {zi.filename}')
207
-
208
197
  zh.remove(file1)
209
198
  zh.repack()
210
199
  self.assertIsNone(zh.testzip())
File without changes
File without changes
File without changes