archive-r-python 0.1.0__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (45) hide show
  1. archive_r_python-0.1.0/LICENSE.txt +56 -0
  2. archive_r_python-0.1.0/MANIFEST.in +11 -0
  3. archive_r_python-0.1.0/PKG-INFO +612 -0
  4. archive_r_python-0.1.0/README.md +583 -0
  5. archive_r_python-0.1.0/VERSION +1 -0
  6. archive_r_python-0.1.0/_vendor/archive_r/include/archive_r/data_stream.h +41 -0
  7. archive_r_python-0.1.0/_vendor/archive_r/include/archive_r/entry.h +161 -0
  8. archive_r_python-0.1.0/_vendor/archive_r/include/archive_r/entry_fault.h +34 -0
  9. archive_r_python-0.1.0/_vendor/archive_r/include/archive_r/entry_metadata.h +56 -0
  10. archive_r_python-0.1.0/_vendor/archive_r/include/archive_r/multi_volume_stream_base.h +46 -0
  11. archive_r_python-0.1.0/_vendor/archive_r/include/archive_r/path_hierarchy.h +109 -0
  12. archive_r_python-0.1.0/_vendor/archive_r/include/archive_r/path_hierarchy_utils.h +37 -0
  13. archive_r_python-0.1.0/_vendor/archive_r/include/archive_r/traverser.h +122 -0
  14. archive_r_python-0.1.0/_vendor/archive_r/src/archive_stack_cursor.cc +330 -0
  15. archive_r_python-0.1.0/_vendor/archive_r/src/archive_stack_cursor.h +98 -0
  16. archive_r_python-0.1.0/_vendor/archive_r/src/archive_stack_orchestrator.cc +162 -0
  17. archive_r_python-0.1.0/_vendor/archive_r/src/archive_stack_orchestrator.h +54 -0
  18. archive_r_python-0.1.0/_vendor/archive_r/src/archive_type.cc +552 -0
  19. archive_r_python-0.1.0/_vendor/archive_r/src/archive_type.h +76 -0
  20. archive_r_python-0.1.0/_vendor/archive_r/src/data_stream.cc +35 -0
  21. archive_r_python-0.1.0/_vendor/archive_r/src/entry.cc +253 -0
  22. archive_r_python-0.1.0/_vendor/archive_r/src/entry_fault.cc +26 -0
  23. archive_r_python-0.1.0/_vendor/archive_r/src/entry_fault_error.cc +54 -0
  24. archive_r_python-0.1.0/_vendor/archive_r/src/entry_fault_error.h +32 -0
  25. archive_r_python-0.1.0/_vendor/archive_r/src/entry_impl.h +58 -0
  26. archive_r_python-0.1.0/_vendor/archive_r/src/multi_volume_manager.cc +81 -0
  27. archive_r_python-0.1.0/_vendor/archive_r/src/multi_volume_manager.h +41 -0
  28. archive_r_python-0.1.0/_vendor/archive_r/src/multi_volume_stream_base.cc +199 -0
  29. archive_r_python-0.1.0/_vendor/archive_r/src/path_hierarchy.cc +151 -0
  30. archive_r_python-0.1.0/_vendor/archive_r/src/path_hierarchy_utils.cc +304 -0
  31. archive_r_python-0.1.0/_vendor/archive_r/src/simple_profiler.h +120 -0
  32. archive_r_python-0.1.0/_vendor/archive_r/src/system_file_stream.cc +263 -0
  33. archive_r_python-0.1.0/_vendor/archive_r/src/system_file_stream.h +46 -0
  34. archive_r_python-0.1.0/_vendor/archive_r/src/traverser.cc +314 -0
  35. archive_r_python-0.1.0/archive_r_python.egg-info/PKG-INFO +612 -0
  36. archive_r_python-0.1.0/archive_r_python.egg-info/SOURCES.txt +44 -0
  37. archive_r_python-0.1.0/archive_r_python.egg-info/dependency_links.txt +1 -0
  38. archive_r_python-0.1.0/archive_r_python.egg-info/not-zip-safe +1 -0
  39. archive_r_python-0.1.0/archive_r_python.egg-info/top_level.txt +1 -0
  40. archive_r_python-0.1.0/examples/traverse_archive.py +72 -0
  41. archive_r_python-0.1.0/pyproject.toml +7 -0
  42. archive_r_python-0.1.0/setup.cfg +33 -0
  43. archive_r_python-0.1.0/setup.py +173 -0
  44. archive_r_python-0.1.0/src/archive_r_py.cc +723 -0
  45. archive_r_python-0.1.0/test/test_traverser.py +488 -0
@@ -0,0 +1,56 @@
1
+ archive_r License
2
+ Version: 0.1.0 (2025-10-25)
3
+
4
+ ----------------------------------------
5
+ Primary License
6
+ ----------------------------------------
7
+
8
+ MIT License
9
+
10
+ Copyright (c) 2025 archive_r Team
11
+
12
+ Permission is hereby granted, free of charge, to any person obtaining a copy
13
+ of this software and associated documentation files (the "Software"), to deal
14
+ in the Software without restriction, including without limitation the rights
15
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
16
+ copies of the Software, and to permit persons to whom the Software is
17
+ furnished to do so, subject to the following conditions:
18
+
19
+ The above copyright notice and this permission notice shall be included in all
20
+ copies or substantial portions of the Software.
21
+
22
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
23
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
24
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
25
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
26
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
27
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
28
+ SOFTWARE.
29
+
30
+ ----------------------------------------
31
+ Third-Party Notices
32
+ ----------------------------------------
33
+
34
+ This distribution bundles or links against the following third-party
35
+ components. Their respective license terms apply in addition to the MIT
36
+ License shown above.
37
+
38
+ 1. libarchive
39
+ - Purpose: core archive reading and writing functionality for the C++
40
+ library and language bindings.
41
+ - License: New BSD License (https://www.libarchive.org/)
42
+
43
+ 2. pybind11
44
+ - Purpose: header-only binding generator for the Python extension module.
45
+ - License: BSD-style License (https://github.com/pybind/pybind11)
46
+
47
+ 3. rake (development dependency for Ruby bindings)
48
+ - Purpose: build and release tasks for the Ruby gem.
49
+ - License: MIT License (https://github.com/ruby/rake)
50
+
51
+ 4. minitest (development dependency for Ruby bindings)
52
+ - Purpose: unit testing framework for the Ruby gem.
53
+ - License: MIT License (https://github.com/minitest/minitest)
54
+
55
+ Users of archive_r should review the linked third-party licenses to ensure
56
+ compliance with their terms when redistributing this software.
@@ -0,0 +1,11 @@
1
+ include README.md
2
+ include LICENSE.txt
3
+ include VERSION
4
+ include pyproject.toml
5
+ include setup.cfg
6
+ include setup.py
7
+ recursive-include src *.cc *.h
8
+ recursive-include examples *.py
9
+ recursive-include test *.py
10
+ recursive-include _vendor/archive_r/include *
11
+ recursive-include _vendor/archive_r/src *
@@ -0,0 +1,612 @@
1
+ Metadata-Version: 2.4
2
+ Name: archive_r_python
3
+ Version: 0.1.0
4
+ Summary: Python bindings for the archive_r traverser library
5
+ Home-page: https://github.com/Raizo-TCS/archive_r
6
+ Author: archive_r Team
7
+ Author-email: raizo.tcs@users.noreply.github.com
8
+ License: MIT
9
+ Project-URL: Source, https://github.com/Raizo-TCS/archive_r
10
+ Project-URL: Bug Tracker, https://github.com/Raizo-TCS/archive_r/issues
11
+ Project-URL: Documentation, https://github.com/Raizo-TCS/archive_r#readme
12
+ Keywords: archive libarchive traversal nested multi-volume
13
+ Classifier: Programming Language :: Python :: 3
14
+ Classifier: Programming Language :: Python :: 3.8
15
+ Classifier: Programming Language :: Python :: 3.9
16
+ Classifier: Programming Language :: Python :: 3.10
17
+ Classifier: Programming Language :: Python :: 3.11
18
+ Classifier: Programming Language :: Python :: 3.12
19
+ Classifier: Operating System :: POSIX :: Linux
20
+ Classifier: Development Status :: 4 - Beta
21
+ Classifier: Topic :: System :: Archiving
22
+ Classifier: Topic :: Software Development :: Libraries
23
+ Requires-Python: >=3.8
24
+ Description-Content-Type: text/markdown
25
+ License-File: LICENSE.txt
26
+ Dynamic: description
27
+ Dynamic: description-content-type
28
+ Dynamic: license-file
29
+
30
+ # archive_r Python Bindings
31
+
32
+ > ⚠️ **Development Status**: This library is currently under development. The API may change without notice.
33
+
34
+ ## Overview
35
+
36
+ Python bindings for archive_r, providing recursive archive reading capabilities using libarchive. The bindings expose a Pythonic iterator API with context manager support for traversing nested archives without extracting them to temporary files.
37
+
38
+ ---
39
+
40
+ ## Installation
41
+
42
+ ### From PyPI
43
+
44
+ ```bash
45
+ pip install archive_r_python
46
+ ```
47
+
48
+ ### From Source
49
+
50
+ ```bash
51
+ cd archive_r/bindings/python
52
+ pip install .
53
+ ```
54
+
55
+ ### Development Installation (Editable Mode)
56
+
57
+ ```bash
58
+ cd archive_r/bindings/python
59
+ pip install -e .
60
+ ```
61
+
62
+ ### Building with Parent Build Script
63
+
64
+ ```bash
65
+ cd archive_r
66
+ ./build.sh --with-python
67
+ ```
68
+
69
+ This builds the core library and Python bindings, placing artifacts in `build/bindings/python/`.
70
+
71
+ ---
72
+
73
+ ## Basic Usage
74
+
75
+ ### Simple Traversal
76
+
77
+ ```python
78
+ import archive_r
79
+
80
+ # Context manager ensures proper resource cleanup
81
+ with archive_r.Traverser("test.zip") as traverser:
82
+ for entry in traverser:
83
+ print(f"Path: {entry.path} (depth={entry.depth})")
84
+ if entry.is_file:
85
+ print(f" Size: {entry.size} bytes")
86
+ ```
87
+
88
+ ### Reading Entry Content
89
+
90
+ ```python
91
+ import archive_r
92
+
93
+ with archive_r.Traverser("archive.tar.gz") as traverser:
94
+ for entry in traverser:
95
+ if entry.is_file and entry.path.endswith('.txt'):
96
+ # Read full content
97
+ content = entry.read()
98
+ print(f"Content of {entry.path}:")
99
+ print(content.decode('utf-8', errors='replace'))
100
+ ```
101
+
102
+ ### Chunked Reading (Large Files)
103
+
104
+ ```python
105
+ import archive_r
106
+
107
+ with archive_r.Traverser("large_archive.zip") as traverser:
108
+ for entry in traverser:
109
+ if entry.is_file:
110
+ # Read in 8KB chunks
111
+ chunk_size = 8192
112
+ total_bytes = 0
113
+ while True:
114
+ chunk = entry.read(chunk_size)
115
+ if not chunk:
116
+ break
117
+ total_bytes += len(chunk)
118
+ # Process chunk...
119
+
120
+ print(f"{entry.path}: {total_bytes} bytes read")
121
+ ```
122
+
123
+ ### Searching in Entry Content
124
+
125
+ ```python
126
+ import archive_r
127
+
128
+ def search_in_entry(entry, keyword):
129
+ """Stream search within entry content (buffer boundary aware)"""
130
+ overlap = b''
131
+ buffer_size = 8192
132
+ keyword_bytes = keyword.encode('utf-8')
133
+
134
+ while True:
135
+ chunk = entry.read(buffer_size)
136
+ if not chunk:
137
+ break
138
+
139
+ search_text = overlap + chunk
140
+ if keyword_bytes in search_text:
141
+ return True
142
+
143
+ # Preserve tail for next iteration
144
+ if len(chunk) >= len(keyword_bytes) - 1:
145
+ overlap = chunk[-(len(keyword_bytes) - 1):]
146
+ else:
147
+ overlap = chunk
148
+
149
+ return False
150
+
151
+ with archive_r.Traverser("documents.zip") as traverser:
152
+ for entry in traverser:
153
+ if entry.is_file and entry.path.endswith('.txt'):
154
+ if search_in_entry(entry, "important"):
155
+ print(f"Found keyword in: {entry.path}")
156
+ ```
157
+
158
+ ### Controlling Archive Descent
159
+
160
+ ```python
161
+ import archive_r
162
+
163
+ with archive_r.Traverser("test.zip") as traverser:
164
+ for entry in traverser:
165
+ # Don't expand Office files (they are ZIP internally)
166
+ if entry.path.endswith(('.docx', '.xlsx', '.pptx')):
167
+ entry.set_descent(False)
168
+
169
+ print(f"Path: {entry.path}, Will descend: {entry.descent_enabled}")
170
+ ```
171
+
172
+ You can also disable automatic descent globally:
173
+
174
+ ```python
175
+ # Disable automatic descent for all entries
176
+ with archive_r.Traverser("test.zip", descend_archives=False) as traverser:
177
+ for entry in traverser:
178
+ # Manually enable descent for specific entries
179
+ if entry.path.endswith('.tar.gz'):
180
+ entry.set_descent(True)
181
+ ```
182
+
183
+ > ⚠️ **Note**: Reading entry content automatically disables descent. Call `entry.set_descent(True)` if you need to descend after reading.
184
+
185
+ ---
186
+
187
+ ## Path Representation
188
+
189
+ The Python bindings provide three ways to access entry paths:
190
+
191
+ ```python
192
+ with archive_r.Traverser("outer.zip") as traverser:
193
+ for entry in traverser:
194
+ # Full path including top-level archive
195
+ # Example: "outer.zip/inner.tar/file.txt"
196
+ print(f"path: {entry.path}")
197
+
198
+ # Last element of path_hierarchy
199
+ # Example: "inner.tar/file.txt"
200
+ print(f"name: {entry.name}")
201
+
202
+ # Path hierarchy as list
203
+ # Example: ["outer.zip", "inner.tar/file.txt"]
204
+ print(f"path_hierarchy: {entry.path_hierarchy}")
205
+ ```
206
+
207
+ `path_hierarchy` is particularly useful when you need custom path separators or want to represent the nesting structure explicitly.
208
+
209
+ ---
210
+
211
+ ## Metadata Access
212
+
213
+ ### Basic Metadata
214
+
215
+ Entry objects provide common metadata through properties:
216
+
217
+ ```python
218
+ with archive_r.Traverser("archive.tar") as traverser:
219
+ for entry in traverser:
220
+ print(f"Path: {entry.path}")
221
+ print(f" Type: {'file' if entry.is_file else 'directory'}")
222
+ print(f" Size: {entry.size} bytes")
223
+ print(f" Depth: {entry.depth}")
224
+ ```
225
+
226
+ ### Extended Metadata
227
+
228
+ For additional metadata (permissions, ownership, timestamps), specify `metadata_keys`:
229
+
230
+ ```python
231
+ with archive_r.Traverser("archive.tar", metadata_keys=["uid", "gid", "mtime", "mode"]) as traverser:
232
+ for entry in traverser:
233
+ # Retrieve all specified metadata as dictionary
234
+ metadata = entry.metadata()
235
+ print(f"{entry.path}:")
236
+ print(f" UID: {metadata.get('uid')}")
237
+ print(f" GID: {metadata.get('gid')}")
238
+ print(f" Mode: {oct(metadata.get('mode', 0))}")
239
+
240
+ # Or retrieve specific metadata
241
+ mtime = entry.find_metadata("mtime")
242
+ if mtime is not None:
243
+ print(f" Modified: {mtime}")
244
+ ```
245
+
246
+ Available metadata keys depend on the archive format. Common keys include:
247
+ - `uid`, `gid`: User/group ID
248
+ - `mtime`, `atime`, `ctime`: Timestamps (Unix time)
249
+ - `mode`: File permissions
250
+ - `uname`, `gname`: User/group names
251
+ - `hardlink`, `symlink`: Link targets
252
+
253
+ ---
254
+
255
+ ## Processing Split Archives
256
+
257
+ For split archive files (e.g., `.zip.001`, `.zip.002`), use `set_multi_volume_group()`:
258
+
259
+ ```python
260
+ import archive_r
261
+
262
+ with archive_r.Traverser("container.tar") as traverser:
263
+ for entry in traverser:
264
+ # Detect split archive parts
265
+ if '.part' in entry.path:
266
+ # Extract base name (e.g., "archive.zip.part001" → "archive.zip")
267
+ pos = entry.path.rfind('.part')
268
+ base_name = entry.path[:pos]
269
+ entry.set_multi_volume_group(base_name)
270
+
271
+ # After parent traversal, grouped parts are merged and expanded
272
+ ```
273
+
274
+ ---
275
+
276
+ ## Format Specification
277
+
278
+ By default, all formats supported by libarchive are enabled. To restrict to specific formats:
279
+
280
+ ```python
281
+ # Enable only ZIP and TAR
282
+ with archive_r.Traverser("test.zip", formats=["zip", "tar"]) as traverser:
283
+ for entry in traverser:
284
+ print(entry.path)
285
+ ```
286
+
287
+ Common format names: `"7zip"`, `"ar"`, `"cab"`, `"cpio"`, `"iso9660"`, `"lha"`, `"rar"`, `"tar"`, `"warc"`, `"xar"`, `"zip"`
288
+
289
+ > 💡 **Tip**: Exclude pseudo-formats like `"mtree"` and `"raw"` if you encounter false positives on non-archive files.
290
+
291
+ ---
292
+
293
+ ## Custom Stream Factories
294
+
295
+ You can provide custom stream objects (file-like objects with `read()` method) to override the default file opening behavior:
296
+
297
+ ```python
298
+ import archive_r
299
+ import io
300
+
301
+ # Register a custom stream factory
302
+ def custom_stream_factory(path):
303
+ """Return a file-like object for the given path"""
304
+ if path == "special_file.bin":
305
+ # Return custom data source
306
+ return io.BytesIO(b"custom content")
307
+ # Return None to use default file opening
308
+ return None
309
+
310
+ archive_r.register_stream_factory(custom_stream_factory)
311
+
312
+ with archive_r.Traverser("test.zip") as traverser:
313
+ for entry in traverser:
314
+ # When traverser needs to open "special_file.bin",
315
+ # your factory will provide the BytesIO stream
316
+ pass
317
+ ```
318
+
319
+ Stream objects must provide:
320
+ - `read(size)`: Read up to `size` bytes
321
+ - Optional: `seek(offset, whence)`, `tell()` for seekable streams
322
+ - Optional: `rewind()` (defaults to `seek(0, 0)` if not provided)
323
+
324
+ ---
325
+
326
+ ## Error Handling
327
+
328
+ ### Fault Callbacks
329
+
330
+ Data errors (corrupted archives, I/O failures) are reported via callbacks without stopping traversal:
331
+
332
+ ```python
333
+ import archive_r
334
+
335
+ def fault_handler(fault_info):
336
+ """Called when data errors occur during traversal"""
337
+ print(f"Warning at {fault_info['hierarchy']}: {fault_info['message']}")
338
+ if fault_info.get('errno'):
339
+ print(f" Error code: {fault_info['errno']}")
340
+
341
+ archive_r.on_fault(fault_handler)
342
+
343
+ with archive_r.Traverser("potentially_corrupted.zip") as traverser:
344
+ for entry in traverser:
345
+ # Valid entries are processed normally
346
+ # Corrupted entries trigger fault_handler
347
+ print(entry.path)
348
+ ```
349
+
350
+ ### Read Errors
351
+
352
+ Errors during `read()` raise exceptions:
353
+
354
+ ```python
355
+ try:
356
+ with archive_r.Traverser("test.zip") as traverser:
357
+ for entry in traverser:
358
+ if entry.is_file:
359
+ content = entry.read()
360
+ except RuntimeError as e:
361
+ print(f"Read error: {e}")
362
+ ```
363
+
364
+ ---
365
+
366
+ ## Thread Safety
367
+
368
+ The Python bindings follow the same thread safety constraints as the C++ core:
369
+
370
+ - ✓ **Thread-safe**: Each thread can create and use its own `Traverser` instance independently
371
+ - ✗ **Not thread-safe**: A single `Traverser` or `Entry` instance must not be shared across threads
372
+
373
+ ### Example
374
+
375
+ ```python
376
+ import threading
377
+ import archive_r
378
+
379
+ # ✓ SAFE: Each thread has its own Traverser
380
+ def worker():
381
+ with archive_r.Traverser("archive.tar.gz") as traverser:
382
+ for entry in traverser:
383
+ # Process entry...
384
+ pass
385
+
386
+ t1 = threading.Thread(target=worker)
387
+ t2 = threading.Thread(target=worker)
388
+ t1.start()
389
+ t2.start()
390
+ t1.join()
391
+ t2.join()
392
+
393
+ # ✗ UNSAFE: Sharing a single Traverser instance across threads
394
+ shared_traverser = archive_r.Traverser("archive.tar.gz")
395
+ def unsafe_worker():
396
+ for entry in shared_traverser: # Race condition!
397
+ pass
398
+
399
+ # Don't do this!
400
+ # t1 = threading.Thread(target=unsafe_worker)
401
+ # t2 = threading.Thread(target=unsafe_worker)
402
+ ```
403
+
404
+ Additionally:
405
+ - **Global registration functions** (`register_stream_factory`, `on_fault`) should be called during single-threaded initialization
406
+ - **Entry objects** should not be shared between threads (they are tied to the Traverser's internal state)
407
+
408
+ ---
409
+
410
+ ## Advanced Examples
411
+
412
+ ### Full Example: Recursive Archive Analyzer
413
+
414
+ ```python
415
+ import archive_r
416
+ import sys
417
+ from collections import defaultdict
418
+
419
+ def analyze_archive(archive_path):
420
+ """Analyze archive contents and print statistics"""
421
+ stats = defaultdict(int)
422
+ file_types = defaultdict(int)
423
+
424
+ with archive_r.Traverser(archive_path, metadata_keys=["mtime"]) as traverser:
425
+ for entry in traverser:
426
+ stats['total_entries'] += 1
427
+
428
+ if entry.is_file:
429
+ stats['files'] += 1
430
+ stats['total_size'] += entry.size
431
+
432
+ # Count by extension
433
+ if '.' in entry.name:
434
+ ext = entry.name.rsplit('.', 1)[1]
435
+ file_types[ext] += 1
436
+
437
+ # Find largest file
438
+ if entry.size > stats.get('max_file_size', 0):
439
+ stats['max_file_size'] = entry.size
440
+ stats['max_file_path'] = entry.path
441
+ else:
442
+ stats['directories'] += 1
443
+
444
+ # Track maximum depth
445
+ if entry.depth > stats.get('max_depth', 0):
446
+ stats['max_depth'] = entry.depth
447
+
448
+ # Print results
449
+ print(f"\nArchive Analysis: {archive_path}")
450
+ print(f" Total entries: {stats['total_entries']}")
451
+ print(f" Files: {stats['files']}")
452
+ print(f" Directories: {stats['directories']}")
453
+ print(f" Total size: {stats['total_size']:,} bytes")
454
+ print(f" Maximum depth: {stats['max_depth']}")
455
+
456
+ if 'max_file_path' in stats:
457
+ print(f" Largest file: {stats['max_file_path']} ({stats['max_file_size']:,} bytes)")
458
+
459
+ if file_types:
460
+ print("\n File types:")
461
+ for ext, count in sorted(file_types.items(), key=lambda x: x[1], reverse=True)[:10]:
462
+ print(f" .{ext}: {count}")
463
+
464
+ if __name__ == '__main__':
465
+ if len(sys.argv) < 2:
466
+ print("Usage: python analyze.py <archive_path>")
467
+ sys.exit(1)
468
+
469
+ analyze_archive(sys.argv[1])
470
+ ```
471
+
472
+ ---
473
+
474
+ ## Testing
475
+
476
+ Run the Python binding tests:
477
+
478
+ ```bash
479
+ cd archive_r/bindings/python
480
+ python -m unittest discover test
481
+ ```
482
+
483
+ Or use the project-wide test runner:
484
+
485
+ ```bash
486
+ cd archive_r
487
+ ./run_tests.sh
488
+ ```
489
+
490
+ ---
491
+
492
+ ## API Reference
493
+
494
+ ### Module: `archive_r`
495
+
496
+ #### Class: `Traverser`
497
+
498
+ Constructor:
499
+ ```python
500
+ Traverser(
501
+ roots, # str or list of str/list (path hierarchy)
502
+ formats=None, # list of format names (default: all)
503
+ descend_archives=True, # automatically expand archives
504
+ metadata_keys=None, # list of metadata keys to capture
505
+ passphrases=None # list of passphrases for encrypted archives
506
+ )
507
+ ```
508
+
509
+ Methods:
510
+ - `__iter__()`: Returns self (iterator protocol)
511
+ - `__next__()`: Returns next `Entry` or raises `StopIteration`
512
+ - `__enter__()`: Context manager entry (returns self)
513
+ - `__exit__(exc_type, exc_val, exc_tb)`: Context manager exit
514
+
515
+ #### Class: `Entry`
516
+
517
+ Properties:
518
+ - `path`: Full path string (read-only)
519
+ - `name`: Last element of path hierarchy (read-only)
520
+ - `path_hierarchy`: List representation of path (read-only)
521
+ - `depth`: Nesting depth (read-only)
522
+ - `is_file`: True if entry is a file (read-only)
523
+ - `size`: File size in bytes, 0 for directories (read-only)
524
+ - `descent_enabled`: Whether this entry will be expanded as an archive (read-only)
525
+
526
+ Methods:
527
+ - `read(size=None)`: Read entry content (bytes). If `size` is omitted, reads all remaining data
528
+ - `set_descent(enabled)`: Enable/disable archive expansion for this entry
529
+ - `set_multi_volume_group(group_name)`: Register this entry as part of a split archive group
530
+ - `metadata()`: Return dictionary of all captured metadata
531
+ - `find_metadata(key)`: Return value for specific metadata key, or None if not found
532
+
533
+ #### Function: `register_stream_factory`
534
+
535
+ ```python
536
+ archive_r.register_stream_factory(factory_func)
537
+ ```
538
+
539
+ Register a callback to provide custom stream objects for file access.
540
+
541
+ **Parameters**:
542
+ - `factory_func`: Callable that takes a file path (str) and returns a file-like object or None
543
+
544
+ **Stream object requirements**:
545
+ - Must provide `read(size)` method
546
+ - Optional: `seek(offset, whence)`, `tell()`, `rewind()`
547
+
548
+ #### Function: `on_fault`
549
+
550
+ ```python
551
+ archive_r.on_fault(callback)
552
+ ```
553
+
554
+ Register a callback to receive fault notifications during traversal.
555
+
556
+ **Parameters**:
557
+ - `callback`: Callable that takes a dict with keys:
558
+ - `hierarchy`: List of path components where fault occurred
559
+ - `message`: Human-readable error description
560
+ - `errno`: Optional error number from system calls
561
+
562
+ ---
563
+
564
+ ## Packaging
565
+
566
+ ### Building Wheels
567
+
568
+ ```bash
569
+ cd archive_r
570
+ ./build.sh --package-python
571
+ ```
572
+
573
+ This creates wheel (`.whl`) and source distribution (`.tar.gz`) in `build/bindings/python/dist/`.
574
+
575
+ ### Manual Packaging
576
+
577
+ ```bash
578
+ cd bindings/python
579
+ python setup.py sdist bdist_wheel
580
+ ```
581
+
582
+ ---
583
+
584
+ ## Requirements
585
+
586
+ - Python 3.8 or later
587
+ - libarchive 3.x (runtime dependency)
588
+ - setuptools, wheel (build dependencies)
589
+ - pybind11 >= 2.6.0 (build dependency, automatically vendored during packaging)
590
+
591
+ ---
592
+
593
+ ## License
594
+
595
+ The Python bindings are distributed under the MIT License, consistent with the archive_r core library.
596
+
597
+ ### Third-Party Licenses
598
+
599
+ - **pybind11**: BSD-style License (used for C++/Python interfacing)
600
+ - **libarchive**: New BSD License (runtime dependency)
601
+
602
+ ---
603
+
604
+ ## See Also
605
+
606
+ - [archive_r Core Documentation](../../README.md)
607
+ - [Ruby Bindings](../ruby/README.md)
608
+ - [Example Scripts](examples/)
609
+
610
+ ---
611
+
612
+ **Note**: This document describes archive_r Python bindings version 0.1.0.