PyHardLinkBackup 1.4.1__tar.gz → 1.6.0__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (61) hide show
  1. {pyhardlinkbackup-1.4.1 → pyhardlinkbackup-1.6.0}/PKG-INFO +12 -5
  2. {pyhardlinkbackup-1.4.1 → pyhardlinkbackup-1.6.0}/PyHardLinkBackup/__init__.py +1 -1
  3. {pyhardlinkbackup-1.4.1 → pyhardlinkbackup-1.6.0}/PyHardLinkBackup/backup.py +1 -1
  4. {pyhardlinkbackup-1.4.1 → pyhardlinkbackup-1.6.0}/PyHardLinkBackup/cli_app/phlb.py +37 -1
  5. pyhardlinkbackup-1.6.0/PyHardLinkBackup/compare_backup.py +213 -0
  6. {pyhardlinkbackup-1.4.1 → pyhardlinkbackup-1.6.0}/PyHardLinkBackup/tests/test_backup.py +81 -3
  7. pyhardlinkbackup-1.6.0/PyHardLinkBackup/tests/test_compare_backup.py +145 -0
  8. {pyhardlinkbackup-1.4.1 → pyhardlinkbackup-1.6.0}/README.md +11 -4
  9. {pyhardlinkbackup-1.4.1 → pyhardlinkbackup-1.6.0}/.editorconfig +0 -0
  10. {pyhardlinkbackup-1.4.1 → pyhardlinkbackup-1.6.0}/.github/workflows/tests.yml +0 -0
  11. {pyhardlinkbackup-1.4.1 → pyhardlinkbackup-1.6.0}/.gitignore +0 -0
  12. {pyhardlinkbackup-1.4.1 → pyhardlinkbackup-1.6.0}/.idea/.gitignore +0 -0
  13. {pyhardlinkbackup-1.4.1 → pyhardlinkbackup-1.6.0}/.pre-commit-config.yaml +0 -0
  14. {pyhardlinkbackup-1.4.1 → pyhardlinkbackup-1.6.0}/.pre-commit-hooks.yaml +0 -0
  15. {pyhardlinkbackup-1.4.1 → pyhardlinkbackup-1.6.0}/.run/Template Python tests.run.xml +0 -0
  16. {pyhardlinkbackup-1.4.1 → pyhardlinkbackup-1.6.0}/.run/Unittests - __all__.run.xml +0 -0
  17. {pyhardlinkbackup-1.4.1 → pyhardlinkbackup-1.6.0}/.run/cli.py --help.run.xml +0 -0
  18. {pyhardlinkbackup-1.4.1 → pyhardlinkbackup-1.6.0}/.run/dev-cli update.run.xml +0 -0
  19. {pyhardlinkbackup-1.4.1 → pyhardlinkbackup-1.6.0}/.run/only DocTests.run.xml +0 -0
  20. {pyhardlinkbackup-1.4.1 → pyhardlinkbackup-1.6.0}/.run/only DocWrite.run.xml +0 -0
  21. {pyhardlinkbackup-1.4.1 → pyhardlinkbackup-1.6.0}/.venv-app/lib/python3.12/site-packages/cli_base/tests/shell_complete_snapshots/.gitignore +0 -0
  22. {pyhardlinkbackup-1.4.1 → pyhardlinkbackup-1.6.0}/PyHardLinkBackup/__main__.py +0 -0
  23. {pyhardlinkbackup-1.4.1 → pyhardlinkbackup-1.6.0}/PyHardLinkBackup/cli_app/__init__.py +0 -0
  24. {pyhardlinkbackup-1.4.1 → pyhardlinkbackup-1.6.0}/PyHardLinkBackup/cli_dev/__init__.py +0 -0
  25. {pyhardlinkbackup-1.4.1 → pyhardlinkbackup-1.6.0}/PyHardLinkBackup/cli_dev/benchmark.py +0 -0
  26. {pyhardlinkbackup-1.4.1 → pyhardlinkbackup-1.6.0}/PyHardLinkBackup/cli_dev/code_style.py +0 -0
  27. {pyhardlinkbackup-1.4.1 → pyhardlinkbackup-1.6.0}/PyHardLinkBackup/cli_dev/packaging.py +0 -0
  28. {pyhardlinkbackup-1.4.1 → pyhardlinkbackup-1.6.0}/PyHardLinkBackup/cli_dev/shell_completion.py +0 -0
  29. {pyhardlinkbackup-1.4.1 → pyhardlinkbackup-1.6.0}/PyHardLinkBackup/cli_dev/testing.py +0 -0
  30. {pyhardlinkbackup-1.4.1 → pyhardlinkbackup-1.6.0}/PyHardLinkBackup/cli_dev/update_readme_history.py +0 -0
  31. {pyhardlinkbackup-1.4.1 → pyhardlinkbackup-1.6.0}/PyHardLinkBackup/constants.py +0 -0
  32. {pyhardlinkbackup-1.4.1 → pyhardlinkbackup-1.6.0}/PyHardLinkBackup/logging_setup.py +0 -0
  33. {pyhardlinkbackup-1.4.1 → pyhardlinkbackup-1.6.0}/PyHardLinkBackup/rebuild_databases.py +0 -0
  34. {pyhardlinkbackup-1.4.1 → pyhardlinkbackup-1.6.0}/PyHardLinkBackup/tests/__init__.py +0 -0
  35. {pyhardlinkbackup-1.4.1 → pyhardlinkbackup-1.6.0}/PyHardLinkBackup/tests/test_doc_write.py +0 -0
  36. {pyhardlinkbackup-1.4.1 → pyhardlinkbackup-1.6.0}/PyHardLinkBackup/tests/test_doctests.py +0 -0
  37. {pyhardlinkbackup-1.4.1 → pyhardlinkbackup-1.6.0}/PyHardLinkBackup/tests/test_project_setup.py +0 -0
  38. {pyhardlinkbackup-1.4.1 → pyhardlinkbackup-1.6.0}/PyHardLinkBackup/tests/test_readme.py +0 -0
  39. {pyhardlinkbackup-1.4.1 → pyhardlinkbackup-1.6.0}/PyHardLinkBackup/tests/test_readme_history.py +0 -0
  40. {pyhardlinkbackup-1.4.1 → pyhardlinkbackup-1.6.0}/PyHardLinkBackup/tests/test_rebuild_database.py +0 -0
  41. {pyhardlinkbackup-1.4.1 → pyhardlinkbackup-1.6.0}/PyHardLinkBackup/utilities/__init__.py +0 -0
  42. {pyhardlinkbackup-1.4.1 → pyhardlinkbackup-1.6.0}/PyHardLinkBackup/utilities/file_hash_database.py +0 -0
  43. {pyhardlinkbackup-1.4.1 → pyhardlinkbackup-1.6.0}/PyHardLinkBackup/utilities/file_size_database.py +0 -0
  44. {pyhardlinkbackup-1.4.1 → pyhardlinkbackup-1.6.0}/PyHardLinkBackup/utilities/filesystem.py +0 -0
  45. {pyhardlinkbackup-1.4.1 → pyhardlinkbackup-1.6.0}/PyHardLinkBackup/utilities/humanize.py +0 -0
  46. {pyhardlinkbackup-1.4.1 → pyhardlinkbackup-1.6.0}/PyHardLinkBackup/utilities/rich_utils.py +0 -0
  47. {pyhardlinkbackup-1.4.1 → pyhardlinkbackup-1.6.0}/PyHardLinkBackup/utilities/sha256sums.py +0 -0
  48. {pyhardlinkbackup-1.4.1 → pyhardlinkbackup-1.6.0}/PyHardLinkBackup/utilities/tee.py +0 -0
  49. {pyhardlinkbackup-1.4.1 → pyhardlinkbackup-1.6.0}/PyHardLinkBackup/utilities/tests/__init__.py +0 -0
  50. {pyhardlinkbackup-1.4.1 → pyhardlinkbackup-1.6.0}/PyHardLinkBackup/utilities/tests/test_file_hash_database.py +0 -0
  51. {pyhardlinkbackup-1.4.1 → pyhardlinkbackup-1.6.0}/PyHardLinkBackup/utilities/tests/test_file_size_database.py +0 -0
  52. {pyhardlinkbackup-1.4.1 → pyhardlinkbackup-1.6.0}/PyHardLinkBackup/utilities/tests/test_filesystem.py +0 -0
  53. {pyhardlinkbackup-1.4.1 → pyhardlinkbackup-1.6.0}/PyHardLinkBackup/utilities/tyro_cli_shared_args.py +0 -0
  54. {pyhardlinkbackup-1.4.1 → pyhardlinkbackup-1.6.0}/cli.py +0 -0
  55. {pyhardlinkbackup-1.4.1 → pyhardlinkbackup-1.6.0}/dev-cli.py +0 -0
  56. {pyhardlinkbackup-1.4.1 → pyhardlinkbackup-1.6.0}/dist/.gitignore +0 -0
  57. {pyhardlinkbackup-1.4.1 → pyhardlinkbackup-1.6.0}/docs/README.md +0 -0
  58. {pyhardlinkbackup-1.4.1 → pyhardlinkbackup-1.6.0}/docs/about-docs.md +0 -0
  59. {pyhardlinkbackup-1.4.1 → pyhardlinkbackup-1.6.0}/noxfile.py +0 -0
  60. {pyhardlinkbackup-1.4.1 → pyhardlinkbackup-1.6.0}/pyproject.toml +0 -0
  61. {pyhardlinkbackup-1.4.1 → pyhardlinkbackup-1.6.0}/uv.lock +0 -0
@@ -1,6 +1,6 @@
1
1
  Metadata-Version: 2.4
2
2
  Name: PyHardLinkBackup
3
- Version: 1.4.1
3
+ Version: 1.6.0
4
4
  Summary: HardLink/Deduplication Backups with Python
5
5
  Project-URL: Documentation, https://github.com/jedie/PyHardLinkBackup
6
6
  Project-URL: Source, https://github.com/jedie/PyHardLinkBackup
@@ -80,7 +80,7 @@ complete help for main CLI app:
80
80
 
81
81
  [comment]: <> (✂✂✂ auto generated main help start ✂✂✂)
82
82
  ```
83
- usage: phlb [-h] {backup,rebuild,version}
83
+ usage: phlb [-h] {backup,compare,rebuild,version}
84
84
 
85
85
 
86
86
 
@@ -90,6 +90,7 @@ usage: phlb [-h] {backup,rebuild,version}
90
90
  ╭─ subcommands ────────────────────────────────────────────────────────────────────────────────────────────────────────╮
91
91
  │ (required) │
92
92
  │ • backup Backup the source directory to the destination directory using hard links for deduplication. │
93
+ │ • compare Compares a source tree with the last backup and validates all known file hashes. │
93
94
  │ • rebuild Rebuild the file hash and size database by scanning all backup files. And also verify SHA256SUMS and/or │
94
95
  │ store missing hashes in SHA256SUMS files. │
95
96
  │ • version Print version and exit │
@@ -231,12 +232,21 @@ Overview of main changes:
231
232
 
232
233
  [comment]: <> (✂✂✂ auto generated history start ✂✂✂)
233
234
 
235
+ * [v1.6.0](https://github.com/jedie/PyHardLinkBackup/compare/v1.5.0...v1.6.0)
236
+ * 2026-01-17 - Fix flaky test, because of terminal size
237
+ * 2026-01-17 - Bugfix: Don't hash new large files twice
238
+ * 2026-01-17 - Use compare also in backup tests
239
+ * [v1.5.0](https://github.com/jedie/PyHardLinkBackup/compare/v1.4.1...v1.5.0)
240
+ * 2026-01-17 - NEW: Compare command to verify source tree with last backup
234
241
  * [v1.4.1](https://github.com/jedie/PyHardLinkBackup/compare/v1.4.0...v1.4.1)
235
242
  * 2026-01-16 - Bugfix large file handling
236
243
  * [v1.4.0](https://github.com/jedie/PyHardLinkBackup/compare/v1.3.0...v1.4.0)
237
244
  * 2026-01-16 - Create log file in backup and a summary.txt
238
245
  * 2026-01-16 - Run CI tests on macos, too.
239
246
  * 2026-01-16 - add dev cli command "scan-benchmark"
247
+
248
+ <details><summary>Expand older history entries ...</summary>
249
+
240
250
  * [v1.3.0](https://github.com/jedie/PyHardLinkBackup/compare/v1.2.0...v1.3.0)
241
251
  * 2026-01-15 - Verify SHA256SUMS files in "rebuild" command, too.
242
252
  * 2026-01-15 - Code cleanup: use more generic names for and in BackupProgress
@@ -249,9 +259,6 @@ Overview of main changes:
249
259
  * 2026-01-14 - Enhance progress bars
250
260
  * 2026-01-14 - A a note to rsync --link-dest
251
261
  * 2026-01-14 - Use cli_base.cli_tools.test_utils.base_testcases
252
-
253
- <details><summary>Expand older history entries ...</summary>
254
-
255
262
  * [v1.1.0](https://github.com/jedie/PyHardLinkBackup/compare/v1.0.1...v1.1.0)
256
263
  * 2026-01-14 - Change backup timestamp directory to old schema: '%Y-%m-%d-%H%M%S'
257
264
  * 2026-01-14 - Add "Overview of main changes" to README
@@ -3,5 +3,5 @@
3
3
  """
4
4
 
5
5
  # See https://packaging.python.org/en/latest/specifications/version-specifiers/
6
- __version__ = '1.4.1'
6
+ __version__ = '1.6.0'
7
7
  __author__ = 'Jens Diemer <PyHardLinkBackup@jensdiemer.de>'
@@ -138,7 +138,7 @@ def backup_one_file(
138
138
  backup_result.hardlinked_size += size
139
139
  else:
140
140
  logger.info('Copy unique file: %s to %s', src_path, dst_path)
141
- file_hash = copy_and_hash(src_path, dst_path)
141
+ shutil.copyfile(src_path, dst_path)
142
142
  hash_db[file_hash] = dst_path
143
143
  backup_result.copied_files += 1
144
144
  backup_result.copied_size += size
@@ -5,7 +5,7 @@ from typing import Annotated
5
5
  import tyro
6
6
  from rich import print # noqa
7
7
 
8
- from PyHardLinkBackup import rebuild_databases
8
+ from PyHardLinkBackup import compare_backup, rebuild_databases
9
9
  from PyHardLinkBackup.backup import backup_tree
10
10
  from PyHardLinkBackup.cli_app import app
11
11
  from PyHardLinkBackup.logging_setup import (
@@ -60,6 +60,42 @@ def backup(
60
60
  )
61
61
 
62
62
 
63
+ @app.command
64
+ def compare(
65
+ src: Annotated[
66
+ Path,
67
+ tyro.conf.arg(
68
+ metavar='source',
69
+ help='Source directory that should be compared with the last backup.',
70
+ ),
71
+ ],
72
+ dst: Annotated[
73
+ Path,
74
+ tyro.conf.arg(
75
+ metavar='destination',
76
+ help='Destination directory with the backups. Will pick the last backup for comparison.',
77
+ ),
78
+ ],
79
+ /,
80
+ excludes: TyroExcludeDirectoriesArgType = DEFAULT_EXCLUDE_DIRECTORIES,
81
+ verbosity: TyroConsoleLogLevelArgType = DEFAULT_CONSOLE_LOG_LEVEL,
82
+ log_file_level: TyroLogFileLevelArgType = DEFAULT_LOG_FILE_LEVEL,
83
+ ) -> None:
84
+ """
85
+ Compares a source tree with the last backup and validates all known file hashes.
86
+ """
87
+ log_manager = LoggingManager(
88
+ console_level=verbosity,
89
+ file_level=log_file_level,
90
+ )
91
+ compare_backup.compare_tree(
92
+ src_root=src,
93
+ backup_root=dst,
94
+ excludes=excludes,
95
+ log_manager=log_manager,
96
+ )
97
+
98
+
63
99
  @app.command
64
100
  def rebuild(
65
101
  backup_root: Annotated[
@@ -0,0 +1,213 @@
1
+ import dataclasses
2
+ import datetime
3
+ import logging
4
+ import os
5
+ import sys
6
+ import time
7
+ from pathlib import Path
8
+
9
+ from rich import print # noqa
10
+
11
+ from PyHardLinkBackup.logging_setup import LoggingManager
12
+ from PyHardLinkBackup.utilities.file_hash_database import FileHashDatabase
13
+ from PyHardLinkBackup.utilities.file_size_database import FileSizeDatabase
14
+ from PyHardLinkBackup.utilities.filesystem import (
15
+ hash_file,
16
+ humanized_fs_scan,
17
+ iter_scandir_files,
18
+ )
19
+ from PyHardLinkBackup.utilities.humanize import PrintTimingContextManager, human_filesize
20
+ from PyHardLinkBackup.utilities.rich_utils import DisplayFileTreeProgress
21
+ from PyHardLinkBackup.utilities.tee import TeeStdoutContext
22
+
23
+
24
+ logger = logging.getLogger(__name__)
25
+
26
+
27
+ @dataclasses.dataclass
28
+ class CompareResult:
29
+ last_timestamp: str
30
+ compare_dir: Path
31
+ log_file: Path
32
+ #
33
+ total_file_count: int = 0
34
+ total_size: int = 0
35
+ #
36
+ src_file_new_count: int = 0
37
+ file_size_missmatch: int = 0
38
+ file_hash_missmatch: int = 0
39
+ #
40
+ small_file_count: int = 0
41
+ size_db_missing_count: int = 0
42
+ hash_db_missing_count: int = 0
43
+ #
44
+ successful_file_count: int = 0
45
+ error_count: int = 0
46
+
47
+
48
+ def compare_one_file(
49
+ *,
50
+ src_root: Path,
51
+ entry: os.DirEntry,
52
+ size_db: FileSizeDatabase,
53
+ hash_db: FileHashDatabase,
54
+ compare_dir: Path,
55
+ compare_result: CompareResult,
56
+ ) -> None:
57
+ src_size = entry.stat().st_size
58
+
59
+ # For the progress bars:
60
+ compare_result.total_file_count += 1
61
+ compare_result.total_size += src_size
62
+
63
+ src_path = Path(entry.path)
64
+ dst_path = compare_dir / src_path.relative_to(src_root)
65
+
66
+ if not dst_path.exists():
67
+ logger.warning('Source file %s not found in compare %s', src_path, dst_path)
68
+ compare_result.src_file_new_count += 1
69
+ return
70
+
71
+ dst_size = dst_path.stat().st_size
72
+ if src_size != dst_size:
73
+ logger.warning(
74
+ 'Source file %s size (%i Bytes) differs from compare file %s size (%iBytes)',
75
+ src_path,
76
+ src_size,
77
+ dst_path,
78
+ dst_size,
79
+ )
80
+ compare_result.file_size_missmatch += 1
81
+ return
82
+
83
+ src_hash = hash_file(src_path)
84
+ dst_hash = hash_file(dst_path)
85
+
86
+ if src_hash != dst_hash:
87
+ logger.warning(
88
+ 'Source file %s hash %r differs from compare file %s hash (%s)',
89
+ src_path,
90
+ src_hash,
91
+ dst_path,
92
+ dst_hash,
93
+ )
94
+ compare_result.file_hash_missmatch += 1
95
+ return
96
+
97
+ if src_size < size_db.MIN_SIZE:
98
+ # Small file -> Not in deduplication database
99
+ compare_result.small_file_count += 1
100
+ else:
101
+ if src_size not in size_db:
102
+ logger.warning(
103
+ 'Source file %s size (%i Bytes) not found in deduplication database',
104
+ src_path,
105
+ src_size,
106
+ )
107
+ compare_result.size_db_missing_count += 1
108
+
109
+ if src_hash not in hash_db:
110
+ logger.warning(
111
+ 'Source file %s hash %r not found in deduplication database',
112
+ src_path,
113
+ src_hash,
114
+ )
115
+ compare_result.hash_db_missing_count += 1
116
+
117
+ # Everything is ok
118
+ compare_result.successful_file_count += 1
119
+
120
+
121
+ def compare_tree(
122
+ *,
123
+ src_root: Path,
124
+ backup_root: Path,
125
+ excludes: tuple[str, ...],
126
+ log_manager: LoggingManager,
127
+ ) -> CompareResult:
128
+ src_root = src_root.resolve()
129
+ if not src_root.is_dir():
130
+ print('Error: Source directory does not exist!')
131
+ print(f'Please check source directory: "{src_root}"\n')
132
+ sys.exit(1)
133
+
134
+ backup_root = backup_root.resolve()
135
+ phlb_conf_dir = backup_root / '.phlb'
136
+ if not phlb_conf_dir.is_dir():
137
+ print('Error: Compare directory seems to be wrong! (No .phlb configuration directory found)')
138
+ print(f'Please check backup directory: "{backup_root}"\n')
139
+ sys.exit(1)
140
+
141
+ compare_main_dir = backup_root / src_root.name
142
+ timestamps = sorted(
143
+ path.name for path in compare_main_dir.iterdir() if path.is_dir() and path.name.startswith('20')
144
+ )
145
+ print(f'Found {len(timestamps)} compare(s) in {compare_main_dir}:')
146
+ for timestamp in timestamps:
147
+ print(f' * {timestamp}')
148
+ last_timestamp = timestamps[-1]
149
+ compare_dir = compare_main_dir / last_timestamp
150
+ print(f'\nComparing source tree {src_root} with {last_timestamp} compare:')
151
+ print(f' {compare_dir}\n')
152
+
153
+ now_timestamp = datetime.datetime.now().strftime('%Y-%m-%d-%H%M%S')
154
+ log_file = compare_main_dir / f'{now_timestamp}-compare.log'
155
+ log_manager.start_file_logging(log_file)
156
+
157
+ excludes: set = set(excludes)
158
+ with PrintTimingContextManager('Filesystem scan completed in'):
159
+ src_file_count, src_total_size = humanized_fs_scan(src_root, excludes=excludes)
160
+
161
+ with DisplayFileTreeProgress(src_file_count, src_total_size) as progress:
162
+ # init "databases":
163
+ size_db = FileSizeDatabase(phlb_conf_dir)
164
+ hash_db = FileHashDatabase(backup_root, phlb_conf_dir)
165
+
166
+ compare_result = CompareResult(last_timestamp=last_timestamp, compare_dir=compare_dir, log_file=log_file)
167
+
168
+ next_update = 0
169
+ for entry in iter_scandir_files(src_root, excludes=excludes):
170
+ try:
171
+ compare_one_file(
172
+ src_root=src_root,
173
+ entry=entry,
174
+ size_db=size_db,
175
+ hash_db=hash_db,
176
+ compare_dir=compare_dir,
177
+ compare_result=compare_result,
178
+ )
179
+ except Exception as err:
180
+ logger.exception(f'Compare {entry.path} {err.__class__.__name__}: {err}')
181
+ compare_result.error_count += 1
182
+ else:
183
+ now = time.monotonic()
184
+ if now >= next_update:
185
+ progress.update(
186
+ completed_file_count=compare_result.total_file_count,
187
+ completed_size=compare_result.total_size,
188
+ )
189
+ next_update = now + 0.5
190
+
191
+ # Finalize progress indicator values:
192
+ progress.update(completed_file_count=compare_result.total_file_count, completed_size=compare_result.total_size)
193
+
194
+ summary_file = compare_main_dir / f'{now_timestamp}-summary.txt'
195
+ with TeeStdoutContext(summary_file):
196
+ print(f'\nCompare complete: {compare_dir} (total size {human_filesize(compare_result.total_size)})\n')
197
+ print(f' Total files processed: {compare_result.total_file_count}')
198
+ print(f' * Successful compared files: {compare_result.successful_file_count}')
199
+ print(f' * New source files: {compare_result.src_file_new_count}')
200
+ print(f' * File size missmatch: {compare_result.file_size_missmatch}')
201
+ print(f' * File hash missmatch: {compare_result.file_hash_missmatch}')
202
+
203
+ print(f' * Small (<{size_db.MIN_SIZE} Bytes) files: {compare_result.small_file_count}')
204
+ print(f' * Missing in size DB: {compare_result.size_db_missing_count}')
205
+ print(f' * Missing in hash DB: {compare_result.hash_db_missing_count}')
206
+
207
+ if compare_result.error_count > 0:
208
+ print(f' Errors during compare: {compare_result.error_count} (see log for details)')
209
+ print()
210
+
211
+ logger.info('Compare completed. Summary created: %s', summary_file)
212
+
213
+ return compare_result
@@ -14,13 +14,13 @@ from bx_py_utils.test_utils.assertion import assert_text_equal
14
14
  from bx_py_utils.test_utils.datetime import parse_dt
15
15
  from bx_py_utils.test_utils.log_utils import NoLogs
16
16
  from bx_py_utils.test_utils.redirect import RedirectOut
17
- from cli_base.cli_tools.test_utils.base_testcases import OutputMustCapturedTestCaseMixin
18
17
  from freezegun import freeze_time
19
18
  from tabulate import tabulate
20
19
 
21
20
  from PyHardLinkBackup.backup import BackupResult, backup_tree
22
21
  from PyHardLinkBackup.constants import CHUNK_SIZE
23
22
  from PyHardLinkBackup.logging_setup import DEFAULT_CONSOLE_LOG_LEVEL, DEFAULT_LOG_FILE_LEVEL, LoggingManager
23
+ from PyHardLinkBackup.tests.test_compare_backup import assert_compare_backup
24
24
  from PyHardLinkBackup.utilities.file_size_database import FileSizeDatabase
25
25
  from PyHardLinkBackup.utilities.filesystem import copy_and_hash, iter_scandir_files
26
26
  from PyHardLinkBackup.utilities.tests.test_file_hash_database import assert_hash_db_info
@@ -120,7 +120,7 @@ def assert_fs_tree_overview(root: Path, expected_overview: str):
120
120
 
121
121
 
122
122
  class BackupTreeTestCase(
123
- OutputMustCapturedTestCaseMixin,
123
+ # TODO: OutputMustCapturedTestCaseMixin,
124
124
  unittest.TestCase,
125
125
  ):
126
126
  def test_happy_path(self):
@@ -254,6 +254,20 @@ class BackupTreeTestCase(
254
254
  """,
255
255
  )
256
256
 
257
+ #######################################################################################
258
+ # Compare the backup
259
+
260
+ assert_compare_backup(
261
+ test_case=self,
262
+ src_root=src_root,
263
+ backup_root=backup_root,
264
+ excludes=('.cache',),
265
+ excpected_last_timestamp='2026-01-01-123456', # Freezed time, see above
266
+ excpected_total_file_count=7,
267
+ excpected_successful_file_count=7,
268
+ excpected_error_count=0,
269
+ )
270
+
257
271
  #######################################################################################
258
272
  # Backup again with new added files:
259
273
 
@@ -352,6 +366,20 @@ class BackupTreeTestCase(
352
366
  """,
353
367
  )
354
368
 
369
+ #######################################################################################
370
+ # Compare the backup
371
+
372
+ assert_compare_backup(
373
+ test_case=self,
374
+ src_root=src_root,
375
+ backup_root=backup_root,
376
+ excludes=('.cache',),
377
+ excpected_last_timestamp='2026-01-02-123456', # Freezed time, see above
378
+ excpected_total_file_count=12,
379
+ excpected_successful_file_count=12,
380
+ excpected_error_count=0,
381
+ )
382
+
355
383
  #######################################################################################
356
384
  # Don't create broken hardlinks!
357
385
 
@@ -422,7 +450,7 @@ class BackupTreeTestCase(
422
450
  copied_size=1074,
423
451
  copied_small_files=5,
424
452
  copied_small_size=74,
425
- error_count=0
453
+ error_count=0,
426
454
  ),
427
455
  )
428
456
 
@@ -439,6 +467,20 @@ class BackupTreeTestCase(
439
467
  """,
440
468
  )
441
469
 
470
+ #######################################################################################
471
+ # Compare the backup
472
+
473
+ assert_compare_backup(
474
+ test_case=self,
475
+ src_root=src_root,
476
+ backup_root=backup_root,
477
+ excludes=('.cache',),
478
+ excpected_last_timestamp='2026-01-03-123456', # Freezed time, see above
479
+ excpected_total_file_count=12,
480
+ excpected_successful_file_count=12,
481
+ excpected_error_count=0,
482
+ )
483
+
442
484
  def test_symlink(self):
443
485
  with tempfile.TemporaryDirectory() as temp_dir:
444
486
  temp_path = Path(temp_dir).resolve()
@@ -558,6 +600,24 @@ class BackupTreeTestCase(
558
600
  with self.assertLogs('PyHardLinkBackup', level=logging.DEBUG):
559
601
  assert_hash_db_info(backup_root=backup_root, expected='')
560
602
 
603
+ #######################################################################################
604
+ # Compare the backup
605
+
606
+ assert_compare_backup(
607
+ test_case=self,
608
+ src_root=src_root,
609
+ backup_root=backup_root,
610
+ std_out_parts=(
611
+ 'Compare completed.',
612
+ 'broken_symlink', # <<< the error we expect
613
+ ),
614
+ excludes=('.cache',),
615
+ excpected_last_timestamp='2026-01-01-123456', # Freezed time, see above
616
+ excpected_total_file_count=3,
617
+ excpected_successful_file_count=3,
618
+ excpected_error_count=1, # One broken symlink
619
+ )
620
+
561
621
  def test_error_handling(self):
562
622
  with tempfile.TemporaryDirectory() as temp_dir:
563
623
  temp_path = Path(temp_dir).resolve()
@@ -626,3 +686,21 @@ class BackupTreeTestCase(
626
686
  error_count=1,
627
687
  ),
628
688
  )
689
+
690
+ #######################################################################################
691
+ # Compare the backup
692
+
693
+ assert_compare_backup(
694
+ test_case=self,
695
+ src_root=src_root,
696
+ backup_root=backup_root,
697
+ std_out_parts=(
698
+ 'Compare completed.',
699
+ 'file2.txt not found', # <<< the error we expect
700
+ ),
701
+ excludes=('.cache',),
702
+ excpected_last_timestamp='2026-01-01-123456', # Freezed time, see above
703
+ excpected_total_file_count=3,
704
+ excpected_successful_file_count=2,
705
+ excpected_error_count=0,
706
+ )
@@ -0,0 +1,145 @@
1
+ import shutil
2
+ import tempfile
3
+ from pathlib import Path
4
+ from unittest import TestCase
5
+
6
+ from bx_py_utils.test_utils.redirect import RedirectOut
7
+ from cli_base.cli_tools.test_utils.assertion import assert_in
8
+ from cli_base.cli_tools.test_utils.rich_test_utils import NoColorEnvRich
9
+
10
+ from PyHardLinkBackup.compare_backup import CompareResult, LoggingManager, compare_tree
11
+ from PyHardLinkBackup.logging_setup import DEFAULT_LOG_FILE_LEVEL
12
+ from PyHardLinkBackup.utilities.file_hash_database import FileHashDatabase
13
+ from PyHardLinkBackup.utilities.file_size_database import FileSizeDatabase
14
+ from PyHardLinkBackup.utilities.filesystem import hash_file
15
+
16
+
17
+ def assert_compare_backup(
18
+ test_case: TestCase,
19
+ src_root: Path,
20
+ backup_root: Path,
21
+ excpected_last_timestamp: str,
22
+ excpected_total_file_count: int,
23
+ excpected_successful_file_count: int,
24
+ std_out_parts: tuple[str, ...] = ('Compare completed.',),
25
+ excludes: tuple[str, ...] = (),
26
+ excpected_error_count: int = 0,
27
+ ) -> None:
28
+ with (
29
+ NoColorEnvRich(
30
+ width=200, # Wide width to avoid line breaks in test output that failed assert_in()
31
+ ),
32
+ RedirectOut() as redirected_out,
33
+ ):
34
+ result = compare_tree(
35
+ src_root=src_root,
36
+ backup_root=backup_root,
37
+ excludes=excludes,
38
+ log_manager=LoggingManager(
39
+ console_level='info',
40
+ file_level=DEFAULT_LOG_FILE_LEVEL,
41
+ ),
42
+ )
43
+ stdout = redirected_out.stdout
44
+ test_case.assertEqual(redirected_out.stderr, '', stdout)
45
+
46
+ assert_in(content=stdout, parts=std_out_parts)
47
+
48
+ test_case.assertEqual(result.last_timestamp, excpected_last_timestamp, stdout)
49
+ test_case.assertEqual(result.total_file_count, excpected_total_file_count, stdout)
50
+ test_case.assertEqual(result.successful_file_count, excpected_successful_file_count, stdout)
51
+ test_case.assertEqual(result.error_count, excpected_error_count, stdout)
52
+
53
+
54
+ class CompareBackupTestCase(TestCase):
55
+ def test_happy_path(self):
56
+ with tempfile.TemporaryDirectory() as src_dir, tempfile.TemporaryDirectory() as backup_dir:
57
+ src_root = Path(src_dir).resolve()
58
+ backup_root = Path(backup_dir).resolve()
59
+
60
+ # Setup backup structure
61
+ phlb_conf_dir = backup_root / '.phlb'
62
+ phlb_conf_dir.mkdir()
63
+
64
+ compare_main_dir = backup_root / src_root.name
65
+ compare_main_dir.mkdir()
66
+
67
+ # Create some "older" compare dirs
68
+ (compare_main_dir / '2025-12-31-235959').mkdir()
69
+ (compare_main_dir / '2026-01-10-235959').mkdir()
70
+
71
+ # Create "last" backup dir:
72
+ timestamp = '2026-01-17-120000'
73
+ last_backup_dir = compare_main_dir / timestamp
74
+ last_backup_dir.mkdir()
75
+
76
+ # Create source files
77
+ (src_root / 'small_file.txt').write_text('hello world')
78
+ (src_root / 'large_file_missing.txt').write_bytes(b'X' * FileSizeDatabase.MIN_SIZE)
79
+ large_file_in_dbs = src_root / 'large_file_in_dbs.txt'
80
+ large_file_in_dbs.write_bytes(b'Y' * (FileSizeDatabase.MIN_SIZE + 1))
81
+
82
+ # Copy files to backup
83
+ total_size = 0
84
+ total_file_count = 0
85
+ for file_path in src_root.iterdir():
86
+ shutil.copy2(file_path, last_backup_dir / file_path.name)
87
+ total_size += file_path.stat().st_size
88
+ total_file_count += 1
89
+ self.assertEqual(total_file_count, 3)
90
+ self.assertEqual(total_size, 2012)
91
+
92
+ # Create databases and add values from 'large_file_in_dbs.txt'
93
+ size_db = FileSizeDatabase(phlb_conf_dir)
94
+ size_db.add(FileSizeDatabase.MIN_SIZE + 1)
95
+ hash_db = FileHashDatabase(backup_root, phlb_conf_dir)
96
+ src_hash = hash_file(large_file_in_dbs)
97
+ hash_db[src_hash] = last_backup_dir / 'large_file_in_dbs.txt'
98
+
99
+ #######################################################################################
100
+ # Run compare_tree
101
+
102
+ with RedirectOut() as redirected_out:
103
+ result = compare_tree(
104
+ src_root=src_root,
105
+ backup_root=backup_root,
106
+ excludes=(),
107
+ log_manager=LoggingManager(
108
+ console_level='info',
109
+ file_level=DEFAULT_LOG_FILE_LEVEL,
110
+ ),
111
+ )
112
+ self.assertEqual(redirected_out.stderr, '')
113
+ self.assertIn('Compare completed.', redirected_out.stdout)
114
+ self.assertEqual(
115
+ result,
116
+ CompareResult(
117
+ last_timestamp='2026-01-17-120000',
118
+ compare_dir=last_backup_dir,
119
+ log_file=result.log_file,
120
+ total_file_count=total_file_count,
121
+ total_size=total_size,
122
+ src_file_new_count=0,
123
+ file_size_missmatch=0,
124
+ file_hash_missmatch=0,
125
+ small_file_count=1,
126
+ size_db_missing_count=1,
127
+ hash_db_missing_count=1,
128
+ successful_file_count=total_file_count,
129
+ error_count=0,
130
+ ),
131
+ redirected_out.stdout,
132
+ )
133
+
134
+ #######################################################################################
135
+ # Check again with our test helper:
136
+
137
+ assert_compare_backup(
138
+ test_case=self,
139
+ src_root=src_root,
140
+ backup_root=backup_root,
141
+ excpected_last_timestamp='2026-01-17-120000',
142
+ excpected_total_file_count=total_file_count,
143
+ excpected_successful_file_count=total_file_count,
144
+ excpected_error_count=0,
145
+ )
@@ -65,7 +65,7 @@ complete help for main CLI app:
65
65
 
66
66
  [comment]: <> (✂✂✂ auto generated main help start ✂✂✂)
67
67
  ```
68
- usage: phlb [-h] {backup,rebuild,version}
68
+ usage: phlb [-h] {backup,compare,rebuild,version}
69
69
 
70
70
 
71
71
 
@@ -75,6 +75,7 @@ usage: phlb [-h] {backup,rebuild,version}
75
75
  ╭─ subcommands ────────────────────────────────────────────────────────────────────────────────────────────────────────╮
76
76
  │ (required) │
77
77
  │ • backup Backup the source directory to the destination directory using hard links for deduplication. │
78
+ │ • compare Compares a source tree with the last backup and validates all known file hashes. │
78
79
  │ • rebuild Rebuild the file hash and size database by scanning all backup files. And also verify SHA256SUMS and/or │
79
80
  │ store missing hashes in SHA256SUMS files. │
80
81
  │ • version Print version and exit │
@@ -216,12 +217,21 @@ Overview of main changes:
216
217
 
217
218
  [comment]: <> (✂✂✂ auto generated history start ✂✂✂)
218
219
 
220
+ * [v1.6.0](https://github.com/jedie/PyHardLinkBackup/compare/v1.5.0...v1.6.0)
221
+ * 2026-01-17 - Fix flaky test, because of terminal size
222
+ * 2026-01-17 - Bugfix: Don't hash new large files twice
223
+ * 2026-01-17 - Use compare also in backup tests
224
+ * [v1.5.0](https://github.com/jedie/PyHardLinkBackup/compare/v1.4.1...v1.5.0)
225
+ * 2026-01-17 - NEW: Compare command to verify source tree with last backup
219
226
  * [v1.4.1](https://github.com/jedie/PyHardLinkBackup/compare/v1.4.0...v1.4.1)
220
227
  * 2026-01-16 - Bugfix large file handling
221
228
  * [v1.4.0](https://github.com/jedie/PyHardLinkBackup/compare/v1.3.0...v1.4.0)
222
229
  * 2026-01-16 - Create log file in backup and a summary.txt
223
230
  * 2026-01-16 - Run CI tests on macos, too.
224
231
  * 2026-01-16 - add dev cli command "scan-benchmark"
232
+
233
+ <details><summary>Expand older history entries ...</summary>
234
+
225
235
  * [v1.3.0](https://github.com/jedie/PyHardLinkBackup/compare/v1.2.0...v1.3.0)
226
236
  * 2026-01-15 - Verify SHA256SUMS files in "rebuild" command, too.
227
237
  * 2026-01-15 - Code cleanup: use more generic names for and in BackupProgress
@@ -234,9 +244,6 @@ Overview of main changes:
234
244
  * 2026-01-14 - Enhance progress bars
235
245
  * 2026-01-14 - A a note to rsync --link-dest
236
246
  * 2026-01-14 - Use cli_base.cli_tools.test_utils.base_testcases
237
-
238
- <details><summary>Expand older history entries ...</summary>
239
-
240
247
  * [v1.1.0](https://github.com/jedie/PyHardLinkBackup/compare/v1.0.1...v1.1.0)
241
248
  * 2026-01-14 - Change backup timestamp directory to old schema: '%Y-%m-%d-%H%M%S'
242
249
  * 2026-01-14 - Add "Overview of main changes" to README