ruborg 0.7.0 → 0.7.1
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +4 -4
- data/CHANGELOG.md +39 -0
- data/README.md +6 -3
- data/SECURITY.md +15 -0
- data/lib/ruborg/backup.rb +159 -5
- data/lib/ruborg/version.rb +1 -1
- metadata +1 -1
checksums.yaml
CHANGED
|
@@ -1,7 +1,7 @@
|
|
|
1
1
|
---
|
|
2
2
|
SHA256:
|
|
3
|
-
metadata.gz:
|
|
4
|
-
data.tar.gz:
|
|
3
|
+
metadata.gz: 5b1659a54e64ed15742467c6e95ea96fd311f0332e2fafdee7e560cec5c2385c
|
|
4
|
+
data.tar.gz: ec49de1e1231ad2bd189e08aedb70d22ec10f4f322f1b4690c9c3ec41be590bb
|
|
5
5
|
SHA512:
|
|
6
|
-
metadata.gz:
|
|
7
|
-
data.tar.gz:
|
|
6
|
+
metadata.gz: '0593d8521ab13110b3fefe00b9a1bc3cd6a8c28ffa70e93dda6e3bfa6b763f0562abd564af25a14c5619b2fe7dabb7ef76ffbe1341c6dcdb4dcac7efbb7be847'
|
|
7
|
+
data.tar.gz: 9486465c9803962569b5f376556efafa8147d72a294e9a6e3137d29e67c9664b212bed69addc0bab0ad434ddc5f42416e3aa7426087bbf4dd7c9a4ce2c71192f
|
data/CHANGELOG.md
CHANGED
|
@@ -7,6 +7,45 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
|
|
|
7
7
|
|
|
8
8
|
## [Unreleased]
|
|
9
9
|
|
|
10
|
+
## [0.7.1] - 2025-10-08
|
|
11
|
+
|
|
12
|
+
### Added
|
|
13
|
+
- **Paranoid Mode Duplicate Detection**: Per-file backup mode now uses SHA256 content hashing to detect duplicate files
|
|
14
|
+
- Skips unchanged files automatically (same path, size, and content hash)
|
|
15
|
+
- Creates versioned archives (-v2, -v3) when content changes but modification time stays the same
|
|
16
|
+
- Protects against edge cases where files are modified with manual `touch -t` operations
|
|
17
|
+
- Archive metadata stores: `path|||size|||hash` for comprehensive verification
|
|
18
|
+
- Backward compatible with old archive formats (plain path, path|||hash)
|
|
19
|
+
- **Smart Skip Statistics**: Backup completion messages show both backed-up and skipped file counts
|
|
20
|
+
- Example: "✓ Per-file backup completed: 50000 file(s) backed up, 26456 skipped (unchanged)"
|
|
21
|
+
- Provides visibility into deduplication efficiency
|
|
22
|
+
|
|
23
|
+
### Fixed
|
|
24
|
+
- **Per-File Backup Archive Collision**: Fixed "Archive already exists" error in per-file backup mode
|
|
25
|
+
- Archives are now verified by path, size, and content hash before skipping
|
|
26
|
+
- Different files with same archive name get automatic version suffixes
|
|
27
|
+
- File size changes detected even when modification time is manually reset
|
|
28
|
+
- Logs warning messages for collision scenarios with detailed context
|
|
29
|
+
|
|
30
|
+
### Changed
|
|
31
|
+
- **Archive Comment Format**: Per-file archives now store comprehensive metadata
|
|
32
|
+
- New format: `path|||size|||hash` (three-part delimiter-based format)
|
|
33
|
+
- Enables instant duplicate detection without re-hashing files
|
|
34
|
+
- Backward compatible parsing handles old formats gracefully
|
|
35
|
+
- **Enhanced Collision Handling**: Intelligent version suffix generation
|
|
36
|
+
- Appends `-v2`, `-v3`, etc. for archive name collisions
|
|
37
|
+
- Prevents data loss from conflicting archive names
|
|
38
|
+
- Logs warnings for all collision scenarios
|
|
39
|
+
|
|
40
|
+
### Security
|
|
41
|
+
- **No Security Impact**: Security review found no exploitable vulnerabilities in new features
|
|
42
|
+
- Content hashing uses SHA256 (cryptographically secure)
|
|
43
|
+
- Archive comment parsing uses safe string splitting (no injection risks)
|
|
44
|
+
- File paths from archives only used for comparison, not file operations
|
|
45
|
+
- Array-based command execution prevents shell injection
|
|
46
|
+
- JSON parsing uses Ruby's safe `JSON.parse()` with error handling
|
|
47
|
+
- All existing security controls maintained
|
|
48
|
+
|
|
10
49
|
## [0.7.0] - 2025-10-08
|
|
11
50
|
|
|
12
51
|
### Added
|
data/README.md
CHANGED
|
@@ -25,7 +25,7 @@ A friendly Ruby frontend for [Borg Backup](https://www.borgbackup.org/). Ruborg
|
|
|
25
25
|
- 📈 **Summary View** - Quick overview of all repositories and their configurations
|
|
26
26
|
- 🔧 **Custom Borg Path** - Support for custom Borg executable paths per repository
|
|
27
27
|
- 🏠 **Hostname Validation** - NEW! Restrict backups to specific hosts (global or per-repository)
|
|
28
|
-
- ✅ **Well-tested** - Comprehensive test suite with RSpec (
|
|
28
|
+
- ✅ **Well-tested** - Comprehensive test suite with RSpec (286+ examples)
|
|
29
29
|
- 🔒 **Security-focused** - Path validation, safe YAML loading, command injection protection
|
|
30
30
|
|
|
31
31
|
## Prerequisites
|
|
@@ -775,9 +775,12 @@ repositories:
|
|
|
775
775
|
|
|
776
776
|
**How it works:**
|
|
777
777
|
- **Per-File Archives**: Each file is backed up as a separate Borg archive
|
|
778
|
-
- **Hash-Based Naming**: Archives are named `repo-{hash}-{timestamp}` (hash uniquely identifies the file path)
|
|
779
|
-
- **
|
|
778
|
+
- **Hash-Based Naming**: Archives are named `repo-filename-{hash}-{timestamp}` (hash uniquely identifies the file path)
|
|
779
|
+
- **Metadata Storage**: Archive comments store `path|||size|||hash` for comprehensive duplicate detection
|
|
780
780
|
- **Metadata Preservation**: Borg preserves all file metadata (mtime, size, permissions) in the archive
|
|
781
|
+
- **Paranoid Mode Duplicate Detection** (v0.7.1+): SHA256 content hashing detects file changes even when size and mtime are identical
|
|
782
|
+
- **Smart Skip**: Automatically skips unchanged files during backup (compares path, size, and content hash)
|
|
783
|
+
- **Version Suffixes**: Creates versioned archives (`-v2`, `-v3`) for archive name collisions, preventing data loss
|
|
781
784
|
- **Smart Pruning**: Retention reads file mtime directly from archives - works even after files are deleted
|
|
782
785
|
|
|
783
786
|
**File Metadata Retention Options:**
|
data/SECURITY.md
CHANGED
|
@@ -229,6 +229,21 @@ We will respond within 48 hours and work with you to address the issue.
|
|
|
229
229
|
|
|
230
230
|
## Security Audit History
|
|
231
231
|
|
|
232
|
+
- **v0.7.1** (2025-10-08): Paranoid mode duplicate detection - security review passed
|
|
233
|
+
- **NEW FEATURE**: SHA256 content hashing for detecting file changes even when mtime/size are identical
|
|
234
|
+
- **NEW FEATURE**: Smart skip statistics showing backed-up and skipped file counts
|
|
235
|
+
- **BUG FIX**: Fixed "Archive already exists" error in per-file backup mode
|
|
236
|
+
- **ENHANCED**: Archive comment format now stores comprehensive metadata (`path|||size|||hash`)
|
|
237
|
+
- **ENHANCED**: Version suffix generation for archive name collisions (`-v2`, `-v3`)
|
|
238
|
+
- **SECURITY REVIEW**: Comprehensive security analysis found no exploitable vulnerabilities
|
|
239
|
+
- SHA256 hashing is cryptographically secure (using Ruby's Digest::SHA256)
|
|
240
|
+
- Archive comment parsing uses safe string splitting with `|||` delimiter (no injection risks)
|
|
241
|
+
- File paths from archives only used for comparison, never for file operations
|
|
242
|
+
- Array-based command execution prevents shell injection (maintained from previous versions)
|
|
243
|
+
- JSON parsing uses Ruby's safe `JSON.parse()` with error handling
|
|
244
|
+
- All existing security controls maintained - no security regressions
|
|
245
|
+
- Backward compatibility with three metadata formats (plain path, path|||hash, path|||size|||hash)
|
|
246
|
+
|
|
232
247
|
- **v0.7.0** (2025-10-08): Archive naming and metadata features - security review passed
|
|
233
248
|
- **NEW FEATURE**: List files within archives (--archive option)
|
|
234
249
|
- **NEW FEATURE**: File metadata retrieval from archives
|
data/lib/ruborg/backup.rb
CHANGED
|
@@ -43,17 +43,25 @@ module Ruborg
|
|
|
43
43
|
remove_source_files if remove_source
|
|
44
44
|
end
|
|
45
45
|
|
|
46
|
+
# rubocop:disable Metrics/AbcSize, Metrics/MethodLength, Metrics/PerceivedComplexity, Metrics/BlockNesting
|
|
46
47
|
def create_per_file_archives(name_prefix, remove_source)
|
|
47
48
|
# Collect all files from backup paths
|
|
48
49
|
files_to_backup = collect_files_from_paths(@config.backup_paths, @config.exclude_patterns)
|
|
49
50
|
|
|
50
51
|
raise BorgError, "No files found to backup" if files_to_backup.empty?
|
|
51
52
|
|
|
53
|
+
# Get list of existing archives for duplicate detection
|
|
54
|
+
existing_archives = get_existing_archive_names
|
|
55
|
+
|
|
52
56
|
# Show repository header in console only
|
|
53
57
|
print_repository_header
|
|
54
58
|
|
|
55
59
|
puts "Found #{files_to_backup.size} file(s) to backup"
|
|
56
60
|
|
|
61
|
+
backed_up_count = 0
|
|
62
|
+
skipped_count = 0
|
|
63
|
+
|
|
64
|
+
# rubocop:disable Metrics/BlockLength
|
|
57
65
|
files_to_backup.each_with_index do |file_path, index|
|
|
58
66
|
# Generate hash-based archive name with filename
|
|
59
67
|
path_hash = generate_path_hash(file_path)
|
|
@@ -67,22 +75,75 @@ module Ruborg
|
|
|
67
75
|
archive_name = name_prefix || build_archive_name(@repo_name, sanitized_filename, path_hash, file_mtime)
|
|
68
76
|
|
|
69
77
|
# Show progress in console
|
|
70
|
-
|
|
78
|
+
print " [#{index + 1}/#{files_to_backup.size}] Backing up: #{file_path}"
|
|
79
|
+
|
|
80
|
+
# Check if archive already exists AND contains this exact file
|
|
81
|
+
if existing_archives.key?(archive_name)
|
|
82
|
+
stored_info = existing_archives[archive_name]
|
|
83
|
+
if stored_info[:path] == file_path
|
|
84
|
+
# Same file, same mtime -> check if size changed (rare: manual content edit + touch -t)
|
|
85
|
+
current_size = File.size(file_path)
|
|
86
|
+
stored_size = stored_info[:size]
|
|
87
|
+
|
|
88
|
+
if current_size == stored_size
|
|
89
|
+
# Size same -> verify content hasn't changed (paranoid mode)
|
|
90
|
+
current_hash = calculate_file_hash(file_path)
|
|
91
|
+
stored_hash = stored_info[:hash]
|
|
92
|
+
|
|
93
|
+
if current_hash == stored_hash
|
|
94
|
+
# Content truly unchanged
|
|
95
|
+
puts " - Archive already exists (file unchanged)"
|
|
96
|
+
@logger&.info(
|
|
97
|
+
"[#{@repo_name}] Skipped #{file_path} - archive #{archive_name} already exists (file unchanged)"
|
|
98
|
+
)
|
|
99
|
+
skipped_count += 1
|
|
100
|
+
next
|
|
101
|
+
else
|
|
102
|
+
# Size same but content changed (rare: edited + truncated/padded to same size)
|
|
103
|
+
archive_name = find_next_version_name(archive_name, existing_archives)
|
|
104
|
+
@logger&.warn(
|
|
105
|
+
"[#{@repo_name}] File content changed but size/mtime unchanged for #{file_path}, " \
|
|
106
|
+
"using #{archive_name}"
|
|
107
|
+
)
|
|
108
|
+
end
|
|
109
|
+
else
|
|
110
|
+
# Size changed but mtime same -> content changed, add version suffix
|
|
111
|
+
archive_name = find_next_version_name(archive_name, existing_archives)
|
|
112
|
+
@logger&.warn(
|
|
113
|
+
"[#{@repo_name}] File size changed but mtime unchanged for #{file_path}, using #{archive_name}"
|
|
114
|
+
)
|
|
115
|
+
end
|
|
116
|
+
else
|
|
117
|
+
# Different file, same archive name -> add version suffix
|
|
118
|
+
archive_name = find_next_version_name(archive_name, existing_archives)
|
|
119
|
+
@logger&.warn(
|
|
120
|
+
"[#{@repo_name}] Archive name collision: #{archive_name} exists for different file, using version suffix"
|
|
121
|
+
)
|
|
122
|
+
end
|
|
123
|
+
end
|
|
71
124
|
|
|
72
125
|
# Create archive for single file with original path as comment
|
|
73
126
|
cmd = build_per_file_create_command(archive_name, file_path)
|
|
74
127
|
|
|
75
128
|
execute_borg_command(cmd)
|
|
129
|
+
puts ""
|
|
76
130
|
|
|
77
131
|
# Log successful action with details
|
|
78
132
|
@logger&.info("[#{@repo_name}] Archived #{file_path} in archive #{archive_name}")
|
|
133
|
+
backed_up_count += 1
|
|
79
134
|
end
|
|
135
|
+
# rubocop:enable Metrics/BlockLength
|
|
80
136
|
|
|
81
|
-
|
|
137
|
+
if skipped_count.positive?
|
|
138
|
+
puts "✓ Per-file backup completed: #{backed_up_count} file(s) backed up, #{skipped_count} skipped (unchanged)"
|
|
139
|
+
else
|
|
140
|
+
puts "✓ Per-file backup completed: #{backed_up_count} file(s) backed up"
|
|
141
|
+
end
|
|
82
142
|
|
|
83
143
|
# NOTE: remove_source handled per file after successful backup
|
|
84
144
|
remove_source_files if remove_source
|
|
85
145
|
end
|
|
146
|
+
# rubocop:enable Metrics/AbcSize, Metrics/MethodLength, Metrics/PerceivedComplexity, Metrics/BlockNesting
|
|
86
147
|
|
|
87
148
|
def collect_files_from_paths(paths, exclude_patterns)
|
|
88
149
|
require "find"
|
|
@@ -178,12 +239,21 @@ module Ruborg
|
|
|
178
239
|
end
|
|
179
240
|
end
|
|
180
241
|
|
|
242
|
+
def calculate_file_hash(file_path)
|
|
243
|
+
require "digest"
|
|
244
|
+
Digest::SHA256.file(file_path).hexdigest
|
|
245
|
+
end
|
|
246
|
+
|
|
181
247
|
def build_per_file_create_command(archive_name, file_path)
|
|
182
248
|
cmd = [@repository.borg_path, "create"]
|
|
183
249
|
cmd += ["--compression", @config.compression]
|
|
184
250
|
|
|
185
|
-
# Store
|
|
186
|
-
|
|
251
|
+
# Store file metadata (path + size + hash) in archive comment for duplicate detection
|
|
252
|
+
# Format: path|||size|||hash (using ||| as delimiter to avoid conflicts with paths)
|
|
253
|
+
file_size = File.size(file_path)
|
|
254
|
+
file_hash = calculate_file_hash(file_path)
|
|
255
|
+
metadata = "#{file_path}|||#{file_size}|||#{file_hash}"
|
|
256
|
+
cmd += ["--comment", metadata]
|
|
187
257
|
|
|
188
258
|
cmd << "#{@repository.path}::#{archive_name}"
|
|
189
259
|
cmd << file_path
|
|
@@ -333,9 +403,93 @@ module Ruborg
|
|
|
333
403
|
end
|
|
334
404
|
|
|
335
405
|
def print_repository_header
|
|
336
|
-
puts "\n"
|
|
406
|
+
puts "\n#{"=" * 60}"
|
|
337
407
|
puts " Repository: #{@repo_name}"
|
|
338
408
|
puts "=" * 60
|
|
339
409
|
end
|
|
410
|
+
|
|
411
|
+
# rubocop:disable Metrics/AbcSize, Metrics/MethodLength, Metrics/PerceivedComplexity
|
|
412
|
+
def get_existing_archive_names
|
|
413
|
+
require "json"
|
|
414
|
+
require "open3"
|
|
415
|
+
|
|
416
|
+
# First get list of archives
|
|
417
|
+
cmd = [@repository.borg_path, "list", @repository.path, "--json"]
|
|
418
|
+
env = {}
|
|
419
|
+
passphrase = @repository.instance_variable_get(:@passphrase)
|
|
420
|
+
env["BORG_PASSPHRASE"] = passphrase if passphrase
|
|
421
|
+
env["BORG_RELOCATED_REPO_ACCESS_IS_OK"] = "yes"
|
|
422
|
+
env["BORG_UNKNOWN_UNENCRYPTED_REPO_ACCESS_IS_OK"] = "yes"
|
|
423
|
+
|
|
424
|
+
stdout, stderr, status = Open3.capture3(env, *cmd)
|
|
425
|
+
raise BorgError, "Failed to list archives: #{stderr}" unless status.success?
|
|
426
|
+
|
|
427
|
+
json_data = JSON.parse(stdout)
|
|
428
|
+
archives = json_data["archives"] || []
|
|
429
|
+
|
|
430
|
+
# Build hash by querying each archive individually for comment
|
|
431
|
+
# This is necessary because 'borg list' doesn't include comments
|
|
432
|
+
archives.each_with_object({}) do |archive, hash|
|
|
433
|
+
archive_name = archive["name"]
|
|
434
|
+
|
|
435
|
+
# Query this specific archive to get the comment
|
|
436
|
+
info_cmd = [@repository.borg_path, "info", "#{@repository.path}::#{archive_name}", "--json"]
|
|
437
|
+
info_stdout, _, info_status = Open3.capture3(env, *info_cmd)
|
|
438
|
+
|
|
439
|
+
unless info_status.success?
|
|
440
|
+
# If we can't get info for this archive, skip it with defaults
|
|
441
|
+
hash[archive_name] = { path: "", size: 0, hash: "" }
|
|
442
|
+
next
|
|
443
|
+
end
|
|
444
|
+
|
|
445
|
+
info_data = JSON.parse(info_stdout)
|
|
446
|
+
archive_info = info_data["archives"]&.first || {}
|
|
447
|
+
comment = archive_info["comment"] || ""
|
|
448
|
+
|
|
449
|
+
# Parse comment based on format
|
|
450
|
+
# The comment field stores metadata as: path|||size|||hash (using ||| as delimiter)
|
|
451
|
+
# For backward compatibility, handle old formats:
|
|
452
|
+
# - Old format 1: plain path (no |||)
|
|
453
|
+
# - Old format 2: path|||hash (2 parts)
|
|
454
|
+
# - New format: path|||size|||hash (3 parts)
|
|
455
|
+
if comment.include?("|||")
|
|
456
|
+
parts = comment.split("|||")
|
|
457
|
+
file_path = parts[0]
|
|
458
|
+
if parts.length >= 3
|
|
459
|
+
# New format: path|||size|||hash
|
|
460
|
+
file_size = parts[1].to_i
|
|
461
|
+
file_hash = parts[2] || ""
|
|
462
|
+
else
|
|
463
|
+
# Old format: path|||hash (size not available)
|
|
464
|
+
file_size = 0
|
|
465
|
+
file_hash = parts[1] || ""
|
|
466
|
+
end
|
|
467
|
+
else
|
|
468
|
+
# Oldest format: comment is just the path string
|
|
469
|
+
file_path = comment
|
|
470
|
+
file_size = 0
|
|
471
|
+
file_hash = ""
|
|
472
|
+
end
|
|
473
|
+
|
|
474
|
+
hash[archive_name] = {
|
|
475
|
+
path: file_path,
|
|
476
|
+
size: file_size,
|
|
477
|
+
hash: file_hash
|
|
478
|
+
}
|
|
479
|
+
end
|
|
480
|
+
rescue JSON::ParserError => e
|
|
481
|
+
raise BorgError, "Failed to parse archive info: #{e.message}"
|
|
482
|
+
end
|
|
483
|
+
# rubocop:enable Metrics/AbcSize, Metrics/MethodLength, Metrics/PerceivedComplexity
|
|
484
|
+
|
|
485
|
+
def find_next_version_name(base_name, existing_archives)
|
|
486
|
+
version = 2
|
|
487
|
+
loop do
|
|
488
|
+
versioned_name = "#{base_name}-v#{version}"
|
|
489
|
+
return versioned_name unless existing_archives.key?(versioned_name)
|
|
490
|
+
|
|
491
|
+
version += 1
|
|
492
|
+
end
|
|
493
|
+
end
|
|
340
494
|
end
|
|
341
495
|
end
|
data/lib/ruborg/version.rb
CHANGED