ruborg 0.7.0 → 0.7.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: c0db1cc300f1c33ec3ab6e1ad027b27fe379672e6344140fec4c5009f638de02
4
- data.tar.gz: 29ffeb1331a718a2babd8febb11d12eca46db3f581074db8c66971d8737b21e4
3
+ metadata.gz: 5b1659a54e64ed15742467c6e95ea96fd311f0332e2fafdee7e560cec5c2385c
4
+ data.tar.gz: ec49de1e1231ad2bd189e08aedb70d22ec10f4f322f1b4690c9c3ec41be590bb
5
5
  SHA512:
6
- metadata.gz: 19ba4e200dd88d1f5251d0bf257b1591284c8a196571eb1f039cd6cf9bbca186774509f0adf72dadc2fc2da297589230bc0910b1a1466d6c016d5a0f2d39a6fe
7
- data.tar.gz: 7af17006b577b9b48854b40214862d70c146197602ed94c83f1e5a4ec52b7b8f9273e09c827eebabd2c15a6c49974932d288c273bf4602c96240d2d7eba0ae8b
6
+ metadata.gz: '0593d8521ab13110b3fefe00b9a1bc3cd6a8c28ffa70e93dda6e3bfa6b763f0562abd564af25a14c5619b2fe7dabb7ef76ffbe1341c6dcdb4dcac7efbb7be847'
7
+ data.tar.gz: 9486465c9803962569b5f376556efafa8147d72a294e9a6e3137d29e67c9664b212bed69addc0bab0ad434ddc5f42416e3aa7426087bbf4dd7c9a4ce2c71192f
data/CHANGELOG.md CHANGED
@@ -7,6 +7,45 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
7
7
 
8
8
  ## [Unreleased]
9
9
 
10
+ ## [0.7.1] - 2025-10-08
11
+
12
+ ### Added
13
+ - **Paranoid Mode Duplicate Detection**: Per-file backup mode now uses SHA256 content hashing to detect duplicate files
14
+ - Skips unchanged files automatically (same path, size, and content hash)
15
+ - Creates versioned archives (-v2, -v3) when content changes but modification time stays the same
16
+ - Protects against edge cases where files are modified with manual `touch -t` operations
17
+ - Archive metadata stores: `path|||size|||hash` for comprehensive verification
18
+ - Backward compatible with old archive formats (plain path, path|||hash)
19
+ - **Smart Skip Statistics**: Backup completion messages show both backed-up and skipped file counts
20
+ - Example: "✓ Per-file backup completed: 50000 file(s) backed up, 26456 skipped (unchanged)"
21
+ - Provides visibility into deduplication efficiency
22
+
23
+ ### Fixed
24
+ - **Per-File Backup Archive Collision**: Fixed "Archive already exists" error in per-file backup mode
25
+ - Archives are now verified by path, size, and content hash before skipping
26
+ - Different files with same archive name get automatic version suffixes
27
+ - File size changes detected even when modification time is manually reset
28
+ - Logs warning messages for collision scenarios with detailed context
29
+
30
+ ### Changed
31
+ - **Archive Comment Format**: Per-file archives now store comprehensive metadata
32
+ - New format: `path|||size|||hash` (three-part delimiter-based format)
33
+ - Enables instant duplicate detection without re-hashing files
34
+ - Backward compatible parsing handles old formats gracefully
35
+ - **Enhanced Collision Handling**: Intelligent version suffix generation
36
+ - Appends `-v2`, `-v3`, etc. for archive name collisions
37
+ - Prevents data loss from conflicting archive names
38
+ - Logs warnings for all collision scenarios
39
+
40
+ ### Security
41
+ - **No Security Impact**: Security review found no exploitable vulnerabilities in new features
42
+ - Content hashing uses SHA256 (cryptographically secure)
43
+ - Archive comment parsing uses safe string splitting (no injection risks)
44
+ - File paths from archives only used for comparison, not file operations
45
+ - Array-based command execution prevents shell injection
46
+ - JSON parsing uses Ruby's safe `JSON.parse()` with error handling
47
+ - All existing security controls maintained
48
+
10
49
  ## [0.7.0] - 2025-10-08
11
50
 
12
51
  ### Added
data/README.md CHANGED
@@ -25,7 +25,7 @@ A friendly Ruby frontend for [Borg Backup](https://www.borgbackup.org/). Ruborg
25
25
  - 📈 **Summary View** - Quick overview of all repositories and their configurations
26
26
  - 🔧 **Custom Borg Path** - Support for custom Borg executable paths per repository
27
27
  - 🏠 **Hostname Validation** - NEW! Restrict backups to specific hosts (global or per-repository)
28
- - ✅ **Well-tested** - Comprehensive test suite with RSpec (220+ examples)
28
+ - ✅ **Well-tested** - Comprehensive test suite with RSpec (286+ examples)
29
29
  - 🔒 **Security-focused** - Path validation, safe YAML loading, command injection protection
30
30
 
31
31
  ## Prerequisites
@@ -775,9 +775,12 @@ repositories:
775
775
 
776
776
  **How it works:**
777
777
  - **Per-File Archives**: Each file is backed up as a separate Borg archive
778
- - **Hash-Based Naming**: Archives are named `repo-{hash}-{timestamp}` (hash uniquely identifies the file path)
779
- - **Original Path Stored**: The complete original file path is stored in the archive comment
778
+ - **Hash-Based Naming**: Archives are named `repo-filename-{hash}-{timestamp}` (hash uniquely identifies the file path)
779
+ - **Metadata Storage**: Archive comments store `path|||size|||hash` for comprehensive duplicate detection
780
780
  - **Metadata Preservation**: Borg preserves all file metadata (mtime, size, permissions) in the archive
781
+ - **Paranoid Mode Duplicate Detection** (v0.7.1+): SHA256 content hashing detects file changes even when size and mtime are identical
782
+ - **Smart Skip**: Automatically skips unchanged files during backup (compares path, size, and content hash)
783
+ - **Version Suffixes**: Creates versioned archives (`-v2`, `-v3`) for archive name collisions, preventing data loss
781
784
  - **Smart Pruning**: Retention reads file mtime directly from archives - works even after files are deleted
782
785
 
783
786
  **File Metadata Retention Options:**
data/SECURITY.md CHANGED
@@ -229,6 +229,21 @@ We will respond within 48 hours and work with you to address the issue.
229
229
 
230
230
  ## Security Audit History
231
231
 
232
+ - **v0.7.1** (2025-10-08): Paranoid mode duplicate detection - security review passed
233
+ - **NEW FEATURE**: SHA256 content hashing for detecting file changes even when mtime/size are identical
234
+ - **NEW FEATURE**: Smart skip statistics showing backed-up and skipped file counts
235
+ - **BUG FIX**: Fixed "Archive already exists" error in per-file backup mode
236
+ - **ENHANCED**: Archive comment format now stores comprehensive metadata (`path|||size|||hash`)
237
+ - **ENHANCED**: Version suffix generation for archive name collisions (`-v2`, `-v3`)
238
+ - **SECURITY REVIEW**: Comprehensive security analysis found no exploitable vulnerabilities
239
+ - SHA256 hashing is cryptographically secure (using Ruby's Digest::SHA256)
240
+ - Archive comment parsing uses safe string splitting with `|||` delimiter (no injection risks)
241
+ - File paths from archives only used for comparison, never for file operations
242
+ - Array-based command execution prevents shell injection (maintained from previous versions)
243
+ - JSON parsing uses Ruby's safe `JSON.parse()` with error handling
244
+ - All existing security controls maintained - no security regressions
245
+ - Backward compatibility with three metadata formats (plain path, path|||hash, path|||size|||hash)
246
+
232
247
  - **v0.7.0** (2025-10-08): Archive naming and metadata features - security review passed
233
248
  - **NEW FEATURE**: List files within archives (--archive option)
234
249
  - **NEW FEATURE**: File metadata retrieval from archives
data/lib/ruborg/backup.rb CHANGED
@@ -43,17 +43,25 @@ module Ruborg
43
43
  remove_source_files if remove_source
44
44
  end
45
45
 
46
+ # rubocop:disable Metrics/AbcSize, Metrics/MethodLength, Metrics/PerceivedComplexity, Metrics/BlockNesting
46
47
  def create_per_file_archives(name_prefix, remove_source)
47
48
  # Collect all files from backup paths
48
49
  files_to_backup = collect_files_from_paths(@config.backup_paths, @config.exclude_patterns)
49
50
 
50
51
  raise BorgError, "No files found to backup" if files_to_backup.empty?
51
52
 
53
+ # Get list of existing archives for duplicate detection
54
+ existing_archives = get_existing_archive_names
55
+
52
56
  # Show repository header in console only
53
57
  print_repository_header
54
58
 
55
59
  puts "Found #{files_to_backup.size} file(s) to backup"
56
60
 
61
+ backed_up_count = 0
62
+ skipped_count = 0
63
+
64
+ # rubocop:disable Metrics/BlockLength
57
65
  files_to_backup.each_with_index do |file_path, index|
58
66
  # Generate hash-based archive name with filename
59
67
  path_hash = generate_path_hash(file_path)
@@ -67,22 +75,75 @@ module Ruborg
67
75
  archive_name = name_prefix || build_archive_name(@repo_name, sanitized_filename, path_hash, file_mtime)
68
76
 
69
77
  # Show progress in console
70
- puts " [#{index + 1}/#{files_to_backup.size}] Backing up: #{file_path}"
78
+ print " [#{index + 1}/#{files_to_backup.size}] Backing up: #{file_path}"
79
+
80
+ # Check if archive already exists AND contains this exact file
81
+ if existing_archives.key?(archive_name)
82
+ stored_info = existing_archives[archive_name]
83
+ if stored_info[:path] == file_path
84
+ # Same file, same mtime -> check if size changed (rare: manual content edit + touch -t)
85
+ current_size = File.size(file_path)
86
+ stored_size = stored_info[:size]
87
+
88
+ if current_size == stored_size
89
+ # Size same -> verify content hasn't changed (paranoid mode)
90
+ current_hash = calculate_file_hash(file_path)
91
+ stored_hash = stored_info[:hash]
92
+
93
+ if current_hash == stored_hash
94
+ # Content truly unchanged
95
+ puts " - Archive already exists (file unchanged)"
96
+ @logger&.info(
97
+ "[#{@repo_name}] Skipped #{file_path} - archive #{archive_name} already exists (file unchanged)"
98
+ )
99
+ skipped_count += 1
100
+ next
101
+ else
102
+ # Size same but content changed (rare: edited + truncated/padded to same size)
103
+ archive_name = find_next_version_name(archive_name, existing_archives)
104
+ @logger&.warn(
105
+ "[#{@repo_name}] File content changed but size/mtime unchanged for #{file_path}, " \
106
+ "using #{archive_name}"
107
+ )
108
+ end
109
+ else
110
+ # Size changed but mtime same -> content changed, add version suffix
111
+ archive_name = find_next_version_name(archive_name, existing_archives)
112
+ @logger&.warn(
113
+ "[#{@repo_name}] File size changed but mtime unchanged for #{file_path}, using #{archive_name}"
114
+ )
115
+ end
116
+ else
117
+ # Different file, same archive name -> add version suffix
118
+ archive_name = find_next_version_name(archive_name, existing_archives)
119
+ @logger&.warn(
120
+ "[#{@repo_name}] Archive name collision: #{archive_name} exists for different file, using version suffix"
121
+ )
122
+ end
123
+ end
71
124
 
72
125
  # Create archive for single file with original path as comment
73
126
  cmd = build_per_file_create_command(archive_name, file_path)
74
127
 
75
128
  execute_borg_command(cmd)
129
+ puts ""
76
130
 
77
131
  # Log successful action with details
78
132
  @logger&.info("[#{@repo_name}] Archived #{file_path} in archive #{archive_name}")
133
+ backed_up_count += 1
79
134
  end
135
+ # rubocop:enable Metrics/BlockLength
80
136
 
81
- puts "✓ Per-file backup completed: #{files_to_backup.size} file(s) backed up"
137
+ if skipped_count.positive?
138
+ puts "✓ Per-file backup completed: #{backed_up_count} file(s) backed up, #{skipped_count} skipped (unchanged)"
139
+ else
140
+ puts "✓ Per-file backup completed: #{backed_up_count} file(s) backed up"
141
+ end
82
142
 
83
143
  # NOTE: remove_source handled per file after successful backup
84
144
  remove_source_files if remove_source
85
145
  end
146
+ # rubocop:enable Metrics/AbcSize, Metrics/MethodLength, Metrics/PerceivedComplexity, Metrics/BlockNesting
86
147
 
87
148
  def collect_files_from_paths(paths, exclude_patterns)
88
149
  require "find"
@@ -178,12 +239,21 @@ module Ruborg
178
239
  end
179
240
  end
180
241
 
242
+ def calculate_file_hash(file_path)
243
+ require "digest"
244
+ Digest::SHA256.file(file_path).hexdigest
245
+ end
246
+
181
247
  def build_per_file_create_command(archive_name, file_path)
182
248
  cmd = [@repository.borg_path, "create"]
183
249
  cmd += ["--compression", @config.compression]
184
250
 
185
- # Store original path in archive comment for retrieval
186
- cmd += ["--comment", file_path]
251
+ # Store file metadata (path + size + hash) in archive comment for duplicate detection
252
+ # Format: path|||size|||hash (using ||| as delimiter to avoid conflicts with paths)
253
+ file_size = File.size(file_path)
254
+ file_hash = calculate_file_hash(file_path)
255
+ metadata = "#{file_path}|||#{file_size}|||#{file_hash}"
256
+ cmd += ["--comment", metadata]
187
257
 
188
258
  cmd << "#{@repository.path}::#{archive_name}"
189
259
  cmd << file_path
@@ -333,9 +403,93 @@ module Ruborg
333
403
  end
334
404
 
335
405
  def print_repository_header
336
- puts "\n" + ("=" * 60)
406
+ puts "\n#{"=" * 60}"
337
407
  puts " Repository: #{@repo_name}"
338
408
  puts "=" * 60
339
409
  end
410
+
411
+ # rubocop:disable Metrics/AbcSize, Metrics/MethodLength, Metrics/PerceivedComplexity
412
+ def get_existing_archive_names
413
+ require "json"
414
+ require "open3"
415
+
416
+ # First get list of archives
417
+ cmd = [@repository.borg_path, "list", @repository.path, "--json"]
418
+ env = {}
419
+ passphrase = @repository.instance_variable_get(:@passphrase)
420
+ env["BORG_PASSPHRASE"] = passphrase if passphrase
421
+ env["BORG_RELOCATED_REPO_ACCESS_IS_OK"] = "yes"
422
+ env["BORG_UNKNOWN_UNENCRYPTED_REPO_ACCESS_IS_OK"] = "yes"
423
+
424
+ stdout, stderr, status = Open3.capture3(env, *cmd)
425
+ raise BorgError, "Failed to list archives: #{stderr}" unless status.success?
426
+
427
+ json_data = JSON.parse(stdout)
428
+ archives = json_data["archives"] || []
429
+
430
+ # Build hash by querying each archive individually for comment
431
+ # This is necessary because 'borg list' doesn't include comments
432
+ archives.each_with_object({}) do |archive, hash|
433
+ archive_name = archive["name"]
434
+
435
+ # Query this specific archive to get the comment
436
+ info_cmd = [@repository.borg_path, "info", "#{@repository.path}::#{archive_name}", "--json"]
437
+ info_stdout, _, info_status = Open3.capture3(env, *info_cmd)
438
+
439
+ unless info_status.success?
440
+ # If we can't get info for this archive, skip it with defaults
441
+ hash[archive_name] = { path: "", size: 0, hash: "" }
442
+ next
443
+ end
444
+
445
+ info_data = JSON.parse(info_stdout)
446
+ archive_info = info_data["archives"]&.first || {}
447
+ comment = archive_info["comment"] || ""
448
+
449
+ # Parse comment based on format
450
+ # The comment field stores metadata as: path|||size|||hash (using ||| as delimiter)
451
+ # For backward compatibility, handle old formats:
452
+ # - Old format 1: plain path (no |||)
453
+ # - Old format 2: path|||hash (2 parts)
454
+ # - New format: path|||size|||hash (3 parts)
455
+ if comment.include?("|||")
456
+ parts = comment.split("|||")
457
+ file_path = parts[0]
458
+ if parts.length >= 3
459
+ # New format: path|||size|||hash
460
+ file_size = parts[1].to_i
461
+ file_hash = parts[2] || ""
462
+ else
463
+ # Old format: path|||hash (size not available)
464
+ file_size = 0
465
+ file_hash = parts[1] || ""
466
+ end
467
+ else
468
+ # Oldest format: comment is just the path string
469
+ file_path = comment
470
+ file_size = 0
471
+ file_hash = ""
472
+ end
473
+
474
+ hash[archive_name] = {
475
+ path: file_path,
476
+ size: file_size,
477
+ hash: file_hash
478
+ }
479
+ end
480
+ rescue JSON::ParserError => e
481
+ raise BorgError, "Failed to parse archive info: #{e.message}"
482
+ end
483
+ # rubocop:enable Metrics/AbcSize, Metrics/MethodLength, Metrics/PerceivedComplexity
484
+
485
+ def find_next_version_name(base_name, existing_archives)
486
+ version = 2
487
+ loop do
488
+ versioned_name = "#{base_name}-v#{version}"
489
+ return versioned_name unless existing_archives.key?(versioned_name)
490
+
491
+ version += 1
492
+ end
493
+ end
340
494
  end
341
495
  end
@@ -1,5 +1,5 @@
1
1
  # frozen_string_literal: true
2
2
 
3
3
  module Ruborg
4
- VERSION = "0.7.0"
4
+ VERSION = "0.7.1"
5
5
  end
metadata CHANGED
@@ -1,7 +1,7 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: ruborg
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.7.0
4
+ version: 0.7.1
5
5
  platform: ruby
6
6
  authors:
7
7
  - Michail Pantelelis