ruborg 0.7.0 → 0.7.3

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: c0db1cc300f1c33ec3ab6e1ad027b27fe379672e6344140fec4c5009f638de02
4
- data.tar.gz: 29ffeb1331a718a2babd8febb11d12eca46db3f581074db8c66971d8737b21e4
3
+ metadata.gz: 314fb4f24b1b5544d95257f5d3f315099f221ecc92b7920e935066319db72458
4
+ data.tar.gz: 0ef7e5183f7596e1417b4b12915557d9c177b0b2504f6a143c7ff8aa59f726aa
5
5
  SHA512:
6
- metadata.gz: 19ba4e200dd88d1f5251d0bf257b1591284c8a196571eb1f039cd6cf9bbca186774509f0adf72dadc2fc2da297589230bc0910b1a1466d6c016d5a0f2d39a6fe
7
- data.tar.gz: 7af17006b577b9b48854b40214862d70c146197602ed94c83f1e5a4ec52b7b8f9273e09c827eebabd2c15a6c49974932d288c273bf4602c96240d2d7eba0ae8b
6
+ metadata.gz: fb9bcdb3517dc9fd2c1c05d868e8015fd5fe3a55c00e5c277cf7b0257f238535821f8dbcda594dc9bc4b413a8b96d4739d327e1b420f6bad08ac19e724193199
7
+ data.tar.gz: 6544e4d87969bd59ca7912653ecaebedeca75795f934edb93ad9435a0e48cff5b0c95734aa5bdc70572a38cbed014ea8ac93cff4abee97867a910acba47d5240
data/CHANGELOG.md CHANGED
@@ -7,6 +7,73 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
7
7
 
8
8
  ## [Unreleased]
9
9
 
10
+ ## [0.7.3] - 2025-10-09
11
+
12
+ ### Changed
13
+ - **Smart Remove-Source for Skipped Files**: Skipped files (unchanged, already backed up) are now deleted when `--remove-source` is used
14
+ - Previously: Only newly backed-up files were deleted, skipped files remained
15
+ - Now: Both backed-up AND skipped files are deleted (they're all safely backed up)
16
+ - Rationale: If a file is skipped because it's already in an archive (verified by hash), it's safe to delete
17
+ - Makes `--remove-source` behavior consistent: "delete everything that's safely backed up"
18
+
19
+ ### Technical Details
20
+ - Per-file mode verifies files are safely backed up before skipping (path + size + SHA256 hash match)
21
+ - Skipped files are deleted immediately after verification (lib/ruborg/backup.rb:102)
22
+ - Test updated to verify skipped files are deleted (spec/ruborg/per_file_backup_spec.rb:518)
23
+
24
+ ## [0.7.2] - 2025-10-09
25
+
26
+ ### Fixed
27
+ - **Per-File Remove Source Behavior**: Files are now deleted immediately after each successful backup in per-file mode
28
+ - Previously deleted entire source paths at the end (dangerous - could delete unchanged files)
29
+ - Now deletes only successfully backed-up files, one at a time
30
+ - Skipped files (unchanged) are never deleted
31
+ - Matches the per-file philosophy: individual file handling throughout the backup process
32
+
33
+ ### Added
34
+ - **Test Coverage**: Added 2 new RSpec tests verifying per-file remove-source behavior
35
+ - Tests immediate file deletion after backup
36
+ - Tests that skipped files are not deleted
37
+
38
+ ## [0.7.1] - 2025-10-08
39
+
40
+ ### Added
41
+ - **Paranoid Mode Duplicate Detection**: Per-file backup mode now uses SHA256 content hashing to detect duplicate files
42
+ - Skips unchanged files automatically (same path, size, and content hash)
43
+ - Creates versioned archives (-v2, -v3) when content changes but modification time stays the same
44
+ - Protects against edge cases where files are modified with manual `touch -t` operations
45
+ - Archive metadata stores: `path|||size|||hash` for comprehensive verification
46
+ - Backward compatible with old archive formats (plain path, path|||hash)
47
+ - **Smart Skip Statistics**: Backup completion messages show both backed-up and skipped file counts
48
+ - Example: "✓ Per-file backup completed: 50000 file(s) backed up, 26456 skipped (unchanged)"
49
+ - Provides visibility into deduplication efficiency
50
+
51
+ ### Fixed
52
+ - **Per-File Backup Archive Collision**: Fixed "Archive already exists" error in per-file backup mode
53
+ - Archives are now verified by path, size, and content hash before skipping
54
+ - Different files with same archive name get automatic version suffixes
55
+ - File size changes detected even when modification time is manually reset
56
+ - Logs warning messages for collision scenarios with detailed context
57
+
58
+ ### Changed
59
+ - **Archive Comment Format**: Per-file archives now store comprehensive metadata
60
+ - New format: `path|||size|||hash` (three-part delimiter-based format)
61
+ - Enables instant duplicate detection without re-hashing files
62
+ - Backward compatible parsing handles old formats gracefully
63
+ - **Enhanced Collision Handling**: Intelligent version suffix generation
64
+ - Appends `-v2`, `-v3`, etc. for archive name collisions
65
+ - Prevents data loss from conflicting archive names
66
+ - Logs warnings for all collision scenarios
67
+
68
+ ### Security
69
+ - **No Security Impact**: Security review found no exploitable vulnerabilities in new features
70
+ - Content hashing uses SHA256 (cryptographically secure)
71
+ - Archive comment parsing uses safe string splitting (no injection risks)
72
+ - File paths from archives only used for comparison, not file operations
73
+ - Array-based command execution prevents shell injection
74
+ - JSON parsing uses Ruby's safe `JSON.parse()` with error handling
75
+ - All existing security controls maintained
76
+
10
77
  ## [0.7.0] - 2025-10-08
11
78
 
12
79
  ### Added
data/README.md CHANGED
@@ -25,7 +25,7 @@ A friendly Ruby frontend for [Borg Backup](https://www.borgbackup.org/). Ruborg
25
25
  - 📈 **Summary View** - Quick overview of all repositories and their configurations
26
26
  - 🔧 **Custom Borg Path** - Support for custom Borg executable paths per repository
27
27
  - 🏠 **Hostname Validation** - NEW! Restrict backups to specific hosts (global or per-repository)
28
- - ✅ **Well-tested** - Comprehensive test suite with RSpec (220+ examples)
28
+ - ✅ **Well-tested** - Comprehensive test suite with RSpec (288+ examples)
29
29
  - 🔒 **Security-focused** - Path validation, safe YAML loading, command injection protection
30
30
 
31
31
  ## Prerequisites
@@ -775,9 +775,12 @@ repositories:
775
775
 
776
776
  **How it works:**
777
777
  - **Per-File Archives**: Each file is backed up as a separate Borg archive
778
- - **Hash-Based Naming**: Archives are named `repo-{hash}-{timestamp}` (hash uniquely identifies the file path)
779
- - **Original Path Stored**: The complete original file path is stored in the archive comment
778
+ - **Hash-Based Naming**: Archives are named `repo-filename-{hash}-{timestamp}` (hash uniquely identifies the file path)
779
+ - **Metadata Storage**: Archive comments store `path|||size|||hash` for comprehensive duplicate detection
780
780
  - **Metadata Preservation**: Borg preserves all file metadata (mtime, size, permissions) in the archive
781
+ - **Paranoid Mode Duplicate Detection** (v0.7.1+): SHA256 content hashing detects file changes even when size and mtime are identical
782
+ - **Smart Skip**: Automatically skips unchanged files during backup (compares path, size, and content hash)
783
+ - **Version Suffixes**: Creates versioned archives (`-v2`, `-v3`) for archive name collisions, preventing data loss
781
784
  - **Smart Pruning**: Retention reads file mtime directly from archives - works even after files are deleted
782
785
 
783
786
  **File Metadata Retention Options:**
data/SECURITY.md CHANGED
@@ -229,6 +229,21 @@ We will respond within 48 hours and work with you to address the issue.
229
229
 
230
230
  ## Security Audit History
231
231
 
232
+ - **v0.7.1** (2025-10-08): Paranoid mode duplicate detection - security review passed
233
+ - **NEW FEATURE**: SHA256 content hashing for detecting file changes even when mtime/size are identical
234
+ - **NEW FEATURE**: Smart skip statistics showing backed-up and skipped file counts
235
+ - **BUG FIX**: Fixed "Archive already exists" error in per-file backup mode
236
+ - **ENHANCED**: Archive comment format now stores comprehensive metadata (`path|||size|||hash`)
237
+ - **ENHANCED**: Version suffix generation for archive name collisions (`-v2`, `-v3`)
238
+ - **SECURITY REVIEW**: Comprehensive security analysis found no exploitable vulnerabilities
239
+ - SHA256 hashing is cryptographically secure (using Ruby's Digest::SHA256)
240
+ - Archive comment parsing uses safe string splitting with `|||` delimiter (no injection risks)
241
+ - File paths from archives only used for comparison, never for file operations
242
+ - Array-based command execution prevents shell injection (maintained from previous versions)
243
+ - JSON parsing uses Ruby's safe `JSON.parse()` with error handling
244
+ - All existing security controls maintained - no security regressions
245
+ - Backward compatibility with three metadata formats (plain path, path|||hash, path|||size|||hash)
246
+
232
247
  - **v0.7.0** (2025-10-08): Archive naming and metadata features - security review passed
233
248
  - **NEW FEATURE**: List files within archives (--archive option)
234
249
  - **NEW FEATURE**: File metadata retrieval from archives
data/lib/ruborg/backup.rb CHANGED
@@ -43,17 +43,25 @@ module Ruborg
43
43
  remove_source_files if remove_source
44
44
  end
45
45
 
46
+ # rubocop:disable Metrics/AbcSize, Metrics/MethodLength, Metrics/PerceivedComplexity, Metrics/BlockNesting
46
47
  def create_per_file_archives(name_prefix, remove_source)
47
48
  # Collect all files from backup paths
48
49
  files_to_backup = collect_files_from_paths(@config.backup_paths, @config.exclude_patterns)
49
50
 
50
51
  raise BorgError, "No files found to backup" if files_to_backup.empty?
51
52
 
53
+ # Get list of existing archives for duplicate detection
54
+ existing_archives = get_existing_archive_names
55
+
52
56
  # Show repository header in console only
53
57
  print_repository_header
54
58
 
55
59
  puts "Found #{files_to_backup.size} file(s) to backup"
56
60
 
61
+ backed_up_count = 0
62
+ skipped_count = 0
63
+
64
+ # rubocop:disable Metrics/BlockLength
57
65
  files_to_backup.each_with_index do |file_path, index|
58
66
  # Generate hash-based archive name with filename
59
67
  path_hash = generate_path_hash(file_path)
@@ -67,22 +75,79 @@ module Ruborg
67
75
  archive_name = name_prefix || build_archive_name(@repo_name, sanitized_filename, path_hash, file_mtime)
68
76
 
69
77
  # Show progress in console
70
- puts " [#{index + 1}/#{files_to_backup.size}] Backing up: #{file_path}"
78
+ print " [#{index + 1}/#{files_to_backup.size}] Backing up: #{file_path}"
79
+
80
+ # Check if archive already exists AND contains this exact file
81
+ if existing_archives.key?(archive_name)
82
+ stored_info = existing_archives[archive_name]
83
+ if stored_info[:path] == file_path
84
+ # Same file, same mtime -> check if size changed (rare: manual content edit + touch -t)
85
+ current_size = File.size(file_path)
86
+ stored_size = stored_info[:size]
87
+
88
+ if current_size == stored_size
89
+ # Size same -> verify content hasn't changed (paranoid mode)
90
+ current_hash = calculate_file_hash(file_path)
91
+ stored_hash = stored_info[:hash]
92
+
93
+ if current_hash == stored_hash
94
+ # Content truly unchanged - file is already safely backed up
95
+ puts " - Archive already exists (file unchanged)"
96
+ @logger&.info(
97
+ "[#{@repo_name}] Skipped #{file_path} - archive #{archive_name} already exists (file unchanged)"
98
+ )
99
+ skipped_count += 1
100
+
101
+ # If remove_source is enabled, delete the file (it's already safely backed up)
102
+ remove_single_file(file_path) if remove_source
103
+
104
+ next
105
+ else
106
+ # Size same but content changed (rare: edited + truncated/padded to same size)
107
+ archive_name = find_next_version_name(archive_name, existing_archives)
108
+ @logger&.warn(
109
+ "[#{@repo_name}] File content changed but size/mtime unchanged for #{file_path}, " \
110
+ "using #{archive_name}"
111
+ )
112
+ end
113
+ else
114
+ # Size changed but mtime same -> content changed, add version suffix
115
+ archive_name = find_next_version_name(archive_name, existing_archives)
116
+ @logger&.warn(
117
+ "[#{@repo_name}] File size changed but mtime unchanged for #{file_path}, using #{archive_name}"
118
+ )
119
+ end
120
+ else
121
+ # Different file, same archive name -> add version suffix
122
+ archive_name = find_next_version_name(archive_name, existing_archives)
123
+ @logger&.warn(
124
+ "[#{@repo_name}] Archive name collision: #{archive_name} exists for different file, using version suffix"
125
+ )
126
+ end
127
+ end
71
128
 
72
129
  # Create archive for single file with original path as comment
73
130
  cmd = build_per_file_create_command(archive_name, file_path)
74
131
 
75
132
  execute_borg_command(cmd)
133
+ puts ""
76
134
 
77
135
  # Log successful action with details
78
136
  @logger&.info("[#{@repo_name}] Archived #{file_path} in archive #{archive_name}")
79
- end
137
+ backed_up_count += 1
80
138
 
81
- puts "✓ Per-file backup completed: #{files_to_backup.size} file(s) backed up"
139
+ # Remove source file immediately after successful backup in per-file mode
140
+ remove_single_file(file_path) if remove_source
141
+ end
142
+ # rubocop:enable Metrics/BlockLength
82
143
 
83
- # NOTE: remove_source handled per file after successful backup
84
- remove_source_files if remove_source
144
+ if skipped_count.positive?
145
+ puts "✓ Per-file backup completed: #{backed_up_count} file(s) backed up, #{skipped_count} skipped (unchanged)"
146
+ else
147
+ puts "✓ Per-file backup completed: #{backed_up_count} file(s) backed up"
148
+ end
85
149
  end
150
+ # rubocop:enable Metrics/AbcSize, Metrics/MethodLength, Metrics/PerceivedComplexity, Metrics/BlockNesting
86
151
 
87
152
  def collect_files_from_paths(paths, exclude_patterns)
88
153
  require "find"
@@ -178,12 +243,21 @@ module Ruborg
178
243
  end
179
244
  end
180
245
 
246
+ def calculate_file_hash(file_path)
247
+ require "digest"
248
+ Digest::SHA256.file(file_path).hexdigest
249
+ end
250
+
181
251
  def build_per_file_create_command(archive_name, file_path)
182
252
  cmd = [@repository.borg_path, "create"]
183
253
  cmd += ["--compression", @config.compression]
184
254
 
185
- # Store original path in archive comment for retrieval
186
- cmd += ["--comment", file_path]
255
+ # Store file metadata (path + size + hash) in archive comment for duplicate detection
256
+ # Format: path|||size|||hash (using ||| as delimiter to avoid conflicts with paths)
257
+ file_size = File.size(file_path)
258
+ file_hash = calculate_file_hash(file_path)
259
+ metadata = "#{file_path}|||#{file_size}|||#{file_hash}"
260
+ cmd += ["--comment", metadata]
187
261
 
188
262
  cmd << "#{@repository.path}::#{archive_name}"
189
263
  cmd << file_path
@@ -263,6 +337,34 @@ module Ruborg
263
337
  result
264
338
  end
265
339
 
340
+ def remove_single_file(file_path)
341
+ require "fileutils"
342
+
343
+ # Resolve symlinks and validate path
344
+ begin
345
+ real_path = File.realpath(file_path)
346
+ rescue Errno::ENOENT
347
+ # File doesn't exist (already deleted?), skip
348
+ @logger&.warn("Source file does not exist, skipping: #{file_path}")
349
+ return
350
+ end
351
+
352
+ # Security check: ensure file still exists
353
+ unless File.exist?(real_path)
354
+ @logger&.warn("Source file no longer exists, skipping: #{real_path}")
355
+ return
356
+ end
357
+
358
+ # Additional safety: don't delete system files
359
+ if real_path == "/" || real_path.start_with?("/bin", "/sbin", "/usr", "/etc", "/sys", "/proc")
360
+ @logger&.error("Refusing to delete system path: #{real_path}")
361
+ raise BorgError, "Refusing to delete system path: #{real_path}"
362
+ end
363
+
364
+ @logger&.info("Removing file: #{real_path}")
365
+ FileUtils.rm(real_path)
366
+ end
367
+
266
368
  def remove_source_files
267
369
  require "fileutils"
268
370
 
@@ -333,9 +435,93 @@ module Ruborg
333
435
  end
334
436
 
335
437
  def print_repository_header
336
- puts "\n" + ("=" * 60)
438
+ puts "\n#{"=" * 60}"
337
439
  puts " Repository: #{@repo_name}"
338
440
  puts "=" * 60
339
441
  end
442
+
443
+ # rubocop:disable Metrics/AbcSize, Metrics/MethodLength, Metrics/PerceivedComplexity
444
+ def get_existing_archive_names
445
+ require "json"
446
+ require "open3"
447
+
448
+ # First get list of archives
449
+ cmd = [@repository.borg_path, "list", @repository.path, "--json"]
450
+ env = {}
451
+ passphrase = @repository.instance_variable_get(:@passphrase)
452
+ env["BORG_PASSPHRASE"] = passphrase if passphrase
453
+ env["BORG_RELOCATED_REPO_ACCESS_IS_OK"] = "yes"
454
+ env["BORG_UNKNOWN_UNENCRYPTED_REPO_ACCESS_IS_OK"] = "yes"
455
+
456
+ stdout, stderr, status = Open3.capture3(env, *cmd)
457
+ raise BorgError, "Failed to list archives: #{stderr}" unless status.success?
458
+
459
+ json_data = JSON.parse(stdout)
460
+ archives = json_data["archives"] || []
461
+
462
+ # Build hash by querying each archive individually for comment
463
+ # This is necessary because 'borg list' doesn't include comments
464
+ archives.each_with_object({}) do |archive, hash|
465
+ archive_name = archive["name"]
466
+
467
+ # Query this specific archive to get the comment
468
+ info_cmd = [@repository.borg_path, "info", "#{@repository.path}::#{archive_name}", "--json"]
469
+ info_stdout, _, info_status = Open3.capture3(env, *info_cmd)
470
+
471
+ unless info_status.success?
472
+ # If we can't get info for this archive, skip it with defaults
473
+ hash[archive_name] = { path: "", size: 0, hash: "" }
474
+ next
475
+ end
476
+
477
+ info_data = JSON.parse(info_stdout)
478
+ archive_info = info_data["archives"]&.first || {}
479
+ comment = archive_info["comment"] || ""
480
+
481
+ # Parse comment based on format
482
+ # The comment field stores metadata as: path|||size|||hash (using ||| as delimiter)
483
+ # For backward compatibility, handle old formats:
484
+ # - Old format 1: plain path (no |||)
485
+ # - Old format 2: path|||hash (2 parts)
486
+ # - New format: path|||size|||hash (3 parts)
487
+ if comment.include?("|||")
488
+ parts = comment.split("|||")
489
+ file_path = parts[0]
490
+ if parts.length >= 3
491
+ # New format: path|||size|||hash
492
+ file_size = parts[1].to_i
493
+ file_hash = parts[2] || ""
494
+ else
495
+ # Old format: path|||hash (size not available)
496
+ file_size = 0
497
+ file_hash = parts[1] || ""
498
+ end
499
+ else
500
+ # Oldest format: comment is just the path string
501
+ file_path = comment
502
+ file_size = 0
503
+ file_hash = ""
504
+ end
505
+
506
+ hash[archive_name] = {
507
+ path: file_path,
508
+ size: file_size,
509
+ hash: file_hash
510
+ }
511
+ end
512
+ rescue JSON::ParserError => e
513
+ raise BorgError, "Failed to parse archive info: #{e.message}"
514
+ end
515
+ # rubocop:enable Metrics/AbcSize, Metrics/MethodLength, Metrics/PerceivedComplexity
516
+
517
+ def find_next_version_name(base_name, existing_archives)
518
+ version = 2
519
+ loop do
520
+ versioned_name = "#{base_name}-v#{version}"
521
+ return versioned_name unless existing_archives.key?(versioned_name)
522
+
523
+ version += 1
524
+ end
525
+ end
340
526
  end
341
527
  end
@@ -1,5 +1,5 @@
1
1
  # frozen_string_literal: true
2
2
 
3
3
  module Ruborg
4
- VERSION = "0.7.0"
4
+ VERSION = "0.7.3"
5
5
  end
metadata CHANGED
@@ -1,7 +1,7 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: ruborg
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.7.0
4
+ version: 0.7.3
5
5
  platform: ruby
6
6
  authors:
7
7
  - Michail Pantelelis