RubyGems - ruborg - Versions diffs - 0.7.0 → 0.7.3 - Mend

ruborg 0.7.0 → 0.7.3

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (7) hide show

checksums.yaml CHANGED Viewed

@@ -1,7 +1,7 @@
 ---
 SHA256:
-  metadata.gz: c0db1cc300f1c33ec3ab6e1ad027b27fe379672e6344140fec4c5009f638de02
-  data.tar.gz: 29ffeb1331a718a2babd8febb11d12eca46db3f581074db8c66971d8737b21e4
+  metadata.gz: 314fb4f24b1b5544d95257f5d3f315099f221ecc92b7920e935066319db72458
+  data.tar.gz: 0ef7e5183f7596e1417b4b12915557d9c177b0b2504f6a143c7ff8aa59f726aa
 SHA512:
-  metadata.gz: 19ba4e200dd88d1f5251d0bf257b1591284c8a196571eb1f039cd6cf9bbca186774509f0adf72dadc2fc2da297589230bc0910b1a1466d6c016d5a0f2d39a6fe
-  data.tar.gz: 7af17006b577b9b48854b40214862d70c146197602ed94c83f1e5a4ec52b7b8f9273e09c827eebabd2c15a6c49974932d288c273bf4602c96240d2d7eba0ae8b
+  metadata.gz: fb9bcdb3517dc9fd2c1c05d868e8015fd5fe3a55c00e5c277cf7b0257f238535821f8dbcda594dc9bc4b413a8b96d4739d327e1b420f6bad08ac19e724193199
+  data.tar.gz: 6544e4d87969bd59ca7912653ecaebedeca75795f934edb93ad9435a0e48cff5b0c95734aa5bdc70572a38cbed014ea8ac93cff4abee97867a910acba47d5240

data/CHANGELOG.md CHANGED Viewed

@@ -7,6 +7,73 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
 ## [Unreleased]
+## [0.7.3] - 2025-10-09
+### Changed
+- **Smart Remove-Source for Skipped Files**: Skipped files (unchanged, already backed up) are now deleted when `--remove-source` is used
+  - Previously: Only newly backed-up files were deleted, skipped files remained
+  - Now: Both backed-up AND skipped files are deleted (they're all safely backed up)
+  - Rationale: If a file is skipped because it's already in an archive (verified by hash), it's safe to delete
+  - Makes `--remove-source` behavior consistent: "delete everything that's safely backed up"
+### Technical Details
+- Per-file mode verifies files are safely backed up before skipping (path + size + SHA256 hash match)
+- Skipped files are deleted immediately after verification (lib/ruborg/backup.rb:102)
+- Test updated to verify skipped files are deleted (spec/ruborg/per_file_backup_spec.rb:518)
+## [0.7.2] - 2025-10-09
+### Fixed
+- **Per-File Remove Source Behavior**: Files are now deleted immediately after each successful backup in per-file mode
+  - Previously deleted entire source paths at the end (dangerous - could delete unchanged files)
+  - Now deletes only successfully backed-up files, one at a time
+  - Skipped files (unchanged) are never deleted
+  - Matches the per-file philosophy: individual file handling throughout the backup process
+### Added
+- **Test Coverage**: Added 2 new RSpec tests verifying per-file remove-source behavior
+  - Tests immediate file deletion after backup
+  - Tests that skipped files are not deleted
+## [0.7.1] - 2025-10-08
+### Added
+- **Paranoid Mode Duplicate Detection**: Per-file backup mode now uses SHA256 content hashing to detect duplicate files
+  - Skips unchanged files automatically (same path, size, and content hash)
+  - Creates versioned archives (-v2, -v3) when content changes but modification time stays the same
+  - Protects against edge cases where files are modified with manual `touch -t` operations
+  - Archive metadata stores: `path|||size|||hash` for comprehensive verification
+  - Backward compatible with old archive formats (plain path, path|||hash)
+- **Smart Skip Statistics**: Backup completion messages show both backed-up and skipped file counts
+  - Example: "✓ Per-file backup completed: 50000 file(s) backed up, 26456 skipped (unchanged)"
+  - Provides visibility into deduplication efficiency
+### Fixed
+- **Per-File Backup Archive Collision**: Fixed "Archive already exists" error in per-file backup mode
+  - Archives are now verified by path, size, and content hash before skipping
+  - Different files with same archive name get automatic version suffixes
+  - File size changes detected even when modification time is manually reset
+  - Logs warning messages for collision scenarios with detailed context
+### Changed
+- **Archive Comment Format**: Per-file archives now store comprehensive metadata
+  - New format: `path|||size|||hash` (three-part delimiter-based format)
+  - Enables instant duplicate detection without re-hashing files
+  - Backward compatible parsing handles old formats gracefully
+- **Enhanced Collision Handling**: Intelligent version suffix generation
+  - Appends `-v2`, `-v3`, etc. for archive name collisions
+  - Prevents data loss from conflicting archive names
+  - Logs warnings for all collision scenarios
+### Security
+- **No Security Impact**: Security review found no exploitable vulnerabilities in new features
+  - Content hashing uses SHA256 (cryptographically secure)
+  - Archive comment parsing uses safe string splitting (no injection risks)
+  - File paths from archives only used for comparison, not file operations
+  - Array-based command execution prevents shell injection
+  - JSON parsing uses Ruby's safe `JSON.parse()` with error handling
+  - All existing security controls maintained
 ## [0.7.0] - 2025-10-08
 ### Added

data/README.md CHANGED Viewed

@@ -25,7 +25,7 @@ A friendly Ruby frontend for [Borg Backup](https://www.borgbackup.org/). Ruborg
 - 📈 **Summary View** - Quick overview of all repositories and their configurations
 - 🔧 **Custom Borg Path** - Support for custom Borg executable paths per repository
 - 🏠 **Hostname Validation** - NEW! Restrict backups to specific hosts (global or per-repository)
-- ✅ **Well-tested** - Comprehensive test suite with RSpec (220+ examples)
+- ✅ **Well-tested** - Comprehensive test suite with RSpec (288+ examples)
 - 🔒 **Security-focused** - Path validation, safe YAML loading, command injection protection
 ## Prerequisites
@@ -775,9 +775,12 @@ repositories:
 **How it works:**
 - **Per-File Archives**: Each file is backed up as a separate Borg archive
-- **Hash-Based Naming**: Archives are named `repo-{hash}-{timestamp}` (hash uniquely identifies the file path)
-- **Original Path Stored**: The complete original file path is stored in the archive comment
+- **Hash-Based Naming**: Archives are named `repo-filename-{hash}-{timestamp}` (hash uniquely identifies the file path)
+- **Metadata Storage**: Archive comments store `path|||size|||hash` for comprehensive duplicate detection
 - **Metadata Preservation**: Borg preserves all file metadata (mtime, size, permissions) in the archive
+- **Paranoid Mode Duplicate Detection** (v0.7.1+): SHA256 content hashing detects file changes even when size and mtime are identical
+- **Smart Skip**: Automatically skips unchanged files during backup (compares path, size, and content hash)
+- **Version Suffixes**: Creates versioned archives (`-v2`, `-v3`) for archive name collisions, preventing data loss
 - **Smart Pruning**: Retention reads file mtime directly from archives - works even after files are deleted
 **File Metadata Retention Options:**

data/SECURITY.md CHANGED Viewed

@@ -229,6 +229,21 @@ We will respond within 48 hours and work with you to address the issue.
 ## Security Audit History
+- **v0.7.1** (2025-10-08): Paranoid mode duplicate detection - security review passed
+  - **NEW FEATURE**: SHA256 content hashing for detecting file changes even when mtime/size are identical
+  - **NEW FEATURE**: Smart skip statistics showing backed-up and skipped file counts
+  - **BUG FIX**: Fixed "Archive already exists" error in per-file backup mode
+  - **ENHANCED**: Archive comment format now stores comprehensive metadata (`path|||size|||hash`)
+  - **ENHANCED**: Version suffix generation for archive name collisions (`-v2`, `-v3`)
+  - **SECURITY REVIEW**: Comprehensive security analysis found no exploitable vulnerabilities
+  - SHA256 hashing is cryptographically secure (using Ruby's Digest::SHA256)
+  - Archive comment parsing uses safe string splitting with `|||` delimiter (no injection risks)
+  - File paths from archives only used for comparison, never for file operations
+  - Array-based command execution prevents shell injection (maintained from previous versions)
+  - JSON parsing uses Ruby's safe `JSON.parse()` with error handling
+  - All existing security controls maintained - no security regressions
+  - Backward compatibility with three metadata formats (plain path, path|||hash, path|||size|||hash)
 - **v0.7.0** (2025-10-08): Archive naming and metadata features - security review passed
   - **NEW FEATURE**: List files within archives (--archive option)
   - **NEW FEATURE**: File metadata retrieval from archives

data/lib/ruborg/backup.rb CHANGED Viewed

@@ -43,17 +43,25 @@ module Ruborg
       remove_source_files if remove_source
     end
+    # rubocop:disable Metrics/AbcSize, Metrics/MethodLength, Metrics/PerceivedComplexity, Metrics/BlockNesting
     def create_per_file_archives(name_prefix, remove_source)
       # Collect all files from backup paths
       files_to_backup = collect_files_from_paths(@config.backup_paths, @config.exclude_patterns)
       raise BorgError, "No files found to backup" if files_to_backup.empty?
+      # Get list of existing archives for duplicate detection
+      existing_archives = get_existing_archive_names
       # Show repository header in console only
       print_repository_header
       puts "Found #{files_to_backup.size} file(s) to backup"
+      backed_up_count = 0
+      skipped_count = 0
+      # rubocop:disable Metrics/BlockLength
       files_to_backup.each_with_index do |file_path, index|
         # Generate hash-based archive name with filename
         path_hash = generate_path_hash(file_path)
@@ -67,22 +75,79 @@ module Ruborg
         archive_name = name_prefix || build_archive_name(@repo_name, sanitized_filename, path_hash, file_mtime)
         # Show progress in console
-        puts "  [#{index + 1}/#{files_to_backup.size}] Backing up: #{file_path}"
+        print "  [#{index + 1}/#{files_to_backup.size}] Backing up: #{file_path}"
+        # Check if archive already exists AND contains this exact file
+        if existing_archives.key?(archive_name)
+          stored_info = existing_archives[archive_name]
+          if stored_info[:path] == file_path
+            # Same file, same mtime -> check if size changed (rare: manual content edit + touch -t)
+            current_size = File.size(file_path)
+            stored_size = stored_info[:size]
+            if current_size == stored_size
+              # Size same -> verify content hasn't changed (paranoid mode)
+              current_hash = calculate_file_hash(file_path)
+              stored_hash = stored_info[:hash]
+              if current_hash == stored_hash
+                # Content truly unchanged - file is already safely backed up
+                puts " - Archive already exists (file unchanged)"
+                @logger&.info(
+                  "[#{@repo_name}] Skipped #{file_path} - archive #{archive_name} already exists (file unchanged)"
+                )
+                skipped_count += 1
+                # If remove_source is enabled, delete the file (it's already safely backed up)
+                remove_single_file(file_path) if remove_source
+                next
+              else
+                # Size same but content changed (rare: edited + truncated/padded to same size)
+                archive_name = find_next_version_name(archive_name, existing_archives)
+                @logger&.warn(
+                  "[#{@repo_name}] File content changed but size/mtime unchanged for #{file_path}, " \
+                  "using #{archive_name}"
+                )
+              end
+            else
+              # Size changed but mtime same -> content changed, add version suffix
+              archive_name = find_next_version_name(archive_name, existing_archives)
+              @logger&.warn(
+                "[#{@repo_name}] File size changed but mtime unchanged for #{file_path}, using #{archive_name}"
+              )
+            end
+          else
+            # Different file, same archive name -> add version suffix
+            archive_name = find_next_version_name(archive_name, existing_archives)
+            @logger&.warn(
+              "[#{@repo_name}] Archive name collision: #{archive_name} exists for different file, using version suffix"
+            )
+          end
+        end
         # Create archive for single file with original path as comment
         cmd = build_per_file_create_command(archive_name, file_path)
         execute_borg_command(cmd)
+        puts ""
         # Log successful action with details
         @logger&.info("[#{@repo_name}] Archived #{file_path} in archive #{archive_name}")
-      end
+        backed_up_count += 1
-      puts "✓ Per-file backup completed: #{files_to_backup.size} file(s) backed up"
+        # Remove source file immediately after successful backup in per-file mode
+        remove_single_file(file_path) if remove_source
+      end
+      # rubocop:enable Metrics/BlockLength
-      # NOTE: remove_source handled per file after successful backup
-      remove_source_files if remove_source
+      if skipped_count.positive?
+        puts "✓ Per-file backup completed: #{backed_up_count} file(s) backed up, #{skipped_count} skipped (unchanged)"
+      else
+        puts "✓ Per-file backup completed: #{backed_up_count} file(s) backed up"
+      end
     end
+    # rubocop:enable Metrics/AbcSize, Metrics/MethodLength, Metrics/PerceivedComplexity, Metrics/BlockNesting
     def collect_files_from_paths(paths, exclude_patterns)
       require "find"
@@ -178,12 +243,21 @@ module Ruborg
       end
     end
+    def calculate_file_hash(file_path)
+      require "digest"
+      Digest::SHA256.file(file_path).hexdigest
+    end
     def build_per_file_create_command(archive_name, file_path)
       cmd = [@repository.borg_path, "create"]
       cmd += ["--compression", @config.compression]
-      # Store original path in archive comment for retrieval
-      cmd += ["--comment", file_path]
+      # Store file metadata (path + size + hash) in archive comment for duplicate detection
+      # Format: path|||size|||hash (using ||| as delimiter to avoid conflicts with paths)
+      file_size = File.size(file_path)
+      file_hash = calculate_file_hash(file_path)
+      metadata = "#{file_path}|||#{file_size}|||#{file_hash}"
+      cmd += ["--comment", metadata]
       cmd << "#{@repository.path}::#{archive_name}"
       cmd << file_path
@@ -263,6 +337,34 @@ module Ruborg
       result
     end
+    def remove_single_file(file_path)
+      require "fileutils"
+      # Resolve symlinks and validate path
+      begin
+        real_path = File.realpath(file_path)
+      rescue Errno::ENOENT
+        # File doesn't exist (already deleted?), skip
+        @logger&.warn("Source file does not exist, skipping: #{file_path}")
+        return
+      end
+      # Security check: ensure file still exists
+      unless File.exist?(real_path)
+        @logger&.warn("Source file no longer exists, skipping: #{real_path}")
+        return
+      end
+      # Additional safety: don't delete system files
+      if real_path == "/" || real_path.start_with?("/bin", "/sbin", "/usr", "/etc", "/sys", "/proc")
+        @logger&.error("Refusing to delete system path: #{real_path}")
+        raise BorgError, "Refusing to delete system path: #{real_path}"
+      end
+      @logger&.info("Removing file: #{real_path}")
+      FileUtils.rm(real_path)
+    end
     def remove_source_files
       require "fileutils"
@@ -333,9 +435,93 @@ module Ruborg
     end
     def print_repository_header
-      puts "\n" + ("=" * 60)
+      puts "\n#{"=" * 60}"
       puts "  Repository: #{@repo_name}"
       puts "=" * 60
     end
+    # rubocop:disable Metrics/AbcSize, Metrics/MethodLength, Metrics/PerceivedComplexity
+    def get_existing_archive_names
+      require "json"
+      require "open3"
+      # First get list of archives
+      cmd = [@repository.borg_path, "list", @repository.path, "--json"]
+      env = {}
+      passphrase = @repository.instance_variable_get(:@passphrase)
+      env["BORG_PASSPHRASE"] = passphrase if passphrase
+      env["BORG_RELOCATED_REPO_ACCESS_IS_OK"] = "yes"
+      env["BORG_UNKNOWN_UNENCRYPTED_REPO_ACCESS_IS_OK"] = "yes"
+      stdout, stderr, status = Open3.capture3(env, *cmd)
+      raise BorgError, "Failed to list archives: #{stderr}" unless status.success?
+      json_data = JSON.parse(stdout)
+      archives = json_data["archives"] || []
+      # Build hash by querying each archive individually for comment
+      # This is necessary because 'borg list' doesn't include comments
+      archives.each_with_object({}) do |archive, hash|
+        archive_name = archive["name"]
+        # Query this specific archive to get the comment
+        info_cmd = [@repository.borg_path, "info", "#{@repository.path}::#{archive_name}", "--json"]
+        info_stdout, _, info_status = Open3.capture3(env, *info_cmd)
+        unless info_status.success?
+          # If we can't get info for this archive, skip it with defaults
+          hash[archive_name] = { path: "", size: 0, hash: "" }
+          next
+        end
+        info_data = JSON.parse(info_stdout)
+        archive_info = info_data["archives"]&.first || {}
+        comment = archive_info["comment"] || ""
+        # Parse comment based on format
+        # The comment field stores metadata as: path|||size|||hash (using ||| as delimiter)
+        # For backward compatibility, handle old formats:
+        #   - Old format 1: plain path (no |||)
+        #   - Old format 2: path|||hash (2 parts)
+        #   - New format: path|||size|||hash (3 parts)
+        if comment.include?("|||")
+          parts = comment.split("|||")
+          file_path = parts[0]
+          if parts.length >= 3
+            # New format: path|||size|||hash
+            file_size = parts[1].to_i
+            file_hash = parts[2] || ""
+          else
+            # Old format: path|||hash (size not available)
+            file_size = 0
+            file_hash = parts[1] || ""
+          end
+        else
+          # Oldest format: comment is just the path string
+          file_path = comment
+          file_size = 0
+          file_hash = ""
+        end
+        hash[archive_name] = {
+          path: file_path,
+          size: file_size,
+          hash: file_hash
+        }
+      end
+    rescue JSON::ParserError => e
+      raise BorgError, "Failed to parse archive info: #{e.message}"
+    end
+    # rubocop:enable Metrics/AbcSize, Metrics/MethodLength, Metrics/PerceivedComplexity
+    def find_next_version_name(base_name, existing_archives)
+      version = 2
+      loop do
+        versioned_name = "#{base_name}-v#{version}"
+        return versioned_name unless existing_archives.key?(versioned_name)
+        version += 1
+      end
+    end
   end
 end

data/lib/ruborg/version.rb CHANGED Viewed

@@ -1,5 +1,5 @@
 # frozen_string_literal: true
 module Ruborg
-  VERSION = "0.7.0"
+  VERSION = "0.7.3"
 end

metadata CHANGED Viewed

@@ -1,7 +1,7 @@
 --- !ruby/object:Gem::Specification
 name: ruborg
 version: !ruby/object:Gem::Version
-  version: 0.7.0
+  version: 0.7.3
 platform: ruby
 authors:
 - Michail Pantelelis