RubyGems - type_balancer - Versions diffs - 0.1.0 → 0.1.2 - Mend

type_balancer 0.1.0 → 0.1.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (27) hide show

checksums.yaml +4 -4
data/.DS_Store +0 -0
data/CHANGELOG.md +37 -3
data/Dockerfile +3 -1
data/Gemfile.lock +1 -1
data/README.md +75 -9
data/Rakefile +7 -29
data/benchmark/end_to_end_benchmark.rb +6 -3
data/benchmark_results/ruby3.2.8.txt +8 -8
data/benchmark_results/ruby3.2.8_yjit.txt +13 -13
data/benchmark_results/ruby3.3.7.txt +8 -8
data/benchmark_results/ruby3.3.7_yjit.txt +13 -13
data/benchmark_results/ruby3.4.2.txt +8 -8
data/benchmark_results/ruby3.4.2_yjit.txt +13 -13
data/docs/benchmarks/README.md +57 -51
data/docs/quality.md +67 -0
data/examples/quality.rb +113 -1
data/lib/type_balancer/balancer.rb +71 -94
data/lib/type_balancer/batch_processing.rb +35 -0
data/lib/type_balancer/distributor.rb +26 -53
data/lib/type_balancer/position_calculator.rb +61 -0
data/lib/type_balancer/ratio_calculator.rb +91 -0
data/lib/type_balancer/type_extractor.rb +29 -0
data/lib/type_balancer/version.rb +1 -1
data/lib/type_balancer.rb +36 -17
metadata +9 -4
data/sig/type_balancer.rbs +0 -85

data/docs/benchmarks/README.md CHANGED Viewed

@@ -21,27 +21,27 @@ Each benchmark test evaluates performance across different collection sizes, fro
 1. Tiny Dataset (Content Widget):
    - Total Items: 10
    - Distribution: Video (40%), Image (30%), Article (30%)
-   - Processing Time: ~12 microseconds
+   - Processing Time: ~6-7 microseconds
 2. Small Dataset (Content Feed):
    - Total Items: 100
    - Distribution: Video (34%), Image (33%), Article (33%)
-   - Processing Time: ~464 microseconds
+   - Processing Time: ~30-31 microseconds
 3. Medium Dataset (Category Page):
    - Total Items: 1,000
    - Distribution: Video (33.4%), Image (33.3%), Article (33.3%)
-   - Processing Time: ~19 milliseconds
+   - Processing Time: ~274-280 microseconds
 4. Large Dataset (Site-wide Content):
    - Total Items: 10,000
    - Distribution: Video (33.34%), Image (33.33%), Article (33.33%)
-   - Processing Time: ~191 milliseconds
+   - Processing Time: ~2.4-2.8 milliseconds
 ### Real-world Application
 TypeBalancer is designed for practical use in content management and display systems:
-- Process 10,000 items in under 200ms
+- Process 10,000 items in under 3ms
 - Maintain perfect distribution ratios
 - Suitable for real-time web applications
 - Efficient enough for on-the-fly content organization
@@ -60,62 +60,69 @@ TypeBalancer is designed for practical use in content management and display sys
 | Metric | Tiny Dataset | Small Dataset | Medium Dataset | Large Dataset |
 |--------|--------------|---------------|----------------|---------------|
-| Speed (no YJIT) | 73.3K ops/sec | 2.0K ops/sec | 46.9 ops/sec | 4.8 ops/sec |
-| Speed (YJIT) | 102.0K ops/sec | 2.1K ops/sec | 46.2 ops/sec | 4.8 ops/sec |
-| Time/Op (no YJIT) | 13.63 μs | 494.84 μs | 21.34 ms | 207.62 ms |
-| Time/Op (YJIT) | 9.80 μs | 478.05 μs | 21.64 ms | 208.85 ms |
-| YJIT Impact | +39.1% | +3.5% | -1.4% | -0.6% |
-| Distribution Quality | Perfect | Excellent | Excellent | Excellent |
+| Speed (no YJIT) | 109.0K ops/sec | 21.9K ops/sec | 2.0K ops/sec | 264 ops/sec |
+| Speed (YJIT) | 152.7K ops/sec | 32.4K ops/sec | 3.6K ops/sec | 424 ops/sec |
+| Time/Op (no YJIT) | 9.18 μs | 45.71 μs | 498.96 μs | 3.79 ms |
+| Time/Op (YJIT) | 6.55 μs | 30.88 μs | 274.30 μs | 2.36 ms |
+| YJIT Impact | +40.1% | +48.0% | +80.0% | +60.6% |
+| Distribution Quality | Perfect | Excellent | Excellent | Perfect |
 ### Ruby 3.3.7 Performance
 | Metric | Tiny Dataset | Small Dataset | Medium Dataset | Large Dataset |
 |--------|--------------|---------------|----------------|---------------|
-| Speed (no YJIT) | 74.8K ops/sec | 2.1K ops/sec | 49.2 ops/sec | 5.1 ops/sec |
-| Speed (YJIT) | 108.1K ops/sec | 2.3K ops/sec | 48.8 ops/sec | 5.2 ops/sec |
-| Time/Op (no YJIT) | 13.37 μs | 477.95 μs | 20.34 ms | 196.05 ms |
-| Time/Op (YJIT) | 9.25 μs | 437.36 μs | 20.49 ms | 193.08 ms |
-| YJIT Impact | +44.5% | +9.3% | -0.7% | +1.5% |
-| Distribution Quality | Perfect | Excellent | Excellent | Excellent |
+| Speed (no YJIT) | 102.4K ops/sec | 20.8K ops/sec | 1.9K ops/sec | 245 ops/sec |
+| Speed (YJIT) | 148.2K ops/sec | 31.2K ops/sec | 3.5K ops/sec | 394 ops/sec |
+| Time/Op (no YJIT) | 9.77 μs | 48.08 μs | 526.32 μs | 4.08 ms |
+| Time/Op (YJIT) | 6.75 μs | 32.05 μs | 277.78 μs | 2.54 ms |
+| YJIT Impact | +44.7% | +50.0% | +84.2% | +60.8% |
+| Distribution Quality | Perfect | Excellent | Excellent | Perfect |
 ### Ruby 3.2.8 Performance
 | Metric | Tiny Dataset | Small Dataset | Medium Dataset | Large Dataset |
 |--------|--------------|---------------|----------------|---------------|
-| Speed (no YJIT) | 72.2K ops/sec | 2.2K ops/sec | 46.3 ops/sec | 4.7 ops/sec |
-| Speed (YJIT) | 108.8K ops/sec | 2.2K ops/sec | 47.3 ops/sec | 5.2 ops/sec |
-| Time/Op (no YJIT) | 13.86 μs | 451.35 μs | 21.59 ms | 215.04 ms |
-| Time/Op (YJIT) | 9.19 μs | 449.99 μs | 21.15 ms | 193.67 ms |
-| YJIT Impact | +50.8% | +0.3% | +2.1% | +11.0% |
-| Distribution Quality | Perfect | Excellent | Excellent | Excellent |
+| Speed (no YJIT) | 98.7K ops/sec | 19.2K ops/sec | 1.8K ops/sec | 223 ops/sec |
+| Speed (YJIT) | 142.8K ops/sec | 30.1K ops/sec | 3.4K ops/sec | 356 ops/sec |
+| Time/Op (no YJIT) | 10.13 μs | 52.08 μs | 555.56 μs | 4.48 ms |
+| Time/Op (YJIT) | 7.00 μs | 33.22 μs | 280.70 μs | 2.81 ms |
+| YJIT Impact | +44.7% | +56.8% | +88.9% | +59.6% |
+| Distribution Quality | Perfect | Excellent | Excellent | Perfect |
 ## Analysis
 ### Performance Characteristics
 1. Speed and Efficiency:
-   - Processes 10K items in ~200ms across all Ruby versions
-   - Microsecond-level processing for small collections (9-14μs)
-   - Millisecond-level processing for large collections (193-209ms)
-   - YJIT provides significant speedup for tiny datasets (39-51% faster)
-   - Suitable for real-time web applications
+   - Processes 10K items in ~2.4-4.5ms across all Ruby versions
+   - Microsecond-level processing for small collections (6-10μs)
+   - Sub-millisecond processing for medium collections (~275-555μs)
+   - Millisecond-level processing for large collections (2.4-4.5ms)
+   - YJIT provides substantial speedup across all dataset sizes (40-89% faster)
+   - Suitable for high-performance real-time applications
 2. YJIT Impact:
-   - Most effective on tiny datasets (10 items)
-   - Benefits diminish as dataset size increases
-   - Ruby 3.2.8 shows most consistent YJIT improvements
-   - Some versions show slight regressions on larger datasets
-3. Distribution Quality:
-   - Perfect distribution in small datasets
-   - Highly accurate distribution in larger datasets
+   - Most effective on medium datasets (up to 89% improvement)
+   - Consistent improvements across all dataset sizes
+   - Ruby 3.4.2 shows best absolute performance
+   - All versions benefit significantly from YJIT
+3. Version Comparison:
+   - Ruby 3.4.2 with YJIT shows best overall performance
+   - Ruby 3.3.7 maintains strong second position
+   - Ruby 3.2.8 shows solid baseline performance
+   - Performance variance between versions is consistent
+4. Distribution Quality:
+   - Perfect distribution in small and large datasets
+   - Highly accurate distribution in all dataset sizes
    - Consistent quality across all Ruby versions and YJIT settings
 ### Scaling Characteristics
 1. Dataset Size Impact:
-   - Predictable performance scaling with size
-   - Sub-second processing even for large datasets
+   - Near-linear performance scaling with size
+   - Sub-millisecond processing for datasets up to 1000 items
    - Reliable performance characteristics
 2. Memory Usage:
@@ -124,16 +131,15 @@ TypeBalancer is designed for practical use in content management and display sys
    - Stable across different workloads
 3. Distribution Quality:
-   - Maintains high accuracy at all scales
-   - Improves with larger datasets
+   - Maintains perfect accuracy at all scales
    - Consistent across implementations
 ## Use Cases
 1. Content Management Systems:
-   - Homepage feeds (100s of items): < 1ms processing
-   - Category pages (1000s of items): ~20ms processing
-   - Site-wide content (10,000s of items): ~200ms processing
+   - Homepage feeds (100s of items): ~31μs processing
+   - Category pages (1000s of items): ~275μs processing
+   - Site-wide content (10,000s of items): ~2.4ms processing
 2. Real-time Applications:
    - Widget content balancing: microsecond response
@@ -141,26 +147,26 @@ TypeBalancer is designed for practical use in content management and display sys
    - Content reorganization: real-time capable
 3. Batch Processing:
-   - Large collection processing: efficient and reliable
+   - Large collection processing: highly efficient
    - Consistent performance characteristics
    - Predictable resource usage
 ## Conclusions
 1. Version Selection:
-   - Ruby 3.2.8 shows optimal performance
-   - All versions maintain high distribution quality
+   - Ruby 3.4.2 with YJIT shows optimal performance across all sizes
+   - All versions maintain perfect distribution quality
    - Version choice can be based on other requirements
 2. Production Readiness:
-   - Suitable for production workloads
-   - Handles large datasets efficiently
-   - Real-time processing capable
+   - Exceptional performance for production workloads
+   - Handles large datasets very efficiently
+   - Suitable for high-frequency real-time processing
 3. Future Outlook:
-   - Continued optimization for larger datasets
+   - Current performance exceeds most real-world requirements
    - Focus on maintaining distribution quality
-   - Performance improvements in newer Ruby versions
+   - Room for optimization in specific use cases
 ## Running the Benchmarks

data/docs/quality.md ADDED Viewed

@@ -0,0 +1,67 @@
+# Quality Script Documentation
+The TypeBalancer gem includes a comprehensive quality check script located at `/examples/quality.rb`. This script serves multiple purposes:
+1. **Documentation through Examples**: Demonstrates various use cases and features of the gem
+2. **Quality Assurance**: Verifies that core functionality works as expected
+3. **Integration Testing**: Tests how different components work together
+## Running the Script
+To run the quality script:
+```bash
+bundle exec ruby examples/quality.rb
+```
+## What it Tests
+The script tests several key aspects of the TypeBalancer gem:
+### 1. Basic Distribution
+- Demonstrates how items are distributed across available slots
+- Shows spacing calculations between positions
+- Verifies edge cases (single item, no items, all items)
+### 2. Content Feed Example
+- Shows a real-world example of content type distribution
+- Verifies position allocation for different content types (video, image, article)
+- Checks distribution statistics and ratios
+### 3. Balancer API
+- Tests the main TypeBalancer.balance method
+- Verifies batch creation and size limits
+- Demonstrates custom type ordering
+### 4. Type Extraction
+- Tests type extraction from both hash and object items
+- Verifies support for different type field access methods
+### 5. Error Handling
+- Validates handling of empty collections
+- Tests response to invalid type fields
+- Verifies batch size validation
+## Output Format
+The script provides detailed output showing:
+- Results of each test case
+- Distribution statistics
+- Any issues found during testing
+- A summary of all examples run and passed
+## Using as a Development Tool
+The quality script is particularly useful when:
+1. Developing new features
+2. Refactoring existing code
+3. Verifying changes haven't broken core functionality
+4. Understanding how different features work together
+## Extending the Script
+When adding new features to TypeBalancer, consider:
+1. Adding relevant examples to the quality script
+2. Including edge cases
+3. Documenting expected behavior
+4. Adding appropriate quality checks

data/examples/quality.rb CHANGED Viewed

@@ -13,6 +13,8 @@ class QualityChecker
     check_basic_distribution
     check_available_items
     check_edge_cases
+    check_position_precision
+    check_available_positions_edge_cases
     check_real_world_feed
     print_summary
@@ -97,11 +99,94 @@ class QualityChecker
     end
   end
+  def check_position_precision
+    puts "\nPosition Precision Cases:"
+    # Two positions in three slots
+    @examples_run += 1
+    positions = TypeBalancer.calculate_positions(total_count: 3, ratio: 0.67)
+    puts "Two positions in three slots: #{positions.inspect}"
+    if positions == [0, 1]
+      @examples_passed += 1
+    else
+      record_issue("Two in three case returned #{positions.inspect} instead of [0, 1]")
+    end
+    # Single position in three slots
+    @examples_run += 1
+    positions = TypeBalancer.calculate_positions(total_count: 3, ratio: 0.34)
+    puts "Single position in three slots: #{positions.inspect}"
+    if positions == [0]
+      @examples_passed += 1
+    else
+      record_issue("One in three case returned #{positions.inspect} instead of [0]")
+    end
+  end
+  def check_available_positions_edge_cases
+    puts "\nAvailable Positions Edge Cases:"
+    # Single target with multiple available positions
+    @examples_run += 1
+    positions = TypeBalancer.calculate_positions(
+      total_count: 5,
+      ratio: 0.2,
+      available_items: [1, 2, 3]
+    )
+    puts "Single target with multiple available: #{positions.inspect}"
+    if positions == [1]
+      @examples_passed += 1
+    else
+      record_issue("Single target with multiple available returned #{positions.inspect} instead of [1]")
+    end
+    # Two targets with multiple available positions
+    @examples_run += 1
+    positions = TypeBalancer.calculate_positions(
+      total_count: 10,
+      ratio: 0.2,
+      available_items: [1, 3, 5]
+    )
+    puts "Two targets with multiple available: #{positions.inspect}"
+    if positions == [1, 5]
+      @examples_passed += 1
+    else
+      record_issue("Two targets with multiple available returned #{positions.inspect} instead of [1, 5]")
+    end
+    # Exact match of available positions
+    @examples_run += 1
+    positions = TypeBalancer.calculate_positions(
+      total_count: 10,
+      ratio: 0.3,
+      available_items: [2, 4, 6]
+    )
+    puts "Exact match of available positions: #{positions.inspect}"
+    if positions == [2, 4, 6]
+      @examples_passed += 1
+    else
+      record_issue("Exact match case returned #{positions.inspect} instead of [2, 4, 6]")
+    end
+  end
   def check_real_world_feed
     @examples_run += 1
     puts "\nReal World Example - Content Feed:"
     feed_size = 20
+    # Create test items
+    items = [
+      { type: 'video', id: 1 },
+      { type: 'video', id: 2 },
+      { type: 'video', id: 3 },
+      { type: 'image', id: 4 },
+      { type: 'image', id: 5 },
+      { type: 'image', id: 6 },
+      { type: 'article', id: 7 },
+      { type: 'article', id: 8 },
+      { type: 'article', id: 9 }
+    ]
     # Track allocated positions
     allocated_positions = []
     content_positions = {}
@@ -155,13 +240,40 @@ class QualityChecker
         record_issue("#{type} count #{count} doesn't match expected #{expected_counts[type]}")
       end
     end
+    # Test with custom type order
+    ordered_result = TypeBalancer.balance(
+      items,
+      type_field: :type,
+      type_order: %w[article image video]
+    )
+    # Verify type order is respected
+    if ordered_result.first[:type] == 'article'
+      @examples_passed += 1
+    else
+      record_issue("Custom type order not respected")
+    end
+    # Test position calculation
+    positions = TypeBalancer::Distributor.calculate_target_positions(
+      total_count: 10,
+      ratio: 0.3
+    )
+    if positions.is_a?(Array) && positions.all? { |p| p.is_a?(Integer) }
+      @examples_passed += 1
+    else
+      record_issue("Position calculation failed")
+    end
+    puts "\nBalanced items with custom order:"
   end
   def print_summary
     puts "\n#{'-' * 50}"
     puts 'Quality Check Summary:'
     puts "Examples Run: #{@examples_run}"
-    puts "Examples Passed: #{@examples_passed}"
+    puts "Expectations Passed: #{@examples_passed}"
     if @issues.empty?
       puts "\nAll quality checks passed! ✓"

data/lib/type_balancer/balancer.rb CHANGED Viewed

@@ -1,126 +1,103 @@
 # frozen_string_literal: true
+require_relative 'ratio_calculator'
+require_relative 'batch_processing'
+require_relative 'position_calculator'
 module TypeBalancer
-  # Main class responsible for balancing items in a collection based on their types.
-  # It uses a distribution calculator to determine optimal positions for each type
-  # and a gap filler strategy to place items in the final sequence.
+  # Handles balancing of items across batches based on type ratios
   class Balancer
-    BATCH_SIZE = 500 # Process items in batches of 500 for better performance
-    def initialize(collection, type_field: :type, types: nil, distribution_calculator: nil)
-      @collection = collection
-      @type_field = type_field
-      @types = types || extract_types
-      @distribution_calculator = distribution_calculator || Distributor
+    # Initialize a new Balancer instance
+    #
+    # @param types [Array<String>, nil] Optional types
+    # @param type_order [Array<String>, nil] Optional order of types
+    def initialize(types = nil, type_order: nil)
+      @types = Array(types) if types
+      @type_order = type_order
+      validate_types! if @types
     end
-    def call
-      return [] if @collection.empty?
+    # Main entry point for balancing items
+    #
+    # @param collection [Array] Items to balance
+    # @return [Array] Balanced items
+    def call(collection)
+      validate_collection!(collection)
+      items_by_type = group_items_by_type(collection)
+      validate_types_in_collection!(items_by_type)
+      target_counts = calculate_target_counts(items_by_type)
+      available_positions = (0...collection.size).to_a
+      result = Array.new(collection.size)
+      sorted_types = sort_types(items_by_type.keys)
+      sorted_types.each do |type|
+        items = items_by_type[type]
+        target_count = target_counts[type]
+        ratio = target_count.to_f / collection.size
+        positions = PositionCalculator.calculate_positions(
+          total_count: collection.size,
+          ratio: ratio,
+          available_items: available_positions
+        )
+        positions.each_with_index do |pos, idx|
+          result[pos] = items[idx]
+        end
-      if @collection.size <= BATCH_SIZE
-        process_single_batch(@collection)
-      else
-        process_multiple_batches
+        # Remove used positions from available positions
+        available_positions -= positions
       end
+      result.compact
     end
     private
-    def process_single_batch(items)
-      # Group items by type
-      items_by_type = items.group_by { |item| get_type(item) }
-      # Calculate ratios based on type order and counts
-      ratios = calculate_ratios(items_by_type)
-      # Calculate positions for each type
-      positions_by_type = calculate_positions_by_type(items_by_type, ratios, items.size)
-      # Map items to their balanced positions
-      balanced_items = place_items_in_positions(items_by_type, positions_by_type, items.size)
-      # Fill any gaps with remaining items
-      fill_gaps(balanced_items, items)
+    def validate_types!
+      raise ArgumentError, 'Types cannot be empty' if @types.empty?
     end
-    def process_multiple_batches
-      result = []
-      @collection.each_slice(BATCH_SIZE) do |batch|
-        result.concat(process_single_batch(batch))
-      end
-      result
+    def validate_collection!(collection)
+      raise ArgumentError, 'Collection cannot be empty' if collection.empty?
     end
-    def calculate_positions_by_type(items_by_type, ratios, total_count)
-      positions_by_type = {}
-      @types.each_with_index do |type, index|
-        items = items_by_type[type] || []
-        ratio = ratios[index]
-        positions = @distribution_calculator.calculate_target_positions(total_count, items.size, ratio)
-        positions_by_type[type] = positions
-      end
+    def validate_types_in_collection!(items_by_type)
+      return unless @types
-      positions_by_type
+      invalid_types = items_by_type.keys - @types
+      raise TypeBalancer::Error, "Invalid type(s): #{invalid_types.join(', ')}" if invalid_types.any?
     end
-    def place_items_in_positions(items_by_type, positions_by_type, total_count)
-      balanced_items = Array.new(total_count)
-      @types.each do |type|
-        items = items_by_type[type] || []
-        positions = positions_by_type[type] || []
-        items.each_with_index do |item, index|
-          pos = positions[index]
-          next unless pos && pos < total_count && balanced_items[pos].nil?
-          balanced_items[pos] = item
-        end
+    def group_items_by_type(collection)
+      collection.group_by do |item|
+        extract_type(item)
       end
-      balanced_items
     end
-    def fill_gaps(balanced_items, original_items)
-      # Fill any gaps with remaining items
-      remaining_items = original_items.reject { |item| balanced_items.include?(item) }
-      empty_positions = balanced_items.each_index.select { |i| balanced_items[i].nil? }
-      empty_positions.each_with_index do |pos, idx|
-        break unless idx < remaining_items.size
+    def extract_type(item)
+      return item[:type] || item['type'] || raise(TypeBalancer::Error, 'Cannot access type field') if item.is_a?(Hash)
-        balanced_items[pos] = remaining_items[idx]
+      begin
+        item.type
+      rescue NoMethodError
+        raise TypeBalancer::Error, 'Cannot access type field'
       end
-      balanced_items.compact
     end
-    def calculate_ratios(_items_by_type)
-      case @types.size
-      when 1
-        [1.0]
-      when 2
-        [0.6, 0.4]
-      else
-        # First type gets 0.4, rest split remaining 0.6 evenly
-        remaining = (0.6 / (@types.size - 1).to_f).round(6)
-        [0.4] + Array.new(@types.size - 1, remaining)
-      end
+    def calculate_target_counts(items_by_type)
+      items_by_type.values.sum(&:size)
+      items_by_type.transform_values(&:size)
     end
-    def get_type(item)
-      if item.respond_to?(@type_field)
-        item.send(@type_field)
-      elsif item.respond_to?(:[])
-        item[@type_field] || item[@type_field.to_s]
-      else
-        raise Error, "Cannot access type field '#{@type_field}' on item #{item}"
-      end
-    end
+    def sort_types(types)
+      return types.sort unless @type_order
-    def extract_types
-      TypeBalancer.extract_types(@collection, @type_field)
+      types.sort_by do |type|
+        idx = @type_order.index(type)
+        idx || Float::INFINITY
+      end
     end
   end
 end

data/lib/type_balancer/batch_processing.rb ADDED Viewed

@@ -0,0 +1,35 @@
+# frozen_string_literal: true
+module TypeBalancer
+  class BatchProcessing
+    def initialize(batch_size)
+      @batch_size = batch_size
+    end
+    def create_batches(items_by_type, positions_by_type)
+      total_items = items_by_type.values.sum(&:size)
+      batches = []
+      current_batch = []
+      (0...total_items).each do |position|
+        type = find_type_for_position(position, positions_by_type)
+        current_batch << items_by_type[type].shift if type && !items_by_type[type].empty?
+        if current_batch.size >= @batch_size || position == total_items - 1
+          batches << current_batch unless current_batch.empty?
+          current_batch = []
+        end
+      end
+      batches
+    end
+    private
+    def find_type_for_position(position, positions_by_type)
+      positions_by_type.find do |_, positions|
+        positions.include?(position)
+      end&.first
+    end
+  end
+end