RubyGems - split-test-rb - Versions diffs - 1.0.0 - Mend

split-test-rb 1.0.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (7) hide show

checksums.yaml ADDED Viewed

@@ -0,0 +1,7 @@
+---
+SHA256:
+  metadata.gz: e7ce85d70006c8b2a102995f7cd447f650fd7b69e5c944c510b3a3964c48062e
+  data.tar.gz: 4da7116f406f33cc6a7e8289d3c492ed6c8606f914fdd2f0cfdcaca1b99535b1
+SHA512:
+  metadata.gz: fe78061d59dabf16414d848d31a321e4584e751c92412e3c86e3544018e3e6660671b2699ddbdfe2e838352343df03df7e18fa0cc499387501c009063a3aefaf
+  data.tar.gz: 1507fe84941c2a77ce2a4f1af80ef21b23a1a41ca3c8f13a4b1b09a5163bfe48fffb5e62e2158d8545bee2bf156fb8e3cf0698c27b492d4dbef7ec37be154328

data/LICENSE ADDED Viewed

@@ -0,0 +1,21 @@
+MIT License
+Copyright (c) 2026 naofumi-fujii
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.

data/README.md ADDED Viewed

@@ -0,0 +1,165 @@
+# split-test-rb
+[![codecov](https://codecov.io/gh/naofumi-fujii/split-test-rb/branch/main/graph/badge.svg)](https://codecov.io/gh/naofumi-fujii/split-test-rb)
+A simple Ruby CLI tool to balance RSpec tests across parallel CI nodes using RSpec JSON reports.
+## Overview
+split-test-rb reads RSpec JSON test reports containing execution times and distributes test files across multiple nodes for parallel execution. It uses a greedy algorithm to ensure balanced distribution based on historical test execution times.
+## Installation
+Since this gem is not yet published to RubyGems, you need to install it from GitHub.
+Add to your Gemfile:
+```ruby
+gem 'split-test-rb', github: 'naofumi-fujii/split-test-rb'
+```
+Then run:
+```bash
+bundle install
+```
+## GitHub Actions Example
+First, add split-test-rb to your Gemfile:
+```ruby
+# Gemfile
+gem 'split-test-rb', github: 'naofumi-fujii/split-test-rb'
+```
+For a working example, see this project's own CI configuration:
+- [.github/workflows/ci.yml](https://github.com/naofumi-fujii/split-test-rb/blob/main/.github/workflows/ci.yml)
+## Usage
+### Command Line Options
+```
+split-test-rb [options]
+Options:
+  --node-index INDEX          Current node index (0-based)
+  --node-total TOTAL          Total number of nodes
+  --json-path PATH            Path to directory containing RSpec JSON reports (required)
+  --test-dir DIR              Test directory (default: spec)
+  --test-pattern PATTERN      Test file pattern (default: **/*_spec.rb)
+  --split-by-example-threshold SECONDS
+                              Split files with execution time >= threshold into individual examples
+  --debug                     Show debug information
+  -h, --help                  Show help message
+```
+### Custom Test Directory and Pattern
+By default, split-test-rb looks for test files in the `spec/` directory with the pattern `**/*_spec.rb`. You can customize this for projects with different test directory structures:
+**Using Minitest with `test/` directory:**
+```bash
+split-test-rb --json-path tmp/test-results \
+  --node-index $CI_NODE_INDEX \
+  --node-total $CI_NODE_TOTAL \
+  --test-dir test \
+  --test-pattern '**/*_test.rb'
+```
+**Custom test directory structure:**
+```bash
+split-test-rb --json-path tmp/test-results \
+  --node-index 0 \
+  --node-total 4 \
+  --test-dir tests \
+  --test-pattern 'unit/**/*.rb'
+```
+The test directory and pattern options are useful for:
+- Projects using Minitest (`test/` directory)
+- Custom test directory structures
+- Different naming conventions for test files
+- Monorepos with multiple test suites
+### Example-Level Splitting for Heavy Files
+When you have test files that take significantly longer than others, you can use `--split-by-example-threshold` to automatically split them into individual RSpec examples. This enables finer-grained load balancing across CI nodes.
+```bash
+split-test-rb --json-path tmp/test-results \
+  --node-index $CI_NODE_INDEX \
+  --node-total $CI_NODE_TOTAL \
+  --split-by-example-threshold 10.0
+```
+With this option:
+- Files with execution time **below** the threshold are distributed as whole files (e.g., `spec/fast_spec.rb`)
+- Files with execution time **at or above** the threshold are split into individual examples (e.g., `spec/slow_spec.rb[1:1]`, `spec/slow_spec.rb[1:2]`)
+This is useful when:
+- A single test file contains many slow examples that dominate a CI node's runtime
+- You want to maximize parallelization without manually splitting large test files
+- Some test files are bottlenecks that prevent even distribution
+**Note:** The JSON report must contain the `id` field for each example (RSpec's default JSON formatter includes this). The tool uses these IDs to generate the example-specific paths that RSpec can run.
+## How It Works
+1. **Parse RSpec JSON**: Extracts test file paths and execution times from the JSON report
+2. **Greedy Balancing**: Sorts files by execution time (descending) and assigns each file to the node with the lowest cumulative time
+3. **Output**: Prints the list of test files for the specified node
+## Fallback Behavior
+split-test-rb provides intelligent fallback handling to ensure tests can run even without historical timing data:
+### When JSON file doesn't exist
+If the specified JSON file is not found, the tool will:
+- Display a warning: `Warning: JSON directory not found: <path>, using all test files with equal execution time`
+- Find all test files matching the specified directory and pattern (default: `spec/**/*_spec.rb`)
+- Assign equal execution time (1.0 seconds) to each file
+- Distribute them evenly across nodes
+This is useful for:
+- First-time runs when no test history exists yet
+- Local development environments
+- New CI pipelines
+### When test files are missing from JSON
+If new test files exist that aren't in the JSON report, the tool will:
+- Display a warning: `Warning: Found N test files not in JSON, adding with default execution time`
+- Add the missing files with default execution time (1.0 seconds)
+- Include them in the distribution
+This ensures newly added test files are always included in the test run.
+## RSpec JSON Format
+The tool expects [RSpec JSON output format](https://rspec.info/features/3-13/rspec-core/formatters/json-formatter/) (generated with `--format json`):
+```json
+{
+  "examples": [
+    {
+      "file_path": "./spec/models/user_spec.rb",
+      "run_time": 1.234
+    },
+    {
+      "file_path": "./spec/models/post_spec.rb",
+      "run_time": 0.567
+    }
+  ]
+}
+```
+To generate JSON reports with RSpec, use the built-in JSON formatter:
+```bash
+bundle exec rspec --format json --out tmp/rspec-results/results.json
+```
+## License
+MIT

data/bin/split-test-rb ADDED Viewed

@@ -0,0 +1,5 @@
+#!/usr/bin/env ruby
+require_relative '../lib/split_test_rb'
+SplitTestRb::CLI.run(ARGV)

data/lib/split_test_rb/version.rb ADDED Viewed

@@ -0,0 +1,3 @@
+module SplitTestRb
+  VERSION = '1.0.0'.freeze
+end

data/lib/split_test_rb.rb ADDED Viewed

@@ -0,0 +1,399 @@
+require 'json'
+require 'optparse'
+require_relative 'split_test_rb/version'
+module SplitTestRb
+  # Parses RSpec JSON result files and extracts test timing data
+  class JsonParser
+    # Parses RSpec JSON file and returns hash of {file_path => execution_time}
+    def self.parse(json_path)
+      content = File.read(json_path)
+      data = JSON.parse(content)
+      timings = {}
+      examples = data['examples'] || []
+      examples.each do |example|
+        file_path = extract_file_path(example)
+        run_time = example['run_time'].to_f
+        next unless file_path
+        # Normalize path to ensure consistent format (remove leading ./)
+        file_path = normalize_path(file_path)
+        # Aggregate timing for files (sum if multiple test cases from same file)
+        timings[file_path] ||= 0
+        timings[file_path] += run_time
+      end
+      timings
+    end
+    # Extracts file path from example, preferring id field over file_path
+    # This is important for shared examples where file_path points to the shared example file
+    # but id contains the actual spec file path (e.g., "./spec/features/entry_spec.rb[1:1:1]")
+    def self.extract_file_path(example)
+      if example['id']
+        # Extract file path from id (format: "./path/to/spec.rb[1:2:3]")
+        example['id'].split('[').first
+      else
+        example['file_path']
+      end
+    end
+    # Parses RSpec JSON file and returns hash of {example_id => execution_time}
+    # Example ID format: "spec/file.rb[1:1]"
+    def self.parse_with_examples(json_path)
+      content = File.read(json_path)
+      data = JSON.parse(content)
+      timings = {}
+      examples = data['examples'] || []
+      examples.each do |example|
+        next unless example['id']
+        example_id = normalize_path(example['id'])
+        run_time = example['run_time'].to_f
+        timings[example_id] = run_time
+      end
+      timings
+    end
+    # Parses multiple JSON files and returns hash of {example_id => execution_time}
+    def self.parse_files_with_examples(json_paths)
+      timings = {}
+      json_paths.each do |json_path|
+        next unless File.exist?(json_path)
+        next if File.empty?(json_path)
+        begin
+          example_timings = parse_with_examples(json_path)
+          example_timings.each do |example_id, time|
+            timings[example_id] = time
+          end
+        rescue JSON::ParserError => e
+          warn "Warning: Failed to parse #{json_path}: #{e.message}"
+        end
+      end
+      timings
+    end
+    # Parses all JSON files in a directory and merges results
+    def self.parse_directory(dir_path)
+      json_files = Dir.glob(File.join(dir_path, '**', '*.json'))
+      parse_files(json_files)
+    end
+    # Parses multiple JSON files and merges results
+    def self.parse_files(json_paths)
+      timings = {}
+      json_paths.each do |json_path|
+        next unless File.exist?(json_path)
+        next if File.empty?(json_path)
+        begin
+          file_timings = parse(json_path)
+          file_timings.each do |file, time|
+            timings[file] ||= 0
+            timings[file] += time
+          end
+        rescue JSON::ParserError => e
+          warn "Warning: Failed to parse #{json_path}: #{e.message}"
+        end
+      end
+      timings
+    end
+    # Normalizes file path by removing leading ./
+    def self.normalize_path(path)
+      path.sub(%r{^\./}, '')
+    end
+  end
+  # Balances test files across multiple nodes using greedy algorithm
+  class Balancer
+    # Distributes test files across nodes based on execution times
+    # Uses greedy algorithm: assign each file to the node with lowest cumulative time
+    def self.balance(timings, total_nodes)
+      # Sort files by execution time (descending) for better balance
+      sorted_files = timings.sort_by { |_file, time| -time }
+      # Initialize nodes with empty arrays and zero cumulative time
+      nodes = Array.new(total_nodes) { { files: [], total_time: 0 } }
+      # Assign each file to the node with lowest cumulative time
+      sorted_files.each do |file, time|
+        # Find node with minimum total time
+        min_node = nodes.min_by { |node| node[:total_time] }
+        min_node[:files] << file
+        min_node[:total_time] += time
+      end
+      nodes
+    end
+  end
+  # Command-line interface
+  class CLI
+    def self.run(argv)
+      options = parse_options(argv)
+      validate_options!(options)
+      timings, default_files, json_files = load_timings(options)
+      exit_if_no_tests(timings)
+      nodes = Balancer.balance(timings, options[:total_nodes])
+      DebugPrinter.print(nodes, timings, default_files, json_files) if options[:debug]
+      output_node_files(nodes, options[:node_index])
+    end
+    def self.validate_options!(options)
+      return if options[:json_path]
+      warn 'Error: --json-path is required'
+      exit 1
+    end
+    def self.load_timings(options)
+      json_dir = options[:json_path]
+      if File.directory?(json_dir)
+        load_timings_from_json(json_dir, options)
+      else
+        warn "Warning: JSON directory not found: #{json_dir}, using all test files with equal execution time"
+        timings = find_all_spec_files(options[:test_dir], options[:test_pattern])
+        [timings, Set.new(timings.keys), []]
+      end
+    end
+    def self.load_timings_from_json(json_dir, options)
+      json_files = Dir.glob(File.join(json_dir, '**', '*.json'))
+      file_timings = JsonParser.parse_files(json_files)
+      all_test_files = find_all_spec_files(options[:test_dir], options[:test_pattern])
+      # Filter out files from JSON cache that don't match the test pattern
+      file_timings.select! { |file, _| all_test_files.key?(file) }
+      default_files = add_missing_files_with_default_timing(file_timings, all_test_files)
+      # Apply example-level splitting if threshold is set
+      threshold = options[:split_by_example_threshold]
+      timings = if threshold
+                  apply_example_splitting(file_timings, json_files, threshold)
+                else
+                  file_timings
+                end
+      [timings, default_files, json_files]
+    end
+    # Splits heavy files (>= threshold) into individual examples
+    def self.apply_example_splitting(file_timings, json_files, threshold)
+      heavy_files = file_timings.select { |_file, time| time >= threshold }
+      return file_timings if heavy_files.empty?
+      example_timings = JsonParser.parse_files_with_examples(json_files)
+      # Start with light files (below threshold)
+      timings = file_timings.reject { |file, _| heavy_files.key?(file) }
+      # Add individual examples from heavy files
+      heavy_files.each_key do |heavy_file|
+        example_timings.each do |example_id, time|
+          timings[example_id] = time if example_id.start_with?(heavy_file)
+        end
+      end
+      timings
+    end
+    # Adds test files missing from JSON results with default timing (1.0s)
+    def self.add_missing_files_with_default_timing(timings, all_test_files)
+      default_files = Set.new
+      missing_files = all_test_files.keys - timings.keys
+      return default_files if missing_files.empty?
+      warn "Warning: Found #{missing_files.size} test files not in JSON, adding with default execution time"
+      missing_files.each do |file|
+        timings[file] = 1.0
+        default_files.add(file)
+      end
+      default_files
+    end
+    def self.exit_if_no_tests(timings)
+      return unless timings.empty?
+      warn 'Warning: No test files found'
+      exit 0
+    end
+    def self.output_node_files(nodes, node_index)
+      node_files = nodes[node_index][:files]
+      puts node_files.join("\n")
+    end
+    # Default option values for CLI
+    DEFAULT_OPTIONS = {
+      node_index: 0,
+      total_nodes: 1,
+      debug: false,
+      test_dir: 'spec',
+      test_pattern: '**/*_spec.rb',
+      split_by_example_threshold: nil
+    }.freeze
+    # Parses command-line arguments and returns options hash
+    def self.parse_options(argv)
+      options = DEFAULT_OPTIONS.dup
+      build_option_parser(options).parse!(argv)
+      options
+    end
+    # Builds and configures the OptionParser instance
+    def self.build_option_parser(options)
+      OptionParser.new do |opts|
+        opts.banner = 'Usage: split-test-rb [options]'
+        define_options(opts, options)
+      end
+    end
+    # Defines all CLI options on the given OptionParser
+    def self.define_options(opts, options)
+      define_node_options(opts, options)
+      define_test_options(opts, options)
+    end
+    # Defines node distribution related CLI options
+    def self.define_node_options(opts, options)
+      opts.on('--node-index INDEX', Integer, 'Current node index (0-based)') { |v| options[:node_index] = v }
+      opts.on('--node-total TOTAL', Integer, 'Total number of nodes') { |v| options[:total_nodes] = v }
+      opts.on('--json-path PATH', 'Path to directory containing RSpec JSON reports') { |v| options[:json_path] = v }
+    end
+    # Defines test configuration and utility CLI options
+    def self.define_test_options(opts, options)
+      opts.on('--test-dir DIR', 'Test directory (default: spec)') { |v| options[:test_dir] = v }
+      opts.on('--test-pattern PATTERN', 'Test file pattern (default: **/*_spec.rb)') { |v| options[:test_pattern] = v }
+      opts.on('--split-by-example-threshold SECONDS', Float,
+              'Split files with execution time >= threshold into individual examples') do |v|
+        options[:split_by_example_threshold] = v
+      end
+      opts.on('--debug', 'Show debug information') { options[:debug] = true }
+      opts.on('-h', '--help', 'Show this help message') do
+        puts opts
+        exit
+      end
+      opts.on('-v', '--version', 'Show version') do
+        puts "split-test-rb #{VERSION}"
+        exit
+      end
+    end
+    def self.find_all_spec_files(test_dir = 'spec', test_pattern = '**/*_spec.rb')
+      # Find all test files in the specified directory with the given pattern
+      glob_pattern = File.join(test_dir, test_pattern)
+      test_files = Dir.glob(glob_pattern)
+      # Normalize paths and assign equal execution time (1.0) to each file
+      test_files.each_with_object({}) do |file, hash|
+        normalized_path = JsonParser.normalize_path(file)
+        hash[normalized_path] = 1.0
+      end
+    end
+  end
+  # Outputs debug information about test distribution
+  module DebugPrinter
+    # Shows distribution statistics, timing data sources, and per-node assignments
+    def self.print(nodes, timings, default_files, json_files)
+      total_files = timings.size
+      total_time = timings.values.sum.round(2)
+      files_from_xml = total_files - default_files.size
+      avg_time, variance, max_deviation = calculate_load_balance_stats(nodes, total_time)
+      warn '=== Test Balancing Debug Info ==='
+      warn ''
+      print_loaded_json_files(json_files, timings)
+      print_timing_data_source(files_from_xml, default_files.size, total_files, total_time)
+      print_load_balance_stats(avg_time, max_deviation)
+      print_node_distribution(nodes, variance, timings, default_files)
+      warn '===================================='
+    end
+    # Prints information about loaded JSON result files
+    def self.print_loaded_json_files(json_files, timings)
+      warn '## Loaded Test Result Files'
+      if json_files.empty?
+        warn '  (no JSON files loaded)'
+      else
+        json_files.each do |file|
+          warn "  - #{file}"
+        end
+        warn "  Total: #{json_files.size} JSON files, #{timings.size} test files extracted"
+      end
+      warn ''
+    end
+    # Calculates load balance statistics across nodes
+    def self.calculate_load_balance_stats(nodes, total_time)
+      avg_time = total_time / nodes.size
+      variance = nodes.map { |n| ((n[:total_time] - avg_time) / avg_time * 100).round(1) }
+      max_deviation = variance.map(&:abs).max
+      [avg_time, variance, max_deviation]
+    end
+    # Prints timing data source information
+    def self.print_timing_data_source(files_from_xml, default_files_count, total_files, total_time)
+      warn '## Timing Data Source (from past test execution results)'
+      warn "  - Files with historical timing: #{files_from_xml} files"
+      warn "  - Files with default timing (1.0s): #{default_files_count} files"
+      warn "  - Total files: #{total_files} files"
+      warn "  - Total estimated time: #{total_time}s"
+      warn ''
+    end
+    # Prints load balance statistics
+    def self.print_load_balance_stats(avg_time, max_deviation)
+      warn '## Load Balance'
+      warn "  - Average time per node: #{avg_time.round(2)}s"
+      warn "  - Max deviation from average: #{max_deviation}%"
+      warn ''
+    end
+    # Prints per-node distribution details
+    def self.print_node_distribution(nodes, variance, timings, default_files)
+      warn '## Per-Node Distribution'
+      nodes.each_with_index do |node, index|
+        print_node_info(node, index, variance[index], timings, default_files)
+      end
+    end
+    # Prints information for a single node
+    def self.print_node_info(node, index, deviation, timings, default_files)
+      deviation_str = deviation >= 0 ? "+#{deviation}%" : "#{deviation}%"
+      warn "Node #{index}: #{node[:files].size} files, #{node[:total_time].round(2)}s (#{deviation_str} from avg)"
+      node[:files].each do |file|
+        warn "  - #{file} #{format_file_timing(file, timings, default_files)}"
+      end
+      warn ''
+    end
+    # Formats file timing information with labels
+    def self.format_file_timing(file, timings, default_files)
+      time = timings[file]
+      timing_str = "(#{time.round(2)}s"
+      timing_str += ', default - no historical data' if default_files.include?(file)
+      timing_str += ')'
+      timing_str
+    end
+  end
+end

metadata ADDED Viewed

@@ -0,0 +1,106 @@
+--- !ruby/object:Gem::Specification
+name: split-test-rb
+version: !ruby/object:Gem::Version
+  version: 1.0.0
+platform: ruby
+authors:
+- Naofumi Fujii
+autorequire:
+bindir: bin
+cert_chain: []
+date: 2026-02-16 00:00:00.000000000 Z
+dependencies:
+- !ruby/object:Gem::Dependency
+  name: rspec
+  requirement: !ruby/object:Gem::Requirement
+    requirements:
+    - - "~>"
+      - !ruby/object:Gem::Version
+        version: '3.12'
+  type: :development
+  prerelease: false
+  version_requirements: !ruby/object:Gem::Requirement
+    requirements:
+    - - "~>"
+      - !ruby/object:Gem::Version
+        version: '3.12'
+- !ruby/object:Gem::Dependency
+  name: rubocop
+  requirement: !ruby/object:Gem::Requirement
+    requirements:
+    - - "~>"
+      - !ruby/object:Gem::Version
+        version: '1.50'
+  type: :development
+  prerelease: false
+  version_requirements: !ruby/object:Gem::Requirement
+    requirements:
+    - - "~>"
+      - !ruby/object:Gem::Version
+        version: '1.50'
+- !ruby/object:Gem::Dependency
+  name: rubocop-rspec
+  requirement: !ruby/object:Gem::Requirement
+    requirements:
+    - - "~>"
+      - !ruby/object:Gem::Version
+        version: '3.9'
+  type: :development
+  prerelease: false
+  version_requirements: !ruby/object:Gem::Requirement
+    requirements:
+    - - "~>"
+      - !ruby/object:Gem::Version
+        version: '3.9'
+- !ruby/object:Gem::Dependency
+  name: simplecov
+  requirement: !ruby/object:Gem::Requirement
+    requirements:
+    - - "~>"
+      - !ruby/object:Gem::Version
+        version: '0.22'
+  type: :development
+  prerelease: false
+  version_requirements: !ruby/object:Gem::Requirement
+    requirements:
+    - - "~>"
+      - !ruby/object:Gem::Version
+        version: '0.22'
+description: A simple CLI tool to balance RSpec tests across parallel CI nodes using
+  RSpec JSON reports
+email:
+executables:
+- split-test-rb
+extensions: []
+extra_rdoc_files: []
+files:
+- LICENSE
+- README.md
+- bin/split-test-rb
+- lib/split_test_rb.rb
+- lib/split_test_rb/version.rb
+homepage: https://github.com/naofumi-fujii/split-test-rb
+licenses:
+- MIT
+metadata:
+  rubygems_mfa_required: 'true'
+post_install_message:
+rdoc_options: []
+require_paths:
+- lib
+required_ruby_version: !ruby/object:Gem::Requirement
+  requirements:
+  - - ">="
+    - !ruby/object:Gem::Version
+      version: 3.2.0
+required_rubygems_version: !ruby/object:Gem::Requirement
+  requirements:
+  - - ">="
+    - !ruby/object:Gem::Version
+      version: '0'
+requirements: []
+rubygems_version: 3.5.11
+signing_key:
+specification_version: 4
+summary: Split tests across multiple nodes based on timing data
+test_files: []