RubyGems - rspec-flake-classifier - Versions diffs - 0.1.0 - Mend

rspec-flake-classifier 0.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (31) hide show

checksums.yaml +7 -0
data/LICENSE.txt +21 -0
data/README.md +301 -0
data/Rakefile +8 -0
data/exe/rspec-flake +6 -0
data/lib/rspec/flake/classifier/classify/classifier.rb +228 -0
data/lib/rspec/flake/classifier/classify/context.rb +41 -0
data/lib/rspec/flake/classifier/classify/result.rb +44 -0
data/lib/rspec/flake/classifier/cli.rb +298 -0
data/lib/rspec/flake/classifier/configuration.rb +40 -0
data/lib/rspec/flake/classifier/coverage_snapshot.rb +89 -0
data/lib/rspec/flake/classifier/deflaker.rb +102 -0
data/lib/rspec/flake/classifier/evaluation.rb +127 -0
data/lib/rspec/flake/classifier/example_history.rb +24 -0
data/lib/rspec/flake/classifier/features.rb +42 -0
data/lib/rspec/flake/classifier/formatter.rb +194 -0
data/lib/rspec/flake/classifier/integrations.rb +247 -0
data/lib/rspec/flake/classifier/predictor.rb +144 -0
data/lib/rspec/flake/classifier/probe_evidence.rb +77 -0
data/lib/rspec/flake/classifier/rerun/bisect_dependency_search.rb +81 -0
data/lib/rspec/flake/classifier/rerun/isolated_runner.rb +69 -0
data/lib/rspec/flake/classifier/rerun/protocol.rb +83 -0
data/lib/rspec/flake/classifier/rerun/result.rb +82 -0
data/lib/rspec/flake/classifier/runtime_controls.rb +63 -0
data/lib/rspec/flake/classifier/sensitivity.rb +82 -0
data/lib/rspec/flake/classifier/signature.rb +59 -0
data/lib/rspec/flake/classifier/store/jsonl_store.rb +131 -0
data/lib/rspec/flake/classifier/version.rb +13 -0
data/lib/rspec/flake/classifier.rb +285 -0
data/sig/rspec/flake/classifier.rbs +176 -0
metadata +135 -0

checksums.yaml ADDED Viewed

@@ -0,0 +1,7 @@
+---
+SHA256:
+  metadata.gz: 858a4f181049ee08fcd65482a9d162b5c9cb299be9492058d3385625e495a8bc
+  data.tar.gz: c8331443bacc0c9f94f09b3bf7a887321a5fe72eb6669ee89eafc3288cc312e5
+SHA512:
+  metadata.gz: ed88a38ce6d5acaaec24ef6d786cbe437b26f503cf916062f8dd811ff3b4be780f4d4ad7bc29c7bb4b9f5f045f91172f93f03ff9e2a669ffaa3c9c71d2989327
+  data.tar.gz: f83d5bc547db34d0eb1695911f99af512e3eec2b3e69833e578870bbd313b61703e0a721227d782ac160e3b11c684561e5675e8e484facaf2191e835c74b2525

data/LICENSE.txt ADDED Viewed

@@ -0,0 +1,21 @@
+The MIT License (MIT)
+Copyright (c) 2026 Yudai Takada
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+The above copyright notice and this permission notice shall be included in
+all copies or substantial portions of the Software.
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
+THE SOFTWARE.

data/README.md ADDED Viewed

@@ -0,0 +1,301 @@
+# rspec-flake-classifier
+`rspec-flake-classifier` records failed RSpec examples, normalizes their failure
+signatures, and classifies likely flaky-test causes. It can also rerun examples in
+isolated RSpec subprocesses to separate repeatable regressions from flaky failures.
+## Installation
+Add this line to your application's Gemfile:
+```ruby
+gem "rspec-flake-classifier"
+```
+And then execute:
+```bash
+bundle install
+```
+Or install it yourself as:
+```bash
+gem install rspec-flake-classifier
+```
+Requirements:
+- Ruby 3.2 or newer
+- RSpec 3.10 or newer
+Runtime dependencies:
+- `rspec-core`
+- `rspec-covers`
+- `rspec-hermetic`
+## Usage
+Configure it from `spec/spec_helper.rb` or `spec/rails_helper.rb`:
+```ruby
+require "rspec/flake/classifier"
+RSpec.configure do |config|
+  RSpec::FlakeClassifier.configure(config) do |flake|
+    flake.store = ".rspec_flake_store"
+    flake.auto_rerun = false
+    flake.deflaker = true
+    flake.run_sensitivity = false
+  end
+end
+```
+Run your suite normally:
+```bash
+bundle exec rspec
+```
+Failure records are written to `.rspec_flake_store/failures.jsonl`. Each record
+contains a normalized signature digest, occurrence counts, labels, example IDs,
+and evidence metadata.
+### Features
+- Append-only JSONL failure signature store
+- iDFlakies-style rerun protocol for same-order, isolated, OD, and NOD checks
+- Luo taxonomy cause labels such as `network`, `time`, `io`, `resource_leak`,
+  `async_wait`, `randomness`, and `test_order_dependency`
+- DeFlaker-style coverage and git diff suspicion signal
+- Runtime integration with `rspec-covers` and `rspec-hermetic`
+- JSON and JUnit formatters for CI ingestion
+- Buildkite Ruby test collector tags when the collector is active
+- Lightweight feature extraction, prediction, training, and evaluation commands
+### Configuration
+Common options:
+```ruby
+RSpec::FlakeClassifier.configure(config) do |flake|
+  flake.store = ".rspec_flake_store"
+  flake.auto_rerun = { failures: 3 }
+  flake.rspec_command = ["bundle", "exec", "rspec"]
+  flake.deflaker = true
+  flake.changed_lines_provider = -> { { "app/user.rb" => [10, 11] } }
+  flake.coverage_provider = ->(example) { example.metadata[:covered_lines] }
+  flake.probe_provider = ->(_example) { { files: ["tmp/cache"] } }
+  flake.skip_known_flakes = false
+  flake.run_sensitivity = false
+  flake.sensitivity_factors = %i[time randomness network]
+end
+```
+`auto_rerun` starts isolated subprocess reruns for failed examples. Keep it off
+unless you are comfortable with the extra runtime cost in CI.
+### rspec-covers and rspec-hermetic
+When all three gems are enabled, configure `rspec-covers` and `rspec-hermetic`
+before `rspec-flake-classifier`. That lets the classifier observe their
+after-example records from its outer RSpec hook.
+```ruby
+require "rspec/covers"
+require "rspec/hermetic"
+require "rspec/flake/classifier"
+RSpec.configure do |config|
+  RSpec::Covers.configure(config) do |covers|
+    covers.root = Dir.pwd
+    covers.production_paths = %w[app lib]
+    covers.risky = :report
+  end
+  RSpec::Hermetic.configure(config) do |hermetic|
+    hermetic.root_path = Dir.pwd
+    hermetic.probes = %i[filesystem resources]
+    hermetic.on_pollution = :report
+  end
+  RSpec::FlakeClassifier.configure(config) do |flake|
+    flake.store = ".rspec_flake_store"
+    flake.deflaker = true
+  end
+end
+```
+The classifier reads executed locations from `RSpec::Covers.reporter.results`
+and filesystem or resource changes from `RSpec::Hermetic.runner.records`. It also
+accepts compatible third-party adapters and explicit metadata such as
+`:flake_classifier_coverage`, `:flake_classifier_files`,
+`:flake_classifier_resources`, and `:flake_classifier_sockets`.
+### CI output
+Require the formatter explicitly when you want machine-readable output:
+```bash
+bundle exec rspec \
+  --require rspec/flake/classifier/formatter \
+  --format progress \
+  --format RSpec::FlakeClassifier::Formatter \
+  --out tmp/rspec-flakes.json
+```
+For JUnit XML:
+```bash
+bundle exec rspec \
+  --require rspec/flake/classifier/formatter \
+  --format progress \
+  --format RSpec::FlakeClassifier::JUnitFormatter \
+  --out test-results/rspec/rspec.xml
+```
+The JUnit formatter adds `flaky`, `flaky_labels`, `flake_signature`,
+`buildkite.*`, and `circleci.*` metadata.
+### Buildkite
+Use the official Buildkite Ruby test collector as usual:
+```ruby
+require "buildkite/test_collector"
+Buildkite::TestCollector.configure(hook: :rspec)
+```
+When the collector is active, `rspec-flake-classifier` tags failed executions
+with:
+- `rspec_flake_classifier.flaky`
+- `rspec_flake_classifier.labels`
+- `rspec_flake_classifier.signature`
+### CircleCI
+Generate JUnit XML and store the results directory:
+```yaml
+version: 2.1
+jobs:
+  test:
+    docker:
+      - image: cimg/ruby:3.3
+    steps:
+      - checkout
+      - run: bundle install
+      - run:
+          name: Run RSpec
+          command: |
+            bundle exec rspec \
+              --require rspec/flake/classifier/formatter \
+              --format progress \
+              --format RSpec::FlakeClassifier::JUnitFormatter \
+              --out test-results/rspec/rspec.xml
+      - store_test_results:
+          path: test-results
+```
+### CLI
+Investigate a failed example:
+```bash
+rspec-flake investigate 'spec/models/user_spec.rb[1:2]' --seed 1234 --prior 'spec/models/user_spec.rb[1:1]'
+```
+Classify a message or existing store records:
+```bash
+rspec-flake classify "Net::HTTP timed out" --json
+rspec-flake classify --from-store --store .rspec_flake_store --json
+```
+Export features, rank files, train weights, and evaluate results:
+```bash
+rspec-flake features spec/models/user_spec.rb --json
+rspec-flake predict spec/models/user_spec.rb spec/system/search_spec.rb --json
+rspec-flake train tmp/flake_features.jsonl --out tmp/flake_weights.json --json
+rspec-flake evaluate --predictions tmp/predictions.json --ground-truth tmp/ground_truth.json --json
+```
+Run sensitivity checks or summarize the store:
+```bash
+rspec-flake sensitivity 'spec/models/user_spec.rb[1:2]' --factor network --json
+rspec-flake report --store .rspec_flake_store --json
+```
+### Labels
+Labels are multi-value and confidence-scored. A single failure can receive more
+than one label.
+Common labels include:
+- `test_order_dependency`
+- `async_wait`
+- `concurrency`
+- `resource_leak`
+- `network`
+- `time`
+- `io`
+- `randomness`
+- `floating_point`
+- `unordered_collections`
+- `infrastructure`
+- `suspected_flaky_deflaker`
+- `known_flaky`
+## Development
+After checking out the repo, run:
+```bash
+bin/setup
+```
+Run the test suite:
+```bash
+bundle exec rake spec
+```
+Validate RBS signatures:
+```bash
+bundle exec rbs validate
+```
+Build the gem:
+```bash
+bundle exec rake build
+```
+Open an interactive console:
+```bash
+bin/console
+```
+To install this gem onto your local machine:
+```bash
+bundle exec rake install
+```
+## Contributing
+Bug reports and pull requests are welcome on GitHub at
+https://github.com/ydah/rspec-flake-classifier.
+## License
+The gem is available as open source under the terms of the MIT License.

data/Rakefile ADDED Viewed

@@ -0,0 +1,8 @@
+# frozen_string_literal: true
+require "bundler/gem_tasks"
+require "rspec/core/rake_task"
+RSpec::Core::RakeTask.new(:spec)
+task default: :spec

data/exe/rspec-flake ADDED Viewed

@@ -0,0 +1,6 @@
+#!/usr/bin/env ruby
+# frozen_string_literal: true
+require "rspec/flake/classifier/cli"
+exit RSpec::FlakeClassifier::CLI.new(ARGV).run

data/lib/rspec/flake/classifier/classify/classifier.rb ADDED Viewed

@@ -0,0 +1,228 @@
+# frozen_string_literal: true
+require_relative "context"
+require_relative "result"
+module RSpec
+  module FlakeClassifier
+    module Classify
+      class Classifier
+        RECOMMENDATIONS = {
+          "test_order_dependency" => "Isolate shared state and inspect polluter or cleaner examples.",
+          "async_wait" => "Replace sleeps with observable waits or Capybara matchers.",
+          "concurrency" => "Synchronize shared state and make thread lifecycle explicit.",
+          "resource_leak" => "Close leaked resources in example cleanup hooks.",
+          "network" => "Stub external calls with WebMock/VCR and fail on real sockets.",
+          "time" => "Freeze time and cover timezone or date-boundary cases explicitly.",
+          "io" => "Use isolated temp paths and avoid filesystem order assumptions.",
+          "randomness" => "Fix srand or dependency seeds and avoid random production data in assertions.",
+          "floating_point" => "Assert with tolerance instead of exact float equality.",
+          "unordered_collections" => "Use match_array/contain_exactly or compare sets.",
+          "infrastructure" => "Check CI host limits, disk, memory, and scheduler pressure.",
+          "suspected_flaky_deflaker" => "Review recent changes first; this failure did not cover changed lines.",
+          "known_flaky" => "Inspect the stored classification and linked commits."
+        }.freeze
+        def classify(input = nil, **attributes)
+          context = build_context(input, attributes)
+          labels = detectors.each_with_object([]) do |detector, result|
+            detected = detector.call(context)
+            next unless detected
+            detected.is_a?(Array) ? result.concat(detected.compact) : result << detected
+          end
+          status = labels.empty? ? "unknown" : "flaky"
+          Result.new(status: status, labels: labels, evidence: labels.flat_map(&:evidence))
+        end
+        private
+        def build_context(input, attributes)
+          return input if input.is_a?(Context)
+          if input.respond_to?(:message)
+            attributes[:message] ||= input.message
+            attributes[:backtrace] ||= input.backtrace
+          end
+          Context.new(**attributes)
+        end
+        def detectors
+          [
+            method(:deflaker_suspect),
+            method(:order_dependency),
+            method(:sensitivity),
+            method(:async_wait),
+            method(:concurrency),
+            method(:resource_leak),
+            method(:network),
+            method(:time),
+            method(:io),
+            method(:randomness),
+            method(:floating_point),
+            method(:unordered_collections),
+            method(:infrastructure)
+          ]
+        end
+        def order_dependency(context)
+          return unless context.order_dependent
+          label("test_order_dependency", 0.95, ["rerun protocol marked this example as order-dependent"])
+        end
+        def deflaker_suspect(context)
+          result = metadata_value(context, :deflaker)
+          return unless result && boolean_value(result, "suspected")
+          label("suspected_flaky_deflaker", 0.9, [result.fetch("reason", "covered code did not intersect changed lines")])
+        end
+        def sensitivity(context)
+          result = metadata_value(context, :sensitivity)
+          factors = result && (result["factors"] || result[:factors])
+          return unless factors
+          factors.filter_map do |factor, factor_result|
+            next unless boolean_value(factor_result, "sensitive")
+            category = sensitivity_category(factor)
+            label(category, 0.9, ["#{factor} sensitivity run changed the pass/fail outcome"])
+          end
+        end
+        def async_wait(context)
+          evidence = []
+          evidence << "wait/sleep/capybara call appears in failure context" if context.text.match?(/sleep|wait|eventually|capybara|selenium/i)
+          evidence << "large runtime spread across reruns" if large_duration_spread?(context.durations)
+          return if evidence.empty?
+          label("async_wait", evidence.length == 2 ? 0.85 : 0.65, evidence)
+        end
+        def concurrency(context)
+          return unless context.text.match?(/thread|fiber|mutex|deadlock|race|concurrent|monitor/i)
+          label("concurrency", 0.8, ["threading or race-related signal found"])
+        end
+        def resource_leak(context)
+          evidence = []
+          evidence << "resource probe reported leaked resources" unless context.resource_list.empty?
+          evidence << "failure context mentions leaked or exhausted resources" if context.text.match?(/leak|too many open files|connection pool|fd|descriptor/i)
+          return if evidence.empty?
+          label("resource_leak", evidence.length == 2 ? 0.9 : 0.75, evidence)
+        end
+        def network(context)
+          evidence = []
+          evidence << "socket probe recorded real network use" unless context.socket_list.empty?
+          evidence << "network exception or HTTP stack appears in failure context" if context.text.match?(/webmock|vcr|net::http|socket|dns|econn|timeout|timed out|connection refused/i)
+          return if evidence.empty?
+          label("network", evidence.length == 2 ? 0.9 : 0.8, evidence)
+        end
+        def time(context)
+          return unless context.text.match?(/timecop|active_support::testing::timehelpers|timezone|time zone|tz|date|time\.now|today|midnight/i)
+          label("time", 0.8, ["time or timezone dependency appears in failure context"])
+        end
+        def io(context)
+          evidence = []
+          evidence << "filesystem probe recorded file activity" unless context.file_list.empty?
+          evidence << "filesystem exception appears in failure context" if context.text.match?(/errno::e|no such file|permission denied|file|directory|tmpdir|tmp/i)
+          return if evidence.empty?
+          label("io", evidence.length == 2 ? 0.9 : 0.75, evidence)
+        end
+        def randomness(context)
+          return unless context.text.match?(/srand|securerandom|random|faker|seed|shuffle|sample/i)
+          label("randomness", 0.75, ["randomness or seed signal appears in failure context"])
+        end
+        def floating_point(context)
+          numbers = context.text.scan(/[-+]?\d+\.\d+(?:e[-+]?\d+)?/i).map(&:to_f)
+          return if numbers.length < 2
+          close_pair = numbers.combination(2).any? { |left, right| close_float?(left, right) }
+          return unless close_pair
+          label("floating_point", 0.85, ["near-equal floating point values appear in failure diff"])
+        end
+        def unordered_collections(context)
+          arrays = context.text.scan(/\[([^\[\]]+)\]/m).map { |(body)| tokenize_array(body) }.reject(&:empty?)
+          return if arrays.length < 2
+          matching_set = arrays.combination(2).any? do |left, right|
+            left != right && left.sort == right.sort
+          end
+          return unless matching_set
+          label("unordered_collections", 0.85, ["array-like values contain the same elements in different order"])
+        end
+        def infrastructure(context)
+          return unless context.text.match?(/out of memory|no space left|ci|buildkite|circleci|runner|killed|process terminated/i)
+          label("infrastructure", 0.55, ["failure context points to CI or host resource pressure"])
+        end
+        def label(category, confidence, evidence)
+          Label.new(
+            category: category,
+            confidence: confidence,
+            evidence: evidence,
+            action: RECOMMENDATIONS.fetch(category)
+          )
+        end
+        def large_duration_spread?(durations)
+          return false if durations.length < 2
+          min, max = durations.minmax
+          min.positive? && (max / min) >= 2.0
+        end
+        def close_float?(left, right)
+          delta = (left - right).abs
+          return false if delta.zero?
+          scale = [left.abs, right.abs, 1.0].max
+          delta <= scale * 1e-6
+        end
+        def metadata_value(context, key)
+          metadata = context.metadata || {}
+          metadata[key] || metadata[key.to_s]
+        end
+        def boolean_value(hash, key)
+          return false unless hash.respond_to?(:fetch)
+          hash.fetch(key, hash.fetch(key.to_sym, false)) == true
+        end
+        def sensitivity_category(factor)
+          case factor.to_s
+          when "time" then "time"
+          when "randomness" then "randomness"
+          when "network" then "network"
+          else "infrastructure"
+          end
+        end
+        def tokenize_array(body)
+          body.split(",").map { |value| value.strip.gsub(/\A["']|["']\z/, "") }
+        end
+      end
+    end
+  end
+end

data/lib/rspec/flake/classifier/classify/context.rb ADDED Viewed

@@ -0,0 +1,41 @@
+# frozen_string_literal: true
+module RSpec
+  module FlakeClassifier
+    module Classify
+      Context = Struct.new(
+        :message,
+        :backtrace,
+        :source,
+        :duration_samples,
+        :order_dependent,
+        :single_run_passed,
+        :resources,
+        :files,
+        :sockets,
+        :metadata,
+        keyword_init: true
+      ) do
+        def text
+          [message, *Array(backtrace), source].compact.join("\n")
+        end
+        def durations
+          Array(duration_samples).map(&:to_f)
+        end
+        def resource_list
+          Array(resources)
+        end
+        def file_list
+          Array(files)
+        end
+        def socket_list
+          Array(sockets)
+        end
+      end
+    end
+  end
+end

data/lib/rspec/flake/classifier/classify/result.rb ADDED Viewed

@@ -0,0 +1,44 @@
+# frozen_string_literal: true
+module RSpec
+  module FlakeClassifier
+    module Classify
+      Label = Struct.new(:category, :confidence, :evidence, :action, keyword_init: true) do
+        def to_h
+          {
+            "category" => category,
+            "confidence" => confidence,
+            "evidence" => evidence,
+            "action" => action
+          }
+        end
+      end
+      class Result
+        attr_reader :status, :labels, :evidence
+        def initialize(status:, labels:, evidence: [])
+          @status = status.to_s
+          @labels = labels.sort_by { |label| -label.confidence.to_f }
+          @evidence = evidence
+        end
+        def primary_label
+          labels.first
+        end
+        def flaky?
+          status == "flaky" || !labels.empty?
+        end
+        def to_h
+          {
+            "status" => status,
+            "labels" => labels.map(&:to_h),
+            "evidence" => evidence
+          }
+        end
+      end
+    end
+  end
+end