RubyGems - philiprehberger-encoding_kit - Versions diffs - 0.1.1 → 0.2.0 - Mend

philiprehberger-encoding_kit 0.1.1 → 0.2.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (9) hide show

checksums.yaml +4 -4
data/CHANGELOG.md +14 -0
data/README.md +70 -7
data/lib/philiprehberger/encoding_kit/converter.rb +2 -1
data/lib/philiprehberger/encoding_kit/detection_result.rb +85 -0
data/lib/philiprehberger/encoding_kit/detector.rb +96 -13
data/lib/philiprehberger/encoding_kit/version.rb +1 -1
data/lib/philiprehberger/encoding_kit.rb +121 -1
metadata +6 -4

checksums.yaml CHANGED Viewed

@@ -1,7 +1,7 @@
 ---
 SHA256:
-  metadata.gz: 53d87e443eeb4c48893c09ab98662b29de9df96c6b6af0795a8fc3311a33ea3b
-  data.tar.gz: e1939f99e731ad8627ce581761850a0f6362a55e1a6b164bebe2ed951b0a6fc3
+  metadata.gz: f28c4abaf812bdda01287ea83ef2c48cdef622a082db01adb3f81899ae715809
+  data.tar.gz: 113cd2df4e35e9f49955209662d37aa091bb749fb819d7e1597d7c50c11cdb14
 SHA512:
-  metadata.gz: 21eb582de63768fc7c657b4d4e3f5d1a33b2b46f38dfc168b2e86f7be0816477a0c8fc74e55580fc33922205d30b343260d1ff655888997f26fbc38112b1e1ee
-  data.tar.gz: e2d6cf3dfa4a688e166ab790d00142bb1621a1ec205355a0cbfebc828b8008f8d52f08e783d4c64d5b1410cb24d2c2401abc7e9ccd5d5621bfa88427aaa590e2
+  metadata.gz: c9b0b01dbeb5e0362854c5a66b36e86a7d18c3b05949f70f5496cbe9c5ca784305dc61b9cd330b2f4e7549b0430381ab55ab2c387c24c28b64f9232a843f0c06
+  data.tar.gz: 1754851e434597696433549da529b471e2307da5c6282433c1b8fe5bf624cba3d3210c5e51d6d4f0656b561c5dea27336c9ce8ac8f09c9e53ac56165a848f536

data/CHANGELOG.md CHANGED Viewed

@@ -7,6 +7,20 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
 ## [Unreleased]
+## [0.2.0] - 2026-03-28
+### Added
+- Confidence scores: `detect` returns a `DetectionResult` with `.encoding` and `.confidence` (1.0 for BOM, 0.5-0.9 for heuristics)
+- `DetectionResult` delegates to `Encoding` for backward compatibility (e.g., `result == Encoding::UTF_8` still works)
+- Streaming detection: `detect_stream(io, sample_size: 4096)` reads a sample from IO objects
+- Encoding analysis: `analyze(string)` returns byte distribution stats and ranked candidates
+- Windows codepage support: CP1252, CP1250, CP1251 detection via 0x80-0x9F byte patterns
+- Transcode alias: `transcode(string, to:, fallback:, replace:)` for simplified auto-detect-and-convert
+- Issue templates for bug reports and feature requests
+- Dependabot configuration for bundler and GitHub Actions
+- Pull request template
 ## [0.1.1] - 2026-03-26
 ### Added

data/README.md CHANGED Viewed

@@ -2,7 +2,11 @@
 [![Tests](https://github.com/philiprehberger/rb-encoding-kit/actions/workflows/ci.yml/badge.svg)](https://github.com/philiprehberger/rb-encoding-kit/actions/workflows/ci.yml)
 [![Gem Version](https://badge.fury.io/rb/philiprehberger-encoding_kit.svg)](https://rubygems.org/gems/philiprehberger-encoding_kit)
+[![GitHub release](https://img.shields.io/github/v/release/philiprehberger/rb-encoding-kit)](https://github.com/philiprehberger/rb-encoding-kit/releases)
+[![Last updated](https://img.shields.io/github/last-commit/philiprehberger/rb-encoding-kit)](https://github.com/philiprehberger/rb-encoding-kit/commits/main)
 [![License](https://img.shields.io/github/license/philiprehberger/rb-encoding-kit)](LICENSE)
+[![Bug Reports](https://img.shields.io/github/issues/philiprehberger/rb-encoding-kit/bug)](https://github.com/philiprehberger/rb-encoding-kit/issues?q=is%3Aissue+is%3Aopen+label%3Abug)
+[![Feature Requests](https://img.shields.io/github/issues/philiprehberger/rb-encoding-kit/enhancement)](https://github.com/philiprehberger/rb-encoding-kit/issues?q=is%3Aissue+is%3Aopen+label%3Aenhancement)
 [![Sponsor](https://img.shields.io/badge/sponsor-GitHub%20Sponsors-ec6cb9)](https://github.com/sponsors/philiprehberger)
 Character encoding detection, conversion, and normalization
@@ -30,19 +34,68 @@ gem install philiprehberger-encoding_kit
 ```ruby
 require "philiprehberger/encoding_kit"
-encoding = Philiprehberger::EncodingKit.detect(raw_bytes)
+result = Philiprehberger::EncodingKit.detect(raw_bytes)
+result.encoding   # => Encoding::UTF_8
+result.confidence # => 0.9
 utf8 = Philiprehberger::EncodingKit.to_utf8(raw_bytes)
 ```
-### Encoding Detection
+### Encoding Detection with Confidence
 ```ruby
 require "philiprehberger/encoding_kit"
-# Detects via BOM first, then UTF-8 validity, ASCII, Latin-1 heuristics
-Philiprehberger::EncodingKit.detect("\xEF\xBB\xBFhello".b) # => Encoding::UTF_8
-Philiprehberger::EncodingKit.detect("caf\xC3\xA9".b)       # => Encoding::UTF_8
-Philiprehberger::EncodingKit.detect("caf\xE9".b)            # => Encoding::ISO_8859_1
+# Returns a DetectionResult that delegates to Encoding
+result = Philiprehberger::EncodingKit.detect("\xEF\xBB\xBFhello".b)
+result == Encoding::UTF_8  # => true (backward compatible)
+result.confidence          # => 1.0 (BOM detected)
+result.name                # => "UTF-8"
+result.to_h                # => {encoding: Encoding::UTF_8, confidence: 1.0}
+# Heuristic detection returns lower confidence
+result = Philiprehberger::EncodingKit.detect("caf\xC3\xA9".b)
+result.confidence # => 0.85-0.9
+```
+### Streaming Detection
+```ruby
+require "philiprehberger/encoding_kit"
+File.open("data.csv", "rb") do |file|
+  result = Philiprehberger::EncodingKit.detect_stream(file, sample_size: 8192)
+  result.encoding   # => Encoding::UTF_8
+  result.confidence # => 0.9
+end
+```
+### Encoding Analysis
+```ruby
+require "philiprehberger/encoding_kit"
+analysis = Philiprehberger::EncodingKit.analyze(raw_bytes)
+analysis[:encoding]       # => Encoding::UTF_8
+analysis[:confidence]     # => 0.9
+analysis[:printable_ratio] # => 0.95
+analysis[:ascii_ratio]    # => 0.8
+analysis[:high_bytes]     # => 12
+analysis[:candidates]     # => [{encoding: Encoding::UTF_8, confidence: 0.9}, ...]
+```
+### Transcode
+```ruby
+require "philiprehberger/encoding_kit"
+# Auto-detect source, convert to UTF-8
+utf8 = Philiprehberger::EncodingKit.transcode(raw_bytes)
+# Convert to a specific encoding
+latin1 = Philiprehberger::EncodingKit.transcode(utf8_string, to: Encoding::ISO_8859_1)
+# Custom fallback behavior
+result = Philiprehberger::EncodingKit.transcode(data, to: "UTF-8", fallback: :replace, replace: "?")
 ```
 ### Convert to UTF-8
@@ -97,7 +150,10 @@ Philiprehberger::EncodingKit.valid?("hello", encoding: Encoding::US_ASCII)  # =>
 | Method | Description |
 |--------|-------------|
-| `EncodingKit.detect(string)` | Detect encoding via BOM and heuristics, returns an `Encoding` object |
+| `EncodingKit.detect(string)` | Detect encoding via BOM and heuristics, returns a `DetectionResult` with `.encoding` and `.confidence` |
+| `EncodingKit.detect_stream(io, sample_size: 4096)` | Detect encoding from an IO stream by sampling bytes |
+| `EncodingKit.analyze(string)` | Analyze byte distribution and return encoding candidates with stats |
+| `EncodingKit.transcode(string, to:, fallback:, replace:)` | Auto-detect source and convert to target encoding |
 | `EncodingKit.to_utf8(string, from: nil)` | Convert to UTF-8, auto-detect source if `from` is nil |
 | `EncodingKit.normalize(string)` | Force to valid UTF-8, replacing bad bytes with U+FFFD |
 | `EncodingKit.valid?(string, encoding: nil)` | Check if string is valid in given or current encoding |
@@ -113,6 +169,13 @@ bundle exec rspec
 bundle exec rubocop
 ```
+## Support
+If you find this package useful, consider giving it a star on GitHub — it helps motivate continued maintenance and development.
+[![LinkedIn](https://img.shields.io/badge/Philip%20Rehberger-LinkedIn-0A66C2?logo=linkedin)](https://www.linkedin.com/in/philiprehberger)
+[![More packages](https://img.shields.io/badge/more-open%20source%20packages-blue)](https://philiprehberger.com/open-source-packages)
 ## License
 [MIT](LICENSE)

data/lib/philiprehberger/encoding_kit/converter.rb CHANGED Viewed

@@ -35,7 +35,8 @@ module Philiprehberger
         # @param from [String, Encoding, nil] source encoding (auto-detect if nil)
         # @return [String] UTF-8 encoded string
         def to_utf8(string, from: nil)
-          source = from ? Encoding.find(from.to_s) : Detector.call(string)
+          detected = from ? Encoding.find(from.to_s) : Detector.call(string)
+          source = detected.is_a?(DetectionResult) ? detected.encoding : detected
           str = string.dup.force_encoding(source)
           str.encode(Encoding::UTF_8, invalid: :replace, undef: :replace, replace: "\uFFFD")
         end

data/lib/philiprehberger/encoding_kit/detection_result.rb ADDED Viewed

@@ -0,0 +1,85 @@
+# frozen_string_literal: true
+module Philiprehberger
+  module EncodingKit
+    # A detection result that wraps an Encoding with a confidence score.
+    # Delegates to the underlying Encoding so it can be used transparently
+    # wherever an Encoding object is expected (e.g., == Encoding::UTF_8).
+    class DetectionResult
+      attr_reader :encoding, :confidence
+      # @param encoding [Encoding] the detected encoding
+      # @param confidence [Float] confidence score between 0.0 and 1.0
+      def initialize(encoding, confidence)
+        @encoding = encoding
+        @confidence = confidence.to_f
+      end
+      # Equality check delegates to the underlying encoding so that
+      # `result == Encoding::UTF_8` works as expected.
+      #
+      # @param other [Object] the object to compare
+      # @return [Boolean]
+      def ==(other)
+        case other
+        when Encoding
+          @encoding == other
+        when DetectionResult
+          @encoding == other.encoding
+        else
+          super
+        end
+      end
+      # Support `eql?` for hash key usage.
+      #
+      # @param other [Object]
+      # @return [Boolean]
+      def eql?(other)
+        self == other
+      end
+      # Delegate hash to encoding for hash key consistency.
+      #
+      # @return [Integer]
+      def hash
+        @encoding.hash
+      end
+      # String representation shows the encoding name.
+      #
+      # @return [String]
+      def to_s
+        @encoding.to_s
+      end
+      # Inspect shows both encoding and confidence.
+      #
+      # @return [String]
+      def inspect
+        "#<#{self.class} encoding=#{@encoding} confidence=#{@confidence}>"
+      end
+      # Convert to a plain hash representation.
+      #
+      # @return [Hash]
+      def to_h
+        { encoding: @encoding, confidence: @confidence }
+      end
+      # Delegate unknown methods to the underlying Encoding object.
+      def method_missing(method, ...)
+        if @encoding.respond_to?(method)
+          @encoding.send(method, ...)
+        else
+          super
+        end
+      end
+      # Support respond_to? for delegated methods.
+      def respond_to_missing?(method, include_private = false)
+        @encoding.respond_to?(method, include_private) || super
+      end
+    end
+  end
+end

data/lib/philiprehberger/encoding_kit/detector.rb CHANGED Viewed

@@ -13,29 +13,54 @@ module Philiprehberger
         ["\xFF\xFE".b,         Encoding::UTF_16LE]
       ].freeze
+      # Bytes in 0x80-0x9F that are defined in CP1252 but not in ISO-8859-1.
+      # These bytes are unmapped in ISO-8859-1, so their presence strongly
+      # suggests a Windows codepage.
+      CP1252_SPECIFIC = [
+        0x80, 0x82, 0x83, 0x84, 0x85, 0x86, 0x87, 0x88,
+        0x89, 0x8A, 0x8B, 0x8C, 0x8E, 0x91, 0x92, 0x93,
+        0x94, 0x95, 0x96, 0x97, 0x98, 0x99, 0x9A, 0x9B,
+        0x9C, 0x9E, 0x9F
+      ].freeze
+      # CP1250 (Central European) has specific characters in 0x80-0x9F
+      # that differ from CP1252. Common: 0x8A (S-caron), 0x8E (Z-caron),
+      # 0x9A (s-caron), 0x9E (z-caron).
+      CP1250_MARKERS = [0x8A, 0x8E, 0x9A, 0x9E].freeze
+      # CP1251 (Cyrillic) maps 0x80-0xFF almost entirely to Cyrillic letters.
+      # Bytes 0xC0-0xFF are Cyrillic А-я in CP1251.
+      CP1251_RANGE = (0xC0..0xFF)
       class << self
-        # Detect the encoding of a byte string.
+        # Detect the encoding of a byte string, returning a DetectionResult
+        # with encoding and confidence score.
         #
         # Strategy:
-        #   1. Check for a byte order mark (BOM)
-        #   2. Try UTF-8 validity
-        #   3. Check pure ASCII
-        #   4. Apply Latin-1 heuristic
-        #   5. Fall back to BINARY
+        #   1. Check for a byte order mark (BOM) - confidence 1.0
+        #   2. Try UTF-8 validity - confidence 0.9
+        #   3. Check pure ASCII - confidence 0.9
+        #   4. Check Windows codepages (CP1252, CP1250, CP1251) - confidence 0.6-0.7
+        #   5. Apply Latin-1 heuristic - confidence 0.7
+        #   6. Fall back to BINARY - confidence 0.5
         #
         # @param string [String] the input string (ideally with BINARY/ASCII-8BIT encoding)
-        # @return [Encoding] the detected encoding
+        # @return [DetectionResult] the detected encoding with confidence
         def call(string)
           bytes = string.b
-          bom_encoding = detect_bom(bytes)
-          return bom_encoding if bom_encoding
+          bom_result = detect_bom_with_confidence(bytes)
+          return bom_result if bom_result
-          return Encoding::UTF_8 if valid_utf8?(bytes)
-          return Encoding::US_ASCII if ascii_only?(bytes)
-          return Encoding::ISO_8859_1 if latin1_heuristic?(bytes)
+          return DetectionResult.new(Encoding::UTF_8, utf8_confidence(bytes)) if valid_utf8?(bytes)
+          return DetectionResult.new(Encoding::US_ASCII, 0.9) if ascii_only?(bytes)
-          Encoding::BINARY
+          codepage_result = detect_windows_codepage(bytes)
+          return codepage_result if codepage_result
+          return DetectionResult.new(Encoding::ISO_8859_1, 0.7) if latin1_heuristic?(bytes)
+          DetectionResult.new(Encoding::BINARY, 0.5)
         end
         # Check whether the string starts with a known BOM.
@@ -51,6 +76,32 @@ module Philiprehberger
         private
+        # Detect BOM and return a DetectionResult with confidence 1.0.
+        #
+        # @param bytes [String] binary string
+        # @return [DetectionResult, nil]
+        def detect_bom_with_confidence(bytes)
+          BOMS.each do |bom, encoding|
+            return DetectionResult.new(encoding, 1.0) if bytes.start_with?(bom)
+          end
+          nil
+        end
+        # Calculate UTF-8 confidence based on the ratio of multibyte sequences.
+        #
+        # @param bytes [String] binary string
+        # @return [Float] confidence between 0.8 and 0.9
+        def utf8_confidence(bytes)
+          total = bytes.bytesize.to_f
+          return 0.9 if total.zero?
+          high_bytes = bytes.each_byte.count { |b| b >= 128 }
+          ratio = high_bytes / total
+          # More multibyte chars = higher confidence it's genuinely UTF-8
+          ratio > 0.1 ? 0.9 : 0.85
+        end
         # @param bytes [String] binary string
         # @return [Boolean]
         def valid_utf8?(bytes)
@@ -64,6 +115,38 @@ module Philiprehberger
           bytes.each_byte.all? { |b| b < 128 }
         end
+        # Detect Windows codepages by checking for bytes in the 0x80-0x9F range.
+        #
+        # @param bytes [String] binary string
+        # @return [DetectionResult, nil]
+        def detect_windows_codepage(bytes)
+          high_control = bytes.each_byte.grep(0x80..0x9F)
+          return nil if high_control.empty?
+          # Check for CP1251 (Cyrillic): high ratio of bytes in 0xC0-0xFF
+          cyrillic_count = bytes.each_byte.count { |b| CP1251_RANGE.cover?(b) }
+          total_high = bytes.each_byte.count { |b| b >= 0x80 }
+          if total_high.positive? && (cyrillic_count.to_f / total_high) > 0.6
+            return DetectionResult.new(Encoding::Windows_1251, 0.65)
+          end
+          # Check for CP1250 (Central European): presence of specific marker bytes
+          cp1250_markers = high_control.count { |b| CP1250_MARKERS.include?(b) }
+          if cp1250_markers >= 2
+            return DetectionResult.new(Encoding::Windows_1250, 0.6)
+          end
+          # Default to CP1252 (Western European) if bytes in 0x80-0x9F are present
+          cp1252_count = high_control.count { |b| CP1252_SPECIFIC.include?(b) }
+          if cp1252_count.positive?
+            confidence = cp1252_count > 3 ? 0.7 : 0.6
+            return DetectionResult.new(Encoding::Windows_1252, confidence)
+          end
+          nil
+        end
         # Simple heuristic: if every byte is in the ISO-8859-1 printable range
         # (0x20..0x7E or 0xA0..0xFF) or is a common control character, treat as Latin-1.
         #

data/lib/philiprehberger/encoding_kit/version.rb CHANGED Viewed

@@ -2,6 +2,6 @@
 module Philiprehberger
   module EncodingKit
-    VERSION = '0.1.1'
+    VERSION = '0.2.0'
   end
 end

data/lib/philiprehberger/encoding_kit.rb CHANGED Viewed

@@ -1,6 +1,7 @@
 # frozen_string_literal: true
 require_relative 'encoding_kit/version'
+require_relative 'encoding_kit/detection_result'
 require_relative 'encoding_kit/detector'
 require_relative 'encoding_kit/converter'
@@ -12,13 +13,79 @@ module Philiprehberger
     BOMS = Detector::BOMS
     # Detect the encoding of a string via BOM and heuristics.
+    # Returns a DetectionResult that delegates to the underlying Encoding,
+    # so it can be compared directly (e.g., result == Encoding::UTF_8)
+    # while also providing a confidence score via result.confidence.
     #
     # @param string [String] the input string
-    # @return [Encoding] the detected encoding
+    # @return [DetectionResult] the detected encoding with confidence score
     def self.detect(string)
       Detector.call(string)
     end
+    # Detect encoding from an IO stream by reading a sample of bytes.
+    # The IO position is restored after reading (if the IO supports seek).
+    #
+    # @param io [IO, StringIO] the IO object to read from
+    # @param sample_size [Integer] number of bytes to sample (default: 4096)
+    # @return [DetectionResult] the detected encoding with confidence score
+    def self.detect_stream(io, sample_size: 4096)
+      original_pos = io.respond_to?(:pos) ? io.pos : nil
+      sample = io.read(sample_size)
+      if original_pos && io.respond_to?(:seek)
+        io.seek(original_pos)
+      end
+      return DetectionResult.new(Encoding::BINARY, 0.5) if sample.nil? || sample.empty?
+      Detector.call(sample)
+    end
+    # Analyze a string and return detailed byte distribution statistics
+    # along with encoding candidates ranked by confidence.
+    #
+    # @param string [String] the input string
+    # @return [Hash] analysis results with keys :encoding, :confidence,
+    #   :printable_ratio, :ascii_ratio, :high_bytes, :candidates
+    def self.analyze(string)
+      bytes = string.b
+      total = bytes.bytesize.to_f
+      if total.zero?
+        return {
+          encoding: Encoding::BINARY,
+          confidence: 0.5,
+          printable_ratio: 0.0,
+          ascii_ratio: 0.0,
+          high_bytes: 0,
+          candidates: [{ encoding: Encoding::BINARY, confidence: 0.5 }]
+        }
+      end
+      ascii_count = 0
+      printable_count = 0
+      high_byte_count = 0
+      bytes.each_byte do |b|
+        ascii_count += 1 if b < 128
+        printable_count += 1 if (0x20..0x7E).cover?(b) || b == 0x09 || b == 0x0A || b == 0x0D
+        high_byte_count += 1 if b >= 128
+      end
+      primary = Detector.call(bytes)
+      candidates = build_candidates(bytes, primary)
+      {
+        encoding: primary.encoding,
+        confidence: primary.confidence,
+        printable_ratio: (printable_count / total).round(4),
+        ascii_ratio: (ascii_count / total).round(4),
+        high_bytes: high_byte_count,
+        candidates: candidates
+      }
+    end
     # Convert a string to UTF-8, auto-detecting source encoding if not specified.
     #
     # @param string [String] the input string
@@ -61,6 +128,23 @@ module Philiprehberger
       Converter.convert(string, from: from, to: to)
     end
+    # Transcode a string to the target encoding, auto-detecting the source.
+    # Simpler API for the most common conversion pattern.
+    #
+    # @param string [String] the input string
+    # @param to [String, Encoding] target encoding (default: UTF-8)
+    # @param fallback [Symbol] fallback strategy (:replace or :raise)
+    # @param replace [String] replacement character for invalid bytes
+    # @return [String] the transcoded string
+    # @raise [EncodingKit::Error] on conversion failure when fallback is :raise
+    def self.transcode(string, to: Encoding::UTF_8, fallback: :replace, replace: '?')
+      detected = Detector.call(string)
+      source = detected.encoding
+      target = to.is_a?(Encoding) ? to : Encoding.find(to.to_s)
+      Converter.convert(string, from: source, to: target, fallback: fallback, replace: replace)
+    end
     # Remove a byte order mark from the beginning of a string.
     #
     # @param string [String] the input string
@@ -84,5 +168,41 @@ module Philiprehberger
       bytes = string.b
       BOMS.any? { |bom, _encoding| bytes.start_with?(bom) }
     end
+    # Build a list of encoding candidates with confidence scores.
+    #
+    # @param bytes [String] binary string
+    # @param primary [DetectionResult] the primary detection result
+    # @return [Array<Hash>] candidates sorted by confidence (descending)
+    private_class_method def self.build_candidates(bytes, primary)
+      candidates = [{ encoding: primary.encoding, confidence: primary.confidence }]
+      # Check UTF-8 validity as a candidate
+      utf8_dup = bytes.dup.force_encoding(Encoding::UTF_8)
+      if utf8_dup.valid_encoding? && primary.encoding != Encoding::UTF_8
+        candidates << { encoding: Encoding::UTF_8, confidence: 0.6 }
+      end
+      # Check ASCII as a candidate
+      if bytes.each_byte.all? { |b| b < 128 } && primary.encoding != Encoding::US_ASCII
+        candidates << { encoding: Encoding::US_ASCII, confidence: 0.5 }
+      end
+      # Always consider Latin-1 for high-byte content
+      high_bytes = bytes.each_byte.any? { |b| b >= 128 }
+      if high_bytes && primary.encoding != Encoding::ISO_8859_1
+        candidates << { encoding: Encoding::ISO_8859_1, confidence: 0.5 }
+      end
+      # Consider Windows codepages for high-byte content
+      if high_bytes
+        has_control_high = bytes.each_byte.any? { |b| (0x80..0x9F).cover?(b) }
+        if has_control_high && primary.encoding != Encoding::Windows_1252
+          candidates << { encoding: Encoding::Windows_1252, confidence: 0.5 }
+        end
+      end
+      candidates.sort_by { |c| -c[:confidence] }
+    end
   end
 end

metadata CHANGED Viewed

@@ -1,17 +1,18 @@
 --- !ruby/object:Gem::Specification
 name: philiprehberger-encoding_kit
 version: !ruby/object:Gem::Version
-  version: 0.1.1
+  version: 0.2.0
 platform: ruby
 authors:
 - Philip Rehberger
 autorequire:
 bindir: bin
 cert_chain: []
-date: 2026-03-27 00:00:00.000000000 Z
+date: 2026-03-29 00:00:00.000000000 Z
 dependencies: []
-description: Detect encoding from BOM and heuristics, convert between encodings, normalize
-  to UTF-8, and strip byte order marks. Zero dependencies.
+description: Detect encoding from BOM and heuristics with confidence scores, convert
+  between encodings, normalize to UTF-8, analyze byte distributions, and handle Windows
+  codepages. Zero dependencies.
 email:
 - me@philiprehberger.com
 executables: []
@@ -23,6 +24,7 @@ files:
 - README.md
 - lib/philiprehberger/encoding_kit.rb
 - lib/philiprehberger/encoding_kit/converter.rb
+- lib/philiprehberger/encoding_kit/detection_result.rb
 - lib/philiprehberger/encoding_kit/detector.rb
 - lib/philiprehberger/encoding_kit/version.rb
 homepage: https://github.com/philiprehberger/rb-encoding-kit