RubyGems - smarter_csv - Versions diffs - 1.9.2 → 1.10.0 - Mend

smarter_csv 1.9.2 → 1.10.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (17) hide show

checksums.yaml +4 -4
data/CHANGELOG.md +23 -0
data/README.md +29 -8
data/lib/smarter_csv/auto_detection.rb +73 -0
data/lib/smarter_csv/file_io.rb +50 -0
data/lib/smarter_csv/hash_transformations.rb +91 -0
data/lib/smarter_csv/header_transformations.rb +63 -0
data/lib/smarter_csv/header_validations.rb +34 -0
data/lib/smarter_csv/headers.rb +68 -0
data/lib/smarter_csv/options_processing.rb +10 -1
data/lib/smarter_csv/parse.rb +90 -0
data/lib/smarter_csv/smarter_csv.rb +79 -416
data/lib/smarter_csv/variables.rb +30 -0
data/lib/smarter_csv/version.rb +1 -1
data/lib/smarter_csv.rb +16 -3
metadata +11 -4
data/lib/core_ext/hash.rb +0 -9

checksums.yaml CHANGED Viewed

@@ -1,7 +1,7 @@
 ---
 SHA256:
-  metadata.gz: 3e4032569303bd062a92b3c3f45f5166346808291667dda9ebd91af123f532ef
-  data.tar.gz: 78b73abc411d8ed866feae600b87b72c3c99fd3b00b67c81eac227c17f8d38ea
+  metadata.gz: f1d0b58acf0135b621e3182470674230ef73b48c829810e74fffa975fc318cf5
+  data.tar.gz: ee404c5c485748d35cda36b8d249cb6813a3f80005182fe8c05feac1694aba57
 SHA512:
-  metadata.gz: 1712951a2ce4f6e8ad93a6e76a105a3a8d4890babacfbb9ae3eead11ac638962d9da3d45421a327049e87c9d54b43c0dca1327f11a13bbd54440d3a7fefc6253
-  data.tar.gz: 3d8b81f04c8eb16a7b2ab9ddf27bdaf2b2bfdd2ee3a8b70765a88f809fc9869500debe950d8ec27e3a6af818e6f1e415d96d078e52784d638f1363619088faa3
+  metadata.gz: 4fee097fe2237f863510100155062da6815237260da5b15189f104f54596f7d5ff0479deb80596544e0bb1b9ba7b78126d2251798721e8d2f91e06b430950cd6
+  data.tar.gz: c30562965452ef296b5e5aaf2a9a12887aa42d8e8396780b73b34f99a2386d232bf020578618fcbd65186fc864518c81a3e7555cae9b00a005322f3599e18c5a

data/CHANGELOG.md CHANGED Viewed

@@ -1,6 +1,29 @@
 # SmarterCSV 1.x Change Log
+## 1.10.0 (2023-12-31) ⚡ BREAKING ⚡
+  * BREAKING CHANGES:
+    Changed behavior:
+     + when `user_provided_headers` are provided:
+       * if they are not unique, an exception will now be raised
+       * they are taken "as is", no header transformations can be applied
+       * when they are given as strings or as symbols, it is assumed that this is the desired format
+       * the value of the `strings_as_keys` options will be ignored
+     + option `duplicate_header_suffix` now defaults to `''` instead of `nil`.
+       * this allows automatic disambiguation when processing of CSV files with duplicate headers, by appending a number
+       * explicitly set this option to `nil` to get the behavior from previous versions.
+  * performance and memory improvements
+  * code refactor
+## 1.9.3 (2023-12-16)
+  * raise SmarterCSV::IncorrectOption when `user_provided_headers` are empty
+  * code refactor / no functional changes
+  * added test cases
 ## 1.9.2 (2023-11-12)
   * fixed bug with '\\' at end of line (issue #252, thanks to averycrespi-moz)
   * fixed require statements (issue #249, thanks to PikachuEXE, courtsimas)

data/README.md CHANGED Viewed

@@ -2,15 +2,33 @@
 # SmarterCSV
  [![codecov](https://codecov.io/gh/tilo/smarter_csv/branch/main/graph/badge.svg?token=1L7OD80182)](https://codecov.io/gh/tilo/smarter_csv) [![Gem Version](https://badge.fury.io/rb/smarter_csv.svg)](http://badge.fury.io/rb/smarter_csv)
+#### LATEST CHANGES
+* Version 1.10.0 has BREAKING CHANGES:
+    Changed behavior:
+     + when `user_provided_headers` are provided:
+       * if they are not unique, an exception will now be raised
+       * they are taken "as is", no header transformations can be applied
+       * when they are given as strings or as symbols, it is assumed that this is the desired format
+       * the value of the `strings_as_keys` options will be ignored
+     + option `duplicate_header_suffix` now defaults to `''` instead of `nil`.
+       * this allows automatic disambiguation when processing of CSV files with duplicate headers, by appending a number
+       * explicitly set this option to `nil` to get the behavior from previous versions.
 #### Development Branches
 * default branch is `main` for 1.x development
-* 2.x development is on `2.0-development` (check this branch for 2.0 documentation)
+* 2.x development is on `2.0-development` (check this branch for 2.0 documentation)
+  - This is an EXPERIMENTAL branch - DO NOT USE in production
-#### Work towards Future Version 2.0
+#### Work towards Future Version 2.x
-* Work towards SmarterCSV 2.0 is still ongoing, with improved features, and more streamlined options, but consider it as experimental at this time.
+* Work towards SmarterCSV 2.x is still ongoing, with improved features, and more streamlined options, but consider it as experimental at this time.
   Please check the [2.0-develop branch](https://github.com/tilo/smarter_csv/tree/2.0-develop), open any issues and pull requests with mention of tag v2.0.
 ---------------
@@ -84,6 +102,10 @@ $ hexdump -C spec/fixtures/bom_test_feff.csv
 00000040  73 2c 35 36 37 38 0d 0a                           |s,5678..|
 ```
+### Articles
+* [Processing 1.4 Million CSV Records in Ruby, fast ](https://lcx.wien/blog/processing-14-million-csv-records-in-ruby/)
+* [Speeding up CSV parsing with parallel processing](http://xjlin0.github.io/tech/2015/05/25/faster-parsing-csv-with-parallel-processing)
 ### Examples
 Here are some examples to demonstrate the versatility of SmarterCSV.
@@ -243,8 +265,6 @@ NOTE: If you use `key_mappings` and `value_converters`, make sure that the value
     data[0][:price].class
       => Float
 ```
-## Parallel Processing
-[Jack](https://github.com/xjlin0) wrote an interesting article about [Speeding up CSV parsing with parallel processing](http://xjlin0.github.io/tech/2015/05/25/faster-parsing-csv-with-parallel-processing)
 ## Documentation
@@ -280,7 +300,8 @@ The options and the block are optional.
      | :headers_in_file            |   true   | Whether or not the file contains headers as the first line.                          |
      |                             |          | Important if the file does not contain headers,                                      |
      |                             |          | otherwise you would lose the first line of data.                                     |
-     | :duplicate_header_suffix    |   nil    | If set, adds numbers to duplicated headers and separates them by the given suffix    |
+     | :duplicate_header_suffix    |   ''     | Adds numbers to duplicated headers and separates them by the given suffix.           |
+     |                             |          | Set this to nil to raise `DuplicateHeaders` error instead (previous behavior)        |
      | :user_provided_headers      |   nil    | *careful with that axe!*                                                             |
      |                             |          | user provided Array of header strings or symbols, to define                          |
      |                             |          | what headers should be used, overriding any in-file headers.                         |
@@ -300,7 +321,7 @@ And header and data validations will also be supported in 2.x
      | Option                      | Default  |  Explanation                                                                         |
      ---------------------------------------------------------------------------------------------------------------------------------
      | :key_mapping                |   nil    | a hash which maps headers from the CSV file to keys in the result hash               |
-     | :silence_missing_key        |   false  | ignore missing keys in `key_mapping`                                   |
+     | :silence_missing_keys        |   false  | ignore missing keys in `key_mapping`                                   |
      |                             |          | if set to true: makes all mapped keys optional                         |
      |                             |          | if given an array, makes only the keys listed in it optional                         |
      | :required_keys              |   nil    | An array. Specify the required names AFTER header transformation.                  |

data/lib/smarter_csv/auto_detection.rb ADDED Viewed

@@ -0,0 +1,73 @@
+# frozen_string_literal: true
+module SmarterCSV
+  class << self
+    protected
+    # If file has headers, then guesses column separator from headers.
+    # Otherwise guesses column separator from contents.
+    # Raises exception if none is found.
+    def guess_column_separator(filehandle, options)
+      skip_lines(filehandle, options)
+      delimiters = [',', "\t", ';', ':', '|']
+      line = nil
+      has_header = options[:headers_in_file]
+      candidates = Hash.new(0)
+      count = has_header ? 1 : 5
+      count.times do
+        line = readline_with_counts(filehandle, options)
+        delimiters.each do |d|
+          candidates[d] += line.scan(d).count
+        end
+      rescue EOFError # short files
+        break
+      end
+      rewind(filehandle)
+      if candidates.values.max == 0
+        # if the header only contains
+        return ',' if line.chomp(options[:row_sep]) =~ /^\w+$/
+        raise SmarterCSV::NoColSepDetected
+      end
+      candidates.key(candidates.values.max)
+    end
+    # limitation: this currently reads the whole file in before making a decision
+    def guess_line_ending(filehandle, options)
+      counts = {"\n" => 0, "\r" => 0, "\r\n" => 0}
+      quoted_char = false
+      # count how many of the pre-defined line-endings we find
+      # ignoring those contained within quote characters
+      last_char = nil
+      lines = 0
+      filehandle.each_char do |c|
+        quoted_char = !quoted_char if c == options[:quote_char]
+        next if quoted_char
+        if last_char == "\r"
+          if c == "\n"
+            counts["\r\n"] += 1
+          else
+            counts["\r"] += 1 # \r are counted after they appeared
+          end
+        elsif c == "\n"
+          counts["\n"] += 1
+        end
+        last_char = c
+        lines += 1
+        break if options[:auto_row_sep_chars] && options[:auto_row_sep_chars] > 0 && lines >= options[:auto_row_sep_chars]
+      end
+      rewind(filehandle)
+      counts["\r"] += 1 if last_char == "\r"
+      # find the most frequent key/value pair:
+      most_frequent_key, _count = counts.max_by{|_, v| v}
+      most_frequent_key
+    end
+  end
+end

data/lib/smarter_csv/file_io.rb ADDED Viewed

@@ -0,0 +1,50 @@
+# frozen_string_literal: true
+module SmarterCSV
+  class << self
+    protected
+    def readline_with_counts(filehandle, options)
+      line = filehandle.readline(options[:row_sep])
+      @file_line_count += 1
+      @csv_line_count += 1
+      line = remove_bom(line) if @csv_line_count == 1
+      line
+    end
+    def skip_lines(filehandle, options)
+      options[:skip_lines].to_i.times do
+        readline_with_counts(filehandle, options)
+      end
+    end
+    def rewind(filehandle)
+      @file_line_count = 0
+      @csv_line_count = 0
+      filehandle.rewind
+    end
+    private
+    UTF_32_BOM = %w[0 0 fe ff].freeze
+    UTF_32LE_BOM = %w[ff fe 0 0].freeze
+    UTF_8_BOM = %w[ef bb bf].freeze
+    UTF_16_BOM = %w[fe ff].freeze
+    UTF_16LE_BOM = %w[ff fe].freeze
+    def remove_bom(str)
+      str_as_hex = str.bytes.map{|x| x.to_s(16)}
+      # if string does not start with one of the bytes, there is no BOM
+      return str unless %w[ef fe ff 0].include?(str_as_hex[0])
+      return str.byteslice(4..-1) if [UTF_32_BOM, UTF_32LE_BOM].include?(str_as_hex[0..3])
+      return str.byteslice(3..-1) if str_as_hex[0..2] == UTF_8_BOM
+      return str.byteslice(2..-1) if [UTF_16_BOM, UTF_16LE_BOM].include?(str_as_hex[0..1])
+      # :nocov:
+      puts "SmarterCSV found unhandled BOM! #{str.chars[0..7].inspect}"
+      str
+      # :nocov:
+    end
+  end
+end

data/lib/smarter_csv/hash_transformations.rb ADDED Viewed

@@ -0,0 +1,91 @@
+# frozen_string_literal: true
+module SmarterCSV
+  class << self
+    def hash_transformations(hash, options)
+      # there may be unmapped keys, or keys purposedly mapped to nil or an empty key..
+      # make sure we delete any key/value pairs from the hash, which the user wanted to delete:
+      remove_empty_values = options[:remove_empty_values] == true
+      remove_zero_values = options[:remove_zero_values]
+      remove_values_matching = options[:remove_values_matching]
+      convert_to_numeric = options[:convert_values_to_numeric]
+      value_converters = options[:value_converters]
+      hash.each_with_object({}) do |(k, v), new_hash|
+        next if k.nil? || k == '' || k == :""
+        next if remove_empty_values && (has_rails ? v.blank? : blank?(v))
+        next if remove_zero_values && v.is_a?(String) && v =~ /^(0+|0+\.0+)$/ # values are Strings
+        next if remove_values_matching && v =~ remove_values_matching
+        # deal with the :only / :except options to :convert_values_to_numeric
+        if convert_to_numeric && !limit_execution_for_only_or_except(options, :convert_values_to_numeric, k)
+          if v =~ /^[+-]?\d+\.\d+$/
+            v = v.to_f
+          elsif v =~ /^[+-]?\d+$/
+            v = v.to_i
+          end
+        end
+        converter = value_converters[k] if value_converters
+        v = converter.convert(v) if converter
+        new_hash[k] = v
+      end
+    end
+    # def hash_transformations(hash, options)
+    #   # there may be unmapped keys, or keys purposedly mapped to nil or an empty key..
+    #   # make sure we delete any key/value pairs from the hash, which the user wanted to delete:
+    #   hash.delete(nil)
+    #   hash.delete('')
+    #   hash.delete(:"")
+    #   if options[:remove_empty_values] == true
+    #     hash.delete_if{|_k, v| has_rails ? v.blank? : blank?(v)}
+    #   end
+    #   hash.delete_if{|_k, v| !v.nil? && v =~ /^(0+|0+\.0+)$/} if options[:remove_zero_values] # values are Strings
+    #   hash.delete_if{|_k, v| v =~ options[:remove_values_matching]} if options[:remove_values_matching]
+    #   if options[:convert_values_to_numeric]
+    #     hash.each do |k, v|
+    #       # deal with the :only / :except options to :convert_values_to_numeric
+    #       next if limit_execution_for_only_or_except(options, :convert_values_to_numeric, k)
+    #       # convert if it's a numeric value:
+    #       case v
+    #       when /^[+-]?\d+\.\d+$/
+    #         hash[k] = v.to_f
+    #       when /^[+-]?\d+$/
+    #         hash[k] = v.to_i
+    #       end
+    #     end
+    #   end
+    #   if options[:value_converters]
+    #     hash.each do |k, v|
+    #       converter = options[:value_converters][k]
+    #       next unless converter
+    #       hash[k] = converter.convert(v)
+    #     end
+    #   end
+    #   hash
+    # end
+    protected
+    # acts as a road-block to limit processing when iterating over all k/v pairs of a CSV-hash:
+    def limit_execution_for_only_or_except(options, option_name, key)
+      if options[option_name].is_a?(Hash)
+        if options[option_name].has_key?(:except)
+          return true if Array(options[option_name][:except]).include?(key)
+        elsif options[option_name].has_key?(:only)
+          return true unless Array(options[option_name][:only]).include?(key)
+        end
+      end
+      false
+    end
+  end
+end

data/lib/smarter_csv/header_transformations.rb ADDED Viewed

@@ -0,0 +1,63 @@
+# frozen_string_literal: true
+module SmarterCSV
+  class << self
+    # transform the headers that were in the file:
+    def header_transformations(header_array, options)
+      header_array.map!{|x| x.gsub(%r/#{options[:quote_char]}/, '')}
+      header_array.map!{|x| x.strip} if options[:strip_whitespace]
+      unless options[:keep_original_headers]
+        header_array.map!{|x| x.gsub(/\s+|-+/, '_')}
+        header_array.map!{|x| x.downcase} if options[:downcase_header]
+      end
+      # detect duplicate headers and disambiguate
+      header_array = disambiguate_headers(header_array, options) if options[:duplicate_header_suffix]
+      # symbolize headers
+      header_array = header_array.map{|x| x.to_sym } unless options[:strings_as_keys] || options[:keep_original_headers]
+      # doesn't make sense to re-map when we have user_provided_headers
+      header_array = remap_headers(header_array, options) if options[:key_mapping]
+      header_array
+    end
+    def disambiguate_headers(headers, options)
+      counts = Hash.new(0)
+      headers.map do |header|
+        counts[header] += 1
+        counts[header] > 1 ? "#{header}#{options[:duplicate_header_suffix]}#{counts[header]}" : header
+      end
+    end
+    # do some key mapping on the keys in the file header
+    # if you want to completely delete a key, then map it to nil or to ''
+    def remap_headers(headers, options)
+      key_mapping = options[:key_mapping]
+      if key_mapping.empty? || !key_mapping.is_a?(Hash) || key_mapping.keys.empty?
+        raise(SmarterCSV::IncorrectOption, "ERROR: incorrect format for key_mapping! Expecting hash with from -> to mappings")
+      end
+      key_mapping = options[:key_mapping]
+      # if silence_missing_keys are not set, raise error if missing header
+      missing_keys = key_mapping.keys - headers
+      # if the user passes a list of speciffic mapped keys that are optional
+      missing_keys -= options[:silence_missing_keys] if options[:silence_missing_keys].is_a?(Array)
+      unless missing_keys.empty? || options[:silence_missing_keys] == true
+        raise SmarterCSV::KeyMappingError, "ERROR: can not map headers: #{missing_keys.join(', ')}"
+      end
+      headers.map! do |header|
+        if key_mapping.has_key?(header)
+          key_mapping[header].nil? ? nil : key_mapping[header]
+        elsif options[:remove_unmapped_keys]
+          nil
+        else
+          header
+        end
+      end
+      headers
+    end
+  end
+end

data/lib/smarter_csv/header_validations.rb ADDED Viewed

@@ -0,0 +1,34 @@
+# frozen_string_literal: true
+module SmarterCSV
+  class << self
+    def header_validations(headers, options)
+      check_duplicate_headers(headers, options)
+      check_required_headers(headers, options)
+    end
+    def check_duplicate_headers(headers, _options)
+      header_counts = Hash.new(0)
+      headers.each { |header| header_counts[header] += 1 unless header.nil? }
+      duplicates = header_counts.select { |_, count| count > 1 }
+      unless duplicates.empty?
+        raise(SmarterCSV::DuplicateHeaders, "Duplicate Headers in CSV: #{duplicates.inspect}")
+      end
+    end
+    require 'set'
+    def check_required_headers(headers, options)
+      if options[:required_keys] && options[:required_keys].is_a?(Array)
+        headers_set = headers.to_set
+        missing_keys = options[:required_keys].select { |k| !headers_set.include?(k) }
+        unless missing_keys.empty?
+          raise SmarterCSV::MissingKeys, "ERROR: missing attributes: #{missing_keys.join(',')}"
+        end
+      end
+    end
+  end
+end

data/lib/smarter_csv/headers.rb ADDED Viewed

@@ -0,0 +1,68 @@
+# frozen_string_literal: true
+module SmarterCSV
+  class << self
+    def process_headers(filehandle, options)
+      @raw_header = nil # header as it appears in the file
+      @headers = nil # the processed headers
+      header_array = []
+      file_header_size = nil
+      # if headers_in_file, get the headers -> We get the number of columns, even when user provided headers
+      if options[:headers_in_file] # extract the header line
+        # process the header line in the CSV file..
+        # the first line of a CSV file contains the header .. it might be commented out, so we need to read it anyhow
+        header_line = @raw_header = readline_with_counts(filehandle, options)
+        header_line = preprocess_header_line(header_line, options)
+        file_header_array, file_header_size = parse(header_line, options)
+        file_header_array = header_transformations(file_header_array, options)
+      else
+        unless options[:user_provided_headers]
+          raise SmarterCSV::IncorrectOption, "ERROR: If :headers_in_file is set to false, you have to provide :user_provided_headers"
+        end
+      end
+      if options[:user_provided_headers]
+        unless options[:user_provided_headers].is_a?(Array) && !options[:user_provided_headers].empty?
+          raise(SmarterCSV::IncorrectOption, "ERROR: incorrect format for user_provided_headers! Expecting array with headers.")
+        end
+        # use user-provided headers
+        user_header_array = options[:user_provided_headers]
+        # user_provided_headers: their count should match the headers_in_file if any
+        if defined?(file_header_size) && !file_header_size.nil?
+          if user_header_array.size != file_header_size
+            raise SmarterCSV::HeaderSizeMismatch, "ERROR: :user_provided_headers defines #{user_header_array.size} headers !=  CSV-file has #{file_header_size} headers"
+          else
+            # we could print out the mapping of file_header_array to header_array here
+          end
+        end
+        header_array = user_header_array
+      else
+        header_array = file_header_array
+      end
+      [header_array, header_array.size]
+    end
+    private
+    def preprocess_header_line(header_line, options)
+      header_line = enforce_utf8_encoding(header_line, options)
+      header_line = remove_comments_from_header(header_line, options)
+      header_line = header_line.chomp(options[:row_sep])
+      header_line.gsub!(options[:strip_chars_from_headers], '') if options[:strip_chars_from_headers]
+      header_line
+    end
+    def remove_comments_from_header(header, options)
+      return header unless options[:comment_regexp]
+      header.sub(options[:comment_regexp], '')
+    end
+  end
+end

data/lib/smarter_csv/options_processing.rb CHANGED Viewed

@@ -9,7 +9,7 @@ module SmarterCSV
     comment_regexp: nil, # was: /\A#/,
     convert_values_to_numeric: true,
     downcase_header: true,
-    duplicate_header_suffix: nil,
+    duplicate_header_suffix: '', # was: nil,
     file_encoding: 'utf-8',
     force_simple_split: false,
     force_utf8: false,
@@ -62,6 +62,15 @@ module SmarterCSV
     private
     def validate_options!(options)
+      # deprecate required_headers
+      unless options[:required_headers].nil?
+        puts "DEPRECATION WARNING: please use 'required_keys' instead of 'required_headers'"
+        if options[:required_keys].nil?
+          options[:required_keys] = options[:required_headers]
+          options[:required_headers] = nil
+        end
+      end
       keys = options.keys
       errors = []
       errors << "invalid row_sep" if keys.include?(:row_sep) && !option_valid?(options[:row_sep])

data/lib/smarter_csv/parse.rb ADDED Viewed

@@ -0,0 +1,90 @@
+# frozen_string_literal: true
+module SmarterCSV
+  class << self
+    protected
+    ###
+    ### Thin wrapper around C-extension
+    ###
+    def parse(line, options, header_size = nil)
+      # puts "SmarterCSV.parse OPTIONS: #{options[:acceleration]}" if options[:verbose]
+      if options[:acceleration] && has_acceleration?
+        # :nocov:
+        has_quotes = line =~ /#{options[:quote_char]}/
+        elements = parse_csv_line_c(line, options[:col_sep], options[:quote_char], header_size)
+        elements.map!{|x| cleanup_quotes(x, options[:quote_char])} if has_quotes
+        [elements, elements.size]
+        # :nocov:
+      else
+        # puts "WARNING: SmarterCSV is using un-accelerated parsing of lines. Check options[:acceleration]"
+        parse_csv_line_ruby(line, options, header_size)
+      end
+    end
+    # ------------------------------------------------------------------
+    # Ruby equivalent of the C-extension for parse_line
+    #
+    # parses a single line: either a CSV header and body line
+    # - quoting rules compared to RFC-4180 are somewhat relaxed
+    # - we are not assuming that quotes inside a fields need to be doubled
+    # - we are not assuming that all fields need to be quoted (0 is even)
+    # - works with multi-char col_sep
+    # - if header_size is given, only up to header_size fields are parsed
+    #
+    # We use header_size for parsing the body lines to make sure we always match the number of headers
+    # in case there are trailing col_sep characters in line
+    #
+    # Our convention is that empty fields are returned as empty strings, not as nil.
+    #
+    #
+    # the purpose of the max_size parameter is to handle a corner case where
+    # CSV lines contain more fields than the header.
+    # In which case the remaining fields in the line are ignored
+    #
+    def parse_csv_line_ruby(line, options, header_size = nil)
+      return [] if line.nil?
+      line_size = line.size
+      col_sep = options[:col_sep]
+      col_sep_size = col_sep.size
+      quote = options[:quote_char]
+      quote_count = 0
+      elements = []
+      start = 0
+      i = 0
+      previous_char = ''
+      while i < line_size
+        if line[i...i+col_sep_size] == col_sep && quote_count.even?
+          break if !header_size.nil? && elements.size >= header_size
+          elements << cleanup_quotes(line[start...i], quote)
+          previous_char = line[i]
+          i += col_sep.size
+          start = i
+        else
+          quote_count += 1 if line[i] == quote && previous_char != '\\'
+          previous_char = line[i]
+          i += 1
+        end
+      end
+      elements << cleanup_quotes(line[start..-1], quote) if header_size.nil? || elements.size < header_size
+      [elements, elements.size]
+    end
+    def cleanup_quotes(field, quote)
+      return field if field.nil?
+      # return if field !~ /#{quote}/ # this check can probably eliminated
+      if field.start_with?(quote) && field.end_with?(quote)
+        field.delete_prefix!(quote)
+        field.delete_suffix!(quote)
+      end
+      field.gsub!("#{quote}#{quote}", quote)
+      field
+    end
+  end
+end