RubyGems - smarter_csv - Versions diffs - 1.9.2 → 1.10.0 - Mend

smarter_csv 1.9.2 → 1.10.0

Files changed (17) hide show

checksums.yaml +4 -4
data/CHANGELOG.md +23 -0
data/README.md +29 -8
data/lib/smarter_csv/auto_detection.rb +73 -0
data/lib/smarter_csv/file_io.rb +50 -0
data/lib/smarter_csv/hash_transformations.rb +91 -0
data/lib/smarter_csv/header_transformations.rb +63 -0
data/lib/smarter_csv/header_validations.rb +34 -0
data/lib/smarter_csv/headers.rb +68 -0
data/lib/smarter_csv/options_processing.rb +10 -1
data/lib/smarter_csv/parse.rb +90 -0
data/lib/smarter_csv/smarter_csv.rb +79 -416
data/lib/smarter_csv/variables.rb +30 -0
data/lib/smarter_csv/version.rb +1 -1
data/lib/smarter_csv.rb +16 -3
metadata +11 -4
data/lib/core_ext/hash.rb +0 -9

checksums.yaml CHANGED Viewed

@@ -1,7 +1,7 @@
 ---
 SHA256:
-  metadata.gz: 3e4032569303bd062a92b3c3f45f5166346808291667dda9ebd91af123f532ef
-  data.tar.gz: 78b73abc411d8ed866feae600b87b72c3c99fd3b00b67c81eac227c17f8d38ea
+  metadata.gz: f1d0b58acf0135b621e3182470674230ef73b48c829810e74fffa975fc318cf5
+  data.tar.gz: ee404c5c485748d35cda36b8d249cb6813a3f80005182fe8c05feac1694aba57
 SHA512:
-  metadata.gz: 1712951a2ce4f6e8ad93a6e76a105a3a8d4890babacfbb9ae3eead11ac638962d9da3d45421a327049e87c9d54b43c0dca1327f11a13bbd54440d3a7fefc6253
-  data.tar.gz: 3d8b81f04c8eb16a7b2ab9ddf27bdaf2b2bfdd2ee3a8b70765a88f809fc9869500debe950d8ec27e3a6af818e6f1e415d96d078e52784d638f1363619088faa3
+  metadata.gz: 4fee097fe2237f863510100155062da6815237260da5b15189f104f54596f7d5ff0479deb80596544e0bb1b9ba7b78126d2251798721e8d2f91e06b430950cd6
+  data.tar.gz: c30562965452ef296b5e5aaf2a9a12887aa42d8e8396780b73b34f99a2386d232bf020578618fcbd65186fc864518c81a3e7555cae9b00a005322f3599e18c5a

data/CHANGELOG.md CHANGED Viewed

@@ -1,6 +1,29 @@
 # SmarterCSV 1.x Change Log
+## 1.10.0 (2023-12-31) ⚡ BREAKING ⚡
+  * BREAKING CHANGES:
+    Changed behavior:
+     + when `user_provided_headers` are provided:
+       * if they are not unique, an exception will now be raised
+       * they are taken "as is", no header transformations can be applied
+       * when they are given as strings or as symbols, it is assumed that this is the desired format
+       * the value of the `strings_as_keys` options will be ignored
+     + option `duplicate_header_suffix` now defaults to `''` instead of `nil`.
+       * this allows automatic disambiguation when processing of CSV files with duplicate headers, by appending a number
+       * explicitly set this option to `nil` to get the behavior from previous versions.
+  * performance and memory improvements
+  * code refactor
+## 1.9.3 (2023-12-16)
+  * raise SmarterCSV::IncorrectOption when `user_provided_headers` are empty
+  * code refactor / no functional changes
+  * added test cases
 ## 1.9.2 (2023-11-12)
   * fixed bug with '\\' at end of line (issue #252, thanks to averycrespi-moz)
   * fixed require statements (issue #249, thanks to PikachuEXE, courtsimas)

data/README.md CHANGED Viewed

@@ -2,15 +2,33 @@
 # SmarterCSV
  [![codecov](https://codecov.io/gh/tilo/smarter_csv/branch/main/graph/badge.svg?token=1L7OD80182)](https://codecov.io/gh/tilo/smarter_csv) [![Gem Version](https://badge.fury.io/rb/smarter_csv.svg)](http://badge.fury.io/rb/smarter_csv)
+#### LATEST CHANGES
+* Version 1.10.0 has BREAKING CHANGES:
+    Changed behavior:
+     + when `user_provided_headers` are provided:
+       * if they are not unique, an exception will now be raised
+       * they are taken "as is", no header transformations can be applied
+       * when they are given as strings or as symbols, it is assumed that this is the desired format
+       * the value of the `strings_as_keys` options will be ignored
+     + option `duplicate_header_suffix` now defaults to `''` instead of `nil`.
+       * this allows automatic disambiguation when processing of CSV files with duplicate headers, by appending a number
+       * explicitly set this option to `nil` to get the behavior from previous versions.
 #### Development Branches
 * default branch is `main` for 1.x development
-* 2.x development is on `2.0-development` (check this branch for 2.0 documentation)
+* 2.x development is on `2.0-development` (check this branch for 2.0 documentation)
+  - This is an EXPERIMENTAL branch - DO NOT USE in production
-#### Work towards Future Version 2.0
+#### Work towards Future Version 2.x
-* Work towards SmarterCSV 2.0 is still ongoing, with improved features, and more streamlined options, but consider it as experimental at this time.
+* Work towards SmarterCSV 2.x is still ongoing, with improved features, and more streamlined options, but consider it as experimental at this time.
   Please check the [2.0-develop branch](https://github.com/tilo/smarter_csv/tree/2.0-develop), open any issues and pull requests with mention of tag v2.0.
 ---------------
@@ -84,6 +102,10 @@ $ hexdump -C spec/fixtures/bom_test_feff.csv
 00000040  73 2c 35 36 37 38 0d 0a                           |s,5678..|
 ```
+### Articles
+* [Processing 1.4 Million CSV Records in Ruby, fast ](https://lcx.wien/blog/processing-14-million-csv-records-in-ruby/)
+* [Speeding up CSV parsing with parallel processing](http://xjlin0.github.io/tech/2015/05/25/faster-parsing-csv-with-parallel-processing)
 ### Examples
 Here are some examples to demonstrate the versatility of SmarterCSV.
@@ -243,8 +265,6 @@ NOTE: If you use `key_mappings` and `value_converters`, make sure that the value
     data[0][:price].class
       => Float
 ```
-## Parallel Processing
-[Jack](https://github.com/xjlin0) wrote an interesting article about [Speeding up CSV parsing with parallel processing](http://xjlin0.github.io/tech/2015/05/25/faster-parsing-csv-with-parallel-processing)
 ## Documentation
@@ -280,7 +300,8 @@ The options and the block are optional.
      | :headers_in_file            |   true   | Whether or not the file contains headers as the first line.                          |
      |                             |          | Important if the file does not contain headers,                                      |
      |                             |          | otherwise you would lose the first line of data.                                     |
-     | :duplicate_header_suffix    |   nil    | If set, adds numbers to duplicated headers and separates them by the given suffix    |
+     | :duplicate_header_suffix    |   ''     | Adds numbers to duplicated headers and separates them by the given suffix.           |
+     |                             |          | Set this to nil to raise `DuplicateHeaders` error instead (previous behavior)        |
      | :user_provided_headers      |   nil    | *careful with that axe!*                                                             |
      |                             |          | user provided Array of header strings or symbols, to define                          |
      |                             |          | what headers should be used, overriding any in-file headers.                         |
@@ -300,7 +321,7 @@ And header and data validations will also be supported in 2.x
      | Option                      | Default  |  Explanation                                                                         |
      ---------------------------------------------------------------------------------------------------------------------------------
      | :key_mapping                |   nil    | a hash which maps headers from the CSV file to keys in the result hash               |
-     | :silence_missing_key        |   false  | ignore missing keys in `key_mapping`                                   |
+     | :silence_missing_keys        |   false  | ignore missing keys in `key_mapping`                                   |
      |                             |          | if set to true: makes all mapped keys optional                         |
      |                             |          | if given an array, makes only the keys listed in it optional                         |
      | :required_keys              |   nil    | An array. Specify the required names AFTER header transformation.                  |

data/lib/smarter_csv/auto_detection.rb ADDED Viewed

@@ -0,0 +1,73 @@
+# frozen_string_literal: true
+module SmarterCSV
+  class << self
+    protected
+    # If file has headers, then guesses column separator from headers.
+    # Otherwise guesses column separator from contents.
+    # Raises exception if none is found.
+    def guess_column_separator(filehandle, options)
+      skip_lines(filehandle, options)
+      delimiters = [',', "\t", ';', ':', '|']
+      line = nil
+      has_header = options[:headers_in_file]
+      candidates = Hash.new(0)
+      count = has_header ? 1 : 5
+      count.times do
+        line = readline_with_counts(filehandle, options)
+        delimiters.each do |d|
+          candidates[d] += line.scan(d).count
+        end
+      rescue EOFError # short files
+        break
+      end
+      rewind(filehandle)
+      if candidates.values.max == 0
+        # if the header only contains
+        return ',' if line.chomp(options[:row_sep]) =~ /^\w+$/
+        raise SmarterCSV::NoColSepDetected
+      end
+      candidates.key(candidates.values.max)
+    end
+    # limitation: this currently reads the whole file in before making a decision
+    def guess_line_ending(filehandle, options)
+      counts = {"\n" => 0, "\r" => 0, "\r\n" => 0}
+      quoted_char = false
+      # count how many of the pre-defined line-endings we find
+      # ignoring those contained within quote characters
+      last_char = nil
+      lines = 0
+      filehandle.each_char do |c|
+        quoted_char = !quoted_char if c == options[:quote_char]
+        next if quoted_char
+        if last_char == "\r"
+          if c == "\n"
+            counts["\r\n"] += 1
+          else
+            counts["\r"] += 1 # \r are counted after they appeared
+          end
+        elsif c == "\n"
+          counts["\n"] += 1
+        end
+        last_char = c
+        lines += 1
+        break if options[:auto_row_sep_chars] && options[:auto_row_sep_chars] > 0 && lines >= options[:auto_row_sep_chars]
+      end
+      rewind(filehandle)
+      counts["\r"] += 1 if last_char == "\r"
+      # find the most frequent key/value pair:
+      most_frequent_key, _count = counts.max_by{|_, v| v}
+      most_frequent_key
+    end
+  end
+end

data/lib/smarter_csv/file_io.rb ADDED Viewed

@@ -0,0 +1,50 @@
+# frozen_string_literal: true
+module SmarterCSV
+  class << self
+    protected
+    def readline_with_counts(filehandle, options)
+      line = filehandle.readline(options[:row_sep])
+      @file_line_count += 1
+      @csv_line_count += 1
+      line = remove_bom(line) if @csv_line_count == 1
+      line
+    end
+    def skip_lines(filehandle, options)
+      options[:skip_lines].to_i.times do
+        readline_with_counts(filehandle, options)
+      end
+    end
+    def rewind(filehandle)
+      @file_line_count = 0
+      @csv_line_count = 0
+      filehandle.rewind
+    end
+    private
+    UTF_32_BOM = %w[0 0 fe ff].freeze
+    UTF_32LE_BOM = %w[ff fe 0 0].freeze
+    UTF_8_BOM = %w[ef bb bf].freeze
+    UTF_16_BOM = %w[fe ff].freeze
+    UTF_16LE_BOM = %w[ff fe].freeze
+    def remove_bom(str)
+      str_as_hex = str.bytes.map{|x| x.to_s(16)}
+      # if string does not start with one of the bytes, there is no BOM
+      return str unless %w[ef fe ff 0].include?(str_as_hex[0])
+      return str.byteslice(4..-1) if [UTF_32_BOM, UTF_32LE_BOM].include?(str_as_hex[0..3])
+      return str.byteslice(3..-1) if str_as_hex[0..2] == UTF_8_BOM
+      return str.byteslice(2..-1) if [UTF_16_BOM, UTF_16LE_BOM].include?(str_as_hex[0..1])
+      # :nocov:
+      puts "SmarterCSV found unhandled BOM! #{str.chars[0..7].inspect}"
+      str
+      # :nocov:
+    end
+  end
+end

data/lib/smarter_csv/hash_transformations.rb ADDED Viewed

@@ -0,0 +1,91 @@
+# frozen_string_literal: true
+module SmarterCSV
+  class << self
+    def hash_transformations(hash, options)
+      # there may be unmapped keys, or keys purposedly mapped to nil or an empty key..
+      # make sure we delete any key/value pairs from the hash, which the user wanted to delete:
+      remove_empty_values = options[:remove_empty_values] == true
+      remove_zero_values = options[:remove_zero_values]
+      remove_values_matching = options[:remove_values_matching]
+      convert_to_numeric = options[:convert_values_to_numeric]
+      value_converters = options[:value_converters]
+      hash.each_with_object({}) do |(k, v), new_hash|
+        next if k.nil? || k == '' || k == :""
+        next if remove_empty_values && (has_rails ? v.blank? : blank?(v))
+        next if remove_zero_values && v.is_a?(String) && v =~ /^(0+|0+\.0+)$/ # values are Strings
+        next if remove_values_matching && v =~ remove_values_matching
+        # deal with the :only / :except options to :convert_values_to_numeric
+        if convert_to_numeric && !limit_execution_for_only_or_except(options, :convert_values_to_numeric, k)
+          if v =~ /^[+-]?\d+\.\d+$/
+            v = v.to_f
+          elsif v =~ /^[+-]?\d+$/
+            v = v.to_i
+          end
+        end
+        converter = value_converters[k] if value_converters
+        v = converter.convert(v) if converter
+        new_hash[k] = v
+      end
+    end
+    # def hash_transformations(hash, options)
+    #   # there may be unmapped keys, or keys purposedly mapped to nil or an empty key..
+    #   # make sure we delete any key/value pairs from the hash, which the user wanted to delete:
+    #   hash.delete(nil)
+    #   hash.delete('')
+    #   hash.delete(:"")
+    #   if options[:remove_empty_values] == true
+    #     hash.delete_if{|_k, v| has_rails ? v.blank? : blank?(v)}
+    #   end
+    #   hash.delete_if{|_k, v| !v.nil? && v =~ /^(0+|0+\.0+)$/} if options[:remove_zero_values] # values are Strings
+    #   hash.delete_if{|_k, v| v =~ options[:remove_values_matching]} if options[:remove_values_matching]
+    #   if options[:convert_values_to_numeric]
+    #     hash.each do |k, v|
+    #       # deal with the :only / :except options to :convert_values_to_numeric
+    #       next if limit_execution_for_only_or_except(options, :convert_values_to_numeric, k)
+    #       # convert if it's a numeric value:
+    #       case v
+    #       when /^[+-]?\d+\.\d+$/
+    #         hash[k] = v.to_f
+    #       when /^[+-]?\d+$/
+    #         hash[k] = v.to_i
+    #       end
+    #     end
+    #   end
+    #   if options[:value_converters]
+    #     hash.each do |k, v|
+    #       converter = options[:value_converters][k]
+    #       next unless converter
+    #       hash[k] = converter.convert(v)
+    #     end
+    #   end
+    #   hash
+    # end
+    protected
+    # acts as a road-block to limit processing when iterating over all k/v pairs of a CSV-hash:
+    def limit_execution_for_only_or_except(options, option_name, key)
+      if options[option_name].is_a?(Hash)
+        if options[option_name].has_key?(:except)
+          return true if Array(options[option_name][:except]).include?(key)
+        elsif options[option_name].has_key?(:only)
+          return true unless Array(options[option_name][:only]).include?(key)
+        end
+      end
+      false
+    end
+  end
+end

data/lib/smarter_csv/header_transformations.rb ADDED Viewed

@@ -0,0 +1,63 @@
+# frozen_string_literal: true
+module SmarterCSV
+  class << self
+    # transform the headers that were in the file:
+    def header_transformations(header_array, options)
+      header_array.map!{|x| x.gsub(%r/#{options[:quote_char]}/, '')}
+      header_array.map!{|x| x.strip} if options[:strip_whitespace]
+      unless options[:keep_original_headers]
+        header_array.map!{|x| x.gsub(/\s+|-+/, '_')}
+        header_array.map!{|x| x.downcase} if options[:downcase_header]
+      end
+      # detect duplicate headers and disambiguate
+      header_array = disambiguate_headers(header_array, options) if options[:duplicate_header_suffix]
+      # symbolize headers
+      header_array = header_array.map{|x| x.to_sym } unless options[:strings_as_keys] || options[:keep_original_headers]
+      # doesn't make sense to re-map when we have user_provided_headers
+      header_array = remap_headers(header_array, options) if options[:key_mapping]
+      header_array
+    end
+    def disambiguate_headers(headers, options)
+      counts = Hash.new(0)
+      headers.map do |header|
+        counts[header] += 1
+        counts[header] > 1 ? "#{header}#{options[:duplicate_header_suffix]}#{counts[header]}" : header
+      end
+    end
+    # do some key mapping on the keys in the file header
+    # if you want to completely delete a key, then map it to nil or to ''
+    def remap_headers(headers, options)
+      key_mapping = options[:key_mapping]
+      if key_mapping.empty? || !key_mapping.is_a?(Hash) || key_mapping.keys.empty?
+        raise(SmarterCSV::IncorrectOption, "ERROR: incorrect format for key_mapping! Expecting hash with from -> to mappings")
+      end
+      key_mapping = options[:key_mapping]
+      # if silence_missing_keys are not set, raise error if missing header
+      missing_keys = key_mapping.keys - headers
+      # if the user passes a list of speciffic mapped keys that are optional
+      missing_keys -= options[:silence_missing_keys] if options[:silence_missing_keys].is_a?(Array)
+      unless missing_keys.empty? || options[:silence_missing_keys] == true
+        raise SmarterCSV::KeyMappingError, "ERROR: can not map headers: #{missing_keys.join(', ')}"
+      end
+      headers.map! do |header|
+        if key_mapping.has_key?(header)
+          key_mapping[header].nil? ? nil : key_mapping[header]
+        elsif options[:remove_unmapped_keys]
+          nil
+        else
+          header
+        end
+      end
+      headers
+    end
+  end
+end

data/lib/smarter_csv/header_validations.rb ADDED Viewed

@@ -0,0 +1,34 @@
+# frozen_string_literal: true
+module SmarterCSV
+  class << self
+    def header_validations(headers, options)
+      check_duplicate_headers(headers, options)
+      check_required_headers(headers, options)
+    end
+    def check_duplicate_headers(headers, _options)
+      header_counts = Hash.new(0)
+      headers.each { |header| header_counts[header] += 1 unless header.nil? }
+      duplicates = header_counts.select { |_, count| count > 1 }
+      unless duplicates.empty?
+        raise(SmarterCSV::DuplicateHeaders, "Duplicate Headers in CSV: #{duplicates.inspect}")
+      end
+    end
+    require 'set'
+    def check_required_headers(headers, options)
+      if options[:required_keys] && options[:required_keys].is_a?(Array)
+        headers_set = headers.to_set
+        missing_keys = options[:required_keys].select { |k| !headers_set.include?(k) }
+        unless missing_keys.empty?
+          raise SmarterCSV::MissingKeys, "ERROR: missing attributes: #{missing_keys.join(',')}"
+        end
+      end
+    end
+  end
+end

data/lib/smarter_csv/headers.rb ADDED Viewed

@@ -0,0 +1,68 @@
+# frozen_string_literal: true
+module SmarterCSV
+  class << self
+    def process_headers(filehandle, options)
+      @raw_header = nil # header as it appears in the file
+      @headers = nil # the processed headers
+      header_array = []
+      file_header_size = nil
+      # if headers_in_file, get the headers -> We get the number of columns, even when user provided headers
+      if options[:headers_in_file] # extract the header line
+        # process the header line in the CSV file..
+        # the first line of a CSV file contains the header .. it might be commented out, so we need to read it anyhow
+        header_line = @raw_header = readline_with_counts(filehandle, options)
+        header_line = preprocess_header_line(header_line, options)
+        file_header_array, file_header_size = parse(header_line, options)
+        file_header_array = header_transformations(file_header_array, options)
+      else
+        unless options[:user_provided_headers]
+          raise SmarterCSV::IncorrectOption, "ERROR: If :headers_in_file is set to false, you have to provide :user_provided_headers"
+        end
+      end
+      if options[:user_provided_headers]
+        unless options[:user_provided_headers].is_a?(Array) && !options[:user_provided_headers].empty?
+          raise(SmarterCSV::IncorrectOption, "ERROR: incorrect format for user_provided_headers! Expecting array with headers.")
+        end
+        # use user-provided headers
+        user_header_array = options[:user_provided_headers]
+        # user_provided_headers: their count should match the headers_in_file if any
+        if defined?(file_header_size) && !file_header_size.nil?
+          if user_header_array.size != file_header_size
+            raise SmarterCSV::HeaderSizeMismatch, "ERROR: :user_provided_headers defines #{user_header_array.size} headers !=  CSV-file has #{file_header_size} headers"
+          else
+            # we could print out the mapping of file_header_array to header_array here
+          end
+        end
+        header_array = user_header_array
+      else
+        header_array = file_header_array
+      end
+      [header_array, header_array.size]
+    end
+    private
+    def preprocess_header_line(header_line, options)
+      header_line = enforce_utf8_encoding(header_line, options)
+      header_line = remove_comments_from_header(header_line, options)
+      header_line = header_line.chomp(options[:row_sep])
+      header_line.gsub!(options[:strip_chars_from_headers], '') if options[:strip_chars_from_headers]
+      header_line
+    end
+    def remove_comments_from_header(header, options)
+      return header unless options[:comment_regexp]
+      header.sub(options[:comment_regexp], '')
+    end
+  end
+end

data/lib/smarter_csv/options_processing.rb CHANGED Viewed

@@ -9,7 +9,7 @@ module SmarterCSV
     comment_regexp: nil, # was: /\A#/,
     convert_values_to_numeric: true,
     downcase_header: true,
-    duplicate_header_suffix: nil,
+    duplicate_header_suffix: '', # was: nil,
     file_encoding: 'utf-8',
     force_simple_split: false,
     force_utf8: false,
@@ -62,6 +62,15 @@ module SmarterCSV
     private
     def validate_options!(options)
+      # deprecate required_headers
+      unless options[:required_headers].nil?
+        puts "DEPRECATION WARNING: please use 'required_keys' instead of 'required_headers'"
+        if options[:required_keys].nil?
+          options[:required_keys] = options[:required_headers]
+          options[:required_headers] = nil
+        end
+      end
       keys = options.keys
       errors = []
       errors << "invalid row_sep" if keys.include?(:row_sep) && !option_valid?(options[:row_sep])

data/lib/smarter_csv/parse.rb ADDED Viewed

@@ -0,0 +1,90 @@
+# frozen_string_literal: true
+module SmarterCSV
+  class << self
+    protected
+    ###
+    ### Thin wrapper around C-extension
+    ###
+    def parse(line, options, header_size = nil)
+      # puts "SmarterCSV.parse OPTIONS: #{options[:acceleration]}" if options[:verbose]
+      if options[:acceleration] && has_acceleration?
+        # :nocov:
+        has_quotes = line =~ /#{options[:quote_char]}/
+        elements = parse_csv_line_c(line, options[:col_sep], options[:quote_char], header_size)
+        elements.map!{|x| cleanup_quotes(x, options[:quote_char])} if has_quotes
+        [elements, elements.size]
+        # :nocov:
+      else
+        # puts "WARNING: SmarterCSV is using un-accelerated parsing of lines. Check options[:acceleration]"
+        parse_csv_line_ruby(line, options, header_size)
+      end
+    end
+    # ------------------------------------------------------------------
+    # Ruby equivalent of the C-extension for parse_line
+    #
+    # parses a single line: either a CSV header and body line
+    # - quoting rules compared to RFC-4180 are somewhat relaxed
+    # - we are not assuming that quotes inside a fields need to be doubled
+    # - we are not assuming that all fields need to be quoted (0 is even)
+    # - works with multi-char col_sep
+    # - if header_size is given, only up to header_size fields are parsed
+    #
+    # We use header_size for parsing the body lines to make sure we always match the number of headers
+    # in case there are trailing col_sep characters in line
+    #
+    # Our convention is that empty fields are returned as empty strings, not as nil.
+    #
+    #
+    # the purpose of the max_size parameter is to handle a corner case where
+    # CSV lines contain more fields than the header.
+    # In which case the remaining fields in the line are ignored
+    #
+    def parse_csv_line_ruby(line, options, header_size = nil)
+      return [] if line.nil?
+      line_size = line.size
+      col_sep = options[:col_sep]
+      col_sep_size = col_sep.size
+      quote = options[:quote_char]
+      quote_count = 0
+      elements = []
+      start = 0
+      i = 0
+      previous_char = ''
+      while i < line_size
+        if line[i...i+col_sep_size] == col_sep && quote_count.even?
+          break if !header_size.nil? && elements.size >= header_size
+          elements << cleanup_quotes(line[start...i], quote)
+          previous_char = line[i]
+          i += col_sep.size
+          start = i
+        else
+          quote_count += 1 if line[i] == quote && previous_char != '\\'
+          previous_char = line[i]
+          i += 1
+        end
+      end
+      elements << cleanup_quotes(line[start..-1], quote) if header_size.nil? || elements.size < header_size
+      [elements, elements.size]
+    end
+    def cleanup_quotes(field, quote)
+      return field if field.nil?
+      # return if field !~ /#{quote}/ # this check can probably eliminated
+      if field.start_with?(quote) && field.end_with?(quote)
+        field.delete_prefix!(quote)
+        field.delete_suffix!(quote)
+      end
+      field.gsub!("#{quote}#{quote}", quote)
+      field
+    end
+  end
+end