RubyGems - smarter_csv - Versions diffs - 1.5.0 → 1.6.0 - Mend

smarter_csv 1.5.0 → 1.6.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (17) hide show

checksums.yaml +4 -4
data/CHANGELOG.md +13 -0
data/CONTRIBUTORS.md +1 -0
data/README.md +17 -4
data/lib/smarter_csv/smarter_csv.rb +182 -102
data/lib/smarter_csv/version.rb +1 -1
data/smarter_csv.gemspec +1 -1
data/spec/fixtures/duplicate_headers.csv +1 -1
data/spec/smarter_csv/duplicate_headers_spec.rb +76 -0
data/spec/smarter_csv/invalid_headers_spec.rb +8 -22
data/spec/smarter_csv/malformed_spec.rb +15 -7
data/spec/smarter_csv/no_header_spec.rb +16 -11
data/spec/smarter_csv/parse/column_separator_spec.rb +61 -0
data/spec/smarter_csv/parse/old_csv_library_spec.rb +74 -0
data/spec/smarter_csv/parse/rfc4180_and_more_spec.rb +170 -0
data/spec/smarter_csv/quoted_spec.rb +8 -4
metadata +25 -4

checksums.yaml CHANGED Viewed

@@ -1,7 +1,7 @@
 ---
 SHA256:
-  metadata.gz: 23032eface2d1d918bcd6daabb4ca79e03096612bda1017d06f1b0542d0c4619
-  data.tar.gz: 12b68eeafc4f83c06b66da45b27da5e716675bff1e77be2362c2c10006821d9c
+  metadata.gz: fd2cf82aafc3b45257fbdfc594ed8e1d3bf2226e59cbee144b3003d8f79ec6cf
+  data.tar.gz: 95df862865e3123cf86194d47107f140f69f2fc91c20aba01d4004e8bffa5d74
 SHA512:
-  metadata.gz: 5b84337de25ed7a8492088b82342e6d3b16d1fdc95120f9699986aee7d9416a51cfec981eb125e0d4b17600bc1c06c52eb3b2251857668210d9402e95bb75860
-  data.tar.gz: b26b40b49bf6d739df9cd5deb477c33fdc22c54ed88c96f87e401fef789aedf3f9c55d25df60e4900a1e2a3c8bc0fc6e78018b128ab6ac14062fd97f694f3568
+  metadata.gz: df32ae9a380fa4fff0932d56e8a0cacadb8d4ebf7d8124e607f2ba389c3b60f875c300a2137fe04aac2b7eda77850b343af5e58c53e310f90460f96223f3228c
+  data.tar.gz: 107e1dbacdc6293a0c044a91cf237f50fbeab59eb5b032167f55d0fe6c2cf07b079c6cfe296368c03d2c84e64ce3c0e6ad744043397cdc217cd3ab51beb3ab09

data/CHANGELOG.md CHANGED Viewed

@@ -1,6 +1,19 @@
 # SmarterCSV 1.x Change Log
+## 1.6.0 (2022-05-03)
+  * completely rewrote line parser
+  * added methods `SmarterCSV.raw_headers` and `SmarterCSV.headers` to allow easy examination of how the headers are processed.
+## 1.5.2 (2022-04-29)
+  * added missing keys to the SmarterCSV::KeyMappingError exception message #189 (thanks to John Dell)
+## 1.5.1 (2022-04-27)
+  * added raising of `KeyMappingError` if `key_mapping` refers to a non-existent key
+  * added option `duplicate_header_suffix` (thanks to Skye Shaw)
+    When given a non-nil string, it uses the suffix to append numbering 2..n to duplicate headers.
+    If your code will need to process arbitrary CSV files, please set `duplicate_header_suffix`.
 ## 1.5.0 (2022-04-25)
   * fixed bug with trailing col_sep characters, introduced in 1.4.0
   * Fix deprecation warning in Ruby 3.0.3 / $INPUT_RECORD_SEPARATOR (thanks to Joel Fouse )

data/CONTRIBUTORS.md CHANGED Viewed

@@ -44,3 +44,4 @@ A Big Thank you to everyone who filed issues, sent comments, and who contributed
  * [Nicolas Guillemain](https://github.com/Viiruus)
  * [Sp6](https://github.com/sp6)
  * [Joel Fouse](https://github.com/jfouse)
+ * [John Dell](https://github.com/spovich)

data/README.md CHANGED Viewed

@@ -16,10 +16,12 @@
 # SmarterCSV
-[![Build Status](https://secure.travis-ci.org/tilo/smarter_csv.svg?branch=master)](http://travis-ci.org/tilo/smarter_csv) [![Gem Version](https://badge.fury.io/rb/smarter_csv.svg)](http://badge.fury.io/rb/smarter_csv)
+[![Build Status](https://secure.travis-ci.org/tilo/smarter_csv.svg?branch=master)](http://travis-ci.com/tilo/smarter_csv) [![Gem Version](https://badge.fury.io/rb/smarter_csv.svg)](http://badge.fury.io/rb/smarter_csv)
 #### SmarterCSV 1.x
+`smarter_csv` is now 10 years old, and still kicking! 🎉🎉🎉
 `smarter_csv` is a Ruby Gem for smarter importing of CSV Files as Array(s) of Hashes, suitable for direct processing with Mongoid or ActiveRecord,
 and parallel processing with Resque or Sidekiq.
@@ -42,11 +44,13 @@ NOTE; This Gem is only for importing CSV files - writing of CSV files is not sup
 ### Why?
-Ruby's CSV library's API is pretty old, and it's processing of CSV-files returning Arrays of Arrays feels 'very close to the metal'. The output is not easy to use - especially not if you want to create database records from it. Another shortcoming is that Ruby's CSV library does not have good support for huge CSV-files, e.g. there is no support for 'chunking' and/or parallel processing of the CSV-content (e.g. with Resque or Sidekiq),
+Ruby's CSV library's API is pretty old, and it's processing of CSV-files returning Arrays of Arrays feels 'very close to the metal'. The output is not easy to use - especially not if you want to create database records or Sidekiq jobs with it. Another shortcoming is that Ruby's CSV library does not have good support for huge CSV-files, e.g. there is no support for 'chunking' and/or parallel processing of the CSV-content (e.g. with Sidekiq).
+As the existing CSV libraries didn't fit my needs, I was writing my own CSV processing - specifically for use in connection with Rails ORMs like Mongoid, MongoMapper and ActiveRecord. In those ORMs you can easily pass a hash with attribute/value pairs to the create() method. The lower-level Mongo driver and Moped also accept larger arrays of such hashes to create a larger amount of records quickly with just one call. The same patterns are used when you pass data to Sidekiq jobs.
-As the existing CSV libraries didn't fit my needs, I was writing my own CSV processing - specifically for use in connection with Rails ORMs like Mongoid, MongoMapper or ActiveRecord. In those ORMs you can easily pass a hash with attribute/value pairs to the create() method. The lower-level Mongo driver and Moped also accept larger arrays of such hashes to create a larger amount of records quickly with just one call.
+For processing large CSV files it is essential to process them in chunks, so the memory impact is minimized.
-### Examples
+### How?
 The two main choices you have in terms of how to call `SmarterCSV.process` are:
  * calling `process` with or without a block
@@ -228,6 +232,7 @@ The options and the block are optional.
      | :headers_in_file            |   true   | Whether or not the file contains headers as the first line.                          |
      |                             |          | Important if the file does not contain headers,                                      |
      |                             |          | otherwise you would lose the first line of data.                                     |
+     | :duplicate_header_suffix    |   nil    | If set, adds numbers to duplicated headers and separates them by the given suffix    |
      | :user_provided_headers      |   nil    | *careful with that axe!*                                                             |
      |                             |          | user provided Array of header strings or symbols, to define                          |
      |                             |          | what headers should be used, overriding any in-file headers.                         |
@@ -282,6 +287,7 @@ And header and data validations will also be supported in 2.x
          data = SmarterCSV.process(f)
        end
 ```
 #### NOTES about CSV Headers:
  * as this method parses CSV files, it is assumed that the first line of any file will contain a valid header
  * the first line with the header might be commented out, in which case you will need to set `comment_regexp: /\A#/`
@@ -291,6 +297,13 @@ And header and data validations will also be supported in 2.x
  * you can not combine the :user_provided_headers and :key_mapping options
  * if the incorrect number of headers are provided via :user_provided_headers, exception SmarterCSV::HeaderSizeMismatch is raised
+#### NOTES on Duplicate Headers:
+ As a corner case, it is possible that a CSV file contains multiple headers with the same name.
+ * If that happens, by default `smarter_csv` will raise a `DuplicateHeaders` error.
+ * If you set `duplicate_header_suffix` to a non-nil string, it will use it to append numbers 2..n to the duplicate headers. To further disambiguate the headers, you can further use `key_mapping` to assign meaningful names.
+ * If your code will need to process arbitrary CSV files, please set `duplicate_header_suffix`.
+ * Another way to deal with duplicate headers it to use `user_assigned_headers` to ignore any headers in the file.
 #### NOTES on Key Mapping:
  * keys in the header line of the file can be re-mapped to a chosen set of symbols, so the resulting Hashes can be better used internally in your application (e.g. when directly creating MongoDB entries with them)
  * if you want to completely delete a key, then map it to nil or to '', they will be automatically deleted from any result Hash

data/lib/smarter_csv/smarter_csv.rb CHANGED Viewed

@@ -5,107 +5,38 @@ module SmarterCSV
   class DuplicateHeaders < SmarterCSVException; end
   class MissingHeaders < SmarterCSVException; end
   class NoColSepDetected < SmarterCSVException; end
+  class KeyMappingError < SmarterCSVException; end
+  class MalformedCSVError < SmarterCSVException; end
-  def SmarterCSV.process(input, options={}, &block)   # first parameter: filename or input object with readline method
+  # first parameter: filename or input object which responds to readline method
+  def SmarterCSV.process(input, options={}, &block)
     options = default_options.merge(options)
     options[:invalid_byte_sequence] = '' if options[:invalid_byte_sequence].nil?
     headerA = []
     result = []
-    file_line_count = 0
-    csv_line_count = 0
+    @file_line_count = 0
+    @csv_line_count = 0
     has_rails = !! defined?(Rails)
     begin
-      f = input.respond_to?(:readline) ? input : File.open(input, "r:#{options[:file_encoding]}")
+      fh = input.respond_to?(:readline) ? input : File.open(input, "r:#{options[:file_encoding]}")
       # auto-detect the row separator
-      options[:row_sep] = SmarterCSV.guess_line_ending(f, options) if options[:row_sep].to_sym == :auto
+      options[:row_sep] = SmarterCSV.guess_line_ending(fh, options) if options[:row_sep].to_sym == :auto
       # attempt to auto-detect column separator
-      options[:col_sep] = guess_column_separator(f, options) if options[:col_sep].to_sym == :auto
-      # preserve options, in case we need to call the CSV class
-      csv_options = options.select{|k,v| [:col_sep, :row_sep, :quote_char].include?(k)} # options.slice(:col_sep, :row_sep, :quote_char)
-      csv_options.delete(:row_sep) if [nil, :auto].include?( options[:row_sep].to_sym )
-      csv_options.delete(:col_sep) if [nil, :auto].include?( options[:col_sep].to_sym )
+      options[:col_sep] = guess_column_separator(fh, options) if options[:col_sep].to_sym == :auto
-      if (options[:force_utf8] || options[:file_encoding] =~ /utf-8/i) && ( f.respond_to?(:external_encoding) && f.external_encoding != Encoding.find('UTF-8') || f.respond_to?(:encoding) && f.encoding != Encoding.find('UTF-8') )
+      if (options[:force_utf8] || options[:file_encoding] =~ /utf-8/i) && ( fh.respond_to?(:external_encoding) && fh.external_encoding != Encoding.find('UTF-8') || fh.respond_to?(:encoding) && fh.encoding != Encoding.find('UTF-8') )
         puts 'WARNING: you are trying to process UTF-8 input, but did not open the input with "b:utf-8" option. See README file "NOTES about File Encodings".'
       end
-      options[:skip_lines].to_i.times{f.readline(options[:row_sep])} if options[:skip_lines].to_i > 0
-      if options[:headers_in_file]        # extract the header line
-        # process the header line in the CSV file..
-        # the first line of a CSV file contains the header .. it might be commented out, so we need to read it anyhow
-        header = f.readline(options[:row_sep])
-        header = header.force_encoding('utf-8').encode('utf-8', invalid: :replace, undef: :replace, replace: options[:invalid_byte_sequence]) if options[:force_utf8] || options[:file_encoding] !~ /utf-8/i
-        header = header.sub(options[:comment_regexp],'') if options[:comment_regexp]
-        header = header.chomp(options[:row_sep])
-        file_line_count += 1
-        csv_line_count += 1
-        header = header.gsub(options[:strip_chars_from_headers], '') if options[:strip_chars_from_headers]
-        if (header =~ %r{#{options[:quote_char]}}) and (! options[:force_simple_split])
-          file_headerA = begin
-            CSV.parse( header, **csv_options ).flatten.collect!{|x| x.nil? ? '' : x} # to deal with nil values from CSV.parse
-          rescue CSV::MalformedCSVError => e
-            raise $!, "#{$!} [SmarterCSV: csv line #{csv_line_count}]", $!.backtrace
-          end
-        else
-          file_headerA =  header.split(options[:col_sep])
-        end
-        file_header_size = file_headerA.size # before mapping, which could delete keys
-        file_headerA.map!{|x| x.gsub(%r/#{options[:quote_char]}/,'') }
-        file_headerA.map!{|x| x.strip}  if options[:strip_whitespace]
-        unless options[:keep_original_headers]
-          file_headerA.map!{|x| x.gsub(/\s+|-+/,'_')}
-          file_headerA.map!{|x| x.downcase }   if options[:downcase_header]
-        end
-      else
-        raise SmarterCSV::IncorrectOption , "ERROR: If :headers_in_file is set to false, you have to provide :user_provided_headers" if options[:user_provided_headers].nil?
-      end
-      if options[:user_provided_headers] && options[:user_provided_headers].class == Array && ! options[:user_provided_headers].empty?
-        # use user-provided headers
-        headerA = options[:user_provided_headers]
-        if defined?(file_header_size) && ! file_header_size.nil?
-          if headerA.size != file_header_size
-            raise SmarterCSV::HeaderSizeMismatch , "ERROR: :user_provided_headers defines #{headerA.size} headers !=  CSV-file #{input} has #{file_header_size} headers"
-          else
-            # we could print out the mapping of file_headerA to headerA here
-          end
+      if options[:skip_lines].to_i > 0
+        options[:skip_lines].to_i.times do
+          readline_with_counts(fh, options)
         end
-      else
-        headerA = file_headerA
       end
-      header_size = headerA.size # used for splitting lines
-      headerA.map!{|x| x.to_sym } unless options[:strings_as_keys] || options[:keep_original_headers]
-      unless options[:user_provided_headers] # wouldn't make sense to re-map user provided headers
-        key_mappingH = options[:key_mapping]
-        # do some key mapping on the keys in the file header
-        #   if you want to completely delete a key, then map it to nil or to ''
-        if ! key_mappingH.nil? && key_mappingH.class == Hash && key_mappingH.keys.size > 0
-          headerA.map!{|x| key_mappingH.has_key?(x) ? (key_mappingH[x].nil? ? nil : key_mappingH[x]) : (options[:remove_unmapped_keys] ? nil : x)}
-        end
-      end
-      # header_validations
-      duplicate_headers = []
-      headerA.compact.each do |k|
-        duplicate_headers << k if headerA.select{|x| x == k}.size > 1
-      end
-      raise SmarterCSV::DuplicateHeaders , "ERROR: duplicate headers: #{duplicate_headers.join(',')}" unless duplicate_headers.empty?
-      if options[:required_headers] && options[:required_headers].is_a?(Array)
-        missing_headers = []
-        options[:required_headers].each do |k|
-          missing_headers << k unless headerA.include?(k)
-        end
-        raise SmarterCSV::MissingHeaders , "ERROR: missing headers: #{missing_headers.join(',')}" unless missing_headers.empty?
-      end
+      headerA, header_size = process_headers(fh, options)
       # in case we use chunking.. we'll need to set it up..
       if ! options[:chunk_size].nil? && options[:chunk_size].to_i > 0
@@ -118,15 +49,13 @@ module SmarterCSV
       end
       # now on to processing all the rest of the lines in the CSV file:
-      while ! f.eof?    # we can't use f.readlines() here, because this would read the whole file into memory at once, and eof => true
-        line = f.readline(options[:row_sep])  # read one line
+      while ! fh.eof?    # we can't use fh.readlines() here, because this would read the whole file into memory at once, and eof => true
+        line = readline_with_counts(fh, options)
         # replace invalid byte sequence in UTF-8 with question mark to avoid errors
         line = line.force_encoding('utf-8').encode('utf-8', invalid: :replace, undef: :replace, replace: options[:invalid_byte_sequence]) if options[:force_utf8] || options[:file_encoding] !~ /utf-8/i
-        file_line_count += 1
-        csv_line_count += 1
-        print "processing file line %10d, csv line %10d\r" % [file_line_count, csv_line_count] if options[:verbose]
+        print "processing file line %10d, csv line %10d\r" % [@file_line_count, @csv_line_count] if options[:verbose]
         next if options[:comment_regexp] && line =~ options[:comment_regexp] # ignore all comment lines if there are any
@@ -135,24 +64,17 @@ module SmarterCSV
         # by detecting the existence of an uneven number of quote characters
         multiline = line.count(options[:quote_char])%2 == 1 # should handle quote_char nil
         while line.count(options[:quote_char])%2 == 1 # should handle quote_char nil
-          next_line = f.readline(options[:row_sep])
+          next_line = fh.readline(options[:row_sep])
           next_line = next_line.force_encoding('utf-8').encode('utf-8', invalid: :replace, undef: :replace, replace: options[:invalid_byte_sequence]) if options[:force_utf8] || options[:file_encoding] !~ /utf-8/i
           line += next_line
-          file_line_count += 1
+          @file_line_count += 1
         end
-        print "\nline contains uneven number of quote chars so including content through file line %d\n" % file_line_count if options[:verbose] && multiline
+        print "\nline contains uneven number of quote chars so including content through file line %d\n" % @file_line_count if options[:verbose] && multiline
         line.chomp!(options[:row_sep])
-        if (line =~ %r{#{options[:quote_char]}}) and (! options[:force_simple_split])
-          dataA = begin
-            CSV.parse( line, **csv_options ).flatten.collect!{|x| x.nil? ? '' : x} # to deal with nil values from CSV.parse
-          rescue CSV::MalformedCSVError => e
-            raise $!, "#{$!} [SmarterCSV: csv line #{csv_line_count}]", $!.backtrace
-          end
-        else
-          dataA = line.split(options[:col_sep], header_size)
-        end
+        dataA, data_size = parse(line, options, header_size)
         dataA.map!{|x| x.sub(/(#{options[:col_sep]})+\z/, '')} # remove any unwanted trailing col_sep characters at the end
         dataA.map!{|x| x.strip} if options[:strip_whitespace]
@@ -208,7 +130,7 @@ module SmarterCSV
         if use_chunks
           chunk << hash  # append temp result to chunk
-          if chunk.size >= chunk_size || f.eof?   # if chunk if full, or EOF reached
+          if chunk.size >= chunk_size || fh.eof?   # if chunk if full, or EOF reached
             # do something with the chunk
             if block_given?
               yield chunk  # do something with the hashes in the chunk in the block
@@ -249,7 +171,7 @@ module SmarterCSV
         chunk = []  # initialize for next chunk of data
       end
     ensure
-      f.close if f.respond_to?(:close)
+      fh.close if fh.respond_to?(:close)
     end
     if block_given?
       return chunk_count  # when we do processing through a block we only care how many chunks we processed
@@ -268,6 +190,7 @@ module SmarterCSV
       comment_regexp: nil, # was: /\A#/,
       convert_values_to_numeric: true,
       downcase_header: true,
+      duplicate_header_suffix: nil,
       file_encoding: 'utf-8',
       force_simple_split: false ,
       force_utf8: false,
@@ -293,6 +216,62 @@ module SmarterCSV
     }
   end
+  def self.readline_with_counts(filehandle, options)
+    line  = filehandle.readline(options[:row_sep])
+    @file_line_count += 1
+    @csv_line_count += 1
+    line
+  end
+  # parses a single line: either a CSV header and body line
+  # - quoting rules compared to RFC-4180 are somewhat relaxed
+  # - we are not assuming that quotes inside a fields need to be doubled
+  # - we are not assuming that all fields need to be quoted (0 is even)
+  # - works with multi-char col_sep
+  # - if header_size is given, only up to header_size fields are parsed
+  #
+  # We use header_size for parsing the body lines to make sure we always match the number of headers
+  # in case there are trailing col_sep characters in line
+  #
+  # Our convention is that empty fields are returned as empty strings, not as nil.
+  #
+  def self.parse(line, options, header_size = nil)
+    return [] if line.nil?
+    col_sep = options[:col_sep]
+    quote = options[:quote_char]
+    quote_count = 0
+    elements = []
+    start = 0
+    i = 0
+    while i < line.size do
+      if line[i...i+col_sep.size] == col_sep && quote_count.even?
+        break if !header_size.nil? && elements.size >= header_size
+        elements << cleanup_quotes(line[start...i], quote)
+        i += col_sep.size
+        start = i
+      else
+        quote_count += 1 if line[i] == quote
+        i += 1
+      end
+    end
+    elements << cleanup_quotes(line[start..-1], quote) if header_size.nil? || elements.size < header_size
+    [elements, elements.size]
+  end
+  def self.cleanup_quotes(field, quote)
+    return field if field.nil? || field !~ /#{quote}/
+    if field.start_with?(quote) && field.end_with?(quote)
+      field.delete_prefix!(quote)
+      field.delete_suffix!(quote)
+    end
+    field.gsub!("#{quote}#{quote}", quote)
+    field
+  end
   def self.blank?(value)
     case value
     when Array
@@ -378,4 +357,105 @@ module SmarterCSV
     k,_ = counts.max_by{|_,v| v}
     return k                    # the most frequent one is it
   end
+  def self.raw_hearder
+    @raw_header
+  end
+  def self.headers
+    @headers
+  end
+  def self.process_headers(filehandle, options)
+    @raw_header = nil
+    @headers = nil
+    if options[:headers_in_file]        # extract the header line
+      # process the header line in the CSV file..
+      # the first line of a CSV file contains the header .. it might be commented out, so we need to read it anyhow
+      header = readline_with_counts(filehandle, options)
+      @raw_header = header
+      header = header.force_encoding('utf-8').encode('utf-8', invalid: :replace, undef: :replace, replace: options[:invalid_byte_sequence]) if options[:force_utf8] || options[:file_encoding] !~ /utf-8/i
+      header = header.sub(options[:comment_regexp],'') if options[:comment_regexp]
+      header = header.chomp(options[:row_sep])
+      header = header.gsub(options[:strip_chars_from_headers], '') if options[:strip_chars_from_headers]
+      file_headerA, file_header_size = parse(header, options)
+      file_headerA.map!{|x| x.gsub(%r/#{options[:quote_char]}/,'') }
+      file_headerA.map!{|x| x.strip}  if options[:strip_whitespace]
+      unless options[:keep_original_headers]
+        file_headerA.map!{|x| x.gsub(/\s+|-+/,'_')}
+        file_headerA.map!{|x| x.downcase }   if options[:downcase_header]
+      end
+    else
+      raise SmarterCSV::IncorrectOption , "ERROR: If :headers_in_file is set to false, you have to provide :user_provided_headers" unless options[:user_provided_headers]
+    end
+    if options[:user_provided_headers] && options[:user_provided_headers].class == Array && ! options[:user_provided_headers].empty?
+      # use user-provided headers
+      headerA = options[:user_provided_headers]
+      if defined?(file_header_size) && ! file_header_size.nil?
+        if headerA.size != file_header_size
+          raise SmarterCSV::HeaderSizeMismatch , "ERROR: :user_provided_headers defines #{headerA.size} headers !=  CSV-file #{input} has #{file_header_size} headers"
+        else
+          # we could print out the mapping of file_headerA to headerA here
+        end
+      end
+    else
+      headerA = file_headerA
+    end
+    # detect duplicate headers and disambiguate
+    headerA = process_duplicate_headers(headerA, options) if options[:duplicate_header_suffix]
+    header_size = headerA.size # used for splitting lines
+    headerA.map!{|x| x.to_sym } unless options[:strings_as_keys] || options[:keep_original_headers]
+    unless options[:user_provided_headers] # wouldn't make sense to re-map user provided headers
+      key_mappingH = options[:key_mapping]
+      # do some key mapping on the keys in the file header
+      #   if you want to completely delete a key, then map it to nil or to ''
+      if ! key_mappingH.nil? && key_mappingH.class == Hash && key_mappingH.keys.size > 0
+        # we can't map keys that are not there
+        missing_keys = key_mappingH.keys - headerA
+        raise(SmarterCSV::KeyMappingError, "missing header(s): #{missing_keys.join(",")}") unless missing_keys.empty?
+        headerA.map!{|x| key_mappingH.has_key?(x) ? (key_mappingH[x].nil? ? nil : key_mappingH[x]) : (options[:remove_unmapped_keys] ? nil : x)}
+      end
+    end
+    # header_validations
+    duplicate_headers = []
+    headerA.compact.each do |k|
+      duplicate_headers << k if headerA.select{|x| x == k}.size > 1
+    end
+    raise SmarterCSV::DuplicateHeaders , "ERROR: duplicate headers: #{duplicate_headers.join(',')}" unless duplicate_headers.empty?
+    if options[:required_headers] && options[:required_headers].is_a?(Array)
+      missing_headers = []
+      options[:required_headers].each do |k|
+        missing_headers << k unless headerA.include?(k)
+      end
+      raise SmarterCSV::MissingHeaders , "ERROR: missing headers: #{missing_headers.join(',')}" unless missing_headers.empty?
+    end
+    @headers = headerA
+    [headerA, header_size]
+  end
+  def self.process_duplicate_headers(headers, options)
+    counts = Hash.new(0)
+    result = []
+    headers.each do |key|
+      counts[key] += 1
+      if counts[key] == 1
+        result << key
+      else
+        result << [key, options[:duplicate_header_suffix], counts[key]].join
+      end
+    end
+    result
+  end
 end

data/lib/smarter_csv/version.rb CHANGED Viewed

@@ -1,3 +1,3 @@
 module SmarterCSV
-  VERSION = "1.5.0"
+  VERSION = "1.6.0"
 end

data/smarter_csv.gemspec CHANGED Viewed

@@ -16,9 +16,9 @@ Gem::Specification.new do |spec|
   spec.executables   = spec.files.grep(%r{^bin/}).map{ |f| File.basename(f) }
   spec.test_files    = spec.files.grep(%r{^(test|spec|features)/})
   spec.require_paths = ["lib"]
-  spec.requirements  = ['csv'] # for CSV.parse() only needed in case we have quoted fields
   spec.add_development_dependency "rspec"
   spec.add_development_dependency "simplecov"
+  spec.add_development_dependency "awesome_print"
   #  spec.add_development_dependency "guard-rspec"
   spec.metadata["homepage_uri"] = spec.homepage

data/spec/fixtures/duplicate_headers.csv CHANGED Viewed

@@ -1,3 +1,3 @@
 email,firstname,lastname,email,age
 tom@bla.com,Tom,Sawyer,mike@bla.com,34
-eri@bla.com,Eri Chan,tom@bla.com,21
+eri@bla.com,Eri,Chan,tom@bla.com,21

data/spec/smarter_csv/duplicate_headers_spec.rb ADDED Viewed

@@ -0,0 +1,76 @@
+require 'spec_helper'
+fixture_path = 'spec/fixtures'
+describe 'duplicate headers' do
+  describe 'without special handling / default behavior' do
+    it 'raises error on duplicate headers' do
+      expect {
+        SmarterCSV.process("#{fixture_path}/duplicate_headers.csv", {})
+      }.to raise_exception(SmarterCSV::DuplicateHeaders)
+    end
+    it 'raises error on duplicate given headers' do
+      expect {
+        options = {:user_provided_headers => [:a,:b,:c,:d,:a]}
+        SmarterCSV.process("#{fixture_path}/duplicate_headers.csv", options)
+      }.to raise_exception(SmarterCSV::DuplicateHeaders)
+    end
+    it 'raises error on missing mapped headers and includes missing headers in message' do
+      expect {
+        # the mapping is right, but the underlying csv file is bad
+        options = {:key_mapping => {:email => :a, :firstname => :b, :lastname => :c, :manager_email => :d, :age => :e} }
+        SmarterCSV.process("#{fixture_path}/duplicate_headers.csv", options)
+      }.to raise_exception(SmarterCSV::KeyMappingError, "missing header(s): manager_email")
+    end
+  end
+  describe 'with special handling' do
+    context 'with given suffix' do
+      let(:options) { {duplicate_header_suffix: '_'} }
+      it 'reads whole file' do
+        data = SmarterCSV.process("#{fixture_path}/duplicate_headers.csv", options)
+        expect(data.size).to eq 2
+      end
+      it 'generates the correct keys' do
+        data = SmarterCSV.process("#{fixture_path}/duplicate_headers.csv", options)
+        expect(data.first.keys).to eq [:email, :firstname, :lastname, :email_2, :age]
+      end
+      it 'enumerates when duplicate headers are given' do
+        options.merge!({:user_provided_headers => [:a,:b,:c,:a,:a]})
+        data = SmarterCSV.process("#{fixture_path}/duplicate_headers.csv", options)
+        expect(data.first.keys).to eq [:a, :b, :c, :a_2, :a_3]
+      end
+      it 'can remap duplicated headers' do
+        options.merge!({:key_mapping => {:email => :a, :firstname => :b, :lastname => :c, :email_2 => :d, :age => :e}})
+        data = SmarterCSV.process("#{fixture_path}/duplicate_headers.csv", options)
+        expect(data.first).to eq({a: 'tom@bla.com', b: 'Tom', c: 'Sawyer', d: 'mike@bla.com', e: 34})
+      end
+    end
+    context 'with empty suffix' do
+      let(:options) { {duplicate_header_suffix: ''} }
+      it 'reads whole file' do
+        data = SmarterCSV.process("#{fixture_path}/duplicate_headers.csv", options)
+        expect(data.size).to eq 2
+      end
+      it 'generates the correct keys' do
+        data = SmarterCSV.process("#{fixture_path}/duplicate_headers.csv", options)
+        expect(data.first.keys).to eq [:email, :firstname, :lastname, :email2, :age]
+      end
+      it 'enumerates when duplicate headers are given' do
+        options.merge!({:user_provided_headers => [:a,:b,:c,:a,:a]})
+        data = SmarterCSV.process("#{fixture_path}/duplicate_headers.csv", options)
+        expect(data.first.keys).to eq [:a, :b, :c, :a2, :a3]
+      end
+    end
+  end
+end

data/spec/smarter_csv/invalid_headers_spec.rb CHANGED Viewed

@@ -3,28 +3,6 @@ require 'spec_helper'
 fixture_path = 'spec/fixtures'
 describe 'test exceptions for invalid headers' do
-  it 'raises error on duplicate headers' do
-    expect {
-      SmarterCSV.process("#{fixture_path}/duplicate_headers.csv", {})
-    }.to raise_exception(SmarterCSV::DuplicateHeaders)
-  end
-  it 'raises error on duplicate given headers' do
-    expect {
-      options = {:user_provided_headers => [:a,:b,:c,:d,:a]}
-      SmarterCSV.process("#{fixture_path}/duplicate_headers.csv", options)
-    }.to raise_exception(SmarterCSV::DuplicateHeaders)
-  end
-  it 'raises error on duplicate mapped headers' do
-    expect {
-      # the mapping is right, but the underlying csv file is bad
-      options = {:key_mapping => {:email => :a, :firstname => :b, :lastname => :c, :manager_email => :d, :age => :e} }
-      SmarterCSV.process("#{fixture_path}/duplicate_headers.csv", options)
-    }.to raise_exception(SmarterCSV::DuplicateHeaders)
-  end
   it 'does not raise an error if no required headers are given' do
     options = {:required_headers => nil} # order does not matter
     data = SmarterCSV.process("#{fixture_path}/user_import.csv", options)
@@ -49,4 +27,12 @@ describe 'test exceptions for invalid headers' do
       SmarterCSV.process("#{fixture_path}/user_import.csv", options)
     }.to raise_exception(SmarterCSV::MissingHeaders)
   end
+  it 'raises error on missing mapped headers and includes missing headers in message' do
+    expect {
+      # :age does not exist in the CSV header
+      options = {:key_mapping => {:email => :a, :firstname => :b, :lastname => :c, :manager_email => :d, :age => :e} }
+      SmarterCSV.process("#{fixture_path}/user_import.csv", options)
+    }.to raise_exception(SmarterCSV::KeyMappingError, "missing header(s): age")
+  end
 end

data/spec/smarter_csv/malformed_spec.rb CHANGED Viewed

@@ -2,16 +2,24 @@ require 'spec_helper'
 fixture_path = 'spec/fixtures'
-describe 'malformed_csv' do
-  subject { lambda { SmarterCSV.process(csv_path) } }
-  context "malformed header" do
+# according to RFC-4180 quotes inside of "words" shouldbe doubled, but our parser is robust against that.
+describe 'malformed CSV quotes' do
+  context "malformed quotes in header" do
     let(:csv_path) { "#{fixture_path}/malformed_header.csv" }
-    it { should raise_error(CSV::MalformedCSVError) }
+    it 'should be resilient against single quotes' do
+      data = SmarterCSV.process(csv_path)
+      expect(data[0]).to eq({:name=>"Arnold Schwarzenegger", :dobdob=>"1947-07-30"})
+      expect(data[1]).to eq({:name=>"Jeff Bridges", :dobdob=>"1949-12-04"})
+    end
   end
-  context "malformed content" do
+  context "malformed quotes in content" do
     let(:csv_path) { "#{fixture_path}/malformed.csv" }
-    it { should raise_error(CSV::MalformedCSVError) }
+    it 'should be resilient against single quotes' do
+      data = SmarterCSV.process(csv_path)
+      expect(data[0]).to eq({:name=>"Arnold Schwarzenegger", :dob=>"1947-07-30"})
+      expect(data[1]).to eq({:name=>"Jeff \"the dude\" Bridges", :dob=>"1949-12-04"})
+    end
   end
 end

data/spec/smarter_csv/no_header_spec.rb CHANGED Viewed

@@ -2,23 +2,28 @@ require 'spec_helper'
 fixture_path = 'spec/fixtures'
-describe 'be_able_to' do
-  it 'loads_csv_file_without_header' do
-    options = {:headers_in_file => false, :user_provided_headers => [:a,:b,:c,:d,:e,:f]}
-    data = SmarterCSV.process("#{fixture_path}/no_header.csv", options)
+describe 'no header in file' do
+  let(:headers) { [:a,:b,:c,:d,:e,:f] }
+  let(:options) { {:headers_in_file => false, :user_provided_headers => headers} }
+  subject(:data) { SmarterCSV.process("#{fixture_path}/no_header.csv", options) }
+  it 'load the correct number of records' do
     data.size.should == 5
-    # all the keys should be symbols
-    data.each{|item| item.keys.each{|x| x.class.should be == Symbol}}
+  end
-    data.each do |item|
+  it 'uses given symbols for all records' do
+    data.each do |item|
       item.keys.each do |key|
         [:a,:b,:c,:d,:e,:f].should include( key )
       end
     end
-    data.each do |h|
-      h.size.should <= 6
-    end
   end
+  it 'loads the correct data' do
+    data[0].should == {a: "Dan", b: "McAllister", c: 2, d: 0}
+    data[1].should == {a: "Lucy", b: "Laweless", d: 5, e: 0}
+    data[2].should == {a: "Miles", b: "O'Brian", c: 0, d: 0, e: 0, f: 21}
+    data[3].should == {a: "Nancy", b: "Homes", c: 2, d: 0, e: 1}
+    data[4].should == {a: "Hernán", b: "Curaçon", c: 3, d: 0, e: 0}
+  end
 end

data/spec/smarter_csv/parse/column_separator_spec.rb ADDED Viewed

@@ -0,0 +1,61 @@
+require 'spec_helper'
+describe 'parse with col_sep' do
+  let(:options) { {quote_char: '"'} }
+  it 'parses with comma' do
+    line = "a,b,,d"
+    options.merge!({col_sep: ","})
+    array, array_size = SmarterCSV.send(:parse, line, options)
+    expect(array).to eq ['a', 'b', '', 'd']
+    expect(array_size).to eq 4
+  end
+  it 'parses trailing commas' do
+    line = "a,b,c,,"
+    options.merge!({col_sep: ","})
+    array, array_size = SmarterCSV.send(:parse, line, options)
+    expect(array).to eq ['a', 'b', 'c', '', '']
+    expect(array_size).to eq 5
+  end
+  it 'parses with space' do
+    line = "a b  d"
+    options.merge!({col_sep: " "})
+    array, array_size = SmarterCSV.send(:parse, line, options)
+    expect(array).to eq ['a', 'b', '', 'd']
+    expect(array_size).to eq 4
+  end
+  it 'parses with tab' do
+    line = "a\tb\t\td"
+    options.merge!({col_sep: "\t"})
+    array, array_size = SmarterCSV.send(:parse, line, options)
+    expect(array).to eq ['a', 'b', '', 'd']
+    expect(array_size).to eq 4
+  end
+  it 'parses with multiple space separator' do
+    line = "a b    d"
+    options.merge!({col_sep: "  "})
+    array, array_size = SmarterCSV.send(:parse, line, options)
+    expect(array).to eq ['a b', '', 'd']
+    expect(array_size).to eq 3
+  end
+  it 'parses with multiple char separator' do
+    line = '<=><=>A<=>B<=>C'
+    options.merge!({col_sep: "<=>"})
+    array, array_size = SmarterCSV.send(:parse, line, options)
+    expect(array).to eq ["", "", "A", "B", "C"]
+    expect(array_size).to eq 5
+  end
+  it 'parses trailing multiple char separator' do
+    line = '<=><=>A<=>B<=>C<=><=>'
+    options.merge!({col_sep: "<=>"})
+    array, array_size = SmarterCSV.send(:parse, line, options)
+    expect(array).to eq ["", "", "A", "B", "C", "", ""]
+    expect(array_size).to eq 7
+  end
+end

data/spec/smarter_csv/parse/old_csv_library_spec.rb ADDED Viewed

@@ -0,0 +1,74 @@
+require 'spec_helper'
+describe 'old CSV library parsing tests' do
+  let(:options) { {quote_char: '"', col_sep: ","} }
+  [ ["\t", ["\t"]],
+    ["foo,\"\"\"\"\"\",baz", ["foo", "\"\"", "baz"]],
+    ["foo,\"\"\"bar\"\"\",baz", ["foo", "\"bar\"", "baz"]],
+    ["\"\"\"\n\",\"\"\"\n\"", ["\"\n", "\"\n"]],
+    ["foo,\"\r\n\",baz", ["foo", "\r\n", "baz"]],
+    ["\"\"", [""]],
+    ["foo,\"\"\"\",baz", ["foo", "\"", "baz"]],
+    ["foo,\"\r.\n\",baz", ["foo", "\r.\n", "baz"]],
+    ["foo,\"\r\",baz", ["foo", "\r", "baz"]],
+    ["foo,\"\",baz", ["foo", "", "baz"]],
+    ["\",\"", [","]],
+    ["foo", ["foo"]],
+    [",,", ['', '', '']],
+    [",", ['', '']],
+    ["foo,\"\n\",baz", ["foo", "\n", "baz"]],
+    ["foo,,baz", ["foo", '', "baz"]],
+    ["\"\"\"\r\",\"\"\"\r\"", ["\"\r", "\"\r"]],
+    ["\",\",\",\"", [",", ","]],
+    ["foo,bar,", ["foo", "bar", '']],
+    [",foo,bar", ['', "foo", "bar"]],
+    ["foo,bar", ["foo", "bar"]],
+    [";", [";"]],
+    ["\t,\t", ["\t", "\t"]],
+    ["foo,\"\r\n\r\",baz", ["foo", "\r\n\r", "baz"]],
+    ["foo,\"\r\n\n\",baz", ["foo", "\r\n\n", "baz"]],
+    ["foo,\"foo,bar\",baz", ["foo", "foo,bar", "baz"]],
+    [";,;", [";", ";"]]
+  ].each do |line, result|
+    it "parses #{line}" do
+      array, array_size = SmarterCSV.send(:parse, line, options)
+      expect(array).to eq result
+    end
+  end
+  [ ["foo,\"\"\"\"\"\",baz", ["foo", "\"\"", "baz"]],
+    ["foo,\"\"\"bar\"\"\",baz", ["foo", "\"bar\"", "baz"]],
+    ["foo,\"\r\n\",baz", ["foo", "\r\n", "baz"]],
+    ["\"\"", [""]],
+    ["foo,\"\"\"\",baz", ["foo", "\"", "baz"]],
+    ["foo,\"\r.\n\",baz", ["foo", "\r.\n", "baz"]],
+    ["foo,\"\r\",baz", ["foo", "\r", "baz"]],
+    ["foo,\"\",baz", ["foo", "", "baz"]],
+    ["foo", ["foo"]],
+    [",,", ['', '', '']],
+    [",", ['', '']],
+    ["foo,\"\n\",baz", ["foo", "\n", "baz"]],
+    ["foo,,baz", ["foo", '', "baz"]],
+    ["foo,bar", ["foo", "bar"]],
+    ["foo,\"\r\n\n\",baz", ["foo", "\r\n\n", "baz"]],
+    ["foo,\"foo,bar\",baz", ["foo", "foo,bar", "baz"]]
+  ].each do |line, result|
+    it "parses #{line}" do
+      array, array_size = SmarterCSV.send(:parse, line, options)
+      expect(array).to eq result
+    end
+  end
+  it 'mixed quotes' do
+    line = %Q{Ten Thousand,10000, 2710 ,,"10,000","It's ""10 Grand"", baby",10K}
+    array, array_size = SmarterCSV.send(:parse, line, options)
+    expect(array).to eq ["Ten Thousand", "10000", " 2710 ", "", "10,000", "It's \"10 Grand\", baby", "10K"]
+  end
+  it 'single quotes in fields' do
+    line = 'Indoor Chrome,49.2"" L x 49.2"" W x 20.5"" H,Chrome,"Crystal,Metal,Wood",23.12'
+    array, array_size = SmarterCSV.send(:parse, line, options)
+    expect(array).to eq ['Indoor Chrome', '49.2" L x 49.2" W x 20.5" H', 'Chrome', 'Crystal,Metal,Wood', '23.12']
+  end
+end

data/spec/smarter_csv/parse/rfc4180_and_more_spec.rb ADDED Viewed

@@ -0,0 +1,170 @@
+require 'spec_helper'
+fixture_path = 'spec/fixtures'
+describe 'fulfills RFC-4180 and more' do
+  let(:options) { {col_sep: ',', row_sep: $INPUT_RECORD_SEPARATOR, quote_char: '"' } }
+  context 'parses simple CSV' do
+    context 'RFC-4180' do
+      it 'separating on col_sep' do
+        line = 'aaa,bbb,ccc'
+        expect( SmarterCSV.send(:parse, line, options)).to eq [%w[aaa bbb ccc], 3]
+      end
+      it 'preserves whitespace' do
+        line = ' aaa , bbb , ccc '
+        expect( SmarterCSV.send(:parse, line, options)).to eq [
+          [' aaa ', ' bbb ', ' ccc '], 3
+        ]
+      end
+    end
+    context 'extending RFC-4180' do
+      it 'with extra col_sep' do
+        line = 'aaa,bbb,ccc,'
+        expect( SmarterCSV.send(:parse, line, options)).to eq [
+          ['aaa', 'bbb', 'ccc', ''], 4
+        ]
+      end
+      it 'with extra col_sep with given header_size' do
+        line = 'aaa,bbb,ccc,'
+        expect( SmarterCSV.send(:parse, line, options, 3)).to eq [
+          ['aaa', 'bbb', 'ccc'], 3
+        ]
+      end
+      it 'with multiple extra col_sep' do
+        line = 'aaa,bbb,ccc,,,'
+        expect( SmarterCSV.send(:parse, line, options)).to eq [
+          ['aaa', 'bbb', 'ccc', '', '', ''], 6
+        ]
+      end
+      it 'with multiple extra col_sep' do
+        line = 'aaa,bbb,ccc,,,'
+        expect( SmarterCSV.send(:parse, line, options, 3)).to eq [
+          ['aaa', 'bbb', 'ccc'], 3
+        ]
+      end
+      it 'with multiple complex col_sep' do
+        line = 'aaa<=>bbb<=>ccc<=><=><=>'
+        expect( SmarterCSV.send(:parse, line, options.merge({col_sep: '<=>'}))).to eq [
+          ['aaa', 'bbb', 'ccc', '', '', ''], 6
+        ]
+      end
+      it 'with multiple complex col_sep with given header_size' do
+        line = 'aaa<=>bbb<=>ccc<=><=><=>'
+        expect( SmarterCSV.send(:parse, line, options.merge({col_sep: '<=>'}), 3)).to eq [
+          ['aaa', 'bbb', 'ccc'], 3
+        ]
+      end
+    end
+  end
+  context 'parses quoted CSV' do
+    context 'RFC-4180' do
+      it 'separating on col_sep' do
+        line = '"aaa","bbb","ccc"'
+        expect( SmarterCSV.send(:parse, line, options)).to eq [%w[aaa bbb ccc], 3]
+      end
+      it 'parses corner case correctly' do
+        line = '"Board 4""","$17.40","10000003427"'
+        expect( SmarterCSV.send(:parse, line, options)).to eq [
+          ['Board 4"', '$17.40', '10000003427'], 3
+        ]
+      end
+      it 'quoted parts can contain spaces' do
+        line = '" aaa1 aaa2 "," bbb1 bbb2 "," ccc1 ccc2 "'
+        expect( SmarterCSV.send(:parse, line, options)).to eq [
+          [' aaa1 aaa2 ', ' bbb1 bbb2 ', ' ccc1 ccc2 '], 3
+        ]
+      end
+      it 'quoted parts can contain row_sep' do
+        line = '"aaa1, aaa2","bbb1, bbb2","ccc1, ccc2"'
+        expect( SmarterCSV.send(:parse, line, options)).to eq [
+          ['aaa1, aaa2', 'bbb1, bbb2', 'ccc1, ccc2'], 3
+        ]
+      end
+      it 'quoted parts can contain row_sep' do
+        line = '"aaa1, ""aaa2"", aaa3","""bbb1"", bbb2","ccc1, ""ccc2"""'
+        expect( SmarterCSV.send(:parse, line, options)).to eq [
+          ['aaa1, "aaa2", aaa3', '"bbb1", bbb2', 'ccc1, "ccc2"'], 3
+        ]
+      end
+      it 'some fields are quoted' do
+        line = '1,"board 4""",12.95'
+        expect( SmarterCSV.send(:parse, line, options)).to eq [
+          ['1', 'board 4"', '12.95'], 3
+        ]
+      end
+      it 'separating on col_sep' do
+        line = '"some","thing","""completely"" different"'
+        expect( SmarterCSV.send(:parse, line, options)).to eq [
+          ['some', 'thing', '"completely" different'], 3
+        ]
+      end
+    end
+    context 'extending RFC-4180' do
+      it 'with extra col_sep, without given header_size' do
+        line = '"aaa","bbb","ccc",'
+        expect( SmarterCSV.send(:parse, line, options)).to eq [
+          ['aaa', 'bbb', 'ccc', ''], 4
+        ]
+      end
+      it 'with extra col_sep, with given header_size' do
+        line = '"aaa","bbb","ccc",'
+        expect( SmarterCSV.send(:parse, line, options, 3)).to eq [%w[aaa bbb ccc], 3]
+      end
+      it 'with multiple extra col_sep, without given header_size' do
+        line = '"aaa","bbb","ccc",,,'
+        expect( SmarterCSV.send(:parse, line, options)).to eq [
+          ['aaa', 'bbb', 'ccc', '', '', ''], 6
+        ]
+      end
+      it 'with multiple extra col_sep, with given header_size' do
+        line = '"aaa","bbb","ccc",,,'
+        expect( SmarterCSV.send(:parse, line, options, 3)).to eq [
+          ['aaa', 'bbb', 'ccc'], 3
+        ]
+      end
+      it 'with multiple complex extra col_sep, without given header_size' do
+        line = '"aaa"<=>"bbb"<=>"ccc"<=><=><=>'
+        expect( SmarterCSV.send(:parse, line, options.merge({col_sep: '<=>'}))).to eq [
+          ['aaa', 'bbb', 'ccc', '', '', ''], 6
+        ]
+      end
+      it 'with multiple complex extra col_sep, with given header_size' do
+        line = '"aaa"<=>"bbb"<=>"ccc"<=><=><=>'
+        expect( SmarterCSV.send(:parse, line, options.merge({col_sep: '<=>'}), 3)).to eq [
+          ['aaa', 'bbb', 'ccc'], 3
+        ]
+      end
+    end
+  end
+  # relaxed parsing compared to RFC-4180
+  context 'liberal_parsing' do
+    it 'parses corner case correctly' do
+      line = 'is,this "three, or four",fields'
+      expect( SmarterCSV.send(:parse, line, options)).to eq [
+        ['is', 'this "three, or four"', 'fields'], 3
+      ]
+    end
+  end
+end

data/spec/smarter_csv/quoted_spec.rb CHANGED Viewed

@@ -3,7 +3,6 @@ require 'spec_helper'
 fixture_path = 'spec/fixtures'
 describe 'loading file with quoted fields' do
   it 'leaving the quotes in the data' do
     options = {}
     data = SmarterCSV.process("#{fixture_path}/quoted.csv", options)
@@ -12,6 +11,7 @@ describe 'loading file with quoted fields' do
     data[1][:description].should be_nil
     data[2][:model].should eq 'Venture "Extended Edition, Very Large"'
     data[2][:description].should be_nil
+    data[3][:description].should eq 'MUST SELL! air, moon roof, loaded'
     data.each do |h|
       h[:year].class.should eq Fixnum
       h[:make].should_not be_nil
@@ -20,17 +20,21 @@ describe 'loading file with quoted fields' do
     end
   end
+  # quotes inside quoted fields need to be escaped by another double-quote
   it 'removes quotes around quoted fields, but not inside data' do
     options = {}
     data = SmarterCSV.process("#{fixture_path}/quote_char.csv", options)
     data.length.should eq 6
+    data[0][:first_name].should eq "\"John"
+    data[0][:last_name].should eq "Cooke\""
     data[1][:first_name].should eq "Jam\ne\nson\""
     data[2][:first_name].should eq "\"Jean"
+    data[4][:first_name].should eq "Bo\"bbie"
+    data[5][:first_name].should eq 'Mica'
+    data[5][:last_name].should eq 'Copeland'
   end
   # NOTE: quotes inside headers need to be escaped by doubling them
   #       e.g. 'correct ""EXAMPLE""'
   #       this escaping is illegal: 'incorrect \"EXAMPLE\"' <-- this caused CSV parsing error
@@ -43,6 +47,6 @@ describe 'loading file with quoted fields' do
     data.length.should eq 3
     data.first.keys[2].should eq :isbn
     data.first.keys[3].should eq :discounted_price
+    data[1][:author].should eq 'Timothy "The Parser" Campbell'
   end
 end

metadata CHANGED Viewed

@@ -1,14 +1,14 @@
 --- !ruby/object:Gem::Specification
 name: smarter_csv
 version: !ruby/object:Gem::Version
-  version: 1.5.0
+  version: 1.6.0
 platform: ruby
 authors:
 - Tilo Sloboda
 autorequire:
 bindir: bin
 cert_chain: []
-date: 2022-04-25 00:00:00.000000000 Z
+date: 2022-05-03 00:00:00.000000000 Z
 dependencies:
 - !ruby/object:Gem::Dependency
   name: rspec
@@ -38,6 +38,20 @@ dependencies:
     - - ">="
       - !ruby/object:Gem::Version
         version: '0'
+- !ruby/object:Gem::Dependency
+  name: awesome_print
+  requirement: !ruby/object:Gem::Requirement
+    requirements:
+    - - ">="
+      - !ruby/object:Gem::Version
+        version: '0'
+  type: :development
+  prerelease: false
+  version_requirements: !ruby/object:Gem::Requirement
+    requirements:
+    - - ">="
+      - !ruby/object:Gem::Version
+        version: '0'
 description: Ruby Gem for smarter importing of CSV Files as Array(s) of Hashes, with
   optional features for processing large files in parallel, embedded comments, unusual
   field- and record-separators, flexible mapping of CSV-headers to Hash-keys
@@ -112,6 +126,7 @@ files:
 - spec/smarter_csv/close_file_spec.rb
 - spec/smarter_csv/column_separator_spec.rb
 - spec/smarter_csv/convert_values_to_numeric_spec.rb
+- spec/smarter_csv/duplicate_headers_spec.rb
 - spec/smarter_csv/empty_columns_spec.rb
 - spec/smarter_csv/extenstions_spec.rb
 - spec/smarter_csv/hard_sample_spec.rb
@@ -125,6 +140,9 @@ files:
 - spec/smarter_csv/malformed_spec.rb
 - spec/smarter_csv/no_header_spec.rb
 - spec/smarter_csv/not_downcase_header_spec.rb
+- spec/smarter_csv/parse/column_separator_spec.rb
+- spec/smarter_csv/parse/old_csv_library_spec.rb
+- spec/smarter_csv/parse/rfc4180_and_more_spec.rb
 - spec/smarter_csv/problematic.rb
 - spec/smarter_csv/quoted_spec.rb
 - spec/smarter_csv/remove_empty_values_spec.rb
@@ -160,8 +178,7 @@ required_rubygems_version: !ruby/object:Gem::Requirement
   - - ">="
     - !ruby/object:Gem::Version
       version: '0'
-requirements:
-- csv
+requirements: []
 rubygems_version: 3.1.6
 signing_key:
 specification_version: 4
@@ -218,6 +235,7 @@ test_files:
 - spec/smarter_csv/close_file_spec.rb
 - spec/smarter_csv/column_separator_spec.rb
 - spec/smarter_csv/convert_values_to_numeric_spec.rb
+- spec/smarter_csv/duplicate_headers_spec.rb
 - spec/smarter_csv/empty_columns_spec.rb
 - spec/smarter_csv/extenstions_spec.rb
 - spec/smarter_csv/hard_sample_spec.rb
@@ -231,6 +249,9 @@ test_files:
 - spec/smarter_csv/malformed_spec.rb
 - spec/smarter_csv/no_header_spec.rb
 - spec/smarter_csv/not_downcase_header_spec.rb
+- spec/smarter_csv/parse/column_separator_spec.rb
+- spec/smarter_csv/parse/old_csv_library_spec.rb
+- spec/smarter_csv/parse/rfc4180_and_more_spec.rb
 - spec/smarter_csv/problematic.rb
 - spec/smarter_csv/quoted_spec.rb
 - spec/smarter_csv/remove_empty_values_spec.rb