RubyGems - smarter_csv - Versions diffs - 1.8.4 → 1.9.0 - Mend

smarter_csv 1.8.4 → 1.9.0

Files changed (11) hide show

checksums.yaml +4 -4
data/.rubocop.yml +13 -1
data/CHANGELOG.md +24 -0
data/CONTRIBUTORS.md +1 -0
data/README.md +19 -3
data/Rakefile +9 -10
data/ext/smarter_csv/smarter_csv.c +5 -1
data/lib/smarter_csv/version.rb +1 -1
data/lib/smarter_csv.rb +45 -22
data/smarter_csv.gemspec +8 -5
metadata +4 -4

checksums.yaml CHANGED Viewed

@@ -1,7 +1,7 @@
 ---
 SHA256:
-  metadata.gz: 122fe57cc771c142a77ceb6305212e8884660a21b9c9edd67a198a14d19e103e
-  data.tar.gz: 4355a9bb355d9f2fa7640ed9f712e6ba57b0a8682417c7668745f96ded39a7c1
+  metadata.gz: 20db4a75108d2b7934b90e6d0e3fef3053e61ad077b4ec4966a97a6e5cb3aa42
+  data.tar.gz: 44d1c3b995b3f7d53d46768437517379dfe3b93fa1db1384054ac16baadfb8d6
 SHA512:
-  metadata.gz: 1646311a9207cf6f042f7e9b30b4ebc94cb6389b541548104b1af888ebdec7af0e50c675fd98ae3e60e86f0b6cd81b51a7e01588b82ae79cdb9ac2674bcc8a51
-  data.tar.gz: 24cc3e5d6467349d24bac39c615e802a1a8f8e5100b1d8f1f93962f23d6ababb50d58e8505d28e83e5b3c7de7d65d57880dc4447dc1247ac2a200ba2f034d27e
+  metadata.gz: 74a2edab893bd9e1b798b03321aea55b566accb92a47d939bdddb616af57fc8525b60b50fb94d1f67face95c83d0a00300f587355dd4934130f5bac9879d5dcd
+  data.tar.gz: 1cb78471b4021dafed4fc1bdd42c2acae934eb1a89f05e9700bfb99f40543c770db58b9b42a70272de9797bdf43fe5d53fa8c352964c1574d658ab2f881a06d6

data/.rubocop.yml CHANGED Viewed

@@ -22,6 +22,9 @@ Metrics/BlockLength:
 Metrics/BlockNesting:
   Enabled: false
+Metrics/ClassLength:
+  Enabled: false
 Metrics/CyclomaticComplexity: # BS rule
   Enabled: false
@@ -46,6 +49,9 @@ Naming/VariableNumber:
 Style/ClassEqualityComparison:
   Enabled: false
+Style/ClassMethods:
+  Enabled: false
 Style/ConditionalAssignment:
   Enabled: false
@@ -114,6 +120,9 @@ Style/StringLiteralsInInterpolation:
   Enabled: false
   EnforcedStyle: double_quotes
+Style/SymbolArray:
+  Enabled: false
 Style/SymbolProc: # old Ruby versions can't do this
   Enabled: false
@@ -123,6 +132,9 @@ Style/TrailingCommaInHashLiteral:
 Style/TrailingUnderscoreVariable:
   Enabled: false
+Style/TrivialAccessors:
+  Enabled: false
 # Style/UnlessModifier:
 #   Enabled: false
@@ -130,4 +142,4 @@ Style/ZeroLengthPredicate:
   Enabled: false
 Layout/LineLength:
-  Max: 240
+  Max: 256

data/CHANGELOG.md CHANGED Viewed

@@ -1,6 +1,30 @@
 # SmarterCSV 1.x Change Log
+## 1.9.0 (2023-09-04)
+  * fixed issue #139
+  * Error `SmarterCSV::MissingHeaders` was renamed to `SmarterCSV::MissingKeys`
+  * CHANGED BEHAVIOR:
+    When `key_mapping` option is used. (issue #139)
+    Previous versions just printed an error message when a CSV header was missing during key mapping.
+    Versions >= 1.9 will throw `SmarterCSV::MissingHeaders` listing all headers that were missing during mapping.
+  * Notable details for `key_mapping` and `required_headers`:
+    * `key_mapping` is applied to the headers early on during `SmarterCSV.process`, and raises an error if a header in the input CSV file is missing, and we can not map that header to its desired name.
+    Mapping errors can be surpressed by using:
+    * `silence_missing_keys` set to `true`, which silence all such errors, making all headers for mapping optional.
+    * `silence_missing_keys` given an Array with the specific header keys that are optional
+    The use case is that some header fields are optional, but we still want them renamed if they are present.
+    * `required_headers` checks which headers are present **after** `key_mapping` was applied.
+## 1.8.5 (2023-06-25)
+  * fix parsing of escaped quote characters (thanks to JP Camara)
 ## 1.8.4 (2023-04-01)
   * fix gem loading issue (issue #232, #234)

data/CONTRIBUTORS.md CHANGED Viewed

@@ -50,3 +50,4 @@ A Big Thank you to everyone who filed issues, sent comments, and who contributed
  * [Hirotaka Mizutani ](https://github.com/hirotaka)
  * [Rahul Chaudhary](https://github.com/rahulch95)
  * [Alessandro Fazzi](https://github.com/pioneerskies)
+ * [JP Camara](https://github.com/jpcamara)

data/README.md CHANGED Viewed

@@ -161,7 +161,22 @@ and how the `process` method returns the number of chunks when called with a blo
      => returns number of chunks / rows we processed
 ```
-#### Example 4: Reading a CSV-like File, and Processing it with Sidekiq:
+#### Example 4: Processing a CSV File, and inserting batch jobs in Sidekiq:
+```ruby
+    filename = '/tmp/input.csv' # CSV file containing ids or data to process
+    options = { :chunk_size => 100 }
+    n = SmarterCSV.process(filename, options) do |chunk|
+      Sidekiq::Client.push_bulk(
+        'class' => SidekiqIndividualWorkerClass,
+        'args' => chunk,
+      )
+      # OR:
+      # SidekiqBatchWorkerClass.process_async(chunk ) # pass an array of hashes to Sidekiq workers for parallel processing
+    end
+    => returns number of chunks
+```
+#### Example 4b: Reading a CSV-like File, and Processing it with Sidekiq:
 ```ruby
     filename = '/tmp/strange_db_dump'   # a file with CRTL-A as col_separator, and with CTRL-B\n as record_separator (hello iTunes!)
     options = {
@@ -173,7 +188,6 @@ and how the `process` method returns the number of chunks when called with a blo
     end
     => returns number of chunks
 ```
 #### Example 5: Populate a MongoDB Database in Chunks of 100 records with SmarterCSV:
 ```ruby
     # using chunks:
@@ -282,7 +296,9 @@ And header and data validations will also be supported in 2.x
      | Option                      | Default  |  Explanation                                                                         |
      ---------------------------------------------------------------------------------------------------------------------------------
      | :key_mapping                |   nil    | a hash which maps headers from the CSV file to keys in the result hash               |
-     | :silence_missing_key        |   false  | ignore missing keys in `key_mapping` if true                                         |
+     | :silence_missing_key        |   false  | ignore missing keys in `key_mapping`                                   |
+     |                             |          | if set to true: makes all mapped keys optional                         |
+     |                             |          | if given an array, makes only the keys listed in it optional                         |
      | :required_keys              |   nil    | An array. Specify the required names AFTER header transformation.                  |
      | :required_headers           |   nil    | (DEPRECATED / renamed) Use `required_keys` instead                          |
      |                             |          | or an exception is raised   No validation if nil is given.                           |

data/Rakefile CHANGED Viewed

@@ -3,16 +3,15 @@
 require "bundler/gem_tasks"
 require 'rspec/core/rake_task'
-# temp fix for NoMethodError: undefined method `last_comment'
-# remove when fixed in Rake 11.x and higher
-module TempFixForRakeLastComment
-  def last_comment
-    last_description
-  end
-end
-Rake::Application.send :include, TempFixForRakeLastComment
-### end of tempfix
+# # temp fix for NoMethodError: undefined method `last_comment'
+# # remove when fixed in Rake 11.x and higher
+# module TempFixForRakeLastComment
+#   def last_comment
+#     last_description
+#   end
+# end
+# Rake::Application.send :include, TempFixForRakeLastComment
+# ### end of tempfix
 RSpec::Core::RakeTask.new(:spec)

data/ext/smarter_csv/smarter_csv.c CHANGED Viewed

@@ -39,6 +39,8 @@ static VALUE rb_parse_csv_line(VALUE self, VALUE line, VALUE col_sep, VALUE quot
   VALUE field;
   long i;
+  char prev_char = '\0'; // Store the previous character for comparison against an escape character
   while (p < endP) {
     /* does the remaining string start with col_sep ? */
     col_sep_found = true;
@@ -59,11 +61,13 @@ static VALUE rb_parse_csv_line(VALUE self, VALUE line, VALUE col_sep, VALUE quot
         startP = p;
       }
     } else {
-      if (*p == *quoteP) {
+      if (*p == *quoteP && prev_char != '\\') {
         quote_count += 1;
       }
       p++;
     }
+    prev_char = *(p - 1); // Update the previous character
   } /* while */
   /* check if the last part of the line needs to be processed */

data/lib/smarter_csv/version.rb CHANGED Viewed

@@ -1,5 +1,5 @@
 # frozen_string_literal: true
 module SmarterCSV
-  VERSION = "1.8.4"
+  VERSION = "1.9.0"
 end

data/lib/smarter_csv.rb CHANGED Viewed

@@ -12,12 +12,12 @@ module SmarterCSV
   class IncorrectOption < SmarterCSVException; end
   class ValidationError < SmarterCSVException; end
   class DuplicateHeaders < SmarterCSVException; end
-  class MissingHeaders < SmarterCSVException; end
+  class MissingKeys < SmarterCSVException; end # previously known as MissingHeaders
   class NoColSepDetected < SmarterCSVException; end
-  class KeyMappingError < SmarterCSVException; end # CURRENTLY UNUSED -> version 1.9.0
+  class KeyMappingError < SmarterCSVException; end
   # first parameter: filename or input object which responds to readline method
-  def SmarterCSV.process(input, options = {}, &block)
+  def SmarterCSV.process(input, options = {}, &block) # rubocop:disable Lint/UnusedMethodArgument
     options = default_options.merge(options)
     options[:invalid_byte_sequence] = '' if options[:invalid_byte_sequence].nil?
     puts "SmarterCSV OPTIONS: #{options.inspect}" if options[:verbose]
@@ -69,8 +69,8 @@ module SmarterCSV
         # in which case the row data will be split across multiple lines (see the sample content in spec/fixtures/carriage_returns_rn.csv)
         # by detecting the existence of an uneven number of quote characters
-        multiline = line.count(options[:quote_char]).odd? # should handle quote_char nil
-        while line.count(options[:quote_char]).odd? # should handle quote_char nil
+        multiline = count_quote_chars(line, options[:quote_char]).odd? # should handle quote_char nil
+        while count_quote_chars(line, options[:quote_char]).odd? # should handle quote_char nil
           next_line = fh.readline(options[:row_sep])
           next_line = next_line.force_encoding('utf-8').encode('utf-8', invalid: :replace, undef: :replace, replace: options[:invalid_byte_sequence]) if options[:force_utf8] || options[:file_encoding] !~ /utf-8/i
           line += next_line
@@ -99,7 +99,7 @@ module SmarterCSV
           hash.delete_if{|_k, v| has_rails ? v.blank? : blank?(v)}
         end
-        hash.delete_if{|_k, v| !v.nil? && v =~ /^(\d+|\d+\.\d+)$/ && v.to_f == 0} if options[:remove_zero_values] # values are typically Strings!
+        hash.delete_if{|_k, v| !v.nil? && v =~ /^(0+|0+\.0+)$/} if options[:remove_zero_values] # values are Strings
         hash.delete_if{|_k, v| v =~ options[:remove_values_matching]} if options[:remove_values_matching]
         if options[:convert_values_to_numeric]
@@ -171,15 +171,15 @@ module SmarterCSV
           result << chunk # not sure yet, why anybody would want to do this without a block
         end
         chunk_count += 1
-        chunk = [] # initialize for next chunk of data
+        # chunk = [] # initialize for next chunk of data
       end
     ensure
       fh.close if fh.respond_to?(:close)
     end
     if block_given?
-      return chunk_count # when we do processing through a block we only care how many chunks we processed
+      chunk_count # when we do processing through a block we only care how many chunks we processed
     else
-      return result # returns either an Array of Hashes, or an Array of Arrays of Hashes (if in chunked mode)
+      result # returns either an Array of Hashes, or an Array of Arrays of Hashes (if in chunked mode)
     end
   end
@@ -196,6 +196,21 @@ module SmarterCSV
       @headers
     end
+    # Counts the number of quote characters in a line, excluding escaped quotes.
+    def count_quote_chars(line, quote_char)
+      return 0 if line.nil? || quote_char.nil?
+      count = 0
+      previous_char = ''
+      line.each_char do |char|
+        count += 1 if char == quote_char && previous_char != '\\'
+        previous_char = char
+      end
+      count
+    end
     protected
     # NOTE: this is not called when "parse" methods are tested by themselves
@@ -270,11 +285,11 @@ module SmarterCSV
         has_quotes = line =~ /#{options[:quote_char]}/
         elements = parse_csv_line_c(line, options[:col_sep], options[:quote_char], header_size)
         elements.map!{|x| cleanup_quotes(x, options[:quote_char])} if has_quotes
-        return [elements, elements.size]
+        [elements, elements.size]
         # :nocov:
       else
         # puts "WARNING: SmarterCSV is using un-accelerated parsing of lines. Check options[:acceleration]"
-        return parse_csv_line_ruby(line, options, header_size)
+        parse_csv_line_ruby(line, options, header_size)
       end
     end
@@ -310,15 +325,18 @@ module SmarterCSV
       start = 0
       i = 0
+      previous_char = ''
       while i < line_size
         if line[i...i+col_sep_size] == col_sep && quote_count.even?
           break if !header_size.nil? && elements.size >= header_size
           elements << cleanup_quotes(line[start...i], quote)
+          previous_char = line[i]
           i += col_sep.size
           start = i
         else
-          quote_count += 1 if line[i] == quote
+          quote_count += 1 if line[i] == quote && previous_char != '\\'
+          previous_char = line[i]
           i += 1
         end
       end
@@ -384,7 +402,7 @@ module SmarterCSV
           return true unless Array(options[option_name][:only]).include?(key)
         end
       end
-      return false
+      false
     end
     # If file has headers, then guesses column separator from headers.
@@ -449,8 +467,8 @@ module SmarterCSV
       counts["\r"] += 1 if last_char == "\r"
       # find the most frequent key/value pair:
-      k, _ = counts.max_by{|_, v| v}
-      return k
+      most_frequent_key, _count = counts.max_by{|_, v| v}
+      most_frequent_key
     end
     def process_headers(filehandle, options)
@@ -472,6 +490,7 @@ module SmarterCSV
         file_headerA.map!{|x| x.gsub(%r/#{options[:quote_char]}/, '')}
         file_headerA.map!{|x| x.strip} if options[:strip_whitespace]
         unless options[:keep_original_headers]
           file_headerA.map!{|x| x.gsub(/\s+|-+/, '_')}
           file_headerA.map!{|x| x.downcase} if options[:downcase_header]
@@ -505,10 +524,13 @@ module SmarterCSV
         # do some key mapping on the keys in the file header
         #   if you want to completely delete a key, then map it to nil or to ''
         if !key_mappingH.nil? && key_mappingH.class == Hash && key_mappingH.keys.size > 0
-          unless options[:silence_missing_keys]
-            # if silence_missing_keys are not set, raise error if missing header
-            missing_keys = key_mappingH.keys - headerA
-            puts "WARNING: missing header(s): #{missing_keys.join(",")}" unless missing_keys.empty?
+          # if silence_missing_keys are not set, raise error if missing header
+          missing_keys = key_mappingH.keys - headerA
+          # if the user passes a list of speciffic mapped keys that are optional
+          missing_keys -= options[:silence_missing_keys] if options[:silence_missing_keys].is_a?(Array)
+          unless missing_keys.empty? || options[:silence_missing_keys] == true
+            raise  SmarterCSV::KeyMappingError,  "ERROR: can not map headers: #{missing_keys.join(', ')}"
           end
           headerA.map!{|x| key_mappingH.has_key?(x) ? (key_mappingH[x].nil? ? nil : key_mappingH[x]) : (options[:remove_unmapped_keys] ? nil : x)}
@@ -526,8 +548,8 @@ module SmarterCSV
       end
       # deprecate required_headers
-      if !options[:required_headers].nil?
-        puts "DEPRECATION WARNING: please use 'required_keys' instead of 'required headers'"
+      unless options[:required_headers].nil?
+        puts "DEPRECATION WARNING: please use 'required_keys' instead of 'required_headers'"
         if options[:required_keys].nil?
           options[:required_keys] = options[:required_headers]
           options[:required_headers] = nil
@@ -539,7 +561,7 @@ module SmarterCSV
         options[:required_keys].each do |k|
           missing_keys << k unless headerA.include?(k)
         end
-        raise SmarterCSV::MissingHeaders, "ERROR: missing attributes: #{missing_keys.join(',')}" unless missing_keys.empty?
+        raise SmarterCSV::MissingKeys, "ERROR: missing attributes: #{missing_keys.join(',')}" unless missing_keys.empty?
       end
       @headers = headerA
@@ -593,6 +615,7 @@ module SmarterCSV
     def option_valid?(str)
       return true if str.is_a?(Symbol) && str == :auto
       return true if str.is_a?(String) && !str.empty?
       false
     end
   end

data/smarter_csv.gemspec CHANGED Viewed

@@ -1,5 +1,7 @@
-# -*- encoding: utf-8 -*-
-require File.expand_path('../lib/smarter_csv/version', __FILE__)
+# coding: utf-8
+# frozen_string_literal: true
+require File.expand_path('lib/smarter_csv/version', __dir__)
 Gem::Specification.new do |spec|
   spec.name          = "smarter_csv"
@@ -7,8 +9,8 @@ Gem::Specification.new do |spec|
   spec.authors       = ["Tilo Sloboda"]
   spec.email         = ["tilo.sloboda@gmail.com"]
-  spec.summary       = %q{Ruby Gem for smarter importing of CSV Files (and CSV-like files), with lots of optional features, e.g. chunked processing for huge CSV files}
-  spec.description   = %q{Ruby Gem for smarter importing of CSV Files as Array(s) of Hashes, with optional features for processing large files in parallel, embedded comments, unusual field- and record-separators, flexible mapping of CSV-headers to Hash-keys}
+  spec.summary       = "Ruby Gem for smarter importing of CSV Files (and CSV-like files), with lots of optional features, e.g. chunked processing for huge CSV files"
+  spec.description   = "Ruby Gem for smarter importing of CSV Files as Array(s) of Hashes, with optional features for processing large files in parallel, embedded comments, unusual field- and record-separators, flexible mapping of CSV-headers to Hash-keys"
   spec.homepage      = "https://github.com/tilo/smarter_csv"
   spec.license       = 'MIT'
@@ -16,6 +18,8 @@ Gem::Specification.new do |spec|
   spec.metadata["source_code_uri"] = spec.homepage
   spec.metadata["changelog_uri"] = "https://github.com/tilo/smarter_csv/blob/main/CHANGELOG.md"
+  spec.required_ruby_version = ">= 2.5.0"
   # Specify which files should be added to the gem when it is released.
   # The `git ls-files -z` loads the files in the RubyGem that have been added into git.
   spec.files = Dir.chdir(__dir__) do
@@ -30,7 +34,6 @@ Gem::Specification.new do |spec|
   spec.require_paths = ["lib"] # add ext here?
   spec.extensions = ["ext/smarter_csv/extconf.rb"]
   spec.add_development_dependency "awesome_print"
   spec.add_development_dependency "codecov"
   spec.add_development_dependency "pry"

metadata CHANGED Viewed

@@ -1,14 +1,14 @@
 --- !ruby/object:Gem::Specification
 name: smarter_csv
 version: !ruby/object:Gem::Version
-  version: 1.8.4
+  version: 1.9.0
 platform: ruby
 authors:
 - Tilo Sloboda
 autorequire:
 bindir: bin
 cert_chain: []
-date: 2023-04-02 00:00:00.000000000 Z
+date: 2023-09-05 00:00:00.000000000 Z
 dependencies:
 - !ruby/object:Gem::Dependency
   name: awesome_print
@@ -134,14 +134,14 @@ required_ruby_version: !ruby/object:Gem::Requirement
   requirements:
   - - ">="
     - !ruby/object:Gem::Version
-      version: '0'
+      version: 2.5.0
 required_rubygems_version: !ruby/object:Gem::Requirement
   requirements:
   - - ">="
     - !ruby/object:Gem::Version
       version: '0'
 requirements: []
-rubygems_version: 3.1.6
+rubygems_version: 3.2.3
 signing_key:
 specification_version: 4
 summary: Ruby Gem for smarter importing of CSV Files (and CSV-like files), with lots