RubyGems - smarter_csv - Versions diffs - 1.8.5 → 1.9.0 - Mend

smarter_csv 1.8.5 → 1.9.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (9) hide show

checksums.yaml CHANGED Viewed

@@ -1,7 +1,7 @@
 ---
 SHA256:
-  metadata.gz: 8a812edc2e7a7778b0722a120acf8432eb844895b09668c3adf37184cfa08408
-  data.tar.gz: e78756cef3558b32cfa2788fdbfdd8723cc7a06872c97916fc2c95e16cf363d7
+  metadata.gz: 20db4a75108d2b7934b90e6d0e3fef3053e61ad077b4ec4966a97a6e5cb3aa42
+  data.tar.gz: 44d1c3b995b3f7d53d46768437517379dfe3b93fa1db1384054ac16baadfb8d6
 SHA512:
-  metadata.gz: e81dfd9e713a301f58c64311a02e5047aa3678652e7c410fd8bfaaa5f49459faa3766f5e152d27b080d8677fb3cb452b45ae76086f9feba5caf5f1cd57220260
-  data.tar.gz: c2a6206128b0860138ca739a70b9546fe6ddbb37b4d09069607abc17a08fd51a2f2070d9ab2a6f55955a5f6a98f9def1dad7ce5ac03d0d6aabe67c9593be792f
+  metadata.gz: 74a2edab893bd9e1b798b03321aea55b566accb92a47d939bdddb616af57fc8525b60b50fb94d1f67face95c83d0a00300f587355dd4934130f5bac9879d5dcd
+  data.tar.gz: 1cb78471b4021dafed4fc1bdd42c2acae934eb1a89f05e9700bfb99f40543c770db58b9b42a70272de9797bdf43fe5d53fa8c352964c1574d658ab2f881a06d6

data/.rubocop.yml CHANGED Viewed

@@ -22,6 +22,9 @@ Metrics/BlockLength:
 Metrics/BlockNesting:
   Enabled: false
+Metrics/ClassLength:
+  Enabled: false
 Metrics/CyclomaticComplexity: # BS rule
   Enabled: false
@@ -46,6 +49,9 @@ Naming/VariableNumber:
 Style/ClassEqualityComparison:
   Enabled: false
+Style/ClassMethods:
+  Enabled: false
 Style/ConditionalAssignment:
   Enabled: false
@@ -114,6 +120,9 @@ Style/StringLiteralsInInterpolation:
   Enabled: false
   EnforcedStyle: double_quotes
+Style/SymbolArray:
+  Enabled: false
 Style/SymbolProc: # old Ruby versions can't do this
   Enabled: false
@@ -123,6 +132,9 @@ Style/TrailingCommaInHashLiteral:
 Style/TrailingUnderscoreVariable:
   Enabled: false
+Style/TrivialAccessors:
+  Enabled: false
 # Style/UnlessModifier:
 #   Enabled: false
@@ -130,4 +142,4 @@ Style/ZeroLengthPredicate:
   Enabled: false
 Layout/LineLength:
-  Max: 240
+  Max: 256

data/CHANGELOG.md CHANGED Viewed

@@ -1,6 +1,27 @@
 # SmarterCSV 1.x Change Log
+## 1.9.0 (2023-09-04)
+  * fixed issue #139
+  * Error `SmarterCSV::MissingHeaders` was renamed to `SmarterCSV::MissingKeys`
+  * CHANGED BEHAVIOR:
+    When `key_mapping` option is used. (issue #139)
+    Previous versions just printed an error message when a CSV header was missing during key mapping.
+    Versions >= 1.9 will throw `SmarterCSV::MissingHeaders` listing all headers that were missing during mapping.
+  * Notable details for `key_mapping` and `required_headers`:
+    * `key_mapping` is applied to the headers early on during `SmarterCSV.process`, and raises an error if a header in the input CSV file is missing, and we can not map that header to its desired name.
+    Mapping errors can be surpressed by using:
+    * `silence_missing_keys` set to `true`, which silence all such errors, making all headers for mapping optional.
+    * `silence_missing_keys` given an Array with the specific header keys that are optional
+    The use case is that some header fields are optional, but we still want them renamed if they are present.
+    * `required_headers` checks which headers are present **after** `key_mapping` was applied.
 ## 1.8.5 (2023-06-25)
   * fix parsing of escaped quote characters (thanks to JP Camara)

data/README.md CHANGED Viewed

@@ -161,7 +161,22 @@ and how the `process` method returns the number of chunks when called with a blo
      => returns number of chunks / rows we processed
 ```
-#### Example 4: Reading a CSV-like File, and Processing it with Sidekiq:
+#### Example 4: Processing a CSV File, and inserting batch jobs in Sidekiq:
+```ruby
+    filename = '/tmp/input.csv' # CSV file containing ids or data to process
+    options = { :chunk_size => 100 }
+    n = SmarterCSV.process(filename, options) do |chunk|
+      Sidekiq::Client.push_bulk(
+        'class' => SidekiqIndividualWorkerClass,
+        'args' => chunk,
+      )
+      # OR:
+      # SidekiqBatchWorkerClass.process_async(chunk ) # pass an array of hashes to Sidekiq workers for parallel processing
+    end
+    => returns number of chunks
+```
+#### Example 4b: Reading a CSV-like File, and Processing it with Sidekiq:
 ```ruby
     filename = '/tmp/strange_db_dump'   # a file with CRTL-A as col_separator, and with CTRL-B\n as record_separator (hello iTunes!)
     options = {
@@ -173,7 +188,6 @@ and how the `process` method returns the number of chunks when called with a blo
     end
     => returns number of chunks
 ```
 #### Example 5: Populate a MongoDB Database in Chunks of 100 records with SmarterCSV:
 ```ruby
     # using chunks:
@@ -282,7 +296,9 @@ And header and data validations will also be supported in 2.x
      | Option                      | Default  |  Explanation                                                                         |
      ---------------------------------------------------------------------------------------------------------------------------------
      | :key_mapping                |   nil    | a hash which maps headers from the CSV file to keys in the result hash               |
-     | :silence_missing_key        |   false  | ignore missing keys in `key_mapping` if true                                         |
+     | :silence_missing_key        |   false  | ignore missing keys in `key_mapping`                                   |
+     |                             |          | if set to true: makes all mapped keys optional                         |
+     |                             |          | if given an array, makes only the keys listed in it optional                         |
      | :required_keys              |   nil    | An array. Specify the required names AFTER header transformation.                  |
      | :required_headers           |   nil    | (DEPRECATED / renamed) Use `required_keys` instead                          |
      |                             |          | or an exception is raised   No validation if nil is given.                           |

data/Rakefile CHANGED Viewed

@@ -3,16 +3,15 @@
 require "bundler/gem_tasks"
 require 'rspec/core/rake_task'
-# temp fix for NoMethodError: undefined method `last_comment'
-# remove when fixed in Rake 11.x and higher
-module TempFixForRakeLastComment
-  def last_comment
-    last_description
-  end
-end
-Rake::Application.send :include, TempFixForRakeLastComment
-### end of tempfix
+# # temp fix for NoMethodError: undefined method `last_comment'
+# # remove when fixed in Rake 11.x and higher
+# module TempFixForRakeLastComment
+#   def last_comment
+#     last_description
+#   end
+# end
+# Rake::Application.send :include, TempFixForRakeLastComment
+# ### end of tempfix
 RSpec::Core::RakeTask.new(:spec)

data/lib/smarter_csv/version.rb CHANGED Viewed

@@ -1,5 +1,5 @@
 # frozen_string_literal: true
 module SmarterCSV
-  VERSION = "1.8.5"
+  VERSION = "1.9.0"
 end

data/lib/smarter_csv.rb CHANGED Viewed

@@ -12,12 +12,12 @@ module SmarterCSV
   class IncorrectOption < SmarterCSVException; end
   class ValidationError < SmarterCSVException; end
   class DuplicateHeaders < SmarterCSVException; end
-  class MissingHeaders < SmarterCSVException; end
+  class MissingKeys < SmarterCSVException; end # previously known as MissingHeaders
   class NoColSepDetected < SmarterCSVException; end
-  class KeyMappingError < SmarterCSVException; end # CURRENTLY UNUSED -> version 1.9.0
+  class KeyMappingError < SmarterCSVException; end
   # first parameter: filename or input object which responds to readline method
-  def SmarterCSV.process(input, options = {}, &block)
+  def SmarterCSV.process(input, options = {}, &block) # rubocop:disable Lint/UnusedMethodArgument
     options = default_options.merge(options)
     options[:invalid_byte_sequence] = '' if options[:invalid_byte_sequence].nil?
     puts "SmarterCSV OPTIONS: #{options.inspect}" if options[:verbose]
@@ -99,7 +99,7 @@ module SmarterCSV
           hash.delete_if{|_k, v| has_rails ? v.blank? : blank?(v)}
         end
-        hash.delete_if{|_k, v| !v.nil? && v =~ /^(\d+|\d+\.\d+)$/ && v.to_f == 0} if options[:remove_zero_values] # values are typically Strings!
+        hash.delete_if{|_k, v| !v.nil? && v =~ /^(0+|0+\.0+)$/} if options[:remove_zero_values] # values are Strings
         hash.delete_if{|_k, v| v =~ options[:remove_values_matching]} if options[:remove_values_matching]
         if options[:convert_values_to_numeric]
@@ -171,15 +171,15 @@ module SmarterCSV
           result << chunk # not sure yet, why anybody would want to do this without a block
         end
         chunk_count += 1
-        chunk = [] # initialize for next chunk of data
+        # chunk = [] # initialize for next chunk of data
       end
     ensure
       fh.close if fh.respond_to?(:close)
     end
     if block_given?
-      return chunk_count # when we do processing through a block we only care how many chunks we processed
+      chunk_count # when we do processing through a block we only care how many chunks we processed
     else
-      return result # returns either an Array of Hashes, or an Array of Arrays of Hashes (if in chunked mode)
+      result # returns either an Array of Hashes, or an Array of Arrays of Hashes (if in chunked mode)
     end
   end
@@ -285,11 +285,11 @@ module SmarterCSV
         has_quotes = line =~ /#{options[:quote_char]}/
         elements = parse_csv_line_c(line, options[:col_sep], options[:quote_char], header_size)
         elements.map!{|x| cleanup_quotes(x, options[:quote_char])} if has_quotes
-        return [elements, elements.size]
+        [elements, elements.size]
         # :nocov:
       else
         # puts "WARNING: SmarterCSV is using un-accelerated parsing of lines. Check options[:acceleration]"
-        return parse_csv_line_ruby(line, options, header_size)
+        parse_csv_line_ruby(line, options, header_size)
       end
     end
@@ -402,7 +402,7 @@ module SmarterCSV
           return true unless Array(options[option_name][:only]).include?(key)
         end
       end
-      return false
+      false
     end
     # If file has headers, then guesses column separator from headers.
@@ -467,8 +467,8 @@ module SmarterCSV
       counts["\r"] += 1 if last_char == "\r"
       # find the most frequent key/value pair:
-      k, _ = counts.max_by{|_, v| v}
-      return k
+      most_frequent_key, _count = counts.max_by{|_, v| v}
+      most_frequent_key
     end
     def process_headers(filehandle, options)
@@ -490,6 +490,7 @@ module SmarterCSV
         file_headerA.map!{|x| x.gsub(%r/#{options[:quote_char]}/, '')}
         file_headerA.map!{|x| x.strip} if options[:strip_whitespace]
         unless options[:keep_original_headers]
           file_headerA.map!{|x| x.gsub(/\s+|-+/, '_')}
           file_headerA.map!{|x| x.downcase} if options[:downcase_header]
@@ -523,10 +524,13 @@ module SmarterCSV
         # do some key mapping on the keys in the file header
         #   if you want to completely delete a key, then map it to nil or to ''
         if !key_mappingH.nil? && key_mappingH.class == Hash && key_mappingH.keys.size > 0
-          unless options[:silence_missing_keys]
-            # if silence_missing_keys are not set, raise error if missing header
-            missing_keys = key_mappingH.keys - headerA
-            puts "WARNING: missing header(s): #{missing_keys.join(",")}" unless missing_keys.empty?
+          # if silence_missing_keys are not set, raise error if missing header
+          missing_keys = key_mappingH.keys - headerA
+          # if the user passes a list of speciffic mapped keys that are optional
+          missing_keys -= options[:silence_missing_keys] if options[:silence_missing_keys].is_a?(Array)
+          unless missing_keys.empty? || options[:silence_missing_keys] == true
+            raise  SmarterCSV::KeyMappingError,  "ERROR: can not map headers: #{missing_keys.join(', ')}"
           end
           headerA.map!{|x| key_mappingH.has_key?(x) ? (key_mappingH[x].nil? ? nil : key_mappingH[x]) : (options[:remove_unmapped_keys] ? nil : x)}
@@ -544,8 +548,8 @@ module SmarterCSV
       end
       # deprecate required_headers
-      if !options[:required_headers].nil?
-        puts "DEPRECATION WARNING: please use 'required_keys' instead of 'required headers'"
+      unless options[:required_headers].nil?
+        puts "DEPRECATION WARNING: please use 'required_keys' instead of 'required_headers'"
         if options[:required_keys].nil?
           options[:required_keys] = options[:required_headers]
           options[:required_headers] = nil
@@ -557,7 +561,7 @@ module SmarterCSV
         options[:required_keys].each do |k|
           missing_keys << k unless headerA.include?(k)
         end
-        raise SmarterCSV::MissingHeaders, "ERROR: missing attributes: #{missing_keys.join(',')}" unless missing_keys.empty?
+        raise SmarterCSV::MissingKeys, "ERROR: missing attributes: #{missing_keys.join(',')}" unless missing_keys.empty?
       end
       @headers = headerA
@@ -611,6 +615,7 @@ module SmarterCSV
     def option_valid?(str)
       return true if str.is_a?(Symbol) && str == :auto
       return true if str.is_a?(String) && !str.empty?
       false
     end
   end

data/smarter_csv.gemspec CHANGED Viewed

@@ -1,5 +1,7 @@
-# -*- encoding: utf-8 -*-
-require File.expand_path('../lib/smarter_csv/version', __FILE__)
+# coding: utf-8
+# frozen_string_literal: true
+require File.expand_path('lib/smarter_csv/version', __dir__)
 Gem::Specification.new do |spec|
   spec.name          = "smarter_csv"
@@ -7,8 +9,8 @@ Gem::Specification.new do |spec|
   spec.authors       = ["Tilo Sloboda"]
   spec.email         = ["tilo.sloboda@gmail.com"]
-  spec.summary       = %q{Ruby Gem for smarter importing of CSV Files (and CSV-like files), with lots of optional features, e.g. chunked processing for huge CSV files}
-  spec.description   = %q{Ruby Gem for smarter importing of CSV Files as Array(s) of Hashes, with optional features for processing large files in parallel, embedded comments, unusual field- and record-separators, flexible mapping of CSV-headers to Hash-keys}
+  spec.summary       = "Ruby Gem for smarter importing of CSV Files (and CSV-like files), with lots of optional features, e.g. chunked processing for huge CSV files"
+  spec.description   = "Ruby Gem for smarter importing of CSV Files as Array(s) of Hashes, with optional features for processing large files in parallel, embedded comments, unusual field- and record-separators, flexible mapping of CSV-headers to Hash-keys"
   spec.homepage      = "https://github.com/tilo/smarter_csv"
   spec.license       = 'MIT'
@@ -16,6 +18,8 @@ Gem::Specification.new do |spec|
   spec.metadata["source_code_uri"] = spec.homepage
   spec.metadata["changelog_uri"] = "https://github.com/tilo/smarter_csv/blob/main/CHANGELOG.md"
+  spec.required_ruby_version = ">= 2.5.0"
   # Specify which files should be added to the gem when it is released.
   # The `git ls-files -z` loads the files in the RubyGem that have been added into git.
   spec.files = Dir.chdir(__dir__) do
@@ -30,7 +34,6 @@ Gem::Specification.new do |spec|
   spec.require_paths = ["lib"] # add ext here?
   spec.extensions = ["ext/smarter_csv/extconf.rb"]
   spec.add_development_dependency "awesome_print"
   spec.add_development_dependency "codecov"
   spec.add_development_dependency "pry"

metadata CHANGED Viewed

@@ -1,14 +1,14 @@
 --- !ruby/object:Gem::Specification
 name: smarter_csv
 version: !ruby/object:Gem::Version
-  version: 1.8.5
+  version: 1.9.0
 platform: ruby
 authors:
 - Tilo Sloboda
 autorequire:
 bindir: bin
 cert_chain: []
-date: 2023-06-26 00:00:00.000000000 Z
+date: 2023-09-05 00:00:00.000000000 Z
 dependencies:
 - !ruby/object:Gem::Dependency
   name: awesome_print
@@ -134,7 +134,7 @@ required_ruby_version: !ruby/object:Gem::Requirement
   requirements:
   - - ">="
     - !ruby/object:Gem::Version
-      version: '0'
+      version: 2.5.0
 required_rubygems_version: !ruby/object:Gem::Requirement
   requirements:
   - - ">="