smarter_csv 1.8.5 → 1.9.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: 8a812edc2e7a7778b0722a120acf8432eb844895b09668c3adf37184cfa08408
4
- data.tar.gz: e78756cef3558b32cfa2788fdbfdd8723cc7a06872c97916fc2c95e16cf363d7
3
+ metadata.gz: 20db4a75108d2b7934b90e6d0e3fef3053e61ad077b4ec4966a97a6e5cb3aa42
4
+ data.tar.gz: 44d1c3b995b3f7d53d46768437517379dfe3b93fa1db1384054ac16baadfb8d6
5
5
  SHA512:
6
- metadata.gz: e81dfd9e713a301f58c64311a02e5047aa3678652e7c410fd8bfaaa5f49459faa3766f5e152d27b080d8677fb3cb452b45ae76086f9feba5caf5f1cd57220260
7
- data.tar.gz: c2a6206128b0860138ca739a70b9546fe6ddbb37b4d09069607abc17a08fd51a2f2070d9ab2a6f55955a5f6a98f9def1dad7ce5ac03d0d6aabe67c9593be792f
6
+ metadata.gz: 74a2edab893bd9e1b798b03321aea55b566accb92a47d939bdddb616af57fc8525b60b50fb94d1f67face95c83d0a00300f587355dd4934130f5bac9879d5dcd
7
+ data.tar.gz: 1cb78471b4021dafed4fc1bdd42c2acae934eb1a89f05e9700bfb99f40543c770db58b9b42a70272de9797bdf43fe5d53fa8c352964c1574d658ab2f881a06d6
data/.rubocop.yml CHANGED
@@ -22,6 +22,9 @@ Metrics/BlockLength:
22
22
  Metrics/BlockNesting:
23
23
  Enabled: false
24
24
 
25
+ Metrics/ClassLength:
26
+ Enabled: false
27
+
25
28
  Metrics/CyclomaticComplexity: # BS rule
26
29
  Enabled: false
27
30
 
@@ -46,6 +49,9 @@ Naming/VariableNumber:
46
49
  Style/ClassEqualityComparison:
47
50
  Enabled: false
48
51
 
52
+ Style/ClassMethods:
53
+ Enabled: false
54
+
49
55
  Style/ConditionalAssignment:
50
56
  Enabled: false
51
57
 
@@ -114,6 +120,9 @@ Style/StringLiteralsInInterpolation:
114
120
  Enabled: false
115
121
  EnforcedStyle: double_quotes
116
122
 
123
+ Style/SymbolArray:
124
+ Enabled: false
125
+
117
126
  Style/SymbolProc: # old Ruby versions can't do this
118
127
  Enabled: false
119
128
 
@@ -123,6 +132,9 @@ Style/TrailingCommaInHashLiteral:
123
132
  Style/TrailingUnderscoreVariable:
124
133
  Enabled: false
125
134
 
135
+ Style/TrivialAccessors:
136
+ Enabled: false
137
+
126
138
  # Style/UnlessModifier:
127
139
  # Enabled: false
128
140
 
@@ -130,4 +142,4 @@ Style/ZeroLengthPredicate:
130
142
  Enabled: false
131
143
 
132
144
  Layout/LineLength:
133
- Max: 240
145
+ Max: 256
data/CHANGELOG.md CHANGED
@@ -1,6 +1,27 @@
1
1
 
2
2
  # SmarterCSV 1.x Change Log
3
3
 
4
+ ## 1.9.0 (2023-09-04)
5
+ * fixed issue #139
6
+
7
+ * Error `SmarterCSV::MissingHeaders` was renamed to `SmarterCSV::MissingKeys`
8
+
9
+ * CHANGED BEHAVIOR:
10
+ When `key_mapping` option is used. (issue #139)
11
+ Previous versions just printed an error message when a CSV header was missing during key mapping.
12
+ Versions >= 1.9 will throw `SmarterCSV::MissingHeaders` listing all headers that were missing during mapping.
13
+
14
+ * Notable details for `key_mapping` and `required_headers`:
15
+
16
+ * `key_mapping` is applied to the headers early on during `SmarterCSV.process`, and raises an error if a header in the input CSV file is missing, and we can not map that header to its desired name.
17
+
18
+ Mapping errors can be surpressed by using:
19
+ * `silence_missing_keys` set to `true`, which silence all such errors, making all headers for mapping optional.
20
+ * `silence_missing_keys` given an Array with the specific header keys that are optional
21
+ The use case is that some header fields are optional, but we still want them renamed if they are present.
22
+
23
+ * `required_headers` checks which headers are present **after** `key_mapping` was applied.
24
+
4
25
  ## 1.8.5 (2023-06-25)
5
26
  * fix parsing of escaped quote characters (thanks to JP Camara)
6
27
 
data/README.md CHANGED
@@ -161,7 +161,22 @@ and how the `process` method returns the number of chunks when called with a blo
161
161
  => returns number of chunks / rows we processed
162
162
  ```
163
163
 
164
- #### Example 4: Reading a CSV-like File, and Processing it with Sidekiq:
164
+ #### Example 4: Processing a CSV File, and inserting batch jobs in Sidekiq:
165
+ ```ruby
166
+ filename = '/tmp/input.csv' # CSV file containing ids or data to process
167
+ options = { :chunk_size => 100 }
168
+ n = SmarterCSV.process(filename, options) do |chunk|
169
+ Sidekiq::Client.push_bulk(
170
+ 'class' => SidekiqIndividualWorkerClass,
171
+ 'args' => chunk,
172
+ )
173
+ # OR:
174
+ # SidekiqBatchWorkerClass.process_async(chunk ) # pass an array of hashes to Sidekiq workers for parallel processing
175
+ end
176
+ => returns number of chunks
177
+ ```
178
+
179
+ #### Example 4b: Reading a CSV-like File, and Processing it with Sidekiq:
165
180
  ```ruby
166
181
  filename = '/tmp/strange_db_dump' # a file with CRTL-A as col_separator, and with CTRL-B\n as record_separator (hello iTunes!)
167
182
  options = {
@@ -173,7 +188,6 @@ and how the `process` method returns the number of chunks when called with a blo
173
188
  end
174
189
  => returns number of chunks
175
190
  ```
176
-
177
191
  #### Example 5: Populate a MongoDB Database in Chunks of 100 records with SmarterCSV:
178
192
  ```ruby
179
193
  # using chunks:
@@ -282,7 +296,9 @@ And header and data validations will also be supported in 2.x
282
296
  | Option | Default | Explanation |
283
297
  ---------------------------------------------------------------------------------------------------------------------------------
284
298
  | :key_mapping | nil | a hash which maps headers from the CSV file to keys in the result hash |
285
- | :silence_missing_key | false | ignore missing keys in `key_mapping` if true |
299
+ | :silence_missing_key | false | ignore missing keys in `key_mapping` |
300
+ | | | if set to true: makes all mapped keys optional |
301
+ | | | if given an array, makes only the keys listed in it optional |
286
302
  | :required_keys | nil | An array. Specify the required names AFTER header transformation. |
287
303
  | :required_headers | nil | (DEPRECATED / renamed) Use `required_keys` instead |
288
304
  | | | or an exception is raised No validation if nil is given. |
data/Rakefile CHANGED
@@ -3,16 +3,15 @@
3
3
  require "bundler/gem_tasks"
4
4
  require 'rspec/core/rake_task'
5
5
 
6
-
7
- # temp fix for NoMethodError: undefined method `last_comment'
8
- # remove when fixed in Rake 11.x and higher
9
- module TempFixForRakeLastComment
10
- def last_comment
11
- last_description
12
- end
13
- end
14
- Rake::Application.send :include, TempFixForRakeLastComment
15
- ### end of tempfix
6
+ # # temp fix for NoMethodError: undefined method `last_comment'
7
+ # # remove when fixed in Rake 11.x and higher
8
+ # module TempFixForRakeLastComment
9
+ # def last_comment
10
+ # last_description
11
+ # end
12
+ # end
13
+ # Rake::Application.send :include, TempFixForRakeLastComment
14
+ # ### end of tempfix
16
15
 
17
16
  RSpec::Core::RakeTask.new(:spec)
18
17
 
@@ -1,5 +1,5 @@
1
1
  # frozen_string_literal: true
2
2
 
3
3
  module SmarterCSV
4
- VERSION = "1.8.5"
4
+ VERSION = "1.9.0"
5
5
  end
data/lib/smarter_csv.rb CHANGED
@@ -12,12 +12,12 @@ module SmarterCSV
12
12
  class IncorrectOption < SmarterCSVException; end
13
13
  class ValidationError < SmarterCSVException; end
14
14
  class DuplicateHeaders < SmarterCSVException; end
15
- class MissingHeaders < SmarterCSVException; end
15
+ class MissingKeys < SmarterCSVException; end # previously known as MissingHeaders
16
16
  class NoColSepDetected < SmarterCSVException; end
17
- class KeyMappingError < SmarterCSVException; end # CURRENTLY UNUSED -> version 1.9.0
17
+ class KeyMappingError < SmarterCSVException; end
18
18
 
19
19
  # first parameter: filename or input object which responds to readline method
20
- def SmarterCSV.process(input, options = {}, &block)
20
+ def SmarterCSV.process(input, options = {}, &block) # rubocop:disable Lint/UnusedMethodArgument
21
21
  options = default_options.merge(options)
22
22
  options[:invalid_byte_sequence] = '' if options[:invalid_byte_sequence].nil?
23
23
  puts "SmarterCSV OPTIONS: #{options.inspect}" if options[:verbose]
@@ -99,7 +99,7 @@ module SmarterCSV
99
99
  hash.delete_if{|_k, v| has_rails ? v.blank? : blank?(v)}
100
100
  end
101
101
 
102
- hash.delete_if{|_k, v| !v.nil? && v =~ /^(\d+|\d+\.\d+)$/ && v.to_f == 0} if options[:remove_zero_values] # values are typically Strings!
102
+ hash.delete_if{|_k, v| !v.nil? && v =~ /^(0+|0+\.0+)$/} if options[:remove_zero_values] # values are Strings
103
103
  hash.delete_if{|_k, v| v =~ options[:remove_values_matching]} if options[:remove_values_matching]
104
104
 
105
105
  if options[:convert_values_to_numeric]
@@ -171,15 +171,15 @@ module SmarterCSV
171
171
  result << chunk # not sure yet, why anybody would want to do this without a block
172
172
  end
173
173
  chunk_count += 1
174
- chunk = [] # initialize for next chunk of data
174
+ # chunk = [] # initialize for next chunk of data
175
175
  end
176
176
  ensure
177
177
  fh.close if fh.respond_to?(:close)
178
178
  end
179
179
  if block_given?
180
- return chunk_count # when we do processing through a block we only care how many chunks we processed
180
+ chunk_count # when we do processing through a block we only care how many chunks we processed
181
181
  else
182
- return result # returns either an Array of Hashes, or an Array of Arrays of Hashes (if in chunked mode)
182
+ result # returns either an Array of Hashes, or an Array of Arrays of Hashes (if in chunked mode)
183
183
  end
184
184
  end
185
185
 
@@ -285,11 +285,11 @@ module SmarterCSV
285
285
  has_quotes = line =~ /#{options[:quote_char]}/
286
286
  elements = parse_csv_line_c(line, options[:col_sep], options[:quote_char], header_size)
287
287
  elements.map!{|x| cleanup_quotes(x, options[:quote_char])} if has_quotes
288
- return [elements, elements.size]
288
+ [elements, elements.size]
289
289
  # :nocov:
290
290
  else
291
291
  # puts "WARNING: SmarterCSV is using un-accelerated parsing of lines. Check options[:acceleration]"
292
- return parse_csv_line_ruby(line, options, header_size)
292
+ parse_csv_line_ruby(line, options, header_size)
293
293
  end
294
294
  end
295
295
 
@@ -402,7 +402,7 @@ module SmarterCSV
402
402
  return true unless Array(options[option_name][:only]).include?(key)
403
403
  end
404
404
  end
405
- return false
405
+ false
406
406
  end
407
407
 
408
408
  # If file has headers, then guesses column separator from headers.
@@ -467,8 +467,8 @@ module SmarterCSV
467
467
 
468
468
  counts["\r"] += 1 if last_char == "\r"
469
469
  # find the most frequent key/value pair:
470
- k, _ = counts.max_by{|_, v| v}
471
- return k
470
+ most_frequent_key, _count = counts.max_by{|_, v| v}
471
+ most_frequent_key
472
472
  end
473
473
 
474
474
  def process_headers(filehandle, options)
@@ -490,6 +490,7 @@ module SmarterCSV
490
490
 
491
491
  file_headerA.map!{|x| x.gsub(%r/#{options[:quote_char]}/, '')}
492
492
  file_headerA.map!{|x| x.strip} if options[:strip_whitespace]
493
+
493
494
  unless options[:keep_original_headers]
494
495
  file_headerA.map!{|x| x.gsub(/\s+|-+/, '_')}
495
496
  file_headerA.map!{|x| x.downcase} if options[:downcase_header]
@@ -523,10 +524,13 @@ module SmarterCSV
523
524
  # do some key mapping on the keys in the file header
524
525
  # if you want to completely delete a key, then map it to nil or to ''
525
526
  if !key_mappingH.nil? && key_mappingH.class == Hash && key_mappingH.keys.size > 0
526
- unless options[:silence_missing_keys]
527
- # if silence_missing_keys are not set, raise error if missing header
528
- missing_keys = key_mappingH.keys - headerA
529
- puts "WARNING: missing header(s): #{missing_keys.join(",")}" unless missing_keys.empty?
527
+ # if silence_missing_keys are not set, raise error if missing header
528
+ missing_keys = key_mappingH.keys - headerA
529
+ # if the user passes a list of speciffic mapped keys that are optional
530
+ missing_keys -= options[:silence_missing_keys] if options[:silence_missing_keys].is_a?(Array)
531
+
532
+ unless missing_keys.empty? || options[:silence_missing_keys] == true
533
+ raise SmarterCSV::KeyMappingError, "ERROR: can not map headers: #{missing_keys.join(', ')}"
530
534
  end
531
535
 
532
536
  headerA.map!{|x| key_mappingH.has_key?(x) ? (key_mappingH[x].nil? ? nil : key_mappingH[x]) : (options[:remove_unmapped_keys] ? nil : x)}
@@ -544,8 +548,8 @@ module SmarterCSV
544
548
  end
545
549
 
546
550
  # deprecate required_headers
547
- if !options[:required_headers].nil?
548
- puts "DEPRECATION WARNING: please use 'required_keys' instead of 'required headers'"
551
+ unless options[:required_headers].nil?
552
+ puts "DEPRECATION WARNING: please use 'required_keys' instead of 'required_headers'"
549
553
  if options[:required_keys].nil?
550
554
  options[:required_keys] = options[:required_headers]
551
555
  options[:required_headers] = nil
@@ -557,7 +561,7 @@ module SmarterCSV
557
561
  options[:required_keys].each do |k|
558
562
  missing_keys << k unless headerA.include?(k)
559
563
  end
560
- raise SmarterCSV::MissingHeaders, "ERROR: missing attributes: #{missing_keys.join(',')}" unless missing_keys.empty?
564
+ raise SmarterCSV::MissingKeys, "ERROR: missing attributes: #{missing_keys.join(',')}" unless missing_keys.empty?
561
565
  end
562
566
 
563
567
  @headers = headerA
@@ -611,6 +615,7 @@ module SmarterCSV
611
615
  def option_valid?(str)
612
616
  return true if str.is_a?(Symbol) && str == :auto
613
617
  return true if str.is_a?(String) && !str.empty?
618
+
614
619
  false
615
620
  end
616
621
  end
data/smarter_csv.gemspec CHANGED
@@ -1,5 +1,7 @@
1
- # -*- encoding: utf-8 -*-
2
- require File.expand_path('../lib/smarter_csv/version', __FILE__)
1
+ # coding: utf-8
2
+ # frozen_string_literal: true
3
+
4
+ require File.expand_path('lib/smarter_csv/version', __dir__)
3
5
 
4
6
  Gem::Specification.new do |spec|
5
7
  spec.name = "smarter_csv"
@@ -7,8 +9,8 @@ Gem::Specification.new do |spec|
7
9
  spec.authors = ["Tilo Sloboda"]
8
10
  spec.email = ["tilo.sloboda@gmail.com"]
9
11
 
10
- spec.summary = %q{Ruby Gem for smarter importing of CSV Files (and CSV-like files), with lots of optional features, e.g. chunked processing for huge CSV files}
11
- spec.description = %q{Ruby Gem for smarter importing of CSV Files as Array(s) of Hashes, with optional features for processing large files in parallel, embedded comments, unusual field- and record-separators, flexible mapping of CSV-headers to Hash-keys}
12
+ spec.summary = "Ruby Gem for smarter importing of CSV Files (and CSV-like files), with lots of optional features, e.g. chunked processing for huge CSV files"
13
+ spec.description = "Ruby Gem for smarter importing of CSV Files as Array(s) of Hashes, with optional features for processing large files in parallel, embedded comments, unusual field- and record-separators, flexible mapping of CSV-headers to Hash-keys"
12
14
  spec.homepage = "https://github.com/tilo/smarter_csv"
13
15
  spec.license = 'MIT'
14
16
 
@@ -16,6 +18,8 @@ Gem::Specification.new do |spec|
16
18
  spec.metadata["source_code_uri"] = spec.homepage
17
19
  spec.metadata["changelog_uri"] = "https://github.com/tilo/smarter_csv/blob/main/CHANGELOG.md"
18
20
 
21
+ spec.required_ruby_version = ">= 2.5.0"
22
+
19
23
  # Specify which files should be added to the gem when it is released.
20
24
  # The `git ls-files -z` loads the files in the RubyGem that have been added into git.
21
25
  spec.files = Dir.chdir(__dir__) do
@@ -30,7 +34,6 @@ Gem::Specification.new do |spec|
30
34
  spec.require_paths = ["lib"] # add ext here?
31
35
  spec.extensions = ["ext/smarter_csv/extconf.rb"]
32
36
 
33
-
34
37
  spec.add_development_dependency "awesome_print"
35
38
  spec.add_development_dependency "codecov"
36
39
  spec.add_development_dependency "pry"
metadata CHANGED
@@ -1,14 +1,14 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: smarter_csv
3
3
  version: !ruby/object:Gem::Version
4
- version: 1.8.5
4
+ version: 1.9.0
5
5
  platform: ruby
6
6
  authors:
7
7
  - Tilo Sloboda
8
8
  autorequire:
9
9
  bindir: bin
10
10
  cert_chain: []
11
- date: 2023-06-26 00:00:00.000000000 Z
11
+ date: 2023-09-05 00:00:00.000000000 Z
12
12
  dependencies:
13
13
  - !ruby/object:Gem::Dependency
14
14
  name: awesome_print
@@ -134,7 +134,7 @@ required_ruby_version: !ruby/object:Gem::Requirement
134
134
  requirements:
135
135
  - - ">="
136
136
  - !ruby/object:Gem::Version
137
- version: '0'
137
+ version: 2.5.0
138
138
  required_rubygems_version: !ruby/object:Gem::Requirement
139
139
  requirements:
140
140
  - - ">="