smarter_csv 1.8.5 → 1.9.2.pre01

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: 8a812edc2e7a7778b0722a120acf8432eb844895b09668c3adf37184cfa08408
4
- data.tar.gz: e78756cef3558b32cfa2788fdbfdd8723cc7a06872c97916fc2c95e16cf363d7
3
+ metadata.gz: 75d9d441d771c2fbe0861e0bc5f84dbf05d010b4844a867fc4679df002822d07
4
+ data.tar.gz: 0d5d37d4f2654fd354a2adac23019b2955540c1356c57f72052c01220598ffa2
5
5
  SHA512:
6
- metadata.gz: e81dfd9e713a301f58c64311a02e5047aa3678652e7c410fd8bfaaa5f49459faa3766f5e152d27b080d8677fb3cb452b45ae76086f9feba5caf5f1cd57220260
7
- data.tar.gz: c2a6206128b0860138ca739a70b9546fe6ddbb37b4d09069607abc17a08fd51a2f2070d9ab2a6f55955a5f6a98f9def1dad7ce5ac03d0d6aabe67c9593be792f
6
+ metadata.gz: b24e2b09ea919994da347eb52b781868a19e0f28dc367bfeb43b8a254619ab8dd882d3035f0546683c2ebc893fd600ce05a3abb800e4b124c7369d314607ee3f
7
+ data.tar.gz: 3a1115ac4937c2fedf469d1f45e3aa1cf7ed03f1f55d66f6cb310c767a3b5e8cb4966a19ba968f065c5d1de2d7074f479b5eeb0686dbab8012e5e6b8ed0f2628
data/.rubocop.yml CHANGED
@@ -22,6 +22,9 @@ Metrics/BlockLength:
22
22
  Metrics/BlockNesting:
23
23
  Enabled: false
24
24
 
25
+ Metrics/ClassLength:
26
+ Enabled: false
27
+
25
28
  Metrics/CyclomaticComplexity: # BS rule
26
29
  Enabled: false
27
30
 
@@ -46,6 +49,9 @@ Naming/VariableNumber:
46
49
  Style/ClassEqualityComparison:
47
50
  Enabled: false
48
51
 
52
+ Style/ClassMethods:
53
+ Enabled: false
54
+
49
55
  Style/ConditionalAssignment:
50
56
  Enabled: false
51
57
 
@@ -114,6 +120,9 @@ Style/StringLiteralsInInterpolation:
114
120
  Enabled: false
115
121
  EnforcedStyle: double_quotes
116
122
 
123
+ Style/SymbolArray:
124
+ Enabled: false
125
+
117
126
  Style/SymbolProc: # old Ruby versions can't do this
118
127
  Enabled: false
119
128
 
@@ -123,6 +132,9 @@ Style/TrailingCommaInHashLiteral:
123
132
  Style/TrailingUnderscoreVariable:
124
133
  Enabled: false
125
134
 
135
+ Style/TrivialAccessors:
136
+ Enabled: false
137
+
126
138
  # Style/UnlessModifier:
127
139
  # Enabled: false
128
140
 
@@ -130,4 +142,4 @@ Style/ZeroLengthPredicate:
130
142
  Enabled: false
131
143
 
132
144
  Layout/LineLength:
133
- Max: 240
145
+ Max: 256
data/CHANGELOG.md CHANGED
@@ -1,6 +1,40 @@
1
1
 
2
2
  # SmarterCSV 1.x Change Log
3
3
 
4
+ ## 1.9.2.pre01 (2023-11-11)
5
+ * fixed bug with '\\' at end of line (issue #252)
6
+ * fixed require statements
7
+
8
+ ## 1.9.1 (2023-10-30) (YANKED)
9
+ * yanked
10
+ * no functional changes
11
+ * refactored directory structure
12
+ * re-added JRuby and TruffleRuby to CI tests
13
+ * no C-accelleration for JRuby
14
+ * refactored options parsing
15
+ * code coverage / rubocop
16
+
17
+ ## 1.9.0 (2023-09-04)
18
+ * fixed issue #139
19
+
20
+ * Error `SmarterCSV::MissingHeaders` was renamed to `SmarterCSV::MissingKeys`
21
+
22
+ * CHANGED BEHAVIOR:
23
+ When `key_mapping` option is used. (issue #139)
24
+ Previous versions just printed an error message when a CSV header was missing during key mapping.
25
+ Versions >= 1.9 will throw `SmarterCSV::MissingHeaders` listing all headers that were missing during mapping.
26
+
27
+ * Notable details for `key_mapping` and `required_headers`:
28
+
29
+ * `key_mapping` is applied to the headers early on during `SmarterCSV.process`, and raises an error if a header in the input CSV file is missing, and we can not map that header to its desired name.
30
+
31
+ Mapping errors can be surpressed by using:
32
+ * `silence_missing_keys` set to `true`, which silence all such errors, making all headers for mapping optional.
33
+ * `silence_missing_keys` given an Array with the specific header keys that are optional
34
+ The use case is that some header fields are optional, but we still want them renamed if they are present.
35
+
36
+ * `required_headers` checks which headers are present **after** `key_mapping` was applied.
37
+
4
38
  ## 1.8.5 (2023-06-25)
5
39
  * fix parsing of escaped quote characters (thanks to JP Camara)
6
40
 
data/Gemfile CHANGED
@@ -5,10 +5,13 @@ source 'https://rubygems.org'
5
5
  # Specify your gem's dependencies in smarter_csv.gemspec
6
6
  gemspec
7
7
 
8
- gem "rake" # , "< 11"
8
+ gem "rake"
9
9
  gem "rake-compiler"
10
10
 
11
11
  gem 'pry'
12
-
13
- gem "rspec"
14
12
  gem "rubocop"
13
+
14
+ group :test do
15
+ gem "rspec"
16
+ gem "simplecov"
17
+ end
data/README.md CHANGED
@@ -46,7 +46,11 @@ One `smarter_csv` user wrote:
46
46
  * able to ignore "columns" in the input (delete columns)
47
47
  * able to eliminate nil or empty fields from the result hashes (default)
48
48
 
49
- NOTE; This Gem is only for importing CSV files - writing of CSV files is not supported at this time.
49
+ #### Assumptions / Limitations
50
+ * It is assumed that the escape character is `\`, as on UNIX and Windows systems.
51
+ * It is assumed that quote charcters around fields are balanced, e.g. valid: `"field"`, invalid: `"field\"`
52
+ e.g. an escaped `quote_char` does not denote the end of a field.
53
+ * This Gem is only for importing CSV files - writing of CSV files is not supported at this time.
50
54
 
51
55
  ### Why?
52
56
 
@@ -161,7 +165,22 @@ and how the `process` method returns the number of chunks when called with a blo
161
165
  => returns number of chunks / rows we processed
162
166
  ```
163
167
 
164
- #### Example 4: Reading a CSV-like File, and Processing it with Sidekiq:
168
+ #### Example 4: Processing a CSV File, and inserting batch jobs in Sidekiq:
169
+ ```ruby
170
+ filename = '/tmp/input.csv' # CSV file containing ids or data to process
171
+ options = { :chunk_size => 100 }
172
+ n = SmarterCSV.process(filename, options) do |chunk|
173
+ Sidekiq::Client.push_bulk(
174
+ 'class' => SidekiqIndividualWorkerClass,
175
+ 'args' => chunk,
176
+ )
177
+ # OR:
178
+ # SidekiqBatchWorkerClass.process_async(chunk ) # pass an array of hashes to Sidekiq workers for parallel processing
179
+ end
180
+ => returns number of chunks
181
+ ```
182
+
183
+ #### Example 4b: Reading a CSV-like File, and Processing it with Sidekiq:
165
184
  ```ruby
166
185
  filename = '/tmp/strange_db_dump' # a file with CRTL-A as col_separator, and with CTRL-B\n as record_separator (hello iTunes!)
167
186
  options = {
@@ -173,7 +192,6 @@ and how the `process` method returns the number of chunks when called with a blo
173
192
  end
174
193
  => returns number of chunks
175
194
  ```
176
-
177
195
  #### Example 5: Populate a MongoDB Database in Chunks of 100 records with SmarterCSV:
178
196
  ```ruby
179
197
  # using chunks:
@@ -282,7 +300,9 @@ And header and data validations will also be supported in 2.x
282
300
  | Option | Default | Explanation |
283
301
  ---------------------------------------------------------------------------------------------------------------------------------
284
302
  | :key_mapping | nil | a hash which maps headers from the CSV file to keys in the result hash |
285
- | :silence_missing_key | false | ignore missing keys in `key_mapping` if true |
303
+ | :silence_missing_key | false | ignore missing keys in `key_mapping` |
304
+ | | | if set to true: makes all mapped keys optional |
305
+ | | | if given an array, makes only the keys listed in it optional |
286
306
  | :required_keys | nil | An array. Specify the required names AFTER header transformation. |
287
307
  | :required_headers | nil | (DEPRECATED / renamed) Use `required_keys` instead |
288
308
  | | | or an exception is raised No validation if nil is given. |
data/Rakefile CHANGED
@@ -3,16 +3,15 @@
3
3
  require "bundler/gem_tasks"
4
4
  require 'rspec/core/rake_task'
5
5
 
6
-
7
- # temp fix for NoMethodError: undefined method `last_comment'
8
- # remove when fixed in Rake 11.x and higher
9
- module TempFixForRakeLastComment
10
- def last_comment
11
- last_description
12
- end
13
- end
14
- Rake::Application.send :include, TempFixForRakeLastComment
15
- ### end of tempfix
6
+ # # temp fix for NoMethodError: undefined method `last_comment'
7
+ # # remove when fixed in Rake 11.x and higher
8
+ # module TempFixForRakeLastComment
9
+ # def last_comment
10
+ # last_description
11
+ # end
12
+ # end
13
+ # Rake::Application.send :include, TempFixForRakeLastComment
14
+ # ### end of tempfix
16
15
 
17
16
  RSpec::Core::RakeTask.new(:spec)
18
17
 
@@ -22,11 +21,26 @@ RuboCop::RakeTask.new
22
21
 
23
22
  require "rake/extensiontask"
24
23
 
25
- task build: :compile
24
+ if RUBY_ENGINE == 'jruby'
25
+
26
+ task default: %i[spec]
26
27
 
27
- Rake::ExtensionTask.new("smarter_csv") do |ext|
28
- ext.ext_dir = "ext/smarter_csv"
28
+ else
29
+ task build: :compile
30
+
31
+ Rake::ExtensionTask.new("smarter_csv") do |ext|
32
+ ext.lib_dir = "lib/smarter_csv"
33
+ ext.ext_dir = "ext/smarter_csv"
34
+ ext.source_pattern = "*.{c,h}"
35
+ end
36
+
37
+ # task default: %i[clobber compile spec rubocop]
38
+ task default: %i[clobber compile spec]
29
39
  end
30
40
 
31
- # task default: %i[clobber compile spec rubocop]
32
- task default: %i[clobber compile spec]
41
+ desc 'Run spec with coverage'
42
+ task :coverage do
43
+ ENV['COVERAGE'] = 'true'
44
+ Rake::Task['spec'].execute
45
+ `open coverage/index.html`
46
+ end
@@ -1,10 +1,10 @@
1
1
  # frozen_string_literal: true
2
2
 
3
3
  require 'mkmf'
4
-
5
4
  require "rbconfig"
5
+
6
6
  if RbConfig::MAKEFILE_CONFIG["CFLAGS"].include?("-g -O3")
7
- fixed_CFLAGS = RbConfig::MAKEFILE_CONFIG["CFLAGS"].sub("-g -O3", "$(cflags)")
7
+ fixed_CFLAGS = RbConfig::MAKEFILE_CONFIG["CFLAGS"].sub("-g -O3", "-O3 $(cflags)")
8
8
  puts("Fix CFLAGS: #{RbConfig::MAKEFILE_CONFIG["CFLAGS"]} -> #{fixed_CFLAGS}")
9
9
  RbConfig::MAKEFILE_CONFIG["CFLAGS"] = fixed_CFLAGS
10
10
  end
@@ -40,6 +40,7 @@ static VALUE rb_parse_csv_line(VALUE self, VALUE line, VALUE col_sep, VALUE quot
40
40
  long i;
41
41
 
42
42
  char prev_char = '\0'; // Store the previous character for comparison against an escape character
43
+ long backslash_count = 0; // to count consecutive backslash characters
43
44
 
44
45
  while (p < endP) {
45
46
  /* does the remaining string start with col_sep ? */
@@ -61,8 +62,13 @@ static VALUE rb_parse_csv_line(VALUE self, VALUE line, VALUE col_sep, VALUE quot
61
62
  startP = p;
62
63
  }
63
64
  } else {
64
- if (*p == *quoteP && prev_char != '\\') {
65
- quote_count += 1;
65
+ if (*p == '\\') {
66
+ backslash_count++;
67
+ } else {
68
+ if (*p == *quoteP && (backslash_count % 2 == 0)) {
69
+ quote_count++;
70
+ }
71
+ backslash_count = 0; // no more consecutive backslash characters
66
72
  }
67
73
  p++;
68
74
  }
@@ -0,0 +1,84 @@
1
+ # frozen_string_literal: true
2
+
3
+ module SmarterCSV
4
+ DEFAULT_OPTIONS = {
5
+ acceleration: true,
6
+ auto_row_sep_chars: 500,
7
+ chunk_size: nil,
8
+ col_sep: :auto, # was: ',',
9
+ comment_regexp: nil, # was: /\A#/,
10
+ convert_values_to_numeric: true,
11
+ downcase_header: true,
12
+ duplicate_header_suffix: nil,
13
+ file_encoding: 'utf-8',
14
+ force_simple_split: false,
15
+ force_utf8: false,
16
+ headers_in_file: true,
17
+ invalid_byte_sequence: '',
18
+ keep_original_headers: false,
19
+ key_mapping: nil,
20
+ quote_char: '"',
21
+ remove_empty_hashes: true,
22
+ remove_empty_values: true,
23
+ remove_unmapped_keys: false,
24
+ remove_values_matching: nil,
25
+ remove_zero_values: false,
26
+ required_headers: nil,
27
+ required_keys: nil,
28
+ row_sep: :auto, # was: $/,
29
+ silence_missing_keys: false,
30
+ skip_lines: nil,
31
+ strings_as_keys: false,
32
+ strip_chars_from_headers: nil,
33
+ strip_whitespace: true,
34
+ user_provided_headers: nil,
35
+ value_converters: nil,
36
+ verbose: false,
37
+ with_line_numbers: false,
38
+ }.freeze
39
+
40
+ class << self
41
+ # NOTE: this is not called when "parse" methods are tested by themselves
42
+ def process_options(given_options = {})
43
+ puts "User provided options:\n#{pp(given_options)}\n" if given_options[:verbose]
44
+
45
+ # fix invalid input
46
+ given_options[:invalid_byte_sequence] = '' if given_options[:invalid_byte_sequence].nil?
47
+
48
+ @options = DEFAULT_OPTIONS.dup.merge!(given_options)
49
+ puts "Computed options:\n#{pp(@options)}\n" if given_options[:verbose]
50
+
51
+ validate_options!(@options)
52
+ @options
53
+ end
54
+
55
+ # NOTE: this is not called when "parse" methods are tested by themselves
56
+ #
57
+ # ONLY FOR BACKWARDS-COMPATIBILITY
58
+ def default_options
59
+ DEFAULT_OPTIONS
60
+ end
61
+
62
+ private
63
+
64
+ def validate_options!(options)
65
+ keys = options.keys
66
+ errors = []
67
+ errors << "invalid row_sep" if keys.include?(:row_sep) && !option_valid?(options[:row_sep])
68
+ errors << "invalid col_sep" if keys.include?(:col_sep) && !option_valid?(options[:col_sep])
69
+ errors << "invalid quote_char" if keys.include?(:quote_char) && !option_valid?(options[:quote_char])
70
+ raise SmarterCSV::ValidationError, errors.inspect if errors.any?
71
+ end
72
+
73
+ def option_valid?(str)
74
+ return true if str.is_a?(Symbol) && str == :auto
75
+ return true if str.is_a?(String) && !str.empty?
76
+
77
+ false
78
+ end
79
+
80
+ def pp(value)
81
+ defined?(AwesomePrint) ? value.awesome_inspect(index: nil) : value.inspect
82
+ end
83
+ end
84
+ end