smarter_csv 1.4.2 → 1.5.2

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: 3be724101d41326ff480bcb723c1b40a3cabd879eb55e0c2f044372f8e5a57d0
4
- data.tar.gz: 657db1421352f449bf042f8df4d5178167af048ad37836e4f2f2f8a6aea3ece0
3
+ metadata.gz: 88b9932c898320fb05d5697e155dc0bd3ade887d2fcfab7b660933e230007364
4
+ data.tar.gz: f0525d9c917aff44f910d4547b8e918faa3beb50d47adc29182df1fc1ec2be19
5
5
  SHA512:
6
- metadata.gz: 3430649df35ac8139d35b04b85e8691ca5fc3d98b7b15f0d3987855f571987bdb742e0ed6f807ddb7a2e61e61d696d529ac311bc58e30188325f1c4bb78098a4
7
- data.tar.gz: 1b386af7cc7c39bc7ea934875e16f6641a2cc0c2bb5dfaa3b1f298739b1b355b2f41570e42998a2d7790a17f96feb07118b69c23d913acc634aae5901f0c9229
6
+ metadata.gz: 330ad44b9808150f6fdf96dec65d259c2d9cf5eb25e22dc80f63095f4014b065b8aa97a2ba9b814c6cea6f4c0361e04567be403ab78b54d0518b49dc072f36ac
7
+ data.tar.gz: 27531bd508b5b455a32947badfb85d7e95489ad282a837ef046864806ba7fa12539148ab2fe4c84174fba3ef085dd3adda5d7c070615d684cd99ed0f90b903a3
data/CHANGELOG.md CHANGED
@@ -1,7 +1,28 @@
1
1
 
2
2
  # SmarterCSV 1.x Change Log
3
3
 
4
- ## 1.4.1 (2022-02-12)
4
+ ## 1.5.2 (2022-04-29)
5
+ * added missing keys to the SmarterCSV::KeyMappingError exception message #189 (thanks to John Dell)
6
+
7
+ ## 1.5.1 (2022-04-27)
8
+ * added raising of `KeyMappingError` if `key_mapping` refers to a non-existent key
9
+ * added option `duplicate_header_suffix` (thanks to Skye Shaw)
10
+ When given a non-nil string, it uses the suffix to append numbering 2..n to duplicate headers.
11
+ If your code will need to process arbitrary CSV files, please set `duplicate_header_suffix`.
12
+
13
+ ## 1.5.0 (2022-04-25)
14
+ * fixed bug with trailing col_sep characters, introduced in 1.4.0
15
+ * Fix deprecation warning in Ruby 3.0.3 / $INPUT_RECORD_SEPARATOR (thanks to Joel Fouse )
16
+
17
+ * changed default for `comment_regexp` to be `nil` for a safer default behavior (thanks to David Lazar)
18
+ **Note**
19
+ This no longer assumes that lines starting with `#` are comments.
20
+ If you want to treat lines starting with '#' as comments, use `comment_regexp: /\A#/`
21
+
22
+ ## 1.4.2 (2022-02-12)
23
+ * fixed issue with simplecov
24
+
25
+ ## 1.4.1 (2022-02-12) (PULLED)
5
26
  * minor fix: also support `col_sep: :auto`
6
27
  * added simplecov
7
28
 
data/CONTRIBUTORS.md CHANGED
@@ -43,3 +43,5 @@ A Big Thank you to everyone who filed issues, sent comments, and who contributed
43
43
  * [Olle Jonsson](https://github.com/olleolleolle)
44
44
  * [Nicolas Guillemain](https://github.com/Viiruus)
45
45
  * [Sp6](https://github.com/sp6)
46
+ * [Joel Fouse](https://github.com/jfouse)
47
+ * [John Dell](https://github.com/spovich)
data/README.md CHANGED
@@ -215,7 +215,7 @@ The options and the block are optional.
215
215
  | :invalid_byte_sequence | '' | what to replace invalid byte sequences with |
216
216
  | :force_utf8 | false | force UTF-8 encoding of all lines (including headers) in the CSV file |
217
217
  | :skip_lines | nil | how many lines to skip before the first line or header line is processed |
218
- | :comment_regexp | /^#/ | regular expression which matches comment lines (see NOTE about the CSV header) |
218
+ | :comment_regexp | nil | regular expression to ignore comment lines (see NOTE on CSV header), e.g./\A#/ |
219
219
  ---------------------------------------------------------------------------------------------------------------------------------
220
220
  | :col_sep | ',' | column separator, can be set to :auto |
221
221
  | :force_simple_split | false | force simple splitting on :col_sep character for non-standard CSV-files. |
@@ -228,6 +228,7 @@ The options and the block are optional.
228
228
  | :headers_in_file | true | Whether or not the file contains headers as the first line. |
229
229
  | | | Important if the file does not contain headers, |
230
230
  | | | otherwise you would lose the first line of data. |
231
+ | :duplicate_header_suffix | nil | If set, adds numbers to duplicated headers and separates them by the given suffix |
231
232
  | :user_provided_headers | nil | *careful with that axe!* |
232
233
  | | | user provided Array of header strings or symbols, to define |
233
234
  | | | what headers should be used, overriding any in-file headers. |
@@ -282,14 +283,23 @@ And header and data validations will also be supported in 2.x
282
283
  data = SmarterCSV.process(f)
283
284
  end
284
285
  ```
286
+
285
287
  #### NOTES about CSV Headers:
286
288
  * as this method parses CSV files, it is assumed that the first line of any file will contain a valid header
287
- * the first line with the CSV header may or may not be commented out according to the :comment_regexp
289
+ * the first line with the header might be commented out, in which case you will need to set `comment_regexp: /\A#/`
290
+ This is no longer handled automatically since 1.5.0.
288
291
  * any occurences of :comment_regexp or :row_sep will be stripped from the first line with the CSV header
289
292
  * any of the keys in the header line will be downcased, spaces replaced by underscore, and converted to Ruby symbols before being used as keys in the returned Hashes
290
293
  * you can not combine the :user_provided_headers and :key_mapping options
291
294
  * if the incorrect number of headers are provided via :user_provided_headers, exception SmarterCSV::HeaderSizeMismatch is raised
292
295
 
296
+ #### NOTES on Duplicate Headers:
297
+ As a corner case, it is possible that a CSV file contains multiple headers with the same name.
298
+ * If that happens, by default `smarter_csv` will raise a `DuplicateHeaders` error.
299
+ * If you set `duplicate_header_suffix` to a non-nil string, it will use it to append numbers 2..n to the duplicate headers. To further disambiguate the headers, you can further use `key_mapping` to assign meaningful names.
300
+ * If your code will need to process arbitrary CSV files, please set `duplicate_header_suffix`.
301
+ * Another way to deal with duplicate headers it to use `user_assigned_headers` to ignore any headers in the file.
302
+
293
303
  #### NOTES on Key Mapping:
294
304
  * keys in the header line of the file can be re-mapped to a chosen set of symbols, so the resulting Hashes can be better used internally in your application (e.g. when directly creating MongoDB entries with them)
295
305
  * if you want to completely delete a key, then map it to nil or to '', they will be automatically deleted from any result Hash
@@ -5,108 +5,41 @@ module SmarterCSV
5
5
  class DuplicateHeaders < SmarterCSVException; end
6
6
  class MissingHeaders < SmarterCSVException; end
7
7
  class NoColSepDetected < SmarterCSVException; end
8
+ class KeyMappingError < SmarterCSVException; end
8
9
 
9
- def SmarterCSV.process(input, options={}, &block) # first parameter: filename or input object with readline method
10
+ # first parameter: filename or input object which responds to readline method
11
+ def SmarterCSV.process(input, options={}, &block)
10
12
  options = default_options.merge(options)
11
13
  options[:invalid_byte_sequence] = '' if options[:invalid_byte_sequence].nil?
12
14
 
13
15
  headerA = []
14
16
  result = []
15
- old_row_sep = $INPUT_RECORD_SEPARATOR
16
- file_line_count = 0
17
- csv_line_count = 0
17
+ @file_line_count = 0
18
+ @csv_line_count = 0
18
19
  has_rails = !! defined?(Rails)
19
20
  begin
20
- f = input.respond_to?(:readline) ? input : File.open(input, "r:#{options[:file_encoding]}")
21
+ fh = input.respond_to?(:readline) ? input : File.open(input, "r:#{options[:file_encoding]}")
21
22
 
22
23
  # auto-detect the row separator
23
- options[:row_sep] = SmarterCSV.guess_line_ending(f, options) if options[:row_sep].to_sym == :auto
24
- $INPUT_RECORD_SEPARATOR = options[:row_sep]
24
+ options[:row_sep] = SmarterCSV.guess_line_ending(fh, options) if options[:row_sep].to_sym == :auto
25
25
  # attempt to auto-detect column separator
26
- options[:col_sep] = guess_column_separator(f) if options[:col_sep].to_sym == :auto
26
+ options[:col_sep] = guess_column_separator(fh, options) if options[:col_sep].to_sym == :auto
27
27
  # preserve options, in case we need to call the CSV class
28
28
  csv_options = options.select{|k,v| [:col_sep, :row_sep, :quote_char].include?(k)} # options.slice(:col_sep, :row_sep, :quote_char)
29
29
  csv_options.delete(:row_sep) if [nil, :auto].include?( options[:row_sep].to_sym )
30
30
  csv_options.delete(:col_sep) if [nil, :auto].include?( options[:col_sep].to_sym )
31
31
 
32
- if (options[:force_utf8] || options[:file_encoding] =~ /utf-8/i) && ( f.respond_to?(:external_encoding) && f.external_encoding != Encoding.find('UTF-8') || f.respond_to?(:encoding) && f.encoding != Encoding.find('UTF-8') )
32
+ if (options[:force_utf8] || options[:file_encoding] =~ /utf-8/i) && ( fh.respond_to?(:external_encoding) && fh.external_encoding != Encoding.find('UTF-8') || fh.respond_to?(:encoding) && fh.encoding != Encoding.find('UTF-8') )
33
33
  puts 'WARNING: you are trying to process UTF-8 input, but did not open the input with "b:utf-8" option. See README file "NOTES about File Encodings".'
34
34
  end
35
35
 
36
- options[:skip_lines].to_i.times{f.readline} if options[:skip_lines].to_i > 0
37
-
38
- if options[:headers_in_file] # extract the header line
39
- # process the header line in the CSV file..
40
- # the first line of a CSV file contains the header .. it might be commented out, so we need to read it anyhow
41
- header = f.readline
42
- header = header.force_encoding('utf-8').encode('utf-8', invalid: :replace, undef: :replace, replace: options[:invalid_byte_sequence]) if options[:force_utf8] || options[:file_encoding] !~ /utf-8/i
43
- header = header.sub(options[:comment_regexp],'').chomp(options[:row_sep])
44
-
45
- file_line_count += 1
46
- csv_line_count += 1
47
- header = header.gsub(options[:strip_chars_from_headers], '') if options[:strip_chars_from_headers]
48
-
49
- if (header =~ %r{#{options[:quote_char]}}) and (! options[:force_simple_split])
50
- file_headerA = begin
51
- CSV.parse( header, **csv_options ).flatten.collect!{|x| x.nil? ? '' : x} # to deal with nil values from CSV.parse
52
- rescue CSV::MalformedCSVError => e
53
- raise $!, "#{$!} [SmarterCSV: csv line #{csv_line_count}]", $!.backtrace
54
- end
55
- else
56
- file_headerA = header.split(options[:col_sep])
57
- end
58
- file_header_size = file_headerA.size # before mapping, which could delete keys
59
-
60
- file_headerA.map!{|x| x.gsub(%r/#{options[:quote_char]}/,'') }
61
- file_headerA.map!{|x| x.strip} if options[:strip_whitespace]
62
- unless options[:keep_original_headers]
63
- file_headerA.map!{|x| x.gsub(/\s+|-+/,'_')}
64
- file_headerA.map!{|x| x.downcase } if options[:downcase_header]
36
+ if options[:skip_lines].to_i > 0
37
+ options[:skip_lines].to_i.times do
38
+ readline_with_counts(fh, options)
65
39
  end
66
- else
67
- raise SmarterCSV::IncorrectOption , "ERROR: If :headers_in_file is set to false, you have to provide :user_provided_headers" if options[:user_provided_headers].nil?
68
- end
69
- if options[:user_provided_headers] && options[:user_provided_headers].class == Array && ! options[:user_provided_headers].empty?
70
- # use user-provided headers
71
- headerA = options[:user_provided_headers]
72
- if defined?(file_header_size) && ! file_header_size.nil?
73
- if headerA.size != file_header_size
74
- raise SmarterCSV::HeaderSizeMismatch , "ERROR: :user_provided_headers defines #{headerA.size} headers != CSV-file #{input} has #{file_header_size} headers"
75
- else
76
- # we could print out the mapping of file_headerA to headerA here
77
- end
78
- end
79
- else
80
- headerA = file_headerA
81
40
  end
82
- header_size = headerA.size # used for splitting lines
83
41
 
84
- headerA.map!{|x| x.to_sym } unless options[:strings_as_keys] || options[:keep_original_headers]
85
-
86
- unless options[:user_provided_headers] # wouldn't make sense to re-map user provided headers
87
- key_mappingH = options[:key_mapping]
88
-
89
- # do some key mapping on the keys in the file header
90
- # if you want to completely delete a key, then map it to nil or to ''
91
- if ! key_mappingH.nil? && key_mappingH.class == Hash && key_mappingH.keys.size > 0
92
- headerA.map!{|x| key_mappingH.has_key?(x) ? (key_mappingH[x].nil? ? nil : key_mappingH[x]) : (options[:remove_unmapped_keys] ? nil : x)}
93
- end
94
- end
95
-
96
- # header_validations
97
- duplicate_headers = []
98
- headerA.compact.each do |k|
99
- duplicate_headers << k if headerA.select{|x| x == k}.size > 1
100
- end
101
- raise SmarterCSV::DuplicateHeaders , "ERROR: duplicate headers: #{duplicate_headers.join(',')}" unless duplicate_headers.empty?
102
-
103
- if options[:required_headers] && options[:required_headers].is_a?(Array)
104
- missing_headers = []
105
- options[:required_headers].each do |k|
106
- missing_headers << k unless headerA.include?(k)
107
- end
108
- raise SmarterCSV::MissingHeaders , "ERROR: missing headers: #{missing_headers.join(',')}" unless missing_headers.empty?
109
- end
42
+ headerA, header_size = process_headers(fh, options, csv_options)
110
43
 
111
44
  # in case we use chunking.. we'll need to set it up..
112
45
  if ! options[:chunk_size].nil? && options[:chunk_size].to_i > 0
@@ -119,42 +52,41 @@ module SmarterCSV
119
52
  end
120
53
 
121
54
  # now on to processing all the rest of the lines in the CSV file:
122
- while ! f.eof? # we can't use f.readlines() here, because this would read the whole file into memory at once, and eof => true
123
- line = f.readline # read one line.. this uses the input_record_separator $INPUT_RECORD_SEPARATOR which we set previously!
55
+ while ! fh.eof? # we can't use fh.readlines() here, because this would read the whole file into memory at once, and eof => true
56
+ line = readline_with_counts(fh, options)
124
57
 
125
58
  # replace invalid byte sequence in UTF-8 with question mark to avoid errors
126
59
  line = line.force_encoding('utf-8').encode('utf-8', invalid: :replace, undef: :replace, replace: options[:invalid_byte_sequence]) if options[:force_utf8] || options[:file_encoding] !~ /utf-8/i
127
60
 
128
- file_line_count += 1
129
- csv_line_count += 1
130
- print "processing file line %10d, csv line %10d\r" % [file_line_count, csv_line_count] if options[:verbose]
131
- next if line =~ options[:comment_regexp] # ignore all comment lines if there are any
61
+ print "processing file line %10d, csv line %10d\r" % [@file_line_count, @csv_line_count] if options[:verbose]
62
+
63
+ next if options[:comment_regexp] && line =~ options[:comment_regexp] # ignore all comment lines if there are any
132
64
 
133
65
  # cater for the quoted csv data containing the row separator carriage return character
134
66
  # in which case the row data will be split across multiple lines (see the sample content in spec/fixtures/carriage_returns_rn.csv)
135
67
  # by detecting the existence of an uneven number of quote characters
136
68
  multiline = line.count(options[:quote_char])%2 == 1 # should handle quote_char nil
137
69
  while line.count(options[:quote_char])%2 == 1 # should handle quote_char nil
138
- next_line = f.readline
70
+ next_line = fh.readline(options[:row_sep])
139
71
  next_line = next_line.force_encoding('utf-8').encode('utf-8', invalid: :replace, undef: :replace, replace: options[:invalid_byte_sequence]) if options[:force_utf8] || options[:file_encoding] !~ /utf-8/i
140
72
  line += next_line
141
- file_line_count += 1
73
+ @file_line_count += 1
142
74
  end
143
- print "\nline contains uneven number of quote chars so including content through file line %d\n" % file_line_count if options[:verbose] && multiline
75
+ print "\nline contains uneven number of quote chars so including content through file line %d\n" % @file_line_count if options[:verbose] && multiline
144
76
 
145
- line.chomp! # will use $INPUT_RECORD_SEPARATOR which is set to options[:col_sep]
77
+ line.chomp!(options[:row_sep])
146
78
 
147
79
  if (line =~ %r{#{options[:quote_char]}}) and (! options[:force_simple_split])
148
80
  dataA = begin
149
81
  CSV.parse( line, **csv_options ).flatten.collect!{|x| x.nil? ? '' : x} # to deal with nil values from CSV.parse
150
82
  rescue CSV::MalformedCSVError => e
151
- raise $!, "#{$!} [SmarterCSV: csv line #{csv_line_count}]", $!.backtrace
83
+ raise $!, "#{$!} [SmarterCSV: csv line #{@csv_line_count}]", $!.backtrace
152
84
  end
153
85
  else
154
- dataA = line.split(options[:col_sep], header_size)
86
+ dataA = line.split(options[:col_sep], header_size)
155
87
  end
156
- #### dataA.map!{|x| x.gsub(%r/#{options[:quote_char]}/,'') } # this is actually not a good idea as a default
157
- dataA.map!{|x| x.strip} if options[:strip_whitespace]
88
+ dataA.map!{|x| x.sub(/(#{options[:col_sep]})+\z/, '')} # remove any unwanted trailing col_sep characters at the end
89
+ dataA.map!{|x| x.strip} if options[:strip_whitespace]
158
90
 
159
91
  # if all values are blank, then ignore this line
160
92
  # SEE: https://github.com/rails/rails/blob/32015b6f369adc839c4f0955f2d9dce50c0b6123/activesupport/lib/active_support/core_ext/object/blank.rb#L121
@@ -208,7 +140,7 @@ module SmarterCSV
208
140
  if use_chunks
209
141
  chunk << hash # append temp result to chunk
210
142
 
211
- if chunk.size >= chunk_size || f.eof? # if chunk if full, or EOF reached
143
+ if chunk.size >= chunk_size || fh.eof? # if chunk if full, or EOF reached
212
144
  # do something with the chunk
213
145
  if block_given?
214
146
  yield chunk # do something with the hashes in the chunk in the block
@@ -249,8 +181,7 @@ module SmarterCSV
249
181
  chunk = [] # initialize for next chunk of data
250
182
  end
251
183
  ensure
252
- $INPUT_RECORD_SEPARATOR = old_row_sep # make sure this stupid global variable is always reset to it's previous value after we're done!
253
- f.close if f.respond_to?(:close)
184
+ fh.close if fh.respond_to?(:close)
254
185
  end
255
186
  if block_given?
256
187
  return chunk_count # when we do processing through a block we only care how many chunks we processed
@@ -261,14 +192,22 @@ module SmarterCSV
261
192
 
262
193
  private
263
194
 
195
+ def self.readline_with_counts(filehandle, options)
196
+ line = filehandle.readline(options[:row_sep])
197
+ @file_line_count += 1
198
+ @csv_line_count += 1
199
+ line
200
+ end
201
+
264
202
  def self.default_options
265
203
  {
266
204
  auto_row_sep_chars: 500,
267
205
  chunk_size: nil ,
268
206
  col_sep: ',',
269
- comment_regexp: /\A#/,
207
+ comment_regexp: nil, # was: /\A#/,
270
208
  convert_values_to_numeric: true,
271
209
  downcase_header: true,
210
+ duplicate_header_suffix: nil,
272
211
  file_encoding: 'utf-8',
273
212
  force_simple_split: false ,
274
213
  force_utf8: false,
@@ -329,11 +268,11 @@ module SmarterCSV
329
268
  end
330
269
 
331
270
  # raise exception if none is found
332
- def self.guess_column_separator(filehandle)
271
+ def self.guess_column_separator(filehandle, options)
333
272
  del = [',', "\t", ';', ':', '|']
334
273
  n = Hash.new(0)
335
274
  5.times do
336
- line = filehandle.readline
275
+ line = filehandle.readline(options[:row_sep])
337
276
  del.each do |d|
338
277
  n[d] += line.scan(d).count
339
278
  end
@@ -379,4 +318,102 @@ module SmarterCSV
379
318
  k,_ = counts.max_by{|_,v| v}
380
319
  return k # the most frequent one is it
381
320
  end
321
+
322
+ def self.process_headers(filehandle, options, csv_options)
323
+ if options[:headers_in_file] # extract the header line
324
+ # process the header line in the CSV file..
325
+ # the first line of a CSV file contains the header .. it might be commented out, so we need to read it anyhow
326
+ header = readline_with_counts(filehandle, options)
327
+
328
+ header = header.force_encoding('utf-8').encode('utf-8', invalid: :replace, undef: :replace, replace: options[:invalid_byte_sequence]) if options[:force_utf8] || options[:file_encoding] !~ /utf-8/i
329
+ header = header.sub(options[:comment_regexp],'') if options[:comment_regexp]
330
+ header = header.chomp(options[:row_sep])
331
+
332
+ header = header.gsub(options[:strip_chars_from_headers], '') if options[:strip_chars_from_headers]
333
+
334
+ if (header =~ %r{#{options[:quote_char]}}) and (! options[:force_simple_split])
335
+ file_headerA = begin
336
+ CSV.parse( header, **csv_options ).flatten.collect!{|x| x.nil? ? '' : x} # to deal with nil values from CSV.parse
337
+ rescue CSV::MalformedCSVError => e
338
+ raise $!, "#{$!} [SmarterCSV: csv line #{@csv_line_count}]", $!.backtrace
339
+ end
340
+ else
341
+ file_headerA = header.split(options[:col_sep])
342
+ end
343
+ file_header_size = file_headerA.size # before mapping, which could delete keys
344
+
345
+ file_headerA.map!{|x| x.gsub(%r/#{options[:quote_char]}/,'') }
346
+ file_headerA.map!{|x| x.strip} if options[:strip_whitespace]
347
+ unless options[:keep_original_headers]
348
+ file_headerA.map!{|x| x.gsub(/\s+|-+/,'_')}
349
+ file_headerA.map!{|x| x.downcase } if options[:downcase_header]
350
+ end
351
+ else
352
+ raise SmarterCSV::IncorrectOption , "ERROR: If :headers_in_file is set to false, you have to provide :user_provided_headers" unless options[:user_provided_headers]
353
+ end
354
+ if options[:user_provided_headers] && options[:user_provided_headers].class == Array && ! options[:user_provided_headers].empty?
355
+ # use user-provided headers
356
+ headerA = options[:user_provided_headers]
357
+ if defined?(file_header_size) && ! file_header_size.nil?
358
+ if headerA.size != file_header_size
359
+ raise SmarterCSV::HeaderSizeMismatch , "ERROR: :user_provided_headers defines #{headerA.size} headers != CSV-file #{input} has #{file_header_size} headers"
360
+ else
361
+ # we could print out the mapping of file_headerA to headerA here
362
+ end
363
+ end
364
+ else
365
+ headerA = file_headerA
366
+ end
367
+
368
+ # detect duplicate headers and disambiguate
369
+ headerA = process_duplicate_headers(headerA, options) if options[:duplicate_header_suffix]
370
+ header_size = headerA.size # used for splitting lines
371
+
372
+ headerA.map!{|x| x.to_sym } unless options[:strings_as_keys] || options[:keep_original_headers]
373
+
374
+ unless options[:user_provided_headers] # wouldn't make sense to re-map user provided headers
375
+ key_mappingH = options[:key_mapping]
376
+
377
+ # do some key mapping on the keys in the file header
378
+ # if you want to completely delete a key, then map it to nil or to ''
379
+ if ! key_mappingH.nil? && key_mappingH.class == Hash && key_mappingH.keys.size > 0
380
+ # we can't map keys that are not there
381
+ missing_keys = key_mappingH.keys - headerA
382
+ raise(SmarterCSV::KeyMappingError, "missing header(s): #{missing_keys.join(",")}") unless missing_keys.empty?
383
+
384
+ headerA.map!{|x| key_mappingH.has_key?(x) ? (key_mappingH[x].nil? ? nil : key_mappingH[x]) : (options[:remove_unmapped_keys] ? nil : x)}
385
+ end
386
+ end
387
+
388
+ # header_validations
389
+ duplicate_headers = []
390
+ headerA.compact.each do |k|
391
+ duplicate_headers << k if headerA.select{|x| x == k}.size > 1
392
+ end
393
+ raise SmarterCSV::DuplicateHeaders , "ERROR: duplicate headers: #{duplicate_headers.join(',')}" unless duplicate_headers.empty?
394
+
395
+ if options[:required_headers] && options[:required_headers].is_a?(Array)
396
+ missing_headers = []
397
+ options[:required_headers].each do |k|
398
+ missing_headers << k unless headerA.include?(k)
399
+ end
400
+ raise SmarterCSV::MissingHeaders , "ERROR: missing headers: #{missing_headers.join(',')}" unless missing_headers.empty?
401
+ end
402
+
403
+ [headerA, header_size]
404
+ end
405
+
406
+ def self.process_duplicate_headers(headers, options)
407
+ counts = Hash.new(0)
408
+ result = []
409
+ headers.each do |key|
410
+ counts[key] += 1
411
+ if counts[key] == 1
412
+ result << key
413
+ else
414
+ result << [key, options[:duplicate_header_suffix], counts[key]].join
415
+ end
416
+ end
417
+ result
418
+ end
382
419
  end
@@ -1,3 +1,3 @@
1
1
  module SmarterCSV
2
- VERSION = "1.4.2"
2
+ VERSION = "1.5.2"
3
3
  end
@@ -0,0 +1,6 @@
1
+ col1,col2
2
+ eins,zwei
3
+ uno,dos,
4
+ one,two ,,,
5
+ ichi, ,,,,,
6
+ un
@@ -1,3 +1,3 @@
1
1
  email,firstname,lastname,email,age
2
2
  tom@bla.com,Tom,Sawyer,mike@bla.com,34
3
- eri@bla.com,Eri Chan,tom@bla.com,21
3
+ eri@bla.com,Eri,Chan,tom@bla.com,21
@@ -0,0 +1,2 @@
1
+ Name,Email,Financial Status,Paid at,Fulfillment Status,Fulfilled at,Accepts Marketing,Currency,Subtotal,Shipping,Taxes,Total,Discount Code,Discount Amount,Shipping Method,Created at,Lineitem quantity,Lineitem name,Lineitem price,Lineitem compare at price,Lineitem sku,Lineitem requires shipping,Lineitem taxable,Lineitem fulfillment status,Billing Name,Billing Street,Billing Address1,Billing Address2,Billing Company,Billing City,Billing Zip,Billing Province,Billing Country,Billing Phone,Shipping Name,Shipping Street,Shipping Address1,Shipping Address2,Shipping Company,Shipping City,Shipping Zip,Shipping Province,Shipping Country,Shipping Phone,Notes,Note Attributes,Cancelled at,Payment Method,Payment Reference,Refunded Amount,Vendor, rece,Tags,Risk Level,Source,Lineitem discount,Tax 1 Name,Tax 1 Value,Tax 2 Name,Tax 2 Value,Tax 3 Name,Tax 3 Value,Tax 4 Name,Tax 4 Value,Tax 5 Name,Tax 5 Value,Phone,Receipt Number,Duties,Billing Province Name,Shipping Province Name,Payment ID,Payment Terms Name,Next Payment Due At
2
+ #MR1220817,foo@bar.com,paid,2022-02-08 22:31:28 +0100,unfulfilled,,yes,EUR,144,0,24,144,VIP,119.6,"Livraison Standard GRATUITE, 2-5 jours avec suivi",2022-02-08 22:31:26 +0100,2,Cire Épilation Nacrée,37,,WAX-200-NAC,true,true,pending,French Fry,64 Boulevard Budgié,64 Boulevard Budgié,,,dootdoot’,'49100,,FR,06 12 34 56 78,French Fry,64 Boulevard Budgi,64 Boulevard Budgié,,,dootdoot,'49100,,FR,06 12 34 56 78,,,,Stripe,c23800013619353.2,0,Goober Rég,4331065802905,902,Low,web,0,FR TVA 20%,24,,,,,,,,,3366012111111,,,,,,,
@@ -0,0 +1,45 @@
1
+ require 'spec_helper'
2
+
3
+ fixture_path = 'spec/fixtures'
4
+
5
+ describe 'handling of additional trailing column separators' do
6
+ let(:file) { "#{fixture_path}/additional_separator.csv" }
7
+
8
+ describe '' do
9
+ let(:data) { SmarterCSV.process(file) }
10
+
11
+ it 'reads all lines' do
12
+ data.size.should eq 5
13
+ end
14
+
15
+ it 'reads regular lines' do
16
+ item = data[0]
17
+ item[:col1].should == 'eins'
18
+ item[:col2].should == 'zwei'
19
+ end
20
+
21
+ it 'strips single trailing col_sep character' do
22
+ item = data[1]
23
+ item[:col1].should == 'uno'
24
+ item[:col2].should == 'dos'
25
+ end
26
+
27
+ it 'strips multiple trailing col_sep characters' do
28
+ item = data[2]
29
+ item[:col1].should == 'one'
30
+ item[:col2].should == 'two'
31
+ end
32
+
33
+ it 'strips multiple trailing col_sep chars' do
34
+ item = data[3]
35
+ item[:col1].should == 'ichi'
36
+ item[:col2].should == nil
37
+ end
38
+
39
+ it 'strips multiple trailing col_sep chars' do
40
+ item = data[4]
41
+ item[:col1].should == 'un'
42
+ item[:col2].should == nil
43
+ end
44
+ end
45
+ end
@@ -12,7 +12,7 @@ describe 'be_able_to' do
12
12
  it 'loads_binary_file_with_strings_as_keys' do
13
13
  options = {:col_sep => "\cA", :row_sep => "\cB", :comment_regexp => /^#/, :strings_as_keys => true}
14
14
  data = SmarterCSV.process("#{fixture_path}/binary.csv", options)
15
- data.flatten.size.should == 8
15
+ data.size.should == 8
16
16
  data.each do |item|
17
17
  # all keys should be strings
18
18
  item.keys.each{|x| x.class.should be == String}
@@ -0,0 +1,76 @@
1
+ require 'spec_helper'
2
+
3
+ fixture_path = 'spec/fixtures'
4
+
5
+ describe 'duplicate headers' do
6
+ describe 'without special handling / default behavior' do
7
+ it 'raises error on duplicate headers' do
8
+ expect {
9
+ SmarterCSV.process("#{fixture_path}/duplicate_headers.csv", {})
10
+ }.to raise_exception(SmarterCSV::DuplicateHeaders)
11
+ end
12
+
13
+ it 'raises error on duplicate given headers' do
14
+ expect {
15
+ options = {:user_provided_headers => [:a,:b,:c,:d,:a]}
16
+ SmarterCSV.process("#{fixture_path}/duplicate_headers.csv", options)
17
+ }.to raise_exception(SmarterCSV::DuplicateHeaders)
18
+ end
19
+
20
+ it 'raises error on missing mapped headers and includes missing headers in message' do
21
+ expect {
22
+ # the mapping is right, but the underlying csv file is bad
23
+ options = {:key_mapping => {:email => :a, :firstname => :b, :lastname => :c, :manager_email => :d, :age => :e} }
24
+ SmarterCSV.process("#{fixture_path}/duplicate_headers.csv", options)
25
+ }.to raise_exception(SmarterCSV::KeyMappingError, "missing header(s): manager_email")
26
+ end
27
+ end
28
+
29
+ describe 'with special handling' do
30
+ context 'with given suffix' do
31
+ let(:options) { {duplicate_header_suffix: '_'} }
32
+
33
+ it 'reads whole file' do
34
+ data = SmarterCSV.process("#{fixture_path}/duplicate_headers.csv", options)
35
+ expect(data.size).to eq 2
36
+ end
37
+
38
+ it 'generates the correct keys' do
39
+ data = SmarterCSV.process("#{fixture_path}/duplicate_headers.csv", options)
40
+ expect(data.first.keys).to eq [:email, :firstname, :lastname, :email_2, :age]
41
+ end
42
+
43
+ it 'enumerates when duplicate headers are given' do
44
+ options.merge!({:user_provided_headers => [:a,:b,:c,:a,:a]})
45
+ data = SmarterCSV.process("#{fixture_path}/duplicate_headers.csv", options)
46
+ expect(data.first.keys).to eq [:a, :b, :c, :a_2, :a_3]
47
+ end
48
+
49
+ it 'can remap duplicated headers' do
50
+ options.merge!({:key_mapping => {:email => :a, :firstname => :b, :lastname => :c, :email_2 => :d, :age => :e}})
51
+ data = SmarterCSV.process("#{fixture_path}/duplicate_headers.csv", options)
52
+ expect(data.first).to eq({a: 'tom@bla.com', b: 'Tom', c: 'Sawyer', d: 'mike@bla.com', e: 34})
53
+ end
54
+ end
55
+
56
+ context 'with empty suffix' do
57
+ let(:options) { {duplicate_header_suffix: ''} }
58
+
59
+ it 'reads whole file' do
60
+ data = SmarterCSV.process("#{fixture_path}/duplicate_headers.csv", options)
61
+ expect(data.size).to eq 2
62
+ end
63
+
64
+ it 'generates the correct keys' do
65
+ data = SmarterCSV.process("#{fixture_path}/duplicate_headers.csv", options)
66
+ expect(data.first.keys).to eq [:email, :firstname, :lastname, :email2, :age]
67
+ end
68
+
69
+ it 'enumerates when duplicate headers are given' do
70
+ options.merge!({:user_provided_headers => [:a,:b,:c,:a,:a]})
71
+ data = SmarterCSV.process("#{fixture_path}/duplicate_headers.csv", options)
72
+ expect(data.first.keys).to eq [:a, :b, :c, :a2, :a3]
73
+ end
74
+ end
75
+ end
76
+ end
@@ -0,0 +1,24 @@
1
+ require 'spec_helper'
2
+
3
+ fixture_path = 'spec/fixtures'
4
+
5
+ describe 'can handle the difficult CSV file' do
6
+
7
+ it 'loads the data with default values' do
8
+ data = SmarterCSV.process("#{fixture_path}/hard_sample.csv")
9
+ data.size.should eq 1
10
+ item = data.first
11
+ item.keys.count.should == 48
12
+ item[:name].should == '#MR1220817'
13
+ item[:shipping_method].should == 'Livraison Standard GRATUITE, 2-5 jours avec suivi'
14
+ item[:lineitem_name].should == 'Cire Épilation Nacrée'
15
+ item[:phone].should == 3366012111111
16
+ end
17
+
18
+ # the main problem is the data line starting with a # character, but not being a comment
19
+ it 'fails to load the CSV file with incorrectly set comment_regexp' do
20
+ options = {comment_regexp: /\A#/ }
21
+ data = SmarterCSV.process("#{fixture_path}/hard_sample.csv", options)
22
+ data.size.should eq 0
23
+ end
24
+ end
@@ -1,30 +1,45 @@
1
- require 'spec_helper'
2
-
3
- fixture_path = 'spec/fixtures'
4
-
5
- describe 'be_able_to' do
6
- it 'ignore comments in CSV files' do
7
- options = {}
8
- data = SmarterCSV.process("#{fixture_path}/ignore_comments.csv", options)
9
-
10
- data.size.should eq 5
11
-
12
- # all the keys should be symbols
13
- data.each{|item| item.keys.each{|x| x.is_a?(Symbol).should be_truthy}}
14
- data.each do |h|
15
- h.keys.each do |key|
16
- [:"not_a_comment#first_name", :last_name, :dogs, :cats, :birds, :fish].should include( key )
17
- end
18
- end
19
- end
20
-
21
- it 'ignore comments in CSV files with CRLF' do
22
- options = {row_sep: "\r\n"}
23
- data = SmarterCSV.process("#{fixture_path}/ignore_comments2.csv", options)
24
-
25
- # all the keys should be symbols
26
- data.size.should eq 1
27
- data.first[:h1].should eq 'a'
28
- data.first[:h2].should eq "b\r\n#c"
29
- end
30
- end
1
+ require 'spec_helper'
2
+
3
+ fixture_path = 'spec/fixtures'
4
+
5
+ describe 'be_able_to' do
6
+ it 'by default does not ignore comments in CSV files' do
7
+ options = {}
8
+ data = SmarterCSV.process("#{fixture_path}/ignore_comments.csv", options)
9
+
10
+ data.size.should eq 8
11
+
12
+ # all the keys should be symbols
13
+ data.each{|item| item.keys.each{|x| x.is_a?(Symbol).should be_truthy}}
14
+ data.each do |h|
15
+ h.keys.each do |key|
16
+ [:"not_a_comment#first_name", :last_name, :dogs, :cats, :birds, :fish].should include( key )
17
+ end
18
+ end
19
+ end
20
+
21
+ it 'ignore comments in CSV files using comment_regexp' do
22
+ options = {comment_regexp: /\A#/}
23
+ data = SmarterCSV.process("#{fixture_path}/ignore_comments.csv", options)
24
+
25
+ data.size.should eq 5
26
+
27
+ # all the keys should be symbols
28
+ data.each{|item| item.keys.each{|x| x.is_a?(Symbol).should be_truthy}}
29
+ data.each do |h|
30
+ h.keys.each do |key|
31
+ [:"not_a_comment#first_name", :last_name, :dogs, :cats, :birds, :fish].should include( key )
32
+ end
33
+ end
34
+ end
35
+
36
+ it 'ignore comments in CSV files with CRLF' do
37
+ options = {row_sep: "\r\n"}
38
+ data = SmarterCSV.process("#{fixture_path}/ignore_comments2.csv", options)
39
+
40
+ # all the keys should be symbols
41
+ data.size.should eq 1
42
+ data.first[:h1].should eq 'a'
43
+ data.first[:h2].should eq "b\r\n#c"
44
+ end
45
+ end
@@ -3,28 +3,6 @@ require 'spec_helper'
3
3
  fixture_path = 'spec/fixtures'
4
4
 
5
5
  describe 'test exceptions for invalid headers' do
6
- it 'raises error on duplicate headers' do
7
- expect {
8
- SmarterCSV.process("#{fixture_path}/duplicate_headers.csv", {})
9
- }.to raise_exception(SmarterCSV::DuplicateHeaders)
10
- end
11
-
12
- it 'raises error on duplicate given headers' do
13
- expect {
14
- options = {:user_provided_headers => [:a,:b,:c,:d,:a]}
15
- SmarterCSV.process("#{fixture_path}/duplicate_headers.csv", options)
16
- }.to raise_exception(SmarterCSV::DuplicateHeaders)
17
- end
18
-
19
- it 'raises error on duplicate mapped headers' do
20
- expect {
21
- # the mapping is right, but the underlying csv file is bad
22
- options = {:key_mapping => {:email => :a, :firstname => :b, :lastname => :c, :manager_email => :d, :age => :e} }
23
- SmarterCSV.process("#{fixture_path}/duplicate_headers.csv", options)
24
- }.to raise_exception(SmarterCSV::DuplicateHeaders)
25
- end
26
-
27
-
28
6
  it 'does not raise an error if no required headers are given' do
29
7
  options = {:required_headers => nil} # order does not matter
30
8
  data = SmarterCSV.process("#{fixture_path}/user_import.csv", options)
@@ -49,4 +27,12 @@ describe 'test exceptions for invalid headers' do
49
27
  SmarterCSV.process("#{fixture_path}/user_import.csv", options)
50
28
  }.to raise_exception(SmarterCSV::MissingHeaders)
51
29
  end
30
+
31
+ it 'raises error on missing mapped headers and includes missing headers in message' do
32
+ expect {
33
+ # :age does not exist in the CSV header
34
+ options = {:key_mapping => {:email => :a, :firstname => :b, :lastname => :c, :manager_email => :d, :age => :e} }
35
+ SmarterCSV.process("#{fixture_path}/user_import.csv", options)
36
+ }.to raise_exception(SmarterCSV::KeyMappingError, "missing header(s): age")
37
+ end
52
38
  end
@@ -2,23 +2,28 @@ require 'spec_helper'
2
2
 
3
3
  fixture_path = 'spec/fixtures'
4
4
 
5
- describe 'be_able_to' do
6
- it 'loads_csv_file_without_header' do
7
- options = {:headers_in_file => false, :user_provided_headers => [:a,:b,:c,:d,:e,:f]}
8
- data = SmarterCSV.process("#{fixture_path}/no_header.csv", options)
5
+ describe 'no header in file' do
6
+ let(:headers) { [:a,:b,:c,:d,:e,:f] }
7
+ let(:options) { {:headers_in_file => false, :user_provided_headers => headers} }
8
+ subject(:data) { SmarterCSV.process("#{fixture_path}/no_header.csv", options) }
9
+
10
+ it 'load the correct number of records' do
9
11
  data.size.should == 5
10
- # all the keys should be symbols
11
- data.each{|item| item.keys.each{|x| x.class.should be == Symbol}}
12
+ end
12
13
 
13
- data.each do |item|
14
+ it 'uses given symbols for all records' do
15
+ data.each do |item|
14
16
  item.keys.each do |key|
15
17
  [:a,:b,:c,:d,:e,:f].should include( key )
16
18
  end
17
19
  end
18
-
19
- data.each do |h|
20
- h.size.should <= 6
21
- end
22
20
  end
23
21
 
22
+ it 'loads the correct data' do
23
+ data[0].should == {a: "Dan", b: "McAllister", c: 2, d: 0}
24
+ data[1].should == {a: "Lucy", b: "Laweless", d: 5, e: 0}
25
+ data[2].should == {a: "Miles", b: "O'Brian", c: 0, d: 0, e: 0, f: 21}
26
+ data[3].should == {a: "Nancy", b: "Homes", c: 2, d: 0, e: 1}
27
+ data[4].should == {a: "Hernán", b: "Curaçon", c: 3, d: 0, e: 0}
28
+ end
24
29
  end
metadata CHANGED
@@ -1,14 +1,14 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: smarter_csv
3
3
  version: !ruby/object:Gem::Version
4
- version: 1.4.2
4
+ version: 1.5.2
5
5
  platform: ruby
6
6
  authors:
7
7
  - Tilo Sloboda
8
8
  autorequire:
9
9
  bindir: bin
10
10
  cert_chain: []
11
- date: 2022-02-15 00:00:00.000000000 Z
11
+ date: 2022-04-29 00:00:00.000000000 Z
12
12
  dependencies:
13
13
  - !ruby/object:Gem::Dependency
14
14
  name: rspec
@@ -62,6 +62,7 @@ files:
62
62
  - lib/smarter_csv/smarter_csv.rb
63
63
  - lib/smarter_csv/version.rb
64
64
  - smarter_csv.gemspec
65
+ - spec/fixtures/additional_separator.csv
65
66
  - spec/fixtures/basic.csv
66
67
  - spec/fixtures/binary.csv
67
68
  - spec/fixtures/carriage_returns_n.csv
@@ -73,6 +74,7 @@ files:
73
74
  - spec/fixtures/empty.csv
74
75
  - spec/fixtures/empty_columns_1.csv
75
76
  - spec/fixtures/empty_columns_2.csv
77
+ - spec/fixtures/hard_sample.csv
76
78
  - spec/fixtures/ignore_comments.csv
77
79
  - spec/fixtures/ignore_comments2.csv
78
80
  - spec/fixtures/key_mapping.csv
@@ -101,6 +103,7 @@ files:
101
103
  - spec/fixtures/valid_unicode.csv
102
104
  - spec/fixtures/with_dashes.csv
103
105
  - spec/fixtures/with_dates.csv
106
+ - spec/smarter_csv/additional_separator_spec.rb
104
107
  - spec/smarter_csv/binary_file2_spec.rb
105
108
  - spec/smarter_csv/binary_file_spec.rb
106
109
  - spec/smarter_csv/blank_spec.rb
@@ -109,8 +112,10 @@ files:
109
112
  - spec/smarter_csv/close_file_spec.rb
110
113
  - spec/smarter_csv/column_separator_spec.rb
111
114
  - spec/smarter_csv/convert_values_to_numeric_spec.rb
115
+ - spec/smarter_csv/duplicate_headers_spec.rb
112
116
  - spec/smarter_csv/empty_columns_spec.rb
113
117
  - spec/smarter_csv/extenstions_spec.rb
118
+ - spec/smarter_csv/hard_sample_spec.rb
114
119
  - spec/smarter_csv/header_transformation_spec.rb
115
120
  - spec/smarter_csv/ignore_comments_spec.rb
116
121
  - spec/smarter_csv/invalid_headers_spec.rb
@@ -164,6 +169,7 @@ specification_version: 4
164
169
  summary: Ruby Gem for smarter importing of CSV Files (and CSV-like files), with lots
165
170
  of optional features, e.g. chunked processing for huge CSV files
166
171
  test_files:
172
+ - spec/fixtures/additional_separator.csv
167
173
  - spec/fixtures/basic.csv
168
174
  - spec/fixtures/binary.csv
169
175
  - spec/fixtures/carriage_returns_n.csv
@@ -175,6 +181,7 @@ test_files:
175
181
  - spec/fixtures/empty.csv
176
182
  - spec/fixtures/empty_columns_1.csv
177
183
  - spec/fixtures/empty_columns_2.csv
184
+ - spec/fixtures/hard_sample.csv
178
185
  - spec/fixtures/ignore_comments.csv
179
186
  - spec/fixtures/ignore_comments2.csv
180
187
  - spec/fixtures/key_mapping.csv
@@ -203,6 +210,7 @@ test_files:
203
210
  - spec/fixtures/valid_unicode.csv
204
211
  - spec/fixtures/with_dashes.csv
205
212
  - spec/fixtures/with_dates.csv
213
+ - spec/smarter_csv/additional_separator_spec.rb
206
214
  - spec/smarter_csv/binary_file2_spec.rb
207
215
  - spec/smarter_csv/binary_file_spec.rb
208
216
  - spec/smarter_csv/blank_spec.rb
@@ -211,8 +219,10 @@ test_files:
211
219
  - spec/smarter_csv/close_file_spec.rb
212
220
  - spec/smarter_csv/column_separator_spec.rb
213
221
  - spec/smarter_csv/convert_values_to_numeric_spec.rb
222
+ - spec/smarter_csv/duplicate_headers_spec.rb
214
223
  - spec/smarter_csv/empty_columns_spec.rb
215
224
  - spec/smarter_csv/extenstions_spec.rb
225
+ - spec/smarter_csv/hard_sample_spec.rb
216
226
  - spec/smarter_csv/header_transformation_spec.rb
217
227
  - spec/smarter_csv/ignore_comments_spec.rb
218
228
  - spec/smarter_csv/invalid_headers_spec.rb