smarter_csv 1.5.0 → 1.6.0

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: 23032eface2d1d918bcd6daabb4ca79e03096612bda1017d06f1b0542d0c4619
4
- data.tar.gz: 12b68eeafc4f83c06b66da45b27da5e716675bff1e77be2362c2c10006821d9c
3
+ metadata.gz: fd2cf82aafc3b45257fbdfc594ed8e1d3bf2226e59cbee144b3003d8f79ec6cf
4
+ data.tar.gz: 95df862865e3123cf86194d47107f140f69f2fc91c20aba01d4004e8bffa5d74
5
5
  SHA512:
6
- metadata.gz: 5b84337de25ed7a8492088b82342e6d3b16d1fdc95120f9699986aee7d9416a51cfec981eb125e0d4b17600bc1c06c52eb3b2251857668210d9402e95bb75860
7
- data.tar.gz: b26b40b49bf6d739df9cd5deb477c33fdc22c54ed88c96f87e401fef789aedf3f9c55d25df60e4900a1e2a3c8bc0fc6e78018b128ab6ac14062fd97f694f3568
6
+ metadata.gz: df32ae9a380fa4fff0932d56e8a0cacadb8d4ebf7d8124e607f2ba389c3b60f875c300a2137fe04aac2b7eda77850b343af5e58c53e310f90460f96223f3228c
7
+ data.tar.gz: 107e1dbacdc6293a0c044a91cf237f50fbeab59eb5b032167f55d0fe6c2cf07b079c6cfe296368c03d2c84e64ce3c0e6ad744043397cdc217cd3ab51beb3ab09
data/CHANGELOG.md CHANGED
@@ -1,6 +1,19 @@
1
1
 
2
2
  # SmarterCSV 1.x Change Log
3
3
 
4
+ ## 1.6.0 (2022-05-03)
5
+ * completely rewrote line parser
6
+ * added methods `SmarterCSV.raw_headers` and `SmarterCSV.headers` to allow easy examination of how the headers are processed.
7
+
8
+ ## 1.5.2 (2022-04-29)
9
+ * added missing keys to the SmarterCSV::KeyMappingError exception message #189 (thanks to John Dell)
10
+
11
+ ## 1.5.1 (2022-04-27)
12
+ * added raising of `KeyMappingError` if `key_mapping` refers to a non-existent key
13
+ * added option `duplicate_header_suffix` (thanks to Skye Shaw)
14
+ When given a non-nil string, it uses the suffix to append numbering 2..n to duplicate headers.
15
+ If your code will need to process arbitrary CSV files, please set `duplicate_header_suffix`.
16
+
4
17
  ## 1.5.0 (2022-04-25)
5
18
  * fixed bug with trailing col_sep characters, introduced in 1.4.0
6
19
  * Fix deprecation warning in Ruby 3.0.3 / $INPUT_RECORD_SEPARATOR (thanks to Joel Fouse )
data/CONTRIBUTORS.md CHANGED
@@ -44,3 +44,4 @@ A Big Thank you to everyone who filed issues, sent comments, and who contributed
44
44
  * [Nicolas Guillemain](https://github.com/Viiruus)
45
45
  * [Sp6](https://github.com/sp6)
46
46
  * [Joel Fouse](https://github.com/jfouse)
47
+ * [John Dell](https://github.com/spovich)
data/README.md CHANGED
@@ -16,10 +16,12 @@
16
16
 
17
17
  # SmarterCSV
18
18
 
19
- [![Build Status](https://secure.travis-ci.org/tilo/smarter_csv.svg?branch=master)](http://travis-ci.org/tilo/smarter_csv) [![Gem Version](https://badge.fury.io/rb/smarter_csv.svg)](http://badge.fury.io/rb/smarter_csv)
19
+ [![Build Status](https://secure.travis-ci.org/tilo/smarter_csv.svg?branch=master)](http://travis-ci.com/tilo/smarter_csv) [![Gem Version](https://badge.fury.io/rb/smarter_csv.svg)](http://badge.fury.io/rb/smarter_csv)
20
20
 
21
21
  #### SmarterCSV 1.x
22
22
 
23
+ `smarter_csv` is now 10 years old, and still kicking! 🎉🎉🎉
24
+
23
25
  `smarter_csv` is a Ruby Gem for smarter importing of CSV Files as Array(s) of Hashes, suitable for direct processing with Mongoid or ActiveRecord,
24
26
  and parallel processing with Resque or Sidekiq.
25
27
 
@@ -42,11 +44,13 @@ NOTE; This Gem is only for importing CSV files - writing of CSV files is not sup
42
44
 
43
45
  ### Why?
44
46
 
45
- Ruby's CSV library's API is pretty old, and it's processing of CSV-files returning Arrays of Arrays feels 'very close to the metal'. The output is not easy to use - especially not if you want to create database records from it. Another shortcoming is that Ruby's CSV library does not have good support for huge CSV-files, e.g. there is no support for 'chunking' and/or parallel processing of the CSV-content (e.g. with Resque or Sidekiq),
47
+ Ruby's CSV library's API is pretty old, and it's processing of CSV-files returning Arrays of Arrays feels 'very close to the metal'. The output is not easy to use - especially not if you want to create database records or Sidekiq jobs with it. Another shortcoming is that Ruby's CSV library does not have good support for huge CSV-files, e.g. there is no support for 'chunking' and/or parallel processing of the CSV-content (e.g. with Sidekiq).
48
+
49
+ As the existing CSV libraries didn't fit my needs, I was writing my own CSV processing - specifically for use in connection with Rails ORMs like Mongoid, MongoMapper and ActiveRecord. In those ORMs you can easily pass a hash with attribute/value pairs to the create() method. The lower-level Mongo driver and Moped also accept larger arrays of such hashes to create a larger amount of records quickly with just one call. The same patterns are used when you pass data to Sidekiq jobs.
46
50
 
47
- As the existing CSV libraries didn't fit my needs, I was writing my own CSV processing - specifically for use in connection with Rails ORMs like Mongoid, MongoMapper or ActiveRecord. In those ORMs you can easily pass a hash with attribute/value pairs to the create() method. The lower-level Mongo driver and Moped also accept larger arrays of such hashes to create a larger amount of records quickly with just one call.
51
+ For processing large CSV files it is essential to process them in chunks, so the memory impact is minimized.
48
52
 
49
- ### Examples
53
+ ### How?
50
54
 
51
55
  The two main choices you have in terms of how to call `SmarterCSV.process` are:
52
56
  * calling `process` with or without a block
@@ -228,6 +232,7 @@ The options and the block are optional.
228
232
  | :headers_in_file | true | Whether or not the file contains headers as the first line. |
229
233
  | | | Important if the file does not contain headers, |
230
234
  | | | otherwise you would lose the first line of data. |
235
+ | :duplicate_header_suffix | nil | If set, adds numbers to duplicated headers and separates them by the given suffix |
231
236
  | :user_provided_headers | nil | *careful with that axe!* |
232
237
  | | | user provided Array of header strings or symbols, to define |
233
238
  | | | what headers should be used, overriding any in-file headers. |
@@ -282,6 +287,7 @@ And header and data validations will also be supported in 2.x
282
287
  data = SmarterCSV.process(f)
283
288
  end
284
289
  ```
290
+
285
291
  #### NOTES about CSV Headers:
286
292
  * as this method parses CSV files, it is assumed that the first line of any file will contain a valid header
287
293
  * the first line with the header might be commented out, in which case you will need to set `comment_regexp: /\A#/`
@@ -291,6 +297,13 @@ And header and data validations will also be supported in 2.x
291
297
  * you can not combine the :user_provided_headers and :key_mapping options
292
298
  * if the incorrect number of headers are provided via :user_provided_headers, exception SmarterCSV::HeaderSizeMismatch is raised
293
299
 
300
+ #### NOTES on Duplicate Headers:
301
+ As a corner case, it is possible that a CSV file contains multiple headers with the same name.
302
+ * If that happens, by default `smarter_csv` will raise a `DuplicateHeaders` error.
303
+ * If you set `duplicate_header_suffix` to a non-nil string, it will use it to append numbers 2..n to the duplicate headers. To further disambiguate the headers, you can further use `key_mapping` to assign meaningful names.
304
+ * If your code will need to process arbitrary CSV files, please set `duplicate_header_suffix`.
305
+ * Another way to deal with duplicate headers it to use `user_assigned_headers` to ignore any headers in the file.
306
+
294
307
  #### NOTES on Key Mapping:
295
308
  * keys in the header line of the file can be re-mapped to a chosen set of symbols, so the resulting Hashes can be better used internally in your application (e.g. when directly creating MongoDB entries with them)
296
309
  * if you want to completely delete a key, then map it to nil or to '', they will be automatically deleted from any result Hash
@@ -5,107 +5,38 @@ module SmarterCSV
5
5
  class DuplicateHeaders < SmarterCSVException; end
6
6
  class MissingHeaders < SmarterCSVException; end
7
7
  class NoColSepDetected < SmarterCSVException; end
8
+ class KeyMappingError < SmarterCSVException; end
9
+ class MalformedCSVError < SmarterCSVException; end
8
10
 
9
- def SmarterCSV.process(input, options={}, &block) # first parameter: filename or input object with readline method
11
+ # first parameter: filename or input object which responds to readline method
12
+ def SmarterCSV.process(input, options={}, &block)
10
13
  options = default_options.merge(options)
11
14
  options[:invalid_byte_sequence] = '' if options[:invalid_byte_sequence].nil?
12
15
 
13
16
  headerA = []
14
17
  result = []
15
- file_line_count = 0
16
- csv_line_count = 0
18
+ @file_line_count = 0
19
+ @csv_line_count = 0
17
20
  has_rails = !! defined?(Rails)
18
21
  begin
19
- f = input.respond_to?(:readline) ? input : File.open(input, "r:#{options[:file_encoding]}")
22
+ fh = input.respond_to?(:readline) ? input : File.open(input, "r:#{options[:file_encoding]}")
20
23
 
21
24
  # auto-detect the row separator
22
- options[:row_sep] = SmarterCSV.guess_line_ending(f, options) if options[:row_sep].to_sym == :auto
25
+ options[:row_sep] = SmarterCSV.guess_line_ending(fh, options) if options[:row_sep].to_sym == :auto
23
26
  # attempt to auto-detect column separator
24
- options[:col_sep] = guess_column_separator(f, options) if options[:col_sep].to_sym == :auto
25
- # preserve options, in case we need to call the CSV class
26
- csv_options = options.select{|k,v| [:col_sep, :row_sep, :quote_char].include?(k)} # options.slice(:col_sep, :row_sep, :quote_char)
27
- csv_options.delete(:row_sep) if [nil, :auto].include?( options[:row_sep].to_sym )
28
- csv_options.delete(:col_sep) if [nil, :auto].include?( options[:col_sep].to_sym )
27
+ options[:col_sep] = guess_column_separator(fh, options) if options[:col_sep].to_sym == :auto
29
28
 
30
- if (options[:force_utf8] || options[:file_encoding] =~ /utf-8/i) && ( f.respond_to?(:external_encoding) && f.external_encoding != Encoding.find('UTF-8') || f.respond_to?(:encoding) && f.encoding != Encoding.find('UTF-8') )
29
+ if (options[:force_utf8] || options[:file_encoding] =~ /utf-8/i) && ( fh.respond_to?(:external_encoding) && fh.external_encoding != Encoding.find('UTF-8') || fh.respond_to?(:encoding) && fh.encoding != Encoding.find('UTF-8') )
31
30
  puts 'WARNING: you are trying to process UTF-8 input, but did not open the input with "b:utf-8" option. See README file "NOTES about File Encodings".'
32
31
  end
33
32
 
34
- options[:skip_lines].to_i.times{f.readline(options[:row_sep])} if options[:skip_lines].to_i > 0
35
-
36
- if options[:headers_in_file] # extract the header line
37
- # process the header line in the CSV file..
38
- # the first line of a CSV file contains the header .. it might be commented out, so we need to read it anyhow
39
- header = f.readline(options[:row_sep])
40
- header = header.force_encoding('utf-8').encode('utf-8', invalid: :replace, undef: :replace, replace: options[:invalid_byte_sequence]) if options[:force_utf8] || options[:file_encoding] !~ /utf-8/i
41
- header = header.sub(options[:comment_regexp],'') if options[:comment_regexp]
42
- header = header.chomp(options[:row_sep])
43
-
44
- file_line_count += 1
45
- csv_line_count += 1
46
- header = header.gsub(options[:strip_chars_from_headers], '') if options[:strip_chars_from_headers]
47
-
48
- if (header =~ %r{#{options[:quote_char]}}) and (! options[:force_simple_split])
49
- file_headerA = begin
50
- CSV.parse( header, **csv_options ).flatten.collect!{|x| x.nil? ? '' : x} # to deal with nil values from CSV.parse
51
- rescue CSV::MalformedCSVError => e
52
- raise $!, "#{$!} [SmarterCSV: csv line #{csv_line_count}]", $!.backtrace
53
- end
54
- else
55
- file_headerA = header.split(options[:col_sep])
56
- end
57
- file_header_size = file_headerA.size # before mapping, which could delete keys
58
-
59
- file_headerA.map!{|x| x.gsub(%r/#{options[:quote_char]}/,'') }
60
- file_headerA.map!{|x| x.strip} if options[:strip_whitespace]
61
- unless options[:keep_original_headers]
62
- file_headerA.map!{|x| x.gsub(/\s+|-+/,'_')}
63
- file_headerA.map!{|x| x.downcase } if options[:downcase_header]
64
- end
65
- else
66
- raise SmarterCSV::IncorrectOption , "ERROR: If :headers_in_file is set to false, you have to provide :user_provided_headers" if options[:user_provided_headers].nil?
67
- end
68
- if options[:user_provided_headers] && options[:user_provided_headers].class == Array && ! options[:user_provided_headers].empty?
69
- # use user-provided headers
70
- headerA = options[:user_provided_headers]
71
- if defined?(file_header_size) && ! file_header_size.nil?
72
- if headerA.size != file_header_size
73
- raise SmarterCSV::HeaderSizeMismatch , "ERROR: :user_provided_headers defines #{headerA.size} headers != CSV-file #{input} has #{file_header_size} headers"
74
- else
75
- # we could print out the mapping of file_headerA to headerA here
76
- end
33
+ if options[:skip_lines].to_i > 0
34
+ options[:skip_lines].to_i.times do
35
+ readline_with_counts(fh, options)
77
36
  end
78
- else
79
- headerA = file_headerA
80
37
  end
81
- header_size = headerA.size # used for splitting lines
82
-
83
- headerA.map!{|x| x.to_sym } unless options[:strings_as_keys] || options[:keep_original_headers]
84
-
85
- unless options[:user_provided_headers] # wouldn't make sense to re-map user provided headers
86
- key_mappingH = options[:key_mapping]
87
38
 
88
- # do some key mapping on the keys in the file header
89
- # if you want to completely delete a key, then map it to nil or to ''
90
- if ! key_mappingH.nil? && key_mappingH.class == Hash && key_mappingH.keys.size > 0
91
- headerA.map!{|x| key_mappingH.has_key?(x) ? (key_mappingH[x].nil? ? nil : key_mappingH[x]) : (options[:remove_unmapped_keys] ? nil : x)}
92
- end
93
- end
94
-
95
- # header_validations
96
- duplicate_headers = []
97
- headerA.compact.each do |k|
98
- duplicate_headers << k if headerA.select{|x| x == k}.size > 1
99
- end
100
- raise SmarterCSV::DuplicateHeaders , "ERROR: duplicate headers: #{duplicate_headers.join(',')}" unless duplicate_headers.empty?
101
-
102
- if options[:required_headers] && options[:required_headers].is_a?(Array)
103
- missing_headers = []
104
- options[:required_headers].each do |k|
105
- missing_headers << k unless headerA.include?(k)
106
- end
107
- raise SmarterCSV::MissingHeaders , "ERROR: missing headers: #{missing_headers.join(',')}" unless missing_headers.empty?
108
- end
39
+ headerA, header_size = process_headers(fh, options)
109
40
 
110
41
  # in case we use chunking.. we'll need to set it up..
111
42
  if ! options[:chunk_size].nil? && options[:chunk_size].to_i > 0
@@ -118,15 +49,13 @@ module SmarterCSV
118
49
  end
119
50
 
120
51
  # now on to processing all the rest of the lines in the CSV file:
121
- while ! f.eof? # we can't use f.readlines() here, because this would read the whole file into memory at once, and eof => true
122
- line = f.readline(options[:row_sep]) # read one line
52
+ while ! fh.eof? # we can't use fh.readlines() here, because this would read the whole file into memory at once, and eof => true
53
+ line = readline_with_counts(fh, options)
123
54
 
124
55
  # replace invalid byte sequence in UTF-8 with question mark to avoid errors
125
56
  line = line.force_encoding('utf-8').encode('utf-8', invalid: :replace, undef: :replace, replace: options[:invalid_byte_sequence]) if options[:force_utf8] || options[:file_encoding] !~ /utf-8/i
126
57
 
127
- file_line_count += 1
128
- csv_line_count += 1
129
- print "processing file line %10d, csv line %10d\r" % [file_line_count, csv_line_count] if options[:verbose]
58
+ print "processing file line %10d, csv line %10d\r" % [@file_line_count, @csv_line_count] if options[:verbose]
130
59
 
131
60
  next if options[:comment_regexp] && line =~ options[:comment_regexp] # ignore all comment lines if there are any
132
61
 
@@ -135,24 +64,17 @@ module SmarterCSV
135
64
  # by detecting the existence of an uneven number of quote characters
136
65
  multiline = line.count(options[:quote_char])%2 == 1 # should handle quote_char nil
137
66
  while line.count(options[:quote_char])%2 == 1 # should handle quote_char nil
138
- next_line = f.readline(options[:row_sep])
67
+ next_line = fh.readline(options[:row_sep])
139
68
  next_line = next_line.force_encoding('utf-8').encode('utf-8', invalid: :replace, undef: :replace, replace: options[:invalid_byte_sequence]) if options[:force_utf8] || options[:file_encoding] !~ /utf-8/i
140
69
  line += next_line
141
- file_line_count += 1
70
+ @file_line_count += 1
142
71
  end
143
- print "\nline contains uneven number of quote chars so including content through file line %d\n" % file_line_count if options[:verbose] && multiline
72
+ print "\nline contains uneven number of quote chars so including content through file line %d\n" % @file_line_count if options[:verbose] && multiline
144
73
 
145
74
  line.chomp!(options[:row_sep])
146
75
 
147
- if (line =~ %r{#{options[:quote_char]}}) and (! options[:force_simple_split])
148
- dataA = begin
149
- CSV.parse( line, **csv_options ).flatten.collect!{|x| x.nil? ? '' : x} # to deal with nil values from CSV.parse
150
- rescue CSV::MalformedCSVError => e
151
- raise $!, "#{$!} [SmarterCSV: csv line #{csv_line_count}]", $!.backtrace
152
- end
153
- else
154
- dataA = line.split(options[:col_sep], header_size)
155
- end
76
+ dataA, data_size = parse(line, options, header_size)
77
+
156
78
  dataA.map!{|x| x.sub(/(#{options[:col_sep]})+\z/, '')} # remove any unwanted trailing col_sep characters at the end
157
79
  dataA.map!{|x| x.strip} if options[:strip_whitespace]
158
80
 
@@ -208,7 +130,7 @@ module SmarterCSV
208
130
  if use_chunks
209
131
  chunk << hash # append temp result to chunk
210
132
 
211
- if chunk.size >= chunk_size || f.eof? # if chunk if full, or EOF reached
133
+ if chunk.size >= chunk_size || fh.eof? # if chunk if full, or EOF reached
212
134
  # do something with the chunk
213
135
  if block_given?
214
136
  yield chunk # do something with the hashes in the chunk in the block
@@ -249,7 +171,7 @@ module SmarterCSV
249
171
  chunk = [] # initialize for next chunk of data
250
172
  end
251
173
  ensure
252
- f.close if f.respond_to?(:close)
174
+ fh.close if fh.respond_to?(:close)
253
175
  end
254
176
  if block_given?
255
177
  return chunk_count # when we do processing through a block we only care how many chunks we processed
@@ -268,6 +190,7 @@ module SmarterCSV
268
190
  comment_regexp: nil, # was: /\A#/,
269
191
  convert_values_to_numeric: true,
270
192
  downcase_header: true,
193
+ duplicate_header_suffix: nil,
271
194
  file_encoding: 'utf-8',
272
195
  force_simple_split: false ,
273
196
  force_utf8: false,
@@ -293,6 +216,62 @@ module SmarterCSV
293
216
  }
294
217
  end
295
218
 
219
+ def self.readline_with_counts(filehandle, options)
220
+ line = filehandle.readline(options[:row_sep])
221
+ @file_line_count += 1
222
+ @csv_line_count += 1
223
+ line
224
+ end
225
+
226
+ # parses a single line: either a CSV header and body line
227
+ # - quoting rules compared to RFC-4180 are somewhat relaxed
228
+ # - we are not assuming that quotes inside a fields need to be doubled
229
+ # - we are not assuming that all fields need to be quoted (0 is even)
230
+ # - works with multi-char col_sep
231
+ # - if header_size is given, only up to header_size fields are parsed
232
+ #
233
+ # We use header_size for parsing the body lines to make sure we always match the number of headers
234
+ # in case there are trailing col_sep characters in line
235
+ #
236
+ # Our convention is that empty fields are returned as empty strings, not as nil.
237
+ #
238
+ def self.parse(line, options, header_size = nil)
239
+ return [] if line.nil?
240
+
241
+ col_sep = options[:col_sep]
242
+ quote = options[:quote_char]
243
+ quote_count = 0
244
+ elements = []
245
+ start = 0
246
+ i = 0
247
+
248
+ while i < line.size do
249
+ if line[i...i+col_sep.size] == col_sep && quote_count.even?
250
+ break if !header_size.nil? && elements.size >= header_size
251
+
252
+ elements << cleanup_quotes(line[start...i], quote)
253
+ i += col_sep.size
254
+ start = i
255
+ else
256
+ quote_count += 1 if line[i] == quote
257
+ i += 1
258
+ end
259
+ end
260
+ elements << cleanup_quotes(line[start..-1], quote) if header_size.nil? || elements.size < header_size
261
+ [elements, elements.size]
262
+ end
263
+
264
+ def self.cleanup_quotes(field, quote)
265
+ return field if field.nil? || field !~ /#{quote}/
266
+
267
+ if field.start_with?(quote) && field.end_with?(quote)
268
+ field.delete_prefix!(quote)
269
+ field.delete_suffix!(quote)
270
+ end
271
+ field.gsub!("#{quote}#{quote}", quote)
272
+ field
273
+ end
274
+
296
275
  def self.blank?(value)
297
276
  case value
298
277
  when Array
@@ -378,4 +357,105 @@ module SmarterCSV
378
357
  k,_ = counts.max_by{|_,v| v}
379
358
  return k # the most frequent one is it
380
359
  end
360
+
361
+ def self.raw_hearder
362
+ @raw_header
363
+ end
364
+
365
+ def self.headers
366
+ @headers
367
+ end
368
+
369
+ def self.process_headers(filehandle, options)
370
+ @raw_header = nil
371
+ @headers = nil
372
+ if options[:headers_in_file] # extract the header line
373
+ # process the header line in the CSV file..
374
+ # the first line of a CSV file contains the header .. it might be commented out, so we need to read it anyhow
375
+ header = readline_with_counts(filehandle, options)
376
+ @raw_header = header
377
+
378
+ header = header.force_encoding('utf-8').encode('utf-8', invalid: :replace, undef: :replace, replace: options[:invalid_byte_sequence]) if options[:force_utf8] || options[:file_encoding] !~ /utf-8/i
379
+ header = header.sub(options[:comment_regexp],'') if options[:comment_regexp]
380
+ header = header.chomp(options[:row_sep])
381
+
382
+ header = header.gsub(options[:strip_chars_from_headers], '') if options[:strip_chars_from_headers]
383
+
384
+ file_headerA, file_header_size = parse(header, options)
385
+
386
+ file_headerA.map!{|x| x.gsub(%r/#{options[:quote_char]}/,'') }
387
+ file_headerA.map!{|x| x.strip} if options[:strip_whitespace]
388
+ unless options[:keep_original_headers]
389
+ file_headerA.map!{|x| x.gsub(/\s+|-+/,'_')}
390
+ file_headerA.map!{|x| x.downcase } if options[:downcase_header]
391
+ end
392
+ else
393
+ raise SmarterCSV::IncorrectOption , "ERROR: If :headers_in_file is set to false, you have to provide :user_provided_headers" unless options[:user_provided_headers]
394
+ end
395
+ if options[:user_provided_headers] && options[:user_provided_headers].class == Array && ! options[:user_provided_headers].empty?
396
+ # use user-provided headers
397
+ headerA = options[:user_provided_headers]
398
+ if defined?(file_header_size) && ! file_header_size.nil?
399
+ if headerA.size != file_header_size
400
+ raise SmarterCSV::HeaderSizeMismatch , "ERROR: :user_provided_headers defines #{headerA.size} headers != CSV-file #{input} has #{file_header_size} headers"
401
+ else
402
+ # we could print out the mapping of file_headerA to headerA here
403
+ end
404
+ end
405
+ else
406
+ headerA = file_headerA
407
+ end
408
+
409
+ # detect duplicate headers and disambiguate
410
+ headerA = process_duplicate_headers(headerA, options) if options[:duplicate_header_suffix]
411
+ header_size = headerA.size # used for splitting lines
412
+
413
+ headerA.map!{|x| x.to_sym } unless options[:strings_as_keys] || options[:keep_original_headers]
414
+
415
+ unless options[:user_provided_headers] # wouldn't make sense to re-map user provided headers
416
+ key_mappingH = options[:key_mapping]
417
+
418
+ # do some key mapping on the keys in the file header
419
+ # if you want to completely delete a key, then map it to nil or to ''
420
+ if ! key_mappingH.nil? && key_mappingH.class == Hash && key_mappingH.keys.size > 0
421
+ # we can't map keys that are not there
422
+ missing_keys = key_mappingH.keys - headerA
423
+ raise(SmarterCSV::KeyMappingError, "missing header(s): #{missing_keys.join(",")}") unless missing_keys.empty?
424
+
425
+ headerA.map!{|x| key_mappingH.has_key?(x) ? (key_mappingH[x].nil? ? nil : key_mappingH[x]) : (options[:remove_unmapped_keys] ? nil : x)}
426
+ end
427
+ end
428
+
429
+ # header_validations
430
+ duplicate_headers = []
431
+ headerA.compact.each do |k|
432
+ duplicate_headers << k if headerA.select{|x| x == k}.size > 1
433
+ end
434
+ raise SmarterCSV::DuplicateHeaders , "ERROR: duplicate headers: #{duplicate_headers.join(',')}" unless duplicate_headers.empty?
435
+
436
+ if options[:required_headers] && options[:required_headers].is_a?(Array)
437
+ missing_headers = []
438
+ options[:required_headers].each do |k|
439
+ missing_headers << k unless headerA.include?(k)
440
+ end
441
+ raise SmarterCSV::MissingHeaders , "ERROR: missing headers: #{missing_headers.join(',')}" unless missing_headers.empty?
442
+ end
443
+
444
+ @headers = headerA
445
+ [headerA, header_size]
446
+ end
447
+
448
+ def self.process_duplicate_headers(headers, options)
449
+ counts = Hash.new(0)
450
+ result = []
451
+ headers.each do |key|
452
+ counts[key] += 1
453
+ if counts[key] == 1
454
+ result << key
455
+ else
456
+ result << [key, options[:duplicate_header_suffix], counts[key]].join
457
+ end
458
+ end
459
+ result
460
+ end
381
461
  end
@@ -1,3 +1,3 @@
1
1
  module SmarterCSV
2
- VERSION = "1.5.0"
2
+ VERSION = "1.6.0"
3
3
  end
data/smarter_csv.gemspec CHANGED
@@ -16,9 +16,9 @@ Gem::Specification.new do |spec|
16
16
  spec.executables = spec.files.grep(%r{^bin/}).map{ |f| File.basename(f) }
17
17
  spec.test_files = spec.files.grep(%r{^(test|spec|features)/})
18
18
  spec.require_paths = ["lib"]
19
- spec.requirements = ['csv'] # for CSV.parse() only needed in case we have quoted fields
20
19
  spec.add_development_dependency "rspec"
21
20
  spec.add_development_dependency "simplecov"
21
+ spec.add_development_dependency "awesome_print"
22
22
  # spec.add_development_dependency "guard-rspec"
23
23
 
24
24
  spec.metadata["homepage_uri"] = spec.homepage
@@ -1,3 +1,3 @@
1
1
  email,firstname,lastname,email,age
2
2
  tom@bla.com,Tom,Sawyer,mike@bla.com,34
3
- eri@bla.com,Eri Chan,tom@bla.com,21
3
+ eri@bla.com,Eri,Chan,tom@bla.com,21
@@ -0,0 +1,76 @@
1
+ require 'spec_helper'
2
+
3
+ fixture_path = 'spec/fixtures'
4
+
5
+ describe 'duplicate headers' do
6
+ describe 'without special handling / default behavior' do
7
+ it 'raises error on duplicate headers' do
8
+ expect {
9
+ SmarterCSV.process("#{fixture_path}/duplicate_headers.csv", {})
10
+ }.to raise_exception(SmarterCSV::DuplicateHeaders)
11
+ end
12
+
13
+ it 'raises error on duplicate given headers' do
14
+ expect {
15
+ options = {:user_provided_headers => [:a,:b,:c,:d,:a]}
16
+ SmarterCSV.process("#{fixture_path}/duplicate_headers.csv", options)
17
+ }.to raise_exception(SmarterCSV::DuplicateHeaders)
18
+ end
19
+
20
+ it 'raises error on missing mapped headers and includes missing headers in message' do
21
+ expect {
22
+ # the mapping is right, but the underlying csv file is bad
23
+ options = {:key_mapping => {:email => :a, :firstname => :b, :lastname => :c, :manager_email => :d, :age => :e} }
24
+ SmarterCSV.process("#{fixture_path}/duplicate_headers.csv", options)
25
+ }.to raise_exception(SmarterCSV::KeyMappingError, "missing header(s): manager_email")
26
+ end
27
+ end
28
+
29
+ describe 'with special handling' do
30
+ context 'with given suffix' do
31
+ let(:options) { {duplicate_header_suffix: '_'} }
32
+
33
+ it 'reads whole file' do
34
+ data = SmarterCSV.process("#{fixture_path}/duplicate_headers.csv", options)
35
+ expect(data.size).to eq 2
36
+ end
37
+
38
+ it 'generates the correct keys' do
39
+ data = SmarterCSV.process("#{fixture_path}/duplicate_headers.csv", options)
40
+ expect(data.first.keys).to eq [:email, :firstname, :lastname, :email_2, :age]
41
+ end
42
+
43
+ it 'enumerates when duplicate headers are given' do
44
+ options.merge!({:user_provided_headers => [:a,:b,:c,:a,:a]})
45
+ data = SmarterCSV.process("#{fixture_path}/duplicate_headers.csv", options)
46
+ expect(data.first.keys).to eq [:a, :b, :c, :a_2, :a_3]
47
+ end
48
+
49
+ it 'can remap duplicated headers' do
50
+ options.merge!({:key_mapping => {:email => :a, :firstname => :b, :lastname => :c, :email_2 => :d, :age => :e}})
51
+ data = SmarterCSV.process("#{fixture_path}/duplicate_headers.csv", options)
52
+ expect(data.first).to eq({a: 'tom@bla.com', b: 'Tom', c: 'Sawyer', d: 'mike@bla.com', e: 34})
53
+ end
54
+ end
55
+
56
+ context 'with empty suffix' do
57
+ let(:options) { {duplicate_header_suffix: ''} }
58
+
59
+ it 'reads whole file' do
60
+ data = SmarterCSV.process("#{fixture_path}/duplicate_headers.csv", options)
61
+ expect(data.size).to eq 2
62
+ end
63
+
64
+ it 'generates the correct keys' do
65
+ data = SmarterCSV.process("#{fixture_path}/duplicate_headers.csv", options)
66
+ expect(data.first.keys).to eq [:email, :firstname, :lastname, :email2, :age]
67
+ end
68
+
69
+ it 'enumerates when duplicate headers are given' do
70
+ options.merge!({:user_provided_headers => [:a,:b,:c,:a,:a]})
71
+ data = SmarterCSV.process("#{fixture_path}/duplicate_headers.csv", options)
72
+ expect(data.first.keys).to eq [:a, :b, :c, :a2, :a3]
73
+ end
74
+ end
75
+ end
76
+ end
@@ -3,28 +3,6 @@ require 'spec_helper'
3
3
  fixture_path = 'spec/fixtures'
4
4
 
5
5
  describe 'test exceptions for invalid headers' do
6
- it 'raises error on duplicate headers' do
7
- expect {
8
- SmarterCSV.process("#{fixture_path}/duplicate_headers.csv", {})
9
- }.to raise_exception(SmarterCSV::DuplicateHeaders)
10
- end
11
-
12
- it 'raises error on duplicate given headers' do
13
- expect {
14
- options = {:user_provided_headers => [:a,:b,:c,:d,:a]}
15
- SmarterCSV.process("#{fixture_path}/duplicate_headers.csv", options)
16
- }.to raise_exception(SmarterCSV::DuplicateHeaders)
17
- end
18
-
19
- it 'raises error on duplicate mapped headers' do
20
- expect {
21
- # the mapping is right, but the underlying csv file is bad
22
- options = {:key_mapping => {:email => :a, :firstname => :b, :lastname => :c, :manager_email => :d, :age => :e} }
23
- SmarterCSV.process("#{fixture_path}/duplicate_headers.csv", options)
24
- }.to raise_exception(SmarterCSV::DuplicateHeaders)
25
- end
26
-
27
-
28
6
  it 'does not raise an error if no required headers are given' do
29
7
  options = {:required_headers => nil} # order does not matter
30
8
  data = SmarterCSV.process("#{fixture_path}/user_import.csv", options)
@@ -49,4 +27,12 @@ describe 'test exceptions for invalid headers' do
49
27
  SmarterCSV.process("#{fixture_path}/user_import.csv", options)
50
28
  }.to raise_exception(SmarterCSV::MissingHeaders)
51
29
  end
30
+
31
+ it 'raises error on missing mapped headers and includes missing headers in message' do
32
+ expect {
33
+ # :age does not exist in the CSV header
34
+ options = {:key_mapping => {:email => :a, :firstname => :b, :lastname => :c, :manager_email => :d, :age => :e} }
35
+ SmarterCSV.process("#{fixture_path}/user_import.csv", options)
36
+ }.to raise_exception(SmarterCSV::KeyMappingError, "missing header(s): age")
37
+ end
52
38
  end
@@ -2,16 +2,24 @@ require 'spec_helper'
2
2
 
3
3
  fixture_path = 'spec/fixtures'
4
4
 
5
- describe 'malformed_csv' do
6
- subject { lambda { SmarterCSV.process(csv_path) } }
7
-
8
- context "malformed header" do
5
+ # according to RFC-4180 quotes inside of "words" shouldbe doubled, but our parser is robust against that.
6
+ describe 'malformed CSV quotes' do
7
+ context "malformed quotes in header" do
9
8
  let(:csv_path) { "#{fixture_path}/malformed_header.csv" }
10
- it { should raise_error(CSV::MalformedCSVError) }
9
+ it 'should be resilient against single quotes' do
10
+ data = SmarterCSV.process(csv_path)
11
+ expect(data[0]).to eq({:name=>"Arnold Schwarzenegger", :dobdob=>"1947-07-30"})
12
+ expect(data[1]).to eq({:name=>"Jeff Bridges", :dobdob=>"1949-12-04"})
13
+ end
11
14
  end
12
15
 
13
- context "malformed content" do
16
+ context "malformed quotes in content" do
14
17
  let(:csv_path) { "#{fixture_path}/malformed.csv" }
15
- it { should raise_error(CSV::MalformedCSVError) }
18
+
19
+ it 'should be resilient against single quotes' do
20
+ data = SmarterCSV.process(csv_path)
21
+ expect(data[0]).to eq({:name=>"Arnold Schwarzenegger", :dob=>"1947-07-30"})
22
+ expect(data[1]).to eq({:name=>"Jeff \"the dude\" Bridges", :dob=>"1949-12-04"})
23
+ end
16
24
  end
17
25
  end
@@ -2,23 +2,28 @@ require 'spec_helper'
2
2
 
3
3
  fixture_path = 'spec/fixtures'
4
4
 
5
- describe 'be_able_to' do
6
- it 'loads_csv_file_without_header' do
7
- options = {:headers_in_file => false, :user_provided_headers => [:a,:b,:c,:d,:e,:f]}
8
- data = SmarterCSV.process("#{fixture_path}/no_header.csv", options)
5
+ describe 'no header in file' do
6
+ let(:headers) { [:a,:b,:c,:d,:e,:f] }
7
+ let(:options) { {:headers_in_file => false, :user_provided_headers => headers} }
8
+ subject(:data) { SmarterCSV.process("#{fixture_path}/no_header.csv", options) }
9
+
10
+ it 'load the correct number of records' do
9
11
  data.size.should == 5
10
- # all the keys should be symbols
11
- data.each{|item| item.keys.each{|x| x.class.should be == Symbol}}
12
+ end
12
13
 
13
- data.each do |item|
14
+ it 'uses given symbols for all records' do
15
+ data.each do |item|
14
16
  item.keys.each do |key|
15
17
  [:a,:b,:c,:d,:e,:f].should include( key )
16
18
  end
17
19
  end
18
-
19
- data.each do |h|
20
- h.size.should <= 6
21
- end
22
20
  end
23
21
 
22
+ it 'loads the correct data' do
23
+ data[0].should == {a: "Dan", b: "McAllister", c: 2, d: 0}
24
+ data[1].should == {a: "Lucy", b: "Laweless", d: 5, e: 0}
25
+ data[2].should == {a: "Miles", b: "O'Brian", c: 0, d: 0, e: 0, f: 21}
26
+ data[3].should == {a: "Nancy", b: "Homes", c: 2, d: 0, e: 1}
27
+ data[4].should == {a: "Hernán", b: "Curaçon", c: 3, d: 0, e: 0}
28
+ end
24
29
  end
@@ -0,0 +1,61 @@
1
+ require 'spec_helper'
2
+
3
+ describe 'parse with col_sep' do
4
+ let(:options) { {quote_char: '"'} }
5
+
6
+ it 'parses with comma' do
7
+ line = "a,b,,d"
8
+ options.merge!({col_sep: ","})
9
+ array, array_size = SmarterCSV.send(:parse, line, options)
10
+ expect(array).to eq ['a', 'b', '', 'd']
11
+ expect(array_size).to eq 4
12
+ end
13
+
14
+ it 'parses trailing commas' do
15
+ line = "a,b,c,,"
16
+ options.merge!({col_sep: ","})
17
+ array, array_size = SmarterCSV.send(:parse, line, options)
18
+ expect(array).to eq ['a', 'b', 'c', '', '']
19
+ expect(array_size).to eq 5
20
+ end
21
+
22
+ it 'parses with space' do
23
+ line = "a b d"
24
+ options.merge!({col_sep: " "})
25
+ array, array_size = SmarterCSV.send(:parse, line, options)
26
+ expect(array).to eq ['a', 'b', '', 'd']
27
+ expect(array_size).to eq 4
28
+ end
29
+
30
+ it 'parses with tab' do
31
+ line = "a\tb\t\td"
32
+ options.merge!({col_sep: "\t"})
33
+ array, array_size = SmarterCSV.send(:parse, line, options)
34
+ expect(array).to eq ['a', 'b', '', 'd']
35
+ expect(array_size).to eq 4
36
+ end
37
+
38
+ it 'parses with multiple space separator' do
39
+ line = "a b d"
40
+ options.merge!({col_sep: " "})
41
+ array, array_size = SmarterCSV.send(:parse, line, options)
42
+ expect(array).to eq ['a b', '', 'd']
43
+ expect(array_size).to eq 3
44
+ end
45
+
46
+ it 'parses with multiple char separator' do
47
+ line = '<=><=>A<=>B<=>C'
48
+ options.merge!({col_sep: "<=>"})
49
+ array, array_size = SmarterCSV.send(:parse, line, options)
50
+ expect(array).to eq ["", "", "A", "B", "C"]
51
+ expect(array_size).to eq 5
52
+ end
53
+
54
+ it 'parses trailing multiple char separator' do
55
+ line = '<=><=>A<=>B<=>C<=><=>'
56
+ options.merge!({col_sep: "<=>"})
57
+ array, array_size = SmarterCSV.send(:parse, line, options)
58
+ expect(array).to eq ["", "", "A", "B", "C", "", ""]
59
+ expect(array_size).to eq 7
60
+ end
61
+ end
@@ -0,0 +1,74 @@
1
+ require 'spec_helper'
2
+
3
+ describe 'old CSV library parsing tests' do
4
+ let(:options) { {quote_char: '"', col_sep: ","} }
5
+
6
+ [ ["\t", ["\t"]],
7
+ ["foo,\"\"\"\"\"\",baz", ["foo", "\"\"", "baz"]],
8
+ ["foo,\"\"\"bar\"\"\",baz", ["foo", "\"bar\"", "baz"]],
9
+ ["\"\"\"\n\",\"\"\"\n\"", ["\"\n", "\"\n"]],
10
+ ["foo,\"\r\n\",baz", ["foo", "\r\n", "baz"]],
11
+ ["\"\"", [""]],
12
+ ["foo,\"\"\"\",baz", ["foo", "\"", "baz"]],
13
+ ["foo,\"\r.\n\",baz", ["foo", "\r.\n", "baz"]],
14
+ ["foo,\"\r\",baz", ["foo", "\r", "baz"]],
15
+ ["foo,\"\",baz", ["foo", "", "baz"]],
16
+ ["\",\"", [","]],
17
+ ["foo", ["foo"]],
18
+ [",,", ['', '', '']],
19
+ [",", ['', '']],
20
+ ["foo,\"\n\",baz", ["foo", "\n", "baz"]],
21
+ ["foo,,baz", ["foo", '', "baz"]],
22
+ ["\"\"\"\r\",\"\"\"\r\"", ["\"\r", "\"\r"]],
23
+ ["\",\",\",\"", [",", ","]],
24
+ ["foo,bar,", ["foo", "bar", '']],
25
+ [",foo,bar", ['', "foo", "bar"]],
26
+ ["foo,bar", ["foo", "bar"]],
27
+ [";", [";"]],
28
+ ["\t,\t", ["\t", "\t"]],
29
+ ["foo,\"\r\n\r\",baz", ["foo", "\r\n\r", "baz"]],
30
+ ["foo,\"\r\n\n\",baz", ["foo", "\r\n\n", "baz"]],
31
+ ["foo,\"foo,bar\",baz", ["foo", "foo,bar", "baz"]],
32
+ [";,;", [";", ";"]]
33
+ ].each do |line, result|
34
+ it "parses #{line}" do
35
+ array, array_size = SmarterCSV.send(:parse, line, options)
36
+ expect(array).to eq result
37
+ end
38
+ end
39
+
40
+ [ ["foo,\"\"\"\"\"\",baz", ["foo", "\"\"", "baz"]],
41
+ ["foo,\"\"\"bar\"\"\",baz", ["foo", "\"bar\"", "baz"]],
42
+ ["foo,\"\r\n\",baz", ["foo", "\r\n", "baz"]],
43
+ ["\"\"", [""]],
44
+ ["foo,\"\"\"\",baz", ["foo", "\"", "baz"]],
45
+ ["foo,\"\r.\n\",baz", ["foo", "\r.\n", "baz"]],
46
+ ["foo,\"\r\",baz", ["foo", "\r", "baz"]],
47
+ ["foo,\"\",baz", ["foo", "", "baz"]],
48
+ ["foo", ["foo"]],
49
+ [",,", ['', '', '']],
50
+ [",", ['', '']],
51
+ ["foo,\"\n\",baz", ["foo", "\n", "baz"]],
52
+ ["foo,,baz", ["foo", '', "baz"]],
53
+ ["foo,bar", ["foo", "bar"]],
54
+ ["foo,\"\r\n\n\",baz", ["foo", "\r\n\n", "baz"]],
55
+ ["foo,\"foo,bar\",baz", ["foo", "foo,bar", "baz"]]
56
+ ].each do |line, result|
57
+ it "parses #{line}" do
58
+ array, array_size = SmarterCSV.send(:parse, line, options)
59
+ expect(array).to eq result
60
+ end
61
+ end
62
+
63
+ it 'mixed quotes' do
64
+ line = %Q{Ten Thousand,10000, 2710 ,,"10,000","It's ""10 Grand"", baby",10K}
65
+ array, array_size = SmarterCSV.send(:parse, line, options)
66
+ expect(array).to eq ["Ten Thousand", "10000", " 2710 ", "", "10,000", "It's \"10 Grand\", baby", "10K"]
67
+ end
68
+
69
+ it 'single quotes in fields' do
70
+ line = 'Indoor Chrome,49.2"" L x 49.2"" W x 20.5"" H,Chrome,"Crystal,Metal,Wood",23.12'
71
+ array, array_size = SmarterCSV.send(:parse, line, options)
72
+ expect(array).to eq ['Indoor Chrome', '49.2" L x 49.2" W x 20.5" H', 'Chrome', 'Crystal,Metal,Wood', '23.12']
73
+ end
74
+ end
@@ -0,0 +1,170 @@
1
+ require 'spec_helper'
2
+
3
+ fixture_path = 'spec/fixtures'
4
+
5
+ describe 'fulfills RFC-4180 and more' do
6
+ let(:options) { {col_sep: ',', row_sep: $INPUT_RECORD_SEPARATOR, quote_char: '"' } }
7
+
8
+ context 'parses simple CSV' do
9
+ context 'RFC-4180' do
10
+ it 'separating on col_sep' do
11
+ line = 'aaa,bbb,ccc'
12
+ expect( SmarterCSV.send(:parse, line, options)).to eq [%w[aaa bbb ccc], 3]
13
+ end
14
+
15
+ it 'preserves whitespace' do
16
+ line = ' aaa , bbb , ccc '
17
+ expect( SmarterCSV.send(:parse, line, options)).to eq [
18
+ [' aaa ', ' bbb ', ' ccc '], 3
19
+ ]
20
+ end
21
+ end
22
+
23
+ context 'extending RFC-4180' do
24
+ it 'with extra col_sep' do
25
+ line = 'aaa,bbb,ccc,'
26
+ expect( SmarterCSV.send(:parse, line, options)).to eq [
27
+ ['aaa', 'bbb', 'ccc', ''], 4
28
+ ]
29
+ end
30
+
31
+ it 'with extra col_sep with given header_size' do
32
+ line = 'aaa,bbb,ccc,'
33
+ expect( SmarterCSV.send(:parse, line, options, 3)).to eq [
34
+ ['aaa', 'bbb', 'ccc'], 3
35
+ ]
36
+ end
37
+
38
+ it 'with multiple extra col_sep' do
39
+ line = 'aaa,bbb,ccc,,,'
40
+ expect( SmarterCSV.send(:parse, line, options)).to eq [
41
+ ['aaa', 'bbb', 'ccc', '', '', ''], 6
42
+ ]
43
+ end
44
+
45
+ it 'with multiple extra col_sep' do
46
+ line = 'aaa,bbb,ccc,,,'
47
+ expect( SmarterCSV.send(:parse, line, options, 3)).to eq [
48
+ ['aaa', 'bbb', 'ccc'], 3
49
+ ]
50
+ end
51
+
52
+ it 'with multiple complex col_sep' do
53
+ line = 'aaa<=>bbb<=>ccc<=><=><=>'
54
+ expect( SmarterCSV.send(:parse, line, options.merge({col_sep: '<=>'}))).to eq [
55
+ ['aaa', 'bbb', 'ccc', '', '', ''], 6
56
+ ]
57
+ end
58
+
59
+ it 'with multiple complex col_sep with given header_size' do
60
+ line = 'aaa<=>bbb<=>ccc<=><=><=>'
61
+ expect( SmarterCSV.send(:parse, line, options.merge({col_sep: '<=>'}), 3)).to eq [
62
+ ['aaa', 'bbb', 'ccc'], 3
63
+ ]
64
+ end
65
+ end
66
+ end
67
+
68
+ context 'parses quoted CSV' do
69
+ context 'RFC-4180' do
70
+ it 'separating on col_sep' do
71
+ line = '"aaa","bbb","ccc"'
72
+ expect( SmarterCSV.send(:parse, line, options)).to eq [%w[aaa bbb ccc], 3]
73
+ end
74
+
75
+ it 'parses corner case correctly' do
76
+ line = '"Board 4""","$17.40","10000003427"'
77
+ expect( SmarterCSV.send(:parse, line, options)).to eq [
78
+ ['Board 4"', '$17.40', '10000003427'], 3
79
+ ]
80
+ end
81
+
82
+ it 'quoted parts can contain spaces' do
83
+ line = '" aaa1 aaa2 "," bbb1 bbb2 "," ccc1 ccc2 "'
84
+ expect( SmarterCSV.send(:parse, line, options)).to eq [
85
+ [' aaa1 aaa2 ', ' bbb1 bbb2 ', ' ccc1 ccc2 '], 3
86
+ ]
87
+ end
88
+
89
+ it 'quoted parts can contain row_sep' do
90
+ line = '"aaa1, aaa2","bbb1, bbb2","ccc1, ccc2"'
91
+ expect( SmarterCSV.send(:parse, line, options)).to eq [
92
+ ['aaa1, aaa2', 'bbb1, bbb2', 'ccc1, ccc2'], 3
93
+ ]
94
+ end
95
+
96
+ it 'quoted parts can contain row_sep' do
97
+ line = '"aaa1, ""aaa2"", aaa3","""bbb1"", bbb2","ccc1, ""ccc2"""'
98
+ expect( SmarterCSV.send(:parse, line, options)).to eq [
99
+ ['aaa1, "aaa2", aaa3', '"bbb1", bbb2', 'ccc1, "ccc2"'], 3
100
+ ]
101
+ end
102
+
103
+ it 'some fields are quoted' do
104
+ line = '1,"board 4""",12.95'
105
+ expect( SmarterCSV.send(:parse, line, options)).to eq [
106
+ ['1', 'board 4"', '12.95'], 3
107
+ ]
108
+ end
109
+
110
+ it 'separating on col_sep' do
111
+ line = '"some","thing","""completely"" different"'
112
+ expect( SmarterCSV.send(:parse, line, options)).to eq [
113
+ ['some', 'thing', '"completely" different'], 3
114
+ ]
115
+ end
116
+ end
117
+
118
+ context 'extending RFC-4180' do
119
+ it 'with extra col_sep, without given header_size' do
120
+ line = '"aaa","bbb","ccc",'
121
+ expect( SmarterCSV.send(:parse, line, options)).to eq [
122
+ ['aaa', 'bbb', 'ccc', ''], 4
123
+ ]
124
+ end
125
+
126
+ it 'with extra col_sep, with given header_size' do
127
+ line = '"aaa","bbb","ccc",'
128
+ expect( SmarterCSV.send(:parse, line, options, 3)).to eq [%w[aaa bbb ccc], 3]
129
+ end
130
+
131
+ it 'with multiple extra col_sep, without given header_size' do
132
+ line = '"aaa","bbb","ccc",,,'
133
+ expect( SmarterCSV.send(:parse, line, options)).to eq [
134
+ ['aaa', 'bbb', 'ccc', '', '', ''], 6
135
+ ]
136
+ end
137
+
138
+ it 'with multiple extra col_sep, with given header_size' do
139
+ line = '"aaa","bbb","ccc",,,'
140
+ expect( SmarterCSV.send(:parse, line, options, 3)).to eq [
141
+ ['aaa', 'bbb', 'ccc'], 3
142
+ ]
143
+ end
144
+
145
+ it 'with multiple complex extra col_sep, without given header_size' do
146
+ line = '"aaa"<=>"bbb"<=>"ccc"<=><=><=>'
147
+ expect( SmarterCSV.send(:parse, line, options.merge({col_sep: '<=>'}))).to eq [
148
+ ['aaa', 'bbb', 'ccc', '', '', ''], 6
149
+ ]
150
+ end
151
+
152
+ it 'with multiple complex extra col_sep, with given header_size' do
153
+ line = '"aaa"<=>"bbb"<=>"ccc"<=><=><=>'
154
+ expect( SmarterCSV.send(:parse, line, options.merge({col_sep: '<=>'}), 3)).to eq [
155
+ ['aaa', 'bbb', 'ccc'], 3
156
+ ]
157
+ end
158
+ end
159
+ end
160
+
161
+ # relaxed parsing compared to RFC-4180
162
+ context 'liberal_parsing' do
163
+ it 'parses corner case correctly' do
164
+ line = 'is,this "three, or four",fields'
165
+ expect( SmarterCSV.send(:parse, line, options)).to eq [
166
+ ['is', 'this "three, or four"', 'fields'], 3
167
+ ]
168
+ end
169
+ end
170
+ end
@@ -3,7 +3,6 @@ require 'spec_helper'
3
3
  fixture_path = 'spec/fixtures'
4
4
 
5
5
  describe 'loading file with quoted fields' do
6
-
7
6
  it 'leaving the quotes in the data' do
8
7
  options = {}
9
8
  data = SmarterCSV.process("#{fixture_path}/quoted.csv", options)
@@ -12,6 +11,7 @@ describe 'loading file with quoted fields' do
12
11
  data[1][:description].should be_nil
13
12
  data[2][:model].should eq 'Venture "Extended Edition, Very Large"'
14
13
  data[2][:description].should be_nil
14
+ data[3][:description].should eq 'MUST SELL! air, moon roof, loaded'
15
15
  data.each do |h|
16
16
  h[:year].class.should eq Fixnum
17
17
  h[:make].should_not be_nil
@@ -20,17 +20,21 @@ describe 'loading file with quoted fields' do
20
20
  end
21
21
  end
22
22
 
23
-
23
+ # quotes inside quoted fields need to be escaped by another double-quote
24
24
  it 'removes quotes around quoted fields, but not inside data' do
25
25
  options = {}
26
26
  data = SmarterCSV.process("#{fixture_path}/quote_char.csv", options)
27
27
 
28
28
  data.length.should eq 6
29
+ data[0][:first_name].should eq "\"John"
30
+ data[0][:last_name].should eq "Cooke\""
29
31
  data[1][:first_name].should eq "Jam\ne\nson\""
30
32
  data[2][:first_name].should eq "\"Jean"
33
+ data[4][:first_name].should eq "Bo\"bbie"
34
+ data[5][:first_name].should eq 'Mica'
35
+ data[5][:last_name].should eq 'Copeland'
31
36
  end
32
37
 
33
-
34
38
  # NOTE: quotes inside headers need to be escaped by doubling them
35
39
  # e.g. 'correct ""EXAMPLE""'
36
40
  # this escaping is illegal: 'incorrect \"EXAMPLE\"' <-- this caused CSV parsing error
@@ -43,6 +47,6 @@ describe 'loading file with quoted fields' do
43
47
  data.length.should eq 3
44
48
  data.first.keys[2].should eq :isbn
45
49
  data.first.keys[3].should eq :discounted_price
50
+ data[1][:author].should eq 'Timothy "The Parser" Campbell'
46
51
  end
47
-
48
52
  end
metadata CHANGED
@@ -1,14 +1,14 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: smarter_csv
3
3
  version: !ruby/object:Gem::Version
4
- version: 1.5.0
4
+ version: 1.6.0
5
5
  platform: ruby
6
6
  authors:
7
7
  - Tilo Sloboda
8
8
  autorequire:
9
9
  bindir: bin
10
10
  cert_chain: []
11
- date: 2022-04-25 00:00:00.000000000 Z
11
+ date: 2022-05-03 00:00:00.000000000 Z
12
12
  dependencies:
13
13
  - !ruby/object:Gem::Dependency
14
14
  name: rspec
@@ -38,6 +38,20 @@ dependencies:
38
38
  - - ">="
39
39
  - !ruby/object:Gem::Version
40
40
  version: '0'
41
+ - !ruby/object:Gem::Dependency
42
+ name: awesome_print
43
+ requirement: !ruby/object:Gem::Requirement
44
+ requirements:
45
+ - - ">="
46
+ - !ruby/object:Gem::Version
47
+ version: '0'
48
+ type: :development
49
+ prerelease: false
50
+ version_requirements: !ruby/object:Gem::Requirement
51
+ requirements:
52
+ - - ">="
53
+ - !ruby/object:Gem::Version
54
+ version: '0'
41
55
  description: Ruby Gem for smarter importing of CSV Files as Array(s) of Hashes, with
42
56
  optional features for processing large files in parallel, embedded comments, unusual
43
57
  field- and record-separators, flexible mapping of CSV-headers to Hash-keys
@@ -112,6 +126,7 @@ files:
112
126
  - spec/smarter_csv/close_file_spec.rb
113
127
  - spec/smarter_csv/column_separator_spec.rb
114
128
  - spec/smarter_csv/convert_values_to_numeric_spec.rb
129
+ - spec/smarter_csv/duplicate_headers_spec.rb
115
130
  - spec/smarter_csv/empty_columns_spec.rb
116
131
  - spec/smarter_csv/extenstions_spec.rb
117
132
  - spec/smarter_csv/hard_sample_spec.rb
@@ -125,6 +140,9 @@ files:
125
140
  - spec/smarter_csv/malformed_spec.rb
126
141
  - spec/smarter_csv/no_header_spec.rb
127
142
  - spec/smarter_csv/not_downcase_header_spec.rb
143
+ - spec/smarter_csv/parse/column_separator_spec.rb
144
+ - spec/smarter_csv/parse/old_csv_library_spec.rb
145
+ - spec/smarter_csv/parse/rfc4180_and_more_spec.rb
128
146
  - spec/smarter_csv/problematic.rb
129
147
  - spec/smarter_csv/quoted_spec.rb
130
148
  - spec/smarter_csv/remove_empty_values_spec.rb
@@ -160,8 +178,7 @@ required_rubygems_version: !ruby/object:Gem::Requirement
160
178
  - - ">="
161
179
  - !ruby/object:Gem::Version
162
180
  version: '0'
163
- requirements:
164
- - csv
181
+ requirements: []
165
182
  rubygems_version: 3.1.6
166
183
  signing_key:
167
184
  specification_version: 4
@@ -218,6 +235,7 @@ test_files:
218
235
  - spec/smarter_csv/close_file_spec.rb
219
236
  - spec/smarter_csv/column_separator_spec.rb
220
237
  - spec/smarter_csv/convert_values_to_numeric_spec.rb
238
+ - spec/smarter_csv/duplicate_headers_spec.rb
221
239
  - spec/smarter_csv/empty_columns_spec.rb
222
240
  - spec/smarter_csv/extenstions_spec.rb
223
241
  - spec/smarter_csv/hard_sample_spec.rb
@@ -231,6 +249,9 @@ test_files:
231
249
  - spec/smarter_csv/malformed_spec.rb
232
250
  - spec/smarter_csv/no_header_spec.rb
233
251
  - spec/smarter_csv/not_downcase_header_spec.rb
252
+ - spec/smarter_csv/parse/column_separator_spec.rb
253
+ - spec/smarter_csv/parse/old_csv_library_spec.rb
254
+ - spec/smarter_csv/parse/rfc4180_and_more_spec.rb
234
255
  - spec/smarter_csv/problematic.rb
235
256
  - spec/smarter_csv/quoted_spec.rb
236
257
  - spec/smarter_csv/remove_empty_values_spec.rb