smarter_csv 1.5.1 → 1.6.1

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: 352cf76ac0cd6b2eb4a1cac9e5056aa6e92a8a61b627d7c922e063dcf82ad675
4
- data.tar.gz: 0c6e3ab1eaee02a9361fe0b418191244d81bc558dbcd10ee1d2c5f15390d91b6
3
+ metadata.gz: 9be5f053e15e157d7d28555b4de894d2761d5918203da45f5fc4e6c5adcc2a3f
4
+ data.tar.gz: a47394f3d1f985960a64abf1a43ce6ebf9b8217af2c01a0c5f053af8c77c09ae
5
5
  SHA512:
6
- metadata.gz: 3763cd8e493e7da6560e8ce9adc58bd411f745f5af119c97d70c02667a524ccb1055b5c640ef795c3cb25b79fa5e17800018da6e76f9d358afa1c7a3513caae3
7
- data.tar.gz: '039183fdece20e80007f3f0d3e395fac8d273df6c21928a35b685ade5915503b3c55918fc7015ead0a7d768a545bb76bcfb1087006af6118dc3d22df83e68ddb'
6
+ metadata.gz: f27113af8a5771d89ac5c8783f1f69645c24bb576dafd97de17b0d8db8fff74dc396b42450802f9332c9f8b32a02ee18dabbc5dbc2daa91c75f957a678e99099
7
+ data.tar.gz: 5b2c2f3cbfc17b43c030c4c4c261962818bfbb2ce1a0ecd88f394682b75b75a64a2d8b6afcc0e4b99b97d39ef4b645de143cbb5f595e7aa9d661e66b1a53e98f
data/CHANGELOG.md CHANGED
@@ -1,7 +1,17 @@
1
1
 
2
2
  # SmarterCSV 1.x Change Log
3
3
 
4
- ## 1.5.1 (2022-04-26)
4
+ ## 1.6.1 (2022-05-06)
5
+ * unused keys in `key_mapping` generate a warning, no longer raise an exception
6
+
7
+ ## 1.6.0 (2022-05-03)
8
+ * completely rewrote line parser
9
+ * added methods `SmarterCSV.raw_headers` and `SmarterCSV.headers` to allow easy examination of how the headers are processed.
10
+
11
+ ## 1.5.2 (2022-04-29)
12
+ * added missing keys to the SmarterCSV::KeyMappingError exception message #189 (thanks to John Dell)
13
+
14
+ ## 1.5.1 (2022-04-27)
5
15
  * added raising of `KeyMappingError` if `key_mapping` refers to a non-existent key
6
16
  * added option `duplicate_header_suffix` (thanks to Skye Shaw)
7
17
  When given a non-nil string, it uses the suffix to append numbering 2..n to duplicate headers.
data/CONTRIBUTORS.md CHANGED
@@ -44,3 +44,4 @@ A Big Thank you to everyone who filed issues, sent comments, and who contributed
44
44
  * [Nicolas Guillemain](https://github.com/Viiruus)
45
45
  * [Sp6](https://github.com/sp6)
46
46
  * [Joel Fouse](https://github.com/jfouse)
47
+ * [John Dell](https://github.com/spovich)
data/README.md CHANGED
@@ -16,10 +16,12 @@
16
16
 
17
17
  # SmarterCSV
18
18
 
19
- [![Build Status](https://secure.travis-ci.org/tilo/smarter_csv.svg?branch=master)](http://travis-ci.org/tilo/smarter_csv) [![Gem Version](https://badge.fury.io/rb/smarter_csv.svg)](http://badge.fury.io/rb/smarter_csv)
19
+ [![Build Status](https://secure.travis-ci.org/tilo/smarter_csv.svg?branch=master)](http://travis-ci.com/tilo/smarter_csv) [![Gem Version](https://badge.fury.io/rb/smarter_csv.svg)](http://badge.fury.io/rb/smarter_csv)
20
20
 
21
21
  #### SmarterCSV 1.x
22
22
 
23
+ `smarter_csv` is now 10 years old, and still kicking! 🎉🎉🎉
24
+
23
25
  `smarter_csv` is a Ruby Gem for smarter importing of CSV Files as Array(s) of Hashes, suitable for direct processing with Mongoid or ActiveRecord,
24
26
  and parallel processing with Resque or Sidekiq.
25
27
 
@@ -42,11 +44,13 @@ NOTE; This Gem is only for importing CSV files - writing of CSV files is not sup
42
44
 
43
45
  ### Why?
44
46
 
45
- Ruby's CSV library's API is pretty old, and it's processing of CSV-files returning Arrays of Arrays feels 'very close to the metal'. The output is not easy to use - especially not if you want to create database records from it. Another shortcoming is that Ruby's CSV library does not have good support for huge CSV-files, e.g. there is no support for 'chunking' and/or parallel processing of the CSV-content (e.g. with Resque or Sidekiq),
47
+ Ruby's CSV library's API is pretty old, and it's processing of CSV-files returning Arrays of Arrays feels 'very close to the metal'. The output is not easy to use - especially not if you want to create database records or Sidekiq jobs with it. Another shortcoming is that Ruby's CSV library does not have good support for huge CSV-files, e.g. there is no support for 'chunking' and/or parallel processing of the CSV-content (e.g. with Sidekiq).
48
+
49
+ As the existing CSV libraries didn't fit my needs, I was writing my own CSV processing - specifically for use in connection with Rails ORMs like Mongoid, MongoMapper and ActiveRecord. In those ORMs you can easily pass a hash with attribute/value pairs to the create() method. The lower-level Mongo driver and Moped also accept larger arrays of such hashes to create a larger amount of records quickly with just one call. The same patterns are used when you pass data to Sidekiq jobs.
46
50
 
47
- As the existing CSV libraries didn't fit my needs, I was writing my own CSV processing - specifically for use in connection with Rails ORMs like Mongoid, MongoMapper or ActiveRecord. In those ORMs you can easily pass a hash with attribute/value pairs to the create() method. The lower-level Mongo driver and Moped also accept larger arrays of such hashes to create a larger amount of records quickly with just one call.
51
+ For processing large CSV files it is essential to process them in chunks, so the memory impact is minimized.
48
52
 
49
- ### Examples
53
+ ### How?
50
54
 
51
55
  The two main choices you have in terms of how to call `SmarterCSV.process` are:
52
56
  * calling `process` with or without a block
@@ -6,6 +6,7 @@ module SmarterCSV
6
6
  class MissingHeaders < SmarterCSVException; end
7
7
  class NoColSepDetected < SmarterCSVException; end
8
8
  class KeyMappingError < SmarterCSVException; end
9
+ class MalformedCSVError < SmarterCSVException; end
9
10
 
10
11
  # first parameter: filename or input object which responds to readline method
11
12
  def SmarterCSV.process(input, options={}, &block)
@@ -18,24 +19,24 @@ module SmarterCSV
18
19
  @csv_line_count = 0
19
20
  has_rails = !! defined?(Rails)
20
21
  begin
21
- f = input.respond_to?(:readline) ? input : File.open(input, "r:#{options[:file_encoding]}")
22
+ fh = input.respond_to?(:readline) ? input : File.open(input, "r:#{options[:file_encoding]}")
22
23
 
23
24
  # auto-detect the row separator
24
- options[:row_sep] = SmarterCSV.guess_line_ending(f, options) if options[:row_sep].to_sym == :auto
25
+ options[:row_sep] = SmarterCSV.guess_line_ending(fh, options) if options[:row_sep].to_sym == :auto
25
26
  # attempt to auto-detect column separator
26
- options[:col_sep] = guess_column_separator(f, options) if options[:col_sep].to_sym == :auto
27
- # preserve options, in case we need to call the CSV class
28
- csv_options = options.select{|k,v| [:col_sep, :row_sep, :quote_char].include?(k)} # options.slice(:col_sep, :row_sep, :quote_char)
29
- csv_options.delete(:row_sep) if [nil, :auto].include?( options[:row_sep].to_sym )
30
- csv_options.delete(:col_sep) if [nil, :auto].include?( options[:col_sep].to_sym )
27
+ options[:col_sep] = guess_column_separator(fh, options) if options[:col_sep].to_sym == :auto
31
28
 
32
- if (options[:force_utf8] || options[:file_encoding] =~ /utf-8/i) && ( f.respond_to?(:external_encoding) && f.external_encoding != Encoding.find('UTF-8') || f.respond_to?(:encoding) && f.encoding != Encoding.find('UTF-8') )
29
+ if (options[:force_utf8] || options[:file_encoding] =~ /utf-8/i) && ( fh.respond_to?(:external_encoding) && fh.external_encoding != Encoding.find('UTF-8') || fh.respond_to?(:encoding) && fh.encoding != Encoding.find('UTF-8') )
33
30
  puts 'WARNING: you are trying to process UTF-8 input, but did not open the input with "b:utf-8" option. See README file "NOTES about File Encodings".'
34
31
  end
35
32
 
36
- options[:skip_lines].to_i.times{f.readline(options[:row_sep])} if options[:skip_lines].to_i > 0
33
+ if options[:skip_lines].to_i > 0
34
+ options[:skip_lines].to_i.times do
35
+ readline_with_counts(fh, options)
36
+ end
37
+ end
37
38
 
38
- headerA, header_size = process_headers(f, options, csv_options)
39
+ headerA, header_size = process_headers(fh, options)
39
40
 
40
41
  # in case we use chunking.. we'll need to set it up..
41
42
  if ! options[:chunk_size].nil? && options[:chunk_size].to_i > 0
@@ -48,10 +49,8 @@ module SmarterCSV
48
49
  end
49
50
 
50
51
  # now on to processing all the rest of the lines in the CSV file:
51
- while ! f.eof? # we can't use f.readlines() here, because this would read the whole file into memory at once, and eof => true
52
- line = f.readline(options[:row_sep]) # read one line
53
- @file_line_count += 1
54
- @csv_line_count += 1
52
+ while ! fh.eof? # we can't use fh.readlines() here, because this would read the whole file into memory at once, and eof => true
53
+ line = readline_with_counts(fh, options)
55
54
 
56
55
  # replace invalid byte sequence in UTF-8 with question mark to avoid errors
57
56
  line = line.force_encoding('utf-8').encode('utf-8', invalid: :replace, undef: :replace, replace: options[:invalid_byte_sequence]) if options[:force_utf8] || options[:file_encoding] !~ /utf-8/i
@@ -63,9 +62,10 @@ module SmarterCSV
63
62
  # cater for the quoted csv data containing the row separator carriage return character
64
63
  # in which case the row data will be split across multiple lines (see the sample content in spec/fixtures/carriage_returns_rn.csv)
65
64
  # by detecting the existence of an uneven number of quote characters
65
+
66
66
  multiline = line.count(options[:quote_char])%2 == 1 # should handle quote_char nil
67
67
  while line.count(options[:quote_char])%2 == 1 # should handle quote_char nil
68
- next_line = f.readline(options[:row_sep])
68
+ next_line = fh.readline(options[:row_sep])
69
69
  next_line = next_line.force_encoding('utf-8').encode('utf-8', invalid: :replace, undef: :replace, replace: options[:invalid_byte_sequence]) if options[:force_utf8] || options[:file_encoding] !~ /utf-8/i
70
70
  line += next_line
71
71
  @file_line_count += 1
@@ -74,16 +74,8 @@ module SmarterCSV
74
74
 
75
75
  line.chomp!(options[:row_sep])
76
76
 
77
- if (line =~ %r{#{options[:quote_char]}}) and (! options[:force_simple_split])
78
- dataA = begin
79
- CSV.parse( line, **csv_options ).flatten.collect!{|x| x.nil? ? '' : x} # to deal with nil values from CSV.parse
80
- rescue CSV::MalformedCSVError => e
81
- raise $!, "#{$!} [SmarterCSV: csv line #{@csv_line_count}]", $!.backtrace
82
- end
83
- else
84
- dataA = line.split(options[:col_sep], header_size)
85
- end
86
- dataA.map!{|x| x.sub(/(#{options[:col_sep]})+\z/, '')} # remove any unwanted trailing col_sep characters at the end
77
+ dataA, data_size = parse(line, options, header_size)
78
+
87
79
  dataA.map!{|x| x.strip} if options[:strip_whitespace]
88
80
 
89
81
  # if all values are blank, then ignore this line
@@ -138,7 +130,7 @@ module SmarterCSV
138
130
  if use_chunks
139
131
  chunk << hash # append temp result to chunk
140
132
 
141
- if chunk.size >= chunk_size || f.eof? # if chunk if full, or EOF reached
133
+ if chunk.size >= chunk_size || fh.eof? # if chunk if full, or EOF reached
142
134
  # do something with the chunk
143
135
  if block_given?
144
136
  yield chunk # do something with the hashes in the chunk in the block
@@ -179,7 +171,7 @@ module SmarterCSV
179
171
  chunk = [] # initialize for next chunk of data
180
172
  end
181
173
  ensure
182
- f.close if f.respond_to?(:close)
174
+ fh.close if fh.respond_to?(:close)
183
175
  end
184
176
  if block_given?
185
177
  return chunk_count # when we do processing through a block we only care how many chunks we processed
@@ -224,6 +216,62 @@ module SmarterCSV
224
216
  }
225
217
  end
226
218
 
219
+ def self.readline_with_counts(filehandle, options)
220
+ line = filehandle.readline(options[:row_sep])
221
+ @file_line_count += 1
222
+ @csv_line_count += 1
223
+ line
224
+ end
225
+
226
+ # parses a single line: either a CSV header and body line
227
+ # - quoting rules compared to RFC-4180 are somewhat relaxed
228
+ # - we are not assuming that quotes inside a fields need to be doubled
229
+ # - we are not assuming that all fields need to be quoted (0 is even)
230
+ # - works with multi-char col_sep
231
+ # - if header_size is given, only up to header_size fields are parsed
232
+ #
233
+ # We use header_size for parsing the body lines to make sure we always match the number of headers
234
+ # in case there are trailing col_sep characters in line
235
+ #
236
+ # Our convention is that empty fields are returned as empty strings, not as nil.
237
+ #
238
+ def self.parse(line, options, header_size = nil)
239
+ return [] if line.nil?
240
+
241
+ col_sep = options[:col_sep]
242
+ quote = options[:quote_char]
243
+ quote_count = 0
244
+ elements = []
245
+ start = 0
246
+ i = 0
247
+
248
+ while i < line.size do
249
+ if line[i...i+col_sep.size] == col_sep && quote_count.even?
250
+ break if !header_size.nil? && elements.size >= header_size
251
+
252
+ elements << cleanup_quotes(line[start...i], quote)
253
+ i += col_sep.size
254
+ start = i
255
+ else
256
+ quote_count += 1 if line[i] == quote
257
+ i += 1
258
+ end
259
+ end
260
+ elements << cleanup_quotes(line[start..-1], quote) if header_size.nil? || elements.size < header_size
261
+ [elements, elements.size]
262
+ end
263
+
264
+ def self.cleanup_quotes(field, quote)
265
+ return field if field.nil? || field !~ /#{quote}/
266
+
267
+ if field.start_with?(quote) && field.end_with?(quote)
268
+ field.delete_prefix!(quote)
269
+ field.delete_suffix!(quote)
270
+ end
271
+ field.gsub!("#{quote}#{quote}", quote)
272
+ field
273
+ end
274
+
227
275
  def self.blank?(value)
228
276
  case value
229
277
  when Array
@@ -310,13 +358,22 @@ module SmarterCSV
310
358
  return k # the most frequent one is it
311
359
  end
312
360
 
313
- def self.process_headers(filehandle, options, csv_options)
361
+ def self.raw_hearder
362
+ @raw_header
363
+ end
364
+
365
+ def self.headers
366
+ @headers
367
+ end
368
+
369
+ def self.process_headers(filehandle, options)
370
+ @raw_header = nil
371
+ @headers = nil
314
372
  if options[:headers_in_file] # extract the header line
315
373
  # process the header line in the CSV file..
316
374
  # the first line of a CSV file contains the header .. it might be commented out, so we need to read it anyhow
317
- header = filehandle.readline(options[:row_sep])
318
- @file_line_count += 1
319
- @csv_line_count += 1
375
+ header = readline_with_counts(filehandle, options)
376
+ @raw_header = header
320
377
 
321
378
  header = header.force_encoding('utf-8').encode('utf-8', invalid: :replace, undef: :replace, replace: options[:invalid_byte_sequence]) if options[:force_utf8] || options[:file_encoding] !~ /utf-8/i
322
379
  header = header.sub(options[:comment_regexp],'') if options[:comment_regexp]
@@ -324,16 +381,7 @@ module SmarterCSV
324
381
 
325
382
  header = header.gsub(options[:strip_chars_from_headers], '') if options[:strip_chars_from_headers]
326
383
 
327
- if (header =~ %r{#{options[:quote_char]}}) and (! options[:force_simple_split])
328
- file_headerA = begin
329
- CSV.parse( header, **csv_options ).flatten.collect!{|x| x.nil? ? '' : x} # to deal with nil values from CSV.parse
330
- rescue CSV::MalformedCSVError => e
331
- raise $!, "#{$!} [SmarterCSV: csv line #{@csv_line_count}]", $!.backtrace
332
- end
333
- else
334
- file_headerA = header.split(options[:col_sep])
335
- end
336
- file_header_size = file_headerA.size # before mapping, which could delete keys
384
+ file_headerA, file_header_size = parse(header, options)
337
385
 
338
386
  file_headerA.map!{|x| x.gsub(%r/#{options[:quote_char]}/,'') }
339
387
  file_headerA.map!{|x| x.strip} if options[:strip_whitespace]
@@ -371,7 +419,8 @@ module SmarterCSV
371
419
  # if you want to completely delete a key, then map it to nil or to ''
372
420
  if ! key_mappingH.nil? && key_mappingH.class == Hash && key_mappingH.keys.size > 0
373
421
  # we can't map keys that are not there
374
- raise SmarterCSV::KeyMappingError unless (key_mappingH.keys - headerA).empty?
422
+ missing_keys = key_mappingH.keys - headerA
423
+ puts "WARNING: missing header(s): #{missing_keys.join(",")}" unless missing_keys.empty?
375
424
 
376
425
  headerA.map!{|x| key_mappingH.has_key?(x) ? (key_mappingH[x].nil? ? nil : key_mappingH[x]) : (options[:remove_unmapped_keys] ? nil : x)}
377
426
  end
@@ -392,6 +441,7 @@ module SmarterCSV
392
441
  raise SmarterCSV::MissingHeaders , "ERROR: missing headers: #{missing_headers.join(',')}" unless missing_headers.empty?
393
442
  end
394
443
 
444
+ @headers = headerA
395
445
  [headerA, header_size]
396
446
  end
397
447
 
@@ -1,3 +1,3 @@
1
1
  module SmarterCSV
2
- VERSION = "1.5.1"
2
+ VERSION = "1.6.1"
3
3
  end
data/smarter_csv.gemspec CHANGED
@@ -16,9 +16,9 @@ Gem::Specification.new do |spec|
16
16
  spec.executables = spec.files.grep(%r{^bin/}).map{ |f| File.basename(f) }
17
17
  spec.test_files = spec.files.grep(%r{^(test|spec|features)/})
18
18
  spec.require_paths = ["lib"]
19
- spec.requirements = ['csv'] # for CSV.parse() only needed in case we have quoted fields
20
19
  spec.add_development_dependency "rspec"
21
20
  spec.add_development_dependency "simplecov"
21
+ spec.add_development_dependency "awesome_print"
22
22
  # spec.add_development_dependency "guard-rspec"
23
23
 
24
24
  spec.metadata["homepage_uri"] = spec.homepage
@@ -17,12 +17,12 @@ describe 'duplicate headers' do
17
17
  }.to raise_exception(SmarterCSV::DuplicateHeaders)
18
18
  end
19
19
 
20
- it 'raises error on missing mapped headers' do
20
+ it 'does not raise error on missing mapped headers and includes missing headers in message' do
21
+ # the mapping is right, but the underlying csv file is bad
22
+ options = {:key_mapping => {:email => :a, :firstname => :b, :lastname => :c, :manager_email => :d, :age => :e} }
21
23
  expect {
22
- # the mapping is right, but the underlying csv file is bad
23
- options = {:key_mapping => {:email => :a, :firstname => :b, :lastname => :c, :manager_email => :d, :age => :e} }
24
24
  SmarterCSV.process("#{fixture_path}/duplicate_headers.csv", options)
25
- }.to raise_exception(SmarterCSV::KeyMappingError)
25
+ }.not_to raise_exception(SmarterCSV::KeyMappingError)
26
26
  end
27
27
  end
28
28
 
@@ -28,11 +28,11 @@ describe 'test exceptions for invalid headers' do
28
28
  }.to raise_exception(SmarterCSV::MissingHeaders)
29
29
  end
30
30
 
31
- it 'raises error on missing mapped headers' do
31
+ it 'does not raise error on missing mapped headers and includes missing headers in message' do
32
+ # :age does not exist in the CSV header
33
+ options = {:key_mapping => {:email => :a, :firstname => :b, :lastname => :c, :manager_email => :d, :age => :e} }
32
34
  expect {
33
- # :age does not exist in the CSV header
34
- options = {:key_mapping => {:email => :a, :firstname => :b, :lastname => :c, :manager_email => :d, :age => :e} }
35
35
  SmarterCSV.process("#{fixture_path}/user_import.csv", options)
36
- }.to raise_exception(SmarterCSV::KeyMappingError)
36
+ }.not_to raise_exception(SmarterCSV::KeyMappingError)
37
37
  end
38
38
  end
@@ -2,16 +2,24 @@ require 'spec_helper'
2
2
 
3
3
  fixture_path = 'spec/fixtures'
4
4
 
5
- describe 'malformed_csv' do
6
- subject { lambda { SmarterCSV.process(csv_path) } }
7
-
8
- context "malformed header" do
5
+ # according to RFC-4180 quotes inside of "words" shouldbe doubled, but our parser is robust against that.
6
+ describe 'malformed CSV quotes' do
7
+ context "malformed quotes in header" do
9
8
  let(:csv_path) { "#{fixture_path}/malformed_header.csv" }
10
- it { should raise_error(CSV::MalformedCSVError) }
9
+ it 'should be resilient against single quotes' do
10
+ data = SmarterCSV.process(csv_path)
11
+ expect(data[0]).to eq({:name=>"Arnold Schwarzenegger", :dobdob=>"1947-07-30"})
12
+ expect(data[1]).to eq({:name=>"Jeff Bridges", :dobdob=>"1949-12-04"})
13
+ end
11
14
  end
12
15
 
13
- context "malformed content" do
16
+ context "malformed quotes in content" do
14
17
  let(:csv_path) { "#{fixture_path}/malformed.csv" }
15
- it { should raise_error(CSV::MalformedCSVError) }
18
+
19
+ it 'should be resilient against single quotes' do
20
+ data = SmarterCSV.process(csv_path)
21
+ expect(data[0]).to eq({:name=>"Arnold Schwarzenegger", :dob=>"1947-07-30"})
22
+ expect(data[1]).to eq({:name=>"Jeff \"the dude\" Bridges", :dob=>"1949-12-04"})
23
+ end
16
24
  end
17
25
  end
@@ -0,0 +1,61 @@
1
+ require 'spec_helper'
2
+
3
+ describe 'parse with col_sep' do
4
+ let(:options) { {quote_char: '"'} }
5
+
6
+ it 'parses with comma' do
7
+ line = "a,b,,d"
8
+ options.merge!({col_sep: ","})
9
+ array, array_size = SmarterCSV.send(:parse, line, options)
10
+ expect(array).to eq ['a', 'b', '', 'd']
11
+ expect(array_size).to eq 4
12
+ end
13
+
14
+ it 'parses trailing commas' do
15
+ line = "a,b,c,,"
16
+ options.merge!({col_sep: ","})
17
+ array, array_size = SmarterCSV.send(:parse, line, options)
18
+ expect(array).to eq ['a', 'b', 'c', '', '']
19
+ expect(array_size).to eq 5
20
+ end
21
+
22
+ it 'parses with space' do
23
+ line = "a b d"
24
+ options.merge!({col_sep: " "})
25
+ array, array_size = SmarterCSV.send(:parse, line, options)
26
+ expect(array).to eq ['a', 'b', '', 'd']
27
+ expect(array_size).to eq 4
28
+ end
29
+
30
+ it 'parses with tab' do
31
+ line = "a\tb\t\td"
32
+ options.merge!({col_sep: "\t"})
33
+ array, array_size = SmarterCSV.send(:parse, line, options)
34
+ expect(array).to eq ['a', 'b', '', 'd']
35
+ expect(array_size).to eq 4
36
+ end
37
+
38
+ it 'parses with multiple space separator' do
39
+ line = "a b d"
40
+ options.merge!({col_sep: " "})
41
+ array, array_size = SmarterCSV.send(:parse, line, options)
42
+ expect(array).to eq ['a b', '', 'd']
43
+ expect(array_size).to eq 3
44
+ end
45
+
46
+ it 'parses with multiple char separator' do
47
+ line = '<=><=>A<=>B<=>C'
48
+ options.merge!({col_sep: "<=>"})
49
+ array, array_size = SmarterCSV.send(:parse, line, options)
50
+ expect(array).to eq ["", "", "A", "B", "C"]
51
+ expect(array_size).to eq 5
52
+ end
53
+
54
+ it 'parses trailing multiple char separator' do
55
+ line = '<=><=>A<=>B<=>C<=><=>'
56
+ options.merge!({col_sep: "<=>"})
57
+ array, array_size = SmarterCSV.send(:parse, line, options)
58
+ expect(array).to eq ["", "", "A", "B", "C", "", ""]
59
+ expect(array_size).to eq 7
60
+ end
61
+ end
@@ -0,0 +1,74 @@
1
+ require 'spec_helper'
2
+
3
+ describe 'old CSV library parsing tests' do
4
+ let(:options) { {quote_char: '"', col_sep: ","} }
5
+
6
+ [ ["\t", ["\t"]],
7
+ ["foo,\"\"\"\"\"\",baz", ["foo", "\"\"", "baz"]],
8
+ ["foo,\"\"\"bar\"\"\",baz", ["foo", "\"bar\"", "baz"]],
9
+ ["\"\"\"\n\",\"\"\"\n\"", ["\"\n", "\"\n"]],
10
+ ["foo,\"\r\n\",baz", ["foo", "\r\n", "baz"]],
11
+ ["\"\"", [""]],
12
+ ["foo,\"\"\"\",baz", ["foo", "\"", "baz"]],
13
+ ["foo,\"\r.\n\",baz", ["foo", "\r.\n", "baz"]],
14
+ ["foo,\"\r\",baz", ["foo", "\r", "baz"]],
15
+ ["foo,\"\",baz", ["foo", "", "baz"]],
16
+ ["\",\"", [","]],
17
+ ["foo", ["foo"]],
18
+ [",,", ['', '', '']],
19
+ [",", ['', '']],
20
+ ["foo,\"\n\",baz", ["foo", "\n", "baz"]],
21
+ ["foo,,baz", ["foo", '', "baz"]],
22
+ ["\"\"\"\r\",\"\"\"\r\"", ["\"\r", "\"\r"]],
23
+ ["\",\",\",\"", [",", ","]],
24
+ ["foo,bar,", ["foo", "bar", '']],
25
+ [",foo,bar", ['', "foo", "bar"]],
26
+ ["foo,bar", ["foo", "bar"]],
27
+ [";", [";"]],
28
+ ["\t,\t", ["\t", "\t"]],
29
+ ["foo,\"\r\n\r\",baz", ["foo", "\r\n\r", "baz"]],
30
+ ["foo,\"\r\n\n\",baz", ["foo", "\r\n\n", "baz"]],
31
+ ["foo,\"foo,bar\",baz", ["foo", "foo,bar", "baz"]],
32
+ [";,;", [";", ";"]]
33
+ ].each do |line, result|
34
+ it "parses #{line}" do
35
+ array, array_size = SmarterCSV.send(:parse, line, options)
36
+ expect(array).to eq result
37
+ end
38
+ end
39
+
40
+ [ ["foo,\"\"\"\"\"\",baz", ["foo", "\"\"", "baz"]],
41
+ ["foo,\"\"\"bar\"\"\",baz", ["foo", "\"bar\"", "baz"]],
42
+ ["foo,\"\r\n\",baz", ["foo", "\r\n", "baz"]],
43
+ ["\"\"", [""]],
44
+ ["foo,\"\"\"\",baz", ["foo", "\"", "baz"]],
45
+ ["foo,\"\r.\n\",baz", ["foo", "\r.\n", "baz"]],
46
+ ["foo,\"\r\",baz", ["foo", "\r", "baz"]],
47
+ ["foo,\"\",baz", ["foo", "", "baz"]],
48
+ ["foo", ["foo"]],
49
+ [",,", ['', '', '']],
50
+ [",", ['', '']],
51
+ ["foo,\"\n\",baz", ["foo", "\n", "baz"]],
52
+ ["foo,,baz", ["foo", '', "baz"]],
53
+ ["foo,bar", ["foo", "bar"]],
54
+ ["foo,\"\r\n\n\",baz", ["foo", "\r\n\n", "baz"]],
55
+ ["foo,\"foo,bar\",baz", ["foo", "foo,bar", "baz"]]
56
+ ].each do |line, result|
57
+ it "parses #{line}" do
58
+ array, array_size = SmarterCSV.send(:parse, line, options)
59
+ expect(array).to eq result
60
+ end
61
+ end
62
+
63
+ it 'mixed quotes' do
64
+ line = %Q{Ten Thousand,10000, 2710 ,,"10,000","It's ""10 Grand"", baby",10K}
65
+ array, array_size = SmarterCSV.send(:parse, line, options)
66
+ expect(array).to eq ["Ten Thousand", "10000", " 2710 ", "", "10,000", "It's \"10 Grand\", baby", "10K"]
67
+ end
68
+
69
+ it 'single quotes in fields' do
70
+ line = 'Indoor Chrome,49.2"" L x 49.2"" W x 20.5"" H,Chrome,"Crystal,Metal,Wood",23.12'
71
+ array, array_size = SmarterCSV.send(:parse, line, options)
72
+ expect(array).to eq ['Indoor Chrome', '49.2" L x 49.2" W x 20.5" H', 'Chrome', 'Crystal,Metal,Wood', '23.12']
73
+ end
74
+ end
@@ -0,0 +1,170 @@
1
+ require 'spec_helper'
2
+
3
+ fixture_path = 'spec/fixtures'
4
+
5
+ describe 'fulfills RFC-4180 and more' do
6
+ let(:options) { {col_sep: ',', row_sep: $INPUT_RECORD_SEPARATOR, quote_char: '"' } }
7
+
8
+ context 'parses simple CSV' do
9
+ context 'RFC-4180' do
10
+ it 'separating on col_sep' do
11
+ line = 'aaa,bbb,ccc'
12
+ expect( SmarterCSV.send(:parse, line, options)).to eq [%w[aaa bbb ccc], 3]
13
+ end
14
+
15
+ it 'preserves whitespace' do
16
+ line = ' aaa , bbb , ccc '
17
+ expect( SmarterCSV.send(:parse, line, options)).to eq [
18
+ [' aaa ', ' bbb ', ' ccc '], 3
19
+ ]
20
+ end
21
+ end
22
+
23
+ context 'extending RFC-4180' do
24
+ it 'with extra col_sep' do
25
+ line = 'aaa,bbb,ccc,'
26
+ expect( SmarterCSV.send(:parse, line, options)).to eq [
27
+ ['aaa', 'bbb', 'ccc', ''], 4
28
+ ]
29
+ end
30
+
31
+ it 'with extra col_sep with given header_size' do
32
+ line = 'aaa,bbb,ccc,'
33
+ expect( SmarterCSV.send(:parse, line, options, 3)).to eq [
34
+ ['aaa', 'bbb', 'ccc'], 3
35
+ ]
36
+ end
37
+
38
+ it 'with multiple extra col_sep' do
39
+ line = 'aaa,bbb,ccc,,,'
40
+ expect( SmarterCSV.send(:parse, line, options)).to eq [
41
+ ['aaa', 'bbb', 'ccc', '', '', ''], 6
42
+ ]
43
+ end
44
+
45
+ it 'with multiple extra col_sep' do
46
+ line = 'aaa,bbb,ccc,,,'
47
+ expect( SmarterCSV.send(:parse, line, options, 3)).to eq [
48
+ ['aaa', 'bbb', 'ccc'], 3
49
+ ]
50
+ end
51
+
52
+ it 'with multiple complex col_sep' do
53
+ line = 'aaa<=>bbb<=>ccc<=><=><=>'
54
+ expect( SmarterCSV.send(:parse, line, options.merge({col_sep: '<=>'}))).to eq [
55
+ ['aaa', 'bbb', 'ccc', '', '', ''], 6
56
+ ]
57
+ end
58
+
59
+ it 'with multiple complex col_sep with given header_size' do
60
+ line = 'aaa<=>bbb<=>ccc<=><=><=>'
61
+ expect( SmarterCSV.send(:parse, line, options.merge({col_sep: '<=>'}), 3)).to eq [
62
+ ['aaa', 'bbb', 'ccc'], 3
63
+ ]
64
+ end
65
+ end
66
+ end
67
+
68
+ context 'parses quoted CSV' do
69
+ context 'RFC-4180' do
70
+ it 'separating on col_sep' do
71
+ line = '"aaa","bbb","ccc"'
72
+ expect( SmarterCSV.send(:parse, line, options)).to eq [%w[aaa bbb ccc], 3]
73
+ end
74
+
75
+ it 'parses corner case correctly' do
76
+ line = '"Board 4""","$17.40","10000003427"'
77
+ expect( SmarterCSV.send(:parse, line, options)).to eq [
78
+ ['Board 4"', '$17.40', '10000003427'], 3
79
+ ]
80
+ end
81
+
82
+ it 'quoted parts can contain spaces' do
83
+ line = '" aaa1 aaa2 "," bbb1 bbb2 "," ccc1 ccc2 "'
84
+ expect( SmarterCSV.send(:parse, line, options)).to eq [
85
+ [' aaa1 aaa2 ', ' bbb1 bbb2 ', ' ccc1 ccc2 '], 3
86
+ ]
87
+ end
88
+
89
+ it 'quoted parts can contain row_sep' do
90
+ line = '"aaa1, aaa2","bbb1, bbb2","ccc1, ccc2"'
91
+ expect( SmarterCSV.send(:parse, line, options)).to eq [
92
+ ['aaa1, aaa2', 'bbb1, bbb2', 'ccc1, ccc2'], 3
93
+ ]
94
+ end
95
+
96
+ it 'quoted parts can contain row_sep' do
97
+ line = '"aaa1, ""aaa2"", aaa3","""bbb1"", bbb2","ccc1, ""ccc2"""'
98
+ expect( SmarterCSV.send(:parse, line, options)).to eq [
99
+ ['aaa1, "aaa2", aaa3', '"bbb1", bbb2', 'ccc1, "ccc2"'], 3
100
+ ]
101
+ end
102
+
103
+ it 'some fields are quoted' do
104
+ line = '1,"board 4""",12.95'
105
+ expect( SmarterCSV.send(:parse, line, options)).to eq [
106
+ ['1', 'board 4"', '12.95'], 3
107
+ ]
108
+ end
109
+
110
+ it 'separating on col_sep' do
111
+ line = '"some","thing","""completely"" different"'
112
+ expect( SmarterCSV.send(:parse, line, options)).to eq [
113
+ ['some', 'thing', '"completely" different'], 3
114
+ ]
115
+ end
116
+ end
117
+
118
+ context 'extending RFC-4180' do
119
+ it 'with extra col_sep, without given header_size' do
120
+ line = '"aaa","bbb","ccc",'
121
+ expect( SmarterCSV.send(:parse, line, options)).to eq [
122
+ ['aaa', 'bbb', 'ccc', ''], 4
123
+ ]
124
+ end
125
+
126
+ it 'with extra col_sep, with given header_size' do
127
+ line = '"aaa","bbb","ccc",'
128
+ expect( SmarterCSV.send(:parse, line, options, 3)).to eq [%w[aaa bbb ccc], 3]
129
+ end
130
+
131
+ it 'with multiple extra col_sep, without given header_size' do
132
+ line = '"aaa","bbb","ccc",,,'
133
+ expect( SmarterCSV.send(:parse, line, options)).to eq [
134
+ ['aaa', 'bbb', 'ccc', '', '', ''], 6
135
+ ]
136
+ end
137
+
138
+ it 'with multiple extra col_sep, with given header_size' do
139
+ line = '"aaa","bbb","ccc",,,'
140
+ expect( SmarterCSV.send(:parse, line, options, 3)).to eq [
141
+ ['aaa', 'bbb', 'ccc'], 3
142
+ ]
143
+ end
144
+
145
+ it 'with multiple complex extra col_sep, without given header_size' do
146
+ line = '"aaa"<=>"bbb"<=>"ccc"<=><=><=>'
147
+ expect( SmarterCSV.send(:parse, line, options.merge({col_sep: '<=>'}))).to eq [
148
+ ['aaa', 'bbb', 'ccc', '', '', ''], 6
149
+ ]
150
+ end
151
+
152
+ it 'with multiple complex extra col_sep, with given header_size' do
153
+ line = '"aaa"<=>"bbb"<=>"ccc"<=><=><=>'
154
+ expect( SmarterCSV.send(:parse, line, options.merge({col_sep: '<=>'}), 3)).to eq [
155
+ ['aaa', 'bbb', 'ccc'], 3
156
+ ]
157
+ end
158
+ end
159
+ end
160
+
161
+ # relaxed parsing compared to RFC-4180
162
+ context 'liberal_parsing' do
163
+ it 'parses corner case correctly' do
164
+ line = 'is,this "three, or four",fields'
165
+ expect( SmarterCSV.send(:parse, line, options)).to eq [
166
+ ['is', 'this "three, or four"', 'fields'], 3
167
+ ]
168
+ end
169
+ end
170
+ end
@@ -3,7 +3,6 @@ require 'spec_helper'
3
3
  fixture_path = 'spec/fixtures'
4
4
 
5
5
  describe 'loading file with quoted fields' do
6
-
7
6
  it 'leaving the quotes in the data' do
8
7
  options = {}
9
8
  data = SmarterCSV.process("#{fixture_path}/quoted.csv", options)
@@ -12,6 +11,7 @@ describe 'loading file with quoted fields' do
12
11
  data[1][:description].should be_nil
13
12
  data[2][:model].should eq 'Venture "Extended Edition, Very Large"'
14
13
  data[2][:description].should be_nil
14
+ data[3][:description].should eq 'MUST SELL! air, moon roof, loaded'
15
15
  data.each do |h|
16
16
  h[:year].class.should eq Fixnum
17
17
  h[:make].should_not be_nil
@@ -20,17 +20,21 @@ describe 'loading file with quoted fields' do
20
20
  end
21
21
  end
22
22
 
23
-
23
+ # quotes inside quoted fields need to be escaped by another double-quote
24
24
  it 'removes quotes around quoted fields, but not inside data' do
25
25
  options = {}
26
26
  data = SmarterCSV.process("#{fixture_path}/quote_char.csv", options)
27
27
 
28
28
  data.length.should eq 6
29
+ data[0][:first_name].should eq "\"John"
30
+ data[0][:last_name].should eq "Cooke\""
29
31
  data[1][:first_name].should eq "Jam\ne\nson\""
30
32
  data[2][:first_name].should eq "\"Jean"
33
+ data[4][:first_name].should eq "Bo\"bbie"
34
+ data[5][:first_name].should eq 'Mica'
35
+ data[5][:last_name].should eq 'Copeland'
31
36
  end
32
37
 
33
-
34
38
  # NOTE: quotes inside headers need to be escaped by doubling them
35
39
  # e.g. 'correct ""EXAMPLE""'
36
40
  # this escaping is illegal: 'incorrect \"EXAMPLE\"' <-- this caused CSV parsing error
@@ -43,6 +47,6 @@ describe 'loading file with quoted fields' do
43
47
  data.length.should eq 3
44
48
  data.first.keys[2].should eq :isbn
45
49
  data.first.keys[3].should eq :discounted_price
50
+ data[1][:author].should eq 'Timothy "The Parser" Campbell'
46
51
  end
47
-
48
52
  end
metadata CHANGED
@@ -1,14 +1,14 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: smarter_csv
3
3
  version: !ruby/object:Gem::Version
4
- version: 1.5.1
4
+ version: 1.6.1
5
5
  platform: ruby
6
6
  authors:
7
7
  - Tilo Sloboda
8
8
  autorequire:
9
9
  bindir: bin
10
10
  cert_chain: []
11
- date: 2022-04-27 00:00:00.000000000 Z
11
+ date: 2022-05-06 00:00:00.000000000 Z
12
12
  dependencies:
13
13
  - !ruby/object:Gem::Dependency
14
14
  name: rspec
@@ -38,6 +38,20 @@ dependencies:
38
38
  - - ">="
39
39
  - !ruby/object:Gem::Version
40
40
  version: '0'
41
+ - !ruby/object:Gem::Dependency
42
+ name: awesome_print
43
+ requirement: !ruby/object:Gem::Requirement
44
+ requirements:
45
+ - - ">="
46
+ - !ruby/object:Gem::Version
47
+ version: '0'
48
+ type: :development
49
+ prerelease: false
50
+ version_requirements: !ruby/object:Gem::Requirement
51
+ requirements:
52
+ - - ">="
53
+ - !ruby/object:Gem::Version
54
+ version: '0'
41
55
  description: Ruby Gem for smarter importing of CSV Files as Array(s) of Hashes, with
42
56
  optional features for processing large files in parallel, embedded comments, unusual
43
57
  field- and record-separators, flexible mapping of CSV-headers to Hash-keys
@@ -126,6 +140,9 @@ files:
126
140
  - spec/smarter_csv/malformed_spec.rb
127
141
  - spec/smarter_csv/no_header_spec.rb
128
142
  - spec/smarter_csv/not_downcase_header_spec.rb
143
+ - spec/smarter_csv/parse/column_separator_spec.rb
144
+ - spec/smarter_csv/parse/old_csv_library_spec.rb
145
+ - spec/smarter_csv/parse/rfc4180_and_more_spec.rb
129
146
  - spec/smarter_csv/problematic.rb
130
147
  - spec/smarter_csv/quoted_spec.rb
131
148
  - spec/smarter_csv/remove_empty_values_spec.rb
@@ -161,8 +178,7 @@ required_rubygems_version: !ruby/object:Gem::Requirement
161
178
  - - ">="
162
179
  - !ruby/object:Gem::Version
163
180
  version: '0'
164
- requirements:
165
- - csv
181
+ requirements: []
166
182
  rubygems_version: 3.1.6
167
183
  signing_key:
168
184
  specification_version: 4
@@ -233,6 +249,9 @@ test_files:
233
249
  - spec/smarter_csv/malformed_spec.rb
234
250
  - spec/smarter_csv/no_header_spec.rb
235
251
  - spec/smarter_csv/not_downcase_header_spec.rb
252
+ - spec/smarter_csv/parse/column_separator_spec.rb
253
+ - spec/smarter_csv/parse/old_csv_library_spec.rb
254
+ - spec/smarter_csv/parse/rfc4180_and_more_spec.rb
236
255
  - spec/smarter_csv/problematic.rb
237
256
  - spec/smarter_csv/quoted_spec.rb
238
257
  - spec/smarter_csv/remove_empty_values_spec.rb