smarter_csv 1.5.2 → 1.6.0

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: 88b9932c898320fb05d5697e155dc0bd3ade887d2fcfab7b660933e230007364
4
- data.tar.gz: f0525d9c917aff44f910d4547b8e918faa3beb50d47adc29182df1fc1ec2be19
3
+ metadata.gz: fd2cf82aafc3b45257fbdfc594ed8e1d3bf2226e59cbee144b3003d8f79ec6cf
4
+ data.tar.gz: 95df862865e3123cf86194d47107f140f69f2fc91c20aba01d4004e8bffa5d74
5
5
  SHA512:
6
- metadata.gz: 330ad44b9808150f6fdf96dec65d259c2d9cf5eb25e22dc80f63095f4014b065b8aa97a2ba9b814c6cea6f4c0361e04567be403ab78b54d0518b49dc072f36ac
7
- data.tar.gz: 27531bd508b5b455a32947badfb85d7e95489ad282a837ef046864806ba7fa12539148ab2fe4c84174fba3ef085dd3adda5d7c070615d684cd99ed0f90b903a3
6
+ metadata.gz: df32ae9a380fa4fff0932d56e8a0cacadb8d4ebf7d8124e607f2ba389c3b60f875c300a2137fe04aac2b7eda77850b343af5e58c53e310f90460f96223f3228c
7
+ data.tar.gz: 107e1dbacdc6293a0c044a91cf237f50fbeab59eb5b032167f55d0fe6c2cf07b079c6cfe296368c03d2c84e64ce3c0e6ad744043397cdc217cd3ab51beb3ab09
data/CHANGELOG.md CHANGED
@@ -1,6 +1,10 @@
1
1
 
2
2
  # SmarterCSV 1.x Change Log
3
3
 
4
+ ## 1.6.0 (2022-05-03)
5
+ * completely rewrote line parser
6
+ * added methods `SmarterCSV.raw_headers` and `SmarterCSV.headers` to allow easy examination of how the headers are processed.
7
+
4
8
  ## 1.5.2 (2022-04-29)
5
9
  * added missing keys to the SmarterCSV::KeyMappingError exception message #189 (thanks to John Dell)
6
10
 
data/README.md CHANGED
@@ -16,10 +16,12 @@
16
16
 
17
17
  # SmarterCSV
18
18
 
19
- [![Build Status](https://secure.travis-ci.org/tilo/smarter_csv.svg?branch=master)](http://travis-ci.org/tilo/smarter_csv) [![Gem Version](https://badge.fury.io/rb/smarter_csv.svg)](http://badge.fury.io/rb/smarter_csv)
19
+ [![Build Status](https://secure.travis-ci.org/tilo/smarter_csv.svg?branch=master)](http://travis-ci.com/tilo/smarter_csv) [![Gem Version](https://badge.fury.io/rb/smarter_csv.svg)](http://badge.fury.io/rb/smarter_csv)
20
20
 
21
21
  #### SmarterCSV 1.x
22
22
 
23
+ `smarter_csv` is now 10 years old, and still kicking! 🎉🎉🎉
24
+
23
25
  `smarter_csv` is a Ruby Gem for smarter importing of CSV Files as Array(s) of Hashes, suitable for direct processing with Mongoid or ActiveRecord,
24
26
  and parallel processing with Resque or Sidekiq.
25
27
 
@@ -42,11 +44,13 @@ NOTE; This Gem is only for importing CSV files - writing of CSV files is not sup
42
44
 
43
45
  ### Why?
44
46
 
45
- Ruby's CSV library's API is pretty old, and it's processing of CSV-files returning Arrays of Arrays feels 'very close to the metal'. The output is not easy to use - especially not if you want to create database records from it. Another shortcoming is that Ruby's CSV library does not have good support for huge CSV-files, e.g. there is no support for 'chunking' and/or parallel processing of the CSV-content (e.g. with Resque or Sidekiq),
47
+ Ruby's CSV library's API is pretty old, and it's processing of CSV-files returning Arrays of Arrays feels 'very close to the metal'. The output is not easy to use - especially not if you want to create database records or Sidekiq jobs with it. Another shortcoming is that Ruby's CSV library does not have good support for huge CSV-files, e.g. there is no support for 'chunking' and/or parallel processing of the CSV-content (e.g. with Sidekiq).
48
+
49
+ As the existing CSV libraries didn't fit my needs, I was writing my own CSV processing - specifically for use in connection with Rails ORMs like Mongoid, MongoMapper and ActiveRecord. In those ORMs you can easily pass a hash with attribute/value pairs to the create() method. The lower-level Mongo driver and Moped also accept larger arrays of such hashes to create a larger amount of records quickly with just one call. The same patterns are used when you pass data to Sidekiq jobs.
46
50
 
47
- As the existing CSV libraries didn't fit my needs, I was writing my own CSV processing - specifically for use in connection with Rails ORMs like Mongoid, MongoMapper or ActiveRecord. In those ORMs you can easily pass a hash with attribute/value pairs to the create() method. The lower-level Mongo driver and Moped also accept larger arrays of such hashes to create a larger amount of records quickly with just one call.
51
+ For processing large CSV files it is essential to process them in chunks, so the memory impact is minimized.
48
52
 
49
- ### Examples
53
+ ### How?
50
54
 
51
55
  The two main choices you have in terms of how to call `SmarterCSV.process` are:
52
56
  * calling `process` with or without a block
@@ -6,6 +6,7 @@ module SmarterCSV
6
6
  class MissingHeaders < SmarterCSVException; end
7
7
  class NoColSepDetected < SmarterCSVException; end
8
8
  class KeyMappingError < SmarterCSVException; end
9
+ class MalformedCSVError < SmarterCSVException; end
9
10
 
10
11
  # first parameter: filename or input object which responds to readline method
11
12
  def SmarterCSV.process(input, options={}, &block)
@@ -24,10 +25,6 @@ module SmarterCSV
24
25
  options[:row_sep] = SmarterCSV.guess_line_ending(fh, options) if options[:row_sep].to_sym == :auto
25
26
  # attempt to auto-detect column separator
26
27
  options[:col_sep] = guess_column_separator(fh, options) if options[:col_sep].to_sym == :auto
27
- # preserve options, in case we need to call the CSV class
28
- csv_options = options.select{|k,v| [:col_sep, :row_sep, :quote_char].include?(k)} # options.slice(:col_sep, :row_sep, :quote_char)
29
- csv_options.delete(:row_sep) if [nil, :auto].include?( options[:row_sep].to_sym )
30
- csv_options.delete(:col_sep) if [nil, :auto].include?( options[:col_sep].to_sym )
31
28
 
32
29
  if (options[:force_utf8] || options[:file_encoding] =~ /utf-8/i) && ( fh.respond_to?(:external_encoding) && fh.external_encoding != Encoding.find('UTF-8') || fh.respond_to?(:encoding) && fh.encoding != Encoding.find('UTF-8') )
33
30
  puts 'WARNING: you are trying to process UTF-8 input, but did not open the input with "b:utf-8" option. See README file "NOTES about File Encodings".'
@@ -39,7 +36,7 @@ module SmarterCSV
39
36
  end
40
37
  end
41
38
 
42
- headerA, header_size = process_headers(fh, options, csv_options)
39
+ headerA, header_size = process_headers(fh, options)
43
40
 
44
41
  # in case we use chunking.. we'll need to set it up..
45
42
  if ! options[:chunk_size].nil? && options[:chunk_size].to_i > 0
@@ -76,15 +73,8 @@ module SmarterCSV
76
73
 
77
74
  line.chomp!(options[:row_sep])
78
75
 
79
- if (line =~ %r{#{options[:quote_char]}}) and (! options[:force_simple_split])
80
- dataA = begin
81
- CSV.parse( line, **csv_options ).flatten.collect!{|x| x.nil? ? '' : x} # to deal with nil values from CSV.parse
82
- rescue CSV::MalformedCSVError => e
83
- raise $!, "#{$!} [SmarterCSV: csv line #{@csv_line_count}]", $!.backtrace
84
- end
85
- else
86
- dataA = line.split(options[:col_sep], header_size)
87
- end
76
+ dataA, data_size = parse(line, options, header_size)
77
+
88
78
  dataA.map!{|x| x.sub(/(#{options[:col_sep]})+\z/, '')} # remove any unwanted trailing col_sep characters at the end
89
79
  dataA.map!{|x| x.strip} if options[:strip_whitespace]
90
80
 
@@ -192,13 +182,6 @@ module SmarterCSV
192
182
 
193
183
  private
194
184
 
195
- def self.readline_with_counts(filehandle, options)
196
- line = filehandle.readline(options[:row_sep])
197
- @file_line_count += 1
198
- @csv_line_count += 1
199
- line
200
- end
201
-
202
185
  def self.default_options
203
186
  {
204
187
  auto_row_sep_chars: 500,
@@ -233,6 +216,62 @@ module SmarterCSV
233
216
  }
234
217
  end
235
218
 
219
+ def self.readline_with_counts(filehandle, options)
220
+ line = filehandle.readline(options[:row_sep])
221
+ @file_line_count += 1
222
+ @csv_line_count += 1
223
+ line
224
+ end
225
+
226
+ # parses a single line: either a CSV header and body line
227
+ # - quoting rules compared to RFC-4180 are somewhat relaxed
228
+ # - we are not assuming that quotes inside a fields need to be doubled
229
+ # - we are not assuming that all fields need to be quoted (0 is even)
230
+ # - works with multi-char col_sep
231
+ # - if header_size is given, only up to header_size fields are parsed
232
+ #
233
+ # We use header_size for parsing the body lines to make sure we always match the number of headers
234
+ # in case there are trailing col_sep characters in line
235
+ #
236
+ # Our convention is that empty fields are returned as empty strings, not as nil.
237
+ #
238
+ def self.parse(line, options, header_size = nil)
239
+ return [] if line.nil?
240
+
241
+ col_sep = options[:col_sep]
242
+ quote = options[:quote_char]
243
+ quote_count = 0
244
+ elements = []
245
+ start = 0
246
+ i = 0
247
+
248
+ while i < line.size do
249
+ if line[i...i+col_sep.size] == col_sep && quote_count.even?
250
+ break if !header_size.nil? && elements.size >= header_size
251
+
252
+ elements << cleanup_quotes(line[start...i], quote)
253
+ i += col_sep.size
254
+ start = i
255
+ else
256
+ quote_count += 1 if line[i] == quote
257
+ i += 1
258
+ end
259
+ end
260
+ elements << cleanup_quotes(line[start..-1], quote) if header_size.nil? || elements.size < header_size
261
+ [elements, elements.size]
262
+ end
263
+
264
+ def self.cleanup_quotes(field, quote)
265
+ return field if field.nil? || field !~ /#{quote}/
266
+
267
+ if field.start_with?(quote) && field.end_with?(quote)
268
+ field.delete_prefix!(quote)
269
+ field.delete_suffix!(quote)
270
+ end
271
+ field.gsub!("#{quote}#{quote}", quote)
272
+ field
273
+ end
274
+
236
275
  def self.blank?(value)
237
276
  case value
238
277
  when Array
@@ -319,11 +358,22 @@ module SmarterCSV
319
358
  return k # the most frequent one is it
320
359
  end
321
360
 
322
- def self.process_headers(filehandle, options, csv_options)
361
+ def self.raw_hearder
362
+ @raw_header
363
+ end
364
+
365
+ def self.headers
366
+ @headers
367
+ end
368
+
369
+ def self.process_headers(filehandle, options)
370
+ @raw_header = nil
371
+ @headers = nil
323
372
  if options[:headers_in_file] # extract the header line
324
373
  # process the header line in the CSV file..
325
374
  # the first line of a CSV file contains the header .. it might be commented out, so we need to read it anyhow
326
375
  header = readline_with_counts(filehandle, options)
376
+ @raw_header = header
327
377
 
328
378
  header = header.force_encoding('utf-8').encode('utf-8', invalid: :replace, undef: :replace, replace: options[:invalid_byte_sequence]) if options[:force_utf8] || options[:file_encoding] !~ /utf-8/i
329
379
  header = header.sub(options[:comment_regexp],'') if options[:comment_regexp]
@@ -331,16 +381,7 @@ module SmarterCSV
331
381
 
332
382
  header = header.gsub(options[:strip_chars_from_headers], '') if options[:strip_chars_from_headers]
333
383
 
334
- if (header =~ %r{#{options[:quote_char]}}) and (! options[:force_simple_split])
335
- file_headerA = begin
336
- CSV.parse( header, **csv_options ).flatten.collect!{|x| x.nil? ? '' : x} # to deal with nil values from CSV.parse
337
- rescue CSV::MalformedCSVError => e
338
- raise $!, "#{$!} [SmarterCSV: csv line #{@csv_line_count}]", $!.backtrace
339
- end
340
- else
341
- file_headerA = header.split(options[:col_sep])
342
- end
343
- file_header_size = file_headerA.size # before mapping, which could delete keys
384
+ file_headerA, file_header_size = parse(header, options)
344
385
 
345
386
  file_headerA.map!{|x| x.gsub(%r/#{options[:quote_char]}/,'') }
346
387
  file_headerA.map!{|x| x.strip} if options[:strip_whitespace]
@@ -400,6 +441,7 @@ module SmarterCSV
400
441
  raise SmarterCSV::MissingHeaders , "ERROR: missing headers: #{missing_headers.join(',')}" unless missing_headers.empty?
401
442
  end
402
443
 
444
+ @headers = headerA
403
445
  [headerA, header_size]
404
446
  end
405
447
 
@@ -1,3 +1,3 @@
1
1
  module SmarterCSV
2
- VERSION = "1.5.2"
2
+ VERSION = "1.6.0"
3
3
  end
data/smarter_csv.gemspec CHANGED
@@ -16,9 +16,9 @@ Gem::Specification.new do |spec|
16
16
  spec.executables = spec.files.grep(%r{^bin/}).map{ |f| File.basename(f) }
17
17
  spec.test_files = spec.files.grep(%r{^(test|spec|features)/})
18
18
  spec.require_paths = ["lib"]
19
- spec.requirements = ['csv'] # for CSV.parse() only needed in case we have quoted fields
20
19
  spec.add_development_dependency "rspec"
21
20
  spec.add_development_dependency "simplecov"
21
+ spec.add_development_dependency "awesome_print"
22
22
  # spec.add_development_dependency "guard-rspec"
23
23
 
24
24
  spec.metadata["homepage_uri"] = spec.homepage
@@ -2,16 +2,24 @@ require 'spec_helper'
2
2
 
3
3
  fixture_path = 'spec/fixtures'
4
4
 
5
- describe 'malformed_csv' do
6
- subject { lambda { SmarterCSV.process(csv_path) } }
7
-
8
- context "malformed header" do
5
+ # according to RFC-4180 quotes inside of "words" shouldbe doubled, but our parser is robust against that.
6
+ describe 'malformed CSV quotes' do
7
+ context "malformed quotes in header" do
9
8
  let(:csv_path) { "#{fixture_path}/malformed_header.csv" }
10
- it { should raise_error(CSV::MalformedCSVError) }
9
+ it 'should be resilient against single quotes' do
10
+ data = SmarterCSV.process(csv_path)
11
+ expect(data[0]).to eq({:name=>"Arnold Schwarzenegger", :dobdob=>"1947-07-30"})
12
+ expect(data[1]).to eq({:name=>"Jeff Bridges", :dobdob=>"1949-12-04"})
13
+ end
11
14
  end
12
15
 
13
- context "malformed content" do
16
+ context "malformed quotes in content" do
14
17
  let(:csv_path) { "#{fixture_path}/malformed.csv" }
15
- it { should raise_error(CSV::MalformedCSVError) }
18
+
19
+ it 'should be resilient against single quotes' do
20
+ data = SmarterCSV.process(csv_path)
21
+ expect(data[0]).to eq({:name=>"Arnold Schwarzenegger", :dob=>"1947-07-30"})
22
+ expect(data[1]).to eq({:name=>"Jeff \"the dude\" Bridges", :dob=>"1949-12-04"})
23
+ end
16
24
  end
17
25
  end
@@ -0,0 +1,61 @@
1
+ require 'spec_helper'
2
+
3
+ describe 'parse with col_sep' do
4
+ let(:options) { {quote_char: '"'} }
5
+
6
+ it 'parses with comma' do
7
+ line = "a,b,,d"
8
+ options.merge!({col_sep: ","})
9
+ array, array_size = SmarterCSV.send(:parse, line, options)
10
+ expect(array).to eq ['a', 'b', '', 'd']
11
+ expect(array_size).to eq 4
12
+ end
13
+
14
+ it 'parses trailing commas' do
15
+ line = "a,b,c,,"
16
+ options.merge!({col_sep: ","})
17
+ array, array_size = SmarterCSV.send(:parse, line, options)
18
+ expect(array).to eq ['a', 'b', 'c', '', '']
19
+ expect(array_size).to eq 5
20
+ end
21
+
22
+ it 'parses with space' do
23
+ line = "a b d"
24
+ options.merge!({col_sep: " "})
25
+ array, array_size = SmarterCSV.send(:parse, line, options)
26
+ expect(array).to eq ['a', 'b', '', 'd']
27
+ expect(array_size).to eq 4
28
+ end
29
+
30
+ it 'parses with tab' do
31
+ line = "a\tb\t\td"
32
+ options.merge!({col_sep: "\t"})
33
+ array, array_size = SmarterCSV.send(:parse, line, options)
34
+ expect(array).to eq ['a', 'b', '', 'd']
35
+ expect(array_size).to eq 4
36
+ end
37
+
38
+ it 'parses with multiple space separator' do
39
+ line = "a b d"
40
+ options.merge!({col_sep: " "})
41
+ array, array_size = SmarterCSV.send(:parse, line, options)
42
+ expect(array).to eq ['a b', '', 'd']
43
+ expect(array_size).to eq 3
44
+ end
45
+
46
+ it 'parses with multiple char separator' do
47
+ line = '<=><=>A<=>B<=>C'
48
+ options.merge!({col_sep: "<=>"})
49
+ array, array_size = SmarterCSV.send(:parse, line, options)
50
+ expect(array).to eq ["", "", "A", "B", "C"]
51
+ expect(array_size).to eq 5
52
+ end
53
+
54
+ it 'parses trailing multiple char separator' do
55
+ line = '<=><=>A<=>B<=>C<=><=>'
56
+ options.merge!({col_sep: "<=>"})
57
+ array, array_size = SmarterCSV.send(:parse, line, options)
58
+ expect(array).to eq ["", "", "A", "B", "C", "", ""]
59
+ expect(array_size).to eq 7
60
+ end
61
+ end
@@ -0,0 +1,74 @@
1
+ require 'spec_helper'
2
+
3
+ describe 'old CSV library parsing tests' do
4
+ let(:options) { {quote_char: '"', col_sep: ","} }
5
+
6
+ [ ["\t", ["\t"]],
7
+ ["foo,\"\"\"\"\"\",baz", ["foo", "\"\"", "baz"]],
8
+ ["foo,\"\"\"bar\"\"\",baz", ["foo", "\"bar\"", "baz"]],
9
+ ["\"\"\"\n\",\"\"\"\n\"", ["\"\n", "\"\n"]],
10
+ ["foo,\"\r\n\",baz", ["foo", "\r\n", "baz"]],
11
+ ["\"\"", [""]],
12
+ ["foo,\"\"\"\",baz", ["foo", "\"", "baz"]],
13
+ ["foo,\"\r.\n\",baz", ["foo", "\r.\n", "baz"]],
14
+ ["foo,\"\r\",baz", ["foo", "\r", "baz"]],
15
+ ["foo,\"\",baz", ["foo", "", "baz"]],
16
+ ["\",\"", [","]],
17
+ ["foo", ["foo"]],
18
+ [",,", ['', '', '']],
19
+ [",", ['', '']],
20
+ ["foo,\"\n\",baz", ["foo", "\n", "baz"]],
21
+ ["foo,,baz", ["foo", '', "baz"]],
22
+ ["\"\"\"\r\",\"\"\"\r\"", ["\"\r", "\"\r"]],
23
+ ["\",\",\",\"", [",", ","]],
24
+ ["foo,bar,", ["foo", "bar", '']],
25
+ [",foo,bar", ['', "foo", "bar"]],
26
+ ["foo,bar", ["foo", "bar"]],
27
+ [";", [";"]],
28
+ ["\t,\t", ["\t", "\t"]],
29
+ ["foo,\"\r\n\r\",baz", ["foo", "\r\n\r", "baz"]],
30
+ ["foo,\"\r\n\n\",baz", ["foo", "\r\n\n", "baz"]],
31
+ ["foo,\"foo,bar\",baz", ["foo", "foo,bar", "baz"]],
32
+ [";,;", [";", ";"]]
33
+ ].each do |line, result|
34
+ it "parses #{line}" do
35
+ array, array_size = SmarterCSV.send(:parse, line, options)
36
+ expect(array).to eq result
37
+ end
38
+ end
39
+
40
+ [ ["foo,\"\"\"\"\"\",baz", ["foo", "\"\"", "baz"]],
41
+ ["foo,\"\"\"bar\"\"\",baz", ["foo", "\"bar\"", "baz"]],
42
+ ["foo,\"\r\n\",baz", ["foo", "\r\n", "baz"]],
43
+ ["\"\"", [""]],
44
+ ["foo,\"\"\"\",baz", ["foo", "\"", "baz"]],
45
+ ["foo,\"\r.\n\",baz", ["foo", "\r.\n", "baz"]],
46
+ ["foo,\"\r\",baz", ["foo", "\r", "baz"]],
47
+ ["foo,\"\",baz", ["foo", "", "baz"]],
48
+ ["foo", ["foo"]],
49
+ [",,", ['', '', '']],
50
+ [",", ['', '']],
51
+ ["foo,\"\n\",baz", ["foo", "\n", "baz"]],
52
+ ["foo,,baz", ["foo", '', "baz"]],
53
+ ["foo,bar", ["foo", "bar"]],
54
+ ["foo,\"\r\n\n\",baz", ["foo", "\r\n\n", "baz"]],
55
+ ["foo,\"foo,bar\",baz", ["foo", "foo,bar", "baz"]]
56
+ ].each do |line, result|
57
+ it "parses #{line}" do
58
+ array, array_size = SmarterCSV.send(:parse, line, options)
59
+ expect(array).to eq result
60
+ end
61
+ end
62
+
63
+ it 'mixed quotes' do
64
+ line = %Q{Ten Thousand,10000, 2710 ,,"10,000","It's ""10 Grand"", baby",10K}
65
+ array, array_size = SmarterCSV.send(:parse, line, options)
66
+ expect(array).to eq ["Ten Thousand", "10000", " 2710 ", "", "10,000", "It's \"10 Grand\", baby", "10K"]
67
+ end
68
+
69
+ it 'single quotes in fields' do
70
+ line = 'Indoor Chrome,49.2"" L x 49.2"" W x 20.5"" H,Chrome,"Crystal,Metal,Wood",23.12'
71
+ array, array_size = SmarterCSV.send(:parse, line, options)
72
+ expect(array).to eq ['Indoor Chrome', '49.2" L x 49.2" W x 20.5" H', 'Chrome', 'Crystal,Metal,Wood', '23.12']
73
+ end
74
+ end
@@ -0,0 +1,170 @@
1
+ require 'spec_helper'
2
+
3
+ fixture_path = 'spec/fixtures'
4
+
5
+ describe 'fulfills RFC-4180 and more' do
6
+ let(:options) { {col_sep: ',', row_sep: $INPUT_RECORD_SEPARATOR, quote_char: '"' } }
7
+
8
+ context 'parses simple CSV' do
9
+ context 'RFC-4180' do
10
+ it 'separating on col_sep' do
11
+ line = 'aaa,bbb,ccc'
12
+ expect( SmarterCSV.send(:parse, line, options)).to eq [%w[aaa bbb ccc], 3]
13
+ end
14
+
15
+ it 'preserves whitespace' do
16
+ line = ' aaa , bbb , ccc '
17
+ expect( SmarterCSV.send(:parse, line, options)).to eq [
18
+ [' aaa ', ' bbb ', ' ccc '], 3
19
+ ]
20
+ end
21
+ end
22
+
23
+ context 'extending RFC-4180' do
24
+ it 'with extra col_sep' do
25
+ line = 'aaa,bbb,ccc,'
26
+ expect( SmarterCSV.send(:parse, line, options)).to eq [
27
+ ['aaa', 'bbb', 'ccc', ''], 4
28
+ ]
29
+ end
30
+
31
+ it 'with extra col_sep with given header_size' do
32
+ line = 'aaa,bbb,ccc,'
33
+ expect( SmarterCSV.send(:parse, line, options, 3)).to eq [
34
+ ['aaa', 'bbb', 'ccc'], 3
35
+ ]
36
+ end
37
+
38
+ it 'with multiple extra col_sep' do
39
+ line = 'aaa,bbb,ccc,,,'
40
+ expect( SmarterCSV.send(:parse, line, options)).to eq [
41
+ ['aaa', 'bbb', 'ccc', '', '', ''], 6
42
+ ]
43
+ end
44
+
45
+ it 'with multiple extra col_sep' do
46
+ line = 'aaa,bbb,ccc,,,'
47
+ expect( SmarterCSV.send(:parse, line, options, 3)).to eq [
48
+ ['aaa', 'bbb', 'ccc'], 3
49
+ ]
50
+ end
51
+
52
+ it 'with multiple complex col_sep' do
53
+ line = 'aaa<=>bbb<=>ccc<=><=><=>'
54
+ expect( SmarterCSV.send(:parse, line, options.merge({col_sep: '<=>'}))).to eq [
55
+ ['aaa', 'bbb', 'ccc', '', '', ''], 6
56
+ ]
57
+ end
58
+
59
+ it 'with multiple complex col_sep with given header_size' do
60
+ line = 'aaa<=>bbb<=>ccc<=><=><=>'
61
+ expect( SmarterCSV.send(:parse, line, options.merge({col_sep: '<=>'}), 3)).to eq [
62
+ ['aaa', 'bbb', 'ccc'], 3
63
+ ]
64
+ end
65
+ end
66
+ end
67
+
68
+ context 'parses quoted CSV' do
69
+ context 'RFC-4180' do
70
+ it 'separating on col_sep' do
71
+ line = '"aaa","bbb","ccc"'
72
+ expect( SmarterCSV.send(:parse, line, options)).to eq [%w[aaa bbb ccc], 3]
73
+ end
74
+
75
+ it 'parses corner case correctly' do
76
+ line = '"Board 4""","$17.40","10000003427"'
77
+ expect( SmarterCSV.send(:parse, line, options)).to eq [
78
+ ['Board 4"', '$17.40', '10000003427'], 3
79
+ ]
80
+ end
81
+
82
+ it 'quoted parts can contain spaces' do
83
+ line = '" aaa1 aaa2 "," bbb1 bbb2 "," ccc1 ccc2 "'
84
+ expect( SmarterCSV.send(:parse, line, options)).to eq [
85
+ [' aaa1 aaa2 ', ' bbb1 bbb2 ', ' ccc1 ccc2 '], 3
86
+ ]
87
+ end
88
+
89
+ it 'quoted parts can contain row_sep' do
90
+ line = '"aaa1, aaa2","bbb1, bbb2","ccc1, ccc2"'
91
+ expect( SmarterCSV.send(:parse, line, options)).to eq [
92
+ ['aaa1, aaa2', 'bbb1, bbb2', 'ccc1, ccc2'], 3
93
+ ]
94
+ end
95
+
96
+ it 'quoted parts can contain row_sep' do
97
+ line = '"aaa1, ""aaa2"", aaa3","""bbb1"", bbb2","ccc1, ""ccc2"""'
98
+ expect( SmarterCSV.send(:parse, line, options)).to eq [
99
+ ['aaa1, "aaa2", aaa3', '"bbb1", bbb2', 'ccc1, "ccc2"'], 3
100
+ ]
101
+ end
102
+
103
+ it 'some fields are quoted' do
104
+ line = '1,"board 4""",12.95'
105
+ expect( SmarterCSV.send(:parse, line, options)).to eq [
106
+ ['1', 'board 4"', '12.95'], 3
107
+ ]
108
+ end
109
+
110
+ it 'separating on col_sep' do
111
+ line = '"some","thing","""completely"" different"'
112
+ expect( SmarterCSV.send(:parse, line, options)).to eq [
113
+ ['some', 'thing', '"completely" different'], 3
114
+ ]
115
+ end
116
+ end
117
+
118
+ context 'extending RFC-4180' do
119
+ it 'with extra col_sep, without given header_size' do
120
+ line = '"aaa","bbb","ccc",'
121
+ expect( SmarterCSV.send(:parse, line, options)).to eq [
122
+ ['aaa', 'bbb', 'ccc', ''], 4
123
+ ]
124
+ end
125
+
126
+ it 'with extra col_sep, with given header_size' do
127
+ line = '"aaa","bbb","ccc",'
128
+ expect( SmarterCSV.send(:parse, line, options, 3)).to eq [%w[aaa bbb ccc], 3]
129
+ end
130
+
131
+ it 'with multiple extra col_sep, without given header_size' do
132
+ line = '"aaa","bbb","ccc",,,'
133
+ expect( SmarterCSV.send(:parse, line, options)).to eq [
134
+ ['aaa', 'bbb', 'ccc', '', '', ''], 6
135
+ ]
136
+ end
137
+
138
+ it 'with multiple extra col_sep, with given header_size' do
139
+ line = '"aaa","bbb","ccc",,,'
140
+ expect( SmarterCSV.send(:parse, line, options, 3)).to eq [
141
+ ['aaa', 'bbb', 'ccc'], 3
142
+ ]
143
+ end
144
+
145
+ it 'with multiple complex extra col_sep, without given header_size' do
146
+ line = '"aaa"<=>"bbb"<=>"ccc"<=><=><=>'
147
+ expect( SmarterCSV.send(:parse, line, options.merge({col_sep: '<=>'}))).to eq [
148
+ ['aaa', 'bbb', 'ccc', '', '', ''], 6
149
+ ]
150
+ end
151
+
152
+ it 'with multiple complex extra col_sep, with given header_size' do
153
+ line = '"aaa"<=>"bbb"<=>"ccc"<=><=><=>'
154
+ expect( SmarterCSV.send(:parse, line, options.merge({col_sep: '<=>'}), 3)).to eq [
155
+ ['aaa', 'bbb', 'ccc'], 3
156
+ ]
157
+ end
158
+ end
159
+ end
160
+
161
+ # relaxed parsing compared to RFC-4180
162
+ context 'liberal_parsing' do
163
+ it 'parses corner case correctly' do
164
+ line = 'is,this "three, or four",fields'
165
+ expect( SmarterCSV.send(:parse, line, options)).to eq [
166
+ ['is', 'this "three, or four"', 'fields'], 3
167
+ ]
168
+ end
169
+ end
170
+ end
@@ -3,7 +3,6 @@ require 'spec_helper'
3
3
  fixture_path = 'spec/fixtures'
4
4
 
5
5
  describe 'loading file with quoted fields' do
6
-
7
6
  it 'leaving the quotes in the data' do
8
7
  options = {}
9
8
  data = SmarterCSV.process("#{fixture_path}/quoted.csv", options)
@@ -12,6 +11,7 @@ describe 'loading file with quoted fields' do
12
11
  data[1][:description].should be_nil
13
12
  data[2][:model].should eq 'Venture "Extended Edition, Very Large"'
14
13
  data[2][:description].should be_nil
14
+ data[3][:description].should eq 'MUST SELL! air, moon roof, loaded'
15
15
  data.each do |h|
16
16
  h[:year].class.should eq Fixnum
17
17
  h[:make].should_not be_nil
@@ -20,17 +20,21 @@ describe 'loading file with quoted fields' do
20
20
  end
21
21
  end
22
22
 
23
-
23
+ # quotes inside quoted fields need to be escaped by another double-quote
24
24
  it 'removes quotes around quoted fields, but not inside data' do
25
25
  options = {}
26
26
  data = SmarterCSV.process("#{fixture_path}/quote_char.csv", options)
27
27
 
28
28
  data.length.should eq 6
29
+ data[0][:first_name].should eq "\"John"
30
+ data[0][:last_name].should eq "Cooke\""
29
31
  data[1][:first_name].should eq "Jam\ne\nson\""
30
32
  data[2][:first_name].should eq "\"Jean"
33
+ data[4][:first_name].should eq "Bo\"bbie"
34
+ data[5][:first_name].should eq 'Mica'
35
+ data[5][:last_name].should eq 'Copeland'
31
36
  end
32
37
 
33
-
34
38
  # NOTE: quotes inside headers need to be escaped by doubling them
35
39
  # e.g. 'correct ""EXAMPLE""'
36
40
  # this escaping is illegal: 'incorrect \"EXAMPLE\"' <-- this caused CSV parsing error
@@ -43,6 +47,6 @@ describe 'loading file with quoted fields' do
43
47
  data.length.should eq 3
44
48
  data.first.keys[2].should eq :isbn
45
49
  data.first.keys[3].should eq :discounted_price
50
+ data[1][:author].should eq 'Timothy "The Parser" Campbell'
46
51
  end
47
-
48
52
  end
metadata CHANGED
@@ -1,14 +1,14 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: smarter_csv
3
3
  version: !ruby/object:Gem::Version
4
- version: 1.5.2
4
+ version: 1.6.0
5
5
  platform: ruby
6
6
  authors:
7
7
  - Tilo Sloboda
8
8
  autorequire:
9
9
  bindir: bin
10
10
  cert_chain: []
11
- date: 2022-04-29 00:00:00.000000000 Z
11
+ date: 2022-05-03 00:00:00.000000000 Z
12
12
  dependencies:
13
13
  - !ruby/object:Gem::Dependency
14
14
  name: rspec
@@ -38,6 +38,20 @@ dependencies:
38
38
  - - ">="
39
39
  - !ruby/object:Gem::Version
40
40
  version: '0'
41
+ - !ruby/object:Gem::Dependency
42
+ name: awesome_print
43
+ requirement: !ruby/object:Gem::Requirement
44
+ requirements:
45
+ - - ">="
46
+ - !ruby/object:Gem::Version
47
+ version: '0'
48
+ type: :development
49
+ prerelease: false
50
+ version_requirements: !ruby/object:Gem::Requirement
51
+ requirements:
52
+ - - ">="
53
+ - !ruby/object:Gem::Version
54
+ version: '0'
41
55
  description: Ruby Gem for smarter importing of CSV Files as Array(s) of Hashes, with
42
56
  optional features for processing large files in parallel, embedded comments, unusual
43
57
  field- and record-separators, flexible mapping of CSV-headers to Hash-keys
@@ -126,6 +140,9 @@ files:
126
140
  - spec/smarter_csv/malformed_spec.rb
127
141
  - spec/smarter_csv/no_header_spec.rb
128
142
  - spec/smarter_csv/not_downcase_header_spec.rb
143
+ - spec/smarter_csv/parse/column_separator_spec.rb
144
+ - spec/smarter_csv/parse/old_csv_library_spec.rb
145
+ - spec/smarter_csv/parse/rfc4180_and_more_spec.rb
129
146
  - spec/smarter_csv/problematic.rb
130
147
  - spec/smarter_csv/quoted_spec.rb
131
148
  - spec/smarter_csv/remove_empty_values_spec.rb
@@ -161,8 +178,7 @@ required_rubygems_version: !ruby/object:Gem::Requirement
161
178
  - - ">="
162
179
  - !ruby/object:Gem::Version
163
180
  version: '0'
164
- requirements:
165
- - csv
181
+ requirements: []
166
182
  rubygems_version: 3.1.6
167
183
  signing_key:
168
184
  specification_version: 4
@@ -233,6 +249,9 @@ test_files:
233
249
  - spec/smarter_csv/malformed_spec.rb
234
250
  - spec/smarter_csv/no_header_spec.rb
235
251
  - spec/smarter_csv/not_downcase_header_spec.rb
252
+ - spec/smarter_csv/parse/column_separator_spec.rb
253
+ - spec/smarter_csv/parse/old_csv_library_spec.rb
254
+ - spec/smarter_csv/parse/rfc4180_and_more_spec.rb
236
255
  - spec/smarter_csv/problematic.rb
237
256
  - spec/smarter_csv/quoted_spec.rb
238
257
  - spec/smarter_csv/remove_empty_values_spec.rb