smarter_csv 1.4.0 → 1.4.2

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: c8236e4cc8f0081efd9b74f12ad4b5342707d0a2f883414b07538160910008a3
4
- data.tar.gz: b04a53b0030bf6c623aa19fb15c0c6c5ca123ce2ff85d47f176884fffa0f9811
3
+ metadata.gz: 3be724101d41326ff480bcb723c1b40a3cabd879eb55e0c2f044372f8e5a57d0
4
+ data.tar.gz: 657db1421352f449bf042f8df4d5178167af048ad37836e4f2f2f8a6aea3ece0
5
5
  SHA512:
6
- metadata.gz: f2ddaa7bf44362c8bb4439289172d40b6ca926a67a8a35fb335473ddf7349658a629f3008ece5314c6bc5fa17145a2ae89b4d706b9c130a1642a51f2434d5e21
7
- data.tar.gz: b48908b657a07589886873fe251263dabbe6e2333a1fc025dfede085841544458d4498ba1d288a4a7c0de3875d1c14631cc584b2a1cb7fd0be1543b758781dd3
6
+ metadata.gz: 3430649df35ac8139d35b04b85e8691ca5fc3d98b7b15f0d3987855f571987bdb742e0ed6f807ddb7a2e61e61d696d529ac311bc58e30188325f1c4bb78098a4
7
+ data.tar.gz: 1b386af7cc7c39bc7ea934875e16f6641a2cc0c2bb5dfaa3b1f298739b1b355b2f41570e42998a2d7790a17f96feb07118b69c23d913acc634aae5901f0c9229
data/.gitignore CHANGED
@@ -6,3 +6,5 @@
6
6
  .bundle
7
7
  Gemfile.lock
8
8
  pkg/*
9
+ coverage/*
10
+ .DS_Store
data/CHANGELOG.md CHANGED
@@ -1,14 +1,18 @@
1
1
 
2
2
  # SmarterCSV 1.x Change Log
3
3
 
4
- ## 1.4.0 (2022-01-11)
4
+ ## 1.4.1 (2022-02-12)
5
+ * minor fix: also support `col_sep: :auto`
6
+ * added simplecov
7
+
8
+ ## 1.4.0 (2022-02-11)
5
9
  * dropped GPL license, smarter_csv is now only using the MIT License
6
10
  * added experimental option `col_sep: 'auto` to auto-detect the column separator (issue #183)
7
11
  The default behavior is still to assume `,` is the column separator.
8
12
  * fixed buggy behavior when using `remove_empty_values: false` (issue #168)
9
13
  * fixed Ruby 3.0 deprecation
10
14
 
11
- ## 1.3.0 (2022-01-06) Breaking code change if you used `--key_mappings`
15
+ ## 1.3.0 (2022-02-06) Breaking code change if you used `--key_mappings`
12
16
  * fix bug for key_mappings (issue #181)
13
17
  The values of the `key_mappings` hash will now be used "as is", and no longer forced to be symbols
14
18
 
data/CONTRIBUTORS.md ADDED
@@ -0,0 +1,45 @@
1
+ # A Big Thank You to all the Contributors!!
2
+
3
+
4
+ A Big Thank you to everyone who filed issues, sent comments, and who contributed with pull requests:
5
+
6
+ * [Jack 0](https://github.com/xjlin0)
7
+ * [Alejandro](https://github.com/agaviria)
8
+ * [Lucas Camargo de Almeida](https://github.com/lcalmeida)
9
+ * [Raphaël Bleuse](https://github.com/bleuse)
10
+ * [feens](https://github.com/feens)
11
+ * [César Camacho](https://github.com/chanko)
12
+ * [innhyu](https://github.com/innhyu)
13
+ * [Benjamin Thouret](https://github.com/benichu)
14
+ * [Chris Hilton](https://github.com/chrismhilton)
15
+ * [Sean Duckett](http://github.com/sduckett)
16
+ * [Alex Ong](http://github.com/khaong)
17
+ * [Martin Nilsson](http://github.com/MrTin)
18
+ * [Eustáquio Rangel](http://github.com/taq)
19
+ * [Pavel](http://github.com/paxa)
20
+ * [Félix Bellanger](https://github.com/Keeguon)
21
+ * [Graham Wetzler](https://github.com/grahamwetzler)
22
+ * [Marcos G. Zimmermann](https://github.com/marcosgz)
23
+ * [Jordan Running](https://github.com/jrunning)
24
+ * [Dave Sanders](https://github.com/DaveSanders)
25
+ * [Hugo Lepetit](https://github.com/giglemad)
26
+ * [esBeee](https://github.com/esBeee)
27
+ * [Waldyr de Souza](https://github.com/waldyr)
28
+ * [Ben Maher](https://github.com/benmaher)
29
+ * [Wal McConnell](https://github.com/wal)
30
+ * [Jordan Graft](https://github.com/jordangraft)
31
+ * [Michael](https://github.com/polycarpou)
32
+ * [Kevin Coleman](https://github.com/KevinColemanInc)
33
+ * [Tirdad C.](https://github.com/tridadc)
34
+ * [Dave Myron](https://github.com/contentfree)
35
+ * [Ivan Ushakov](https://github.com/IvanUshakov)
36
+ * [Matthieu Paret](https://github.com/mtparet)
37
+ * [Rohit Amarnath](https://github.com/ramarnat)
38
+ * [Joshua Smith](https://github.com/enviable)
39
+ * [Colin Petruno](https://github.com/colinpetruno)
40
+ * [Diego Salido](https://github.com/salidux)
41
+ * [Elie](https://github.com/elieteyssedou)
42
+ * [Chris Wong](https://github.com/lightwave)
43
+ * [Olle Jonsson](https://github.com/olleolleolle)
44
+ * [Nicolas Guillemain](https://github.com/Viiruus)
45
+ * [Sp6](https://github.com/sp6)
data/LICENSE.txt CHANGED
@@ -1,6 +1,6 @@
1
1
  The MIT License (MIT)
2
2
 
3
- Copyright (c) 2022 Tilo Sloboda
3
+ Copyright (c) 2012..2022 Tilo Sloboda
4
4
 
5
5
  Permission is hereby granted, free of charge, to any person obtaining a copy
6
6
  of this software and associated documentation files (the "Software"), to deal
data/README.md CHANGED
@@ -1,17 +1,23 @@
1
- # SmarterCSV
2
-
3
- [![Build Status](https://secure.travis-ci.org/tilo/smarter_csv.svg?branch=master)](http://travis-ci.org/tilo/smarter_csv) [![Gem Version](https://badge.fury.io/rb/smarter_csv.svg)](http://badge.fury.io/rb/smarter_csv)
4
1
 
5
- ---------------
6
2
  #### Service Announcement
7
3
 
8
4
  * Work towards SmarterCSV 2.0 is still on it's way, with much improved features, and more streamlined options.
9
- Please check the 2.0-develop branch, open any issues and pull requests with mention of v2.0.
5
+ Please check the [2.0-develop branch](https://github.com/tilo/smarter_csv/blob/master/README.md), open any issues and pull requests with mention of v2.0.
10
6
 
11
- * New versions on the 1.2 branch will soon print a deprecation warning if you set :verbose to true
7
+ * New versions of SmarterCSV 1.x will soon print a deprecation warning if you set :verbose to true
12
8
  See below for list of deprecated options.
13
9
 
10
+ #### Restructured Branches
11
+
12
+ * default branch is `main` for 1.x development
13
+ * 2.x development is on `2.0-development`
14
+
14
15
  ---------------
16
+
17
+ # SmarterCSV
18
+
19
+ [![Build Status](https://secure.travis-ci.org/tilo/smarter_csv.svg?branch=master)](http://travis-ci.org/tilo/smarter_csv) [![Gem Version](https://badge.fury.io/rb/smarter_csv.svg)](http://badge.fury.io/rb/smarter_csv)
20
+
15
21
  #### SmarterCSV 1.x
16
22
 
17
23
  `smarter_csv` is a Ruby Gem for smarter importing of CSV Files as Array(s) of Hashes, suitable for direct processing with Mongoid or ActiveRecord,
@@ -55,6 +61,7 @@ You can also set the `:row_sep` manually! Checkout Example 5 for unusual `:row_s
55
61
  #### Example 1a: How SmarterCSV processes CSV-files as array of hashes:
56
62
  Please note how each hash contains only the keys for columns with non-null values.
57
63
 
64
+ ```ruby
58
65
  $ cat pets.csv
59
66
  first name,last name,dogs,cats,birds,fish
60
67
  Dan,McAllister,2,,,
@@ -70,21 +77,25 @@ Please note how each hash contains only the keys for columns with non-null value
70
77
  {:first_name=>"Miles", :last_name=>"O'Brian", :fish=>"21"},
71
78
  {:first_name=>"Nancy", :last_name=>"Homes", :dogs=>"2", :birds=>"1"}
72
79
  ]
80
+ ```
73
81
 
74
82
 
75
83
  #### Example 1b: How SmarterCSV processes CSV-files as chunks, returning arrays of hashes:
76
84
  Please note how the returned array contains two sub-arrays containing the chunks which were read, each chunk containing 2 hashes.
77
85
  In case the number of rows is not cleanly divisible by `:chunk_size`, the last chunk contains fewer hashes.
78
86
 
87
+ ```ruby
79
88
  > pets_by_owner = SmarterCSV.process('/tmp/pets.csv', {:chunk_size => 2, :key_mapping => {:first_name => :first, :last_name => :last}})
80
89
  => [ [ {:first=>"Dan", :last=>"McAllister", :dogs=>"2"}, {:first=>"Lucy", :last=>"Laweless", :cats=>"5"} ],
81
90
  [ {:first=>"Miles", :last=>"O'Brian", :fish=>"21"}, {:first=>"Nancy", :last=>"Homes", :dogs=>"2", :birds=>"1"} ]
82
91
  ]
92
+ ```
83
93
 
84
94
  #### Example 1c: How SmarterCSV processes CSV-files as chunks, and passes arrays of hashes to a given block:
85
95
  Please note how the given block is passed the data for each chunk as the parameter (array of hashes),
86
96
  and how the `process` method returns the number of chunks when called with a block
87
97
 
98
+ ```ruby
88
99
  > total_chunks = SmarterCSV.process('/tmp/pets.csv', {:chunk_size => 2, :key_mapping => {:first_name => :first, :last_name => :last}}) do |chunk|
89
100
  chunk.each do |h| # you can post-process the data from each row to your heart's content, and also create virtual attributes:
90
101
  h[:full_name] = [h[:first],h[:last]].join(' ') # create a virtual attribute
@@ -96,16 +107,16 @@ and how the `process` method returns the number of chunks when called with a blo
96
107
  [{:dogs=>"2", :full_name=>"Dan McAllister"}, {:cats=>"5", :full_name=>"Lucy Laweless"}]
97
108
  [{:fish=>"21", :full_name=>"Miles O'Brian"}, {:dogs=>"2", :birds=>"1", :full_name=>"Nancy Homes"}]
98
109
  => 2
99
-
110
+ ```
100
111
  #### Example 2: Reading a CSV-File in one Chunk, returning one Array of Hashes:
101
-
112
+ ```ruby
102
113
  filename = '/tmp/input_file.txt' # TAB delimited file, each row ending with Control-M
103
114
  recordsA = SmarterCSV.process(filename, {:col_sep => "\t", :row_sep => "\cM"}) # no block given
104
115
 
105
116
  => returns an array of hashes
106
-
117
+ ```
107
118
  #### Example 3: Populate a MySQL or MongoDB Database with SmarterCSV:
108
-
119
+ ```ruby
109
120
  # without using chunks:
110
121
  filename = '/tmp/some.csv'
111
122
  options = {:key_mapping => {:unwanted_row => nil, :old_row_name => :new_name}}
@@ -116,9 +127,9 @@ and how the `process` method returns the number of chunks when called with a blo
116
127
  end
117
128
 
118
129
  => returns number of chunks / rows we processed
119
-
130
+ ```
120
131
  #### Example 4: Populate a MongoDB Database in Chunks of 100 records with SmarterCSV:
121
-
132
+ ```ruby
122
133
  # using chunks:
123
134
  filename = '/tmp/some.csv'
124
135
  options = {:chunk_size => 100, :key_mapping => {:unwanted_row => nil, :old_row_name => :new_name}}
@@ -129,10 +140,10 @@ and how the `process` method returns the number of chunks when called with a blo
129
140
  end
130
141
 
131
142
  => returns number of chunks we processed
132
-
143
+ ```
133
144
 
134
145
  #### Example 5: Reading a CSV-like File, and Processing it with Resque:
135
-
146
+ ```ruby
136
147
  filename = '/tmp/strange_db_dump' # a file with CRTL-A as col_separator, and with CTRL-B\n as record_separator (hello iTunes!)
137
148
  options = {
138
149
  :col_sep => "\cA", :row_sep => "\cB\n", :comment_regexp => /^#/,
@@ -142,11 +153,11 @@ and how the `process` method returns the number of chunks when called with a blo
142
153
  Resque.enque( ResqueWorkerClass, chunk ) # pass chunks of CSV-data to Resque workers for parallel processing
143
154
  end
144
155
  => returns number of chunks
145
-
156
+ ```
146
157
  #### Example 6: Using Value Converters
147
158
 
148
159
  NOTE: If you use `key_mappings` and `value_converters`, make sure that the value converters has references the keys based on the final mapped name, not the original name in the CSV file.
149
-
160
+ ```ruby
150
161
  $ cat spec/fixtures/with_dates.csv
151
162
  first,last,date,price
152
163
  Ben,Miller,10/30/1998,$44.50
@@ -179,7 +190,7 @@ NOTE: If you use `key_mappings` and `value_converters`, make sure that the value
179
190
  => 44.50
180
191
  data[0][:price].class
181
192
  => Float
182
-
193
+ ```
183
194
  ## Parallel Processing
184
195
  [Jack](https://github.com/xjlin0) wrote an interesting article about [Speeding up CSV parsing with parallel processing](http://xjlin0.github.io/tech/2015/05/25/faster-parsing-csv-with-parallel-processing)
185
196
 
@@ -206,7 +217,7 @@ The options and the block are optional.
206
217
  | :skip_lines | nil | how many lines to skip before the first line or header line is processed |
207
218
  | :comment_regexp | /^#/ | regular expression which matches comment lines (see NOTE about the CSV header) |
208
219
  ---------------------------------------------------------------------------------------------------------------------------------
209
- | :col_sep | ',' | column separator, can be set to 'auto' |
220
+ | :col_sep | ',' | column separator, can be set to :auto |
210
221
  | :force_simple_split | false | force simple splitting on :col_sep character for non-standard CSV-files. |
211
222
  | | | e.g. when :quote_char is not properly escaped |
212
223
  | :row_sep | $/ ,"\n" | row separator or record separator , defaults to system's $/ , which defaults to "\n" |
@@ -258,19 +269,19 @@ And header and data validations will also be supported in 2.x
258
269
  #### NOTES about File Encodings:
259
270
  * if you have a CSV file which contains unicode characters, you can process it as follows:
260
271
 
261
-
272
+ ```ruby
262
273
  File.open(filename, "r:bom|utf-8") do |f|
263
274
  data = SmarterCSV.process(f);
264
275
  end
265
-
276
+ ```
266
277
  * if the CSV file with unicode characters is in a remote location, similarly you need to give the encoding as an option to the `open` call:
267
-
278
+ ```ruby
268
279
  require 'open-uri'
269
280
  file_location = 'http://your.remote.org/sample.csv'
270
281
  open(file_location, 'r:utf-8') do |f| # don't forget to specify the UTF-8 encoding!!
271
282
  data = SmarterCSV.process(f)
272
283
  end
273
-
284
+ ```
274
285
  #### NOTES about CSV Headers:
275
286
  * as this method parses CSV files, it is assumed that the first line of any file will contain a valid header
276
287
  * the first line with the CSV header may or may not be commented out according to the :comment_regexp
@@ -304,64 +315,27 @@ And header and data validations will also be supported in 2.x
304
315
  ## Installation
305
316
 
306
317
  Add this line to your application's Gemfile:
307
-
318
+ ```ruby
308
319
  gem 'smarter_csv'
309
-
320
+ ```
310
321
  And then execute:
311
-
322
+ ```ruby
312
323
  $ bundle
313
-
324
+ ```
314
325
  Or install it yourself as:
315
-
326
+ ```ruby
316
327
  $ gem install smarter_csv
317
-
328
+ ```
318
329
  ## [ChangeLog](./CHANGELOG.md)
319
330
 
320
331
  ## Reporting Bugs / Feature Requests
321
332
 
322
333
  Please [open an Issue on GitHub](https://github.com/tilo/smarter_csv/issues) if you have feedback, new feature requests, or want to report a bug. Thank you!
323
334
 
335
+ * please include a small sample CSV file
336
+ * please mention your version of SmarterCSV, Ruby, Rails
324
337
 
325
- ## Special Thanks
326
-
327
- Many thanks to people who have filed issues and sent comments.
328
- And a special thanks to those who contributed pull requests:
329
-
330
- * [Jack 0](https://github.com/xjlin0)
331
- * [Alejandro](https://github.com/agaviria)
332
- * [Lucas Camargo de Almeida](https://github.com/lcalmeida)
333
- * [Raphaël Bleuse](https://github.com/bleuse)
334
- * [feens](https://github.com/feens)
335
- * [César Camacho](https://github.com/chanko)
336
- * [innhyu](https://github.com/innhyu)
337
- * [Benjamin Thouret](https://github.com/benichu)
338
- * [Chris Hilton](https://github.com/chrismhilton)
339
- * [Sean Duckett](http://github.com/sduckett)
340
- * [Alex Ong](http://github.com/khaong)
341
- * [Martin Nilsson](http://github.com/MrTin)
342
- * [Eustáquio Rangel](http://github.com/taq)
343
- * [Pavel](http://github.com/paxa)
344
- * [Félix Bellanger](https://github.com/Keeguon)
345
- * [Graham Wetzler](https://github.com/grahamwetzler)
346
- * [Marcos G. Zimmermann](https://github.com/marcosgz)
347
- * [Jordan Running](https://github.com/jrunning)
348
- * [Dave Sanders](https://github.com/DaveSanders)
349
- * [Hugo Lepetit](https://github.com/giglemad)
350
- * [esBeee](https://github.com/esBeee)
351
- * [Waldyr de Souza](https://github.com/waldyr)
352
- * [Ben Maher](https://github.com/benmaher)
353
- * [Wal McConnell](https://github.com/wal)
354
- * [Jordan Graft](https://github.com/jordangraft)
355
- * [Michael](https://github.com/polycarpou)
356
- * [Kevin Coleman](https://github.com/KevinColemanInc)
357
- * [Tirdad C.](https://github.com/tridadc)
358
- * [Dave Myron](https://github.com/contentfree)
359
- * [Ivan Ushakov](https://github.com/IvanUshakov)
360
- * [Matthieu Paret](https://github.com/mtparet)
361
- * [Rohit Amarnath](https://github.com/ramarnat)
362
- * [Joshua Smith](https://github.com/enviable)
363
- * [Colin Petruno](https://github.com/colinpetruno)
364
- * [Diego Salido](https://github.com/salidux)
338
+ ## [A Special Thanks to all Contributors!](CONTRIBUTORS.md) 🎉🎉🎉
365
339
 
366
340
 
367
341
  ## Contributing
data/Rakefile CHANGED
@@ -1,26 +1,19 @@
1
1
  #!/usr/bin/env rake
2
2
  require "bundler/gem_tasks"
3
-
4
3
  require 'rubygems'
5
4
  require 'rake'
6
-
7
5
  require 'rspec/core/rake_task'
8
6
 
7
+ task :default => :spec
8
+
9
9
  desc "Run RSpec"
10
10
  RSpec::Core::RakeTask.new do |t|
11
- t.verbose = false
11
+ # t.verbose = false
12
12
  end
13
13
 
14
- desc "Run specs for all test cases"
15
- task :spec_all do
16
- system "rake spec"
14
+ desc 'Run spec with coverage'
15
+ task :coverage do
16
+ ENV['COVERAGE'] = 'true'
17
+ Rake::Task['spec'].execute
18
+ `open coverage/index.html`
17
19
  end
18
-
19
- # task :spec_all do
20
- # %w[active_record data_mapper mongoid].each do |model_adapter|
21
- # puts "MODEL_ADAPTER = #{model_adapter}"
22
- # system "rake spec MODEL_ADAPTER=#{model_adapter}"
23
- # end
24
- # end
25
-
26
- task :default => :spec
@@ -7,16 +7,9 @@ module SmarterCSV
7
7
  class NoColSepDetected < SmarterCSVException; end
8
8
 
9
9
  def SmarterCSV.process(input, options={}, &block) # first parameter: filename or input object with readline method
10
- default_options = {:col_sep => ',', :row_sep => $INPUT_RECORD_SEPARATOR, :quote_char => '"', :force_simple_split => false , :verbose => false ,
11
- :remove_empty_values => true, :remove_zero_values => false , :remove_values_matching => nil , :remove_empty_hashes => true , :strip_whitespace => true,
12
- :convert_values_to_numeric => true, :strip_chars_from_headers => nil , :user_provided_headers => nil , :headers_in_file => true,
13
- :comment_regexp => /\A#/, :chunk_size => nil , :key_mapping_hash => nil , :downcase_header => true, :strings_as_keys => false, :file_encoding => 'utf-8',
14
- :remove_unmapped_keys => false, :keep_original_headers => false, :value_converters => nil, :skip_lines => nil, :force_utf8 => false, :invalid_byte_sequence => '',
15
- :auto_row_sep_chars => 500, :required_headers => nil
16
- }
17
10
  options = default_options.merge(options)
18
11
  options[:invalid_byte_sequence] = '' if options[:invalid_byte_sequence].nil?
19
- csv_options = options.select{|k,v| [:col_sep, :row_sep, :quote_char].include?(k)} # options.slice(:col_sep, :row_sep, :quote_char)
12
+
20
13
  headerA = []
21
14
  result = []
22
15
  old_row_sep = $INPUT_RECORD_SEPARATOR
@@ -26,22 +19,21 @@ module SmarterCSV
26
19
  begin
27
20
  f = input.respond_to?(:readline) ? input : File.open(input, "r:#{options[:file_encoding]}")
28
21
 
22
+ # auto-detect the row separator
23
+ options[:row_sep] = SmarterCSV.guess_line_ending(f, options) if options[:row_sep].to_sym == :auto
24
+ $INPUT_RECORD_SEPARATOR = options[:row_sep]
29
25
  # attempt to auto-detect column separator
30
- options[:col_sep] = guess_column_separator(f) if options[:col_sep] == 'auto'
26
+ options[:col_sep] = guess_column_separator(f) if options[:col_sep].to_sym == :auto
27
+ # preserve options, in case we need to call the CSV class
28
+ csv_options = options.select{|k,v| [:col_sep, :row_sep, :quote_char].include?(k)} # options.slice(:col_sep, :row_sep, :quote_char)
29
+ csv_options.delete(:row_sep) if [nil, :auto].include?( options[:row_sep].to_sym )
30
+ csv_options.delete(:col_sep) if [nil, :auto].include?( options[:col_sep].to_sym )
31
31
 
32
32
  if (options[:force_utf8] || options[:file_encoding] =~ /utf-8/i) && ( f.respond_to?(:external_encoding) && f.external_encoding != Encoding.find('UTF-8') || f.respond_to?(:encoding) && f.encoding != Encoding.find('UTF-8') )
33
33
  puts 'WARNING: you are trying to process UTF-8 input, but did not open the input with "b:utf-8" option. See README file "NOTES about File Encodings".'
34
34
  end
35
35
 
36
- if options[:row_sep] == :auto
37
- options[:row_sep] = line_ending = SmarterCSV.guess_line_ending( f, options )
38
- f.rewind
39
- end
40
- $INPUT_RECORD_SEPARATOR = options[:row_sep]
41
-
42
- if options[:skip_lines].to_i > 0
43
- options[:skip_lines].to_i.times{f.readline}
44
- end
36
+ options[:skip_lines].to_i.times{f.readline} if options[:skip_lines].to_i > 0
45
37
 
46
38
  if options[:headers_in_file] # extract the header line
47
39
  # process the header line in the CSV file..
@@ -87,7 +79,7 @@ module SmarterCSV
87
79
  else
88
80
  headerA = file_headerA
89
81
  end
90
- header_size = headerA.size
82
+ header_size = headerA.size # used for splitting lines
91
83
 
92
84
  headerA.map!{|x| x.to_sym } unless options[:strings_as_keys] || options[:keep_original_headers]
93
85
 
@@ -141,8 +133,8 @@ module SmarterCSV
141
133
  # cater for the quoted csv data containing the row separator carriage return character
142
134
  # in which case the row data will be split across multiple lines (see the sample content in spec/fixtures/carriage_returns_rn.csv)
143
135
  # by detecting the existence of an uneven number of quote characters
144
- multiline = line.count(options[:quote_char])%2 == 1
145
- while line.count(options[:quote_char])%2 == 1
136
+ multiline = line.count(options[:quote_char])%2 == 1 # should handle quote_char nil
137
+ while line.count(options[:quote_char])%2 == 1 # should handle quote_char nil
146
138
  next_line = f.readline
147
139
  next_line = next_line.force_encoding('utf-8').encode('utf-8', invalid: :replace, undef: :replace, replace: options[:invalid_byte_sequence]) if options[:force_utf8] || options[:file_encoding] !~ /utf-8/i
148
140
  line += next_line
@@ -269,6 +261,39 @@ module SmarterCSV
269
261
 
270
262
  private
271
263
 
264
+ def self.default_options
265
+ {
266
+ auto_row_sep_chars: 500,
267
+ chunk_size: nil ,
268
+ col_sep: ',',
269
+ comment_regexp: /\A#/,
270
+ convert_values_to_numeric: true,
271
+ downcase_header: true,
272
+ file_encoding: 'utf-8',
273
+ force_simple_split: false ,
274
+ force_utf8: false,
275
+ headers_in_file: true,
276
+ invalid_byte_sequence: '',
277
+ keep_original_headers: false,
278
+ key_mapping_hash: nil ,
279
+ quote_char: '"',
280
+ remove_empty_hashes: true ,
281
+ remove_empty_values: true,
282
+ remove_unmapped_keys: false,
283
+ remove_values_matching: nil,
284
+ remove_zero_values: false,
285
+ required_headers: nil,
286
+ row_sep: $INPUT_RECORD_SEPARATOR,
287
+ skip_lines: nil,
288
+ strings_as_keys: false,
289
+ strip_chars_from_headers: nil,
290
+ strip_whitespace: true,
291
+ user_provided_headers: nil,
292
+ value_converters: nil,
293
+ verbose: false,
294
+ }
295
+ end
296
+
272
297
  def self.blank?(value)
273
298
  case value
274
299
  when Array
@@ -347,6 +372,8 @@ module SmarterCSV
347
372
  lines += 1
348
373
  break if options[:auto_row_sep_chars] && options[:auto_row_sep_chars] > 0 && lines >= options[:auto_row_sep_chars]
349
374
  end
375
+ filehandle.rewind
376
+
350
377
  counts["\r"] += 1 if last_char == "\r"
351
378
  # find the key/value pair with the largest counter:
352
379
  k,_ = counts.max_by{|_,v| v}
@@ -1,3 +1,3 @@
1
1
  module SmarterCSV
2
- VERSION = "1.4.0"
2
+ VERSION = "1.4.2"
3
3
  end
data/lib/smarter_csv.rb CHANGED
@@ -1,3 +1,11 @@
1
+ if ENV['COVERAGE']
2
+ require 'simplecov'
3
+ SimpleCov.start do
4
+ add_filter "/spec/"
5
+ add_filter "/pkg/"
6
+ end
7
+ end
8
+
1
9
  require 'csv'
2
10
  require "smarter_csv/version"
3
11
  require "extensions/hash.rb"
data/smarter_csv.gemspec CHANGED
@@ -18,6 +18,7 @@ Gem::Specification.new do |spec|
18
18
  spec.require_paths = ["lib"]
19
19
  spec.requirements = ['csv'] # for CSV.parse() only needed in case we have quoted fields
20
20
  spec.add_development_dependency "rspec"
21
+ spec.add_development_dependency "simplecov"
21
22
  # spec.add_development_dependency "guard-rspec"
22
23
 
23
24
  spec.metadata["homepage_uri"] = spec.homepage
@@ -3,7 +3,6 @@ require 'spec_helper'
3
3
  fixture_path = 'spec/fixtures'
4
4
 
5
5
  describe 'process files with line endings explicitly pre-specified' do
6
-
7
6
  it 'should process a file with \n for line endings and within data fields' do
8
7
  sep = "\n"
9
8
  options = {:row_sep => sep}
@@ -83,14 +82,14 @@ describe 'process files with line endings explicitly pre-specified' do
83
82
  data[1][:members].should == ["Jimmy Page", "Robert Plant", "John Bonham", "John Paul Jones"].join(text_sep)
84
83
  data[1][:albums].should == ["Led Zeppelin", "Led Zeppelin II", "Led Zeppelin III", "Led Zeppelin IV"].join(text_sep)
85
84
  end
86
-
87
85
  end
88
86
 
89
87
  describe 'process files with line endings in automatic mode' do
88
+ let(:options) { { row_sep: :auto } }
90
89
 
91
90
  it 'should process a file with \n for line endings and within data fields' do
92
91
  sep = "\n"
93
- data = SmarterCSV.process("#{fixture_path}/carriage_returns_n.csv", {:row_sep => :auto})
92
+ data = SmarterCSV.process("#{fixture_path}/carriage_returns_n.csv", options)
94
93
  data.flatten.size.should == 8
95
94
  data[0][:name].should == "Anfield"
96
95
  data[0][:street].should == "Anfield Road"
@@ -112,7 +111,29 @@ describe 'process files with line endings in automatic mode' do
112
111
 
113
112
  it 'should process a file with \r for line endings and within data fields' do
114
113
  sep = "\r"
115
- data = SmarterCSV.process("#{fixture_path}/carriage_returns_r.csv", {:row_sep => :auto})
114
+ data = SmarterCSV.process("#{fixture_path}/carriage_returns_r.csv", options)
115
+ data.flatten.size.should == 8
116
+ data[0][:name].should == "Anfield"
117
+ data[0][:street].should == "Anfield Road"
118
+ data[0][:city].should == "Liverpool"
119
+ data[1][:name].should == ["Highbury", "Highbury House"].join(sep)
120
+ data[2][:street].should == ["Sir Matt ", "Busby Way"].join(sep)
121
+ data[3][:city].should == ["Newcastle-upon-tyne ", "Tyne and Wear"].join(sep)
122
+ data[4][:name].should == ["White Hart Lane", "(The Lane)"].join(sep)
123
+ data[4][:street].should == ["Bill Nicholson Way ", "748 High Rd"].join(sep)
124
+ data[4][:city].should == ["Tottenham", "London"].join(sep)
125
+ data[5][:name].should == "Stamford Bridge"
126
+ data[5][:street].should == ["Fulham Road", "London"].join(sep)
127
+ data[5][:city].should be_nil
128
+ data[6][:name].should == ["Etihad Stadium", "Rowsley St", "Manchester"].join(sep)
129
+ data[7][:name].should == "Goodison"
130
+ data[7][:street].should == "Goodison Road"
131
+ data[7][:city].should == "Liverpool"
132
+ end
133
+
134
+ it 'also works when auto is given a string' do
135
+ sep = "\r"
136
+ data = SmarterCSV.process("#{fixture_path}/carriage_returns_r.csv", {row_sep: 'auto'})
116
137
  data.flatten.size.should == 8
117
138
  data[0][:name].should == "Anfield"
118
139
  data[0][:street].should == "Anfield Road"
@@ -134,7 +155,7 @@ describe 'process files with line endings in automatic mode' do
134
155
 
135
156
  it 'should process a file with \r\n for line endings and within data fields' do
136
157
  sep = "\r\n"
137
- data = SmarterCSV.process("#{fixture_path}/carriage_returns_rn.csv", {:row_sep => :auto})
158
+ data = SmarterCSV.process("#{fixture_path}/carriage_returns_rn.csv", options)
138
159
  data.flatten.size.should == 8
139
160
  data[0][:name].should == "Anfield"
140
161
  data[0][:street].should == "Anfield Road"
@@ -157,7 +178,7 @@ describe 'process files with line endings in automatic mode' do
157
178
  it 'should process a file with more quoted text carriage return characters (\r) than line ending characters (\n)' do
158
179
  row_sep = "\n"
159
180
  text_sep = "\r"
160
- data = SmarterCSV.process("#{fixture_path}/carriage_returns_quoted.csv", {:row_sep => :auto})
181
+ data = SmarterCSV.process("#{fixture_path}/carriage_returns_quoted.csv", options)
161
182
  data.flatten.size.should == 2
162
183
  data[0][:band].should == "New Order"
163
184
  data[0][:members].should == ["Bernard Sumner", "Peter Hook", "Stephen Morris", "Gillian Gilbert"].join(text_sep)
@@ -166,5 +187,4 @@ describe 'process files with line endings in automatic mode' do
166
187
  data[1][:members].should == ["Jimmy Page", "Robert Plant", "John Bonham", "John Paul Jones"].join(text_sep)
167
188
  data[1][:albums].should == ["Led Zeppelin", "Led Zeppelin II", "Led Zeppelin III", "Led Zeppelin IV"].join(text_sep)
168
189
  end
169
-
170
190
  end
@@ -48,7 +48,7 @@ describe 'can handle col_sep' do
48
48
  end
49
49
 
50
50
  describe 'auto-detection of separator' do
51
- options = {:col_sep => 'auto'}
51
+ options = {col_sep: :auto}
52
52
 
53
53
  it 'auto-detects comma separator and loads data' do
54
54
  data = SmarterCSV.process("#{fixture_path}/separator_comma.csv", options)
@@ -85,5 +85,11 @@ describe 'can handle col_sep' do
85
85
  SmarterCSV.process("#{fixture_path}/binary.csv", options)
86
86
  }.to raise_exception SmarterCSV::NoColSepDetected
87
87
  end
88
+
89
+ it 'also works when auto is given a string' do
90
+ data = SmarterCSV.process("#{fixture_path}/separator_pipe.csv", col_sep: 'auto')
91
+ data.first.keys.size.should == 4
92
+ data.size.should eq 3
93
+ end
88
94
  end
89
95
  end
metadata CHANGED
@@ -1,14 +1,14 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: smarter_csv
3
3
  version: !ruby/object:Gem::Version
4
- version: 1.4.0
4
+ version: 1.4.2
5
5
  platform: ruby
6
6
  authors:
7
7
  - Tilo Sloboda
8
8
  autorequire:
9
9
  bindir: bin
10
10
  cert_chain: []
11
- date: 2022-02-11 00:00:00.000000000 Z
11
+ date: 2022-02-15 00:00:00.000000000 Z
12
12
  dependencies:
13
13
  - !ruby/object:Gem::Dependency
14
14
  name: rspec
@@ -24,6 +24,20 @@ dependencies:
24
24
  - - ">="
25
25
  - !ruby/object:Gem::Version
26
26
  version: '0'
27
+ - !ruby/object:Gem::Dependency
28
+ name: simplecov
29
+ requirement: !ruby/object:Gem::Requirement
30
+ requirements:
31
+ - - ">="
32
+ - !ruby/object:Gem::Version
33
+ version: '0'
34
+ type: :development
35
+ prerelease: false
36
+ version_requirements: !ruby/object:Gem::Requirement
37
+ requirements:
38
+ - - ">="
39
+ - !ruby/object:Gem::Version
40
+ version: '0'
27
41
  description: Ruby Gem for smarter importing of CSV Files as Array(s) of Hashes, with
28
42
  optional features for processing large files in parallel, embedded comments, unusual
29
43
  field- and record-separators, flexible mapping of CSV-headers to Hash-keys
@@ -38,6 +52,7 @@ files:
38
52
  - ".rvmrc"
39
53
  - ".travis.yml"
40
54
  - CHANGELOG.md
55
+ - CONTRIBUTORS.md
41
56
  - Gemfile
42
57
  - LICENSE.txt
43
58
  - README.md
@@ -143,7 +158,7 @@ required_rubygems_version: !ruby/object:Gem::Requirement
143
158
  version: '0'
144
159
  requirements:
145
160
  - csv
146
- rubygems_version: 3.1.4
161
+ rubygems_version: 3.1.6
147
162
  signing_key:
148
163
  specification_version: 4
149
164
  summary: Ruby Gem for smarter importing of CSV Files (and CSV-like files), with lots