smarter_csv 1.4.0 → 1.4.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: c8236e4cc8f0081efd9b74f12ad4b5342707d0a2f883414b07538160910008a3
4
- data.tar.gz: b04a53b0030bf6c623aa19fb15c0c6c5ca123ce2ff85d47f176884fffa0f9811
3
+ metadata.gz: 3be724101d41326ff480bcb723c1b40a3cabd879eb55e0c2f044372f8e5a57d0
4
+ data.tar.gz: 657db1421352f449bf042f8df4d5178167af048ad37836e4f2f2f8a6aea3ece0
5
5
  SHA512:
6
- metadata.gz: f2ddaa7bf44362c8bb4439289172d40b6ca926a67a8a35fb335473ddf7349658a629f3008ece5314c6bc5fa17145a2ae89b4d706b9c130a1642a51f2434d5e21
7
- data.tar.gz: b48908b657a07589886873fe251263dabbe6e2333a1fc025dfede085841544458d4498ba1d288a4a7c0de3875d1c14631cc584b2a1cb7fd0be1543b758781dd3
6
+ metadata.gz: 3430649df35ac8139d35b04b85e8691ca5fc3d98b7b15f0d3987855f571987bdb742e0ed6f807ddb7a2e61e61d696d529ac311bc58e30188325f1c4bb78098a4
7
+ data.tar.gz: 1b386af7cc7c39bc7ea934875e16f6641a2cc0c2bb5dfaa3b1f298739b1b355b2f41570e42998a2d7790a17f96feb07118b69c23d913acc634aae5901f0c9229
data/.gitignore CHANGED
@@ -6,3 +6,5 @@
6
6
  .bundle
7
7
  Gemfile.lock
8
8
  pkg/*
9
+ coverage/*
10
+ .DS_Store
data/CHANGELOG.md CHANGED
@@ -1,14 +1,18 @@
1
1
 
2
2
  # SmarterCSV 1.x Change Log
3
3
 
4
- ## 1.4.0 (2022-01-11)
4
+ ## 1.4.1 (2022-02-12)
5
+ * minor fix: also support `col_sep: :auto`
6
+ * added simplecov
7
+
8
+ ## 1.4.0 (2022-02-11)
5
9
  * dropped GPL license, smarter_csv is now only using the MIT License
6
10
  * added experimental option `col_sep: 'auto` to auto-detect the column separator (issue #183)
7
11
  The default behavior is still to assume `,` is the column separator.
8
12
  * fixed buggy behavior when using `remove_empty_values: false` (issue #168)
9
13
  * fixed Ruby 3.0 deprecation
10
14
 
11
- ## 1.3.0 (2022-01-06) Breaking code change if you used `--key_mappings`
15
+ ## 1.3.0 (2022-02-06) Breaking code change if you used `--key_mappings`
12
16
  * fix bug for key_mappings (issue #181)
13
17
  The values of the `key_mappings` hash will now be used "as is", and no longer forced to be symbols
14
18
 
data/CONTRIBUTORS.md ADDED
@@ -0,0 +1,45 @@
1
+ # A Big Thank You to all the Contributors!!
2
+
3
+
4
+ A Big Thank you to everyone who filed issues, sent comments, and who contributed with pull requests:
5
+
6
+ * [Jack 0](https://github.com/xjlin0)
7
+ * [Alejandro](https://github.com/agaviria)
8
+ * [Lucas Camargo de Almeida](https://github.com/lcalmeida)
9
+ * [Raphaël Bleuse](https://github.com/bleuse)
10
+ * [feens](https://github.com/feens)
11
+ * [César Camacho](https://github.com/chanko)
12
+ * [innhyu](https://github.com/innhyu)
13
+ * [Benjamin Thouret](https://github.com/benichu)
14
+ * [Chris Hilton](https://github.com/chrismhilton)
15
+ * [Sean Duckett](http://github.com/sduckett)
16
+ * [Alex Ong](http://github.com/khaong)
17
+ * [Martin Nilsson](http://github.com/MrTin)
18
+ * [Eustáquio Rangel](http://github.com/taq)
19
+ * [Pavel](http://github.com/paxa)
20
+ * [Félix Bellanger](https://github.com/Keeguon)
21
+ * [Graham Wetzler](https://github.com/grahamwetzler)
22
+ * [Marcos G. Zimmermann](https://github.com/marcosgz)
23
+ * [Jordan Running](https://github.com/jrunning)
24
+ * [Dave Sanders](https://github.com/DaveSanders)
25
+ * [Hugo Lepetit](https://github.com/giglemad)
26
+ * [esBeee](https://github.com/esBeee)
27
+ * [Waldyr de Souza](https://github.com/waldyr)
28
+ * [Ben Maher](https://github.com/benmaher)
29
+ * [Wal McConnell](https://github.com/wal)
30
+ * [Jordan Graft](https://github.com/jordangraft)
31
+ * [Michael](https://github.com/polycarpou)
32
+ * [Kevin Coleman](https://github.com/KevinColemanInc)
33
+ * [Tirdad C.](https://github.com/tridadc)
34
+ * [Dave Myron](https://github.com/contentfree)
35
+ * [Ivan Ushakov](https://github.com/IvanUshakov)
36
+ * [Matthieu Paret](https://github.com/mtparet)
37
+ * [Rohit Amarnath](https://github.com/ramarnat)
38
+ * [Joshua Smith](https://github.com/enviable)
39
+ * [Colin Petruno](https://github.com/colinpetruno)
40
+ * [Diego Salido](https://github.com/salidux)
41
+ * [Elie](https://github.com/elieteyssedou)
42
+ * [Chris Wong](https://github.com/lightwave)
43
+ * [Olle Jonsson](https://github.com/olleolleolle)
44
+ * [Nicolas Guillemain](https://github.com/Viiruus)
45
+ * [Sp6](https://github.com/sp6)
data/LICENSE.txt CHANGED
@@ -1,6 +1,6 @@
1
1
  The MIT License (MIT)
2
2
 
3
- Copyright (c) 2022 Tilo Sloboda
3
+ Copyright (c) 2012..2022 Tilo Sloboda
4
4
 
5
5
  Permission is hereby granted, free of charge, to any person obtaining a copy
6
6
  of this software and associated documentation files (the "Software"), to deal
data/README.md CHANGED
@@ -1,17 +1,23 @@
1
- # SmarterCSV
2
-
3
- [![Build Status](https://secure.travis-ci.org/tilo/smarter_csv.svg?branch=master)](http://travis-ci.org/tilo/smarter_csv) [![Gem Version](https://badge.fury.io/rb/smarter_csv.svg)](http://badge.fury.io/rb/smarter_csv)
4
1
 
5
- ---------------
6
2
  #### Service Announcement
7
3
 
8
4
  * Work towards SmarterCSV 2.0 is still on it's way, with much improved features, and more streamlined options.
9
- Please check the 2.0-develop branch, open any issues and pull requests with mention of v2.0.
5
+ Please check the [2.0-develop branch](https://github.com/tilo/smarter_csv/blob/master/README.md), open any issues and pull requests with mention of v2.0.
10
6
 
11
- * New versions on the 1.2 branch will soon print a deprecation warning if you set :verbose to true
7
+ * New versions of SmarterCSV 1.x will soon print a deprecation warning if you set :verbose to true
12
8
  See below for list of deprecated options.
13
9
 
10
+ #### Restructured Branches
11
+
12
+ * default branch is `main` for 1.x development
13
+ * 2.x development is on `2.0-development`
14
+
14
15
  ---------------
16
+
17
+ # SmarterCSV
18
+
19
+ [![Build Status](https://secure.travis-ci.org/tilo/smarter_csv.svg?branch=master)](http://travis-ci.org/tilo/smarter_csv) [![Gem Version](https://badge.fury.io/rb/smarter_csv.svg)](http://badge.fury.io/rb/smarter_csv)
20
+
15
21
  #### SmarterCSV 1.x
16
22
 
17
23
  `smarter_csv` is a Ruby Gem for smarter importing of CSV Files as Array(s) of Hashes, suitable for direct processing with Mongoid or ActiveRecord,
@@ -55,6 +61,7 @@ You can also set the `:row_sep` manually! Checkout Example 5 for unusual `:row_s
55
61
  #### Example 1a: How SmarterCSV processes CSV-files as array of hashes:
56
62
  Please note how each hash contains only the keys for columns with non-null values.
57
63
 
64
+ ```ruby
58
65
  $ cat pets.csv
59
66
  first name,last name,dogs,cats,birds,fish
60
67
  Dan,McAllister,2,,,
@@ -70,21 +77,25 @@ Please note how each hash contains only the keys for columns with non-null value
70
77
  {:first_name=>"Miles", :last_name=>"O'Brian", :fish=>"21"},
71
78
  {:first_name=>"Nancy", :last_name=>"Homes", :dogs=>"2", :birds=>"1"}
72
79
  ]
80
+ ```
73
81
 
74
82
 
75
83
  #### Example 1b: How SmarterCSV processes CSV-files as chunks, returning arrays of hashes:
76
84
  Please note how the returned array contains two sub-arrays containing the chunks which were read, each chunk containing 2 hashes.
77
85
  In case the number of rows is not cleanly divisible by `:chunk_size`, the last chunk contains fewer hashes.
78
86
 
87
+ ```ruby
79
88
  > pets_by_owner = SmarterCSV.process('/tmp/pets.csv', {:chunk_size => 2, :key_mapping => {:first_name => :first, :last_name => :last}})
80
89
  => [ [ {:first=>"Dan", :last=>"McAllister", :dogs=>"2"}, {:first=>"Lucy", :last=>"Laweless", :cats=>"5"} ],
81
90
  [ {:first=>"Miles", :last=>"O'Brian", :fish=>"21"}, {:first=>"Nancy", :last=>"Homes", :dogs=>"2", :birds=>"1"} ]
82
91
  ]
92
+ ```
83
93
 
84
94
  #### Example 1c: How SmarterCSV processes CSV-files as chunks, and passes arrays of hashes to a given block:
85
95
  Please note how the given block is passed the data for each chunk as the parameter (array of hashes),
86
96
  and how the `process` method returns the number of chunks when called with a block
87
97
 
98
+ ```ruby
88
99
  > total_chunks = SmarterCSV.process('/tmp/pets.csv', {:chunk_size => 2, :key_mapping => {:first_name => :first, :last_name => :last}}) do |chunk|
89
100
  chunk.each do |h| # you can post-process the data from each row to your heart's content, and also create virtual attributes:
90
101
  h[:full_name] = [h[:first],h[:last]].join(' ') # create a virtual attribute
@@ -96,16 +107,16 @@ and how the `process` method returns the number of chunks when called with a blo
96
107
  [{:dogs=>"2", :full_name=>"Dan McAllister"}, {:cats=>"5", :full_name=>"Lucy Laweless"}]
97
108
  [{:fish=>"21", :full_name=>"Miles O'Brian"}, {:dogs=>"2", :birds=>"1", :full_name=>"Nancy Homes"}]
98
109
  => 2
99
-
110
+ ```
100
111
  #### Example 2: Reading a CSV-File in one Chunk, returning one Array of Hashes:
101
-
112
+ ```ruby
102
113
  filename = '/tmp/input_file.txt' # TAB delimited file, each row ending with Control-M
103
114
  recordsA = SmarterCSV.process(filename, {:col_sep => "\t", :row_sep => "\cM"}) # no block given
104
115
 
105
116
  => returns an array of hashes
106
-
117
+ ```
107
118
  #### Example 3: Populate a MySQL or MongoDB Database with SmarterCSV:
108
-
119
+ ```ruby
109
120
  # without using chunks:
110
121
  filename = '/tmp/some.csv'
111
122
  options = {:key_mapping => {:unwanted_row => nil, :old_row_name => :new_name}}
@@ -116,9 +127,9 @@ and how the `process` method returns the number of chunks when called with a blo
116
127
  end
117
128
 
118
129
  => returns number of chunks / rows we processed
119
-
130
+ ```
120
131
  #### Example 4: Populate a MongoDB Database in Chunks of 100 records with SmarterCSV:
121
-
132
+ ```ruby
122
133
  # using chunks:
123
134
  filename = '/tmp/some.csv'
124
135
  options = {:chunk_size => 100, :key_mapping => {:unwanted_row => nil, :old_row_name => :new_name}}
@@ -129,10 +140,10 @@ and how the `process` method returns the number of chunks when called with a blo
129
140
  end
130
141
 
131
142
  => returns number of chunks we processed
132
-
143
+ ```
133
144
 
134
145
  #### Example 5: Reading a CSV-like File, and Processing it with Resque:
135
-
146
+ ```ruby
136
147
  filename = '/tmp/strange_db_dump' # a file with CRTL-A as col_separator, and with CTRL-B\n as record_separator (hello iTunes!)
137
148
  options = {
138
149
  :col_sep => "\cA", :row_sep => "\cB\n", :comment_regexp => /^#/,
@@ -142,11 +153,11 @@ and how the `process` method returns the number of chunks when called with a blo
142
153
  Resque.enque( ResqueWorkerClass, chunk ) # pass chunks of CSV-data to Resque workers for parallel processing
143
154
  end
144
155
  => returns number of chunks
145
-
156
+ ```
146
157
  #### Example 6: Using Value Converters
147
158
 
148
159
  NOTE: If you use `key_mappings` and `value_converters`, make sure that the value converters has references the keys based on the final mapped name, not the original name in the CSV file.
149
-
160
+ ```ruby
150
161
  $ cat spec/fixtures/with_dates.csv
151
162
  first,last,date,price
152
163
  Ben,Miller,10/30/1998,$44.50
@@ -179,7 +190,7 @@ NOTE: If you use `key_mappings` and `value_converters`, make sure that the value
179
190
  => 44.50
180
191
  data[0][:price].class
181
192
  => Float
182
-
193
+ ```
183
194
  ## Parallel Processing
184
195
  [Jack](https://github.com/xjlin0) wrote an interesting article about [Speeding up CSV parsing with parallel processing](http://xjlin0.github.io/tech/2015/05/25/faster-parsing-csv-with-parallel-processing)
185
196
 
@@ -206,7 +217,7 @@ The options and the block are optional.
206
217
  | :skip_lines | nil | how many lines to skip before the first line or header line is processed |
207
218
  | :comment_regexp | /^#/ | regular expression which matches comment lines (see NOTE about the CSV header) |
208
219
  ---------------------------------------------------------------------------------------------------------------------------------
209
- | :col_sep | ',' | column separator, can be set to 'auto' |
220
+ | :col_sep | ',' | column separator, can be set to :auto |
210
221
  | :force_simple_split | false | force simple splitting on :col_sep character for non-standard CSV-files. |
211
222
  | | | e.g. when :quote_char is not properly escaped |
212
223
  | :row_sep | $/ ,"\n" | row separator or record separator , defaults to system's $/ , which defaults to "\n" |
@@ -258,19 +269,19 @@ And header and data validations will also be supported in 2.x
258
269
  #### NOTES about File Encodings:
259
270
  * if you have a CSV file which contains unicode characters, you can process it as follows:
260
271
 
261
-
272
+ ```ruby
262
273
  File.open(filename, "r:bom|utf-8") do |f|
263
274
  data = SmarterCSV.process(f);
264
275
  end
265
-
276
+ ```
266
277
  * if the CSV file with unicode characters is in a remote location, similarly you need to give the encoding as an option to the `open` call:
267
-
278
+ ```ruby
268
279
  require 'open-uri'
269
280
  file_location = 'http://your.remote.org/sample.csv'
270
281
  open(file_location, 'r:utf-8') do |f| # don't forget to specify the UTF-8 encoding!!
271
282
  data = SmarterCSV.process(f)
272
283
  end
273
-
284
+ ```
274
285
  #### NOTES about CSV Headers:
275
286
  * as this method parses CSV files, it is assumed that the first line of any file will contain a valid header
276
287
  * the first line with the CSV header may or may not be commented out according to the :comment_regexp
@@ -304,64 +315,27 @@ And header and data validations will also be supported in 2.x
304
315
  ## Installation
305
316
 
306
317
  Add this line to your application's Gemfile:
307
-
318
+ ```ruby
308
319
  gem 'smarter_csv'
309
-
320
+ ```
310
321
  And then execute:
311
-
322
+ ```ruby
312
323
  $ bundle
313
-
324
+ ```
314
325
  Or install it yourself as:
315
-
326
+ ```ruby
316
327
  $ gem install smarter_csv
317
-
328
+ ```
318
329
  ## [ChangeLog](./CHANGELOG.md)
319
330
 
320
331
  ## Reporting Bugs / Feature Requests
321
332
 
322
333
  Please [open an Issue on GitHub](https://github.com/tilo/smarter_csv/issues) if you have feedback, new feature requests, or want to report a bug. Thank you!
323
334
 
335
+ * please include a small sample CSV file
336
+ * please mention your version of SmarterCSV, Ruby, Rails
324
337
 
325
- ## Special Thanks
326
-
327
- Many thanks to people who have filed issues and sent comments.
328
- And a special thanks to those who contributed pull requests:
329
-
330
- * [Jack 0](https://github.com/xjlin0)
331
- * [Alejandro](https://github.com/agaviria)
332
- * [Lucas Camargo de Almeida](https://github.com/lcalmeida)
333
- * [Raphaël Bleuse](https://github.com/bleuse)
334
- * [feens](https://github.com/feens)
335
- * [César Camacho](https://github.com/chanko)
336
- * [innhyu](https://github.com/innhyu)
337
- * [Benjamin Thouret](https://github.com/benichu)
338
- * [Chris Hilton](https://github.com/chrismhilton)
339
- * [Sean Duckett](http://github.com/sduckett)
340
- * [Alex Ong](http://github.com/khaong)
341
- * [Martin Nilsson](http://github.com/MrTin)
342
- * [Eustáquio Rangel](http://github.com/taq)
343
- * [Pavel](http://github.com/paxa)
344
- * [Félix Bellanger](https://github.com/Keeguon)
345
- * [Graham Wetzler](https://github.com/grahamwetzler)
346
- * [Marcos G. Zimmermann](https://github.com/marcosgz)
347
- * [Jordan Running](https://github.com/jrunning)
348
- * [Dave Sanders](https://github.com/DaveSanders)
349
- * [Hugo Lepetit](https://github.com/giglemad)
350
- * [esBeee](https://github.com/esBeee)
351
- * [Waldyr de Souza](https://github.com/waldyr)
352
- * [Ben Maher](https://github.com/benmaher)
353
- * [Wal McConnell](https://github.com/wal)
354
- * [Jordan Graft](https://github.com/jordangraft)
355
- * [Michael](https://github.com/polycarpou)
356
- * [Kevin Coleman](https://github.com/KevinColemanInc)
357
- * [Tirdad C.](https://github.com/tridadc)
358
- * [Dave Myron](https://github.com/contentfree)
359
- * [Ivan Ushakov](https://github.com/IvanUshakov)
360
- * [Matthieu Paret](https://github.com/mtparet)
361
- * [Rohit Amarnath](https://github.com/ramarnat)
362
- * [Joshua Smith](https://github.com/enviable)
363
- * [Colin Petruno](https://github.com/colinpetruno)
364
- * [Diego Salido](https://github.com/salidux)
338
+ ## [A Special Thanks to all Contributors!](CONTRIBUTORS.md) 🎉🎉🎉
365
339
 
366
340
 
367
341
  ## Contributing
data/Rakefile CHANGED
@@ -1,26 +1,19 @@
1
1
  #!/usr/bin/env rake
2
2
  require "bundler/gem_tasks"
3
-
4
3
  require 'rubygems'
5
4
  require 'rake'
6
-
7
5
  require 'rspec/core/rake_task'
8
6
 
7
+ task :default => :spec
8
+
9
9
  desc "Run RSpec"
10
10
  RSpec::Core::RakeTask.new do |t|
11
- t.verbose = false
11
+ # t.verbose = false
12
12
  end
13
13
 
14
- desc "Run specs for all test cases"
15
- task :spec_all do
16
- system "rake spec"
14
+ desc 'Run spec with coverage'
15
+ task :coverage do
16
+ ENV['COVERAGE'] = 'true'
17
+ Rake::Task['spec'].execute
18
+ `open coverage/index.html`
17
19
  end
18
-
19
- # task :spec_all do
20
- # %w[active_record data_mapper mongoid].each do |model_adapter|
21
- # puts "MODEL_ADAPTER = #{model_adapter}"
22
- # system "rake spec MODEL_ADAPTER=#{model_adapter}"
23
- # end
24
- # end
25
-
26
- task :default => :spec
@@ -7,16 +7,9 @@ module SmarterCSV
7
7
  class NoColSepDetected < SmarterCSVException; end
8
8
 
9
9
  def SmarterCSV.process(input, options={}, &block) # first parameter: filename or input object with readline method
10
- default_options = {:col_sep => ',', :row_sep => $INPUT_RECORD_SEPARATOR, :quote_char => '"', :force_simple_split => false , :verbose => false ,
11
- :remove_empty_values => true, :remove_zero_values => false , :remove_values_matching => nil , :remove_empty_hashes => true , :strip_whitespace => true,
12
- :convert_values_to_numeric => true, :strip_chars_from_headers => nil , :user_provided_headers => nil , :headers_in_file => true,
13
- :comment_regexp => /\A#/, :chunk_size => nil , :key_mapping_hash => nil , :downcase_header => true, :strings_as_keys => false, :file_encoding => 'utf-8',
14
- :remove_unmapped_keys => false, :keep_original_headers => false, :value_converters => nil, :skip_lines => nil, :force_utf8 => false, :invalid_byte_sequence => '',
15
- :auto_row_sep_chars => 500, :required_headers => nil
16
- }
17
10
  options = default_options.merge(options)
18
11
  options[:invalid_byte_sequence] = '' if options[:invalid_byte_sequence].nil?
19
- csv_options = options.select{|k,v| [:col_sep, :row_sep, :quote_char].include?(k)} # options.slice(:col_sep, :row_sep, :quote_char)
12
+
20
13
  headerA = []
21
14
  result = []
22
15
  old_row_sep = $INPUT_RECORD_SEPARATOR
@@ -26,22 +19,21 @@ module SmarterCSV
26
19
  begin
27
20
  f = input.respond_to?(:readline) ? input : File.open(input, "r:#{options[:file_encoding]}")
28
21
 
22
+ # auto-detect the row separator
23
+ options[:row_sep] = SmarterCSV.guess_line_ending(f, options) if options[:row_sep].to_sym == :auto
24
+ $INPUT_RECORD_SEPARATOR = options[:row_sep]
29
25
  # attempt to auto-detect column separator
30
- options[:col_sep] = guess_column_separator(f) if options[:col_sep] == 'auto'
26
+ options[:col_sep] = guess_column_separator(f) if options[:col_sep].to_sym == :auto
27
+ # preserve options, in case we need to call the CSV class
28
+ csv_options = options.select{|k,v| [:col_sep, :row_sep, :quote_char].include?(k)} # options.slice(:col_sep, :row_sep, :quote_char)
29
+ csv_options.delete(:row_sep) if [nil, :auto].include?( options[:row_sep].to_sym )
30
+ csv_options.delete(:col_sep) if [nil, :auto].include?( options[:col_sep].to_sym )
31
31
 
32
32
  if (options[:force_utf8] || options[:file_encoding] =~ /utf-8/i) && ( f.respond_to?(:external_encoding) && f.external_encoding != Encoding.find('UTF-8') || f.respond_to?(:encoding) && f.encoding != Encoding.find('UTF-8') )
33
33
  puts 'WARNING: you are trying to process UTF-8 input, but did not open the input with "b:utf-8" option. See README file "NOTES about File Encodings".'
34
34
  end
35
35
 
36
- if options[:row_sep] == :auto
37
- options[:row_sep] = line_ending = SmarterCSV.guess_line_ending( f, options )
38
- f.rewind
39
- end
40
- $INPUT_RECORD_SEPARATOR = options[:row_sep]
41
-
42
- if options[:skip_lines].to_i > 0
43
- options[:skip_lines].to_i.times{f.readline}
44
- end
36
+ options[:skip_lines].to_i.times{f.readline} if options[:skip_lines].to_i > 0
45
37
 
46
38
  if options[:headers_in_file] # extract the header line
47
39
  # process the header line in the CSV file..
@@ -87,7 +79,7 @@ module SmarterCSV
87
79
  else
88
80
  headerA = file_headerA
89
81
  end
90
- header_size = headerA.size
82
+ header_size = headerA.size # used for splitting lines
91
83
 
92
84
  headerA.map!{|x| x.to_sym } unless options[:strings_as_keys] || options[:keep_original_headers]
93
85
 
@@ -141,8 +133,8 @@ module SmarterCSV
141
133
  # cater for the quoted csv data containing the row separator carriage return character
142
134
  # in which case the row data will be split across multiple lines (see the sample content in spec/fixtures/carriage_returns_rn.csv)
143
135
  # by detecting the existence of an uneven number of quote characters
144
- multiline = line.count(options[:quote_char])%2 == 1
145
- while line.count(options[:quote_char])%2 == 1
136
+ multiline = line.count(options[:quote_char])%2 == 1 # should handle quote_char nil
137
+ while line.count(options[:quote_char])%2 == 1 # should handle quote_char nil
146
138
  next_line = f.readline
147
139
  next_line = next_line.force_encoding('utf-8').encode('utf-8', invalid: :replace, undef: :replace, replace: options[:invalid_byte_sequence]) if options[:force_utf8] || options[:file_encoding] !~ /utf-8/i
148
140
  line += next_line
@@ -269,6 +261,39 @@ module SmarterCSV
269
261
 
270
262
  private
271
263
 
264
+ def self.default_options
265
+ {
266
+ auto_row_sep_chars: 500,
267
+ chunk_size: nil ,
268
+ col_sep: ',',
269
+ comment_regexp: /\A#/,
270
+ convert_values_to_numeric: true,
271
+ downcase_header: true,
272
+ file_encoding: 'utf-8',
273
+ force_simple_split: false ,
274
+ force_utf8: false,
275
+ headers_in_file: true,
276
+ invalid_byte_sequence: '',
277
+ keep_original_headers: false,
278
+ key_mapping_hash: nil ,
279
+ quote_char: '"',
280
+ remove_empty_hashes: true ,
281
+ remove_empty_values: true,
282
+ remove_unmapped_keys: false,
283
+ remove_values_matching: nil,
284
+ remove_zero_values: false,
285
+ required_headers: nil,
286
+ row_sep: $INPUT_RECORD_SEPARATOR,
287
+ skip_lines: nil,
288
+ strings_as_keys: false,
289
+ strip_chars_from_headers: nil,
290
+ strip_whitespace: true,
291
+ user_provided_headers: nil,
292
+ value_converters: nil,
293
+ verbose: false,
294
+ }
295
+ end
296
+
272
297
  def self.blank?(value)
273
298
  case value
274
299
  when Array
@@ -347,6 +372,8 @@ module SmarterCSV
347
372
  lines += 1
348
373
  break if options[:auto_row_sep_chars] && options[:auto_row_sep_chars] > 0 && lines >= options[:auto_row_sep_chars]
349
374
  end
375
+ filehandle.rewind
376
+
350
377
  counts["\r"] += 1 if last_char == "\r"
351
378
  # find the key/value pair with the largest counter:
352
379
  k,_ = counts.max_by{|_,v| v}
@@ -1,3 +1,3 @@
1
1
  module SmarterCSV
2
- VERSION = "1.4.0"
2
+ VERSION = "1.4.2"
3
3
  end
data/lib/smarter_csv.rb CHANGED
@@ -1,3 +1,11 @@
1
+ if ENV['COVERAGE']
2
+ require 'simplecov'
3
+ SimpleCov.start do
4
+ add_filter "/spec/"
5
+ add_filter "/pkg/"
6
+ end
7
+ end
8
+
1
9
  require 'csv'
2
10
  require "smarter_csv/version"
3
11
  require "extensions/hash.rb"
data/smarter_csv.gemspec CHANGED
@@ -18,6 +18,7 @@ Gem::Specification.new do |spec|
18
18
  spec.require_paths = ["lib"]
19
19
  spec.requirements = ['csv'] # for CSV.parse() only needed in case we have quoted fields
20
20
  spec.add_development_dependency "rspec"
21
+ spec.add_development_dependency "simplecov"
21
22
  # spec.add_development_dependency "guard-rspec"
22
23
 
23
24
  spec.metadata["homepage_uri"] = spec.homepage
@@ -3,7 +3,6 @@ require 'spec_helper'
3
3
  fixture_path = 'spec/fixtures'
4
4
 
5
5
  describe 'process files with line endings explicitly pre-specified' do
6
-
7
6
  it 'should process a file with \n for line endings and within data fields' do
8
7
  sep = "\n"
9
8
  options = {:row_sep => sep}
@@ -83,14 +82,14 @@ describe 'process files with line endings explicitly pre-specified' do
83
82
  data[1][:members].should == ["Jimmy Page", "Robert Plant", "John Bonham", "John Paul Jones"].join(text_sep)
84
83
  data[1][:albums].should == ["Led Zeppelin", "Led Zeppelin II", "Led Zeppelin III", "Led Zeppelin IV"].join(text_sep)
85
84
  end
86
-
87
85
  end
88
86
 
89
87
  describe 'process files with line endings in automatic mode' do
88
+ let(:options) { { row_sep: :auto } }
90
89
 
91
90
  it 'should process a file with \n for line endings and within data fields' do
92
91
  sep = "\n"
93
- data = SmarterCSV.process("#{fixture_path}/carriage_returns_n.csv", {:row_sep => :auto})
92
+ data = SmarterCSV.process("#{fixture_path}/carriage_returns_n.csv", options)
94
93
  data.flatten.size.should == 8
95
94
  data[0][:name].should == "Anfield"
96
95
  data[0][:street].should == "Anfield Road"
@@ -112,7 +111,29 @@ describe 'process files with line endings in automatic mode' do
112
111
 
113
112
  it 'should process a file with \r for line endings and within data fields' do
114
113
  sep = "\r"
115
- data = SmarterCSV.process("#{fixture_path}/carriage_returns_r.csv", {:row_sep => :auto})
114
+ data = SmarterCSV.process("#{fixture_path}/carriage_returns_r.csv", options)
115
+ data.flatten.size.should == 8
116
+ data[0][:name].should == "Anfield"
117
+ data[0][:street].should == "Anfield Road"
118
+ data[0][:city].should == "Liverpool"
119
+ data[1][:name].should == ["Highbury", "Highbury House"].join(sep)
120
+ data[2][:street].should == ["Sir Matt ", "Busby Way"].join(sep)
121
+ data[3][:city].should == ["Newcastle-upon-tyne ", "Tyne and Wear"].join(sep)
122
+ data[4][:name].should == ["White Hart Lane", "(The Lane)"].join(sep)
123
+ data[4][:street].should == ["Bill Nicholson Way ", "748 High Rd"].join(sep)
124
+ data[4][:city].should == ["Tottenham", "London"].join(sep)
125
+ data[5][:name].should == "Stamford Bridge"
126
+ data[5][:street].should == ["Fulham Road", "London"].join(sep)
127
+ data[5][:city].should be_nil
128
+ data[6][:name].should == ["Etihad Stadium", "Rowsley St", "Manchester"].join(sep)
129
+ data[7][:name].should == "Goodison"
130
+ data[7][:street].should == "Goodison Road"
131
+ data[7][:city].should == "Liverpool"
132
+ end
133
+
134
+ it 'also works when auto is given a string' do
135
+ sep = "\r"
136
+ data = SmarterCSV.process("#{fixture_path}/carriage_returns_r.csv", {row_sep: 'auto'})
116
137
  data.flatten.size.should == 8
117
138
  data[0][:name].should == "Anfield"
118
139
  data[0][:street].should == "Anfield Road"
@@ -134,7 +155,7 @@ describe 'process files with line endings in automatic mode' do
134
155
 
135
156
  it 'should process a file with \r\n for line endings and within data fields' do
136
157
  sep = "\r\n"
137
- data = SmarterCSV.process("#{fixture_path}/carriage_returns_rn.csv", {:row_sep => :auto})
158
+ data = SmarterCSV.process("#{fixture_path}/carriage_returns_rn.csv", options)
138
159
  data.flatten.size.should == 8
139
160
  data[0][:name].should == "Anfield"
140
161
  data[0][:street].should == "Anfield Road"
@@ -157,7 +178,7 @@ describe 'process files with line endings in automatic mode' do
157
178
  it 'should process a file with more quoted text carriage return characters (\r) than line ending characters (\n)' do
158
179
  row_sep = "\n"
159
180
  text_sep = "\r"
160
- data = SmarterCSV.process("#{fixture_path}/carriage_returns_quoted.csv", {:row_sep => :auto})
181
+ data = SmarterCSV.process("#{fixture_path}/carriage_returns_quoted.csv", options)
161
182
  data.flatten.size.should == 2
162
183
  data[0][:band].should == "New Order"
163
184
  data[0][:members].should == ["Bernard Sumner", "Peter Hook", "Stephen Morris", "Gillian Gilbert"].join(text_sep)
@@ -166,5 +187,4 @@ describe 'process files with line endings in automatic mode' do
166
187
  data[1][:members].should == ["Jimmy Page", "Robert Plant", "John Bonham", "John Paul Jones"].join(text_sep)
167
188
  data[1][:albums].should == ["Led Zeppelin", "Led Zeppelin II", "Led Zeppelin III", "Led Zeppelin IV"].join(text_sep)
168
189
  end
169
-
170
190
  end
@@ -48,7 +48,7 @@ describe 'can handle col_sep' do
48
48
  end
49
49
 
50
50
  describe 'auto-detection of separator' do
51
- options = {:col_sep => 'auto'}
51
+ options = {col_sep: :auto}
52
52
 
53
53
  it 'auto-detects comma separator and loads data' do
54
54
  data = SmarterCSV.process("#{fixture_path}/separator_comma.csv", options)
@@ -85,5 +85,11 @@ describe 'can handle col_sep' do
85
85
  SmarterCSV.process("#{fixture_path}/binary.csv", options)
86
86
  }.to raise_exception SmarterCSV::NoColSepDetected
87
87
  end
88
+
89
+ it 'also works when auto is given a string' do
90
+ data = SmarterCSV.process("#{fixture_path}/separator_pipe.csv", col_sep: 'auto')
91
+ data.first.keys.size.should == 4
92
+ data.size.should eq 3
93
+ end
88
94
  end
89
95
  end
metadata CHANGED
@@ -1,14 +1,14 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: smarter_csv
3
3
  version: !ruby/object:Gem::Version
4
- version: 1.4.0
4
+ version: 1.4.2
5
5
  platform: ruby
6
6
  authors:
7
7
  - Tilo Sloboda
8
8
  autorequire:
9
9
  bindir: bin
10
10
  cert_chain: []
11
- date: 2022-02-11 00:00:00.000000000 Z
11
+ date: 2022-02-15 00:00:00.000000000 Z
12
12
  dependencies:
13
13
  - !ruby/object:Gem::Dependency
14
14
  name: rspec
@@ -24,6 +24,20 @@ dependencies:
24
24
  - - ">="
25
25
  - !ruby/object:Gem::Version
26
26
  version: '0'
27
+ - !ruby/object:Gem::Dependency
28
+ name: simplecov
29
+ requirement: !ruby/object:Gem::Requirement
30
+ requirements:
31
+ - - ">="
32
+ - !ruby/object:Gem::Version
33
+ version: '0'
34
+ type: :development
35
+ prerelease: false
36
+ version_requirements: !ruby/object:Gem::Requirement
37
+ requirements:
38
+ - - ">="
39
+ - !ruby/object:Gem::Version
40
+ version: '0'
27
41
  description: Ruby Gem for smarter importing of CSV Files as Array(s) of Hashes, with
28
42
  optional features for processing large files in parallel, embedded comments, unusual
29
43
  field- and record-separators, flexible mapping of CSV-headers to Hash-keys
@@ -38,6 +52,7 @@ files:
38
52
  - ".rvmrc"
39
53
  - ".travis.yml"
40
54
  - CHANGELOG.md
55
+ - CONTRIBUTORS.md
41
56
  - Gemfile
42
57
  - LICENSE.txt
43
58
  - README.md
@@ -143,7 +158,7 @@ required_rubygems_version: !ruby/object:Gem::Requirement
143
158
  version: '0'
144
159
  requirements:
145
160
  - csv
146
- rubygems_version: 3.1.4
161
+ rubygems_version: 3.1.6
147
162
  signing_key:
148
163
  specification_version: 4
149
164
  summary: Ruby Gem for smarter importing of CSV Files (and CSV-like files), with lots