smarter_csv 1.4.0 → 1.4.2
Sign up to get free protection for your applications and to get access to all the features.
- checksums.yaml +4 -4
- data/.gitignore +2 -0
- data/CHANGELOG.md +6 -2
- data/CONTRIBUTORS.md +45 -0
- data/LICENSE.txt +1 -1
- data/README.md +42 -68
- data/Rakefile +8 -15
- data/lib/smarter_csv/smarter_csv.rb +48 -21
- data/lib/smarter_csv/version.rb +1 -1
- data/lib/smarter_csv.rb +8 -0
- data/smarter_csv.gemspec +1 -0
- data/spec/smarter_csv/carriage_return_spec.rb +27 -7
- data/spec/smarter_csv/column_separator_spec.rb +7 -1
- metadata +18 -3
checksums.yaml
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
---
|
2
2
|
SHA256:
|
3
|
-
metadata.gz:
|
4
|
-
data.tar.gz:
|
3
|
+
metadata.gz: 3be724101d41326ff480bcb723c1b40a3cabd879eb55e0c2f044372f8e5a57d0
|
4
|
+
data.tar.gz: 657db1421352f449bf042f8df4d5178167af048ad37836e4f2f2f8a6aea3ece0
|
5
5
|
SHA512:
|
6
|
-
metadata.gz:
|
7
|
-
data.tar.gz:
|
6
|
+
metadata.gz: 3430649df35ac8139d35b04b85e8691ca5fc3d98b7b15f0d3987855f571987bdb742e0ed6f807ddb7a2e61e61d696d529ac311bc58e30188325f1c4bb78098a4
|
7
|
+
data.tar.gz: 1b386af7cc7c39bc7ea934875e16f6641a2cc0c2bb5dfaa3b1f298739b1b355b2f41570e42998a2d7790a17f96feb07118b69c23d913acc634aae5901f0c9229
|
data/.gitignore
CHANGED
data/CHANGELOG.md
CHANGED
@@ -1,14 +1,18 @@
|
|
1
1
|
|
2
2
|
# SmarterCSV 1.x Change Log
|
3
3
|
|
4
|
-
## 1.4.
|
4
|
+
## 1.4.1 (2022-02-12)
|
5
|
+
* minor fix: also support `col_sep: :auto`
|
6
|
+
* added simplecov
|
7
|
+
|
8
|
+
## 1.4.0 (2022-02-11)
|
5
9
|
* dropped GPL license, smarter_csv is now only using the MIT License
|
6
10
|
* added experimental option `col_sep: 'auto` to auto-detect the column separator (issue #183)
|
7
11
|
The default behavior is still to assume `,` is the column separator.
|
8
12
|
* fixed buggy behavior when using `remove_empty_values: false` (issue #168)
|
9
13
|
* fixed Ruby 3.0 deprecation
|
10
14
|
|
11
|
-
## 1.3.0 (2022-
|
15
|
+
## 1.3.0 (2022-02-06) Breaking code change if you used `--key_mappings`
|
12
16
|
* fix bug for key_mappings (issue #181)
|
13
17
|
The values of the `key_mappings` hash will now be used "as is", and no longer forced to be symbols
|
14
18
|
|
data/CONTRIBUTORS.md
ADDED
@@ -0,0 +1,45 @@
|
|
1
|
+
# A Big Thank You to all the Contributors!!
|
2
|
+
|
3
|
+
|
4
|
+
A Big Thank you to everyone who filed issues, sent comments, and who contributed with pull requests:
|
5
|
+
|
6
|
+
* [Jack 0](https://github.com/xjlin0)
|
7
|
+
* [Alejandro](https://github.com/agaviria)
|
8
|
+
* [Lucas Camargo de Almeida](https://github.com/lcalmeida)
|
9
|
+
* [Raphaël Bleuse](https://github.com/bleuse)
|
10
|
+
* [feens](https://github.com/feens)
|
11
|
+
* [César Camacho](https://github.com/chanko)
|
12
|
+
* [innhyu](https://github.com/innhyu)
|
13
|
+
* [Benjamin Thouret](https://github.com/benichu)
|
14
|
+
* [Chris Hilton](https://github.com/chrismhilton)
|
15
|
+
* [Sean Duckett](http://github.com/sduckett)
|
16
|
+
* [Alex Ong](http://github.com/khaong)
|
17
|
+
* [Martin Nilsson](http://github.com/MrTin)
|
18
|
+
* [Eustáquio Rangel](http://github.com/taq)
|
19
|
+
* [Pavel](http://github.com/paxa)
|
20
|
+
* [Félix Bellanger](https://github.com/Keeguon)
|
21
|
+
* [Graham Wetzler](https://github.com/grahamwetzler)
|
22
|
+
* [Marcos G. Zimmermann](https://github.com/marcosgz)
|
23
|
+
* [Jordan Running](https://github.com/jrunning)
|
24
|
+
* [Dave Sanders](https://github.com/DaveSanders)
|
25
|
+
* [Hugo Lepetit](https://github.com/giglemad)
|
26
|
+
* [esBeee](https://github.com/esBeee)
|
27
|
+
* [Waldyr de Souza](https://github.com/waldyr)
|
28
|
+
* [Ben Maher](https://github.com/benmaher)
|
29
|
+
* [Wal McConnell](https://github.com/wal)
|
30
|
+
* [Jordan Graft](https://github.com/jordangraft)
|
31
|
+
* [Michael](https://github.com/polycarpou)
|
32
|
+
* [Kevin Coleman](https://github.com/KevinColemanInc)
|
33
|
+
* [Tirdad C.](https://github.com/tridadc)
|
34
|
+
* [Dave Myron](https://github.com/contentfree)
|
35
|
+
* [Ivan Ushakov](https://github.com/IvanUshakov)
|
36
|
+
* [Matthieu Paret](https://github.com/mtparet)
|
37
|
+
* [Rohit Amarnath](https://github.com/ramarnat)
|
38
|
+
* [Joshua Smith](https://github.com/enviable)
|
39
|
+
* [Colin Petruno](https://github.com/colinpetruno)
|
40
|
+
* [Diego Salido](https://github.com/salidux)
|
41
|
+
* [Elie](https://github.com/elieteyssedou)
|
42
|
+
* [Chris Wong](https://github.com/lightwave)
|
43
|
+
* [Olle Jonsson](https://github.com/olleolleolle)
|
44
|
+
* [Nicolas Guillemain](https://github.com/Viiruus)
|
45
|
+
* [Sp6](https://github.com/sp6)
|
data/LICENSE.txt
CHANGED
data/README.md
CHANGED
@@ -1,17 +1,23 @@
|
|
1
|
-
# SmarterCSV
|
2
|
-
|
3
|
-
[![Build Status](https://secure.travis-ci.org/tilo/smarter_csv.svg?branch=master)](http://travis-ci.org/tilo/smarter_csv) [![Gem Version](https://badge.fury.io/rb/smarter_csv.svg)](http://badge.fury.io/rb/smarter_csv)
|
4
1
|
|
5
|
-
---------------
|
6
2
|
#### Service Announcement
|
7
3
|
|
8
4
|
* Work towards SmarterCSV 2.0 is still on it's way, with much improved features, and more streamlined options.
|
9
|
-
Please check the 2.0-develop branch, open any issues and pull requests with mention of v2.0.
|
5
|
+
Please check the [2.0-develop branch](https://github.com/tilo/smarter_csv/blob/master/README.md), open any issues and pull requests with mention of v2.0.
|
10
6
|
|
11
|
-
* New versions
|
7
|
+
* New versions of SmarterCSV 1.x will soon print a deprecation warning if you set :verbose to true
|
12
8
|
See below for list of deprecated options.
|
13
9
|
|
10
|
+
#### Restructured Branches
|
11
|
+
|
12
|
+
* default branch is `main` for 1.x development
|
13
|
+
* 2.x development is on `2.0-development`
|
14
|
+
|
14
15
|
---------------
|
16
|
+
|
17
|
+
# SmarterCSV
|
18
|
+
|
19
|
+
[![Build Status](https://secure.travis-ci.org/tilo/smarter_csv.svg?branch=master)](http://travis-ci.org/tilo/smarter_csv) [![Gem Version](https://badge.fury.io/rb/smarter_csv.svg)](http://badge.fury.io/rb/smarter_csv)
|
20
|
+
|
15
21
|
#### SmarterCSV 1.x
|
16
22
|
|
17
23
|
`smarter_csv` is a Ruby Gem for smarter importing of CSV Files as Array(s) of Hashes, suitable for direct processing with Mongoid or ActiveRecord,
|
@@ -55,6 +61,7 @@ You can also set the `:row_sep` manually! Checkout Example 5 for unusual `:row_s
|
|
55
61
|
#### Example 1a: How SmarterCSV processes CSV-files as array of hashes:
|
56
62
|
Please note how each hash contains only the keys for columns with non-null values.
|
57
63
|
|
64
|
+
```ruby
|
58
65
|
$ cat pets.csv
|
59
66
|
first name,last name,dogs,cats,birds,fish
|
60
67
|
Dan,McAllister,2,,,
|
@@ -70,21 +77,25 @@ Please note how each hash contains only the keys for columns with non-null value
|
|
70
77
|
{:first_name=>"Miles", :last_name=>"O'Brian", :fish=>"21"},
|
71
78
|
{:first_name=>"Nancy", :last_name=>"Homes", :dogs=>"2", :birds=>"1"}
|
72
79
|
]
|
80
|
+
```
|
73
81
|
|
74
82
|
|
75
83
|
#### Example 1b: How SmarterCSV processes CSV-files as chunks, returning arrays of hashes:
|
76
84
|
Please note how the returned array contains two sub-arrays containing the chunks which were read, each chunk containing 2 hashes.
|
77
85
|
In case the number of rows is not cleanly divisible by `:chunk_size`, the last chunk contains fewer hashes.
|
78
86
|
|
87
|
+
```ruby
|
79
88
|
> pets_by_owner = SmarterCSV.process('/tmp/pets.csv', {:chunk_size => 2, :key_mapping => {:first_name => :first, :last_name => :last}})
|
80
89
|
=> [ [ {:first=>"Dan", :last=>"McAllister", :dogs=>"2"}, {:first=>"Lucy", :last=>"Laweless", :cats=>"5"} ],
|
81
90
|
[ {:first=>"Miles", :last=>"O'Brian", :fish=>"21"}, {:first=>"Nancy", :last=>"Homes", :dogs=>"2", :birds=>"1"} ]
|
82
91
|
]
|
92
|
+
```
|
83
93
|
|
84
94
|
#### Example 1c: How SmarterCSV processes CSV-files as chunks, and passes arrays of hashes to a given block:
|
85
95
|
Please note how the given block is passed the data for each chunk as the parameter (array of hashes),
|
86
96
|
and how the `process` method returns the number of chunks when called with a block
|
87
97
|
|
98
|
+
```ruby
|
88
99
|
> total_chunks = SmarterCSV.process('/tmp/pets.csv', {:chunk_size => 2, :key_mapping => {:first_name => :first, :last_name => :last}}) do |chunk|
|
89
100
|
chunk.each do |h| # you can post-process the data from each row to your heart's content, and also create virtual attributes:
|
90
101
|
h[:full_name] = [h[:first],h[:last]].join(' ') # create a virtual attribute
|
@@ -96,16 +107,16 @@ and how the `process` method returns the number of chunks when called with a blo
|
|
96
107
|
[{:dogs=>"2", :full_name=>"Dan McAllister"}, {:cats=>"5", :full_name=>"Lucy Laweless"}]
|
97
108
|
[{:fish=>"21", :full_name=>"Miles O'Brian"}, {:dogs=>"2", :birds=>"1", :full_name=>"Nancy Homes"}]
|
98
109
|
=> 2
|
99
|
-
|
110
|
+
```
|
100
111
|
#### Example 2: Reading a CSV-File in one Chunk, returning one Array of Hashes:
|
101
|
-
|
112
|
+
```ruby
|
102
113
|
filename = '/tmp/input_file.txt' # TAB delimited file, each row ending with Control-M
|
103
114
|
recordsA = SmarterCSV.process(filename, {:col_sep => "\t", :row_sep => "\cM"}) # no block given
|
104
115
|
|
105
116
|
=> returns an array of hashes
|
106
|
-
|
117
|
+
```
|
107
118
|
#### Example 3: Populate a MySQL or MongoDB Database with SmarterCSV:
|
108
|
-
|
119
|
+
```ruby
|
109
120
|
# without using chunks:
|
110
121
|
filename = '/tmp/some.csv'
|
111
122
|
options = {:key_mapping => {:unwanted_row => nil, :old_row_name => :new_name}}
|
@@ -116,9 +127,9 @@ and how the `process` method returns the number of chunks when called with a blo
|
|
116
127
|
end
|
117
128
|
|
118
129
|
=> returns number of chunks / rows we processed
|
119
|
-
|
130
|
+
```
|
120
131
|
#### Example 4: Populate a MongoDB Database in Chunks of 100 records with SmarterCSV:
|
121
|
-
|
132
|
+
```ruby
|
122
133
|
# using chunks:
|
123
134
|
filename = '/tmp/some.csv'
|
124
135
|
options = {:chunk_size => 100, :key_mapping => {:unwanted_row => nil, :old_row_name => :new_name}}
|
@@ -129,10 +140,10 @@ and how the `process` method returns the number of chunks when called with a blo
|
|
129
140
|
end
|
130
141
|
|
131
142
|
=> returns number of chunks we processed
|
132
|
-
|
143
|
+
```
|
133
144
|
|
134
145
|
#### Example 5: Reading a CSV-like File, and Processing it with Resque:
|
135
|
-
|
146
|
+
```ruby
|
136
147
|
filename = '/tmp/strange_db_dump' # a file with CRTL-A as col_separator, and with CTRL-B\n as record_separator (hello iTunes!)
|
137
148
|
options = {
|
138
149
|
:col_sep => "\cA", :row_sep => "\cB\n", :comment_regexp => /^#/,
|
@@ -142,11 +153,11 @@ and how the `process` method returns the number of chunks when called with a blo
|
|
142
153
|
Resque.enque( ResqueWorkerClass, chunk ) # pass chunks of CSV-data to Resque workers for parallel processing
|
143
154
|
end
|
144
155
|
=> returns number of chunks
|
145
|
-
|
156
|
+
```
|
146
157
|
#### Example 6: Using Value Converters
|
147
158
|
|
148
159
|
NOTE: If you use `key_mappings` and `value_converters`, make sure that the value converters has references the keys based on the final mapped name, not the original name in the CSV file.
|
149
|
-
|
160
|
+
```ruby
|
150
161
|
$ cat spec/fixtures/with_dates.csv
|
151
162
|
first,last,date,price
|
152
163
|
Ben,Miller,10/30/1998,$44.50
|
@@ -179,7 +190,7 @@ NOTE: If you use `key_mappings` and `value_converters`, make sure that the value
|
|
179
190
|
=> 44.50
|
180
191
|
data[0][:price].class
|
181
192
|
=> Float
|
182
|
-
|
193
|
+
```
|
183
194
|
## Parallel Processing
|
184
195
|
[Jack](https://github.com/xjlin0) wrote an interesting article about [Speeding up CSV parsing with parallel processing](http://xjlin0.github.io/tech/2015/05/25/faster-parsing-csv-with-parallel-processing)
|
185
196
|
|
@@ -206,7 +217,7 @@ The options and the block are optional.
|
|
206
217
|
| :skip_lines | nil | how many lines to skip before the first line or header line is processed |
|
207
218
|
| :comment_regexp | /^#/ | regular expression which matches comment lines (see NOTE about the CSV header) |
|
208
219
|
---------------------------------------------------------------------------------------------------------------------------------
|
209
|
-
| :col_sep | ',' | column separator, can be set to
|
220
|
+
| :col_sep | ',' | column separator, can be set to :auto |
|
210
221
|
| :force_simple_split | false | force simple splitting on :col_sep character for non-standard CSV-files. |
|
211
222
|
| | | e.g. when :quote_char is not properly escaped |
|
212
223
|
| :row_sep | $/ ,"\n" | row separator or record separator , defaults to system's $/ , which defaults to "\n" |
|
@@ -258,19 +269,19 @@ And header and data validations will also be supported in 2.x
|
|
258
269
|
#### NOTES about File Encodings:
|
259
270
|
* if you have a CSV file which contains unicode characters, you can process it as follows:
|
260
271
|
|
261
|
-
|
272
|
+
```ruby
|
262
273
|
File.open(filename, "r:bom|utf-8") do |f|
|
263
274
|
data = SmarterCSV.process(f);
|
264
275
|
end
|
265
|
-
|
276
|
+
```
|
266
277
|
* if the CSV file with unicode characters is in a remote location, similarly you need to give the encoding as an option to the `open` call:
|
267
|
-
|
278
|
+
```ruby
|
268
279
|
require 'open-uri'
|
269
280
|
file_location = 'http://your.remote.org/sample.csv'
|
270
281
|
open(file_location, 'r:utf-8') do |f| # don't forget to specify the UTF-8 encoding!!
|
271
282
|
data = SmarterCSV.process(f)
|
272
283
|
end
|
273
|
-
|
284
|
+
```
|
274
285
|
#### NOTES about CSV Headers:
|
275
286
|
* as this method parses CSV files, it is assumed that the first line of any file will contain a valid header
|
276
287
|
* the first line with the CSV header may or may not be commented out according to the :comment_regexp
|
@@ -304,64 +315,27 @@ And header and data validations will also be supported in 2.x
|
|
304
315
|
## Installation
|
305
316
|
|
306
317
|
Add this line to your application's Gemfile:
|
307
|
-
|
318
|
+
```ruby
|
308
319
|
gem 'smarter_csv'
|
309
|
-
|
320
|
+
```
|
310
321
|
And then execute:
|
311
|
-
|
322
|
+
```ruby
|
312
323
|
$ bundle
|
313
|
-
|
324
|
+
```
|
314
325
|
Or install it yourself as:
|
315
|
-
|
326
|
+
```ruby
|
316
327
|
$ gem install smarter_csv
|
317
|
-
|
328
|
+
```
|
318
329
|
## [ChangeLog](./CHANGELOG.md)
|
319
330
|
|
320
331
|
## Reporting Bugs / Feature Requests
|
321
332
|
|
322
333
|
Please [open an Issue on GitHub](https://github.com/tilo/smarter_csv/issues) if you have feedback, new feature requests, or want to report a bug. Thank you!
|
323
334
|
|
335
|
+
* please include a small sample CSV file
|
336
|
+
* please mention your version of SmarterCSV, Ruby, Rails
|
324
337
|
|
325
|
-
## Special Thanks
|
326
|
-
|
327
|
-
Many thanks to people who have filed issues and sent comments.
|
328
|
-
And a special thanks to those who contributed pull requests:
|
329
|
-
|
330
|
-
* [Jack 0](https://github.com/xjlin0)
|
331
|
-
* [Alejandro](https://github.com/agaviria)
|
332
|
-
* [Lucas Camargo de Almeida](https://github.com/lcalmeida)
|
333
|
-
* [Raphaël Bleuse](https://github.com/bleuse)
|
334
|
-
* [feens](https://github.com/feens)
|
335
|
-
* [César Camacho](https://github.com/chanko)
|
336
|
-
* [innhyu](https://github.com/innhyu)
|
337
|
-
* [Benjamin Thouret](https://github.com/benichu)
|
338
|
-
* [Chris Hilton](https://github.com/chrismhilton)
|
339
|
-
* [Sean Duckett](http://github.com/sduckett)
|
340
|
-
* [Alex Ong](http://github.com/khaong)
|
341
|
-
* [Martin Nilsson](http://github.com/MrTin)
|
342
|
-
* [Eustáquio Rangel](http://github.com/taq)
|
343
|
-
* [Pavel](http://github.com/paxa)
|
344
|
-
* [Félix Bellanger](https://github.com/Keeguon)
|
345
|
-
* [Graham Wetzler](https://github.com/grahamwetzler)
|
346
|
-
* [Marcos G. Zimmermann](https://github.com/marcosgz)
|
347
|
-
* [Jordan Running](https://github.com/jrunning)
|
348
|
-
* [Dave Sanders](https://github.com/DaveSanders)
|
349
|
-
* [Hugo Lepetit](https://github.com/giglemad)
|
350
|
-
* [esBeee](https://github.com/esBeee)
|
351
|
-
* [Waldyr de Souza](https://github.com/waldyr)
|
352
|
-
* [Ben Maher](https://github.com/benmaher)
|
353
|
-
* [Wal McConnell](https://github.com/wal)
|
354
|
-
* [Jordan Graft](https://github.com/jordangraft)
|
355
|
-
* [Michael](https://github.com/polycarpou)
|
356
|
-
* [Kevin Coleman](https://github.com/KevinColemanInc)
|
357
|
-
* [Tirdad C.](https://github.com/tridadc)
|
358
|
-
* [Dave Myron](https://github.com/contentfree)
|
359
|
-
* [Ivan Ushakov](https://github.com/IvanUshakov)
|
360
|
-
* [Matthieu Paret](https://github.com/mtparet)
|
361
|
-
* [Rohit Amarnath](https://github.com/ramarnat)
|
362
|
-
* [Joshua Smith](https://github.com/enviable)
|
363
|
-
* [Colin Petruno](https://github.com/colinpetruno)
|
364
|
-
* [Diego Salido](https://github.com/salidux)
|
338
|
+
## [A Special Thanks to all Contributors!](CONTRIBUTORS.md) 🎉🎉🎉
|
365
339
|
|
366
340
|
|
367
341
|
## Contributing
|
data/Rakefile
CHANGED
@@ -1,26 +1,19 @@
|
|
1
1
|
#!/usr/bin/env rake
|
2
2
|
require "bundler/gem_tasks"
|
3
|
-
|
4
3
|
require 'rubygems'
|
5
4
|
require 'rake'
|
6
|
-
|
7
5
|
require 'rspec/core/rake_task'
|
8
6
|
|
7
|
+
task :default => :spec
|
8
|
+
|
9
9
|
desc "Run RSpec"
|
10
10
|
RSpec::Core::RakeTask.new do |t|
|
11
|
-
t.verbose = false
|
11
|
+
# t.verbose = false
|
12
12
|
end
|
13
13
|
|
14
|
-
desc
|
15
|
-
task :
|
16
|
-
|
14
|
+
desc 'Run spec with coverage'
|
15
|
+
task :coverage do
|
16
|
+
ENV['COVERAGE'] = 'true'
|
17
|
+
Rake::Task['spec'].execute
|
18
|
+
`open coverage/index.html`
|
17
19
|
end
|
18
|
-
|
19
|
-
# task :spec_all do
|
20
|
-
# %w[active_record data_mapper mongoid].each do |model_adapter|
|
21
|
-
# puts "MODEL_ADAPTER = #{model_adapter}"
|
22
|
-
# system "rake spec MODEL_ADAPTER=#{model_adapter}"
|
23
|
-
# end
|
24
|
-
# end
|
25
|
-
|
26
|
-
task :default => :spec
|
@@ -7,16 +7,9 @@ module SmarterCSV
|
|
7
7
|
class NoColSepDetected < SmarterCSVException; end
|
8
8
|
|
9
9
|
def SmarterCSV.process(input, options={}, &block) # first parameter: filename or input object with readline method
|
10
|
-
default_options = {:col_sep => ',', :row_sep => $INPUT_RECORD_SEPARATOR, :quote_char => '"', :force_simple_split => false , :verbose => false ,
|
11
|
-
:remove_empty_values => true, :remove_zero_values => false , :remove_values_matching => nil , :remove_empty_hashes => true , :strip_whitespace => true,
|
12
|
-
:convert_values_to_numeric => true, :strip_chars_from_headers => nil , :user_provided_headers => nil , :headers_in_file => true,
|
13
|
-
:comment_regexp => /\A#/, :chunk_size => nil , :key_mapping_hash => nil , :downcase_header => true, :strings_as_keys => false, :file_encoding => 'utf-8',
|
14
|
-
:remove_unmapped_keys => false, :keep_original_headers => false, :value_converters => nil, :skip_lines => nil, :force_utf8 => false, :invalid_byte_sequence => '',
|
15
|
-
:auto_row_sep_chars => 500, :required_headers => nil
|
16
|
-
}
|
17
10
|
options = default_options.merge(options)
|
18
11
|
options[:invalid_byte_sequence] = '' if options[:invalid_byte_sequence].nil?
|
19
|
-
|
12
|
+
|
20
13
|
headerA = []
|
21
14
|
result = []
|
22
15
|
old_row_sep = $INPUT_RECORD_SEPARATOR
|
@@ -26,22 +19,21 @@ module SmarterCSV
|
|
26
19
|
begin
|
27
20
|
f = input.respond_to?(:readline) ? input : File.open(input, "r:#{options[:file_encoding]}")
|
28
21
|
|
22
|
+
# auto-detect the row separator
|
23
|
+
options[:row_sep] = SmarterCSV.guess_line_ending(f, options) if options[:row_sep].to_sym == :auto
|
24
|
+
$INPUT_RECORD_SEPARATOR = options[:row_sep]
|
29
25
|
# attempt to auto-detect column separator
|
30
|
-
options[:col_sep] = guess_column_separator(f) if options[:col_sep] ==
|
26
|
+
options[:col_sep] = guess_column_separator(f) if options[:col_sep].to_sym == :auto
|
27
|
+
# preserve options, in case we need to call the CSV class
|
28
|
+
csv_options = options.select{|k,v| [:col_sep, :row_sep, :quote_char].include?(k)} # options.slice(:col_sep, :row_sep, :quote_char)
|
29
|
+
csv_options.delete(:row_sep) if [nil, :auto].include?( options[:row_sep].to_sym )
|
30
|
+
csv_options.delete(:col_sep) if [nil, :auto].include?( options[:col_sep].to_sym )
|
31
31
|
|
32
32
|
if (options[:force_utf8] || options[:file_encoding] =~ /utf-8/i) && ( f.respond_to?(:external_encoding) && f.external_encoding != Encoding.find('UTF-8') || f.respond_to?(:encoding) && f.encoding != Encoding.find('UTF-8') )
|
33
33
|
puts 'WARNING: you are trying to process UTF-8 input, but did not open the input with "b:utf-8" option. See README file "NOTES about File Encodings".'
|
34
34
|
end
|
35
35
|
|
36
|
-
if options[:
|
37
|
-
options[:row_sep] = line_ending = SmarterCSV.guess_line_ending( f, options )
|
38
|
-
f.rewind
|
39
|
-
end
|
40
|
-
$INPUT_RECORD_SEPARATOR = options[:row_sep]
|
41
|
-
|
42
|
-
if options[:skip_lines].to_i > 0
|
43
|
-
options[:skip_lines].to_i.times{f.readline}
|
44
|
-
end
|
36
|
+
options[:skip_lines].to_i.times{f.readline} if options[:skip_lines].to_i > 0
|
45
37
|
|
46
38
|
if options[:headers_in_file] # extract the header line
|
47
39
|
# process the header line in the CSV file..
|
@@ -87,7 +79,7 @@ module SmarterCSV
|
|
87
79
|
else
|
88
80
|
headerA = file_headerA
|
89
81
|
end
|
90
|
-
header_size = headerA.size
|
82
|
+
header_size = headerA.size # used for splitting lines
|
91
83
|
|
92
84
|
headerA.map!{|x| x.to_sym } unless options[:strings_as_keys] || options[:keep_original_headers]
|
93
85
|
|
@@ -141,8 +133,8 @@ module SmarterCSV
|
|
141
133
|
# cater for the quoted csv data containing the row separator carriage return character
|
142
134
|
# in which case the row data will be split across multiple lines (see the sample content in spec/fixtures/carriage_returns_rn.csv)
|
143
135
|
# by detecting the existence of an uneven number of quote characters
|
144
|
-
multiline = line.count(options[:quote_char])%2 == 1
|
145
|
-
while line.count(options[:quote_char])%2 == 1
|
136
|
+
multiline = line.count(options[:quote_char])%2 == 1 # should handle quote_char nil
|
137
|
+
while line.count(options[:quote_char])%2 == 1 # should handle quote_char nil
|
146
138
|
next_line = f.readline
|
147
139
|
next_line = next_line.force_encoding('utf-8').encode('utf-8', invalid: :replace, undef: :replace, replace: options[:invalid_byte_sequence]) if options[:force_utf8] || options[:file_encoding] !~ /utf-8/i
|
148
140
|
line += next_line
|
@@ -269,6 +261,39 @@ module SmarterCSV
|
|
269
261
|
|
270
262
|
private
|
271
263
|
|
264
|
+
def self.default_options
|
265
|
+
{
|
266
|
+
auto_row_sep_chars: 500,
|
267
|
+
chunk_size: nil ,
|
268
|
+
col_sep: ',',
|
269
|
+
comment_regexp: /\A#/,
|
270
|
+
convert_values_to_numeric: true,
|
271
|
+
downcase_header: true,
|
272
|
+
file_encoding: 'utf-8',
|
273
|
+
force_simple_split: false ,
|
274
|
+
force_utf8: false,
|
275
|
+
headers_in_file: true,
|
276
|
+
invalid_byte_sequence: '',
|
277
|
+
keep_original_headers: false,
|
278
|
+
key_mapping_hash: nil ,
|
279
|
+
quote_char: '"',
|
280
|
+
remove_empty_hashes: true ,
|
281
|
+
remove_empty_values: true,
|
282
|
+
remove_unmapped_keys: false,
|
283
|
+
remove_values_matching: nil,
|
284
|
+
remove_zero_values: false,
|
285
|
+
required_headers: nil,
|
286
|
+
row_sep: $INPUT_RECORD_SEPARATOR,
|
287
|
+
skip_lines: nil,
|
288
|
+
strings_as_keys: false,
|
289
|
+
strip_chars_from_headers: nil,
|
290
|
+
strip_whitespace: true,
|
291
|
+
user_provided_headers: nil,
|
292
|
+
value_converters: nil,
|
293
|
+
verbose: false,
|
294
|
+
}
|
295
|
+
end
|
296
|
+
|
272
297
|
def self.blank?(value)
|
273
298
|
case value
|
274
299
|
when Array
|
@@ -347,6 +372,8 @@ module SmarterCSV
|
|
347
372
|
lines += 1
|
348
373
|
break if options[:auto_row_sep_chars] && options[:auto_row_sep_chars] > 0 && lines >= options[:auto_row_sep_chars]
|
349
374
|
end
|
375
|
+
filehandle.rewind
|
376
|
+
|
350
377
|
counts["\r"] += 1 if last_char == "\r"
|
351
378
|
# find the key/value pair with the largest counter:
|
352
379
|
k,_ = counts.max_by{|_,v| v}
|
data/lib/smarter_csv/version.rb
CHANGED
data/lib/smarter_csv.rb
CHANGED
data/smarter_csv.gemspec
CHANGED
@@ -18,6 +18,7 @@ Gem::Specification.new do |spec|
|
|
18
18
|
spec.require_paths = ["lib"]
|
19
19
|
spec.requirements = ['csv'] # for CSV.parse() only needed in case we have quoted fields
|
20
20
|
spec.add_development_dependency "rspec"
|
21
|
+
spec.add_development_dependency "simplecov"
|
21
22
|
# spec.add_development_dependency "guard-rspec"
|
22
23
|
|
23
24
|
spec.metadata["homepage_uri"] = spec.homepage
|
@@ -3,7 +3,6 @@ require 'spec_helper'
|
|
3
3
|
fixture_path = 'spec/fixtures'
|
4
4
|
|
5
5
|
describe 'process files with line endings explicitly pre-specified' do
|
6
|
-
|
7
6
|
it 'should process a file with \n for line endings and within data fields' do
|
8
7
|
sep = "\n"
|
9
8
|
options = {:row_sep => sep}
|
@@ -83,14 +82,14 @@ describe 'process files with line endings explicitly pre-specified' do
|
|
83
82
|
data[1][:members].should == ["Jimmy Page", "Robert Plant", "John Bonham", "John Paul Jones"].join(text_sep)
|
84
83
|
data[1][:albums].should == ["Led Zeppelin", "Led Zeppelin II", "Led Zeppelin III", "Led Zeppelin IV"].join(text_sep)
|
85
84
|
end
|
86
|
-
|
87
85
|
end
|
88
86
|
|
89
87
|
describe 'process files with line endings in automatic mode' do
|
88
|
+
let(:options) { { row_sep: :auto } }
|
90
89
|
|
91
90
|
it 'should process a file with \n for line endings and within data fields' do
|
92
91
|
sep = "\n"
|
93
|
-
data = SmarterCSV.process("#{fixture_path}/carriage_returns_n.csv",
|
92
|
+
data = SmarterCSV.process("#{fixture_path}/carriage_returns_n.csv", options)
|
94
93
|
data.flatten.size.should == 8
|
95
94
|
data[0][:name].should == "Anfield"
|
96
95
|
data[0][:street].should == "Anfield Road"
|
@@ -112,7 +111,29 @@ describe 'process files with line endings in automatic mode' do
|
|
112
111
|
|
113
112
|
it 'should process a file with \r for line endings and within data fields' do
|
114
113
|
sep = "\r"
|
115
|
-
data = SmarterCSV.process("#{fixture_path}/carriage_returns_r.csv",
|
114
|
+
data = SmarterCSV.process("#{fixture_path}/carriage_returns_r.csv", options)
|
115
|
+
data.flatten.size.should == 8
|
116
|
+
data[0][:name].should == "Anfield"
|
117
|
+
data[0][:street].should == "Anfield Road"
|
118
|
+
data[0][:city].should == "Liverpool"
|
119
|
+
data[1][:name].should == ["Highbury", "Highbury House"].join(sep)
|
120
|
+
data[2][:street].should == ["Sir Matt ", "Busby Way"].join(sep)
|
121
|
+
data[3][:city].should == ["Newcastle-upon-tyne ", "Tyne and Wear"].join(sep)
|
122
|
+
data[4][:name].should == ["White Hart Lane", "(The Lane)"].join(sep)
|
123
|
+
data[4][:street].should == ["Bill Nicholson Way ", "748 High Rd"].join(sep)
|
124
|
+
data[4][:city].should == ["Tottenham", "London"].join(sep)
|
125
|
+
data[5][:name].should == "Stamford Bridge"
|
126
|
+
data[5][:street].should == ["Fulham Road", "London"].join(sep)
|
127
|
+
data[5][:city].should be_nil
|
128
|
+
data[6][:name].should == ["Etihad Stadium", "Rowsley St", "Manchester"].join(sep)
|
129
|
+
data[7][:name].should == "Goodison"
|
130
|
+
data[7][:street].should == "Goodison Road"
|
131
|
+
data[7][:city].should == "Liverpool"
|
132
|
+
end
|
133
|
+
|
134
|
+
it 'also works when auto is given a string' do
|
135
|
+
sep = "\r"
|
136
|
+
data = SmarterCSV.process("#{fixture_path}/carriage_returns_r.csv", {row_sep: 'auto'})
|
116
137
|
data.flatten.size.should == 8
|
117
138
|
data[0][:name].should == "Anfield"
|
118
139
|
data[0][:street].should == "Anfield Road"
|
@@ -134,7 +155,7 @@ describe 'process files with line endings in automatic mode' do
|
|
134
155
|
|
135
156
|
it 'should process a file with \r\n for line endings and within data fields' do
|
136
157
|
sep = "\r\n"
|
137
|
-
data = SmarterCSV.process("#{fixture_path}/carriage_returns_rn.csv",
|
158
|
+
data = SmarterCSV.process("#{fixture_path}/carriage_returns_rn.csv", options)
|
138
159
|
data.flatten.size.should == 8
|
139
160
|
data[0][:name].should == "Anfield"
|
140
161
|
data[0][:street].should == "Anfield Road"
|
@@ -157,7 +178,7 @@ describe 'process files with line endings in automatic mode' do
|
|
157
178
|
it 'should process a file with more quoted text carriage return characters (\r) than line ending characters (\n)' do
|
158
179
|
row_sep = "\n"
|
159
180
|
text_sep = "\r"
|
160
|
-
data = SmarterCSV.process("#{fixture_path}/carriage_returns_quoted.csv",
|
181
|
+
data = SmarterCSV.process("#{fixture_path}/carriage_returns_quoted.csv", options)
|
161
182
|
data.flatten.size.should == 2
|
162
183
|
data[0][:band].should == "New Order"
|
163
184
|
data[0][:members].should == ["Bernard Sumner", "Peter Hook", "Stephen Morris", "Gillian Gilbert"].join(text_sep)
|
@@ -166,5 +187,4 @@ describe 'process files with line endings in automatic mode' do
|
|
166
187
|
data[1][:members].should == ["Jimmy Page", "Robert Plant", "John Bonham", "John Paul Jones"].join(text_sep)
|
167
188
|
data[1][:albums].should == ["Led Zeppelin", "Led Zeppelin II", "Led Zeppelin III", "Led Zeppelin IV"].join(text_sep)
|
168
189
|
end
|
169
|
-
|
170
190
|
end
|
@@ -48,7 +48,7 @@ describe 'can handle col_sep' do
|
|
48
48
|
end
|
49
49
|
|
50
50
|
describe 'auto-detection of separator' do
|
51
|
-
options = {:
|
51
|
+
options = {col_sep: :auto}
|
52
52
|
|
53
53
|
it 'auto-detects comma separator and loads data' do
|
54
54
|
data = SmarterCSV.process("#{fixture_path}/separator_comma.csv", options)
|
@@ -85,5 +85,11 @@ describe 'can handle col_sep' do
|
|
85
85
|
SmarterCSV.process("#{fixture_path}/binary.csv", options)
|
86
86
|
}.to raise_exception SmarterCSV::NoColSepDetected
|
87
87
|
end
|
88
|
+
|
89
|
+
it 'also works when auto is given a string' do
|
90
|
+
data = SmarterCSV.process("#{fixture_path}/separator_pipe.csv", col_sep: 'auto')
|
91
|
+
data.first.keys.size.should == 4
|
92
|
+
data.size.should eq 3
|
93
|
+
end
|
88
94
|
end
|
89
95
|
end
|
metadata
CHANGED
@@ -1,14 +1,14 @@
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
2
2
|
name: smarter_csv
|
3
3
|
version: !ruby/object:Gem::Version
|
4
|
-
version: 1.4.
|
4
|
+
version: 1.4.2
|
5
5
|
platform: ruby
|
6
6
|
authors:
|
7
7
|
- Tilo Sloboda
|
8
8
|
autorequire:
|
9
9
|
bindir: bin
|
10
10
|
cert_chain: []
|
11
|
-
date: 2022-02-
|
11
|
+
date: 2022-02-15 00:00:00.000000000 Z
|
12
12
|
dependencies:
|
13
13
|
- !ruby/object:Gem::Dependency
|
14
14
|
name: rspec
|
@@ -24,6 +24,20 @@ dependencies:
|
|
24
24
|
- - ">="
|
25
25
|
- !ruby/object:Gem::Version
|
26
26
|
version: '0'
|
27
|
+
- !ruby/object:Gem::Dependency
|
28
|
+
name: simplecov
|
29
|
+
requirement: !ruby/object:Gem::Requirement
|
30
|
+
requirements:
|
31
|
+
- - ">="
|
32
|
+
- !ruby/object:Gem::Version
|
33
|
+
version: '0'
|
34
|
+
type: :development
|
35
|
+
prerelease: false
|
36
|
+
version_requirements: !ruby/object:Gem::Requirement
|
37
|
+
requirements:
|
38
|
+
- - ">="
|
39
|
+
- !ruby/object:Gem::Version
|
40
|
+
version: '0'
|
27
41
|
description: Ruby Gem for smarter importing of CSV Files as Array(s) of Hashes, with
|
28
42
|
optional features for processing large files in parallel, embedded comments, unusual
|
29
43
|
field- and record-separators, flexible mapping of CSV-headers to Hash-keys
|
@@ -38,6 +52,7 @@ files:
|
|
38
52
|
- ".rvmrc"
|
39
53
|
- ".travis.yml"
|
40
54
|
- CHANGELOG.md
|
55
|
+
- CONTRIBUTORS.md
|
41
56
|
- Gemfile
|
42
57
|
- LICENSE.txt
|
43
58
|
- README.md
|
@@ -143,7 +158,7 @@ required_rubygems_version: !ruby/object:Gem::Requirement
|
|
143
158
|
version: '0'
|
144
159
|
requirements:
|
145
160
|
- csv
|
146
|
-
rubygems_version: 3.1.
|
161
|
+
rubygems_version: 3.1.6
|
147
162
|
signing_key:
|
148
163
|
specification_version: 4
|
149
164
|
summary: Ruby Gem for smarter importing of CSV Files (and CSV-like files), with lots
|