smarter_csv 1.4.0 → 1.4.2
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +4 -4
- data/.gitignore +2 -0
- data/CHANGELOG.md +6 -2
- data/CONTRIBUTORS.md +45 -0
- data/LICENSE.txt +1 -1
- data/README.md +42 -68
- data/Rakefile +8 -15
- data/lib/smarter_csv/smarter_csv.rb +48 -21
- data/lib/smarter_csv/version.rb +1 -1
- data/lib/smarter_csv.rb +8 -0
- data/smarter_csv.gemspec +1 -0
- data/spec/smarter_csv/carriage_return_spec.rb +27 -7
- data/spec/smarter_csv/column_separator_spec.rb +7 -1
- metadata +18 -3
checksums.yaml
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
---
|
2
2
|
SHA256:
|
3
|
-
metadata.gz:
|
4
|
-
data.tar.gz:
|
3
|
+
metadata.gz: 3be724101d41326ff480bcb723c1b40a3cabd879eb55e0c2f044372f8e5a57d0
|
4
|
+
data.tar.gz: 657db1421352f449bf042f8df4d5178167af048ad37836e4f2f2f8a6aea3ece0
|
5
5
|
SHA512:
|
6
|
-
metadata.gz:
|
7
|
-
data.tar.gz:
|
6
|
+
metadata.gz: 3430649df35ac8139d35b04b85e8691ca5fc3d98b7b15f0d3987855f571987bdb742e0ed6f807ddb7a2e61e61d696d529ac311bc58e30188325f1c4bb78098a4
|
7
|
+
data.tar.gz: 1b386af7cc7c39bc7ea934875e16f6641a2cc0c2bb5dfaa3b1f298739b1b355b2f41570e42998a2d7790a17f96feb07118b69c23d913acc634aae5901f0c9229
|
data/.gitignore
CHANGED
data/CHANGELOG.md
CHANGED
@@ -1,14 +1,18 @@
|
|
1
1
|
|
2
2
|
# SmarterCSV 1.x Change Log
|
3
3
|
|
4
|
-
## 1.4.
|
4
|
+
## 1.4.1 (2022-02-12)
|
5
|
+
* minor fix: also support `col_sep: :auto`
|
6
|
+
* added simplecov
|
7
|
+
|
8
|
+
## 1.4.0 (2022-02-11)
|
5
9
|
* dropped GPL license, smarter_csv is now only using the MIT License
|
6
10
|
* added experimental option `col_sep: 'auto` to auto-detect the column separator (issue #183)
|
7
11
|
The default behavior is still to assume `,` is the column separator.
|
8
12
|
* fixed buggy behavior when using `remove_empty_values: false` (issue #168)
|
9
13
|
* fixed Ruby 3.0 deprecation
|
10
14
|
|
11
|
-
## 1.3.0 (2022-
|
15
|
+
## 1.3.0 (2022-02-06) Breaking code change if you used `--key_mappings`
|
12
16
|
* fix bug for key_mappings (issue #181)
|
13
17
|
The values of the `key_mappings` hash will now be used "as is", and no longer forced to be symbols
|
14
18
|
|
data/CONTRIBUTORS.md
ADDED
@@ -0,0 +1,45 @@
|
|
1
|
+
# A Big Thank You to all the Contributors!!
|
2
|
+
|
3
|
+
|
4
|
+
A Big Thank you to everyone who filed issues, sent comments, and who contributed with pull requests:
|
5
|
+
|
6
|
+
* [Jack 0](https://github.com/xjlin0)
|
7
|
+
* [Alejandro](https://github.com/agaviria)
|
8
|
+
* [Lucas Camargo de Almeida](https://github.com/lcalmeida)
|
9
|
+
* [Raphaël Bleuse](https://github.com/bleuse)
|
10
|
+
* [feens](https://github.com/feens)
|
11
|
+
* [César Camacho](https://github.com/chanko)
|
12
|
+
* [innhyu](https://github.com/innhyu)
|
13
|
+
* [Benjamin Thouret](https://github.com/benichu)
|
14
|
+
* [Chris Hilton](https://github.com/chrismhilton)
|
15
|
+
* [Sean Duckett](http://github.com/sduckett)
|
16
|
+
* [Alex Ong](http://github.com/khaong)
|
17
|
+
* [Martin Nilsson](http://github.com/MrTin)
|
18
|
+
* [Eustáquio Rangel](http://github.com/taq)
|
19
|
+
* [Pavel](http://github.com/paxa)
|
20
|
+
* [Félix Bellanger](https://github.com/Keeguon)
|
21
|
+
* [Graham Wetzler](https://github.com/grahamwetzler)
|
22
|
+
* [Marcos G. Zimmermann](https://github.com/marcosgz)
|
23
|
+
* [Jordan Running](https://github.com/jrunning)
|
24
|
+
* [Dave Sanders](https://github.com/DaveSanders)
|
25
|
+
* [Hugo Lepetit](https://github.com/giglemad)
|
26
|
+
* [esBeee](https://github.com/esBeee)
|
27
|
+
* [Waldyr de Souza](https://github.com/waldyr)
|
28
|
+
* [Ben Maher](https://github.com/benmaher)
|
29
|
+
* [Wal McConnell](https://github.com/wal)
|
30
|
+
* [Jordan Graft](https://github.com/jordangraft)
|
31
|
+
* [Michael](https://github.com/polycarpou)
|
32
|
+
* [Kevin Coleman](https://github.com/KevinColemanInc)
|
33
|
+
* [Tirdad C.](https://github.com/tridadc)
|
34
|
+
* [Dave Myron](https://github.com/contentfree)
|
35
|
+
* [Ivan Ushakov](https://github.com/IvanUshakov)
|
36
|
+
* [Matthieu Paret](https://github.com/mtparet)
|
37
|
+
* [Rohit Amarnath](https://github.com/ramarnat)
|
38
|
+
* [Joshua Smith](https://github.com/enviable)
|
39
|
+
* [Colin Petruno](https://github.com/colinpetruno)
|
40
|
+
* [Diego Salido](https://github.com/salidux)
|
41
|
+
* [Elie](https://github.com/elieteyssedou)
|
42
|
+
* [Chris Wong](https://github.com/lightwave)
|
43
|
+
* [Olle Jonsson](https://github.com/olleolleolle)
|
44
|
+
* [Nicolas Guillemain](https://github.com/Viiruus)
|
45
|
+
* [Sp6](https://github.com/sp6)
|
data/LICENSE.txt
CHANGED
data/README.md
CHANGED
@@ -1,17 +1,23 @@
|
|
1
|
-
# SmarterCSV
|
2
|
-
|
3
|
-
[](http://travis-ci.org/tilo/smarter_csv) [](http://badge.fury.io/rb/smarter_csv)
|
4
1
|
|
5
|
-
---------------
|
6
2
|
#### Service Announcement
|
7
3
|
|
8
4
|
* Work towards SmarterCSV 2.0 is still on it's way, with much improved features, and more streamlined options.
|
9
|
-
Please check the 2.0-develop branch, open any issues and pull requests with mention of v2.0.
|
5
|
+
Please check the [2.0-develop branch](https://github.com/tilo/smarter_csv/blob/master/README.md), open any issues and pull requests with mention of v2.0.
|
10
6
|
|
11
|
-
* New versions
|
7
|
+
* New versions of SmarterCSV 1.x will soon print a deprecation warning if you set :verbose to true
|
12
8
|
See below for list of deprecated options.
|
13
9
|
|
10
|
+
#### Restructured Branches
|
11
|
+
|
12
|
+
* default branch is `main` for 1.x development
|
13
|
+
* 2.x development is on `2.0-development`
|
14
|
+
|
14
15
|
---------------
|
16
|
+
|
17
|
+
# SmarterCSV
|
18
|
+
|
19
|
+
[](http://travis-ci.org/tilo/smarter_csv) [](http://badge.fury.io/rb/smarter_csv)
|
20
|
+
|
15
21
|
#### SmarterCSV 1.x
|
16
22
|
|
17
23
|
`smarter_csv` is a Ruby Gem for smarter importing of CSV Files as Array(s) of Hashes, suitable for direct processing with Mongoid or ActiveRecord,
|
@@ -55,6 +61,7 @@ You can also set the `:row_sep` manually! Checkout Example 5 for unusual `:row_s
|
|
55
61
|
#### Example 1a: How SmarterCSV processes CSV-files as array of hashes:
|
56
62
|
Please note how each hash contains only the keys for columns with non-null values.
|
57
63
|
|
64
|
+
```ruby
|
58
65
|
$ cat pets.csv
|
59
66
|
first name,last name,dogs,cats,birds,fish
|
60
67
|
Dan,McAllister,2,,,
|
@@ -70,21 +77,25 @@ Please note how each hash contains only the keys for columns with non-null value
|
|
70
77
|
{:first_name=>"Miles", :last_name=>"O'Brian", :fish=>"21"},
|
71
78
|
{:first_name=>"Nancy", :last_name=>"Homes", :dogs=>"2", :birds=>"1"}
|
72
79
|
]
|
80
|
+
```
|
73
81
|
|
74
82
|
|
75
83
|
#### Example 1b: How SmarterCSV processes CSV-files as chunks, returning arrays of hashes:
|
76
84
|
Please note how the returned array contains two sub-arrays containing the chunks which were read, each chunk containing 2 hashes.
|
77
85
|
In case the number of rows is not cleanly divisible by `:chunk_size`, the last chunk contains fewer hashes.
|
78
86
|
|
87
|
+
```ruby
|
79
88
|
> pets_by_owner = SmarterCSV.process('/tmp/pets.csv', {:chunk_size => 2, :key_mapping => {:first_name => :first, :last_name => :last}})
|
80
89
|
=> [ [ {:first=>"Dan", :last=>"McAllister", :dogs=>"2"}, {:first=>"Lucy", :last=>"Laweless", :cats=>"5"} ],
|
81
90
|
[ {:first=>"Miles", :last=>"O'Brian", :fish=>"21"}, {:first=>"Nancy", :last=>"Homes", :dogs=>"2", :birds=>"1"} ]
|
82
91
|
]
|
92
|
+
```
|
83
93
|
|
84
94
|
#### Example 1c: How SmarterCSV processes CSV-files as chunks, and passes arrays of hashes to a given block:
|
85
95
|
Please note how the given block is passed the data for each chunk as the parameter (array of hashes),
|
86
96
|
and how the `process` method returns the number of chunks when called with a block
|
87
97
|
|
98
|
+
```ruby
|
88
99
|
> total_chunks = SmarterCSV.process('/tmp/pets.csv', {:chunk_size => 2, :key_mapping => {:first_name => :first, :last_name => :last}}) do |chunk|
|
89
100
|
chunk.each do |h| # you can post-process the data from each row to your heart's content, and also create virtual attributes:
|
90
101
|
h[:full_name] = [h[:first],h[:last]].join(' ') # create a virtual attribute
|
@@ -96,16 +107,16 @@ and how the `process` method returns the number of chunks when called with a blo
|
|
96
107
|
[{:dogs=>"2", :full_name=>"Dan McAllister"}, {:cats=>"5", :full_name=>"Lucy Laweless"}]
|
97
108
|
[{:fish=>"21", :full_name=>"Miles O'Brian"}, {:dogs=>"2", :birds=>"1", :full_name=>"Nancy Homes"}]
|
98
109
|
=> 2
|
99
|
-
|
110
|
+
```
|
100
111
|
#### Example 2: Reading a CSV-File in one Chunk, returning one Array of Hashes:
|
101
|
-
|
112
|
+
```ruby
|
102
113
|
filename = '/tmp/input_file.txt' # TAB delimited file, each row ending with Control-M
|
103
114
|
recordsA = SmarterCSV.process(filename, {:col_sep => "\t", :row_sep => "\cM"}) # no block given
|
104
115
|
|
105
116
|
=> returns an array of hashes
|
106
|
-
|
117
|
+
```
|
107
118
|
#### Example 3: Populate a MySQL or MongoDB Database with SmarterCSV:
|
108
|
-
|
119
|
+
```ruby
|
109
120
|
# without using chunks:
|
110
121
|
filename = '/tmp/some.csv'
|
111
122
|
options = {:key_mapping => {:unwanted_row => nil, :old_row_name => :new_name}}
|
@@ -116,9 +127,9 @@ and how the `process` method returns the number of chunks when called with a blo
|
|
116
127
|
end
|
117
128
|
|
118
129
|
=> returns number of chunks / rows we processed
|
119
|
-
|
130
|
+
```
|
120
131
|
#### Example 4: Populate a MongoDB Database in Chunks of 100 records with SmarterCSV:
|
121
|
-
|
132
|
+
```ruby
|
122
133
|
# using chunks:
|
123
134
|
filename = '/tmp/some.csv'
|
124
135
|
options = {:chunk_size => 100, :key_mapping => {:unwanted_row => nil, :old_row_name => :new_name}}
|
@@ -129,10 +140,10 @@ and how the `process` method returns the number of chunks when called with a blo
|
|
129
140
|
end
|
130
141
|
|
131
142
|
=> returns number of chunks we processed
|
132
|
-
|
143
|
+
```
|
133
144
|
|
134
145
|
#### Example 5: Reading a CSV-like File, and Processing it with Resque:
|
135
|
-
|
146
|
+
```ruby
|
136
147
|
filename = '/tmp/strange_db_dump' # a file with CRTL-A as col_separator, and with CTRL-B\n as record_separator (hello iTunes!)
|
137
148
|
options = {
|
138
149
|
:col_sep => "\cA", :row_sep => "\cB\n", :comment_regexp => /^#/,
|
@@ -142,11 +153,11 @@ and how the `process` method returns the number of chunks when called with a blo
|
|
142
153
|
Resque.enque( ResqueWorkerClass, chunk ) # pass chunks of CSV-data to Resque workers for parallel processing
|
143
154
|
end
|
144
155
|
=> returns number of chunks
|
145
|
-
|
156
|
+
```
|
146
157
|
#### Example 6: Using Value Converters
|
147
158
|
|
148
159
|
NOTE: If you use `key_mappings` and `value_converters`, make sure that the value converters has references the keys based on the final mapped name, not the original name in the CSV file.
|
149
|
-
|
160
|
+
```ruby
|
150
161
|
$ cat spec/fixtures/with_dates.csv
|
151
162
|
first,last,date,price
|
152
163
|
Ben,Miller,10/30/1998,$44.50
|
@@ -179,7 +190,7 @@ NOTE: If you use `key_mappings` and `value_converters`, make sure that the value
|
|
179
190
|
=> 44.50
|
180
191
|
data[0][:price].class
|
181
192
|
=> Float
|
182
|
-
|
193
|
+
```
|
183
194
|
## Parallel Processing
|
184
195
|
[Jack](https://github.com/xjlin0) wrote an interesting article about [Speeding up CSV parsing with parallel processing](http://xjlin0.github.io/tech/2015/05/25/faster-parsing-csv-with-parallel-processing)
|
185
196
|
|
@@ -206,7 +217,7 @@ The options and the block are optional.
|
|
206
217
|
| :skip_lines | nil | how many lines to skip before the first line or header line is processed |
|
207
218
|
| :comment_regexp | /^#/ | regular expression which matches comment lines (see NOTE about the CSV header) |
|
208
219
|
---------------------------------------------------------------------------------------------------------------------------------
|
209
|
-
| :col_sep | ',' | column separator, can be set to
|
220
|
+
| :col_sep | ',' | column separator, can be set to :auto |
|
210
221
|
| :force_simple_split | false | force simple splitting on :col_sep character for non-standard CSV-files. |
|
211
222
|
| | | e.g. when :quote_char is not properly escaped |
|
212
223
|
| :row_sep | $/ ,"\n" | row separator or record separator , defaults to system's $/ , which defaults to "\n" |
|
@@ -258,19 +269,19 @@ And header and data validations will also be supported in 2.x
|
|
258
269
|
#### NOTES about File Encodings:
|
259
270
|
* if you have a CSV file which contains unicode characters, you can process it as follows:
|
260
271
|
|
261
|
-
|
272
|
+
```ruby
|
262
273
|
File.open(filename, "r:bom|utf-8") do |f|
|
263
274
|
data = SmarterCSV.process(f);
|
264
275
|
end
|
265
|
-
|
276
|
+
```
|
266
277
|
* if the CSV file with unicode characters is in a remote location, similarly you need to give the encoding as an option to the `open` call:
|
267
|
-
|
278
|
+
```ruby
|
268
279
|
require 'open-uri'
|
269
280
|
file_location = 'http://your.remote.org/sample.csv'
|
270
281
|
open(file_location, 'r:utf-8') do |f| # don't forget to specify the UTF-8 encoding!!
|
271
282
|
data = SmarterCSV.process(f)
|
272
283
|
end
|
273
|
-
|
284
|
+
```
|
274
285
|
#### NOTES about CSV Headers:
|
275
286
|
* as this method parses CSV files, it is assumed that the first line of any file will contain a valid header
|
276
287
|
* the first line with the CSV header may or may not be commented out according to the :comment_regexp
|
@@ -304,64 +315,27 @@ And header and data validations will also be supported in 2.x
|
|
304
315
|
## Installation
|
305
316
|
|
306
317
|
Add this line to your application's Gemfile:
|
307
|
-
|
318
|
+
```ruby
|
308
319
|
gem 'smarter_csv'
|
309
|
-
|
320
|
+
```
|
310
321
|
And then execute:
|
311
|
-
|
322
|
+
```ruby
|
312
323
|
$ bundle
|
313
|
-
|
324
|
+
```
|
314
325
|
Or install it yourself as:
|
315
|
-
|
326
|
+
```ruby
|
316
327
|
$ gem install smarter_csv
|
317
|
-
|
328
|
+
```
|
318
329
|
## [ChangeLog](./CHANGELOG.md)
|
319
330
|
|
320
331
|
## Reporting Bugs / Feature Requests
|
321
332
|
|
322
333
|
Please [open an Issue on GitHub](https://github.com/tilo/smarter_csv/issues) if you have feedback, new feature requests, or want to report a bug. Thank you!
|
323
334
|
|
335
|
+
* please include a small sample CSV file
|
336
|
+
* please mention your version of SmarterCSV, Ruby, Rails
|
324
337
|
|
325
|
-
## Special Thanks
|
326
|
-
|
327
|
-
Many thanks to people who have filed issues and sent comments.
|
328
|
-
And a special thanks to those who contributed pull requests:
|
329
|
-
|
330
|
-
* [Jack 0](https://github.com/xjlin0)
|
331
|
-
* [Alejandro](https://github.com/agaviria)
|
332
|
-
* [Lucas Camargo de Almeida](https://github.com/lcalmeida)
|
333
|
-
* [Raphaël Bleuse](https://github.com/bleuse)
|
334
|
-
* [feens](https://github.com/feens)
|
335
|
-
* [César Camacho](https://github.com/chanko)
|
336
|
-
* [innhyu](https://github.com/innhyu)
|
337
|
-
* [Benjamin Thouret](https://github.com/benichu)
|
338
|
-
* [Chris Hilton](https://github.com/chrismhilton)
|
339
|
-
* [Sean Duckett](http://github.com/sduckett)
|
340
|
-
* [Alex Ong](http://github.com/khaong)
|
341
|
-
* [Martin Nilsson](http://github.com/MrTin)
|
342
|
-
* [Eustáquio Rangel](http://github.com/taq)
|
343
|
-
* [Pavel](http://github.com/paxa)
|
344
|
-
* [Félix Bellanger](https://github.com/Keeguon)
|
345
|
-
* [Graham Wetzler](https://github.com/grahamwetzler)
|
346
|
-
* [Marcos G. Zimmermann](https://github.com/marcosgz)
|
347
|
-
* [Jordan Running](https://github.com/jrunning)
|
348
|
-
* [Dave Sanders](https://github.com/DaveSanders)
|
349
|
-
* [Hugo Lepetit](https://github.com/giglemad)
|
350
|
-
* [esBeee](https://github.com/esBeee)
|
351
|
-
* [Waldyr de Souza](https://github.com/waldyr)
|
352
|
-
* [Ben Maher](https://github.com/benmaher)
|
353
|
-
* [Wal McConnell](https://github.com/wal)
|
354
|
-
* [Jordan Graft](https://github.com/jordangraft)
|
355
|
-
* [Michael](https://github.com/polycarpou)
|
356
|
-
* [Kevin Coleman](https://github.com/KevinColemanInc)
|
357
|
-
* [Tirdad C.](https://github.com/tridadc)
|
358
|
-
* [Dave Myron](https://github.com/contentfree)
|
359
|
-
* [Ivan Ushakov](https://github.com/IvanUshakov)
|
360
|
-
* [Matthieu Paret](https://github.com/mtparet)
|
361
|
-
* [Rohit Amarnath](https://github.com/ramarnat)
|
362
|
-
* [Joshua Smith](https://github.com/enviable)
|
363
|
-
* [Colin Petruno](https://github.com/colinpetruno)
|
364
|
-
* [Diego Salido](https://github.com/salidux)
|
338
|
+
## [A Special Thanks to all Contributors!](CONTRIBUTORS.md) 🎉🎉🎉
|
365
339
|
|
366
340
|
|
367
341
|
## Contributing
|
data/Rakefile
CHANGED
@@ -1,26 +1,19 @@
|
|
1
1
|
#!/usr/bin/env rake
|
2
2
|
require "bundler/gem_tasks"
|
3
|
-
|
4
3
|
require 'rubygems'
|
5
4
|
require 'rake'
|
6
|
-
|
7
5
|
require 'rspec/core/rake_task'
|
8
6
|
|
7
|
+
task :default => :spec
|
8
|
+
|
9
9
|
desc "Run RSpec"
|
10
10
|
RSpec::Core::RakeTask.new do |t|
|
11
|
-
t.verbose = false
|
11
|
+
# t.verbose = false
|
12
12
|
end
|
13
13
|
|
14
|
-
desc
|
15
|
-
task :
|
16
|
-
|
14
|
+
desc 'Run spec with coverage'
|
15
|
+
task :coverage do
|
16
|
+
ENV['COVERAGE'] = 'true'
|
17
|
+
Rake::Task['spec'].execute
|
18
|
+
`open coverage/index.html`
|
17
19
|
end
|
18
|
-
|
19
|
-
# task :spec_all do
|
20
|
-
# %w[active_record data_mapper mongoid].each do |model_adapter|
|
21
|
-
# puts "MODEL_ADAPTER = #{model_adapter}"
|
22
|
-
# system "rake spec MODEL_ADAPTER=#{model_adapter}"
|
23
|
-
# end
|
24
|
-
# end
|
25
|
-
|
26
|
-
task :default => :spec
|
@@ -7,16 +7,9 @@ module SmarterCSV
|
|
7
7
|
class NoColSepDetected < SmarterCSVException; end
|
8
8
|
|
9
9
|
def SmarterCSV.process(input, options={}, &block) # first parameter: filename or input object with readline method
|
10
|
-
default_options = {:col_sep => ',', :row_sep => $INPUT_RECORD_SEPARATOR, :quote_char => '"', :force_simple_split => false , :verbose => false ,
|
11
|
-
:remove_empty_values => true, :remove_zero_values => false , :remove_values_matching => nil , :remove_empty_hashes => true , :strip_whitespace => true,
|
12
|
-
:convert_values_to_numeric => true, :strip_chars_from_headers => nil , :user_provided_headers => nil , :headers_in_file => true,
|
13
|
-
:comment_regexp => /\A#/, :chunk_size => nil , :key_mapping_hash => nil , :downcase_header => true, :strings_as_keys => false, :file_encoding => 'utf-8',
|
14
|
-
:remove_unmapped_keys => false, :keep_original_headers => false, :value_converters => nil, :skip_lines => nil, :force_utf8 => false, :invalid_byte_sequence => '',
|
15
|
-
:auto_row_sep_chars => 500, :required_headers => nil
|
16
|
-
}
|
17
10
|
options = default_options.merge(options)
|
18
11
|
options[:invalid_byte_sequence] = '' if options[:invalid_byte_sequence].nil?
|
19
|
-
|
12
|
+
|
20
13
|
headerA = []
|
21
14
|
result = []
|
22
15
|
old_row_sep = $INPUT_RECORD_SEPARATOR
|
@@ -26,22 +19,21 @@ module SmarterCSV
|
|
26
19
|
begin
|
27
20
|
f = input.respond_to?(:readline) ? input : File.open(input, "r:#{options[:file_encoding]}")
|
28
21
|
|
22
|
+
# auto-detect the row separator
|
23
|
+
options[:row_sep] = SmarterCSV.guess_line_ending(f, options) if options[:row_sep].to_sym == :auto
|
24
|
+
$INPUT_RECORD_SEPARATOR = options[:row_sep]
|
29
25
|
# attempt to auto-detect column separator
|
30
|
-
options[:col_sep] = guess_column_separator(f) if options[:col_sep] ==
|
26
|
+
options[:col_sep] = guess_column_separator(f) if options[:col_sep].to_sym == :auto
|
27
|
+
# preserve options, in case we need to call the CSV class
|
28
|
+
csv_options = options.select{|k,v| [:col_sep, :row_sep, :quote_char].include?(k)} # options.slice(:col_sep, :row_sep, :quote_char)
|
29
|
+
csv_options.delete(:row_sep) if [nil, :auto].include?( options[:row_sep].to_sym )
|
30
|
+
csv_options.delete(:col_sep) if [nil, :auto].include?( options[:col_sep].to_sym )
|
31
31
|
|
32
32
|
if (options[:force_utf8] || options[:file_encoding] =~ /utf-8/i) && ( f.respond_to?(:external_encoding) && f.external_encoding != Encoding.find('UTF-8') || f.respond_to?(:encoding) && f.encoding != Encoding.find('UTF-8') )
|
33
33
|
puts 'WARNING: you are trying to process UTF-8 input, but did not open the input with "b:utf-8" option. See README file "NOTES about File Encodings".'
|
34
34
|
end
|
35
35
|
|
36
|
-
if options[:
|
37
|
-
options[:row_sep] = line_ending = SmarterCSV.guess_line_ending( f, options )
|
38
|
-
f.rewind
|
39
|
-
end
|
40
|
-
$INPUT_RECORD_SEPARATOR = options[:row_sep]
|
41
|
-
|
42
|
-
if options[:skip_lines].to_i > 0
|
43
|
-
options[:skip_lines].to_i.times{f.readline}
|
44
|
-
end
|
36
|
+
options[:skip_lines].to_i.times{f.readline} if options[:skip_lines].to_i > 0
|
45
37
|
|
46
38
|
if options[:headers_in_file] # extract the header line
|
47
39
|
# process the header line in the CSV file..
|
@@ -87,7 +79,7 @@ module SmarterCSV
|
|
87
79
|
else
|
88
80
|
headerA = file_headerA
|
89
81
|
end
|
90
|
-
header_size = headerA.size
|
82
|
+
header_size = headerA.size # used for splitting lines
|
91
83
|
|
92
84
|
headerA.map!{|x| x.to_sym } unless options[:strings_as_keys] || options[:keep_original_headers]
|
93
85
|
|
@@ -141,8 +133,8 @@ module SmarterCSV
|
|
141
133
|
# cater for the quoted csv data containing the row separator carriage return character
|
142
134
|
# in which case the row data will be split across multiple lines (see the sample content in spec/fixtures/carriage_returns_rn.csv)
|
143
135
|
# by detecting the existence of an uneven number of quote characters
|
144
|
-
multiline = line.count(options[:quote_char])%2 == 1
|
145
|
-
while line.count(options[:quote_char])%2 == 1
|
136
|
+
multiline = line.count(options[:quote_char])%2 == 1 # should handle quote_char nil
|
137
|
+
while line.count(options[:quote_char])%2 == 1 # should handle quote_char nil
|
146
138
|
next_line = f.readline
|
147
139
|
next_line = next_line.force_encoding('utf-8').encode('utf-8', invalid: :replace, undef: :replace, replace: options[:invalid_byte_sequence]) if options[:force_utf8] || options[:file_encoding] !~ /utf-8/i
|
148
140
|
line += next_line
|
@@ -269,6 +261,39 @@ module SmarterCSV
|
|
269
261
|
|
270
262
|
private
|
271
263
|
|
264
|
+
def self.default_options
|
265
|
+
{
|
266
|
+
auto_row_sep_chars: 500,
|
267
|
+
chunk_size: nil ,
|
268
|
+
col_sep: ',',
|
269
|
+
comment_regexp: /\A#/,
|
270
|
+
convert_values_to_numeric: true,
|
271
|
+
downcase_header: true,
|
272
|
+
file_encoding: 'utf-8',
|
273
|
+
force_simple_split: false ,
|
274
|
+
force_utf8: false,
|
275
|
+
headers_in_file: true,
|
276
|
+
invalid_byte_sequence: '',
|
277
|
+
keep_original_headers: false,
|
278
|
+
key_mapping_hash: nil ,
|
279
|
+
quote_char: '"',
|
280
|
+
remove_empty_hashes: true ,
|
281
|
+
remove_empty_values: true,
|
282
|
+
remove_unmapped_keys: false,
|
283
|
+
remove_values_matching: nil,
|
284
|
+
remove_zero_values: false,
|
285
|
+
required_headers: nil,
|
286
|
+
row_sep: $INPUT_RECORD_SEPARATOR,
|
287
|
+
skip_lines: nil,
|
288
|
+
strings_as_keys: false,
|
289
|
+
strip_chars_from_headers: nil,
|
290
|
+
strip_whitespace: true,
|
291
|
+
user_provided_headers: nil,
|
292
|
+
value_converters: nil,
|
293
|
+
verbose: false,
|
294
|
+
}
|
295
|
+
end
|
296
|
+
|
272
297
|
def self.blank?(value)
|
273
298
|
case value
|
274
299
|
when Array
|
@@ -347,6 +372,8 @@ module SmarterCSV
|
|
347
372
|
lines += 1
|
348
373
|
break if options[:auto_row_sep_chars] && options[:auto_row_sep_chars] > 0 && lines >= options[:auto_row_sep_chars]
|
349
374
|
end
|
375
|
+
filehandle.rewind
|
376
|
+
|
350
377
|
counts["\r"] += 1 if last_char == "\r"
|
351
378
|
# find the key/value pair with the largest counter:
|
352
379
|
k,_ = counts.max_by{|_,v| v}
|
data/lib/smarter_csv/version.rb
CHANGED
data/lib/smarter_csv.rb
CHANGED
data/smarter_csv.gemspec
CHANGED
@@ -18,6 +18,7 @@ Gem::Specification.new do |spec|
|
|
18
18
|
spec.require_paths = ["lib"]
|
19
19
|
spec.requirements = ['csv'] # for CSV.parse() only needed in case we have quoted fields
|
20
20
|
spec.add_development_dependency "rspec"
|
21
|
+
spec.add_development_dependency "simplecov"
|
21
22
|
# spec.add_development_dependency "guard-rspec"
|
22
23
|
|
23
24
|
spec.metadata["homepage_uri"] = spec.homepage
|
@@ -3,7 +3,6 @@ require 'spec_helper'
|
|
3
3
|
fixture_path = 'spec/fixtures'
|
4
4
|
|
5
5
|
describe 'process files with line endings explicitly pre-specified' do
|
6
|
-
|
7
6
|
it 'should process a file with \n for line endings and within data fields' do
|
8
7
|
sep = "\n"
|
9
8
|
options = {:row_sep => sep}
|
@@ -83,14 +82,14 @@ describe 'process files with line endings explicitly pre-specified' do
|
|
83
82
|
data[1][:members].should == ["Jimmy Page", "Robert Plant", "John Bonham", "John Paul Jones"].join(text_sep)
|
84
83
|
data[1][:albums].should == ["Led Zeppelin", "Led Zeppelin II", "Led Zeppelin III", "Led Zeppelin IV"].join(text_sep)
|
85
84
|
end
|
86
|
-
|
87
85
|
end
|
88
86
|
|
89
87
|
describe 'process files with line endings in automatic mode' do
|
88
|
+
let(:options) { { row_sep: :auto } }
|
90
89
|
|
91
90
|
it 'should process a file with \n for line endings and within data fields' do
|
92
91
|
sep = "\n"
|
93
|
-
data = SmarterCSV.process("#{fixture_path}/carriage_returns_n.csv",
|
92
|
+
data = SmarterCSV.process("#{fixture_path}/carriage_returns_n.csv", options)
|
94
93
|
data.flatten.size.should == 8
|
95
94
|
data[0][:name].should == "Anfield"
|
96
95
|
data[0][:street].should == "Anfield Road"
|
@@ -112,7 +111,29 @@ describe 'process files with line endings in automatic mode' do
|
|
112
111
|
|
113
112
|
it 'should process a file with \r for line endings and within data fields' do
|
114
113
|
sep = "\r"
|
115
|
-
data = SmarterCSV.process("#{fixture_path}/carriage_returns_r.csv",
|
114
|
+
data = SmarterCSV.process("#{fixture_path}/carriage_returns_r.csv", options)
|
115
|
+
data.flatten.size.should == 8
|
116
|
+
data[0][:name].should == "Anfield"
|
117
|
+
data[0][:street].should == "Anfield Road"
|
118
|
+
data[0][:city].should == "Liverpool"
|
119
|
+
data[1][:name].should == ["Highbury", "Highbury House"].join(sep)
|
120
|
+
data[2][:street].should == ["Sir Matt ", "Busby Way"].join(sep)
|
121
|
+
data[3][:city].should == ["Newcastle-upon-tyne ", "Tyne and Wear"].join(sep)
|
122
|
+
data[4][:name].should == ["White Hart Lane", "(The Lane)"].join(sep)
|
123
|
+
data[4][:street].should == ["Bill Nicholson Way ", "748 High Rd"].join(sep)
|
124
|
+
data[4][:city].should == ["Tottenham", "London"].join(sep)
|
125
|
+
data[5][:name].should == "Stamford Bridge"
|
126
|
+
data[5][:street].should == ["Fulham Road", "London"].join(sep)
|
127
|
+
data[5][:city].should be_nil
|
128
|
+
data[6][:name].should == ["Etihad Stadium", "Rowsley St", "Manchester"].join(sep)
|
129
|
+
data[7][:name].should == "Goodison"
|
130
|
+
data[7][:street].should == "Goodison Road"
|
131
|
+
data[7][:city].should == "Liverpool"
|
132
|
+
end
|
133
|
+
|
134
|
+
it 'also works when auto is given a string' do
|
135
|
+
sep = "\r"
|
136
|
+
data = SmarterCSV.process("#{fixture_path}/carriage_returns_r.csv", {row_sep: 'auto'})
|
116
137
|
data.flatten.size.should == 8
|
117
138
|
data[0][:name].should == "Anfield"
|
118
139
|
data[0][:street].should == "Anfield Road"
|
@@ -134,7 +155,7 @@ describe 'process files with line endings in automatic mode' do
|
|
134
155
|
|
135
156
|
it 'should process a file with \r\n for line endings and within data fields' do
|
136
157
|
sep = "\r\n"
|
137
|
-
data = SmarterCSV.process("#{fixture_path}/carriage_returns_rn.csv",
|
158
|
+
data = SmarterCSV.process("#{fixture_path}/carriage_returns_rn.csv", options)
|
138
159
|
data.flatten.size.should == 8
|
139
160
|
data[0][:name].should == "Anfield"
|
140
161
|
data[0][:street].should == "Anfield Road"
|
@@ -157,7 +178,7 @@ describe 'process files with line endings in automatic mode' do
|
|
157
178
|
it 'should process a file with more quoted text carriage return characters (\r) than line ending characters (\n)' do
|
158
179
|
row_sep = "\n"
|
159
180
|
text_sep = "\r"
|
160
|
-
data = SmarterCSV.process("#{fixture_path}/carriage_returns_quoted.csv",
|
181
|
+
data = SmarterCSV.process("#{fixture_path}/carriage_returns_quoted.csv", options)
|
161
182
|
data.flatten.size.should == 2
|
162
183
|
data[0][:band].should == "New Order"
|
163
184
|
data[0][:members].should == ["Bernard Sumner", "Peter Hook", "Stephen Morris", "Gillian Gilbert"].join(text_sep)
|
@@ -166,5 +187,4 @@ describe 'process files with line endings in automatic mode' do
|
|
166
187
|
data[1][:members].should == ["Jimmy Page", "Robert Plant", "John Bonham", "John Paul Jones"].join(text_sep)
|
167
188
|
data[1][:albums].should == ["Led Zeppelin", "Led Zeppelin II", "Led Zeppelin III", "Led Zeppelin IV"].join(text_sep)
|
168
189
|
end
|
169
|
-
|
170
190
|
end
|
@@ -48,7 +48,7 @@ describe 'can handle col_sep' do
|
|
48
48
|
end
|
49
49
|
|
50
50
|
describe 'auto-detection of separator' do
|
51
|
-
options = {:
|
51
|
+
options = {col_sep: :auto}
|
52
52
|
|
53
53
|
it 'auto-detects comma separator and loads data' do
|
54
54
|
data = SmarterCSV.process("#{fixture_path}/separator_comma.csv", options)
|
@@ -85,5 +85,11 @@ describe 'can handle col_sep' do
|
|
85
85
|
SmarterCSV.process("#{fixture_path}/binary.csv", options)
|
86
86
|
}.to raise_exception SmarterCSV::NoColSepDetected
|
87
87
|
end
|
88
|
+
|
89
|
+
it 'also works when auto is given a string' do
|
90
|
+
data = SmarterCSV.process("#{fixture_path}/separator_pipe.csv", col_sep: 'auto')
|
91
|
+
data.first.keys.size.should == 4
|
92
|
+
data.size.should eq 3
|
93
|
+
end
|
88
94
|
end
|
89
95
|
end
|
metadata
CHANGED
@@ -1,14 +1,14 @@
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
2
2
|
name: smarter_csv
|
3
3
|
version: !ruby/object:Gem::Version
|
4
|
-
version: 1.4.
|
4
|
+
version: 1.4.2
|
5
5
|
platform: ruby
|
6
6
|
authors:
|
7
7
|
- Tilo Sloboda
|
8
8
|
autorequire:
|
9
9
|
bindir: bin
|
10
10
|
cert_chain: []
|
11
|
-
date: 2022-02-
|
11
|
+
date: 2022-02-15 00:00:00.000000000 Z
|
12
12
|
dependencies:
|
13
13
|
- !ruby/object:Gem::Dependency
|
14
14
|
name: rspec
|
@@ -24,6 +24,20 @@ dependencies:
|
|
24
24
|
- - ">="
|
25
25
|
- !ruby/object:Gem::Version
|
26
26
|
version: '0'
|
27
|
+
- !ruby/object:Gem::Dependency
|
28
|
+
name: simplecov
|
29
|
+
requirement: !ruby/object:Gem::Requirement
|
30
|
+
requirements:
|
31
|
+
- - ">="
|
32
|
+
- !ruby/object:Gem::Version
|
33
|
+
version: '0'
|
34
|
+
type: :development
|
35
|
+
prerelease: false
|
36
|
+
version_requirements: !ruby/object:Gem::Requirement
|
37
|
+
requirements:
|
38
|
+
- - ">="
|
39
|
+
- !ruby/object:Gem::Version
|
40
|
+
version: '0'
|
27
41
|
description: Ruby Gem for smarter importing of CSV Files as Array(s) of Hashes, with
|
28
42
|
optional features for processing large files in parallel, embedded comments, unusual
|
29
43
|
field- and record-separators, flexible mapping of CSV-headers to Hash-keys
|
@@ -38,6 +52,7 @@ files:
|
|
38
52
|
- ".rvmrc"
|
39
53
|
- ".travis.yml"
|
40
54
|
- CHANGELOG.md
|
55
|
+
- CONTRIBUTORS.md
|
41
56
|
- Gemfile
|
42
57
|
- LICENSE.txt
|
43
58
|
- README.md
|
@@ -143,7 +158,7 @@ required_rubygems_version: !ruby/object:Gem::Requirement
|
|
143
158
|
version: '0'
|
144
159
|
requirements:
|
145
160
|
- csv
|
146
|
-
rubygems_version: 3.1.
|
161
|
+
rubygems_version: 3.1.6
|
147
162
|
signing_key:
|
148
163
|
specification_version: 4
|
149
164
|
summary: Ruby Gem for smarter importing of CSV Files (and CSV-like files), with lots
|