smarter_csv 1.2.6 → 1.4.0
Sign up to get free protection for your applications and to get access to all the features.
- checksums.yaml +4 -4
- data/.travis.yml +10 -4
- data/CHANGELOG.md +165 -0
- data/Gemfile +0 -1
- data/LICENSE.txt +21 -0
- data/README.md +10 -156
- data/lib/smarter_csv/smarter_csv.rb +68 -18
- data/lib/smarter_csv/version.rb +1 -1
- data/smarter_csv.gemspec +19 -16
- data/spec/fixtures/empty_columns_1.csv +2 -0
- data/spec/fixtures/empty_columns_2.csv +2 -0
- data/spec/fixtures/key_mapping.csv +2 -0
- data/spec/fixtures/numeric.csv +1 -1
- data/spec/fixtures/separator_colon.csv +4 -0
- data/spec/fixtures/separator_comma.csv +4 -0
- data/spec/fixtures/separator_pipe.csv +4 -0
- data/spec/fixtures/{separator.csv → separator_semi.csv} +0 -0
- data/spec/fixtures/separator_tab.csv +4 -0
- data/spec/smarter_csv/blank_spec.rb +55 -0
- data/spec/smarter_csv/column_separator_spec.rb +83 -5
- data/spec/smarter_csv/empty_columns_spec.rb +74 -0
- data/spec/smarter_csv/key_mapping_spec.rb +31 -0
- data/spec/smarter_csv/malformed_spec.rb +0 -4
- metadata +32 -17
checksums.yaml
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
---
|
2
2
|
SHA256:
|
3
|
-
metadata.gz:
|
4
|
-
data.tar.gz:
|
3
|
+
metadata.gz: c8236e4cc8f0081efd9b74f12ad4b5342707d0a2f883414b07538160910008a3
|
4
|
+
data.tar.gz: b04a53b0030bf6c623aa19fb15c0c6c5ca123ce2ff85d47f176884fffa0f9811
|
5
5
|
SHA512:
|
6
|
-
metadata.gz:
|
7
|
-
data.tar.gz:
|
6
|
+
metadata.gz: f2ddaa7bf44362c8bb4439289172d40b6ca926a67a8a35fb335473ddf7349658a629f3008ece5314c6bc5fa17145a2ae89b4d706b9c130a1642a51f2434d5e21
|
7
|
+
data.tar.gz: b48908b657a07589886873fe251263dabbe6e2333a1fc025dfede085841544458d4498ba1d288a4a7c0de3875d1c14631cc584b2a1cb7fd0be1543b758781dd3
|
data/.travis.yml
CHANGED
@@ -6,10 +6,16 @@ before_install:
|
|
6
6
|
|
7
7
|
matrix:
|
8
8
|
include:
|
9
|
-
- rvm: 2.2.
|
10
|
-
- rvm: 2.3.
|
11
|
-
- rvm: 2.4.
|
12
|
-
- rvm:
|
9
|
+
- rvm: 2.2.10
|
10
|
+
- rvm: 2.3.8
|
11
|
+
- rvm: 2.4.10
|
12
|
+
- rvm: 2.5.8
|
13
|
+
- rvm: 2.6.9
|
14
|
+
- rvm: 2.7.5
|
15
|
+
- rvm: 3.0.3
|
16
|
+
- rvm: 3.1.0
|
17
|
+
- rvm: jruby-9.2.19.0
|
18
|
+
- rvm: jruby-9.3.3.0
|
13
19
|
env:
|
14
20
|
- JRUBY_OPTS="--server -Xcompile.invokedynamic=false -J-XX:+TieredCompilation -J-XX:TieredStopAtLevel=1 -J-noverify -J-Xms512m -J-Xmx1024m"
|
15
21
|
- rvm: ruby-head
|
data/CHANGELOG.md
ADDED
@@ -0,0 +1,165 @@
|
|
1
|
+
|
2
|
+
# SmarterCSV 1.x Change Log
|
3
|
+
|
4
|
+
## 1.4.0 (2022-01-11)
|
5
|
+
* dropped GPL license, smarter_csv is now only using the MIT License
|
6
|
+
* added experimental option `col_sep: 'auto` to auto-detect the column separator (issue #183)
|
7
|
+
The default behavior is still to assume `,` is the column separator.
|
8
|
+
* fixed buggy behavior when using `remove_empty_values: false` (issue #168)
|
9
|
+
* fixed Ruby 3.0 deprecation
|
10
|
+
|
11
|
+
## 1.3.0 (2022-01-06) Breaking code change if you used `--key_mappings`
|
12
|
+
* fix bug for key_mappings (issue #181)
|
13
|
+
The values of the `key_mappings` hash will now be used "as is", and no longer forced to be symbols
|
14
|
+
|
15
|
+
**Users with existing code with `--key_mappings` need to change their code** to
|
16
|
+
* either use symbols in the `key_mapping` hash
|
17
|
+
* or change the expected keys from symbols to strings
|
18
|
+
|
19
|
+
## 1.2.9 (2021-11-22) (PULLED)
|
20
|
+
* fix bug for key_mappings (issue #181)
|
21
|
+
The values of the `key_mappings` hash will now be used "as is", and no longer forced to be symbols
|
22
|
+
|
23
|
+
## 1.2.8 (2020-02-04)
|
24
|
+
* fix deprecation warnings on Ruby 2.7 (thank to Diego Salido)
|
25
|
+
|
26
|
+
## 1.2.7 (2020-02-03)
|
27
|
+
|
28
|
+
## 1.2.6 (2018-11-13)
|
29
|
+
* fixing error caused by calling f.close when we do not hand in a file
|
30
|
+
|
31
|
+
## 1.2.5 (2018-09-16)
|
32
|
+
* fixing issue #136 with comments in CSV files
|
33
|
+
* fixing error class hierarchy
|
34
|
+
|
35
|
+
## 1.2.4 (2018-08-06)
|
36
|
+
* using Rails blank? if it's available
|
37
|
+
|
38
|
+
## 1.2.3 (2018-01-27)
|
39
|
+
* fixed regression / test
|
40
|
+
* fuxed quote_char interpolation for headers, but not data (thanks to Colin Petruno)
|
41
|
+
* bugfix (thanks to Joshua Smith for reporting)
|
42
|
+
|
43
|
+
## 1.2.0 (2018-01-20)
|
44
|
+
* add default validation that a header can only appear once
|
45
|
+
* add option `required_headers`
|
46
|
+
|
47
|
+
## 1.1.5 (2017-11-05)
|
48
|
+
* fix issue with invalid byte sequences in header (issue #103, thanks to Dave Myron)
|
49
|
+
* fix issue with invalid byte sequences in multi-line data (thanks to Ivan Ushakov)
|
50
|
+
* analyze only 500 characters by default when `:row_sep => :auto` is used.
|
51
|
+
added option `row_sep_auto_chars` to change the default if necessary. (thanks to Matthieu Paret)
|
52
|
+
|
53
|
+
## 1.1.4 (2017-01-16)
|
54
|
+
* fixing UTF-8 related bug which was introduced in 1.1.2 (thanks to Tirdad C.)
|
55
|
+
|
56
|
+
## 1.1.3 (2016-12-30)
|
57
|
+
* added warning when options indicate UTF-8 processing, but input filehandle is not opened with r:UTF-8 option
|
58
|
+
|
59
|
+
## 1.1.2 (2016-12-29)
|
60
|
+
* added option `invalid_byte_sequence` (thanks to polycarpou)
|
61
|
+
* added comments on handling of UTF-8 encoding when opening from File vs. OpenURI (thanks to KevinColemanInc)
|
62
|
+
|
63
|
+
## 1.1.1 (2016-11-26)
|
64
|
+
* added option to `skip_lines` (thanks to wal)
|
65
|
+
* added option to `force_utf8` encoding (thanks to jordangraft)
|
66
|
+
* bugfix if no headers in input data (thanks to esBeee)
|
67
|
+
* ensure input file is closed (thanks to waldyr)
|
68
|
+
* improved verbose output (thankd to benmaher)
|
69
|
+
* improved documentation
|
70
|
+
|
71
|
+
## 1.1.0 (2015-07-26)
|
72
|
+
* added feature :value_converters, which allows parsing of dates, money, and other things (thanks to Raphaël Bleuse, Lucas Camargo de Almeida, Alejandro)
|
73
|
+
* added error if :headers_in_file is set to false, and no :user_provided_headers are given (thanks to innhyu)
|
74
|
+
* added support to convert dashes to underscore characters in headers (thanks to César Camacho)
|
75
|
+
* fixing automatic detection of \r\n line-endings (thanks to feens)
|
76
|
+
|
77
|
+
## 1.0.19 (2014-10-29)
|
78
|
+
* added option :keep_original_headers to keep CSV-headers as-is (thanks to Benjamin Thouret)
|
79
|
+
|
80
|
+
## 1.0.18 (2014-10-27)
|
81
|
+
* added support for multi-line fields / csv fields containing CR (thanks to Chris Hilton) (issue #31)
|
82
|
+
|
83
|
+
## 1.0.17 (2014-01-13)
|
84
|
+
* added option to set :row_sep to :auto , for automatic detection of the row-separator (issue #22)
|
85
|
+
|
86
|
+
## 1.0.16 (2014-01-13)
|
87
|
+
* :convert_values_to_numeric option can now be qualified with :except or :only (thanks to Hugo Lepetit)
|
88
|
+
* removed deprecated `process_csv` method
|
89
|
+
|
90
|
+
## 1.0.15 (2013-12-07)
|
91
|
+
* new option:
|
92
|
+
* :remove_unmapped_keys to completely ignore columns which were not mapped with :key_mapping (thanks to Dave Sanders)
|
93
|
+
|
94
|
+
## 1.0.14 (2013-11-01)
|
95
|
+
* added GPL-2 and MIT license to GEM spec file; if you need another license contact me
|
96
|
+
|
97
|
+
## 1.0.12 (2013-10-15)
|
98
|
+
* added RSpec tests
|
99
|
+
|
100
|
+
## 1.0.11 (2013-09-28)
|
101
|
+
* bugfix : fixed issue #18 - fixing issue with last chunk not being properly returned (thanks to Jordan Running)
|
102
|
+
* added RSpec tests
|
103
|
+
|
104
|
+
## 1.0.10 (2013-06-26)
|
105
|
+
* bugfix : fixed issue #14 - passing options along to CSV.parse (thanks to Marcos Zimmermann)
|
106
|
+
|
107
|
+
## 1.0.9 (2013-06-19)
|
108
|
+
* bugfix : fixed issue #13 with negative integers and floats not being correctly converted (thanks to Graham Wetzler)
|
109
|
+
|
110
|
+
## 1.0.8 (2013-06-01)
|
111
|
+
|
112
|
+
* bugfix : fixed issue with nil values in inputs with quote-char (thanks to Félix Bellanger)
|
113
|
+
* new options:
|
114
|
+
* :force_simple_split : to force simiple splitting on :col_sep character for non-standard CSV-files. e.g. without properly escaped :quote_char
|
115
|
+
* :verbose : print out line number while processing (to track down problems in input files)
|
116
|
+
|
117
|
+
## 1.0.7 (2013-05-20)
|
118
|
+
|
119
|
+
* allowing process to work with objects with a 'readline' method (thanks to taq)
|
120
|
+
* added options:
|
121
|
+
* :file_encoding : defaults to utf8 (thanks to MrTin, Paxa)
|
122
|
+
|
123
|
+
## 1.0.6 (2013-05-19)
|
124
|
+
|
125
|
+
* bugfix : quoted fields are now correctly parsed
|
126
|
+
|
127
|
+
## 1.0.5 (2013-05-08)
|
128
|
+
|
129
|
+
* bugfix : for :headers_in_file option
|
130
|
+
|
131
|
+
## 1.0.4 (2012-08-17)
|
132
|
+
|
133
|
+
* renamed the following options:
|
134
|
+
* :strip_whitepace_from_values => :strip_whitespace - removes leading/trailing whitespace from headers and values
|
135
|
+
|
136
|
+
## 1.0.3 (2012-08-16)
|
137
|
+
|
138
|
+
* added the following options:
|
139
|
+
* :strip_whitepace_from_values - removes leading/trailing whitespace from values
|
140
|
+
|
141
|
+
## 1.0.2 (2012-08-02)
|
142
|
+
|
143
|
+
* added more options for dealing with headers:
|
144
|
+
* :user_provided_headers ,user provided Array with header strings or symbols, to precisely define what the headers should be, overriding any in-file headers (default: nil)
|
145
|
+
* :headers_in_file , if the file contains headers as the first line (default: true)
|
146
|
+
|
147
|
+
## 1.0.1 (2012-07-30)
|
148
|
+
|
149
|
+
* added the following options:
|
150
|
+
* :downcase_header
|
151
|
+
* :strings_as_keys
|
152
|
+
* :remove_zero_values
|
153
|
+
* :remove_values_matching
|
154
|
+
* :remove_empty_hashes
|
155
|
+
* :convert_values_to_numeric
|
156
|
+
|
157
|
+
* renamed the following options:
|
158
|
+
* :remove_empty_fields => :remove_empty_values
|
159
|
+
|
160
|
+
|
161
|
+
## 1.0.0 (2012-07-29)
|
162
|
+
|
163
|
+
* renamed `SmarterCSV.process_csv` to `SmarterCSV.process`.
|
164
|
+
|
165
|
+
## 1.0.0.pre1 (2012-07-29)
|
data/Gemfile
CHANGED
data/LICENSE.txt
ADDED
@@ -0,0 +1,21 @@
|
|
1
|
+
The MIT License (MIT)
|
2
|
+
|
3
|
+
Copyright (c) 2022 Tilo Sloboda
|
4
|
+
|
5
|
+
Permission is hereby granted, free of charge, to any person obtaining a copy
|
6
|
+
of this software and associated documentation files (the "Software"), to deal
|
7
|
+
in the Software without restriction, including without limitation the rights
|
8
|
+
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
|
9
|
+
copies of the Software, and to permit persons to whom the Software is
|
10
|
+
furnished to do so, subject to the following conditions:
|
11
|
+
|
12
|
+
The above copyright notice and this permission notice shall be included in
|
13
|
+
all copies or substantial portions of the Software.
|
14
|
+
|
15
|
+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
16
|
+
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
17
|
+
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
|
18
|
+
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
19
|
+
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
|
20
|
+
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
|
21
|
+
THE SOFTWARE.
|
data/README.md
CHANGED
@@ -5,15 +5,14 @@
|
|
5
5
|
---------------
|
6
6
|
#### Service Announcement
|
7
7
|
|
8
|
-
Work towards SmarterCSV 2.0 is on it's way, with much improved features, and more streamlined options.
|
8
|
+
* Work towards SmarterCSV 2.0 is still on it's way, with much improved features, and more streamlined options.
|
9
|
+
Please check the 2.0-develop branch, open any issues and pull requests with mention of v2.0.
|
9
10
|
|
10
|
-
|
11
|
-
|
12
|
-
New versions on the 1.2 branch will soon print a deprecation warning if you set :verbose to true
|
13
|
-
See below for list of deprecated options.
|
11
|
+
* New versions on the 1.2 branch will soon print a deprecation warning if you set :verbose to true
|
12
|
+
See below for list of deprecated options.
|
14
13
|
|
15
14
|
---------------
|
16
|
-
#### SmarterCSV
|
15
|
+
#### SmarterCSV 1.x
|
17
16
|
|
18
17
|
`smarter_csv` is a Ruby Gem for smarter importing of CSV Files as Array(s) of Hashes, suitable for direct processing with Mongoid or ActiveRecord,
|
19
18
|
and parallel processing with Resque or Sidekiq.
|
@@ -207,7 +206,7 @@ The options and the block are optional.
|
|
207
206
|
| :skip_lines | nil | how many lines to skip before the first line or header line is processed |
|
208
207
|
| :comment_regexp | /^#/ | regular expression which matches comment lines (see NOTE about the CSV header) |
|
209
208
|
---------------------------------------------------------------------------------------------------------------------------------
|
210
|
-
| :col_sep | ',' | column separator
|
209
|
+
| :col_sep | ',' | column separator, can be set to 'auto' |
|
211
210
|
| :force_simple_split | false | force simple splitting on :col_sep character for non-standard CSV-files. |
|
212
211
|
| | | e.g. when :quote_char is not properly escaped |
|
213
212
|
| :row_sep | $/ ,"\n" | row separator or record separator , defaults to system's $/ , which defaults to "\n" |
|
@@ -222,7 +221,7 @@ The options and the block are optional.
|
|
222
221
|
| | | user provided Array of header strings or symbols, to define |
|
223
222
|
| | | what headers should be used, overriding any in-file headers. |
|
224
223
|
| | | You can not combine the :user_provided_headers and :key_mapping options |
|
225
|
-
| :remove_empty_hashes | true | remove / ignore any hashes which don't have any key/value pairs
|
224
|
+
| :remove_empty_hashes | true | remove / ignore any hashes which don't have any key/value pairs or all empty values |
|
226
225
|
| :verbose | false | print out line number while processing (to track down problems in input files) |
|
227
226
|
---------------------------------------------------------------------------------------------------------------------------------
|
228
227
|
|
@@ -248,7 +247,7 @@ And header and data validations will also be supported in 2.x
|
|
248
247
|
---------------------------------------------------------------------------------------------------------------------------------
|
249
248
|
| :value_converters | nil | supply a hash of :header => KlassName; the class needs to implement self.convert(val)|
|
250
249
|
| :remove_empty_values | true | remove values which have nil or empty strings as values |
|
251
|
-
| :remove_zero_values |
|
250
|
+
| :remove_zero_values | false | remove values which have a numeric value equal to zero / 0 |
|
252
251
|
| :remove_values_matching | nil | removes key/value pairs if value matches given regular expressions. e.g.: |
|
253
252
|
| | | /^\$0\.0+$/ to match $0.00 , or /^#VALUE!$/ to match errors in Excel spreadsheets |
|
254
253
|
| :convert_values_to_numeric | true | converts strings containing Integers or Floats to the appropriate class |
|
@@ -316,153 +315,7 @@ Or install it yourself as:
|
|
316
315
|
|
317
316
|
$ gem install smarter_csv
|
318
317
|
|
319
|
-
##
|
320
|
-
|
321
|
-
Planned in the next releases:
|
322
|
-
* programmatic header transformations
|
323
|
-
* CSV command line
|
324
|
-
|
325
|
-
## Changes
|
326
|
-
|
327
|
-
#### 1.2.6 (2018-11-13)
|
328
|
-
* fixing error caused by calling f.close when we do not hand in a file
|
329
|
-
|
330
|
-
#### 1.2.5 (2018-09-16)
|
331
|
-
* fixing issue #136 with comments in CSV files
|
332
|
-
* fixing error class hierarchy
|
333
|
-
|
334
|
-
#### 1.2.4 (2018-08-06)
|
335
|
-
* using Rails blank? if it's available
|
336
|
-
|
337
|
-
#### 1.2.3 (2018-01-27)
|
338
|
-
* fixed regression / test
|
339
|
-
* fuxed quote_char interpolation for headers, but not data (thanks to Colin Petruno)
|
340
|
-
* bugfix (thanks to Joshua Smith for reporting)
|
341
|
-
|
342
|
-
#### 1.2.0 (2018-01-20)
|
343
|
-
* add default validation that a header can only appear once
|
344
|
-
* add option `required_headers`
|
345
|
-
|
346
|
-
#### 1.1.5 (2017-11-05)
|
347
|
-
* fix issue with invalid byte sequences in header (issue #103, thanks to Dave Myron)
|
348
|
-
* fix issue with invalid byte sequences in multi-line data (thanks to Ivan Ushakov)
|
349
|
-
* analyze only 500 characters by default when `:row_sep => :auto` is used.
|
350
|
-
added option `row_sep_auto_chars` to change the default if necessary. (thanks to Matthieu Paret)
|
351
|
-
|
352
|
-
#### 1.1.4 (2017-01-16)
|
353
|
-
* fixing UTF-8 related bug which was introduced in 1.1.2 (thanks to Tirdad C.)
|
354
|
-
|
355
|
-
#### 1.1.3 (2016-12-30)
|
356
|
-
* added warning when options indicate UTF-8 processing, but input filehandle is not opened with r:UTF-8 option
|
357
|
-
|
358
|
-
#### 1.1.2 (2016-12-29)
|
359
|
-
* added option `invalid_byte_sequence` (thanks to polycarpou)
|
360
|
-
* added comments on handling of UTF-8 encoding when opening from File vs. OpenURI (thanks to KevinColemanInc)
|
361
|
-
|
362
|
-
#### 1.1.1 (2016-11-26)
|
363
|
-
* added option to `skip_lines` (thanks to wal)
|
364
|
-
* added option to `force_utf8` encoding (thanks to jordangraft)
|
365
|
-
* bugfix if no headers in input data (thanks to esBeee)
|
366
|
-
* ensure input file is closed (thanks to waldyr)
|
367
|
-
* improved verbose output (thankd to benmaher)
|
368
|
-
* improved documentation
|
369
|
-
|
370
|
-
#### 1.1.0 (2015-07-26)
|
371
|
-
* added feature :value_converters, which allows parsing of dates, money, and other things (thanks to Raphaël Bleuse, Lucas Camargo de Almeida, Alejandro)
|
372
|
-
* added error if :headers_in_file is set to false, and no :user_provided_headers are given (thanks to innhyu)
|
373
|
-
* added support to convert dashes to underscore characters in headers (thanks to César Camacho)
|
374
|
-
* fixing automatic detection of \r\n line-endings (thanks to feens)
|
375
|
-
|
376
|
-
#### 1.0.19 (2014-10-29)
|
377
|
-
* added option :keep_original_headers to keep CSV-headers as-is (thanks to Benjamin Thouret)
|
378
|
-
|
379
|
-
#### 1.0.18 (2014-10-27)
|
380
|
-
* added support for multi-line fields / csv fields containing CR (thanks to Chris Hilton) (issue #31)
|
381
|
-
|
382
|
-
#### 1.0.17 (2014-01-13)
|
383
|
-
* added option to set :row_sep to :auto , for automatic detection of the row-separator (issue #22)
|
384
|
-
|
385
|
-
#### 1.0.16 (2014-01-13)
|
386
|
-
* :convert_values_to_numeric option can now be qualified with :except or :only (thanks to Hugo Lepetit)
|
387
|
-
* removed deprecated `process_csv` method
|
388
|
-
|
389
|
-
#### 1.0.15 (2013-12-07)
|
390
|
-
* new option:
|
391
|
-
* :remove_unmapped_keys to completely ignore columns which were not mapped with :key_mapping (thanks to Dave Sanders)
|
392
|
-
|
393
|
-
#### 1.0.14 (2013-11-01)
|
394
|
-
* added GPL-2 and MIT license to GEM spec file; if you need another license contact me
|
395
|
-
|
396
|
-
#### 1.0.12 (2013-10-15)
|
397
|
-
* added RSpec tests
|
398
|
-
|
399
|
-
#### 1.0.11 (2013-09-28)
|
400
|
-
* bugfix : fixed issue #18 - fixing issue with last chunk not being properly returned (thanks to Jordan Running)
|
401
|
-
* added RSpec tests
|
402
|
-
|
403
|
-
#### 1.0.10 (2013-06-26)
|
404
|
-
* bugfix : fixed issue #14 - passing options along to CSV.parse (thanks to Marcos Zimmermann)
|
405
|
-
|
406
|
-
#### 1.0.9 (2013-06-19)
|
407
|
-
* bugfix : fixed issue #13 with negative integers and floats not being correctly converted (thanks to Graham Wetzler)
|
408
|
-
|
409
|
-
#### 1.0.8 (2013-06-01)
|
410
|
-
|
411
|
-
* bugfix : fixed issue with nil values in inputs with quote-char (thanks to Félix Bellanger)
|
412
|
-
* new options:
|
413
|
-
* :force_simple_split : to force simiple splitting on :col_sep character for non-standard CSV-files. e.g. without properly escaped :quote_char
|
414
|
-
* :verbose : print out line number while processing (to track down problems in input files)
|
415
|
-
|
416
|
-
#### 1.0.7 (2013-05-20)
|
417
|
-
|
418
|
-
* allowing process to work with objects with a 'readline' method (thanks to taq)
|
419
|
-
* added options:
|
420
|
-
* :file_encoding : defaults to utf8 (thanks to MrTin, Paxa)
|
421
|
-
|
422
|
-
#### 1.0.6 (2013-05-19)
|
423
|
-
|
424
|
-
* bugfix : quoted fields are now correctly parsed
|
425
|
-
|
426
|
-
#### 1.0.5 (2013-05-08)
|
427
|
-
|
428
|
-
* bugfix : for :headers_in_file option
|
429
|
-
|
430
|
-
#### 1.0.4 (2012-08-17)
|
431
|
-
|
432
|
-
* renamed the following options:
|
433
|
-
* :strip_whitepace_from_values => :strip_whitespace - removes leading/trailing whitespace from headers and values
|
434
|
-
|
435
|
-
#### 1.0.3 (2012-08-16)
|
436
|
-
|
437
|
-
* added the following options:
|
438
|
-
* :strip_whitepace_from_values - removes leading/trailing whitespace from values
|
439
|
-
|
440
|
-
#### 1.0.2 (2012-08-02)
|
441
|
-
|
442
|
-
* added more options for dealing with headers:
|
443
|
-
* :user_provided_headers ,user provided Array with header strings or symbols, to precisely define what the headers should be, overriding any in-file headers (default: nil)
|
444
|
-
* :headers_in_file , if the file contains headers as the first line (default: true)
|
445
|
-
|
446
|
-
#### 1.0.1 (2012-07-30)
|
447
|
-
|
448
|
-
* added the following options:
|
449
|
-
* :downcase_header
|
450
|
-
* :strings_as_keys
|
451
|
-
* :remove_zero_values
|
452
|
-
* :remove_values_matching
|
453
|
-
* :remove_empty_hashes
|
454
|
-
* :convert_values_to_numeric
|
455
|
-
|
456
|
-
* renamed the following options:
|
457
|
-
* :remove_empty_fields => :remove_empty_values
|
458
|
-
|
459
|
-
|
460
|
-
#### 1.0.0 (2012-07-29)
|
461
|
-
|
462
|
-
* renamed `SmarterCSV.process_csv` to `SmarterCSV.process`.
|
463
|
-
|
464
|
-
#### 1.0.0.pre1 (2012-07-29)
|
465
|
-
|
318
|
+
## [ChangeLog](./CHANGELOG.md)
|
466
319
|
|
467
320
|
## Reporting Bugs / Feature Requests
|
468
321
|
|
@@ -508,6 +361,7 @@ And a special thanks to those who contributed pull requests:
|
|
508
361
|
* [Rohit Amarnath](https://github.com/ramarnat)
|
509
362
|
* [Joshua Smith](https://github.com/enviable)
|
510
363
|
* [Colin Petruno](https://github.com/colinpetruno)
|
364
|
+
* [Diego Salido](https://github.com/salidux)
|
511
365
|
|
512
366
|
|
513
367
|
## Contributing
|
@@ -4,10 +4,10 @@ module SmarterCSV
|
|
4
4
|
class IncorrectOption < SmarterCSVException; end
|
5
5
|
class DuplicateHeaders < SmarterCSVException; end
|
6
6
|
class MissingHeaders < SmarterCSVException; end
|
7
|
-
|
7
|
+
class NoColSepDetected < SmarterCSVException; end
|
8
8
|
|
9
9
|
def SmarterCSV.process(input, options={}, &block) # first parameter: filename or input object with readline method
|
10
|
-
default_options = {:col_sep => ','
|
10
|
+
default_options = {:col_sep => ',', :row_sep => $INPUT_RECORD_SEPARATOR, :quote_char => '"', :force_simple_split => false , :verbose => false ,
|
11
11
|
:remove_empty_values => true, :remove_zero_values => false , :remove_values_matching => nil , :remove_empty_hashes => true , :strip_whitespace => true,
|
12
12
|
:convert_values_to_numeric => true, :strip_chars_from_headers => nil , :user_provided_headers => nil , :headers_in_file => true,
|
13
13
|
:comment_regexp => /\A#/, :chunk_size => nil , :key_mapping_hash => nil , :downcase_header => true, :strings_as_keys => false, :file_encoding => 'utf-8',
|
@@ -19,13 +19,16 @@ module SmarterCSV
|
|
19
19
|
csv_options = options.select{|k,v| [:col_sep, :row_sep, :quote_char].include?(k)} # options.slice(:col_sep, :row_sep, :quote_char)
|
20
20
|
headerA = []
|
21
21
|
result = []
|
22
|
-
old_row_sep =
|
22
|
+
old_row_sep = $INPUT_RECORD_SEPARATOR
|
23
23
|
file_line_count = 0
|
24
24
|
csv_line_count = 0
|
25
25
|
has_rails = !! defined?(Rails)
|
26
26
|
begin
|
27
27
|
f = input.respond_to?(:readline) ? input : File.open(input, "r:#{options[:file_encoding]}")
|
28
28
|
|
29
|
+
# attempt to auto-detect column separator
|
30
|
+
options[:col_sep] = guess_column_separator(f) if options[:col_sep] == 'auto'
|
31
|
+
|
29
32
|
if (options[:force_utf8] || options[:file_encoding] =~ /utf-8/i) && ( f.respond_to?(:external_encoding) && f.external_encoding != Encoding.find('UTF-8') || f.respond_to?(:encoding) && f.encoding != Encoding.find('UTF-8') )
|
30
33
|
puts 'WARNING: you are trying to process UTF-8 input, but did not open the input with "b:utf-8" option. See README file "NOTES about File Encodings".'
|
31
34
|
end
|
@@ -34,7 +37,7 @@ module SmarterCSV
|
|
34
37
|
options[:row_sep] = line_ending = SmarterCSV.guess_line_ending( f, options )
|
35
38
|
f.rewind
|
36
39
|
end
|
37
|
-
|
40
|
+
$INPUT_RECORD_SEPARATOR = options[:row_sep]
|
38
41
|
|
39
42
|
if options[:skip_lines].to_i > 0
|
40
43
|
options[:skip_lines].to_i.times{f.readline}
|
@@ -53,21 +56,21 @@ module SmarterCSV
|
|
53
56
|
|
54
57
|
if (header =~ %r{#{options[:quote_char]}}) and (! options[:force_simple_split])
|
55
58
|
file_headerA = begin
|
56
|
-
CSV.parse( header, csv_options ).flatten.collect!{|x| x.nil? ? '' : x} # to deal with nil values from CSV.parse
|
59
|
+
CSV.parse( header, **csv_options ).flatten.collect!{|x| x.nil? ? '' : x} # to deal with nil values from CSV.parse
|
57
60
|
rescue CSV::MalformedCSVError => e
|
58
61
|
raise $!, "#{$!} [SmarterCSV: csv line #{csv_line_count}]", $!.backtrace
|
59
62
|
end
|
60
63
|
else
|
61
64
|
file_headerA = header.split(options[:col_sep])
|
62
65
|
end
|
66
|
+
file_header_size = file_headerA.size # before mapping, which could delete keys
|
67
|
+
|
63
68
|
file_headerA.map!{|x| x.gsub(%r/#{options[:quote_char]}/,'') }
|
64
69
|
file_headerA.map!{|x| x.strip} if options[:strip_whitespace]
|
65
70
|
unless options[:keep_original_headers]
|
66
71
|
file_headerA.map!{|x| x.gsub(/\s+|-+/,'_')}
|
67
72
|
file_headerA.map!{|x| x.downcase } if options[:downcase_header]
|
68
73
|
end
|
69
|
-
|
70
|
-
file_header_size = file_headerA.size
|
71
74
|
else
|
72
75
|
raise SmarterCSV::IncorrectOption , "ERROR: If :headers_in_file is set to false, you have to provide :user_provided_headers" if options[:user_provided_headers].nil?
|
73
76
|
end
|
@@ -84,6 +87,8 @@ module SmarterCSV
|
|
84
87
|
else
|
85
88
|
headerA = file_headerA
|
86
89
|
end
|
90
|
+
header_size = headerA.size
|
91
|
+
|
87
92
|
headerA.map!{|x| x.to_sym } unless options[:strings_as_keys] || options[:keep_original_headers]
|
88
93
|
|
89
94
|
unless options[:user_provided_headers] # wouldn't make sense to re-map user provided headers
|
@@ -92,7 +97,7 @@ module SmarterCSV
|
|
92
97
|
# do some key mapping on the keys in the file header
|
93
98
|
# if you want to completely delete a key, then map it to nil or to ''
|
94
99
|
if ! key_mappingH.nil? && key_mappingH.class == Hash && key_mappingH.keys.size > 0
|
95
|
-
headerA.map!{|x| key_mappingH.has_key?(x) ? (key_mappingH[x].nil? ? nil : key_mappingH[x]
|
100
|
+
headerA.map!{|x| key_mappingH.has_key?(x) ? (key_mappingH[x].nil? ? nil : key_mappingH[x]) : (options[:remove_unmapped_keys] ? nil : x)}
|
96
101
|
end
|
97
102
|
end
|
98
103
|
|
@@ -123,7 +128,7 @@ module SmarterCSV
|
|
123
128
|
|
124
129
|
# now on to processing all the rest of the lines in the CSV file:
|
125
130
|
while ! f.eof? # we can't use f.readlines() here, because this would read the whole file into memory at once, and eof => true
|
126
|
-
line = f.readline # read one line.. this uses the input_record_separator
|
131
|
+
line = f.readline # read one line.. this uses the input_record_separator $INPUT_RECORD_SEPARATOR which we set previously!
|
127
132
|
|
128
133
|
# replace invalid byte sequence in UTF-8 with question mark to avoid errors
|
129
134
|
line = line.force_encoding('utf-8').encode('utf-8', invalid: :replace, undef: :replace, replace: options[:invalid_byte_sequence]) if options[:force_utf8] || options[:file_encoding] !~ /utf-8/i
|
@@ -145,20 +150,26 @@ module SmarterCSV
|
|
145
150
|
end
|
146
151
|
print "\nline contains uneven number of quote chars so including content through file line %d\n" % file_line_count if options[:verbose] && multiline
|
147
152
|
|
148
|
-
line.chomp! # will use
|
153
|
+
line.chomp! # will use $INPUT_RECORD_SEPARATOR which is set to options[:col_sep]
|
149
154
|
|
150
155
|
if (line =~ %r{#{options[:quote_char]}}) and (! options[:force_simple_split])
|
151
156
|
dataA = begin
|
152
|
-
CSV.parse( line, csv_options ).flatten.collect!{|x| x.nil? ? '' : x} # to deal with nil values from CSV.parse
|
157
|
+
CSV.parse( line, **csv_options ).flatten.collect!{|x| x.nil? ? '' : x} # to deal with nil values from CSV.parse
|
153
158
|
rescue CSV::MalformedCSVError => e
|
154
159
|
raise $!, "#{$!} [SmarterCSV: csv line #{csv_line_count}]", $!.backtrace
|
155
160
|
end
|
156
161
|
else
|
157
|
-
dataA = line.split(options[:col_sep])
|
162
|
+
dataA = line.split(options[:col_sep], header_size)
|
158
163
|
end
|
159
164
|
#### dataA.map!{|x| x.gsub(%r/#{options[:quote_char]}/,'') } # this is actually not a good idea as a default
|
160
165
|
dataA.map!{|x| x.strip} if options[:strip_whitespace]
|
166
|
+
|
167
|
+
# if all values are blank, then ignore this line
|
168
|
+
# SEE: https://github.com/rails/rails/blob/32015b6f369adc839c4f0955f2d9dce50c0b6123/activesupport/lib/active_support/core_ext/object/blank.rb#L121
|
169
|
+
next if options[:remove_empty_hashes] && blank?(dataA)
|
170
|
+
|
161
171
|
hash = Hash.zip(headerA,dataA) # from Facets of Ruby library
|
172
|
+
|
162
173
|
# make sure we delete any key/value pairs from the hash, which the user wanted to delete:
|
163
174
|
# Note: Ruby < 1.9 doesn't allow empty symbol literals!
|
164
175
|
hash.delete(nil); hash.delete('');
|
@@ -166,18 +177,17 @@ module SmarterCSV
|
|
166
177
|
eval('hash.delete(:"")')
|
167
178
|
end
|
168
179
|
|
169
|
-
|
170
|
-
# which caters for double \n and \r\n characters such as "1\r\n\r\n2" whereas the original check (v =~ /^\s*$/) does not
|
171
|
-
if options[:remove_empty_values]
|
180
|
+
if options[:remove_empty_values] == true
|
172
181
|
if has_rails
|
173
182
|
hash.delete_if{|k,v| v.blank?}
|
174
183
|
else
|
175
|
-
hash.delete_if{|k,v|
|
184
|
+
hash.delete_if{|k,v| blank?(v)}
|
176
185
|
end
|
177
186
|
end
|
178
187
|
|
179
188
|
hash.delete_if{|k,v| ! v.nil? && v =~ /^(\d+|\d+\.\d+)$/ && v.to_f == 0} if options[:remove_zero_values] # values are typically Strings!
|
180
189
|
hash.delete_if{|k,v| v =~ options[:remove_values_matching]} if options[:remove_values_matching]
|
190
|
+
|
181
191
|
if options[:convert_values_to_numeric]
|
182
192
|
hash.each do |k,v|
|
183
193
|
# deal with the :only / :except options to :convert_values_to_numeric
|
@@ -247,7 +257,7 @@ module SmarterCSV
|
|
247
257
|
chunk = [] # initialize for next chunk of data
|
248
258
|
end
|
249
259
|
ensure
|
250
|
-
|
260
|
+
$INPUT_RECORD_SEPARATOR = old_row_sep # make sure this stupid global variable is always reset to it's previous value after we're done!
|
251
261
|
f.close if f.respond_to?(:close)
|
252
262
|
end
|
253
263
|
if block_given?
|
@@ -258,8 +268,30 @@ module SmarterCSV
|
|
258
268
|
end
|
259
269
|
|
260
270
|
private
|
261
|
-
# acts as a road-block to limit processing when iterating over all k/v pairs of a CSV-hash:
|
262
271
|
|
272
|
+
def self.blank?(value)
|
273
|
+
case value
|
274
|
+
when Array
|
275
|
+
value.inject(true){|result, x| result &&= elem_blank?(x)}
|
276
|
+
when Hash
|
277
|
+
value.inject(true){|result, x| result &&= elem_blank?(x.last)}
|
278
|
+
else
|
279
|
+
elem_blank?(value)
|
280
|
+
end
|
281
|
+
end
|
282
|
+
|
283
|
+
def self.elem_blank?(value)
|
284
|
+
case value
|
285
|
+
when NilClass
|
286
|
+
true
|
287
|
+
when String
|
288
|
+
value !~ /\S/
|
289
|
+
else
|
290
|
+
false
|
291
|
+
end
|
292
|
+
end
|
293
|
+
|
294
|
+
# acts as a road-block to limit processing when iterating over all k/v pairs of a CSV-hash:
|
263
295
|
def self.only_or_except_limit_execution( options, option_name, key )
|
264
296
|
if options[option_name].is_a?(Hash)
|
265
297
|
if options[option_name].has_key?( :except )
|
@@ -271,6 +303,24 @@ module SmarterCSV
|
|
271
303
|
return false
|
272
304
|
end
|
273
305
|
|
306
|
+
# raise exception if none is found
|
307
|
+
def self.guess_column_separator(filehandle)
|
308
|
+
del = [',', "\t", ';', ':', '|']
|
309
|
+
n = Hash.new(0)
|
310
|
+
5.times do
|
311
|
+
line = filehandle.readline
|
312
|
+
del.each do |d|
|
313
|
+
n[d] += line.scan(d).count
|
314
|
+
end
|
315
|
+
rescue EOFError # short files
|
316
|
+
break
|
317
|
+
end
|
318
|
+
filehandle.rewind
|
319
|
+
raise SmarterCSV::NoColSepDetected if n.values.max == 0
|
320
|
+
|
321
|
+
col_sep = n.key(n.values.max)
|
322
|
+
end
|
323
|
+
|
274
324
|
# limitation: this currently reads the whole file in before making a decision
|
275
325
|
def self.guess_line_ending( filehandle, options )
|
276
326
|
counts = {"\n" => 0 , "\r" => 0, "\r\n" => 0}
|
data/lib/smarter_csv/version.rb
CHANGED
data/smarter_csv.gemspec
CHANGED
@@ -1,21 +1,24 @@
|
|
1
1
|
# -*- encoding: utf-8 -*-
|
2
2
|
require File.expand_path('../lib/smarter_csv/version', __FILE__)
|
3
3
|
|
4
|
-
Gem::Specification.new do |
|
5
|
-
|
6
|
-
|
7
|
-
|
8
|
-
|
9
|
-
gem.homepage = "https://github.com/tilo/smarter_csv"
|
4
|
+
Gem::Specification.new do |spec|
|
5
|
+
spec.name = "smarter_csv"
|
6
|
+
spec.version = SmarterCSV::VERSION
|
7
|
+
spec.authors = ["Tilo Sloboda"]
|
8
|
+
spec.email = ["tilo.sloboda@gmail.com"]
|
10
9
|
|
11
|
-
|
12
|
-
|
13
|
-
|
14
|
-
|
15
|
-
|
16
|
-
|
17
|
-
|
18
|
-
|
19
|
-
|
20
|
-
|
10
|
+
spec.summary = %q{Ruby Gem for smarter importing of CSV Files (and CSV-like files), with lots of optional features, e.g. chunked processing for huge CSV files}
|
11
|
+
spec.description = %q{Ruby Gem for smarter importing of CSV Files as Array(s) of Hashes, with optional features for processing large files in parallel, embedded comments, unusual field- and record-separators, flexible mapping of CSV-headers to Hash-keys}
|
12
|
+
spec.homepage = "https://github.com/tilo/smarter_csv"
|
13
|
+
spec.license = 'MIT'
|
14
|
+
|
15
|
+
spec.files = `git ls-files`.split($\)
|
16
|
+
spec.executables = spec.files.grep(%r{^bin/}).map{ |f| File.basename(f) }
|
17
|
+
spec.test_files = spec.files.grep(%r{^(test|spec|features)/})
|
18
|
+
spec.require_paths = ["lib"]
|
19
|
+
spec.requirements = ['csv'] # for CSV.parse() only needed in case we have quoted fields
|
20
|
+
spec.add_development_dependency "rspec"
|
21
|
+
# spec.add_development_dependency "guard-rspec"
|
22
|
+
|
23
|
+
spec.metadata["homepage_uri"] = spec.homepage
|
21
24
|
end
|
data/spec/fixtures/numeric.csv
CHANGED
File without changes
|
@@ -0,0 +1,55 @@
|
|
1
|
+
require 'spec_helper'
|
2
|
+
|
3
|
+
describe 'blank?' do
|
4
|
+
it 'is true for nil' do
|
5
|
+
SmarterCSV.send(:blank?, nil).should eq true
|
6
|
+
end
|
7
|
+
|
8
|
+
it 'is true for empty string' do
|
9
|
+
SmarterCSV.send(:blank?, '').should eq true
|
10
|
+
end
|
11
|
+
|
12
|
+
it 'is true for blank string' do
|
13
|
+
SmarterCSV.send(:blank?, ' ').should eq true
|
14
|
+
end
|
15
|
+
|
16
|
+
it 'is true for tab string' do
|
17
|
+
SmarterCSV.send(:blank?, " \t ").should eq true
|
18
|
+
end
|
19
|
+
|
20
|
+
it 'is false for string with content' do
|
21
|
+
SmarterCSV.send(:blank?, " 1 ").should eq false
|
22
|
+
end
|
23
|
+
|
24
|
+
it 'is false for numeic values' do
|
25
|
+
SmarterCSV.send(:blank?, 1).should eq false
|
26
|
+
end
|
27
|
+
|
28
|
+
describe 'arrays' do
|
29
|
+
it 'is true for empty arrays' do
|
30
|
+
SmarterCSV.send(:blank?, []).should eq true
|
31
|
+
end
|
32
|
+
|
33
|
+
it 'is true for blank arrays' do
|
34
|
+
SmarterCSV.send(:blank?, [nil, '', ' ', " \t "]).should eq true
|
35
|
+
end
|
36
|
+
|
37
|
+
it 'is false for non-blank arrays' do
|
38
|
+
SmarterCSV.send(:blank?, [nil, '', ' ', " 1 "]).should eq false
|
39
|
+
end
|
40
|
+
end
|
41
|
+
|
42
|
+
describe 'hashes' do
|
43
|
+
it 'is true for empty arrays' do
|
44
|
+
SmarterCSV.send(:blank?, {}).should eq true
|
45
|
+
end
|
46
|
+
|
47
|
+
it 'is true for blank arrays' do
|
48
|
+
SmarterCSV.send(:blank?, {a: nil, b: '', c: ' ', d: " \t "}).should eq true
|
49
|
+
end
|
50
|
+
|
51
|
+
it 'is false for non-blank arrays' do
|
52
|
+
SmarterCSV.send(:blank?, {a: nil, b: '', c: ' ', d: " 1 "}).should eq false
|
53
|
+
end
|
54
|
+
end
|
55
|
+
end
|
@@ -2,10 +2,88 @@ require 'spec_helper'
|
|
2
2
|
|
3
3
|
fixture_path = 'spec/fixtures'
|
4
4
|
|
5
|
-
describe '
|
6
|
-
|
7
|
-
|
8
|
-
data = SmarterCSV.process("#{fixture_path}/
|
9
|
-
data.
|
5
|
+
describe 'can handle col_sep' do
|
6
|
+
|
7
|
+
it 'has default of comma as col_sep' do
|
8
|
+
data = SmarterCSV.process("#{fixture_path}/separator_comma.csv") # no options
|
9
|
+
data.first.keys.size.should == 4
|
10
|
+
data.size.should eq 3
|
11
|
+
end
|
12
|
+
|
13
|
+
describe 'with explicitly given col_sep' do
|
14
|
+
it 'loads file with comma separator' do
|
15
|
+
options = {:col_sep => ','}
|
16
|
+
data = SmarterCSV.process("#{fixture_path}/separator_comma.csv", options)
|
17
|
+
data.first.keys.size.should == 4
|
18
|
+
data.size.should eq 3
|
19
|
+
end
|
20
|
+
|
21
|
+
it 'loads file with tab separator' do
|
22
|
+
options = {:col_sep => "\t"}
|
23
|
+
data = SmarterCSV.process("#{fixture_path}/separator_tab.csv", options)
|
24
|
+
data.first.keys.size.should == 4
|
25
|
+
data.size.should eq 3
|
26
|
+
end
|
27
|
+
|
28
|
+
it 'loads file with semi-colon separator' do
|
29
|
+
options = {:col_sep => ';'}
|
30
|
+
data = SmarterCSV.process("#{fixture_path}/separator_semi.csv", options)
|
31
|
+
data.first.keys.size.should == 4
|
32
|
+
data.size.should eq 3
|
33
|
+
end
|
34
|
+
|
35
|
+
it 'loads file with colon separator' do
|
36
|
+
options = {:col_sep => ':'}
|
37
|
+
data = SmarterCSV.process("#{fixture_path}/separator_colon.csv", options)
|
38
|
+
data.first.keys.size.should == 4
|
39
|
+
data.size.should eq 3
|
40
|
+
end
|
41
|
+
|
42
|
+
it 'loads file with pipe separator' do
|
43
|
+
options = {:col_sep => '|'}
|
44
|
+
data = SmarterCSV.process("#{fixture_path}/separator_pipe.csv", options)
|
45
|
+
data.first.keys.size.should == 4
|
46
|
+
data.size.should eq 3
|
47
|
+
end
|
48
|
+
end
|
49
|
+
|
50
|
+
describe 'auto-detection of separator' do
|
51
|
+
options = {:col_sep => 'auto'}
|
52
|
+
|
53
|
+
it 'auto-detects comma separator and loads data' do
|
54
|
+
data = SmarterCSV.process("#{fixture_path}/separator_comma.csv", options)
|
55
|
+
data.first.keys.size.should == 4
|
56
|
+
data.size.should eq 3
|
57
|
+
end
|
58
|
+
|
59
|
+
it 'auto-detects tab separator and loads data' do
|
60
|
+
data = SmarterCSV.process("#{fixture_path}/separator_tab.csv", options)
|
61
|
+
data.first.keys.size.should == 4
|
62
|
+
data.size.should eq 3
|
63
|
+
end
|
64
|
+
|
65
|
+
it 'auto-detects semi-colon separator and loads data' do
|
66
|
+
data = SmarterCSV.process("#{fixture_path}/separator_semi.csv", options)
|
67
|
+
data.first.keys.size.should == 4
|
68
|
+
data.size.should eq 3
|
69
|
+
end
|
70
|
+
|
71
|
+
it 'auto-detects colon separator and loads data' do
|
72
|
+
data = SmarterCSV.process("#{fixture_path}/separator_colon.csv", options)
|
73
|
+
data.first.keys.size.should == 4
|
74
|
+
data.size.should eq 3
|
75
|
+
end
|
76
|
+
|
77
|
+
it 'auto-detects pipe separator and loads data' do
|
78
|
+
data = SmarterCSV.process("#{fixture_path}/separator_pipe.csv", options)
|
79
|
+
data.first.keys.size.should == 4
|
80
|
+
data.size.should eq 3
|
81
|
+
end
|
82
|
+
|
83
|
+
it 'does not auto-detect other separators' do
|
84
|
+
expect {
|
85
|
+
SmarterCSV.process("#{fixture_path}/binary.csv", options)
|
86
|
+
}.to raise_exception SmarterCSV::NoColSepDetected
|
87
|
+
end
|
10
88
|
end
|
11
89
|
end
|
@@ -0,0 +1,74 @@
|
|
1
|
+
require 'spec_helper'
|
2
|
+
|
3
|
+
fixture_path = 'spec/fixtures'
|
4
|
+
|
5
|
+
describe 'can handle empty columns' do
|
6
|
+
|
7
|
+
describe 'default behavior' do
|
8
|
+
it 'has empty columns at end' do
|
9
|
+
data = SmarterCSV.process("#{fixture_path}/empty_columns_1.csv")
|
10
|
+
data.size.should eq 1
|
11
|
+
item = data.first
|
12
|
+
item[:id].should == 123
|
13
|
+
item[:col1].should == nil
|
14
|
+
item[:col2].should == nil
|
15
|
+
item[:col3].should == nil
|
16
|
+
end
|
17
|
+
|
18
|
+
it 'has empty columns in the middle' do
|
19
|
+
data = SmarterCSV.process("#{fixture_path}/empty_columns_2.csv")
|
20
|
+
data.size.should eq 1
|
21
|
+
item = data.first
|
22
|
+
item[:id].should == 123
|
23
|
+
item[:col1].should == nil
|
24
|
+
item[:col2].should == nil
|
25
|
+
item[:col3].should == 1
|
26
|
+
end
|
27
|
+
end
|
28
|
+
|
29
|
+
describe 'with remove_empty_values: true' do
|
30
|
+
options = {remove_empty_values: true}
|
31
|
+
it 'has empty columns at end' do
|
32
|
+
data = SmarterCSV.process("#{fixture_path}/empty_columns_1.csv", options)
|
33
|
+
data.size.should eq 1
|
34
|
+
item = data.first
|
35
|
+
item[:id].should == 123
|
36
|
+
item[:col1].should == nil
|
37
|
+
item[:col2].should == nil
|
38
|
+
item[:col3].should == nil
|
39
|
+
end
|
40
|
+
|
41
|
+
it 'has empty columns in the middle' do
|
42
|
+
data = SmarterCSV.process("#{fixture_path}/empty_columns_2.csv", options)
|
43
|
+
data.size.should eq 1
|
44
|
+
item = data.first
|
45
|
+
item[:id].should == 123
|
46
|
+
item[:col1].should == nil
|
47
|
+
item[:col2].should == nil
|
48
|
+
item[:col3].should == 1
|
49
|
+
end
|
50
|
+
end
|
51
|
+
|
52
|
+
describe 'with remove_empty_values: false' do
|
53
|
+
options = {remove_empty_values: false}
|
54
|
+
it 'has empty columns at end' do
|
55
|
+
data = SmarterCSV.process("#{fixture_path}/empty_columns_1.csv", options)
|
56
|
+
data.size.should eq 1
|
57
|
+
item = data.first
|
58
|
+
item[:id].should == 123
|
59
|
+
item[:col1].should == ''
|
60
|
+
item[:col2].should == ''
|
61
|
+
item[:col3].should == ''
|
62
|
+
end
|
63
|
+
|
64
|
+
it 'has empty columns in the middle' do
|
65
|
+
data = SmarterCSV.process("#{fixture_path}/empty_columns_2.csv", options)
|
66
|
+
data.size.should eq 1
|
67
|
+
item = data.first
|
68
|
+
item[:id].should == 123
|
69
|
+
item[:col1].should == ''
|
70
|
+
item[:col2].should == ''
|
71
|
+
item[:col3].should == 1
|
72
|
+
end
|
73
|
+
end
|
74
|
+
end
|
@@ -22,4 +22,35 @@ describe 'be_able_to' do
|
|
22
22
|
end
|
23
23
|
end
|
24
24
|
|
25
|
+
describe 'when keep_original_headers' do
|
26
|
+
it 'without key mapping' do
|
27
|
+
options = {:keep_original_headers => true}
|
28
|
+
data = SmarterCSV.process("#{fixture_path}/key_mapping.csv", options)
|
29
|
+
data.size.should == 1
|
30
|
+
data.first.keys.should == ['THIS', 'THAT', 'other']
|
31
|
+
end
|
32
|
+
|
33
|
+
it 'sets key_mapping to a symbol' do
|
34
|
+
options = {:keep_original_headers => true, :key_mapping => {'other' => :other}}
|
35
|
+
data = SmarterCSV.process("#{fixture_path}/key_mapping.csv", options)
|
36
|
+
data.size.should == 1
|
37
|
+
data.first.keys.should == ['THIS', 'THAT', :other]
|
38
|
+
end
|
39
|
+
|
40
|
+
# this previously would set the key to a symbol :OTHER, which was a bug!
|
41
|
+
it 'sets key_mapping to a string' do
|
42
|
+
options = {:keep_original_headers => true, :key_mapping => {'other' => 'OTHER'}}
|
43
|
+
data = SmarterCSV.process("#{fixture_path}/key_mapping.csv", options)
|
44
|
+
data.size.should == 1
|
45
|
+
data.first.keys.should == ['THIS', 'THAT', 'OTHER']
|
46
|
+
end
|
47
|
+
|
48
|
+
# users now have to explicitly set this to a symbol, or change the expected keys to be strings.
|
49
|
+
it 'sets key_mapping to a symbol' do
|
50
|
+
options = {:keep_original_headers => true, :key_mapping => {'other' => :OTHER}}
|
51
|
+
data = SmarterCSV.process("#{fixture_path}/key_mapping.csv", options)
|
52
|
+
data.size.should == 1
|
53
|
+
data.first.keys.should == ['THIS', 'THAT', :OTHER]
|
54
|
+
end
|
55
|
+
end
|
25
56
|
end
|
@@ -8,14 +8,10 @@ describe 'malformed_csv' do
|
|
8
8
|
context "malformed header" do
|
9
9
|
let(:csv_path) { "#{fixture_path}/malformed_header.csv" }
|
10
10
|
it { should raise_error(CSV::MalformedCSVError) }
|
11
|
-
it { should raise_error(/(Missing or stray quote in line 1|CSV::MalformedCSVError)/) }
|
12
|
-
it { should raise_error(CSV::MalformedCSVError) }
|
13
11
|
end
|
14
12
|
|
15
13
|
context "malformed content" do
|
16
14
|
let(:csv_path) { "#{fixture_path}/malformed.csv" }
|
17
15
|
it { should raise_error(CSV::MalformedCSVError) }
|
18
|
-
it { should raise_error(/(Missing or stray quote in line 1|CSV::MalformedCSVError)/) }
|
19
|
-
it { should raise_error(CSV::MalformedCSVError) }
|
20
16
|
end
|
21
17
|
end
|
metadata
CHANGED
@@ -1,16 +1,14 @@
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
2
2
|
name: smarter_csv
|
3
3
|
version: !ruby/object:Gem::Version
|
4
|
-
version: 1.
|
4
|
+
version: 1.4.0
|
5
5
|
platform: ruby
|
6
6
|
authors:
|
7
|
-
-
|
8
|
-
|
9
|
-
'
|
10
|
-
autorequire:
|
7
|
+
- Tilo Sloboda
|
8
|
+
autorequire:
|
11
9
|
bindir: bin
|
12
10
|
cert_chain: []
|
13
|
-
date:
|
11
|
+
date: 2022-02-11 00:00:00.000000000 Z
|
14
12
|
dependencies:
|
15
13
|
- !ruby/object:Gem::Dependency
|
16
14
|
name: rspec
|
@@ -30,9 +28,7 @@ description: Ruby Gem for smarter importing of CSV Files as Array(s) of Hashes,
|
|
30
28
|
optional features for processing large files in parallel, embedded comments, unusual
|
31
29
|
field- and record-separators, flexible mapping of CSV-headers to Hash-keys
|
32
30
|
email:
|
33
|
-
-
|
34
|
-
|
35
|
-
'
|
31
|
+
- tilo.sloboda@gmail.com
|
36
32
|
executables: []
|
37
33
|
extensions: []
|
38
34
|
extra_rdoc_files: []
|
@@ -41,7 +37,9 @@ files:
|
|
41
37
|
- ".rspec"
|
42
38
|
- ".rvmrc"
|
43
39
|
- ".travis.yml"
|
40
|
+
- CHANGELOG.md
|
44
41
|
- Gemfile
|
42
|
+
- LICENSE.txt
|
45
43
|
- README.md
|
46
44
|
- Rakefile
|
47
45
|
- lib/extensions/hash.rb
|
@@ -58,8 +56,11 @@ files:
|
|
58
56
|
- spec/fixtures/chunk_cornercase.csv
|
59
57
|
- spec/fixtures/duplicate_headers.csv
|
60
58
|
- spec/fixtures/empty.csv
|
59
|
+
- spec/fixtures/empty_columns_1.csv
|
60
|
+
- spec/fixtures/empty_columns_2.csv
|
61
61
|
- spec/fixtures/ignore_comments.csv
|
62
62
|
- spec/fixtures/ignore_comments2.csv
|
63
|
+
- spec/fixtures/key_mapping.csv
|
63
64
|
- spec/fixtures/line_endings_n.csv
|
64
65
|
- spec/fixtures/line_endings_r.csv
|
65
66
|
- spec/fixtures/line_endings_rn.csv
|
@@ -74,7 +75,11 @@ files:
|
|
74
75
|
- spec/fixtures/quote_char.csv
|
75
76
|
- spec/fixtures/quoted.csv
|
76
77
|
- spec/fixtures/quoted2.csv
|
77
|
-
- spec/fixtures/
|
78
|
+
- spec/fixtures/separator_colon.csv
|
79
|
+
- spec/fixtures/separator_comma.csv
|
80
|
+
- spec/fixtures/separator_pipe.csv
|
81
|
+
- spec/fixtures/separator_semi.csv
|
82
|
+
- spec/fixtures/separator_tab.csv
|
78
83
|
- spec/fixtures/skip_lines.csv
|
79
84
|
- spec/fixtures/trading.csv
|
80
85
|
- spec/fixtures/user_import.csv
|
@@ -83,11 +88,13 @@ files:
|
|
83
88
|
- spec/fixtures/with_dates.csv
|
84
89
|
- spec/smarter_csv/binary_file2_spec.rb
|
85
90
|
- spec/smarter_csv/binary_file_spec.rb
|
91
|
+
- spec/smarter_csv/blank_spec.rb
|
86
92
|
- spec/smarter_csv/carriage_return_spec.rb
|
87
93
|
- spec/smarter_csv/chunked_reading_spec.rb
|
88
94
|
- spec/smarter_csv/close_file_spec.rb
|
89
95
|
- spec/smarter_csv/column_separator_spec.rb
|
90
96
|
- spec/smarter_csv/convert_values_to_numeric_spec.rb
|
97
|
+
- spec/smarter_csv/empty_columns_spec.rb
|
91
98
|
- spec/smarter_csv/extenstions_spec.rb
|
92
99
|
- spec/smarter_csv/header_transformation_spec.rb
|
93
100
|
- spec/smarter_csv/ignore_comments_spec.rb
|
@@ -118,9 +125,9 @@ files:
|
|
118
125
|
homepage: https://github.com/tilo/smarter_csv
|
119
126
|
licenses:
|
120
127
|
- MIT
|
121
|
-
|
122
|
-
|
123
|
-
post_install_message:
|
128
|
+
metadata:
|
129
|
+
homepage_uri: https://github.com/tilo/smarter_csv
|
130
|
+
post_install_message:
|
124
131
|
rdoc_options: []
|
125
132
|
require_paths:
|
126
133
|
- lib
|
@@ -136,9 +143,8 @@ required_rubygems_version: !ruby/object:Gem::Requirement
|
|
136
143
|
version: '0'
|
137
144
|
requirements:
|
138
145
|
- csv
|
139
|
-
|
140
|
-
|
141
|
-
signing_key:
|
146
|
+
rubygems_version: 3.1.4
|
147
|
+
signing_key:
|
142
148
|
specification_version: 4
|
143
149
|
summary: Ruby Gem for smarter importing of CSV Files (and CSV-like files), with lots
|
144
150
|
of optional features, e.g. chunked processing for huge CSV files
|
@@ -152,8 +158,11 @@ test_files:
|
|
152
158
|
- spec/fixtures/chunk_cornercase.csv
|
153
159
|
- spec/fixtures/duplicate_headers.csv
|
154
160
|
- spec/fixtures/empty.csv
|
161
|
+
- spec/fixtures/empty_columns_1.csv
|
162
|
+
- spec/fixtures/empty_columns_2.csv
|
155
163
|
- spec/fixtures/ignore_comments.csv
|
156
164
|
- spec/fixtures/ignore_comments2.csv
|
165
|
+
- spec/fixtures/key_mapping.csv
|
157
166
|
- spec/fixtures/line_endings_n.csv
|
158
167
|
- spec/fixtures/line_endings_r.csv
|
159
168
|
- spec/fixtures/line_endings_rn.csv
|
@@ -168,7 +177,11 @@ test_files:
|
|
168
177
|
- spec/fixtures/quote_char.csv
|
169
178
|
- spec/fixtures/quoted.csv
|
170
179
|
- spec/fixtures/quoted2.csv
|
171
|
-
- spec/fixtures/
|
180
|
+
- spec/fixtures/separator_colon.csv
|
181
|
+
- spec/fixtures/separator_comma.csv
|
182
|
+
- spec/fixtures/separator_pipe.csv
|
183
|
+
- spec/fixtures/separator_semi.csv
|
184
|
+
- spec/fixtures/separator_tab.csv
|
172
185
|
- spec/fixtures/skip_lines.csv
|
173
186
|
- spec/fixtures/trading.csv
|
174
187
|
- spec/fixtures/user_import.csv
|
@@ -177,11 +190,13 @@ test_files:
|
|
177
190
|
- spec/fixtures/with_dates.csv
|
178
191
|
- spec/smarter_csv/binary_file2_spec.rb
|
179
192
|
- spec/smarter_csv/binary_file_spec.rb
|
193
|
+
- spec/smarter_csv/blank_spec.rb
|
180
194
|
- spec/smarter_csv/carriage_return_spec.rb
|
181
195
|
- spec/smarter_csv/chunked_reading_spec.rb
|
182
196
|
- spec/smarter_csv/close_file_spec.rb
|
183
197
|
- spec/smarter_csv/column_separator_spec.rb
|
184
198
|
- spec/smarter_csv/convert_values_to_numeric_spec.rb
|
199
|
+
- spec/smarter_csv/empty_columns_spec.rb
|
185
200
|
- spec/smarter_csv/extenstions_spec.rb
|
186
201
|
- spec/smarter_csv/header_transformation_spec.rb
|
187
202
|
- spec/smarter_csv/ignore_comments_spec.rb
|