smarter_csv 1.3.0 → 1.4.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +4 -4
- data/CHANGELOG.md +165 -0
- data/LICENSE.txt +21 -0
- data/README.md +9 -173
- data/lib/smarter_csv/smarter_csv.rb +65 -15
- data/lib/smarter_csv/version.rb +1 -1
- data/smarter_csv.gemspec +19 -16
- data/spec/fixtures/empty_columns_1.csv +2 -0
- data/spec/fixtures/empty_columns_2.csv +2 -0
- data/spec/fixtures/numeric.csv +1 -1
- data/spec/fixtures/separator_colon.csv +4 -0
- data/spec/fixtures/separator_comma.csv +4 -0
- data/spec/fixtures/separator_pipe.csv +4 -0
- data/spec/fixtures/{separator.csv → separator_semi.csv} +0 -0
- data/spec/fixtures/separator_tab.csv +4 -0
- data/spec/smarter_csv/blank_spec.rb +55 -0
- data/spec/smarter_csv/column_separator_spec.rb +83 -5
- data/spec/smarter_csv/empty_columns_spec.rb +74 -0
- metadata +26 -12
checksums.yaml
CHANGED
|
@@ -1,7 +1,7 @@
|
|
|
1
1
|
---
|
|
2
2
|
SHA256:
|
|
3
|
-
metadata.gz:
|
|
4
|
-
data.tar.gz:
|
|
3
|
+
metadata.gz: c8236e4cc8f0081efd9b74f12ad4b5342707d0a2f883414b07538160910008a3
|
|
4
|
+
data.tar.gz: b04a53b0030bf6c623aa19fb15c0c6c5ca123ce2ff85d47f176884fffa0f9811
|
|
5
5
|
SHA512:
|
|
6
|
-
metadata.gz:
|
|
7
|
-
data.tar.gz:
|
|
6
|
+
metadata.gz: f2ddaa7bf44362c8bb4439289172d40b6ca926a67a8a35fb335473ddf7349658a629f3008ece5314c6bc5fa17145a2ae89b4d706b9c130a1642a51f2434d5e21
|
|
7
|
+
data.tar.gz: b48908b657a07589886873fe251263dabbe6e2333a1fc025dfede085841544458d4498ba1d288a4a7c0de3875d1c14631cc584b2a1cb7fd0be1543b758781dd3
|
data/CHANGELOG.md
ADDED
|
@@ -0,0 +1,165 @@
|
|
|
1
|
+
|
|
2
|
+
# SmarterCSV 1.x Change Log
|
|
3
|
+
|
|
4
|
+
## 1.4.0 (2022-01-11)
|
|
5
|
+
* dropped GPL license, smarter_csv is now only using the MIT License
|
|
6
|
+
* added experimental option `col_sep: 'auto` to auto-detect the column separator (issue #183)
|
|
7
|
+
The default behavior is still to assume `,` is the column separator.
|
|
8
|
+
* fixed buggy behavior when using `remove_empty_values: false` (issue #168)
|
|
9
|
+
* fixed Ruby 3.0 deprecation
|
|
10
|
+
|
|
11
|
+
## 1.3.0 (2022-01-06) Breaking code change if you used `--key_mappings`
|
|
12
|
+
* fix bug for key_mappings (issue #181)
|
|
13
|
+
The values of the `key_mappings` hash will now be used "as is", and no longer forced to be symbols
|
|
14
|
+
|
|
15
|
+
**Users with existing code with `--key_mappings` need to change their code** to
|
|
16
|
+
* either use symbols in the `key_mapping` hash
|
|
17
|
+
* or change the expected keys from symbols to strings
|
|
18
|
+
|
|
19
|
+
## 1.2.9 (2021-11-22) (PULLED)
|
|
20
|
+
* fix bug for key_mappings (issue #181)
|
|
21
|
+
The values of the `key_mappings` hash will now be used "as is", and no longer forced to be symbols
|
|
22
|
+
|
|
23
|
+
## 1.2.8 (2020-02-04)
|
|
24
|
+
* fix deprecation warnings on Ruby 2.7 (thank to Diego Salido)
|
|
25
|
+
|
|
26
|
+
## 1.2.7 (2020-02-03)
|
|
27
|
+
|
|
28
|
+
## 1.2.6 (2018-11-13)
|
|
29
|
+
* fixing error caused by calling f.close when we do not hand in a file
|
|
30
|
+
|
|
31
|
+
## 1.2.5 (2018-09-16)
|
|
32
|
+
* fixing issue #136 with comments in CSV files
|
|
33
|
+
* fixing error class hierarchy
|
|
34
|
+
|
|
35
|
+
## 1.2.4 (2018-08-06)
|
|
36
|
+
* using Rails blank? if it's available
|
|
37
|
+
|
|
38
|
+
## 1.2.3 (2018-01-27)
|
|
39
|
+
* fixed regression / test
|
|
40
|
+
* fuxed quote_char interpolation for headers, but not data (thanks to Colin Petruno)
|
|
41
|
+
* bugfix (thanks to Joshua Smith for reporting)
|
|
42
|
+
|
|
43
|
+
## 1.2.0 (2018-01-20)
|
|
44
|
+
* add default validation that a header can only appear once
|
|
45
|
+
* add option `required_headers`
|
|
46
|
+
|
|
47
|
+
## 1.1.5 (2017-11-05)
|
|
48
|
+
* fix issue with invalid byte sequences in header (issue #103, thanks to Dave Myron)
|
|
49
|
+
* fix issue with invalid byte sequences in multi-line data (thanks to Ivan Ushakov)
|
|
50
|
+
* analyze only 500 characters by default when `:row_sep => :auto` is used.
|
|
51
|
+
added option `row_sep_auto_chars` to change the default if necessary. (thanks to Matthieu Paret)
|
|
52
|
+
|
|
53
|
+
## 1.1.4 (2017-01-16)
|
|
54
|
+
* fixing UTF-8 related bug which was introduced in 1.1.2 (thanks to Tirdad C.)
|
|
55
|
+
|
|
56
|
+
## 1.1.3 (2016-12-30)
|
|
57
|
+
* added warning when options indicate UTF-8 processing, but input filehandle is not opened with r:UTF-8 option
|
|
58
|
+
|
|
59
|
+
## 1.1.2 (2016-12-29)
|
|
60
|
+
* added option `invalid_byte_sequence` (thanks to polycarpou)
|
|
61
|
+
* added comments on handling of UTF-8 encoding when opening from File vs. OpenURI (thanks to KevinColemanInc)
|
|
62
|
+
|
|
63
|
+
## 1.1.1 (2016-11-26)
|
|
64
|
+
* added option to `skip_lines` (thanks to wal)
|
|
65
|
+
* added option to `force_utf8` encoding (thanks to jordangraft)
|
|
66
|
+
* bugfix if no headers in input data (thanks to esBeee)
|
|
67
|
+
* ensure input file is closed (thanks to waldyr)
|
|
68
|
+
* improved verbose output (thankd to benmaher)
|
|
69
|
+
* improved documentation
|
|
70
|
+
|
|
71
|
+
## 1.1.0 (2015-07-26)
|
|
72
|
+
* added feature :value_converters, which allows parsing of dates, money, and other things (thanks to Raphaël Bleuse, Lucas Camargo de Almeida, Alejandro)
|
|
73
|
+
* added error if :headers_in_file is set to false, and no :user_provided_headers are given (thanks to innhyu)
|
|
74
|
+
* added support to convert dashes to underscore characters in headers (thanks to César Camacho)
|
|
75
|
+
* fixing automatic detection of \r\n line-endings (thanks to feens)
|
|
76
|
+
|
|
77
|
+
## 1.0.19 (2014-10-29)
|
|
78
|
+
* added option :keep_original_headers to keep CSV-headers as-is (thanks to Benjamin Thouret)
|
|
79
|
+
|
|
80
|
+
## 1.0.18 (2014-10-27)
|
|
81
|
+
* added support for multi-line fields / csv fields containing CR (thanks to Chris Hilton) (issue #31)
|
|
82
|
+
|
|
83
|
+
## 1.0.17 (2014-01-13)
|
|
84
|
+
* added option to set :row_sep to :auto , for automatic detection of the row-separator (issue #22)
|
|
85
|
+
|
|
86
|
+
## 1.0.16 (2014-01-13)
|
|
87
|
+
* :convert_values_to_numeric option can now be qualified with :except or :only (thanks to Hugo Lepetit)
|
|
88
|
+
* removed deprecated `process_csv` method
|
|
89
|
+
|
|
90
|
+
## 1.0.15 (2013-12-07)
|
|
91
|
+
* new option:
|
|
92
|
+
* :remove_unmapped_keys to completely ignore columns which were not mapped with :key_mapping (thanks to Dave Sanders)
|
|
93
|
+
|
|
94
|
+
## 1.0.14 (2013-11-01)
|
|
95
|
+
* added GPL-2 and MIT license to GEM spec file; if you need another license contact me
|
|
96
|
+
|
|
97
|
+
## 1.0.12 (2013-10-15)
|
|
98
|
+
* added RSpec tests
|
|
99
|
+
|
|
100
|
+
## 1.0.11 (2013-09-28)
|
|
101
|
+
* bugfix : fixed issue #18 - fixing issue with last chunk not being properly returned (thanks to Jordan Running)
|
|
102
|
+
* added RSpec tests
|
|
103
|
+
|
|
104
|
+
## 1.0.10 (2013-06-26)
|
|
105
|
+
* bugfix : fixed issue #14 - passing options along to CSV.parse (thanks to Marcos Zimmermann)
|
|
106
|
+
|
|
107
|
+
## 1.0.9 (2013-06-19)
|
|
108
|
+
* bugfix : fixed issue #13 with negative integers and floats not being correctly converted (thanks to Graham Wetzler)
|
|
109
|
+
|
|
110
|
+
## 1.0.8 (2013-06-01)
|
|
111
|
+
|
|
112
|
+
* bugfix : fixed issue with nil values in inputs with quote-char (thanks to Félix Bellanger)
|
|
113
|
+
* new options:
|
|
114
|
+
* :force_simple_split : to force simiple splitting on :col_sep character for non-standard CSV-files. e.g. without properly escaped :quote_char
|
|
115
|
+
* :verbose : print out line number while processing (to track down problems in input files)
|
|
116
|
+
|
|
117
|
+
## 1.0.7 (2013-05-20)
|
|
118
|
+
|
|
119
|
+
* allowing process to work with objects with a 'readline' method (thanks to taq)
|
|
120
|
+
* added options:
|
|
121
|
+
* :file_encoding : defaults to utf8 (thanks to MrTin, Paxa)
|
|
122
|
+
|
|
123
|
+
## 1.0.6 (2013-05-19)
|
|
124
|
+
|
|
125
|
+
* bugfix : quoted fields are now correctly parsed
|
|
126
|
+
|
|
127
|
+
## 1.0.5 (2013-05-08)
|
|
128
|
+
|
|
129
|
+
* bugfix : for :headers_in_file option
|
|
130
|
+
|
|
131
|
+
## 1.0.4 (2012-08-17)
|
|
132
|
+
|
|
133
|
+
* renamed the following options:
|
|
134
|
+
* :strip_whitepace_from_values => :strip_whitespace - removes leading/trailing whitespace from headers and values
|
|
135
|
+
|
|
136
|
+
## 1.0.3 (2012-08-16)
|
|
137
|
+
|
|
138
|
+
* added the following options:
|
|
139
|
+
* :strip_whitepace_from_values - removes leading/trailing whitespace from values
|
|
140
|
+
|
|
141
|
+
## 1.0.2 (2012-08-02)
|
|
142
|
+
|
|
143
|
+
* added more options for dealing with headers:
|
|
144
|
+
* :user_provided_headers ,user provided Array with header strings or symbols, to precisely define what the headers should be, overriding any in-file headers (default: nil)
|
|
145
|
+
* :headers_in_file , if the file contains headers as the first line (default: true)
|
|
146
|
+
|
|
147
|
+
## 1.0.1 (2012-07-30)
|
|
148
|
+
|
|
149
|
+
* added the following options:
|
|
150
|
+
* :downcase_header
|
|
151
|
+
* :strings_as_keys
|
|
152
|
+
* :remove_zero_values
|
|
153
|
+
* :remove_values_matching
|
|
154
|
+
* :remove_empty_hashes
|
|
155
|
+
* :convert_values_to_numeric
|
|
156
|
+
|
|
157
|
+
* renamed the following options:
|
|
158
|
+
* :remove_empty_fields => :remove_empty_values
|
|
159
|
+
|
|
160
|
+
|
|
161
|
+
## 1.0.0 (2012-07-29)
|
|
162
|
+
|
|
163
|
+
* renamed `SmarterCSV.process_csv` to `SmarterCSV.process`.
|
|
164
|
+
|
|
165
|
+
## 1.0.0.pre1 (2012-07-29)
|
data/LICENSE.txt
ADDED
|
@@ -0,0 +1,21 @@
|
|
|
1
|
+
The MIT License (MIT)
|
|
2
|
+
|
|
3
|
+
Copyright (c) 2022 Tilo Sloboda
|
|
4
|
+
|
|
5
|
+
Permission is hereby granted, free of charge, to any person obtaining a copy
|
|
6
|
+
of this software and associated documentation files (the "Software"), to deal
|
|
7
|
+
in the Software without restriction, including without limitation the rights
|
|
8
|
+
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
|
|
9
|
+
copies of the Software, and to permit persons to whom the Software is
|
|
10
|
+
furnished to do so, subject to the following conditions:
|
|
11
|
+
|
|
12
|
+
The above copyright notice and this permission notice shall be included in
|
|
13
|
+
all copies or substantial portions of the Software.
|
|
14
|
+
|
|
15
|
+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
|
16
|
+
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
|
17
|
+
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
|
|
18
|
+
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
|
19
|
+
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
|
|
20
|
+
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
|
|
21
|
+
THE SOFTWARE.
|
data/README.md
CHANGED
|
@@ -5,15 +5,14 @@
|
|
|
5
5
|
---------------
|
|
6
6
|
#### Service Announcement
|
|
7
7
|
|
|
8
|
-
Work towards SmarterCSV 2.0 is on it's way, with much improved features, and more streamlined options.
|
|
8
|
+
* Work towards SmarterCSV 2.0 is still on it's way, with much improved features, and more streamlined options.
|
|
9
|
+
Please check the 2.0-develop branch, open any issues and pull requests with mention of v2.0.
|
|
9
10
|
|
|
10
|
-
|
|
11
|
-
|
|
12
|
-
New versions on the 1.2 branch will soon print a deprecation warning if you set :verbose to true
|
|
13
|
-
See below for list of deprecated options.
|
|
11
|
+
* New versions on the 1.2 branch will soon print a deprecation warning if you set :verbose to true
|
|
12
|
+
See below for list of deprecated options.
|
|
14
13
|
|
|
15
14
|
---------------
|
|
16
|
-
#### SmarterCSV
|
|
15
|
+
#### SmarterCSV 1.x
|
|
17
16
|
|
|
18
17
|
`smarter_csv` is a Ruby Gem for smarter importing of CSV Files as Array(s) of Hashes, suitable for direct processing with Mongoid or ActiveRecord,
|
|
19
18
|
and parallel processing with Resque or Sidekiq.
|
|
@@ -207,7 +206,7 @@ The options and the block are optional.
|
|
|
207
206
|
| :skip_lines | nil | how many lines to skip before the first line or header line is processed |
|
|
208
207
|
| :comment_regexp | /^#/ | regular expression which matches comment lines (see NOTE about the CSV header) |
|
|
209
208
|
---------------------------------------------------------------------------------------------------------------------------------
|
|
210
|
-
| :col_sep | ',' | column separator
|
|
209
|
+
| :col_sep | ',' | column separator, can be set to 'auto' |
|
|
211
210
|
| :force_simple_split | false | force simple splitting on :col_sep character for non-standard CSV-files. |
|
|
212
211
|
| | | e.g. when :quote_char is not properly escaped |
|
|
213
212
|
| :row_sep | $/ ,"\n" | row separator or record separator , defaults to system's $/ , which defaults to "\n" |
|
|
@@ -222,7 +221,7 @@ The options and the block are optional.
|
|
|
222
221
|
| | | user provided Array of header strings or symbols, to define |
|
|
223
222
|
| | | what headers should be used, overriding any in-file headers. |
|
|
224
223
|
| | | You can not combine the :user_provided_headers and :key_mapping options |
|
|
225
|
-
| :remove_empty_hashes | true | remove / ignore any hashes which don't have any key/value pairs
|
|
224
|
+
| :remove_empty_hashes | true | remove / ignore any hashes which don't have any key/value pairs or all empty values |
|
|
226
225
|
| :verbose | false | print out line number while processing (to track down problems in input files) |
|
|
227
226
|
---------------------------------------------------------------------------------------------------------------------------------
|
|
228
227
|
|
|
@@ -248,7 +247,7 @@ And header and data validations will also be supported in 2.x
|
|
|
248
247
|
---------------------------------------------------------------------------------------------------------------------------------
|
|
249
248
|
| :value_converters | nil | supply a hash of :header => KlassName; the class needs to implement self.convert(val)|
|
|
250
249
|
| :remove_empty_values | true | remove values which have nil or empty strings as values |
|
|
251
|
-
| :remove_zero_values |
|
|
250
|
+
| :remove_zero_values | false | remove values which have a numeric value equal to zero / 0 |
|
|
252
251
|
| :remove_values_matching | nil | removes key/value pairs if value matches given regular expressions. e.g.: |
|
|
253
252
|
| | | /^\$0\.0+$/ to match $0.00 , or /^#VALUE!$/ to match errors in Excel spreadsheets |
|
|
254
253
|
| :convert_values_to_numeric | true | converts strings containing Integers or Floats to the appropriate class |
|
|
@@ -316,170 +315,7 @@ Or install it yourself as:
|
|
|
316
315
|
|
|
317
316
|
$ gem install smarter_csv
|
|
318
317
|
|
|
319
|
-
##
|
|
320
|
-
|
|
321
|
-
Planned in the next releases:
|
|
322
|
-
* programmatic header transformations
|
|
323
|
-
* CSV command line
|
|
324
|
-
|
|
325
|
-
## Changes
|
|
326
|
-
|
|
327
|
-
#### 1.3.0 (2022-01-06) Breaking code change if you used `--key_mappings`
|
|
328
|
-
* fix bug for key_mappings (issue #181)
|
|
329
|
-
The values of the `key_mappings` hash will now be used "as is", and no longer forced to be symbols
|
|
330
|
-
|
|
331
|
-
**Users with existing code with `--key_mappings` need to change their code** to
|
|
332
|
-
* either use symbols in the `key_mapping` hash
|
|
333
|
-
* or change the expected keys from symbols to strings
|
|
334
|
-
|
|
335
|
-
#### 1.2.9 (2021-11-22) (PULLED)
|
|
336
|
-
* fix bug for key_mappings (issue #181)
|
|
337
|
-
The values of the `key_mappings` hash will now be used "as is", and no longer forced to be symbols
|
|
338
|
-
|
|
339
|
-
#### 1.2.8 (2020-02-04)
|
|
340
|
-
* fix deprecation warnings on Ruby 2.7 (thank to Diego Salido)
|
|
341
|
-
|
|
342
|
-
#### 1.2.7 (2020-02-03)
|
|
343
|
-
|
|
344
|
-
#### 1.2.6 (2018-11-13)
|
|
345
|
-
* fixing error caused by calling f.close when we do not hand in a file
|
|
346
|
-
|
|
347
|
-
#### 1.2.5 (2018-09-16)
|
|
348
|
-
* fixing issue #136 with comments in CSV files
|
|
349
|
-
* fixing error class hierarchy
|
|
350
|
-
|
|
351
|
-
#### 1.2.4 (2018-08-06)
|
|
352
|
-
* using Rails blank? if it's available
|
|
353
|
-
|
|
354
|
-
#### 1.2.3 (2018-01-27)
|
|
355
|
-
* fixed regression / test
|
|
356
|
-
* fuxed quote_char interpolation for headers, but not data (thanks to Colin Petruno)
|
|
357
|
-
* bugfix (thanks to Joshua Smith for reporting)
|
|
358
|
-
|
|
359
|
-
#### 1.2.0 (2018-01-20)
|
|
360
|
-
* add default validation that a header can only appear once
|
|
361
|
-
* add option `required_headers`
|
|
362
|
-
|
|
363
|
-
#### 1.1.5 (2017-11-05)
|
|
364
|
-
* fix issue with invalid byte sequences in header (issue #103, thanks to Dave Myron)
|
|
365
|
-
* fix issue with invalid byte sequences in multi-line data (thanks to Ivan Ushakov)
|
|
366
|
-
* analyze only 500 characters by default when `:row_sep => :auto` is used.
|
|
367
|
-
added option `row_sep_auto_chars` to change the default if necessary. (thanks to Matthieu Paret)
|
|
368
|
-
|
|
369
|
-
#### 1.1.4 (2017-01-16)
|
|
370
|
-
* fixing UTF-8 related bug which was introduced in 1.1.2 (thanks to Tirdad C.)
|
|
371
|
-
|
|
372
|
-
#### 1.1.3 (2016-12-30)
|
|
373
|
-
* added warning when options indicate UTF-8 processing, but input filehandle is not opened with r:UTF-8 option
|
|
374
|
-
|
|
375
|
-
#### 1.1.2 (2016-12-29)
|
|
376
|
-
* added option `invalid_byte_sequence` (thanks to polycarpou)
|
|
377
|
-
* added comments on handling of UTF-8 encoding when opening from File vs. OpenURI (thanks to KevinColemanInc)
|
|
378
|
-
|
|
379
|
-
#### 1.1.1 (2016-11-26)
|
|
380
|
-
* added option to `skip_lines` (thanks to wal)
|
|
381
|
-
* added option to `force_utf8` encoding (thanks to jordangraft)
|
|
382
|
-
* bugfix if no headers in input data (thanks to esBeee)
|
|
383
|
-
* ensure input file is closed (thanks to waldyr)
|
|
384
|
-
* improved verbose output (thankd to benmaher)
|
|
385
|
-
* improved documentation
|
|
386
|
-
|
|
387
|
-
#### 1.1.0 (2015-07-26)
|
|
388
|
-
* added feature :value_converters, which allows parsing of dates, money, and other things (thanks to Raphaël Bleuse, Lucas Camargo de Almeida, Alejandro)
|
|
389
|
-
* added error if :headers_in_file is set to false, and no :user_provided_headers are given (thanks to innhyu)
|
|
390
|
-
* added support to convert dashes to underscore characters in headers (thanks to César Camacho)
|
|
391
|
-
* fixing automatic detection of \r\n line-endings (thanks to feens)
|
|
392
|
-
|
|
393
|
-
#### 1.0.19 (2014-10-29)
|
|
394
|
-
* added option :keep_original_headers to keep CSV-headers as-is (thanks to Benjamin Thouret)
|
|
395
|
-
|
|
396
|
-
#### 1.0.18 (2014-10-27)
|
|
397
|
-
* added support for multi-line fields / csv fields containing CR (thanks to Chris Hilton) (issue #31)
|
|
398
|
-
|
|
399
|
-
#### 1.0.17 (2014-01-13)
|
|
400
|
-
* added option to set :row_sep to :auto , for automatic detection of the row-separator (issue #22)
|
|
401
|
-
|
|
402
|
-
#### 1.0.16 (2014-01-13)
|
|
403
|
-
* :convert_values_to_numeric option can now be qualified with :except or :only (thanks to Hugo Lepetit)
|
|
404
|
-
* removed deprecated `process_csv` method
|
|
405
|
-
|
|
406
|
-
#### 1.0.15 (2013-12-07)
|
|
407
|
-
* new option:
|
|
408
|
-
* :remove_unmapped_keys to completely ignore columns which were not mapped with :key_mapping (thanks to Dave Sanders)
|
|
409
|
-
|
|
410
|
-
#### 1.0.14 (2013-11-01)
|
|
411
|
-
* added GPL-2 and MIT license to GEM spec file; if you need another license contact me
|
|
412
|
-
|
|
413
|
-
#### 1.0.12 (2013-10-15)
|
|
414
|
-
* added RSpec tests
|
|
415
|
-
|
|
416
|
-
#### 1.0.11 (2013-09-28)
|
|
417
|
-
* bugfix : fixed issue #18 - fixing issue with last chunk not being properly returned (thanks to Jordan Running)
|
|
418
|
-
* added RSpec tests
|
|
419
|
-
|
|
420
|
-
#### 1.0.10 (2013-06-26)
|
|
421
|
-
* bugfix : fixed issue #14 - passing options along to CSV.parse (thanks to Marcos Zimmermann)
|
|
422
|
-
|
|
423
|
-
#### 1.0.9 (2013-06-19)
|
|
424
|
-
* bugfix : fixed issue #13 with negative integers and floats not being correctly converted (thanks to Graham Wetzler)
|
|
425
|
-
|
|
426
|
-
#### 1.0.8 (2013-06-01)
|
|
427
|
-
|
|
428
|
-
* bugfix : fixed issue with nil values in inputs with quote-char (thanks to Félix Bellanger)
|
|
429
|
-
* new options:
|
|
430
|
-
* :force_simple_split : to force simiple splitting on :col_sep character for non-standard CSV-files. e.g. without properly escaped :quote_char
|
|
431
|
-
* :verbose : print out line number while processing (to track down problems in input files)
|
|
432
|
-
|
|
433
|
-
#### 1.0.7 (2013-05-20)
|
|
434
|
-
|
|
435
|
-
* allowing process to work with objects with a 'readline' method (thanks to taq)
|
|
436
|
-
* added options:
|
|
437
|
-
* :file_encoding : defaults to utf8 (thanks to MrTin, Paxa)
|
|
438
|
-
|
|
439
|
-
#### 1.0.6 (2013-05-19)
|
|
440
|
-
|
|
441
|
-
* bugfix : quoted fields are now correctly parsed
|
|
442
|
-
|
|
443
|
-
#### 1.0.5 (2013-05-08)
|
|
444
|
-
|
|
445
|
-
* bugfix : for :headers_in_file option
|
|
446
|
-
|
|
447
|
-
#### 1.0.4 (2012-08-17)
|
|
448
|
-
|
|
449
|
-
* renamed the following options:
|
|
450
|
-
* :strip_whitepace_from_values => :strip_whitespace - removes leading/trailing whitespace from headers and values
|
|
451
|
-
|
|
452
|
-
#### 1.0.3 (2012-08-16)
|
|
453
|
-
|
|
454
|
-
* added the following options:
|
|
455
|
-
* :strip_whitepace_from_values - removes leading/trailing whitespace from values
|
|
456
|
-
|
|
457
|
-
#### 1.0.2 (2012-08-02)
|
|
458
|
-
|
|
459
|
-
* added more options for dealing with headers:
|
|
460
|
-
* :user_provided_headers ,user provided Array with header strings or symbols, to precisely define what the headers should be, overriding any in-file headers (default: nil)
|
|
461
|
-
* :headers_in_file , if the file contains headers as the first line (default: true)
|
|
462
|
-
|
|
463
|
-
#### 1.0.1 (2012-07-30)
|
|
464
|
-
|
|
465
|
-
* added the following options:
|
|
466
|
-
* :downcase_header
|
|
467
|
-
* :strings_as_keys
|
|
468
|
-
* :remove_zero_values
|
|
469
|
-
* :remove_values_matching
|
|
470
|
-
* :remove_empty_hashes
|
|
471
|
-
* :convert_values_to_numeric
|
|
472
|
-
|
|
473
|
-
* renamed the following options:
|
|
474
|
-
* :remove_empty_fields => :remove_empty_values
|
|
475
|
-
|
|
476
|
-
|
|
477
|
-
#### 1.0.0 (2012-07-29)
|
|
478
|
-
|
|
479
|
-
* renamed `SmarterCSV.process_csv` to `SmarterCSV.process`.
|
|
480
|
-
|
|
481
|
-
#### 1.0.0.pre1 (2012-07-29)
|
|
482
|
-
|
|
318
|
+
## [ChangeLog](./CHANGELOG.md)
|
|
483
319
|
|
|
484
320
|
## Reporting Bugs / Feature Requests
|
|
485
321
|
|
|
@@ -4,10 +4,10 @@ module SmarterCSV
|
|
|
4
4
|
class IncorrectOption < SmarterCSVException; end
|
|
5
5
|
class DuplicateHeaders < SmarterCSVException; end
|
|
6
6
|
class MissingHeaders < SmarterCSVException; end
|
|
7
|
-
|
|
7
|
+
class NoColSepDetected < SmarterCSVException; end
|
|
8
8
|
|
|
9
9
|
def SmarterCSV.process(input, options={}, &block) # first parameter: filename or input object with readline method
|
|
10
|
-
default_options = {:col_sep => ','
|
|
10
|
+
default_options = {:col_sep => ',', :row_sep => $INPUT_RECORD_SEPARATOR, :quote_char => '"', :force_simple_split => false , :verbose => false ,
|
|
11
11
|
:remove_empty_values => true, :remove_zero_values => false , :remove_values_matching => nil , :remove_empty_hashes => true , :strip_whitespace => true,
|
|
12
12
|
:convert_values_to_numeric => true, :strip_chars_from_headers => nil , :user_provided_headers => nil , :headers_in_file => true,
|
|
13
13
|
:comment_regexp => /\A#/, :chunk_size => nil , :key_mapping_hash => nil , :downcase_header => true, :strings_as_keys => false, :file_encoding => 'utf-8',
|
|
@@ -19,13 +19,16 @@ module SmarterCSV
|
|
|
19
19
|
csv_options = options.select{|k,v| [:col_sep, :row_sep, :quote_char].include?(k)} # options.slice(:col_sep, :row_sep, :quote_char)
|
|
20
20
|
headerA = []
|
|
21
21
|
result = []
|
|
22
|
-
old_row_sep =
|
|
22
|
+
old_row_sep = $INPUT_RECORD_SEPARATOR
|
|
23
23
|
file_line_count = 0
|
|
24
24
|
csv_line_count = 0
|
|
25
25
|
has_rails = !! defined?(Rails)
|
|
26
26
|
begin
|
|
27
27
|
f = input.respond_to?(:readline) ? input : File.open(input, "r:#{options[:file_encoding]}")
|
|
28
28
|
|
|
29
|
+
# attempt to auto-detect column separator
|
|
30
|
+
options[:col_sep] = guess_column_separator(f) if options[:col_sep] == 'auto'
|
|
31
|
+
|
|
29
32
|
if (options[:force_utf8] || options[:file_encoding] =~ /utf-8/i) && ( f.respond_to?(:external_encoding) && f.external_encoding != Encoding.find('UTF-8') || f.respond_to?(:encoding) && f.encoding != Encoding.find('UTF-8') )
|
|
30
33
|
puts 'WARNING: you are trying to process UTF-8 input, but did not open the input with "b:utf-8" option. See README file "NOTES about File Encodings".'
|
|
31
34
|
end
|
|
@@ -34,7 +37,7 @@ module SmarterCSV
|
|
|
34
37
|
options[:row_sep] = line_ending = SmarterCSV.guess_line_ending( f, options )
|
|
35
38
|
f.rewind
|
|
36
39
|
end
|
|
37
|
-
|
|
40
|
+
$INPUT_RECORD_SEPARATOR = options[:row_sep]
|
|
38
41
|
|
|
39
42
|
if options[:skip_lines].to_i > 0
|
|
40
43
|
options[:skip_lines].to_i.times{f.readline}
|
|
@@ -60,14 +63,14 @@ module SmarterCSV
|
|
|
60
63
|
else
|
|
61
64
|
file_headerA = header.split(options[:col_sep])
|
|
62
65
|
end
|
|
66
|
+
file_header_size = file_headerA.size # before mapping, which could delete keys
|
|
67
|
+
|
|
63
68
|
file_headerA.map!{|x| x.gsub(%r/#{options[:quote_char]}/,'') }
|
|
64
69
|
file_headerA.map!{|x| x.strip} if options[:strip_whitespace]
|
|
65
70
|
unless options[:keep_original_headers]
|
|
66
71
|
file_headerA.map!{|x| x.gsub(/\s+|-+/,'_')}
|
|
67
72
|
file_headerA.map!{|x| x.downcase } if options[:downcase_header]
|
|
68
73
|
end
|
|
69
|
-
|
|
70
|
-
file_header_size = file_headerA.size
|
|
71
74
|
else
|
|
72
75
|
raise SmarterCSV::IncorrectOption , "ERROR: If :headers_in_file is set to false, you have to provide :user_provided_headers" if options[:user_provided_headers].nil?
|
|
73
76
|
end
|
|
@@ -84,6 +87,8 @@ module SmarterCSV
|
|
|
84
87
|
else
|
|
85
88
|
headerA = file_headerA
|
|
86
89
|
end
|
|
90
|
+
header_size = headerA.size
|
|
91
|
+
|
|
87
92
|
headerA.map!{|x| x.to_sym } unless options[:strings_as_keys] || options[:keep_original_headers]
|
|
88
93
|
|
|
89
94
|
unless options[:user_provided_headers] # wouldn't make sense to re-map user provided headers
|
|
@@ -123,7 +128,7 @@ module SmarterCSV
|
|
|
123
128
|
|
|
124
129
|
# now on to processing all the rest of the lines in the CSV file:
|
|
125
130
|
while ! f.eof? # we can't use f.readlines() here, because this would read the whole file into memory at once, and eof => true
|
|
126
|
-
line = f.readline # read one line.. this uses the input_record_separator
|
|
131
|
+
line = f.readline # read one line.. this uses the input_record_separator $INPUT_RECORD_SEPARATOR which we set previously!
|
|
127
132
|
|
|
128
133
|
# replace invalid byte sequence in UTF-8 with question mark to avoid errors
|
|
129
134
|
line = line.force_encoding('utf-8').encode('utf-8', invalid: :replace, undef: :replace, replace: options[:invalid_byte_sequence]) if options[:force_utf8] || options[:file_encoding] !~ /utf-8/i
|
|
@@ -145,7 +150,7 @@ module SmarterCSV
|
|
|
145
150
|
end
|
|
146
151
|
print "\nline contains uneven number of quote chars so including content through file line %d\n" % file_line_count if options[:verbose] && multiline
|
|
147
152
|
|
|
148
|
-
line.chomp! # will use
|
|
153
|
+
line.chomp! # will use $INPUT_RECORD_SEPARATOR which is set to options[:col_sep]
|
|
149
154
|
|
|
150
155
|
if (line =~ %r{#{options[:quote_char]}}) and (! options[:force_simple_split])
|
|
151
156
|
dataA = begin
|
|
@@ -154,11 +159,17 @@ module SmarterCSV
|
|
|
154
159
|
raise $!, "#{$!} [SmarterCSV: csv line #{csv_line_count}]", $!.backtrace
|
|
155
160
|
end
|
|
156
161
|
else
|
|
157
|
-
dataA = line.split(options[:col_sep])
|
|
162
|
+
dataA = line.split(options[:col_sep], header_size)
|
|
158
163
|
end
|
|
159
164
|
#### dataA.map!{|x| x.gsub(%r/#{options[:quote_char]}/,'') } # this is actually not a good idea as a default
|
|
160
165
|
dataA.map!{|x| x.strip} if options[:strip_whitespace]
|
|
166
|
+
|
|
167
|
+
# if all values are blank, then ignore this line
|
|
168
|
+
# SEE: https://github.com/rails/rails/blob/32015b6f369adc839c4f0955f2d9dce50c0b6123/activesupport/lib/active_support/core_ext/object/blank.rb#L121
|
|
169
|
+
next if options[:remove_empty_hashes] && blank?(dataA)
|
|
170
|
+
|
|
161
171
|
hash = Hash.zip(headerA,dataA) # from Facets of Ruby library
|
|
172
|
+
|
|
162
173
|
# make sure we delete any key/value pairs from the hash, which the user wanted to delete:
|
|
163
174
|
# Note: Ruby < 1.9 doesn't allow empty symbol literals!
|
|
164
175
|
hash.delete(nil); hash.delete('');
|
|
@@ -166,18 +177,17 @@ module SmarterCSV
|
|
|
166
177
|
eval('hash.delete(:"")')
|
|
167
178
|
end
|
|
168
179
|
|
|
169
|
-
|
|
170
|
-
# which caters for double \n and \r\n characters such as "1\r\n\r\n2" whereas the original check (v =~ /^\s*$/) does not
|
|
171
|
-
if options[:remove_empty_values]
|
|
180
|
+
if options[:remove_empty_values] == true
|
|
172
181
|
if has_rails
|
|
173
182
|
hash.delete_if{|k,v| v.blank?}
|
|
174
183
|
else
|
|
175
|
-
hash.delete_if{|k,v|
|
|
184
|
+
hash.delete_if{|k,v| blank?(v)}
|
|
176
185
|
end
|
|
177
186
|
end
|
|
178
187
|
|
|
179
188
|
hash.delete_if{|k,v| ! v.nil? && v =~ /^(\d+|\d+\.\d+)$/ && v.to_f == 0} if options[:remove_zero_values] # values are typically Strings!
|
|
180
189
|
hash.delete_if{|k,v| v =~ options[:remove_values_matching]} if options[:remove_values_matching]
|
|
190
|
+
|
|
181
191
|
if options[:convert_values_to_numeric]
|
|
182
192
|
hash.each do |k,v|
|
|
183
193
|
# deal with the :only / :except options to :convert_values_to_numeric
|
|
@@ -247,7 +257,7 @@ module SmarterCSV
|
|
|
247
257
|
chunk = [] # initialize for next chunk of data
|
|
248
258
|
end
|
|
249
259
|
ensure
|
|
250
|
-
|
|
260
|
+
$INPUT_RECORD_SEPARATOR = old_row_sep # make sure this stupid global variable is always reset to it's previous value after we're done!
|
|
251
261
|
f.close if f.respond_to?(:close)
|
|
252
262
|
end
|
|
253
263
|
if block_given?
|
|
@@ -258,8 +268,30 @@ module SmarterCSV
|
|
|
258
268
|
end
|
|
259
269
|
|
|
260
270
|
private
|
|
261
|
-
# acts as a road-block to limit processing when iterating over all k/v pairs of a CSV-hash:
|
|
262
271
|
|
|
272
|
+
def self.blank?(value)
|
|
273
|
+
case value
|
|
274
|
+
when Array
|
|
275
|
+
value.inject(true){|result, x| result &&= elem_blank?(x)}
|
|
276
|
+
when Hash
|
|
277
|
+
value.inject(true){|result, x| result &&= elem_blank?(x.last)}
|
|
278
|
+
else
|
|
279
|
+
elem_blank?(value)
|
|
280
|
+
end
|
|
281
|
+
end
|
|
282
|
+
|
|
283
|
+
def self.elem_blank?(value)
|
|
284
|
+
case value
|
|
285
|
+
when NilClass
|
|
286
|
+
true
|
|
287
|
+
when String
|
|
288
|
+
value !~ /\S/
|
|
289
|
+
else
|
|
290
|
+
false
|
|
291
|
+
end
|
|
292
|
+
end
|
|
293
|
+
|
|
294
|
+
# acts as a road-block to limit processing when iterating over all k/v pairs of a CSV-hash:
|
|
263
295
|
def self.only_or_except_limit_execution( options, option_name, key )
|
|
264
296
|
if options[option_name].is_a?(Hash)
|
|
265
297
|
if options[option_name].has_key?( :except )
|
|
@@ -271,6 +303,24 @@ module SmarterCSV
|
|
|
271
303
|
return false
|
|
272
304
|
end
|
|
273
305
|
|
|
306
|
+
# raise exception if none is found
|
|
307
|
+
def self.guess_column_separator(filehandle)
|
|
308
|
+
del = [',', "\t", ';', ':', '|']
|
|
309
|
+
n = Hash.new(0)
|
|
310
|
+
5.times do
|
|
311
|
+
line = filehandle.readline
|
|
312
|
+
del.each do |d|
|
|
313
|
+
n[d] += line.scan(d).count
|
|
314
|
+
end
|
|
315
|
+
rescue EOFError # short files
|
|
316
|
+
break
|
|
317
|
+
end
|
|
318
|
+
filehandle.rewind
|
|
319
|
+
raise SmarterCSV::NoColSepDetected if n.values.max == 0
|
|
320
|
+
|
|
321
|
+
col_sep = n.key(n.values.max)
|
|
322
|
+
end
|
|
323
|
+
|
|
274
324
|
# limitation: this currently reads the whole file in before making a decision
|
|
275
325
|
def self.guess_line_ending( filehandle, options )
|
|
276
326
|
counts = {"\n" => 0 , "\r" => 0, "\r\n" => 0}
|
data/lib/smarter_csv/version.rb
CHANGED
data/smarter_csv.gemspec
CHANGED
|
@@ -1,21 +1,24 @@
|
|
|
1
1
|
# -*- encoding: utf-8 -*-
|
|
2
2
|
require File.expand_path('../lib/smarter_csv/version', __FILE__)
|
|
3
3
|
|
|
4
|
-
Gem::Specification.new do |
|
|
5
|
-
|
|
6
|
-
|
|
7
|
-
|
|
8
|
-
|
|
9
|
-
gem.homepage = "https://github.com/tilo/smarter_csv"
|
|
4
|
+
Gem::Specification.new do |spec|
|
|
5
|
+
spec.name = "smarter_csv"
|
|
6
|
+
spec.version = SmarterCSV::VERSION
|
|
7
|
+
spec.authors = ["Tilo Sloboda"]
|
|
8
|
+
spec.email = ["tilo.sloboda@gmail.com"]
|
|
10
9
|
|
|
11
|
-
|
|
12
|
-
|
|
13
|
-
|
|
14
|
-
|
|
15
|
-
|
|
16
|
-
|
|
17
|
-
|
|
18
|
-
|
|
19
|
-
|
|
20
|
-
|
|
10
|
+
spec.summary = %q{Ruby Gem for smarter importing of CSV Files (and CSV-like files), with lots of optional features, e.g. chunked processing for huge CSV files}
|
|
11
|
+
spec.description = %q{Ruby Gem for smarter importing of CSV Files as Array(s) of Hashes, with optional features for processing large files in parallel, embedded comments, unusual field- and record-separators, flexible mapping of CSV-headers to Hash-keys}
|
|
12
|
+
spec.homepage = "https://github.com/tilo/smarter_csv"
|
|
13
|
+
spec.license = 'MIT'
|
|
14
|
+
|
|
15
|
+
spec.files = `git ls-files`.split($\)
|
|
16
|
+
spec.executables = spec.files.grep(%r{^bin/}).map{ |f| File.basename(f) }
|
|
17
|
+
spec.test_files = spec.files.grep(%r{^(test|spec|features)/})
|
|
18
|
+
spec.require_paths = ["lib"]
|
|
19
|
+
spec.requirements = ['csv'] # for CSV.parse() only needed in case we have quoted fields
|
|
20
|
+
spec.add_development_dependency "rspec"
|
|
21
|
+
# spec.add_development_dependency "guard-rspec"
|
|
22
|
+
|
|
23
|
+
spec.metadata["homepage_uri"] = spec.homepage
|
|
21
24
|
end
|
data/spec/fixtures/numeric.csv
CHANGED
|
File without changes
|
|
@@ -0,0 +1,55 @@
|
|
|
1
|
+
require 'spec_helper'
|
|
2
|
+
|
|
3
|
+
describe 'blank?' do
|
|
4
|
+
it 'is true for nil' do
|
|
5
|
+
SmarterCSV.send(:blank?, nil).should eq true
|
|
6
|
+
end
|
|
7
|
+
|
|
8
|
+
it 'is true for empty string' do
|
|
9
|
+
SmarterCSV.send(:blank?, '').should eq true
|
|
10
|
+
end
|
|
11
|
+
|
|
12
|
+
it 'is true for blank string' do
|
|
13
|
+
SmarterCSV.send(:blank?, ' ').should eq true
|
|
14
|
+
end
|
|
15
|
+
|
|
16
|
+
it 'is true for tab string' do
|
|
17
|
+
SmarterCSV.send(:blank?, " \t ").should eq true
|
|
18
|
+
end
|
|
19
|
+
|
|
20
|
+
it 'is false for string with content' do
|
|
21
|
+
SmarterCSV.send(:blank?, " 1 ").should eq false
|
|
22
|
+
end
|
|
23
|
+
|
|
24
|
+
it 'is false for numeic values' do
|
|
25
|
+
SmarterCSV.send(:blank?, 1).should eq false
|
|
26
|
+
end
|
|
27
|
+
|
|
28
|
+
describe 'arrays' do
|
|
29
|
+
it 'is true for empty arrays' do
|
|
30
|
+
SmarterCSV.send(:blank?, []).should eq true
|
|
31
|
+
end
|
|
32
|
+
|
|
33
|
+
it 'is true for blank arrays' do
|
|
34
|
+
SmarterCSV.send(:blank?, [nil, '', ' ', " \t "]).should eq true
|
|
35
|
+
end
|
|
36
|
+
|
|
37
|
+
it 'is false for non-blank arrays' do
|
|
38
|
+
SmarterCSV.send(:blank?, [nil, '', ' ', " 1 "]).should eq false
|
|
39
|
+
end
|
|
40
|
+
end
|
|
41
|
+
|
|
42
|
+
describe 'hashes' do
|
|
43
|
+
it 'is true for empty arrays' do
|
|
44
|
+
SmarterCSV.send(:blank?, {}).should eq true
|
|
45
|
+
end
|
|
46
|
+
|
|
47
|
+
it 'is true for blank arrays' do
|
|
48
|
+
SmarterCSV.send(:blank?, {a: nil, b: '', c: ' ', d: " \t "}).should eq true
|
|
49
|
+
end
|
|
50
|
+
|
|
51
|
+
it 'is false for non-blank arrays' do
|
|
52
|
+
SmarterCSV.send(:blank?, {a: nil, b: '', c: ' ', d: " 1 "}).should eq false
|
|
53
|
+
end
|
|
54
|
+
end
|
|
55
|
+
end
|
|
@@ -2,10 +2,88 @@ require 'spec_helper'
|
|
|
2
2
|
|
|
3
3
|
fixture_path = 'spec/fixtures'
|
|
4
4
|
|
|
5
|
-
describe '
|
|
6
|
-
|
|
7
|
-
|
|
8
|
-
data = SmarterCSV.process("#{fixture_path}/
|
|
9
|
-
data.
|
|
5
|
+
describe 'can handle col_sep' do
|
|
6
|
+
|
|
7
|
+
it 'has default of comma as col_sep' do
|
|
8
|
+
data = SmarterCSV.process("#{fixture_path}/separator_comma.csv") # no options
|
|
9
|
+
data.first.keys.size.should == 4
|
|
10
|
+
data.size.should eq 3
|
|
11
|
+
end
|
|
12
|
+
|
|
13
|
+
describe 'with explicitly given col_sep' do
|
|
14
|
+
it 'loads file with comma separator' do
|
|
15
|
+
options = {:col_sep => ','}
|
|
16
|
+
data = SmarterCSV.process("#{fixture_path}/separator_comma.csv", options)
|
|
17
|
+
data.first.keys.size.should == 4
|
|
18
|
+
data.size.should eq 3
|
|
19
|
+
end
|
|
20
|
+
|
|
21
|
+
it 'loads file with tab separator' do
|
|
22
|
+
options = {:col_sep => "\t"}
|
|
23
|
+
data = SmarterCSV.process("#{fixture_path}/separator_tab.csv", options)
|
|
24
|
+
data.first.keys.size.should == 4
|
|
25
|
+
data.size.should eq 3
|
|
26
|
+
end
|
|
27
|
+
|
|
28
|
+
it 'loads file with semi-colon separator' do
|
|
29
|
+
options = {:col_sep => ';'}
|
|
30
|
+
data = SmarterCSV.process("#{fixture_path}/separator_semi.csv", options)
|
|
31
|
+
data.first.keys.size.should == 4
|
|
32
|
+
data.size.should eq 3
|
|
33
|
+
end
|
|
34
|
+
|
|
35
|
+
it 'loads file with colon separator' do
|
|
36
|
+
options = {:col_sep => ':'}
|
|
37
|
+
data = SmarterCSV.process("#{fixture_path}/separator_colon.csv", options)
|
|
38
|
+
data.first.keys.size.should == 4
|
|
39
|
+
data.size.should eq 3
|
|
40
|
+
end
|
|
41
|
+
|
|
42
|
+
it 'loads file with pipe separator' do
|
|
43
|
+
options = {:col_sep => '|'}
|
|
44
|
+
data = SmarterCSV.process("#{fixture_path}/separator_pipe.csv", options)
|
|
45
|
+
data.first.keys.size.should == 4
|
|
46
|
+
data.size.should eq 3
|
|
47
|
+
end
|
|
48
|
+
end
|
|
49
|
+
|
|
50
|
+
describe 'auto-detection of separator' do
|
|
51
|
+
options = {:col_sep => 'auto'}
|
|
52
|
+
|
|
53
|
+
it 'auto-detects comma separator and loads data' do
|
|
54
|
+
data = SmarterCSV.process("#{fixture_path}/separator_comma.csv", options)
|
|
55
|
+
data.first.keys.size.should == 4
|
|
56
|
+
data.size.should eq 3
|
|
57
|
+
end
|
|
58
|
+
|
|
59
|
+
it 'auto-detects tab separator and loads data' do
|
|
60
|
+
data = SmarterCSV.process("#{fixture_path}/separator_tab.csv", options)
|
|
61
|
+
data.first.keys.size.should == 4
|
|
62
|
+
data.size.should eq 3
|
|
63
|
+
end
|
|
64
|
+
|
|
65
|
+
it 'auto-detects semi-colon separator and loads data' do
|
|
66
|
+
data = SmarterCSV.process("#{fixture_path}/separator_semi.csv", options)
|
|
67
|
+
data.first.keys.size.should == 4
|
|
68
|
+
data.size.should eq 3
|
|
69
|
+
end
|
|
70
|
+
|
|
71
|
+
it 'auto-detects colon separator and loads data' do
|
|
72
|
+
data = SmarterCSV.process("#{fixture_path}/separator_colon.csv", options)
|
|
73
|
+
data.first.keys.size.should == 4
|
|
74
|
+
data.size.should eq 3
|
|
75
|
+
end
|
|
76
|
+
|
|
77
|
+
it 'auto-detects pipe separator and loads data' do
|
|
78
|
+
data = SmarterCSV.process("#{fixture_path}/separator_pipe.csv", options)
|
|
79
|
+
data.first.keys.size.should == 4
|
|
80
|
+
data.size.should eq 3
|
|
81
|
+
end
|
|
82
|
+
|
|
83
|
+
it 'does not auto-detect other separators' do
|
|
84
|
+
expect {
|
|
85
|
+
SmarterCSV.process("#{fixture_path}/binary.csv", options)
|
|
86
|
+
}.to raise_exception SmarterCSV::NoColSepDetected
|
|
87
|
+
end
|
|
10
88
|
end
|
|
11
89
|
end
|
|
@@ -0,0 +1,74 @@
|
|
|
1
|
+
require 'spec_helper'
|
|
2
|
+
|
|
3
|
+
fixture_path = 'spec/fixtures'
|
|
4
|
+
|
|
5
|
+
describe 'can handle empty columns' do
|
|
6
|
+
|
|
7
|
+
describe 'default behavior' do
|
|
8
|
+
it 'has empty columns at end' do
|
|
9
|
+
data = SmarterCSV.process("#{fixture_path}/empty_columns_1.csv")
|
|
10
|
+
data.size.should eq 1
|
|
11
|
+
item = data.first
|
|
12
|
+
item[:id].should == 123
|
|
13
|
+
item[:col1].should == nil
|
|
14
|
+
item[:col2].should == nil
|
|
15
|
+
item[:col3].should == nil
|
|
16
|
+
end
|
|
17
|
+
|
|
18
|
+
it 'has empty columns in the middle' do
|
|
19
|
+
data = SmarterCSV.process("#{fixture_path}/empty_columns_2.csv")
|
|
20
|
+
data.size.should eq 1
|
|
21
|
+
item = data.first
|
|
22
|
+
item[:id].should == 123
|
|
23
|
+
item[:col1].should == nil
|
|
24
|
+
item[:col2].should == nil
|
|
25
|
+
item[:col3].should == 1
|
|
26
|
+
end
|
|
27
|
+
end
|
|
28
|
+
|
|
29
|
+
describe 'with remove_empty_values: true' do
|
|
30
|
+
options = {remove_empty_values: true}
|
|
31
|
+
it 'has empty columns at end' do
|
|
32
|
+
data = SmarterCSV.process("#{fixture_path}/empty_columns_1.csv", options)
|
|
33
|
+
data.size.should eq 1
|
|
34
|
+
item = data.first
|
|
35
|
+
item[:id].should == 123
|
|
36
|
+
item[:col1].should == nil
|
|
37
|
+
item[:col2].should == nil
|
|
38
|
+
item[:col3].should == nil
|
|
39
|
+
end
|
|
40
|
+
|
|
41
|
+
it 'has empty columns in the middle' do
|
|
42
|
+
data = SmarterCSV.process("#{fixture_path}/empty_columns_2.csv", options)
|
|
43
|
+
data.size.should eq 1
|
|
44
|
+
item = data.first
|
|
45
|
+
item[:id].should == 123
|
|
46
|
+
item[:col1].should == nil
|
|
47
|
+
item[:col2].should == nil
|
|
48
|
+
item[:col3].should == 1
|
|
49
|
+
end
|
|
50
|
+
end
|
|
51
|
+
|
|
52
|
+
describe 'with remove_empty_values: false' do
|
|
53
|
+
options = {remove_empty_values: false}
|
|
54
|
+
it 'has empty columns at end' do
|
|
55
|
+
data = SmarterCSV.process("#{fixture_path}/empty_columns_1.csv", options)
|
|
56
|
+
data.size.should eq 1
|
|
57
|
+
item = data.first
|
|
58
|
+
item[:id].should == 123
|
|
59
|
+
item[:col1].should == ''
|
|
60
|
+
item[:col2].should == ''
|
|
61
|
+
item[:col3].should == ''
|
|
62
|
+
end
|
|
63
|
+
|
|
64
|
+
it 'has empty columns in the middle' do
|
|
65
|
+
data = SmarterCSV.process("#{fixture_path}/empty_columns_2.csv", options)
|
|
66
|
+
data.size.should eq 1
|
|
67
|
+
item = data.first
|
|
68
|
+
item[:id].should == 123
|
|
69
|
+
item[:col1].should == ''
|
|
70
|
+
item[:col2].should == ''
|
|
71
|
+
item[:col3].should == 1
|
|
72
|
+
end
|
|
73
|
+
end
|
|
74
|
+
end
|
metadata
CHANGED
|
@@ -1,16 +1,14 @@
|
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
|
2
2
|
name: smarter_csv
|
|
3
3
|
version: !ruby/object:Gem::Version
|
|
4
|
-
version: 1.
|
|
4
|
+
version: 1.4.0
|
|
5
5
|
platform: ruby
|
|
6
6
|
authors:
|
|
7
|
-
-
|
|
8
|
-
|
|
9
|
-
'
|
|
7
|
+
- Tilo Sloboda
|
|
10
8
|
autorequire:
|
|
11
9
|
bindir: bin
|
|
12
10
|
cert_chain: []
|
|
13
|
-
date: 2022-02-
|
|
11
|
+
date: 2022-02-11 00:00:00.000000000 Z
|
|
14
12
|
dependencies:
|
|
15
13
|
- !ruby/object:Gem::Dependency
|
|
16
14
|
name: rspec
|
|
@@ -30,9 +28,7 @@ description: Ruby Gem for smarter importing of CSV Files as Array(s) of Hashes,
|
|
|
30
28
|
optional features for processing large files in parallel, embedded comments, unusual
|
|
31
29
|
field- and record-separators, flexible mapping of CSV-headers to Hash-keys
|
|
32
30
|
email:
|
|
33
|
-
-
|
|
34
|
-
|
|
35
|
-
'
|
|
31
|
+
- tilo.sloboda@gmail.com
|
|
36
32
|
executables: []
|
|
37
33
|
extensions: []
|
|
38
34
|
extra_rdoc_files: []
|
|
@@ -41,7 +37,9 @@ files:
|
|
|
41
37
|
- ".rspec"
|
|
42
38
|
- ".rvmrc"
|
|
43
39
|
- ".travis.yml"
|
|
40
|
+
- CHANGELOG.md
|
|
44
41
|
- Gemfile
|
|
42
|
+
- LICENSE.txt
|
|
45
43
|
- README.md
|
|
46
44
|
- Rakefile
|
|
47
45
|
- lib/extensions/hash.rb
|
|
@@ -58,6 +56,8 @@ files:
|
|
|
58
56
|
- spec/fixtures/chunk_cornercase.csv
|
|
59
57
|
- spec/fixtures/duplicate_headers.csv
|
|
60
58
|
- spec/fixtures/empty.csv
|
|
59
|
+
- spec/fixtures/empty_columns_1.csv
|
|
60
|
+
- spec/fixtures/empty_columns_2.csv
|
|
61
61
|
- spec/fixtures/ignore_comments.csv
|
|
62
62
|
- spec/fixtures/ignore_comments2.csv
|
|
63
63
|
- spec/fixtures/key_mapping.csv
|
|
@@ -75,7 +75,11 @@ files:
|
|
|
75
75
|
- spec/fixtures/quote_char.csv
|
|
76
76
|
- spec/fixtures/quoted.csv
|
|
77
77
|
- spec/fixtures/quoted2.csv
|
|
78
|
-
- spec/fixtures/
|
|
78
|
+
- spec/fixtures/separator_colon.csv
|
|
79
|
+
- spec/fixtures/separator_comma.csv
|
|
80
|
+
- spec/fixtures/separator_pipe.csv
|
|
81
|
+
- spec/fixtures/separator_semi.csv
|
|
82
|
+
- spec/fixtures/separator_tab.csv
|
|
79
83
|
- spec/fixtures/skip_lines.csv
|
|
80
84
|
- spec/fixtures/trading.csv
|
|
81
85
|
- spec/fixtures/user_import.csv
|
|
@@ -84,11 +88,13 @@ files:
|
|
|
84
88
|
- spec/fixtures/with_dates.csv
|
|
85
89
|
- spec/smarter_csv/binary_file2_spec.rb
|
|
86
90
|
- spec/smarter_csv/binary_file_spec.rb
|
|
91
|
+
- spec/smarter_csv/blank_spec.rb
|
|
87
92
|
- spec/smarter_csv/carriage_return_spec.rb
|
|
88
93
|
- spec/smarter_csv/chunked_reading_spec.rb
|
|
89
94
|
- spec/smarter_csv/close_file_spec.rb
|
|
90
95
|
- spec/smarter_csv/column_separator_spec.rb
|
|
91
96
|
- spec/smarter_csv/convert_values_to_numeric_spec.rb
|
|
97
|
+
- spec/smarter_csv/empty_columns_spec.rb
|
|
92
98
|
- spec/smarter_csv/extenstions_spec.rb
|
|
93
99
|
- spec/smarter_csv/header_transformation_spec.rb
|
|
94
100
|
- spec/smarter_csv/ignore_comments_spec.rb
|
|
@@ -119,8 +125,8 @@ files:
|
|
|
119
125
|
homepage: https://github.com/tilo/smarter_csv
|
|
120
126
|
licenses:
|
|
121
127
|
- MIT
|
|
122
|
-
|
|
123
|
-
|
|
128
|
+
metadata:
|
|
129
|
+
homepage_uri: https://github.com/tilo/smarter_csv
|
|
124
130
|
post_install_message:
|
|
125
131
|
rdoc_options: []
|
|
126
132
|
require_paths:
|
|
@@ -152,6 +158,8 @@ test_files:
|
|
|
152
158
|
- spec/fixtures/chunk_cornercase.csv
|
|
153
159
|
- spec/fixtures/duplicate_headers.csv
|
|
154
160
|
- spec/fixtures/empty.csv
|
|
161
|
+
- spec/fixtures/empty_columns_1.csv
|
|
162
|
+
- spec/fixtures/empty_columns_2.csv
|
|
155
163
|
- spec/fixtures/ignore_comments.csv
|
|
156
164
|
- spec/fixtures/ignore_comments2.csv
|
|
157
165
|
- spec/fixtures/key_mapping.csv
|
|
@@ -169,7 +177,11 @@ test_files:
|
|
|
169
177
|
- spec/fixtures/quote_char.csv
|
|
170
178
|
- spec/fixtures/quoted.csv
|
|
171
179
|
- spec/fixtures/quoted2.csv
|
|
172
|
-
- spec/fixtures/
|
|
180
|
+
- spec/fixtures/separator_colon.csv
|
|
181
|
+
- spec/fixtures/separator_comma.csv
|
|
182
|
+
- spec/fixtures/separator_pipe.csv
|
|
183
|
+
- spec/fixtures/separator_semi.csv
|
|
184
|
+
- spec/fixtures/separator_tab.csv
|
|
173
185
|
- spec/fixtures/skip_lines.csv
|
|
174
186
|
- spec/fixtures/trading.csv
|
|
175
187
|
- spec/fixtures/user_import.csv
|
|
@@ -178,11 +190,13 @@ test_files:
|
|
|
178
190
|
- spec/fixtures/with_dates.csv
|
|
179
191
|
- spec/smarter_csv/binary_file2_spec.rb
|
|
180
192
|
- spec/smarter_csv/binary_file_spec.rb
|
|
193
|
+
- spec/smarter_csv/blank_spec.rb
|
|
181
194
|
- spec/smarter_csv/carriage_return_spec.rb
|
|
182
195
|
- spec/smarter_csv/chunked_reading_spec.rb
|
|
183
196
|
- spec/smarter_csv/close_file_spec.rb
|
|
184
197
|
- spec/smarter_csv/column_separator_spec.rb
|
|
185
198
|
- spec/smarter_csv/convert_values_to_numeric_spec.rb
|
|
199
|
+
- spec/smarter_csv/empty_columns_spec.rb
|
|
186
200
|
- spec/smarter_csv/extenstions_spec.rb
|
|
187
201
|
- spec/smarter_csv/header_transformation_spec.rb
|
|
188
202
|
- spec/smarter_csv/ignore_comments_spec.rb
|