smarter_csv 1.2.0 → 1.2.3
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +5 -5
- data/README.md +56 -18
- data/lib/smarter_csv/smarter_csv.rb +6 -6
- data/lib/smarter_csv/version.rb +1 -1
- data/spec/fixtures/quote_char.csv +9 -0
- data/spec/fixtures/quoted2.csv +4 -0
- data/spec/fixtures/trading.csv +3 -0
- data/spec/smarter_csv/quoted_spec.rb +27 -2
- data/spec/smarter_csv/trading_spec.rb +25 -0
- metadata +11 -3
checksums.yaml
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
---
|
2
|
-
|
3
|
-
metadata.gz:
|
4
|
-
data.tar.gz:
|
2
|
+
SHA256:
|
3
|
+
metadata.gz: c0efcd4dbad2546469ba99c1a133bc7548382e147fca66207d11065607fdb5ee
|
4
|
+
data.tar.gz: 98024633925ed73251dd00606f23f73ca6cbda5517c24882b95935138ab4d7e3
|
5
5
|
SHA512:
|
6
|
-
metadata.gz:
|
7
|
-
data.tar.gz:
|
6
|
+
metadata.gz: ac64f3a688f9b5b4bce09e26d8ee140994bf585578ea6eed1d998cec434535f3afb4aa8c2ea222b873cb5ee113fba748412a96e1b3b23a8b5425d13aa7a6436c
|
7
|
+
data.tar.gz: f8c3132f1f2cf60f8f67d2278d0b3660d7aab456581256e1c2508d1397c1cc31d455035c68656777c34e90ad6238c239699b1c59cdd918cdcd63c9f5a264aaba
|
data/README.md
CHANGED
@@ -2,6 +2,19 @@
|
|
2
2
|
|
3
3
|
[](http://travis-ci.org/tilo/smarter_csv) [](http://badge.fury.io/rb/smarter_csv)
|
4
4
|
|
5
|
+
---------------
|
6
|
+
#### Service Announcement
|
7
|
+
|
8
|
+
Work towards SmarterCSV 2.0 is on it's way, with much improved features, and more streamlined options.
|
9
|
+
|
10
|
+
Please check the 2.0-develop branch, and open issues marked v2.0 and leave your comments.
|
11
|
+
|
12
|
+
New versions on the 1.2 branch will soon print a deprecation warning if you set :verbose to true
|
13
|
+
See below for list of deprecated options.
|
14
|
+
|
15
|
+
---------------
|
16
|
+
#### SmarterCSV
|
17
|
+
|
5
18
|
`smarter_csv` is a Ruby Gem for smarter importing of CSV Files as Array(s) of Hashes, suitable for direct processing with Mongoid or ActiveRecord,
|
6
19
|
and parallel processing with Resque or Sidekiq.
|
7
20
|
|
@@ -182,18 +195,48 @@ The options and the block are optional.
|
|
182
195
|
|
183
196
|
`SmarterCSV.process` supports the following options:
|
184
197
|
|
198
|
+
#### Options:
|
199
|
+
|
185
200
|
| Option | Default | Explanation |
|
186
201
|
---------------------------------------------------------------------------------------------------------------------------------
|
202
|
+
| :chunk_size | nil | if set, determines the desired chunk-size (defaults to nil, no chunk processing) |
|
203
|
+
| | | |
|
204
|
+
| :file_encoding | utf-8 | Set the file encoding eg.: 'windows-1252' or 'iso-8859-1' |
|
205
|
+
| :invalid_byte_sequence | '' | what to replace invalid byte sequences with |
|
206
|
+
| :force_utf8 | false | force UTF-8 encoding of all lines (including headers) in the CSV file |
|
207
|
+
| :skip_lines | nil | how many lines to skip before the first line or header line is processed |
|
208
|
+
| :comment_regexp | /^#/ | regular expression which matches comment lines (see NOTE about the CSV header) |
|
209
|
+
---------------------------------------------------------------------------------------------------------------------------------
|
187
210
|
| :col_sep | ',' | column separator |
|
211
|
+
| :force_simple_split | false | force simple splitting on :col_sep character for non-standard CSV-files. |
|
212
|
+
| | | e.g. when :quote_char is not properly escaped |
|
188
213
|
| :row_sep | $/ ,"\n" | row separator or record separator , defaults to system's $/ , which defaults to "\n" |
|
189
214
|
| | | This can also be set to :auto, but will process the whole cvs file first (slow!) |
|
190
215
|
| :auto_row_sep_chars | 500 | How many characters to analyze when using `:row_sep => :auto`. nil or 0 means whole file. |
|
191
216
|
| :quote_char | '"' | quotation character |
|
192
|
-
|
193
|
-
| :
|
217
|
+
---------------------------------------------------------------------------------------------------------------------------------
|
218
|
+
| :headers_in_file | true | Whether or not the file contains headers as the first line. |
|
219
|
+
| | | Important if the file does not contain headers, |
|
220
|
+
| | | otherwise you would lose the first line of data. |
|
221
|
+
| :user_provided_headers | nil | *careful with that axe!* |
|
222
|
+
| | | user provided Array of header strings or symbols, to define |
|
223
|
+
| | | what headers should be used, overriding any in-file headers. |
|
224
|
+
| | | You can not combine the :user_provided_headers and :key_mapping options |
|
225
|
+
| :remove_empty_hashes | true | remove / ignore any hashes which don't have any key/value pairs |
|
226
|
+
| :verbose | false | print out line number while processing (to track down problems in input files) |
|
227
|
+
---------------------------------------------------------------------------------------------------------------------------------
|
228
|
+
|
229
|
+
#### Deprecated 1.x Options: to be replaced in 2.0
|
230
|
+
|
231
|
+
There have been a lot of 1-offs and feature creep around these options, and going forward we'll have a simpler, but more flexible way to address these features.
|
232
|
+
|
233
|
+
Instead of these options, there will be a new and more flexible way to process the header fields, as well as the fields in each line of the CSV.
|
234
|
+
And header and data validations will also be supported in 2.x
|
235
|
+
|
236
|
+
| Option | Default | Explanation |
|
194
237
|
---------------------------------------------------------------------------------------------------------------------------------
|
195
238
|
| :key_mapping | nil | a hash which maps headers from the CSV file to keys in the result hash |
|
196
|
-
| :required_headers | nil | An array. Eacn of the given headers must be present
|
239
|
+
| :required_headers | nil | An array. Eacn of the given headers must be present after header manipulation, |
|
197
240
|
| | | or an exception is raised No validation if nil is given. |
|
198
241
|
| :remove_unmapped_keys | false | when using :key_mapping option, should non-mapped keys / columns be removed? |
|
199
242
|
| :downcase_header | true | downcase all column headers |
|
@@ -201,17 +244,7 @@ The options and the block are optional.
|
|
201
244
|
| :strip_whitespace | true | remove whitespace before/after values and headers |
|
202
245
|
| :keep_original_headers | false | keep the original headers from the CSV-file as-is. |
|
203
246
|
| | | Disables other flags manipulating the header fields. |
|
204
|
-
| :user_provided_headers | nil | *careful with that axe!* |
|
205
|
-
| | | user provided Array of header strings or symbols, to define |
|
206
|
-
| | | what headers should be used, overriding any in-file headers. |
|
207
|
-
| | | You can not combine the :user_provided_headers and :key_mapping options |
|
208
247
|
| :strip_chars_from_headers | nil | RegExp to remove extraneous characters from the header line (e.g. if headers are quoted) |
|
209
|
-
| :headers_in_file | true | Whether or not the file contains headers as the first line. |
|
210
|
-
| | | Important if the file does not contain headers, |
|
211
|
-
| | | otherwise you would lose the first line of data. |
|
212
|
-
| :skip_lines | nil | how many lines to skip before the first line or header line is processed |
|
213
|
-
| :force_utf8 | false | force UTF-8 encoding of all lines (including headers) in the CSV file |
|
214
|
-
| :invalid_byte_sequence | '' | how to replace invalid byte sequences with |
|
215
248
|
---------------------------------------------------------------------------------------------------------------------------------
|
216
249
|
| :value_converters | nil | supply a hash of :header => KlassName; the class needs to implement self.convert(val)|
|
217
250
|
| :remove_empty_values | true | remove values which have nil or empty strings as values |
|
@@ -220,11 +253,7 @@ The options and the block are optional.
|
|
220
253
|
| | | /^\$0\.0+$/ to match $0.00 , or /^#VALUE!$/ to match errors in Excel spreadsheets |
|
221
254
|
| :convert_values_to_numeric | true | converts strings containing Integers or Floats to the appropriate class |
|
222
255
|
| | | also accepts either {:except => [:key1,:key2]} or {:only => :key3} |
|
223
|
-
|
224
|
-
| :file_encoding | utf-8 | Set the file encoding eg.: 'windows-1252' or 'iso-8859-1' |
|
225
|
-
| :force_simple_split | false | force simple splitting on :col_sep character for non-standard CSV-files. |
|
226
|
-
| | | e.g. when :quote_char is not properly escaped |
|
227
|
-
| :verbose | false | print out line number while processing (to track down problems in input files) |
|
256
|
+
---------------------------------------------------------------------------------------------------------------------------------
|
228
257
|
|
229
258
|
|
230
259
|
#### NOTES about File Encodings:
|
@@ -295,6 +324,13 @@ Planned in the next releases:
|
|
295
324
|
|
296
325
|
## Changes
|
297
326
|
|
327
|
+
#### 1.2.3 (2018-01-27)
|
328
|
+
* fixed regression / test
|
329
|
+
* fuxed quote_char interpolation for headers, but not data (thanks to Colin Petruno)
|
330
|
+
|
331
|
+
#### 1.2.1 (2018-01-25) ### YANKED!
|
332
|
+
* bugfix (thanks to Joshua Smith for reporting)
|
333
|
+
|
298
334
|
#### 1.2.0 (2018-01-20)
|
299
335
|
* add default validation that a header can only appear once
|
300
336
|
* add option `required_headers`
|
@@ -465,6 +501,8 @@ And a special thanks to those who contributed pull requests:
|
|
465
501
|
* [Ivan Ushakov](https://github.com/IvanUshakov)
|
466
502
|
* [Matthieu Paret](https://github.com/mtparet)
|
467
503
|
* [Rohit Amarnath](https://github.com/ramarnat)
|
504
|
+
* [Joshua Smith](https://github.com/enviable)
|
505
|
+
* [Colin Petruno](https://github.com/colinpetruno)
|
468
506
|
|
469
507
|
|
470
508
|
## Contributing
|
@@ -59,7 +59,7 @@ module SmarterCSV
|
|
59
59
|
else
|
60
60
|
file_headerA = header.split(options[:col_sep])
|
61
61
|
end
|
62
|
-
file_headerA.map!{|x| x.gsub(%r
|
62
|
+
file_headerA.map!{|x| x.gsub(%r/#{options[:quote_char]}/,'') }
|
63
63
|
file_headerA.map!{|x| x.strip} if options[:strip_whitespace]
|
64
64
|
unless options[:keep_original_headers]
|
65
65
|
file_headerA.map!{|x| x.gsub(/\s+|-+/,'_')}
|
@@ -68,14 +68,14 @@ module SmarterCSV
|
|
68
68
|
|
69
69
|
file_header_size = file_headerA.size
|
70
70
|
else
|
71
|
-
raise SmarterCSV::IncorrectOption , "ERROR
|
71
|
+
raise SmarterCSV::IncorrectOption , "ERROR: If :headers_in_file is set to false, you have to provide :user_provided_headers" if options[:user_provided_headers].nil?
|
72
72
|
end
|
73
73
|
if options[:user_provided_headers] && options[:user_provided_headers].class == Array && ! options[:user_provided_headers].empty?
|
74
74
|
# use user-provided headers
|
75
75
|
headerA = options[:user_provided_headers]
|
76
76
|
if defined?(file_header_size) && ! file_header_size.nil?
|
77
77
|
if headerA.size != file_header_size
|
78
|
-
raise SmarterCSV::HeaderSizeMismatch , "ERROR
|
78
|
+
raise SmarterCSV::HeaderSizeMismatch , "ERROR: :user_provided_headers defines #{headerA.size} headers != CSV-file #{input} has #{file_header_size} headers"
|
79
79
|
else
|
80
80
|
# we could print out the mapping of file_headerA to headerA here
|
81
81
|
end
|
@@ -100,14 +100,14 @@ module SmarterCSV
|
|
100
100
|
headerA.compact.each do |k|
|
101
101
|
duplicate_headers << k if headerA.select{|x| x == k}.size > 1
|
102
102
|
end
|
103
|
-
raise SmarterCSV::DuplicateHeaders , "
|
103
|
+
raise SmarterCSV::DuplicateHeaders , "ERROR: duplicate headers: #{duplicate_headers.join(',')}" unless duplicate_headers.empty?
|
104
104
|
|
105
105
|
if options[:required_headers] && options[:required_headers].is_a?(Array)
|
106
106
|
missing_headers = []
|
107
107
|
options[:required_headers].each do |k|
|
108
108
|
missing_headers << k unless headerA.include?(k)
|
109
109
|
end
|
110
|
-
raise SmarterCSV::MissingHeaders , "
|
110
|
+
raise SmarterCSV::MissingHeaders , "ERROR: missing headers: #{missing_headers.join(',')}" unless missing_headers.empty?
|
111
111
|
end
|
112
112
|
|
113
113
|
# in case we use chunking.. we'll need to set it up..
|
@@ -155,7 +155,7 @@ module SmarterCSV
|
|
155
155
|
else
|
156
156
|
dataA = line.split(options[:col_sep])
|
157
157
|
end
|
158
|
-
|
158
|
+
#### dataA.map!{|x| x.gsub(%r/#{options[:quote_char]}/,'') } # this is actually not a good idea as a default
|
159
159
|
dataA.map!{|x| x.strip} if options[:strip_whitespace]
|
160
160
|
hash = Hash.zip(headerA,dataA) # from Facets of Ruby library
|
161
161
|
# make sure we delete any key/value pairs from the hash, which the user wanted to delete:
|
data/lib/smarter_csv/version.rb
CHANGED
@@ -2,9 +2,9 @@ require 'spec_helper'
|
|
2
2
|
|
3
3
|
fixture_path = 'spec/fixtures'
|
4
4
|
|
5
|
-
describe '
|
5
|
+
describe 'loading file with quoted fields' do
|
6
6
|
|
7
|
-
it '
|
7
|
+
it 'leaving the quotes in the data' do
|
8
8
|
options = {}
|
9
9
|
data = SmarterCSV.process("#{fixture_path}/quoted.csv", options)
|
10
10
|
data.flatten.size.should == 4
|
@@ -20,4 +20,29 @@ describe 'be_able_to' do
|
|
20
20
|
end
|
21
21
|
end
|
22
22
|
|
23
|
+
|
24
|
+
it 'removes quotes around quoted fields, but not inside data' do
|
25
|
+
options = {}
|
26
|
+
data = SmarterCSV.process("#{fixture_path}/quote_char.csv", options)
|
27
|
+
|
28
|
+
data.length.should eq 6
|
29
|
+
data[1][:first_name].should eq "Jam\ne\nson\""
|
30
|
+
data[2][:first_name].should eq "\"Jean"
|
31
|
+
end
|
32
|
+
|
33
|
+
|
34
|
+
# NOTE: quotes inside headers need to be escaped by doubling them
|
35
|
+
# e.g. 'correct ""EXAMPLE""'
|
36
|
+
# this escaping is illegal: 'incorrect \"EXAMPLE\"' <-- this caused CSV parsing error
|
37
|
+
# in case of CSV parsing errirs, use :user_provided_headers, or key_mapping
|
38
|
+
#
|
39
|
+
it 'removes quotes around headers and extra quotes inside headers' do
|
40
|
+
options = {}
|
41
|
+
data = SmarterCSV.process("#{fixture_path}/quoted2.csv", options)
|
42
|
+
|
43
|
+
data.length.should eq 3
|
44
|
+
data.first.keys[2].should eq :isbn
|
45
|
+
data.first.keys[3].should eq :discounted_price
|
46
|
+
end
|
47
|
+
|
23
48
|
end
|
@@ -0,0 +1,25 @@
|
|
1
|
+
require 'spec_helper'
|
2
|
+
|
3
|
+
fixture_path = 'spec/fixtures'
|
4
|
+
|
5
|
+
# somebody reported that a column called 'options_trader' would be truncated to 'trader'
|
6
|
+
|
7
|
+
describe 'loads simple file format' do
|
8
|
+
|
9
|
+
it 'with symbols as keys when using defaults' do
|
10
|
+
options = {}
|
11
|
+
data = SmarterCSV.process("#{fixture_path}/trading.csv", options)
|
12
|
+
|
13
|
+
data.flatten.size.should eq 2
|
14
|
+
data.each do |item|
|
15
|
+
# all keys should be symbols when using v1.x backwards compatible mode
|
16
|
+
item.keys.each{|x| x.class.should eq Symbol}
|
17
|
+
item[:account_id].class.should eq Fixnum
|
18
|
+
item[:options_trader].class.should eq String
|
19
|
+
item[:stock_symbol].class.should eq String
|
20
|
+
item[:shares_issued].class.should eq Fixnum
|
21
|
+
item[:purchase_date].class.should eq String
|
22
|
+
end
|
23
|
+
end
|
24
|
+
|
25
|
+
end
|
metadata
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
2
2
|
name: smarter_csv
|
3
3
|
version: !ruby/object:Gem::Version
|
4
|
-
version: 1.2.
|
4
|
+
version: 1.2.3
|
5
5
|
platform: ruby
|
6
6
|
authors:
|
7
7
|
- 'Tilo Sloboda
|
@@ -10,7 +10,7 @@ authors:
|
|
10
10
|
autorequire:
|
11
11
|
bindir: bin
|
12
12
|
cert_chain: []
|
13
|
-
date: 2018-01-
|
13
|
+
date: 2018-01-27 00:00:00.000000000 Z
|
14
14
|
dependencies:
|
15
15
|
- !ruby/object:Gem::Dependency
|
16
16
|
name: rspec
|
@@ -68,9 +68,12 @@ files:
|
|
68
68
|
- spec/fixtures/no_header.csv
|
69
69
|
- spec/fixtures/numeric.csv
|
70
70
|
- spec/fixtures/pets.csv
|
71
|
+
- spec/fixtures/quote_char.csv
|
71
72
|
- spec/fixtures/quoted.csv
|
73
|
+
- spec/fixtures/quoted2.csv
|
72
74
|
- spec/fixtures/separator.csv
|
73
75
|
- spec/fixtures/skip_lines.csv
|
76
|
+
- spec/fixtures/trading.csv
|
74
77
|
- spec/fixtures/user_import.csv
|
75
78
|
- spec/fixtures/valid_unicode.csv
|
76
79
|
- spec/fixtures/with_dashes.csv
|
@@ -101,6 +104,7 @@ files:
|
|
101
104
|
- spec/smarter_csv/skip_lines_spec.rb
|
102
105
|
- spec/smarter_csv/strings_as_keys_spec.rb
|
103
106
|
- spec/smarter_csv/strip_chars_from_headers_spec.rb
|
107
|
+
- spec/smarter_csv/trading_spec.rb
|
104
108
|
- spec/smarter_csv/valid_unicode_spec.rb
|
105
109
|
- spec/smarter_csv/value_converters_spec.rb
|
106
110
|
- spec/spec.opts
|
@@ -128,7 +132,7 @@ required_rubygems_version: !ruby/object:Gem::Requirement
|
|
128
132
|
requirements:
|
129
133
|
- csv
|
130
134
|
rubyforge_project:
|
131
|
-
rubygems_version: 2.
|
135
|
+
rubygems_version: 2.7.4
|
132
136
|
signing_key:
|
133
137
|
specification_version: 4
|
134
138
|
summary: Ruby Gem for smarter importing of CSV Files (and CSV-like files), with lots
|
@@ -153,9 +157,12 @@ test_files:
|
|
153
157
|
- spec/fixtures/no_header.csv
|
154
158
|
- spec/fixtures/numeric.csv
|
155
159
|
- spec/fixtures/pets.csv
|
160
|
+
- spec/fixtures/quote_char.csv
|
156
161
|
- spec/fixtures/quoted.csv
|
162
|
+
- spec/fixtures/quoted2.csv
|
157
163
|
- spec/fixtures/separator.csv
|
158
164
|
- spec/fixtures/skip_lines.csv
|
165
|
+
- spec/fixtures/trading.csv
|
159
166
|
- spec/fixtures/user_import.csv
|
160
167
|
- spec/fixtures/valid_unicode.csv
|
161
168
|
- spec/fixtures/with_dashes.csv
|
@@ -186,6 +193,7 @@ test_files:
|
|
186
193
|
- spec/smarter_csv/skip_lines_spec.rb
|
187
194
|
- spec/smarter_csv/strings_as_keys_spec.rb
|
188
195
|
- spec/smarter_csv/strip_chars_from_headers_spec.rb
|
196
|
+
- spec/smarter_csv/trading_spec.rb
|
189
197
|
- spec/smarter_csv/valid_unicode_spec.rb
|
190
198
|
- spec/smarter_csv/value_converters_spec.rb
|
191
199
|
- spec/spec.opts
|