smarter_csv 1.0.19 → 1.1.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +4 -4
- data/.travis.yml +1 -0
- data/README.md +53 -3
- data/lib/smarter_csv/smarter_csv.rb +27 -4
- data/lib/smarter_csv/version.rb +1 -1
- data/spec/fixtures/money.csv +3 -0
- data/spec/fixtures/with_dashes.csv +8 -0
- data/spec/fixtures/with_dates.csv +4 -0
- data/spec/smarter_csv/header_transformation_spec.rb +21 -0
- data/spec/smarter_csv/value_converters_spec.rb +52 -0
- metadata +13 -3
checksums.yaml
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
---
|
2
2
|
SHA1:
|
3
|
-
metadata.gz:
|
4
|
-
data.tar.gz:
|
3
|
+
metadata.gz: ec1442dad9c0f71dc3264f6df10f5fe1116f9d23
|
4
|
+
data.tar.gz: 2025c6c7c81fc6c94fed0ed7d391eb122a464bd1
|
5
5
|
SHA512:
|
6
|
-
metadata.gz:
|
7
|
-
data.tar.gz:
|
6
|
+
metadata.gz: 1dd00a098dba973b2f6e0303317bd11cdc527d05ca959dd21ae78f241a640e799b623f0b719da5d146c89856d0e50a557ffa8180725d29c6582e0e10231b091a
|
7
|
+
data.tar.gz: 7ab34fac0386ccef0ad13f1116b5ed0669d4ffe6270f8a4b9e39da8421132c4fac4f1c1764b78ca8ce5e702df29c9819ee1359bf0b18c98b735ea53a3935e4d8
|
data/.travis.yml
CHANGED
data/README.md
CHANGED
@@ -1,4 +1,6 @@
|
|
1
|
-
# SmarterCSV
|
1
|
+
# SmarterCSV
|
2
|
+
|
3
|
+
[](http://travis-ci.org/tilo/smarter_csv) [](http://badge.fury.io/rb/smarter_csv)
|
2
4
|
|
3
5
|
`smarter_csv` is a Ruby Gem for smarter importing of CSV Files as Array(s) of Hashes, suitable for direct processing with Mongoid or ActiveRecord,
|
4
6
|
and parallel processing with Resque or Sidekiq.
|
@@ -32,7 +34,8 @@ The two main choices you have in terms of how to call `SmarterCSV.process` are:
|
|
32
34
|
* calling `process` with or without a block
|
33
35
|
* passing a `:chunk_size` to the `process` method, and processing the CSV-file in chunks, rather than in one piece.
|
34
36
|
|
35
|
-
Tip: If you are uncertain about what line endings a CSV-file uses, try specifying `:row_sep => :auto` as part of the options.
|
37
|
+
Tip: If you are uncertain about what line endings a CSV-file uses, try specifying `:row_sep => :auto` as part of the options.
|
38
|
+
But this could be slow, because it will try to analyze each CSV file first. If you want to speed things up, set the `:row_sep` manually! Checkout Example 5 for unusual `:row_sep` and `:col_sep`.
|
36
39
|
|
37
40
|
#### Example 1a: How SmarterCSV processes CSV-files as array of hashes:
|
38
41
|
Please note how each hash contains only the keys for columns with non-null values.
|
@@ -125,6 +128,40 @@ and how the `process` method returns the number of chunks when called with a blo
|
|
125
128
|
end
|
126
129
|
=> returns number of chunks
|
127
130
|
|
131
|
+
#### Example 6: Using Value Converters
|
132
|
+
|
133
|
+
$ cat spec/fixtures/with_dates.csv
|
134
|
+
first,last,date,price
|
135
|
+
Ben,Miller,10/30/1998,$44.50
|
136
|
+
Tom,Turner,2/1/2011,$15.99
|
137
|
+
Ken,Smith,01/09/2013,$199.99
|
138
|
+
$ irb
|
139
|
+
> require 'smarter_csv'
|
140
|
+
> require 'date'
|
141
|
+
|
142
|
+
# define a custom converter class, which implements self.convert(value)
|
143
|
+
class DateConverter
|
144
|
+
def self.convert(value)
|
145
|
+
Date.strptime( value, '%m/%d/%Y') # parses custom date format into Date instance
|
146
|
+
end
|
147
|
+
end
|
148
|
+
|
149
|
+
class DollarConverter
|
150
|
+
def self.convert(value)
|
151
|
+
value.sub('$','').to_f
|
152
|
+
end
|
153
|
+
end
|
154
|
+
|
155
|
+
options = {:value_converters => {:date => DateConverter, :price => DollarConverter}}
|
156
|
+
data = SmarterCSV.process("spec/fixtures/with_dates.csv", options)
|
157
|
+
data[0][:date]
|
158
|
+
=> #<Date: 1998-10-30 ((2451117j,0s,0n),+0s,2299161j)>
|
159
|
+
data[0][:date].class
|
160
|
+
=> Date
|
161
|
+
data[0][:price]
|
162
|
+
=> 44.50
|
163
|
+
data[0][:price].class
|
164
|
+
=> Float
|
128
165
|
|
129
166
|
## Documentation
|
130
167
|
|
@@ -141,7 +178,7 @@ The options and the block are optional.
|
|
141
178
|
---------------------------------------------------------------------------------------------------------------------------------
|
142
179
|
| :col_sep | ',' | column separator |
|
143
180
|
| :row_sep | $/ ,"\n" | row separator or record separator , defaults to system's $/ , which defaults to "\n" |
|
144
|
-
| | |
|
181
|
+
| | | This can also be set to :auto, but will process the whole cvs file first (slow!) |
|
145
182
|
| :quote_char | '"' | quotation character |
|
146
183
|
| :comment_regexp | /^#/ | regular expression which matches comment lines (see NOTE about the CSV header) |
|
147
184
|
| :chunk_size | nil | if set, determines the desired chunk-size (defaults to nil, no chunk processing) |
|
@@ -162,6 +199,7 @@ The options and the block are optional.
|
|
162
199
|
| | | Important if the file does not contain headers, |
|
163
200
|
| | | otherwise you would lose the first line of data. |
|
164
201
|
---------------------------------------------------------------------------------------------------------------------------------
|
202
|
+
| :value_converters | nil | supply a hash of :header => KlassName; the class needs to implement self.convert(val)|
|
165
203
|
| :remove_empty_values | true | remove values which have nil or empty strings as values |
|
166
204
|
| :remove_zero_values | true | remove values which have a numeric value equal to zero / 0 |
|
167
205
|
| :remove_values_matching | nil | removes key/value pairs if value matches given regular expressions. e.g.: |
|
@@ -235,6 +273,12 @@ Or install it yourself as:
|
|
235
273
|
|
236
274
|
## Changes
|
237
275
|
|
276
|
+
#### 1.1.0 (2015-07-26)
|
277
|
+
* added feature :value_converters, which allows parsing of dates, money, and other things (thanks to Raphaël Bleuse, Lucas Camargo de Almeida, Alejandro)
|
278
|
+
* added error if :headers_in_file is set to false, and no :user_provided_headers are given (thanks to innhyu)
|
279
|
+
* added support to convert dashes to underscore characters in headers (thanks to César Camacho)
|
280
|
+
* fixing automatic detection of \r\n line-endings (thanks to feens)
|
281
|
+
|
238
282
|
#### 1.0.19 (2014-10-29)
|
239
283
|
* added option :keep_original_headers to keep CSV-headers as-is (thanks to Benjamin Thouret)
|
240
284
|
|
@@ -339,6 +383,12 @@ Please [open an Issue on GitHub](https://github.com/tilo/smarter_csv/issues) if
|
|
339
383
|
Many thanks to people who have filed issues and sent comments.
|
340
384
|
And a special thanks to those who contributed pull requests:
|
341
385
|
|
386
|
+
* [Alejandro](https://github.com/agaviria)
|
387
|
+
* [Lucas Camargo de Almeida](https://github.com/lcalmeida)
|
388
|
+
* [Raphaël Bleuse](https://github.com/bleuse)
|
389
|
+
* [feens](https://github.com/feens)
|
390
|
+
* [César Camacho](https://github.com/chanko)
|
391
|
+
* [innhyu](https://github.com/innhyu)
|
342
392
|
* [Benjamin Thouret](https://github.com/benichu)
|
343
393
|
* [Chris Hilton](https://github.com/chrismhilton)
|
344
394
|
* [Sean Duckett](http://github.com/sduckett)
|
@@ -9,7 +9,7 @@ module SmarterCSV
|
|
9
9
|
:remove_empty_values => true, :remove_zero_values => false , :remove_values_matching => nil , :remove_empty_hashes => true , :strip_whitespace => true,
|
10
10
|
:convert_values_to_numeric => true, :strip_chars_from_headers => nil , :user_provided_headers => nil , :headers_in_file => true,
|
11
11
|
:comment_regexp => /^#/, :chunk_size => nil , :key_mapping_hash => nil , :downcase_header => true, :strings_as_keys => false, :file_encoding => 'utf-8',
|
12
|
-
:remove_unmapped_keys => false, :keep_original_headers => false,
|
12
|
+
:remove_unmapped_keys => false, :keep_original_headers => false, :value_converters => nil,
|
13
13
|
}
|
14
14
|
options = default_options.merge(options)
|
15
15
|
csv_options = options.select{|k,v| [:col_sep, :row_sep, :quote_char].include?(k)} # options.slice(:col_sep, :row_sep, :quote_char)
|
@@ -40,13 +40,15 @@ module SmarterCSV
|
|
40
40
|
file_headerA.map!{|x| x.gsub(%r/options[:quote_char]/,'') }
|
41
41
|
file_headerA.map!{|x| x.strip} if options[:strip_whitespace]
|
42
42
|
unless options[:keep_original_headers]
|
43
|
-
file_headerA.map!{|x| x.gsub(/\s
|
43
|
+
file_headerA.map!{|x| x.gsub(/\s+|-+/,'_')}
|
44
44
|
file_headerA.map!{|x| x.downcase } if options[:downcase_header]
|
45
45
|
end
|
46
46
|
|
47
47
|
# puts "HeaderA: #{file_headerA.join(' , ')}" if options[:verbose]
|
48
48
|
|
49
49
|
file_header_size = file_headerA.size
|
50
|
+
else
|
51
|
+
raise SmarterCSV::IncorrectOption , "ERROR [smarter_csv]: If :headers_in_file is set to false, you have to provide :user_provided_headers" if ! options.keys.include?(:user_provided_headers)
|
50
52
|
end
|
51
53
|
if options[:user_provided_headers] && options[:user_provided_headers].class == Array && ! options[:user_provided_headers].empty?
|
52
54
|
# use user-provided headers
|
@@ -135,6 +137,15 @@ module SmarterCSV
|
|
135
137
|
end
|
136
138
|
end
|
137
139
|
end
|
140
|
+
|
141
|
+
if options[:value_converters]
|
142
|
+
hash.each do |k,v|
|
143
|
+
converter = options[:value_converters][k]
|
144
|
+
next unless converter
|
145
|
+
hash[k] = converter.convert(v)
|
146
|
+
end
|
147
|
+
end
|
148
|
+
|
138
149
|
next if hash.empty? if options[:remove_empty_hashes]
|
139
150
|
|
140
151
|
if use_chunks
|
@@ -212,11 +223,23 @@ module SmarterCSV
|
|
212
223
|
|
213
224
|
# count how many of the pre-defined line-endings we find
|
214
225
|
# ignoring those contained within quote characters
|
226
|
+
last_char = nil
|
215
227
|
filehandle.each_char do |c|
|
216
228
|
quoted_char = !quoted_char if c == options[:quote_char]
|
217
|
-
next if quoted_char
|
218
|
-
|
229
|
+
next if quoted_char
|
230
|
+
|
231
|
+
if last_char == "\r"
|
232
|
+
if c == "\n"
|
233
|
+
counts["\r\n"] += 1
|
234
|
+
else
|
235
|
+
counts["\r"] += 1 # \r are counted after they appeared, we might
|
236
|
+
end
|
237
|
+
elsif c == "\n"
|
238
|
+
counts["\n"] += 1
|
239
|
+
end
|
240
|
+
last_char = c
|
219
241
|
end
|
242
|
+
counts["\r"] += 1 if last_char == "\r"
|
220
243
|
# find the key/value pair with the largest counter:
|
221
244
|
k,v = counts.max_by{|k,v| v}
|
222
245
|
return k # the most frequent one is it
|
data/lib/smarter_csv/version.rb
CHANGED
@@ -0,0 +1,21 @@
|
|
1
|
+
require 'spec_helper'
|
2
|
+
|
3
|
+
fixture_path = 'spec/fixtures'
|
4
|
+
|
5
|
+
describe 'be_able_to' do
|
6
|
+
it 'loads_file_with_dashes_in_header_fields as strings' do
|
7
|
+
options = {:strings_as_keys => true}
|
8
|
+
data = SmarterCSV.process("#{fixture_path}/with_dashes.csv", options)
|
9
|
+
data.flatten.size.should == 5
|
10
|
+
data[0]['first_name'].should eq 'Dan'
|
11
|
+
data[0]['last_name'].should eq 'McAllister'
|
12
|
+
end
|
13
|
+
|
14
|
+
it 'loads_file_with_dashes_in_header_fields as symbols' do
|
15
|
+
options = {:strings_as_keys => false}
|
16
|
+
data = SmarterCSV.process("#{fixture_path}/with_dashes.csv", options)
|
17
|
+
data.flatten.size.should == 5
|
18
|
+
data[0][:first_name].should eq 'Dan'
|
19
|
+
data[0][:last_name].should eq 'McAllister'
|
20
|
+
end
|
21
|
+
end
|
@@ -0,0 +1,52 @@
|
|
1
|
+
require 'spec_helper'
|
2
|
+
|
3
|
+
fixture_path = 'spec/fixtures'
|
4
|
+
|
5
|
+
require 'date'
|
6
|
+
class DateConverter
|
7
|
+
def self.convert(value)
|
8
|
+
Date.strptime( value, '%m/%d/%Y')
|
9
|
+
end
|
10
|
+
end
|
11
|
+
|
12
|
+
class CurrencyConverter
|
13
|
+
def self.convert(value)
|
14
|
+
value.sub(/[$]/,'').to_f # would be nice to add a computed column :currency => '€'
|
15
|
+
end
|
16
|
+
end
|
17
|
+
|
18
|
+
describe 'be_able_to' do
|
19
|
+
it 'convert date values into Date instances' do
|
20
|
+
options = {:value_converters => {:date => DateConverter}}
|
21
|
+
data = SmarterCSV.process("#{fixture_path}/with_dates.csv", options)
|
22
|
+
data.flatten.size.should == 3
|
23
|
+
data[0][:date].class.should eq Date
|
24
|
+
data[0][:date].to_s.should eq "1998-10-30"
|
25
|
+
data[1][:date].to_s.should eq "2011-02-01"
|
26
|
+
data[2][:date].to_s.should eq "2013-01-09"
|
27
|
+
end
|
28
|
+
|
29
|
+
it 'converts dollar prices into float values' do
|
30
|
+
options = {:value_converters => {:price => CurrencyConverter}}
|
31
|
+
data = SmarterCSV.process("#{fixture_path}/money.csv", options)
|
32
|
+
data.flatten.size.should == 2
|
33
|
+
data[0][:price].class.should eq Float
|
34
|
+
data[0][:price].should eq 9.99
|
35
|
+
data[1][:price].should eq 14.99
|
36
|
+
end
|
37
|
+
|
38
|
+
it 'convert can use multiple value converters' do
|
39
|
+
options = {:value_converters => {:date => DateConverter, :price => CurrencyConverter}}
|
40
|
+
data = SmarterCSV.process("#{fixture_path}/with_dates.csv", options)
|
41
|
+
data.flatten.size.should == 3
|
42
|
+
data[0][:date].class.should eq Date
|
43
|
+
data[0][:date].to_s.should eq "1998-10-30"
|
44
|
+
data[1][:date].to_s.should eq "2011-02-01"
|
45
|
+
data[2][:date].to_s.should eq "2013-01-09"
|
46
|
+
|
47
|
+
data[0][:price].class.should eq Float
|
48
|
+
data[0][:price].should eq 44.50
|
49
|
+
data[1][:price].should eq 15.0
|
50
|
+
data[2][:price].should eq 0.11
|
51
|
+
end
|
52
|
+
end
|
metadata
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
2
2
|
name: smarter_csv
|
3
3
|
version: !ruby/object:Gem::Version
|
4
|
-
version: 1.0
|
4
|
+
version: 1.1.0
|
5
5
|
platform: ruby
|
6
6
|
authors:
|
7
7
|
- |
|
@@ -9,7 +9,7 @@ authors:
|
|
9
9
|
autorequire:
|
10
10
|
bindir: bin
|
11
11
|
cert_chain: []
|
12
|
-
date:
|
12
|
+
date: 2015-07-27 00:00:00.000000000 Z
|
13
13
|
dependencies:
|
14
14
|
- !ruby/object:Gem::Dependency
|
15
15
|
name: rspec
|
@@ -59,17 +59,21 @@ files:
|
|
59
59
|
- spec/fixtures/line_endings_r.csv
|
60
60
|
- spec/fixtures/line_endings_rn.csv
|
61
61
|
- spec/fixtures/lots_of_columns.csv
|
62
|
+
- spec/fixtures/money.csv
|
62
63
|
- spec/fixtures/no_header.csv
|
63
64
|
- spec/fixtures/numeric.csv
|
64
65
|
- spec/fixtures/pets.csv
|
65
66
|
- spec/fixtures/quoted.csv
|
66
67
|
- spec/fixtures/separator.csv
|
68
|
+
- spec/fixtures/with_dashes.csv
|
69
|
+
- spec/fixtures/with_dates.csv
|
67
70
|
- spec/smarter_csv/binary_file2_spec.rb
|
68
71
|
- spec/smarter_csv/binary_file_spec.rb
|
69
72
|
- spec/smarter_csv/carriage_return_spec.rb
|
70
73
|
- spec/smarter_csv/chunked_reading_spec.rb
|
71
74
|
- spec/smarter_csv/column_separator_spec.rb
|
72
75
|
- spec/smarter_csv/convert_values_to_numeric_spec.rb
|
76
|
+
- spec/smarter_csv/header_transformation_spec.rb
|
73
77
|
- spec/smarter_csv/keep_headers_spec.rb
|
74
78
|
- spec/smarter_csv/key_mapping_spec.rb
|
75
79
|
- spec/smarter_csv/line_ending_spec.rb
|
@@ -84,6 +88,7 @@ files:
|
|
84
88
|
- spec/smarter_csv/remove_zero_values_spec.rb
|
85
89
|
- spec/smarter_csv/strings_as_keys_spec.rb
|
86
90
|
- spec/smarter_csv/strip_chars_from_headers_spec.rb
|
91
|
+
- spec/smarter_csv/value_converters_spec.rb
|
87
92
|
- spec/spec.opts
|
88
93
|
- spec/spec/spec_helper.rb
|
89
94
|
- spec/spec_helper.rb
|
@@ -109,7 +114,7 @@ required_rubygems_version: !ruby/object:Gem::Requirement
|
|
109
114
|
requirements:
|
110
115
|
- csv
|
111
116
|
rubyforge_project:
|
112
|
-
rubygems_version: 2.
|
117
|
+
rubygems_version: 2.4.5
|
113
118
|
signing_key:
|
114
119
|
specification_version: 4
|
115
120
|
summary: Ruby Gem for smarter importing of CSV Files (and CSV-like files), with lots
|
@@ -127,17 +132,21 @@ test_files:
|
|
127
132
|
- spec/fixtures/line_endings_r.csv
|
128
133
|
- spec/fixtures/line_endings_rn.csv
|
129
134
|
- spec/fixtures/lots_of_columns.csv
|
135
|
+
- spec/fixtures/money.csv
|
130
136
|
- spec/fixtures/no_header.csv
|
131
137
|
- spec/fixtures/numeric.csv
|
132
138
|
- spec/fixtures/pets.csv
|
133
139
|
- spec/fixtures/quoted.csv
|
134
140
|
- spec/fixtures/separator.csv
|
141
|
+
- spec/fixtures/with_dashes.csv
|
142
|
+
- spec/fixtures/with_dates.csv
|
135
143
|
- spec/smarter_csv/binary_file2_spec.rb
|
136
144
|
- spec/smarter_csv/binary_file_spec.rb
|
137
145
|
- spec/smarter_csv/carriage_return_spec.rb
|
138
146
|
- spec/smarter_csv/chunked_reading_spec.rb
|
139
147
|
- spec/smarter_csv/column_separator_spec.rb
|
140
148
|
- spec/smarter_csv/convert_values_to_numeric_spec.rb
|
149
|
+
- spec/smarter_csv/header_transformation_spec.rb
|
141
150
|
- spec/smarter_csv/keep_headers_spec.rb
|
142
151
|
- spec/smarter_csv/key_mapping_spec.rb
|
143
152
|
- spec/smarter_csv/line_ending_spec.rb
|
@@ -152,6 +161,7 @@ test_files:
|
|
152
161
|
- spec/smarter_csv/remove_zero_values_spec.rb
|
153
162
|
- spec/smarter_csv/strings_as_keys_spec.rb
|
154
163
|
- spec/smarter_csv/strip_chars_from_headers_spec.rb
|
164
|
+
- spec/smarter_csv/value_converters_spec.rb
|
155
165
|
- spec/spec.opts
|
156
166
|
- spec/spec/spec_helper.rb
|
157
167
|
- spec/spec_helper.rb
|