smarter_csv 1.0.19 → 1.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA1:
3
- metadata.gz: 9022f349dd8ee2590c73b198fb83114a8c96932d
4
- data.tar.gz: 9c1a769c72e08e2e78d15ad444bf3f9642cd33e5
3
+ metadata.gz: ec1442dad9c0f71dc3264f6df10f5fe1116f9d23
4
+ data.tar.gz: 2025c6c7c81fc6c94fed0ed7d391eb122a464bd1
5
5
  SHA512:
6
- metadata.gz: e3ccf944663244bc4b336d9980c26f1fda874d48586a131f3c761b6885a2753ac443c80a559046e2c6670f90ba192155e10aceb0e84798add22c9a20d78653a1
7
- data.tar.gz: 69b3abf03488df9b79b796dd7efbc3612bd273fb1e4f6f156b238213ef377e22ff1852ae17fb7384722dfbba456d9ab36313e5d2c43d5a599a696d008194cd29
6
+ metadata.gz: 1dd00a098dba973b2f6e0303317bd11cdc527d05ca959dd21ae78f241a640e799b623f0b719da5d146c89856d0e50a557ffa8180725d29c6582e0e10231b091a
7
+ data.tar.gz: 7ab34fac0386ccef0ad13f1116b5ed0669d4ffe6270f8a4b9e39da8421132c4fac4f1c1764b78ca8ce5e702df29c9819ee1359bf0b18c98b735ea53a3935e4d8
@@ -6,6 +6,7 @@ rvm:
6
6
  - 1.9.3
7
7
  - 2.0.0
8
8
  - 2.1.3
9
+ - 2.2.2
9
10
  - jruby
10
11
  - ruby-head
11
12
  - jruby-head
data/README.md CHANGED
@@ -1,4 +1,6 @@
1
- # SmarterCSV [![Build Status](https://secure.travis-ci.org/tilo/smarter_csv.png?branch=master)](http://travis-ci.org/tilo/smarter_csv)
1
+ # SmarterCSV
2
+
3
+ [![Build Status](https://secure.travis-ci.org/tilo/smarter_csv.png?branch=master)](http://travis-ci.org/tilo/smarter_csv) [![Gem Version](https://badge.fury.io/rb/smarter_csv.svg)](http://badge.fury.io/rb/smarter_csv)
2
4
 
3
5
  `smarter_csv` is a Ruby Gem for smarter importing of CSV Files as Array(s) of Hashes, suitable for direct processing with Mongoid or ActiveRecord,
4
6
  and parallel processing with Resque or Sidekiq.
@@ -32,7 +34,8 @@ The two main choices you have in terms of how to call `SmarterCSV.process` are:
32
34
  * calling `process` with or without a block
33
35
  * passing a `:chunk_size` to the `process` method, and processing the CSV-file in chunks, rather than in one piece.
34
36
 
35
- Tip: If you are uncertain about what line endings a CSV-file uses, try specifying `:row_sep => :auto` as part of the options. Checkout Example 5 for unusual `:row_sep` and `:col_sep`.
37
+ Tip: If you are uncertain about what line endings a CSV-file uses, try specifying `:row_sep => :auto` as part of the options.
38
+ But this could be slow, because it will try to analyze each CSV file first. If you want to speed things up, set the `:row_sep` manually! Checkout Example 5 for unusual `:row_sep` and `:col_sep`.
36
39
 
37
40
  #### Example 1a: How SmarterCSV processes CSV-files as array of hashes:
38
41
  Please note how each hash contains only the keys for columns with non-null values.
@@ -125,6 +128,40 @@ and how the `process` method returns the number of chunks when called with a blo
125
128
  end
126
129
  => returns number of chunks
127
130
 
131
+ #### Example 6: Using Value Converters
132
+
133
+ $ cat spec/fixtures/with_dates.csv
134
+ first,last,date,price
135
+ Ben,Miller,10/30/1998,$44.50
136
+ Tom,Turner,2/1/2011,$15.99
137
+ Ken,Smith,01/09/2013,$199.99
138
+ $ irb
139
+ > require 'smarter_csv'
140
+ > require 'date'
141
+
142
+ # define a custom converter class, which implements self.convert(value)
143
+ class DateConverter
144
+ def self.convert(value)
145
+ Date.strptime( value, '%m/%d/%Y') # parses custom date format into Date instance
146
+ end
147
+ end
148
+
149
+ class DollarConverter
150
+ def self.convert(value)
151
+ value.sub('$','').to_f
152
+ end
153
+ end
154
+
155
+ options = {:value_converters => {:date => DateConverter, :price => DollarConverter}}
156
+ data = SmarterCSV.process("spec/fixtures/with_dates.csv", options)
157
+ data[0][:date]
158
+ => #<Date: 1998-10-30 ((2451117j,0s,0n),+0s,2299161j)>
159
+ data[0][:date].class
160
+ => Date
161
+ data[0][:price]
162
+ => 44.50
163
+ data[0][:price].class
164
+ => Float
128
165
 
129
166
  ## Documentation
130
167
 
@@ -141,7 +178,7 @@ The options and the block are optional.
141
178
  ---------------------------------------------------------------------------------------------------------------------------------
142
179
  | :col_sep | ',' | column separator |
143
180
  | :row_sep | $/ ,"\n" | row separator or record separator , defaults to system's $/ , which defaults to "\n" |
144
- | | | this can also be set to :auto , but will process the whole cvs file first |
181
+ | | | This can also be set to :auto, but will process the whole cvs file first (slow!) |
145
182
  | :quote_char | '"' | quotation character |
146
183
  | :comment_regexp | /^#/ | regular expression which matches comment lines (see NOTE about the CSV header) |
147
184
  | :chunk_size | nil | if set, determines the desired chunk-size (defaults to nil, no chunk processing) |
@@ -162,6 +199,7 @@ The options and the block are optional.
162
199
  | | | Important if the file does not contain headers, |
163
200
  | | | otherwise you would lose the first line of data. |
164
201
  ---------------------------------------------------------------------------------------------------------------------------------
202
+ | :value_converters | nil | supply a hash of :header => KlassName; the class needs to implement self.convert(val)|
165
203
  | :remove_empty_values | true | remove values which have nil or empty strings as values |
166
204
  | :remove_zero_values | true | remove values which have a numeric value equal to zero / 0 |
167
205
  | :remove_values_matching | nil | removes key/value pairs if value matches given regular expressions. e.g.: |
@@ -235,6 +273,12 @@ Or install it yourself as:
235
273
 
236
274
  ## Changes
237
275
 
276
+ #### 1.1.0 (2015-07-26)
277
+ * added feature :value_converters, which allows parsing of dates, money, and other things (thanks to Raphaël Bleuse, Lucas Camargo de Almeida, Alejandro)
278
+ * added error if :headers_in_file is set to false, and no :user_provided_headers are given (thanks to innhyu)
279
+ * added support to convert dashes to underscore characters in headers (thanks to César Camacho)
280
+ * fixing automatic detection of \r\n line-endings (thanks to feens)
281
+
238
282
  #### 1.0.19 (2014-10-29)
239
283
  * added option :keep_original_headers to keep CSV-headers as-is (thanks to Benjamin Thouret)
240
284
 
@@ -339,6 +383,12 @@ Please [open an Issue on GitHub](https://github.com/tilo/smarter_csv/issues) if
339
383
  Many thanks to people who have filed issues and sent comments.
340
384
  And a special thanks to those who contributed pull requests:
341
385
 
386
+ * [Alejandro](https://github.com/agaviria)
387
+ * [Lucas Camargo de Almeida](https://github.com/lcalmeida)
388
+ * [Raphaël Bleuse](https://github.com/bleuse)
389
+ * [feens](https://github.com/feens)
390
+ * [César Camacho](https://github.com/chanko)
391
+ * [innhyu](https://github.com/innhyu)
342
392
  * [Benjamin Thouret](https://github.com/benichu)
343
393
  * [Chris Hilton](https://github.com/chrismhilton)
344
394
  * [Sean Duckett](http://github.com/sduckett)
@@ -9,7 +9,7 @@ module SmarterCSV
9
9
  :remove_empty_values => true, :remove_zero_values => false , :remove_values_matching => nil , :remove_empty_hashes => true , :strip_whitespace => true,
10
10
  :convert_values_to_numeric => true, :strip_chars_from_headers => nil , :user_provided_headers => nil , :headers_in_file => true,
11
11
  :comment_regexp => /^#/, :chunk_size => nil , :key_mapping_hash => nil , :downcase_header => true, :strings_as_keys => false, :file_encoding => 'utf-8',
12
- :remove_unmapped_keys => false, :keep_original_headers => false,
12
+ :remove_unmapped_keys => false, :keep_original_headers => false, :value_converters => nil,
13
13
  }
14
14
  options = default_options.merge(options)
15
15
  csv_options = options.select{|k,v| [:col_sep, :row_sep, :quote_char].include?(k)} # options.slice(:col_sep, :row_sep, :quote_char)
@@ -40,13 +40,15 @@ module SmarterCSV
40
40
  file_headerA.map!{|x| x.gsub(%r/options[:quote_char]/,'') }
41
41
  file_headerA.map!{|x| x.strip} if options[:strip_whitespace]
42
42
  unless options[:keep_original_headers]
43
- file_headerA.map!{|x| x.gsub(/\s+/,'_')}
43
+ file_headerA.map!{|x| x.gsub(/\s+|-+/,'_')}
44
44
  file_headerA.map!{|x| x.downcase } if options[:downcase_header]
45
45
  end
46
46
 
47
47
  # puts "HeaderA: #{file_headerA.join(' , ')}" if options[:verbose]
48
48
 
49
49
  file_header_size = file_headerA.size
50
+ else
51
+ raise SmarterCSV::IncorrectOption , "ERROR [smarter_csv]: If :headers_in_file is set to false, you have to provide :user_provided_headers" if ! options.keys.include?(:user_provided_headers)
50
52
  end
51
53
  if options[:user_provided_headers] && options[:user_provided_headers].class == Array && ! options[:user_provided_headers].empty?
52
54
  # use user-provided headers
@@ -135,6 +137,15 @@ module SmarterCSV
135
137
  end
136
138
  end
137
139
  end
140
+
141
+ if options[:value_converters]
142
+ hash.each do |k,v|
143
+ converter = options[:value_converters][k]
144
+ next unless converter
145
+ hash[k] = converter.convert(v)
146
+ end
147
+ end
148
+
138
149
  next if hash.empty? if options[:remove_empty_hashes]
139
150
 
140
151
  if use_chunks
@@ -212,11 +223,23 @@ module SmarterCSV
212
223
 
213
224
  # count how many of the pre-defined line-endings we find
214
225
  # ignoring those contained within quote characters
226
+ last_char = nil
215
227
  filehandle.each_char do |c|
216
228
  quoted_char = !quoted_char if c == options[:quote_char]
217
- next if quoted_char || c !~ /\r|\n|\r\n/
218
- counts[c] += 1
229
+ next if quoted_char
230
+
231
+ if last_char == "\r"
232
+ if c == "\n"
233
+ counts["\r\n"] += 1
234
+ else
235
+ counts["\r"] += 1 # \r are counted after they appeared, we might
236
+ end
237
+ elsif c == "\n"
238
+ counts["\n"] += 1
239
+ end
240
+ last_char = c
219
241
  end
242
+ counts["\r"] += 1 if last_char == "\r"
220
243
  # find the key/value pair with the largest counter:
221
244
  k,v = counts.max_by{|k,v| v}
222
245
  return k # the most frequent one is it
@@ -1,3 +1,3 @@
1
1
  module SmarterCSV
2
- VERSION = "1.0.19"
2
+ VERSION = "1.1.0"
3
3
  end
@@ -0,0 +1,3 @@
1
+ item,price
2
+ Book,$9.99
3
+ Mug,$14.99
@@ -0,0 +1,8 @@
1
+ First-Name,Last-Name,Dogs,Cats,Birds,Fish
2
+ Dan,McAllister,2,0,,
3
+ Lucy,Laweless,,5,0,
4
+ ,,,,,
5
+ Miles,O'Brian,0,0,0,21
6
+ Nancy,Homes,2,0,1,
7
+ Hernán,Curaçon,3,0,0,
8
+ ,,,,,
@@ -0,0 +1,4 @@
1
+ first,last,date,price
2
+ Ben,Miller,10/30/1998,$44.50
3
+ Tom,Turner,2/1/2011,$15
4
+ Ken,Smith,01/09/2013,$0.11
@@ -0,0 +1,21 @@
1
+ require 'spec_helper'
2
+
3
+ fixture_path = 'spec/fixtures'
4
+
5
+ describe 'be_able_to' do
6
+ it 'loads_file_with_dashes_in_header_fields as strings' do
7
+ options = {:strings_as_keys => true}
8
+ data = SmarterCSV.process("#{fixture_path}/with_dashes.csv", options)
9
+ data.flatten.size.should == 5
10
+ data[0]['first_name'].should eq 'Dan'
11
+ data[0]['last_name'].should eq 'McAllister'
12
+ end
13
+
14
+ it 'loads_file_with_dashes_in_header_fields as symbols' do
15
+ options = {:strings_as_keys => false}
16
+ data = SmarterCSV.process("#{fixture_path}/with_dashes.csv", options)
17
+ data.flatten.size.should == 5
18
+ data[0][:first_name].should eq 'Dan'
19
+ data[0][:last_name].should eq 'McAllister'
20
+ end
21
+ end
@@ -0,0 +1,52 @@
1
+ require 'spec_helper'
2
+
3
+ fixture_path = 'spec/fixtures'
4
+
5
+ require 'date'
6
+ class DateConverter
7
+ def self.convert(value)
8
+ Date.strptime( value, '%m/%d/%Y')
9
+ end
10
+ end
11
+
12
+ class CurrencyConverter
13
+ def self.convert(value)
14
+ value.sub(/[$]/,'').to_f # would be nice to add a computed column :currency => '€'
15
+ end
16
+ end
17
+
18
+ describe 'be_able_to' do
19
+ it 'convert date values into Date instances' do
20
+ options = {:value_converters => {:date => DateConverter}}
21
+ data = SmarterCSV.process("#{fixture_path}/with_dates.csv", options)
22
+ data.flatten.size.should == 3
23
+ data[0][:date].class.should eq Date
24
+ data[0][:date].to_s.should eq "1998-10-30"
25
+ data[1][:date].to_s.should eq "2011-02-01"
26
+ data[2][:date].to_s.should eq "2013-01-09"
27
+ end
28
+
29
+ it 'converts dollar prices into float values' do
30
+ options = {:value_converters => {:price => CurrencyConverter}}
31
+ data = SmarterCSV.process("#{fixture_path}/money.csv", options)
32
+ data.flatten.size.should == 2
33
+ data[0][:price].class.should eq Float
34
+ data[0][:price].should eq 9.99
35
+ data[1][:price].should eq 14.99
36
+ end
37
+
38
+ it 'convert can use multiple value converters' do
39
+ options = {:value_converters => {:date => DateConverter, :price => CurrencyConverter}}
40
+ data = SmarterCSV.process("#{fixture_path}/with_dates.csv", options)
41
+ data.flatten.size.should == 3
42
+ data[0][:date].class.should eq Date
43
+ data[0][:date].to_s.should eq "1998-10-30"
44
+ data[1][:date].to_s.should eq "2011-02-01"
45
+ data[2][:date].to_s.should eq "2013-01-09"
46
+
47
+ data[0][:price].class.should eq Float
48
+ data[0][:price].should eq 44.50
49
+ data[1][:price].should eq 15.0
50
+ data[2][:price].should eq 0.11
51
+ end
52
+ end
metadata CHANGED
@@ -1,7 +1,7 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: smarter_csv
3
3
  version: !ruby/object:Gem::Version
4
- version: 1.0.19
4
+ version: 1.1.0
5
5
  platform: ruby
6
6
  authors:
7
7
  - |
@@ -9,7 +9,7 @@ authors:
9
9
  autorequire:
10
10
  bindir: bin
11
11
  cert_chain: []
12
- date: 2014-10-29 00:00:00.000000000 Z
12
+ date: 2015-07-27 00:00:00.000000000 Z
13
13
  dependencies:
14
14
  - !ruby/object:Gem::Dependency
15
15
  name: rspec
@@ -59,17 +59,21 @@ files:
59
59
  - spec/fixtures/line_endings_r.csv
60
60
  - spec/fixtures/line_endings_rn.csv
61
61
  - spec/fixtures/lots_of_columns.csv
62
+ - spec/fixtures/money.csv
62
63
  - spec/fixtures/no_header.csv
63
64
  - spec/fixtures/numeric.csv
64
65
  - spec/fixtures/pets.csv
65
66
  - spec/fixtures/quoted.csv
66
67
  - spec/fixtures/separator.csv
68
+ - spec/fixtures/with_dashes.csv
69
+ - spec/fixtures/with_dates.csv
67
70
  - spec/smarter_csv/binary_file2_spec.rb
68
71
  - spec/smarter_csv/binary_file_spec.rb
69
72
  - spec/smarter_csv/carriage_return_spec.rb
70
73
  - spec/smarter_csv/chunked_reading_spec.rb
71
74
  - spec/smarter_csv/column_separator_spec.rb
72
75
  - spec/smarter_csv/convert_values_to_numeric_spec.rb
76
+ - spec/smarter_csv/header_transformation_spec.rb
73
77
  - spec/smarter_csv/keep_headers_spec.rb
74
78
  - spec/smarter_csv/key_mapping_spec.rb
75
79
  - spec/smarter_csv/line_ending_spec.rb
@@ -84,6 +88,7 @@ files:
84
88
  - spec/smarter_csv/remove_zero_values_spec.rb
85
89
  - spec/smarter_csv/strings_as_keys_spec.rb
86
90
  - spec/smarter_csv/strip_chars_from_headers_spec.rb
91
+ - spec/smarter_csv/value_converters_spec.rb
87
92
  - spec/spec.opts
88
93
  - spec/spec/spec_helper.rb
89
94
  - spec/spec_helper.rb
@@ -109,7 +114,7 @@ required_rubygems_version: !ruby/object:Gem::Requirement
109
114
  requirements:
110
115
  - csv
111
116
  rubyforge_project:
112
- rubygems_version: 2.2.2
117
+ rubygems_version: 2.4.5
113
118
  signing_key:
114
119
  specification_version: 4
115
120
  summary: Ruby Gem for smarter importing of CSV Files (and CSV-like files), with lots
@@ -127,17 +132,21 @@ test_files:
127
132
  - spec/fixtures/line_endings_r.csv
128
133
  - spec/fixtures/line_endings_rn.csv
129
134
  - spec/fixtures/lots_of_columns.csv
135
+ - spec/fixtures/money.csv
130
136
  - spec/fixtures/no_header.csv
131
137
  - spec/fixtures/numeric.csv
132
138
  - spec/fixtures/pets.csv
133
139
  - spec/fixtures/quoted.csv
134
140
  - spec/fixtures/separator.csv
141
+ - spec/fixtures/with_dashes.csv
142
+ - spec/fixtures/with_dates.csv
135
143
  - spec/smarter_csv/binary_file2_spec.rb
136
144
  - spec/smarter_csv/binary_file_spec.rb
137
145
  - spec/smarter_csv/carriage_return_spec.rb
138
146
  - spec/smarter_csv/chunked_reading_spec.rb
139
147
  - spec/smarter_csv/column_separator_spec.rb
140
148
  - spec/smarter_csv/convert_values_to_numeric_spec.rb
149
+ - spec/smarter_csv/header_transformation_spec.rb
141
150
  - spec/smarter_csv/keep_headers_spec.rb
142
151
  - spec/smarter_csv/key_mapping_spec.rb
143
152
  - spec/smarter_csv/line_ending_spec.rb
@@ -152,6 +161,7 @@ test_files:
152
161
  - spec/smarter_csv/remove_zero_values_spec.rb
153
162
  - spec/smarter_csv/strings_as_keys_spec.rb
154
163
  - spec/smarter_csv/strip_chars_from_headers_spec.rb
164
+ - spec/smarter_csv/value_converters_spec.rb
155
165
  - spec/spec.opts
156
166
  - spec/spec/spec_helper.rb
157
167
  - spec/spec_helper.rb