smarter_csv 1.0.15 → 1.0.16

Sign up to get free protection for your applications and to get access to all the features.
@@ -7,8 +7,6 @@ rvm:
7
7
  - 2.0.0
8
8
  - jruby-18mode
9
9
  - jruby-19mode
10
- - rbx-18mode
11
- - rbx-19mode
12
10
  - ruby-head
13
11
  - jruby-head
14
12
  - ree
@@ -18,6 +16,8 @@ jdk:
18
16
  env: JRUBY_OPTS="--server -Xcompile.invokedynamic=false -J-XX:+TieredCompilation -J-XX:TieredStopAtLevel=1 -J-noverify -J-Xms512m -J-Xmx1024m"
19
17
  matrix:
20
18
  allow_failures:
19
+ - rbx-18mode
20
+ - rbx-19mode
21
21
  - rvm: jruby-head
22
22
  - rvm: ruby-head
23
23
  - rvm: ree
data/README.md CHANGED
@@ -3,6 +3,11 @@
3
3
  `smarter_csv` is a Ruby Gem for smarter importing of CSV Files as Array(s) of Hashes, suitable for direct processing with Mongoid or ActiveRecord,
4
4
  and parallel processing with Resque or Sidekiq.
5
5
 
6
+ One `smarter_csv` user wrote:
7
+
8
+ *Best gem for CSV for us yet. [...] taking an import process from 7+ hours to about 3 minutes.
9
+ [...] Smarter CSV was a big part and helped clean up our code ALOT*
10
+
6
11
  `smarter_csv` has lots of features:
7
12
  * able to process large CSV-files
8
13
  * able to chunk the input from the CSV file to avoid loading the whole CSV file into memory
@@ -143,12 +148,13 @@ The options and the block are optional.
143
148
  | :remove_values_matching | nil | removes key/value pairs if value matches given regular expressions. e.g.: |
144
149
  | | | /^\$0\.0+$/ to match $0.00 , or /^#VALUE!$/ to match errors in Excel spreadsheets |
145
150
  | :convert_values_to_numeric | true | converts strings containing Integers or Floats to the appropriate class |
151
+ | | | also accepts either {:except => [:key1,:key2]} or {:only => :key3} |
146
152
  | :remove_empty_hashes | true | remove / ignore any hashes which don't have any key/value pairs |
147
153
  | :user_provided_headers | nil | *careful with that axe!* |
148
154
  | | | user provided Array of header strings or symbols, to define |
149
155
  | | | what headers should be used, overriding any in-file headers. |
150
156
  | | | You can not combine the :user_provided_headers and :key_mapping options |
151
- | :strip_chars_from_headers | nil | remove extraneous characters from the header line (e.g. if the headers are quoted) |
157
+ | :strip_chars_from_headers | nil | RegExp to remove extraneous characters from the header line (e.g. if headers are quoted) |
152
158
  | :headers_in_file | true | Whether or not the file contains headers as the first line. |
153
159
  | | | Important if the file does not contain headers, |
154
160
  | | | otherwise you would lose the first line of data. |
@@ -178,6 +184,10 @@ The options and the block are optional.
178
184
  * if the chunk_size is > 0 , then the array may contain up to chunk_size Hashes.
179
185
  * this can be very useful when passing chunked data to a post-processing step, e.g. through Resque
180
186
 
187
+ #### NOTES on improper quotation and unwanted characters in headers:
188
+ * some CSV files use un-escaped quotation characters inside fields. This can cause the import to break. To get around this, use the `:force_simple_split => true` option in combination with `:strip_chars_from_headers => /[\-"]/` . This will also significantly speed up the import.
189
+ If you would force a different :quote_char instead (setting it to a non-used character), then the import would be up to 5-times slower than using `:force_simple_split`.
190
+
181
191
  #### Known Issues:
182
192
  * if you are using 1.8.7 versions of Ruby, JRuby, or Ruby Enterprise Edition, `smarter_csv` will have problems with double-quoted fields, because of a bug in an underlying library.
183
193
  * if your CSV data contains the :row_sep character, e.g. CR, smarter_csv will not be able to handle the data, but will report `CSV::MalformedCSVError: Unclosed quoted field`.
@@ -218,7 +228,11 @@ Or install it yourself as:
218
228
 
219
229
  ## Changes
220
230
 
221
- #### 1.0.15 (2013-11-01)
231
+ #### 1.0.16 (2014-01-13)
232
+ * :convert_values_to_numeric option can now be qualified with :except or :only (thanks to Hugo Lepetit)
233
+ * removed deprecated `process_csv` method
234
+
235
+ #### 1.0.15 (2013-12-07)
222
236
  * new option:
223
237
  * :remove_unmapped_keys to completely ignore columns which were not mapped with :key_mapping (thanks to Dave Sanders)
224
238
 
@@ -319,6 +333,7 @@ And a special thanks to those who contributed pull requests:
319
333
  * [Marcos G. Zimmermann](https://github.com/marcosgz)
320
334
  * [Jordan Running](https://github.com/jrunning)
321
335
  * [Dave Sanders](https://github.com/DaveSanders)
336
+ * [Hugo Lepetit](https://github.com/giglemad)
322
337
 
323
338
 
324
339
  ## Contributing
@@ -1,7 +1,8 @@
1
1
  module SmarterCSV
2
2
 
3
- class HeaderSizeMismatch < Exception
4
- end
3
+ class HeaderSizeMismatch < Exception; end
4
+
5
+ class IncorrectOption < Exception; end
5
6
 
6
7
  def SmarterCSV.process(input, options={}, &block) # first parameter: filename or input object with readline method
7
8
  default_options = {:col_sep => ',' , :row_sep => $/ , :quote_char => '"', :force_simple_split => false , :verbose => false ,
@@ -103,6 +104,10 @@ module SmarterCSV
103
104
  hash.delete_if{|k,v| v =~ options[:remove_values_matching]} if options[:remove_values_matching]
104
105
  if options[:convert_values_to_numeric]
105
106
  hash.each do |k,v|
107
+ # deal with the :only / :except options to :convert_values_to_numeric
108
+ next if SmarterCSV.only_or_except_limit_execution( options, :convert_values_to_numeric , k )
109
+
110
+ # convert if it's a numeric value:
106
111
  case v
107
112
  when /^[+-]?\d+\.\d+$/
108
113
  hash[k] = v.to_f
@@ -128,11 +133,9 @@ module SmarterCSV
128
133
  else
129
134
 
130
135
  # the last chunk may contain partial data, which also needs to be returned (BUG / ISSUE-18)
131
-
132
136
 
133
137
  end
134
138
 
135
-
136
139
  # while a chunk is being filled up we don't need to do anything else here
137
140
 
138
141
  else # no chunk handling
@@ -164,9 +167,23 @@ module SmarterCSV
164
167
  end
165
168
  end
166
169
 
167
- def SmarterCSV.process_csv(*args)
168
- warn "[DEPRECATION] `process_csv` is deprecated. Please use `process` instead."
169
- SmarterCSV.process(*args)
170
+ # def SmarterCSV.process_csv(*args)
171
+ # warn "[DEPRECATION] `process_csv` is deprecated. Please use `process` instead."
172
+ # SmarterCSV.process(*args)
173
+ # end
174
+
175
+ private
176
+ # acts as a road-block to limit processing when iterating over all k/v pairs of a CSV-hash:
177
+
178
+ def self.only_or_except_limit_execution( options, option_name, key )
179
+ if options[option_name].is_a?(Hash)
180
+ if options[option_name].has_key?( :except )
181
+ return true if Array( options[ option_name ][:except] ).include?(key)
182
+ elsif options[ option_name ].has_key?(:only)
183
+ return true unless Array( options[ option_name ][:only] ).include?(key)
184
+ end
185
+ end
186
+ return false
170
187
  end
171
188
  end
172
189
 
@@ -1,3 +1,3 @@
1
1
  module SmarterCSV
2
- VERSION = "1.0.15"
2
+ VERSION = "1.0.16"
3
3
  end
@@ -0,0 +1,5 @@
1
+ First Name,Last Name,Reference, Wealth
2
+ Dan,McAllister,0123,3.5
3
+ ,,,,,
4
+ Miles,O'Brian,2345,3
5
+ Nancy,Homes,2345,01
@@ -0,0 +1,48 @@
1
+ require 'spec_helper'
2
+
3
+ fixture_path = 'spec/fixtures'
4
+
5
+ describe 'numeric conversion of values' do
6
+ it 'occurs by default' do
7
+ options = {}
8
+ data = SmarterCSV.process("#{fixture_path}/numeric.csv", options)
9
+ data.size.should == 3
10
+
11
+ # all the keys should be symbols
12
+ data.each do |hash|
13
+ hash[:wealth].should be_a_kind_of(Numeric) unless hash[:wealth].nil?
14
+ hash[:reference].should be_a_kind_of(Numeric) unless hash[:reference].nil?
15
+ end
16
+ end
17
+
18
+ it 'can be prevented for all values' do
19
+ options = { convert_values_to_numeric: false }
20
+ data = SmarterCSV.process("#{fixture_path}/numeric.csv", options)
21
+
22
+ data.each do |hash|
23
+ hash[:wealth].should be_a_kind_of(String) unless hash[:wealth].nil?
24
+ hash[:reference].should be_a_kind_of(String) unless hash[:reference].nil?
25
+ end
26
+ end
27
+
28
+ it 'can be prevented for some keys' do
29
+ options = { convert_values_to_numeric: { except: :reference }}
30
+ data = SmarterCSV.process("#{fixture_path}/numeric.csv", options)
31
+
32
+ data.each do |hash|
33
+ hash[:wealth].should be_a_kind_of(Numeric) unless hash[:wealth].nil?
34
+ hash[:reference].should be_a_kind_of(String) unless hash[:reference].nil?
35
+ end
36
+ end
37
+
38
+ it 'can occur only for some keys' do
39
+ options = { convert_values_to_numeric: { only: :wealth }}
40
+ data = SmarterCSV.process("#{fixture_path}/numeric.csv", options)
41
+
42
+ data.each do |hash|
43
+ hash[:wealth].should be_a_kind_of(Numeric) unless hash[:wealth].nil?
44
+ hash[:reference].should be_a_kind_of(String) unless hash[:reference].nil?
45
+ end
46
+ end
47
+ end
48
+
metadata CHANGED
@@ -1,41 +1,45 @@
1
- --- !ruby/object:Gem::Specification
1
+ --- !ruby/object:Gem::Specification
2
2
  name: smarter_csv
3
- version: !ruby/object:Gem::Version
4
- version: 1.0.15
3
+ version: !ruby/object:Gem::Version
4
+ version: 1.0.16
5
+ prerelease:
5
6
  platform: ruby
6
- authors:
7
- - |
8
- Tilo Sloboda
7
+ authors:
8
+ - ! 'Tilo Sloboda
9
9
 
10
+ '
10
11
  autorequire:
11
12
  bindir: bin
12
13
  cert_chain: []
13
-
14
- date: 2013-12-07 00:00:00 Z
15
- dependencies:
16
- - !ruby/object:Gem::Dependency
14
+ date: 2014-01-13 00:00:00.000000000 Z
15
+ dependencies:
16
+ - !ruby/object:Gem::Dependency
17
17
  name: rspec
18
- prerelease: false
19
- requirement: &id001 !ruby/object:Gem::Requirement
20
- requirements:
21
- - &id002
22
- - ">="
23
- - !ruby/object:Gem::Version
24
- version: "0"
18
+ requirement: !ruby/object:Gem::Requirement
19
+ none: false
20
+ requirements:
21
+ - - ! '>='
22
+ - !ruby/object:Gem::Version
23
+ version: '0'
25
24
  type: :development
26
- version_requirements: *id001
27
- description: Ruby Gem for smarter importing of CSV Files as Array(s) of Hashes, with optional features for processing large files in parallel, embedded comments, unusual field- and record-separators, flexible mapping of CSV-headers to Hash-keys
28
- email:
29
- - |
30
- tilo.sloboda@gmail.com
25
+ prerelease: false
26
+ version_requirements: !ruby/object:Gem::Requirement
27
+ none: false
28
+ requirements:
29
+ - - ! '>='
30
+ - !ruby/object:Gem::Version
31
+ version: '0'
32
+ description: Ruby Gem for smarter importing of CSV Files as Array(s) of Hashes, with
33
+ optional features for processing large files in parallel, embedded comments, unusual
34
+ field- and record-separators, flexible mapping of CSV-headers to Hash-keys
35
+ email:
36
+ - ! 'tilo.sloboda@gmail.com
31
37
 
38
+ '
32
39
  executables: []
33
-
34
40
  extensions: []
35
-
36
41
  extra_rdoc_files: []
37
-
38
- files:
42
+ files:
39
43
  - .gitignore
40
44
  - .rspec
41
45
  - .rvmrc
@@ -53,6 +57,7 @@ files:
53
57
  - spec/fixtures/chunk_cornercase.csv
54
58
  - spec/fixtures/lots_of_columns.csv
55
59
  - spec/fixtures/no_header.csv
60
+ - spec/fixtures/numeric.csv
56
61
  - spec/fixtures/pets.csv
57
62
  - spec/fixtures/quoted.csv
58
63
  - spec/fixtures/separator.csv
@@ -60,6 +65,7 @@ files:
60
65
  - spec/smarter_csv/binary_file_spec.rb
61
66
  - spec/smarter_csv/chunked_reading_spec.rb
62
67
  - spec/smarter_csv/column_separator_spec.rb
68
+ - spec/smarter_csv/convert_values_to_numeric_spec.rb
63
69
  - spec/smarter_csv/key_mapping_spec.rb
64
70
  - spec/smarter_csv/load_basic_spec.rb
65
71
  - spec/smarter_csv/no_header_spec.rb
@@ -75,35 +81,40 @@ files:
75
81
  - spec/spec/spec_helper.rb
76
82
  - spec/spec_helper.rb
77
83
  homepage: https://github.com/tilo/smarter_csv
78
- licenses:
84
+ licenses:
79
85
  - MIT
80
86
  - GPL-2
81
- metadata: {}
82
-
83
87
  post_install_message:
84
88
  rdoc_options: []
85
-
86
- require_paths:
89
+ require_paths:
87
90
  - lib
88
- required_ruby_version: !ruby/object:Gem::Requirement
89
- requirements:
90
- - *id002
91
- required_rubygems_version: !ruby/object:Gem::Requirement
92
- requirements:
93
- - *id002
94
- requirements:
91
+ required_ruby_version: !ruby/object:Gem::Requirement
92
+ none: false
93
+ requirements:
94
+ - - ! '>='
95
+ - !ruby/object:Gem::Version
96
+ version: '0'
97
+ required_rubygems_version: !ruby/object:Gem::Requirement
98
+ none: false
99
+ requirements:
100
+ - - ! '>='
101
+ - !ruby/object:Gem::Version
102
+ version: '0'
103
+ requirements:
95
104
  - csv
96
105
  rubyforge_project:
97
- rubygems_version: 2.0.3
106
+ rubygems_version: 1.8.23
98
107
  signing_key:
99
- specification_version: 4
100
- summary: Ruby Gem for smarter importing of CSV Files (and CSV-like files), with lots of optional features, e.g. chunked processing for huge CSV files
101
- test_files:
108
+ specification_version: 3
109
+ summary: Ruby Gem for smarter importing of CSV Files (and CSV-like files), with lots
110
+ of optional features, e.g. chunked processing for huge CSV files
111
+ test_files:
102
112
  - spec/fixtures/basic.csv
103
113
  - spec/fixtures/binary.csv
104
114
  - spec/fixtures/chunk_cornercase.csv
105
115
  - spec/fixtures/lots_of_columns.csv
106
116
  - spec/fixtures/no_header.csv
117
+ - spec/fixtures/numeric.csv
107
118
  - spec/fixtures/pets.csv
108
119
  - spec/fixtures/quoted.csv
109
120
  - spec/fixtures/separator.csv
@@ -111,6 +122,7 @@ test_files:
111
122
  - spec/smarter_csv/binary_file_spec.rb
112
123
  - spec/smarter_csv/chunked_reading_spec.rb
113
124
  - spec/smarter_csv/column_separator_spec.rb
125
+ - spec/smarter_csv/convert_values_to_numeric_spec.rb
114
126
  - spec/smarter_csv/key_mapping_spec.rb
115
127
  - spec/smarter_csv/load_basic_spec.rb
116
128
  - spec/smarter_csv/no_header_spec.rb
checksums.yaml DELETED
@@ -1,7 +0,0 @@
1
- ---
2
- SHA1:
3
- metadata.gz: 7fe73934009f54b4447577c4190d8051e315544f
4
- data.tar.gz: b32b1635d2cb25f5feac9e06522058c319d6291d
5
- SHA512:
6
- metadata.gz: 726b149bde30de57bbabd90078b92f20a4b59d6a41dfb076b0bdb8fae6d47d7701fbdd427f154e1c523fcce3e93ae0af71c4f9426ff48edca10241ab1b50ce5b
7
- data.tar.gz: 4cc80396cb2a41fae08c20969b3c4ff05b13900289f81a32ef36cc830f363d710428d050b666bc0843b0641ea63486b0157ddf2c728b59a2a47e3eb9974e265d