smarter_csv 1.0.15 → 1.0.16

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -7,8 +7,6 @@ rvm:
7
7
  - 2.0.0
8
8
  - jruby-18mode
9
9
  - jruby-19mode
10
- - rbx-18mode
11
- - rbx-19mode
12
10
  - ruby-head
13
11
  - jruby-head
14
12
  - ree
@@ -18,6 +16,8 @@ jdk:
18
16
  env: JRUBY_OPTS="--server -Xcompile.invokedynamic=false -J-XX:+TieredCompilation -J-XX:TieredStopAtLevel=1 -J-noverify -J-Xms512m -J-Xmx1024m"
19
17
  matrix:
20
18
  allow_failures:
19
+ - rbx-18mode
20
+ - rbx-19mode
21
21
  - rvm: jruby-head
22
22
  - rvm: ruby-head
23
23
  - rvm: ree
data/README.md CHANGED
@@ -3,6 +3,11 @@
3
3
  `smarter_csv` is a Ruby Gem for smarter importing of CSV Files as Array(s) of Hashes, suitable for direct processing with Mongoid or ActiveRecord,
4
4
  and parallel processing with Resque or Sidekiq.
5
5
 
6
+ One `smarter_csv` user wrote:
7
+
8
+ *Best gem for CSV for us yet. [...] taking an import process from 7+ hours to about 3 minutes.
9
+ [...] Smarter CSV was a big part and helped clean up our code ALOT*
10
+
6
11
  `smarter_csv` has lots of features:
7
12
  * able to process large CSV-files
8
13
  * able to chunk the input from the CSV file to avoid loading the whole CSV file into memory
@@ -143,12 +148,13 @@ The options and the block are optional.
143
148
  | :remove_values_matching | nil | removes key/value pairs if value matches given regular expressions. e.g.: |
144
149
  | | | /^\$0\.0+$/ to match $0.00 , or /^#VALUE!$/ to match errors in Excel spreadsheets |
145
150
  | :convert_values_to_numeric | true | converts strings containing Integers or Floats to the appropriate class |
151
+ | | | also accepts either {:except => [:key1,:key2]} or {:only => :key3} |
146
152
  | :remove_empty_hashes | true | remove / ignore any hashes which don't have any key/value pairs |
147
153
  | :user_provided_headers | nil | *careful with that axe!* |
148
154
  | | | user provided Array of header strings or symbols, to define |
149
155
  | | | what headers should be used, overriding any in-file headers. |
150
156
  | | | You can not combine the :user_provided_headers and :key_mapping options |
151
- | :strip_chars_from_headers | nil | remove extraneous characters from the header line (e.g. if the headers are quoted) |
157
+ | :strip_chars_from_headers | nil | RegExp to remove extraneous characters from the header line (e.g. if headers are quoted) |
152
158
  | :headers_in_file | true | Whether or not the file contains headers as the first line. |
153
159
  | | | Important if the file does not contain headers, |
154
160
  | | | otherwise you would lose the first line of data. |
@@ -178,6 +184,10 @@ The options and the block are optional.
178
184
  * if the chunk_size is > 0 , then the array may contain up to chunk_size Hashes.
179
185
  * this can be very useful when passing chunked data to a post-processing step, e.g. through Resque
180
186
 
187
+ #### NOTES on improper quotation and unwanted characters in headers:
188
+ * some CSV files use un-escaped quotation characters inside fields. This can cause the import to break. To get around this, use the `:force_simple_split => true` option in combination with `:strip_chars_from_headers => /[\-"]/` . This will also significantly speed up the import.
189
+ If you would force a different :quote_char instead (setting it to a non-used character), then the import would be up to 5-times slower than using `:force_simple_split`.
190
+
181
191
  #### Known Issues:
182
192
  * if you are using 1.8.7 versions of Ruby, JRuby, or Ruby Enterprise Edition, `smarter_csv` will have problems with double-quoted fields, because of a bug in an underlying library.
183
193
  * if your CSV data contains the :row_sep character, e.g. CR, smarter_csv will not be able to handle the data, but will report `CSV::MalformedCSVError: Unclosed quoted field`.
@@ -218,7 +228,11 @@ Or install it yourself as:
218
228
 
219
229
  ## Changes
220
230
 
221
- #### 1.0.15 (2013-11-01)
231
+ #### 1.0.16 (2014-01-13)
232
+ * :convert_values_to_numeric option can now be qualified with :except or :only (thanks to Hugo Lepetit)
233
+ * removed deprecated `process_csv` method
234
+
235
+ #### 1.0.15 (2013-12-07)
222
236
  * new option:
223
237
  * :remove_unmapped_keys to completely ignore columns which were not mapped with :key_mapping (thanks to Dave Sanders)
224
238
 
@@ -319,6 +333,7 @@ And a special thanks to those who contributed pull requests:
319
333
  * [Marcos G. Zimmermann](https://github.com/marcosgz)
320
334
  * [Jordan Running](https://github.com/jrunning)
321
335
  * [Dave Sanders](https://github.com/DaveSanders)
336
+ * [Hugo Lepetit](https://github.com/giglemad)
322
337
 
323
338
 
324
339
  ## Contributing
@@ -1,7 +1,8 @@
1
1
  module SmarterCSV
2
2
 
3
- class HeaderSizeMismatch < Exception
4
- end
3
+ class HeaderSizeMismatch < Exception; end
4
+
5
+ class IncorrectOption < Exception; end
5
6
 
6
7
  def SmarterCSV.process(input, options={}, &block) # first parameter: filename or input object with readline method
7
8
  default_options = {:col_sep => ',' , :row_sep => $/ , :quote_char => '"', :force_simple_split => false , :verbose => false ,
@@ -103,6 +104,10 @@ module SmarterCSV
103
104
  hash.delete_if{|k,v| v =~ options[:remove_values_matching]} if options[:remove_values_matching]
104
105
  if options[:convert_values_to_numeric]
105
106
  hash.each do |k,v|
107
+ # deal with the :only / :except options to :convert_values_to_numeric
108
+ next if SmarterCSV.only_or_except_limit_execution( options, :convert_values_to_numeric , k )
109
+
110
+ # convert if it's a numeric value:
106
111
  case v
107
112
  when /^[+-]?\d+\.\d+$/
108
113
  hash[k] = v.to_f
@@ -128,11 +133,9 @@ module SmarterCSV
128
133
  else
129
134
 
130
135
  # the last chunk may contain partial data, which also needs to be returned (BUG / ISSUE-18)
131
-
132
136
 
133
137
  end
134
138
 
135
-
136
139
  # while a chunk is being filled up we don't need to do anything else here
137
140
 
138
141
  else # no chunk handling
@@ -164,9 +167,23 @@ module SmarterCSV
164
167
  end
165
168
  end
166
169
 
167
- def SmarterCSV.process_csv(*args)
168
- warn "[DEPRECATION] `process_csv` is deprecated. Please use `process` instead."
169
- SmarterCSV.process(*args)
170
+ # def SmarterCSV.process_csv(*args)
171
+ # warn "[DEPRECATION] `process_csv` is deprecated. Please use `process` instead."
172
+ # SmarterCSV.process(*args)
173
+ # end
174
+
175
+ private
176
+ # acts as a road-block to limit processing when iterating over all k/v pairs of a CSV-hash:
177
+
178
+ def self.only_or_except_limit_execution( options, option_name, key )
179
+ if options[option_name].is_a?(Hash)
180
+ if options[option_name].has_key?( :except )
181
+ return true if Array( options[ option_name ][:except] ).include?(key)
182
+ elsif options[ option_name ].has_key?(:only)
183
+ return true unless Array( options[ option_name ][:only] ).include?(key)
184
+ end
185
+ end
186
+ return false
170
187
  end
171
188
  end
172
189
 
@@ -1,3 +1,3 @@
1
1
  module SmarterCSV
2
- VERSION = "1.0.15"
2
+ VERSION = "1.0.16"
3
3
  end
@@ -0,0 +1,5 @@
1
+ First Name,Last Name,Reference, Wealth
2
+ Dan,McAllister,0123,3.5
3
+ ,,,,,
4
+ Miles,O'Brian,2345,3
5
+ Nancy,Homes,2345,01
@@ -0,0 +1,48 @@
1
+ require 'spec_helper'
2
+
3
+ fixture_path = 'spec/fixtures'
4
+
5
+ describe 'numeric conversion of values' do
6
+ it 'occurs by default' do
7
+ options = {}
8
+ data = SmarterCSV.process("#{fixture_path}/numeric.csv", options)
9
+ data.size.should == 3
10
+
11
+ # all the keys should be symbols
12
+ data.each do |hash|
13
+ hash[:wealth].should be_a_kind_of(Numeric) unless hash[:wealth].nil?
14
+ hash[:reference].should be_a_kind_of(Numeric) unless hash[:reference].nil?
15
+ end
16
+ end
17
+
18
+ it 'can be prevented for all values' do
19
+ options = { convert_values_to_numeric: false }
20
+ data = SmarterCSV.process("#{fixture_path}/numeric.csv", options)
21
+
22
+ data.each do |hash|
23
+ hash[:wealth].should be_a_kind_of(String) unless hash[:wealth].nil?
24
+ hash[:reference].should be_a_kind_of(String) unless hash[:reference].nil?
25
+ end
26
+ end
27
+
28
+ it 'can be prevented for some keys' do
29
+ options = { convert_values_to_numeric: { except: :reference }}
30
+ data = SmarterCSV.process("#{fixture_path}/numeric.csv", options)
31
+
32
+ data.each do |hash|
33
+ hash[:wealth].should be_a_kind_of(Numeric) unless hash[:wealth].nil?
34
+ hash[:reference].should be_a_kind_of(String) unless hash[:reference].nil?
35
+ end
36
+ end
37
+
38
+ it 'can occur only for some keys' do
39
+ options = { convert_values_to_numeric: { only: :wealth }}
40
+ data = SmarterCSV.process("#{fixture_path}/numeric.csv", options)
41
+
42
+ data.each do |hash|
43
+ hash[:wealth].should be_a_kind_of(Numeric) unless hash[:wealth].nil?
44
+ hash[:reference].should be_a_kind_of(String) unless hash[:reference].nil?
45
+ end
46
+ end
47
+ end
48
+
metadata CHANGED
@@ -1,41 +1,45 @@
1
- --- !ruby/object:Gem::Specification
1
+ --- !ruby/object:Gem::Specification
2
2
  name: smarter_csv
3
- version: !ruby/object:Gem::Version
4
- version: 1.0.15
3
+ version: !ruby/object:Gem::Version
4
+ version: 1.0.16
5
+ prerelease:
5
6
  platform: ruby
6
- authors:
7
- - |
8
- Tilo Sloboda
7
+ authors:
8
+ - ! 'Tilo Sloboda
9
9
 
10
+ '
10
11
  autorequire:
11
12
  bindir: bin
12
13
  cert_chain: []
13
-
14
- date: 2013-12-07 00:00:00 Z
15
- dependencies:
16
- - !ruby/object:Gem::Dependency
14
+ date: 2014-01-13 00:00:00.000000000 Z
15
+ dependencies:
16
+ - !ruby/object:Gem::Dependency
17
17
  name: rspec
18
- prerelease: false
19
- requirement: &id001 !ruby/object:Gem::Requirement
20
- requirements:
21
- - &id002
22
- - ">="
23
- - !ruby/object:Gem::Version
24
- version: "0"
18
+ requirement: !ruby/object:Gem::Requirement
19
+ none: false
20
+ requirements:
21
+ - - ! '>='
22
+ - !ruby/object:Gem::Version
23
+ version: '0'
25
24
  type: :development
26
- version_requirements: *id001
27
- description: Ruby Gem for smarter importing of CSV Files as Array(s) of Hashes, with optional features for processing large files in parallel, embedded comments, unusual field- and record-separators, flexible mapping of CSV-headers to Hash-keys
28
- email:
29
- - |
30
- tilo.sloboda@gmail.com
25
+ prerelease: false
26
+ version_requirements: !ruby/object:Gem::Requirement
27
+ none: false
28
+ requirements:
29
+ - - ! '>='
30
+ - !ruby/object:Gem::Version
31
+ version: '0'
32
+ description: Ruby Gem for smarter importing of CSV Files as Array(s) of Hashes, with
33
+ optional features for processing large files in parallel, embedded comments, unusual
34
+ field- and record-separators, flexible mapping of CSV-headers to Hash-keys
35
+ email:
36
+ - ! 'tilo.sloboda@gmail.com
31
37
 
38
+ '
32
39
  executables: []
33
-
34
40
  extensions: []
35
-
36
41
  extra_rdoc_files: []
37
-
38
- files:
42
+ files:
39
43
  - .gitignore
40
44
  - .rspec
41
45
  - .rvmrc
@@ -53,6 +57,7 @@ files:
53
57
  - spec/fixtures/chunk_cornercase.csv
54
58
  - spec/fixtures/lots_of_columns.csv
55
59
  - spec/fixtures/no_header.csv
60
+ - spec/fixtures/numeric.csv
56
61
  - spec/fixtures/pets.csv
57
62
  - spec/fixtures/quoted.csv
58
63
  - spec/fixtures/separator.csv
@@ -60,6 +65,7 @@ files:
60
65
  - spec/smarter_csv/binary_file_spec.rb
61
66
  - spec/smarter_csv/chunked_reading_spec.rb
62
67
  - spec/smarter_csv/column_separator_spec.rb
68
+ - spec/smarter_csv/convert_values_to_numeric_spec.rb
63
69
  - spec/smarter_csv/key_mapping_spec.rb
64
70
  - spec/smarter_csv/load_basic_spec.rb
65
71
  - spec/smarter_csv/no_header_spec.rb
@@ -75,35 +81,40 @@ files:
75
81
  - spec/spec/spec_helper.rb
76
82
  - spec/spec_helper.rb
77
83
  homepage: https://github.com/tilo/smarter_csv
78
- licenses:
84
+ licenses:
79
85
  - MIT
80
86
  - GPL-2
81
- metadata: {}
82
-
83
87
  post_install_message:
84
88
  rdoc_options: []
85
-
86
- require_paths:
89
+ require_paths:
87
90
  - lib
88
- required_ruby_version: !ruby/object:Gem::Requirement
89
- requirements:
90
- - *id002
91
- required_rubygems_version: !ruby/object:Gem::Requirement
92
- requirements:
93
- - *id002
94
- requirements:
91
+ required_ruby_version: !ruby/object:Gem::Requirement
92
+ none: false
93
+ requirements:
94
+ - - ! '>='
95
+ - !ruby/object:Gem::Version
96
+ version: '0'
97
+ required_rubygems_version: !ruby/object:Gem::Requirement
98
+ none: false
99
+ requirements:
100
+ - - ! '>='
101
+ - !ruby/object:Gem::Version
102
+ version: '0'
103
+ requirements:
95
104
  - csv
96
105
  rubyforge_project:
97
- rubygems_version: 2.0.3
106
+ rubygems_version: 1.8.23
98
107
  signing_key:
99
- specification_version: 4
100
- summary: Ruby Gem for smarter importing of CSV Files (and CSV-like files), with lots of optional features, e.g. chunked processing for huge CSV files
101
- test_files:
108
+ specification_version: 3
109
+ summary: Ruby Gem for smarter importing of CSV Files (and CSV-like files), with lots
110
+ of optional features, e.g. chunked processing for huge CSV files
111
+ test_files:
102
112
  - spec/fixtures/basic.csv
103
113
  - spec/fixtures/binary.csv
104
114
  - spec/fixtures/chunk_cornercase.csv
105
115
  - spec/fixtures/lots_of_columns.csv
106
116
  - spec/fixtures/no_header.csv
117
+ - spec/fixtures/numeric.csv
107
118
  - spec/fixtures/pets.csv
108
119
  - spec/fixtures/quoted.csv
109
120
  - spec/fixtures/separator.csv
@@ -111,6 +122,7 @@ test_files:
111
122
  - spec/smarter_csv/binary_file_spec.rb
112
123
  - spec/smarter_csv/chunked_reading_spec.rb
113
124
  - spec/smarter_csv/column_separator_spec.rb
125
+ - spec/smarter_csv/convert_values_to_numeric_spec.rb
114
126
  - spec/smarter_csv/key_mapping_spec.rb
115
127
  - spec/smarter_csv/load_basic_spec.rb
116
128
  - spec/smarter_csv/no_header_spec.rb
checksums.yaml DELETED
@@ -1,7 +0,0 @@
1
- ---
2
- SHA1:
3
- metadata.gz: 7fe73934009f54b4447577c4190d8051e315544f
4
- data.tar.gz: b32b1635d2cb25f5feac9e06522058c319d6291d
5
- SHA512:
6
- metadata.gz: 726b149bde30de57bbabd90078b92f20a4b59d6a41dfb076b0bdb8fae6d47d7701fbdd427f154e1c523fcce3e93ae0af71c4f9426ff48edca10241ab1b50ce5b
7
- data.tar.gz: 4cc80396cb2a41fae08c20969b3c4ff05b13900289f81a32ef36cc830f363d710428d050b666bc0843b0641ea63486b0157ddf2c728b59a2a47e3eb9974e265d