smarter_csv 1.0.15 → 1.0.16
Sign up to get free protection for your applications and to get access to all the features.
- data/.travis.yml +2 -2
- data/README.md +17 -2
- data/lib/smarter_csv/smarter_csv.rb +24 -7
- data/lib/smarter_csv/version.rb +1 -1
- data/spec/fixtures/numeric.csv +5 -0
- data/spec/smarter_csv/convert_values_to_numeric_spec.rb +48 -0
- metadata +54 -42
- checksums.yaml +0 -7
data/.travis.yml
CHANGED
@@ -7,8 +7,6 @@ rvm:
|
|
7
7
|
- 2.0.0
|
8
8
|
- jruby-18mode
|
9
9
|
- jruby-19mode
|
10
|
-
- rbx-18mode
|
11
|
-
- rbx-19mode
|
12
10
|
- ruby-head
|
13
11
|
- jruby-head
|
14
12
|
- ree
|
@@ -18,6 +16,8 @@ jdk:
|
|
18
16
|
env: JRUBY_OPTS="--server -Xcompile.invokedynamic=false -J-XX:+TieredCompilation -J-XX:TieredStopAtLevel=1 -J-noverify -J-Xms512m -J-Xmx1024m"
|
19
17
|
matrix:
|
20
18
|
allow_failures:
|
19
|
+
- rbx-18mode
|
20
|
+
- rbx-19mode
|
21
21
|
- rvm: jruby-head
|
22
22
|
- rvm: ruby-head
|
23
23
|
- rvm: ree
|
data/README.md
CHANGED
@@ -3,6 +3,11 @@
|
|
3
3
|
`smarter_csv` is a Ruby Gem for smarter importing of CSV Files as Array(s) of Hashes, suitable for direct processing with Mongoid or ActiveRecord,
|
4
4
|
and parallel processing with Resque or Sidekiq.
|
5
5
|
|
6
|
+
One `smarter_csv` user wrote:
|
7
|
+
|
8
|
+
*Best gem for CSV for us yet. [...] taking an import process from 7+ hours to about 3 minutes.
|
9
|
+
[...] Smarter CSV was a big part and helped clean up our code ALOT*
|
10
|
+
|
6
11
|
`smarter_csv` has lots of features:
|
7
12
|
* able to process large CSV-files
|
8
13
|
* able to chunk the input from the CSV file to avoid loading the whole CSV file into memory
|
@@ -143,12 +148,13 @@ The options and the block are optional.
|
|
143
148
|
| :remove_values_matching | nil | removes key/value pairs if value matches given regular expressions. e.g.: |
|
144
149
|
| | | /^\$0\.0+$/ to match $0.00 , or /^#VALUE!$/ to match errors in Excel spreadsheets |
|
145
150
|
| :convert_values_to_numeric | true | converts strings containing Integers or Floats to the appropriate class |
|
151
|
+
| | | also accepts either {:except => [:key1,:key2]} or {:only => :key3} |
|
146
152
|
| :remove_empty_hashes | true | remove / ignore any hashes which don't have any key/value pairs |
|
147
153
|
| :user_provided_headers | nil | *careful with that axe!* |
|
148
154
|
| | | user provided Array of header strings or symbols, to define |
|
149
155
|
| | | what headers should be used, overriding any in-file headers. |
|
150
156
|
| | | You can not combine the :user_provided_headers and :key_mapping options |
|
151
|
-
| :strip_chars_from_headers | nil | remove extraneous characters from the header line (e.g. if
|
157
|
+
| :strip_chars_from_headers | nil | RegExp to remove extraneous characters from the header line (e.g. if headers are quoted) |
|
152
158
|
| :headers_in_file | true | Whether or not the file contains headers as the first line. |
|
153
159
|
| | | Important if the file does not contain headers, |
|
154
160
|
| | | otherwise you would lose the first line of data. |
|
@@ -178,6 +184,10 @@ The options and the block are optional.
|
|
178
184
|
* if the chunk_size is > 0 , then the array may contain up to chunk_size Hashes.
|
179
185
|
* this can be very useful when passing chunked data to a post-processing step, e.g. through Resque
|
180
186
|
|
187
|
+
#### NOTES on improper quotation and unwanted characters in headers:
|
188
|
+
* some CSV files use un-escaped quotation characters inside fields. This can cause the import to break. To get around this, use the `:force_simple_split => true` option in combination with `:strip_chars_from_headers => /[\-"]/` . This will also significantly speed up the import.
|
189
|
+
If you would force a different :quote_char instead (setting it to a non-used character), then the import would be up to 5-times slower than using `:force_simple_split`.
|
190
|
+
|
181
191
|
#### Known Issues:
|
182
192
|
* if you are using 1.8.7 versions of Ruby, JRuby, or Ruby Enterprise Edition, `smarter_csv` will have problems with double-quoted fields, because of a bug in an underlying library.
|
183
193
|
* if your CSV data contains the :row_sep character, e.g. CR, smarter_csv will not be able to handle the data, but will report `CSV::MalformedCSVError: Unclosed quoted field`.
|
@@ -218,7 +228,11 @@ Or install it yourself as:
|
|
218
228
|
|
219
229
|
## Changes
|
220
230
|
|
221
|
-
#### 1.0.
|
231
|
+
#### 1.0.16 (2014-01-13)
|
232
|
+
* :convert_values_to_numeric option can now be qualified with :except or :only (thanks to Hugo Lepetit)
|
233
|
+
* removed deprecated `process_csv` method
|
234
|
+
|
235
|
+
#### 1.0.15 (2013-12-07)
|
222
236
|
* new option:
|
223
237
|
* :remove_unmapped_keys to completely ignore columns which were not mapped with :key_mapping (thanks to Dave Sanders)
|
224
238
|
|
@@ -319,6 +333,7 @@ And a special thanks to those who contributed pull requests:
|
|
319
333
|
* [Marcos G. Zimmermann](https://github.com/marcosgz)
|
320
334
|
* [Jordan Running](https://github.com/jrunning)
|
321
335
|
* [Dave Sanders](https://github.com/DaveSanders)
|
336
|
+
* [Hugo Lepetit](https://github.com/giglemad)
|
322
337
|
|
323
338
|
|
324
339
|
## Contributing
|
@@ -1,7 +1,8 @@
|
|
1
1
|
module SmarterCSV
|
2
2
|
|
3
|
-
class HeaderSizeMismatch < Exception
|
4
|
-
|
3
|
+
class HeaderSizeMismatch < Exception; end
|
4
|
+
|
5
|
+
class IncorrectOption < Exception; end
|
5
6
|
|
6
7
|
def SmarterCSV.process(input, options={}, &block) # first parameter: filename or input object with readline method
|
7
8
|
default_options = {:col_sep => ',' , :row_sep => $/ , :quote_char => '"', :force_simple_split => false , :verbose => false ,
|
@@ -103,6 +104,10 @@ module SmarterCSV
|
|
103
104
|
hash.delete_if{|k,v| v =~ options[:remove_values_matching]} if options[:remove_values_matching]
|
104
105
|
if options[:convert_values_to_numeric]
|
105
106
|
hash.each do |k,v|
|
107
|
+
# deal with the :only / :except options to :convert_values_to_numeric
|
108
|
+
next if SmarterCSV.only_or_except_limit_execution( options, :convert_values_to_numeric , k )
|
109
|
+
|
110
|
+
# convert if it's a numeric value:
|
106
111
|
case v
|
107
112
|
when /^[+-]?\d+\.\d+$/
|
108
113
|
hash[k] = v.to_f
|
@@ -128,11 +133,9 @@ module SmarterCSV
|
|
128
133
|
else
|
129
134
|
|
130
135
|
# the last chunk may contain partial data, which also needs to be returned (BUG / ISSUE-18)
|
131
|
-
|
132
136
|
|
133
137
|
end
|
134
138
|
|
135
|
-
|
136
139
|
# while a chunk is being filled up we don't need to do anything else here
|
137
140
|
|
138
141
|
else # no chunk handling
|
@@ -164,9 +167,23 @@ module SmarterCSV
|
|
164
167
|
end
|
165
168
|
end
|
166
169
|
|
167
|
-
def SmarterCSV.process_csv(*args)
|
168
|
-
warn "[DEPRECATION] `process_csv` is deprecated. Please use `process` instead."
|
169
|
-
SmarterCSV.process(*args)
|
170
|
+
# def SmarterCSV.process_csv(*args)
|
171
|
+
# warn "[DEPRECATION] `process_csv` is deprecated. Please use `process` instead."
|
172
|
+
# SmarterCSV.process(*args)
|
173
|
+
# end
|
174
|
+
|
175
|
+
private
|
176
|
+
# acts as a road-block to limit processing when iterating over all k/v pairs of a CSV-hash:
|
177
|
+
|
178
|
+
def self.only_or_except_limit_execution( options, option_name, key )
|
179
|
+
if options[option_name].is_a?(Hash)
|
180
|
+
if options[option_name].has_key?( :except )
|
181
|
+
return true if Array( options[ option_name ][:except] ).include?(key)
|
182
|
+
elsif options[ option_name ].has_key?(:only)
|
183
|
+
return true unless Array( options[ option_name ][:only] ).include?(key)
|
184
|
+
end
|
185
|
+
end
|
186
|
+
return false
|
170
187
|
end
|
171
188
|
end
|
172
189
|
|
data/lib/smarter_csv/version.rb
CHANGED
@@ -0,0 +1,48 @@
|
|
1
|
+
require 'spec_helper'
|
2
|
+
|
3
|
+
fixture_path = 'spec/fixtures'
|
4
|
+
|
5
|
+
describe 'numeric conversion of values' do
|
6
|
+
it 'occurs by default' do
|
7
|
+
options = {}
|
8
|
+
data = SmarterCSV.process("#{fixture_path}/numeric.csv", options)
|
9
|
+
data.size.should == 3
|
10
|
+
|
11
|
+
# all the keys should be symbols
|
12
|
+
data.each do |hash|
|
13
|
+
hash[:wealth].should be_a_kind_of(Numeric) unless hash[:wealth].nil?
|
14
|
+
hash[:reference].should be_a_kind_of(Numeric) unless hash[:reference].nil?
|
15
|
+
end
|
16
|
+
end
|
17
|
+
|
18
|
+
it 'can be prevented for all values' do
|
19
|
+
options = { convert_values_to_numeric: false }
|
20
|
+
data = SmarterCSV.process("#{fixture_path}/numeric.csv", options)
|
21
|
+
|
22
|
+
data.each do |hash|
|
23
|
+
hash[:wealth].should be_a_kind_of(String) unless hash[:wealth].nil?
|
24
|
+
hash[:reference].should be_a_kind_of(String) unless hash[:reference].nil?
|
25
|
+
end
|
26
|
+
end
|
27
|
+
|
28
|
+
it 'can be prevented for some keys' do
|
29
|
+
options = { convert_values_to_numeric: { except: :reference }}
|
30
|
+
data = SmarterCSV.process("#{fixture_path}/numeric.csv", options)
|
31
|
+
|
32
|
+
data.each do |hash|
|
33
|
+
hash[:wealth].should be_a_kind_of(Numeric) unless hash[:wealth].nil?
|
34
|
+
hash[:reference].should be_a_kind_of(String) unless hash[:reference].nil?
|
35
|
+
end
|
36
|
+
end
|
37
|
+
|
38
|
+
it 'can occur only for some keys' do
|
39
|
+
options = { convert_values_to_numeric: { only: :wealth }}
|
40
|
+
data = SmarterCSV.process("#{fixture_path}/numeric.csv", options)
|
41
|
+
|
42
|
+
data.each do |hash|
|
43
|
+
hash[:wealth].should be_a_kind_of(Numeric) unless hash[:wealth].nil?
|
44
|
+
hash[:reference].should be_a_kind_of(String) unless hash[:reference].nil?
|
45
|
+
end
|
46
|
+
end
|
47
|
+
end
|
48
|
+
|
metadata
CHANGED
@@ -1,41 +1,45 @@
|
|
1
|
-
--- !ruby/object:Gem::Specification
|
1
|
+
--- !ruby/object:Gem::Specification
|
2
2
|
name: smarter_csv
|
3
|
-
version: !ruby/object:Gem::Version
|
4
|
-
version: 1.0.
|
3
|
+
version: !ruby/object:Gem::Version
|
4
|
+
version: 1.0.16
|
5
|
+
prerelease:
|
5
6
|
platform: ruby
|
6
|
-
authors:
|
7
|
-
-
|
8
|
-
Tilo Sloboda
|
7
|
+
authors:
|
8
|
+
- ! 'Tilo Sloboda
|
9
9
|
|
10
|
+
'
|
10
11
|
autorequire:
|
11
12
|
bindir: bin
|
12
13
|
cert_chain: []
|
13
|
-
|
14
|
-
|
15
|
-
|
16
|
-
- !ruby/object:Gem::Dependency
|
14
|
+
date: 2014-01-13 00:00:00.000000000 Z
|
15
|
+
dependencies:
|
16
|
+
- !ruby/object:Gem::Dependency
|
17
17
|
name: rspec
|
18
|
-
|
19
|
-
|
20
|
-
requirements:
|
21
|
-
-
|
22
|
-
-
|
23
|
-
|
24
|
-
version: "0"
|
18
|
+
requirement: !ruby/object:Gem::Requirement
|
19
|
+
none: false
|
20
|
+
requirements:
|
21
|
+
- - ! '>='
|
22
|
+
- !ruby/object:Gem::Version
|
23
|
+
version: '0'
|
25
24
|
type: :development
|
26
|
-
|
27
|
-
|
28
|
-
|
29
|
-
|
30
|
-
|
25
|
+
prerelease: false
|
26
|
+
version_requirements: !ruby/object:Gem::Requirement
|
27
|
+
none: false
|
28
|
+
requirements:
|
29
|
+
- - ! '>='
|
30
|
+
- !ruby/object:Gem::Version
|
31
|
+
version: '0'
|
32
|
+
description: Ruby Gem for smarter importing of CSV Files as Array(s) of Hashes, with
|
33
|
+
optional features for processing large files in parallel, embedded comments, unusual
|
34
|
+
field- and record-separators, flexible mapping of CSV-headers to Hash-keys
|
35
|
+
email:
|
36
|
+
- ! 'tilo.sloboda@gmail.com
|
31
37
|
|
38
|
+
'
|
32
39
|
executables: []
|
33
|
-
|
34
40
|
extensions: []
|
35
|
-
|
36
41
|
extra_rdoc_files: []
|
37
|
-
|
38
|
-
files:
|
42
|
+
files:
|
39
43
|
- .gitignore
|
40
44
|
- .rspec
|
41
45
|
- .rvmrc
|
@@ -53,6 +57,7 @@ files:
|
|
53
57
|
- spec/fixtures/chunk_cornercase.csv
|
54
58
|
- spec/fixtures/lots_of_columns.csv
|
55
59
|
- spec/fixtures/no_header.csv
|
60
|
+
- spec/fixtures/numeric.csv
|
56
61
|
- spec/fixtures/pets.csv
|
57
62
|
- spec/fixtures/quoted.csv
|
58
63
|
- spec/fixtures/separator.csv
|
@@ -60,6 +65,7 @@ files:
|
|
60
65
|
- spec/smarter_csv/binary_file_spec.rb
|
61
66
|
- spec/smarter_csv/chunked_reading_spec.rb
|
62
67
|
- spec/smarter_csv/column_separator_spec.rb
|
68
|
+
- spec/smarter_csv/convert_values_to_numeric_spec.rb
|
63
69
|
- spec/smarter_csv/key_mapping_spec.rb
|
64
70
|
- spec/smarter_csv/load_basic_spec.rb
|
65
71
|
- spec/smarter_csv/no_header_spec.rb
|
@@ -75,35 +81,40 @@ files:
|
|
75
81
|
- spec/spec/spec_helper.rb
|
76
82
|
- spec/spec_helper.rb
|
77
83
|
homepage: https://github.com/tilo/smarter_csv
|
78
|
-
licenses:
|
84
|
+
licenses:
|
79
85
|
- MIT
|
80
86
|
- GPL-2
|
81
|
-
metadata: {}
|
82
|
-
|
83
87
|
post_install_message:
|
84
88
|
rdoc_options: []
|
85
|
-
|
86
|
-
require_paths:
|
89
|
+
require_paths:
|
87
90
|
- lib
|
88
|
-
required_ruby_version: !ruby/object:Gem::Requirement
|
89
|
-
|
90
|
-
|
91
|
-
|
92
|
-
|
93
|
-
|
94
|
-
|
91
|
+
required_ruby_version: !ruby/object:Gem::Requirement
|
92
|
+
none: false
|
93
|
+
requirements:
|
94
|
+
- - ! '>='
|
95
|
+
- !ruby/object:Gem::Version
|
96
|
+
version: '0'
|
97
|
+
required_rubygems_version: !ruby/object:Gem::Requirement
|
98
|
+
none: false
|
99
|
+
requirements:
|
100
|
+
- - ! '>='
|
101
|
+
- !ruby/object:Gem::Version
|
102
|
+
version: '0'
|
103
|
+
requirements:
|
95
104
|
- csv
|
96
105
|
rubyforge_project:
|
97
|
-
rubygems_version:
|
106
|
+
rubygems_version: 1.8.23
|
98
107
|
signing_key:
|
99
|
-
specification_version:
|
100
|
-
summary: Ruby Gem for smarter importing of CSV Files (and CSV-like files), with lots
|
101
|
-
|
108
|
+
specification_version: 3
|
109
|
+
summary: Ruby Gem for smarter importing of CSV Files (and CSV-like files), with lots
|
110
|
+
of optional features, e.g. chunked processing for huge CSV files
|
111
|
+
test_files:
|
102
112
|
- spec/fixtures/basic.csv
|
103
113
|
- spec/fixtures/binary.csv
|
104
114
|
- spec/fixtures/chunk_cornercase.csv
|
105
115
|
- spec/fixtures/lots_of_columns.csv
|
106
116
|
- spec/fixtures/no_header.csv
|
117
|
+
- spec/fixtures/numeric.csv
|
107
118
|
- spec/fixtures/pets.csv
|
108
119
|
- spec/fixtures/quoted.csv
|
109
120
|
- spec/fixtures/separator.csv
|
@@ -111,6 +122,7 @@ test_files:
|
|
111
122
|
- spec/smarter_csv/binary_file_spec.rb
|
112
123
|
- spec/smarter_csv/chunked_reading_spec.rb
|
113
124
|
- spec/smarter_csv/column_separator_spec.rb
|
125
|
+
- spec/smarter_csv/convert_values_to_numeric_spec.rb
|
114
126
|
- spec/smarter_csv/key_mapping_spec.rb
|
115
127
|
- spec/smarter_csv/load_basic_spec.rb
|
116
128
|
- spec/smarter_csv/no_header_spec.rb
|
checksums.yaml
DELETED
@@ -1,7 +0,0 @@
|
|
1
|
-
---
|
2
|
-
SHA1:
|
3
|
-
metadata.gz: 7fe73934009f54b4447577c4190d8051e315544f
|
4
|
-
data.tar.gz: b32b1635d2cb25f5feac9e06522058c319d6291d
|
5
|
-
SHA512:
|
6
|
-
metadata.gz: 726b149bde30de57bbabd90078b92f20a4b59d6a41dfb076b0bdb8fae6d47d7701fbdd427f154e1c523fcce3e93ae0af71c4f9426ff48edca10241ab1b50ce5b
|
7
|
-
data.tar.gz: 4cc80396cb2a41fae08c20969b3c4ff05b13900289f81a32ef36cc830f363d710428d050b666bc0843b0641ea63486b0157ddf2c728b59a2a47e3eb9974e265d
|