smarter_csv 1.0.15 → 1.0.16
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- data/.travis.yml +2 -2
- data/README.md +17 -2
- data/lib/smarter_csv/smarter_csv.rb +24 -7
- data/lib/smarter_csv/version.rb +1 -1
- data/spec/fixtures/numeric.csv +5 -0
- data/spec/smarter_csv/convert_values_to_numeric_spec.rb +48 -0
- metadata +54 -42
- checksums.yaml +0 -7
data/.travis.yml
CHANGED
@@ -7,8 +7,6 @@ rvm:
|
|
7
7
|
- 2.0.0
|
8
8
|
- jruby-18mode
|
9
9
|
- jruby-19mode
|
10
|
-
- rbx-18mode
|
11
|
-
- rbx-19mode
|
12
10
|
- ruby-head
|
13
11
|
- jruby-head
|
14
12
|
- ree
|
@@ -18,6 +16,8 @@ jdk:
|
|
18
16
|
env: JRUBY_OPTS="--server -Xcompile.invokedynamic=false -J-XX:+TieredCompilation -J-XX:TieredStopAtLevel=1 -J-noverify -J-Xms512m -J-Xmx1024m"
|
19
17
|
matrix:
|
20
18
|
allow_failures:
|
19
|
+
- rbx-18mode
|
20
|
+
- rbx-19mode
|
21
21
|
- rvm: jruby-head
|
22
22
|
- rvm: ruby-head
|
23
23
|
- rvm: ree
|
data/README.md
CHANGED
@@ -3,6 +3,11 @@
|
|
3
3
|
`smarter_csv` is a Ruby Gem for smarter importing of CSV Files as Array(s) of Hashes, suitable for direct processing with Mongoid or ActiveRecord,
|
4
4
|
and parallel processing with Resque or Sidekiq.
|
5
5
|
|
6
|
+
One `smarter_csv` user wrote:
|
7
|
+
|
8
|
+
*Best gem for CSV for us yet. [...] taking an import process from 7+ hours to about 3 minutes.
|
9
|
+
[...] Smarter CSV was a big part and helped clean up our code ALOT*
|
10
|
+
|
6
11
|
`smarter_csv` has lots of features:
|
7
12
|
* able to process large CSV-files
|
8
13
|
* able to chunk the input from the CSV file to avoid loading the whole CSV file into memory
|
@@ -143,12 +148,13 @@ The options and the block are optional.
|
|
143
148
|
| :remove_values_matching | nil | removes key/value pairs if value matches given regular expressions. e.g.: |
|
144
149
|
| | | /^\$0\.0+$/ to match $0.00 , or /^#VALUE!$/ to match errors in Excel spreadsheets |
|
145
150
|
| :convert_values_to_numeric | true | converts strings containing Integers or Floats to the appropriate class |
|
151
|
+
| | | also accepts either {:except => [:key1,:key2]} or {:only => :key3} |
|
146
152
|
| :remove_empty_hashes | true | remove / ignore any hashes which don't have any key/value pairs |
|
147
153
|
| :user_provided_headers | nil | *careful with that axe!* |
|
148
154
|
| | | user provided Array of header strings or symbols, to define |
|
149
155
|
| | | what headers should be used, overriding any in-file headers. |
|
150
156
|
| | | You can not combine the :user_provided_headers and :key_mapping options |
|
151
|
-
| :strip_chars_from_headers | nil | remove extraneous characters from the header line (e.g. if
|
157
|
+
| :strip_chars_from_headers | nil | RegExp to remove extraneous characters from the header line (e.g. if headers are quoted) |
|
152
158
|
| :headers_in_file | true | Whether or not the file contains headers as the first line. |
|
153
159
|
| | | Important if the file does not contain headers, |
|
154
160
|
| | | otherwise you would lose the first line of data. |
|
@@ -178,6 +184,10 @@ The options and the block are optional.
|
|
178
184
|
* if the chunk_size is > 0 , then the array may contain up to chunk_size Hashes.
|
179
185
|
* this can be very useful when passing chunked data to a post-processing step, e.g. through Resque
|
180
186
|
|
187
|
+
#### NOTES on improper quotation and unwanted characters in headers:
|
188
|
+
* some CSV files use un-escaped quotation characters inside fields. This can cause the import to break. To get around this, use the `:force_simple_split => true` option in combination with `:strip_chars_from_headers => /[\-"]/` . This will also significantly speed up the import.
|
189
|
+
If you would force a different :quote_char instead (setting it to a non-used character), then the import would be up to 5-times slower than using `:force_simple_split`.
|
190
|
+
|
181
191
|
#### Known Issues:
|
182
192
|
* if you are using 1.8.7 versions of Ruby, JRuby, or Ruby Enterprise Edition, `smarter_csv` will have problems with double-quoted fields, because of a bug in an underlying library.
|
183
193
|
* if your CSV data contains the :row_sep character, e.g. CR, smarter_csv will not be able to handle the data, but will report `CSV::MalformedCSVError: Unclosed quoted field`.
|
@@ -218,7 +228,11 @@ Or install it yourself as:
|
|
218
228
|
|
219
229
|
## Changes
|
220
230
|
|
221
|
-
#### 1.0.
|
231
|
+
#### 1.0.16 (2014-01-13)
|
232
|
+
* :convert_values_to_numeric option can now be qualified with :except or :only (thanks to Hugo Lepetit)
|
233
|
+
* removed deprecated `process_csv` method
|
234
|
+
|
235
|
+
#### 1.0.15 (2013-12-07)
|
222
236
|
* new option:
|
223
237
|
* :remove_unmapped_keys to completely ignore columns which were not mapped with :key_mapping (thanks to Dave Sanders)
|
224
238
|
|
@@ -319,6 +333,7 @@ And a special thanks to those who contributed pull requests:
|
|
319
333
|
* [Marcos G. Zimmermann](https://github.com/marcosgz)
|
320
334
|
* [Jordan Running](https://github.com/jrunning)
|
321
335
|
* [Dave Sanders](https://github.com/DaveSanders)
|
336
|
+
* [Hugo Lepetit](https://github.com/giglemad)
|
322
337
|
|
323
338
|
|
324
339
|
## Contributing
|
@@ -1,7 +1,8 @@
|
|
1
1
|
module SmarterCSV
|
2
2
|
|
3
|
-
class HeaderSizeMismatch < Exception
|
4
|
-
|
3
|
+
class HeaderSizeMismatch < Exception; end
|
4
|
+
|
5
|
+
class IncorrectOption < Exception; end
|
5
6
|
|
6
7
|
def SmarterCSV.process(input, options={}, &block) # first parameter: filename or input object with readline method
|
7
8
|
default_options = {:col_sep => ',' , :row_sep => $/ , :quote_char => '"', :force_simple_split => false , :verbose => false ,
|
@@ -103,6 +104,10 @@ module SmarterCSV
|
|
103
104
|
hash.delete_if{|k,v| v =~ options[:remove_values_matching]} if options[:remove_values_matching]
|
104
105
|
if options[:convert_values_to_numeric]
|
105
106
|
hash.each do |k,v|
|
107
|
+
# deal with the :only / :except options to :convert_values_to_numeric
|
108
|
+
next if SmarterCSV.only_or_except_limit_execution( options, :convert_values_to_numeric , k )
|
109
|
+
|
110
|
+
# convert if it's a numeric value:
|
106
111
|
case v
|
107
112
|
when /^[+-]?\d+\.\d+$/
|
108
113
|
hash[k] = v.to_f
|
@@ -128,11 +133,9 @@ module SmarterCSV
|
|
128
133
|
else
|
129
134
|
|
130
135
|
# the last chunk may contain partial data, which also needs to be returned (BUG / ISSUE-18)
|
131
|
-
|
132
136
|
|
133
137
|
end
|
134
138
|
|
135
|
-
|
136
139
|
# while a chunk is being filled up we don't need to do anything else here
|
137
140
|
|
138
141
|
else # no chunk handling
|
@@ -164,9 +167,23 @@ module SmarterCSV
|
|
164
167
|
end
|
165
168
|
end
|
166
169
|
|
167
|
-
def SmarterCSV.process_csv(*args)
|
168
|
-
warn "[DEPRECATION] `process_csv` is deprecated. Please use `process` instead."
|
169
|
-
SmarterCSV.process(*args)
|
170
|
+
# def SmarterCSV.process_csv(*args)
|
171
|
+
# warn "[DEPRECATION] `process_csv` is deprecated. Please use `process` instead."
|
172
|
+
# SmarterCSV.process(*args)
|
173
|
+
# end
|
174
|
+
|
175
|
+
private
|
176
|
+
# acts as a road-block to limit processing when iterating over all k/v pairs of a CSV-hash:
|
177
|
+
|
178
|
+
def self.only_or_except_limit_execution( options, option_name, key )
|
179
|
+
if options[option_name].is_a?(Hash)
|
180
|
+
if options[option_name].has_key?( :except )
|
181
|
+
return true if Array( options[ option_name ][:except] ).include?(key)
|
182
|
+
elsif options[ option_name ].has_key?(:only)
|
183
|
+
return true unless Array( options[ option_name ][:only] ).include?(key)
|
184
|
+
end
|
185
|
+
end
|
186
|
+
return false
|
170
187
|
end
|
171
188
|
end
|
172
189
|
|
data/lib/smarter_csv/version.rb
CHANGED
@@ -0,0 +1,48 @@
|
|
1
|
+
require 'spec_helper'
|
2
|
+
|
3
|
+
fixture_path = 'spec/fixtures'
|
4
|
+
|
5
|
+
describe 'numeric conversion of values' do
|
6
|
+
it 'occurs by default' do
|
7
|
+
options = {}
|
8
|
+
data = SmarterCSV.process("#{fixture_path}/numeric.csv", options)
|
9
|
+
data.size.should == 3
|
10
|
+
|
11
|
+
# all the keys should be symbols
|
12
|
+
data.each do |hash|
|
13
|
+
hash[:wealth].should be_a_kind_of(Numeric) unless hash[:wealth].nil?
|
14
|
+
hash[:reference].should be_a_kind_of(Numeric) unless hash[:reference].nil?
|
15
|
+
end
|
16
|
+
end
|
17
|
+
|
18
|
+
it 'can be prevented for all values' do
|
19
|
+
options = { convert_values_to_numeric: false }
|
20
|
+
data = SmarterCSV.process("#{fixture_path}/numeric.csv", options)
|
21
|
+
|
22
|
+
data.each do |hash|
|
23
|
+
hash[:wealth].should be_a_kind_of(String) unless hash[:wealth].nil?
|
24
|
+
hash[:reference].should be_a_kind_of(String) unless hash[:reference].nil?
|
25
|
+
end
|
26
|
+
end
|
27
|
+
|
28
|
+
it 'can be prevented for some keys' do
|
29
|
+
options = { convert_values_to_numeric: { except: :reference }}
|
30
|
+
data = SmarterCSV.process("#{fixture_path}/numeric.csv", options)
|
31
|
+
|
32
|
+
data.each do |hash|
|
33
|
+
hash[:wealth].should be_a_kind_of(Numeric) unless hash[:wealth].nil?
|
34
|
+
hash[:reference].should be_a_kind_of(String) unless hash[:reference].nil?
|
35
|
+
end
|
36
|
+
end
|
37
|
+
|
38
|
+
it 'can occur only for some keys' do
|
39
|
+
options = { convert_values_to_numeric: { only: :wealth }}
|
40
|
+
data = SmarterCSV.process("#{fixture_path}/numeric.csv", options)
|
41
|
+
|
42
|
+
data.each do |hash|
|
43
|
+
hash[:wealth].should be_a_kind_of(Numeric) unless hash[:wealth].nil?
|
44
|
+
hash[:reference].should be_a_kind_of(String) unless hash[:reference].nil?
|
45
|
+
end
|
46
|
+
end
|
47
|
+
end
|
48
|
+
|
metadata
CHANGED
@@ -1,41 +1,45 @@
|
|
1
|
-
--- !ruby/object:Gem::Specification
|
1
|
+
--- !ruby/object:Gem::Specification
|
2
2
|
name: smarter_csv
|
3
|
-
version: !ruby/object:Gem::Version
|
4
|
-
version: 1.0.
|
3
|
+
version: !ruby/object:Gem::Version
|
4
|
+
version: 1.0.16
|
5
|
+
prerelease:
|
5
6
|
platform: ruby
|
6
|
-
authors:
|
7
|
-
-
|
8
|
-
Tilo Sloboda
|
7
|
+
authors:
|
8
|
+
- ! 'Tilo Sloboda
|
9
9
|
|
10
|
+
'
|
10
11
|
autorequire:
|
11
12
|
bindir: bin
|
12
13
|
cert_chain: []
|
13
|
-
|
14
|
-
|
15
|
-
|
16
|
-
- !ruby/object:Gem::Dependency
|
14
|
+
date: 2014-01-13 00:00:00.000000000 Z
|
15
|
+
dependencies:
|
16
|
+
- !ruby/object:Gem::Dependency
|
17
17
|
name: rspec
|
18
|
-
|
19
|
-
|
20
|
-
requirements:
|
21
|
-
-
|
22
|
-
-
|
23
|
-
|
24
|
-
version: "0"
|
18
|
+
requirement: !ruby/object:Gem::Requirement
|
19
|
+
none: false
|
20
|
+
requirements:
|
21
|
+
- - ! '>='
|
22
|
+
- !ruby/object:Gem::Version
|
23
|
+
version: '0'
|
25
24
|
type: :development
|
26
|
-
|
27
|
-
|
28
|
-
|
29
|
-
|
30
|
-
|
25
|
+
prerelease: false
|
26
|
+
version_requirements: !ruby/object:Gem::Requirement
|
27
|
+
none: false
|
28
|
+
requirements:
|
29
|
+
- - ! '>='
|
30
|
+
- !ruby/object:Gem::Version
|
31
|
+
version: '0'
|
32
|
+
description: Ruby Gem for smarter importing of CSV Files as Array(s) of Hashes, with
|
33
|
+
optional features for processing large files in parallel, embedded comments, unusual
|
34
|
+
field- and record-separators, flexible mapping of CSV-headers to Hash-keys
|
35
|
+
email:
|
36
|
+
- ! 'tilo.sloboda@gmail.com
|
31
37
|
|
38
|
+
'
|
32
39
|
executables: []
|
33
|
-
|
34
40
|
extensions: []
|
35
|
-
|
36
41
|
extra_rdoc_files: []
|
37
|
-
|
38
|
-
files:
|
42
|
+
files:
|
39
43
|
- .gitignore
|
40
44
|
- .rspec
|
41
45
|
- .rvmrc
|
@@ -53,6 +57,7 @@ files:
|
|
53
57
|
- spec/fixtures/chunk_cornercase.csv
|
54
58
|
- spec/fixtures/lots_of_columns.csv
|
55
59
|
- spec/fixtures/no_header.csv
|
60
|
+
- spec/fixtures/numeric.csv
|
56
61
|
- spec/fixtures/pets.csv
|
57
62
|
- spec/fixtures/quoted.csv
|
58
63
|
- spec/fixtures/separator.csv
|
@@ -60,6 +65,7 @@ files:
|
|
60
65
|
- spec/smarter_csv/binary_file_spec.rb
|
61
66
|
- spec/smarter_csv/chunked_reading_spec.rb
|
62
67
|
- spec/smarter_csv/column_separator_spec.rb
|
68
|
+
- spec/smarter_csv/convert_values_to_numeric_spec.rb
|
63
69
|
- spec/smarter_csv/key_mapping_spec.rb
|
64
70
|
- spec/smarter_csv/load_basic_spec.rb
|
65
71
|
- spec/smarter_csv/no_header_spec.rb
|
@@ -75,35 +81,40 @@ files:
|
|
75
81
|
- spec/spec/spec_helper.rb
|
76
82
|
- spec/spec_helper.rb
|
77
83
|
homepage: https://github.com/tilo/smarter_csv
|
78
|
-
licenses:
|
84
|
+
licenses:
|
79
85
|
- MIT
|
80
86
|
- GPL-2
|
81
|
-
metadata: {}
|
82
|
-
|
83
87
|
post_install_message:
|
84
88
|
rdoc_options: []
|
85
|
-
|
86
|
-
require_paths:
|
89
|
+
require_paths:
|
87
90
|
- lib
|
88
|
-
required_ruby_version: !ruby/object:Gem::Requirement
|
89
|
-
|
90
|
-
|
91
|
-
|
92
|
-
|
93
|
-
|
94
|
-
|
91
|
+
required_ruby_version: !ruby/object:Gem::Requirement
|
92
|
+
none: false
|
93
|
+
requirements:
|
94
|
+
- - ! '>='
|
95
|
+
- !ruby/object:Gem::Version
|
96
|
+
version: '0'
|
97
|
+
required_rubygems_version: !ruby/object:Gem::Requirement
|
98
|
+
none: false
|
99
|
+
requirements:
|
100
|
+
- - ! '>='
|
101
|
+
- !ruby/object:Gem::Version
|
102
|
+
version: '0'
|
103
|
+
requirements:
|
95
104
|
- csv
|
96
105
|
rubyforge_project:
|
97
|
-
rubygems_version:
|
106
|
+
rubygems_version: 1.8.23
|
98
107
|
signing_key:
|
99
|
-
specification_version:
|
100
|
-
summary: Ruby Gem for smarter importing of CSV Files (and CSV-like files), with lots
|
101
|
-
|
108
|
+
specification_version: 3
|
109
|
+
summary: Ruby Gem for smarter importing of CSV Files (and CSV-like files), with lots
|
110
|
+
of optional features, e.g. chunked processing for huge CSV files
|
111
|
+
test_files:
|
102
112
|
- spec/fixtures/basic.csv
|
103
113
|
- spec/fixtures/binary.csv
|
104
114
|
- spec/fixtures/chunk_cornercase.csv
|
105
115
|
- spec/fixtures/lots_of_columns.csv
|
106
116
|
- spec/fixtures/no_header.csv
|
117
|
+
- spec/fixtures/numeric.csv
|
107
118
|
- spec/fixtures/pets.csv
|
108
119
|
- spec/fixtures/quoted.csv
|
109
120
|
- spec/fixtures/separator.csv
|
@@ -111,6 +122,7 @@ test_files:
|
|
111
122
|
- spec/smarter_csv/binary_file_spec.rb
|
112
123
|
- spec/smarter_csv/chunked_reading_spec.rb
|
113
124
|
- spec/smarter_csv/column_separator_spec.rb
|
125
|
+
- spec/smarter_csv/convert_values_to_numeric_spec.rb
|
114
126
|
- spec/smarter_csv/key_mapping_spec.rb
|
115
127
|
- spec/smarter_csv/load_basic_spec.rb
|
116
128
|
- spec/smarter_csv/no_header_spec.rb
|
checksums.yaml
DELETED
@@ -1,7 +0,0 @@
|
|
1
|
-
---
|
2
|
-
SHA1:
|
3
|
-
metadata.gz: 7fe73934009f54b4447577c4190d8051e315544f
|
4
|
-
data.tar.gz: b32b1635d2cb25f5feac9e06522058c319d6291d
|
5
|
-
SHA512:
|
6
|
-
metadata.gz: 726b149bde30de57bbabd90078b92f20a4b59d6a41dfb076b0bdb8fae6d47d7701fbdd427f154e1c523fcce3e93ae0af71c4f9426ff48edca10241ab1b50ce5b
|
7
|
-
data.tar.gz: 4cc80396cb2a41fae08c20969b3c4ff05b13900289f81a32ef36cc830f363d710428d050b666bc0843b0641ea63486b0157ddf2c728b59a2a47e3eb9974e265d
|