smarter_csv 1.2.0 → 1.2.3

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
- SHA1:
3
- metadata.gz: 251eb1eff306211d163c6be6880ba45e3eb29985
4
- data.tar.gz: 9e16d1d4aaab86df7a65d73e6a23ace8f481ba4e
2
+ SHA256:
3
+ metadata.gz: c0efcd4dbad2546469ba99c1a133bc7548382e147fca66207d11065607fdb5ee
4
+ data.tar.gz: 98024633925ed73251dd00606f23f73ca6cbda5517c24882b95935138ab4d7e3
5
5
  SHA512:
6
- metadata.gz: e2ad758ddb7d644777df1b9f38b73c7c4c0753d76507834bf68e86394ae017b1f5caa16867efd82f4c0443d4a7c1b4c1d7f704d9c3796e71083bde689ee7aeba
7
- data.tar.gz: 8f7126500628a17fda587a05678aca9140d0ebb7ca168fc3144b1d9c00d2be6ec901a43900bd3d558aceb0a8dcfc139aba7347e6dbad01b2ff83c19dc3816763
6
+ metadata.gz: ac64f3a688f9b5b4bce09e26d8ee140994bf585578ea6eed1d998cec434535f3afb4aa8c2ea222b873cb5ee113fba748412a96e1b3b23a8b5425d13aa7a6436c
7
+ data.tar.gz: f8c3132f1f2cf60f8f67d2278d0b3660d7aab456581256e1c2508d1397c1cc31d455035c68656777c34e90ad6238c239699b1c59cdd918cdcd63c9f5a264aaba
data/README.md CHANGED
@@ -2,6 +2,19 @@
2
2
 
3
3
  [![Build Status](https://secure.travis-ci.org/tilo/smarter_csv.svg?branch=master)](http://travis-ci.org/tilo/smarter_csv) [![Gem Version](https://badge.fury.io/rb/smarter_csv.svg)](http://badge.fury.io/rb/smarter_csv)
4
4
 
5
+ ---------------
6
+ #### Service Announcement
7
+
8
+ Work towards SmarterCSV 2.0 is on it's way, with much improved features, and more streamlined options.
9
+
10
+ Please check the 2.0-develop branch, and open issues marked v2.0 and leave your comments.
11
+
12
+ New versions on the 1.2 branch will soon print a deprecation warning if you set :verbose to true
13
+ See below for list of deprecated options.
14
+
15
+ ---------------
16
+ #### SmarterCSV
17
+
5
18
  `smarter_csv` is a Ruby Gem for smarter importing of CSV Files as Array(s) of Hashes, suitable for direct processing with Mongoid or ActiveRecord,
6
19
  and parallel processing with Resque or Sidekiq.
7
20
 
@@ -182,18 +195,48 @@ The options and the block are optional.
182
195
 
183
196
  `SmarterCSV.process` supports the following options:
184
197
 
198
+ #### Options:
199
+
185
200
  | Option | Default | Explanation |
186
201
  ---------------------------------------------------------------------------------------------------------------------------------
202
+ | :chunk_size | nil | if set, determines the desired chunk-size (defaults to nil, no chunk processing) |
203
+ | | | |
204
+ | :file_encoding | utf-8 | Set the file encoding eg.: 'windows-1252' or 'iso-8859-1' |
205
+ | :invalid_byte_sequence | '' | what to replace invalid byte sequences with |
206
+ | :force_utf8 | false | force UTF-8 encoding of all lines (including headers) in the CSV file |
207
+ | :skip_lines | nil | how many lines to skip before the first line or header line is processed |
208
+ | :comment_regexp | /^#/ | regular expression which matches comment lines (see NOTE about the CSV header) |
209
+ ---------------------------------------------------------------------------------------------------------------------------------
187
210
  | :col_sep | ',' | column separator |
211
+ | :force_simple_split | false | force simple splitting on :col_sep character for non-standard CSV-files. |
212
+ | | | e.g. when :quote_char is not properly escaped |
188
213
  | :row_sep | $/ ,"\n" | row separator or record separator , defaults to system's $/ , which defaults to "\n" |
189
214
  | | | This can also be set to :auto, but will process the whole cvs file first (slow!) |
190
215
  | :auto_row_sep_chars | 500 | How many characters to analyze when using `:row_sep => :auto`. nil or 0 means whole file. |
191
216
  | :quote_char | '"' | quotation character |
192
- | :comment_regexp | /^#/ | regular expression which matches comment lines (see NOTE about the CSV header) |
193
- | :chunk_size | nil | if set, determines the desired chunk-size (defaults to nil, no chunk processing) |
217
+ ---------------------------------------------------------------------------------------------------------------------------------
218
+ | :headers_in_file | true | Whether or not the file contains headers as the first line. |
219
+ | | | Important if the file does not contain headers, |
220
+ | | | otherwise you would lose the first line of data. |
221
+ | :user_provided_headers | nil | *careful with that axe!* |
222
+ | | | user provided Array of header strings or symbols, to define |
223
+ | | | what headers should be used, overriding any in-file headers. |
224
+ | | | You can not combine the :user_provided_headers and :key_mapping options |
225
+ | :remove_empty_hashes | true | remove / ignore any hashes which don't have any key/value pairs |
226
+ | :verbose | false | print out line number while processing (to track down problems in input files) |
227
+ ---------------------------------------------------------------------------------------------------------------------------------
228
+
229
+ #### Deprecated 1.x Options: to be replaced in 2.0
230
+
231
+ There have been a lot of 1-offs and feature creep around these options, and going forward we'll have a simpler, but more flexible way to address these features.
232
+
233
+ Instead of these options, there will be a new and more flexible way to process the header fields, as well as the fields in each line of the CSV.
234
+ And header and data validations will also be supported in 2.x
235
+
236
+ | Option | Default | Explanation |
194
237
  ---------------------------------------------------------------------------------------------------------------------------------
195
238
  | :key_mapping | nil | a hash which maps headers from the CSV file to keys in the result hash |
196
- | :required_headers | nil | An array. Eacn of the given headers must be present in the CSV file, |
239
+ | :required_headers | nil | An array. Eacn of the given headers must be present after header manipulation, |
197
240
  | | | or an exception is raised No validation if nil is given. |
198
241
  | :remove_unmapped_keys | false | when using :key_mapping option, should non-mapped keys / columns be removed? |
199
242
  | :downcase_header | true | downcase all column headers |
@@ -201,17 +244,7 @@ The options and the block are optional.
201
244
  | :strip_whitespace | true | remove whitespace before/after values and headers |
202
245
  | :keep_original_headers | false | keep the original headers from the CSV-file as-is. |
203
246
  | | | Disables other flags manipulating the header fields. |
204
- | :user_provided_headers | nil | *careful with that axe!* |
205
- | | | user provided Array of header strings or symbols, to define |
206
- | | | what headers should be used, overriding any in-file headers. |
207
- | | | You can not combine the :user_provided_headers and :key_mapping options |
208
247
  | :strip_chars_from_headers | nil | RegExp to remove extraneous characters from the header line (e.g. if headers are quoted) |
209
- | :headers_in_file | true | Whether or not the file contains headers as the first line. |
210
- | | | Important if the file does not contain headers, |
211
- | | | otherwise you would lose the first line of data. |
212
- | :skip_lines | nil | how many lines to skip before the first line or header line is processed |
213
- | :force_utf8 | false | force UTF-8 encoding of all lines (including headers) in the CSV file |
214
- | :invalid_byte_sequence | '' | how to replace invalid byte sequences with |
215
248
  ---------------------------------------------------------------------------------------------------------------------------------
216
249
  | :value_converters | nil | supply a hash of :header => KlassName; the class needs to implement self.convert(val)|
217
250
  | :remove_empty_values | true | remove values which have nil or empty strings as values |
@@ -220,11 +253,7 @@ The options and the block are optional.
220
253
  | | | /^\$0\.0+$/ to match $0.00 , or /^#VALUE!$/ to match errors in Excel spreadsheets |
221
254
  | :convert_values_to_numeric | true | converts strings containing Integers or Floats to the appropriate class |
222
255
  | | | also accepts either {:except => [:key1,:key2]} or {:only => :key3} |
223
- | :remove_empty_hashes | true | remove / ignore any hashes which don't have any key/value pairs |
224
- | :file_encoding | utf-8 | Set the file encoding eg.: 'windows-1252' or 'iso-8859-1' |
225
- | :force_simple_split | false | force simple splitting on :col_sep character for non-standard CSV-files. |
226
- | | | e.g. when :quote_char is not properly escaped |
227
- | :verbose | false | print out line number while processing (to track down problems in input files) |
256
+ ---------------------------------------------------------------------------------------------------------------------------------
228
257
 
229
258
 
230
259
  #### NOTES about File Encodings:
@@ -295,6 +324,13 @@ Planned in the next releases:
295
324
 
296
325
  ## Changes
297
326
 
327
+ #### 1.2.3 (2018-01-27)
328
+ * fixed regression / test
329
+ * fuxed quote_char interpolation for headers, but not data (thanks to Colin Petruno)
330
+
331
+ #### 1.2.1 (2018-01-25) ### YANKED!
332
+ * bugfix (thanks to Joshua Smith for reporting)
333
+
298
334
  #### 1.2.0 (2018-01-20)
299
335
  * add default validation that a header can only appear once
300
336
  * add option `required_headers`
@@ -465,6 +501,8 @@ And a special thanks to those who contributed pull requests:
465
501
  * [Ivan Ushakov](https://github.com/IvanUshakov)
466
502
  * [Matthieu Paret](https://github.com/mtparet)
467
503
  * [Rohit Amarnath](https://github.com/ramarnat)
504
+ * [Joshua Smith](https://github.com/enviable)
505
+ * [Colin Petruno](https://github.com/colinpetruno)
468
506
 
469
507
 
470
508
  ## Contributing
@@ -59,7 +59,7 @@ module SmarterCSV
59
59
  else
60
60
  file_headerA = header.split(options[:col_sep])
61
61
  end
62
- file_headerA.map!{|x| x.gsub(%r/options[:quote_char]/,'') }
62
+ file_headerA.map!{|x| x.gsub(%r/#{options[:quote_char]}/,'') }
63
63
  file_headerA.map!{|x| x.strip} if options[:strip_whitespace]
64
64
  unless options[:keep_original_headers]
65
65
  file_headerA.map!{|x| x.gsub(/\s+|-+/,'_')}
@@ -68,14 +68,14 @@ module SmarterCSV
68
68
 
69
69
  file_header_size = file_headerA.size
70
70
  else
71
- raise SmarterCSV::IncorrectOption , "ERROR [smarter_csv]: If :headers_in_file is set to false, you have to provide :user_provided_headers" if options[:user_provided_headers].nil?
71
+ raise SmarterCSV::IncorrectOption , "ERROR: If :headers_in_file is set to false, you have to provide :user_provided_headers" if options[:user_provided_headers].nil?
72
72
  end
73
73
  if options[:user_provided_headers] && options[:user_provided_headers].class == Array && ! options[:user_provided_headers].empty?
74
74
  # use user-provided headers
75
75
  headerA = options[:user_provided_headers]
76
76
  if defined?(file_header_size) && ! file_header_size.nil?
77
77
  if headerA.size != file_header_size
78
- raise SmarterCSV::HeaderSizeMismatch , "ERROR [smarter_csv]: :user_provided_headers defines #{headerA.size} headers != CSV-file #{input} has #{file_header_size} headers"
78
+ raise SmarterCSV::HeaderSizeMismatch , "ERROR: :user_provided_headers defines #{headerA.size} headers != CSV-file #{input} has #{file_header_size} headers"
79
79
  else
80
80
  # we could print out the mapping of file_headerA to headerA here
81
81
  end
@@ -100,14 +100,14 @@ module SmarterCSV
100
100
  headerA.compact.each do |k|
101
101
  duplicate_headers << k if headerA.select{|x| x == k}.size > 1
102
102
  end
103
- raise SmarterCSV::DuplicateHeaders , "ERORR [smarter_csv]: duplicate headers: #{duplicate_headers.join(',')}" unless duplicate_headers.empty?
103
+ raise SmarterCSV::DuplicateHeaders , "ERROR: duplicate headers: #{duplicate_headers.join(',')}" unless duplicate_headers.empty?
104
104
 
105
105
  if options[:required_headers] && options[:required_headers].is_a?(Array)
106
106
  missing_headers = []
107
107
  options[:required_headers].each do |k|
108
108
  missing_headers << k unless headerA.include?(k)
109
109
  end
110
- raise SmarterCSV::MissingHeaders , "ERORR [smarter_csv]: missing headers: #{missing_headers.join(',')}" unless missing_headers.empty?
110
+ raise SmarterCSV::MissingHeaders , "ERROR: missing headers: #{missing_headers.join(',')}" unless missing_headers.empty?
111
111
  end
112
112
 
113
113
  # in case we use chunking.. we'll need to set it up..
@@ -155,7 +155,7 @@ module SmarterCSV
155
155
  else
156
156
  dataA = line.split(options[:col_sep])
157
157
  end
158
- dataA.map!{|x| x.gsub(%r/options[:quote_char]/,'') }
158
+ #### dataA.map!{|x| x.gsub(%r/#{options[:quote_char]}/,'') } # this is actually not a good idea as a default
159
159
  dataA.map!{|x| x.strip} if options[:strip_whitespace]
160
160
  hash = Hash.zip(headerA,dataA) # from Facets of Ruby library
161
161
  # make sure we delete any key/value pairs from the hash, which the user wanted to delete:
@@ -1,3 +1,3 @@
1
1
  module SmarterCSV
2
- VERSION = "1.2.0"
2
+ VERSION = "1.2.3"
3
3
  end
@@ -0,0 +1,9 @@
1
+ "ID","FIRST_NAME","LAST_NAME"
2
+ "1","""John","Cooke"""
3
+ "2","Jam
4
+ e
5
+ son""","McCollum"
6
+ "3","""Jean","Conn"
7
+ "4","Jenny","Traer"
8
+ "5","Bo""bbie","Faga"
9
+ "6","Mica","Copeland"
@@ -0,0 +1,4 @@
1
+ "REVIEW DATE","AUTHOR","""ISBN""","DISCOUNTED ""PRICE"""
2
+ "1985/01/21","Douglas Adams",0345391802,5.95
3
+ "1998/07/15","Timothy ""The Parser"" Campbell",0968411304,18.99
4
+ "1999/12/03","Richard Friedman",0060630353,5.95
@@ -0,0 +1,3 @@
1
+ Account_ID,options_trader,Stock_Symbol,Shares Issued,Purchase Date
2
+ 0002310234,Mike Smith,TSLA,2300,2011-08-19
3
+ 0024923423,John Doe,AAPL,1300,2013-03-21
@@ -2,9 +2,9 @@ require 'spec_helper'
2
2
 
3
3
  fixture_path = 'spec/fixtures'
4
4
 
5
- describe 'be_able_to' do
5
+ describe 'loading file with quoted fields' do
6
6
 
7
- it 'loads_file_with_quoted_fields' do
7
+ it 'leaving the quotes in the data' do
8
8
  options = {}
9
9
  data = SmarterCSV.process("#{fixture_path}/quoted.csv", options)
10
10
  data.flatten.size.should == 4
@@ -20,4 +20,29 @@ describe 'be_able_to' do
20
20
  end
21
21
  end
22
22
 
23
+
24
+ it 'removes quotes around quoted fields, but not inside data' do
25
+ options = {}
26
+ data = SmarterCSV.process("#{fixture_path}/quote_char.csv", options)
27
+
28
+ data.length.should eq 6
29
+ data[1][:first_name].should eq "Jam\ne\nson\""
30
+ data[2][:first_name].should eq "\"Jean"
31
+ end
32
+
33
+
34
+ # NOTE: quotes inside headers need to be escaped by doubling them
35
+ # e.g. 'correct ""EXAMPLE""'
36
+ # this escaping is illegal: 'incorrect \"EXAMPLE\"' <-- this caused CSV parsing error
37
+ # in case of CSV parsing errirs, use :user_provided_headers, or key_mapping
38
+ #
39
+ it 'removes quotes around headers and extra quotes inside headers' do
40
+ options = {}
41
+ data = SmarterCSV.process("#{fixture_path}/quoted2.csv", options)
42
+
43
+ data.length.should eq 3
44
+ data.first.keys[2].should eq :isbn
45
+ data.first.keys[3].should eq :discounted_price
46
+ end
47
+
23
48
  end
@@ -0,0 +1,25 @@
1
+ require 'spec_helper'
2
+
3
+ fixture_path = 'spec/fixtures'
4
+
5
+ # somebody reported that a column called 'options_trader' would be truncated to 'trader'
6
+
7
+ describe 'loads simple file format' do
8
+
9
+ it 'with symbols as keys when using defaults' do
10
+ options = {}
11
+ data = SmarterCSV.process("#{fixture_path}/trading.csv", options)
12
+
13
+ data.flatten.size.should eq 2
14
+ data.each do |item|
15
+ # all keys should be symbols when using v1.x backwards compatible mode
16
+ item.keys.each{|x| x.class.should eq Symbol}
17
+ item[:account_id].class.should eq Fixnum
18
+ item[:options_trader].class.should eq String
19
+ item[:stock_symbol].class.should eq String
20
+ item[:shares_issued].class.should eq Fixnum
21
+ item[:purchase_date].class.should eq String
22
+ end
23
+ end
24
+
25
+ end
metadata CHANGED
@@ -1,7 +1,7 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: smarter_csv
3
3
  version: !ruby/object:Gem::Version
4
- version: 1.2.0
4
+ version: 1.2.3
5
5
  platform: ruby
6
6
  authors:
7
7
  - 'Tilo Sloboda
@@ -10,7 +10,7 @@ authors:
10
10
  autorequire:
11
11
  bindir: bin
12
12
  cert_chain: []
13
- date: 2018-01-20 00:00:00.000000000 Z
13
+ date: 2018-01-27 00:00:00.000000000 Z
14
14
  dependencies:
15
15
  - !ruby/object:Gem::Dependency
16
16
  name: rspec
@@ -68,9 +68,12 @@ files:
68
68
  - spec/fixtures/no_header.csv
69
69
  - spec/fixtures/numeric.csv
70
70
  - spec/fixtures/pets.csv
71
+ - spec/fixtures/quote_char.csv
71
72
  - spec/fixtures/quoted.csv
73
+ - spec/fixtures/quoted2.csv
72
74
  - spec/fixtures/separator.csv
73
75
  - spec/fixtures/skip_lines.csv
76
+ - spec/fixtures/trading.csv
74
77
  - spec/fixtures/user_import.csv
75
78
  - spec/fixtures/valid_unicode.csv
76
79
  - spec/fixtures/with_dashes.csv
@@ -101,6 +104,7 @@ files:
101
104
  - spec/smarter_csv/skip_lines_spec.rb
102
105
  - spec/smarter_csv/strings_as_keys_spec.rb
103
106
  - spec/smarter_csv/strip_chars_from_headers_spec.rb
107
+ - spec/smarter_csv/trading_spec.rb
104
108
  - spec/smarter_csv/valid_unicode_spec.rb
105
109
  - spec/smarter_csv/value_converters_spec.rb
106
110
  - spec/spec.opts
@@ -128,7 +132,7 @@ required_rubygems_version: !ruby/object:Gem::Requirement
128
132
  requirements:
129
133
  - csv
130
134
  rubyforge_project:
131
- rubygems_version: 2.6.13
135
+ rubygems_version: 2.7.4
132
136
  signing_key:
133
137
  specification_version: 4
134
138
  summary: Ruby Gem for smarter importing of CSV Files (and CSV-like files), with lots
@@ -153,9 +157,12 @@ test_files:
153
157
  - spec/fixtures/no_header.csv
154
158
  - spec/fixtures/numeric.csv
155
159
  - spec/fixtures/pets.csv
160
+ - spec/fixtures/quote_char.csv
156
161
  - spec/fixtures/quoted.csv
162
+ - spec/fixtures/quoted2.csv
157
163
  - spec/fixtures/separator.csv
158
164
  - spec/fixtures/skip_lines.csv
165
+ - spec/fixtures/trading.csv
159
166
  - spec/fixtures/user_import.csv
160
167
  - spec/fixtures/valid_unicode.csv
161
168
  - spec/fixtures/with_dashes.csv
@@ -186,6 +193,7 @@ test_files:
186
193
  - spec/smarter_csv/skip_lines_spec.rb
187
194
  - spec/smarter_csv/strings_as_keys_spec.rb
188
195
  - spec/smarter_csv/strip_chars_from_headers_spec.rb
196
+ - spec/smarter_csv/trading_spec.rb
189
197
  - spec/smarter_csv/valid_unicode_spec.rb
190
198
  - spec/smarter_csv/value_converters_spec.rb
191
199
  - spec/spec.opts