smarter_csv 1.2.0 → 1.2.3

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
- SHA1:
3
- metadata.gz: 251eb1eff306211d163c6be6880ba45e3eb29985
4
- data.tar.gz: 9e16d1d4aaab86df7a65d73e6a23ace8f481ba4e
2
+ SHA256:
3
+ metadata.gz: c0efcd4dbad2546469ba99c1a133bc7548382e147fca66207d11065607fdb5ee
4
+ data.tar.gz: 98024633925ed73251dd00606f23f73ca6cbda5517c24882b95935138ab4d7e3
5
5
  SHA512:
6
- metadata.gz: e2ad758ddb7d644777df1b9f38b73c7c4c0753d76507834bf68e86394ae017b1f5caa16867efd82f4c0443d4a7c1b4c1d7f704d9c3796e71083bde689ee7aeba
7
- data.tar.gz: 8f7126500628a17fda587a05678aca9140d0ebb7ca168fc3144b1d9c00d2be6ec901a43900bd3d558aceb0a8dcfc139aba7347e6dbad01b2ff83c19dc3816763
6
+ metadata.gz: ac64f3a688f9b5b4bce09e26d8ee140994bf585578ea6eed1d998cec434535f3afb4aa8c2ea222b873cb5ee113fba748412a96e1b3b23a8b5425d13aa7a6436c
7
+ data.tar.gz: f8c3132f1f2cf60f8f67d2278d0b3660d7aab456581256e1c2508d1397c1cc31d455035c68656777c34e90ad6238c239699b1c59cdd918cdcd63c9f5a264aaba
data/README.md CHANGED
@@ -2,6 +2,19 @@
2
2
 
3
3
  [![Build Status](https://secure.travis-ci.org/tilo/smarter_csv.svg?branch=master)](http://travis-ci.org/tilo/smarter_csv) [![Gem Version](https://badge.fury.io/rb/smarter_csv.svg)](http://badge.fury.io/rb/smarter_csv)
4
4
 
5
+ ---------------
6
+ #### Service Announcement
7
+
8
+ Work towards SmarterCSV 2.0 is on it's way, with much improved features, and more streamlined options.
9
+
10
+ Please check the 2.0-develop branch, and open issues marked v2.0 and leave your comments.
11
+
12
+ New versions on the 1.2 branch will soon print a deprecation warning if you set :verbose to true
13
+ See below for list of deprecated options.
14
+
15
+ ---------------
16
+ #### SmarterCSV
17
+
5
18
  `smarter_csv` is a Ruby Gem for smarter importing of CSV Files as Array(s) of Hashes, suitable for direct processing with Mongoid or ActiveRecord,
6
19
  and parallel processing with Resque or Sidekiq.
7
20
 
@@ -182,18 +195,48 @@ The options and the block are optional.
182
195
 
183
196
  `SmarterCSV.process` supports the following options:
184
197
 
198
+ #### Options:
199
+
185
200
  | Option | Default | Explanation |
186
201
  ---------------------------------------------------------------------------------------------------------------------------------
202
+ | :chunk_size | nil | if set, determines the desired chunk-size (defaults to nil, no chunk processing) |
203
+ | | | |
204
+ | :file_encoding | utf-8 | Set the file encoding eg.: 'windows-1252' or 'iso-8859-1' |
205
+ | :invalid_byte_sequence | '' | what to replace invalid byte sequences with |
206
+ | :force_utf8 | false | force UTF-8 encoding of all lines (including headers) in the CSV file |
207
+ | :skip_lines | nil | how many lines to skip before the first line or header line is processed |
208
+ | :comment_regexp | /^#/ | regular expression which matches comment lines (see NOTE about the CSV header) |
209
+ ---------------------------------------------------------------------------------------------------------------------------------
187
210
  | :col_sep | ',' | column separator |
211
+ | :force_simple_split | false | force simple splitting on :col_sep character for non-standard CSV-files. |
212
+ | | | e.g. when :quote_char is not properly escaped |
188
213
  | :row_sep | $/ ,"\n" | row separator or record separator , defaults to system's $/ , which defaults to "\n" |
189
214
  | | | This can also be set to :auto, but will process the whole cvs file first (slow!) |
190
215
  | :auto_row_sep_chars | 500 | How many characters to analyze when using `:row_sep => :auto`. nil or 0 means whole file. |
191
216
  | :quote_char | '"' | quotation character |
192
- | :comment_regexp | /^#/ | regular expression which matches comment lines (see NOTE about the CSV header) |
193
- | :chunk_size | nil | if set, determines the desired chunk-size (defaults to nil, no chunk processing) |
217
+ ---------------------------------------------------------------------------------------------------------------------------------
218
+ | :headers_in_file | true | Whether or not the file contains headers as the first line. |
219
+ | | | Important if the file does not contain headers, |
220
+ | | | otherwise you would lose the first line of data. |
221
+ | :user_provided_headers | nil | *careful with that axe!* |
222
+ | | | user provided Array of header strings or symbols, to define |
223
+ | | | what headers should be used, overriding any in-file headers. |
224
+ | | | You can not combine the :user_provided_headers and :key_mapping options |
225
+ | :remove_empty_hashes | true | remove / ignore any hashes which don't have any key/value pairs |
226
+ | :verbose | false | print out line number while processing (to track down problems in input files) |
227
+ ---------------------------------------------------------------------------------------------------------------------------------
228
+
229
+ #### Deprecated 1.x Options: to be replaced in 2.0
230
+
231
+ There have been a lot of 1-offs and feature creep around these options, and going forward we'll have a simpler, but more flexible way to address these features.
232
+
233
+ Instead of these options, there will be a new and more flexible way to process the header fields, as well as the fields in each line of the CSV.
234
+ And header and data validations will also be supported in 2.x
235
+
236
+ | Option | Default | Explanation |
194
237
  ---------------------------------------------------------------------------------------------------------------------------------
195
238
  | :key_mapping | nil | a hash which maps headers from the CSV file to keys in the result hash |
196
- | :required_headers | nil | An array. Eacn of the given headers must be present in the CSV file, |
239
+ | :required_headers | nil | An array. Eacn of the given headers must be present after header manipulation, |
197
240
  | | | or an exception is raised No validation if nil is given. |
198
241
  | :remove_unmapped_keys | false | when using :key_mapping option, should non-mapped keys / columns be removed? |
199
242
  | :downcase_header | true | downcase all column headers |
@@ -201,17 +244,7 @@ The options and the block are optional.
201
244
  | :strip_whitespace | true | remove whitespace before/after values and headers |
202
245
  | :keep_original_headers | false | keep the original headers from the CSV-file as-is. |
203
246
  | | | Disables other flags manipulating the header fields. |
204
- | :user_provided_headers | nil | *careful with that axe!* |
205
- | | | user provided Array of header strings or symbols, to define |
206
- | | | what headers should be used, overriding any in-file headers. |
207
- | | | You can not combine the :user_provided_headers and :key_mapping options |
208
247
  | :strip_chars_from_headers | nil | RegExp to remove extraneous characters from the header line (e.g. if headers are quoted) |
209
- | :headers_in_file | true | Whether or not the file contains headers as the first line. |
210
- | | | Important if the file does not contain headers, |
211
- | | | otherwise you would lose the first line of data. |
212
- | :skip_lines | nil | how many lines to skip before the first line or header line is processed |
213
- | :force_utf8 | false | force UTF-8 encoding of all lines (including headers) in the CSV file |
214
- | :invalid_byte_sequence | '' | how to replace invalid byte sequences with |
215
248
  ---------------------------------------------------------------------------------------------------------------------------------
216
249
  | :value_converters | nil | supply a hash of :header => KlassName; the class needs to implement self.convert(val)|
217
250
  | :remove_empty_values | true | remove values which have nil or empty strings as values |
@@ -220,11 +253,7 @@ The options and the block are optional.
220
253
  | | | /^\$0\.0+$/ to match $0.00 , or /^#VALUE!$/ to match errors in Excel spreadsheets |
221
254
  | :convert_values_to_numeric | true | converts strings containing Integers or Floats to the appropriate class |
222
255
  | | | also accepts either {:except => [:key1,:key2]} or {:only => :key3} |
223
- | :remove_empty_hashes | true | remove / ignore any hashes which don't have any key/value pairs |
224
- | :file_encoding | utf-8 | Set the file encoding eg.: 'windows-1252' or 'iso-8859-1' |
225
- | :force_simple_split | false | force simple splitting on :col_sep character for non-standard CSV-files. |
226
- | | | e.g. when :quote_char is not properly escaped |
227
- | :verbose | false | print out line number while processing (to track down problems in input files) |
256
+ ---------------------------------------------------------------------------------------------------------------------------------
228
257
 
229
258
 
230
259
  #### NOTES about File Encodings:
@@ -295,6 +324,13 @@ Planned in the next releases:
295
324
 
296
325
  ## Changes
297
326
 
327
+ #### 1.2.3 (2018-01-27)
328
+ * fixed regression / test
329
+ * fuxed quote_char interpolation for headers, but not data (thanks to Colin Petruno)
330
+
331
+ #### 1.2.1 (2018-01-25) ### YANKED!
332
+ * bugfix (thanks to Joshua Smith for reporting)
333
+
298
334
  #### 1.2.0 (2018-01-20)
299
335
  * add default validation that a header can only appear once
300
336
  * add option `required_headers`
@@ -465,6 +501,8 @@ And a special thanks to those who contributed pull requests:
465
501
  * [Ivan Ushakov](https://github.com/IvanUshakov)
466
502
  * [Matthieu Paret](https://github.com/mtparet)
467
503
  * [Rohit Amarnath](https://github.com/ramarnat)
504
+ * [Joshua Smith](https://github.com/enviable)
505
+ * [Colin Petruno](https://github.com/colinpetruno)
468
506
 
469
507
 
470
508
  ## Contributing
@@ -59,7 +59,7 @@ module SmarterCSV
59
59
  else
60
60
  file_headerA = header.split(options[:col_sep])
61
61
  end
62
- file_headerA.map!{|x| x.gsub(%r/options[:quote_char]/,'') }
62
+ file_headerA.map!{|x| x.gsub(%r/#{options[:quote_char]}/,'') }
63
63
  file_headerA.map!{|x| x.strip} if options[:strip_whitespace]
64
64
  unless options[:keep_original_headers]
65
65
  file_headerA.map!{|x| x.gsub(/\s+|-+/,'_')}
@@ -68,14 +68,14 @@ module SmarterCSV
68
68
 
69
69
  file_header_size = file_headerA.size
70
70
  else
71
- raise SmarterCSV::IncorrectOption , "ERROR [smarter_csv]: If :headers_in_file is set to false, you have to provide :user_provided_headers" if options[:user_provided_headers].nil?
71
+ raise SmarterCSV::IncorrectOption , "ERROR: If :headers_in_file is set to false, you have to provide :user_provided_headers" if options[:user_provided_headers].nil?
72
72
  end
73
73
  if options[:user_provided_headers] && options[:user_provided_headers].class == Array && ! options[:user_provided_headers].empty?
74
74
  # use user-provided headers
75
75
  headerA = options[:user_provided_headers]
76
76
  if defined?(file_header_size) && ! file_header_size.nil?
77
77
  if headerA.size != file_header_size
78
- raise SmarterCSV::HeaderSizeMismatch , "ERROR [smarter_csv]: :user_provided_headers defines #{headerA.size} headers != CSV-file #{input} has #{file_header_size} headers"
78
+ raise SmarterCSV::HeaderSizeMismatch , "ERROR: :user_provided_headers defines #{headerA.size} headers != CSV-file #{input} has #{file_header_size} headers"
79
79
  else
80
80
  # we could print out the mapping of file_headerA to headerA here
81
81
  end
@@ -100,14 +100,14 @@ module SmarterCSV
100
100
  headerA.compact.each do |k|
101
101
  duplicate_headers << k if headerA.select{|x| x == k}.size > 1
102
102
  end
103
- raise SmarterCSV::DuplicateHeaders , "ERORR [smarter_csv]: duplicate headers: #{duplicate_headers.join(',')}" unless duplicate_headers.empty?
103
+ raise SmarterCSV::DuplicateHeaders , "ERROR: duplicate headers: #{duplicate_headers.join(',')}" unless duplicate_headers.empty?
104
104
 
105
105
  if options[:required_headers] && options[:required_headers].is_a?(Array)
106
106
  missing_headers = []
107
107
  options[:required_headers].each do |k|
108
108
  missing_headers << k unless headerA.include?(k)
109
109
  end
110
- raise SmarterCSV::MissingHeaders , "ERORR [smarter_csv]: missing headers: #{missing_headers.join(',')}" unless missing_headers.empty?
110
+ raise SmarterCSV::MissingHeaders , "ERROR: missing headers: #{missing_headers.join(',')}" unless missing_headers.empty?
111
111
  end
112
112
 
113
113
  # in case we use chunking.. we'll need to set it up..
@@ -155,7 +155,7 @@ module SmarterCSV
155
155
  else
156
156
  dataA = line.split(options[:col_sep])
157
157
  end
158
- dataA.map!{|x| x.gsub(%r/options[:quote_char]/,'') }
158
+ #### dataA.map!{|x| x.gsub(%r/#{options[:quote_char]}/,'') } # this is actually not a good idea as a default
159
159
  dataA.map!{|x| x.strip} if options[:strip_whitespace]
160
160
  hash = Hash.zip(headerA,dataA) # from Facets of Ruby library
161
161
  # make sure we delete any key/value pairs from the hash, which the user wanted to delete:
@@ -1,3 +1,3 @@
1
1
  module SmarterCSV
2
- VERSION = "1.2.0"
2
+ VERSION = "1.2.3"
3
3
  end
@@ -0,0 +1,9 @@
1
+ "ID","FIRST_NAME","LAST_NAME"
2
+ "1","""John","Cooke"""
3
+ "2","Jam
4
+ e
5
+ son""","McCollum"
6
+ "3","""Jean","Conn"
7
+ "4","Jenny","Traer"
8
+ "5","Bo""bbie","Faga"
9
+ "6","Mica","Copeland"
@@ -0,0 +1,4 @@
1
+ "REVIEW DATE","AUTHOR","""ISBN""","DISCOUNTED ""PRICE"""
2
+ "1985/01/21","Douglas Adams",0345391802,5.95
3
+ "1998/07/15","Timothy ""The Parser"" Campbell",0968411304,18.99
4
+ "1999/12/03","Richard Friedman",0060630353,5.95
@@ -0,0 +1,3 @@
1
+ Account_ID,options_trader,Stock_Symbol,Shares Issued,Purchase Date
2
+ 0002310234,Mike Smith,TSLA,2300,2011-08-19
3
+ 0024923423,John Doe,AAPL,1300,2013-03-21
@@ -2,9 +2,9 @@ require 'spec_helper'
2
2
 
3
3
  fixture_path = 'spec/fixtures'
4
4
 
5
- describe 'be_able_to' do
5
+ describe 'loading file with quoted fields' do
6
6
 
7
- it 'loads_file_with_quoted_fields' do
7
+ it 'leaving the quotes in the data' do
8
8
  options = {}
9
9
  data = SmarterCSV.process("#{fixture_path}/quoted.csv", options)
10
10
  data.flatten.size.should == 4
@@ -20,4 +20,29 @@ describe 'be_able_to' do
20
20
  end
21
21
  end
22
22
 
23
+
24
+ it 'removes quotes around quoted fields, but not inside data' do
25
+ options = {}
26
+ data = SmarterCSV.process("#{fixture_path}/quote_char.csv", options)
27
+
28
+ data.length.should eq 6
29
+ data[1][:first_name].should eq "Jam\ne\nson\""
30
+ data[2][:first_name].should eq "\"Jean"
31
+ end
32
+
33
+
34
+ # NOTE: quotes inside headers need to be escaped by doubling them
35
+ # e.g. 'correct ""EXAMPLE""'
36
+ # this escaping is illegal: 'incorrect \"EXAMPLE\"' <-- this caused CSV parsing error
37
+ # in case of CSV parsing errirs, use :user_provided_headers, or key_mapping
38
+ #
39
+ it 'removes quotes around headers and extra quotes inside headers' do
40
+ options = {}
41
+ data = SmarterCSV.process("#{fixture_path}/quoted2.csv", options)
42
+
43
+ data.length.should eq 3
44
+ data.first.keys[2].should eq :isbn
45
+ data.first.keys[3].should eq :discounted_price
46
+ end
47
+
23
48
  end
@@ -0,0 +1,25 @@
1
+ require 'spec_helper'
2
+
3
+ fixture_path = 'spec/fixtures'
4
+
5
+ # somebody reported that a column called 'options_trader' would be truncated to 'trader'
6
+
7
+ describe 'loads simple file format' do
8
+
9
+ it 'with symbols as keys when using defaults' do
10
+ options = {}
11
+ data = SmarterCSV.process("#{fixture_path}/trading.csv", options)
12
+
13
+ data.flatten.size.should eq 2
14
+ data.each do |item|
15
+ # all keys should be symbols when using v1.x backwards compatible mode
16
+ item.keys.each{|x| x.class.should eq Symbol}
17
+ item[:account_id].class.should eq Fixnum
18
+ item[:options_trader].class.should eq String
19
+ item[:stock_symbol].class.should eq String
20
+ item[:shares_issued].class.should eq Fixnum
21
+ item[:purchase_date].class.should eq String
22
+ end
23
+ end
24
+
25
+ end
metadata CHANGED
@@ -1,7 +1,7 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: smarter_csv
3
3
  version: !ruby/object:Gem::Version
4
- version: 1.2.0
4
+ version: 1.2.3
5
5
  platform: ruby
6
6
  authors:
7
7
  - 'Tilo Sloboda
@@ -10,7 +10,7 @@ authors:
10
10
  autorequire:
11
11
  bindir: bin
12
12
  cert_chain: []
13
- date: 2018-01-20 00:00:00.000000000 Z
13
+ date: 2018-01-27 00:00:00.000000000 Z
14
14
  dependencies:
15
15
  - !ruby/object:Gem::Dependency
16
16
  name: rspec
@@ -68,9 +68,12 @@ files:
68
68
  - spec/fixtures/no_header.csv
69
69
  - spec/fixtures/numeric.csv
70
70
  - spec/fixtures/pets.csv
71
+ - spec/fixtures/quote_char.csv
71
72
  - spec/fixtures/quoted.csv
73
+ - spec/fixtures/quoted2.csv
72
74
  - spec/fixtures/separator.csv
73
75
  - spec/fixtures/skip_lines.csv
76
+ - spec/fixtures/trading.csv
74
77
  - spec/fixtures/user_import.csv
75
78
  - spec/fixtures/valid_unicode.csv
76
79
  - spec/fixtures/with_dashes.csv
@@ -101,6 +104,7 @@ files:
101
104
  - spec/smarter_csv/skip_lines_spec.rb
102
105
  - spec/smarter_csv/strings_as_keys_spec.rb
103
106
  - spec/smarter_csv/strip_chars_from_headers_spec.rb
107
+ - spec/smarter_csv/trading_spec.rb
104
108
  - spec/smarter_csv/valid_unicode_spec.rb
105
109
  - spec/smarter_csv/value_converters_spec.rb
106
110
  - spec/spec.opts
@@ -128,7 +132,7 @@ required_rubygems_version: !ruby/object:Gem::Requirement
128
132
  requirements:
129
133
  - csv
130
134
  rubyforge_project:
131
- rubygems_version: 2.6.13
135
+ rubygems_version: 2.7.4
132
136
  signing_key:
133
137
  specification_version: 4
134
138
  summary: Ruby Gem for smarter importing of CSV Files (and CSV-like files), with lots
@@ -153,9 +157,12 @@ test_files:
153
157
  - spec/fixtures/no_header.csv
154
158
  - spec/fixtures/numeric.csv
155
159
  - spec/fixtures/pets.csv
160
+ - spec/fixtures/quote_char.csv
156
161
  - spec/fixtures/quoted.csv
162
+ - spec/fixtures/quoted2.csv
157
163
  - spec/fixtures/separator.csv
158
164
  - spec/fixtures/skip_lines.csv
165
+ - spec/fixtures/trading.csv
159
166
  - spec/fixtures/user_import.csv
160
167
  - spec/fixtures/valid_unicode.csv
161
168
  - spec/fixtures/with_dashes.csv
@@ -186,6 +193,7 @@ test_files:
186
193
  - spec/smarter_csv/skip_lines_spec.rb
187
194
  - spec/smarter_csv/strings_as_keys_spec.rb
188
195
  - spec/smarter_csv/strip_chars_from_headers_spec.rb
196
+ - spec/smarter_csv/trading_spec.rb
189
197
  - spec/smarter_csv/valid_unicode_spec.rb
190
198
  - spec/smarter_csv/value_converters_spec.rb
191
199
  - spec/spec.opts