smarter_csv 1.1.0 → 1.1.1

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA1:
3
- metadata.gz: ec1442dad9c0f71dc3264f6df10f5fe1116f9d23
4
- data.tar.gz: 2025c6c7c81fc6c94fed0ed7d391eb122a464bd1
3
+ metadata.gz: 34d6c592bebe9d6b1d8f87f9f59ecf4a7d3b3a9d
4
+ data.tar.gz: fe722f38c4962a312c4db7e14cb72e735426db82
5
5
  SHA512:
6
- metadata.gz: 1dd00a098dba973b2f6e0303317bd11cdc527d05ca959dd21ae78f241a640e799b623f0b719da5d146c89856d0e50a557ffa8180725d29c6582e0e10231b091a
7
- data.tar.gz: 7ab34fac0386ccef0ad13f1116b5ed0669d4ffe6270f8a4b9e39da8421132c4fac4f1c1764b78ca8ce5e702df29c9819ee1359bf0b18c98b735ea53a3935e4d8
6
+ metadata.gz: 75f5c56cfdeeef41be34f17bfbec30ae201c41463ad9aa6c7da7b5031c63fa1da27ef5120c245329eec108551fae722a5d156e09ead62c2e6aa34ce8edfe4cd8
7
+ data.tar.gz: 2d3b83fa5e7f4eada8d7f03df50890441f594b8db6d34df4d2c07164c73beda7b9963d596ce43d2ad74ffddb83efb4bdea8ca83d780d7f3129c258c6d21bdb70
data/README.md CHANGED
@@ -20,7 +20,7 @@ One `smarter_csv` user wrote:
20
20
  * able to ignore "columns" in the input (delete columns)
21
21
  * able to eliminate nil or empty fields from the result hashes (default)
22
22
 
23
- NOTE; This Gem is only for importing CSV files - writing of CSV files is not supported.
23
+ NOTE; This Gem is only for importing CSV files - writing of CSV files is not supported at this time.
24
24
 
25
25
  ### Why?
26
26
 
@@ -130,6 +130,8 @@ and how the `process` method returns the number of chunks when called with a blo
130
130
 
131
131
  #### Example 6: Using Value Converters
132
132
 
133
+ NOTE: If you use `key_mappings` and `value_converters`, make sure that the value converters has references the keys based on the final mapped name, not the original name in the CSV file.
134
+
133
135
  $ cat spec/fixtures/with_dates.csv
134
136
  first,last,date,price
135
137
  Ben,Miller,10/30/1998,$44.50
@@ -163,6 +165,9 @@ and how the `process` method returns the number of chunks when called with a blo
163
165
  data[0][:price].class
164
166
  => Float
165
167
 
168
+ ## Parallel Processing
169
+ [Jack](https://github.com/xjlin0) wrote an interesting article about [Speeding up CSV parsing with parallel processing](http://xjlin0.github.io/tech/2015/05/25/faster-parsing-csv-with-parallel-processing/)
170
+
166
171
  ## Documentation
167
172
 
168
173
  The `process` method reads and processes a "generalized" CSV file and returns the contents either as an Array of Hashes,
@@ -198,6 +203,8 @@ The options and the block are optional.
198
203
  | :headers_in_file | true | Whether or not the file contains headers as the first line. |
199
204
  | | | Important if the file does not contain headers, |
200
205
  | | | otherwise you would lose the first line of data. |
206
+ | :skip_lines | nil | how many lines to skip before the first line or header line is processed |
207
+ | :force_utf8 | false | force UTF-8 encoding of all lines (including headers) in the CSV file |
201
208
  ---------------------------------------------------------------------------------------------------------------------------------
202
209
  | :value_converters | nil | supply a hash of :header => KlassName; the class needs to implement self.convert(val)|
203
210
  | :remove_empty_values | true | remove values which have nil or empty strings as values |
@@ -270,9 +277,22 @@ Or install it yourself as:
270
277
 
271
278
  $ gem install smarter_csv
272
279
 
280
+ ## Upcoming
281
+
282
+ Planned in the next releases:
283
+ * programmatic header transformations
284
+ * CSV command line
273
285
 
274
286
  ## Changes
275
287
 
288
+ #### 1.1.1 (2016-11-26)
289
+ * added option to `skip_lines` (thanks to wal)
290
+ * added option to `force_utf8` encoding (thanks to jordangraft)
291
+ * bugfix if no headers in input data (thanks to esBeee)
292
+ * ensure input file is closed (thanks to waldyr)
293
+ * improved verbose output (thankd to benmaher)
294
+ * improved documentation
295
+
276
296
  #### 1.1.0 (2015-07-26)
277
297
  * added feature :value_converters, which allows parsing of dates, money, and other things (thanks to Raphaël Bleuse, Lucas Camargo de Almeida, Alejandro)
278
298
  * added error if :headers_in_file is set to false, and no :user_provided_headers are given (thanks to innhyu)
@@ -383,6 +403,7 @@ Please [open an Issue on GitHub](https://github.com/tilo/smarter_csv/issues) if
383
403
  Many thanks to people who have filed issues and sent comments.
384
404
  And a special thanks to those who contributed pull requests:
385
405
 
406
+ * [Jack 0](https://github.com/xjlin0)
386
407
  * [Alejandro](https://github.com/agaviria)
387
408
  * [Lucas Camargo de Almeida](https://github.com/lcalmeida)
388
409
  * [Raphaël Bleuse](https://github.com/bleuse)
@@ -402,6 +423,11 @@ And a special thanks to those who contributed pull requests:
402
423
  * [Jordan Running](https://github.com/jrunning)
403
424
  * [Dave Sanders](https://github.com/DaveSanders)
404
425
  * [Hugo Lepetit](https://github.com/giglemad)
426
+ * [esBeee](https://github.com/esBeee)
427
+ * [Waldyr de Souza](https://github.com/waldyr)
428
+ * [Ben Maher](https://github.com/benmaher)
429
+ * [Wal McConnell](https://github.com/wal)
430
+ * [Jordan Graft](https://github.com/jordangraft)
405
431
 
406
432
 
407
433
  ## Contributing
@@ -9,14 +9,15 @@ module SmarterCSV
9
9
  :remove_empty_values => true, :remove_zero_values => false , :remove_values_matching => nil , :remove_empty_hashes => true , :strip_whitespace => true,
10
10
  :convert_values_to_numeric => true, :strip_chars_from_headers => nil , :user_provided_headers => nil , :headers_in_file => true,
11
11
  :comment_regexp => /^#/, :chunk_size => nil , :key_mapping_hash => nil , :downcase_header => true, :strings_as_keys => false, :file_encoding => 'utf-8',
12
- :remove_unmapped_keys => false, :keep_original_headers => false, :value_converters => nil,
12
+ :remove_unmapped_keys => false, :keep_original_headers => false, :value_converters => nil, :skip_lines => nil, :force_utf8 => false
13
13
  }
14
14
  options = default_options.merge(options)
15
15
  csv_options = options.select{|k,v| [:col_sep, :row_sep, :quote_char].include?(k)} # options.slice(:col_sep, :row_sep, :quote_char)
16
16
  headerA = []
17
17
  result = []
18
18
  old_row_sep = $/
19
- line_count = 0
19
+ file_line_count = 0
20
+ csv_line_count = 0
20
21
  begin
21
22
  f = input.respond_to?(:readline) ? input : File.open(input, "r:#{options[:file_encoding]}")
22
23
 
@@ -26,14 +27,25 @@ module SmarterCSV
26
27
  end
27
28
  $/ = options[:row_sep]
28
29
 
30
+ if options[:skip_lines].to_i > 0
31
+ options[:skip_lines].to_i.times{f.readline}
32
+ end
33
+
29
34
  if options[:headers_in_file] # extract the header line
30
35
  # process the header line in the CSV file..
31
36
  # the first line of a CSV file contains the header .. it might be commented out, so we need to read it anyhow
32
37
  header = f.readline.sub(options[:comment_regexp],'').chomp(options[:row_sep])
33
- line_count += 1
38
+ header = header.encode!('UTF-8', 'binary', invalid: :replace, undef: :replace, replace: '') if options[:force_utf8]
39
+ file_line_count += 1
40
+ csv_line_count += 1
34
41
  header = header.gsub(options[:strip_chars_from_headers], '') if options[:strip_chars_from_headers]
42
+
35
43
  if (header =~ %r{#{options[:quote_char]}}) and (! options[:force_simple_split])
36
- file_headerA = CSV.parse( header, csv_options ).flatten.collect!{|x| x.nil? ? '' : x} # to deal with nil values from CSV.parse
44
+ file_headerA = begin
45
+ CSV.parse( header, csv_options ).flatten.collect!{|x| x.nil? ? '' : x} # to deal with nil values from CSV.parse
46
+ rescue CSV::MalformedCSVError => e
47
+ raise $!, "#{$!} [SmarterCSV: csv line #{csv_line_count}]", $!.backtrace
48
+ end
37
49
  else
38
50
  file_headerA = header.split(options[:col_sep])
39
51
  end
@@ -44,11 +56,9 @@ module SmarterCSV
44
56
  file_headerA.map!{|x| x.downcase } if options[:downcase_header]
45
57
  end
46
58
 
47
- # puts "HeaderA: #{file_headerA.join(' , ')}" if options[:verbose]
48
-
49
59
  file_header_size = file_headerA.size
50
60
  else
51
- raise SmarterCSV::IncorrectOption , "ERROR [smarter_csv]: If :headers_in_file is set to false, you have to provide :user_provided_headers" if ! options.keys.include?(:user_provided_headers)
61
+ raise SmarterCSV::IncorrectOption , "ERROR [smarter_csv]: If :headers_in_file is set to false, you have to provide :user_provided_headers" if options[:user_provided_headers].nil?
52
62
  end
53
63
  if options[:user_provided_headers] && options[:user_provided_headers].class == Array && ! options[:user_provided_headers].empty?
54
64
  # use user-provided headers
@@ -88,22 +98,30 @@ module SmarterCSV
88
98
  # now on to processing all the rest of the lines in the CSV file:
89
99
  while ! f.eof? # we can't use f.readlines() here, because this would read the whole file into memory at once, and eof => true
90
100
  line = f.readline # read one line.. this uses the input_record_separator $/ which we set previously!
91
- line_count += 1
92
- print "processing line %10d\r" % line_count if options[:verbose]
101
+ line = line.encode!('UTF-8', 'binary', invalid: :replace, undef: :replace, replace: '') if options[:force_utf8]
102
+ file_line_count += 1
103
+ csv_line_count += 1
104
+ print "processing file line %10d, csv line %10d\r" % [file_line_count, csv_line_count] if options[:verbose]
93
105
  next if line =~ options[:comment_regexp] # ignore all comment lines if there are any
94
106
 
95
107
  # cater for the quoted csv data containing the row separator carriage return character
96
108
  # in which case the row data will be split across multiple lines (see the sample content in spec/fixtures/carriage_returns_rn.csv)
97
109
  # by detecting the existence of an uneven number of quote characters
110
+ multiline = line.count(options[:quote_char])%2 == 1
98
111
  while line.count(options[:quote_char])%2 == 1
99
- print "line contains uneven number of quote chars so including content of next line" if options[:verbose]
100
112
  line += f.readline
113
+ file_line_count += 1
101
114
  end
115
+ print "\nline contains uneven number of quote chars so including content through file line %d\n" % file_line_count if options[:verbose] && multiline
102
116
 
103
117
  line.chomp! # will use $/ which is set to options[:col_sep]
104
118
 
105
119
  if (line =~ %r{#{options[:quote_char]}}) and (! options[:force_simple_split])
106
- dataA = CSV.parse( line, csv_options ).flatten.collect!{|x| x.nil? ? '' : x} # to deal with nil values from CSV.parse
120
+ dataA = begin
121
+ CSV.parse( line, csv_options ).flatten.collect!{|x| x.nil? ? '' : x} # to deal with nil values from CSV.parse
122
+ rescue CSV::MalformedCSVError => e
123
+ raise $!, "#{$!} [SmarterCSV: csv line #{csv_line_count}]", $!.backtrace
124
+ end
107
125
  else
108
126
  dataA = line.split(options[:col_sep])
109
127
  end
@@ -176,6 +194,10 @@ module SmarterCSV
176
194
  end
177
195
  end
178
196
  end
197
+
198
+ # print new line to retain last processing line message
199
+ print "\n" if options[:verbose]
200
+
179
201
  # last chunk:
180
202
  if ! chunk.nil? && chunk.size > 0
181
203
  # do something with the chunk
@@ -189,6 +211,7 @@ module SmarterCSV
189
211
  end
190
212
  ensure
191
213
  $/ = old_row_sep # make sure this stupid global variable is always reset to it's previous value after we're done!
214
+ f.close
192
215
  end
193
216
  if block_given?
194
217
  return chunk_count # when we do processing through a block we only care how many chunks we processed
@@ -197,11 +220,6 @@ module SmarterCSV
197
220
  end
198
221
  end
199
222
 
200
- # def SmarterCSV.process_csv(*args)
201
- # warn "[DEPRECATION] `process_csv` is deprecated. Please use `process` instead."
202
- # SmarterCSV.process(*args)
203
- # end
204
-
205
223
  private
206
224
  # acts as a road-block to limit processing when iterating over all k/v pairs of a CSV-hash:
207
225
 
@@ -241,8 +259,7 @@ module SmarterCSV
241
259
  end
242
260
  counts["\r"] += 1 if last_char == "\r"
243
261
  # find the key/value pair with the largest counter:
244
- k,v = counts.max_by{|k,v| v}
262
+ k,_ = counts.max_by{|_,v| v}
245
263
  return k # the most frequent one is it
246
264
  end
247
265
  end
248
-
@@ -1,3 +1,3 @@
1
1
  module SmarterCSV
2
- VERSION = "1.1.0"
2
+ VERSION = "1.1.1"
3
3
  end
@@ -0,0 +1,3 @@
1
+ "name","dob"
2
+ "Arnold Schwarzenegger","1947-07-30"
3
+ "Jeff "the dude" Bridges","1949-12-04"
@@ -0,0 +1,3 @@
1
+ "name","dob"dob""
2
+ "Arnold Schwarzenegger","1947-07-30"
3
+ "Jeff Bridges","1949-12-04"
@@ -0,0 +1,8 @@
1
+ Lines
2
+ To
3
+ Skip
4
+ first name,last name,dogs,cats,birds,fish
5
+ Dan,McAllister,2,,,
6
+ Lucy,Laweless,,5,,
7
+ Miles,O'Brian,,,,21
8
+ Nancy,Homes,2,,1,
@@ -0,0 +1,15 @@
1
+ require 'spec_helper'
2
+
3
+ fixture_path = 'spec/fixtures'
4
+
5
+ describe 'be_able_to' do
6
+ it 'close file after using it' do
7
+ options = {:col_sep => "\cA", :row_sep => "\cB", :comment_regexp => /^#/, :strings_as_keys => true}
8
+
9
+ file = File.new("#{fixture_path}/binary.csv")
10
+
11
+ SmarterCSV.process(file, options)
12
+
13
+ file.closed?.should == true
14
+ end
15
+ end
@@ -0,0 +1,21 @@
1
+ require 'spec_helper'
2
+
3
+ fixture_path = 'spec/fixtures'
4
+
5
+ describe 'malformed_csv' do
6
+ subject { lambda { SmarterCSV.process(csv_path) } }
7
+
8
+ context "malformed header" do
9
+ let(:csv_path) { "#{fixture_path}/malformed_header.csv" }
10
+ it { should raise_error(CSV::MalformedCSVError) }
11
+ it { should raise_error(/(Missing or stray quote in line 1|CSV::MalformedCSVError)/) }
12
+ it { should raise_error(CSV::MalformedCSVError) }
13
+ end
14
+
15
+ context "malformed content" do
16
+ let(:csv_path) { "#{fixture_path}/malformed.csv" }
17
+ it { should raise_error(CSV::MalformedCSVError) }
18
+ it { should raise_error(/(Missing or stray quote in line 1|CSV::MalformedCSVError)/) }
19
+ it { should raise_error(CSV::MalformedCSVError) }
20
+ end
21
+ end
@@ -0,0 +1,29 @@
1
+ require 'spec_helper'
2
+
3
+ fixture_path = 'spec/fixtures'
4
+
5
+ describe 'be_able_to' do
6
+ it 'loads_csv_file_skipping_lines' do
7
+ options = {skip_lines: 3}
8
+ data = SmarterCSV.process("#{fixture_path}/skip_lines.csv", options)
9
+ data.size.should == 4
10
+
11
+ data.each do |item|
12
+ item.keys.each do |key|
13
+ [:first_name,:last_name,:dogs,:cats,:birds,:fish].should include(key)
14
+ end
15
+ end
16
+ end
17
+
18
+ it 'loads_csv_with_user_defined_headers' do
19
+ options = {:skip_lines => 3, :headers_in_file => true, :user_provided_headers => [:a,:b,:c,:d,:e,:f]}
20
+ data = SmarterCSV.process("#{fixture_path}/skip_lines.csv", options)
21
+ data.size.should == 4
22
+
23
+ data.each do |item|
24
+ item.keys.each do |key|
25
+ [:a,:b,:c,:d,:e,:f].should include( key )
26
+ end
27
+ end
28
+ end
29
+ end
metadata CHANGED
@@ -1,7 +1,7 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: smarter_csv
3
3
  version: !ruby/object:Gem::Version
4
- version: 1.1.0
4
+ version: 1.1.1
5
5
  platform: ruby
6
6
  authors:
7
7
  - |
@@ -9,7 +9,7 @@ authors:
9
9
  autorequire:
10
10
  bindir: bin
11
11
  cert_chain: []
12
- date: 2015-07-27 00:00:00.000000000 Z
12
+ date: 2016-11-26 00:00:00.000000000 Z
13
13
  dependencies:
14
14
  - !ruby/object:Gem::Dependency
15
15
  name: rspec
@@ -59,18 +59,22 @@ files:
59
59
  - spec/fixtures/line_endings_r.csv
60
60
  - spec/fixtures/line_endings_rn.csv
61
61
  - spec/fixtures/lots_of_columns.csv
62
+ - spec/fixtures/malformed.csv
63
+ - spec/fixtures/malformed_header.csv
62
64
  - spec/fixtures/money.csv
63
65
  - spec/fixtures/no_header.csv
64
66
  - spec/fixtures/numeric.csv
65
67
  - spec/fixtures/pets.csv
66
68
  - spec/fixtures/quoted.csv
67
69
  - spec/fixtures/separator.csv
70
+ - spec/fixtures/skip_lines.csv
68
71
  - spec/fixtures/with_dashes.csv
69
72
  - spec/fixtures/with_dates.csv
70
73
  - spec/smarter_csv/binary_file2_spec.rb
71
74
  - spec/smarter_csv/binary_file_spec.rb
72
75
  - spec/smarter_csv/carriage_return_spec.rb
73
76
  - spec/smarter_csv/chunked_reading_spec.rb
77
+ - spec/smarter_csv/close_file_spec.rb
74
78
  - spec/smarter_csv/column_separator_spec.rb
75
79
  - spec/smarter_csv/convert_values_to_numeric_spec.rb
76
80
  - spec/smarter_csv/header_transformation_spec.rb
@@ -78,6 +82,7 @@ files:
78
82
  - spec/smarter_csv/key_mapping_spec.rb
79
83
  - spec/smarter_csv/line_ending_spec.rb
80
84
  - spec/smarter_csv/load_basic_spec.rb
85
+ - spec/smarter_csv/malformed_spec.rb
81
86
  - spec/smarter_csv/no_header_spec.rb
82
87
  - spec/smarter_csv/not_downcase_header_spec.rb
83
88
  - spec/smarter_csv/quoted_spec.rb
@@ -86,6 +91,7 @@ files:
86
91
  - spec/smarter_csv/remove_not_mapped_keys_spec.rb
87
92
  - spec/smarter_csv/remove_values_matching_spec.rb
88
93
  - spec/smarter_csv/remove_zero_values_spec.rb
94
+ - spec/smarter_csv/skip_lines_spec.rb
89
95
  - spec/smarter_csv/strings_as_keys_spec.rb
90
96
  - spec/smarter_csv/strip_chars_from_headers_spec.rb
91
97
  - spec/smarter_csv/value_converters_spec.rb
@@ -132,18 +138,22 @@ test_files:
132
138
  - spec/fixtures/line_endings_r.csv
133
139
  - spec/fixtures/line_endings_rn.csv
134
140
  - spec/fixtures/lots_of_columns.csv
141
+ - spec/fixtures/malformed.csv
142
+ - spec/fixtures/malformed_header.csv
135
143
  - spec/fixtures/money.csv
136
144
  - spec/fixtures/no_header.csv
137
145
  - spec/fixtures/numeric.csv
138
146
  - spec/fixtures/pets.csv
139
147
  - spec/fixtures/quoted.csv
140
148
  - spec/fixtures/separator.csv
149
+ - spec/fixtures/skip_lines.csv
141
150
  - spec/fixtures/with_dashes.csv
142
151
  - spec/fixtures/with_dates.csv
143
152
  - spec/smarter_csv/binary_file2_spec.rb
144
153
  - spec/smarter_csv/binary_file_spec.rb
145
154
  - spec/smarter_csv/carriage_return_spec.rb
146
155
  - spec/smarter_csv/chunked_reading_spec.rb
156
+ - spec/smarter_csv/close_file_spec.rb
147
157
  - spec/smarter_csv/column_separator_spec.rb
148
158
  - spec/smarter_csv/convert_values_to_numeric_spec.rb
149
159
  - spec/smarter_csv/header_transformation_spec.rb
@@ -151,6 +161,7 @@ test_files:
151
161
  - spec/smarter_csv/key_mapping_spec.rb
152
162
  - spec/smarter_csv/line_ending_spec.rb
153
163
  - spec/smarter_csv/load_basic_spec.rb
164
+ - spec/smarter_csv/malformed_spec.rb
154
165
  - spec/smarter_csv/no_header_spec.rb
155
166
  - spec/smarter_csv/not_downcase_header_spec.rb
156
167
  - spec/smarter_csv/quoted_spec.rb
@@ -159,6 +170,7 @@ test_files:
159
170
  - spec/smarter_csv/remove_not_mapped_keys_spec.rb
160
171
  - spec/smarter_csv/remove_values_matching_spec.rb
161
172
  - spec/smarter_csv/remove_zero_values_spec.rb
173
+ - spec/smarter_csv/skip_lines_spec.rb
162
174
  - spec/smarter_csv/strings_as_keys_spec.rb
163
175
  - spec/smarter_csv/strip_chars_from_headers_spec.rb
164
176
  - spec/smarter_csv/value_converters_spec.rb