smarter_csv 1.1.0 → 1.1.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA1:
3
- metadata.gz: ec1442dad9c0f71dc3264f6df10f5fe1116f9d23
4
- data.tar.gz: 2025c6c7c81fc6c94fed0ed7d391eb122a464bd1
3
+ metadata.gz: 34d6c592bebe9d6b1d8f87f9f59ecf4a7d3b3a9d
4
+ data.tar.gz: fe722f38c4962a312c4db7e14cb72e735426db82
5
5
  SHA512:
6
- metadata.gz: 1dd00a098dba973b2f6e0303317bd11cdc527d05ca959dd21ae78f241a640e799b623f0b719da5d146c89856d0e50a557ffa8180725d29c6582e0e10231b091a
7
- data.tar.gz: 7ab34fac0386ccef0ad13f1116b5ed0669d4ffe6270f8a4b9e39da8421132c4fac4f1c1764b78ca8ce5e702df29c9819ee1359bf0b18c98b735ea53a3935e4d8
6
+ metadata.gz: 75f5c56cfdeeef41be34f17bfbec30ae201c41463ad9aa6c7da7b5031c63fa1da27ef5120c245329eec108551fae722a5d156e09ead62c2e6aa34ce8edfe4cd8
7
+ data.tar.gz: 2d3b83fa5e7f4eada8d7f03df50890441f594b8db6d34df4d2c07164c73beda7b9963d596ce43d2ad74ffddb83efb4bdea8ca83d780d7f3129c258c6d21bdb70
data/README.md CHANGED
@@ -20,7 +20,7 @@ One `smarter_csv` user wrote:
20
20
  * able to ignore "columns" in the input (delete columns)
21
21
  * able to eliminate nil or empty fields from the result hashes (default)
22
22
 
23
- NOTE; This Gem is only for importing CSV files - writing of CSV files is not supported.
23
+ NOTE; This Gem is only for importing CSV files - writing of CSV files is not supported at this time.
24
24
 
25
25
  ### Why?
26
26
 
@@ -130,6 +130,8 @@ and how the `process` method returns the number of chunks when called with a blo
130
130
 
131
131
  #### Example 6: Using Value Converters
132
132
 
133
+ NOTE: If you use `key_mappings` and `value_converters`, make sure that the value converters has references the keys based on the final mapped name, not the original name in the CSV file.
134
+
133
135
  $ cat spec/fixtures/with_dates.csv
134
136
  first,last,date,price
135
137
  Ben,Miller,10/30/1998,$44.50
@@ -163,6 +165,9 @@ and how the `process` method returns the number of chunks when called with a blo
163
165
  data[0][:price].class
164
166
  => Float
165
167
 
168
+ ## Parallel Processing
169
+ [Jack](https://github.com/xjlin0) wrote an interesting article about [Speeding up CSV parsing with parallel processing](http://xjlin0.github.io/tech/2015/05/25/faster-parsing-csv-with-parallel-processing/)
170
+
166
171
  ## Documentation
167
172
 
168
173
  The `process` method reads and processes a "generalized" CSV file and returns the contents either as an Array of Hashes,
@@ -198,6 +203,8 @@ The options and the block are optional.
198
203
  | :headers_in_file | true | Whether or not the file contains headers as the first line. |
199
204
  | | | Important if the file does not contain headers, |
200
205
  | | | otherwise you would lose the first line of data. |
206
+ | :skip_lines | nil | how many lines to skip before the first line or header line is processed |
207
+ | :force_utf8 | false | force UTF-8 encoding of all lines (including headers) in the CSV file |
201
208
  ---------------------------------------------------------------------------------------------------------------------------------
202
209
  | :value_converters | nil | supply a hash of :header => KlassName; the class needs to implement self.convert(val)|
203
210
  | :remove_empty_values | true | remove values which have nil or empty strings as values |
@@ -270,9 +277,22 @@ Or install it yourself as:
270
277
 
271
278
  $ gem install smarter_csv
272
279
 
280
+ ## Upcoming
281
+
282
+ Planned in the next releases:
283
+ * programmatic header transformations
284
+ * CSV command line
273
285
 
274
286
  ## Changes
275
287
 
288
+ #### 1.1.1 (2016-11-26)
289
+ * added option to `skip_lines` (thanks to wal)
290
+ * added option to `force_utf8` encoding (thanks to jordangraft)
291
+ * bugfix if no headers in input data (thanks to esBeee)
292
+ * ensure input file is closed (thanks to waldyr)
293
+ * improved verbose output (thankd to benmaher)
294
+ * improved documentation
295
+
276
296
  #### 1.1.0 (2015-07-26)
277
297
  * added feature :value_converters, which allows parsing of dates, money, and other things (thanks to Raphaël Bleuse, Lucas Camargo de Almeida, Alejandro)
278
298
  * added error if :headers_in_file is set to false, and no :user_provided_headers are given (thanks to innhyu)
@@ -383,6 +403,7 @@ Please [open an Issue on GitHub](https://github.com/tilo/smarter_csv/issues) if
383
403
  Many thanks to people who have filed issues and sent comments.
384
404
  And a special thanks to those who contributed pull requests:
385
405
 
406
+ * [Jack 0](https://github.com/xjlin0)
386
407
  * [Alejandro](https://github.com/agaviria)
387
408
  * [Lucas Camargo de Almeida](https://github.com/lcalmeida)
388
409
  * [Raphaël Bleuse](https://github.com/bleuse)
@@ -402,6 +423,11 @@ And a special thanks to those who contributed pull requests:
402
423
  * [Jordan Running](https://github.com/jrunning)
403
424
  * [Dave Sanders](https://github.com/DaveSanders)
404
425
  * [Hugo Lepetit](https://github.com/giglemad)
426
+ * [esBeee](https://github.com/esBeee)
427
+ * [Waldyr de Souza](https://github.com/waldyr)
428
+ * [Ben Maher](https://github.com/benmaher)
429
+ * [Wal McConnell](https://github.com/wal)
430
+ * [Jordan Graft](https://github.com/jordangraft)
405
431
 
406
432
 
407
433
  ## Contributing
@@ -9,14 +9,15 @@ module SmarterCSV
9
9
  :remove_empty_values => true, :remove_zero_values => false , :remove_values_matching => nil , :remove_empty_hashes => true , :strip_whitespace => true,
10
10
  :convert_values_to_numeric => true, :strip_chars_from_headers => nil , :user_provided_headers => nil , :headers_in_file => true,
11
11
  :comment_regexp => /^#/, :chunk_size => nil , :key_mapping_hash => nil , :downcase_header => true, :strings_as_keys => false, :file_encoding => 'utf-8',
12
- :remove_unmapped_keys => false, :keep_original_headers => false, :value_converters => nil,
12
+ :remove_unmapped_keys => false, :keep_original_headers => false, :value_converters => nil, :skip_lines => nil, :force_utf8 => false
13
13
  }
14
14
  options = default_options.merge(options)
15
15
  csv_options = options.select{|k,v| [:col_sep, :row_sep, :quote_char].include?(k)} # options.slice(:col_sep, :row_sep, :quote_char)
16
16
  headerA = []
17
17
  result = []
18
18
  old_row_sep = $/
19
- line_count = 0
19
+ file_line_count = 0
20
+ csv_line_count = 0
20
21
  begin
21
22
  f = input.respond_to?(:readline) ? input : File.open(input, "r:#{options[:file_encoding]}")
22
23
 
@@ -26,14 +27,25 @@ module SmarterCSV
26
27
  end
27
28
  $/ = options[:row_sep]
28
29
 
30
+ if options[:skip_lines].to_i > 0
31
+ options[:skip_lines].to_i.times{f.readline}
32
+ end
33
+
29
34
  if options[:headers_in_file] # extract the header line
30
35
  # process the header line in the CSV file..
31
36
  # the first line of a CSV file contains the header .. it might be commented out, so we need to read it anyhow
32
37
  header = f.readline.sub(options[:comment_regexp],'').chomp(options[:row_sep])
33
- line_count += 1
38
+ header = header.encode!('UTF-8', 'binary', invalid: :replace, undef: :replace, replace: '') if options[:force_utf8]
39
+ file_line_count += 1
40
+ csv_line_count += 1
34
41
  header = header.gsub(options[:strip_chars_from_headers], '') if options[:strip_chars_from_headers]
42
+
35
43
  if (header =~ %r{#{options[:quote_char]}}) and (! options[:force_simple_split])
36
- file_headerA = CSV.parse( header, csv_options ).flatten.collect!{|x| x.nil? ? '' : x} # to deal with nil values from CSV.parse
44
+ file_headerA = begin
45
+ CSV.parse( header, csv_options ).flatten.collect!{|x| x.nil? ? '' : x} # to deal with nil values from CSV.parse
46
+ rescue CSV::MalformedCSVError => e
47
+ raise $!, "#{$!} [SmarterCSV: csv line #{csv_line_count}]", $!.backtrace
48
+ end
37
49
  else
38
50
  file_headerA = header.split(options[:col_sep])
39
51
  end
@@ -44,11 +56,9 @@ module SmarterCSV
44
56
  file_headerA.map!{|x| x.downcase } if options[:downcase_header]
45
57
  end
46
58
 
47
- # puts "HeaderA: #{file_headerA.join(' , ')}" if options[:verbose]
48
-
49
59
  file_header_size = file_headerA.size
50
60
  else
51
- raise SmarterCSV::IncorrectOption , "ERROR [smarter_csv]: If :headers_in_file is set to false, you have to provide :user_provided_headers" if ! options.keys.include?(:user_provided_headers)
61
+ raise SmarterCSV::IncorrectOption , "ERROR [smarter_csv]: If :headers_in_file is set to false, you have to provide :user_provided_headers" if options[:user_provided_headers].nil?
52
62
  end
53
63
  if options[:user_provided_headers] && options[:user_provided_headers].class == Array && ! options[:user_provided_headers].empty?
54
64
  # use user-provided headers
@@ -88,22 +98,30 @@ module SmarterCSV
88
98
  # now on to processing all the rest of the lines in the CSV file:
89
99
  while ! f.eof? # we can't use f.readlines() here, because this would read the whole file into memory at once, and eof => true
90
100
  line = f.readline # read one line.. this uses the input_record_separator $/ which we set previously!
91
- line_count += 1
92
- print "processing line %10d\r" % line_count if options[:verbose]
101
+ line = line.encode!('UTF-8', 'binary', invalid: :replace, undef: :replace, replace: '') if options[:force_utf8]
102
+ file_line_count += 1
103
+ csv_line_count += 1
104
+ print "processing file line %10d, csv line %10d\r" % [file_line_count, csv_line_count] if options[:verbose]
93
105
  next if line =~ options[:comment_regexp] # ignore all comment lines if there are any
94
106
 
95
107
  # cater for the quoted csv data containing the row separator carriage return character
96
108
  # in which case the row data will be split across multiple lines (see the sample content in spec/fixtures/carriage_returns_rn.csv)
97
109
  # by detecting the existence of an uneven number of quote characters
110
+ multiline = line.count(options[:quote_char])%2 == 1
98
111
  while line.count(options[:quote_char])%2 == 1
99
- print "line contains uneven number of quote chars so including content of next line" if options[:verbose]
100
112
  line += f.readline
113
+ file_line_count += 1
101
114
  end
115
+ print "\nline contains uneven number of quote chars so including content through file line %d\n" % file_line_count if options[:verbose] && multiline
102
116
 
103
117
  line.chomp! # will use $/ which is set to options[:col_sep]
104
118
 
105
119
  if (line =~ %r{#{options[:quote_char]}}) and (! options[:force_simple_split])
106
- dataA = CSV.parse( line, csv_options ).flatten.collect!{|x| x.nil? ? '' : x} # to deal with nil values from CSV.parse
120
+ dataA = begin
121
+ CSV.parse( line, csv_options ).flatten.collect!{|x| x.nil? ? '' : x} # to deal with nil values from CSV.parse
122
+ rescue CSV::MalformedCSVError => e
123
+ raise $!, "#{$!} [SmarterCSV: csv line #{csv_line_count}]", $!.backtrace
124
+ end
107
125
  else
108
126
  dataA = line.split(options[:col_sep])
109
127
  end
@@ -176,6 +194,10 @@ module SmarterCSV
176
194
  end
177
195
  end
178
196
  end
197
+
198
+ # print new line to retain last processing line message
199
+ print "\n" if options[:verbose]
200
+
179
201
  # last chunk:
180
202
  if ! chunk.nil? && chunk.size > 0
181
203
  # do something with the chunk
@@ -189,6 +211,7 @@ module SmarterCSV
189
211
  end
190
212
  ensure
191
213
  $/ = old_row_sep # make sure this stupid global variable is always reset to it's previous value after we're done!
214
+ f.close
192
215
  end
193
216
  if block_given?
194
217
  return chunk_count # when we do processing through a block we only care how many chunks we processed
@@ -197,11 +220,6 @@ module SmarterCSV
197
220
  end
198
221
  end
199
222
 
200
- # def SmarterCSV.process_csv(*args)
201
- # warn "[DEPRECATION] `process_csv` is deprecated. Please use `process` instead."
202
- # SmarterCSV.process(*args)
203
- # end
204
-
205
223
  private
206
224
  # acts as a road-block to limit processing when iterating over all k/v pairs of a CSV-hash:
207
225
 
@@ -241,8 +259,7 @@ module SmarterCSV
241
259
  end
242
260
  counts["\r"] += 1 if last_char == "\r"
243
261
  # find the key/value pair with the largest counter:
244
- k,v = counts.max_by{|k,v| v}
262
+ k,_ = counts.max_by{|_,v| v}
245
263
  return k # the most frequent one is it
246
264
  end
247
265
  end
248
-
@@ -1,3 +1,3 @@
1
1
  module SmarterCSV
2
- VERSION = "1.1.0"
2
+ VERSION = "1.1.1"
3
3
  end
@@ -0,0 +1,3 @@
1
+ "name","dob"
2
+ "Arnold Schwarzenegger","1947-07-30"
3
+ "Jeff "the dude" Bridges","1949-12-04"
@@ -0,0 +1,3 @@
1
+ "name","dob"dob""
2
+ "Arnold Schwarzenegger","1947-07-30"
3
+ "Jeff Bridges","1949-12-04"
@@ -0,0 +1,8 @@
1
+ Lines
2
+ To
3
+ Skip
4
+ first name,last name,dogs,cats,birds,fish
5
+ Dan,McAllister,2,,,
6
+ Lucy,Laweless,,5,,
7
+ Miles,O'Brian,,,,21
8
+ Nancy,Homes,2,,1,
@@ -0,0 +1,15 @@
1
+ require 'spec_helper'
2
+
3
+ fixture_path = 'spec/fixtures'
4
+
5
+ describe 'be_able_to' do
6
+ it 'close file after using it' do
7
+ options = {:col_sep => "\cA", :row_sep => "\cB", :comment_regexp => /^#/, :strings_as_keys => true}
8
+
9
+ file = File.new("#{fixture_path}/binary.csv")
10
+
11
+ SmarterCSV.process(file, options)
12
+
13
+ file.closed?.should == true
14
+ end
15
+ end
@@ -0,0 +1,21 @@
1
+ require 'spec_helper'
2
+
3
+ fixture_path = 'spec/fixtures'
4
+
5
+ describe 'malformed_csv' do
6
+ subject { lambda { SmarterCSV.process(csv_path) } }
7
+
8
+ context "malformed header" do
9
+ let(:csv_path) { "#{fixture_path}/malformed_header.csv" }
10
+ it { should raise_error(CSV::MalformedCSVError) }
11
+ it { should raise_error(/(Missing or stray quote in line 1|CSV::MalformedCSVError)/) }
12
+ it { should raise_error(CSV::MalformedCSVError) }
13
+ end
14
+
15
+ context "malformed content" do
16
+ let(:csv_path) { "#{fixture_path}/malformed.csv" }
17
+ it { should raise_error(CSV::MalformedCSVError) }
18
+ it { should raise_error(/(Missing or stray quote in line 1|CSV::MalformedCSVError)/) }
19
+ it { should raise_error(CSV::MalformedCSVError) }
20
+ end
21
+ end
@@ -0,0 +1,29 @@
1
+ require 'spec_helper'
2
+
3
+ fixture_path = 'spec/fixtures'
4
+
5
+ describe 'be_able_to' do
6
+ it 'loads_csv_file_skipping_lines' do
7
+ options = {skip_lines: 3}
8
+ data = SmarterCSV.process("#{fixture_path}/skip_lines.csv", options)
9
+ data.size.should == 4
10
+
11
+ data.each do |item|
12
+ item.keys.each do |key|
13
+ [:first_name,:last_name,:dogs,:cats,:birds,:fish].should include(key)
14
+ end
15
+ end
16
+ end
17
+
18
+ it 'loads_csv_with_user_defined_headers' do
19
+ options = {:skip_lines => 3, :headers_in_file => true, :user_provided_headers => [:a,:b,:c,:d,:e,:f]}
20
+ data = SmarterCSV.process("#{fixture_path}/skip_lines.csv", options)
21
+ data.size.should == 4
22
+
23
+ data.each do |item|
24
+ item.keys.each do |key|
25
+ [:a,:b,:c,:d,:e,:f].should include( key )
26
+ end
27
+ end
28
+ end
29
+ end
metadata CHANGED
@@ -1,7 +1,7 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: smarter_csv
3
3
  version: !ruby/object:Gem::Version
4
- version: 1.1.0
4
+ version: 1.1.1
5
5
  platform: ruby
6
6
  authors:
7
7
  - |
@@ -9,7 +9,7 @@ authors:
9
9
  autorequire:
10
10
  bindir: bin
11
11
  cert_chain: []
12
- date: 2015-07-27 00:00:00.000000000 Z
12
+ date: 2016-11-26 00:00:00.000000000 Z
13
13
  dependencies:
14
14
  - !ruby/object:Gem::Dependency
15
15
  name: rspec
@@ -59,18 +59,22 @@ files:
59
59
  - spec/fixtures/line_endings_r.csv
60
60
  - spec/fixtures/line_endings_rn.csv
61
61
  - spec/fixtures/lots_of_columns.csv
62
+ - spec/fixtures/malformed.csv
63
+ - spec/fixtures/malformed_header.csv
62
64
  - spec/fixtures/money.csv
63
65
  - spec/fixtures/no_header.csv
64
66
  - spec/fixtures/numeric.csv
65
67
  - spec/fixtures/pets.csv
66
68
  - spec/fixtures/quoted.csv
67
69
  - spec/fixtures/separator.csv
70
+ - spec/fixtures/skip_lines.csv
68
71
  - spec/fixtures/with_dashes.csv
69
72
  - spec/fixtures/with_dates.csv
70
73
  - spec/smarter_csv/binary_file2_spec.rb
71
74
  - spec/smarter_csv/binary_file_spec.rb
72
75
  - spec/smarter_csv/carriage_return_spec.rb
73
76
  - spec/smarter_csv/chunked_reading_spec.rb
77
+ - spec/smarter_csv/close_file_spec.rb
74
78
  - spec/smarter_csv/column_separator_spec.rb
75
79
  - spec/smarter_csv/convert_values_to_numeric_spec.rb
76
80
  - spec/smarter_csv/header_transformation_spec.rb
@@ -78,6 +82,7 @@ files:
78
82
  - spec/smarter_csv/key_mapping_spec.rb
79
83
  - spec/smarter_csv/line_ending_spec.rb
80
84
  - spec/smarter_csv/load_basic_spec.rb
85
+ - spec/smarter_csv/malformed_spec.rb
81
86
  - spec/smarter_csv/no_header_spec.rb
82
87
  - spec/smarter_csv/not_downcase_header_spec.rb
83
88
  - spec/smarter_csv/quoted_spec.rb
@@ -86,6 +91,7 @@ files:
86
91
  - spec/smarter_csv/remove_not_mapped_keys_spec.rb
87
92
  - spec/smarter_csv/remove_values_matching_spec.rb
88
93
  - spec/smarter_csv/remove_zero_values_spec.rb
94
+ - spec/smarter_csv/skip_lines_spec.rb
89
95
  - spec/smarter_csv/strings_as_keys_spec.rb
90
96
  - spec/smarter_csv/strip_chars_from_headers_spec.rb
91
97
  - spec/smarter_csv/value_converters_spec.rb
@@ -132,18 +138,22 @@ test_files:
132
138
  - spec/fixtures/line_endings_r.csv
133
139
  - spec/fixtures/line_endings_rn.csv
134
140
  - spec/fixtures/lots_of_columns.csv
141
+ - spec/fixtures/malformed.csv
142
+ - spec/fixtures/malformed_header.csv
135
143
  - spec/fixtures/money.csv
136
144
  - spec/fixtures/no_header.csv
137
145
  - spec/fixtures/numeric.csv
138
146
  - spec/fixtures/pets.csv
139
147
  - spec/fixtures/quoted.csv
140
148
  - spec/fixtures/separator.csv
149
+ - spec/fixtures/skip_lines.csv
141
150
  - spec/fixtures/with_dashes.csv
142
151
  - spec/fixtures/with_dates.csv
143
152
  - spec/smarter_csv/binary_file2_spec.rb
144
153
  - spec/smarter_csv/binary_file_spec.rb
145
154
  - spec/smarter_csv/carriage_return_spec.rb
146
155
  - spec/smarter_csv/chunked_reading_spec.rb
156
+ - spec/smarter_csv/close_file_spec.rb
147
157
  - spec/smarter_csv/column_separator_spec.rb
148
158
  - spec/smarter_csv/convert_values_to_numeric_spec.rb
149
159
  - spec/smarter_csv/header_transformation_spec.rb
@@ -151,6 +161,7 @@ test_files:
151
161
  - spec/smarter_csv/key_mapping_spec.rb
152
162
  - spec/smarter_csv/line_ending_spec.rb
153
163
  - spec/smarter_csv/load_basic_spec.rb
164
+ - spec/smarter_csv/malformed_spec.rb
154
165
  - spec/smarter_csv/no_header_spec.rb
155
166
  - spec/smarter_csv/not_downcase_header_spec.rb
156
167
  - spec/smarter_csv/quoted_spec.rb
@@ -159,6 +170,7 @@ test_files:
159
170
  - spec/smarter_csv/remove_not_mapped_keys_spec.rb
160
171
  - spec/smarter_csv/remove_values_matching_spec.rb
161
172
  - spec/smarter_csv/remove_zero_values_spec.rb
173
+ - spec/smarter_csv/skip_lines_spec.rb
162
174
  - spec/smarter_csv/strings_as_keys_spec.rb
163
175
  - spec/smarter_csv/strip_chars_from_headers_spec.rb
164
176
  - spec/smarter_csv/value_converters_spec.rb