smarter_csv 1.0.18 → 1.0.19

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA1:
3
- metadata.gz: 7ac58be862e87f8334e8620b317e2d4cc534b881
4
- data.tar.gz: 91d2b6f6d80b70cfd3878e2a82abf3cb5c7e66fb
3
+ metadata.gz: 9022f349dd8ee2590c73b198fb83114a8c96932d
4
+ data.tar.gz: 9c1a769c72e08e2e78d15ad444bf3f9642cd33e5
5
5
  SHA512:
6
- metadata.gz: 90c70c2eee91b085414aefbd0c1a59268950a2ef3d0ba4a33ec74bc7795b0712ac98e6a9f61f211774dc5c467fb6dcf0ec84c972ba587491bb480dd31915d0d7
7
- data.tar.gz: e2688ae5b2a0f3e36e07b0e8d90847541efadb9263ccdc215b470296a078618cfd93ccd342abbcf466648f5ee5423bae5bb9bbfa69baab0157d5a426eff853bd
6
+ metadata.gz: e3ccf944663244bc4b336d9980c26f1fda874d48586a131f3c761b6885a2753ac443c80a559046e2c6670f90ba192155e10aceb0e84798add22c9a20d78653a1
7
+ data.tar.gz: 69b3abf03488df9b79b796dd7efbc3612bd273fb1e4f6f156b238213ef377e22ff1852ae17fb7384722dfbba456d9ab36313e5d2c43d5a599a696d008194cd29
data/README.md CHANGED
@@ -1,6 +1,6 @@
1
1
  # SmarterCSV [![Build Status](https://secure.travis-ci.org/tilo/smarter_csv.png?branch=master)](http://travis-ci.org/tilo/smarter_csv)
2
2
 
3
- `smarter_csv` is a Ruby Gem for smarter importing of CSV Files as Array(s) of Hashes, suitable for direct processing with Mongoid or ActiveRecord,
3
+ `smarter_csv` is a Ruby Gem for smarter importing of CSV Files as Array(s) of Hashes, suitable for direct processing with Mongoid or ActiveRecord,
4
4
  and parallel processing with Resque or Sidekiq.
5
5
 
6
6
  One `smarter_csv` user wrote:
@@ -32,6 +32,8 @@ The two main choices you have in terms of how to call `SmarterCSV.process` are:
32
32
  * calling `process` with or without a block
33
33
  * passing a `:chunk_size` to the `process` method, and processing the CSV-file in chunks, rather than in one piece.
34
34
 
35
+ Tip: If you are uncertain about what line endings a CSV-file uses, try specifying `:row_sep => :auto` as part of the options. Checkout Example 5 for unusual `:row_sep` and `:col_sep`.
36
+
35
37
  #### Example 1a: How SmarterCSV processes CSV-files as array of hashes:
36
38
  Please note how each hash contains only the keys for columns with non-null values.
37
39
 
@@ -40,15 +42,15 @@ Please note how each hash contains only the keys for columns with non-null value
40
42
  Dan,McAllister,2,,,
41
43
  Lucy,Laweless,,5,,
42
44
  Miles,O'Brian,,,,21
43
- Nancy,Homes,2,,1,
45
+ Nancy,Homes,2,,1,
44
46
  $ irb
45
47
  > require 'smarter_csv'
46
- => true
48
+ => true
47
49
  > pets_by_owner = SmarterCSV.process('/tmp/pets.csv')
48
50
  => [ {:first_name=>"Dan", :last_name=>"McAllister", :dogs=>"2"},
49
- {:first_name=>"Lucy", :last_name=>"Laweless", :cats=>"5"},
50
- {:first_name=>"Miles", :last_name=>"O'Brian", :fish=>"21"},
51
- {:first_name=>"Nancy", :last_name=>"Homes", :dogs=>"2", :birds=>"1"}
51
+ {:first_name=>"Lucy", :last_name=>"Laweless", :cats=>"5"},
52
+ {:first_name=>"Miles", :last_name=>"O'Brian", :fish=>"21"},
53
+ {:first_name=>"Nancy", :last_name=>"Homes", :dogs=>"2", :birds=>"1"}
52
54
  ]
53
55
 
54
56
 
@@ -57,7 +59,7 @@ Please note how the returned array contains two sub-arrays containing the chunks
57
59
  In case the number of rows is not cleanly divisible by `:chunk_size`, the last chunk contains fewer hashes.
58
60
 
59
61
  > pets_by_owner = SmarterCSV.process('/tmp/pets.csv', {:chunk_size => 2, :key_mapping => {:first_name => :first, :last_name => :last}})
60
- => [ [ {:first=>"Dan", :last=>"McAllister", :dogs=>"2"}, {:first=>"Lucy", :last=>"Laweless", :cats=>"5"} ],
62
+ => [ [ {:first=>"Dan", :last=>"McAllister", :dogs=>"2"}, {:first=>"Lucy", :last=>"Laweless", :cats=>"5"} ],
61
63
  [ {:first=>"Miles", :last=>"O'Brian", :fish=>"21"}, {:first=>"Nancy", :last=>"Homes", :dogs=>"2", :birds=>"1"} ]
62
64
  ]
63
65
 
@@ -75,7 +77,7 @@ and how the `process` method returns the number of chunks when called with a blo
75
77
 
76
78
  [{:dogs=>"2", :full_name=>"Dan McAllister"}, {:cats=>"5", :full_name=>"Lucy Laweless"}]
77
79
  [{:fish=>"21", :full_name=>"Miles O'Brian"}, {:dogs=>"2", :birds=>"1", :full_name=>"Nancy Homes"}]
78
- => 2
80
+ => 2
79
81
 
80
82
  #### Example 2: Reading a CSV-File in one Chunk, returning one Array of Hashes:
81
83
 
@@ -88,20 +90,21 @@ and how the `process` method returns the number of chunks when called with a blo
88
90
 
89
91
  # without using chunks:
90
92
  filename = '/tmp/some.csv'
91
- n = SmarterCSV.process(filename, {:key_mapping => {:unwanted_row => nil, :old_row_name => :new_name}}) do |array|
93
+ options = {:key_mapping => {:unwanted_row => nil, :old_row_name => :new_name}}
94
+ n = SmarterCSV.process(filename, options) do |array|
92
95
  # we're passing a block in, to process each resulting hash / =row (the block takes array of hashes)
93
96
  # when chunking is not enabled, there is only one hash in each array
94
97
  MyModel.create( array.first )
95
98
  end
96
99
 
97
- => returns number of chunks / rows we processed
98
-
100
+ => returns number of chunks / rows we processed
99
101
 
100
102
  #### Example 4: Populate a MongoDB Database in Chunks of 100 records with SmarterCSV:
101
103
 
102
104
  # using chunks:
103
105
  filename = '/tmp/some.csv'
104
- n = SmarterCSV.process(filename, {:chunk_size => 100, :key_mapping => {:unwanted_row => nil, :old_row_name => :new_name}}) do |chunk|
106
+ options = {:chunk_size => 100, :key_mapping => {:unwanted_row => nil, :old_row_name => :new_name}}
107
+ n = SmarterCSV.process(filename, options) do |chunk|
105
108
  # we're passing a block in, to process each resulting hash / row (block takes array of hashes)
106
109
  # when chunking is enabled, there are up to :chunk_size hashes in each chunk
107
110
  MyModel.collection.insert( chunk ) # insert up to 100 records at a time
@@ -112,9 +115,12 @@ and how the `process` method returns the number of chunks when called with a blo
112
115
 
113
116
  #### Example 5: Reading a CSV-like File, and Processing it with Resque:
114
117
 
115
- filename = '/tmp/strange_db_dump' # a file with CRTL-A as col_separator, and with CTRL-B\n as record_separator (hello iTunes)
116
- n = SmarterCSV.process(filename, {:col_sep => "\cA", :row_sep => "\cB\n", :comment_regexp => /^#/,
117
- :chunk_size => 100 , :key_mapping => {:export_date => nil, :name => :genre}}) do |chunk|
118
+ filename = '/tmp/strange_db_dump' # a file with CRTL-A as col_separator, and with CTRL-B\n as record_separator (hello iTunes!)
119
+ options = {
120
+ :col_sep => "\cA", :row_sep => "\cB\n", :comment_regexp => /^#/,
121
+ :chunk_size => 100 , :key_mapping => {:export_date => nil, :name => :genre}
122
+ }
123
+ n = SmarterCSV.process(filename, options) do |chunk|
118
124
  Resque.enque( ResqueWorkerClass, chunk ) # pass chunks of CSV-data to Resque workers for parallel processing
119
125
  end
120
126
  => returns number of chunks
@@ -139,18 +145,14 @@ The options and the block are optional.
139
145
  | :quote_char | '"' | quotation character |
140
146
  | :comment_regexp | /^#/ | regular expression which matches comment lines (see NOTE about the CSV header) |
141
147
  | :chunk_size | nil | if set, determines the desired chunk-size (defaults to nil, no chunk processing) |
148
+ ---------------------------------------------------------------------------------------------------------------------------------
142
149
  | :key_mapping | nil | a hash which maps headers from the CSV file to keys in the result hash |
143
150
  | :remove_unmapped_keys | false | when using :key_mapping option, should non-mapped keys / columns be removed? |
144
151
  | :downcase_header | true | downcase all column headers |
145
152
  | :strings_as_keys | false | use strings instead of symbols as the keys in the result hashes |
146
153
  | :strip_whitespace | true | remove whitespace before/after values and headers |
147
- | :remove_empty_values | true | remove values which have nil or empty strings as values |
148
- | :remove_zero_values | true | remove values which have a numeric value equal to zero / 0 |
149
- | :remove_values_matching | nil | removes key/value pairs if value matches given regular expressions. e.g.: |
150
- | | | /^\$0\.0+$/ to match $0.00 , or /^#VALUE!$/ to match errors in Excel spreadsheets |
151
- | :convert_values_to_numeric | true | converts strings containing Integers or Floats to the appropriate class |
152
- | | | also accepts either {:except => [:key1,:key2]} or {:only => :key3} |
153
- | :remove_empty_hashes | true | remove / ignore any hashes which don't have any key/value pairs |
154
+ | :keep_original_headers | false | keep the original headers from the CSV-file as-is. |
155
+ | | | Disables other flags manipulating the header fields. |
154
156
  | :user_provided_headers | nil | *careful with that axe!* |
155
157
  | | | user provided Array of header strings or symbols, to define |
156
158
  | | | what headers should be used, overriding any in-file headers. |
@@ -159,6 +161,14 @@ The options and the block are optional.
159
161
  | :headers_in_file | true | Whether or not the file contains headers as the first line. |
160
162
  | | | Important if the file does not contain headers, |
161
163
  | | | otherwise you would lose the first line of data. |
164
+ ---------------------------------------------------------------------------------------------------------------------------------
165
+ | :remove_empty_values | true | remove values which have nil or empty strings as values |
166
+ | :remove_zero_values | true | remove values which have a numeric value equal to zero / 0 |
167
+ | :remove_values_matching | nil | removes key/value pairs if value matches given regular expressions. e.g.: |
168
+ | | | /^\$0\.0+$/ to match $0.00 , or /^#VALUE!$/ to match errors in Excel spreadsheets |
169
+ | :convert_values_to_numeric | true | converts strings containing Integers or Floats to the appropriate class |
170
+ | | | also accepts either {:except => [:key1,:key2]} or {:only => :key3} |
171
+ | :remove_empty_hashes | true | remove / ignore any hashes which don't have any key/value pairs |
162
172
  | :file_encoding | utf-8 | Set the file encoding eg.: 'windows-1252' or 'iso-8859-1' |
163
173
  | :force_simple_split | false | force simiple splitting on :col_sep character for non-standard CSV-files. |
164
174
  | | | e.g. when :quote_char is not properly escaped |
@@ -225,18 +235,21 @@ Or install it yourself as:
225
235
 
226
236
  ## Changes
227
237
 
238
+ #### 1.0.19 (2014-10-29)
239
+ * added option :keep_original_headers to keep CSV-headers as-is (thanks to Benjamin Thouret)
240
+
228
241
  #### 1.0.18 (2014-10-27)
229
242
  * added support for multi-line fields / csv fields containing CR (thanks to Chris Hilton) (issue #31)
230
-
243
+
231
244
  #### 1.0.17 (2014-01-13)
232
245
  * added option to set :row_sep to :auto , for automatic detection of the row-separator (issue #22)
233
246
 
234
247
  #### 1.0.16 (2014-01-13)
235
248
  * :convert_values_to_numeric option can now be qualified with :except or :only (thanks to Hugo Lepetit)
236
249
  * removed deprecated `process_csv` method
237
-
250
+
238
251
  #### 1.0.15 (2013-12-07)
239
- * new option:
252
+ * new option:
240
253
  * :remove_unmapped_keys to completely ignore columns which were not mapped with :key_mapping (thanks to Dave Sanders)
241
254
 
242
255
  #### 1.0.14 (2013-11-01)
@@ -281,12 +294,12 @@ Or install it yourself as:
281
294
 
282
295
  #### 1.0.4 (2012-08-17)
283
296
 
284
- * renamed the following options:
297
+ * renamed the following options:
285
298
  * :strip_whitepace_from_values => :strip_whitespace - removes leading/trailing whitespace from headers and values
286
299
 
287
300
  #### 1.0.3 (2012-08-16)
288
301
 
289
- * added the following options:
302
+ * added the following options:
290
303
  * :strip_whitepace_from_values - removes leading/trailing whitespace from values
291
304
 
292
305
  #### 1.0.2 (2012-08-02)
@@ -297,7 +310,7 @@ Or install it yourself as:
297
310
 
298
311
  #### 1.0.1 (2012-07-30)
299
312
 
300
- * added the following options:
313
+ * added the following options:
301
314
  * :downcase_header
302
315
  * :strings_as_keys
303
316
  * :remove_zero_values
@@ -307,7 +320,7 @@ Or install it yourself as:
307
320
 
308
321
  * renamed the following options:
309
322
  * :remove_empty_fields => :remove_empty_values
310
-
323
+
311
324
 
312
325
  #### 1.0.0 (2012-07-29)
313
326
 
@@ -323,15 +336,16 @@ Please [open an Issue on GitHub](https://github.com/tilo/smarter_csv/issues) if
323
336
 
324
337
  ## Special Thanks
325
338
 
326
- Many thanks to people who have filed issues and sent comments.
339
+ Many thanks to people who have filed issues and sent comments.
327
340
  And a special thanks to those who contributed pull requests:
328
341
 
342
+ * [Benjamin Thouret](https://github.com/benichu)
329
343
  * [Chris Hilton](https://github.com/chrismhilton)
330
344
  * [Sean Duckett](http://github.com/sduckett)
331
- * [Alex Ong](http://github.com/khaong)
332
- * [Martin Nilsson](http://github.com/MrTin)
333
- * [Eustáquio Rangel](http://github.com/taq)
334
- * [Pavel](http://github.com/paxa)
345
+ * [Alex Ong](http://github.com/khaong)
346
+ * [Martin Nilsson](http://github.com/MrTin)
347
+ * [Eustáquio Rangel](http://github.com/taq)
348
+ * [Pavel](http://github.com/paxa)
335
349
  * [Félix Bellanger](https://github.com/Keeguon)
336
350
  * [Graham Wetzler](https://github.com/grahamwetzler)
337
351
  * [Marcos G. Zimmermann](https://github.com/marcosgz)
@@ -9,7 +9,7 @@ module SmarterCSV
9
9
  :remove_empty_values => true, :remove_zero_values => false , :remove_values_matching => nil , :remove_empty_hashes => true , :strip_whitespace => true,
10
10
  :convert_values_to_numeric => true, :strip_chars_from_headers => nil , :user_provided_headers => nil , :headers_in_file => true,
11
11
  :comment_regexp => /^#/, :chunk_size => nil , :key_mapping_hash => nil , :downcase_header => true, :strings_as_keys => false, :file_encoding => 'utf-8',
12
- :remove_unmapped_keys => false,
12
+ :remove_unmapped_keys => false, :keep_original_headers => false,
13
13
  }
14
14
  options = default_options.merge(options)
15
15
  csv_options = options.select{|k,v| [:col_sep, :row_sep, :quote_char].include?(k)} # options.slice(:col_sep, :row_sep, :quote_char)
@@ -39,8 +39,10 @@ module SmarterCSV
39
39
  end
40
40
  file_headerA.map!{|x| x.gsub(%r/options[:quote_char]/,'') }
41
41
  file_headerA.map!{|x| x.strip} if options[:strip_whitespace]
42
- file_headerA.map!{|x| x.gsub(/\s+/,'_')}
43
- file_headerA.map!{|x| x.downcase } if options[:downcase_header]
42
+ unless options[:keep_original_headers]
43
+ file_headerA.map!{|x| x.gsub(/\s+/,'_')}
44
+ file_headerA.map!{|x| x.downcase } if options[:downcase_header]
45
+ end
44
46
 
45
47
  # puts "HeaderA: #{file_headerA.join(' , ')}" if options[:verbose]
46
48
 
@@ -59,7 +61,7 @@ module SmarterCSV
59
61
  else
60
62
  headerA = file_headerA
61
63
  end
62
- headerA.map!{|x| x.to_sym } unless options[:strings_as_keys]
64
+ headerA.map!{|x| x.to_sym } unless options[:strings_as_keys] || options[:keep_original_headers]
63
65
 
64
66
  unless options[:user_provided_headers] # wouldn't make sense to re-map user provided headers
65
67
  key_mappingH = options[:key_mapping]
@@ -90,12 +92,12 @@ module SmarterCSV
90
92
 
91
93
  # cater for the quoted csv data containing the row separator carriage return character
92
94
  # in which case the row data will be split across multiple lines (see the sample content in spec/fixtures/carriage_returns_rn.csv)
93
- # by detecting the existence of an uneven number of quote characters
95
+ # by detecting the existence of an uneven number of quote characters
94
96
  while line.count(options[:quote_char])%2 == 1
95
97
  print "line contains uneven number of quote chars so including content of next line" if options[:verbose]
96
98
  line += f.readline
97
99
  end
98
-
100
+
99
101
  line.chomp! # will use $/ which is set to options[:col_sep]
100
102
 
101
103
  if (line =~ %r{#{options[:quote_char]}}) and (! options[:force_simple_split])
@@ -1,3 +1,3 @@
1
1
  module SmarterCSV
2
- VERSION = "1.0.18"
2
+ VERSION = "1.0.19"
3
3
  end
@@ -0,0 +1,24 @@
1
+ require 'spec_helper'
2
+
3
+ fixture_path = 'spec/fixtures'
4
+
5
+ describe 'be_able_to' do
6
+ it 'not_downcase_headers' do
7
+ options = {:keep_original_headers => true}
8
+ data = SmarterCSV.process("#{fixture_path}/basic.csv", options)
9
+ data.size.should == 5
10
+ # all the keys should be string
11
+ data.each{|item| item.keys.each{|x| x.class.should be == String}}
12
+
13
+ data.each do |item|
14
+ item.keys.each do |key|
15
+ ['First Name','Last Name','Dogs','Cats','Birds','Fish'].should include( key )
16
+ end
17
+ end
18
+
19
+ data.each do |h|
20
+ h.size.should <= 6
21
+ end
22
+ end
23
+
24
+ end
metadata CHANGED
@@ -1,7 +1,7 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: smarter_csv
3
3
  version: !ruby/object:Gem::Version
4
- version: 1.0.18
4
+ version: 1.0.19
5
5
  platform: ruby
6
6
  authors:
7
7
  - |
@@ -9,7 +9,7 @@ authors:
9
9
  autorequire:
10
10
  bindir: bin
11
11
  cert_chain: []
12
- date: 2014-10-28 00:00:00.000000000 Z
12
+ date: 2014-10-29 00:00:00.000000000 Z
13
13
  dependencies:
14
14
  - !ruby/object:Gem::Dependency
15
15
  name: rspec
@@ -70,6 +70,7 @@ files:
70
70
  - spec/smarter_csv/chunked_reading_spec.rb
71
71
  - spec/smarter_csv/column_separator_spec.rb
72
72
  - spec/smarter_csv/convert_values_to_numeric_spec.rb
73
+ - spec/smarter_csv/keep_headers_spec.rb
73
74
  - spec/smarter_csv/key_mapping_spec.rb
74
75
  - spec/smarter_csv/line_ending_spec.rb
75
76
  - spec/smarter_csv/load_basic_spec.rb
@@ -137,6 +138,7 @@ test_files:
137
138
  - spec/smarter_csv/chunked_reading_spec.rb
138
139
  - spec/smarter_csv/column_separator_spec.rb
139
140
  - spec/smarter_csv/convert_values_to_numeric_spec.rb
141
+ - spec/smarter_csv/keep_headers_spec.rb
140
142
  - spec/smarter_csv/key_mapping_spec.rb
141
143
  - spec/smarter_csv/line_ending_spec.rb
142
144
  - spec/smarter_csv/load_basic_spec.rb