smarter_csv 1.0.18 → 1.0.19

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA1:
3
- metadata.gz: 7ac58be862e87f8334e8620b317e2d4cc534b881
4
- data.tar.gz: 91d2b6f6d80b70cfd3878e2a82abf3cb5c7e66fb
3
+ metadata.gz: 9022f349dd8ee2590c73b198fb83114a8c96932d
4
+ data.tar.gz: 9c1a769c72e08e2e78d15ad444bf3f9642cd33e5
5
5
  SHA512:
6
- metadata.gz: 90c70c2eee91b085414aefbd0c1a59268950a2ef3d0ba4a33ec74bc7795b0712ac98e6a9f61f211774dc5c467fb6dcf0ec84c972ba587491bb480dd31915d0d7
7
- data.tar.gz: e2688ae5b2a0f3e36e07b0e8d90847541efadb9263ccdc215b470296a078618cfd93ccd342abbcf466648f5ee5423bae5bb9bbfa69baab0157d5a426eff853bd
6
+ metadata.gz: e3ccf944663244bc4b336d9980c26f1fda874d48586a131f3c761b6885a2753ac443c80a559046e2c6670f90ba192155e10aceb0e84798add22c9a20d78653a1
7
+ data.tar.gz: 69b3abf03488df9b79b796dd7efbc3612bd273fb1e4f6f156b238213ef377e22ff1852ae17fb7384722dfbba456d9ab36313e5d2c43d5a599a696d008194cd29
data/README.md CHANGED
@@ -1,6 +1,6 @@
1
1
  # SmarterCSV [![Build Status](https://secure.travis-ci.org/tilo/smarter_csv.png?branch=master)](http://travis-ci.org/tilo/smarter_csv)
2
2
 
3
- `smarter_csv` is a Ruby Gem for smarter importing of CSV Files as Array(s) of Hashes, suitable for direct processing with Mongoid or ActiveRecord,
3
+ `smarter_csv` is a Ruby Gem for smarter importing of CSV Files as Array(s) of Hashes, suitable for direct processing with Mongoid or ActiveRecord,
4
4
  and parallel processing with Resque or Sidekiq.
5
5
 
6
6
  One `smarter_csv` user wrote:
@@ -32,6 +32,8 @@ The two main choices you have in terms of how to call `SmarterCSV.process` are:
32
32
  * calling `process` with or without a block
33
33
  * passing a `:chunk_size` to the `process` method, and processing the CSV-file in chunks, rather than in one piece.
34
34
 
35
+ Tip: If you are uncertain about what line endings a CSV-file uses, try specifying `:row_sep => :auto` as part of the options. Checkout Example 5 for unusual `:row_sep` and `:col_sep`.
36
+
35
37
  #### Example 1a: How SmarterCSV processes CSV-files as array of hashes:
36
38
  Please note how each hash contains only the keys for columns with non-null values.
37
39
 
@@ -40,15 +42,15 @@ Please note how each hash contains only the keys for columns with non-null value
40
42
  Dan,McAllister,2,,,
41
43
  Lucy,Laweless,,5,,
42
44
  Miles,O'Brian,,,,21
43
- Nancy,Homes,2,,1,
45
+ Nancy,Homes,2,,1,
44
46
  $ irb
45
47
  > require 'smarter_csv'
46
- => true
48
+ => true
47
49
  > pets_by_owner = SmarterCSV.process('/tmp/pets.csv')
48
50
  => [ {:first_name=>"Dan", :last_name=>"McAllister", :dogs=>"2"},
49
- {:first_name=>"Lucy", :last_name=>"Laweless", :cats=>"5"},
50
- {:first_name=>"Miles", :last_name=>"O'Brian", :fish=>"21"},
51
- {:first_name=>"Nancy", :last_name=>"Homes", :dogs=>"2", :birds=>"1"}
51
+ {:first_name=>"Lucy", :last_name=>"Laweless", :cats=>"5"},
52
+ {:first_name=>"Miles", :last_name=>"O'Brian", :fish=>"21"},
53
+ {:first_name=>"Nancy", :last_name=>"Homes", :dogs=>"2", :birds=>"1"}
52
54
  ]
53
55
 
54
56
 
@@ -57,7 +59,7 @@ Please note how the returned array contains two sub-arrays containing the chunks
57
59
  In case the number of rows is not cleanly divisible by `:chunk_size`, the last chunk contains fewer hashes.
58
60
 
59
61
  > pets_by_owner = SmarterCSV.process('/tmp/pets.csv', {:chunk_size => 2, :key_mapping => {:first_name => :first, :last_name => :last}})
60
- => [ [ {:first=>"Dan", :last=>"McAllister", :dogs=>"2"}, {:first=>"Lucy", :last=>"Laweless", :cats=>"5"} ],
62
+ => [ [ {:first=>"Dan", :last=>"McAllister", :dogs=>"2"}, {:first=>"Lucy", :last=>"Laweless", :cats=>"5"} ],
61
63
  [ {:first=>"Miles", :last=>"O'Brian", :fish=>"21"}, {:first=>"Nancy", :last=>"Homes", :dogs=>"2", :birds=>"1"} ]
62
64
  ]
63
65
 
@@ -75,7 +77,7 @@ and how the `process` method returns the number of chunks when called with a blo
75
77
 
76
78
  [{:dogs=>"2", :full_name=>"Dan McAllister"}, {:cats=>"5", :full_name=>"Lucy Laweless"}]
77
79
  [{:fish=>"21", :full_name=>"Miles O'Brian"}, {:dogs=>"2", :birds=>"1", :full_name=>"Nancy Homes"}]
78
- => 2
80
+ => 2
79
81
 
80
82
  #### Example 2: Reading a CSV-File in one Chunk, returning one Array of Hashes:
81
83
 
@@ -88,20 +90,21 @@ and how the `process` method returns the number of chunks when called with a blo
88
90
 
89
91
  # without using chunks:
90
92
  filename = '/tmp/some.csv'
91
- n = SmarterCSV.process(filename, {:key_mapping => {:unwanted_row => nil, :old_row_name => :new_name}}) do |array|
93
+ options = {:key_mapping => {:unwanted_row => nil, :old_row_name => :new_name}}
94
+ n = SmarterCSV.process(filename, options) do |array|
92
95
  # we're passing a block in, to process each resulting hash / =row (the block takes array of hashes)
93
96
  # when chunking is not enabled, there is only one hash in each array
94
97
  MyModel.create( array.first )
95
98
  end
96
99
 
97
- => returns number of chunks / rows we processed
98
-
100
+ => returns number of chunks / rows we processed
99
101
 
100
102
  #### Example 4: Populate a MongoDB Database in Chunks of 100 records with SmarterCSV:
101
103
 
102
104
  # using chunks:
103
105
  filename = '/tmp/some.csv'
104
- n = SmarterCSV.process(filename, {:chunk_size => 100, :key_mapping => {:unwanted_row => nil, :old_row_name => :new_name}}) do |chunk|
106
+ options = {:chunk_size => 100, :key_mapping => {:unwanted_row => nil, :old_row_name => :new_name}}
107
+ n = SmarterCSV.process(filename, options) do |chunk|
105
108
  # we're passing a block in, to process each resulting hash / row (block takes array of hashes)
106
109
  # when chunking is enabled, there are up to :chunk_size hashes in each chunk
107
110
  MyModel.collection.insert( chunk ) # insert up to 100 records at a time
@@ -112,9 +115,12 @@ and how the `process` method returns the number of chunks when called with a blo
112
115
 
113
116
  #### Example 5: Reading a CSV-like File, and Processing it with Resque:
114
117
 
115
- filename = '/tmp/strange_db_dump' # a file with CRTL-A as col_separator, and with CTRL-B\n as record_separator (hello iTunes)
116
- n = SmarterCSV.process(filename, {:col_sep => "\cA", :row_sep => "\cB\n", :comment_regexp => /^#/,
117
- :chunk_size => 100 , :key_mapping => {:export_date => nil, :name => :genre}}) do |chunk|
118
+ filename = '/tmp/strange_db_dump' # a file with CRTL-A as col_separator, and with CTRL-B\n as record_separator (hello iTunes!)
119
+ options = {
120
+ :col_sep => "\cA", :row_sep => "\cB\n", :comment_regexp => /^#/,
121
+ :chunk_size => 100 , :key_mapping => {:export_date => nil, :name => :genre}
122
+ }
123
+ n = SmarterCSV.process(filename, options) do |chunk|
118
124
  Resque.enque( ResqueWorkerClass, chunk ) # pass chunks of CSV-data to Resque workers for parallel processing
119
125
  end
120
126
  => returns number of chunks
@@ -139,18 +145,14 @@ The options and the block are optional.
139
145
  | :quote_char | '"' | quotation character |
140
146
  | :comment_regexp | /^#/ | regular expression which matches comment lines (see NOTE about the CSV header) |
141
147
  | :chunk_size | nil | if set, determines the desired chunk-size (defaults to nil, no chunk processing) |
148
+ ---------------------------------------------------------------------------------------------------------------------------------
142
149
  | :key_mapping | nil | a hash which maps headers from the CSV file to keys in the result hash |
143
150
  | :remove_unmapped_keys | false | when using :key_mapping option, should non-mapped keys / columns be removed? |
144
151
  | :downcase_header | true | downcase all column headers |
145
152
  | :strings_as_keys | false | use strings instead of symbols as the keys in the result hashes |
146
153
  | :strip_whitespace | true | remove whitespace before/after values and headers |
147
- | :remove_empty_values | true | remove values which have nil or empty strings as values |
148
- | :remove_zero_values | true | remove values which have a numeric value equal to zero / 0 |
149
- | :remove_values_matching | nil | removes key/value pairs if value matches given regular expressions. e.g.: |
150
- | | | /^\$0\.0+$/ to match $0.00 , or /^#VALUE!$/ to match errors in Excel spreadsheets |
151
- | :convert_values_to_numeric | true | converts strings containing Integers or Floats to the appropriate class |
152
- | | | also accepts either {:except => [:key1,:key2]} or {:only => :key3} |
153
- | :remove_empty_hashes | true | remove / ignore any hashes which don't have any key/value pairs |
154
+ | :keep_original_headers | false | keep the original headers from the CSV-file as-is. |
155
+ | | | Disables other flags manipulating the header fields. |
154
156
  | :user_provided_headers | nil | *careful with that axe!* |
155
157
  | | | user provided Array of header strings or symbols, to define |
156
158
  | | | what headers should be used, overriding any in-file headers. |
@@ -159,6 +161,14 @@ The options and the block are optional.
159
161
  | :headers_in_file | true | Whether or not the file contains headers as the first line. |
160
162
  | | | Important if the file does not contain headers, |
161
163
  | | | otherwise you would lose the first line of data. |
164
+ ---------------------------------------------------------------------------------------------------------------------------------
165
+ | :remove_empty_values | true | remove values which have nil or empty strings as values |
166
+ | :remove_zero_values | true | remove values which have a numeric value equal to zero / 0 |
167
+ | :remove_values_matching | nil | removes key/value pairs if value matches given regular expressions. e.g.: |
168
+ | | | /^\$0\.0+$/ to match $0.00 , or /^#VALUE!$/ to match errors in Excel spreadsheets |
169
+ | :convert_values_to_numeric | true | converts strings containing Integers or Floats to the appropriate class |
170
+ | | | also accepts either {:except => [:key1,:key2]} or {:only => :key3} |
171
+ | :remove_empty_hashes | true | remove / ignore any hashes which don't have any key/value pairs |
162
172
  | :file_encoding | utf-8 | Set the file encoding eg.: 'windows-1252' or 'iso-8859-1' |
163
173
  | :force_simple_split | false | force simiple splitting on :col_sep character for non-standard CSV-files. |
164
174
  | | | e.g. when :quote_char is not properly escaped |
@@ -225,18 +235,21 @@ Or install it yourself as:
225
235
 
226
236
  ## Changes
227
237
 
238
+ #### 1.0.19 (2014-10-29)
239
+ * added option :keep_original_headers to keep CSV-headers as-is (thanks to Benjamin Thouret)
240
+
228
241
  #### 1.0.18 (2014-10-27)
229
242
  * added support for multi-line fields / csv fields containing CR (thanks to Chris Hilton) (issue #31)
230
-
243
+
231
244
  #### 1.0.17 (2014-01-13)
232
245
  * added option to set :row_sep to :auto , for automatic detection of the row-separator (issue #22)
233
246
 
234
247
  #### 1.0.16 (2014-01-13)
235
248
  * :convert_values_to_numeric option can now be qualified with :except or :only (thanks to Hugo Lepetit)
236
249
  * removed deprecated `process_csv` method
237
-
250
+
238
251
  #### 1.0.15 (2013-12-07)
239
- * new option:
252
+ * new option:
240
253
  * :remove_unmapped_keys to completely ignore columns which were not mapped with :key_mapping (thanks to Dave Sanders)
241
254
 
242
255
  #### 1.0.14 (2013-11-01)
@@ -281,12 +294,12 @@ Or install it yourself as:
281
294
 
282
295
  #### 1.0.4 (2012-08-17)
283
296
 
284
- * renamed the following options:
297
+ * renamed the following options:
285
298
  * :strip_whitepace_from_values => :strip_whitespace - removes leading/trailing whitespace from headers and values
286
299
 
287
300
  #### 1.0.3 (2012-08-16)
288
301
 
289
- * added the following options:
302
+ * added the following options:
290
303
  * :strip_whitepace_from_values - removes leading/trailing whitespace from values
291
304
 
292
305
  #### 1.0.2 (2012-08-02)
@@ -297,7 +310,7 @@ Or install it yourself as:
297
310
 
298
311
  #### 1.0.1 (2012-07-30)
299
312
 
300
- * added the following options:
313
+ * added the following options:
301
314
  * :downcase_header
302
315
  * :strings_as_keys
303
316
  * :remove_zero_values
@@ -307,7 +320,7 @@ Or install it yourself as:
307
320
 
308
321
  * renamed the following options:
309
322
  * :remove_empty_fields => :remove_empty_values
310
-
323
+
311
324
 
312
325
  #### 1.0.0 (2012-07-29)
313
326
 
@@ -323,15 +336,16 @@ Please [open an Issue on GitHub](https://github.com/tilo/smarter_csv/issues) if
323
336
 
324
337
  ## Special Thanks
325
338
 
326
- Many thanks to people who have filed issues and sent comments.
339
+ Many thanks to people who have filed issues and sent comments.
327
340
  And a special thanks to those who contributed pull requests:
328
341
 
342
+ * [Benjamin Thouret](https://github.com/benichu)
329
343
  * [Chris Hilton](https://github.com/chrismhilton)
330
344
  * [Sean Duckett](http://github.com/sduckett)
331
- * [Alex Ong](http://github.com/khaong)
332
- * [Martin Nilsson](http://github.com/MrTin)
333
- * [Eustáquio Rangel](http://github.com/taq)
334
- * [Pavel](http://github.com/paxa)
345
+ * [Alex Ong](http://github.com/khaong)
346
+ * [Martin Nilsson](http://github.com/MrTin)
347
+ * [Eustáquio Rangel](http://github.com/taq)
348
+ * [Pavel](http://github.com/paxa)
335
349
  * [Félix Bellanger](https://github.com/Keeguon)
336
350
  * [Graham Wetzler](https://github.com/grahamwetzler)
337
351
  * [Marcos G. Zimmermann](https://github.com/marcosgz)
@@ -9,7 +9,7 @@ module SmarterCSV
9
9
  :remove_empty_values => true, :remove_zero_values => false , :remove_values_matching => nil , :remove_empty_hashes => true , :strip_whitespace => true,
10
10
  :convert_values_to_numeric => true, :strip_chars_from_headers => nil , :user_provided_headers => nil , :headers_in_file => true,
11
11
  :comment_regexp => /^#/, :chunk_size => nil , :key_mapping_hash => nil , :downcase_header => true, :strings_as_keys => false, :file_encoding => 'utf-8',
12
- :remove_unmapped_keys => false,
12
+ :remove_unmapped_keys => false, :keep_original_headers => false,
13
13
  }
14
14
  options = default_options.merge(options)
15
15
  csv_options = options.select{|k,v| [:col_sep, :row_sep, :quote_char].include?(k)} # options.slice(:col_sep, :row_sep, :quote_char)
@@ -39,8 +39,10 @@ module SmarterCSV
39
39
  end
40
40
  file_headerA.map!{|x| x.gsub(%r/options[:quote_char]/,'') }
41
41
  file_headerA.map!{|x| x.strip} if options[:strip_whitespace]
42
- file_headerA.map!{|x| x.gsub(/\s+/,'_')}
43
- file_headerA.map!{|x| x.downcase } if options[:downcase_header]
42
+ unless options[:keep_original_headers]
43
+ file_headerA.map!{|x| x.gsub(/\s+/,'_')}
44
+ file_headerA.map!{|x| x.downcase } if options[:downcase_header]
45
+ end
44
46
 
45
47
  # puts "HeaderA: #{file_headerA.join(' , ')}" if options[:verbose]
46
48
 
@@ -59,7 +61,7 @@ module SmarterCSV
59
61
  else
60
62
  headerA = file_headerA
61
63
  end
62
- headerA.map!{|x| x.to_sym } unless options[:strings_as_keys]
64
+ headerA.map!{|x| x.to_sym } unless options[:strings_as_keys] || options[:keep_original_headers]
63
65
 
64
66
  unless options[:user_provided_headers] # wouldn't make sense to re-map user provided headers
65
67
  key_mappingH = options[:key_mapping]
@@ -90,12 +92,12 @@ module SmarterCSV
90
92
 
91
93
  # cater for the quoted csv data containing the row separator carriage return character
92
94
  # in which case the row data will be split across multiple lines (see the sample content in spec/fixtures/carriage_returns_rn.csv)
93
- # by detecting the existence of an uneven number of quote characters
95
+ # by detecting the existence of an uneven number of quote characters
94
96
  while line.count(options[:quote_char])%2 == 1
95
97
  print "line contains uneven number of quote chars so including content of next line" if options[:verbose]
96
98
  line += f.readline
97
99
  end
98
-
100
+
99
101
  line.chomp! # will use $/ which is set to options[:col_sep]
100
102
 
101
103
  if (line =~ %r{#{options[:quote_char]}}) and (! options[:force_simple_split])
@@ -1,3 +1,3 @@
1
1
  module SmarterCSV
2
- VERSION = "1.0.18"
2
+ VERSION = "1.0.19"
3
3
  end
@@ -0,0 +1,24 @@
1
+ require 'spec_helper'
2
+
3
+ fixture_path = 'spec/fixtures'
4
+
5
+ describe 'be_able_to' do
6
+ it 'not_downcase_headers' do
7
+ options = {:keep_original_headers => true}
8
+ data = SmarterCSV.process("#{fixture_path}/basic.csv", options)
9
+ data.size.should == 5
10
+ # all the keys should be string
11
+ data.each{|item| item.keys.each{|x| x.class.should be == String}}
12
+
13
+ data.each do |item|
14
+ item.keys.each do |key|
15
+ ['First Name','Last Name','Dogs','Cats','Birds','Fish'].should include( key )
16
+ end
17
+ end
18
+
19
+ data.each do |h|
20
+ h.size.should <= 6
21
+ end
22
+ end
23
+
24
+ end
metadata CHANGED
@@ -1,7 +1,7 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: smarter_csv
3
3
  version: !ruby/object:Gem::Version
4
- version: 1.0.18
4
+ version: 1.0.19
5
5
  platform: ruby
6
6
  authors:
7
7
  - |
@@ -9,7 +9,7 @@ authors:
9
9
  autorequire:
10
10
  bindir: bin
11
11
  cert_chain: []
12
- date: 2014-10-28 00:00:00.000000000 Z
12
+ date: 2014-10-29 00:00:00.000000000 Z
13
13
  dependencies:
14
14
  - !ruby/object:Gem::Dependency
15
15
  name: rspec
@@ -70,6 +70,7 @@ files:
70
70
  - spec/smarter_csv/chunked_reading_spec.rb
71
71
  - spec/smarter_csv/column_separator_spec.rb
72
72
  - spec/smarter_csv/convert_values_to_numeric_spec.rb
73
+ - spec/smarter_csv/keep_headers_spec.rb
73
74
  - spec/smarter_csv/key_mapping_spec.rb
74
75
  - spec/smarter_csv/line_ending_spec.rb
75
76
  - spec/smarter_csv/load_basic_spec.rb
@@ -137,6 +138,7 @@ test_files:
137
138
  - spec/smarter_csv/chunked_reading_spec.rb
138
139
  - spec/smarter_csv/column_separator_spec.rb
139
140
  - spec/smarter_csv/convert_values_to_numeric_spec.rb
141
+ - spec/smarter_csv/keep_headers_spec.rb
140
142
  - spec/smarter_csv/key_mapping_spec.rb
141
143
  - spec/smarter_csv/line_ending_spec.rb
142
144
  - spec/smarter_csv/load_basic_spec.rb