RubyGems - smarter_csv - Versions diffs - 1.0.18 → 1.0.19 - Mend

smarter_csv 1.0.18 → 1.0.19

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (6) hide show

checksums.yaml +4 -4
data/README.md +48 -34
data/lib/smarter_csv/smarter_csv.rb +8 -6
data/lib/smarter_csv/version.rb +1 -1
data/spec/smarter_csv/keep_headers_spec.rb +24 -0
metadata +4 -2

checksums.yaml CHANGED

@@ -1,7 +1,7 @@
 ---
 SHA1:
-  metadata.gz: 7ac58be862e87f8334e8620b317e2d4cc534b881
-  data.tar.gz: 91d2b6f6d80b70cfd3878e2a82abf3cb5c7e66fb
+  metadata.gz: 9022f349dd8ee2590c73b198fb83114a8c96932d
+  data.tar.gz: 9c1a769c72e08e2e78d15ad444bf3f9642cd33e5
 SHA512:
-  metadata.gz: 90c70c2eee91b085414aefbd0c1a59268950a2ef3d0ba4a33ec74bc7795b0712ac98e6a9f61f211774dc5c467fb6dcf0ec84c972ba587491bb480dd31915d0d7
-  data.tar.gz: e2688ae5b2a0f3e36e07b0e8d90847541efadb9263ccdc215b470296a078618cfd93ccd342abbcf466648f5ee5423bae5bb9bbfa69baab0157d5a426eff853bd
+  metadata.gz: e3ccf944663244bc4b336d9980c26f1fda874d48586a131f3c761b6885a2753ac443c80a559046e2c6670f90ba192155e10aceb0e84798add22c9a20d78653a1
+  data.tar.gz: 69b3abf03488df9b79b796dd7efbc3612bd273fb1e4f6f156b238213ef377e22ff1852ae17fb7384722dfbba456d9ab36313e5d2c43d5a599a696d008194cd29

data/README.md CHANGED

@@ -1,6 +1,6 @@
 # SmarterCSV  [![Build Status](https://secure.travis-ci.org/tilo/smarter_csv.png?branch=master)](http://travis-ci.org/tilo/smarter_csv)
-`smarter_csv` is a Ruby Gem for smarter importing of CSV Files as Array(s) of Hashes, suitable for direct processing with Mongoid or ActiveRecord,
+`smarter_csv` is a Ruby Gem for smarter importing of CSV Files as Array(s) of Hashes, suitable for direct processing with Mongoid or ActiveRecord,
 and parallel processing with Resque or Sidekiq.
 One `smarter_csv` user wrote:
@@ -32,6 +32,8 @@ The two main choices you have in terms of how to call `SmarterCSV.process` are:
  * calling `process` with or without a block
  * passing a `:chunk_size` to the `process` method, and processing the CSV-file in chunks, rather than in one piece.
+Tip: If you are uncertain about what line endings a CSV-file uses, try specifying `:row_sep => :auto` as part of the options. Checkout Example 5 for unusual `:row_sep` and `:col_sep`.
 #### Example 1a: How SmarterCSV processes CSV-files as array of hashes:
 Please note how each hash contains only the keys for columns with non-null values.
@@ -40,15 +42,15 @@ Please note how each hash contains only the keys for columns with non-null value
      Dan,McAllister,2,,,
      Lucy,Laweless,,5,,
      Miles,O'Brian,,,,21
-     Nancy,Homes,2,,1,
+     Nancy,Homes,2,,1,
      $ irb
      > require 'smarter_csv'
-      => true
+      => true
      > pets_by_owner = SmarterCSV.process('/tmp/pets.csv')
       => [ {:first_name=>"Dan", :last_name=>"McAllister", :dogs=>"2"},
-           {:first_name=>"Lucy", :last_name=>"Laweless", :cats=>"5"},
-           {:first_name=>"Miles", :last_name=>"O'Brian", :fish=>"21"},
-           {:first_name=>"Nancy", :last_name=>"Homes", :dogs=>"2", :birds=>"1"}
+           {:first_name=>"Lucy", :last_name=>"Laweless", :cats=>"5"},
+           {:first_name=>"Miles", :last_name=>"O'Brian", :fish=>"21"},
+           {:first_name=>"Nancy", :last_name=>"Homes", :dogs=>"2", :birds=>"1"}
          ]
@@ -57,7 +59,7 @@ Please note how the returned array contains two sub-arrays containing the chunks
 In case the number of rows is not cleanly divisible by `:chunk_size`, the last chunk contains fewer hashes.
      > pets_by_owner = SmarterCSV.process('/tmp/pets.csv', {:chunk_size => 2, :key_mapping => {:first_name => :first, :last_name => :last}})
-       => [ [ {:first=>"Dan", :last=>"McAllister", :dogs=>"2"}, {:first=>"Lucy", :last=>"Laweless", :cats=>"5"} ],
+       => [ [ {:first=>"Dan", :last=>"McAllister", :dogs=>"2"}, {:first=>"Lucy", :last=>"Laweless", :cats=>"5"} ],
             [ {:first=>"Miles", :last=>"O'Brian", :fish=>"21"}, {:first=>"Nancy", :last=>"Homes", :dogs=>"2", :birds=>"1"} ]
           ]
@@ -75,7 +77,7 @@ and how the `process` method returns the number of chunks when called with a blo
        [{:dogs=>"2", :full_name=>"Dan McAllister"}, {:cats=>"5", :full_name=>"Lucy Laweless"}]
        [{:fish=>"21", :full_name=>"Miles O'Brian"}, {:dogs=>"2", :birds=>"1", :full_name=>"Nancy Homes"}]
-        => 2
+        => 2
 #### Example 2: Reading a CSV-File in one Chunk, returning one Array of Hashes:
@@ -88,20 +90,21 @@ and how the `process` method returns the number of chunks when called with a blo
     # without using chunks:
     filename = '/tmp/some.csv'
-    n = SmarterCSV.process(filename, {:key_mapping => {:unwanted_row => nil, :old_row_name => :new_name}}) do |array|
+    options = {:key_mapping => {:unwanted_row => nil, :old_row_name => :new_name}}
+    n = SmarterCSV.process(filename, options) do |array|
           # we're passing a block in, to process each resulting hash / =row (the block takes array of hashes)
           # when chunking is not enabled, there is only one hash in each array
           MyModel.create( array.first )
     end
-     => returns number of chunks / rows we processed
+     => returns number of chunks / rows we processed
 #### Example 4: Populate a MongoDB Database in Chunks of 100 records with SmarterCSV:
     # using chunks:
     filename = '/tmp/some.csv'
-    n = SmarterCSV.process(filename, {:chunk_size => 100, :key_mapping => {:unwanted_row => nil, :old_row_name => :new_name}}) do |chunk|
+    options = {:chunk_size => 100, :key_mapping => {:unwanted_row => nil, :old_row_name => :new_name}}
+    n = SmarterCSV.process(filename, options) do |chunk|
           # we're passing a block in, to process each resulting hash / row (block takes array of hashes)
           # when chunking is enabled, there are up to :chunk_size hashes in each chunk
           MyModel.collection.insert( chunk )   # insert up to 100 records at a time
@@ -112,9 +115,12 @@ and how the `process` method returns the number of chunks when called with a blo
 #### Example 5: Reading a CSV-like File, and Processing it with Resque:
-    filename = '/tmp/strange_db_dump'   # a file with CRTL-A as col_separator, and with CTRL-B\n as record_separator (hello iTunes)
-    n = SmarterCSV.process(filename, {:col_sep => "\cA", :row_sep => "\cB\n", :comment_regexp => /^#/,
-            :chunk_size => 100 , :key_mapping => {:export_date => nil, :name => :genre}}) do |chunk|
+    filename = '/tmp/strange_db_dump'   # a file with CRTL-A as col_separator, and with CTRL-B\n as record_separator (hello iTunes!)
+    options = {
+      :col_sep => "\cA", :row_sep => "\cB\n", :comment_regexp => /^#/,
+      :chunk_size => 100 , :key_mapping => {:export_date => nil, :name => :genre}
+    }
+    n = SmarterCSV.process(filename, options) do |chunk|
         Resque.enque( ResqueWorkerClass, chunk ) # pass chunks of CSV-data to Resque workers for parallel processing
     end
     => returns number of chunks
@@ -139,18 +145,14 @@ The options and the block are optional.
      | :quote_char                 |   '"'    | quotation character                                                                  |
      | :comment_regexp             |   /^#/   | regular expression which matches comment lines (see NOTE about the CSV header)       |
      | :chunk_size                 |   nil    | if set, determines the desired chunk-size (defaults to nil, no chunk processing)     |
+     ---------------------------------------------------------------------------------------------------------------------------------
      | :key_mapping                |   nil    | a hash which maps headers from the CSV file to keys in the result hash               |
      | :remove_unmapped_keys       |   false  | when using :key_mapping option, should non-mapped keys / columns be removed?         |
      | :downcase_header            |   true   | downcase all column headers                                                          |
      | :strings_as_keys            |   false  | use strings instead of symbols as the keys in the result hashes                      |
      | :strip_whitespace           |   true   | remove whitespace before/after values and headers                                    |
-     | :remove_empty_values        |   true   | remove values which have nil or empty strings as values                              |
-     | :remove_zero_values         |   true   | remove values which have a numeric value equal to zero / 0                           |
-     | :remove_values_matching     |   nil    | removes key/value pairs if value matches given regular expressions. e.g.:            |
-     |                             |          | /^\$0\.0+$/ to match $0.00 , or /^#VALUE!$/ to match errors in Excel spreadsheets    |
-     | :convert_values_to_numeric  |   true   | converts strings containing Integers or Floats to the appropriate class              |
-     |                             |          |      also accepts either {:except => [:key1,:key2]} or {:only => :key3}              |
-     | :remove_empty_hashes        |   true   | remove / ignore any hashes which don't have any key/value pairs                      |
+     | :keep_original_headers      |   false  | keep the original headers from the CSV-file as-is.                                   |
+     |                             |          | Disables other flags manipulating the header fields.                                 |
      | :user_provided_headers      |   nil    | *careful with that axe!*                                                             |
      |                             |          | user provided Array of header strings or symbols, to define                          |
      |                             |          | what headers should be used, overriding any in-file headers.                         |
@@ -159,6 +161,14 @@ The options and the block are optional.
      | :headers_in_file            |   true   | Whether or not the file contains headers as the first line.                          |
      |                             |          | Important if the file does not contain headers,                                      |
      |                             |          | otherwise you would lose the first line of data.                                     |
+     ---------------------------------------------------------------------------------------------------------------------------------
+     | :remove_empty_values        |   true   | remove values which have nil or empty strings as values                              |
+     | :remove_zero_values         |   true   | remove values which have a numeric value equal to zero / 0                           |
+     | :remove_values_matching     |   nil    | removes key/value pairs if value matches given regular expressions. e.g.:            |
+     |                             |          | /^\$0\.0+$/ to match $0.00 , or /^#VALUE!$/ to match errors in Excel spreadsheets    |
+     | :convert_values_to_numeric  |   true   | converts strings containing Integers or Floats to the appropriate class              |
+     |                             |          |      also accepts either {:except => [:key1,:key2]} or {:only => :key3}              |
+     | :remove_empty_hashes        |   true   | remove / ignore any hashes which don't have any key/value pairs                      |
      | :file_encoding              |   utf-8  | Set the file encoding eg.: 'windows-1252' or 'iso-8859-1'                            |
      | :force_simple_split         |   false  | force simiple splitting on :col_sep character for non-standard CSV-files.            |
      |                             |          | e.g. when :quote_char is not properly escaped                                        |
@@ -225,18 +235,21 @@ Or install it yourself as:
 ## Changes
+#### 1.0.19 (2014-10-29)
+ * added option :keep_original_headers to keep CSV-headers as-is (thanks to Benjamin Thouret)
 #### 1.0.18 (2014-10-27)
  * added support for multi-line fields / csv fields containing CR (thanks to Chris Hilton) (issue #31)
 #### 1.0.17 (2014-01-13)
  * added option to set :row_sep to :auto , for automatic detection of the row-separator (issue #22)
 #### 1.0.16 (2014-01-13)
  * :convert_values_to_numeric option can now be qualified with :except or :only (thanks to Hugo Lepetit)
  * removed deprecated `process_csv` method
 #### 1.0.15 (2013-12-07)
- * new option:
+ * new option:
    * :remove_unmapped_keys  to completely ignore columns which were not mapped with :key_mapping (thanks to Dave Sanders)
 #### 1.0.14 (2013-11-01)
@@ -281,12 +294,12 @@ Or install it yourself as:
 #### 1.0.4 (2012-08-17)
- * renamed the following options:
+ * renamed the following options:
     * :strip_whitepace_from_values => :strip_whitespace   - removes leading/trailing whitespace from headers and values
 #### 1.0.3 (2012-08-16)
- * added the following options:
+ * added the following options:
     * :strip_whitepace_from_values   - removes leading/trailing whitespace from values
 #### 1.0.2 (2012-08-02)
@@ -297,7 +310,7 @@ Or install it yourself as:
 #### 1.0.1 (2012-07-30)
- * added the following options:
+ * added the following options:
     * :downcase_header
     * :strings_as_keys
     * :remove_zero_values
@@ -307,7 +320,7 @@ Or install it yourself as:
  * renamed the following options:
     * :remove_empty_fields => :remove_empty_values
 #### 1.0.0 (2012-07-29)
@@ -323,15 +336,16 @@ Please [open an Issue on GitHub](https://github.com/tilo/smarter_csv/issues) if
 ## Special Thanks
-Many thanks to people who have filed issues and sent comments.
+Many thanks to people who have filed issues and sent comments.
 And a special thanks to those who contributed pull requests:
+ * [Benjamin Thouret](https://github.com/benichu)
  * [Chris Hilton](https://github.com/chrismhilton)
  * [Sean Duckett](http://github.com/sduckett)
- * [Alex Ong](http://github.com/khaong)
- * [Martin Nilsson](http://github.com/MrTin)
- * [Eustáquio Rangel](http://github.com/taq)
- * [Pavel](http://github.com/paxa)
+ * [Alex Ong](http://github.com/khaong)
+ * [Martin Nilsson](http://github.com/MrTin)
+ * [Eustáquio Rangel](http://github.com/taq)
+ * [Pavel](http://github.com/paxa)
  * [Félix Bellanger](https://github.com/Keeguon)
  * [Graham Wetzler](https://github.com/grahamwetzler)
  * [Marcos G. Zimmermann](https://github.com/marcosgz)

data/lib/smarter_csv/smarter_csv.rb CHANGED

@@ -9,7 +9,7 @@ module SmarterCSV
       :remove_empty_values => true, :remove_zero_values => false , :remove_values_matching => nil , :remove_empty_hashes => true , :strip_whitespace => true,
       :convert_values_to_numeric => true, :strip_chars_from_headers => nil , :user_provided_headers => nil , :headers_in_file => true,
       :comment_regexp => /^#/, :chunk_size => nil , :key_mapping_hash => nil , :downcase_header => true, :strings_as_keys => false, :file_encoding => 'utf-8',
-      :remove_unmapped_keys => false,
+      :remove_unmapped_keys => false, :keep_original_headers => false,
     }
     options = default_options.merge(options)
     csv_options = options.select{|k,v| [:col_sep, :row_sep, :quote_char].include?(k)} # options.slice(:col_sep, :row_sep, :quote_char)
@@ -39,8 +39,10 @@ module SmarterCSV
         end
         file_headerA.map!{|x| x.gsub(%r/options[:quote_char]/,'') }
         file_headerA.map!{|x| x.strip}  if options[:strip_whitespace]
-        file_headerA.map!{|x| x.gsub(/\s+/,'_')}
-        file_headerA.map!{|x| x.downcase }   if options[:downcase_header]
+        unless options[:keep_original_headers]
+          file_headerA.map!{|x| x.gsub(/\s+/,'_')}
+          file_headerA.map!{|x| x.downcase }   if options[:downcase_header]
+        end
 #        puts "HeaderA: #{file_headerA.join(' , ')}" if options[:verbose]
@@ -59,7 +61,7 @@ module SmarterCSV
       else
         headerA = file_headerA
       end
-      headerA.map!{|x| x.to_sym } unless options[:strings_as_keys]
+      headerA.map!{|x| x.to_sym } unless options[:strings_as_keys] || options[:keep_original_headers]
       unless options[:user_provided_headers] # wouldn't make sense to re-map user provided headers
         key_mappingH = options[:key_mapping]
@@ -90,12 +92,12 @@ module SmarterCSV
         # cater for the quoted csv data containing the row separator carriage return character
         # in which case the row data will be split across multiple lines (see the sample content in spec/fixtures/carriage_returns_rn.csv)
-        # by detecting the existence of an uneven number of quote characters
+        # by detecting the existence of an uneven number of quote characters
         while line.count(options[:quote_char])%2 == 1
           print "line contains uneven number of quote chars so including content of next line" if options[:verbose]
           line += f.readline
         end
         line.chomp!    # will use $/ which is set to options[:col_sep]
         if (line =~ %r{#{options[:quote_char]}}) and (! options[:force_simple_split])

data/lib/smarter_csv/version.rb CHANGED

@@ -1,3 +1,3 @@
 module SmarterCSV
-  VERSION = "1.0.18"
+  VERSION = "1.0.19"
 end

data/spec/smarter_csv/keep_headers_spec.rb ADDED

@@ -0,0 +1,24 @@
+require 'spec_helper'
+fixture_path = 'spec/fixtures'
+describe 'be_able_to' do
+  it 'not_downcase_headers' do
+    options = {:keep_original_headers => true}
+    data = SmarterCSV.process("#{fixture_path}/basic.csv", options)
+    data.size.should == 5
+    # all the keys should be string
+    data.each{|item| item.keys.each{|x| x.class.should be == String}}
+    data.each do |item|
+      item.keys.each do |key|
+        ['First Name','Last Name','Dogs','Cats','Birds','Fish'].should include( key )
+      end
+    end
+    data.each do |h|
+      h.size.should <= 6
+    end
+  end
+end

metadata CHANGED

@@ -1,7 +1,7 @@
 --- !ruby/object:Gem::Specification
 name: smarter_csv
 version: !ruby/object:Gem::Version
-  version: 1.0.18
+  version: 1.0.19
 platform: ruby
 authors:
 - |
@@ -9,7 +9,7 @@ authors:
 autorequire:
 bindir: bin
 cert_chain: []
-date: 2014-10-28 00:00:00.000000000 Z
+date: 2014-10-29 00:00:00.000000000 Z
 dependencies:
 - !ruby/object:Gem::Dependency
   name: rspec
@@ -70,6 +70,7 @@ files:
 - spec/smarter_csv/chunked_reading_spec.rb
 - spec/smarter_csv/column_separator_spec.rb
 - spec/smarter_csv/convert_values_to_numeric_spec.rb
+- spec/smarter_csv/keep_headers_spec.rb
 - spec/smarter_csv/key_mapping_spec.rb
 - spec/smarter_csv/line_ending_spec.rb
 - spec/smarter_csv/load_basic_spec.rb
@@ -137,6 +138,7 @@ test_files:
 - spec/smarter_csv/chunked_reading_spec.rb
 - spec/smarter_csv/column_separator_spec.rb
 - spec/smarter_csv/convert_values_to_numeric_spec.rb
+- spec/smarter_csv/keep_headers_spec.rb
 - spec/smarter_csv/key_mapping_spec.rb
 - spec/smarter_csv/line_ending_spec.rb
 - spec/smarter_csv/load_basic_spec.rb