smarter_csv 1.2.7 → 1.4.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
data/README.md CHANGED
@@ -1,19 +1,24 @@
1
- # SmarterCSV
2
-
3
- [![Build Status](https://secure.travis-ci.org/tilo/smarter_csv.svg?branch=master)](http://travis-ci.org/tilo/smarter_csv) [![Gem Version](https://badge.fury.io/rb/smarter_csv.svg)](http://badge.fury.io/rb/smarter_csv)
4
1
 
5
- ---------------
6
2
  #### Service Announcement
7
3
 
8
- Work towards SmarterCSV 2.0 is on it's way, with much improved features, and more streamlined options.
4
+ * Work towards SmarterCSV 2.0 is still on it's way, with much improved features, and more streamlined options.
5
+ Please check the [2.0-develop branch](https://github.com/tilo/smarter_csv/blob/master/README.md), open any issues and pull requests with mention of v2.0.
6
+
7
+ * New versions of SmarterCSV 1.x will soon print a deprecation warning if you set :verbose to true
8
+ See below for list of deprecated options.
9
9
 
10
- Please check the 2.0-develop branch, and open issues marked v2.0 and leave your comments.
10
+ #### Restructured Branches
11
11
 
12
- New versions on the 1.2 branch will soon print a deprecation warning if you set :verbose to true
13
- See below for list of deprecated options.
12
+ * default branch is `main` for 1.x development
13
+ * 2.x development is on `2.0-development`
14
14
 
15
15
  ---------------
16
- #### SmarterCSV
16
+
17
+ # SmarterCSV
18
+
19
+ [![Build Status](https://secure.travis-ci.org/tilo/smarter_csv.svg?branch=master)](http://travis-ci.org/tilo/smarter_csv) [![Gem Version](https://badge.fury.io/rb/smarter_csv.svg)](http://badge.fury.io/rb/smarter_csv)
20
+
21
+ #### SmarterCSV 1.x
17
22
 
18
23
  `smarter_csv` is a Ruby Gem for smarter importing of CSV Files as Array(s) of Hashes, suitable for direct processing with Mongoid or ActiveRecord,
19
24
  and parallel processing with Resque or Sidekiq.
@@ -56,6 +61,7 @@ You can also set the `:row_sep` manually! Checkout Example 5 for unusual `:row_s
56
61
  #### Example 1a: How SmarterCSV processes CSV-files as array of hashes:
57
62
  Please note how each hash contains only the keys for columns with non-null values.
58
63
 
64
+ ```ruby
59
65
  $ cat pets.csv
60
66
  first name,last name,dogs,cats,birds,fish
61
67
  Dan,McAllister,2,,,
@@ -71,21 +77,25 @@ Please note how each hash contains only the keys for columns with non-null value
71
77
  {:first_name=>"Miles", :last_name=>"O'Brian", :fish=>"21"},
72
78
  {:first_name=>"Nancy", :last_name=>"Homes", :dogs=>"2", :birds=>"1"}
73
79
  ]
80
+ ```
74
81
 
75
82
 
76
83
  #### Example 1b: How SmarterCSV processes CSV-files as chunks, returning arrays of hashes:
77
84
  Please note how the returned array contains two sub-arrays containing the chunks which were read, each chunk containing 2 hashes.
78
85
  In case the number of rows is not cleanly divisible by `:chunk_size`, the last chunk contains fewer hashes.
79
86
 
87
+ ```ruby
80
88
  > pets_by_owner = SmarterCSV.process('/tmp/pets.csv', {:chunk_size => 2, :key_mapping => {:first_name => :first, :last_name => :last}})
81
89
  => [ [ {:first=>"Dan", :last=>"McAllister", :dogs=>"2"}, {:first=>"Lucy", :last=>"Laweless", :cats=>"5"} ],
82
90
  [ {:first=>"Miles", :last=>"O'Brian", :fish=>"21"}, {:first=>"Nancy", :last=>"Homes", :dogs=>"2", :birds=>"1"} ]
83
91
  ]
92
+ ```
84
93
 
85
94
  #### Example 1c: How SmarterCSV processes CSV-files as chunks, and passes arrays of hashes to a given block:
86
95
  Please note how the given block is passed the data for each chunk as the parameter (array of hashes),
87
96
  and how the `process` method returns the number of chunks when called with a block
88
97
 
98
+ ```ruby
89
99
  > total_chunks = SmarterCSV.process('/tmp/pets.csv', {:chunk_size => 2, :key_mapping => {:first_name => :first, :last_name => :last}}) do |chunk|
90
100
  chunk.each do |h| # you can post-process the data from each row to your heart's content, and also create virtual attributes:
91
101
  h[:full_name] = [h[:first],h[:last]].join(' ') # create a virtual attribute
@@ -97,16 +107,16 @@ and how the `process` method returns the number of chunks when called with a blo
97
107
  [{:dogs=>"2", :full_name=>"Dan McAllister"}, {:cats=>"5", :full_name=>"Lucy Laweless"}]
98
108
  [{:fish=>"21", :full_name=>"Miles O'Brian"}, {:dogs=>"2", :birds=>"1", :full_name=>"Nancy Homes"}]
99
109
  => 2
100
-
110
+ ```
101
111
  #### Example 2: Reading a CSV-File in one Chunk, returning one Array of Hashes:
102
-
112
+ ```ruby
103
113
  filename = '/tmp/input_file.txt' # TAB delimited file, each row ending with Control-M
104
114
  recordsA = SmarterCSV.process(filename, {:col_sep => "\t", :row_sep => "\cM"}) # no block given
105
115
 
106
116
  => returns an array of hashes
107
-
117
+ ```
108
118
  #### Example 3: Populate a MySQL or MongoDB Database with SmarterCSV:
109
-
119
+ ```ruby
110
120
  # without using chunks:
111
121
  filename = '/tmp/some.csv'
112
122
  options = {:key_mapping => {:unwanted_row => nil, :old_row_name => :new_name}}
@@ -117,9 +127,9 @@ and how the `process` method returns the number of chunks when called with a blo
117
127
  end
118
128
 
119
129
  => returns number of chunks / rows we processed
120
-
130
+ ```
121
131
  #### Example 4: Populate a MongoDB Database in Chunks of 100 records with SmarterCSV:
122
-
132
+ ```ruby
123
133
  # using chunks:
124
134
  filename = '/tmp/some.csv'
125
135
  options = {:chunk_size => 100, :key_mapping => {:unwanted_row => nil, :old_row_name => :new_name}}
@@ -130,10 +140,10 @@ and how the `process` method returns the number of chunks when called with a blo
130
140
  end
131
141
 
132
142
  => returns number of chunks we processed
133
-
143
+ ```
134
144
 
135
145
  #### Example 5: Reading a CSV-like File, and Processing it with Resque:
136
-
146
+ ```ruby
137
147
  filename = '/tmp/strange_db_dump' # a file with CRTL-A as col_separator, and with CTRL-B\n as record_separator (hello iTunes!)
138
148
  options = {
139
149
  :col_sep => "\cA", :row_sep => "\cB\n", :comment_regexp => /^#/,
@@ -143,11 +153,11 @@ and how the `process` method returns the number of chunks when called with a blo
143
153
  Resque.enque( ResqueWorkerClass, chunk ) # pass chunks of CSV-data to Resque workers for parallel processing
144
154
  end
145
155
  => returns number of chunks
146
-
156
+ ```
147
157
  #### Example 6: Using Value Converters
148
158
 
149
159
  NOTE: If you use `key_mappings` and `value_converters`, make sure that the value converters has references the keys based on the final mapped name, not the original name in the CSV file.
150
-
160
+ ```ruby
151
161
  $ cat spec/fixtures/with_dates.csv
152
162
  first,last,date,price
153
163
  Ben,Miller,10/30/1998,$44.50
@@ -180,7 +190,7 @@ NOTE: If you use `key_mappings` and `value_converters`, make sure that the value
180
190
  => 44.50
181
191
  data[0][:price].class
182
192
  => Float
183
-
193
+ ```
184
194
  ## Parallel Processing
185
195
  [Jack](https://github.com/xjlin0) wrote an interesting article about [Speeding up CSV parsing with parallel processing](http://xjlin0.github.io/tech/2015/05/25/faster-parsing-csv-with-parallel-processing)
186
196
 
@@ -207,7 +217,7 @@ The options and the block are optional.
207
217
  | :skip_lines | nil | how many lines to skip before the first line or header line is processed |
208
218
  | :comment_regexp | /^#/ | regular expression which matches comment lines (see NOTE about the CSV header) |
209
219
  ---------------------------------------------------------------------------------------------------------------------------------
210
- | :col_sep | ',' | column separator |
220
+ | :col_sep | ',' | column separator, can be set to :auto |
211
221
  | :force_simple_split | false | force simple splitting on :col_sep character for non-standard CSV-files. |
212
222
  | | | e.g. when :quote_char is not properly escaped |
213
223
  | :row_sep | $/ ,"\n" | row separator or record separator , defaults to system's $/ , which defaults to "\n" |
@@ -222,7 +232,7 @@ The options and the block are optional.
222
232
  | | | user provided Array of header strings or symbols, to define |
223
233
  | | | what headers should be used, overriding any in-file headers. |
224
234
  | | | You can not combine the :user_provided_headers and :key_mapping options |
225
- | :remove_empty_hashes | true | remove / ignore any hashes which don't have any key/value pairs |
235
+ | :remove_empty_hashes | true | remove / ignore any hashes which don't have any key/value pairs or all empty values |
226
236
  | :verbose | false | print out line number while processing (to track down problems in input files) |
227
237
  ---------------------------------------------------------------------------------------------------------------------------------
228
238
 
@@ -248,7 +258,7 @@ And header and data validations will also be supported in 2.x
248
258
  ---------------------------------------------------------------------------------------------------------------------------------
249
259
  | :value_converters | nil | supply a hash of :header => KlassName; the class needs to implement self.convert(val)|
250
260
  | :remove_empty_values | true | remove values which have nil or empty strings as values |
251
- | :remove_zero_values | true | remove values which have a numeric value equal to zero / 0 |
261
+ | :remove_zero_values | false | remove values which have a numeric value equal to zero / 0 |
252
262
  | :remove_values_matching | nil | removes key/value pairs if value matches given regular expressions. e.g.: |
253
263
  | | | /^\$0\.0+$/ to match $0.00 , or /^#VALUE!$/ to match errors in Excel spreadsheets |
254
264
  | :convert_values_to_numeric | true | converts strings containing Integers or Floats to the appropriate class |
@@ -259,19 +269,19 @@ And header and data validations will also be supported in 2.x
259
269
  #### NOTES about File Encodings:
260
270
  * if you have a CSV file which contains unicode characters, you can process it as follows:
261
271
 
262
-
272
+ ```ruby
263
273
  File.open(filename, "r:bom|utf-8") do |f|
264
274
  data = SmarterCSV.process(f);
265
275
  end
266
-
276
+ ```
267
277
  * if the CSV file with unicode characters is in a remote location, similarly you need to give the encoding as an option to the `open` call:
268
-
278
+ ```ruby
269
279
  require 'open-uri'
270
280
  file_location = 'http://your.remote.org/sample.csv'
271
281
  open(file_location, 'r:utf-8') do |f| # don't forget to specify the UTF-8 encoding!!
272
282
  data = SmarterCSV.process(f)
273
283
  end
274
-
284
+ ```
275
285
  #### NOTES about CSV Headers:
276
286
  * as this method parses CSV files, it is assumed that the first line of any file will contain a valid header
277
287
  * the first line with the CSV header may or may not be commented out according to the :comment_regexp
@@ -305,209 +315,27 @@ And header and data validations will also be supported in 2.x
305
315
  ## Installation
306
316
 
307
317
  Add this line to your application's Gemfile:
308
-
318
+ ```ruby
309
319
  gem 'smarter_csv'
310
-
320
+ ```
311
321
  And then execute:
312
-
322
+ ```ruby
313
323
  $ bundle
314
-
324
+ ```
315
325
  Or install it yourself as:
316
-
326
+ ```ruby
317
327
  $ gem install smarter_csv
318
-
319
- ## Upcoming
320
-
321
- Planned in the next releases:
322
- * programmatic header transformations
323
- * CSV command line
324
-
325
- ## Changes
326
-
327
- #### 1.2.6 (2018-11-13)
328
- * fixing error caused by calling f.close when we do not hand in a file
329
-
330
- #### 1.2.5 (2018-09-16)
331
- * fixing issue #136 with comments in CSV files
332
- * fixing error class hierarchy
333
-
334
- #### 1.2.4 (2018-08-06)
335
- * using Rails blank? if it's available
336
-
337
- #### 1.2.3 (2018-01-27)
338
- * fixed regression / test
339
- * fuxed quote_char interpolation for headers, but not data (thanks to Colin Petruno)
340
- * bugfix (thanks to Joshua Smith for reporting)
341
-
342
- #### 1.2.0 (2018-01-20)
343
- * add default validation that a header can only appear once
344
- * add option `required_headers`
345
-
346
- #### 1.1.5 (2017-11-05)
347
- * fix issue with invalid byte sequences in header (issue #103, thanks to Dave Myron)
348
- * fix issue with invalid byte sequences in multi-line data (thanks to Ivan Ushakov)
349
- * analyze only 500 characters by default when `:row_sep => :auto` is used.
350
- added option `row_sep_auto_chars` to change the default if necessary. (thanks to Matthieu Paret)
351
-
352
- #### 1.1.4 (2017-01-16)
353
- * fixing UTF-8 related bug which was introduced in 1.1.2 (thanks to Tirdad C.)
354
-
355
- #### 1.1.3 (2016-12-30)
356
- * added warning when options indicate UTF-8 processing, but input filehandle is not opened with r:UTF-8 option
357
-
358
- #### 1.1.2 (2016-12-29)
359
- * added option `invalid_byte_sequence` (thanks to polycarpou)
360
- * added comments on handling of UTF-8 encoding when opening from File vs. OpenURI (thanks to KevinColemanInc)
361
-
362
- #### 1.1.1 (2016-11-26)
363
- * added option to `skip_lines` (thanks to wal)
364
- * added option to `force_utf8` encoding (thanks to jordangraft)
365
- * bugfix if no headers in input data (thanks to esBeee)
366
- * ensure input file is closed (thanks to waldyr)
367
- * improved verbose output (thankd to benmaher)
368
- * improved documentation
369
-
370
- #### 1.1.0 (2015-07-26)
371
- * added feature :value_converters, which allows parsing of dates, money, and other things (thanks to Raphaël Bleuse, Lucas Camargo de Almeida, Alejandro)
372
- * added error if :headers_in_file is set to false, and no :user_provided_headers are given (thanks to innhyu)
373
- * added support to convert dashes to underscore characters in headers (thanks to César Camacho)
374
- * fixing automatic detection of \r\n line-endings (thanks to feens)
375
-
376
- #### 1.0.19 (2014-10-29)
377
- * added option :keep_original_headers to keep CSV-headers as-is (thanks to Benjamin Thouret)
378
-
379
- #### 1.0.18 (2014-10-27)
380
- * added support for multi-line fields / csv fields containing CR (thanks to Chris Hilton) (issue #31)
381
-
382
- #### 1.0.17 (2014-01-13)
383
- * added option to set :row_sep to :auto , for automatic detection of the row-separator (issue #22)
384
-
385
- #### 1.0.16 (2014-01-13)
386
- * :convert_values_to_numeric option can now be qualified with :except or :only (thanks to Hugo Lepetit)
387
- * removed deprecated `process_csv` method
388
-
389
- #### 1.0.15 (2013-12-07)
390
- * new option:
391
- * :remove_unmapped_keys to completely ignore columns which were not mapped with :key_mapping (thanks to Dave Sanders)
392
-
393
- #### 1.0.14 (2013-11-01)
394
- * added GPL-2 and MIT license to GEM spec file; if you need another license contact me
395
-
396
- #### 1.0.12 (2013-10-15)
397
- * added RSpec tests
398
-
399
- #### 1.0.11 (2013-09-28)
400
- * bugfix : fixed issue #18 - fixing issue with last chunk not being properly returned (thanks to Jordan Running)
401
- * added RSpec tests
402
-
403
- #### 1.0.10 (2013-06-26)
404
- * bugfix : fixed issue #14 - passing options along to CSV.parse (thanks to Marcos Zimmermann)
405
-
406
- #### 1.0.9 (2013-06-19)
407
- * bugfix : fixed issue #13 with negative integers and floats not being correctly converted (thanks to Graham Wetzler)
408
-
409
- #### 1.0.8 (2013-06-01)
410
-
411
- * bugfix : fixed issue with nil values in inputs with quote-char (thanks to Félix Bellanger)
412
- * new options:
413
- * :force_simple_split : to force simiple splitting on :col_sep character for non-standard CSV-files. e.g. without properly escaped :quote_char
414
- * :verbose : print out line number while processing (to track down problems in input files)
415
-
416
- #### 1.0.7 (2013-05-20)
417
-
418
- * allowing process to work with objects with a 'readline' method (thanks to taq)
419
- * added options:
420
- * :file_encoding : defaults to utf8 (thanks to MrTin, Paxa)
421
-
422
- #### 1.0.6 (2013-05-19)
423
-
424
- * bugfix : quoted fields are now correctly parsed
425
-
426
- #### 1.0.5 (2013-05-08)
427
-
428
- * bugfix : for :headers_in_file option
429
-
430
- #### 1.0.4 (2012-08-17)
431
-
432
- * renamed the following options:
433
- * :strip_whitepace_from_values => :strip_whitespace - removes leading/trailing whitespace from headers and values
434
-
435
- #### 1.0.3 (2012-08-16)
436
-
437
- * added the following options:
438
- * :strip_whitepace_from_values - removes leading/trailing whitespace from values
439
-
440
- #### 1.0.2 (2012-08-02)
441
-
442
- * added more options for dealing with headers:
443
- * :user_provided_headers ,user provided Array with header strings or symbols, to precisely define what the headers should be, overriding any in-file headers (default: nil)
444
- * :headers_in_file , if the file contains headers as the first line (default: true)
445
-
446
- #### 1.0.1 (2012-07-30)
447
-
448
- * added the following options:
449
- * :downcase_header
450
- * :strings_as_keys
451
- * :remove_zero_values
452
- * :remove_values_matching
453
- * :remove_empty_hashes
454
- * :convert_values_to_numeric
455
-
456
- * renamed the following options:
457
- * :remove_empty_fields => :remove_empty_values
458
-
459
-
460
- #### 1.0.0 (2012-07-29)
461
-
462
- * renamed `SmarterCSV.process_csv` to `SmarterCSV.process`.
463
-
464
- #### 1.0.0.pre1 (2012-07-29)
465
-
328
+ ```
329
+ ## [ChangeLog](./CHANGELOG.md)
466
330
 
467
331
  ## Reporting Bugs / Feature Requests
468
332
 
469
333
  Please [open an Issue on GitHub](https://github.com/tilo/smarter_csv/issues) if you have feedback, new feature requests, or want to report a bug. Thank you!
470
334
 
335
+ * please include a small sample CSV file
336
+ * please mention your version of SmarterCSV, Ruby, Rails
471
337
 
472
- ## Special Thanks
473
-
474
- Many thanks to people who have filed issues and sent comments.
475
- And a special thanks to those who contributed pull requests:
476
-
477
- * [Jack 0](https://github.com/xjlin0)
478
- * [Alejandro](https://github.com/agaviria)
479
- * [Lucas Camargo de Almeida](https://github.com/lcalmeida)
480
- * [Raphaël Bleuse](https://github.com/bleuse)
481
- * [feens](https://github.com/feens)
482
- * [César Camacho](https://github.com/chanko)
483
- * [innhyu](https://github.com/innhyu)
484
- * [Benjamin Thouret](https://github.com/benichu)
485
- * [Chris Hilton](https://github.com/chrismhilton)
486
- * [Sean Duckett](http://github.com/sduckett)
487
- * [Alex Ong](http://github.com/khaong)
488
- * [Martin Nilsson](http://github.com/MrTin)
489
- * [Eustáquio Rangel](http://github.com/taq)
490
- * [Pavel](http://github.com/paxa)
491
- * [Félix Bellanger](https://github.com/Keeguon)
492
- * [Graham Wetzler](https://github.com/grahamwetzler)
493
- * [Marcos G. Zimmermann](https://github.com/marcosgz)
494
- * [Jordan Running](https://github.com/jrunning)
495
- * [Dave Sanders](https://github.com/DaveSanders)
496
- * [Hugo Lepetit](https://github.com/giglemad)
497
- * [esBeee](https://github.com/esBeee)
498
- * [Waldyr de Souza](https://github.com/waldyr)
499
- * [Ben Maher](https://github.com/benmaher)
500
- * [Wal McConnell](https://github.com/wal)
501
- * [Jordan Graft](https://github.com/jordangraft)
502
- * [Michael](https://github.com/polycarpou)
503
- * [Kevin Coleman](https://github.com/KevinColemanInc)
504
- * [Tirdad C.](https://github.com/tridadc)
505
- * [Dave Myron](https://github.com/contentfree)
506
- * [Ivan Ushakov](https://github.com/IvanUshakov)
507
- * [Matthieu Paret](https://github.com/mtparet)
508
- * [Rohit Amarnath](https://github.com/ramarnat)
509
- * [Joshua Smith](https://github.com/enviable)
510
- * [Colin Petruno](https://github.com/colinpetruno)
338
+ ## [A Special Thanks to all Contributors!](CONTRIBUTORS.md) 🎉🎉🎉
511
339
 
512
340
 
513
341
  ## Contributing
data/Rakefile CHANGED
@@ -1,26 +1,19 @@
1
1
  #!/usr/bin/env rake
2
2
  require "bundler/gem_tasks"
3
-
4
3
  require 'rubygems'
5
4
  require 'rake'
6
-
7
5
  require 'rspec/core/rake_task'
8
6
 
7
+ task :default => :spec
8
+
9
9
  desc "Run RSpec"
10
10
  RSpec::Core::RakeTask.new do |t|
11
- t.verbose = false
11
+ # t.verbose = false
12
12
  end
13
13
 
14
- desc "Run specs for all test cases"
15
- task :spec_all do
16
- system "rake spec"
14
+ desc 'Run spec with coverage'
15
+ task :coverage do
16
+ ENV['COVERAGE'] = 'true'
17
+ Rake::Task['spec'].execute
18
+ `open coverage/index.html`
17
19
  end
18
-
19
- # task :spec_all do
20
- # %w[active_record data_mapper mongoid].each do |model_adapter|
21
- # puts "MODEL_ADAPTER = #{model_adapter}"
22
- # system "rake spec MODEL_ADAPTER=#{model_adapter}"
23
- # end
24
- # end
25
-
26
- task :default => :spec