smarter_csv 1.2.7 → 1.4.2

Sign up to get free protection for your applications and to get access to all the features.
data/README.md CHANGED
@@ -1,19 +1,24 @@
1
- # SmarterCSV
2
-
3
- [![Build Status](https://secure.travis-ci.org/tilo/smarter_csv.svg?branch=master)](http://travis-ci.org/tilo/smarter_csv) [![Gem Version](https://badge.fury.io/rb/smarter_csv.svg)](http://badge.fury.io/rb/smarter_csv)
4
1
 
5
- ---------------
6
2
  #### Service Announcement
7
3
 
8
- Work towards SmarterCSV 2.0 is on it's way, with much improved features, and more streamlined options.
4
+ * Work towards SmarterCSV 2.0 is still on it's way, with much improved features, and more streamlined options.
5
+ Please check the [2.0-develop branch](https://github.com/tilo/smarter_csv/blob/master/README.md), open any issues and pull requests with mention of v2.0.
6
+
7
+ * New versions of SmarterCSV 1.x will soon print a deprecation warning if you set :verbose to true
8
+ See below for list of deprecated options.
9
9
 
10
- Please check the 2.0-develop branch, and open issues marked v2.0 and leave your comments.
10
+ #### Restructured Branches
11
11
 
12
- New versions on the 1.2 branch will soon print a deprecation warning if you set :verbose to true
13
- See below for list of deprecated options.
12
+ * default branch is `main` for 1.x development
13
+ * 2.x development is on `2.0-development`
14
14
 
15
15
  ---------------
16
- #### SmarterCSV
16
+
17
+ # SmarterCSV
18
+
19
+ [![Build Status](https://secure.travis-ci.org/tilo/smarter_csv.svg?branch=master)](http://travis-ci.org/tilo/smarter_csv) [![Gem Version](https://badge.fury.io/rb/smarter_csv.svg)](http://badge.fury.io/rb/smarter_csv)
20
+
21
+ #### SmarterCSV 1.x
17
22
 
18
23
  `smarter_csv` is a Ruby Gem for smarter importing of CSV Files as Array(s) of Hashes, suitable for direct processing with Mongoid or ActiveRecord,
19
24
  and parallel processing with Resque or Sidekiq.
@@ -56,6 +61,7 @@ You can also set the `:row_sep` manually! Checkout Example 5 for unusual `:row_s
56
61
  #### Example 1a: How SmarterCSV processes CSV-files as array of hashes:
57
62
  Please note how each hash contains only the keys for columns with non-null values.
58
63
 
64
+ ```ruby
59
65
  $ cat pets.csv
60
66
  first name,last name,dogs,cats,birds,fish
61
67
  Dan,McAllister,2,,,
@@ -71,21 +77,25 @@ Please note how each hash contains only the keys for columns with non-null value
71
77
  {:first_name=>"Miles", :last_name=>"O'Brian", :fish=>"21"},
72
78
  {:first_name=>"Nancy", :last_name=>"Homes", :dogs=>"2", :birds=>"1"}
73
79
  ]
80
+ ```
74
81
 
75
82
 
76
83
  #### Example 1b: How SmarterCSV processes CSV-files as chunks, returning arrays of hashes:
77
84
  Please note how the returned array contains two sub-arrays containing the chunks which were read, each chunk containing 2 hashes.
78
85
  In case the number of rows is not cleanly divisible by `:chunk_size`, the last chunk contains fewer hashes.
79
86
 
87
+ ```ruby
80
88
  > pets_by_owner = SmarterCSV.process('/tmp/pets.csv', {:chunk_size => 2, :key_mapping => {:first_name => :first, :last_name => :last}})
81
89
  => [ [ {:first=>"Dan", :last=>"McAllister", :dogs=>"2"}, {:first=>"Lucy", :last=>"Laweless", :cats=>"5"} ],
82
90
  [ {:first=>"Miles", :last=>"O'Brian", :fish=>"21"}, {:first=>"Nancy", :last=>"Homes", :dogs=>"2", :birds=>"1"} ]
83
91
  ]
92
+ ```
84
93
 
85
94
  #### Example 1c: How SmarterCSV processes CSV-files as chunks, and passes arrays of hashes to a given block:
86
95
  Please note how the given block is passed the data for each chunk as the parameter (array of hashes),
87
96
  and how the `process` method returns the number of chunks when called with a block
88
97
 
98
+ ```ruby
89
99
  > total_chunks = SmarterCSV.process('/tmp/pets.csv', {:chunk_size => 2, :key_mapping => {:first_name => :first, :last_name => :last}}) do |chunk|
90
100
  chunk.each do |h| # you can post-process the data from each row to your heart's content, and also create virtual attributes:
91
101
  h[:full_name] = [h[:first],h[:last]].join(' ') # create a virtual attribute
@@ -97,16 +107,16 @@ and how the `process` method returns the number of chunks when called with a blo
97
107
  [{:dogs=>"2", :full_name=>"Dan McAllister"}, {:cats=>"5", :full_name=>"Lucy Laweless"}]
98
108
  [{:fish=>"21", :full_name=>"Miles O'Brian"}, {:dogs=>"2", :birds=>"1", :full_name=>"Nancy Homes"}]
99
109
  => 2
100
-
110
+ ```
101
111
  #### Example 2: Reading a CSV-File in one Chunk, returning one Array of Hashes:
102
-
112
+ ```ruby
103
113
  filename = '/tmp/input_file.txt' # TAB delimited file, each row ending with Control-M
104
114
  recordsA = SmarterCSV.process(filename, {:col_sep => "\t", :row_sep => "\cM"}) # no block given
105
115
 
106
116
  => returns an array of hashes
107
-
117
+ ```
108
118
  #### Example 3: Populate a MySQL or MongoDB Database with SmarterCSV:
109
-
119
+ ```ruby
110
120
  # without using chunks:
111
121
  filename = '/tmp/some.csv'
112
122
  options = {:key_mapping => {:unwanted_row => nil, :old_row_name => :new_name}}
@@ -117,9 +127,9 @@ and how the `process` method returns the number of chunks when called with a blo
117
127
  end
118
128
 
119
129
  => returns number of chunks / rows we processed
120
-
130
+ ```
121
131
  #### Example 4: Populate a MongoDB Database in Chunks of 100 records with SmarterCSV:
122
-
132
+ ```ruby
123
133
  # using chunks:
124
134
  filename = '/tmp/some.csv'
125
135
  options = {:chunk_size => 100, :key_mapping => {:unwanted_row => nil, :old_row_name => :new_name}}
@@ -130,10 +140,10 @@ and how the `process` method returns the number of chunks when called with a blo
130
140
  end
131
141
 
132
142
  => returns number of chunks we processed
133
-
143
+ ```
134
144
 
135
145
  #### Example 5: Reading a CSV-like File, and Processing it with Resque:
136
-
146
+ ```ruby
137
147
  filename = '/tmp/strange_db_dump' # a file with CRTL-A as col_separator, and with CTRL-B\n as record_separator (hello iTunes!)
138
148
  options = {
139
149
  :col_sep => "\cA", :row_sep => "\cB\n", :comment_regexp => /^#/,
@@ -143,11 +153,11 @@ and how the `process` method returns the number of chunks when called with a blo
143
153
  Resque.enque( ResqueWorkerClass, chunk ) # pass chunks of CSV-data to Resque workers for parallel processing
144
154
  end
145
155
  => returns number of chunks
146
-
156
+ ```
147
157
  #### Example 6: Using Value Converters
148
158
 
149
159
  NOTE: If you use `key_mappings` and `value_converters`, make sure that the value converters has references the keys based on the final mapped name, not the original name in the CSV file.
150
-
160
+ ```ruby
151
161
  $ cat spec/fixtures/with_dates.csv
152
162
  first,last,date,price
153
163
  Ben,Miller,10/30/1998,$44.50
@@ -180,7 +190,7 @@ NOTE: If you use `key_mappings` and `value_converters`, make sure that the value
180
190
  => 44.50
181
191
  data[0][:price].class
182
192
  => Float
183
-
193
+ ```
184
194
  ## Parallel Processing
185
195
  [Jack](https://github.com/xjlin0) wrote an interesting article about [Speeding up CSV parsing with parallel processing](http://xjlin0.github.io/tech/2015/05/25/faster-parsing-csv-with-parallel-processing)
186
196
 
@@ -207,7 +217,7 @@ The options and the block are optional.
207
217
  | :skip_lines | nil | how many lines to skip before the first line or header line is processed |
208
218
  | :comment_regexp | /^#/ | regular expression which matches comment lines (see NOTE about the CSV header) |
209
219
  ---------------------------------------------------------------------------------------------------------------------------------
210
- | :col_sep | ',' | column separator |
220
+ | :col_sep | ',' | column separator, can be set to :auto |
211
221
  | :force_simple_split | false | force simple splitting on :col_sep character for non-standard CSV-files. |
212
222
  | | | e.g. when :quote_char is not properly escaped |
213
223
  | :row_sep | $/ ,"\n" | row separator or record separator , defaults to system's $/ , which defaults to "\n" |
@@ -222,7 +232,7 @@ The options and the block are optional.
222
232
  | | | user provided Array of header strings or symbols, to define |
223
233
  | | | what headers should be used, overriding any in-file headers. |
224
234
  | | | You can not combine the :user_provided_headers and :key_mapping options |
225
- | :remove_empty_hashes | true | remove / ignore any hashes which don't have any key/value pairs |
235
+ | :remove_empty_hashes | true | remove / ignore any hashes which don't have any key/value pairs or all empty values |
226
236
  | :verbose | false | print out line number while processing (to track down problems in input files) |
227
237
  ---------------------------------------------------------------------------------------------------------------------------------
228
238
 
@@ -248,7 +258,7 @@ And header and data validations will also be supported in 2.x
248
258
  ---------------------------------------------------------------------------------------------------------------------------------
249
259
  | :value_converters | nil | supply a hash of :header => KlassName; the class needs to implement self.convert(val)|
250
260
  | :remove_empty_values | true | remove values which have nil or empty strings as values |
251
- | :remove_zero_values | true | remove values which have a numeric value equal to zero / 0 |
261
+ | :remove_zero_values | false | remove values which have a numeric value equal to zero / 0 |
252
262
  | :remove_values_matching | nil | removes key/value pairs if value matches given regular expressions. e.g.: |
253
263
  | | | /^\$0\.0+$/ to match $0.00 , or /^#VALUE!$/ to match errors in Excel spreadsheets |
254
264
  | :convert_values_to_numeric | true | converts strings containing Integers or Floats to the appropriate class |
@@ -259,19 +269,19 @@ And header and data validations will also be supported in 2.x
259
269
  #### NOTES about File Encodings:
260
270
  * if you have a CSV file which contains unicode characters, you can process it as follows:
261
271
 
262
-
272
+ ```ruby
263
273
  File.open(filename, "r:bom|utf-8") do |f|
264
274
  data = SmarterCSV.process(f);
265
275
  end
266
-
276
+ ```
267
277
  * if the CSV file with unicode characters is in a remote location, similarly you need to give the encoding as an option to the `open` call:
268
-
278
+ ```ruby
269
279
  require 'open-uri'
270
280
  file_location = 'http://your.remote.org/sample.csv'
271
281
  open(file_location, 'r:utf-8') do |f| # don't forget to specify the UTF-8 encoding!!
272
282
  data = SmarterCSV.process(f)
273
283
  end
274
-
284
+ ```
275
285
  #### NOTES about CSV Headers:
276
286
  * as this method parses CSV files, it is assumed that the first line of any file will contain a valid header
277
287
  * the first line with the CSV header may or may not be commented out according to the :comment_regexp
@@ -305,209 +315,27 @@ And header and data validations will also be supported in 2.x
305
315
  ## Installation
306
316
 
307
317
  Add this line to your application's Gemfile:
308
-
318
+ ```ruby
309
319
  gem 'smarter_csv'
310
-
320
+ ```
311
321
  And then execute:
312
-
322
+ ```ruby
313
323
  $ bundle
314
-
324
+ ```
315
325
  Or install it yourself as:
316
-
326
+ ```ruby
317
327
  $ gem install smarter_csv
318
-
319
- ## Upcoming
320
-
321
- Planned in the next releases:
322
- * programmatic header transformations
323
- * CSV command line
324
-
325
- ## Changes
326
-
327
- #### 1.2.6 (2018-11-13)
328
- * fixing error caused by calling f.close when we do not hand in a file
329
-
330
- #### 1.2.5 (2018-09-16)
331
- * fixing issue #136 with comments in CSV files
332
- * fixing error class hierarchy
333
-
334
- #### 1.2.4 (2018-08-06)
335
- * using Rails blank? if it's available
336
-
337
- #### 1.2.3 (2018-01-27)
338
- * fixed regression / test
339
- * fuxed quote_char interpolation for headers, but not data (thanks to Colin Petruno)
340
- * bugfix (thanks to Joshua Smith for reporting)
341
-
342
- #### 1.2.0 (2018-01-20)
343
- * add default validation that a header can only appear once
344
- * add option `required_headers`
345
-
346
- #### 1.1.5 (2017-11-05)
347
- * fix issue with invalid byte sequences in header (issue #103, thanks to Dave Myron)
348
- * fix issue with invalid byte sequences in multi-line data (thanks to Ivan Ushakov)
349
- * analyze only 500 characters by default when `:row_sep => :auto` is used.
350
- added option `row_sep_auto_chars` to change the default if necessary. (thanks to Matthieu Paret)
351
-
352
- #### 1.1.4 (2017-01-16)
353
- * fixing UTF-8 related bug which was introduced in 1.1.2 (thanks to Tirdad C.)
354
-
355
- #### 1.1.3 (2016-12-30)
356
- * added warning when options indicate UTF-8 processing, but input filehandle is not opened with r:UTF-8 option
357
-
358
- #### 1.1.2 (2016-12-29)
359
- * added option `invalid_byte_sequence` (thanks to polycarpou)
360
- * added comments on handling of UTF-8 encoding when opening from File vs. OpenURI (thanks to KevinColemanInc)
361
-
362
- #### 1.1.1 (2016-11-26)
363
- * added option to `skip_lines` (thanks to wal)
364
- * added option to `force_utf8` encoding (thanks to jordangraft)
365
- * bugfix if no headers in input data (thanks to esBeee)
366
- * ensure input file is closed (thanks to waldyr)
367
- * improved verbose output (thankd to benmaher)
368
- * improved documentation
369
-
370
- #### 1.1.0 (2015-07-26)
371
- * added feature :value_converters, which allows parsing of dates, money, and other things (thanks to Raphaël Bleuse, Lucas Camargo de Almeida, Alejandro)
372
- * added error if :headers_in_file is set to false, and no :user_provided_headers are given (thanks to innhyu)
373
- * added support to convert dashes to underscore characters in headers (thanks to César Camacho)
374
- * fixing automatic detection of \r\n line-endings (thanks to feens)
375
-
376
- #### 1.0.19 (2014-10-29)
377
- * added option :keep_original_headers to keep CSV-headers as-is (thanks to Benjamin Thouret)
378
-
379
- #### 1.0.18 (2014-10-27)
380
- * added support for multi-line fields / csv fields containing CR (thanks to Chris Hilton) (issue #31)
381
-
382
- #### 1.0.17 (2014-01-13)
383
- * added option to set :row_sep to :auto , for automatic detection of the row-separator (issue #22)
384
-
385
- #### 1.0.16 (2014-01-13)
386
- * :convert_values_to_numeric option can now be qualified with :except or :only (thanks to Hugo Lepetit)
387
- * removed deprecated `process_csv` method
388
-
389
- #### 1.0.15 (2013-12-07)
390
- * new option:
391
- * :remove_unmapped_keys to completely ignore columns which were not mapped with :key_mapping (thanks to Dave Sanders)
392
-
393
- #### 1.0.14 (2013-11-01)
394
- * added GPL-2 and MIT license to GEM spec file; if you need another license contact me
395
-
396
- #### 1.0.12 (2013-10-15)
397
- * added RSpec tests
398
-
399
- #### 1.0.11 (2013-09-28)
400
- * bugfix : fixed issue #18 - fixing issue with last chunk not being properly returned (thanks to Jordan Running)
401
- * added RSpec tests
402
-
403
- #### 1.0.10 (2013-06-26)
404
- * bugfix : fixed issue #14 - passing options along to CSV.parse (thanks to Marcos Zimmermann)
405
-
406
- #### 1.0.9 (2013-06-19)
407
- * bugfix : fixed issue #13 with negative integers and floats not being correctly converted (thanks to Graham Wetzler)
408
-
409
- #### 1.0.8 (2013-06-01)
410
-
411
- * bugfix : fixed issue with nil values in inputs with quote-char (thanks to Félix Bellanger)
412
- * new options:
413
- * :force_simple_split : to force simiple splitting on :col_sep character for non-standard CSV-files. e.g. without properly escaped :quote_char
414
- * :verbose : print out line number while processing (to track down problems in input files)
415
-
416
- #### 1.0.7 (2013-05-20)
417
-
418
- * allowing process to work with objects with a 'readline' method (thanks to taq)
419
- * added options:
420
- * :file_encoding : defaults to utf8 (thanks to MrTin, Paxa)
421
-
422
- #### 1.0.6 (2013-05-19)
423
-
424
- * bugfix : quoted fields are now correctly parsed
425
-
426
- #### 1.0.5 (2013-05-08)
427
-
428
- * bugfix : for :headers_in_file option
429
-
430
- #### 1.0.4 (2012-08-17)
431
-
432
- * renamed the following options:
433
- * :strip_whitepace_from_values => :strip_whitespace - removes leading/trailing whitespace from headers and values
434
-
435
- #### 1.0.3 (2012-08-16)
436
-
437
- * added the following options:
438
- * :strip_whitepace_from_values - removes leading/trailing whitespace from values
439
-
440
- #### 1.0.2 (2012-08-02)
441
-
442
- * added more options for dealing with headers:
443
- * :user_provided_headers ,user provided Array with header strings or symbols, to precisely define what the headers should be, overriding any in-file headers (default: nil)
444
- * :headers_in_file , if the file contains headers as the first line (default: true)
445
-
446
- #### 1.0.1 (2012-07-30)
447
-
448
- * added the following options:
449
- * :downcase_header
450
- * :strings_as_keys
451
- * :remove_zero_values
452
- * :remove_values_matching
453
- * :remove_empty_hashes
454
- * :convert_values_to_numeric
455
-
456
- * renamed the following options:
457
- * :remove_empty_fields => :remove_empty_values
458
-
459
-
460
- #### 1.0.0 (2012-07-29)
461
-
462
- * renamed `SmarterCSV.process_csv` to `SmarterCSV.process`.
463
-
464
- #### 1.0.0.pre1 (2012-07-29)
465
-
328
+ ```
329
+ ## [ChangeLog](./CHANGELOG.md)
466
330
 
467
331
  ## Reporting Bugs / Feature Requests
468
332
 
469
333
  Please [open an Issue on GitHub](https://github.com/tilo/smarter_csv/issues) if you have feedback, new feature requests, or want to report a bug. Thank you!
470
334
 
335
+ * please include a small sample CSV file
336
+ * please mention your version of SmarterCSV, Ruby, Rails
471
337
 
472
- ## Special Thanks
473
-
474
- Many thanks to people who have filed issues and sent comments.
475
- And a special thanks to those who contributed pull requests:
476
-
477
- * [Jack 0](https://github.com/xjlin0)
478
- * [Alejandro](https://github.com/agaviria)
479
- * [Lucas Camargo de Almeida](https://github.com/lcalmeida)
480
- * [Raphaël Bleuse](https://github.com/bleuse)
481
- * [feens](https://github.com/feens)
482
- * [César Camacho](https://github.com/chanko)
483
- * [innhyu](https://github.com/innhyu)
484
- * [Benjamin Thouret](https://github.com/benichu)
485
- * [Chris Hilton](https://github.com/chrismhilton)
486
- * [Sean Duckett](http://github.com/sduckett)
487
- * [Alex Ong](http://github.com/khaong)
488
- * [Martin Nilsson](http://github.com/MrTin)
489
- * [Eustáquio Rangel](http://github.com/taq)
490
- * [Pavel](http://github.com/paxa)
491
- * [Félix Bellanger](https://github.com/Keeguon)
492
- * [Graham Wetzler](https://github.com/grahamwetzler)
493
- * [Marcos G. Zimmermann](https://github.com/marcosgz)
494
- * [Jordan Running](https://github.com/jrunning)
495
- * [Dave Sanders](https://github.com/DaveSanders)
496
- * [Hugo Lepetit](https://github.com/giglemad)
497
- * [esBeee](https://github.com/esBeee)
498
- * [Waldyr de Souza](https://github.com/waldyr)
499
- * [Ben Maher](https://github.com/benmaher)
500
- * [Wal McConnell](https://github.com/wal)
501
- * [Jordan Graft](https://github.com/jordangraft)
502
- * [Michael](https://github.com/polycarpou)
503
- * [Kevin Coleman](https://github.com/KevinColemanInc)
504
- * [Tirdad C.](https://github.com/tridadc)
505
- * [Dave Myron](https://github.com/contentfree)
506
- * [Ivan Ushakov](https://github.com/IvanUshakov)
507
- * [Matthieu Paret](https://github.com/mtparet)
508
- * [Rohit Amarnath](https://github.com/ramarnat)
509
- * [Joshua Smith](https://github.com/enviable)
510
- * [Colin Petruno](https://github.com/colinpetruno)
338
+ ## [A Special Thanks to all Contributors!](CONTRIBUTORS.md) 🎉🎉🎉
511
339
 
512
340
 
513
341
  ## Contributing
data/Rakefile CHANGED
@@ -1,26 +1,19 @@
1
1
  #!/usr/bin/env rake
2
2
  require "bundler/gem_tasks"
3
-
4
3
  require 'rubygems'
5
4
  require 'rake'
6
-
7
5
  require 'rspec/core/rake_task'
8
6
 
7
+ task :default => :spec
8
+
9
9
  desc "Run RSpec"
10
10
  RSpec::Core::RakeTask.new do |t|
11
- t.verbose = false
11
+ # t.verbose = false
12
12
  end
13
13
 
14
- desc "Run specs for all test cases"
15
- task :spec_all do
16
- system "rake spec"
14
+ desc 'Run spec with coverage'
15
+ task :coverage do
16
+ ENV['COVERAGE'] = 'true'
17
+ Rake::Task['spec'].execute
18
+ `open coverage/index.html`
17
19
  end
18
-
19
- # task :spec_all do
20
- # %w[active_record data_mapper mongoid].each do |model_adapter|
21
- # puts "MODEL_ADAPTER = #{model_adapter}"
22
- # system "rake spec MODEL_ADAPTER=#{model_adapter}"
23
- # end
24
- # end
25
-
26
- task :default => :spec