smarter_csv 1.4.0 → 1.5.1

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: c8236e4cc8f0081efd9b74f12ad4b5342707d0a2f883414b07538160910008a3
4
- data.tar.gz: b04a53b0030bf6c623aa19fb15c0c6c5ca123ce2ff85d47f176884fffa0f9811
3
+ metadata.gz: 352cf76ac0cd6b2eb4a1cac9e5056aa6e92a8a61b627d7c922e063dcf82ad675
4
+ data.tar.gz: 0c6e3ab1eaee02a9361fe0b418191244d81bc558dbcd10ee1d2c5f15390d91b6
5
5
  SHA512:
6
- metadata.gz: f2ddaa7bf44362c8bb4439289172d40b6ca926a67a8a35fb335473ddf7349658a629f3008ece5314c6bc5fa17145a2ae89b4d706b9c130a1642a51f2434d5e21
7
- data.tar.gz: b48908b657a07589886873fe251263dabbe6e2333a1fc025dfede085841544458d4498ba1d288a4a7c0de3875d1c14631cc584b2a1cb7fd0be1543b758781dd3
6
+ metadata.gz: 3763cd8e493e7da6560e8ce9adc58bd411f745f5af119c97d70c02667a524ccb1055b5c640ef795c3cb25b79fa5e17800018da6e76f9d358afa1c7a3513caae3
7
+ data.tar.gz: '039183fdece20e80007f3f0d3e395fac8d273df6c21928a35b685ade5915503b3c55918fc7015ead0a7d768a545bb76bcfb1087006af6118dc3d22df83e68ddb'
data/.gitignore CHANGED
@@ -6,3 +6,5 @@
6
6
  .bundle
7
7
  Gemfile.lock
8
8
  pkg/*
9
+ coverage/*
10
+ .DS_Store
data/CHANGELOG.md CHANGED
@@ -1,14 +1,36 @@
1
1
 
2
2
  # SmarterCSV 1.x Change Log
3
3
 
4
- ## 1.4.0 (2022-01-11)
4
+ ## 1.5.1 (2022-04-26)
5
+ * added raising of `KeyMappingError` if `key_mapping` refers to a non-existent key
6
+ * added option `duplicate_header_suffix` (thanks to Skye Shaw)
7
+ When given a non-nil string, it uses the suffix to append numbering 2..n to duplicate headers.
8
+ If your code will need to process arbitrary CSV files, please set `duplicate_header_suffix`.
9
+
10
+ ## 1.5.0 (2022-04-25)
11
+ * fixed bug with trailing col_sep characters, introduced in 1.4.0
12
+ * Fix deprecation warning in Ruby 3.0.3 / $INPUT_RECORD_SEPARATOR (thanks to Joel Fouse )
13
+
14
+ * changed default for `comment_regexp` to be `nil` for a safer default behavior (thanks to David Lazar)
15
+ **Note**
16
+ This no longer assumes that lines starting with `#` are comments.
17
+ If you want to treat lines starting with '#' as comments, use `comment_regexp: /\A#/`
18
+
19
+ ## 1.4.2 (2022-02-12)
20
+ * fixed issue with simplecov
21
+
22
+ ## 1.4.1 (2022-02-12) (PULLED)
23
+ * minor fix: also support `col_sep: :auto`
24
+ * added simplecov
25
+
26
+ ## 1.4.0 (2022-02-11)
5
27
  * dropped GPL license, smarter_csv is now only using the MIT License
6
28
  * added experimental option `col_sep: 'auto` to auto-detect the column separator (issue #183)
7
29
  The default behavior is still to assume `,` is the column separator.
8
30
  * fixed buggy behavior when using `remove_empty_values: false` (issue #168)
9
31
  * fixed Ruby 3.0 deprecation
10
32
 
11
- ## 1.3.0 (2022-01-06) Breaking code change if you used `--key_mappings`
33
+ ## 1.3.0 (2022-02-06) Breaking code change if you used `--key_mappings`
12
34
  * fix bug for key_mappings (issue #181)
13
35
  The values of the `key_mappings` hash will now be used "as is", and no longer forced to be symbols
14
36
 
data/CONTRIBUTORS.md ADDED
@@ -0,0 +1,46 @@
1
+ # A Big Thank You to all the Contributors!!
2
+
3
+
4
+ A Big Thank you to everyone who filed issues, sent comments, and who contributed with pull requests:
5
+
6
+ * [Jack 0](https://github.com/xjlin0)
7
+ * [Alejandro](https://github.com/agaviria)
8
+ * [Lucas Camargo de Almeida](https://github.com/lcalmeida)
9
+ * [Raphaël Bleuse](https://github.com/bleuse)
10
+ * [feens](https://github.com/feens)
11
+ * [César Camacho](https://github.com/chanko)
12
+ * [innhyu](https://github.com/innhyu)
13
+ * [Benjamin Thouret](https://github.com/benichu)
14
+ * [Chris Hilton](https://github.com/chrismhilton)
15
+ * [Sean Duckett](http://github.com/sduckett)
16
+ * [Alex Ong](http://github.com/khaong)
17
+ * [Martin Nilsson](http://github.com/MrTin)
18
+ * [Eustáquio Rangel](http://github.com/taq)
19
+ * [Pavel](http://github.com/paxa)
20
+ * [Félix Bellanger](https://github.com/Keeguon)
21
+ * [Graham Wetzler](https://github.com/grahamwetzler)
22
+ * [Marcos G. Zimmermann](https://github.com/marcosgz)
23
+ * [Jordan Running](https://github.com/jrunning)
24
+ * [Dave Sanders](https://github.com/DaveSanders)
25
+ * [Hugo Lepetit](https://github.com/giglemad)
26
+ * [esBeee](https://github.com/esBeee)
27
+ * [Waldyr de Souza](https://github.com/waldyr)
28
+ * [Ben Maher](https://github.com/benmaher)
29
+ * [Wal McConnell](https://github.com/wal)
30
+ * [Jordan Graft](https://github.com/jordangraft)
31
+ * [Michael](https://github.com/polycarpou)
32
+ * [Kevin Coleman](https://github.com/KevinColemanInc)
33
+ * [Tirdad C.](https://github.com/tridadc)
34
+ * [Dave Myron](https://github.com/contentfree)
35
+ * [Ivan Ushakov](https://github.com/IvanUshakov)
36
+ * [Matthieu Paret](https://github.com/mtparet)
37
+ * [Rohit Amarnath](https://github.com/ramarnat)
38
+ * [Joshua Smith](https://github.com/enviable)
39
+ * [Colin Petruno](https://github.com/colinpetruno)
40
+ * [Diego Salido](https://github.com/salidux)
41
+ * [Elie](https://github.com/elieteyssedou)
42
+ * [Chris Wong](https://github.com/lightwave)
43
+ * [Olle Jonsson](https://github.com/olleolleolle)
44
+ * [Nicolas Guillemain](https://github.com/Viiruus)
45
+ * [Sp6](https://github.com/sp6)
46
+ * [Joel Fouse](https://github.com/jfouse)
data/LICENSE.txt CHANGED
@@ -1,6 +1,6 @@
1
1
  The MIT License (MIT)
2
2
 
3
- Copyright (c) 2022 Tilo Sloboda
3
+ Copyright (c) 2012..2022 Tilo Sloboda
4
4
 
5
5
  Permission is hereby granted, free of charge, to any person obtaining a copy
6
6
  of this software and associated documentation files (the "Software"), to deal
data/README.md CHANGED
@@ -1,17 +1,23 @@
1
- # SmarterCSV
2
-
3
- [![Build Status](https://secure.travis-ci.org/tilo/smarter_csv.svg?branch=master)](http://travis-ci.org/tilo/smarter_csv) [![Gem Version](https://badge.fury.io/rb/smarter_csv.svg)](http://badge.fury.io/rb/smarter_csv)
4
1
 
5
- ---------------
6
2
  #### Service Announcement
7
3
 
8
4
  * Work towards SmarterCSV 2.0 is still on it's way, with much improved features, and more streamlined options.
9
- Please check the 2.0-develop branch, open any issues and pull requests with mention of v2.0.
5
+ Please check the [2.0-develop branch](https://github.com/tilo/smarter_csv/blob/master/README.md), open any issues and pull requests with mention of v2.0.
10
6
 
11
- * New versions on the 1.2 branch will soon print a deprecation warning if you set :verbose to true
7
+ * New versions of SmarterCSV 1.x will soon print a deprecation warning if you set :verbose to true
12
8
  See below for list of deprecated options.
13
9
 
10
+ #### Restructured Branches
11
+
12
+ * default branch is `main` for 1.x development
13
+ * 2.x development is on `2.0-development`
14
+
14
15
  ---------------
16
+
17
+ # SmarterCSV
18
+
19
+ [![Build Status](https://secure.travis-ci.org/tilo/smarter_csv.svg?branch=master)](http://travis-ci.org/tilo/smarter_csv) [![Gem Version](https://badge.fury.io/rb/smarter_csv.svg)](http://badge.fury.io/rb/smarter_csv)
20
+
15
21
  #### SmarterCSV 1.x
16
22
 
17
23
  `smarter_csv` is a Ruby Gem for smarter importing of CSV Files as Array(s) of Hashes, suitable for direct processing with Mongoid or ActiveRecord,
@@ -55,6 +61,7 @@ You can also set the `:row_sep` manually! Checkout Example 5 for unusual `:row_s
55
61
  #### Example 1a: How SmarterCSV processes CSV-files as array of hashes:
56
62
  Please note how each hash contains only the keys for columns with non-null values.
57
63
 
64
+ ```ruby
58
65
  $ cat pets.csv
59
66
  first name,last name,dogs,cats,birds,fish
60
67
  Dan,McAllister,2,,,
@@ -70,21 +77,25 @@ Please note how each hash contains only the keys for columns with non-null value
70
77
  {:first_name=>"Miles", :last_name=>"O'Brian", :fish=>"21"},
71
78
  {:first_name=>"Nancy", :last_name=>"Homes", :dogs=>"2", :birds=>"1"}
72
79
  ]
80
+ ```
73
81
 
74
82
 
75
83
  #### Example 1b: How SmarterCSV processes CSV-files as chunks, returning arrays of hashes:
76
84
  Please note how the returned array contains two sub-arrays containing the chunks which were read, each chunk containing 2 hashes.
77
85
  In case the number of rows is not cleanly divisible by `:chunk_size`, the last chunk contains fewer hashes.
78
86
 
87
+ ```ruby
79
88
  > pets_by_owner = SmarterCSV.process('/tmp/pets.csv', {:chunk_size => 2, :key_mapping => {:first_name => :first, :last_name => :last}})
80
89
  => [ [ {:first=>"Dan", :last=>"McAllister", :dogs=>"2"}, {:first=>"Lucy", :last=>"Laweless", :cats=>"5"} ],
81
90
  [ {:first=>"Miles", :last=>"O'Brian", :fish=>"21"}, {:first=>"Nancy", :last=>"Homes", :dogs=>"2", :birds=>"1"} ]
82
91
  ]
92
+ ```
83
93
 
84
94
  #### Example 1c: How SmarterCSV processes CSV-files as chunks, and passes arrays of hashes to a given block:
85
95
  Please note how the given block is passed the data for each chunk as the parameter (array of hashes),
86
96
  and how the `process` method returns the number of chunks when called with a block
87
97
 
98
+ ```ruby
88
99
  > total_chunks = SmarterCSV.process('/tmp/pets.csv', {:chunk_size => 2, :key_mapping => {:first_name => :first, :last_name => :last}}) do |chunk|
89
100
  chunk.each do |h| # you can post-process the data from each row to your heart's content, and also create virtual attributes:
90
101
  h[:full_name] = [h[:first],h[:last]].join(' ') # create a virtual attribute
@@ -96,16 +107,16 @@ and how the `process` method returns the number of chunks when called with a blo
96
107
  [{:dogs=>"2", :full_name=>"Dan McAllister"}, {:cats=>"5", :full_name=>"Lucy Laweless"}]
97
108
  [{:fish=>"21", :full_name=>"Miles O'Brian"}, {:dogs=>"2", :birds=>"1", :full_name=>"Nancy Homes"}]
98
109
  => 2
99
-
110
+ ```
100
111
  #### Example 2: Reading a CSV-File in one Chunk, returning one Array of Hashes:
101
-
112
+ ```ruby
102
113
  filename = '/tmp/input_file.txt' # TAB delimited file, each row ending with Control-M
103
114
  recordsA = SmarterCSV.process(filename, {:col_sep => "\t", :row_sep => "\cM"}) # no block given
104
115
 
105
116
  => returns an array of hashes
106
-
117
+ ```
107
118
  #### Example 3: Populate a MySQL or MongoDB Database with SmarterCSV:
108
-
119
+ ```ruby
109
120
  # without using chunks:
110
121
  filename = '/tmp/some.csv'
111
122
  options = {:key_mapping => {:unwanted_row => nil, :old_row_name => :new_name}}
@@ -116,9 +127,9 @@ and how the `process` method returns the number of chunks when called with a blo
116
127
  end
117
128
 
118
129
  => returns number of chunks / rows we processed
119
-
130
+ ```
120
131
  #### Example 4: Populate a MongoDB Database in Chunks of 100 records with SmarterCSV:
121
-
132
+ ```ruby
122
133
  # using chunks:
123
134
  filename = '/tmp/some.csv'
124
135
  options = {:chunk_size => 100, :key_mapping => {:unwanted_row => nil, :old_row_name => :new_name}}
@@ -129,10 +140,10 @@ and how the `process` method returns the number of chunks when called with a blo
129
140
  end
130
141
 
131
142
  => returns number of chunks we processed
132
-
143
+ ```
133
144
 
134
145
  #### Example 5: Reading a CSV-like File, and Processing it with Resque:
135
-
146
+ ```ruby
136
147
  filename = '/tmp/strange_db_dump' # a file with CRTL-A as col_separator, and with CTRL-B\n as record_separator (hello iTunes!)
137
148
  options = {
138
149
  :col_sep => "\cA", :row_sep => "\cB\n", :comment_regexp => /^#/,
@@ -142,11 +153,11 @@ and how the `process` method returns the number of chunks when called with a blo
142
153
  Resque.enque( ResqueWorkerClass, chunk ) # pass chunks of CSV-data to Resque workers for parallel processing
143
154
  end
144
155
  => returns number of chunks
145
-
156
+ ```
146
157
  #### Example 6: Using Value Converters
147
158
 
148
159
  NOTE: If you use `key_mappings` and `value_converters`, make sure that the value converters has references the keys based on the final mapped name, not the original name in the CSV file.
149
-
160
+ ```ruby
150
161
  $ cat spec/fixtures/with_dates.csv
151
162
  first,last,date,price
152
163
  Ben,Miller,10/30/1998,$44.50
@@ -179,7 +190,7 @@ NOTE: If you use `key_mappings` and `value_converters`, make sure that the value
179
190
  => 44.50
180
191
  data[0][:price].class
181
192
  => Float
182
-
193
+ ```
183
194
  ## Parallel Processing
184
195
  [Jack](https://github.com/xjlin0) wrote an interesting article about [Speeding up CSV parsing with parallel processing](http://xjlin0.github.io/tech/2015/05/25/faster-parsing-csv-with-parallel-processing)
185
196
 
@@ -204,9 +215,9 @@ The options and the block are optional.
204
215
  | :invalid_byte_sequence | '' | what to replace invalid byte sequences with |
205
216
  | :force_utf8 | false | force UTF-8 encoding of all lines (including headers) in the CSV file |
206
217
  | :skip_lines | nil | how many lines to skip before the first line or header line is processed |
207
- | :comment_regexp | /^#/ | regular expression which matches comment lines (see NOTE about the CSV header) |
218
+ | :comment_regexp | nil | regular expression to ignore comment lines (see NOTE on CSV header), e.g./\A#/ |
208
219
  ---------------------------------------------------------------------------------------------------------------------------------
209
- | :col_sep | ',' | column separator, can be set to 'auto' |
220
+ | :col_sep | ',' | column separator, can be set to :auto |
210
221
  | :force_simple_split | false | force simple splitting on :col_sep character for non-standard CSV-files. |
211
222
  | | | e.g. when :quote_char is not properly escaped |
212
223
  | :row_sep | $/ ,"\n" | row separator or record separator , defaults to system's $/ , which defaults to "\n" |
@@ -217,6 +228,7 @@ The options and the block are optional.
217
228
  | :headers_in_file | true | Whether or not the file contains headers as the first line. |
218
229
  | | | Important if the file does not contain headers, |
219
230
  | | | otherwise you would lose the first line of data. |
231
+ | :duplicate_header_suffix | nil | If set, adds numbers to duplicated headers and separates them by the given suffix |
220
232
  | :user_provided_headers | nil | *careful with that axe!* |
221
233
  | | | user provided Array of header strings or symbols, to define |
222
234
  | | | what headers should be used, overriding any in-file headers. |
@@ -258,27 +270,36 @@ And header and data validations will also be supported in 2.x
258
270
  #### NOTES about File Encodings:
259
271
  * if you have a CSV file which contains unicode characters, you can process it as follows:
260
272
 
261
-
273
+ ```ruby
262
274
  File.open(filename, "r:bom|utf-8") do |f|
263
275
  data = SmarterCSV.process(f);
264
276
  end
265
-
277
+ ```
266
278
  * if the CSV file with unicode characters is in a remote location, similarly you need to give the encoding as an option to the `open` call:
267
-
279
+ ```ruby
268
280
  require 'open-uri'
269
281
  file_location = 'http://your.remote.org/sample.csv'
270
282
  open(file_location, 'r:utf-8') do |f| # don't forget to specify the UTF-8 encoding!!
271
283
  data = SmarterCSV.process(f)
272
284
  end
285
+ ```
273
286
 
274
287
  #### NOTES about CSV Headers:
275
288
  * as this method parses CSV files, it is assumed that the first line of any file will contain a valid header
276
- * the first line with the CSV header may or may not be commented out according to the :comment_regexp
289
+ * the first line with the header might be commented out, in which case you will need to set `comment_regexp: /\A#/`
290
+ This is no longer handled automatically since 1.5.0.
277
291
  * any occurences of :comment_regexp or :row_sep will be stripped from the first line with the CSV header
278
292
  * any of the keys in the header line will be downcased, spaces replaced by underscore, and converted to Ruby symbols before being used as keys in the returned Hashes
279
293
  * you can not combine the :user_provided_headers and :key_mapping options
280
294
  * if the incorrect number of headers are provided via :user_provided_headers, exception SmarterCSV::HeaderSizeMismatch is raised
281
295
 
296
+ #### NOTES on Duplicate Headers:
297
+ As a corner case, it is possible that a CSV file contains multiple headers with the same name.
298
+ * If that happens, by default `smarter_csv` will raise a `DuplicateHeaders` error.
299
+ * If you set `duplicate_header_suffix` to a non-nil string, it will use it to append numbers 2..n to the duplicate headers. To further disambiguate the headers, you can further use `key_mapping` to assign meaningful names.
300
+ * If your code will need to process arbitrary CSV files, please set `duplicate_header_suffix`.
301
+ * Another way to deal with duplicate headers it to use `user_assigned_headers` to ignore any headers in the file.
302
+
282
303
  #### NOTES on Key Mapping:
283
304
  * keys in the header line of the file can be re-mapped to a chosen set of symbols, so the resulting Hashes can be better used internally in your application (e.g. when directly creating MongoDB entries with them)
284
305
  * if you want to completely delete a key, then map it to nil or to '', they will be automatically deleted from any result Hash
@@ -304,64 +325,27 @@ And header and data validations will also be supported in 2.x
304
325
  ## Installation
305
326
 
306
327
  Add this line to your application's Gemfile:
307
-
328
+ ```ruby
308
329
  gem 'smarter_csv'
309
-
330
+ ```
310
331
  And then execute:
311
-
332
+ ```ruby
312
333
  $ bundle
313
-
334
+ ```
314
335
  Or install it yourself as:
315
-
336
+ ```ruby
316
337
  $ gem install smarter_csv
317
-
338
+ ```
318
339
  ## [ChangeLog](./CHANGELOG.md)
319
340
 
320
341
  ## Reporting Bugs / Feature Requests
321
342
 
322
343
  Please [open an Issue on GitHub](https://github.com/tilo/smarter_csv/issues) if you have feedback, new feature requests, or want to report a bug. Thank you!
323
344
 
345
+ * please include a small sample CSV file
346
+ * please mention your version of SmarterCSV, Ruby, Rails
324
347
 
325
- ## Special Thanks
326
-
327
- Many thanks to people who have filed issues and sent comments.
328
- And a special thanks to those who contributed pull requests:
329
-
330
- * [Jack 0](https://github.com/xjlin0)
331
- * [Alejandro](https://github.com/agaviria)
332
- * [Lucas Camargo de Almeida](https://github.com/lcalmeida)
333
- * [Raphaël Bleuse](https://github.com/bleuse)
334
- * [feens](https://github.com/feens)
335
- * [César Camacho](https://github.com/chanko)
336
- * [innhyu](https://github.com/innhyu)
337
- * [Benjamin Thouret](https://github.com/benichu)
338
- * [Chris Hilton](https://github.com/chrismhilton)
339
- * [Sean Duckett](http://github.com/sduckett)
340
- * [Alex Ong](http://github.com/khaong)
341
- * [Martin Nilsson](http://github.com/MrTin)
342
- * [Eustáquio Rangel](http://github.com/taq)
343
- * [Pavel](http://github.com/paxa)
344
- * [Félix Bellanger](https://github.com/Keeguon)
345
- * [Graham Wetzler](https://github.com/grahamwetzler)
346
- * [Marcos G. Zimmermann](https://github.com/marcosgz)
347
- * [Jordan Running](https://github.com/jrunning)
348
- * [Dave Sanders](https://github.com/DaveSanders)
349
- * [Hugo Lepetit](https://github.com/giglemad)
350
- * [esBeee](https://github.com/esBeee)
351
- * [Waldyr de Souza](https://github.com/waldyr)
352
- * [Ben Maher](https://github.com/benmaher)
353
- * [Wal McConnell](https://github.com/wal)
354
- * [Jordan Graft](https://github.com/jordangraft)
355
- * [Michael](https://github.com/polycarpou)
356
- * [Kevin Coleman](https://github.com/KevinColemanInc)
357
- * [Tirdad C.](https://github.com/tridadc)
358
- * [Dave Myron](https://github.com/contentfree)
359
- * [Ivan Ushakov](https://github.com/IvanUshakov)
360
- * [Matthieu Paret](https://github.com/mtparet)
361
- * [Rohit Amarnath](https://github.com/ramarnat)
362
- * [Joshua Smith](https://github.com/enviable)
363
- * [Colin Petruno](https://github.com/colinpetruno)
364
- * [Diego Salido](https://github.com/salidux)
348
+ ## [A Special Thanks to all Contributors!](CONTRIBUTORS.md) 🎉🎉🎉
365
349
 
366
350
 
367
351
  ## Contributing
data/Rakefile CHANGED
@@ -1,26 +1,19 @@
1
1
  #!/usr/bin/env rake
2
2
  require "bundler/gem_tasks"
3
-
4
3
  require 'rubygems'
5
4
  require 'rake'
6
-
7
5
  require 'rspec/core/rake_task'
8
6
 
7
+ task :default => :spec
8
+
9
9
  desc "Run RSpec"
10
10
  RSpec::Core::RakeTask.new do |t|
11
- t.verbose = false
11
+ # t.verbose = false
12
12
  end
13
13
 
14
- desc "Run specs for all test cases"
15
- task :spec_all do
16
- system "rake spec"
14
+ desc 'Run spec with coverage'
15
+ task :coverage do
16
+ ENV['COVERAGE'] = 'true'
17
+ Rake::Task['spec'].execute
18
+ `open coverage/index.html`
17
19
  end
18
-
19
- # task :spec_all do
20
- # %w[active_record data_mapper mongoid].each do |model_adapter|
21
- # puts "MODEL_ADAPTER = #{model_adapter}"
22
- # system "rake spec MODEL_ADAPTER=#{model_adapter}"
23
- # end
24
- # end
25
-
26
- task :default => :spec