smarter_csv 1.8.0 → 1.8.2

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: 55400b3977ce35c58d60c4101362b68d99f2dbf7cb6a63956ae3b6ab79fcf1ac
4
- data.tar.gz: 41f46d3e4de69a7924ecd2214ba4e37766106469d1b8b257fd752a96204a47fd
3
+ metadata.gz: 4a6ec6f3a579d9c1e6bfc2c3c9006f64d8c7b705eeca6ec048ea56c688f8ea1c
4
+ data.tar.gz: ba9a4a289adcc2fc398ae608f9570c28baac57b877852f3ea37c78fa57f2d7e3
5
5
  SHA512:
6
- metadata.gz: 24ecc14cf9c65efe5c11e4bd20753420aa8ccd7385171cd21eac2e1be92c4896087cdc2a18799fa111c0f36154ad4481daed7f08b752f4fae2b5f27241b8cf6c
7
- data.tar.gz: c1d70e18a7ae8057e58cbf73b62f4896dd7030bc5fd2e927669e5ea829f9a3c11daeb9c8b83296dbb46e6f0d23034245b7207882a77b54cc1ca128a581175359
6
+ metadata.gz: d8f516501a5539e30789e2d18c4d051f50372786d8df1272192c2bc7997470cf5d5e1ae94b776d0d580cb62ed8ffb0f6591ccc2be5d60eae6e421f22f0c92f94
7
+ data.tar.gz: 2993c59278adb531cf2299c0aa3869c637868f2f4e6421f845e4ade35011f94ee6e226f43e07b206c67c354c2eb38c0a41981ffbdbff00950690b94b06b2aacd
data/CHANGELOG.md CHANGED
@@ -1,8 +1,21 @@
1
1
 
2
2
  # SmarterCSV 1.x Change Log
3
3
 
4
- ## 1.8.0 (2023-03-18)
4
+ ## 1.8.2 (2023-03-21)
5
+ * bugfix: do not raise `NoColSepDetected` for CSV files with only one column in most cases (issue #222)
6
+ If the first lines contain non-ASCII characters, and no col_sep is detected, it will still raise `NoColSepDetected`
7
+
8
+ ## 1.8.1 (2023-03-19)
9
+ * added validation against invalid values for :col_sep, :row_sep, :quote_char (issue #216)
10
+ * deprecating `required_headers` and replace with `required_keys` (issue #140)
11
+ * fixed issue with require statement
12
+
13
+ ## 1.8.0 (2023-03-18) BREAKING
5
14
  * NEW DEFAULTS: `col_sep: :auto`, `row_sep: :auto`. Fully automatic detection by default.
15
+
16
+ MAKE SURE to rescue `NoColSepDetected` if your CSV files can have unexpected formats,
17
+ e.g. from users uploading them to a service, and handle those cases.
18
+
6
19
  * ignore Byte Order Marker (BOM) in first line in file (issues #27, #219)
7
20
 
8
21
  ## 1.7.4 (2023-01-13)
data/README.md CHANGED
@@ -73,6 +73,15 @@ $ hexdump -C spec/fixtures/bom_test_feff.csv
73
73
  00000040 73 2c 35 36 37 38 0d 0a |s,5678..|
74
74
  ```
75
75
 
76
+ ### Examples
77
+
78
+ Here are some examples to demonstrate the versatility of SmarterCSV.
79
+
80
+ **It is generally recommended to rescue `SmarterCSVException` or it's sub-classes.**
81
+
82
+ By default SmarterCSV determines the `row_sep` and `col_sep` values automatically. In cases where the automatic detection fails, an exception will be raised, e.g. `NoColSepDetected`. Rescuing from these exceptions will make sure that you don't miss processing CSV files, in case users upload CSV files with unexpected formats.
83
+
84
+ In rare cases you may have to manually set these values, after going through the troubleshooting procedure described above.
76
85
 
77
86
  #### Example 1a: How SmarterCSV processes CSV-files as array of hashes:
78
87
  Please note how each hash contains only the keys for columns with non-null values.
@@ -267,7 +276,8 @@ And header and data validations will also be supported in 2.x
267
276
  ---------------------------------------------------------------------------------------------------------------------------------
268
277
  | :key_mapping | nil | a hash which maps headers from the CSV file to keys in the result hash |
269
278
  | :silence_missing_key | false | ignore missing keys in `key_mapping` if true |
270
- | :required_headers | nil | An array. Each of the given headers must be present after header manipulation, |
279
+ | :required_keys | nil | An array. Specify the required names AFTER header transformation. |
280
+ | :required_headers | nil | (DEPRECATED / renamed) Use `required_keys` instead |
271
281
  | | | or an exception is raised No validation if nil is given. |
272
282
  | :remove_unmapped_keys | false | when using :key_mapping option, should non-mapped keys / columns be removed? |
273
283
  | :downcase_header | true | downcase all column headers |
data/TO_DO_v2.md ADDED
@@ -0,0 +1,14 @@
1
+ # SmarterCSV v2.0 TO DO List
2
+
3
+ * add enumerable to speed up parallel processing [issue #66](https://github.com/tilo/smarter_csv/issues/66), [issue #32](https://github.com/tilo/smarter_csv/issues/32)
4
+ * use Procs for validations and transformatoins [issue #118](https://github.com/tilo/smarter_csv/issues/118)
5
+ * make @errors and @warnings work [issue #118](https://github.com/tilo/smarter_csv/issues/118)
6
+ * skip file opening, allow reading from CSV string, e.g. reading from S3 file [issue #120](https://github.com/tilo/smarter_csv/issues/120).
7
+ Or stream large file from S3 (linked in the issue)
8
+ * Collect all Errors, before surfacing them. Avoid throwing an exception on the first error [issue #133](https://github.com/tilo/smarter_csv/issues/133)
9
+ * Don't call rewind on filehandle
10
+ * [2.0 BUG] :convert_values_to_numeric_unless_leading_zeros drops leading zeros [issue #151](https://github.com/tilo/smarter_csv/issues/151)
11
+ * [2.0 BUG] convert_to_float saves Proc as @@convert_to_integer [issue #157](https://github.com/tilo/smarter_csv/issues/157)
12
+ * Provide an example for custom Procs for hash_transformations in the docs [issue #174](https://github.com/tilo/smarter_csv/issues/174)
13
+ * Replace remove_empty_values: false [issue #213](https://github.com/tilo/smarter_csv/issues/213)
14
+
@@ -27,7 +27,6 @@ static VALUE rb_parse_csv_line(VALUE self, VALUE line, VALUE col_sep, VALUE quot
27
27
  long col_sep_len = RSTRING_LEN(col_sep);
28
28
 
29
29
  char *quoteP = RSTRING_PTR(quote_char);
30
- long quote_len = RSTRING_LEN(quote_char);
31
30
  long quote_count = 0;
32
31
 
33
32
  bool col_sep_found = true;
@@ -1,5 +1,5 @@
1
1
  # frozen_string_literal: true
2
2
 
3
3
  module SmarterCSV
4
- VERSION = "1.8.0"
4
+ VERSION = "1.8.2"
5
5
  end
data/lib/smarter_csv.rb CHANGED
@@ -3,24 +3,25 @@
3
3
  require_relative "extensions/hash"
4
4
  require_relative "smarter_csv/version"
5
5
 
6
- # require_relative "smarter_csv/smarter_csv" unless ENV['CI'] # does not compile/link in CI?
7
- require 'smarter_csv.bundle' unless ENV['CI'] # does not compile/link in CI?
6
+ require_relative "smarter_csv/smarter_csv" unless ENV['CI'] # does not compile/link in CI?
7
+ # require 'smarter_csv.bundle' unless ENV['CI'] # local testing
8
8
 
9
9
  module SmarterCSV
10
10
  class SmarterCSVException < StandardError; end
11
11
  class HeaderSizeMismatch < SmarterCSVException; end
12
12
  class IncorrectOption < SmarterCSVException; end
13
+ class ValidationError < SmarterCSVException; end
13
14
  class DuplicateHeaders < SmarterCSVException; end
14
15
  class MissingHeaders < SmarterCSVException; end
15
16
  class NoColSepDetected < SmarterCSVException; end
16
- class KeyMappingError < SmarterCSVException; end
17
- class MalformedCSVError < SmarterCSVException; end
17
+ class KeyMappingError < SmarterCSVException; end # CURRENTLY UNUSED -> version 1.9.0
18
18
 
19
19
  # first parameter: filename or input object which responds to readline method
20
20
  def SmarterCSV.process(input, options = {}, &block)
21
21
  options = default_options.merge(options)
22
22
  options[:invalid_byte_sequence] = '' if options[:invalid_byte_sequence].nil?
23
23
  puts "SmarterCSV OPTIONS: #{options.inspect}" if options[:verbose]
24
+ validate_options!(options)
24
25
 
25
26
  headerA = []
26
27
  result = []
@@ -214,7 +215,7 @@ module SmarterCSV
214
215
  headers_in_file: true,
215
216
  invalid_byte_sequence: '',
216
217
  keep_original_headers: false,
217
- key_mapping_hash: nil,
218
+ key_mapping: nil,
218
219
  quote_char: '"',
219
220
  remove_empty_hashes: true,
220
221
  remove_empty_values: true,
@@ -222,6 +223,7 @@ module SmarterCSV
222
223
  remove_values_matching: nil,
223
224
  remove_zero_values: false,
224
225
  required_headers: nil,
226
+ required_keys: nil,
225
227
  row_sep: :auto, # was: $/,
226
228
  silence_missing_keys: false,
227
229
  skip_lines: nil,
@@ -391,15 +393,28 @@ module SmarterCSV
391
393
  def guess_column_separator(filehandle, options)
392
394
  skip_lines(filehandle, options)
393
395
 
394
- possible_delimiters = [',', "\t", ';', ':', '|']
396
+ delimiters = [',', "\t", ';', ':', '|']
395
397
 
396
- candidates = if options.fetch(:headers_in_file)
397
- candidated_column_separators_from_headers(filehandle, options, possible_delimiters)
398
- else
399
- candidated_column_separators_from_contents(filehandle, options, possible_delimiters)
400
- end
398
+ line = nil
399
+ has_header = options[:headers_in_file]
400
+ candidates = Hash.new(0)
401
+ count = has_header ? 1 : 5
402
+ count.times do
403
+ line = readline_with_counts(filehandle, options)
404
+ delimiters.each do |d|
405
+ candidates[d] += line.scan(d).count
406
+ end
407
+ rescue EOFError # short files
408
+ break
409
+ end
410
+ rewind(filehandle)
401
411
 
402
- raise SmarterCSV::NoColSepDetected if candidates.values.max == 0
412
+ if candidates.values.max == 0
413
+ # if the header only contains
414
+ return ',' if line =~ /^\w+$/
415
+
416
+ raise SmarterCSV::NoColSepDetected
417
+ end
403
418
 
404
419
  candidates.key(candidates.values.max)
405
420
  end
@@ -486,13 +501,13 @@ module SmarterCSV
486
501
 
487
502
  unless options[:user_provided_headers] # wouldn't make sense to re-map user provided headers
488
503
  key_mappingH = options[:key_mapping]
504
+
489
505
  # do some key mapping on the keys in the file header
490
506
  # if you want to completely delete a key, then map it to nil or to ''
491
507
  if !key_mappingH.nil? && key_mappingH.class == Hash && key_mappingH.keys.size > 0
492
508
  unless options[:silence_missing_keys]
493
509
  # if silence_missing_keys are not set, raise error if missing header
494
510
  missing_keys = key_mappingH.keys - headerA
495
-
496
511
  puts "WARNING: missing header(s): #{missing_keys.join(",")}" unless missing_keys.empty?
497
512
  end
498
513
 
@@ -510,12 +525,21 @@ module SmarterCSV
510
525
  raise SmarterCSV::DuplicateHeaders, "ERROR: duplicate headers: #{duplicate_headers.join(',')}"
511
526
  end
512
527
 
513
- if options[:required_headers] && options[:required_headers].is_a?(Array)
514
- missing_headers = []
515
- options[:required_headers].each do |k|
516
- missing_headers << k unless headerA.include?(k)
528
+ # deprecate required_headers
529
+ if !options[:required_headers].nil?
530
+ puts "DEPRECATION WARNING: please use 'required_keys' instead of 'required headers'"
531
+ if options[:required_keys].nil?
532
+ options[:required_keys] = options[:required_headers]
533
+ options[:required_headers] = nil
517
534
  end
518
- raise SmarterCSV::MissingHeaders, "ERROR: missing headers: #{missing_headers.join(',')}" unless missing_headers.empty?
535
+ end
536
+
537
+ if options[:required_keys] && options[:required_keys].is_a?(Array)
538
+ missing_keys = []
539
+ options[:required_keys].each do |k|
540
+ missing_keys << k unless headerA.include?(k)
541
+ end
542
+ raise SmarterCSV::MissingHeaders, "ERROR: missing attributes: #{missing_keys.join(',')}" unless missing_keys.empty?
519
543
  end
520
544
 
521
545
  @headers = headerA
@@ -546,7 +570,7 @@ module SmarterCSV
546
570
 
547
571
  def remove_bom(str)
548
572
  str_as_hex = str.bytes.map{|x| x.to_s(16)}
549
- # if string does not start with one of the bytes above, there is no BOM
573
+ # if string does not start with one of the bytes, there is no BOM
550
574
  return str unless %w[ef fe ff 0].include?(str_as_hex[0])
551
575
 
552
576
  return str.byteslice(4..-1) if [UTF_32_BOM, UTF_32LE_BOM].include?(str_as_hex[0..3])
@@ -557,34 +581,19 @@ module SmarterCSV
557
581
  str
558
582
  end
559
583
 
560
- def candidated_column_separators_from_headers(filehandle, options, delimiters)
561
- candidates = Hash.new(0)
562
- line = readline_with_counts(filehandle, options.slice(:row_sep))
563
-
564
- delimiters.each do |d|
565
- candidates[d] += line.scan(d).count
566
- end
567
-
568
- rewind(filehandle)
569
-
570
- candidates
584
+ def validate_options!(options)
585
+ keys = options.keys
586
+ errors = []
587
+ errors << "invalid row_sep" if keys.include?(:row_sep) && !option_valid?(options[:row_sep])
588
+ errors << "invalid col_sep" if keys.include?(:col_sep) && !option_valid?(options[:col_sep])
589
+ errors << "invalid quote_char" if keys.include?(:quote_char) && !option_valid?(options[:quote_char])
590
+ raise SmarterCSV::ValidationError, errors.inspect if errors.any?
571
591
  end
572
592
 
573
- def candidated_column_separators_from_contents(filehandle, options, delimiters)
574
- candidates = Hash.new(0)
575
-
576
- 5.times do
577
- line = readline_with_counts(filehandle, options.slice(:row_sep))
578
- delimiters.each do |d|
579
- candidates[d] += line.scan(d).count
580
- end
581
- rescue EOFError # short files
582
- break
583
- end
584
-
585
- rewind(filehandle)
586
-
587
- candidates
593
+ def option_valid?(str)
594
+ return true if str.is_a?(Symbol) && str == :auto
595
+ return true if str.is_a?(String) && !str.empty?
596
+ false
588
597
  end
589
598
  end
590
599
  end
metadata CHANGED
@@ -1,14 +1,14 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: smarter_csv
3
3
  version: !ruby/object:Gem::Version
4
- version: 1.8.0
4
+ version: 1.8.2
5
5
  platform: ruby
6
6
  authors:
7
7
  - Tilo Sloboda
8
8
  autorequire:
9
9
  bindir: bin
10
10
  cert_chain: []
11
- date: 2023-03-19 00:00:00.000000000 Z
11
+ date: 2023-03-22 00:00:00.000000000 Z
12
12
  dependencies:
13
13
  - !ruby/object:Gem::Dependency
14
14
  name: awesome_print
@@ -112,6 +112,7 @@ files:
112
112
  - LICENSE.txt
113
113
  - README.md
114
114
  - Rakefile
115
+ - TO_DO_v2.md
115
116
  - ext/smarter_csv/extconf.rb
116
117
  - ext/smarter_csv/smarter_csv.c
117
118
  - lib/extensions/hash.rb