smarter_csv 1.8.0 → 1.8.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: 55400b3977ce35c58d60c4101362b68d99f2dbf7cb6a63956ae3b6ab79fcf1ac
4
- data.tar.gz: 41f46d3e4de69a7924ecd2214ba4e37766106469d1b8b257fd752a96204a47fd
3
+ metadata.gz: 4a6ec6f3a579d9c1e6bfc2c3c9006f64d8c7b705eeca6ec048ea56c688f8ea1c
4
+ data.tar.gz: ba9a4a289adcc2fc398ae608f9570c28baac57b877852f3ea37c78fa57f2d7e3
5
5
  SHA512:
6
- metadata.gz: 24ecc14cf9c65efe5c11e4bd20753420aa8ccd7385171cd21eac2e1be92c4896087cdc2a18799fa111c0f36154ad4481daed7f08b752f4fae2b5f27241b8cf6c
7
- data.tar.gz: c1d70e18a7ae8057e58cbf73b62f4896dd7030bc5fd2e927669e5ea829f9a3c11daeb9c8b83296dbb46e6f0d23034245b7207882a77b54cc1ca128a581175359
6
+ metadata.gz: d8f516501a5539e30789e2d18c4d051f50372786d8df1272192c2bc7997470cf5d5e1ae94b776d0d580cb62ed8ffb0f6591ccc2be5d60eae6e421f22f0c92f94
7
+ data.tar.gz: 2993c59278adb531cf2299c0aa3869c637868f2f4e6421f845e4ade35011f94ee6e226f43e07b206c67c354c2eb38c0a41981ffbdbff00950690b94b06b2aacd
data/CHANGELOG.md CHANGED
@@ -1,8 +1,21 @@
1
1
 
2
2
  # SmarterCSV 1.x Change Log
3
3
 
4
- ## 1.8.0 (2023-03-18)
4
+ ## 1.8.2 (2023-03-21)
5
+ * bugfix: do not raise `NoColSepDetected` for CSV files with only one column in most cases (issue #222)
6
+ If the first lines contain non-ASCII characters, and no col_sep is detected, it will still raise `NoColSepDetected`
7
+
8
+ ## 1.8.1 (2023-03-19)
9
+ * added validation against invalid values for :col_sep, :row_sep, :quote_char (issue #216)
10
+ * deprecating `required_headers` and replace with `required_keys` (issue #140)
11
+ * fixed issue with require statement
12
+
13
+ ## 1.8.0 (2023-03-18) BREAKING
5
14
  * NEW DEFAULTS: `col_sep: :auto`, `row_sep: :auto`. Fully automatic detection by default.
15
+
16
+ MAKE SURE to rescue `NoColSepDetected` if your CSV files can have unexpected formats,
17
+ e.g. from users uploading them to a service, and handle those cases.
18
+
6
19
  * ignore Byte Order Marker (BOM) in first line in file (issues #27, #219)
7
20
 
8
21
  ## 1.7.4 (2023-01-13)
data/README.md CHANGED
@@ -73,6 +73,15 @@ $ hexdump -C spec/fixtures/bom_test_feff.csv
73
73
  00000040 73 2c 35 36 37 38 0d 0a |s,5678..|
74
74
  ```
75
75
 
76
+ ### Examples
77
+
78
+ Here are some examples to demonstrate the versatility of SmarterCSV.
79
+
80
+ **It is generally recommended to rescue `SmarterCSVException` or it's sub-classes.**
81
+
82
+ By default SmarterCSV determines the `row_sep` and `col_sep` values automatically. In cases where the automatic detection fails, an exception will be raised, e.g. `NoColSepDetected`. Rescuing from these exceptions will make sure that you don't miss processing CSV files, in case users upload CSV files with unexpected formats.
83
+
84
+ In rare cases you may have to manually set these values, after going through the troubleshooting procedure described above.
76
85
 
77
86
  #### Example 1a: How SmarterCSV processes CSV-files as array of hashes:
78
87
  Please note how each hash contains only the keys for columns with non-null values.
@@ -267,7 +276,8 @@ And header and data validations will also be supported in 2.x
267
276
  ---------------------------------------------------------------------------------------------------------------------------------
268
277
  | :key_mapping | nil | a hash which maps headers from the CSV file to keys in the result hash |
269
278
  | :silence_missing_key | false | ignore missing keys in `key_mapping` if true |
270
- | :required_headers | nil | An array. Each of the given headers must be present after header manipulation, |
279
+ | :required_keys | nil | An array. Specify the required names AFTER header transformation. |
280
+ | :required_headers | nil | (DEPRECATED / renamed) Use `required_keys` instead |
271
281
  | | | or an exception is raised No validation if nil is given. |
272
282
  | :remove_unmapped_keys | false | when using :key_mapping option, should non-mapped keys / columns be removed? |
273
283
  | :downcase_header | true | downcase all column headers |
data/TO_DO_v2.md ADDED
@@ -0,0 +1,14 @@
1
+ # SmarterCSV v2.0 TO DO List
2
+
3
+ * add enumerable to speed up parallel processing [issue #66](https://github.com/tilo/smarter_csv/issues/66), [issue #32](https://github.com/tilo/smarter_csv/issues/32)
4
+ * use Procs for validations and transformatoins [issue #118](https://github.com/tilo/smarter_csv/issues/118)
5
+ * make @errors and @warnings work [issue #118](https://github.com/tilo/smarter_csv/issues/118)
6
+ * skip file opening, allow reading from CSV string, e.g. reading from S3 file [issue #120](https://github.com/tilo/smarter_csv/issues/120).
7
+ Or stream large file from S3 (linked in the issue)
8
+ * Collect all Errors, before surfacing them. Avoid throwing an exception on the first error [issue #133](https://github.com/tilo/smarter_csv/issues/133)
9
+ * Don't call rewind on filehandle
10
+ * [2.0 BUG] :convert_values_to_numeric_unless_leading_zeros drops leading zeros [issue #151](https://github.com/tilo/smarter_csv/issues/151)
11
+ * [2.0 BUG] convert_to_float saves Proc as @@convert_to_integer [issue #157](https://github.com/tilo/smarter_csv/issues/157)
12
+ * Provide an example for custom Procs for hash_transformations in the docs [issue #174](https://github.com/tilo/smarter_csv/issues/174)
13
+ * Replace remove_empty_values: false [issue #213](https://github.com/tilo/smarter_csv/issues/213)
14
+
@@ -27,7 +27,6 @@ static VALUE rb_parse_csv_line(VALUE self, VALUE line, VALUE col_sep, VALUE quot
27
27
  long col_sep_len = RSTRING_LEN(col_sep);
28
28
 
29
29
  char *quoteP = RSTRING_PTR(quote_char);
30
- long quote_len = RSTRING_LEN(quote_char);
31
30
  long quote_count = 0;
32
31
 
33
32
  bool col_sep_found = true;
@@ -1,5 +1,5 @@
1
1
  # frozen_string_literal: true
2
2
 
3
3
  module SmarterCSV
4
- VERSION = "1.8.0"
4
+ VERSION = "1.8.2"
5
5
  end
data/lib/smarter_csv.rb CHANGED
@@ -3,24 +3,25 @@
3
3
  require_relative "extensions/hash"
4
4
  require_relative "smarter_csv/version"
5
5
 
6
- # require_relative "smarter_csv/smarter_csv" unless ENV['CI'] # does not compile/link in CI?
7
- require 'smarter_csv.bundle' unless ENV['CI'] # does not compile/link in CI?
6
+ require_relative "smarter_csv/smarter_csv" unless ENV['CI'] # does not compile/link in CI?
7
+ # require 'smarter_csv.bundle' unless ENV['CI'] # local testing
8
8
 
9
9
  module SmarterCSV
10
10
  class SmarterCSVException < StandardError; end
11
11
  class HeaderSizeMismatch < SmarterCSVException; end
12
12
  class IncorrectOption < SmarterCSVException; end
13
+ class ValidationError < SmarterCSVException; end
13
14
  class DuplicateHeaders < SmarterCSVException; end
14
15
  class MissingHeaders < SmarterCSVException; end
15
16
  class NoColSepDetected < SmarterCSVException; end
16
- class KeyMappingError < SmarterCSVException; end
17
- class MalformedCSVError < SmarterCSVException; end
17
+ class KeyMappingError < SmarterCSVException; end # CURRENTLY UNUSED -> version 1.9.0
18
18
 
19
19
  # first parameter: filename or input object which responds to readline method
20
20
  def SmarterCSV.process(input, options = {}, &block)
21
21
  options = default_options.merge(options)
22
22
  options[:invalid_byte_sequence] = '' if options[:invalid_byte_sequence].nil?
23
23
  puts "SmarterCSV OPTIONS: #{options.inspect}" if options[:verbose]
24
+ validate_options!(options)
24
25
 
25
26
  headerA = []
26
27
  result = []
@@ -214,7 +215,7 @@ module SmarterCSV
214
215
  headers_in_file: true,
215
216
  invalid_byte_sequence: '',
216
217
  keep_original_headers: false,
217
- key_mapping_hash: nil,
218
+ key_mapping: nil,
218
219
  quote_char: '"',
219
220
  remove_empty_hashes: true,
220
221
  remove_empty_values: true,
@@ -222,6 +223,7 @@ module SmarterCSV
222
223
  remove_values_matching: nil,
223
224
  remove_zero_values: false,
224
225
  required_headers: nil,
226
+ required_keys: nil,
225
227
  row_sep: :auto, # was: $/,
226
228
  silence_missing_keys: false,
227
229
  skip_lines: nil,
@@ -391,15 +393,28 @@ module SmarterCSV
391
393
  def guess_column_separator(filehandle, options)
392
394
  skip_lines(filehandle, options)
393
395
 
394
- possible_delimiters = [',', "\t", ';', ':', '|']
396
+ delimiters = [',', "\t", ';', ':', '|']
395
397
 
396
- candidates = if options.fetch(:headers_in_file)
397
- candidated_column_separators_from_headers(filehandle, options, possible_delimiters)
398
- else
399
- candidated_column_separators_from_contents(filehandle, options, possible_delimiters)
400
- end
398
+ line = nil
399
+ has_header = options[:headers_in_file]
400
+ candidates = Hash.new(0)
401
+ count = has_header ? 1 : 5
402
+ count.times do
403
+ line = readline_with_counts(filehandle, options)
404
+ delimiters.each do |d|
405
+ candidates[d] += line.scan(d).count
406
+ end
407
+ rescue EOFError # short files
408
+ break
409
+ end
410
+ rewind(filehandle)
401
411
 
402
- raise SmarterCSV::NoColSepDetected if candidates.values.max == 0
412
+ if candidates.values.max == 0
413
+ # if the header only contains
414
+ return ',' if line =~ /^\w+$/
415
+
416
+ raise SmarterCSV::NoColSepDetected
417
+ end
403
418
 
404
419
  candidates.key(candidates.values.max)
405
420
  end
@@ -486,13 +501,13 @@ module SmarterCSV
486
501
 
487
502
  unless options[:user_provided_headers] # wouldn't make sense to re-map user provided headers
488
503
  key_mappingH = options[:key_mapping]
504
+
489
505
  # do some key mapping on the keys in the file header
490
506
  # if you want to completely delete a key, then map it to nil or to ''
491
507
  if !key_mappingH.nil? && key_mappingH.class == Hash && key_mappingH.keys.size > 0
492
508
  unless options[:silence_missing_keys]
493
509
  # if silence_missing_keys are not set, raise error if missing header
494
510
  missing_keys = key_mappingH.keys - headerA
495
-
496
511
  puts "WARNING: missing header(s): #{missing_keys.join(",")}" unless missing_keys.empty?
497
512
  end
498
513
 
@@ -510,12 +525,21 @@ module SmarterCSV
510
525
  raise SmarterCSV::DuplicateHeaders, "ERROR: duplicate headers: #{duplicate_headers.join(',')}"
511
526
  end
512
527
 
513
- if options[:required_headers] && options[:required_headers].is_a?(Array)
514
- missing_headers = []
515
- options[:required_headers].each do |k|
516
- missing_headers << k unless headerA.include?(k)
528
+ # deprecate required_headers
529
+ if !options[:required_headers].nil?
530
+ puts "DEPRECATION WARNING: please use 'required_keys' instead of 'required headers'"
531
+ if options[:required_keys].nil?
532
+ options[:required_keys] = options[:required_headers]
533
+ options[:required_headers] = nil
517
534
  end
518
- raise SmarterCSV::MissingHeaders, "ERROR: missing headers: #{missing_headers.join(',')}" unless missing_headers.empty?
535
+ end
536
+
537
+ if options[:required_keys] && options[:required_keys].is_a?(Array)
538
+ missing_keys = []
539
+ options[:required_keys].each do |k|
540
+ missing_keys << k unless headerA.include?(k)
541
+ end
542
+ raise SmarterCSV::MissingHeaders, "ERROR: missing attributes: #{missing_keys.join(',')}" unless missing_keys.empty?
519
543
  end
520
544
 
521
545
  @headers = headerA
@@ -546,7 +570,7 @@ module SmarterCSV
546
570
 
547
571
  def remove_bom(str)
548
572
  str_as_hex = str.bytes.map{|x| x.to_s(16)}
549
- # if string does not start with one of the bytes above, there is no BOM
573
+ # if string does not start with one of the bytes, there is no BOM
550
574
  return str unless %w[ef fe ff 0].include?(str_as_hex[0])
551
575
 
552
576
  return str.byteslice(4..-1) if [UTF_32_BOM, UTF_32LE_BOM].include?(str_as_hex[0..3])
@@ -557,34 +581,19 @@ module SmarterCSV
557
581
  str
558
582
  end
559
583
 
560
- def candidated_column_separators_from_headers(filehandle, options, delimiters)
561
- candidates = Hash.new(0)
562
- line = readline_with_counts(filehandle, options.slice(:row_sep))
563
-
564
- delimiters.each do |d|
565
- candidates[d] += line.scan(d).count
566
- end
567
-
568
- rewind(filehandle)
569
-
570
- candidates
584
+ def validate_options!(options)
585
+ keys = options.keys
586
+ errors = []
587
+ errors << "invalid row_sep" if keys.include?(:row_sep) && !option_valid?(options[:row_sep])
588
+ errors << "invalid col_sep" if keys.include?(:col_sep) && !option_valid?(options[:col_sep])
589
+ errors << "invalid quote_char" if keys.include?(:quote_char) && !option_valid?(options[:quote_char])
590
+ raise SmarterCSV::ValidationError, errors.inspect if errors.any?
571
591
  end
572
592
 
573
- def candidated_column_separators_from_contents(filehandle, options, delimiters)
574
- candidates = Hash.new(0)
575
-
576
- 5.times do
577
- line = readline_with_counts(filehandle, options.slice(:row_sep))
578
- delimiters.each do |d|
579
- candidates[d] += line.scan(d).count
580
- end
581
- rescue EOFError # short files
582
- break
583
- end
584
-
585
- rewind(filehandle)
586
-
587
- candidates
593
+ def option_valid?(str)
594
+ return true if str.is_a?(Symbol) && str == :auto
595
+ return true if str.is_a?(String) && !str.empty?
596
+ false
588
597
  end
589
598
  end
590
599
  end
metadata CHANGED
@@ -1,14 +1,14 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: smarter_csv
3
3
  version: !ruby/object:Gem::Version
4
- version: 1.8.0
4
+ version: 1.8.2
5
5
  platform: ruby
6
6
  authors:
7
7
  - Tilo Sloboda
8
8
  autorequire:
9
9
  bindir: bin
10
10
  cert_chain: []
11
- date: 2023-03-19 00:00:00.000000000 Z
11
+ date: 2023-03-22 00:00:00.000000000 Z
12
12
  dependencies:
13
13
  - !ruby/object:Gem::Dependency
14
14
  name: awesome_print
@@ -112,6 +112,7 @@ files:
112
112
  - LICENSE.txt
113
113
  - README.md
114
114
  - Rakefile
115
+ - TO_DO_v2.md
115
116
  - ext/smarter_csv/extconf.rb
116
117
  - ext/smarter_csv/smarter_csv.c
117
118
  - lib/extensions/hash.rb