smarter_csv 1.8.0 → 1.8.2
Sign up to get free protection for your applications and to get access to all the features.
- checksums.yaml +4 -4
- data/CHANGELOG.md +14 -1
- data/README.md +11 -1
- data/TO_DO_v2.md +14 -0
- data/ext/smarter_csv/smarter_csv.c +0 -1
- data/lib/smarter_csv/version.rb +1 -1
- data/lib/smarter_csv.rb +54 -45
- metadata +3 -2
checksums.yaml
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
---
|
2
2
|
SHA256:
|
3
|
-
metadata.gz:
|
4
|
-
data.tar.gz:
|
3
|
+
metadata.gz: 4a6ec6f3a579d9c1e6bfc2c3c9006f64d8c7b705eeca6ec048ea56c688f8ea1c
|
4
|
+
data.tar.gz: ba9a4a289adcc2fc398ae608f9570c28baac57b877852f3ea37c78fa57f2d7e3
|
5
5
|
SHA512:
|
6
|
-
metadata.gz:
|
7
|
-
data.tar.gz:
|
6
|
+
metadata.gz: d8f516501a5539e30789e2d18c4d051f50372786d8df1272192c2bc7997470cf5d5e1ae94b776d0d580cb62ed8ffb0f6591ccc2be5d60eae6e421f22f0c92f94
|
7
|
+
data.tar.gz: 2993c59278adb531cf2299c0aa3869c637868f2f4e6421f845e4ade35011f94ee6e226f43e07b206c67c354c2eb38c0a41981ffbdbff00950690b94b06b2aacd
|
data/CHANGELOG.md
CHANGED
@@ -1,8 +1,21 @@
|
|
1
1
|
|
2
2
|
# SmarterCSV 1.x Change Log
|
3
3
|
|
4
|
-
## 1.8.
|
4
|
+
## 1.8.2 (2023-03-21)
|
5
|
+
* bugfix: do not raise `NoColSepDetected` for CSV files with only one column in most cases (issue #222)
|
6
|
+
If the first lines contain non-ASCII characters, and no col_sep is detected, it will still raise `NoColSepDetected`
|
7
|
+
|
8
|
+
## 1.8.1 (2023-03-19)
|
9
|
+
* added validation against invalid values for :col_sep, :row_sep, :quote_char (issue #216)
|
10
|
+
* deprecating `required_headers` and replace with `required_keys` (issue #140)
|
11
|
+
* fixed issue with require statement
|
12
|
+
|
13
|
+
## 1.8.0 (2023-03-18) BREAKING
|
5
14
|
* NEW DEFAULTS: `col_sep: :auto`, `row_sep: :auto`. Fully automatic detection by default.
|
15
|
+
|
16
|
+
MAKE SURE to rescue `NoColSepDetected` if your CSV files can have unexpected formats,
|
17
|
+
e.g. from users uploading them to a service, and handle those cases.
|
18
|
+
|
6
19
|
* ignore Byte Order Marker (BOM) in first line in file (issues #27, #219)
|
7
20
|
|
8
21
|
## 1.7.4 (2023-01-13)
|
data/README.md
CHANGED
@@ -73,6 +73,15 @@ $ hexdump -C spec/fixtures/bom_test_feff.csv
|
|
73
73
|
00000040 73 2c 35 36 37 38 0d 0a |s,5678..|
|
74
74
|
```
|
75
75
|
|
76
|
+
### Examples
|
77
|
+
|
78
|
+
Here are some examples to demonstrate the versatility of SmarterCSV.
|
79
|
+
|
80
|
+
**It is generally recommended to rescue `SmarterCSVException` or it's sub-classes.**
|
81
|
+
|
82
|
+
By default SmarterCSV determines the `row_sep` and `col_sep` values automatically. In cases where the automatic detection fails, an exception will be raised, e.g. `NoColSepDetected`. Rescuing from these exceptions will make sure that you don't miss processing CSV files, in case users upload CSV files with unexpected formats.
|
83
|
+
|
84
|
+
In rare cases you may have to manually set these values, after going through the troubleshooting procedure described above.
|
76
85
|
|
77
86
|
#### Example 1a: How SmarterCSV processes CSV-files as array of hashes:
|
78
87
|
Please note how each hash contains only the keys for columns with non-null values.
|
@@ -267,7 +276,8 @@ And header and data validations will also be supported in 2.x
|
|
267
276
|
---------------------------------------------------------------------------------------------------------------------------------
|
268
277
|
| :key_mapping | nil | a hash which maps headers from the CSV file to keys in the result hash |
|
269
278
|
| :silence_missing_key | false | ignore missing keys in `key_mapping` if true |
|
270
|
-
| :
|
279
|
+
| :required_keys | nil | An array. Specify the required names AFTER header transformation. |
|
280
|
+
| :required_headers | nil | (DEPRECATED / renamed) Use `required_keys` instead |
|
271
281
|
| | | or an exception is raised No validation if nil is given. |
|
272
282
|
| :remove_unmapped_keys | false | when using :key_mapping option, should non-mapped keys / columns be removed? |
|
273
283
|
| :downcase_header | true | downcase all column headers |
|
data/TO_DO_v2.md
ADDED
@@ -0,0 +1,14 @@
|
|
1
|
+
# SmarterCSV v2.0 TO DO List
|
2
|
+
|
3
|
+
* add enumerable to speed up parallel processing [issue #66](https://github.com/tilo/smarter_csv/issues/66), [issue #32](https://github.com/tilo/smarter_csv/issues/32)
|
4
|
+
* use Procs for validations and transformatoins [issue #118](https://github.com/tilo/smarter_csv/issues/118)
|
5
|
+
* make @errors and @warnings work [issue #118](https://github.com/tilo/smarter_csv/issues/118)
|
6
|
+
* skip file opening, allow reading from CSV string, e.g. reading from S3 file [issue #120](https://github.com/tilo/smarter_csv/issues/120).
|
7
|
+
Or stream large file from S3 (linked in the issue)
|
8
|
+
* Collect all Errors, before surfacing them. Avoid throwing an exception on the first error [issue #133](https://github.com/tilo/smarter_csv/issues/133)
|
9
|
+
* Don't call rewind on filehandle
|
10
|
+
* [2.0 BUG] :convert_values_to_numeric_unless_leading_zeros drops leading zeros [issue #151](https://github.com/tilo/smarter_csv/issues/151)
|
11
|
+
* [2.0 BUG] convert_to_float saves Proc as @@convert_to_integer [issue #157](https://github.com/tilo/smarter_csv/issues/157)
|
12
|
+
* Provide an example for custom Procs for hash_transformations in the docs [issue #174](https://github.com/tilo/smarter_csv/issues/174)
|
13
|
+
* Replace remove_empty_values: false [issue #213](https://github.com/tilo/smarter_csv/issues/213)
|
14
|
+
|
@@ -27,7 +27,6 @@ static VALUE rb_parse_csv_line(VALUE self, VALUE line, VALUE col_sep, VALUE quot
|
|
27
27
|
long col_sep_len = RSTRING_LEN(col_sep);
|
28
28
|
|
29
29
|
char *quoteP = RSTRING_PTR(quote_char);
|
30
|
-
long quote_len = RSTRING_LEN(quote_char);
|
31
30
|
long quote_count = 0;
|
32
31
|
|
33
32
|
bool col_sep_found = true;
|
data/lib/smarter_csv/version.rb
CHANGED
data/lib/smarter_csv.rb
CHANGED
@@ -3,24 +3,25 @@
|
|
3
3
|
require_relative "extensions/hash"
|
4
4
|
require_relative "smarter_csv/version"
|
5
5
|
|
6
|
-
|
7
|
-
require 'smarter_csv.bundle' unless ENV['CI'] #
|
6
|
+
require_relative "smarter_csv/smarter_csv" unless ENV['CI'] # does not compile/link in CI?
|
7
|
+
# require 'smarter_csv.bundle' unless ENV['CI'] # local testing
|
8
8
|
|
9
9
|
module SmarterCSV
|
10
10
|
class SmarterCSVException < StandardError; end
|
11
11
|
class HeaderSizeMismatch < SmarterCSVException; end
|
12
12
|
class IncorrectOption < SmarterCSVException; end
|
13
|
+
class ValidationError < SmarterCSVException; end
|
13
14
|
class DuplicateHeaders < SmarterCSVException; end
|
14
15
|
class MissingHeaders < SmarterCSVException; end
|
15
16
|
class NoColSepDetected < SmarterCSVException; end
|
16
|
-
class KeyMappingError < SmarterCSVException; end
|
17
|
-
class MalformedCSVError < SmarterCSVException; end
|
17
|
+
class KeyMappingError < SmarterCSVException; end # CURRENTLY UNUSED -> version 1.9.0
|
18
18
|
|
19
19
|
# first parameter: filename or input object which responds to readline method
|
20
20
|
def SmarterCSV.process(input, options = {}, &block)
|
21
21
|
options = default_options.merge(options)
|
22
22
|
options[:invalid_byte_sequence] = '' if options[:invalid_byte_sequence].nil?
|
23
23
|
puts "SmarterCSV OPTIONS: #{options.inspect}" if options[:verbose]
|
24
|
+
validate_options!(options)
|
24
25
|
|
25
26
|
headerA = []
|
26
27
|
result = []
|
@@ -214,7 +215,7 @@ module SmarterCSV
|
|
214
215
|
headers_in_file: true,
|
215
216
|
invalid_byte_sequence: '',
|
216
217
|
keep_original_headers: false,
|
217
|
-
|
218
|
+
key_mapping: nil,
|
218
219
|
quote_char: '"',
|
219
220
|
remove_empty_hashes: true,
|
220
221
|
remove_empty_values: true,
|
@@ -222,6 +223,7 @@ module SmarterCSV
|
|
222
223
|
remove_values_matching: nil,
|
223
224
|
remove_zero_values: false,
|
224
225
|
required_headers: nil,
|
226
|
+
required_keys: nil,
|
225
227
|
row_sep: :auto, # was: $/,
|
226
228
|
silence_missing_keys: false,
|
227
229
|
skip_lines: nil,
|
@@ -391,15 +393,28 @@ module SmarterCSV
|
|
391
393
|
def guess_column_separator(filehandle, options)
|
392
394
|
skip_lines(filehandle, options)
|
393
395
|
|
394
|
-
|
396
|
+
delimiters = [',', "\t", ';', ':', '|']
|
395
397
|
|
396
|
-
|
397
|
-
|
398
|
-
|
399
|
-
|
400
|
-
|
398
|
+
line = nil
|
399
|
+
has_header = options[:headers_in_file]
|
400
|
+
candidates = Hash.new(0)
|
401
|
+
count = has_header ? 1 : 5
|
402
|
+
count.times do
|
403
|
+
line = readline_with_counts(filehandle, options)
|
404
|
+
delimiters.each do |d|
|
405
|
+
candidates[d] += line.scan(d).count
|
406
|
+
end
|
407
|
+
rescue EOFError # short files
|
408
|
+
break
|
409
|
+
end
|
410
|
+
rewind(filehandle)
|
401
411
|
|
402
|
-
|
412
|
+
if candidates.values.max == 0
|
413
|
+
# if the header only contains
|
414
|
+
return ',' if line =~ /^\w+$/
|
415
|
+
|
416
|
+
raise SmarterCSV::NoColSepDetected
|
417
|
+
end
|
403
418
|
|
404
419
|
candidates.key(candidates.values.max)
|
405
420
|
end
|
@@ -486,13 +501,13 @@ module SmarterCSV
|
|
486
501
|
|
487
502
|
unless options[:user_provided_headers] # wouldn't make sense to re-map user provided headers
|
488
503
|
key_mappingH = options[:key_mapping]
|
504
|
+
|
489
505
|
# do some key mapping on the keys in the file header
|
490
506
|
# if you want to completely delete a key, then map it to nil or to ''
|
491
507
|
if !key_mappingH.nil? && key_mappingH.class == Hash && key_mappingH.keys.size > 0
|
492
508
|
unless options[:silence_missing_keys]
|
493
509
|
# if silence_missing_keys are not set, raise error if missing header
|
494
510
|
missing_keys = key_mappingH.keys - headerA
|
495
|
-
|
496
511
|
puts "WARNING: missing header(s): #{missing_keys.join(",")}" unless missing_keys.empty?
|
497
512
|
end
|
498
513
|
|
@@ -510,12 +525,21 @@ module SmarterCSV
|
|
510
525
|
raise SmarterCSV::DuplicateHeaders, "ERROR: duplicate headers: #{duplicate_headers.join(',')}"
|
511
526
|
end
|
512
527
|
|
513
|
-
|
514
|
-
|
515
|
-
|
516
|
-
|
528
|
+
# deprecate required_headers
|
529
|
+
if !options[:required_headers].nil?
|
530
|
+
puts "DEPRECATION WARNING: please use 'required_keys' instead of 'required headers'"
|
531
|
+
if options[:required_keys].nil?
|
532
|
+
options[:required_keys] = options[:required_headers]
|
533
|
+
options[:required_headers] = nil
|
517
534
|
end
|
518
|
-
|
535
|
+
end
|
536
|
+
|
537
|
+
if options[:required_keys] && options[:required_keys].is_a?(Array)
|
538
|
+
missing_keys = []
|
539
|
+
options[:required_keys].each do |k|
|
540
|
+
missing_keys << k unless headerA.include?(k)
|
541
|
+
end
|
542
|
+
raise SmarterCSV::MissingHeaders, "ERROR: missing attributes: #{missing_keys.join(',')}" unless missing_keys.empty?
|
519
543
|
end
|
520
544
|
|
521
545
|
@headers = headerA
|
@@ -546,7 +570,7 @@ module SmarterCSV
|
|
546
570
|
|
547
571
|
def remove_bom(str)
|
548
572
|
str_as_hex = str.bytes.map{|x| x.to_s(16)}
|
549
|
-
# if string does not start with one of the bytes
|
573
|
+
# if string does not start with one of the bytes, there is no BOM
|
550
574
|
return str unless %w[ef fe ff 0].include?(str_as_hex[0])
|
551
575
|
|
552
576
|
return str.byteslice(4..-1) if [UTF_32_BOM, UTF_32LE_BOM].include?(str_as_hex[0..3])
|
@@ -557,34 +581,19 @@ module SmarterCSV
|
|
557
581
|
str
|
558
582
|
end
|
559
583
|
|
560
|
-
def
|
561
|
-
|
562
|
-
|
563
|
-
|
564
|
-
|
565
|
-
|
566
|
-
|
567
|
-
|
568
|
-
rewind(filehandle)
|
569
|
-
|
570
|
-
candidates
|
584
|
+
def validate_options!(options)
|
585
|
+
keys = options.keys
|
586
|
+
errors = []
|
587
|
+
errors << "invalid row_sep" if keys.include?(:row_sep) && !option_valid?(options[:row_sep])
|
588
|
+
errors << "invalid col_sep" if keys.include?(:col_sep) && !option_valid?(options[:col_sep])
|
589
|
+
errors << "invalid quote_char" if keys.include?(:quote_char) && !option_valid?(options[:quote_char])
|
590
|
+
raise SmarterCSV::ValidationError, errors.inspect if errors.any?
|
571
591
|
end
|
572
592
|
|
573
|
-
def
|
574
|
-
|
575
|
-
|
576
|
-
|
577
|
-
line = readline_with_counts(filehandle, options.slice(:row_sep))
|
578
|
-
delimiters.each do |d|
|
579
|
-
candidates[d] += line.scan(d).count
|
580
|
-
end
|
581
|
-
rescue EOFError # short files
|
582
|
-
break
|
583
|
-
end
|
584
|
-
|
585
|
-
rewind(filehandle)
|
586
|
-
|
587
|
-
candidates
|
593
|
+
def option_valid?(str)
|
594
|
+
return true if str.is_a?(Symbol) && str == :auto
|
595
|
+
return true if str.is_a?(String) && !str.empty?
|
596
|
+
false
|
588
597
|
end
|
589
598
|
end
|
590
599
|
end
|
metadata
CHANGED
@@ -1,14 +1,14 @@
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
2
2
|
name: smarter_csv
|
3
3
|
version: !ruby/object:Gem::Version
|
4
|
-
version: 1.8.
|
4
|
+
version: 1.8.2
|
5
5
|
platform: ruby
|
6
6
|
authors:
|
7
7
|
- Tilo Sloboda
|
8
8
|
autorequire:
|
9
9
|
bindir: bin
|
10
10
|
cert_chain: []
|
11
|
-
date: 2023-03-
|
11
|
+
date: 2023-03-22 00:00:00.000000000 Z
|
12
12
|
dependencies:
|
13
13
|
- !ruby/object:Gem::Dependency
|
14
14
|
name: awesome_print
|
@@ -112,6 +112,7 @@ files:
|
|
112
112
|
- LICENSE.txt
|
113
113
|
- README.md
|
114
114
|
- Rakefile
|
115
|
+
- TO_DO_v2.md
|
115
116
|
- ext/smarter_csv/extconf.rb
|
116
117
|
- ext/smarter_csv/smarter_csv.c
|
117
118
|
- lib/extensions/hash.rb
|