smarter_csv 1.8.1 → 1.8.2
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +4 -4
- data/CHANGELOG.md +9 -1
- data/README.md +4 -1
- data/TO_DO_v2.md +14 -0
- data/lib/smarter_csv/version.rb +1 -1
- data/lib/smarter_csv.rb +20 -37
- metadata +3 -2
checksums.yaml
CHANGED
|
@@ -1,7 +1,7 @@
|
|
|
1
1
|
---
|
|
2
2
|
SHA256:
|
|
3
|
-
metadata.gz:
|
|
4
|
-
data.tar.gz:
|
|
3
|
+
metadata.gz: 4a6ec6f3a579d9c1e6bfc2c3c9006f64d8c7b705eeca6ec048ea56c688f8ea1c
|
|
4
|
+
data.tar.gz: ba9a4a289adcc2fc398ae608f9570c28baac57b877852f3ea37c78fa57f2d7e3
|
|
5
5
|
SHA512:
|
|
6
|
-
metadata.gz:
|
|
7
|
-
data.tar.gz:
|
|
6
|
+
metadata.gz: d8f516501a5539e30789e2d18c4d051f50372786d8df1272192c2bc7997470cf5d5e1ae94b776d0d580cb62ed8ffb0f6591ccc2be5d60eae6e421f22f0c92f94
|
|
7
|
+
data.tar.gz: 2993c59278adb531cf2299c0aa3869c637868f2f4e6421f845e4ade35011f94ee6e226f43e07b206c67c354c2eb38c0a41981ffbdbff00950690b94b06b2aacd
|
data/CHANGELOG.md
CHANGED
|
@@ -1,13 +1,21 @@
|
|
|
1
1
|
|
|
2
2
|
# SmarterCSV 1.x Change Log
|
|
3
3
|
|
|
4
|
+
## 1.8.2 (2023-03-21)
|
|
5
|
+
* bugfix: do not raise `NoColSepDetected` for CSV files with only one column in most cases (issue #222)
|
|
6
|
+
If the first lines contain non-ASCII characters, and no col_sep is detected, it will still raise `NoColSepDetected`
|
|
7
|
+
|
|
4
8
|
## 1.8.1 (2023-03-19)
|
|
5
9
|
* added validation against invalid values for :col_sep, :row_sep, :quote_char (issue #216)
|
|
6
10
|
* deprecating `required_headers` and replace with `required_keys` (issue #140)
|
|
7
11
|
* fixed issue with require statement
|
|
8
12
|
|
|
9
|
-
## 1.8.0 (2023-03-18)
|
|
13
|
+
## 1.8.0 (2023-03-18) BREAKING
|
|
10
14
|
* NEW DEFAULTS: `col_sep: :auto`, `row_sep: :auto`. Fully automatic detection by default.
|
|
15
|
+
|
|
16
|
+
MAKE SURE to rescue `NoColSepDetected` if your CSV files can have unexpected formats,
|
|
17
|
+
e.g. from users uploading them to a service, and handle those cases.
|
|
18
|
+
|
|
11
19
|
* ignore Byte Order Marker (BOM) in first line in file (issues #27, #219)
|
|
12
20
|
|
|
13
21
|
## 1.7.4 (2023-01-13)
|
data/README.md
CHANGED
|
@@ -77,7 +77,10 @@ $ hexdump -C spec/fixtures/bom_test_feff.csv
|
|
|
77
77
|
|
|
78
78
|
Here are some examples to demonstrate the versatility of SmarterCSV.
|
|
79
79
|
|
|
80
|
-
|
|
80
|
+
**It is generally recommended to rescue `SmarterCSVException` or it's sub-classes.**
|
|
81
|
+
|
|
82
|
+
By default SmarterCSV determines the `row_sep` and `col_sep` values automatically. In cases where the automatic detection fails, an exception will be raised, e.g. `NoColSepDetected`. Rescuing from these exceptions will make sure that you don't miss processing CSV files, in case users upload CSV files with unexpected formats.
|
|
83
|
+
|
|
81
84
|
In rare cases you may have to manually set these values, after going through the troubleshooting procedure described above.
|
|
82
85
|
|
|
83
86
|
#### Example 1a: How SmarterCSV processes CSV-files as array of hashes:
|
data/TO_DO_v2.md
ADDED
|
@@ -0,0 +1,14 @@
|
|
|
1
|
+
# SmarterCSV v2.0 TO DO List
|
|
2
|
+
|
|
3
|
+
* add enumerable to speed up parallel processing [issue #66](https://github.com/tilo/smarter_csv/issues/66), [issue #32](https://github.com/tilo/smarter_csv/issues/32)
|
|
4
|
+
* use Procs for validations and transformatoins [issue #118](https://github.com/tilo/smarter_csv/issues/118)
|
|
5
|
+
* make @errors and @warnings work [issue #118](https://github.com/tilo/smarter_csv/issues/118)
|
|
6
|
+
* skip file opening, allow reading from CSV string, e.g. reading from S3 file [issue #120](https://github.com/tilo/smarter_csv/issues/120).
|
|
7
|
+
Or stream large file from S3 (linked in the issue)
|
|
8
|
+
* Collect all Errors, before surfacing them. Avoid throwing an exception on the first error [issue #133](https://github.com/tilo/smarter_csv/issues/133)
|
|
9
|
+
* Don't call rewind on filehandle
|
|
10
|
+
* [2.0 BUG] :convert_values_to_numeric_unless_leading_zeros drops leading zeros [issue #151](https://github.com/tilo/smarter_csv/issues/151)
|
|
11
|
+
* [2.0 BUG] convert_to_float saves Proc as @@convert_to_integer [issue #157](https://github.com/tilo/smarter_csv/issues/157)
|
|
12
|
+
* Provide an example for custom Procs for hash_transformations in the docs [issue #174](https://github.com/tilo/smarter_csv/issues/174)
|
|
13
|
+
* Replace remove_empty_values: false [issue #213](https://github.com/tilo/smarter_csv/issues/213)
|
|
14
|
+
|
data/lib/smarter_csv/version.rb
CHANGED
data/lib/smarter_csv.rb
CHANGED
|
@@ -393,15 +393,28 @@ module SmarterCSV
|
|
|
393
393
|
def guess_column_separator(filehandle, options)
|
|
394
394
|
skip_lines(filehandle, options)
|
|
395
395
|
|
|
396
|
-
|
|
396
|
+
delimiters = [',', "\t", ';', ':', '|']
|
|
397
397
|
|
|
398
|
-
|
|
399
|
-
|
|
400
|
-
|
|
401
|
-
|
|
402
|
-
|
|
398
|
+
line = nil
|
|
399
|
+
has_header = options[:headers_in_file]
|
|
400
|
+
candidates = Hash.new(0)
|
|
401
|
+
count = has_header ? 1 : 5
|
|
402
|
+
count.times do
|
|
403
|
+
line = readline_with_counts(filehandle, options)
|
|
404
|
+
delimiters.each do |d|
|
|
405
|
+
candidates[d] += line.scan(d).count
|
|
406
|
+
end
|
|
407
|
+
rescue EOFError # short files
|
|
408
|
+
break
|
|
409
|
+
end
|
|
410
|
+
rewind(filehandle)
|
|
403
411
|
|
|
404
|
-
|
|
412
|
+
if candidates.values.max == 0
|
|
413
|
+
# if the header only contains
|
|
414
|
+
return ',' if line =~ /^\w+$/
|
|
415
|
+
|
|
416
|
+
raise SmarterCSV::NoColSepDetected
|
|
417
|
+
end
|
|
405
418
|
|
|
406
419
|
candidates.key(candidates.values.max)
|
|
407
420
|
end
|
|
@@ -582,35 +595,5 @@ module SmarterCSV
|
|
|
582
595
|
return true if str.is_a?(String) && !str.empty?
|
|
583
596
|
false
|
|
584
597
|
end
|
|
585
|
-
|
|
586
|
-
def candidated_column_separators_from_headers(filehandle, options, delimiters)
|
|
587
|
-
candidates = Hash.new(0)
|
|
588
|
-
line = readline_with_counts(filehandle, options.slice(:row_sep))
|
|
589
|
-
|
|
590
|
-
delimiters.each do |d|
|
|
591
|
-
candidates[d] += line.scan(d).count
|
|
592
|
-
end
|
|
593
|
-
|
|
594
|
-
rewind(filehandle)
|
|
595
|
-
|
|
596
|
-
candidates
|
|
597
|
-
end
|
|
598
|
-
|
|
599
|
-
def candidated_column_separators_from_contents(filehandle, options, delimiters)
|
|
600
|
-
candidates = Hash.new(0)
|
|
601
|
-
|
|
602
|
-
5.times do
|
|
603
|
-
line = readline_with_counts(filehandle, options.slice(:row_sep))
|
|
604
|
-
delimiters.each do |d|
|
|
605
|
-
candidates[d] += line.scan(d).count
|
|
606
|
-
end
|
|
607
|
-
rescue EOFError # short files
|
|
608
|
-
break
|
|
609
|
-
end
|
|
610
|
-
|
|
611
|
-
rewind(filehandle)
|
|
612
|
-
|
|
613
|
-
candidates
|
|
614
|
-
end
|
|
615
598
|
end
|
|
616
599
|
end
|
metadata
CHANGED
|
@@ -1,14 +1,14 @@
|
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
|
2
2
|
name: smarter_csv
|
|
3
3
|
version: !ruby/object:Gem::Version
|
|
4
|
-
version: 1.8.
|
|
4
|
+
version: 1.8.2
|
|
5
5
|
platform: ruby
|
|
6
6
|
authors:
|
|
7
7
|
- Tilo Sloboda
|
|
8
8
|
autorequire:
|
|
9
9
|
bindir: bin
|
|
10
10
|
cert_chain: []
|
|
11
|
-
date: 2023-03-
|
|
11
|
+
date: 2023-03-22 00:00:00.000000000 Z
|
|
12
12
|
dependencies:
|
|
13
13
|
- !ruby/object:Gem::Dependency
|
|
14
14
|
name: awesome_print
|
|
@@ -112,6 +112,7 @@ files:
|
|
|
112
112
|
- LICENSE.txt
|
|
113
113
|
- README.md
|
|
114
114
|
- Rakefile
|
|
115
|
+
- TO_DO_v2.md
|
|
115
116
|
- ext/smarter_csv/extconf.rb
|
|
116
117
|
- ext/smarter_csv/smarter_csv.c
|
|
117
118
|
- lib/extensions/hash.rb
|