smarter_csv 1.8.1 → 1.8.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: a7aa350efc77f90c6986a7573e733b5d9d02930c94465f17d2227b346263a6ce
4
- data.tar.gz: 42351edf3e618b8c025f266796897aa0c3572d77e42788a05b1ee37ce8bdeed2
3
+ metadata.gz: 4a6ec6f3a579d9c1e6bfc2c3c9006f64d8c7b705eeca6ec048ea56c688f8ea1c
4
+ data.tar.gz: ba9a4a289adcc2fc398ae608f9570c28baac57b877852f3ea37c78fa57f2d7e3
5
5
  SHA512:
6
- metadata.gz: 8bd9d59d7260a8e90ce472917801b98d088e37de5b1e912914f820f2efbbeb0491f5056d47575debdf1bccb8b9b8670cd089647efa15ec93b02413747dcfe702
7
- data.tar.gz: 861364c6213af99c11cd3b9a59b2cf46f8c8e850ee2273e4f1b790714c9cd0ca66a734d64233737e086669c2b6aa51415f1343c3d61811547ec3c715d7a1620c
6
+ metadata.gz: d8f516501a5539e30789e2d18c4d051f50372786d8df1272192c2bc7997470cf5d5e1ae94b776d0d580cb62ed8ffb0f6591ccc2be5d60eae6e421f22f0c92f94
7
+ data.tar.gz: 2993c59278adb531cf2299c0aa3869c637868f2f4e6421f845e4ade35011f94ee6e226f43e07b206c67c354c2eb38c0a41981ffbdbff00950690b94b06b2aacd
data/CHANGELOG.md CHANGED
@@ -1,13 +1,21 @@
1
1
 
2
2
  # SmarterCSV 1.x Change Log
3
3
 
4
+ ## 1.8.2 (2023-03-21)
5
+ * bugfix: do not raise `NoColSepDetected` for CSV files with only one column in most cases (issue #222)
6
+ If the first lines contain non-ASCII characters, and no col_sep is detected, it will still raise `NoColSepDetected`
7
+
4
8
  ## 1.8.1 (2023-03-19)
5
9
  * added validation against invalid values for :col_sep, :row_sep, :quote_char (issue #216)
6
10
  * deprecating `required_headers` and replace with `required_keys` (issue #140)
7
11
  * fixed issue with require statement
8
12
 
9
- ## 1.8.0 (2023-03-18)
13
+ ## 1.8.0 (2023-03-18) BREAKING
10
14
  * NEW DEFAULTS: `col_sep: :auto`, `row_sep: :auto`. Fully automatic detection by default.
15
+
16
+ MAKE SURE to rescue `NoColSepDetected` if your CSV files can have unexpected formats,
17
+ e.g. from users uploading them to a service, and handle those cases.
18
+
11
19
  * ignore Byte Order Marker (BOM) in first line in file (issues #27, #219)
12
20
 
13
21
  ## 1.7.4 (2023-01-13)
data/README.md CHANGED
@@ -77,7 +77,10 @@ $ hexdump -C spec/fixtures/bom_test_feff.csv
77
77
 
78
78
  Here are some examples to demonstrate the versatility of SmarterCSV.
79
79
 
80
- By default SmarterCSV determines the `row_sep` and `col_sep` values automatically.
80
+ **It is generally recommended to rescue `SmarterCSVException` or it's sub-classes.**
81
+
82
+ By default SmarterCSV determines the `row_sep` and `col_sep` values automatically. In cases where the automatic detection fails, an exception will be raised, e.g. `NoColSepDetected`. Rescuing from these exceptions will make sure that you don't miss processing CSV files, in case users upload CSV files with unexpected formats.
83
+
81
84
  In rare cases you may have to manually set these values, after going through the troubleshooting procedure described above.
82
85
 
83
86
  #### Example 1a: How SmarterCSV processes CSV-files as array of hashes:
data/TO_DO_v2.md ADDED
@@ -0,0 +1,14 @@
1
+ # SmarterCSV v2.0 TO DO List
2
+
3
+ * add enumerable to speed up parallel processing [issue #66](https://github.com/tilo/smarter_csv/issues/66), [issue #32](https://github.com/tilo/smarter_csv/issues/32)
4
+ * use Procs for validations and transformatoins [issue #118](https://github.com/tilo/smarter_csv/issues/118)
5
+ * make @errors and @warnings work [issue #118](https://github.com/tilo/smarter_csv/issues/118)
6
+ * skip file opening, allow reading from CSV string, e.g. reading from S3 file [issue #120](https://github.com/tilo/smarter_csv/issues/120).
7
+ Or stream large file from S3 (linked in the issue)
8
+ * Collect all Errors, before surfacing them. Avoid throwing an exception on the first error [issue #133](https://github.com/tilo/smarter_csv/issues/133)
9
+ * Don't call rewind on filehandle
10
+ * [2.0 BUG] :convert_values_to_numeric_unless_leading_zeros drops leading zeros [issue #151](https://github.com/tilo/smarter_csv/issues/151)
11
+ * [2.0 BUG] convert_to_float saves Proc as @@convert_to_integer [issue #157](https://github.com/tilo/smarter_csv/issues/157)
12
+ * Provide an example for custom Procs for hash_transformations in the docs [issue #174](https://github.com/tilo/smarter_csv/issues/174)
13
+ * Replace remove_empty_values: false [issue #213](https://github.com/tilo/smarter_csv/issues/213)
14
+
@@ -1,5 +1,5 @@
1
1
  # frozen_string_literal: true
2
2
 
3
3
  module SmarterCSV
4
- VERSION = "1.8.1"
4
+ VERSION = "1.8.2"
5
5
  end
data/lib/smarter_csv.rb CHANGED
@@ -393,15 +393,28 @@ module SmarterCSV
393
393
  def guess_column_separator(filehandle, options)
394
394
  skip_lines(filehandle, options)
395
395
 
396
- possible_delimiters = [',', "\t", ';', ':', '|']
396
+ delimiters = [',', "\t", ';', ':', '|']
397
397
 
398
- candidates = if options.fetch(:headers_in_file)
399
- candidated_column_separators_from_headers(filehandle, options, possible_delimiters)
400
- else
401
- candidated_column_separators_from_contents(filehandle, options, possible_delimiters)
402
- end
398
+ line = nil
399
+ has_header = options[:headers_in_file]
400
+ candidates = Hash.new(0)
401
+ count = has_header ? 1 : 5
402
+ count.times do
403
+ line = readline_with_counts(filehandle, options)
404
+ delimiters.each do |d|
405
+ candidates[d] += line.scan(d).count
406
+ end
407
+ rescue EOFError # short files
408
+ break
409
+ end
410
+ rewind(filehandle)
403
411
 
404
- raise SmarterCSV::NoColSepDetected if candidates.values.max == 0
412
+ if candidates.values.max == 0
413
+ # if the header only contains
414
+ return ',' if line =~ /^\w+$/
415
+
416
+ raise SmarterCSV::NoColSepDetected
417
+ end
405
418
 
406
419
  candidates.key(candidates.values.max)
407
420
  end
@@ -582,35 +595,5 @@ module SmarterCSV
582
595
  return true if str.is_a?(String) && !str.empty?
583
596
  false
584
597
  end
585
-
586
- def candidated_column_separators_from_headers(filehandle, options, delimiters)
587
- candidates = Hash.new(0)
588
- line = readline_with_counts(filehandle, options.slice(:row_sep))
589
-
590
- delimiters.each do |d|
591
- candidates[d] += line.scan(d).count
592
- end
593
-
594
- rewind(filehandle)
595
-
596
- candidates
597
- end
598
-
599
- def candidated_column_separators_from_contents(filehandle, options, delimiters)
600
- candidates = Hash.new(0)
601
-
602
- 5.times do
603
- line = readline_with_counts(filehandle, options.slice(:row_sep))
604
- delimiters.each do |d|
605
- candidates[d] += line.scan(d).count
606
- end
607
- rescue EOFError # short files
608
- break
609
- end
610
-
611
- rewind(filehandle)
612
-
613
- candidates
614
- end
615
598
  end
616
599
  end
metadata CHANGED
@@ -1,14 +1,14 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: smarter_csv
3
3
  version: !ruby/object:Gem::Version
4
- version: 1.8.1
4
+ version: 1.8.2
5
5
  platform: ruby
6
6
  authors:
7
7
  - Tilo Sloboda
8
8
  autorequire:
9
9
  bindir: bin
10
10
  cert_chain: []
11
- date: 2023-03-19 00:00:00.000000000 Z
11
+ date: 2023-03-22 00:00:00.000000000 Z
12
12
  dependencies:
13
13
  - !ruby/object:Gem::Dependency
14
14
  name: awesome_print
@@ -112,6 +112,7 @@ files:
112
112
  - LICENSE.txt
113
113
  - README.md
114
114
  - Rakefile
115
+ - TO_DO_v2.md
115
116
  - ext/smarter_csv/extconf.rb
116
117
  - ext/smarter_csv/smarter_csv.c
117
118
  - lib/extensions/hash.rb