smarter_csv 1.8.1 → 1.8.2

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: a7aa350efc77f90c6986a7573e733b5d9d02930c94465f17d2227b346263a6ce
4
- data.tar.gz: 42351edf3e618b8c025f266796897aa0c3572d77e42788a05b1ee37ce8bdeed2
3
+ metadata.gz: 4a6ec6f3a579d9c1e6bfc2c3c9006f64d8c7b705eeca6ec048ea56c688f8ea1c
4
+ data.tar.gz: ba9a4a289adcc2fc398ae608f9570c28baac57b877852f3ea37c78fa57f2d7e3
5
5
  SHA512:
6
- metadata.gz: 8bd9d59d7260a8e90ce472917801b98d088e37de5b1e912914f820f2efbbeb0491f5056d47575debdf1bccb8b9b8670cd089647efa15ec93b02413747dcfe702
7
- data.tar.gz: 861364c6213af99c11cd3b9a59b2cf46f8c8e850ee2273e4f1b790714c9cd0ca66a734d64233737e086669c2b6aa51415f1343c3d61811547ec3c715d7a1620c
6
+ metadata.gz: d8f516501a5539e30789e2d18c4d051f50372786d8df1272192c2bc7997470cf5d5e1ae94b776d0d580cb62ed8ffb0f6591ccc2be5d60eae6e421f22f0c92f94
7
+ data.tar.gz: 2993c59278adb531cf2299c0aa3869c637868f2f4e6421f845e4ade35011f94ee6e226f43e07b206c67c354c2eb38c0a41981ffbdbff00950690b94b06b2aacd
data/CHANGELOG.md CHANGED
@@ -1,13 +1,21 @@
1
1
 
2
2
  # SmarterCSV 1.x Change Log
3
3
 
4
+ ## 1.8.2 (2023-03-21)
5
+ * bugfix: do not raise `NoColSepDetected` for CSV files with only one column in most cases (issue #222)
6
+ If the first lines contain non-ASCII characters, and no col_sep is detected, it will still raise `NoColSepDetected`
7
+
4
8
  ## 1.8.1 (2023-03-19)
5
9
  * added validation against invalid values for :col_sep, :row_sep, :quote_char (issue #216)
6
10
  * deprecating `required_headers` and replace with `required_keys` (issue #140)
7
11
  * fixed issue with require statement
8
12
 
9
- ## 1.8.0 (2023-03-18)
13
+ ## 1.8.0 (2023-03-18) BREAKING
10
14
  * NEW DEFAULTS: `col_sep: :auto`, `row_sep: :auto`. Fully automatic detection by default.
15
+
16
+ MAKE SURE to rescue `NoColSepDetected` if your CSV files can have unexpected formats,
17
+ e.g. from users uploading them to a service, and handle those cases.
18
+
11
19
  * ignore Byte Order Marker (BOM) in first line in file (issues #27, #219)
12
20
 
13
21
  ## 1.7.4 (2023-01-13)
data/README.md CHANGED
@@ -77,7 +77,10 @@ $ hexdump -C spec/fixtures/bom_test_feff.csv
77
77
 
78
78
  Here are some examples to demonstrate the versatility of SmarterCSV.
79
79
 
80
- By default SmarterCSV determines the `row_sep` and `col_sep` values automatically.
80
+ **It is generally recommended to rescue `SmarterCSVException` or it's sub-classes.**
81
+
82
+ By default SmarterCSV determines the `row_sep` and `col_sep` values automatically. In cases where the automatic detection fails, an exception will be raised, e.g. `NoColSepDetected`. Rescuing from these exceptions will make sure that you don't miss processing CSV files, in case users upload CSV files with unexpected formats.
83
+
81
84
  In rare cases you may have to manually set these values, after going through the troubleshooting procedure described above.
82
85
 
83
86
  #### Example 1a: How SmarterCSV processes CSV-files as array of hashes:
data/TO_DO_v2.md ADDED
@@ -0,0 +1,14 @@
1
+ # SmarterCSV v2.0 TO DO List
2
+
3
+ * add enumerable to speed up parallel processing [issue #66](https://github.com/tilo/smarter_csv/issues/66), [issue #32](https://github.com/tilo/smarter_csv/issues/32)
4
+ * use Procs for validations and transformatoins [issue #118](https://github.com/tilo/smarter_csv/issues/118)
5
+ * make @errors and @warnings work [issue #118](https://github.com/tilo/smarter_csv/issues/118)
6
+ * skip file opening, allow reading from CSV string, e.g. reading from S3 file [issue #120](https://github.com/tilo/smarter_csv/issues/120).
7
+ Or stream large file from S3 (linked in the issue)
8
+ * Collect all Errors, before surfacing them. Avoid throwing an exception on the first error [issue #133](https://github.com/tilo/smarter_csv/issues/133)
9
+ * Don't call rewind on filehandle
10
+ * [2.0 BUG] :convert_values_to_numeric_unless_leading_zeros drops leading zeros [issue #151](https://github.com/tilo/smarter_csv/issues/151)
11
+ * [2.0 BUG] convert_to_float saves Proc as @@convert_to_integer [issue #157](https://github.com/tilo/smarter_csv/issues/157)
12
+ * Provide an example for custom Procs for hash_transformations in the docs [issue #174](https://github.com/tilo/smarter_csv/issues/174)
13
+ * Replace remove_empty_values: false [issue #213](https://github.com/tilo/smarter_csv/issues/213)
14
+
@@ -1,5 +1,5 @@
1
1
  # frozen_string_literal: true
2
2
 
3
3
  module SmarterCSV
4
- VERSION = "1.8.1"
4
+ VERSION = "1.8.2"
5
5
  end
data/lib/smarter_csv.rb CHANGED
@@ -393,15 +393,28 @@ module SmarterCSV
393
393
  def guess_column_separator(filehandle, options)
394
394
  skip_lines(filehandle, options)
395
395
 
396
- possible_delimiters = [',', "\t", ';', ':', '|']
396
+ delimiters = [',', "\t", ';', ':', '|']
397
397
 
398
- candidates = if options.fetch(:headers_in_file)
399
- candidated_column_separators_from_headers(filehandle, options, possible_delimiters)
400
- else
401
- candidated_column_separators_from_contents(filehandle, options, possible_delimiters)
402
- end
398
+ line = nil
399
+ has_header = options[:headers_in_file]
400
+ candidates = Hash.new(0)
401
+ count = has_header ? 1 : 5
402
+ count.times do
403
+ line = readline_with_counts(filehandle, options)
404
+ delimiters.each do |d|
405
+ candidates[d] += line.scan(d).count
406
+ end
407
+ rescue EOFError # short files
408
+ break
409
+ end
410
+ rewind(filehandle)
403
411
 
404
- raise SmarterCSV::NoColSepDetected if candidates.values.max == 0
412
+ if candidates.values.max == 0
413
+ # if the header only contains
414
+ return ',' if line =~ /^\w+$/
415
+
416
+ raise SmarterCSV::NoColSepDetected
417
+ end
405
418
 
406
419
  candidates.key(candidates.values.max)
407
420
  end
@@ -582,35 +595,5 @@ module SmarterCSV
582
595
  return true if str.is_a?(String) && !str.empty?
583
596
  false
584
597
  end
585
-
586
- def candidated_column_separators_from_headers(filehandle, options, delimiters)
587
- candidates = Hash.new(0)
588
- line = readline_with_counts(filehandle, options.slice(:row_sep))
589
-
590
- delimiters.each do |d|
591
- candidates[d] += line.scan(d).count
592
- end
593
-
594
- rewind(filehandle)
595
-
596
- candidates
597
- end
598
-
599
- def candidated_column_separators_from_contents(filehandle, options, delimiters)
600
- candidates = Hash.new(0)
601
-
602
- 5.times do
603
- line = readline_with_counts(filehandle, options.slice(:row_sep))
604
- delimiters.each do |d|
605
- candidates[d] += line.scan(d).count
606
- end
607
- rescue EOFError # short files
608
- break
609
- end
610
-
611
- rewind(filehandle)
612
-
613
- candidates
614
- end
615
598
  end
616
599
  end
metadata CHANGED
@@ -1,14 +1,14 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: smarter_csv
3
3
  version: !ruby/object:Gem::Version
4
- version: 1.8.1
4
+ version: 1.8.2
5
5
  platform: ruby
6
6
  authors:
7
7
  - Tilo Sloboda
8
8
  autorequire:
9
9
  bindir: bin
10
10
  cert_chain: []
11
- date: 2023-03-19 00:00:00.000000000 Z
11
+ date: 2023-03-22 00:00:00.000000000 Z
12
12
  dependencies:
13
13
  - !ruby/object:Gem::Dependency
14
14
  name: awesome_print
@@ -112,6 +112,7 @@ files:
112
112
  - LICENSE.txt
113
113
  - README.md
114
114
  - Rakefile
115
+ - TO_DO_v2.md
115
116
  - ext/smarter_csv/extconf.rb
116
117
  - ext/smarter_csv/smarter_csv.c
117
118
  - lib/extensions/hash.rb