csv_sniffer 0.0.1 → 0.0.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA1:
3
- metadata.gz: 0c4ec9e742582c39e7ee5b57d4dce731292bc0e0
4
- data.tar.gz: 3aaa46390b01a030b317724be479da6724a2612c
3
+ metadata.gz: b9e303592691083e6ac15f46740577447dfeea43
4
+ data.tar.gz: 0d7c5ab5c866d2b16d5591a20db8ff62ac0abeb3
5
5
  SHA512:
6
- metadata.gz: 976a4029144696e77fdb7904a9be48270b10ca4e4b03b99c26a99875c0845a4032e5cf587ce32dddf8b1332789fd5af0062c95a73b6e10c34a404be262311b79
7
- data.tar.gz: 232fb8c1acacbf5932962e0429bf0f287ec510f96e7d7f731b613662155a636dd89b3f87a5d1659aea8e0d760a648c6fd1a33563a0486fc83e240b72db005620
6
+ metadata.gz: c7bd0b58ee2ae274f149e0212fb9f8434b2a240304b0b37f1237d815ff573a2ae14459cb27f573ce39622d88680460a8c4a859dda94e7297556eca72fc510608
7
+ data.tar.gz: caa05cdbb7763a1e72b85468f2720f10b2354ba7b97c7b6fb05582807b1f3f0bcdf9f078a8888bdfb811f30022904307f98989e252d059f51e8c75206dcb886e
data/LICENSE CHANGED
@@ -1,21 +1,21 @@
1
- The MIT License (MIT)
2
-
3
- Copyright (c) 2015 Tim Ojo
4
-
5
- Permission is hereby granted, free of charge, to any person obtaining a copy
6
- of this software and associated documentation files (the "Software"), to deal
7
- in the Software without restriction, including without limitation the rights
8
- to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
- copies of the Software, and to permit persons to whom the Software is
10
- furnished to do so, subject to the following conditions:
11
-
12
- The above copyright notice and this permission notice shall be included in
13
- all copies or substantial portions of the Software.
14
-
15
- THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
- IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
- FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
- AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
- LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
- OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
21
- THE SOFTWARE.
1
+ The MIT License (MIT)
2
+
3
+ Copyright (c) 2015 Tim Ojo
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ of this software and associated documentation files (the "Software"), to deal
7
+ in the Software without restriction, including without limitation the rights
8
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ copies of the Software, and to permit persons to whom the Software is
10
+ furnished to do so, subject to the following conditions:
11
+
12
+ The above copyright notice and this permission notice shall be included in
13
+ all copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
21
+ THE SOFTWARE.
data/README.md CHANGED
@@ -1,65 +1,68 @@
1
- # CSV Sniffer
2
-
3
- CSV Sniffer is intended to provide utilities that will allow a user heuristically detect the delimiter character in use, whether the values in the CSV file are quote enclosed, whether the file contains a header, and more. The library is intended to detect information to be used as configuration inputs for CSV parsers. For delimiter detection the following delimiters are currently supported `[",", "\t", "|", ";"]`
4
-
5
- To ensure high performance and a low memory footprint, the library uses as little information as needed to make accurate decisions. Contributors are welcome to
6
- improve the algorithms in use.
7
-
8
-
9
- ## Installation
10
-
11
- ```
12
- $ gem install csv_sniffer
13
- ```
14
-
15
- ## Usage
16
-
17
- Given a `some_data.csv` file:
18
-
19
- ```csv
20
- Name;Phone
21
- John Doe ;555-481-2345
22
- Jane C. Doe;555-123-4567
23
- ```
24
-
25
- Detection usage is as follows:
26
-
27
- ```rb
28
- require "csv_sniffer"
29
-
30
- delim = CsvSniffer.detect_delimiter("/path/to/some_data.csv") #=> ";"
31
- is_quote_enclosed = CsvSniffer.is_quote_enclosed?("/path/to/some_data.csv") #=> False
32
- ```
33
-
34
- See [`test.rb`](test.rb) for more examples.
35
-
36
-
37
- ## Tests
38
-
39
- ```
40
- $ ruby test.rb
41
- ```
42
-
43
- ## License
44
-
45
- The MIT License (MIT)
46
-
47
- Copyright © 2015 Tim Ojo
48
-
49
- Permission is hereby granted, free of charge, to any person obtaining a copy
50
- of this software and associated documentation files (the "Software"), to deal
51
- in the Software without restriction, including without limitation the rights
52
- to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
53
- copies of the Software, and to permit persons to whom the Software is
54
- furnished to do so, subject to the following conditions:
55
-
56
- The above copyright notice and this permission notice shall be included in
57
- all copies or substantial portions of the Software.
58
-
59
- THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
60
- IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
61
- FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
62
- AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
63
- LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
64
- OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
65
- THE SOFTWARE.
1
+ # CSV Sniffer
2
+
3
+ CSV Sniffer is intended to provide utilities that will allow a user heuristically detect the delimiter character in use, whether the values in the CSV file are quote enclosed, whether the file contains a header, and more. The library is intended to detect information to be used as configuration inputs for CSV parsers. For delimiter detection the following delimiters are currently supported `[",", "\t", "|", ";"]`
4
+
5
+ To ensure high performance and a low memory footprint, the library uses as little information as needed to make accurate decisions. Contributors are welcome to
6
+ improve the algorithms in use.
7
+
8
+
9
+ ## Installation
10
+
11
+ ```
12
+ $ gem install csv_sniffer
13
+ ```
14
+
15
+ ## Usage
16
+
17
+ Given a `some_data.csv` file:
18
+
19
+ ```csv
20
+ Name;Phone
21
+ John Doe ;555-481-2345
22
+ Jane C. Doe;555-123-4567
23
+ ```
24
+
25
+ Detection usage is as follows:
26
+
27
+ ```rb
28
+ require "csv_sniffer"
29
+
30
+ delim = CsvSniffer.detect_delimiter("/path/to/some_data.csv") #=> ";"
31
+ is_quote_enclosed = CsvSniffer.is_quote_enclosed?("/path/to/some_data.csv") #=> False
32
+ ```
33
+
34
+ See [`test_csv_sniffer.rb`](test/test_csv_sniffer.rb) for more examples.
35
+
36
+
37
+ ## Tests
38
+
39
+ ```
40
+ $ ruby test.rb
41
+ ```
42
+
43
+ ## To Do
44
+ - Header detection
45
+
46
+ ## License
47
+
48
+ The MIT License (MIT)
49
+
50
+ Copyright © 2015 Tim Ojo
51
+
52
+ Permission is hereby granted, free of charge, to any person obtaining a copy
53
+ of this software and associated documentation files (the "Software"), to deal
54
+ in the Software without restriction, including without limitation the rights
55
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
56
+ copies of the Software, and to permit persons to whom the Software is
57
+ furnished to do so, subject to the following conditions:
58
+
59
+ The above copyright notice and this permission notice shall be included in
60
+ all copies or substantial portions of the Software.
61
+
62
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
63
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
64
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
65
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
66
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
67
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
68
+ THE SOFTWARE.
data/Rakefile CHANGED
@@ -1,8 +1,8 @@
1
- require 'rake/testtask'
2
-
3
- Rake::TestTask.new do |t|
4
- t.libs << 'test'
5
- end
6
-
7
- desc "Run tests"
8
- task :default => :test
1
+ require 'rake/testtask'
2
+
3
+ Rake::TestTask.new do |t|
4
+ t.libs << 'test'
5
+ end
6
+
7
+ desc "Run tests"
8
+ task :default => :test
@@ -1,15 +1,15 @@
1
- Gem::Specification.new do |s|
2
- s.name = 'csv_sniffer'
3
- s.version = '0.0.1'
4
- s.date = '2015-10-09'
5
- s.summary = "CSV library for heuristic detection of CSV properties"
6
- s.description = "CSV Sniffer is intended to provide utilities that will allow a user detect the delimiter character in use, whether the values in the CSV file are quote enclosed, whether the file contains a header, and more. The library is intended to detect information to be used as configuration inputs for CSV parsers."
7
- s.authors = ["Tim Ojo"]
8
- s.email = 'ojo.tim@gmail.com'
9
- s.homepage = 'https://github.com/tim-ojo/csv_sniffer'
10
- s.license = 'MIT'
11
-
12
- s.files = `git ls-files`.split($/)
13
- s.test_files = s.files.grep(/^test/)
14
- s.add_development_dependency "test-unit"
15
- end
1
+ Gem::Specification.new do |s|
2
+ s.name = 'csv_sniffer'
3
+ s.version = '0.0.2'
4
+ s.date = '2015-10-10'
5
+ s.summary = "CSV library for heuristic detection of CSV properties"
6
+ s.description = "CSV Sniffer is intended to provide utilities that will allow a user detect the delimiter character in use, whether the values in the CSV file are quote enclosed, whether the file contains a header, and more. The library is intended to detect information to be used as configuration inputs for CSV parsers."
7
+ s.authors = ["Tim Ojo"]
8
+ s.email = 'ojo.tim@gmail.com'
9
+ s.homepage = 'https://github.com/tim-ojo/csv_sniffer'
10
+ s.license = 'MIT'
11
+
12
+ s.files = `git ls-files`.split($/)
13
+ s.test_files = s.files.grep(/^test/)
14
+ s.add_development_dependency "test-unit"
15
+ end
@@ -1,106 +1,137 @@
1
- # This class contains functions to heuristically decipher certain information from a CSV file
2
- class CsvSniffer
3
-
4
- # Reads the first line of the csv and returns true if the line starts and ends with " or '
5
- def self.is_quote_enclosed?(filepath)
6
- line = File.open(filepath, &:readline)
7
- line.chomp!.strip!
8
- return line.start_with?('"') && line.end_with?('"') || line.start_with?("'") && line.end_with?("'")
9
- end
10
-
11
- def self.get_quote_char(filepath)
12
- if is_quote_enclosed?(filepath)
13
- line = File.open(filepath, &:readline)
14
- line.chomp!.strip!
15
- return line[0]
16
- else
17
- return nil
18
- end
19
- end
20
-
21
- # If the csv is quote enclosed then just get the delimiter after the first cell. Otherwise...
22
- # Get the first line and count how many of the possible delimiters are present. If there is >1 of one of the
23
- # delimiters and 0 of the others then, then we pick the max. If there are more than 0 of any of the others then
24
- # we repeat the counting procedure for the next 50 lines until the condition is satisfied.
25
- # If the condition is never satisfied then we simply pick the delimiter that occurs the most frequently, defaulting
26
- # to the comma. Unless that delimeter's count is equal to the tab or pipe delimiter's count. In that case we return \t or |
27
- def self.detect_delimiter (filepath)
28
-
29
- if is_quote_enclosed?(filepath)
30
- line = File.open(filepath, &:readline)
31
- line.chomp!.strip!
32
- m = /["'].+?["']([,|;\t])/.match(line)
33
- if (m)
34
- return m[1]
35
- end
36
- end
37
-
38
- lineCount = 0
39
- File.foreach(filepath) do |line|
40
- detectedDelim = max_delim_when_others_are_zero(line)
41
- if detectedDelim != '0' #=> '0' is a sentinel value that indicates no delim found
42
- return detectedDelim
43
- end
44
-
45
- lineCount += 1;
46
- break if lineCount == 50
47
- end
48
-
49
- # If I got here I'm going to pick the default by counting the delimiters on the first line and returning the max
50
- line = File.open(filepath, &:readline)
51
- freqOfPossibleDelims = get_freq_of_possible_delims(line)
52
-
53
- maxFreq = 0
54
- maxFreqIndex = 0
55
- freqOfPossibleDelims.each_with_index do |delimFreq, i|
56
- if (delimFreq > maxFreq)
57
- maxFreq = delimFreq
58
- maxFreqIndex = i
59
- end
60
- end
61
-
62
- # Favor "\t" and "|" over ","
63
- if (maxFreq == freqOfPossibleDelims[1])
64
- return "\t"
65
- elsif (maxFreq == freqOfPossibleDelims[3])
66
- return "|"
67
- else
68
- return [",", "\t", ";", "|"][maxFreqIndex]
69
- end
70
- end
71
-
72
- def self.max_delim_when_others_are_zero (line)
73
- freqOfPossibleDelims = get_freq_of_possible_delims(line)
74
-
75
- maxFreq = 0
76
- maxFreqIndex = 0
77
- zeroCount = 0
78
- freqOfPossibleDelims.each_with_index do |delimFreq, i|
79
- if (delimFreq > maxFreq)
80
- maxFreq = delimFreq
81
- maxFreqIndex = i
82
- end
83
- zeroCount += 1 if delimFreq == 0
84
- end
85
-
86
- if zeroCount >= 3
87
- return [',', '\t', ';', '|'][maxFreqIndex]
88
- else
89
- return '0' #=> '0' is a sentinel value that indicates no delim found
90
- end
91
- end
92
-
93
- def self.get_freq_of_possible_delims (line)
94
- freqOfPossibleDelims = Array.new(4) #=> [0 = ','] [1 = '\t'] [2 = ';'] [3 = '|']
95
- freqOfPossibleDelims[0] = line.count ","
96
- freqOfPossibleDelims[1] = line.count "\t"
97
- freqOfPossibleDelims[2] = line.count ";"
98
- freqOfPossibleDelims[3] = line.count "|"
99
-
100
- return freqOfPossibleDelims
101
- end
102
-
103
- private_class_method :max_delim_when_others_are_zero
104
- private_class_method :get_freq_of_possible_delims
105
-
106
- end
1
+ # This class contains functions to heuristically decipher certain information from a CSV file
2
+ class CsvSniffer
3
+
4
+ # Reads the first line of the csv and returns true if the line starts and ends with " or '
5
+ #
6
+ # Example:
7
+ # CsvSniffer.is_quote_enclosed?("path/to/file")
8
+ # => true
9
+ #
10
+ # Arguments:
11
+ # filepath: (String)
12
+
13
+ def self.is_quote_enclosed?(filepath)
14
+ line = File.open(filepath, &:readline)
15
+ line.chomp!.strip!
16
+ return line.start_with?('"') && line.end_with?('"') || line.start_with?("'") && line.end_with?("'")
17
+ end
18
+
19
+
20
+ # Gets the quote character in use in the file if one exists. Returns "'", """ or nil
21
+ #
22
+ # Example:
23
+ # CsvSniffer.get_quote_char("path/to/file")
24
+ # => "
25
+ #
26
+ # Arguments:
27
+ # filepath: (String)
28
+
29
+ def self.get_quote_char(filepath)
30
+ if is_quote_enclosed?(filepath)
31
+ line = File.open(filepath, &:readline)
32
+ line.chomp!.strip!
33
+ return line[0]
34
+ else
35
+ return nil
36
+ end
37
+ end
38
+
39
+
40
+ # Heuristically detects the delimiter used in the CSV file and returns it
41
+ #
42
+ # Example:
43
+ # CsvSniffer.detect_delimiter("path/to/file")
44
+ # => "|"
45
+ #
46
+ # Arguments:
47
+ # filepath: (String)
48
+
49
+ def self.detect_delimiter (filepath)
50
+ # If the csv is quote enclosed then just get the delimiter after the first cell. Otherwise...
51
+ # Get the first line and count how many of the possible delimiters are present. If there is >1 of one of the
52
+ # delimiters and 0 of the others then, then we pick the max. If there are more than 0 of any of the others then
53
+ # we repeat the counting procedure for the next 50 lines until the condition is satisfied.
54
+ # If the condition is never satisfied then we simply pick the delimiter that occurs the most frequently, defaulting
55
+ # to the comma. Unless that delimeter's count is equal to the tab or pipe delimiter's count. In that case we return \t or |
56
+
57
+ if is_quote_enclosed?(filepath)
58
+ line = File.open(filepath, &:readline)
59
+ line.chomp!.strip!
60
+ m = /["'].+?["']([,|;\t])/.match(line)
61
+ if (m)
62
+ return m[1]
63
+ end
64
+ end
65
+
66
+ lineCount = 0
67
+ File.foreach(filepath) do |line|
68
+ detectedDelim = max_delim_when_others_are_zero(line)
69
+ if detectedDelim != '0' #=> '0' is a sentinel value that indicates no delim found
70
+ return detectedDelim
71
+ end
72
+
73
+ lineCount += 1;
74
+ break if lineCount == 50
75
+ end
76
+
77
+ # If I got here I'm going to pick the default by counting the delimiters on the first line and returning the max
78
+ line = File.open(filepath, &:readline)
79
+ freqOfPossibleDelims = get_freq_of_possible_delims(line)
80
+
81
+ maxFreq = 0
82
+ maxFreqIndex = 0
83
+ freqOfPossibleDelims.each_with_index do |delimFreq, i|
84
+ if (delimFreq > maxFreq)
85
+ maxFreq = delimFreq
86
+ maxFreqIndex = i
87
+ end
88
+ end
89
+
90
+ # Favor "\t" and "|" over ","
91
+ if (maxFreq == freqOfPossibleDelims[1])
92
+ return "\t"
93
+ elsif (maxFreq == freqOfPossibleDelims[3])
94
+ return "|"
95
+ else
96
+ return [",", "\t", ";", "|"][maxFreqIndex]
97
+ end
98
+ end
99
+
100
+
101
+ def self.max_delim_when_others_are_zero (line)
102
+ freqOfPossibleDelims = get_freq_of_possible_delims(line)
103
+
104
+ maxFreq = 0
105
+ maxFreqIndex = 0
106
+ zeroCount = 0
107
+ freqOfPossibleDelims.each_with_index do |delimFreq, i|
108
+ if (delimFreq > maxFreq)
109
+ maxFreq = delimFreq
110
+ maxFreqIndex = i
111
+ end
112
+ zeroCount += 1 if delimFreq == 0
113
+ end
114
+
115
+ if zeroCount >= 3
116
+ return [',', '\t', ';', '|'][maxFreqIndex]
117
+ else
118
+ return '0' #=> '0' is a sentinel value that indicates no delim found
119
+ end
120
+ end
121
+
122
+
123
+ def self.get_freq_of_possible_delims (line)
124
+ freqOfPossibleDelims = Array.new(4) #=> [0 = ','] [1 = '\t'] [2 = ';'] [3 = '|']
125
+ freqOfPossibleDelims[0] = line.count ","
126
+ freqOfPossibleDelims[1] = line.count "\t"
127
+ freqOfPossibleDelims[2] = line.count ";"
128
+ freqOfPossibleDelims[3] = line.count "|"
129
+
130
+ return freqOfPossibleDelims
131
+ end
132
+
133
+
134
+ private_class_method :max_delim_when_others_are_zero
135
+ private_class_method :get_freq_of_possible_delims
136
+
137
+ end
@@ -1,71 +1,71 @@
1
- require 'minitest/autorun'
2
- require 'tempfile'
3
- require 'csv_sniffer'
4
-
5
- class CsvSnifferTest < Minitest::Test
6
-
7
- @@file1 = Tempfile.new('file1')
8
- @@file1.puts "Name,Number"
9
- @@file1.puts "John Doe,555-123-4567"
10
- @@file1.puts "Jane C. Doe,555-000-1234"
11
- @@file1.rewind
12
-
13
- @@file2 = Tempfile.new('file2')
14
- @@file2.puts "'Name' |'Number'\t"
15
- @@file2.puts "'John Doe'|'555-123-4567'"
16
- @@file2.puts "'Jane C. Doe'|'555-000-1234'"
17
- @@file2.rewind
18
-
19
- @@file3 = Tempfile.new('file3')
20
- @@file3.puts "John Doe;555-123-4567;Good\tdude"
21
- @@file3.puts "Jane C. Doe;555-000-1234 ; Great gal"
22
- @@file3.rewind
23
-
24
- @@file4 = Tempfile.new('file4')
25
- @@file4.puts "Doe, John\t555-123-4567"
26
- @@file4.puts "Jane C. Doe\t555-000-1234\t"
27
- @@file4.rewind
28
-
29
- @@file5 = Tempfile.new('file5')
30
- @@file5.puts '"Doe,,,,,, John"|"555-123-4567"'
31
- @@file5.puts '"Jane C. Doe"|"555-000-1234\t"'
32
- @@file5.rewind
33
-
34
- @@file6 = Tempfile.new('file6')
35
- @@file6.puts 'Doe, John|555-123-4567'
36
- @@file6.puts 'Doe, Jane C. |555-000-1234'
37
- @@file6.rewind
38
-
39
- def test_file1
40
- assert_equal ",", CsvSniffer.detect_delimiter(@@file1.path)
41
- assert_equal false, CsvSniffer.is_quote_enclosed?(@@file1.path)
42
- assert_equal nil, CsvSniffer.get_quote_char(@@file1.path)
43
- end
44
-
45
- def test_file2
46
- assert_equal "|", CsvSniffer.detect_delimiter(@@file2.path)
47
- assert_equal true, CsvSniffer.is_quote_enclosed?(@@file2.path)
48
- assert_equal "'", CsvSniffer.get_quote_char(@@file2.path)
49
- end
50
-
51
- def test_file3
52
- assert_equal ";", CsvSniffer.detect_delimiter(@@file3.path)
53
- assert_equal false, CsvSniffer.is_quote_enclosed?(@@file3.path)
54
- end
55
-
56
- def test_file4
57
- assert_equal "\\t", CsvSniffer.detect_delimiter(@@file4.path)
58
- assert_equal nil, CsvSniffer.get_quote_char(@@file4.path)
59
- end
60
-
61
- def test_file5
62
- assert_equal "|", CsvSniffer.detect_delimiter(@@file5.path)
63
- assert_equal true, CsvSniffer.is_quote_enclosed?(@@file5.path)
64
- assert_equal '"', CsvSniffer.get_quote_char(@@file5.path)
65
- end
66
-
67
- def test_file6
68
- assert_equal "|", CsvSniffer.detect_delimiter(@@file6.path)
69
- end
70
-
71
- end
1
+ require 'minitest/autorun'
2
+ require 'tempfile'
3
+ require 'csv_sniffer'
4
+
5
+ class CsvSnifferTest < Minitest::Test
6
+
7
+ @@file1 = Tempfile.new('file1')
8
+ @@file1.puts "Name,Number"
9
+ @@file1.puts "John Doe,555-123-4567"
10
+ @@file1.puts "Jane C. Doe,555-000-1234"
11
+ @@file1.rewind
12
+
13
+ @@file2 = Tempfile.new('file2')
14
+ @@file2.puts "'Name' |'Number'\t"
15
+ @@file2.puts "'John Doe'|'555-123-4567'"
16
+ @@file2.puts "'Jane C. Doe'|'555-000-1234'"
17
+ @@file2.rewind
18
+
19
+ @@file3 = Tempfile.new('file3')
20
+ @@file3.puts "John Doe;555-123-4567;Good\tdude"
21
+ @@file3.puts "Jane C. Doe;555-000-1234 ; Great gal"
22
+ @@file3.rewind
23
+
24
+ @@file4 = Tempfile.new('file4')
25
+ @@file4.puts "Doe, John\t555-123-4567"
26
+ @@file4.puts "Jane C. Doe\t555-000-1234\t"
27
+ @@file4.rewind
28
+
29
+ @@file5 = Tempfile.new('file5')
30
+ @@file5.puts '"Doe,,,,,, John"|"555-123-4567"'
31
+ @@file5.puts '"Jane C. Doe"|"555-000-1234\t"'
32
+ @@file5.rewind
33
+
34
+ @@file6 = Tempfile.new('file6')
35
+ @@file6.puts 'Doe, John|555-123-4567'
36
+ @@file6.puts 'Doe, Jane C. |555-000-1234'
37
+ @@file6.rewind
38
+
39
+ def test_file1
40
+ assert_equal ",", CsvSniffer.detect_delimiter(@@file1.path)
41
+ assert_equal false, CsvSniffer.is_quote_enclosed?(@@file1.path)
42
+ assert_equal nil, CsvSniffer.get_quote_char(@@file1.path)
43
+ end
44
+
45
+ def test_file2
46
+ assert_equal "|", CsvSniffer.detect_delimiter(@@file2.path)
47
+ assert_equal true, CsvSniffer.is_quote_enclosed?(@@file2.path)
48
+ assert_equal "'", CsvSniffer.get_quote_char(@@file2.path)
49
+ end
50
+
51
+ def test_file3
52
+ assert_equal ";", CsvSniffer.detect_delimiter(@@file3.path)
53
+ assert_equal false, CsvSniffer.is_quote_enclosed?(@@file3.path)
54
+ end
55
+
56
+ def test_file4
57
+ assert_equal "\\t", CsvSniffer.detect_delimiter(@@file4.path)
58
+ assert_equal nil, CsvSniffer.get_quote_char(@@file4.path)
59
+ end
60
+
61
+ def test_file5
62
+ assert_equal "|", CsvSniffer.detect_delimiter(@@file5.path)
63
+ assert_equal true, CsvSniffer.is_quote_enclosed?(@@file5.path)
64
+ assert_equal '"', CsvSniffer.get_quote_char(@@file5.path)
65
+ end
66
+
67
+ def test_file6
68
+ assert_equal "|", CsvSniffer.detect_delimiter(@@file6.path)
69
+ end
70
+
71
+ end
metadata CHANGED
@@ -1,27 +1,27 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: csv_sniffer
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.0.1
4
+ version: 0.0.2
5
5
  platform: ruby
6
6
  authors:
7
7
  - Tim Ojo
8
8
  autorequire:
9
9
  bindir: bin
10
10
  cert_chain: []
11
- date: 2015-10-09 00:00:00.000000000 Z
11
+ date: 2015-10-10 00:00:00.000000000 Z
12
12
  dependencies:
13
13
  - !ruby/object:Gem::Dependency
14
14
  name: test-unit
15
15
  requirement: !ruby/object:Gem::Requirement
16
16
  requirements:
17
- - - ">="
17
+ - - '>='
18
18
  - !ruby/object:Gem::Version
19
19
  version: '0'
20
20
  type: :development
21
21
  prerelease: false
22
22
  version_requirements: !ruby/object:Gem::Requirement
23
23
  requirements:
24
- - - ">="
24
+ - - '>='
25
25
  - !ruby/object:Gem::Version
26
26
  version: '0'
27
27
  description: CSV Sniffer is intended to provide utilities that will allow a user detect
@@ -49,17 +49,17 @@ require_paths:
49
49
  - lib
50
50
  required_ruby_version: !ruby/object:Gem::Requirement
51
51
  requirements:
52
- - - ">="
52
+ - - '>='
53
53
  - !ruby/object:Gem::Version
54
54
  version: '0'
55
55
  required_rubygems_version: !ruby/object:Gem::Requirement
56
56
  requirements:
57
- - - ">="
57
+ - - '>='
58
58
  - !ruby/object:Gem::Version
59
59
  version: '0'
60
60
  requirements: []
61
61
  rubyforge_project:
62
- rubygems_version: 2.4.7
62
+ rubygems_version: 2.0.14
63
63
  signing_key:
64
64
  specification_version: 4
65
65
  summary: CSV library for heuristic detection of CSV properties