csvlint 0.1.0 → 0.1.1

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml CHANGED
@@ -1,15 +1,15 @@
1
1
  ---
2
2
  !binary "U0hBMQ==":
3
3
  metadata.gz: !binary |-
4
- NzY4MzE5MGI1OWI4YzkzY2U1MTZlMWJjNDM4YzM5ZjNiMWI5ZWIyMA==
4
+ MDgwMGE1ZmY5ZWE2MzM2ODhhMTFhZjM5NTkyZTNkZWExZjFkN2I0Mw==
5
5
  data.tar.gz: !binary |-
6
- OTM5MWJmYzNkYTBiMjM4ZGU3Yjc2NTY0Njk4OTEzMDJhNmI2Yjg5Mg==
6
+ OTRhZTljMTc5ZDlmZDVlMmQ2ZDU2MTFmYmRlNGUyMGIyMTI3NTBmOQ==
7
7
  SHA512:
8
8
  metadata.gz: !binary |-
9
- YzU4NzJmZTlkYWE1NTY0MGY4NzY0N2Q1NDYxMzI1N2YxYjZhZThmNDZhMDVj
10
- Mjc1YmMyZGZkODhlNDE4OWZiYWM5NmVjNTNiOGRmMjI0OGY4ZjJkY2I4OTNl
11
- ZmYwMGMwN2YwOWNlYmUzNjZjNWU3OTEzYTIwZDYzMGMzZDBiOGQ=
9
+ ZDE4YWVlNGYzY2E5OGJjM2ZkZjA0MzQwMWE1Yzg5YTI3YzM0MjNkMmFhNjFj
10
+ YmY3YzYzN2E5NDk1NDM4YmNmMGY1ZWRlNmExZWI0NmYzZmQzNTc2N2ZiOTMx
11
+ YzU2MjhiMmE4YTc2NTVmNzE4YWI4ZDZjOWM5MDlhNTRlY2RkMzE=
12
12
  data.tar.gz: !binary |-
13
- YTkwNWVlNjlmZGZkZDJjNmUyZGQxMzBkYzViOTc1NzlhNjYzYjU3ZDljMmRl
14
- YzExNDFkMDUxOThkMjJkNDRkZWYxZjNlMzdmYTRmNjI1MTliNGYzOWQxYzY3
15
- MGIxMjA1YzhkYjUwOWZkOGU2NjViYjNlNjBhMTY4NTY0N2E4MmI=
13
+ NDY1ODhkODdlZDJlYzMyMTQ1NzFlYzAyNTYyN2YzMGE1NTM2Yzg1NWRiYTQ5
14
+ NmQ1ODE5MzNmOTU4ZjhmNmZjNWQwNWU1N2E5OTdmOWMyZmZiYjE4ZmVhNmVh
15
+ YzdmN2VmNjY1NjFkMDdmYTc0YzZiNTk2YWQxMWZkMmRkNmVhODg=
data/.ruby-version ADDED
@@ -0,0 +1 @@
1
+ 2.1.4
data/.travis.yml CHANGED
@@ -1,5 +1,9 @@
1
1
  rvm:
2
2
  - 2.0.0
3
+ - 2.1.0
4
+ - 2.2.0
5
+ sudo: false
6
+ cache: bundler
3
7
  notifications:
4
8
  irc:
5
9
  channels:
data/README.md CHANGED
@@ -24,12 +24,35 @@ Or install it yourself as:
24
24
 
25
25
  ## Usage
26
26
 
27
+ You can either use this gem within your own Ruby code, or as a standolone command line application
28
+
29
+ ## On the command line
30
+
31
+ After installing the gem, you can validate a CSV on the command line like so:
32
+
33
+ csvlint myfile.csv
34
+
35
+ You will then see the validation result, together with any warnings or errors e.g.
36
+
37
+ ```
38
+ myfile.csv is INVALID
39
+ 1. blank_rows. Row: 3
40
+ 1. title_row.
41
+ 2. inconsistent_values. Column: 14
42
+ ```
43
+
44
+ You can also optionally pass a schema file like so:
45
+
46
+ csvlint myfile.csv --schema=schema.json
47
+
48
+ ## In your own Ruby code
49
+
27
50
  Currently the gem supports retrieving a CSV accessible from a URL, File, or an IO-style object (e.g. StringIO)
28
51
 
29
52
  require 'csvlint'
30
53
 
31
54
  validator = Csvlint::Validator.new( "http://example.org/data.csv" )
32
- validator = Csvlint::Validator.new( File.new("/path/to/my/data.csv" )
55
+ validator = Csvlint::Validator.new( File.new("/path/to/my/data.csv" ))
33
56
  validator = Csvlint::Validator.new( StringIO.new( my_data_in_a_string ) )
34
57
 
35
58
  When validating from a URL the range of errors and warnings is wider as the library will also check HTTP headers for
@@ -61,13 +84,13 @@ best practices
61
84
  ## Controlling CSV Parsing
62
85
 
63
86
  The validator supports configuration of the [CSV Dialect](http://dataprotocols.org/csv-dialect/) used in a data file. This is specified by
64
- passing an options hash to the constructor:
87
+ passing a dialect hash to the constructor:
65
88
 
66
- opts = {
89
+ dialect = {
67
90
  "header" => true,
68
91
  "delimiter" => ","
69
92
  }
70
- validator = Csvlint::Validator.new( "http://example.org/data.csv", opts )
93
+ validator = Csvlint::Validator.new( "http://example.org/data.csv", dialect )
71
94
 
72
95
  The options should be a Hash that conforms to the [CSV Dialect](http://dataprotocols.org/csv-dialect/) JSON structure.
73
96
 
@@ -205,6 +228,19 @@ Schema validation provides some additional types of error and warning messages:
205
228
  * `:below_minimum` (error) -- a column with a `minimum` constraint contains a value that is below the minimum
206
229
  * `:above_maximum` (error) -- a column with a `maximum` constraint contains a value that is above the maximum
207
230
 
231
+ ## Other validation options
232
+
233
+ You can also provide an optional options hash as the fourth argument to Validator#new. Supported options are:
234
+
235
+ * :limit_lines -- only check this number of lines of the CSV file. Good for a quick check on huge files.
236
+
237
+ ```
238
+ options = {
239
+ limit_lines: 100
240
+ }
241
+ validator = Csvlint::Validator.new( "http://example.org/data.csv", nil, nil, options )
242
+ ```
243
+
208
244
  ## Contributing
209
245
 
210
246
  1. Fork it
data/bin/csvlint CHANGED
@@ -3,15 +3,57 @@ $:.unshift File.join( File.dirname(__FILE__), "..", "lib")
3
3
 
4
4
  require 'csvlint'
5
5
  require 'colorize'
6
+ require 'json'
7
+ require 'optparse'
8
+ require 'pp'
9
+
10
+ options = {}
11
+ opts = OptionParser.new
12
+
13
+ opts.banner = "Usage: csvlint [options] [file]"
14
+
15
+ opts.on("-d", "--dump-errors", "Pretty print error and warning objects.") do |d|
16
+ options[:dump] = d
17
+ end
18
+
19
+ opts.on("-s", "--schema-file FILENAME", "Schema file") do |s|
20
+ options[:schema_file] = s
21
+ end
22
+
23
+ opts.on_tail("-h", "--help",
24
+ "Show this message") do
25
+ puts opts
26
+ exit
27
+ end
28
+
29
+ begin
30
+ opts.parse!
31
+ rescue OptionParser::InvalidOption => e
32
+ puts e
33
+ puts opts
34
+ exit(1)
35
+ end
36
+
37
+ def print_error(index, error, dump, color)
6
38
 
7
- def print_error(index, error, color=:red)
8
39
  location = ""
9
40
  location += error.row.to_s if error.row
10
41
  location += "#{error.row ? "," : ""}#{error.column.to_s}" if error.column
11
42
  if error.row || error.column
12
43
  location = "#{error.row ? "Row" : "Column"}: #{location}"
13
44
  end
14
- puts "#{index+1}. #{error.type}. #{location}".colorize(color)
45
+ output_string = "#{index+1}. #{error.type}. #{location}"
46
+
47
+ if $stdout.tty?
48
+ puts output_string.colorize(color)
49
+ else
50
+ puts output_string
51
+ end
52
+
53
+ if dump
54
+ pp error
55
+ end
56
+
15
57
  end
16
58
 
17
59
  if ARGV.length == 0 && !$stdin.tty?
@@ -23,29 +65,45 @@ else
23
65
  begin
24
66
  source = File.new( source ) unless source =~ /^http(s)?/
25
67
  rescue Errno::ENOENT
26
- puts "File not found"
68
+ puts "#{source} not found"
27
69
  exit 1
28
70
  end
29
71
  end
30
72
  else
31
- puts "Usage: csvlint {file or URL} or {input} | csvlint"
73
+ puts "No CSV data to validate."
74
+ puts opts
32
75
  exit 1
33
76
  end
34
77
  end
35
78
 
36
- validator = Csvlint::Validator.new( source )
79
+ schema = nil
80
+ if options[:schema_file]
81
+ begin
82
+ schemafile = File.read( options[:schema_file] )
83
+ rescue Errno::ENOENT
84
+ puts "#{options[:schema_file]} not found"
85
+ exit 1
86
+ end
87
+ schema = Csvlint::Schema.from_json_table(nil, JSON.parse(schemafile))
88
+ end
89
+
90
+ validator = Csvlint::Validator.new( source, nil, schema )
37
91
 
38
- puts "#{ARGV[0] || "CSV"} is #{validator.valid? ? "VALID".green : "INVALID".red}"
92
+ if $stdout.tty?
93
+ puts "#{ARGV[0] || "CSV"} is #{validator.valid? ? "VALID".green : "INVALID".red}"
94
+ else
95
+ puts "#{ARGV[0] || "CSV"} is #{validator.valid? ? "VALID" : "INVALID"}"
96
+ end
39
97
 
40
98
  if validator.errors.size > 0
41
99
  validator.errors.each_with_index do |error, i|
42
- print_error(i, error)
100
+ print_error(i, error, options[:dump], :red)
43
101
  end
44
102
  end
45
103
 
46
104
  if validator.warnings.size > 0
47
105
  validator.warnings.each_with_index do |error, i|
48
- print_error(i, error, :yellow)
106
+ print_error(i, error, options[:dump], :yellow)
49
107
  end
50
108
  end
51
109
 
@@ -0,0 +1,2 @@
1
+ Foo,Bsr,Baz
2
+ Qux,Teaspoon,Doge
@@ -1,6 +1,6 @@
1
1
  Feature: Parse CSV
2
-
3
- Scenario: Sucessfully parse a valid CSV
2
+
3
+ Scenario: Successfully parse a valid CSV
4
4
  Given I have a CSV with the following content:
5
5
  """
6
6
  "Foo","Bar","Baz"
@@ -10,12 +10,12 @@ Feature: Parse CSV
10
10
  And it is stored at the url "http://example.com/example1.csv"
11
11
  When I ask if the CSV is valid
12
12
  Then I should get the value of true
13
-
13
+
14
14
  Scenario: Successfully parse a CSV with newlines in quoted fields
15
15
  Given I have a CSV with the following content:
16
16
  """
17
17
  "a","b","c"
18
- "d","e","this is
18
+ "d","e","this is
19
19
  valid"
20
20
  "a","b","c"
21
21
  """
@@ -27,14 +27,14 @@ valid"
27
27
  Given I have a CSV with the following content:
28
28
  """
29
29
  "a","b","c"
30
- "d","this is
31
- valid","as is this
30
+ "d","this is
31
+ valid","as is this
32
32
  too"
33
33
  """
34
34
  And it is stored at the url "http://example.com/example1.csv"
35
35
  When I ask if the CSV is valid
36
36
  Then I should get the value of true
37
-
37
+
38
38
  Scenario: Successfully report an invalid CSV
39
39
  Given I have a CSV with the following content:
40
40
  """
@@ -43,7 +43,7 @@ too"
43
43
  And it is stored at the url "http://example.com/example1.csv"
44
44
  When I ask if the CSV is valid
45
45
  Then I should get the value of false
46
-
46
+
47
47
  Scenario: Successfully report a CSV with incorrect quoting
48
48
  Given I have a CSV with the following content:
49
49
  """
@@ -51,8 +51,8 @@ too"
51
51
  """
52
52
  And it is stored at the url "http://example.com/example1.csv"
53
53
  When I ask if the CSV is valid
54
- Then I should get the value of false
55
-
54
+ Then I should get the value of false
55
+
56
56
  Scenario: Successfully report a CSV with incorrect whitespace
57
57
  Given I have a CSV with the following content:
58
58
  """
@@ -60,8 +60,8 @@ too"
60
60
  """
61
61
  And it is stored at the url "http://example.com/example1.csv"
62
62
  When I ask if the CSV is valid
63
- Then I should get the value of false
64
-
63
+ Then I should get the value of false
64
+
65
65
  Scenario: Successfully report a CSV with ragged rows
66
66
  Given I have a CSV with the following content:
67
67
  """
@@ -4,10 +4,10 @@ end
4
4
 
5
5
  Then(/^the "(.*?)" should be "(.*?)"$/) do |type, encoding|
6
6
  validator = Csvlint::Validator.new( @url, default_csv_options )
7
- validator.send(type.to_sym).should == encoding
7
+ expect( validator.send(type.to_sym) ).to eq( encoding )
8
8
  end
9
9
 
10
10
  Then(/^the metadata content type should be "(.*?)"$/) do |content_type|
11
11
  validator = Csvlint::Validator.new( @url, default_csv_options )
12
- validator.headers['content-type'].should == content_type
12
+ expect( validator.headers['content-type'] ).to eq( content_type )
13
13
  end
@@ -1,5 +1,5 @@
1
1
  Given(/^I have a CSV with the following content:$/) do |string|
2
- @csv = string
2
+ @csv = string.to_s
3
3
  end
4
4
 
5
5
  Given(/^it is stored at the url "(.*?)"$/) do |url|
@@ -17,7 +17,7 @@ end
17
17
 
18
18
  When(/^I ask if the CSV is valid$/) do
19
19
  @csv_options ||= default_csv_options
20
- @validator = Csvlint::Validator.new( @url, @csv_options )
20
+ @validator = Csvlint::Validator.new( @url, @csv_options )
21
21
  @valid = @validator.valid?
22
22
  end
23
23
 
@@ -1,36 +1,36 @@
1
1
  When(/^I ask if there are errors$/) do
2
2
  @csv_options ||= default_csv_options
3
-
3
+
4
4
  if @schema_json
5
5
  @schema = Csvlint::Schema.from_json_table( @schema_url || "http://example.org ", JSON.parse(@schema_json) )
6
6
  end
7
-
8
- @validator = Csvlint::Validator.new( @url, @csv_options, @schema )
7
+
8
+ @validator = Csvlint::Validator.new( @url, @csv_options, @schema )
9
9
  @errors = @validator.errors
10
10
  end
11
11
 
12
- Then(/^there should be (\d+) error$/) do |count|
13
- @errors.count.should == count.to_i
12
+ Then(/^there should be (\d+) error$/) do |count|
13
+ expect( @errors.count ).to eq( count.to_i )
14
14
  end
15
15
 
16
16
  Then(/^that error should have the type "(.*?)"$/) do |type|
17
- @errors.first.type.should == type.to_sym
17
+ expect( @errors.first.type ).to eq( type.to_sym )
18
18
  end
19
19
 
20
20
  Then(/^that error should have the row "(.*?)"$/) do |row|
21
- @errors.first.row.should == row.to_i
21
+ expect( @errors.first.row ).to eq( row.to_i )
22
22
  end
23
23
 
24
24
  Then(/^that error should have the column "(.*?)"$/) do |column|
25
- @errors.first.column.should == column.to_i
25
+ expect( @errors.first.column ).to eq( column.to_i )
26
26
  end
27
27
 
28
28
  Then(/^that error should have the content "(.*)"$/) do |content|
29
- @errors.first.content.chomp.should == content.chomp
29
+ expect( @errors.first.content.chomp ).to eq( content.chomp )
30
30
  end
31
31
 
32
32
  Then(/^that error should have no content$/) do
33
- @errors.first.content.should == nil
33
+ expect( @errors.first.content ).to eq( nil )
34
34
  end
35
35
 
36
36
  Given(/^I have a CSV that doesn't exist$/) do
@@ -40,4 +40,4 @@ end
40
40
 
41
41
  Then(/^there should be no "(.*?)" errors$/) do |type|
42
42
  @errors.each do |error| error.type.should_not == type.to_sym end
43
- end
43
+ end
@@ -10,9 +10,9 @@ Given(/^I ask if there are info messages$/) do
10
10
  end
11
11
 
12
12
  Then(/^there should be (\d+) info messages?$/) do |num|
13
- @info_messages.count.should == num.to_i
13
+ expect( @info_messages.count ).to eq( num.to_i )
14
14
  end
15
15
 
16
16
  Then(/^one of the messages should have the type "(.*?)"$/) do |msg_type|
17
- @info_messages.find{|x| x.type == msg_type.to_sym}.should be_present
17
+ expect( @info_messages.find{|x| x.type == msg_type.to_sym} ).to be_present
18
18
  end
@@ -21,12 +21,12 @@ When(/^I ask if there are warnings$/) do
21
21
  @schema = Csvlint::Schema.from_json_table( @schema_url || "http://example.org ", JSON.parse(@schema_json) )
22
22
  end
23
23
 
24
- @validator = Csvlint::Validator.new( @url, @csv_options, @schema )
24
+ @validator = Csvlint::Validator.new( @url, @csv_options, @schema )
25
25
  @warnings = @validator.warnings
26
26
  end
27
27
 
28
28
  Then(/^there should be (\d+) warnings$/) do |count|
29
- @warnings.count.should == count.to_i
29
+ expect( @warnings.count ).to eq( count.to_i )
30
30
  end
31
31
 
32
32
  Given(/^the content type is set to "(.*?)"$/) do |type|
@@ -34,13 +34,13 @@ Given(/^the content type is set to "(.*?)"$/) do |type|
34
34
  end
35
35
 
36
36
  Then(/^that warning should have the row "(.*?)"$/) do |row|
37
- @warnings.first.row.should == row.to_i
37
+ expect( @warnings.first.row ).to eq( row.to_i )
38
38
  end
39
39
 
40
40
  Then(/^that warning should have the column "(.*?)"$/) do |column|
41
- @warnings.first.column.should == column.to_i
41
+ expect( @warnings.first.column ).to eq( column.to_i )
42
42
  end
43
43
 
44
44
  Then(/^that warning should have the type "(.*?)"$/) do |type|
45
- @warnings.first.type.should == type.to_sym
46
- end
45
+ expect( @warnings.first.type ).to eq( type.to_sym )
46
+ end
@@ -1,17 +1,12 @@
1
+ require 'coveralls'
2
+ Coveralls.wear_merged!('test_frameworks')
3
+
1
4
  $:.unshift File.join( File.dirname(__FILE__), "..", "..", "lib")
2
5
 
3
- require 'simplecov'
4
- require 'simplecov-rcov'
5
6
  require 'rspec/expectations'
6
7
  require 'csvlint'
7
- require 'coveralls'
8
8
  require 'pry'
9
9
 
10
- Coveralls.wear_merged!
11
-
12
- SimpleCov.formatter = SimpleCov::Formatter::RcovFormatter
13
- SimpleCov.start
14
-
15
10
  require 'spork'
16
11
 
17
12
  Spork.each_run do
@@ -148,4 +148,12 @@ Feature: Get validation errors
148
148
  And it is stored at the url "http://example.com/example1.csv"
149
149
  And I ask if there are errors
150
150
  Then there should be 1 error
151
- And that error should have the type "line_breaks"
151
+ And that error should have the type "line_breaks"
152
+
153
+
154
+ Scenario: inconsistent line endings with unquoted fields in file cause an error
155
+ Given I have a CSV file called "inconsistent-line-endings-unquoted.csv"
156
+ And it is stored at the url "http://example.com/example1.csv"
157
+ And I ask if there are errors
158
+ Then there should be 1 error
159
+ And that error should have the type "line_breaks"
data/lib/csvlint.rb CHANGED
@@ -1,9 +1,14 @@
1
1
  require 'csv'
2
+ require 'date'
2
3
  require 'open-uri'
3
- require 'mime/types'
4
+ require 'set'
4
5
  require 'tempfile'
5
6
 
6
- require 'csvlint/types'
7
+ require 'active_support/core_ext/date/conversions'
8
+ require 'active_support/core_ext/time/conversions'
9
+ require 'mime/types'
10
+ require 'open_uri_redirections'
11
+
7
12
  require 'csvlint/error_message'
8
13
  require 'csvlint/error_collector'
9
14
  require 'csvlint/validate'
data/lib/csvlint/field.rb CHANGED
@@ -2,7 +2,6 @@ module Csvlint
2
2
 
3
3
  class Field
4
4
  include Csvlint::ErrorCollector
5
- include Csvlint::Types
6
5
 
7
6
  attr_reader :name, :constraints, :title, :description
8
7
 
@@ -98,5 +97,73 @@ module Csvlint
98
97
  end
99
98
  return parsed
100
99
  end
100
+
101
+ TYPE_VALIDATIONS = {
102
+ 'http://www.w3.org/2001/XMLSchema#string' => lambda { |value, constraints| value },
103
+ 'http://www.w3.org/2001/XMLSchema#int' => lambda { |value, constraints| Integer value },
104
+ 'http://www.w3.org/2001/XMLSchema#integer' => lambda { |value, constraints| Integer value },
105
+ 'http://www.w3.org/2001/XMLSchema#float' => lambda { |value, constraints| Float value },
106
+ 'http://www.w3.org/2001/XMLSchema#double' => lambda { |value, constraints| Float value },
107
+ 'http://www.w3.org/2001/XMLSchema#anyURI' => lambda do |value, constraints|
108
+ u = URI.parse value
109
+ raise ArgumentError unless u.kind_of?(URI::HTTP) || u.kind_of?(URI::HTTPS)
110
+ u
111
+ end,
112
+ 'http://www.w3.org/2001/XMLSchema#boolean' => lambda do |value, constraints|
113
+ return true if ['true', '1'].include? value
114
+ return false if ['false', '0'].include? value
115
+ raise ArgumentError
116
+ end,
117
+ 'http://www.w3.org/2001/XMLSchema#nonPositiveInteger' => lambda do |value, constraints|
118
+ i = Integer value
119
+ raise ArgumentError unless i <= 0
120
+ i
121
+ end,
122
+ 'http://www.w3.org/2001/XMLSchema#negativeInteger' => lambda do |value, constraints|
123
+ i = Integer value
124
+ raise ArgumentError unless i < 0
125
+ i
126
+ end,
127
+ 'http://www.w3.org/2001/XMLSchema#nonNegativeInteger' => lambda do |value, constraints|
128
+ i = Integer value
129
+ raise ArgumentError unless i >= 0
130
+ i
131
+ end,
132
+ 'http://www.w3.org/2001/XMLSchema#positiveInteger' => lambda do |value, constraints|
133
+ i = Integer value
134
+ raise ArgumentError unless i > 0
135
+ i
136
+ end,
137
+ 'http://www.w3.org/2001/XMLSchema#dateTime' => lambda do |value, constraints|
138
+ date_pattern = constraints["datePattern"] || "%Y-%m-%dT%H:%M:%SZ"
139
+ d = DateTime.strptime(value, date_pattern)
140
+ raise ArgumentError unless d.strftime(date_pattern) == value
141
+ d
142
+ end,
143
+ 'http://www.w3.org/2001/XMLSchema#date' => lambda do |value, constraints|
144
+ date_pattern = constraints["datePattern"] || "%Y-%m-%d"
145
+ d = Date.strptime(value, date_pattern)
146
+ raise ArgumentError unless d.strftime(date_pattern) == value
147
+ d
148
+ end,
149
+ 'http://www.w3.org/2001/XMLSchema#time' => lambda do |value, constraints|
150
+ date_pattern = constraints["datePattern"] || "%H:%M:%S"
151
+ d = DateTime.strptime(value, date_pattern)
152
+ raise ArgumentError unless d.strftime(date_pattern) == value
153
+ d
154
+ end,
155
+ 'http://www.w3.org/2001/XMLSchema#gYear' => lambda do |value, constraints|
156
+ date_pattern = constraints["datePattern"] || "%Y"
157
+ d = Date.strptime(value, date_pattern)
158
+ raise ArgumentError unless d.strftime(date_pattern) == value
159
+ d
160
+ end,
161
+ 'http://www.w3.org/2001/XMLSchema#gYearMonth' => lambda do |value, constraints|
162
+ date_pattern = constraints["datePattern"] || "%Y-%m"
163
+ d = Date.strptime(value, date_pattern)
164
+ raise ArgumentError unless d.strftime(date_pattern) == value
165
+ d
166
+ end,
167
+ }
101
168
  end
102
169
  end
@@ -1,5 +1,3 @@
1
- require "set"
2
-
3
1
  module Csvlint
4
2
 
5
3
  class Schema
@@ -1,11 +1,8 @@
1
- require "open_uri_redirections"
2
-
3
1
  module Csvlint
4
2
 
5
3
  class Validator
6
4
 
7
5
  include Csvlint::ErrorCollector
8
- include Csvlint::Types
9
6
 
10
7
  attr_reader :encoding, :content_type, :extension, :headers, :line_breaks, :dialect, :csv_header, :schema, :data
11
8
 
@@ -13,9 +10,10 @@ module Csvlint
13
10
  "Missing or stray quote" => :stray_quote,
14
11
  "Illegal quoting" => :whitespace,
15
12
  "Unclosed quoted field" => :unclosed_quote,
13
+ "Unquoted fields do not allow \\r or \\n" => :line_breaks,
16
14
  }
17
15
 
18
- def initialize(source, dialect = nil, schema = nil)
16
+ def initialize(source, dialect = nil, schema = nil, options = {})
19
17
  @source = source
20
18
  @formats = []
21
19
  @schema = schema
@@ -31,7 +29,7 @@ module Csvlint
31
29
  }.merge(dialect || {})
32
30
 
33
31
  @csv_header = @dialect["header"]
34
-
32
+ @limit_lines = options[:limit_lines]
35
33
  @csv_options = dialect_to_csv_options(@dialect)
36
34
  @extension = parse_extension(source)
37
35
  reset
@@ -111,19 +109,22 @@ module Csvlint
111
109
  end
112
110
  row = nil
113
111
  loop do
114
- current_line = current_line + 1
112
+ current_line += 1
113
+ if @limit_lines && current_line > @limit_lines
114
+ break
115
+ end
115
116
  begin
116
117
  wrapper.reset_line
117
118
  row = csv.shift
118
119
  @data << row
119
120
  if row
120
121
  if current_line == 1 && header?
121
- row = row.reject {|r| r.blank? }
122
+ row = row.reject{|col| col.nil? || col.empty?}
122
123
  validate_header(row)
123
124
  @col_counts << row.size
124
125
  else
125
- build_formats(row, current_line)
126
- @col_counts << row.reject {|r| r.blank? }.size
126
+ build_formats(row)
127
+ @col_counts << row.reject{|col| col.nil? || col.empty?}.size
127
128
  @expected_columns = row.size unless @expected_columns != 0
128
129
 
129
130
  build_errors(:blank_rows, :structure, current_line, nil, wrapper.line) if row.reject{ |c| c.nil? || c.empty? }.size == 0
@@ -150,7 +151,7 @@ module Csvlint
150
151
  end
151
152
  end
152
153
  rescue ArgumentError => ae
153
- build_errors(:invalid_encoding, :structure, current_line, wrapper.line) unless reported_invalid_encoding
154
+ build_errors(:invalid_encoding, :structure, current_line, nil, wrapper.line) unless reported_invalid_encoding
154
155
  reported_invalid_encoding = true
155
156
  end
156
157
  end
@@ -178,7 +179,7 @@ module Csvlint
178
179
  end
179
180
 
180
181
  def fetch_error(error)
181
- e = error.message.match(/^([a-z ]+) (i|o)n line ([0-9]+)\.?$/i)
182
+ e = error.message.match(/^(.+?)(?: [io]n)? \(?line \d+\)?\.?$/i)
182
183
  message = e[1] rescue nil
183
184
  ERROR_MATCHERS.fetch(message, :unknown_error)
184
185
  end
@@ -195,40 +196,52 @@ module Csvlint
195
196
  }
196
197
  end
197
198
 
198
- def build_formats(row, line)
199
+ def build_formats(row)
199
200
  row.each_with_index do |col, i|
200
- next if col.blank?
201
- @formats[i] ||= []
202
-
203
- SIMPLE_FORMATS.each do |type, lambda|
204
- begin
205
- if lambda.call(col)
206
- @format = type
207
- end
208
- rescue ArgumentError, URI::InvalidURIError
209
- end
201
+ next if col.nil? || col.empty?
202
+ @formats[i] ||= Hash.new(0)
203
+
204
+ format = if col.strip[FORMATS[:numeric]]
205
+ :numeric
206
+ elsif uri?(col)
207
+ :uri
208
+ elsif col[FORMATS[:date_db]] && date_format?(Date, col, '%Y-%m-%d')
209
+ :date_db
210
+ elsif col[FORMATS[:date_short]] && date_format?(Date, col, '%e %b')
211
+ :date_short
212
+ elsif col[FORMATS[:date_rfc822]] && date_format?(Date, col, '%e %b %Y')
213
+ :date_rfc822
214
+ elsif col[FORMATS[:date_long]] && date_format?(Date, col, '%B %e, %Y')
215
+ :date_long
216
+ elsif col[FORMATS[:dateTime_time]] && date_format?(Time, col, '%H:%M')
217
+ :dateTime_time
218
+ elsif col[FORMATS[:dateTime_hms]] && date_format?(Time, col, '%H:%M:%S')
219
+ :dateTime_hms
220
+ elsif col[FORMATS[:dateTime_db]] && date_format?(Time, col, '%Y-%m-%d %H:%M:%S')
221
+ :dateTime_db
222
+ elsif col[FORMATS[:dateTime_iso8601]] && date_format?(Time, col, '%Y-%m-%dT%H:%M:%SZ')
223
+ :dateTime_iso8601
224
+ elsif col[FORMATS[:dateTime_short]] && date_format?(Time, col, '%d %b %H:%M')
225
+ :dateTime_short
226
+ elsif col[FORMATS[:dateTime_long]] && date_format?(Time, col, '%B %d, %Y %H:%M')
227
+ :dateTime_long
228
+ else
229
+ :string
210
230
  end
211
231
 
212
- @formats[i] << @format
232
+ @formats[i][format] += 1
213
233
  end
214
234
  end
215
235
 
216
236
  def check_consistency
217
- percentages = []
218
-
219
- SIMPLE_FORMATS.keys.each do |type|
220
- @formats.each_with_index do |format,i|
221
- percentages[i] ||= {}
222
- unless format.nil?
223
- percentages[i][type] = format.count(type) / format.size.to_f
237
+ @formats.each_with_index do |format,i|
238
+ if format
239
+ total = format.values.reduce(:+).to_f
240
+ if format.none?{|_,count| count / total >= 0.9}
241
+ build_warnings(:inconsistent_values, :schema, nil, i + 1)
224
242
  end
225
243
  end
226
244
  end
227
-
228
- percentages.each_with_index do |col, i|
229
- next if col.values.blank?
230
- build_warnings(:inconsistent_values, :schema, nil, i+1) if col.values.max < 0.9
231
- end
232
245
  end
233
246
 
234
247
  private
@@ -248,6 +261,36 @@ module Csvlint
248
261
  File.extname(parsed.path)
249
262
  end
250
263
  end
251
-
264
+
265
+ def uri?(value)
266
+ if value.strip[FORMATS[:uri]]
267
+ uri = URI.parse(value)
268
+ uri.kind_of?(URI::HTTP) || uri.kind_of?(URI::HTTPS)
269
+ end
270
+ rescue URI::InvalidURIError
271
+ false
272
+ end
273
+
274
+ def date_format?(klass, value, format)
275
+ klass.strptime(value, format).strftime(format) == value
276
+ rescue ArgumentError # invalid date
277
+ false
278
+ end
279
+
280
+ FORMATS = {
281
+ :string => nil,
282
+ :numeric => /\A[-+]?\d*\.?\d+(?:[eE][-+]?\d+)?\z/,
283
+ :uri => /\Ahttps?:/,
284
+ :date_db => /\A\d{4,}-\d\d-\d\d\z/, # "12345-01-01"
285
+ :date_long => /\A(?:#{Date::MONTHNAMES.join('|')}) [ \d]\d, \d{4,}\z/, # "January 1, 12345"
286
+ :date_rfc822 => /\A[ \d]\d (?:#{Date::ABBR_MONTHNAMES.join('|')}) \d{4,}\z/, # " 1 Jan 12345"
287
+ :date_short => /\A[ \d]\d (?:#{Date::ABBR_MONTHNAMES.join('|')})\z/, # "1 Jan"
288
+ :dateTime_db => /\A\d{4,}-\d\d-\d\d \d\d:\d\d:\d\d\z/, # "12345-01-01 00:00:00"
289
+ :dateTime_hms => /\A\d\d:\d\d:\d\d\z/, # "00:00:00"
290
+ :dateTime_iso8601 => /\A\d{4,}-\d\d-\d\dT\d\d:\d\d:\d\dZ\z/, # "12345-01-01T00:00:00Z"
291
+ :dateTime_long => /\A(?:#{Date::MONTHNAMES.join('|')}) \d\d, \d{4,} \d\d:\d\d\z/, # "January 01, 12345 00:00"
292
+ :dateTime_short => /\A\d\d (?:#{Date::ABBR_MONTHNAMES.join('|')}) \d\d:\d\d\z/, # "01 Jan 00:00"
293
+ :dateTime_time => /\A\d\d:\d\d\z/, # "00:00"
294
+ }.freeze
252
295
  end
253
- end
296
+ end
@@ -1,3 +1,3 @@
1
1
  module Csvlint
2
- VERSION = "0.1.0"
2
+ VERSION = "0.1.1"
3
3
  end
data/spec/spec_helper.rb CHANGED
@@ -1,11 +1,9 @@
1
- require 'simplecov'
2
- require 'simplecov-rcov'
1
+ require 'coveralls'
2
+ Coveralls.wear_merged!('test_frameworks')
3
+
3
4
  require 'csvlint'
4
5
  require 'pry'
5
6
  require 'webmock/rspec'
6
- require 'coveralls'
7
-
8
- Coveralls.wear_merged!
9
7
 
10
8
  RSpec.configure do |config|
11
9
  config.treat_symbols_as_metadata_keys_with_true_values = true
@@ -159,21 +159,21 @@ describe Csvlint::Validator do
159
159
  context "build_formats" do
160
160
 
161
161
  {
162
- "string" => "foo",
163
- "numeric" => "1",
164
- "uri" => "http://www.example.com",
165
- "dateTime_iso8601" => "2013-01-01T13:00:00Z",
166
- "date_db" => "2013-01-01",
167
- "dateTime_hms" => "13:00:00"
162
+ :string => "foo",
163
+ :numeric => "1",
164
+ :uri => "http://www.example.com",
165
+ :dateTime_iso8601 => "2013-01-01T13:00:00Z",
166
+ :date_db => "2013-01-01",
167
+ :dateTime_hms => "13:00:00"
168
168
  }.each do |type, content|
169
169
  it "should return the format of #{type} correctly" do
170
170
  row = [content]
171
171
 
172
172
  validator = Csvlint::Validator.new("http://example.com/example.csv")
173
- validator.build_formats(row, 1)
173
+ validator.build_formats(row)
174
174
  formats = validator.instance_variable_get("@formats")
175
175
 
176
- formats[0].first.should == type
176
+ formats[0].keys.first.should == type
177
177
  end
178
178
  end
179
179
 
@@ -181,18 +181,18 @@ describe Csvlint::Validator do
181
181
  row = ["12", "3.1476"]
182
182
 
183
183
  validator = Csvlint::Validator.new("http://example.com/example.csv")
184
- validator.build_formats(row, 1)
184
+ validator.build_formats(row)
185
185
  formats = validator.instance_variable_get("@formats")
186
186
 
187
- formats[0].first.should == "numeric"
188
- formats[1].first.should == "numeric"
187
+ formats[0].keys.first.should == :numeric
188
+ formats[1].keys.first.should == :numeric
189
189
  end
190
190
 
191
191
  it "should ignore blank arrays" do
192
192
  row = []
193
193
 
194
194
  validator = Csvlint::Validator.new("http://example.com/example.csv")
195
- validator.build_formats(row, 1)
195
+ validator.build_formats(row)
196
196
  formats = validator.instance_variable_get("@formats")
197
197
  formats.should == []
198
198
  end
@@ -207,16 +207,12 @@ describe Csvlint::Validator do
207
207
  validator = Csvlint::Validator.new("http://example.com/example.csv")
208
208
 
209
209
  rows.each_with_index do |row, i|
210
- validator.build_formats(row, i)
210
+ validator.build_formats(row)
211
211
  end
212
212
 
213
213
  formats = validator.instance_variable_get("@formats")
214
214
 
215
- formats.should == [
216
- ["string",
217
- "string",
218
- "string"]
219
- ]
215
+ formats.should == [{:string => 3}]
220
216
  end
221
217
 
222
218
  it "should return formats correctly if a row is blank" do
@@ -228,15 +224,15 @@ describe Csvlint::Validator do
228
224
  validator = Csvlint::Validator.new("http://example.com/example.csv")
229
225
 
230
226
  rows.each_with_index do |row, i|
231
- validator.build_formats(row, i)
227
+ validator.build_formats(row)
232
228
  end
233
229
 
234
230
  formats = validator.instance_variable_get("@formats")
235
231
 
236
232
  formats.should == [
237
- ["string"],
238
- ["numeric"],
239
- ["string"]
233
+ {:string => 1},
234
+ {:numeric => 1},
235
+ {:string => 1},
240
236
  ]
241
237
  end
242
238
 
@@ -246,9 +242,9 @@ describe Csvlint::Validator do
246
242
 
247
243
  it "should return a warning if columns have inconsistent values" do
248
244
  formats = [
249
- ["string", "string", "string"],
250
- ["string", "numeric", "string"],
251
- ["numeric", "numeric", "numeric"],
245
+ {:string => 3},
246
+ {:string => 2, :numeric => 1},
247
+ {:numeric => 3},
252
248
  ]
253
249
 
254
250
  validator = Csvlint::Validator.new("http://example.com/example.csv")
@@ -290,6 +286,17 @@ describe Csvlint::Validator do
290
286
  expect( data[2] ).to eql ['3','2','1']
291
287
  end
292
288
 
289
+ it "should limit number of lines read" do
290
+ stub_request(:get, "http://example.com/example.csv").to_return(:status => 200,
291
+ :headers=>{"Content-Type" => "text/csv; header=present"},
292
+ :body => File.read(File.join(File.dirname(__FILE__),'..','features','fixtures','valid.csv')))
293
+ validator = Csvlint::Validator.new("http://example.com/example.csv", nil, nil, limit_lines: 2)
294
+ expect( validator.valid? ).to eql(true)
295
+ data = validator.data
296
+ expect( data.count ).to eql 2
297
+ expect( data[0] ).to eql ['Foo','Bar','Baz']
298
+ end
299
+
293
300
  it "should follow redirects to SSL" do
294
301
  stub_request(:get, "http://example.com/redirect").to_return(:status => 301, :headers=>{"Location" => "https://example.com/example.csv"})
295
302
  stub_request(:get, "https://example.com/example.csv").to_return(:status => 200,
metadata CHANGED
@@ -1,14 +1,14 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: csvlint
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.1.0
4
+ version: 0.1.1
5
5
  platform: ruby
6
6
  authors:
7
7
  - pezholio
8
8
  autorequire:
9
9
  bindir: bin
10
10
  cert_chain: []
11
- date: 2014-11-27 00:00:00.000000000 Z
11
+ date: 2015-07-13 00:00:00.000000000 Z
12
12
  dependencies:
13
13
  - !ruby/object:Gem::Dependency
14
14
  name: mime-types
@@ -259,6 +259,7 @@ extra_rdoc_files: []
259
259
  files:
260
260
  - .coveralls.yml
261
261
  - .gitignore
262
+ - .ruby-version
262
263
  - .travis.yml
263
264
  - Gemfile
264
265
  - LICENSE.md
@@ -271,6 +272,7 @@ files:
271
272
  - features/csv_options.feature
272
273
  - features/fixtures/cr-line-endings.csv
273
274
  - features/fixtures/crlf-line-endings.csv
275
+ - features/fixtures/inconsistent-line-endings-unquoted.csv
274
276
  - features/fixtures/inconsistent-line-endings.csv
275
277
  - features/fixtures/invalid-byte-sequence.csv
276
278
  - features/fixtures/lf-line-endings.csv
@@ -300,7 +302,6 @@ files:
300
302
  - lib/csvlint/error_message.rb
301
303
  - lib/csvlint/field.rb
302
304
  - lib/csvlint/schema.rb
303
- - lib/csvlint/types.rb
304
305
  - lib/csvlint/validate.rb
305
306
  - lib/csvlint/version.rb
306
307
  - lib/csvlint/wrapped_io.rb
@@ -328,7 +329,7 @@ required_rubygems_version: !ruby/object:Gem::Requirement
328
329
  version: '0'
329
330
  requirements: []
330
331
  rubyforge_project:
331
- rubygems_version: 2.4.2
332
+ rubygems_version: 2.4.5
332
333
  signing_key:
333
334
  specification_version: 4
334
335
  summary: CSV Validator
@@ -337,6 +338,7 @@ test_files:
337
338
  - features/csv_options.feature
338
339
  - features/fixtures/cr-line-endings.csv
339
340
  - features/fixtures/crlf-line-endings.csv
341
+ - features/fixtures/inconsistent-line-endings-unquoted.csv
340
342
  - features/fixtures/inconsistent-line-endings.csv
341
343
  - features/fixtures/invalid-byte-sequence.csv
342
344
  - features/fixtures/lf-line-endings.csv
data/lib/csvlint/types.rb DELETED
@@ -1,137 +0,0 @@
1
- require 'set'
2
- require 'date'
3
- require 'active_support/core_ext/date/conversions'
4
- require 'active_support/core_ext/time/conversions'
5
-
6
- module Csvlint
7
- module Types
8
- SIMPLE_FORMATS = {
9
- 'string' => lambda { |value| true },
10
- 'numeric' => lambda { |value| value.strip[/\A[-+]?\d*\.?\d+(?:[eE][-+]?\d+)?\z/] },
11
- 'uri' => lambda do |value|
12
- if value.strip[/\Ahttps?:/]
13
- u = URI.parse(value)
14
- u.kind_of?(URI::HTTP) || u.kind_of?(URI::HTTPS)
15
- end
16
- end
17
- }
18
-
19
- def self.date_format(klass, value, format, pattern)
20
- if value[pattern]
21
- klass.strptime(value, format).strftime(format) == value
22
- end
23
- end
24
-
25
- def self.included(base)
26
- [
27
- [ :db, "%Y-%m-%d",
28
- /\A\d{4,}-\d\d-\d\d\z/],
29
- [ :number, "%Y%m%d",
30
- /\A\d{8}\z/],
31
- [ :short, "%e %b",
32
- /\A[ \d]\d (?:#{Date::ABBR_MONTHNAMES.join('|')})\z/],
33
- [ :rfc822, "%e %b %Y",
34
- /\A[ \d]\d (?:#{Date::ABBR_MONTHNAMES.join('|')}) \d{4,}\z/],
35
- [ :long, "%B %e, %Y",
36
- /\A(?:#{Date::MONTHNAMES.join('|')}) [ \d]\d, \d{4,}\z/],
37
- ].each do |type,format,pattern|
38
- SIMPLE_FORMATS["date_#{type}"] = lambda do |value|
39
- date_format(Date, value, format, pattern)
40
- end
41
- end
42
-
43
- # strptime doesn't support widths like %9N, unlike strftime.
44
- # @see http://ruby-doc.org/stdlib-2.0/libdoc/date/rdoc/DateTime.html
45
- [
46
- [ :time, "%H:%M",
47
- /\A\d\d:\d\d\z/],
48
- [ :hms, "%H:%M:%S",
49
- /\A\d\d:\d\d:\d\d\z/],
50
- [ :db, "%Y-%m-%d %H:%M:%S",
51
- /\A\d{4,}-\d\d-\d\d \d\d:\d\d:\d\d\z/],
52
- [ :iso8601, "%Y-%m-%dT%H:%M:%SZ",
53
- /\A\d{4,}-\d\d-\d\dT\d\d:\d\d:\d\dZ\z/],
54
- [ :number, "%Y%m%d%H%M%S",
55
- /\A\d{14}\z/],
56
- [ :nsec, "%Y%m%d%H%M%S%N",
57
- /\A\d{23}\z/],
58
- [ :short, "%d %b %H:%M",
59
- /\A\d\d (?:#{Date::ABBR_MONTHNAMES.join('|')}) \d\d:\d\d\z/],
60
- [ :long, "%B %d, %Y %H:%M",
61
- /\A(?:#{Date::MONTHNAMES.join('|')}) \d\d, \d{4,} \d\d:\d\d\z/],
62
- ].each do |type,format,pattern|
63
- SIMPLE_FORMATS["dateTime_#{type}"] = lambda do |value|
64
- date_format(Time, value, format, pattern)
65
- end
66
- end
67
- end
68
-
69
- TYPE_VALIDATIONS = {
70
- 'http://www.w3.org/2001/XMLSchema#string' => lambda { |value, constraints| value },
71
- 'http://www.w3.org/2001/XMLSchema#int' => lambda { |value, constraints| Integer value },
72
- 'http://www.w3.org/2001/XMLSchema#integer' => lambda { |value, constraints| Integer value },
73
- 'http://www.w3.org/2001/XMLSchema#float' => lambda { |value, constraints| Float value },
74
- 'http://www.w3.org/2001/XMLSchema#double' => lambda { |value, constraints| Float value },
75
- 'http://www.w3.org/2001/XMLSchema#anyURI' => lambda do |value, constraints|
76
- u = URI.parse value
77
- raise ArgumentError unless u.kind_of?(URI::HTTP) || u.kind_of?(URI::HTTPS)
78
- u
79
- end,
80
- 'http://www.w3.org/2001/XMLSchema#boolean' => lambda do |value, constraints|
81
- return true if ['true', '1'].include? value
82
- return false if ['false', '0'].include? value
83
- raise ArgumentError
84
- end,
85
- 'http://www.w3.org/2001/XMLSchema#nonPositiveInteger' => lambda do |value, constraints|
86
- i = Integer value
87
- raise ArgumentError unless i <= 0
88
- i
89
- end,
90
- 'http://www.w3.org/2001/XMLSchema#negativeInteger' => lambda do |value, constraints|
91
- i = Integer value
92
- raise ArgumentError unless i < 0
93
- i
94
- end,
95
- 'http://www.w3.org/2001/XMLSchema#nonNegativeInteger' => lambda do |value, constraints|
96
- i = Integer value
97
- raise ArgumentError unless i >= 0
98
- i
99
- end,
100
- 'http://www.w3.org/2001/XMLSchema#positiveInteger' => lambda do |value, constraints|
101
- i = Integer value
102
- raise ArgumentError unless i > 0
103
- i
104
- end,
105
- 'http://www.w3.org/2001/XMLSchema#dateTime' => lambda do |value, constraints|
106
- date_pattern = constraints["datePattern"] || "%Y-%m-%dT%H:%M:%SZ"
107
- d = DateTime.strptime(value, date_pattern)
108
- raise ArgumentError unless d.strftime(date_pattern) == value
109
- d
110
- end,
111
- 'http://www.w3.org/2001/XMLSchema#date' => lambda do |value, constraints|
112
- date_pattern = constraints["datePattern"] || "%Y-%m-%d"
113
- d = Date.strptime(value, date_pattern)
114
- raise ArgumentError unless d.strftime(date_pattern) == value
115
- d
116
- end,
117
- 'http://www.w3.org/2001/XMLSchema#time' => lambda do |value, constraints|
118
- date_pattern = constraints["datePattern"] || "%H:%M:%S"
119
- d = DateTime.strptime(value, date_pattern)
120
- raise ArgumentError unless d.strftime(date_pattern) == value
121
- d
122
- end,
123
- 'http://www.w3.org/2001/XMLSchema#gYear' => lambda do |value, constraints|
124
- date_pattern = constraints["datePattern"] || "%Y"
125
- d = Date.strptime(value, date_pattern)
126
- raise ArgumentError unless d.strftime(date_pattern) == value
127
- d
128
- end,
129
- 'http://www.w3.org/2001/XMLSchema#gYearMonth' => lambda do |value, constraints|
130
- date_pattern = constraints["datePattern"] || "%Y-%m"
131
- d = Date.strptime(value, date_pattern)
132
- raise ArgumentError unless d.strftime(date_pattern) == value
133
- d
134
- end,
135
- }
136
- end
137
- end