csvlint 0.1.0 → 0.1.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,15 +1,15 @@
1
1
  ---
2
2
  !binary "U0hBMQ==":
3
3
  metadata.gz: !binary |-
4
- NzY4MzE5MGI1OWI4YzkzY2U1MTZlMWJjNDM4YzM5ZjNiMWI5ZWIyMA==
4
+ MDgwMGE1ZmY5ZWE2MzM2ODhhMTFhZjM5NTkyZTNkZWExZjFkN2I0Mw==
5
5
  data.tar.gz: !binary |-
6
- OTM5MWJmYzNkYTBiMjM4ZGU3Yjc2NTY0Njk4OTEzMDJhNmI2Yjg5Mg==
6
+ OTRhZTljMTc5ZDlmZDVlMmQ2ZDU2MTFmYmRlNGUyMGIyMTI3NTBmOQ==
7
7
  SHA512:
8
8
  metadata.gz: !binary |-
9
- YzU4NzJmZTlkYWE1NTY0MGY4NzY0N2Q1NDYxMzI1N2YxYjZhZThmNDZhMDVj
10
- Mjc1YmMyZGZkODhlNDE4OWZiYWM5NmVjNTNiOGRmMjI0OGY4ZjJkY2I4OTNl
11
- ZmYwMGMwN2YwOWNlYmUzNjZjNWU3OTEzYTIwZDYzMGMzZDBiOGQ=
9
+ ZDE4YWVlNGYzY2E5OGJjM2ZkZjA0MzQwMWE1Yzg5YTI3YzM0MjNkMmFhNjFj
10
+ YmY3YzYzN2E5NDk1NDM4YmNmMGY1ZWRlNmExZWI0NmYzZmQzNTc2N2ZiOTMx
11
+ YzU2MjhiMmE4YTc2NTVmNzE4YWI4ZDZjOWM5MDlhNTRlY2RkMzE=
12
12
  data.tar.gz: !binary |-
13
- YTkwNWVlNjlmZGZkZDJjNmUyZGQxMzBkYzViOTc1NzlhNjYzYjU3ZDljMmRl
14
- YzExNDFkMDUxOThkMjJkNDRkZWYxZjNlMzdmYTRmNjI1MTliNGYzOWQxYzY3
15
- MGIxMjA1YzhkYjUwOWZkOGU2NjViYjNlNjBhMTY4NTY0N2E4MmI=
13
+ NDY1ODhkODdlZDJlYzMyMTQ1NzFlYzAyNTYyN2YzMGE1NTM2Yzg1NWRiYTQ5
14
+ NmQ1ODE5MzNmOTU4ZjhmNmZjNWQwNWU1N2E5OTdmOWMyZmZiYjE4ZmVhNmVh
15
+ YzdmN2VmNjY1NjFkMDdmYTc0YzZiNTk2YWQxMWZkMmRkNmVhODg=
data/.ruby-version ADDED
@@ -0,0 +1 @@
1
+ 2.1.4
data/.travis.yml CHANGED
@@ -1,5 +1,9 @@
1
1
  rvm:
2
2
  - 2.0.0
3
+ - 2.1.0
4
+ - 2.2.0
5
+ sudo: false
6
+ cache: bundler
3
7
  notifications:
4
8
  irc:
5
9
  channels:
data/README.md CHANGED
@@ -24,12 +24,35 @@ Or install it yourself as:
24
24
 
25
25
  ## Usage
26
26
 
27
+ You can either use this gem within your own Ruby code, or as a standolone command line application
28
+
29
+ ## On the command line
30
+
31
+ After installing the gem, you can validate a CSV on the command line like so:
32
+
33
+ csvlint myfile.csv
34
+
35
+ You will then see the validation result, together with any warnings or errors e.g.
36
+
37
+ ```
38
+ myfile.csv is INVALID
39
+ 1. blank_rows. Row: 3
40
+ 1. title_row.
41
+ 2. inconsistent_values. Column: 14
42
+ ```
43
+
44
+ You can also optionally pass a schema file like so:
45
+
46
+ csvlint myfile.csv --schema=schema.json
47
+
48
+ ## In your own Ruby code
49
+
27
50
  Currently the gem supports retrieving a CSV accessible from a URL, File, or an IO-style object (e.g. StringIO)
28
51
 
29
52
  require 'csvlint'
30
53
 
31
54
  validator = Csvlint::Validator.new( "http://example.org/data.csv" )
32
- validator = Csvlint::Validator.new( File.new("/path/to/my/data.csv" )
55
+ validator = Csvlint::Validator.new( File.new("/path/to/my/data.csv" ))
33
56
  validator = Csvlint::Validator.new( StringIO.new( my_data_in_a_string ) )
34
57
 
35
58
  When validating from a URL the range of errors and warnings is wider as the library will also check HTTP headers for
@@ -61,13 +84,13 @@ best practices
61
84
  ## Controlling CSV Parsing
62
85
 
63
86
  The validator supports configuration of the [CSV Dialect](http://dataprotocols.org/csv-dialect/) used in a data file. This is specified by
64
- passing an options hash to the constructor:
87
+ passing a dialect hash to the constructor:
65
88
 
66
- opts = {
89
+ dialect = {
67
90
  "header" => true,
68
91
  "delimiter" => ","
69
92
  }
70
- validator = Csvlint::Validator.new( "http://example.org/data.csv", opts )
93
+ validator = Csvlint::Validator.new( "http://example.org/data.csv", dialect )
71
94
 
72
95
  The options should be a Hash that conforms to the [CSV Dialect](http://dataprotocols.org/csv-dialect/) JSON structure.
73
96
 
@@ -205,6 +228,19 @@ Schema validation provides some additional types of error and warning messages:
205
228
  * `:below_minimum` (error) -- a column with a `minimum` constraint contains a value that is below the minimum
206
229
  * `:above_maximum` (error) -- a column with a `maximum` constraint contains a value that is above the maximum
207
230
 
231
+ ## Other validation options
232
+
233
+ You can also provide an optional options hash as the fourth argument to Validator#new. Supported options are:
234
+
235
+ * :limit_lines -- only check this number of lines of the CSV file. Good for a quick check on huge files.
236
+
237
+ ```
238
+ options = {
239
+ limit_lines: 100
240
+ }
241
+ validator = Csvlint::Validator.new( "http://example.org/data.csv", nil, nil, options )
242
+ ```
243
+
208
244
  ## Contributing
209
245
 
210
246
  1. Fork it
data/bin/csvlint CHANGED
@@ -3,15 +3,57 @@ $:.unshift File.join( File.dirname(__FILE__), "..", "lib")
3
3
 
4
4
  require 'csvlint'
5
5
  require 'colorize'
6
+ require 'json'
7
+ require 'optparse'
8
+ require 'pp'
9
+
10
+ options = {}
11
+ opts = OptionParser.new
12
+
13
+ opts.banner = "Usage: csvlint [options] [file]"
14
+
15
+ opts.on("-d", "--dump-errors", "Pretty print error and warning objects.") do |d|
16
+ options[:dump] = d
17
+ end
18
+
19
+ opts.on("-s", "--schema-file FILENAME", "Schema file") do |s|
20
+ options[:schema_file] = s
21
+ end
22
+
23
+ opts.on_tail("-h", "--help",
24
+ "Show this message") do
25
+ puts opts
26
+ exit
27
+ end
28
+
29
+ begin
30
+ opts.parse!
31
+ rescue OptionParser::InvalidOption => e
32
+ puts e
33
+ puts opts
34
+ exit(1)
35
+ end
36
+
37
+ def print_error(index, error, dump, color)
6
38
 
7
- def print_error(index, error, color=:red)
8
39
  location = ""
9
40
  location += error.row.to_s if error.row
10
41
  location += "#{error.row ? "," : ""}#{error.column.to_s}" if error.column
11
42
  if error.row || error.column
12
43
  location = "#{error.row ? "Row" : "Column"}: #{location}"
13
44
  end
14
- puts "#{index+1}. #{error.type}. #{location}".colorize(color)
45
+ output_string = "#{index+1}. #{error.type}. #{location}"
46
+
47
+ if $stdout.tty?
48
+ puts output_string.colorize(color)
49
+ else
50
+ puts output_string
51
+ end
52
+
53
+ if dump
54
+ pp error
55
+ end
56
+
15
57
  end
16
58
 
17
59
  if ARGV.length == 0 && !$stdin.tty?
@@ -23,29 +65,45 @@ else
23
65
  begin
24
66
  source = File.new( source ) unless source =~ /^http(s)?/
25
67
  rescue Errno::ENOENT
26
- puts "File not found"
68
+ puts "#{source} not found"
27
69
  exit 1
28
70
  end
29
71
  end
30
72
  else
31
- puts "Usage: csvlint {file or URL} or {input} | csvlint"
73
+ puts "No CSV data to validate."
74
+ puts opts
32
75
  exit 1
33
76
  end
34
77
  end
35
78
 
36
- validator = Csvlint::Validator.new( source )
79
+ schema = nil
80
+ if options[:schema_file]
81
+ begin
82
+ schemafile = File.read( options[:schema_file] )
83
+ rescue Errno::ENOENT
84
+ puts "#{options[:schema_file]} not found"
85
+ exit 1
86
+ end
87
+ schema = Csvlint::Schema.from_json_table(nil, JSON.parse(schemafile))
88
+ end
89
+
90
+ validator = Csvlint::Validator.new( source, nil, schema )
37
91
 
38
- puts "#{ARGV[0] || "CSV"} is #{validator.valid? ? "VALID".green : "INVALID".red}"
92
+ if $stdout.tty?
93
+ puts "#{ARGV[0] || "CSV"} is #{validator.valid? ? "VALID".green : "INVALID".red}"
94
+ else
95
+ puts "#{ARGV[0] || "CSV"} is #{validator.valid? ? "VALID" : "INVALID"}"
96
+ end
39
97
 
40
98
  if validator.errors.size > 0
41
99
  validator.errors.each_with_index do |error, i|
42
- print_error(i, error)
100
+ print_error(i, error, options[:dump], :red)
43
101
  end
44
102
  end
45
103
 
46
104
  if validator.warnings.size > 0
47
105
  validator.warnings.each_with_index do |error, i|
48
- print_error(i, error, :yellow)
106
+ print_error(i, error, options[:dump], :yellow)
49
107
  end
50
108
  end
51
109
 
@@ -0,0 +1,2 @@
1
+ Foo,Bsr,Baz
2
+ Qux,Teaspoon,Doge
@@ -1,6 +1,6 @@
1
1
  Feature: Parse CSV
2
-
3
- Scenario: Sucessfully parse a valid CSV
2
+
3
+ Scenario: Successfully parse a valid CSV
4
4
  Given I have a CSV with the following content:
5
5
  """
6
6
  "Foo","Bar","Baz"
@@ -10,12 +10,12 @@ Feature: Parse CSV
10
10
  And it is stored at the url "http://example.com/example1.csv"
11
11
  When I ask if the CSV is valid
12
12
  Then I should get the value of true
13
-
13
+
14
14
  Scenario: Successfully parse a CSV with newlines in quoted fields
15
15
  Given I have a CSV with the following content:
16
16
  """
17
17
  "a","b","c"
18
- "d","e","this is
18
+ "d","e","this is
19
19
  valid"
20
20
  "a","b","c"
21
21
  """
@@ -27,14 +27,14 @@ valid"
27
27
  Given I have a CSV with the following content:
28
28
  """
29
29
  "a","b","c"
30
- "d","this is
31
- valid","as is this
30
+ "d","this is
31
+ valid","as is this
32
32
  too"
33
33
  """
34
34
  And it is stored at the url "http://example.com/example1.csv"
35
35
  When I ask if the CSV is valid
36
36
  Then I should get the value of true
37
-
37
+
38
38
  Scenario: Successfully report an invalid CSV
39
39
  Given I have a CSV with the following content:
40
40
  """
@@ -43,7 +43,7 @@ too"
43
43
  And it is stored at the url "http://example.com/example1.csv"
44
44
  When I ask if the CSV is valid
45
45
  Then I should get the value of false
46
-
46
+
47
47
  Scenario: Successfully report a CSV with incorrect quoting
48
48
  Given I have a CSV with the following content:
49
49
  """
@@ -51,8 +51,8 @@ too"
51
51
  """
52
52
  And it is stored at the url "http://example.com/example1.csv"
53
53
  When I ask if the CSV is valid
54
- Then I should get the value of false
55
-
54
+ Then I should get the value of false
55
+
56
56
  Scenario: Successfully report a CSV with incorrect whitespace
57
57
  Given I have a CSV with the following content:
58
58
  """
@@ -60,8 +60,8 @@ too"
60
60
  """
61
61
  And it is stored at the url "http://example.com/example1.csv"
62
62
  When I ask if the CSV is valid
63
- Then I should get the value of false
64
-
63
+ Then I should get the value of false
64
+
65
65
  Scenario: Successfully report a CSV with ragged rows
66
66
  Given I have a CSV with the following content:
67
67
  """
@@ -4,10 +4,10 @@ end
4
4
 
5
5
  Then(/^the "(.*?)" should be "(.*?)"$/) do |type, encoding|
6
6
  validator = Csvlint::Validator.new( @url, default_csv_options )
7
- validator.send(type.to_sym).should == encoding
7
+ expect( validator.send(type.to_sym) ).to eq( encoding )
8
8
  end
9
9
 
10
10
  Then(/^the metadata content type should be "(.*?)"$/) do |content_type|
11
11
  validator = Csvlint::Validator.new( @url, default_csv_options )
12
- validator.headers['content-type'].should == content_type
12
+ expect( validator.headers['content-type'] ).to eq( content_type )
13
13
  end
@@ -1,5 +1,5 @@
1
1
  Given(/^I have a CSV with the following content:$/) do |string|
2
- @csv = string
2
+ @csv = string.to_s
3
3
  end
4
4
 
5
5
  Given(/^it is stored at the url "(.*?)"$/) do |url|
@@ -17,7 +17,7 @@ end
17
17
 
18
18
  When(/^I ask if the CSV is valid$/) do
19
19
  @csv_options ||= default_csv_options
20
- @validator = Csvlint::Validator.new( @url, @csv_options )
20
+ @validator = Csvlint::Validator.new( @url, @csv_options )
21
21
  @valid = @validator.valid?
22
22
  end
23
23
 
@@ -1,36 +1,36 @@
1
1
  When(/^I ask if there are errors$/) do
2
2
  @csv_options ||= default_csv_options
3
-
3
+
4
4
  if @schema_json
5
5
  @schema = Csvlint::Schema.from_json_table( @schema_url || "http://example.org ", JSON.parse(@schema_json) )
6
6
  end
7
-
8
- @validator = Csvlint::Validator.new( @url, @csv_options, @schema )
7
+
8
+ @validator = Csvlint::Validator.new( @url, @csv_options, @schema )
9
9
  @errors = @validator.errors
10
10
  end
11
11
 
12
- Then(/^there should be (\d+) error$/) do |count|
13
- @errors.count.should == count.to_i
12
+ Then(/^there should be (\d+) error$/) do |count|
13
+ expect( @errors.count ).to eq( count.to_i )
14
14
  end
15
15
 
16
16
  Then(/^that error should have the type "(.*?)"$/) do |type|
17
- @errors.first.type.should == type.to_sym
17
+ expect( @errors.first.type ).to eq( type.to_sym )
18
18
  end
19
19
 
20
20
  Then(/^that error should have the row "(.*?)"$/) do |row|
21
- @errors.first.row.should == row.to_i
21
+ expect( @errors.first.row ).to eq( row.to_i )
22
22
  end
23
23
 
24
24
  Then(/^that error should have the column "(.*?)"$/) do |column|
25
- @errors.first.column.should == column.to_i
25
+ expect( @errors.first.column ).to eq( column.to_i )
26
26
  end
27
27
 
28
28
  Then(/^that error should have the content "(.*)"$/) do |content|
29
- @errors.first.content.chomp.should == content.chomp
29
+ expect( @errors.first.content.chomp ).to eq( content.chomp )
30
30
  end
31
31
 
32
32
  Then(/^that error should have no content$/) do
33
- @errors.first.content.should == nil
33
+ expect( @errors.first.content ).to eq( nil )
34
34
  end
35
35
 
36
36
  Given(/^I have a CSV that doesn't exist$/) do
@@ -40,4 +40,4 @@ end
40
40
 
41
41
  Then(/^there should be no "(.*?)" errors$/) do |type|
42
42
  @errors.each do |error| error.type.should_not == type.to_sym end
43
- end
43
+ end
@@ -10,9 +10,9 @@ Given(/^I ask if there are info messages$/) do
10
10
  end
11
11
 
12
12
  Then(/^there should be (\d+) info messages?$/) do |num|
13
- @info_messages.count.should == num.to_i
13
+ expect( @info_messages.count ).to eq( num.to_i )
14
14
  end
15
15
 
16
16
  Then(/^one of the messages should have the type "(.*?)"$/) do |msg_type|
17
- @info_messages.find{|x| x.type == msg_type.to_sym}.should be_present
17
+ expect( @info_messages.find{|x| x.type == msg_type.to_sym} ).to be_present
18
18
  end
@@ -21,12 +21,12 @@ When(/^I ask if there are warnings$/) do
21
21
  @schema = Csvlint::Schema.from_json_table( @schema_url || "http://example.org ", JSON.parse(@schema_json) )
22
22
  end
23
23
 
24
- @validator = Csvlint::Validator.new( @url, @csv_options, @schema )
24
+ @validator = Csvlint::Validator.new( @url, @csv_options, @schema )
25
25
  @warnings = @validator.warnings
26
26
  end
27
27
 
28
28
  Then(/^there should be (\d+) warnings$/) do |count|
29
- @warnings.count.should == count.to_i
29
+ expect( @warnings.count ).to eq( count.to_i )
30
30
  end
31
31
 
32
32
  Given(/^the content type is set to "(.*?)"$/) do |type|
@@ -34,13 +34,13 @@ Given(/^the content type is set to "(.*?)"$/) do |type|
34
34
  end
35
35
 
36
36
  Then(/^that warning should have the row "(.*?)"$/) do |row|
37
- @warnings.first.row.should == row.to_i
37
+ expect( @warnings.first.row ).to eq( row.to_i )
38
38
  end
39
39
 
40
40
  Then(/^that warning should have the column "(.*?)"$/) do |column|
41
- @warnings.first.column.should == column.to_i
41
+ expect( @warnings.first.column ).to eq( column.to_i )
42
42
  end
43
43
 
44
44
  Then(/^that warning should have the type "(.*?)"$/) do |type|
45
- @warnings.first.type.should == type.to_sym
46
- end
45
+ expect( @warnings.first.type ).to eq( type.to_sym )
46
+ end
@@ -1,17 +1,12 @@
1
+ require 'coveralls'
2
+ Coveralls.wear_merged!('test_frameworks')
3
+
1
4
  $:.unshift File.join( File.dirname(__FILE__), "..", "..", "lib")
2
5
 
3
- require 'simplecov'
4
- require 'simplecov-rcov'
5
6
  require 'rspec/expectations'
6
7
  require 'csvlint'
7
- require 'coveralls'
8
8
  require 'pry'
9
9
 
10
- Coveralls.wear_merged!
11
-
12
- SimpleCov.formatter = SimpleCov::Formatter::RcovFormatter
13
- SimpleCov.start
14
-
15
10
  require 'spork'
16
11
 
17
12
  Spork.each_run do
@@ -148,4 +148,12 @@ Feature: Get validation errors
148
148
  And it is stored at the url "http://example.com/example1.csv"
149
149
  And I ask if there are errors
150
150
  Then there should be 1 error
151
- And that error should have the type "line_breaks"
151
+ And that error should have the type "line_breaks"
152
+
153
+
154
+ Scenario: inconsistent line endings with unquoted fields in file cause an error
155
+ Given I have a CSV file called "inconsistent-line-endings-unquoted.csv"
156
+ And it is stored at the url "http://example.com/example1.csv"
157
+ And I ask if there are errors
158
+ Then there should be 1 error
159
+ And that error should have the type "line_breaks"
data/lib/csvlint.rb CHANGED
@@ -1,9 +1,14 @@
1
1
  require 'csv'
2
+ require 'date'
2
3
  require 'open-uri'
3
- require 'mime/types'
4
+ require 'set'
4
5
  require 'tempfile'
5
6
 
6
- require 'csvlint/types'
7
+ require 'active_support/core_ext/date/conversions'
8
+ require 'active_support/core_ext/time/conversions'
9
+ require 'mime/types'
10
+ require 'open_uri_redirections'
11
+
7
12
  require 'csvlint/error_message'
8
13
  require 'csvlint/error_collector'
9
14
  require 'csvlint/validate'
data/lib/csvlint/field.rb CHANGED
@@ -2,7 +2,6 @@ module Csvlint
2
2
 
3
3
  class Field
4
4
  include Csvlint::ErrorCollector
5
- include Csvlint::Types
6
5
 
7
6
  attr_reader :name, :constraints, :title, :description
8
7
 
@@ -98,5 +97,73 @@ module Csvlint
98
97
  end
99
98
  return parsed
100
99
  end
100
+
101
+ TYPE_VALIDATIONS = {
102
+ 'http://www.w3.org/2001/XMLSchema#string' => lambda { |value, constraints| value },
103
+ 'http://www.w3.org/2001/XMLSchema#int' => lambda { |value, constraints| Integer value },
104
+ 'http://www.w3.org/2001/XMLSchema#integer' => lambda { |value, constraints| Integer value },
105
+ 'http://www.w3.org/2001/XMLSchema#float' => lambda { |value, constraints| Float value },
106
+ 'http://www.w3.org/2001/XMLSchema#double' => lambda { |value, constraints| Float value },
107
+ 'http://www.w3.org/2001/XMLSchema#anyURI' => lambda do |value, constraints|
108
+ u = URI.parse value
109
+ raise ArgumentError unless u.kind_of?(URI::HTTP) || u.kind_of?(URI::HTTPS)
110
+ u
111
+ end,
112
+ 'http://www.w3.org/2001/XMLSchema#boolean' => lambda do |value, constraints|
113
+ return true if ['true', '1'].include? value
114
+ return false if ['false', '0'].include? value
115
+ raise ArgumentError
116
+ end,
117
+ 'http://www.w3.org/2001/XMLSchema#nonPositiveInteger' => lambda do |value, constraints|
118
+ i = Integer value
119
+ raise ArgumentError unless i <= 0
120
+ i
121
+ end,
122
+ 'http://www.w3.org/2001/XMLSchema#negativeInteger' => lambda do |value, constraints|
123
+ i = Integer value
124
+ raise ArgumentError unless i < 0
125
+ i
126
+ end,
127
+ 'http://www.w3.org/2001/XMLSchema#nonNegativeInteger' => lambda do |value, constraints|
128
+ i = Integer value
129
+ raise ArgumentError unless i >= 0
130
+ i
131
+ end,
132
+ 'http://www.w3.org/2001/XMLSchema#positiveInteger' => lambda do |value, constraints|
133
+ i = Integer value
134
+ raise ArgumentError unless i > 0
135
+ i
136
+ end,
137
+ 'http://www.w3.org/2001/XMLSchema#dateTime' => lambda do |value, constraints|
138
+ date_pattern = constraints["datePattern"] || "%Y-%m-%dT%H:%M:%SZ"
139
+ d = DateTime.strptime(value, date_pattern)
140
+ raise ArgumentError unless d.strftime(date_pattern) == value
141
+ d
142
+ end,
143
+ 'http://www.w3.org/2001/XMLSchema#date' => lambda do |value, constraints|
144
+ date_pattern = constraints["datePattern"] || "%Y-%m-%d"
145
+ d = Date.strptime(value, date_pattern)
146
+ raise ArgumentError unless d.strftime(date_pattern) == value
147
+ d
148
+ end,
149
+ 'http://www.w3.org/2001/XMLSchema#time' => lambda do |value, constraints|
150
+ date_pattern = constraints["datePattern"] || "%H:%M:%S"
151
+ d = DateTime.strptime(value, date_pattern)
152
+ raise ArgumentError unless d.strftime(date_pattern) == value
153
+ d
154
+ end,
155
+ 'http://www.w3.org/2001/XMLSchema#gYear' => lambda do |value, constraints|
156
+ date_pattern = constraints["datePattern"] || "%Y"
157
+ d = Date.strptime(value, date_pattern)
158
+ raise ArgumentError unless d.strftime(date_pattern) == value
159
+ d
160
+ end,
161
+ 'http://www.w3.org/2001/XMLSchema#gYearMonth' => lambda do |value, constraints|
162
+ date_pattern = constraints["datePattern"] || "%Y-%m"
163
+ d = Date.strptime(value, date_pattern)
164
+ raise ArgumentError unless d.strftime(date_pattern) == value
165
+ d
166
+ end,
167
+ }
101
168
  end
102
169
  end
@@ -1,5 +1,3 @@
1
- require "set"
2
-
3
1
  module Csvlint
4
2
 
5
3
  class Schema
@@ -1,11 +1,8 @@
1
- require "open_uri_redirections"
2
-
3
1
  module Csvlint
4
2
 
5
3
  class Validator
6
4
 
7
5
  include Csvlint::ErrorCollector
8
- include Csvlint::Types
9
6
 
10
7
  attr_reader :encoding, :content_type, :extension, :headers, :line_breaks, :dialect, :csv_header, :schema, :data
11
8
 
@@ -13,9 +10,10 @@ module Csvlint
13
10
  "Missing or stray quote" => :stray_quote,
14
11
  "Illegal quoting" => :whitespace,
15
12
  "Unclosed quoted field" => :unclosed_quote,
13
+ "Unquoted fields do not allow \\r or \\n" => :line_breaks,
16
14
  }
17
15
 
18
- def initialize(source, dialect = nil, schema = nil)
16
+ def initialize(source, dialect = nil, schema = nil, options = {})
19
17
  @source = source
20
18
  @formats = []
21
19
  @schema = schema
@@ -31,7 +29,7 @@ module Csvlint
31
29
  }.merge(dialect || {})
32
30
 
33
31
  @csv_header = @dialect["header"]
34
-
32
+ @limit_lines = options[:limit_lines]
35
33
  @csv_options = dialect_to_csv_options(@dialect)
36
34
  @extension = parse_extension(source)
37
35
  reset
@@ -111,19 +109,22 @@ module Csvlint
111
109
  end
112
110
  row = nil
113
111
  loop do
114
- current_line = current_line + 1
112
+ current_line += 1
113
+ if @limit_lines && current_line > @limit_lines
114
+ break
115
+ end
115
116
  begin
116
117
  wrapper.reset_line
117
118
  row = csv.shift
118
119
  @data << row
119
120
  if row
120
121
  if current_line == 1 && header?
121
- row = row.reject {|r| r.blank? }
122
+ row = row.reject{|col| col.nil? || col.empty?}
122
123
  validate_header(row)
123
124
  @col_counts << row.size
124
125
  else
125
- build_formats(row, current_line)
126
- @col_counts << row.reject {|r| r.blank? }.size
126
+ build_formats(row)
127
+ @col_counts << row.reject{|col| col.nil? || col.empty?}.size
127
128
  @expected_columns = row.size unless @expected_columns != 0
128
129
 
129
130
  build_errors(:blank_rows, :structure, current_line, nil, wrapper.line) if row.reject{ |c| c.nil? || c.empty? }.size == 0
@@ -150,7 +151,7 @@ module Csvlint
150
151
  end
151
152
  end
152
153
  rescue ArgumentError => ae
153
- build_errors(:invalid_encoding, :structure, current_line, wrapper.line) unless reported_invalid_encoding
154
+ build_errors(:invalid_encoding, :structure, current_line, nil, wrapper.line) unless reported_invalid_encoding
154
155
  reported_invalid_encoding = true
155
156
  end
156
157
  end
@@ -178,7 +179,7 @@ module Csvlint
178
179
  end
179
180
 
180
181
  def fetch_error(error)
181
- e = error.message.match(/^([a-z ]+) (i|o)n line ([0-9]+)\.?$/i)
182
+ e = error.message.match(/^(.+?)(?: [io]n)? \(?line \d+\)?\.?$/i)
182
183
  message = e[1] rescue nil
183
184
  ERROR_MATCHERS.fetch(message, :unknown_error)
184
185
  end
@@ -195,40 +196,52 @@ module Csvlint
195
196
  }
196
197
  end
197
198
 
198
- def build_formats(row, line)
199
+ def build_formats(row)
199
200
  row.each_with_index do |col, i|
200
- next if col.blank?
201
- @formats[i] ||= []
202
-
203
- SIMPLE_FORMATS.each do |type, lambda|
204
- begin
205
- if lambda.call(col)
206
- @format = type
207
- end
208
- rescue ArgumentError, URI::InvalidURIError
209
- end
201
+ next if col.nil? || col.empty?
202
+ @formats[i] ||= Hash.new(0)
203
+
204
+ format = if col.strip[FORMATS[:numeric]]
205
+ :numeric
206
+ elsif uri?(col)
207
+ :uri
208
+ elsif col[FORMATS[:date_db]] && date_format?(Date, col, '%Y-%m-%d')
209
+ :date_db
210
+ elsif col[FORMATS[:date_short]] && date_format?(Date, col, '%e %b')
211
+ :date_short
212
+ elsif col[FORMATS[:date_rfc822]] && date_format?(Date, col, '%e %b %Y')
213
+ :date_rfc822
214
+ elsif col[FORMATS[:date_long]] && date_format?(Date, col, '%B %e, %Y')
215
+ :date_long
216
+ elsif col[FORMATS[:dateTime_time]] && date_format?(Time, col, '%H:%M')
217
+ :dateTime_time
218
+ elsif col[FORMATS[:dateTime_hms]] && date_format?(Time, col, '%H:%M:%S')
219
+ :dateTime_hms
220
+ elsif col[FORMATS[:dateTime_db]] && date_format?(Time, col, '%Y-%m-%d %H:%M:%S')
221
+ :dateTime_db
222
+ elsif col[FORMATS[:dateTime_iso8601]] && date_format?(Time, col, '%Y-%m-%dT%H:%M:%SZ')
223
+ :dateTime_iso8601
224
+ elsif col[FORMATS[:dateTime_short]] && date_format?(Time, col, '%d %b %H:%M')
225
+ :dateTime_short
226
+ elsif col[FORMATS[:dateTime_long]] && date_format?(Time, col, '%B %d, %Y %H:%M')
227
+ :dateTime_long
228
+ else
229
+ :string
210
230
  end
211
231
 
212
- @formats[i] << @format
232
+ @formats[i][format] += 1
213
233
  end
214
234
  end
215
235
 
216
236
  def check_consistency
217
- percentages = []
218
-
219
- SIMPLE_FORMATS.keys.each do |type|
220
- @formats.each_with_index do |format,i|
221
- percentages[i] ||= {}
222
- unless format.nil?
223
- percentages[i][type] = format.count(type) / format.size.to_f
237
+ @formats.each_with_index do |format,i|
238
+ if format
239
+ total = format.values.reduce(:+).to_f
240
+ if format.none?{|_,count| count / total >= 0.9}
241
+ build_warnings(:inconsistent_values, :schema, nil, i + 1)
224
242
  end
225
243
  end
226
244
  end
227
-
228
- percentages.each_with_index do |col, i|
229
- next if col.values.blank?
230
- build_warnings(:inconsistent_values, :schema, nil, i+1) if col.values.max < 0.9
231
- end
232
245
  end
233
246
 
234
247
  private
@@ -248,6 +261,36 @@ module Csvlint
248
261
  File.extname(parsed.path)
249
262
  end
250
263
  end
251
-
264
+
265
+ def uri?(value)
266
+ if value.strip[FORMATS[:uri]]
267
+ uri = URI.parse(value)
268
+ uri.kind_of?(URI::HTTP) || uri.kind_of?(URI::HTTPS)
269
+ end
270
+ rescue URI::InvalidURIError
271
+ false
272
+ end
273
+
274
+ def date_format?(klass, value, format)
275
+ klass.strptime(value, format).strftime(format) == value
276
+ rescue ArgumentError # invalid date
277
+ false
278
+ end
279
+
280
+ FORMATS = {
281
+ :string => nil,
282
+ :numeric => /\A[-+]?\d*\.?\d+(?:[eE][-+]?\d+)?\z/,
283
+ :uri => /\Ahttps?:/,
284
+ :date_db => /\A\d{4,}-\d\d-\d\d\z/, # "12345-01-01"
285
+ :date_long => /\A(?:#{Date::MONTHNAMES.join('|')}) [ \d]\d, \d{4,}\z/, # "January 1, 12345"
286
+ :date_rfc822 => /\A[ \d]\d (?:#{Date::ABBR_MONTHNAMES.join('|')}) \d{4,}\z/, # " 1 Jan 12345"
287
+ :date_short => /\A[ \d]\d (?:#{Date::ABBR_MONTHNAMES.join('|')})\z/, # "1 Jan"
288
+ :dateTime_db => /\A\d{4,}-\d\d-\d\d \d\d:\d\d:\d\d\z/, # "12345-01-01 00:00:00"
289
+ :dateTime_hms => /\A\d\d:\d\d:\d\d\z/, # "00:00:00"
290
+ :dateTime_iso8601 => /\A\d{4,}-\d\d-\d\dT\d\d:\d\d:\d\dZ\z/, # "12345-01-01T00:00:00Z"
291
+ :dateTime_long => /\A(?:#{Date::MONTHNAMES.join('|')}) \d\d, \d{4,} \d\d:\d\d\z/, # "January 01, 12345 00:00"
292
+ :dateTime_short => /\A\d\d (?:#{Date::ABBR_MONTHNAMES.join('|')}) \d\d:\d\d\z/, # "01 Jan 00:00"
293
+ :dateTime_time => /\A\d\d:\d\d\z/, # "00:00"
294
+ }.freeze
252
295
  end
253
- end
296
+ end
@@ -1,3 +1,3 @@
1
1
  module Csvlint
2
- VERSION = "0.1.0"
2
+ VERSION = "0.1.1"
3
3
  end
data/spec/spec_helper.rb CHANGED
@@ -1,11 +1,9 @@
1
- require 'simplecov'
2
- require 'simplecov-rcov'
1
+ require 'coveralls'
2
+ Coveralls.wear_merged!('test_frameworks')
3
+
3
4
  require 'csvlint'
4
5
  require 'pry'
5
6
  require 'webmock/rspec'
6
- require 'coveralls'
7
-
8
- Coveralls.wear_merged!
9
7
 
10
8
  RSpec.configure do |config|
11
9
  config.treat_symbols_as_metadata_keys_with_true_values = true
@@ -159,21 +159,21 @@ describe Csvlint::Validator do
159
159
  context "build_formats" do
160
160
 
161
161
  {
162
- "string" => "foo",
163
- "numeric" => "1",
164
- "uri" => "http://www.example.com",
165
- "dateTime_iso8601" => "2013-01-01T13:00:00Z",
166
- "date_db" => "2013-01-01",
167
- "dateTime_hms" => "13:00:00"
162
+ :string => "foo",
163
+ :numeric => "1",
164
+ :uri => "http://www.example.com",
165
+ :dateTime_iso8601 => "2013-01-01T13:00:00Z",
166
+ :date_db => "2013-01-01",
167
+ :dateTime_hms => "13:00:00"
168
168
  }.each do |type, content|
169
169
  it "should return the format of #{type} correctly" do
170
170
  row = [content]
171
171
 
172
172
  validator = Csvlint::Validator.new("http://example.com/example.csv")
173
- validator.build_formats(row, 1)
173
+ validator.build_formats(row)
174
174
  formats = validator.instance_variable_get("@formats")
175
175
 
176
- formats[0].first.should == type
176
+ formats[0].keys.first.should == type
177
177
  end
178
178
  end
179
179
 
@@ -181,18 +181,18 @@ describe Csvlint::Validator do
181
181
  row = ["12", "3.1476"]
182
182
 
183
183
  validator = Csvlint::Validator.new("http://example.com/example.csv")
184
- validator.build_formats(row, 1)
184
+ validator.build_formats(row)
185
185
  formats = validator.instance_variable_get("@formats")
186
186
 
187
- formats[0].first.should == "numeric"
188
- formats[1].first.should == "numeric"
187
+ formats[0].keys.first.should == :numeric
188
+ formats[1].keys.first.should == :numeric
189
189
  end
190
190
 
191
191
  it "should ignore blank arrays" do
192
192
  row = []
193
193
 
194
194
  validator = Csvlint::Validator.new("http://example.com/example.csv")
195
- validator.build_formats(row, 1)
195
+ validator.build_formats(row)
196
196
  formats = validator.instance_variable_get("@formats")
197
197
  formats.should == []
198
198
  end
@@ -207,16 +207,12 @@ describe Csvlint::Validator do
207
207
  validator = Csvlint::Validator.new("http://example.com/example.csv")
208
208
 
209
209
  rows.each_with_index do |row, i|
210
- validator.build_formats(row, i)
210
+ validator.build_formats(row)
211
211
  end
212
212
 
213
213
  formats = validator.instance_variable_get("@formats")
214
214
 
215
- formats.should == [
216
- ["string",
217
- "string",
218
- "string"]
219
- ]
215
+ formats.should == [{:string => 3}]
220
216
  end
221
217
 
222
218
  it "should return formats correctly if a row is blank" do
@@ -228,15 +224,15 @@ describe Csvlint::Validator do
228
224
  validator = Csvlint::Validator.new("http://example.com/example.csv")
229
225
 
230
226
  rows.each_with_index do |row, i|
231
- validator.build_formats(row, i)
227
+ validator.build_formats(row)
232
228
  end
233
229
 
234
230
  formats = validator.instance_variable_get("@formats")
235
231
 
236
232
  formats.should == [
237
- ["string"],
238
- ["numeric"],
239
- ["string"]
233
+ {:string => 1},
234
+ {:numeric => 1},
235
+ {:string => 1},
240
236
  ]
241
237
  end
242
238
 
@@ -246,9 +242,9 @@ describe Csvlint::Validator do
246
242
 
247
243
  it "should return a warning if columns have inconsistent values" do
248
244
  formats = [
249
- ["string", "string", "string"],
250
- ["string", "numeric", "string"],
251
- ["numeric", "numeric", "numeric"],
245
+ {:string => 3},
246
+ {:string => 2, :numeric => 1},
247
+ {:numeric => 3},
252
248
  ]
253
249
 
254
250
  validator = Csvlint::Validator.new("http://example.com/example.csv")
@@ -290,6 +286,17 @@ describe Csvlint::Validator do
290
286
  expect( data[2] ).to eql ['3','2','1']
291
287
  end
292
288
 
289
+ it "should limit number of lines read" do
290
+ stub_request(:get, "http://example.com/example.csv").to_return(:status => 200,
291
+ :headers=>{"Content-Type" => "text/csv; header=present"},
292
+ :body => File.read(File.join(File.dirname(__FILE__),'..','features','fixtures','valid.csv')))
293
+ validator = Csvlint::Validator.new("http://example.com/example.csv", nil, nil, limit_lines: 2)
294
+ expect( validator.valid? ).to eql(true)
295
+ data = validator.data
296
+ expect( data.count ).to eql 2
297
+ expect( data[0] ).to eql ['Foo','Bar','Baz']
298
+ end
299
+
293
300
  it "should follow redirects to SSL" do
294
301
  stub_request(:get, "http://example.com/redirect").to_return(:status => 301, :headers=>{"Location" => "https://example.com/example.csv"})
295
302
  stub_request(:get, "https://example.com/example.csv").to_return(:status => 200,
metadata CHANGED
@@ -1,14 +1,14 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: csvlint
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.1.0
4
+ version: 0.1.1
5
5
  platform: ruby
6
6
  authors:
7
7
  - pezholio
8
8
  autorequire:
9
9
  bindir: bin
10
10
  cert_chain: []
11
- date: 2014-11-27 00:00:00.000000000 Z
11
+ date: 2015-07-13 00:00:00.000000000 Z
12
12
  dependencies:
13
13
  - !ruby/object:Gem::Dependency
14
14
  name: mime-types
@@ -259,6 +259,7 @@ extra_rdoc_files: []
259
259
  files:
260
260
  - .coveralls.yml
261
261
  - .gitignore
262
+ - .ruby-version
262
263
  - .travis.yml
263
264
  - Gemfile
264
265
  - LICENSE.md
@@ -271,6 +272,7 @@ files:
271
272
  - features/csv_options.feature
272
273
  - features/fixtures/cr-line-endings.csv
273
274
  - features/fixtures/crlf-line-endings.csv
275
+ - features/fixtures/inconsistent-line-endings-unquoted.csv
274
276
  - features/fixtures/inconsistent-line-endings.csv
275
277
  - features/fixtures/invalid-byte-sequence.csv
276
278
  - features/fixtures/lf-line-endings.csv
@@ -300,7 +302,6 @@ files:
300
302
  - lib/csvlint/error_message.rb
301
303
  - lib/csvlint/field.rb
302
304
  - lib/csvlint/schema.rb
303
- - lib/csvlint/types.rb
304
305
  - lib/csvlint/validate.rb
305
306
  - lib/csvlint/version.rb
306
307
  - lib/csvlint/wrapped_io.rb
@@ -328,7 +329,7 @@ required_rubygems_version: !ruby/object:Gem::Requirement
328
329
  version: '0'
329
330
  requirements: []
330
331
  rubyforge_project:
331
- rubygems_version: 2.4.2
332
+ rubygems_version: 2.4.5
332
333
  signing_key:
333
334
  specification_version: 4
334
335
  summary: CSV Validator
@@ -337,6 +338,7 @@ test_files:
337
338
  - features/csv_options.feature
338
339
  - features/fixtures/cr-line-endings.csv
339
340
  - features/fixtures/crlf-line-endings.csv
341
+ - features/fixtures/inconsistent-line-endings-unquoted.csv
340
342
  - features/fixtures/inconsistent-line-endings.csv
341
343
  - features/fixtures/invalid-byte-sequence.csv
342
344
  - features/fixtures/lf-line-endings.csv
data/lib/csvlint/types.rb DELETED
@@ -1,137 +0,0 @@
1
- require 'set'
2
- require 'date'
3
- require 'active_support/core_ext/date/conversions'
4
- require 'active_support/core_ext/time/conversions'
5
-
6
- module Csvlint
7
- module Types
8
- SIMPLE_FORMATS = {
9
- 'string' => lambda { |value| true },
10
- 'numeric' => lambda { |value| value.strip[/\A[-+]?\d*\.?\d+(?:[eE][-+]?\d+)?\z/] },
11
- 'uri' => lambda do |value|
12
- if value.strip[/\Ahttps?:/]
13
- u = URI.parse(value)
14
- u.kind_of?(URI::HTTP) || u.kind_of?(URI::HTTPS)
15
- end
16
- end
17
- }
18
-
19
- def self.date_format(klass, value, format, pattern)
20
- if value[pattern]
21
- klass.strptime(value, format).strftime(format) == value
22
- end
23
- end
24
-
25
- def self.included(base)
26
- [
27
- [ :db, "%Y-%m-%d",
28
- /\A\d{4,}-\d\d-\d\d\z/],
29
- [ :number, "%Y%m%d",
30
- /\A\d{8}\z/],
31
- [ :short, "%e %b",
32
- /\A[ \d]\d (?:#{Date::ABBR_MONTHNAMES.join('|')})\z/],
33
- [ :rfc822, "%e %b %Y",
34
- /\A[ \d]\d (?:#{Date::ABBR_MONTHNAMES.join('|')}) \d{4,}\z/],
35
- [ :long, "%B %e, %Y",
36
- /\A(?:#{Date::MONTHNAMES.join('|')}) [ \d]\d, \d{4,}\z/],
37
- ].each do |type,format,pattern|
38
- SIMPLE_FORMATS["date_#{type}"] = lambda do |value|
39
- date_format(Date, value, format, pattern)
40
- end
41
- end
42
-
43
- # strptime doesn't support widths like %9N, unlike strftime.
44
- # @see http://ruby-doc.org/stdlib-2.0/libdoc/date/rdoc/DateTime.html
45
- [
46
- [ :time, "%H:%M",
47
- /\A\d\d:\d\d\z/],
48
- [ :hms, "%H:%M:%S",
49
- /\A\d\d:\d\d:\d\d\z/],
50
- [ :db, "%Y-%m-%d %H:%M:%S",
51
- /\A\d{4,}-\d\d-\d\d \d\d:\d\d:\d\d\z/],
52
- [ :iso8601, "%Y-%m-%dT%H:%M:%SZ",
53
- /\A\d{4,}-\d\d-\d\dT\d\d:\d\d:\d\dZ\z/],
54
- [ :number, "%Y%m%d%H%M%S",
55
- /\A\d{14}\z/],
56
- [ :nsec, "%Y%m%d%H%M%S%N",
57
- /\A\d{23}\z/],
58
- [ :short, "%d %b %H:%M",
59
- /\A\d\d (?:#{Date::ABBR_MONTHNAMES.join('|')}) \d\d:\d\d\z/],
60
- [ :long, "%B %d, %Y %H:%M",
61
- /\A(?:#{Date::MONTHNAMES.join('|')}) \d\d, \d{4,} \d\d:\d\d\z/],
62
- ].each do |type,format,pattern|
63
- SIMPLE_FORMATS["dateTime_#{type}"] = lambda do |value|
64
- date_format(Time, value, format, pattern)
65
- end
66
- end
67
- end
68
-
69
- TYPE_VALIDATIONS = {
70
- 'http://www.w3.org/2001/XMLSchema#string' => lambda { |value, constraints| value },
71
- 'http://www.w3.org/2001/XMLSchema#int' => lambda { |value, constraints| Integer value },
72
- 'http://www.w3.org/2001/XMLSchema#integer' => lambda { |value, constraints| Integer value },
73
- 'http://www.w3.org/2001/XMLSchema#float' => lambda { |value, constraints| Float value },
74
- 'http://www.w3.org/2001/XMLSchema#double' => lambda { |value, constraints| Float value },
75
- 'http://www.w3.org/2001/XMLSchema#anyURI' => lambda do |value, constraints|
76
- u = URI.parse value
77
- raise ArgumentError unless u.kind_of?(URI::HTTP) || u.kind_of?(URI::HTTPS)
78
- u
79
- end,
80
- 'http://www.w3.org/2001/XMLSchema#boolean' => lambda do |value, constraints|
81
- return true if ['true', '1'].include? value
82
- return false if ['false', '0'].include? value
83
- raise ArgumentError
84
- end,
85
- 'http://www.w3.org/2001/XMLSchema#nonPositiveInteger' => lambda do |value, constraints|
86
- i = Integer value
87
- raise ArgumentError unless i <= 0
88
- i
89
- end,
90
- 'http://www.w3.org/2001/XMLSchema#negativeInteger' => lambda do |value, constraints|
91
- i = Integer value
92
- raise ArgumentError unless i < 0
93
- i
94
- end,
95
- 'http://www.w3.org/2001/XMLSchema#nonNegativeInteger' => lambda do |value, constraints|
96
- i = Integer value
97
- raise ArgumentError unless i >= 0
98
- i
99
- end,
100
- 'http://www.w3.org/2001/XMLSchema#positiveInteger' => lambda do |value, constraints|
101
- i = Integer value
102
- raise ArgumentError unless i > 0
103
- i
104
- end,
105
- 'http://www.w3.org/2001/XMLSchema#dateTime' => lambda do |value, constraints|
106
- date_pattern = constraints["datePattern"] || "%Y-%m-%dT%H:%M:%SZ"
107
- d = DateTime.strptime(value, date_pattern)
108
- raise ArgumentError unless d.strftime(date_pattern) == value
109
- d
110
- end,
111
- 'http://www.w3.org/2001/XMLSchema#date' => lambda do |value, constraints|
112
- date_pattern = constraints["datePattern"] || "%Y-%m-%d"
113
- d = Date.strptime(value, date_pattern)
114
- raise ArgumentError unless d.strftime(date_pattern) == value
115
- d
116
- end,
117
- 'http://www.w3.org/2001/XMLSchema#time' => lambda do |value, constraints|
118
- date_pattern = constraints["datePattern"] || "%H:%M:%S"
119
- d = DateTime.strptime(value, date_pattern)
120
- raise ArgumentError unless d.strftime(date_pattern) == value
121
- d
122
- end,
123
- 'http://www.w3.org/2001/XMLSchema#gYear' => lambda do |value, constraints|
124
- date_pattern = constraints["datePattern"] || "%Y"
125
- d = Date.strptime(value, date_pattern)
126
- raise ArgumentError unless d.strftime(date_pattern) == value
127
- d
128
- end,
129
- 'http://www.w3.org/2001/XMLSchema#gYearMonth' => lambda do |value, constraints|
130
- date_pattern = constraints["datePattern"] || "%Y-%m"
131
- d = Date.strptime(value, date_pattern)
132
- raise ArgumentError unless d.strftime(date_pattern) == value
133
- d
134
- end,
135
- }
136
- end
137
- end