csvlint 0.1.2 → 0.1.3
Sign up to get free protection for your applications and to get access to all the features.
- checksums.yaml +8 -8
- data/CHANGELOG.md +137 -0
- data/csvlint.gemspec +1 -0
- data/features/csvupload.feature +153 -0
- data/features/schema_validation.feature +43 -1
- data/lib/csvlint/error_collector.rb +3 -1
- data/lib/csvlint/error_message.rb +1 -0
- data/lib/csvlint/field.rb +34 -17
- data/lib/csvlint/schema.rb +21 -18
- data/lib/csvlint/validate.rb +59 -48
- data/lib/csvlint/version.rb +1 -1
- metadata +19 -2
checksums.yaml
CHANGED
@@ -1,15 +1,15 @@
|
|
1
1
|
---
|
2
2
|
!binary "U0hBMQ==":
|
3
3
|
metadata.gz: !binary |-
|
4
|
-
|
4
|
+
YmFkYTMwMzU2Njg0ZDk5NmJkOTU2ZjczNWVjZTRjNTVmZWEyOTY2NA==
|
5
5
|
data.tar.gz: !binary |-
|
6
|
-
|
6
|
+
ODIzOTA1MmQ0MzI3ZTZlOTc2NTU5NmFkNzgxODkxODU2MjFhY2VmYw==
|
7
7
|
SHA512:
|
8
8
|
metadata.gz: !binary |-
|
9
|
-
|
10
|
-
|
11
|
-
|
9
|
+
Y2IwYTYzYjg1N2JmMmFmN2Q1NjIwYTQxMTU0Y2Y5YTM3YzFkYjkyZWI1MWEy
|
10
|
+
ODJiZTZjNDhlMDc0MjliZGRkMDQyZDRkMDdlNzRjMmFhN2IyYTJkZDU4MWZl
|
11
|
+
NzdmYWQwZDI5MTI1Nzk5ZDQxZjRiZGVlMTI3NzMwNmI2NGZiZTM=
|
12
12
|
data.tar.gz: !binary |-
|
13
|
-
|
14
|
-
|
15
|
-
|
13
|
+
NDBlNGVhMTVkM2IzNDIyNWEzMjU5NDI4YjZjNjlmODg5YWNiOTMzNDk4ZGQ5
|
14
|
+
ODVmMTM2MWJjNTIwNzdhNmNkNTNhZGUyZTUyZTgyYzM4Njc4YWUwYzA1YWJl
|
15
|
+
MmYwNmFkMTVlMGIyNDE5ODVkN2E3NWE5ZWFlYzQ1NGVjNzY1Y2Q=
|
data/CHANGELOG.md
ADDED
@@ -0,0 +1,137 @@
|
|
1
|
+
# Change Log
|
2
|
+
|
3
|
+
## [Unreleased](https://github.com/theodi/csvlint.rb/tree/HEAD)
|
4
|
+
|
5
|
+
[Full Changelog](https://github.com/theodi/csvlint.rb/compare/0.1.2...HEAD)
|
6
|
+
|
7
|
+
**Merged pull requests:**
|
8
|
+
|
9
|
+
- Error reporting schema expanded test suite [\#138](https://github.com/theodi/csvlint.rb/pull/138) ([quadrophobiac](https://github.com/quadrophobiac))
|
10
|
+
|
11
|
+
- Validate header size improvement [\#137](https://github.com/theodi/csvlint.rb/pull/137) ([adamc00](https://github.com/adamc00))
|
12
|
+
|
13
|
+
- Invalid schema [\#132](https://github.com/theodi/csvlint.rb/pull/132) ([bcouston](https://github.com/bcouston))
|
14
|
+
|
15
|
+
## [0.1.2](https://github.com/theodi/csvlint.rb/tree/0.1.2) (2015-07-15)
|
16
|
+
|
17
|
+
[Full Changelog](https://github.com/theodi/csvlint.rb/compare/0.1.1...0.1.2)
|
18
|
+
|
19
|
+
**Closed issues:**
|
20
|
+
|
21
|
+
- When an encoding error is thrown the line content is put into the column field in the error object [\#131](https://github.com/theodi/csvlint.rb/issues/131)
|
22
|
+
|
23
|
+
**Merged pull requests:**
|
24
|
+
|
25
|
+
- Catch invalid URIs [\#133](https://github.com/theodi/csvlint.rb/pull/133) ([pezholio](https://github.com/pezholio))
|
26
|
+
|
27
|
+
- Emit a warning when the CSV header does not match the supplied schema [\#127](https://github.com/theodi/csvlint.rb/pull/127) ([adamc00](https://github.com/adamc00))
|
28
|
+
|
29
|
+
## [0.1.1](https://github.com/theodi/csvlint.rb/tree/0.1.1) (2015-07-13)
|
30
|
+
|
31
|
+
[Full Changelog](https://github.com/theodi/csvlint.rb/compare/0.1.0...0.1.1)
|
32
|
+
|
33
|
+
**Closed issues:**
|
34
|
+
|
35
|
+
- Add Command Line Support [\#128](https://github.com/theodi/csvlint.rb/issues/128)
|
36
|
+
|
37
|
+
- BUG: Incorrect inconsistent\_values error on numeric columns [\#106](https://github.com/theodi/csvlint.rb/issues/106)
|
38
|
+
|
39
|
+
**Merged pull requests:**
|
40
|
+
|
41
|
+
- Fixes line content incorrectly being put into the row column field when there is an encoding error. [\#130](https://github.com/theodi/csvlint.rb/pull/130) ([glacier](https://github.com/glacier))
|
42
|
+
|
43
|
+
- Add command line help [\#129](https://github.com/theodi/csvlint.rb/pull/129) ([pezholio](https://github.com/pezholio))
|
44
|
+
|
45
|
+
- Remove stray q character. [\#125](https://github.com/theodi/csvlint.rb/pull/125) ([adamc00](https://github.com/adamc00))
|
46
|
+
|
47
|
+
- csvlint utility can take arguments to specify a schema and pp errors [\#124](https://github.com/theodi/csvlint.rb/pull/124) ([adamc00](https://github.com/adamc00))
|
48
|
+
|
49
|
+
- Fixed warning - use expect\( \) rather than .should [\#123](https://github.com/theodi/csvlint.rb/pull/123) ([jezhiggins](https://github.com/jezhiggins))
|
50
|
+
|
51
|
+
- Fixed spelling mistake [\#121](https://github.com/theodi/csvlint.rb/pull/121) ([jezhiggins](https://github.com/jezhiggins))
|
52
|
+
|
53
|
+
- Avoid using \#blank? if unnecessary [\#120](https://github.com/theodi/csvlint.rb/pull/120) ([jpmckinney](https://github.com/jpmckinney))
|
54
|
+
|
55
|
+
- eliminate some date and time formats, related \#105 [\#119](https://github.com/theodi/csvlint.rb/pull/119) ([jpmckinney](https://github.com/jpmckinney))
|
56
|
+
|
57
|
+
- Match another CSV error about line endings [\#118](https://github.com/theodi/csvlint.rb/pull/118) ([jpmckinney](https://github.com/jpmckinney))
|
58
|
+
|
59
|
+
- fixed typo mistake in README [\#117](https://github.com/theodi/csvlint.rb/pull/117) ([railsfactory-kumaresan](https://github.com/railsfactory-kumaresan))
|
60
|
+
|
61
|
+
- Integrate @jpmickinney's build\_formats improvements [\#112](https://github.com/theodi/csvlint.rb/pull/112) ([Floppy](https://github.com/Floppy))
|
62
|
+
|
63
|
+
- make limit\_lines into a non-dialect option [\#110](https://github.com/theodi/csvlint.rb/pull/110) ([Floppy](https://github.com/Floppy))
|
64
|
+
|
65
|
+
- fix coveralls stats [\#109](https://github.com/theodi/csvlint.rb/pull/109) ([Floppy](https://github.com/Floppy))
|
66
|
+
|
67
|
+
- Speed up \#build\_formats \(changes its API\) [\#103](https://github.com/theodi/csvlint.rb/pull/103) ([jpmckinney](https://github.com/jpmckinney))
|
68
|
+
|
69
|
+
- Limit lines [\#101](https://github.com/theodi/csvlint.rb/pull/101) ([Hoedic](https://github.com/Hoedic))
|
70
|
+
|
71
|
+
## [0.1.0](https://github.com/theodi/csvlint.rb/tree/0.1.0) (2014-11-27)
|
72
|
+
|
73
|
+
**Implemented enhancements:**
|
74
|
+
|
75
|
+
- Blank values shouldn't count as inconsistencies [\#90](https://github.com/theodi/csvlint.rb/issues/90)
|
76
|
+
|
77
|
+
- Make sure we don't check schema column count and ragged row count together [\#66](https://github.com/theodi/csvlint.rb/issues/66)
|
78
|
+
|
79
|
+
- Include the failed constraints in error message when doing field validation [\#64](https://github.com/theodi/csvlint.rb/issues/64)
|
80
|
+
|
81
|
+
- Include the column value in error message when field validation fails [\#63](https://github.com/theodi/csvlint.rb/issues/63)
|
82
|
+
|
83
|
+
- Expose optional JSON table schema fields [\#55](https://github.com/theodi/csvlint.rb/issues/55)
|
84
|
+
|
85
|
+
- Ensure header rows are properly handled and validated [\#48](https://github.com/theodi/csvlint.rb/issues/48)
|
86
|
+
|
87
|
+
- Support zipped CSV? [\#30](https://github.com/theodi/csvlint.rb/issues/30)
|
88
|
+
|
89
|
+
- Improve feedback on inconsistent values [\#29](https://github.com/theodi/csvlint.rb/issues/29)
|
90
|
+
|
91
|
+
- Reported error positions are not massively useful [\#15](https://github.com/theodi/csvlint.rb/issues/15)
|
92
|
+
|
93
|
+
**Fixed bugs:**
|
94
|
+
|
95
|
+
- undefined method `\[\]' for nil:NilClass from fetch\_error [\#71](https://github.com/theodi/csvlint.rb/issues/71)
|
96
|
+
|
97
|
+
- Inconsistent column bases [\#69](https://github.com/theodi/csvlint.rb/issues/69)
|
98
|
+
|
99
|
+
- Improve error handling in Schema loading [\#42](https://github.com/theodi/csvlint.rb/issues/42)
|
100
|
+
|
101
|
+
- Recover from some line ending problems [\#41](https://github.com/theodi/csvlint.rb/issues/41)
|
102
|
+
|
103
|
+
- Inconsistent values due to number format differences [\#32](https://github.com/theodi/csvlint.rb/issues/32)
|
104
|
+
|
105
|
+
- New lines in quoted fields are valid [\#31](https://github.com/theodi/csvlint.rb/issues/31)
|
106
|
+
|
107
|
+
- Wrongly reporting incorrect file extension [\#23](https://github.com/theodi/csvlint.rb/issues/23)
|
108
|
+
|
109
|
+
- Incorrect extension reported when URL has query options at the end [\#14](https://github.com/theodi/csvlint.rb/issues/14)
|
110
|
+
|
111
|
+
**Closed issues:**
|
112
|
+
|
113
|
+
- Get gem continuously deploying [\#93](https://github.com/theodi/csvlint.rb/issues/93)
|
114
|
+
|
115
|
+
- Publish on rubygems.org [\#92](https://github.com/theodi/csvlint.rb/issues/92)
|
116
|
+
|
117
|
+
- Duplicate column names [\#87](https://github.com/theodi/csvlint.rb/issues/87)
|
118
|
+
|
119
|
+
- Return code is always 0 \(except when it isn't\) [\#85](https://github.com/theodi/csvlint.rb/issues/85)
|
120
|
+
|
121
|
+
- Can't pipe data to csvlint [\#84](https://github.com/theodi/csvlint.rb/issues/84)
|
122
|
+
|
123
|
+
- They have some validator running if someone wants to inspect it for "inspiration" [\#27](https://github.com/theodi/csvlint.rb/issues/27)
|
124
|
+
|
125
|
+
- Allow CSV parsing options to be configured as a parameter [\#6](https://github.com/theodi/csvlint.rb/issues/6)
|
126
|
+
|
127
|
+
- Use explicit CSV parsing options [\#5](https://github.com/theodi/csvlint.rb/issues/5)
|
128
|
+
|
129
|
+
- Improving encoding detection [\#2](https://github.com/theodi/csvlint.rb/issues/2)
|
130
|
+
|
131
|
+
**Merged pull requests:**
|
132
|
+
|
133
|
+
- Continuously deploy gem [\#102](https://github.com/theodi/csvlint.rb/pull/102) ([pezholio](https://github.com/pezholio))
|
134
|
+
|
135
|
+
|
136
|
+
|
137
|
+
\* *This Change Log was automatically generated by [github_changelog_generator](https://github.com/skywinder/Github-Changelog-Generator)*
|
data/csvlint.gemspec
CHANGED
@@ -0,0 +1,153 @@
|
|
1
|
+
Feature: Collect all the tests that should trigger dialect check related errors
|
2
|
+
|
3
|
+
Scenario: Title rows, I wish to trigger a :title_row type message
|
4
|
+
Given I have a CSV file called "title-row.csv"
|
5
|
+
And it is stored at the url "http://example.com/example1.csv"
|
6
|
+
And I ask if there are warnings
|
7
|
+
Then there should be 1 warnings
|
8
|
+
And that warning should have the type "title_row"
|
9
|
+
|
10
|
+
# :nonrfc_line_breaks
|
11
|
+
|
12
|
+
Scenario: LF line endings in file give an info message of type :nonrfc_line_breaks
|
13
|
+
Given I have a CSV file called "lf-line-endings.csv"
|
14
|
+
And it is stored at the url "http://example.com/example1.csv"
|
15
|
+
And I set header to "true"
|
16
|
+
And I ask if there are info messages
|
17
|
+
Then there should be 2 info messages
|
18
|
+
And one of the messages should have the type "nonrfc_line_breaks"
|
19
|
+
|
20
|
+
Scenario: CR line endings in file give an info message of type :nonrfc_line_breaks
|
21
|
+
Given I have a CSV file called "cr-line-endings.csv"
|
22
|
+
And it is stored at the url "http://example.com/example1.csv"
|
23
|
+
And I set header to "true"
|
24
|
+
And I ask if there are info messages
|
25
|
+
Then there should be 2 info messages
|
26
|
+
And one of the messages should have the type "nonrfc_line_breaks"
|
27
|
+
|
28
|
+
Scenario: CRLF line endings in file produces no info messages of type :nonrfc_line_breaks
|
29
|
+
Given I have a CSV file called "crlf-line-endings.csv"
|
30
|
+
And it is stored at the url "http://example.com/example1.csv"
|
31
|
+
And I set header to "true"
|
32
|
+
And I ask if there are info messages
|
33
|
+
Then there should be 1 info message
|
34
|
+
|
35
|
+
# :line_breaks
|
36
|
+
|
37
|
+
Scenario: Incorrect line endings specified in settings
|
38
|
+
Given I have a CSV file called "cr-line-endings.csv"
|
39
|
+
And I set the line endings to linefeed
|
40
|
+
And it is stored at the url "http://example.com/example1.csv"
|
41
|
+
And I ask if there are errors
|
42
|
+
Then there should be 1 error
|
43
|
+
And that error should have the type "line_breaks"
|
44
|
+
|
45
|
+
Scenario: inconsistent line endings in file cause an error
|
46
|
+
Given I have a CSV file called "inconsistent-line-endings.csv"
|
47
|
+
And it is stored at the url "http://example.com/example1.csv"
|
48
|
+
And I ask if there are errors
|
49
|
+
Then there should be 1 error
|
50
|
+
And that error should have the type "line_breaks"
|
51
|
+
|
52
|
+
|
53
|
+
Scenario: inconsistent line endings with unquoted fields in file cause an error
|
54
|
+
Given I have a CSV file called "inconsistent-line-endings-unquoted.csv"
|
55
|
+
And it is stored at the url "http://example.com/example1.csv"
|
56
|
+
And I ask if there are errors
|
57
|
+
Then there should be 1 error
|
58
|
+
And that error should have the type "line_breaks"
|
59
|
+
|
60
|
+
#:unclosed_quote
|
61
|
+
|
62
|
+
Scenario: CSV with incorrect quoting
|
63
|
+
Given I have a CSV with the following content:
|
64
|
+
"""
|
65
|
+
"col1","col2","col3"
|
66
|
+
"Foo","Bar","Baz
|
67
|
+
"""
|
68
|
+
And it is stored at the url "http://example.com/example1.csv"
|
69
|
+
When I ask if there are errors
|
70
|
+
Then there should be 1 error
|
71
|
+
And that error should have the type "unclosed_quote"
|
72
|
+
And that error should have the row "2"
|
73
|
+
And that error should have the content ""Foo","Bar","Baz"
|
74
|
+
|
75
|
+
# :invalid_encoding
|
76
|
+
|
77
|
+
Scenario: Report invalid Encoding
|
78
|
+
Given I have a CSV file called "invalid-byte-sequence.csv"
|
79
|
+
And I set an encoding header of "UTF-8"
|
80
|
+
And it is stored at the url "http://example.com/example1.csv"
|
81
|
+
When I ask if there are errors
|
82
|
+
Then there should be 1 error
|
83
|
+
And that error should have the type "invalid_encoding"
|
84
|
+
|
85
|
+
Scenario: Report invalid file
|
86
|
+
#should this throw an excel error?
|
87
|
+
Given I have a CSV file called "spreadsheet.xls"
|
88
|
+
And it is stored at the url "http://example.com/example1.csv"
|
89
|
+
When I ask if there are errors
|
90
|
+
Then there should be 1 error
|
91
|
+
And that error should have the type "invalid_encoding"
|
92
|
+
|
93
|
+
# :blank_rows
|
94
|
+
|
95
|
+
Scenario: Successfully report a CSV with blank rows
|
96
|
+
Given I have a CSV with the following content:
|
97
|
+
"""
|
98
|
+
"col1","col2","col3"
|
99
|
+
"Foo","Bar","Baz"
|
100
|
+
"","",
|
101
|
+
"Baz","Bar","Foo"
|
102
|
+
"""
|
103
|
+
And it is stored at the url "http://example.com/example1.csv"
|
104
|
+
When I ask if there are errors
|
105
|
+
Then there should be 1 error
|
106
|
+
And that error should have the type "blank_rows"
|
107
|
+
And that error should have the row "3"
|
108
|
+
And that error should have the content ""","","
|
109
|
+
|
110
|
+
Scenario: Successfully report a CSV with multiple trailing empty rows
|
111
|
+
Given I have a CSV with the following content:
|
112
|
+
"""
|
113
|
+
"col1","col2","col3"
|
114
|
+
"Foo","Bar","Baz"
|
115
|
+
"Foo","Bar","Baz"
|
116
|
+
|
117
|
+
|
118
|
+
"""
|
119
|
+
And it is stored at the url "http://example.com/example1.csv"
|
120
|
+
When I ask if there are errors
|
121
|
+
Then there should be 1 error
|
122
|
+
And that error should have the type "blank_rows"
|
123
|
+
And that error should have the row "4"
|
124
|
+
|
125
|
+
Scenario: Successfully report a CSV with an empty row
|
126
|
+
Given I have a CSV with the following content:
|
127
|
+
"""
|
128
|
+
"col1","col2","col3"
|
129
|
+
"Foo","Bar","Baz"
|
130
|
+
|
131
|
+
"Foo","Bar","Baz"
|
132
|
+
"""
|
133
|
+
And it is stored at the url "http://example.com/example1.csv"
|
134
|
+
When I ask if there are errors
|
135
|
+
Then there should be 1 error
|
136
|
+
And that error should have the type "blank_rows"
|
137
|
+
And that error should have the row "3"
|
138
|
+
|
139
|
+
#:check_options
|
140
|
+
|
141
|
+
Scenario: Warn if options seem to return invalid data
|
142
|
+
Given I have a CSV with the following content:
|
143
|
+
"""
|
144
|
+
'Foo';'Bar';'Baz'
|
145
|
+
'1';'2';'3'
|
146
|
+
'3';'2';'1'
|
147
|
+
"""
|
148
|
+
And I set the delimiter to ","
|
149
|
+
And I set quotechar to """
|
150
|
+
And it is stored at the url "http://example.com/example1.csv"
|
151
|
+
And I ask if there are warnings
|
152
|
+
Then there should be 1 warnings
|
153
|
+
And that warning should have the type "check_options"
|
@@ -60,4 +60,46 @@ Feature: Schema Validation
|
|
60
60
|
"""
|
61
61
|
When I ask if there are warnings
|
62
62
|
Then there should be 1 warnings
|
63
|
-
|
63
|
+
|
64
|
+
Scenario: Schema with valid regex
|
65
|
+
Given I have a CSV with the following content:
|
66
|
+
"""
|
67
|
+
"firstname","id","email"
|
68
|
+
"Bob","1234","bob@example.org"
|
69
|
+
"Alice","5","alice@example.com"
|
70
|
+
"""
|
71
|
+
And it is stored at the url "http://example.com/example1.csv"
|
72
|
+
And I have a schema with the following content:
|
73
|
+
"""
|
74
|
+
{
|
75
|
+
"fields": [
|
76
|
+
{ "name": "Name", "constraints": { "required": true, "pattern": "^[A-Za-z0-9_]*$" } },
|
77
|
+
{ "name": "Id", "constraints": { "required": true, "minLength": 1 } },
|
78
|
+
{ "name": "Email", "constraints": { "required": true } }
|
79
|
+
]
|
80
|
+
}
|
81
|
+
"""
|
82
|
+
When I ask if there are errors
|
83
|
+
Then there should be 0 error
|
84
|
+
|
85
|
+
Scenario: Schema with invalid regex
|
86
|
+
Given I have a CSV with the following content:
|
87
|
+
"""
|
88
|
+
"firstname","id","email"
|
89
|
+
"Bob","1234","bob@example.org"
|
90
|
+
"Alice","5","alice@example.com"
|
91
|
+
"""
|
92
|
+
And it is stored at the url "http://example.com/example1.csv"
|
93
|
+
And I have a schema with the following content:
|
94
|
+
"""
|
95
|
+
{
|
96
|
+
"fields": [
|
97
|
+
{ "name": "Name", "constraints": { "required": true, "pattern": "((" } },
|
98
|
+
{ "name": "Id", "constraints": { "required": true, "minLength": 1 } },
|
99
|
+
{ "name": "Email", "constraints": { "required": true } }
|
100
|
+
]
|
101
|
+
}
|
102
|
+
"""
|
103
|
+
When I ask if there are errors
|
104
|
+
Then there should be 1 error
|
105
|
+
And that error should have the type "invalid_regex"
|
@@ -1,13 +1,15 @@
|
|
1
1
|
module Csvlint
|
2
2
|
module ErrorCollector
|
3
3
|
attr_reader :errors, :warnings, :info_messages
|
4
|
-
|
4
|
+
# Creates a validation error
|
5
5
|
def build_errors(type, category = nil, row = nil, column = nil, content = nil, constraints = {})
|
6
6
|
@errors << Csvlint::ErrorMessage.new(type, category, row, column, content, constraints)
|
7
7
|
end
|
8
|
+
# Creates a validation warning
|
8
9
|
def build_warnings(type, category = nil, row = nil, column = nil, content = nil, constraints = {})
|
9
10
|
@warnings << Csvlint::ErrorMessage.new(type, category, row, column, content, constraints)
|
10
11
|
end
|
12
|
+
# Creates a validation information message
|
11
13
|
def build_info_messages(type, category = nil, row = nil, column = nil, content = nil, constraints = {})
|
12
14
|
@info_messages << Csvlint::ErrorMessage.new(type, category, row, column, content, constraints)
|
13
15
|
end
|
data/lib/csvlint/field.rb
CHANGED
@@ -1,10 +1,10 @@
|
|
1
1
|
module Csvlint
|
2
|
-
|
2
|
+
|
3
3
|
class Field
|
4
4
|
include Csvlint::ErrorCollector
|
5
5
|
|
6
6
|
attr_reader :name, :constraints, :title, :description
|
7
|
-
|
7
|
+
|
8
8
|
def initialize(name, constraints={}, title=nil, description=nil)
|
9
9
|
@name = name
|
10
10
|
@constraints = constraints || {}
|
@@ -13,9 +13,12 @@ module Csvlint
|
|
13
13
|
@description = description
|
14
14
|
reset
|
15
15
|
end
|
16
|
-
|
17
|
-
def validate_column(value, row=nil, column=nil)
|
16
|
+
|
17
|
+
def validate_column(value, row=nil, column=nil, all_errors=[])
|
18
18
|
reset
|
19
|
+
unless all_errors.any?{|error| ((error.type == :invalid_regex) && (error.column == column))}
|
20
|
+
validate_regex(value, row, column)
|
21
|
+
end
|
19
22
|
validate_length(value, row, column)
|
20
23
|
validate_values(value, row, column)
|
21
24
|
parsed = validate_type(value, row, column)
|
@@ -26,11 +29,11 @@ module Csvlint
|
|
26
29
|
private
|
27
30
|
def validate_length(value, row, column)
|
28
31
|
if constraints["required"] == true
|
29
|
-
build_errors(:missing_value, :schema, row, column, value,
|
32
|
+
build_errors(:missing_value, :schema, row, column, value,
|
30
33
|
{ "required" => true }) if value.nil? || value.length == 0
|
31
34
|
end
|
32
35
|
if constraints["minLength"]
|
33
|
-
build_errors(:min_length, :schema, row, column, value,
|
36
|
+
build_errors(:min_length, :schema, row, column, value,
|
34
37
|
{ "minLength" => constraints["minLength"] }) if value.nil? || value.length < constraints["minLength"]
|
35
38
|
end
|
36
39
|
if constraints["maxLength"]
|
@@ -38,12 +41,26 @@ module Csvlint
|
|
38
41
|
{ "maxLength" => constraints["maxLength"] } ) if !value.nil? && value.length > constraints["maxLength"]
|
39
42
|
end
|
40
43
|
end
|
41
|
-
|
42
|
-
def
|
43
|
-
|
44
|
-
|
45
|
-
|
44
|
+
|
45
|
+
def validate_regex(value, row, column)
|
46
|
+
pattern = constraints["pattern"]
|
47
|
+
if pattern
|
48
|
+
begin
|
49
|
+
Regexp.new(pattern)
|
50
|
+
build_errors(:pattern, :schema, row, column, value,
|
51
|
+
{ "pattern" => constraints["pattern"] } ) if !value.nil? && !value.match( constraints["pattern"] )
|
52
|
+
rescue RegexpError
|
53
|
+
build_errors(:invalid_regex, :schema, nil, column, ("#{name}: Constraints: Pattern: #{pattern}"),
|
54
|
+
{ "pattern" => constraints["pattern"] })
|
55
|
+
end
|
46
56
|
end
|
57
|
+
end
|
58
|
+
|
59
|
+
def validate_values(value, row, column)
|
60
|
+
# If a pattern exists, raise an invalid regex error if it is not in
|
61
|
+
# valid regex form, else, if the value of the relevant field in the csv
|
62
|
+
# does not match the given regex pattern in the schema, raise a
|
63
|
+
# pattern error.
|
47
64
|
if constraints["unique"] == true
|
48
65
|
if @uniques.include? value
|
49
66
|
build_errors(:unique, :schema, row, column, value, { "unique" => true })
|
@@ -52,7 +69,7 @@ module Csvlint
|
|
52
69
|
end
|
53
70
|
end
|
54
71
|
end
|
55
|
-
|
72
|
+
|
56
73
|
def validate_type(value, row, column)
|
57
74
|
if constraints["type"] && value != ""
|
58
75
|
parsed = convert_to_type(value)
|
@@ -66,21 +83,21 @@ module Csvlint
|
|
66
83
|
end
|
67
84
|
return nil
|
68
85
|
end
|
69
|
-
|
86
|
+
|
70
87
|
def validate_range(value, row, column)
|
71
88
|
#TODO: we're ignoring issues with converting ranges to actual types, maybe we
|
72
89
|
#should generate a warning? The schema is invalid
|
73
90
|
if constraints["minimum"]
|
74
91
|
minimumValue = convert_to_type( constraints["minimum"] )
|
75
92
|
if minimumValue
|
76
|
-
build_errors(:below_minimum, :schema, row, column, value,
|
93
|
+
build_errors(:below_minimum, :schema, row, column, value,
|
77
94
|
{ "minimum" => constraints["minimum"] }) unless value >= minimumValue
|
78
95
|
end
|
79
96
|
end
|
80
97
|
if constraints["maximum"]
|
81
98
|
maximumValue = convert_to_type( constraints["maximum"] )
|
82
99
|
if maximumValue
|
83
|
-
build_errors(:above_maximum, :schema, row, column, value,
|
100
|
+
build_errors(:above_maximum, :schema, row, column, value,
|
84
101
|
{ "maximum" => constraints["maximum"] }) unless value <= maximumValue
|
85
102
|
end
|
86
103
|
end
|
@@ -96,7 +113,7 @@ module Csvlint
|
|
96
113
|
end
|
97
114
|
end
|
98
115
|
return parsed
|
99
|
-
end
|
116
|
+
end
|
100
117
|
|
101
118
|
TYPE_VALIDATIONS = {
|
102
119
|
'http://www.w3.org/2001/XMLSchema#string' => lambda { |value, constraints| value },
|
@@ -170,4 +187,4 @@ module Csvlint
|
|
170
187
|
end,
|
171
188
|
}
|
172
189
|
end
|
173
|
-
end
|
190
|
+
end
|
data/lib/csvlint/schema.rb
CHANGED
@@ -1,11 +1,11 @@
|
|
1
1
|
module Csvlint
|
2
|
-
|
2
|
+
|
3
3
|
class Schema
|
4
|
-
|
4
|
+
|
5
5
|
include Csvlint::ErrorCollector
|
6
|
-
|
6
|
+
|
7
7
|
attr_reader :uri, :fields, :title, :description
|
8
|
-
|
8
|
+
|
9
9
|
def initialize(uri, fields=[], title=nil, description=nil)
|
10
10
|
@uri = uri
|
11
11
|
@fields = fields
|
@@ -17,16 +17,16 @@ module Csvlint
|
|
17
17
|
def validate_header(header)
|
18
18
|
reset
|
19
19
|
|
20
|
-
found_header = header.
|
21
|
-
expected_header = @fields.map{ |f| f.name }.
|
20
|
+
found_header = header.to_csv(:row_sep => '')
|
21
|
+
expected_header = @fields.map{ |f| f.name }.to_csv(:row_sep => '')
|
22
22
|
if found_header != expected_header
|
23
23
|
build_warnings(:malformed_header, :schema, 1, nil, found_header, expected_header)
|
24
24
|
end
|
25
25
|
|
26
26
|
return valid?
|
27
27
|
end
|
28
|
-
|
29
|
-
def validate_row(values, row=nil)
|
28
|
+
|
29
|
+
def validate_row(values, row=nil, all_errors=[])
|
30
30
|
reset
|
31
31
|
if values.length < fields.length
|
32
32
|
fields[values.size..-1].each_with_index do |field, i|
|
@@ -38,34 +38,37 @@ module Csvlint
|
|
38
38
|
build_warnings(:extra_column, :schema, row, fields.size+i+1)
|
39
39
|
end
|
40
40
|
end
|
41
|
-
|
41
|
+
|
42
42
|
fields.each_with_index do |field,i|
|
43
43
|
value = values[i] || ""
|
44
|
-
result = field.validate_column(value, row, i+1)
|
44
|
+
result = field.validate_column(value, row, i+1, all_errors)
|
45
45
|
@errors += fields[i].errors
|
46
|
-
@warnings += fields[i].warnings
|
46
|
+
@warnings += fields[i].warnings
|
47
47
|
end
|
48
|
-
|
48
|
+
|
49
49
|
return valid?
|
50
50
|
end
|
51
|
-
|
51
|
+
|
52
52
|
def Schema.from_json_table(uri, json)
|
53
53
|
fields = []
|
54
54
|
json["fields"].each do |field_desc|
|
55
|
-
fields << Csvlint::Field.new( field_desc["name"] , field_desc["constraints"],
|
55
|
+
fields << Csvlint::Field.new( field_desc["name"] , field_desc["constraints"],
|
56
56
|
field_desc["title"], field_desc["description"] )
|
57
57
|
end if json["fields"]
|
58
58
|
return Schema.new( uri , fields, json["title"], json["description"] )
|
59
59
|
end
|
60
|
-
|
60
|
+
|
61
|
+
# Difference in functionality between from_json_table and load_from_json_table
|
62
|
+
# needs to be specified
|
63
|
+
|
61
64
|
def Schema.load_from_json_table(uri)
|
62
65
|
begin
|
63
66
|
json = JSON.parse( open(uri).read )
|
64
67
|
return Schema.from_json_table(uri,json)
|
65
68
|
rescue
|
66
|
-
return nil
|
69
|
+
return Schema.new(nil, [], "malformed", "malformed")
|
67
70
|
end
|
68
71
|
end
|
69
|
-
|
72
|
+
|
70
73
|
end
|
71
|
-
end
|
74
|
+
end
|
data/lib/csvlint/validate.rb
CHANGED
@@ -1,25 +1,26 @@
|
|
1
1
|
module Csvlint
|
2
|
-
|
2
|
+
|
3
3
|
class Validator
|
4
|
-
|
4
|
+
|
5
5
|
include Csvlint::ErrorCollector
|
6
|
-
|
6
|
+
|
7
7
|
attr_reader :encoding, :content_type, :extension, :headers, :line_breaks, :dialect, :csv_header, :schema, :data
|
8
|
-
|
8
|
+
|
9
9
|
ERROR_MATCHERS = {
|
10
10
|
"Missing or stray quote" => :stray_quote,
|
11
11
|
"Illegal quoting" => :whitespace,
|
12
12
|
"Unclosed quoted field" => :unclosed_quote,
|
13
13
|
"Unquoted fields do not allow \\r or \\n" => :line_breaks,
|
14
14
|
}
|
15
|
-
|
16
|
-
def initialize(source, dialect = nil, schema = nil, options = {})
|
15
|
+
|
16
|
+
def initialize(source, dialect = nil, schema = nil, options = {})
|
17
|
+
|
17
18
|
@source = source
|
18
19
|
@formats = []
|
19
20
|
@schema = schema
|
20
|
-
|
21
|
+
|
21
22
|
@supplied_dialect = dialect != nil
|
22
|
-
|
23
|
+
|
23
24
|
@dialect = {
|
24
25
|
"header" => true,
|
25
26
|
"delimiter" => ",",
|
@@ -27,18 +28,19 @@ module Csvlint
|
|
27
28
|
"lineTerminator" => :auto,
|
28
29
|
"quoteChar" => '"'
|
29
30
|
}.merge(dialect || {})
|
30
|
-
|
31
|
+
|
31
32
|
@csv_header = @dialect["header"]
|
32
33
|
@limit_lines = options[:limit_lines]
|
33
34
|
@csv_options = dialect_to_csv_options(@dialect)
|
34
|
-
@extension = parse_extension(source)
|
35
|
+
@extension = parse_extension(source) unless @source.nil?
|
35
36
|
reset
|
36
37
|
validate
|
38
|
+
|
37
39
|
end
|
38
|
-
|
40
|
+
|
39
41
|
def validate
|
40
|
-
single_col = false
|
41
|
-
io = nil
|
42
|
+
single_col = false
|
43
|
+
io = nil
|
42
44
|
begin
|
43
45
|
io = @source.respond_to?(:gets) ? @source : open(@source, :allow_redirections=>:all)
|
44
46
|
validate_metadata(io)
|
@@ -47,19 +49,19 @@ module Csvlint
|
|
47
49
|
unless sum.nil?
|
48
50
|
build_warnings(:title_row, :structure) if @col_counts.first < (sum / @col_counts.size.to_f)
|
49
51
|
end
|
50
|
-
build_warnings(:check_options, :structure) if @expected_columns == 1
|
51
|
-
check_consistency
|
52
|
+
build_warnings(:check_options, :structure) if @expected_columns == 1
|
53
|
+
check_consistency
|
52
54
|
rescue OpenURI::HTTPError, Errno::ENOENT
|
53
55
|
build_errors(:not_found)
|
54
56
|
ensure
|
55
57
|
io.close if io && io.respond_to?(:close)
|
56
58
|
end
|
57
59
|
end
|
58
|
-
|
60
|
+
|
59
61
|
def validate_metadata(io)
|
60
62
|
@encoding = io.charset rescue nil
|
61
63
|
@content_type = io.content_type rescue nil
|
62
|
-
@headers = io.meta rescue nil
|
64
|
+
@headers = io.meta rescue nil
|
63
65
|
assumed_header = undeclared_header = !@supplied_dialect
|
64
66
|
if @headers
|
65
67
|
if @headers["content-type"] =~ /text\/csv/
|
@@ -74,31 +76,33 @@ module Csvlint
|
|
74
76
|
assumed_header = false
|
75
77
|
end
|
76
78
|
if @headers["content-type"] !~ /charset=/
|
77
|
-
build_warnings(:no_encoding, :context)
|
79
|
+
build_warnings(:no_encoding, :context)
|
78
80
|
else
|
79
81
|
build_warnings(:encoding, :context) if @encoding != "utf-8"
|
80
82
|
end
|
81
83
|
build_warnings(:no_content_type, :context) if @content_type == nil
|
82
84
|
build_warnings(:excel, :context) if @content_type == nil && @extension =~ /.xls(x)?/
|
83
85
|
build_errors(:wrong_content_type, :context) unless (@content_type && @content_type =~ /text\/csv/)
|
84
|
-
|
86
|
+
|
85
87
|
if undeclared_header
|
86
88
|
build_errors(:undeclared_header, :structure)
|
87
89
|
assumed_header = false
|
88
90
|
end
|
89
|
-
|
91
|
+
|
90
92
|
end
|
91
93
|
build_info_messages(:assumed_header, :structure) if assumed_header
|
92
94
|
end
|
93
|
-
|
95
|
+
|
96
|
+
# analyses the provided csv and builds errors, warnings and info messages
|
94
97
|
def parse_csv(io)
|
95
98
|
@expected_columns = 0
|
96
99
|
current_line = 0
|
97
100
|
reported_invalid_encoding = false
|
101
|
+
all_errors = []
|
98
102
|
@col_counts = []
|
99
|
-
|
100
|
-
@csv_options[:encoding] = @encoding
|
101
|
-
|
103
|
+
|
104
|
+
@csv_options[:encoding] = @encoding
|
105
|
+
|
102
106
|
begin
|
103
107
|
wrapper = WrappedIO.new( io )
|
104
108
|
csv = CSV.new( wrapper, @csv_options )
|
@@ -110,37 +114,38 @@ module Csvlint
|
|
110
114
|
row = nil
|
111
115
|
loop do
|
112
116
|
current_line += 1
|
113
|
-
if @limit_lines && current_line > @limit_lines
|
117
|
+
if @limit_lines && current_line > @limit_lines
|
114
118
|
break
|
115
119
|
end
|
116
120
|
begin
|
117
121
|
wrapper.reset_line
|
118
122
|
row = csv.shift
|
119
123
|
@data << row
|
120
|
-
if row
|
124
|
+
if row
|
121
125
|
if current_line == 1 && header?
|
122
126
|
row = row.reject{|col| col.nil? || col.empty?}
|
123
127
|
validate_header(row)
|
124
128
|
@col_counts << row.size
|
125
|
-
else
|
129
|
+
else
|
126
130
|
build_formats(row)
|
127
131
|
@col_counts << row.reject{|col| col.nil? || col.empty?}.size
|
128
132
|
@expected_columns = row.size unless @expected_columns != 0
|
129
|
-
|
133
|
+
|
130
134
|
build_errors(:blank_rows, :structure, current_line, nil, wrapper.line) if row.reject{ |c| c.nil? || c.empty? }.size == 0
|
131
|
-
|
135
|
+
# Builds errors and warnings related to the provided schema file
|
132
136
|
if @schema
|
133
|
-
@schema.validate_row(row, current_line)
|
137
|
+
@schema.validate_row(row, current_line, all_errors)
|
134
138
|
@errors += @schema.errors
|
139
|
+
all_errors += @schema.errors
|
135
140
|
@warnings += @schema.warnings
|
136
141
|
else
|
137
142
|
build_errors(:ragged_rows, :structure, current_line, nil, wrapper.line) if !row.empty? && row.size != @expected_columns
|
138
143
|
end
|
139
|
-
|
144
|
+
|
140
145
|
end
|
141
|
-
else
|
146
|
+
else
|
142
147
|
break
|
143
|
-
end
|
148
|
+
end
|
144
149
|
rescue CSV::MalformedCSVError => e
|
145
150
|
type = fetch_error(e)
|
146
151
|
if type == :stray_quote && !wrapper.line.match(csv.row_sep)
|
@@ -154,8 +159,8 @@ module Csvlint
|
|
154
159
|
build_errors(:invalid_encoding, :structure, current_line, nil, wrapper.line) unless reported_invalid_encoding
|
155
160
|
reported_invalid_encoding = true
|
156
161
|
end
|
157
|
-
end
|
158
|
-
|
162
|
+
end
|
163
|
+
|
159
164
|
def validate_header(header)
|
160
165
|
names = Set.new
|
161
166
|
header.each_with_index do |name,i|
|
@@ -173,18 +178,18 @@ module Csvlint
|
|
173
178
|
end
|
174
179
|
return valid?
|
175
180
|
end
|
176
|
-
|
181
|
+
|
177
182
|
def header?
|
178
183
|
@csv_header
|
179
184
|
end
|
180
|
-
|
185
|
+
|
181
186
|
def fetch_error(error)
|
182
187
|
e = error.message.match(/^(.+?)(?: [io]n)? \(?line \d+\)?\.?$/i)
|
183
188
|
message = e[1] rescue nil
|
184
189
|
ERROR_MATCHERS.fetch(message, :unknown_error)
|
185
190
|
end
|
186
|
-
|
187
|
-
def dialect_to_csv_options(dialect)
|
191
|
+
|
192
|
+
def dialect_to_csv_options(dialect)
|
188
193
|
skipinitialspace = dialect["skipInitialSpace"] || true
|
189
194
|
delimiter = dialect["delimiter"]
|
190
195
|
delimiter = delimiter + " " if !skipinitialspace
|
@@ -195,8 +200,8 @@ module Csvlint
|
|
195
200
|
:skip_blanks => false
|
196
201
|
}
|
197
202
|
end
|
198
|
-
|
199
|
-
def build_formats(row)
|
203
|
+
|
204
|
+
def build_formats(row)
|
200
205
|
row.each_with_index do |col, i|
|
201
206
|
next if col.nil? || col.empty?
|
202
207
|
@formats[i] ||= Hash.new(0)
|
@@ -228,11 +233,11 @@ module Csvlint
|
|
228
233
|
else
|
229
234
|
:string
|
230
235
|
end
|
231
|
-
|
236
|
+
|
232
237
|
@formats[i][format] += 1
|
233
238
|
end
|
234
239
|
end
|
235
|
-
|
240
|
+
|
236
241
|
def check_consistency
|
237
242
|
@formats.each_with_index do |format,i|
|
238
243
|
if format
|
@@ -243,10 +248,11 @@ module Csvlint
|
|
243
248
|
end
|
244
249
|
end
|
245
250
|
end
|
246
|
-
|
251
|
+
|
247
252
|
private
|
248
|
-
|
253
|
+
|
249
254
|
def parse_extension(source)
|
255
|
+
# byebug
|
250
256
|
case source
|
251
257
|
when File
|
252
258
|
return File.extname( source.path )
|
@@ -254,11 +260,16 @@ module Csvlint
|
|
254
260
|
return ""
|
255
261
|
when StringIO
|
256
262
|
return ""
|
257
|
-
|
263
|
+
when Tempfile
|
264
|
+
# this is triggered when the revalidate dialect use case happens
|
258
265
|
return ""
|
259
266
|
else
|
260
|
-
|
261
|
-
|
267
|
+
begin
|
268
|
+
parsed = URI.parse(source)
|
269
|
+
File.extname(parsed.path)
|
270
|
+
rescue URI::InvalidURIError
|
271
|
+
return ""
|
272
|
+
end
|
262
273
|
end
|
263
274
|
end
|
264
275
|
|
data/lib/csvlint/version.rb
CHANGED
metadata
CHANGED
@@ -1,14 +1,14 @@
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
2
2
|
name: csvlint
|
3
3
|
version: !ruby/object:Gem::Version
|
4
|
-
version: 0.1.
|
4
|
+
version: 0.1.3
|
5
5
|
platform: ruby
|
6
6
|
authors:
|
7
7
|
- pezholio
|
8
8
|
autorequire:
|
9
9
|
bindir: bin
|
10
10
|
cert_chain: []
|
11
|
-
date: 2015-07-
|
11
|
+
date: 2015-07-24 00:00:00.000000000 Z
|
12
12
|
dependencies:
|
13
13
|
- !ruby/object:Gem::Dependency
|
14
14
|
name: mime-types
|
@@ -248,6 +248,20 @@ dependencies:
|
|
248
248
|
- - ! '>='
|
249
249
|
- !ruby/object:Gem::Version
|
250
250
|
version: '0'
|
251
|
+
- !ruby/object:Gem::Dependency
|
252
|
+
name: github_changelog_generator
|
253
|
+
requirement: !ruby/object:Gem::Requirement
|
254
|
+
requirements:
|
255
|
+
- - ! '>='
|
256
|
+
- !ruby/object:Gem::Version
|
257
|
+
version: '0'
|
258
|
+
type: :development
|
259
|
+
prerelease: false
|
260
|
+
version_requirements: !ruby/object:Gem::Requirement
|
261
|
+
requirements:
|
262
|
+
- - ! '>='
|
263
|
+
- !ruby/object:Gem::Version
|
264
|
+
version: '0'
|
251
265
|
description: CSV Validator
|
252
266
|
email:
|
253
267
|
- pezholio@gmail.com
|
@@ -261,6 +275,7 @@ files:
|
|
261
275
|
- .gitignore
|
262
276
|
- .ruby-version
|
263
277
|
- .travis.yml
|
278
|
+
- CHANGELOG.md
|
264
279
|
- Gemfile
|
265
280
|
- LICENSE.md
|
266
281
|
- README.md
|
@@ -270,6 +285,7 @@ files:
|
|
270
285
|
- csvlint.gemspec
|
271
286
|
- features/check_format.feature
|
272
287
|
- features/csv_options.feature
|
288
|
+
- features/csvupload.feature
|
273
289
|
- features/fixtures/cr-line-endings.csv
|
274
290
|
- features/fixtures/crlf-line-endings.csv
|
275
291
|
- features/fixtures/inconsistent-line-endings-unquoted.csv
|
@@ -336,6 +352,7 @@ summary: CSV Validator
|
|
336
352
|
test_files:
|
337
353
|
- features/check_format.feature
|
338
354
|
- features/csv_options.feature
|
355
|
+
- features/csvupload.feature
|
339
356
|
- features/fixtures/cr-line-endings.csv
|
340
357
|
- features/fixtures/crlf-line-endings.csv
|
341
358
|
- features/fixtures/inconsistent-line-endings-unquoted.csv
|