csvlint 0.1.4 → 0.2.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +8 -8
- data/.gitignore +7 -1
- data/CHANGELOG.md +19 -1
- data/README.md +93 -36
- data/bin/csvlint +68 -27
- data/csvlint.gemspec +2 -0
- data/features/csvw_schema_validation.feature +127 -0
- data/features/fixtures/spreadsheet.xlsx +0 -0
- data/features/sources.feature +3 -4
- data/features/step_definitions/parse_csv_steps.rb +13 -1
- data/features/step_definitions/schema_validation_steps.rb +27 -1
- data/features/step_definitions/sources_steps.rb +1 -1
- data/features/step_definitions/validation_errors_steps.rb +48 -1
- data/features/step_definitions/validation_info_steps.rb +5 -1
- data/features/step_definitions/validation_warnings_steps.rb +15 -1
- data/features/support/load_tests.rb +114 -0
- data/features/validation_errors.feature +12 -24
- data/features/validation_warnings.feature +18 -6
- data/lib/csvlint.rb +10 -0
- data/lib/csvlint/csvw/column.rb +359 -0
- data/lib/csvlint/csvw/date_format.rb +182 -0
- data/lib/csvlint/csvw/metadata_error.rb +13 -0
- data/lib/csvlint/csvw/number_format.rb +211 -0
- data/lib/csvlint/csvw/property_checker.rb +761 -0
- data/lib/csvlint/csvw/table.rb +204 -0
- data/lib/csvlint/csvw/table_group.rb +165 -0
- data/lib/csvlint/schema.rb +40 -23
- data/lib/csvlint/validate.rb +142 -19
- data/lib/csvlint/version.rb +1 -1
- data/spec/csvw/column_spec.rb +112 -0
- data/spec/csvw/date_format_spec.rb +49 -0
- data/spec/csvw/number_format_spec.rb +403 -0
- data/spec/csvw/table_group_spec.rb +143 -0
- data/spec/csvw/table_spec.rb +90 -0
- data/spec/schema_spec.rb +27 -1
- data/spec/spec_helper.rb +0 -1
- data/spec/validator_spec.rb +16 -10
- metadata +53 -2
checksums.yaml
CHANGED
@@ -1,15 +1,15 @@
|
|
1
1
|
---
|
2
2
|
!binary "U0hBMQ==":
|
3
3
|
metadata.gz: !binary |-
|
4
|
-
|
4
|
+
YjlmZmFlNGZjOWQ5MmNlNDZiOTUxMWY0NGExYTRkYjhhNzdlNjAyNA==
|
5
5
|
data.tar.gz: !binary |-
|
6
|
-
|
6
|
+
ODFjZmJkZmI0Nzg2NmMzN2ViOGNiNDlmODA0NDcxMzM0Zjk4NTgwOQ==
|
7
7
|
SHA512:
|
8
8
|
metadata.gz: !binary |-
|
9
|
-
|
10
|
-
|
11
|
-
|
9
|
+
ZTIyMGVkYjIyMjc2ZWViNTBhYmZkMWIxN2E1OTU0OTFhNGMxNzBlYzg0OTI4
|
10
|
+
NDRkMzY2YzgxNmQwZGZiZDE5M2M2NzYwMzk3ZWZjMDc3YWM0YzQ0NTczY2U3
|
11
|
+
MGZjNTUwMGI2MzgzZDQxYzkzMzBiNzI3NmJkZTIxYjZiYjc5MDA=
|
12
12
|
data.tar.gz: !binary |-
|
13
|
-
|
14
|
-
|
15
|
-
|
13
|
+
NTI1M2I5Yzc3NGNhOTg3Y2VkMmM3ZGM1ZTdiZWNmMzM0ZTY5ODljODNmNWYy
|
14
|
+
MDA0NGVlMGFhNDQ2ZjZjYjI0Nzc2OTdhMWRmODI5YTEzMGRmNTQxZjAyOTA5
|
15
|
+
YjVmMjk4NDIyOWEzMzIxMTBlYjQ4YTgwZmE4MWZlYTQ4MjMzZmE=
|
data/.gitignore
CHANGED
data/CHANGELOG.md
CHANGED
@@ -2,7 +2,25 @@
|
|
2
2
|
|
3
3
|
## [Unreleased](https://github.com/theodi/csvlint.rb/tree/HEAD)
|
4
4
|
|
5
|
-
[Full Changelog](https://github.com/theodi/csvlint.rb/compare/0.1.
|
5
|
+
[Full Changelog](https://github.com/theodi/csvlint.rb/compare/0.1.4...HEAD)
|
6
|
+
|
7
|
+
**Closed issues:**
|
8
|
+
|
9
|
+
- CSV on the web support [\#141](https://github.com/theodi/csvlint.rb/issues/141)
|
10
|
+
|
11
|
+
**Merged pull requests:**
|
12
|
+
|
13
|
+
- Recover from `ArgumentError`s when attempting to locate a schema and detect bad schema when JSON is malformed [\#152](https://github.com/theodi/csvlint.rb/pull/152) ([pezholio](https://github.com/pezholio))
|
14
|
+
|
15
|
+
- Catch errors if link headers are don't have particular values [\#151](https://github.com/theodi/csvlint.rb/pull/151) ([pezholio](https://github.com/pezholio))
|
16
|
+
|
17
|
+
- Rescue excel warning [\#149](https://github.com/theodi/csvlint.rb/pull/149) ([quadrophobiac](https://github.com/quadrophobiac))
|
18
|
+
|
19
|
+
- CSVW-based validation! [\#142](https://github.com/theodi/csvlint.rb/pull/142) ([JeniT](https://github.com/JeniT))
|
20
|
+
|
21
|
+
## [0.1.4](https://github.com/theodi/csvlint.rb/tree/0.1.4) (2015-08-06)
|
22
|
+
|
23
|
+
[Full Changelog](https://github.com/theodi/csvlint.rb/compare/0.1.3...0.1.4)
|
6
24
|
|
7
25
|
**Merged pull requests:**
|
8
26
|
|
data/README.md
CHANGED
@@ -31,13 +31,13 @@ You can either use this gem within your own Ruby code, or as a standolone comman
|
|
31
31
|
After installing the gem, you can validate a CSV on the command line like so:
|
32
32
|
|
33
33
|
csvlint myfile.csv
|
34
|
-
|
34
|
+
|
35
35
|
You will then see the validation result, together with any warnings or errors e.g.
|
36
36
|
|
37
37
|
```
|
38
38
|
myfile.csv is INVALID
|
39
39
|
1. blank_rows. Row: 3
|
40
|
-
1. title_row.
|
40
|
+
1. title_row.
|
41
41
|
2. inconsistent_values. Column: 14
|
42
42
|
```
|
43
43
|
|
@@ -50,40 +50,40 @@ You can also optionally pass a schema file like so:
|
|
50
50
|
Currently the gem supports retrieving a CSV accessible from a URL, File, or an IO-style object (e.g. StringIO)
|
51
51
|
|
52
52
|
require 'csvlint'
|
53
|
-
|
53
|
+
|
54
54
|
validator = Csvlint::Validator.new( "http://example.org/data.csv" )
|
55
55
|
validator = Csvlint::Validator.new( File.new("/path/to/my/data.csv" ))
|
56
56
|
validator = Csvlint::Validator.new( StringIO.new( my_data_in_a_string ) )
|
57
57
|
|
58
|
-
When validating from a URL the range of errors and warnings is wider as the library will also check HTTP headers for
|
58
|
+
When validating from a URL the range of errors and warnings is wider as the library will also check HTTP headers for
|
59
59
|
best practices
|
60
|
-
|
61
|
-
#invoke the validation
|
60
|
+
|
61
|
+
#invoke the validation
|
62
62
|
validator.validate
|
63
|
-
|
63
|
+
|
64
64
|
#check validation status
|
65
65
|
validator.valid?
|
66
|
-
|
66
|
+
|
67
67
|
#access array of errors, each is an Csvlint::ErrorMessage object
|
68
68
|
validator.errors
|
69
|
-
|
69
|
+
|
70
70
|
#access array of warnings
|
71
71
|
validator.warnings
|
72
|
-
|
72
|
+
|
73
73
|
#access array of information messages
|
74
74
|
validator.info_messages
|
75
|
-
|
75
|
+
|
76
76
|
#get some information about the CSV file that was validated
|
77
77
|
validator.encoding
|
78
78
|
validator.content_type
|
79
79
|
validator.extension
|
80
|
-
|
80
|
+
|
81
81
|
#retrieve HTTP headers from request
|
82
82
|
validator.headers
|
83
83
|
|
84
84
|
## Controlling CSV Parsing
|
85
85
|
|
86
|
-
The validator supports configuration of the [CSV Dialect](http://dataprotocols.org/csv-dialect/) used in a data file. This is specified by
|
86
|
+
The validator supports configuration of the [CSV Dialect](http://dataprotocols.org/csv-dialect/) used in a data file. This is specified by
|
87
87
|
passing a dialect hash to the constructor:
|
88
88
|
|
89
89
|
dialect = {
|
@@ -94,17 +94,17 @@ passing a dialect hash to the constructor:
|
|
94
94
|
|
95
95
|
The options should be a Hash that conforms to the [CSV Dialect](http://dataprotocols.org/csv-dialect/) JSON structure.
|
96
96
|
|
97
|
-
While these options configure the parser to correctly process the file, the validator will still raise errors or warnings for CSV
|
97
|
+
While these options configure the parser to correctly process the file, the validator will still raise errors or warnings for CSV
|
98
98
|
structure that it considers to be invalid, e.g. a missing header or different delimiters.
|
99
99
|
|
100
|
-
Note that the parser will also check for a `header` parameter on the `Content-Type` header returned when fetching a remote CSV file. As
|
100
|
+
Note that the parser will also check for a `header` parameter on the `Content-Type` header returned when fetching a remote CSV file. As
|
101
101
|
specified in [RFC 4180](http://www.ietf.org/rfc/rfc4180.txt) the values for this can be `present` and `absent`, e.g:
|
102
102
|
|
103
103
|
Content-Type: text/csv; header=present
|
104
104
|
|
105
105
|
## Error Reporting
|
106
106
|
|
107
|
-
The validator provides feedback on a validation result using instances of `Csvlint::ErrorMessage`. Errors are divided into errors, warnings and information
|
107
|
+
The validator provides feedback on a validation result using instances of `Csvlint::ErrorMessage`. Errors are divided into errors, warnings and information
|
108
108
|
messages. A validation attempt is successful if there are no errors.
|
109
109
|
|
110
110
|
Messages provide context including:
|
@@ -122,7 +122,7 @@ The following types of error can be reported:
|
|
122
122
|
* `:wrong_content_type` -- content type is not `text/csv`
|
123
123
|
* `:ragged_rows` -- row has a different number of columns (than the first row in the file)
|
124
124
|
* `:blank_rows` -- completely empty row, e.g. blank line or a line where all column values are empty
|
125
|
-
* `:invalid_encoding` -- encoding error when parsing row, e.g. because of invalid characters
|
125
|
+
* `:invalid_encoding` -- encoding error when parsing row, e.g. because of invalid characters
|
126
126
|
* `:not_found` -- HTTP 404 error when retrieving the data
|
127
127
|
* `:stray_quote` -- missing or stray quote
|
128
128
|
* `:unclosed_quote` -- unclosed quoted field
|
@@ -153,36 +153,66 @@ There are also information messages available:
|
|
153
153
|
|
154
154
|
## Schema Validation
|
155
155
|
|
156
|
-
The library supports validating data against a schema. A schema configuration can be provided as a Hash or parsed from JSON. The structure currently
|
157
|
-
follows JSON Table Schema with some extensions.
|
156
|
+
The library supports validating data against a schema. A schema configuration can be provided as a Hash or parsed from JSON. The structure currently
|
157
|
+
follows JSON Table Schema with some extensions and rudinmentary [CSV on the Web Metadata](http://www.w3.org/TR/tabular-metadata/).
|
158
158
|
|
159
|
-
An example schema file is:
|
159
|
+
An example JSON Table Schema schema file is:
|
160
160
|
|
161
161
|
{
|
162
162
|
"fields": [
|
163
|
-
{
|
164
|
-
"name": "id",
|
165
|
-
"constraints": { "required": true }
|
163
|
+
{
|
164
|
+
"name": "id",
|
165
|
+
"constraints": { "required": true }
|
166
166
|
},
|
167
|
-
{
|
168
|
-
"name": "price",
|
169
|
-
"constraints": { "required": true, "minLength": 1 }
|
167
|
+
{
|
168
|
+
"name": "price",
|
169
|
+
"constraints": { "required": true, "minLength": 1 }
|
170
170
|
},
|
171
|
-
{
|
172
|
-
"name": "postcode",
|
173
|
-
"constraints": {
|
174
|
-
"required": true,
|
175
|
-
"pattern": "[A-Z]{1,2}[0-9][0-9A-Z]? ?[0-9][A-Z]{2}"
|
176
|
-
}
|
171
|
+
{
|
172
|
+
"name": "postcode",
|
173
|
+
"constraints": {
|
174
|
+
"required": true,
|
175
|
+
"pattern": "[A-Z]{1,2}[0-9][0-9A-Z]? ?[0-9][A-Z]{2}"
|
176
|
+
}
|
177
177
|
}
|
178
178
|
]
|
179
179
|
}
|
180
180
|
|
181
|
-
|
181
|
+
An equivalent CSV on the Web Metadata file is:
|
182
|
+
|
183
|
+
{
|
184
|
+
"@context": "http://www.w3.org/ns/csvw",
|
185
|
+
"url": "http://example.com/example1.csv",
|
186
|
+
"tableSchema": {
|
187
|
+
"columns": [
|
188
|
+
{
|
189
|
+
"name": "id",
|
190
|
+
"required": true
|
191
|
+
},
|
192
|
+
{
|
193
|
+
"name": "price",
|
194
|
+
"required": true,
|
195
|
+
"datatype": { "base": "string", "minLength": 1 }
|
196
|
+
},
|
197
|
+
{
|
198
|
+
"name": "postcode",
|
199
|
+
"required": true
|
200
|
+
}
|
201
|
+
]
|
202
|
+
}
|
203
|
+
}
|
182
204
|
|
183
|
-
|
205
|
+
Parsing and validating with a schema (of either kind):
|
206
|
+
|
207
|
+
schema = Csvlint::Schema.load_from_json(uri)
|
184
208
|
validator = Csvlint::Validator.new( "http://example.org/data.csv", nil, schema )
|
185
209
|
|
210
|
+
### CSV on the Web Validation Support
|
211
|
+
|
212
|
+
This gem passes all the validation tests in the [official CSV on the Web test suite](http://w3c.github.io/csvw/tests/) (though there might still be errors or parts of the [CSV on the Web standard](http://www.w3.org/TR/tabular-metadata/) that aren't tested by that test suite).
|
213
|
+
|
214
|
+
### JSON Table Schema Support
|
215
|
+
|
186
216
|
Supported constraints:
|
187
217
|
|
188
218
|
* `required` -- there must be a value for this field in every row
|
@@ -192,7 +222,7 @@ Supported constraints:
|
|
192
222
|
* `pattern` -- values must match the provided regular expression
|
193
223
|
* `type` -- specifies an XML Schema data type. Values of the column must be a valid value for that type
|
194
224
|
* `minimum` -- specify a minimum range for values, the value will be parsed as specified by `type`
|
195
|
-
* `maximum` -- specify a maximum range for values, the value will be parsed as specified by `type`
|
225
|
+
* `maximum` -- specify a maximum range for values, the value will be parsed as specified by `type`
|
196
226
|
* `datePattern` -- specify a `strftime` compatible date pattern to be used when parsing date values and min/max constraints
|
197
227
|
|
198
228
|
Supported data types (this is still a work in progress):
|
@@ -214,7 +244,7 @@ Supported data types (this is still a work in progress):
|
|
214
244
|
* Time -- `http://www.w3.org/2001/XMLSchema#time`
|
215
245
|
|
216
246
|
Use of an unknown data type will result in the column failing to validate.
|
217
|
-
|
247
|
+
|
218
248
|
Schema validation provides some additional types of error and warning messages:
|
219
249
|
|
220
250
|
* `:missing_value` (error) -- a column marked as `required` in the schema has no value
|
@@ -248,3 +278,30 @@ validator = Csvlint::Validator.new( "http://example.org/data.csv", nil, nil, opt
|
|
248
278
|
3. Commit your changes (`git commit -am 'Add some feature'`)
|
249
279
|
4. Push to the branch (`git push origin my-new-feature`)
|
250
280
|
5. Create new Pull Request
|
281
|
+
|
282
|
+
### Testing
|
283
|
+
|
284
|
+
The codebase includes both rspec and cucumber tests, which can be run together using:
|
285
|
+
|
286
|
+
$ rake
|
287
|
+
|
288
|
+
or separately:
|
289
|
+
|
290
|
+
$ rake spec
|
291
|
+
$ rake features
|
292
|
+
|
293
|
+
When the cucumber tests are first run, a script will create tests based on the latest version of the [CSV on the Web test suite](http://w3c.github.io/csvw/tests/), including creating a local cache of the test files. This requires an internet connection and some patience. Following that download, the tests will run locally; there's also a batch script:
|
294
|
+
|
295
|
+
$ bin/run-csvw-tests
|
296
|
+
|
297
|
+
which will run the tests from the command line.
|
298
|
+
|
299
|
+
If you need to refresh the CSV on the Web tests:
|
300
|
+
|
301
|
+
$ rm bin/run-csvw-tests
|
302
|
+
$ rm features/csvw_validation_tests.feature
|
303
|
+
$ rm -r features/fixtures/csvw
|
304
|
+
|
305
|
+
and then run the cucumber tests again or:
|
306
|
+
|
307
|
+
$ ruby features/support/load_tests.rb
|
data/bin/csvlint
CHANGED
@@ -16,8 +16,8 @@ opts.on("-d", "--dump-errors", "Pretty print error and warning objects.") do |d|
|
|
16
16
|
options[:dump] = d
|
17
17
|
end
|
18
18
|
|
19
|
-
opts.on("-s", "--schema
|
20
|
-
options[:
|
19
|
+
opts.on("-s", "--schema FILENAME", "Schema file") do |s|
|
20
|
+
options[:schema] = s
|
21
21
|
end
|
22
22
|
|
23
23
|
opts.on_tail("-h", "--help",
|
@@ -35,14 +35,15 @@ rescue OptionParser::InvalidOption => e
|
|
35
35
|
end
|
36
36
|
|
37
37
|
def print_error(index, error, dump, color)
|
38
|
-
|
39
38
|
location = ""
|
40
39
|
location += error.row.to_s if error.row
|
41
40
|
location += "#{error.row ? "," : ""}#{error.column.to_s}" if error.column
|
42
41
|
if error.row || error.column
|
43
42
|
location = "#{error.row ? "Row" : "Column"}: #{location}"
|
44
43
|
end
|
45
|
-
output_string = "#{index+1}. #{error.type}
|
44
|
+
output_string = "#{index+1}. #{error.type}"
|
45
|
+
output_string += ". #{location}" unless location.empty?
|
46
|
+
output_string += ". #{error.content}" if error.content
|
46
47
|
|
47
48
|
if $stdout.tty?
|
48
49
|
puts output_string.colorize(color)
|
@@ -56,6 +57,30 @@ def print_error(index, error, dump, color)
|
|
56
57
|
|
57
58
|
end
|
58
59
|
|
60
|
+
def validate_csv(source, schema, dump)
|
61
|
+
validator = Csvlint::Validator.new( source, nil, schema )
|
62
|
+
|
63
|
+
if $stdout.tty?
|
64
|
+
puts "#{source.path || source || "CSV"} is #{validator.valid? ? "VALID".green : "INVALID".red}"
|
65
|
+
else
|
66
|
+
puts "#{source.path || source || "CSV"} is #{validator.valid? ? "VALID" : "INVALID"}"
|
67
|
+
end
|
68
|
+
|
69
|
+
if validator.errors.size > 0
|
70
|
+
validator.errors.each_with_index do |error, i|
|
71
|
+
print_error(i, error, dump, :red)
|
72
|
+
end
|
73
|
+
end
|
74
|
+
|
75
|
+
if validator.warnings.size > 0
|
76
|
+
validator.warnings.each_with_index do |error, i|
|
77
|
+
print_error(i, error, dump, :yellow)
|
78
|
+
end
|
79
|
+
end
|
80
|
+
|
81
|
+
return validator.valid?
|
82
|
+
end
|
83
|
+
|
59
84
|
if ARGV.length == 0 && !$stdin.tty?
|
60
85
|
source = StringIO.new(ARGF.read)
|
61
86
|
else
|
@@ -63,13 +88,13 @@ else
|
|
63
88
|
source = ARGV[0]
|
64
89
|
unless source =~ /^http(s)?/
|
65
90
|
begin
|
66
|
-
source = File.new( source ) unless source =~ /^http(s)?/
|
91
|
+
source = File.new( source ) unless source =~ /^http(s)?/
|
67
92
|
rescue Errno::ENOENT
|
68
93
|
puts "#{source} not found"
|
69
94
|
exit 1
|
70
95
|
end
|
71
96
|
end
|
72
|
-
|
97
|
+
elsif !options[:schema]
|
73
98
|
puts "No CSV data to validate."
|
74
99
|
puts opts
|
75
100
|
exit 1
|
@@ -77,34 +102,50 @@ else
|
|
77
102
|
end
|
78
103
|
|
79
104
|
schema = nil
|
80
|
-
if options[:
|
105
|
+
if options[:schema]
|
81
106
|
begin
|
82
|
-
|
107
|
+
schema = Csvlint::Schema.load_from_json(options[:schema])
|
108
|
+
rescue JSON::ParserError => e
|
109
|
+
output_string = "invalid metadata: malformed JSON"
|
110
|
+
if $stdout.tty?
|
111
|
+
puts output_string.colorize(:red)
|
112
|
+
else
|
113
|
+
puts output_string
|
114
|
+
end
|
115
|
+
exit 1
|
116
|
+
rescue Csvlint::Csvw::MetadataError => e
|
117
|
+
output_string = "invalid metadata: #{e.message}#{" at " + e.path if e.path}"
|
118
|
+
if $stdout.tty?
|
119
|
+
puts output_string.colorize(:red)
|
120
|
+
else
|
121
|
+
puts output_string
|
122
|
+
end
|
123
|
+
exit 1
|
83
124
|
rescue Errno::ENOENT
|
84
|
-
puts "#{options[:
|
125
|
+
puts "#{options[:schema]} not found"
|
85
126
|
exit 1
|
86
127
|
end
|
87
|
-
schema = Csvlint::Schema.from_json_table(nil, JSON.parse(schemafile))
|
88
|
-
end
|
89
|
-
|
90
|
-
validator = Csvlint::Validator.new( source, nil, schema )
|
91
|
-
|
92
|
-
if $stdout.tty?
|
93
|
-
puts "#{ARGV[0] || "CSV"} is #{validator.valid? ? "VALID".green : "INVALID".red}"
|
94
|
-
else
|
95
|
-
puts "#{ARGV[0] || "CSV"} is #{validator.valid? ? "VALID" : "INVALID"}"
|
96
128
|
end
|
97
129
|
|
98
|
-
|
99
|
-
|
100
|
-
|
130
|
+
valid = true
|
131
|
+
if source.nil?
|
132
|
+
unless schema.instance_of? Csvlint::Csvw::TableGroup
|
133
|
+
puts "No CSV data to validate."
|
134
|
+
puts opts
|
135
|
+
exit 1
|
101
136
|
end
|
102
|
-
|
103
|
-
|
104
|
-
|
105
|
-
|
106
|
-
|
137
|
+
schema.tables.keys.each do |source|
|
138
|
+
begin
|
139
|
+
source = source.sub("file:","")
|
140
|
+
source = File.new( source )
|
141
|
+
rescue Errno::ENOENT
|
142
|
+
puts "#{source} not found"
|
143
|
+
exit 1
|
144
|
+
end unless source =~ /^http(s)?/
|
145
|
+
valid &= validate_csv(source, schema, options[:dump])
|
107
146
|
end
|
147
|
+
else
|
148
|
+
valid = validate_csv(source, schema, options[:dump])
|
108
149
|
end
|
109
150
|
|
110
|
-
exit 1 unless
|
151
|
+
exit 1 unless valid
|
data/csvlint.gemspec
CHANGED
@@ -23,6 +23,8 @@ Gem::Specification.new do |spec|
|
|
23
23
|
spec.add_dependency "open_uri_redirections"
|
24
24
|
spec.add_dependency "activesupport"
|
25
25
|
spec.add_dependency "addressable"
|
26
|
+
spec.add_dependency "escape_utils"
|
27
|
+
spec.add_dependency "uri_template"
|
26
28
|
|
27
29
|
spec.add_development_dependency "bundler", "~> 1.3"
|
28
30
|
spec.add_development_dependency "rake"
|
@@ -0,0 +1,127 @@
|
|
1
|
+
Feature: CSVW Schema Validation
|
2
|
+
|
3
|
+
Scenario: Valid CSV
|
4
|
+
Given I have a CSV with the following content:
|
5
|
+
"""
|
6
|
+
"Bob","1234","bob@example.org"
|
7
|
+
"Alice","5","alice@example.com"
|
8
|
+
"""
|
9
|
+
And it is stored at the url "http://example.com/example1.csv"
|
10
|
+
And I have metadata with the following content:
|
11
|
+
"""
|
12
|
+
{
|
13
|
+
"@context": "http://www.w3.org/ns/csvw",
|
14
|
+
"url": "http://example.com/example1.csv",
|
15
|
+
"dialect": { "header": false },
|
16
|
+
"tableSchema": {
|
17
|
+
"columns": [
|
18
|
+
{ "name": "Name", "required": true },
|
19
|
+
{ "name": "Id", "required": true, "datatype": { "base": "string", "minLength": 1 } },
|
20
|
+
{ "name": "Email", "required": true }
|
21
|
+
]
|
22
|
+
}
|
23
|
+
}
|
24
|
+
"""
|
25
|
+
When I ask if there are errors
|
26
|
+
Then there should be 0 error
|
27
|
+
|
28
|
+
Scenario: Schema invalid CSV
|
29
|
+
Given I have a CSV with the following content:
|
30
|
+
"""
|
31
|
+
"Bob","1234","bob@example.org"
|
32
|
+
"Alice","5","alice@example.com"
|
33
|
+
"""
|
34
|
+
And it is stored at the url "http://example.com/example1.csv"
|
35
|
+
And I have metadata with the following content:
|
36
|
+
"""
|
37
|
+
{
|
38
|
+
"@context": "http://www.w3.org/ns/csvw",
|
39
|
+
"url": "http://example.com/example1.csv",
|
40
|
+
"dialect": { "header": false },
|
41
|
+
"tableSchema": {
|
42
|
+
"columns": [
|
43
|
+
{ "name": "Name", "required": true },
|
44
|
+
{ "name": "Id", "required": true, "datatype": { "base": "string", "minLength": 3 } },
|
45
|
+
{ "name": "Email", "required": true }
|
46
|
+
]
|
47
|
+
}
|
48
|
+
}
|
49
|
+
"""
|
50
|
+
When I ask if there are errors
|
51
|
+
Then there should be 1 error
|
52
|
+
|
53
|
+
Scenario: CSV with incorrect header
|
54
|
+
Given I have a CSV with the following content:
|
55
|
+
"""
|
56
|
+
"name","id","contact"
|
57
|
+
"Bob","1234","bob@example.org"
|
58
|
+
"Alice","5","alice@example.com"
|
59
|
+
"""
|
60
|
+
And it is stored at the url "http://example.com/example1.csv"
|
61
|
+
And I have metadata with the following content:
|
62
|
+
"""
|
63
|
+
{
|
64
|
+
"@context": "http://www.w3.org/ns/csvw",
|
65
|
+
"url": "http://example.com/example1.csv",
|
66
|
+
"tableSchema": {
|
67
|
+
"columns": [
|
68
|
+
{ "titles": "name", "required": true },
|
69
|
+
{ "titles": "id", "required": true, "datatype": { "base": "string", "minLength": 1 } },
|
70
|
+
{ "titles": "email", "required": true }
|
71
|
+
]
|
72
|
+
}
|
73
|
+
}
|
74
|
+
"""
|
75
|
+
When I ask if there are errors
|
76
|
+
Then there should be 1 error
|
77
|
+
|
78
|
+
Scenario: Schema with valid regex
|
79
|
+
Given I have a CSV with the following content:
|
80
|
+
"""
|
81
|
+
"firstname","id","email"
|
82
|
+
"Bob","1234","bob@example.org"
|
83
|
+
"Alice","5","alice@example.com"
|
84
|
+
"""
|
85
|
+
And it is stored at the url "http://example.com/example1.csv"
|
86
|
+
And I have metadata with the following content:
|
87
|
+
"""
|
88
|
+
{
|
89
|
+
"@context": "http://www.w3.org/ns/csvw",
|
90
|
+
"url": "http://example.com/example1.csv",
|
91
|
+
"tableSchema": {
|
92
|
+
"columns": [
|
93
|
+
{ "titles": "firstname", "required": true, "datatype": { "base": "string", "format": "^[A-Za-z0-9_]*$" } },
|
94
|
+
{ "titles": "id", "required": true, "datatype": { "base": "string", "minLength": 1 } },
|
95
|
+
{ "titles": "email", "required": true }
|
96
|
+
]
|
97
|
+
}
|
98
|
+
}
|
99
|
+
"""
|
100
|
+
When I ask if there are warnings
|
101
|
+
Then there should be 0 warnings
|
102
|
+
|
103
|
+
Scenario: Schema with invalid regex
|
104
|
+
Given I have a CSV with the following content:
|
105
|
+
"""
|
106
|
+
"firstname","id","email"
|
107
|
+
"Bob","1234","bob@example.org"
|
108
|
+
"Alice","5","alice@example.com"
|
109
|
+
"""
|
110
|
+
And it is stored at the url "http://example.com/example1.csv"
|
111
|
+
And I have metadata with the following content:
|
112
|
+
"""
|
113
|
+
{
|
114
|
+
"@context": "http://www.w3.org/ns/csvw",
|
115
|
+
"url": "http://example.com/example1.csv",
|
116
|
+
"tableSchema": {
|
117
|
+
"columns": [
|
118
|
+
{ "titles": "firstname", "required": true, "datatype": { "base": "string", "format": "((" } },
|
119
|
+
{ "titles": "id", "required": true, "datatype": { "base": "string", "minLength": 1 } },
|
120
|
+
{ "titles": "email", "required": true }
|
121
|
+
]
|
122
|
+
}
|
123
|
+
}
|
124
|
+
"""
|
125
|
+
When I ask if there are warnings
|
126
|
+
Then there should be 1 warnings
|
127
|
+
And that warning should have the type "invalid_regex"
|