datapackage 0.0.1
Sign up to get free protection for your applications and to get access to all the features.
- data/LICENSE.md +20 -0
- data/README.md +184 -0
- data/etc/README.md +18 -0
- data/etc/csvddf-dialect-schema.json +24 -0
- data/etc/datapackage-schema.json +208 -0
- data/etc/jsontable-schema.json +34 -0
- data/lib/datapackage/package.rb +160 -0
- data/lib/datapackage/validator.rb +176 -0
- data/lib/datapackage/version.rb +3 -0
- data/lib/datapackage.rb +11 -0
- metadata +152 -0
data/LICENSE.md
ADDED
@@ -0,0 +1,20 @@
|
|
1
|
+
Copyright 2013 The Open Data Institute
|
2
|
+
|
3
|
+
Permission is hereby granted, free of charge, to any person obtaining
|
4
|
+
a copy of this software and associated documentation files (the
|
5
|
+
"Software"), to deal in the Software without restriction, including
|
6
|
+
without limitation the rights to use, copy, modify, merge, publish,
|
7
|
+
distribute, sublicense, and/or sell copies of the Software, and to
|
8
|
+
permit persons to whom the Software is furnished to do so, subject to
|
9
|
+
the following conditions:
|
10
|
+
|
11
|
+
The above copyright notice and this permission notice shall be
|
12
|
+
included in all copies or substantial portions of the Software.
|
13
|
+
|
14
|
+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
|
15
|
+
EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
|
16
|
+
MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
|
17
|
+
NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE
|
18
|
+
LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION
|
19
|
+
OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION
|
20
|
+
WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
|
data/README.md
ADDED
@@ -0,0 +1,184 @@
|
|
1
|
+
# DataPackage.rb
|
2
|
+
|
3
|
+
A ruby library for working with [Data Packages](http://dataprotocols.org/data-packages/).
|
4
|
+
|
5
|
+
[![Build Status](http://jenkins.theodi.org/job/datapackage.rb-master/badge/icon)](http://jenkins.theodi.org/job/datapackage.rb-master/)
|
6
|
+
[![Code Climate](https://codeclimate.com/github/theodi/datapackage.rb.png)](https://codeclimate.com/github/theodi/datapackage.rb)
|
7
|
+
[![Dependency Status](https://gemnasium.com/theodi/datapackage.rb.png)](https://gemnasium.com/theodi/datapackage.rb)
|
8
|
+
|
9
|
+
The library is intending to support:
|
10
|
+
|
11
|
+
* Parsing and using data package metadata and data
|
12
|
+
* Validating data packages to ensure they conform with the Data Package specification
|
13
|
+
|
14
|
+
## Installation
|
15
|
+
|
16
|
+
Add the gem into your Gemfile:
|
17
|
+
|
18
|
+
gem 'datapackage.rb', :git => "git://github.com/theodi/datapackage.rb.git"
|
19
|
+
|
20
|
+
Note: gem release to come
|
21
|
+
|
22
|
+
## Basic Usage
|
23
|
+
|
24
|
+
Require the gem, if you need to:
|
25
|
+
|
26
|
+
require 'datapackage.rb'
|
27
|
+
|
28
|
+
Parsing a datapackage from a remote location:
|
29
|
+
|
30
|
+
package = DataPackage::Package.new( "http://example.org/datasets/a" )
|
31
|
+
|
32
|
+
This assumes that `http://example.org/datasets/a/datapackage.json` exists, or specifically load a JSON file:
|
33
|
+
|
34
|
+
package = DataPackage::Package.new( "http://example.org/datasets/a/datapackage.json" )
|
35
|
+
|
36
|
+
Similarly you can load a package from a local JSON file, or specify a directory:
|
37
|
+
|
38
|
+
package = DataPackage::Package.new( "/my/data/package" )
|
39
|
+
package = DataPackage::Package.new( "/my/data/package/datapackage.json" )
|
40
|
+
|
41
|
+
There are a set of helper methods for accessing data from the package, e.g:
|
42
|
+
|
43
|
+
package = DataPackage::Package.new( "/my/data/package" )
|
44
|
+
package.name
|
45
|
+
package.title
|
46
|
+
package.licenses
|
47
|
+
package.resources
|
48
|
+
|
49
|
+
These currently just return the raw JSON structure, but this might change in future.
|
50
|
+
|
51
|
+
## Package Validation
|
52
|
+
|
53
|
+
The library supports validating packages. It can be used to validate both the metadata for the package (`datapackage.json`)
|
54
|
+
and the integrity of the package itself, e.g. whether the data files exist.
|
55
|
+
|
56
|
+
### Validating a Package
|
57
|
+
|
58
|
+
Quickly checking the validity of a package can be achieve as follows:
|
59
|
+
|
60
|
+
package.valid?
|
61
|
+
|
62
|
+
To expose more detail on errors and warnings:
|
63
|
+
|
64
|
+
messages = package.validate() # or package.validate(:datapackage)
|
65
|
+
|
66
|
+
This returns an object with two keys: `:errors` and `:warnings`. These are arrays of messages.
|
67
|
+
|
68
|
+
Warnings might include notes on missing metadata elements (e.g. package `licenses`) which are not required by the DataPackage specification
|
69
|
+
but which SHOULD be included.
|
70
|
+
|
71
|
+
It is possible to treat all warnings as errors by performing strict validation:
|
72
|
+
|
73
|
+
package.valid?(true)
|
74
|
+
|
75
|
+
Warnings are currently generated for:
|
76
|
+
|
77
|
+
* Missing `README.md` files from packages
|
78
|
+
* Missing `licenses` key from `datapackage.json`
|
79
|
+
* Missing `datapackage_version` key from `datapackage.json`
|
80
|
+
|
81
|
+
### Selecting a Validation Profile
|
82
|
+
|
83
|
+
The library contains two validation classes, one for the core Data Package specification and the other for the Simple Data Format
|
84
|
+
rules. By default the library uses the more liberal Data Package rules.
|
85
|
+
|
86
|
+
The required profile can be specified in one of two ways. Either as a parameter to the validation methods:
|
87
|
+
|
88
|
+
package.valid?(:datapackage)
|
89
|
+
package.valid?(:simpledataformat)
|
90
|
+
package.validate(:datapackage)
|
91
|
+
package.validate(:simpledataformat)
|
92
|
+
|
93
|
+
Or, by using a `DataPackage::Validation` class:
|
94
|
+
|
95
|
+
validator = DataPackage::SimpleDataFormatValidator.new
|
96
|
+
validator.valid?( package )
|
97
|
+
validator.validate( package )
|
98
|
+
|
99
|
+
### Approach to Validation
|
100
|
+
|
101
|
+
The library will support validating packages against the general rules specified in the
|
102
|
+
[DataPackage specification](http://dataprotocols.org/data-packages/) as well as the stricter requirements given in the
|
103
|
+
[Simple Data Format specification](http://dataprotocols.org/simple-data-format/) (SDF).
|
104
|
+
|
105
|
+
SDF is essentially a profile of DataPackage which includes some additional restrictions on
|
106
|
+
how data should be published and described. For example all data is to be published as CSV files.
|
107
|
+
|
108
|
+
The validation in the library is divided into two parts:
|
109
|
+
|
110
|
+
* Metadata validation -- checking that the structure of the `datapackage.json` file is correcgt
|
111
|
+
* Integrity checking -- checking that the overall package and data files appear to be in order
|
112
|
+
|
113
|
+
#### Metadata Validation
|
114
|
+
|
115
|
+
The basic structure of `datapackage.json` files are validated using [JSON Schema](http://json-schema.org/). This provides a simple
|
116
|
+
declarative way to describe the expected structure of the package metadata.
|
117
|
+
|
118
|
+
The schema files can be found in the `etc` directory of the project and could be used in other applications. The schema files can
|
119
|
+
be customised to support local validation checks (see below).
|
120
|
+
|
121
|
+
#### Integrity Checking
|
122
|
+
|
123
|
+
While the metadata for a package might be correct, there are other ways in which the package could be invalid. For example,
|
124
|
+
data files might be missing or incorrectly described.
|
125
|
+
|
126
|
+
The metadata validation is therefore supplemented with some custom code that performs some other checks:
|
127
|
+
|
128
|
+
* (Both profiles) All resources described in the package must be accessible, e.g. the local file exists or a URL responds successfully to a `HEAD`
|
129
|
+
* (`:simpledataformat`) All resources must be CSV files
|
130
|
+
* (`:simpledataformat`) All resources must have a valid JSON Table Schema
|
131
|
+
* (`:simpledataformat`) CSV `dialect` descriptions must be valid
|
132
|
+
* (`:simpledataformat`) All fields declared in the schema must be present in the CSV file
|
133
|
+
* (`:simpledataformat`) All fields present in the CSV file must be present in the schema
|
134
|
+
|
135
|
+
### Customising the Validation Code
|
136
|
+
|
137
|
+
The library provides several extension points for customising the way that packages are validated.
|
138
|
+
|
139
|
+
#### Supplying Custom JSON Schemas
|
140
|
+
|
141
|
+
Custom JSON schemas can be provided to allow validation to be tweaked for local conventions. An options hash can be
|
142
|
+
provided to the constructor of a `DataPackage::Validator` object, this can be used to map schema names to custom
|
143
|
+
schemas.
|
144
|
+
|
145
|
+
(Any options passed to the constructor of a `DataPackage::Package` object will also be passed to its validator)
|
146
|
+
|
147
|
+
For example to create a new validation profile called `my-validation-rules` and then apply it:
|
148
|
+
|
149
|
+
opts = {
|
150
|
+
:schema => {
|
151
|
+
:my-validation-rules => "/path/to/json/schema.json"
|
152
|
+
}
|
153
|
+
}
|
154
|
+
package = DataPackage::Package.new( url )
|
155
|
+
package.valid?(:my-validation-rules)
|
156
|
+
|
157
|
+
This will cause the code to create a custom `DataPackage::Validator` instance that will apply the supplied schema. This class
|
158
|
+
does not provide any integrity checks.
|
159
|
+
|
160
|
+
To mix a custom schema with the existing integrity checking, you must manually create a `Validator` instance. E.g:
|
161
|
+
|
162
|
+
opts = {
|
163
|
+
:schema => {
|
164
|
+
:my-validation-rules => "/path/to/json/schema.json"
|
165
|
+
}
|
166
|
+
}
|
167
|
+
validator = DataPackage::SimpleDataFormatValidator(:my-validation-rules, opts)
|
168
|
+
validator.valid?( package )
|
169
|
+
|
170
|
+
Custom schemas must be valid JSON files that conforms to the JSON Schema v4 specification. The absolute path to the schema file must be
|
171
|
+
provided.
|
172
|
+
|
173
|
+
Validation is performed using the [json-schema](https://github.com/hoxworth/json-schema) gem which has some documented restrictions.
|
174
|
+
|
175
|
+
The built-in schema files can also be overridden in this way, e.g. by specifying an alternate location for the `:datapackage` schema.
|
176
|
+
|
177
|
+
#### Custom Integrity Checking
|
178
|
+
|
179
|
+
Integrity checking can be customized by creating new sub-classes of `DataPackage::Validator` or one of the existing sub-classes.
|
180
|
+
|
181
|
+
The following methods can be implemented:
|
182
|
+
|
183
|
+
* `validate_metadata(package, messages)` -- perform additional metadata checking after JSON schema is provided.
|
184
|
+
* `validate_resource(package, resource, messages)` -- called for each resource in the package
|
data/etc/README.md
ADDED
@@ -0,0 +1,18 @@
|
|
1
|
+
This directory contains some JSON Schema documents for validating:
|
2
|
+
|
3
|
+
* `datapackage-schema.json` -- [datapackage.json](http://dataprotocols.org/data-packages/) package files
|
4
|
+
* `jsontable-schema.json` -- [JSON Table Schemas](http://dataprotocols.org/json-table-schema/) objects
|
5
|
+
* `csvddf-dialect-schema.json` -- [CSV Dialect Description Format](http://dataprotocols.org/csv-dialect/) dialect objects
|
6
|
+
|
7
|
+
The JSON Table Schemas and CSV Dialect Description Format both define JSON object structures that can appear in `datapackage.json` files (via the `schema` and `dialect` keywords). In the main `datapackage-schema.json` object, these keywords are only validated as simple objects.
|
8
|
+
|
9
|
+
In the application the subsidiary schemas are automatically applied to relevant keys. This could be improved by using JSON Schema cross-referencing.
|
10
|
+
|
11
|
+
Other potential improvements include:
|
12
|
+
|
13
|
+
* Add `data` keyword validation to `datapackage-schema.json`
|
14
|
+
* Add `format` keywords for validating email addresses and date/date-times
|
15
|
+
* Or, add `pattern` for validating dates
|
16
|
+
* Improve regexs used in various places
|
17
|
+
|
18
|
+
|
@@ -0,0 +1,24 @@
|
|
1
|
+
{
|
2
|
+
"$schema": "http://json-schema.org/draft-04/schema#",
|
3
|
+
"title": "CSVDDF",
|
4
|
+
"description": "JSON Schema for validating CSVDDF dialect structures",
|
5
|
+
"type": "object",
|
6
|
+
"properties": {
|
7
|
+
"delimiter": {
|
8
|
+
"type": "string"
|
9
|
+
},
|
10
|
+
"doublequote": {
|
11
|
+
"type": "boolean"
|
12
|
+
},
|
13
|
+
"lineterminator": {
|
14
|
+
"type": "string"
|
15
|
+
},
|
16
|
+
"quotechar": {
|
17
|
+
"type": "string"
|
18
|
+
},
|
19
|
+
"skipinitialspace": {
|
20
|
+
"type": "boolean"
|
21
|
+
}
|
22
|
+
},
|
23
|
+
"required": [ "delimiter", "doublequote", "lineterminator", "quotechar", "skipinitialspace" ]
|
24
|
+
}
|
@@ -0,0 +1,208 @@
|
|
1
|
+
{
|
2
|
+
"$schema": "http://json-schema.org/draft-04/schema#",
|
3
|
+
"title": "DataPackage",
|
4
|
+
"description": "JSON Schema for validating datapackage.json files",
|
5
|
+
"type": "object",
|
6
|
+
"properties": {
|
7
|
+
"name": {
|
8
|
+
"type": "string",
|
9
|
+
"pattern": "^([a-z\\.\\_\\-])+$"
|
10
|
+
},
|
11
|
+
"licences": {
|
12
|
+
"type": "array",
|
13
|
+
"items": {
|
14
|
+
"type": "object",
|
15
|
+
"properties": {
|
16
|
+
"id": { "type": "string" },
|
17
|
+
"url": { "type": "string" }
|
18
|
+
},
|
19
|
+
"anyOf": [
|
20
|
+
{ "title": "id required", "required": ["id"] },
|
21
|
+
{ "title": "url required", "required": ["url"] }
|
22
|
+
]
|
23
|
+
}
|
24
|
+
},
|
25
|
+
"datapackage_version": {
|
26
|
+
"type": "string"
|
27
|
+
},
|
28
|
+
"title": {
|
29
|
+
"type": "string"
|
30
|
+
},
|
31
|
+
"description": {
|
32
|
+
"type": "string"
|
33
|
+
},
|
34
|
+
"homepage": {
|
35
|
+
"type": "string"
|
36
|
+
},
|
37
|
+
"version": {
|
38
|
+
"type": "string"
|
39
|
+
},
|
40
|
+
"sources": {
|
41
|
+
"type": "array",
|
42
|
+
"items": {
|
43
|
+
"type": "object",
|
44
|
+
"properties": {
|
45
|
+
"name": { "type": "string" },
|
46
|
+
"web": { "type": "string" },
|
47
|
+
"email": { "type": "string" }
|
48
|
+
},
|
49
|
+
"anyOf": [
|
50
|
+
{ "title": "name required", "required": ["name"] },
|
51
|
+
{ "title": "web required", "required": ["web"] },
|
52
|
+
{ "title": "email required", "required": ["email"] }
|
53
|
+
]
|
54
|
+
}
|
55
|
+
},
|
56
|
+
"keywords": {
|
57
|
+
"type": "array",
|
58
|
+
"items": {
|
59
|
+
"type": "string"
|
60
|
+
}
|
61
|
+
},
|
62
|
+
"last_modified": {
|
63
|
+
"type": "string"
|
64
|
+
},
|
65
|
+
"image": {
|
66
|
+
"type": "string"
|
67
|
+
},
|
68
|
+
"bugs": {
|
69
|
+
"type": "string"
|
70
|
+
},
|
71
|
+
"maintainers": {
|
72
|
+
"type": "array",
|
73
|
+
"items": {
|
74
|
+
"type": "object",
|
75
|
+
"properties": {
|
76
|
+
"name": {
|
77
|
+
"type": "string"
|
78
|
+
},
|
79
|
+
"email": {
|
80
|
+
"type": "string"
|
81
|
+
},
|
82
|
+
"web": {
|
83
|
+
"type": "string"
|
84
|
+
}
|
85
|
+
},
|
86
|
+
"required": ["name"]
|
87
|
+
}
|
88
|
+
},
|
89
|
+
"contributors": {
|
90
|
+
"type": "array",
|
91
|
+
"items": {
|
92
|
+
"type": "object",
|
93
|
+
"properties": {
|
94
|
+
"name": {
|
95
|
+
"type": "string"
|
96
|
+
},
|
97
|
+
"email": {
|
98
|
+
"type": "string"
|
99
|
+
},
|
100
|
+
"web": {
|
101
|
+
"type": "string"
|
102
|
+
}
|
103
|
+
},
|
104
|
+
"required": ["name"]
|
105
|
+
}
|
106
|
+
},
|
107
|
+
"publisher": {
|
108
|
+
"type": "array",
|
109
|
+
"items": {
|
110
|
+
"type": "object",
|
111
|
+
"properties": {
|
112
|
+
"name": {
|
113
|
+
"type": "string"
|
114
|
+
},
|
115
|
+
"email": {
|
116
|
+
"type": "string"
|
117
|
+
},
|
118
|
+
"web": {
|
119
|
+
"type": "string"
|
120
|
+
}
|
121
|
+
},
|
122
|
+
"required": ["name"]
|
123
|
+
}
|
124
|
+
},
|
125
|
+
"dependencies": {
|
126
|
+
"type": "object"
|
127
|
+
},
|
128
|
+
"resources": {
|
129
|
+
"type": "array",
|
130
|
+
"minItems": 1,
|
131
|
+
"items": {
|
132
|
+
"type": "object",
|
133
|
+
"properties": {
|
134
|
+
"url": {
|
135
|
+
"type": "string"
|
136
|
+
},
|
137
|
+
"path": {
|
138
|
+
"type": "string"
|
139
|
+
},
|
140
|
+
"name": {
|
141
|
+
"type": "string"
|
142
|
+
},
|
143
|
+
"format": {
|
144
|
+
"type": "string"
|
145
|
+
},
|
146
|
+
"mediatype": {
|
147
|
+
"type": "string",
|
148
|
+
"pattern": "^(.+)/(.+)$"
|
149
|
+
},
|
150
|
+
"encoding": {
|
151
|
+
"type": "string"
|
152
|
+
},
|
153
|
+
"bytes": {
|
154
|
+
"type": "integer"
|
155
|
+
},
|
156
|
+
"hash": {
|
157
|
+
"type": "string",
|
158
|
+
"pattern": "^([a-fA-F0-9]{32})$"
|
159
|
+
},
|
160
|
+
"modified": {
|
161
|
+
"type": "string"
|
162
|
+
},
|
163
|
+
"schema": {
|
164
|
+
"type": "object"
|
165
|
+
},
|
166
|
+
"dialect": {
|
167
|
+
"type": "object"
|
168
|
+
},
|
169
|
+
"sources": {
|
170
|
+
"type": "array",
|
171
|
+
"items": {
|
172
|
+
"type": "object",
|
173
|
+
"properties": {
|
174
|
+
"name": { "type": "string" },
|
175
|
+
"web": { "type": "string" },
|
176
|
+
"email": { "type": "string" }
|
177
|
+
},
|
178
|
+
"anyOf": [
|
179
|
+
{ "title": "name required", "required": ["name"] },
|
180
|
+
{ "title": "web required", "required": ["web"] },
|
181
|
+
{ "title": "email required", "required": ["email"] }
|
182
|
+
]
|
183
|
+
}
|
184
|
+
},
|
185
|
+
"licences": {
|
186
|
+
"type": "array",
|
187
|
+
"items": {
|
188
|
+
"type": "object",
|
189
|
+
"properties": {
|
190
|
+
"id": { "type": "string" },
|
191
|
+
"url": { "type": "string" }
|
192
|
+
},
|
193
|
+
"anyOf": [
|
194
|
+
{ "title": "id required", "required": ["id"] },
|
195
|
+
{ "title": "url required", "required": ["url"] }
|
196
|
+
]
|
197
|
+
}
|
198
|
+
}
|
199
|
+
},
|
200
|
+
"anyOf": [
|
201
|
+
{ "title": "url required", "required": ["url"] },
|
202
|
+
{ "title": "path required", "required": ["path"] }
|
203
|
+
]
|
204
|
+
}
|
205
|
+
}
|
206
|
+
},
|
207
|
+
"required": ["name", "resources"]
|
208
|
+
}
|
@@ -0,0 +1,34 @@
|
|
1
|
+
{
|
2
|
+
"$schema": "http://json-schema.org/draft-04/schema#",
|
3
|
+
"title": "JSON Table Schema",
|
4
|
+
"description": "JSON Schema for validating JSON Table structures",
|
5
|
+
"type": "object",
|
6
|
+
"properties": {
|
7
|
+
"fields": {
|
8
|
+
"type": "array",
|
9
|
+
"minItems": 1,
|
10
|
+
"items": {
|
11
|
+
"type": "object",
|
12
|
+
"properties": {
|
13
|
+
"name": {
|
14
|
+
"type": "string"
|
15
|
+
},
|
16
|
+
"title": {
|
17
|
+
"type": "string"
|
18
|
+
},
|
19
|
+
"description": {
|
20
|
+
"type": "string"
|
21
|
+
},
|
22
|
+
"type": {
|
23
|
+
"enum": [ "string", "number", "integer", "date", "time", "datetime", "boolean", "binary", "object", "geopoint", "geojson", "array", "any" ]
|
24
|
+
},
|
25
|
+
"format": {
|
26
|
+
"type": "string"
|
27
|
+
}
|
28
|
+
},
|
29
|
+
"required": ["name"]
|
30
|
+
}
|
31
|
+
}
|
32
|
+
},
|
33
|
+
"required": ["fields"]
|
34
|
+
}
|
@@ -0,0 +1,160 @@
|
|
1
|
+
require 'open-uri'
|
2
|
+
|
3
|
+
module DataPackage
|
4
|
+
|
5
|
+
class Package
|
6
|
+
|
7
|
+
attr_reader :metadata, :opts
|
8
|
+
|
9
|
+
# Parse a data package
|
10
|
+
#
|
11
|
+
# Supports reading data from JSON file, directory, and a URL
|
12
|
+
#
|
13
|
+
# package:: Hash or a String
|
14
|
+
# opts:: Options used to customize reading and parsing
|
15
|
+
def initialize(package, opts={})
|
16
|
+
@opts = opts
|
17
|
+
#TODO base directory/url
|
18
|
+
if package.class == Hash
|
19
|
+
@metadata = package
|
20
|
+
else
|
21
|
+
if !package.start_with?("http") && File.directory?(package)
|
22
|
+
package = File.join(package, opts[:default_filename] || "datapackage.json")
|
23
|
+
end
|
24
|
+
if package.start_with?("http") && !package.end_with?("datapackage.json")
|
25
|
+
package = URI.join(package, "datapackage.json")
|
26
|
+
end
|
27
|
+
@location = package.to_s
|
28
|
+
@metadata = JSON.parse( open(package).read )
|
29
|
+
end
|
30
|
+
end
|
31
|
+
|
32
|
+
#Returns the directory for a local file package or base url for a remote
|
33
|
+
#Returns nil for an in-memory object (because it has no base as yet)
|
34
|
+
def base
|
35
|
+
#user can override base
|
36
|
+
return @opts[:base] if @opts[:base]
|
37
|
+
return "" unless @location
|
38
|
+
#work out base directory or uri
|
39
|
+
if local?
|
40
|
+
return File.dirname( @location )
|
41
|
+
else
|
42
|
+
return @location.split("/")[0..-2].join("/")
|
43
|
+
end
|
44
|
+
end
|
45
|
+
|
46
|
+
#Is this a local package? Returns true if created from an in-memory object or a file/directory reference
|
47
|
+
def local?
|
48
|
+
return !@location.start_with?("http") if @location
|
49
|
+
return true
|
50
|
+
end
|
51
|
+
|
52
|
+
def name
|
53
|
+
@metadata["name"]
|
54
|
+
end
|
55
|
+
|
56
|
+
def title
|
57
|
+
@metadata["title"]
|
58
|
+
end
|
59
|
+
|
60
|
+
def description
|
61
|
+
@metadata["description"]
|
62
|
+
end
|
63
|
+
|
64
|
+
def homepage
|
65
|
+
@metadata["homepage"]
|
66
|
+
end
|
67
|
+
|
68
|
+
def licenses
|
69
|
+
@metadata["licenses"] || []
|
70
|
+
end
|
71
|
+
alias_method :licences, :licenses
|
72
|
+
|
73
|
+
#What version of datapackage specification is this using?
|
74
|
+
def datapackage_version
|
75
|
+
@metadata["datapackage_version"]
|
76
|
+
end
|
77
|
+
|
78
|
+
#What is the version of this specific data package?
|
79
|
+
def version
|
80
|
+
@metadata["version"]
|
81
|
+
end
|
82
|
+
|
83
|
+
def sources
|
84
|
+
@metadata["sources"] || []
|
85
|
+
end
|
86
|
+
|
87
|
+
def keywords
|
88
|
+
@metadata["keywords"] || []
|
89
|
+
end
|
90
|
+
|
91
|
+
def last_modified
|
92
|
+
DateTime.parse @metadata["last_modified"] rescue nil
|
93
|
+
end
|
94
|
+
|
95
|
+
def image
|
96
|
+
@metadata["image"]
|
97
|
+
end
|
98
|
+
|
99
|
+
def maintainers
|
100
|
+
@metadata["maintainers"] || []
|
101
|
+
end
|
102
|
+
|
103
|
+
def contributors
|
104
|
+
@metadata["contributors"] || []
|
105
|
+
end
|
106
|
+
|
107
|
+
def publisher
|
108
|
+
@metadata["publisher"] || []
|
109
|
+
end
|
110
|
+
|
111
|
+
def resources
|
112
|
+
@metadata["resources"] || []
|
113
|
+
end
|
114
|
+
|
115
|
+
def dependencies
|
116
|
+
@metadata["dependencies"]
|
117
|
+
end
|
118
|
+
|
119
|
+
def property(property, default=nil)
|
120
|
+
@metadata[property] || default
|
121
|
+
end
|
122
|
+
|
123
|
+
def valid?(profile=:datapackage, strict=false)
|
124
|
+
validator = DataPackage::Validator.create(profile, @opts)
|
125
|
+
return validator.valid?(self, strict)
|
126
|
+
end
|
127
|
+
|
128
|
+
def validate(profile=:datapackage)
|
129
|
+
validator = DataPackage::Validator.create(profile, @opts)
|
130
|
+
return validator.validate(self)
|
131
|
+
end
|
132
|
+
|
133
|
+
def resolve_resource(resource)
|
134
|
+
return resource["url"] || resolve( resource["path"] )
|
135
|
+
end
|
136
|
+
|
137
|
+
def resolve(path)
|
138
|
+
if local?
|
139
|
+
return File.join( base , path) if base != ""
|
140
|
+
return path
|
141
|
+
else
|
142
|
+
return URI.join(base, path)
|
143
|
+
end
|
144
|
+
end
|
145
|
+
|
146
|
+
def resource_exists?(location)
|
147
|
+
if !location.to_s.start_with?("http")
|
148
|
+
return File.exists?( location )
|
149
|
+
else
|
150
|
+
begin
|
151
|
+
status = RestClient.head( location ).code
|
152
|
+
return status == 200
|
153
|
+
rescue => e
|
154
|
+
return false
|
155
|
+
end
|
156
|
+
end
|
157
|
+
end
|
158
|
+
|
159
|
+
end
|
160
|
+
end
|
@@ -0,0 +1,176 @@
|
|
1
|
+
module DataPackage
|
2
|
+
|
3
|
+
#Base class for validators
|
4
|
+
class Validator
|
5
|
+
|
6
|
+
def Validator.create(profile, opts={})
|
7
|
+
if profile == :simpledataformat
|
8
|
+
return SimpleDataFormatValidator.new(profile, opts)
|
9
|
+
end
|
10
|
+
if profile == :datapackage
|
11
|
+
return DataPackageValidator.new(profile, opts)
|
12
|
+
end
|
13
|
+
return Validator.new(profile, opts)
|
14
|
+
end
|
15
|
+
|
16
|
+
def initialize(schema_name, opts={})
|
17
|
+
@schema_name = schema_name
|
18
|
+
@opts = opts
|
19
|
+
end
|
20
|
+
|
21
|
+
def valid?(package, strict=false)
|
22
|
+
messages = validate( package )
|
23
|
+
return messages[:errors].empty? if !strict
|
24
|
+
return messages[:errors].empty? && messages[:warnings].empty?
|
25
|
+
end
|
26
|
+
|
27
|
+
def validate( package )
|
28
|
+
return validate_integrity( package, validate_with_schema(package) )
|
29
|
+
end
|
30
|
+
|
31
|
+
def validate_with_schema(package)
|
32
|
+
schema = load_schema(@schema_name)
|
33
|
+
messages = {
|
34
|
+
:errors => JSON::Validator.fully_validate(schema, package.metadata, :errors_as_objects => true),
|
35
|
+
:warnings => []
|
36
|
+
}
|
37
|
+
validate_metadata(package, messages)
|
38
|
+
return messages
|
39
|
+
end
|
40
|
+
|
41
|
+
def validate_integrity(package, messages={ :errors=>[], :warnings=>[] } )
|
42
|
+
package.resources.each do |resource|
|
43
|
+
validate_resource(package, resource, messages)
|
44
|
+
end
|
45
|
+
|
46
|
+
messages
|
47
|
+
end
|
48
|
+
|
49
|
+
protected
|
50
|
+
|
51
|
+
#implement to perform additional validation on metadata
|
52
|
+
def validate_metadata(package, messages)
|
53
|
+
end
|
54
|
+
|
55
|
+
#implement for per-resource validation
|
56
|
+
def validate_resource(package, resource, messages)
|
57
|
+
end
|
58
|
+
|
59
|
+
protected
|
60
|
+
|
61
|
+
def load_schema(profile)
|
62
|
+
if @opts[:schema] && @opts[:schema][profile]
|
63
|
+
if !File.exists?( @opts[:schema][profile] )
|
64
|
+
raise "User supplied schema file does not exist: #{@opts[:schema][profile]}"
|
65
|
+
end
|
66
|
+
return JSON.parse( File.read( @opts[:schema][profile] ) )
|
67
|
+
end
|
68
|
+
schema_file = file_in_etc_directory( "#{profile}-schema.json" )
|
69
|
+
if !File.exists?( schema_file )
|
70
|
+
raise "Unable to read schema file #{schema_file} for validation profile #{profile}"
|
71
|
+
end
|
72
|
+
return JSON.parse( File.read( schema_file ) )
|
73
|
+
end
|
74
|
+
|
75
|
+
private
|
76
|
+
|
77
|
+
def file_in_etc_directory(filename)
|
78
|
+
File.join( File.dirname(__FILE__), "..", "..", "etc", filename )
|
79
|
+
end
|
80
|
+
|
81
|
+
end
|
82
|
+
|
83
|
+
#Extends base class with some additional checks for DataPackage conformance.
|
84
|
+
#
|
85
|
+
#These include some warnings about missing metadata elements and an existence
|
86
|
+
#check for all resources
|
87
|
+
class DataPackageValidator < Validator
|
88
|
+
def initialize(schema_name=:datapackage, opts={})
|
89
|
+
super(:datapackage, opts)
|
90
|
+
end
|
91
|
+
|
92
|
+
def validate_metadata(package, messages)
|
93
|
+
#not required, but recommended
|
94
|
+
prefix = "The package does not include a"
|
95
|
+
messages[:warnings] << "#{prefix} 'licenses' property" if package.licenses.empty?
|
96
|
+
messages[:warnings] << "#{prefix} 'datapackage_version' property" unless package.datapackage_version
|
97
|
+
messages[:warnings] << "#{prefix} README.md file" unless package.resource_exists?( package.resolve("README.md") )
|
98
|
+
end
|
99
|
+
|
100
|
+
def validate_resource(package, resource, messages)
|
101
|
+
if !package.resource_exists?( package.resolve_resource( resource ) )
|
102
|
+
messages[:errors] << "Resource #{resource["url"] || resource["path"]} does not exist"
|
103
|
+
end
|
104
|
+
end
|
105
|
+
|
106
|
+
end
|
107
|
+
|
108
|
+
#Validator that checks whether a package conforms to the Simple Data Format profile
|
109
|
+
class SimpleDataFormatValidator < DataPackageValidator
|
110
|
+
|
111
|
+
def initialize(schema_name=:datapackage, opts={})
|
112
|
+
super(:datapackage, opts)
|
113
|
+
@jsontable_schema = load_schema(:jsontable)
|
114
|
+
@csvddf_schema = load_schema("csvddf-dialect")
|
115
|
+
end
|
116
|
+
|
117
|
+
def validate_resource(package, resource, messages)
|
118
|
+
super(package, resource, messages)
|
119
|
+
|
120
|
+
if !csv?(resource)
|
121
|
+
messages[:errors] << "#{resource["name"]} is not a CSV file"
|
122
|
+
else
|
123
|
+
if !resource["schema"]
|
124
|
+
messages[:errors] << "#{resource["name"]} does not have a schema"
|
125
|
+
else
|
126
|
+
messages[:errors] +=
|
127
|
+
JSON::Validator.fully_validate(@jsontable_schema,
|
128
|
+
resource["schema"], :errors_as_objects => true)
|
129
|
+
end
|
130
|
+
if resource["dialect"]
|
131
|
+
messages[:errors] +=
|
132
|
+
JSON::Validator.fully_validate(@csvddf_schema,
|
133
|
+
resource["dialect"], :errors_as_objects => true)
|
134
|
+
end
|
135
|
+
|
136
|
+
if resource["schema"] && resource["schema"]["fields"]
|
137
|
+
fields = resource["schema"]["fields"]
|
138
|
+
declared_fields = fields.map{ |f| f["name"] }
|
139
|
+
headers = headers(package, resource)
|
140
|
+
|
141
|
+
#set algebra to finding fields missing from schema and/or CSV file
|
142
|
+
missing_fields = declared_fields - headers
|
143
|
+
if missing_fields != []
|
144
|
+
messages[:errors] <<
|
145
|
+
"Declared schema has fields not present in CSV file (#{missing_fields.join(",")})"
|
146
|
+
end
|
147
|
+
undeclared_fields = headers - declared_fields
|
148
|
+
if undeclared_fields != []
|
149
|
+
messages[:errors] << "CSV file has fields missing from schema (#{undeclared_fields.join(",")})"
|
150
|
+
end
|
151
|
+
end
|
152
|
+
|
153
|
+
end
|
154
|
+
|
155
|
+
end
|
156
|
+
|
157
|
+
def csv?(resource)
|
158
|
+
resource["mediatype"] == "text/csv" ||
|
159
|
+
resource["format"] == "csv"
|
160
|
+
end
|
161
|
+
|
162
|
+
def headers(package, resource)
|
163
|
+
headers = []
|
164
|
+
opts = dialect_to_csv_options(resource["dialect"])
|
165
|
+
CSV.open( package.resolve_resource(resource), "r", opts) do |csv|
|
166
|
+
headers = csv.shift
|
167
|
+
end
|
168
|
+
return headers
|
169
|
+
end
|
170
|
+
|
171
|
+
def dialect_to_csv_options(dialect)
|
172
|
+
return {}
|
173
|
+
end
|
174
|
+
end
|
175
|
+
|
176
|
+
end
|
data/lib/datapackage.rb
ADDED
metadata
ADDED
@@ -0,0 +1,152 @@
|
|
1
|
+
--- !ruby/object:Gem::Specification
|
2
|
+
name: datapackage
|
3
|
+
version: !ruby/object:Gem::Version
|
4
|
+
version: 0.0.1
|
5
|
+
prerelease:
|
6
|
+
platform: ruby
|
7
|
+
authors:
|
8
|
+
- Leigh Dodds
|
9
|
+
autorequire:
|
10
|
+
bindir: bin
|
11
|
+
cert_chain: []
|
12
|
+
date: 2013-12-05 00:00:00.000000000 Z
|
13
|
+
dependencies:
|
14
|
+
- !ruby/object:Gem::Dependency
|
15
|
+
name: json
|
16
|
+
requirement: !ruby/object:Gem::Requirement
|
17
|
+
none: false
|
18
|
+
requirements:
|
19
|
+
- - ! '>='
|
20
|
+
- !ruby/object:Gem::Version
|
21
|
+
version: '0'
|
22
|
+
type: :runtime
|
23
|
+
prerelease: false
|
24
|
+
version_requirements: !ruby/object:Gem::Requirement
|
25
|
+
none: false
|
26
|
+
requirements:
|
27
|
+
- - ! '>='
|
28
|
+
- !ruby/object:Gem::Version
|
29
|
+
version: '0'
|
30
|
+
- !ruby/object:Gem::Dependency
|
31
|
+
name: json-schema
|
32
|
+
requirement: !ruby/object:Gem::Requirement
|
33
|
+
none: false
|
34
|
+
requirements:
|
35
|
+
- - ! '>='
|
36
|
+
- !ruby/object:Gem::Version
|
37
|
+
version: '0'
|
38
|
+
type: :runtime
|
39
|
+
prerelease: false
|
40
|
+
version_requirements: !ruby/object:Gem::Requirement
|
41
|
+
none: false
|
42
|
+
requirements:
|
43
|
+
- - ! '>='
|
44
|
+
- !ruby/object:Gem::Version
|
45
|
+
version: '0'
|
46
|
+
- !ruby/object:Gem::Dependency
|
47
|
+
name: rest-client
|
48
|
+
requirement: !ruby/object:Gem::Requirement
|
49
|
+
none: false
|
50
|
+
requirements:
|
51
|
+
- - ! '>='
|
52
|
+
- !ruby/object:Gem::Version
|
53
|
+
version: '0'
|
54
|
+
type: :runtime
|
55
|
+
prerelease: false
|
56
|
+
version_requirements: !ruby/object:Gem::Requirement
|
57
|
+
none: false
|
58
|
+
requirements:
|
59
|
+
- - ! '>='
|
60
|
+
- !ruby/object:Gem::Version
|
61
|
+
version: '0'
|
62
|
+
- !ruby/object:Gem::Dependency
|
63
|
+
name: rspec
|
64
|
+
requirement: !ruby/object:Gem::Requirement
|
65
|
+
none: false
|
66
|
+
requirements:
|
67
|
+
- - ! '>='
|
68
|
+
- !ruby/object:Gem::Version
|
69
|
+
version: '0'
|
70
|
+
type: :development
|
71
|
+
prerelease: false
|
72
|
+
version_requirements: !ruby/object:Gem::Requirement
|
73
|
+
none: false
|
74
|
+
requirements:
|
75
|
+
- - ! '>='
|
76
|
+
- !ruby/object:Gem::Version
|
77
|
+
version: '0'
|
78
|
+
- !ruby/object:Gem::Dependency
|
79
|
+
name: simplecov-rcov
|
80
|
+
requirement: !ruby/object:Gem::Requirement
|
81
|
+
none: false
|
82
|
+
requirements:
|
83
|
+
- - ! '>='
|
84
|
+
- !ruby/object:Gem::Version
|
85
|
+
version: '0'
|
86
|
+
type: :development
|
87
|
+
prerelease: false
|
88
|
+
version_requirements: !ruby/object:Gem::Requirement
|
89
|
+
none: false
|
90
|
+
requirements:
|
91
|
+
- - ! '>='
|
92
|
+
- !ruby/object:Gem::Version
|
93
|
+
version: '0'
|
94
|
+
- !ruby/object:Gem::Dependency
|
95
|
+
name: fakeweb
|
96
|
+
requirement: !ruby/object:Gem::Requirement
|
97
|
+
none: false
|
98
|
+
requirements:
|
99
|
+
- - ~>
|
100
|
+
- !ruby/object:Gem::Version
|
101
|
+
version: '1.3'
|
102
|
+
type: :development
|
103
|
+
prerelease: false
|
104
|
+
version_requirements: !ruby/object:Gem::Requirement
|
105
|
+
none: false
|
106
|
+
requirements:
|
107
|
+
- - ~>
|
108
|
+
- !ruby/object:Gem::Version
|
109
|
+
version: '1.3'
|
110
|
+
description:
|
111
|
+
email:
|
112
|
+
- leigh@ldodds.com
|
113
|
+
executables: []
|
114
|
+
extensions: []
|
115
|
+
extra_rdoc_files: []
|
116
|
+
files:
|
117
|
+
- etc/jsontable-schema.json
|
118
|
+
- etc/datapackage-schema.json
|
119
|
+
- etc/README.md
|
120
|
+
- etc/csvddf-dialect-schema.json
|
121
|
+
- lib/datapackage.rb
|
122
|
+
- lib/datapackage/validator.rb
|
123
|
+
- lib/datapackage/package.rb
|
124
|
+
- lib/datapackage/version.rb
|
125
|
+
- LICENSE.md
|
126
|
+
- README.md
|
127
|
+
homepage: http://github.com/theodi/datapackage.rb
|
128
|
+
licenses: []
|
129
|
+
post_install_message:
|
130
|
+
rdoc_options: []
|
131
|
+
require_paths:
|
132
|
+
- lib
|
133
|
+
required_ruby_version: !ruby/object:Gem::Requirement
|
134
|
+
none: false
|
135
|
+
requirements:
|
136
|
+
- - ! '>='
|
137
|
+
- !ruby/object:Gem::Version
|
138
|
+
version: '0'
|
139
|
+
required_rubygems_version: !ruby/object:Gem::Requirement
|
140
|
+
none: false
|
141
|
+
requirements:
|
142
|
+
- - ! '>='
|
143
|
+
- !ruby/object:Gem::Version
|
144
|
+
version: '0'
|
145
|
+
requirements: []
|
146
|
+
rubyforge_project:
|
147
|
+
rubygems_version: 1.8.23
|
148
|
+
signing_key:
|
149
|
+
specification_version: 3
|
150
|
+
summary: Library for working with data packages
|
151
|
+
test_files: []
|
152
|
+
has_rdoc:
|