datapackage 0.0.1
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- data/LICENSE.md +20 -0
- data/README.md +184 -0
- data/etc/README.md +18 -0
- data/etc/csvddf-dialect-schema.json +24 -0
- data/etc/datapackage-schema.json +208 -0
- data/etc/jsontable-schema.json +34 -0
- data/lib/datapackage/package.rb +160 -0
- data/lib/datapackage/validator.rb +176 -0
- data/lib/datapackage/version.rb +3 -0
- data/lib/datapackage.rb +11 -0
- metadata +152 -0
data/LICENSE.md
ADDED
@@ -0,0 +1,20 @@
|
|
1
|
+
Copyright 2013 The Open Data Institute
|
2
|
+
|
3
|
+
Permission is hereby granted, free of charge, to any person obtaining
|
4
|
+
a copy of this software and associated documentation files (the
|
5
|
+
"Software"), to deal in the Software without restriction, including
|
6
|
+
without limitation the rights to use, copy, modify, merge, publish,
|
7
|
+
distribute, sublicense, and/or sell copies of the Software, and to
|
8
|
+
permit persons to whom the Software is furnished to do so, subject to
|
9
|
+
the following conditions:
|
10
|
+
|
11
|
+
The above copyright notice and this permission notice shall be
|
12
|
+
included in all copies or substantial portions of the Software.
|
13
|
+
|
14
|
+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
|
15
|
+
EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
|
16
|
+
MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
|
17
|
+
NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE
|
18
|
+
LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION
|
19
|
+
OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION
|
20
|
+
WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
|
data/README.md
ADDED
@@ -0,0 +1,184 @@
|
|
1
|
+
# DataPackage.rb
|
2
|
+
|
3
|
+
A ruby library for working with [Data Packages](http://dataprotocols.org/data-packages/).
|
4
|
+
|
5
|
+
[](http://jenkins.theodi.org/job/datapackage.rb-master/)
|
6
|
+
[](https://codeclimate.com/github/theodi/datapackage.rb)
|
7
|
+
[](https://gemnasium.com/theodi/datapackage.rb)
|
8
|
+
|
9
|
+
The library is intending to support:
|
10
|
+
|
11
|
+
* Parsing and using data package metadata and data
|
12
|
+
* Validating data packages to ensure they conform with the Data Package specification
|
13
|
+
|
14
|
+
## Installation
|
15
|
+
|
16
|
+
Add the gem into your Gemfile:
|
17
|
+
|
18
|
+
gem 'datapackage.rb', :git => "git://github.com/theodi/datapackage.rb.git"
|
19
|
+
|
20
|
+
Note: gem release to come
|
21
|
+
|
22
|
+
## Basic Usage
|
23
|
+
|
24
|
+
Require the gem, if you need to:
|
25
|
+
|
26
|
+
require 'datapackage.rb'
|
27
|
+
|
28
|
+
Parsing a datapackage from a remote location:
|
29
|
+
|
30
|
+
package = DataPackage::Package.new( "http://example.org/datasets/a" )
|
31
|
+
|
32
|
+
This assumes that `http://example.org/datasets/a/datapackage.json` exists, or specifically load a JSON file:
|
33
|
+
|
34
|
+
package = DataPackage::Package.new( "http://example.org/datasets/a/datapackage.json" )
|
35
|
+
|
36
|
+
Similarly you can load a package from a local JSON file, or specify a directory:
|
37
|
+
|
38
|
+
package = DataPackage::Package.new( "/my/data/package" )
|
39
|
+
package = DataPackage::Package.new( "/my/data/package/datapackage.json" )
|
40
|
+
|
41
|
+
There are a set of helper methods for accessing data from the package, e.g:
|
42
|
+
|
43
|
+
package = DataPackage::Package.new( "/my/data/package" )
|
44
|
+
package.name
|
45
|
+
package.title
|
46
|
+
package.licenses
|
47
|
+
package.resources
|
48
|
+
|
49
|
+
These currently just return the raw JSON structure, but this might change in future.
|
50
|
+
|
51
|
+
## Package Validation
|
52
|
+
|
53
|
+
The library supports validating packages. It can be used to validate both the metadata for the package (`datapackage.json`)
|
54
|
+
and the integrity of the package itself, e.g. whether the data files exist.
|
55
|
+
|
56
|
+
### Validating a Package
|
57
|
+
|
58
|
+
Quickly checking the validity of a package can be achieve as follows:
|
59
|
+
|
60
|
+
package.valid?
|
61
|
+
|
62
|
+
To expose more detail on errors and warnings:
|
63
|
+
|
64
|
+
messages = package.validate() # or package.validate(:datapackage)
|
65
|
+
|
66
|
+
This returns an object with two keys: `:errors` and `:warnings`. These are arrays of messages.
|
67
|
+
|
68
|
+
Warnings might include notes on missing metadata elements (e.g. package `licenses`) which are not required by the DataPackage specification
|
69
|
+
but which SHOULD be included.
|
70
|
+
|
71
|
+
It is possible to treat all warnings as errors by performing strict validation:
|
72
|
+
|
73
|
+
package.valid?(true)
|
74
|
+
|
75
|
+
Warnings are currently generated for:
|
76
|
+
|
77
|
+
* Missing `README.md` files from packages
|
78
|
+
* Missing `licenses` key from `datapackage.json`
|
79
|
+
* Missing `datapackage_version` key from `datapackage.json`
|
80
|
+
|
81
|
+
### Selecting a Validation Profile
|
82
|
+
|
83
|
+
The library contains two validation classes, one for the core Data Package specification and the other for the Simple Data Format
|
84
|
+
rules. By default the library uses the more liberal Data Package rules.
|
85
|
+
|
86
|
+
The required profile can be specified in one of two ways. Either as a parameter to the validation methods:
|
87
|
+
|
88
|
+
package.valid?(:datapackage)
|
89
|
+
package.valid?(:simpledataformat)
|
90
|
+
package.validate(:datapackage)
|
91
|
+
package.validate(:simpledataformat)
|
92
|
+
|
93
|
+
Or, by using a `DataPackage::Validation` class:
|
94
|
+
|
95
|
+
validator = DataPackage::SimpleDataFormatValidator.new
|
96
|
+
validator.valid?( package )
|
97
|
+
validator.validate( package )
|
98
|
+
|
99
|
+
### Approach to Validation
|
100
|
+
|
101
|
+
The library will support validating packages against the general rules specified in the
|
102
|
+
[DataPackage specification](http://dataprotocols.org/data-packages/) as well as the stricter requirements given in the
|
103
|
+
[Simple Data Format specification](http://dataprotocols.org/simple-data-format/) (SDF).
|
104
|
+
|
105
|
+
SDF is essentially a profile of DataPackage which includes some additional restrictions on
|
106
|
+
how data should be published and described. For example all data is to be published as CSV files.
|
107
|
+
|
108
|
+
The validation in the library is divided into two parts:
|
109
|
+
|
110
|
+
* Metadata validation -- checking that the structure of the `datapackage.json` file is correcgt
|
111
|
+
* Integrity checking -- checking that the overall package and data files appear to be in order
|
112
|
+
|
113
|
+
#### Metadata Validation
|
114
|
+
|
115
|
+
The basic structure of `datapackage.json` files are validated using [JSON Schema](http://json-schema.org/). This provides a simple
|
116
|
+
declarative way to describe the expected structure of the package metadata.
|
117
|
+
|
118
|
+
The schema files can be found in the `etc` directory of the project and could be used in other applications. The schema files can
|
119
|
+
be customised to support local validation checks (see below).
|
120
|
+
|
121
|
+
#### Integrity Checking
|
122
|
+
|
123
|
+
While the metadata for a package might be correct, there are other ways in which the package could be invalid. For example,
|
124
|
+
data files might be missing or incorrectly described.
|
125
|
+
|
126
|
+
The metadata validation is therefore supplemented with some custom code that performs some other checks:
|
127
|
+
|
128
|
+
* (Both profiles) All resources described in the package must be accessible, e.g. the local file exists or a URL responds successfully to a `HEAD`
|
129
|
+
* (`:simpledataformat`) All resources must be CSV files
|
130
|
+
* (`:simpledataformat`) All resources must have a valid JSON Table Schema
|
131
|
+
* (`:simpledataformat`) CSV `dialect` descriptions must be valid
|
132
|
+
* (`:simpledataformat`) All fields declared in the schema must be present in the CSV file
|
133
|
+
* (`:simpledataformat`) All fields present in the CSV file must be present in the schema
|
134
|
+
|
135
|
+
### Customising the Validation Code
|
136
|
+
|
137
|
+
The library provides several extension points for customising the way that packages are validated.
|
138
|
+
|
139
|
+
#### Supplying Custom JSON Schemas
|
140
|
+
|
141
|
+
Custom JSON schemas can be provided to allow validation to be tweaked for local conventions. An options hash can be
|
142
|
+
provided to the constructor of a `DataPackage::Validator` object, this can be used to map schema names to custom
|
143
|
+
schemas.
|
144
|
+
|
145
|
+
(Any options passed to the constructor of a `DataPackage::Package` object will also be passed to its validator)
|
146
|
+
|
147
|
+
For example to create a new validation profile called `my-validation-rules` and then apply it:
|
148
|
+
|
149
|
+
opts = {
|
150
|
+
:schema => {
|
151
|
+
:my-validation-rules => "/path/to/json/schema.json"
|
152
|
+
}
|
153
|
+
}
|
154
|
+
package = DataPackage::Package.new( url )
|
155
|
+
package.valid?(:my-validation-rules)
|
156
|
+
|
157
|
+
This will cause the code to create a custom `DataPackage::Validator` instance that will apply the supplied schema. This class
|
158
|
+
does not provide any integrity checks.
|
159
|
+
|
160
|
+
To mix a custom schema with the existing integrity checking, you must manually create a `Validator` instance. E.g:
|
161
|
+
|
162
|
+
opts = {
|
163
|
+
:schema => {
|
164
|
+
:my-validation-rules => "/path/to/json/schema.json"
|
165
|
+
}
|
166
|
+
}
|
167
|
+
validator = DataPackage::SimpleDataFormatValidator(:my-validation-rules, opts)
|
168
|
+
validator.valid?( package )
|
169
|
+
|
170
|
+
Custom schemas must be valid JSON files that conforms to the JSON Schema v4 specification. The absolute path to the schema file must be
|
171
|
+
provided.
|
172
|
+
|
173
|
+
Validation is performed using the [json-schema](https://github.com/hoxworth/json-schema) gem which has some documented restrictions.
|
174
|
+
|
175
|
+
The built-in schema files can also be overridden in this way, e.g. by specifying an alternate location for the `:datapackage` schema.
|
176
|
+
|
177
|
+
#### Custom Integrity Checking
|
178
|
+
|
179
|
+
Integrity checking can be customized by creating new sub-classes of `DataPackage::Validator` or one of the existing sub-classes.
|
180
|
+
|
181
|
+
The following methods can be implemented:
|
182
|
+
|
183
|
+
* `validate_metadata(package, messages)` -- perform additional metadata checking after JSON schema is provided.
|
184
|
+
* `validate_resource(package, resource, messages)` -- called for each resource in the package
|
data/etc/README.md
ADDED
@@ -0,0 +1,18 @@
|
|
1
|
+
This directory contains some JSON Schema documents for validating:
|
2
|
+
|
3
|
+
* `datapackage-schema.json` -- [datapackage.json](http://dataprotocols.org/data-packages/) package files
|
4
|
+
* `jsontable-schema.json` -- [JSON Table Schemas](http://dataprotocols.org/json-table-schema/) objects
|
5
|
+
* `csvddf-dialect-schema.json` -- [CSV Dialect Description Format](http://dataprotocols.org/csv-dialect/) dialect objects
|
6
|
+
|
7
|
+
The JSON Table Schemas and CSV Dialect Description Format both define JSON object structures that can appear in `datapackage.json` files (via the `schema` and `dialect` keywords). In the main `datapackage-schema.json` object, these keywords are only validated as simple objects.
|
8
|
+
|
9
|
+
In the application the subsidiary schemas are automatically applied to relevant keys. This could be improved by using JSON Schema cross-referencing.
|
10
|
+
|
11
|
+
Other potential improvements include:
|
12
|
+
|
13
|
+
* Add `data` keyword validation to `datapackage-schema.json`
|
14
|
+
* Add `format` keywords for validating email addresses and date/date-times
|
15
|
+
* Or, add `pattern` for validating dates
|
16
|
+
* Improve regexs used in various places
|
17
|
+
|
18
|
+
|
@@ -0,0 +1,24 @@
|
|
1
|
+
{
|
2
|
+
"$schema": "http://json-schema.org/draft-04/schema#",
|
3
|
+
"title": "CSVDDF",
|
4
|
+
"description": "JSON Schema for validating CSVDDF dialect structures",
|
5
|
+
"type": "object",
|
6
|
+
"properties": {
|
7
|
+
"delimiter": {
|
8
|
+
"type": "string"
|
9
|
+
},
|
10
|
+
"doublequote": {
|
11
|
+
"type": "boolean"
|
12
|
+
},
|
13
|
+
"lineterminator": {
|
14
|
+
"type": "string"
|
15
|
+
},
|
16
|
+
"quotechar": {
|
17
|
+
"type": "string"
|
18
|
+
},
|
19
|
+
"skipinitialspace": {
|
20
|
+
"type": "boolean"
|
21
|
+
}
|
22
|
+
},
|
23
|
+
"required": [ "delimiter", "doublequote", "lineterminator", "quotechar", "skipinitialspace" ]
|
24
|
+
}
|
@@ -0,0 +1,208 @@
|
|
1
|
+
{
|
2
|
+
"$schema": "http://json-schema.org/draft-04/schema#",
|
3
|
+
"title": "DataPackage",
|
4
|
+
"description": "JSON Schema for validating datapackage.json files",
|
5
|
+
"type": "object",
|
6
|
+
"properties": {
|
7
|
+
"name": {
|
8
|
+
"type": "string",
|
9
|
+
"pattern": "^([a-z\\.\\_\\-])+$"
|
10
|
+
},
|
11
|
+
"licences": {
|
12
|
+
"type": "array",
|
13
|
+
"items": {
|
14
|
+
"type": "object",
|
15
|
+
"properties": {
|
16
|
+
"id": { "type": "string" },
|
17
|
+
"url": { "type": "string" }
|
18
|
+
},
|
19
|
+
"anyOf": [
|
20
|
+
{ "title": "id required", "required": ["id"] },
|
21
|
+
{ "title": "url required", "required": ["url"] }
|
22
|
+
]
|
23
|
+
}
|
24
|
+
},
|
25
|
+
"datapackage_version": {
|
26
|
+
"type": "string"
|
27
|
+
},
|
28
|
+
"title": {
|
29
|
+
"type": "string"
|
30
|
+
},
|
31
|
+
"description": {
|
32
|
+
"type": "string"
|
33
|
+
},
|
34
|
+
"homepage": {
|
35
|
+
"type": "string"
|
36
|
+
},
|
37
|
+
"version": {
|
38
|
+
"type": "string"
|
39
|
+
},
|
40
|
+
"sources": {
|
41
|
+
"type": "array",
|
42
|
+
"items": {
|
43
|
+
"type": "object",
|
44
|
+
"properties": {
|
45
|
+
"name": { "type": "string" },
|
46
|
+
"web": { "type": "string" },
|
47
|
+
"email": { "type": "string" }
|
48
|
+
},
|
49
|
+
"anyOf": [
|
50
|
+
{ "title": "name required", "required": ["name"] },
|
51
|
+
{ "title": "web required", "required": ["web"] },
|
52
|
+
{ "title": "email required", "required": ["email"] }
|
53
|
+
]
|
54
|
+
}
|
55
|
+
},
|
56
|
+
"keywords": {
|
57
|
+
"type": "array",
|
58
|
+
"items": {
|
59
|
+
"type": "string"
|
60
|
+
}
|
61
|
+
},
|
62
|
+
"last_modified": {
|
63
|
+
"type": "string"
|
64
|
+
},
|
65
|
+
"image": {
|
66
|
+
"type": "string"
|
67
|
+
},
|
68
|
+
"bugs": {
|
69
|
+
"type": "string"
|
70
|
+
},
|
71
|
+
"maintainers": {
|
72
|
+
"type": "array",
|
73
|
+
"items": {
|
74
|
+
"type": "object",
|
75
|
+
"properties": {
|
76
|
+
"name": {
|
77
|
+
"type": "string"
|
78
|
+
},
|
79
|
+
"email": {
|
80
|
+
"type": "string"
|
81
|
+
},
|
82
|
+
"web": {
|
83
|
+
"type": "string"
|
84
|
+
}
|
85
|
+
},
|
86
|
+
"required": ["name"]
|
87
|
+
}
|
88
|
+
},
|
89
|
+
"contributors": {
|
90
|
+
"type": "array",
|
91
|
+
"items": {
|
92
|
+
"type": "object",
|
93
|
+
"properties": {
|
94
|
+
"name": {
|
95
|
+
"type": "string"
|
96
|
+
},
|
97
|
+
"email": {
|
98
|
+
"type": "string"
|
99
|
+
},
|
100
|
+
"web": {
|
101
|
+
"type": "string"
|
102
|
+
}
|
103
|
+
},
|
104
|
+
"required": ["name"]
|
105
|
+
}
|
106
|
+
},
|
107
|
+
"publisher": {
|
108
|
+
"type": "array",
|
109
|
+
"items": {
|
110
|
+
"type": "object",
|
111
|
+
"properties": {
|
112
|
+
"name": {
|
113
|
+
"type": "string"
|
114
|
+
},
|
115
|
+
"email": {
|
116
|
+
"type": "string"
|
117
|
+
},
|
118
|
+
"web": {
|
119
|
+
"type": "string"
|
120
|
+
}
|
121
|
+
},
|
122
|
+
"required": ["name"]
|
123
|
+
}
|
124
|
+
},
|
125
|
+
"dependencies": {
|
126
|
+
"type": "object"
|
127
|
+
},
|
128
|
+
"resources": {
|
129
|
+
"type": "array",
|
130
|
+
"minItems": 1,
|
131
|
+
"items": {
|
132
|
+
"type": "object",
|
133
|
+
"properties": {
|
134
|
+
"url": {
|
135
|
+
"type": "string"
|
136
|
+
},
|
137
|
+
"path": {
|
138
|
+
"type": "string"
|
139
|
+
},
|
140
|
+
"name": {
|
141
|
+
"type": "string"
|
142
|
+
},
|
143
|
+
"format": {
|
144
|
+
"type": "string"
|
145
|
+
},
|
146
|
+
"mediatype": {
|
147
|
+
"type": "string",
|
148
|
+
"pattern": "^(.+)/(.+)$"
|
149
|
+
},
|
150
|
+
"encoding": {
|
151
|
+
"type": "string"
|
152
|
+
},
|
153
|
+
"bytes": {
|
154
|
+
"type": "integer"
|
155
|
+
},
|
156
|
+
"hash": {
|
157
|
+
"type": "string",
|
158
|
+
"pattern": "^([a-fA-F0-9]{32})$"
|
159
|
+
},
|
160
|
+
"modified": {
|
161
|
+
"type": "string"
|
162
|
+
},
|
163
|
+
"schema": {
|
164
|
+
"type": "object"
|
165
|
+
},
|
166
|
+
"dialect": {
|
167
|
+
"type": "object"
|
168
|
+
},
|
169
|
+
"sources": {
|
170
|
+
"type": "array",
|
171
|
+
"items": {
|
172
|
+
"type": "object",
|
173
|
+
"properties": {
|
174
|
+
"name": { "type": "string" },
|
175
|
+
"web": { "type": "string" },
|
176
|
+
"email": { "type": "string" }
|
177
|
+
},
|
178
|
+
"anyOf": [
|
179
|
+
{ "title": "name required", "required": ["name"] },
|
180
|
+
{ "title": "web required", "required": ["web"] },
|
181
|
+
{ "title": "email required", "required": ["email"] }
|
182
|
+
]
|
183
|
+
}
|
184
|
+
},
|
185
|
+
"licences": {
|
186
|
+
"type": "array",
|
187
|
+
"items": {
|
188
|
+
"type": "object",
|
189
|
+
"properties": {
|
190
|
+
"id": { "type": "string" },
|
191
|
+
"url": { "type": "string" }
|
192
|
+
},
|
193
|
+
"anyOf": [
|
194
|
+
{ "title": "id required", "required": ["id"] },
|
195
|
+
{ "title": "url required", "required": ["url"] }
|
196
|
+
]
|
197
|
+
}
|
198
|
+
}
|
199
|
+
},
|
200
|
+
"anyOf": [
|
201
|
+
{ "title": "url required", "required": ["url"] },
|
202
|
+
{ "title": "path required", "required": ["path"] }
|
203
|
+
]
|
204
|
+
}
|
205
|
+
}
|
206
|
+
},
|
207
|
+
"required": ["name", "resources"]
|
208
|
+
}
|
@@ -0,0 +1,34 @@
|
|
1
|
+
{
|
2
|
+
"$schema": "http://json-schema.org/draft-04/schema#",
|
3
|
+
"title": "JSON Table Schema",
|
4
|
+
"description": "JSON Schema for validating JSON Table structures",
|
5
|
+
"type": "object",
|
6
|
+
"properties": {
|
7
|
+
"fields": {
|
8
|
+
"type": "array",
|
9
|
+
"minItems": 1,
|
10
|
+
"items": {
|
11
|
+
"type": "object",
|
12
|
+
"properties": {
|
13
|
+
"name": {
|
14
|
+
"type": "string"
|
15
|
+
},
|
16
|
+
"title": {
|
17
|
+
"type": "string"
|
18
|
+
},
|
19
|
+
"description": {
|
20
|
+
"type": "string"
|
21
|
+
},
|
22
|
+
"type": {
|
23
|
+
"enum": [ "string", "number", "integer", "date", "time", "datetime", "boolean", "binary", "object", "geopoint", "geojson", "array", "any" ]
|
24
|
+
},
|
25
|
+
"format": {
|
26
|
+
"type": "string"
|
27
|
+
}
|
28
|
+
},
|
29
|
+
"required": ["name"]
|
30
|
+
}
|
31
|
+
}
|
32
|
+
},
|
33
|
+
"required": ["fields"]
|
34
|
+
}
|
@@ -0,0 +1,160 @@
|
|
1
|
+
require 'open-uri'
|
2
|
+
|
3
|
+
module DataPackage
|
4
|
+
|
5
|
+
class Package
|
6
|
+
|
7
|
+
attr_reader :metadata, :opts
|
8
|
+
|
9
|
+
# Parse a data package
|
10
|
+
#
|
11
|
+
# Supports reading data from JSON file, directory, and a URL
|
12
|
+
#
|
13
|
+
# package:: Hash or a String
|
14
|
+
# opts:: Options used to customize reading and parsing
|
15
|
+
def initialize(package, opts={})
|
16
|
+
@opts = opts
|
17
|
+
#TODO base directory/url
|
18
|
+
if package.class == Hash
|
19
|
+
@metadata = package
|
20
|
+
else
|
21
|
+
if !package.start_with?("http") && File.directory?(package)
|
22
|
+
package = File.join(package, opts[:default_filename] || "datapackage.json")
|
23
|
+
end
|
24
|
+
if package.start_with?("http") && !package.end_with?("datapackage.json")
|
25
|
+
package = URI.join(package, "datapackage.json")
|
26
|
+
end
|
27
|
+
@location = package.to_s
|
28
|
+
@metadata = JSON.parse( open(package).read )
|
29
|
+
end
|
30
|
+
end
|
31
|
+
|
32
|
+
#Returns the directory for a local file package or base url for a remote
|
33
|
+
#Returns nil for an in-memory object (because it has no base as yet)
|
34
|
+
def base
|
35
|
+
#user can override base
|
36
|
+
return @opts[:base] if @opts[:base]
|
37
|
+
return "" unless @location
|
38
|
+
#work out base directory or uri
|
39
|
+
if local?
|
40
|
+
return File.dirname( @location )
|
41
|
+
else
|
42
|
+
return @location.split("/")[0..-2].join("/")
|
43
|
+
end
|
44
|
+
end
|
45
|
+
|
46
|
+
#Is this a local package? Returns true if created from an in-memory object or a file/directory reference
|
47
|
+
def local?
|
48
|
+
return !@location.start_with?("http") if @location
|
49
|
+
return true
|
50
|
+
end
|
51
|
+
|
52
|
+
def name
|
53
|
+
@metadata["name"]
|
54
|
+
end
|
55
|
+
|
56
|
+
def title
|
57
|
+
@metadata["title"]
|
58
|
+
end
|
59
|
+
|
60
|
+
def description
|
61
|
+
@metadata["description"]
|
62
|
+
end
|
63
|
+
|
64
|
+
def homepage
|
65
|
+
@metadata["homepage"]
|
66
|
+
end
|
67
|
+
|
68
|
+
def licenses
|
69
|
+
@metadata["licenses"] || []
|
70
|
+
end
|
71
|
+
alias_method :licences, :licenses
|
72
|
+
|
73
|
+
#What version of datapackage specification is this using?
|
74
|
+
def datapackage_version
|
75
|
+
@metadata["datapackage_version"]
|
76
|
+
end
|
77
|
+
|
78
|
+
#What is the version of this specific data package?
|
79
|
+
def version
|
80
|
+
@metadata["version"]
|
81
|
+
end
|
82
|
+
|
83
|
+
def sources
|
84
|
+
@metadata["sources"] || []
|
85
|
+
end
|
86
|
+
|
87
|
+
def keywords
|
88
|
+
@metadata["keywords"] || []
|
89
|
+
end
|
90
|
+
|
91
|
+
def last_modified
|
92
|
+
DateTime.parse @metadata["last_modified"] rescue nil
|
93
|
+
end
|
94
|
+
|
95
|
+
def image
|
96
|
+
@metadata["image"]
|
97
|
+
end
|
98
|
+
|
99
|
+
def maintainers
|
100
|
+
@metadata["maintainers"] || []
|
101
|
+
end
|
102
|
+
|
103
|
+
def contributors
|
104
|
+
@metadata["contributors"] || []
|
105
|
+
end
|
106
|
+
|
107
|
+
def publisher
|
108
|
+
@metadata["publisher"] || []
|
109
|
+
end
|
110
|
+
|
111
|
+
def resources
|
112
|
+
@metadata["resources"] || []
|
113
|
+
end
|
114
|
+
|
115
|
+
def dependencies
|
116
|
+
@metadata["dependencies"]
|
117
|
+
end
|
118
|
+
|
119
|
+
def property(property, default=nil)
|
120
|
+
@metadata[property] || default
|
121
|
+
end
|
122
|
+
|
123
|
+
def valid?(profile=:datapackage, strict=false)
|
124
|
+
validator = DataPackage::Validator.create(profile, @opts)
|
125
|
+
return validator.valid?(self, strict)
|
126
|
+
end
|
127
|
+
|
128
|
+
def validate(profile=:datapackage)
|
129
|
+
validator = DataPackage::Validator.create(profile, @opts)
|
130
|
+
return validator.validate(self)
|
131
|
+
end
|
132
|
+
|
133
|
+
def resolve_resource(resource)
|
134
|
+
return resource["url"] || resolve( resource["path"] )
|
135
|
+
end
|
136
|
+
|
137
|
+
def resolve(path)
|
138
|
+
if local?
|
139
|
+
return File.join( base , path) if base != ""
|
140
|
+
return path
|
141
|
+
else
|
142
|
+
return URI.join(base, path)
|
143
|
+
end
|
144
|
+
end
|
145
|
+
|
146
|
+
def resource_exists?(location)
|
147
|
+
if !location.to_s.start_with?("http")
|
148
|
+
return File.exists?( location )
|
149
|
+
else
|
150
|
+
begin
|
151
|
+
status = RestClient.head( location ).code
|
152
|
+
return status == 200
|
153
|
+
rescue => e
|
154
|
+
return false
|
155
|
+
end
|
156
|
+
end
|
157
|
+
end
|
158
|
+
|
159
|
+
end
|
160
|
+
end
|
@@ -0,0 +1,176 @@
|
|
1
|
+
module DataPackage
|
2
|
+
|
3
|
+
#Base class for validators
|
4
|
+
class Validator
|
5
|
+
|
6
|
+
def Validator.create(profile, opts={})
|
7
|
+
if profile == :simpledataformat
|
8
|
+
return SimpleDataFormatValidator.new(profile, opts)
|
9
|
+
end
|
10
|
+
if profile == :datapackage
|
11
|
+
return DataPackageValidator.new(profile, opts)
|
12
|
+
end
|
13
|
+
return Validator.new(profile, opts)
|
14
|
+
end
|
15
|
+
|
16
|
+
def initialize(schema_name, opts={})
|
17
|
+
@schema_name = schema_name
|
18
|
+
@opts = opts
|
19
|
+
end
|
20
|
+
|
21
|
+
def valid?(package, strict=false)
|
22
|
+
messages = validate( package )
|
23
|
+
return messages[:errors].empty? if !strict
|
24
|
+
return messages[:errors].empty? && messages[:warnings].empty?
|
25
|
+
end
|
26
|
+
|
27
|
+
def validate( package )
|
28
|
+
return validate_integrity( package, validate_with_schema(package) )
|
29
|
+
end
|
30
|
+
|
31
|
+
def validate_with_schema(package)
|
32
|
+
schema = load_schema(@schema_name)
|
33
|
+
messages = {
|
34
|
+
:errors => JSON::Validator.fully_validate(schema, package.metadata, :errors_as_objects => true),
|
35
|
+
:warnings => []
|
36
|
+
}
|
37
|
+
validate_metadata(package, messages)
|
38
|
+
return messages
|
39
|
+
end
|
40
|
+
|
41
|
+
def validate_integrity(package, messages={ :errors=>[], :warnings=>[] } )
|
42
|
+
package.resources.each do |resource|
|
43
|
+
validate_resource(package, resource, messages)
|
44
|
+
end
|
45
|
+
|
46
|
+
messages
|
47
|
+
end
|
48
|
+
|
49
|
+
protected
|
50
|
+
|
51
|
+
#implement to perform additional validation on metadata
|
52
|
+
def validate_metadata(package, messages)
|
53
|
+
end
|
54
|
+
|
55
|
+
#implement for per-resource validation
|
56
|
+
def validate_resource(package, resource, messages)
|
57
|
+
end
|
58
|
+
|
59
|
+
protected
|
60
|
+
|
61
|
+
def load_schema(profile)
|
62
|
+
if @opts[:schema] && @opts[:schema][profile]
|
63
|
+
if !File.exists?( @opts[:schema][profile] )
|
64
|
+
raise "User supplied schema file does not exist: #{@opts[:schema][profile]}"
|
65
|
+
end
|
66
|
+
return JSON.parse( File.read( @opts[:schema][profile] ) )
|
67
|
+
end
|
68
|
+
schema_file = file_in_etc_directory( "#{profile}-schema.json" )
|
69
|
+
if !File.exists?( schema_file )
|
70
|
+
raise "Unable to read schema file #{schema_file} for validation profile #{profile}"
|
71
|
+
end
|
72
|
+
return JSON.parse( File.read( schema_file ) )
|
73
|
+
end
|
74
|
+
|
75
|
+
private
|
76
|
+
|
77
|
+
def file_in_etc_directory(filename)
|
78
|
+
File.join( File.dirname(__FILE__), "..", "..", "etc", filename )
|
79
|
+
end
|
80
|
+
|
81
|
+
end
|
82
|
+
|
83
|
+
#Extends base class with some additional checks for DataPackage conformance.
|
84
|
+
#
|
85
|
+
#These include some warnings about missing metadata elements and an existence
|
86
|
+
#check for all resources
|
87
|
+
class DataPackageValidator < Validator
|
88
|
+
def initialize(schema_name=:datapackage, opts={})
|
89
|
+
super(:datapackage, opts)
|
90
|
+
end
|
91
|
+
|
92
|
+
def validate_metadata(package, messages)
|
93
|
+
#not required, but recommended
|
94
|
+
prefix = "The package does not include a"
|
95
|
+
messages[:warnings] << "#{prefix} 'licenses' property" if package.licenses.empty?
|
96
|
+
messages[:warnings] << "#{prefix} 'datapackage_version' property" unless package.datapackage_version
|
97
|
+
messages[:warnings] << "#{prefix} README.md file" unless package.resource_exists?( package.resolve("README.md") )
|
98
|
+
end
|
99
|
+
|
100
|
+
def validate_resource(package, resource, messages)
|
101
|
+
if !package.resource_exists?( package.resolve_resource( resource ) )
|
102
|
+
messages[:errors] << "Resource #{resource["url"] || resource["path"]} does not exist"
|
103
|
+
end
|
104
|
+
end
|
105
|
+
|
106
|
+
end
|
107
|
+
|
108
|
+
#Validator that checks whether a package conforms to the Simple Data Format profile
|
109
|
+
class SimpleDataFormatValidator < DataPackageValidator
|
110
|
+
|
111
|
+
def initialize(schema_name=:datapackage, opts={})
|
112
|
+
super(:datapackage, opts)
|
113
|
+
@jsontable_schema = load_schema(:jsontable)
|
114
|
+
@csvddf_schema = load_schema("csvddf-dialect")
|
115
|
+
end
|
116
|
+
|
117
|
+
def validate_resource(package, resource, messages)
|
118
|
+
super(package, resource, messages)
|
119
|
+
|
120
|
+
if !csv?(resource)
|
121
|
+
messages[:errors] << "#{resource["name"]} is not a CSV file"
|
122
|
+
else
|
123
|
+
if !resource["schema"]
|
124
|
+
messages[:errors] << "#{resource["name"]} does not have a schema"
|
125
|
+
else
|
126
|
+
messages[:errors] +=
|
127
|
+
JSON::Validator.fully_validate(@jsontable_schema,
|
128
|
+
resource["schema"], :errors_as_objects => true)
|
129
|
+
end
|
130
|
+
if resource["dialect"]
|
131
|
+
messages[:errors] +=
|
132
|
+
JSON::Validator.fully_validate(@csvddf_schema,
|
133
|
+
resource["dialect"], :errors_as_objects => true)
|
134
|
+
end
|
135
|
+
|
136
|
+
if resource["schema"] && resource["schema"]["fields"]
|
137
|
+
fields = resource["schema"]["fields"]
|
138
|
+
declared_fields = fields.map{ |f| f["name"] }
|
139
|
+
headers = headers(package, resource)
|
140
|
+
|
141
|
+
#set algebra to finding fields missing from schema and/or CSV file
|
142
|
+
missing_fields = declared_fields - headers
|
143
|
+
if missing_fields != []
|
144
|
+
messages[:errors] <<
|
145
|
+
"Declared schema has fields not present in CSV file (#{missing_fields.join(",")})"
|
146
|
+
end
|
147
|
+
undeclared_fields = headers - declared_fields
|
148
|
+
if undeclared_fields != []
|
149
|
+
messages[:errors] << "CSV file has fields missing from schema (#{undeclared_fields.join(",")})"
|
150
|
+
end
|
151
|
+
end
|
152
|
+
|
153
|
+
end
|
154
|
+
|
155
|
+
end
|
156
|
+
|
157
|
+
def csv?(resource)
|
158
|
+
resource["mediatype"] == "text/csv" ||
|
159
|
+
resource["format"] == "csv"
|
160
|
+
end
|
161
|
+
|
162
|
+
def headers(package, resource)
|
163
|
+
headers = []
|
164
|
+
opts = dialect_to_csv_options(resource["dialect"])
|
165
|
+
CSV.open( package.resolve_resource(resource), "r", opts) do |csv|
|
166
|
+
headers = csv.shift
|
167
|
+
end
|
168
|
+
return headers
|
169
|
+
end
|
170
|
+
|
171
|
+
def dialect_to_csv_options(dialect)
|
172
|
+
return {}
|
173
|
+
end
|
174
|
+
end
|
175
|
+
|
176
|
+
end
|
data/lib/datapackage.rb
ADDED
metadata
ADDED
@@ -0,0 +1,152 @@
|
|
1
|
+
--- !ruby/object:Gem::Specification
|
2
|
+
name: datapackage
|
3
|
+
version: !ruby/object:Gem::Version
|
4
|
+
version: 0.0.1
|
5
|
+
prerelease:
|
6
|
+
platform: ruby
|
7
|
+
authors:
|
8
|
+
- Leigh Dodds
|
9
|
+
autorequire:
|
10
|
+
bindir: bin
|
11
|
+
cert_chain: []
|
12
|
+
date: 2013-12-05 00:00:00.000000000 Z
|
13
|
+
dependencies:
|
14
|
+
- !ruby/object:Gem::Dependency
|
15
|
+
name: json
|
16
|
+
requirement: !ruby/object:Gem::Requirement
|
17
|
+
none: false
|
18
|
+
requirements:
|
19
|
+
- - ! '>='
|
20
|
+
- !ruby/object:Gem::Version
|
21
|
+
version: '0'
|
22
|
+
type: :runtime
|
23
|
+
prerelease: false
|
24
|
+
version_requirements: !ruby/object:Gem::Requirement
|
25
|
+
none: false
|
26
|
+
requirements:
|
27
|
+
- - ! '>='
|
28
|
+
- !ruby/object:Gem::Version
|
29
|
+
version: '0'
|
30
|
+
- !ruby/object:Gem::Dependency
|
31
|
+
name: json-schema
|
32
|
+
requirement: !ruby/object:Gem::Requirement
|
33
|
+
none: false
|
34
|
+
requirements:
|
35
|
+
- - ! '>='
|
36
|
+
- !ruby/object:Gem::Version
|
37
|
+
version: '0'
|
38
|
+
type: :runtime
|
39
|
+
prerelease: false
|
40
|
+
version_requirements: !ruby/object:Gem::Requirement
|
41
|
+
none: false
|
42
|
+
requirements:
|
43
|
+
- - ! '>='
|
44
|
+
- !ruby/object:Gem::Version
|
45
|
+
version: '0'
|
46
|
+
- !ruby/object:Gem::Dependency
|
47
|
+
name: rest-client
|
48
|
+
requirement: !ruby/object:Gem::Requirement
|
49
|
+
none: false
|
50
|
+
requirements:
|
51
|
+
- - ! '>='
|
52
|
+
- !ruby/object:Gem::Version
|
53
|
+
version: '0'
|
54
|
+
type: :runtime
|
55
|
+
prerelease: false
|
56
|
+
version_requirements: !ruby/object:Gem::Requirement
|
57
|
+
none: false
|
58
|
+
requirements:
|
59
|
+
- - ! '>='
|
60
|
+
- !ruby/object:Gem::Version
|
61
|
+
version: '0'
|
62
|
+
- !ruby/object:Gem::Dependency
|
63
|
+
name: rspec
|
64
|
+
requirement: !ruby/object:Gem::Requirement
|
65
|
+
none: false
|
66
|
+
requirements:
|
67
|
+
- - ! '>='
|
68
|
+
- !ruby/object:Gem::Version
|
69
|
+
version: '0'
|
70
|
+
type: :development
|
71
|
+
prerelease: false
|
72
|
+
version_requirements: !ruby/object:Gem::Requirement
|
73
|
+
none: false
|
74
|
+
requirements:
|
75
|
+
- - ! '>='
|
76
|
+
- !ruby/object:Gem::Version
|
77
|
+
version: '0'
|
78
|
+
- !ruby/object:Gem::Dependency
|
79
|
+
name: simplecov-rcov
|
80
|
+
requirement: !ruby/object:Gem::Requirement
|
81
|
+
none: false
|
82
|
+
requirements:
|
83
|
+
- - ! '>='
|
84
|
+
- !ruby/object:Gem::Version
|
85
|
+
version: '0'
|
86
|
+
type: :development
|
87
|
+
prerelease: false
|
88
|
+
version_requirements: !ruby/object:Gem::Requirement
|
89
|
+
none: false
|
90
|
+
requirements:
|
91
|
+
- - ! '>='
|
92
|
+
- !ruby/object:Gem::Version
|
93
|
+
version: '0'
|
94
|
+
- !ruby/object:Gem::Dependency
|
95
|
+
name: fakeweb
|
96
|
+
requirement: !ruby/object:Gem::Requirement
|
97
|
+
none: false
|
98
|
+
requirements:
|
99
|
+
- - ~>
|
100
|
+
- !ruby/object:Gem::Version
|
101
|
+
version: '1.3'
|
102
|
+
type: :development
|
103
|
+
prerelease: false
|
104
|
+
version_requirements: !ruby/object:Gem::Requirement
|
105
|
+
none: false
|
106
|
+
requirements:
|
107
|
+
- - ~>
|
108
|
+
- !ruby/object:Gem::Version
|
109
|
+
version: '1.3'
|
110
|
+
description:
|
111
|
+
email:
|
112
|
+
- leigh@ldodds.com
|
113
|
+
executables: []
|
114
|
+
extensions: []
|
115
|
+
extra_rdoc_files: []
|
116
|
+
files:
|
117
|
+
- etc/jsontable-schema.json
|
118
|
+
- etc/datapackage-schema.json
|
119
|
+
- etc/README.md
|
120
|
+
- etc/csvddf-dialect-schema.json
|
121
|
+
- lib/datapackage.rb
|
122
|
+
- lib/datapackage/validator.rb
|
123
|
+
- lib/datapackage/package.rb
|
124
|
+
- lib/datapackage/version.rb
|
125
|
+
- LICENSE.md
|
126
|
+
- README.md
|
127
|
+
homepage: http://github.com/theodi/datapackage.rb
|
128
|
+
licenses: []
|
129
|
+
post_install_message:
|
130
|
+
rdoc_options: []
|
131
|
+
require_paths:
|
132
|
+
- lib
|
133
|
+
required_ruby_version: !ruby/object:Gem::Requirement
|
134
|
+
none: false
|
135
|
+
requirements:
|
136
|
+
- - ! '>='
|
137
|
+
- !ruby/object:Gem::Version
|
138
|
+
version: '0'
|
139
|
+
required_rubygems_version: !ruby/object:Gem::Requirement
|
140
|
+
none: false
|
141
|
+
requirements:
|
142
|
+
- - ! '>='
|
143
|
+
- !ruby/object:Gem::Version
|
144
|
+
version: '0'
|
145
|
+
requirements: []
|
146
|
+
rubyforge_project:
|
147
|
+
rubygems_version: 1.8.23
|
148
|
+
signing_key:
|
149
|
+
specification_version: 3
|
150
|
+
summary: Library for working with data packages
|
151
|
+
test_files: []
|
152
|
+
has_rdoc:
|