datapackage 0.0.1

Sign up to get free protection for your applications and to get access to all the features.
data/LICENSE.md ADDED
@@ -0,0 +1,20 @@
1
+ Copyright 2013 The Open Data Institute
2
+
3
+ Permission is hereby granted, free of charge, to any person obtaining
4
+ a copy of this software and associated documentation files (the
5
+ "Software"), to deal in the Software without restriction, including
6
+ without limitation the rights to use, copy, modify, merge, publish,
7
+ distribute, sublicense, and/or sell copies of the Software, and to
8
+ permit persons to whom the Software is furnished to do so, subject to
9
+ the following conditions:
10
+
11
+ The above copyright notice and this permission notice shall be
12
+ included in all copies or substantial portions of the Software.
13
+
14
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
15
+ EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
16
+ MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
17
+ NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE
18
+ LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION
19
+ OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION
20
+ WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
data/README.md ADDED
@@ -0,0 +1,184 @@
1
+ # DataPackage.rb
2
+
3
+ A ruby library for working with [Data Packages](http://dataprotocols.org/data-packages/).
4
+
5
+ [![Build Status](http://jenkins.theodi.org/job/datapackage.rb-master/badge/icon)](http://jenkins.theodi.org/job/datapackage.rb-master/)
6
+ [![Code Climate](https://codeclimate.com/github/theodi/datapackage.rb.png)](https://codeclimate.com/github/theodi/datapackage.rb)
7
+ [![Dependency Status](https://gemnasium.com/theodi/datapackage.rb.png)](https://gemnasium.com/theodi/datapackage.rb)
8
+
9
+ The library is intending to support:
10
+
11
+ * Parsing and using data package metadata and data
12
+ * Validating data packages to ensure they conform with the Data Package specification
13
+
14
+ ## Installation
15
+
16
+ Add the gem into your Gemfile:
17
+
18
+ gem 'datapackage.rb', :git => "git://github.com/theodi/datapackage.rb.git"
19
+
20
+ Note: gem release to come
21
+
22
+ ## Basic Usage
23
+
24
+ Require the gem, if you need to:
25
+
26
+ require 'datapackage.rb'
27
+
28
+ Parsing a datapackage from a remote location:
29
+
30
+ package = DataPackage::Package.new( "http://example.org/datasets/a" )
31
+
32
+ This assumes that `http://example.org/datasets/a/datapackage.json` exists, or specifically load a JSON file:
33
+
34
+ package = DataPackage::Package.new( "http://example.org/datasets/a/datapackage.json" )
35
+
36
+ Similarly you can load a package from a local JSON file, or specify a directory:
37
+
38
+ package = DataPackage::Package.new( "/my/data/package" )
39
+ package = DataPackage::Package.new( "/my/data/package/datapackage.json" )
40
+
41
+ There are a set of helper methods for accessing data from the package, e.g:
42
+
43
+ package = DataPackage::Package.new( "/my/data/package" )
44
+ package.name
45
+ package.title
46
+ package.licenses
47
+ package.resources
48
+
49
+ These currently just return the raw JSON structure, but this might change in future.
50
+
51
+ ## Package Validation
52
+
53
+ The library supports validating packages. It can be used to validate both the metadata for the package (`datapackage.json`)
54
+ and the integrity of the package itself, e.g. whether the data files exist.
55
+
56
+ ### Validating a Package
57
+
58
+ Quickly checking the validity of a package can be achieve as follows:
59
+
60
+ package.valid?
61
+
62
+ To expose more detail on errors and warnings:
63
+
64
+ messages = package.validate() # or package.validate(:datapackage)
65
+
66
+ This returns an object with two keys: `:errors` and `:warnings`. These are arrays of messages.
67
+
68
+ Warnings might include notes on missing metadata elements (e.g. package `licenses`) which are not required by the DataPackage specification
69
+ but which SHOULD be included.
70
+
71
+ It is possible to treat all warnings as errors by performing strict validation:
72
+
73
+ package.valid?(true)
74
+
75
+ Warnings are currently generated for:
76
+
77
+ * Missing `README.md` files from packages
78
+ * Missing `licenses` key from `datapackage.json`
79
+ * Missing `datapackage_version` key from `datapackage.json`
80
+
81
+ ### Selecting a Validation Profile
82
+
83
+ The library contains two validation classes, one for the core Data Package specification and the other for the Simple Data Format
84
+ rules. By default the library uses the more liberal Data Package rules.
85
+
86
+ The required profile can be specified in one of two ways. Either as a parameter to the validation methods:
87
+
88
+ package.valid?(:datapackage)
89
+ package.valid?(:simpledataformat)
90
+ package.validate(:datapackage)
91
+ package.validate(:simpledataformat)
92
+
93
+ Or, by using a `DataPackage::Validation` class:
94
+
95
+ validator = DataPackage::SimpleDataFormatValidator.new
96
+ validator.valid?( package )
97
+ validator.validate( package )
98
+
99
+ ### Approach to Validation
100
+
101
+ The library will support validating packages against the general rules specified in the
102
+ [DataPackage specification](http://dataprotocols.org/data-packages/) as well as the stricter requirements given in the
103
+ [Simple Data Format specification](http://dataprotocols.org/simple-data-format/) (SDF).
104
+
105
+ SDF is essentially a profile of DataPackage which includes some additional restrictions on
106
+ how data should be published and described. For example all data is to be published as CSV files.
107
+
108
+ The validation in the library is divided into two parts:
109
+
110
+ * Metadata validation -- checking that the structure of the `datapackage.json` file is correcgt
111
+ * Integrity checking -- checking that the overall package and data files appear to be in order
112
+
113
+ #### Metadata Validation
114
+
115
+ The basic structure of `datapackage.json` files are validated using [JSON Schema](http://json-schema.org/). This provides a simple
116
+ declarative way to describe the expected structure of the package metadata.
117
+
118
+ The schema files can be found in the `etc` directory of the project and could be used in other applications. The schema files can
119
+ be customised to support local validation checks (see below).
120
+
121
+ #### Integrity Checking
122
+
123
+ While the metadata for a package might be correct, there are other ways in which the package could be invalid. For example,
124
+ data files might be missing or incorrectly described.
125
+
126
+ The metadata validation is therefore supplemented with some custom code that performs some other checks:
127
+
128
+ * (Both profiles) All resources described in the package must be accessible, e.g. the local file exists or a URL responds successfully to a `HEAD`
129
+ * (`:simpledataformat`) All resources must be CSV files
130
+ * (`:simpledataformat`) All resources must have a valid JSON Table Schema
131
+ * (`:simpledataformat`) CSV `dialect` descriptions must be valid
132
+ * (`:simpledataformat`) All fields declared in the schema must be present in the CSV file
133
+ * (`:simpledataformat`) All fields present in the CSV file must be present in the schema
134
+
135
+ ### Customising the Validation Code
136
+
137
+ The library provides several extension points for customising the way that packages are validated.
138
+
139
+ #### Supplying Custom JSON Schemas
140
+
141
+ Custom JSON schemas can be provided to allow validation to be tweaked for local conventions. An options hash can be
142
+ provided to the constructor of a `DataPackage::Validator` object, this can be used to map schema names to custom
143
+ schemas.
144
+
145
+ (Any options passed to the constructor of a `DataPackage::Package` object will also be passed to its validator)
146
+
147
+ For example to create a new validation profile called `my-validation-rules` and then apply it:
148
+
149
+ opts = {
150
+ :schema => {
151
+ :my-validation-rules => "/path/to/json/schema.json"
152
+ }
153
+ }
154
+ package = DataPackage::Package.new( url )
155
+ package.valid?(:my-validation-rules)
156
+
157
+ This will cause the code to create a custom `DataPackage::Validator` instance that will apply the supplied schema. This class
158
+ does not provide any integrity checks.
159
+
160
+ To mix a custom schema with the existing integrity checking, you must manually create a `Validator` instance. E.g:
161
+
162
+ opts = {
163
+ :schema => {
164
+ :my-validation-rules => "/path/to/json/schema.json"
165
+ }
166
+ }
167
+ validator = DataPackage::SimpleDataFormatValidator(:my-validation-rules, opts)
168
+ validator.valid?( package )
169
+
170
+ Custom schemas must be valid JSON files that conforms to the JSON Schema v4 specification. The absolute path to the schema file must be
171
+ provided.
172
+
173
+ Validation is performed using the [json-schema](https://github.com/hoxworth/json-schema) gem which has some documented restrictions.
174
+
175
+ The built-in schema files can also be overridden in this way, e.g. by specifying an alternate location for the `:datapackage` schema.
176
+
177
+ #### Custom Integrity Checking
178
+
179
+ Integrity checking can be customized by creating new sub-classes of `DataPackage::Validator` or one of the existing sub-classes.
180
+
181
+ The following methods can be implemented:
182
+
183
+ * `validate_metadata(package, messages)` -- perform additional metadata checking after JSON schema is provided.
184
+ * `validate_resource(package, resource, messages)` -- called for each resource in the package
data/etc/README.md ADDED
@@ -0,0 +1,18 @@
1
+ This directory contains some JSON Schema documents for validating:
2
+
3
+ * `datapackage-schema.json` -- [datapackage.json](http://dataprotocols.org/data-packages/) package files
4
+ * `jsontable-schema.json` -- [JSON Table Schemas](http://dataprotocols.org/json-table-schema/) objects
5
+ * `csvddf-dialect-schema.json` -- [CSV Dialect Description Format](http://dataprotocols.org/csv-dialect/) dialect objects
6
+
7
+ The JSON Table Schemas and CSV Dialect Description Format both define JSON object structures that can appear in `datapackage.json` files (via the `schema` and `dialect` keywords). In the main `datapackage-schema.json` object, these keywords are only validated as simple objects.
8
+
9
+ In the application the subsidiary schemas are automatically applied to relevant keys. This could be improved by using JSON Schema cross-referencing.
10
+
11
+ Other potential improvements include:
12
+
13
+ * Add `data` keyword validation to `datapackage-schema.json`
14
+ * Add `format` keywords for validating email addresses and date/date-times
15
+ * Or, add `pattern` for validating dates
16
+ * Improve regexs used in various places
17
+
18
+
@@ -0,0 +1,24 @@
1
+ {
2
+ "$schema": "http://json-schema.org/draft-04/schema#",
3
+ "title": "CSVDDF",
4
+ "description": "JSON Schema for validating CSVDDF dialect structures",
5
+ "type": "object",
6
+ "properties": {
7
+ "delimiter": {
8
+ "type": "string"
9
+ },
10
+ "doublequote": {
11
+ "type": "boolean"
12
+ },
13
+ "lineterminator": {
14
+ "type": "string"
15
+ },
16
+ "quotechar": {
17
+ "type": "string"
18
+ },
19
+ "skipinitialspace": {
20
+ "type": "boolean"
21
+ }
22
+ },
23
+ "required": [ "delimiter", "doublequote", "lineterminator", "quotechar", "skipinitialspace" ]
24
+ }
@@ -0,0 +1,208 @@
1
+ {
2
+ "$schema": "http://json-schema.org/draft-04/schema#",
3
+ "title": "DataPackage",
4
+ "description": "JSON Schema for validating datapackage.json files",
5
+ "type": "object",
6
+ "properties": {
7
+ "name": {
8
+ "type": "string",
9
+ "pattern": "^([a-z\\.\\_\\-])+$"
10
+ },
11
+ "licences": {
12
+ "type": "array",
13
+ "items": {
14
+ "type": "object",
15
+ "properties": {
16
+ "id": { "type": "string" },
17
+ "url": { "type": "string" }
18
+ },
19
+ "anyOf": [
20
+ { "title": "id required", "required": ["id"] },
21
+ { "title": "url required", "required": ["url"] }
22
+ ]
23
+ }
24
+ },
25
+ "datapackage_version": {
26
+ "type": "string"
27
+ },
28
+ "title": {
29
+ "type": "string"
30
+ },
31
+ "description": {
32
+ "type": "string"
33
+ },
34
+ "homepage": {
35
+ "type": "string"
36
+ },
37
+ "version": {
38
+ "type": "string"
39
+ },
40
+ "sources": {
41
+ "type": "array",
42
+ "items": {
43
+ "type": "object",
44
+ "properties": {
45
+ "name": { "type": "string" },
46
+ "web": { "type": "string" },
47
+ "email": { "type": "string" }
48
+ },
49
+ "anyOf": [
50
+ { "title": "name required", "required": ["name"] },
51
+ { "title": "web required", "required": ["web"] },
52
+ { "title": "email required", "required": ["email"] }
53
+ ]
54
+ }
55
+ },
56
+ "keywords": {
57
+ "type": "array",
58
+ "items": {
59
+ "type": "string"
60
+ }
61
+ },
62
+ "last_modified": {
63
+ "type": "string"
64
+ },
65
+ "image": {
66
+ "type": "string"
67
+ },
68
+ "bugs": {
69
+ "type": "string"
70
+ },
71
+ "maintainers": {
72
+ "type": "array",
73
+ "items": {
74
+ "type": "object",
75
+ "properties": {
76
+ "name": {
77
+ "type": "string"
78
+ },
79
+ "email": {
80
+ "type": "string"
81
+ },
82
+ "web": {
83
+ "type": "string"
84
+ }
85
+ },
86
+ "required": ["name"]
87
+ }
88
+ },
89
+ "contributors": {
90
+ "type": "array",
91
+ "items": {
92
+ "type": "object",
93
+ "properties": {
94
+ "name": {
95
+ "type": "string"
96
+ },
97
+ "email": {
98
+ "type": "string"
99
+ },
100
+ "web": {
101
+ "type": "string"
102
+ }
103
+ },
104
+ "required": ["name"]
105
+ }
106
+ },
107
+ "publisher": {
108
+ "type": "array",
109
+ "items": {
110
+ "type": "object",
111
+ "properties": {
112
+ "name": {
113
+ "type": "string"
114
+ },
115
+ "email": {
116
+ "type": "string"
117
+ },
118
+ "web": {
119
+ "type": "string"
120
+ }
121
+ },
122
+ "required": ["name"]
123
+ }
124
+ },
125
+ "dependencies": {
126
+ "type": "object"
127
+ },
128
+ "resources": {
129
+ "type": "array",
130
+ "minItems": 1,
131
+ "items": {
132
+ "type": "object",
133
+ "properties": {
134
+ "url": {
135
+ "type": "string"
136
+ },
137
+ "path": {
138
+ "type": "string"
139
+ },
140
+ "name": {
141
+ "type": "string"
142
+ },
143
+ "format": {
144
+ "type": "string"
145
+ },
146
+ "mediatype": {
147
+ "type": "string",
148
+ "pattern": "^(.+)/(.+)$"
149
+ },
150
+ "encoding": {
151
+ "type": "string"
152
+ },
153
+ "bytes": {
154
+ "type": "integer"
155
+ },
156
+ "hash": {
157
+ "type": "string",
158
+ "pattern": "^([a-fA-F0-9]{32})$"
159
+ },
160
+ "modified": {
161
+ "type": "string"
162
+ },
163
+ "schema": {
164
+ "type": "object"
165
+ },
166
+ "dialect": {
167
+ "type": "object"
168
+ },
169
+ "sources": {
170
+ "type": "array",
171
+ "items": {
172
+ "type": "object",
173
+ "properties": {
174
+ "name": { "type": "string" },
175
+ "web": { "type": "string" },
176
+ "email": { "type": "string" }
177
+ },
178
+ "anyOf": [
179
+ { "title": "name required", "required": ["name"] },
180
+ { "title": "web required", "required": ["web"] },
181
+ { "title": "email required", "required": ["email"] }
182
+ ]
183
+ }
184
+ },
185
+ "licences": {
186
+ "type": "array",
187
+ "items": {
188
+ "type": "object",
189
+ "properties": {
190
+ "id": { "type": "string" },
191
+ "url": { "type": "string" }
192
+ },
193
+ "anyOf": [
194
+ { "title": "id required", "required": ["id"] },
195
+ { "title": "url required", "required": ["url"] }
196
+ ]
197
+ }
198
+ }
199
+ },
200
+ "anyOf": [
201
+ { "title": "url required", "required": ["url"] },
202
+ { "title": "path required", "required": ["path"] }
203
+ ]
204
+ }
205
+ }
206
+ },
207
+ "required": ["name", "resources"]
208
+ }
@@ -0,0 +1,34 @@
1
+ {
2
+ "$schema": "http://json-schema.org/draft-04/schema#",
3
+ "title": "JSON Table Schema",
4
+ "description": "JSON Schema for validating JSON Table structures",
5
+ "type": "object",
6
+ "properties": {
7
+ "fields": {
8
+ "type": "array",
9
+ "minItems": 1,
10
+ "items": {
11
+ "type": "object",
12
+ "properties": {
13
+ "name": {
14
+ "type": "string"
15
+ },
16
+ "title": {
17
+ "type": "string"
18
+ },
19
+ "description": {
20
+ "type": "string"
21
+ },
22
+ "type": {
23
+ "enum": [ "string", "number", "integer", "date", "time", "datetime", "boolean", "binary", "object", "geopoint", "geojson", "array", "any" ]
24
+ },
25
+ "format": {
26
+ "type": "string"
27
+ }
28
+ },
29
+ "required": ["name"]
30
+ }
31
+ }
32
+ },
33
+ "required": ["fields"]
34
+ }
@@ -0,0 +1,160 @@
1
+ require 'open-uri'
2
+
3
+ module DataPackage
4
+
5
+ class Package
6
+
7
+ attr_reader :metadata, :opts
8
+
9
+ # Parse a data package
10
+ #
11
+ # Supports reading data from JSON file, directory, and a URL
12
+ #
13
+ # package:: Hash or a String
14
+ # opts:: Options used to customize reading and parsing
15
+ def initialize(package, opts={})
16
+ @opts = opts
17
+ #TODO base directory/url
18
+ if package.class == Hash
19
+ @metadata = package
20
+ else
21
+ if !package.start_with?("http") && File.directory?(package)
22
+ package = File.join(package, opts[:default_filename] || "datapackage.json")
23
+ end
24
+ if package.start_with?("http") && !package.end_with?("datapackage.json")
25
+ package = URI.join(package, "datapackage.json")
26
+ end
27
+ @location = package.to_s
28
+ @metadata = JSON.parse( open(package).read )
29
+ end
30
+ end
31
+
32
+ #Returns the directory for a local file package or base url for a remote
33
+ #Returns nil for an in-memory object (because it has no base as yet)
34
+ def base
35
+ #user can override base
36
+ return @opts[:base] if @opts[:base]
37
+ return "" unless @location
38
+ #work out base directory or uri
39
+ if local?
40
+ return File.dirname( @location )
41
+ else
42
+ return @location.split("/")[0..-2].join("/")
43
+ end
44
+ end
45
+
46
+ #Is this a local package? Returns true if created from an in-memory object or a file/directory reference
47
+ def local?
48
+ return !@location.start_with?("http") if @location
49
+ return true
50
+ end
51
+
52
+ def name
53
+ @metadata["name"]
54
+ end
55
+
56
+ def title
57
+ @metadata["title"]
58
+ end
59
+
60
+ def description
61
+ @metadata["description"]
62
+ end
63
+
64
+ def homepage
65
+ @metadata["homepage"]
66
+ end
67
+
68
+ def licenses
69
+ @metadata["licenses"] || []
70
+ end
71
+ alias_method :licences, :licenses
72
+
73
+ #What version of datapackage specification is this using?
74
+ def datapackage_version
75
+ @metadata["datapackage_version"]
76
+ end
77
+
78
+ #What is the version of this specific data package?
79
+ def version
80
+ @metadata["version"]
81
+ end
82
+
83
+ def sources
84
+ @metadata["sources"] || []
85
+ end
86
+
87
+ def keywords
88
+ @metadata["keywords"] || []
89
+ end
90
+
91
+ def last_modified
92
+ DateTime.parse @metadata["last_modified"] rescue nil
93
+ end
94
+
95
+ def image
96
+ @metadata["image"]
97
+ end
98
+
99
+ def maintainers
100
+ @metadata["maintainers"] || []
101
+ end
102
+
103
+ def contributors
104
+ @metadata["contributors"] || []
105
+ end
106
+
107
+ def publisher
108
+ @metadata["publisher"] || []
109
+ end
110
+
111
+ def resources
112
+ @metadata["resources"] || []
113
+ end
114
+
115
+ def dependencies
116
+ @metadata["dependencies"]
117
+ end
118
+
119
+ def property(property, default=nil)
120
+ @metadata[property] || default
121
+ end
122
+
123
+ def valid?(profile=:datapackage, strict=false)
124
+ validator = DataPackage::Validator.create(profile, @opts)
125
+ return validator.valid?(self, strict)
126
+ end
127
+
128
+ def validate(profile=:datapackage)
129
+ validator = DataPackage::Validator.create(profile, @opts)
130
+ return validator.validate(self)
131
+ end
132
+
133
+ def resolve_resource(resource)
134
+ return resource["url"] || resolve( resource["path"] )
135
+ end
136
+
137
+ def resolve(path)
138
+ if local?
139
+ return File.join( base , path) if base != ""
140
+ return path
141
+ else
142
+ return URI.join(base, path)
143
+ end
144
+ end
145
+
146
+ def resource_exists?(location)
147
+ if !location.to_s.start_with?("http")
148
+ return File.exists?( location )
149
+ else
150
+ begin
151
+ status = RestClient.head( location ).code
152
+ return status == 200
153
+ rescue => e
154
+ return false
155
+ end
156
+ end
157
+ end
158
+
159
+ end
160
+ end
@@ -0,0 +1,176 @@
1
+ module DataPackage
2
+
3
+ #Base class for validators
4
+ class Validator
5
+
6
+ def Validator.create(profile, opts={})
7
+ if profile == :simpledataformat
8
+ return SimpleDataFormatValidator.new(profile, opts)
9
+ end
10
+ if profile == :datapackage
11
+ return DataPackageValidator.new(profile, opts)
12
+ end
13
+ return Validator.new(profile, opts)
14
+ end
15
+
16
+ def initialize(schema_name, opts={})
17
+ @schema_name = schema_name
18
+ @opts = opts
19
+ end
20
+
21
+ def valid?(package, strict=false)
22
+ messages = validate( package )
23
+ return messages[:errors].empty? if !strict
24
+ return messages[:errors].empty? && messages[:warnings].empty?
25
+ end
26
+
27
+ def validate( package )
28
+ return validate_integrity( package, validate_with_schema(package) )
29
+ end
30
+
31
+ def validate_with_schema(package)
32
+ schema = load_schema(@schema_name)
33
+ messages = {
34
+ :errors => JSON::Validator.fully_validate(schema, package.metadata, :errors_as_objects => true),
35
+ :warnings => []
36
+ }
37
+ validate_metadata(package, messages)
38
+ return messages
39
+ end
40
+
41
+ def validate_integrity(package, messages={ :errors=>[], :warnings=>[] } )
42
+ package.resources.each do |resource|
43
+ validate_resource(package, resource, messages)
44
+ end
45
+
46
+ messages
47
+ end
48
+
49
+ protected
50
+
51
+ #implement to perform additional validation on metadata
52
+ def validate_metadata(package, messages)
53
+ end
54
+
55
+ #implement for per-resource validation
56
+ def validate_resource(package, resource, messages)
57
+ end
58
+
59
+ protected
60
+
61
+ def load_schema(profile)
62
+ if @opts[:schema] && @opts[:schema][profile]
63
+ if !File.exists?( @opts[:schema][profile] )
64
+ raise "User supplied schema file does not exist: #{@opts[:schema][profile]}"
65
+ end
66
+ return JSON.parse( File.read( @opts[:schema][profile] ) )
67
+ end
68
+ schema_file = file_in_etc_directory( "#{profile}-schema.json" )
69
+ if !File.exists?( schema_file )
70
+ raise "Unable to read schema file #{schema_file} for validation profile #{profile}"
71
+ end
72
+ return JSON.parse( File.read( schema_file ) )
73
+ end
74
+
75
+ private
76
+
77
+ def file_in_etc_directory(filename)
78
+ File.join( File.dirname(__FILE__), "..", "..", "etc", filename )
79
+ end
80
+
81
+ end
82
+
83
+ #Extends base class with some additional checks for DataPackage conformance.
84
+ #
85
+ #These include some warnings about missing metadata elements and an existence
86
+ #check for all resources
87
+ class DataPackageValidator < Validator
88
+ def initialize(schema_name=:datapackage, opts={})
89
+ super(:datapackage, opts)
90
+ end
91
+
92
+ def validate_metadata(package, messages)
93
+ #not required, but recommended
94
+ prefix = "The package does not include a"
95
+ messages[:warnings] << "#{prefix} 'licenses' property" if package.licenses.empty?
96
+ messages[:warnings] << "#{prefix} 'datapackage_version' property" unless package.datapackage_version
97
+ messages[:warnings] << "#{prefix} README.md file" unless package.resource_exists?( package.resolve("README.md") )
98
+ end
99
+
100
+ def validate_resource(package, resource, messages)
101
+ if !package.resource_exists?( package.resolve_resource( resource ) )
102
+ messages[:errors] << "Resource #{resource["url"] || resource["path"]} does not exist"
103
+ end
104
+ end
105
+
106
+ end
107
+
108
+ #Validator that checks whether a package conforms to the Simple Data Format profile
109
+ class SimpleDataFormatValidator < DataPackageValidator
110
+
111
+ def initialize(schema_name=:datapackage, opts={})
112
+ super(:datapackage, opts)
113
+ @jsontable_schema = load_schema(:jsontable)
114
+ @csvddf_schema = load_schema("csvddf-dialect")
115
+ end
116
+
117
+ def validate_resource(package, resource, messages)
118
+ super(package, resource, messages)
119
+
120
+ if !csv?(resource)
121
+ messages[:errors] << "#{resource["name"]} is not a CSV file"
122
+ else
123
+ if !resource["schema"]
124
+ messages[:errors] << "#{resource["name"]} does not have a schema"
125
+ else
126
+ messages[:errors] +=
127
+ JSON::Validator.fully_validate(@jsontable_schema,
128
+ resource["schema"], :errors_as_objects => true)
129
+ end
130
+ if resource["dialect"]
131
+ messages[:errors] +=
132
+ JSON::Validator.fully_validate(@csvddf_schema,
133
+ resource["dialect"], :errors_as_objects => true)
134
+ end
135
+
136
+ if resource["schema"] && resource["schema"]["fields"]
137
+ fields = resource["schema"]["fields"]
138
+ declared_fields = fields.map{ |f| f["name"] }
139
+ headers = headers(package, resource)
140
+
141
+ #set algebra to finding fields missing from schema and/or CSV file
142
+ missing_fields = declared_fields - headers
143
+ if missing_fields != []
144
+ messages[:errors] <<
145
+ "Declared schema has fields not present in CSV file (#{missing_fields.join(",")})"
146
+ end
147
+ undeclared_fields = headers - declared_fields
148
+ if undeclared_fields != []
149
+ messages[:errors] << "CSV file has fields missing from schema (#{undeclared_fields.join(",")})"
150
+ end
151
+ end
152
+
153
+ end
154
+
155
+ end
156
+
157
+ def csv?(resource)
158
+ resource["mediatype"] == "text/csv" ||
159
+ resource["format"] == "csv"
160
+ end
161
+
162
+ def headers(package, resource)
163
+ headers = []
164
+ opts = dialect_to_csv_options(resource["dialect"])
165
+ CSV.open( package.resolve_resource(resource), "r", opts) do |csv|
166
+ headers = csv.shift
167
+ end
168
+ return headers
169
+ end
170
+
171
+ def dialect_to_csv_options(dialect)
172
+ return {}
173
+ end
174
+ end
175
+
176
+ end
@@ -0,0 +1,3 @@
1
+ module DataPackage
2
+ VERSION = "0.0.1"
3
+ end
@@ -0,0 +1,11 @@
1
+ require 'date'
2
+ require 'uri'
3
+ require 'net/http'
4
+ require 'csv'
5
+ require 'json'
6
+ require 'json-schema'
7
+ require 'rest-client'
8
+
9
+ require 'datapackage/version'
10
+ require 'datapackage/validator'
11
+ require 'datapackage/package'
metadata ADDED
@@ -0,0 +1,152 @@
1
+ --- !ruby/object:Gem::Specification
2
+ name: datapackage
3
+ version: !ruby/object:Gem::Version
4
+ version: 0.0.1
5
+ prerelease:
6
+ platform: ruby
7
+ authors:
8
+ - Leigh Dodds
9
+ autorequire:
10
+ bindir: bin
11
+ cert_chain: []
12
+ date: 2013-12-05 00:00:00.000000000 Z
13
+ dependencies:
14
+ - !ruby/object:Gem::Dependency
15
+ name: json
16
+ requirement: !ruby/object:Gem::Requirement
17
+ none: false
18
+ requirements:
19
+ - - ! '>='
20
+ - !ruby/object:Gem::Version
21
+ version: '0'
22
+ type: :runtime
23
+ prerelease: false
24
+ version_requirements: !ruby/object:Gem::Requirement
25
+ none: false
26
+ requirements:
27
+ - - ! '>='
28
+ - !ruby/object:Gem::Version
29
+ version: '0'
30
+ - !ruby/object:Gem::Dependency
31
+ name: json-schema
32
+ requirement: !ruby/object:Gem::Requirement
33
+ none: false
34
+ requirements:
35
+ - - ! '>='
36
+ - !ruby/object:Gem::Version
37
+ version: '0'
38
+ type: :runtime
39
+ prerelease: false
40
+ version_requirements: !ruby/object:Gem::Requirement
41
+ none: false
42
+ requirements:
43
+ - - ! '>='
44
+ - !ruby/object:Gem::Version
45
+ version: '0'
46
+ - !ruby/object:Gem::Dependency
47
+ name: rest-client
48
+ requirement: !ruby/object:Gem::Requirement
49
+ none: false
50
+ requirements:
51
+ - - ! '>='
52
+ - !ruby/object:Gem::Version
53
+ version: '0'
54
+ type: :runtime
55
+ prerelease: false
56
+ version_requirements: !ruby/object:Gem::Requirement
57
+ none: false
58
+ requirements:
59
+ - - ! '>='
60
+ - !ruby/object:Gem::Version
61
+ version: '0'
62
+ - !ruby/object:Gem::Dependency
63
+ name: rspec
64
+ requirement: !ruby/object:Gem::Requirement
65
+ none: false
66
+ requirements:
67
+ - - ! '>='
68
+ - !ruby/object:Gem::Version
69
+ version: '0'
70
+ type: :development
71
+ prerelease: false
72
+ version_requirements: !ruby/object:Gem::Requirement
73
+ none: false
74
+ requirements:
75
+ - - ! '>='
76
+ - !ruby/object:Gem::Version
77
+ version: '0'
78
+ - !ruby/object:Gem::Dependency
79
+ name: simplecov-rcov
80
+ requirement: !ruby/object:Gem::Requirement
81
+ none: false
82
+ requirements:
83
+ - - ! '>='
84
+ - !ruby/object:Gem::Version
85
+ version: '0'
86
+ type: :development
87
+ prerelease: false
88
+ version_requirements: !ruby/object:Gem::Requirement
89
+ none: false
90
+ requirements:
91
+ - - ! '>='
92
+ - !ruby/object:Gem::Version
93
+ version: '0'
94
+ - !ruby/object:Gem::Dependency
95
+ name: fakeweb
96
+ requirement: !ruby/object:Gem::Requirement
97
+ none: false
98
+ requirements:
99
+ - - ~>
100
+ - !ruby/object:Gem::Version
101
+ version: '1.3'
102
+ type: :development
103
+ prerelease: false
104
+ version_requirements: !ruby/object:Gem::Requirement
105
+ none: false
106
+ requirements:
107
+ - - ~>
108
+ - !ruby/object:Gem::Version
109
+ version: '1.3'
110
+ description:
111
+ email:
112
+ - leigh@ldodds.com
113
+ executables: []
114
+ extensions: []
115
+ extra_rdoc_files: []
116
+ files:
117
+ - etc/jsontable-schema.json
118
+ - etc/datapackage-schema.json
119
+ - etc/README.md
120
+ - etc/csvddf-dialect-schema.json
121
+ - lib/datapackage.rb
122
+ - lib/datapackage/validator.rb
123
+ - lib/datapackage/package.rb
124
+ - lib/datapackage/version.rb
125
+ - LICENSE.md
126
+ - README.md
127
+ homepage: http://github.com/theodi/datapackage.rb
128
+ licenses: []
129
+ post_install_message:
130
+ rdoc_options: []
131
+ require_paths:
132
+ - lib
133
+ required_ruby_version: !ruby/object:Gem::Requirement
134
+ none: false
135
+ requirements:
136
+ - - ! '>='
137
+ - !ruby/object:Gem::Version
138
+ version: '0'
139
+ required_rubygems_version: !ruby/object:Gem::Requirement
140
+ none: false
141
+ requirements:
142
+ - - ! '>='
143
+ - !ruby/object:Gem::Version
144
+ version: '0'
145
+ requirements: []
146
+ rubyforge_project:
147
+ rubygems_version: 1.8.23
148
+ signing_key:
149
+ specification_version: 3
150
+ summary: Library for working with data packages
151
+ test_files: []
152
+ has_rdoc: