datapackage 0.0.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
data/LICENSE.md ADDED
@@ -0,0 +1,20 @@
1
+ Copyright 2013 The Open Data Institute
2
+
3
+ Permission is hereby granted, free of charge, to any person obtaining
4
+ a copy of this software and associated documentation files (the
5
+ "Software"), to deal in the Software without restriction, including
6
+ without limitation the rights to use, copy, modify, merge, publish,
7
+ distribute, sublicense, and/or sell copies of the Software, and to
8
+ permit persons to whom the Software is furnished to do so, subject to
9
+ the following conditions:
10
+
11
+ The above copyright notice and this permission notice shall be
12
+ included in all copies or substantial portions of the Software.
13
+
14
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
15
+ EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
16
+ MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
17
+ NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE
18
+ LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION
19
+ OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION
20
+ WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
data/README.md ADDED
@@ -0,0 +1,184 @@
1
+ # DataPackage.rb
2
+
3
+ A ruby library for working with [Data Packages](http://dataprotocols.org/data-packages/).
4
+
5
+ [![Build Status](http://jenkins.theodi.org/job/datapackage.rb-master/badge/icon)](http://jenkins.theodi.org/job/datapackage.rb-master/)
6
+ [![Code Climate](https://codeclimate.com/github/theodi/datapackage.rb.png)](https://codeclimate.com/github/theodi/datapackage.rb)
7
+ [![Dependency Status](https://gemnasium.com/theodi/datapackage.rb.png)](https://gemnasium.com/theodi/datapackage.rb)
8
+
9
+ The library is intending to support:
10
+
11
+ * Parsing and using data package metadata and data
12
+ * Validating data packages to ensure they conform with the Data Package specification
13
+
14
+ ## Installation
15
+
16
+ Add the gem into your Gemfile:
17
+
18
+ gem 'datapackage.rb', :git => "git://github.com/theodi/datapackage.rb.git"
19
+
20
+ Note: gem release to come
21
+
22
+ ## Basic Usage
23
+
24
+ Require the gem, if you need to:
25
+
26
+ require 'datapackage.rb'
27
+
28
+ Parsing a datapackage from a remote location:
29
+
30
+ package = DataPackage::Package.new( "http://example.org/datasets/a" )
31
+
32
+ This assumes that `http://example.org/datasets/a/datapackage.json` exists, or specifically load a JSON file:
33
+
34
+ package = DataPackage::Package.new( "http://example.org/datasets/a/datapackage.json" )
35
+
36
+ Similarly you can load a package from a local JSON file, or specify a directory:
37
+
38
+ package = DataPackage::Package.new( "/my/data/package" )
39
+ package = DataPackage::Package.new( "/my/data/package/datapackage.json" )
40
+
41
+ There are a set of helper methods for accessing data from the package, e.g:
42
+
43
+ package = DataPackage::Package.new( "/my/data/package" )
44
+ package.name
45
+ package.title
46
+ package.licenses
47
+ package.resources
48
+
49
+ These currently just return the raw JSON structure, but this might change in future.
50
+
51
+ ## Package Validation
52
+
53
+ The library supports validating packages. It can be used to validate both the metadata for the package (`datapackage.json`)
54
+ and the integrity of the package itself, e.g. whether the data files exist.
55
+
56
+ ### Validating a Package
57
+
58
+ Quickly checking the validity of a package can be achieve as follows:
59
+
60
+ package.valid?
61
+
62
+ To expose more detail on errors and warnings:
63
+
64
+ messages = package.validate() # or package.validate(:datapackage)
65
+
66
+ This returns an object with two keys: `:errors` and `:warnings`. These are arrays of messages.
67
+
68
+ Warnings might include notes on missing metadata elements (e.g. package `licenses`) which are not required by the DataPackage specification
69
+ but which SHOULD be included.
70
+
71
+ It is possible to treat all warnings as errors by performing strict validation:
72
+
73
+ package.valid?(true)
74
+
75
+ Warnings are currently generated for:
76
+
77
+ * Missing `README.md` files from packages
78
+ * Missing `licenses` key from `datapackage.json`
79
+ * Missing `datapackage_version` key from `datapackage.json`
80
+
81
+ ### Selecting a Validation Profile
82
+
83
+ The library contains two validation classes, one for the core Data Package specification and the other for the Simple Data Format
84
+ rules. By default the library uses the more liberal Data Package rules.
85
+
86
+ The required profile can be specified in one of two ways. Either as a parameter to the validation methods:
87
+
88
+ package.valid?(:datapackage)
89
+ package.valid?(:simpledataformat)
90
+ package.validate(:datapackage)
91
+ package.validate(:simpledataformat)
92
+
93
+ Or, by using a `DataPackage::Validation` class:
94
+
95
+ validator = DataPackage::SimpleDataFormatValidator.new
96
+ validator.valid?( package )
97
+ validator.validate( package )
98
+
99
+ ### Approach to Validation
100
+
101
+ The library will support validating packages against the general rules specified in the
102
+ [DataPackage specification](http://dataprotocols.org/data-packages/) as well as the stricter requirements given in the
103
+ [Simple Data Format specification](http://dataprotocols.org/simple-data-format/) (SDF).
104
+
105
+ SDF is essentially a profile of DataPackage which includes some additional restrictions on
106
+ how data should be published and described. For example all data is to be published as CSV files.
107
+
108
+ The validation in the library is divided into two parts:
109
+
110
+ * Metadata validation -- checking that the structure of the `datapackage.json` file is correcgt
111
+ * Integrity checking -- checking that the overall package and data files appear to be in order
112
+
113
+ #### Metadata Validation
114
+
115
+ The basic structure of `datapackage.json` files are validated using [JSON Schema](http://json-schema.org/). This provides a simple
116
+ declarative way to describe the expected structure of the package metadata.
117
+
118
+ The schema files can be found in the `etc` directory of the project and could be used in other applications. The schema files can
119
+ be customised to support local validation checks (see below).
120
+
121
+ #### Integrity Checking
122
+
123
+ While the metadata for a package might be correct, there are other ways in which the package could be invalid. For example,
124
+ data files might be missing or incorrectly described.
125
+
126
+ The metadata validation is therefore supplemented with some custom code that performs some other checks:
127
+
128
+ * (Both profiles) All resources described in the package must be accessible, e.g. the local file exists or a URL responds successfully to a `HEAD`
129
+ * (`:simpledataformat`) All resources must be CSV files
130
+ * (`:simpledataformat`) All resources must have a valid JSON Table Schema
131
+ * (`:simpledataformat`) CSV `dialect` descriptions must be valid
132
+ * (`:simpledataformat`) All fields declared in the schema must be present in the CSV file
133
+ * (`:simpledataformat`) All fields present in the CSV file must be present in the schema
134
+
135
+ ### Customising the Validation Code
136
+
137
+ The library provides several extension points for customising the way that packages are validated.
138
+
139
+ #### Supplying Custom JSON Schemas
140
+
141
+ Custom JSON schemas can be provided to allow validation to be tweaked for local conventions. An options hash can be
142
+ provided to the constructor of a `DataPackage::Validator` object, this can be used to map schema names to custom
143
+ schemas.
144
+
145
+ (Any options passed to the constructor of a `DataPackage::Package` object will also be passed to its validator)
146
+
147
+ For example to create a new validation profile called `my-validation-rules` and then apply it:
148
+
149
+ opts = {
150
+ :schema => {
151
+ :my-validation-rules => "/path/to/json/schema.json"
152
+ }
153
+ }
154
+ package = DataPackage::Package.new( url )
155
+ package.valid?(:my-validation-rules)
156
+
157
+ This will cause the code to create a custom `DataPackage::Validator` instance that will apply the supplied schema. This class
158
+ does not provide any integrity checks.
159
+
160
+ To mix a custom schema with the existing integrity checking, you must manually create a `Validator` instance. E.g:
161
+
162
+ opts = {
163
+ :schema => {
164
+ :my-validation-rules => "/path/to/json/schema.json"
165
+ }
166
+ }
167
+ validator = DataPackage::SimpleDataFormatValidator(:my-validation-rules, opts)
168
+ validator.valid?( package )
169
+
170
+ Custom schemas must be valid JSON files that conforms to the JSON Schema v4 specification. The absolute path to the schema file must be
171
+ provided.
172
+
173
+ Validation is performed using the [json-schema](https://github.com/hoxworth/json-schema) gem which has some documented restrictions.
174
+
175
+ The built-in schema files can also be overridden in this way, e.g. by specifying an alternate location for the `:datapackage` schema.
176
+
177
+ #### Custom Integrity Checking
178
+
179
+ Integrity checking can be customized by creating new sub-classes of `DataPackage::Validator` or one of the existing sub-classes.
180
+
181
+ The following methods can be implemented:
182
+
183
+ * `validate_metadata(package, messages)` -- perform additional metadata checking after JSON schema is provided.
184
+ * `validate_resource(package, resource, messages)` -- called for each resource in the package
data/etc/README.md ADDED
@@ -0,0 +1,18 @@
1
+ This directory contains some JSON Schema documents for validating:
2
+
3
+ * `datapackage-schema.json` -- [datapackage.json](http://dataprotocols.org/data-packages/) package files
4
+ * `jsontable-schema.json` -- [JSON Table Schemas](http://dataprotocols.org/json-table-schema/) objects
5
+ * `csvddf-dialect-schema.json` -- [CSV Dialect Description Format](http://dataprotocols.org/csv-dialect/) dialect objects
6
+
7
+ The JSON Table Schemas and CSV Dialect Description Format both define JSON object structures that can appear in `datapackage.json` files (via the `schema` and `dialect` keywords). In the main `datapackage-schema.json` object, these keywords are only validated as simple objects.
8
+
9
+ In the application the subsidiary schemas are automatically applied to relevant keys. This could be improved by using JSON Schema cross-referencing.
10
+
11
+ Other potential improvements include:
12
+
13
+ * Add `data` keyword validation to `datapackage-schema.json`
14
+ * Add `format` keywords for validating email addresses and date/date-times
15
+ * Or, add `pattern` for validating dates
16
+ * Improve regexs used in various places
17
+
18
+
@@ -0,0 +1,24 @@
1
+ {
2
+ "$schema": "http://json-schema.org/draft-04/schema#",
3
+ "title": "CSVDDF",
4
+ "description": "JSON Schema for validating CSVDDF dialect structures",
5
+ "type": "object",
6
+ "properties": {
7
+ "delimiter": {
8
+ "type": "string"
9
+ },
10
+ "doublequote": {
11
+ "type": "boolean"
12
+ },
13
+ "lineterminator": {
14
+ "type": "string"
15
+ },
16
+ "quotechar": {
17
+ "type": "string"
18
+ },
19
+ "skipinitialspace": {
20
+ "type": "boolean"
21
+ }
22
+ },
23
+ "required": [ "delimiter", "doublequote", "lineterminator", "quotechar", "skipinitialspace" ]
24
+ }
@@ -0,0 +1,208 @@
1
+ {
2
+ "$schema": "http://json-schema.org/draft-04/schema#",
3
+ "title": "DataPackage",
4
+ "description": "JSON Schema for validating datapackage.json files",
5
+ "type": "object",
6
+ "properties": {
7
+ "name": {
8
+ "type": "string",
9
+ "pattern": "^([a-z\\.\\_\\-])+$"
10
+ },
11
+ "licences": {
12
+ "type": "array",
13
+ "items": {
14
+ "type": "object",
15
+ "properties": {
16
+ "id": { "type": "string" },
17
+ "url": { "type": "string" }
18
+ },
19
+ "anyOf": [
20
+ { "title": "id required", "required": ["id"] },
21
+ { "title": "url required", "required": ["url"] }
22
+ ]
23
+ }
24
+ },
25
+ "datapackage_version": {
26
+ "type": "string"
27
+ },
28
+ "title": {
29
+ "type": "string"
30
+ },
31
+ "description": {
32
+ "type": "string"
33
+ },
34
+ "homepage": {
35
+ "type": "string"
36
+ },
37
+ "version": {
38
+ "type": "string"
39
+ },
40
+ "sources": {
41
+ "type": "array",
42
+ "items": {
43
+ "type": "object",
44
+ "properties": {
45
+ "name": { "type": "string" },
46
+ "web": { "type": "string" },
47
+ "email": { "type": "string" }
48
+ },
49
+ "anyOf": [
50
+ { "title": "name required", "required": ["name"] },
51
+ { "title": "web required", "required": ["web"] },
52
+ { "title": "email required", "required": ["email"] }
53
+ ]
54
+ }
55
+ },
56
+ "keywords": {
57
+ "type": "array",
58
+ "items": {
59
+ "type": "string"
60
+ }
61
+ },
62
+ "last_modified": {
63
+ "type": "string"
64
+ },
65
+ "image": {
66
+ "type": "string"
67
+ },
68
+ "bugs": {
69
+ "type": "string"
70
+ },
71
+ "maintainers": {
72
+ "type": "array",
73
+ "items": {
74
+ "type": "object",
75
+ "properties": {
76
+ "name": {
77
+ "type": "string"
78
+ },
79
+ "email": {
80
+ "type": "string"
81
+ },
82
+ "web": {
83
+ "type": "string"
84
+ }
85
+ },
86
+ "required": ["name"]
87
+ }
88
+ },
89
+ "contributors": {
90
+ "type": "array",
91
+ "items": {
92
+ "type": "object",
93
+ "properties": {
94
+ "name": {
95
+ "type": "string"
96
+ },
97
+ "email": {
98
+ "type": "string"
99
+ },
100
+ "web": {
101
+ "type": "string"
102
+ }
103
+ },
104
+ "required": ["name"]
105
+ }
106
+ },
107
+ "publisher": {
108
+ "type": "array",
109
+ "items": {
110
+ "type": "object",
111
+ "properties": {
112
+ "name": {
113
+ "type": "string"
114
+ },
115
+ "email": {
116
+ "type": "string"
117
+ },
118
+ "web": {
119
+ "type": "string"
120
+ }
121
+ },
122
+ "required": ["name"]
123
+ }
124
+ },
125
+ "dependencies": {
126
+ "type": "object"
127
+ },
128
+ "resources": {
129
+ "type": "array",
130
+ "minItems": 1,
131
+ "items": {
132
+ "type": "object",
133
+ "properties": {
134
+ "url": {
135
+ "type": "string"
136
+ },
137
+ "path": {
138
+ "type": "string"
139
+ },
140
+ "name": {
141
+ "type": "string"
142
+ },
143
+ "format": {
144
+ "type": "string"
145
+ },
146
+ "mediatype": {
147
+ "type": "string",
148
+ "pattern": "^(.+)/(.+)$"
149
+ },
150
+ "encoding": {
151
+ "type": "string"
152
+ },
153
+ "bytes": {
154
+ "type": "integer"
155
+ },
156
+ "hash": {
157
+ "type": "string",
158
+ "pattern": "^([a-fA-F0-9]{32})$"
159
+ },
160
+ "modified": {
161
+ "type": "string"
162
+ },
163
+ "schema": {
164
+ "type": "object"
165
+ },
166
+ "dialect": {
167
+ "type": "object"
168
+ },
169
+ "sources": {
170
+ "type": "array",
171
+ "items": {
172
+ "type": "object",
173
+ "properties": {
174
+ "name": { "type": "string" },
175
+ "web": { "type": "string" },
176
+ "email": { "type": "string" }
177
+ },
178
+ "anyOf": [
179
+ { "title": "name required", "required": ["name"] },
180
+ { "title": "web required", "required": ["web"] },
181
+ { "title": "email required", "required": ["email"] }
182
+ ]
183
+ }
184
+ },
185
+ "licences": {
186
+ "type": "array",
187
+ "items": {
188
+ "type": "object",
189
+ "properties": {
190
+ "id": { "type": "string" },
191
+ "url": { "type": "string" }
192
+ },
193
+ "anyOf": [
194
+ { "title": "id required", "required": ["id"] },
195
+ { "title": "url required", "required": ["url"] }
196
+ ]
197
+ }
198
+ }
199
+ },
200
+ "anyOf": [
201
+ { "title": "url required", "required": ["url"] },
202
+ { "title": "path required", "required": ["path"] }
203
+ ]
204
+ }
205
+ }
206
+ },
207
+ "required": ["name", "resources"]
208
+ }
@@ -0,0 +1,34 @@
1
+ {
2
+ "$schema": "http://json-schema.org/draft-04/schema#",
3
+ "title": "JSON Table Schema",
4
+ "description": "JSON Schema for validating JSON Table structures",
5
+ "type": "object",
6
+ "properties": {
7
+ "fields": {
8
+ "type": "array",
9
+ "minItems": 1,
10
+ "items": {
11
+ "type": "object",
12
+ "properties": {
13
+ "name": {
14
+ "type": "string"
15
+ },
16
+ "title": {
17
+ "type": "string"
18
+ },
19
+ "description": {
20
+ "type": "string"
21
+ },
22
+ "type": {
23
+ "enum": [ "string", "number", "integer", "date", "time", "datetime", "boolean", "binary", "object", "geopoint", "geojson", "array", "any" ]
24
+ },
25
+ "format": {
26
+ "type": "string"
27
+ }
28
+ },
29
+ "required": ["name"]
30
+ }
31
+ }
32
+ },
33
+ "required": ["fields"]
34
+ }
@@ -0,0 +1,160 @@
1
+ require 'open-uri'
2
+
3
+ module DataPackage
4
+
5
+ class Package
6
+
7
+ attr_reader :metadata, :opts
8
+
9
+ # Parse a data package
10
+ #
11
+ # Supports reading data from JSON file, directory, and a URL
12
+ #
13
+ # package:: Hash or a String
14
+ # opts:: Options used to customize reading and parsing
15
+ def initialize(package, opts={})
16
+ @opts = opts
17
+ #TODO base directory/url
18
+ if package.class == Hash
19
+ @metadata = package
20
+ else
21
+ if !package.start_with?("http") && File.directory?(package)
22
+ package = File.join(package, opts[:default_filename] || "datapackage.json")
23
+ end
24
+ if package.start_with?("http") && !package.end_with?("datapackage.json")
25
+ package = URI.join(package, "datapackage.json")
26
+ end
27
+ @location = package.to_s
28
+ @metadata = JSON.parse( open(package).read )
29
+ end
30
+ end
31
+
32
+ #Returns the directory for a local file package or base url for a remote
33
+ #Returns nil for an in-memory object (because it has no base as yet)
34
+ def base
35
+ #user can override base
36
+ return @opts[:base] if @opts[:base]
37
+ return "" unless @location
38
+ #work out base directory or uri
39
+ if local?
40
+ return File.dirname( @location )
41
+ else
42
+ return @location.split("/")[0..-2].join("/")
43
+ end
44
+ end
45
+
46
+ #Is this a local package? Returns true if created from an in-memory object or a file/directory reference
47
+ def local?
48
+ return !@location.start_with?("http") if @location
49
+ return true
50
+ end
51
+
52
+ def name
53
+ @metadata["name"]
54
+ end
55
+
56
+ def title
57
+ @metadata["title"]
58
+ end
59
+
60
+ def description
61
+ @metadata["description"]
62
+ end
63
+
64
+ def homepage
65
+ @metadata["homepage"]
66
+ end
67
+
68
+ def licenses
69
+ @metadata["licenses"] || []
70
+ end
71
+ alias_method :licences, :licenses
72
+
73
+ #What version of datapackage specification is this using?
74
+ def datapackage_version
75
+ @metadata["datapackage_version"]
76
+ end
77
+
78
+ #What is the version of this specific data package?
79
+ def version
80
+ @metadata["version"]
81
+ end
82
+
83
+ def sources
84
+ @metadata["sources"] || []
85
+ end
86
+
87
+ def keywords
88
+ @metadata["keywords"] || []
89
+ end
90
+
91
+ def last_modified
92
+ DateTime.parse @metadata["last_modified"] rescue nil
93
+ end
94
+
95
+ def image
96
+ @metadata["image"]
97
+ end
98
+
99
+ def maintainers
100
+ @metadata["maintainers"] || []
101
+ end
102
+
103
+ def contributors
104
+ @metadata["contributors"] || []
105
+ end
106
+
107
+ def publisher
108
+ @metadata["publisher"] || []
109
+ end
110
+
111
+ def resources
112
+ @metadata["resources"] || []
113
+ end
114
+
115
+ def dependencies
116
+ @metadata["dependencies"]
117
+ end
118
+
119
+ def property(property, default=nil)
120
+ @metadata[property] || default
121
+ end
122
+
123
+ def valid?(profile=:datapackage, strict=false)
124
+ validator = DataPackage::Validator.create(profile, @opts)
125
+ return validator.valid?(self, strict)
126
+ end
127
+
128
+ def validate(profile=:datapackage)
129
+ validator = DataPackage::Validator.create(profile, @opts)
130
+ return validator.validate(self)
131
+ end
132
+
133
+ def resolve_resource(resource)
134
+ return resource["url"] || resolve( resource["path"] )
135
+ end
136
+
137
+ def resolve(path)
138
+ if local?
139
+ return File.join( base , path) if base != ""
140
+ return path
141
+ else
142
+ return URI.join(base, path)
143
+ end
144
+ end
145
+
146
+ def resource_exists?(location)
147
+ if !location.to_s.start_with?("http")
148
+ return File.exists?( location )
149
+ else
150
+ begin
151
+ status = RestClient.head( location ).code
152
+ return status == 200
153
+ rescue => e
154
+ return false
155
+ end
156
+ end
157
+ end
158
+
159
+ end
160
+ end
@@ -0,0 +1,176 @@
1
+ module DataPackage
2
+
3
+ #Base class for validators
4
+ class Validator
5
+
6
+ def Validator.create(profile, opts={})
7
+ if profile == :simpledataformat
8
+ return SimpleDataFormatValidator.new(profile, opts)
9
+ end
10
+ if profile == :datapackage
11
+ return DataPackageValidator.new(profile, opts)
12
+ end
13
+ return Validator.new(profile, opts)
14
+ end
15
+
16
+ def initialize(schema_name, opts={})
17
+ @schema_name = schema_name
18
+ @opts = opts
19
+ end
20
+
21
+ def valid?(package, strict=false)
22
+ messages = validate( package )
23
+ return messages[:errors].empty? if !strict
24
+ return messages[:errors].empty? && messages[:warnings].empty?
25
+ end
26
+
27
+ def validate( package )
28
+ return validate_integrity( package, validate_with_schema(package) )
29
+ end
30
+
31
+ def validate_with_schema(package)
32
+ schema = load_schema(@schema_name)
33
+ messages = {
34
+ :errors => JSON::Validator.fully_validate(schema, package.metadata, :errors_as_objects => true),
35
+ :warnings => []
36
+ }
37
+ validate_metadata(package, messages)
38
+ return messages
39
+ end
40
+
41
+ def validate_integrity(package, messages={ :errors=>[], :warnings=>[] } )
42
+ package.resources.each do |resource|
43
+ validate_resource(package, resource, messages)
44
+ end
45
+
46
+ messages
47
+ end
48
+
49
+ protected
50
+
51
+ #implement to perform additional validation on metadata
52
+ def validate_metadata(package, messages)
53
+ end
54
+
55
+ #implement for per-resource validation
56
+ def validate_resource(package, resource, messages)
57
+ end
58
+
59
+ protected
60
+
61
+ def load_schema(profile)
62
+ if @opts[:schema] && @opts[:schema][profile]
63
+ if !File.exists?( @opts[:schema][profile] )
64
+ raise "User supplied schema file does not exist: #{@opts[:schema][profile]}"
65
+ end
66
+ return JSON.parse( File.read( @opts[:schema][profile] ) )
67
+ end
68
+ schema_file = file_in_etc_directory( "#{profile}-schema.json" )
69
+ if !File.exists?( schema_file )
70
+ raise "Unable to read schema file #{schema_file} for validation profile #{profile}"
71
+ end
72
+ return JSON.parse( File.read( schema_file ) )
73
+ end
74
+
75
+ private
76
+
77
+ def file_in_etc_directory(filename)
78
+ File.join( File.dirname(__FILE__), "..", "..", "etc", filename )
79
+ end
80
+
81
+ end
82
+
83
+ #Extends base class with some additional checks for DataPackage conformance.
84
+ #
85
+ #These include some warnings about missing metadata elements and an existence
86
+ #check for all resources
87
+ class DataPackageValidator < Validator
88
+ def initialize(schema_name=:datapackage, opts={})
89
+ super(:datapackage, opts)
90
+ end
91
+
92
+ def validate_metadata(package, messages)
93
+ #not required, but recommended
94
+ prefix = "The package does not include a"
95
+ messages[:warnings] << "#{prefix} 'licenses' property" if package.licenses.empty?
96
+ messages[:warnings] << "#{prefix} 'datapackage_version' property" unless package.datapackage_version
97
+ messages[:warnings] << "#{prefix} README.md file" unless package.resource_exists?( package.resolve("README.md") )
98
+ end
99
+
100
+ def validate_resource(package, resource, messages)
101
+ if !package.resource_exists?( package.resolve_resource( resource ) )
102
+ messages[:errors] << "Resource #{resource["url"] || resource["path"]} does not exist"
103
+ end
104
+ end
105
+
106
+ end
107
+
108
+ #Validator that checks whether a package conforms to the Simple Data Format profile
109
+ class SimpleDataFormatValidator < DataPackageValidator
110
+
111
+ def initialize(schema_name=:datapackage, opts={})
112
+ super(:datapackage, opts)
113
+ @jsontable_schema = load_schema(:jsontable)
114
+ @csvddf_schema = load_schema("csvddf-dialect")
115
+ end
116
+
117
+ def validate_resource(package, resource, messages)
118
+ super(package, resource, messages)
119
+
120
+ if !csv?(resource)
121
+ messages[:errors] << "#{resource["name"]} is not a CSV file"
122
+ else
123
+ if !resource["schema"]
124
+ messages[:errors] << "#{resource["name"]} does not have a schema"
125
+ else
126
+ messages[:errors] +=
127
+ JSON::Validator.fully_validate(@jsontable_schema,
128
+ resource["schema"], :errors_as_objects => true)
129
+ end
130
+ if resource["dialect"]
131
+ messages[:errors] +=
132
+ JSON::Validator.fully_validate(@csvddf_schema,
133
+ resource["dialect"], :errors_as_objects => true)
134
+ end
135
+
136
+ if resource["schema"] && resource["schema"]["fields"]
137
+ fields = resource["schema"]["fields"]
138
+ declared_fields = fields.map{ |f| f["name"] }
139
+ headers = headers(package, resource)
140
+
141
+ #set algebra to finding fields missing from schema and/or CSV file
142
+ missing_fields = declared_fields - headers
143
+ if missing_fields != []
144
+ messages[:errors] <<
145
+ "Declared schema has fields not present in CSV file (#{missing_fields.join(",")})"
146
+ end
147
+ undeclared_fields = headers - declared_fields
148
+ if undeclared_fields != []
149
+ messages[:errors] << "CSV file has fields missing from schema (#{undeclared_fields.join(",")})"
150
+ end
151
+ end
152
+
153
+ end
154
+
155
+ end
156
+
157
+ def csv?(resource)
158
+ resource["mediatype"] == "text/csv" ||
159
+ resource["format"] == "csv"
160
+ end
161
+
162
+ def headers(package, resource)
163
+ headers = []
164
+ opts = dialect_to_csv_options(resource["dialect"])
165
+ CSV.open( package.resolve_resource(resource), "r", opts) do |csv|
166
+ headers = csv.shift
167
+ end
168
+ return headers
169
+ end
170
+
171
+ def dialect_to_csv_options(dialect)
172
+ return {}
173
+ end
174
+ end
175
+
176
+ end
@@ -0,0 +1,3 @@
1
+ module DataPackage
2
+ VERSION = "0.0.1"
3
+ end
@@ -0,0 +1,11 @@
1
+ require 'date'
2
+ require 'uri'
3
+ require 'net/http'
4
+ require 'csv'
5
+ require 'json'
6
+ require 'json-schema'
7
+ require 'rest-client'
8
+
9
+ require 'datapackage/version'
10
+ require 'datapackage/validator'
11
+ require 'datapackage/package'
metadata ADDED
@@ -0,0 +1,152 @@
1
+ --- !ruby/object:Gem::Specification
2
+ name: datapackage
3
+ version: !ruby/object:Gem::Version
4
+ version: 0.0.1
5
+ prerelease:
6
+ platform: ruby
7
+ authors:
8
+ - Leigh Dodds
9
+ autorequire:
10
+ bindir: bin
11
+ cert_chain: []
12
+ date: 2013-12-05 00:00:00.000000000 Z
13
+ dependencies:
14
+ - !ruby/object:Gem::Dependency
15
+ name: json
16
+ requirement: !ruby/object:Gem::Requirement
17
+ none: false
18
+ requirements:
19
+ - - ! '>='
20
+ - !ruby/object:Gem::Version
21
+ version: '0'
22
+ type: :runtime
23
+ prerelease: false
24
+ version_requirements: !ruby/object:Gem::Requirement
25
+ none: false
26
+ requirements:
27
+ - - ! '>='
28
+ - !ruby/object:Gem::Version
29
+ version: '0'
30
+ - !ruby/object:Gem::Dependency
31
+ name: json-schema
32
+ requirement: !ruby/object:Gem::Requirement
33
+ none: false
34
+ requirements:
35
+ - - ! '>='
36
+ - !ruby/object:Gem::Version
37
+ version: '0'
38
+ type: :runtime
39
+ prerelease: false
40
+ version_requirements: !ruby/object:Gem::Requirement
41
+ none: false
42
+ requirements:
43
+ - - ! '>='
44
+ - !ruby/object:Gem::Version
45
+ version: '0'
46
+ - !ruby/object:Gem::Dependency
47
+ name: rest-client
48
+ requirement: !ruby/object:Gem::Requirement
49
+ none: false
50
+ requirements:
51
+ - - ! '>='
52
+ - !ruby/object:Gem::Version
53
+ version: '0'
54
+ type: :runtime
55
+ prerelease: false
56
+ version_requirements: !ruby/object:Gem::Requirement
57
+ none: false
58
+ requirements:
59
+ - - ! '>='
60
+ - !ruby/object:Gem::Version
61
+ version: '0'
62
+ - !ruby/object:Gem::Dependency
63
+ name: rspec
64
+ requirement: !ruby/object:Gem::Requirement
65
+ none: false
66
+ requirements:
67
+ - - ! '>='
68
+ - !ruby/object:Gem::Version
69
+ version: '0'
70
+ type: :development
71
+ prerelease: false
72
+ version_requirements: !ruby/object:Gem::Requirement
73
+ none: false
74
+ requirements:
75
+ - - ! '>='
76
+ - !ruby/object:Gem::Version
77
+ version: '0'
78
+ - !ruby/object:Gem::Dependency
79
+ name: simplecov-rcov
80
+ requirement: !ruby/object:Gem::Requirement
81
+ none: false
82
+ requirements:
83
+ - - ! '>='
84
+ - !ruby/object:Gem::Version
85
+ version: '0'
86
+ type: :development
87
+ prerelease: false
88
+ version_requirements: !ruby/object:Gem::Requirement
89
+ none: false
90
+ requirements:
91
+ - - ! '>='
92
+ - !ruby/object:Gem::Version
93
+ version: '0'
94
+ - !ruby/object:Gem::Dependency
95
+ name: fakeweb
96
+ requirement: !ruby/object:Gem::Requirement
97
+ none: false
98
+ requirements:
99
+ - - ~>
100
+ - !ruby/object:Gem::Version
101
+ version: '1.3'
102
+ type: :development
103
+ prerelease: false
104
+ version_requirements: !ruby/object:Gem::Requirement
105
+ none: false
106
+ requirements:
107
+ - - ~>
108
+ - !ruby/object:Gem::Version
109
+ version: '1.3'
110
+ description:
111
+ email:
112
+ - leigh@ldodds.com
113
+ executables: []
114
+ extensions: []
115
+ extra_rdoc_files: []
116
+ files:
117
+ - etc/jsontable-schema.json
118
+ - etc/datapackage-schema.json
119
+ - etc/README.md
120
+ - etc/csvddf-dialect-schema.json
121
+ - lib/datapackage.rb
122
+ - lib/datapackage/validator.rb
123
+ - lib/datapackage/package.rb
124
+ - lib/datapackage/version.rb
125
+ - LICENSE.md
126
+ - README.md
127
+ homepage: http://github.com/theodi/datapackage.rb
128
+ licenses: []
129
+ post_install_message:
130
+ rdoc_options: []
131
+ require_paths:
132
+ - lib
133
+ required_ruby_version: !ruby/object:Gem::Requirement
134
+ none: false
135
+ requirements:
136
+ - - ! '>='
137
+ - !ruby/object:Gem::Version
138
+ version: '0'
139
+ required_rubygems_version: !ruby/object:Gem::Requirement
140
+ none: false
141
+ requirements:
142
+ - - ! '>='
143
+ - !ruby/object:Gem::Version
144
+ version: '0'
145
+ requirements: []
146
+ rubyforge_project:
147
+ rubygems_version: 1.8.23
148
+ signing_key:
149
+ specification_version: 3
150
+ summary: Library for working with data packages
151
+ test_files: []
152
+ has_rdoc: