datapackage 0.0.4 → 0.1.0

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA1:
3
- metadata.gz: 577f0cccc791f7f81d19b087038ec6061dcee12a
4
- data.tar.gz: 9cac27e6e3eaa0be69575afd50a7193e5a7fb23c
3
+ metadata.gz: 0dbfa361366c6830bb4624347821cc07c754d903
4
+ data.tar.gz: 4fc8b53abca06e6885d686cde540a21ad591aa49
5
5
  SHA512:
6
- metadata.gz: ff2eaf11c9605f6fc20019b285747f952de6563da74e6fc2f5d0eb6d3f650fc32d1d40f44e44776a3f17f56152fa2479d6ff5a37d3f253eae4420144054db829
7
- data.tar.gz: e83f013ca8d5be9c002b43abda33c89d906f9162722a4044267c652e2e96d3d5612d39c5727cc094941fc97e3f62e1f53862bd8d0499975a3d837dac45ab6f90
6
+ metadata.gz: 53d75e66e0b49d2a92350d826f8e33e371f76b162fd8c94ecccf566064aa3e0e5c6f98aa2ec89deafa7126a359d804383df03163e91035721f7f4c22efcbcda9
7
+ data.tar.gz: 4e4b96bef09d7e46f06e7eac048ebacf45aebf3bcb50a6b9f44bf1ff578e7153fc49e5cb64239b76d84a6c936128c5e8394d2f50c67de25203565f0d997cc7df
data/LICENSE.md CHANGED
@@ -1,4 +1,6 @@
1
- Copyright 2013 The Open Data Institute
1
+ ##Copyright (c) 2016 The Open Data Institute
2
+
3
+ #MIT License
2
4
 
3
5
  Permission is hereby granted, free of charge, to any person obtaining
4
6
  a copy of this software and associated documentation files (the
@@ -17,4 +19,4 @@ MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
17
19
  NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE
18
20
  LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION
19
21
  OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION
20
- WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
22
+ WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
data/README.md CHANGED
@@ -1,11 +1,14 @@
1
+ [![Build Status](http://img.shields.io/travis/theodi/datapackage.rb.svg?style=flat-square)](https://travis-ci.org/theodi/datapackage.rb)
2
+ [![Dependency Status](http://img.shields.io/gemnasium/theodi/datapackage.rb.svg?style=flat-square)](https://gemnasium.com/theodi/datapackage.rb)
3
+ [![Coverage Status](http://img.shields.io/coveralls/theodi/datapackage.rb.svg?style=flat-square)](https://coveralls.io/r/theodi/datapackage.rb)
4
+ [![Code Climate](http://img.shields.io/codeclimate/github/theodi/datapackage.rb.svg?style=flat-square)](https://codeclimate.com/github/theodi/datapackage.rb)
5
+ [![Gem Version](http://img.shields.io/gem/v/datapackage.svg?style=flat-square)](https://rubygems.org/gems/datapackage)
6
+ [![License](http://img.shields.io/:license-mit-blue.svg?style=flat-square)](http://theodi.mit-license.org)
7
+
1
8
  # DataPackage.rb
2
9
 
3
10
  A ruby library for working with [Data Packages](http://dataprotocols.org/data-packages/).
4
11
 
5
- [![Build Status](http://jenkins.theodi.org/job/datapackage.rb-master/badge/icon)](http://jenkins.theodi.org/job/datapackage.rb-master/)
6
- [![Code Climate](https://codeclimate.com/github/theodi/datapackage.rb.png)](https://codeclimate.com/github/theodi/datapackage.rb)
7
- [![Dependency Status](https://gemnasium.com/theodi/datapackage.rb.png)](https://gemnasium.com/theodi/datapackage.rb)
8
-
9
12
  The library is intending to support:
10
13
 
11
14
  * Parsing and using data package metadata and data
@@ -15,179 +18,115 @@ The library is intending to support:
15
18
 
16
19
  Add the gem into your Gemfile:
17
20
 
18
- gem 'datapackage.rb', :git => "git://github.com/theodi/datapackage.rb.git"
21
+ ```
22
+ gem 'datapackage.rb'
23
+ ```
19
24
 
20
25
  Or:
21
26
 
22
- sudo gem install datapackage
27
+ ```
28
+ gem install datapackage
29
+ ```
23
30
 
24
- ## Basic Usage
31
+ ## Reading a Data Package
25
32
 
26
33
  Require the gem, if you need to:
27
34
 
28
- require 'datapackage.rb'
29
-
30
- Parsing a datapackage from a remote location:
31
-
32
- package = DataPackage::Package.new( "http://example.org/datasets/a" )
33
-
34
- This assumes that `http://example.org/datasets/a/datapackage.json` exists, or specifically load a JSON file:
35
-
36
- package = DataPackage::Package.new( "http://example.org/datasets/a/datapackage.json" )
37
-
38
- Similarly you can load a package from a local JSON file, or specify a directory:
39
-
40
- package = DataPackage::Package.new( "/my/data/package" )
41
- package = DataPackage::Package.new( "/my/data/package/datapackage.json" )
42
-
43
- There are a set of helper methods for accessing data from the package, e.g:
44
-
45
- package = DataPackage::Package.new( "/my/data/package" )
46
- package.name
47
- package.title
48
- package.licenses
49
- package.resources
50
-
51
- These currently just return the raw JSON structure, but this might change in future.
52
-
53
- ## Package Validation
54
-
55
- The library supports validating packages. It can be used to validate both the metadata for the package (`datapackage.json`)
56
- and the integrity of the package itself, e.g. whether the data files exist.
57
-
58
- ### Validating a Package
59
-
60
- Quickly checking the validity of a package can be achieve as follows:
35
+ ```ruby
36
+ require 'datapackage.rb'
37
+ ```
61
38
 
62
- package.valid?
63
-
64
- To expose more detail on errors and warnings:
39
+ Parsing a Data Package from a remote location:
65
40
 
66
- messages = package.validate() # or package.validate(:datapackage)
41
+ ```ruby
42
+ package = DataPackage::Package.new( "http://example.org/datasets/a" )
43
+ ```
67
44
 
68
- This returns an object with two keys: `:errors` and `:warnings`. The values of these keys are arrays of message object.
69
- Message objects are formatted as follows:
70
-
71
- {
72
- :type => :metadata|:integrity,
73
- :message => "message for user",
74
- :fragment => "/path/to/responsible/element"
75
- }
76
-
77
- It is possible to treat all warnings as errors by performing strict validation:
78
-
79
- package.valid?(true)
80
-
81
- Examples of warnings might include notes on missing metadata elements (e.g. package `licenses`) which are not required by the
82
- DataPackage specification but which SHOULD be included.
83
-
84
- Warnings are currently generated for:
85
-
86
- * Missing `README.md` files from packages
87
- * Missing `licenses` key from `datapackage.json`
88
- * Missing `datapackage_version` key from `datapackage.json`
89
-
90
- ### Selecting a Validation Profile
91
-
92
- The library contains two validation classes, one for the core Data Package specification and the other for the Simple Data Format
93
- rules. By default the library uses the more liberal Data Package rules.
94
-
95
- The required profile can be specified in one of two ways. Either as a parameter to the validation methods:
96
-
97
- package.valid?(:datapackage)
98
- package.valid?(:simpledataformat)
99
- package.validate(:datapackage)
100
- package.validate(:simpledataformat)
101
-
102
- Or, by using a `DataPackage::Validation` class:
45
+ This assumes that `http://example.org/datasets/a/datapackage.json` exists, or specifically load a JSON file:
103
46
 
104
- validator = DataPackage::SimpleDataFormatValidator.new
105
- validator.valid?( package )
106
- validator.validate( package )
47
+ ```ruby
48
+ package = DataPackage::Package.new( "http://example.org/datasets/a/datapackage.json" )
49
+ ```
107
50
 
108
- ### Approach to Validation
51
+ Similarly you can load a package from a local JSON file, or specify a directory:
109
52
 
110
- The library will support validating packages against the general rules specified in the
111
- [DataPackage specification](http://dataprotocols.org/data-packages/) as well as the stricter requirements given in the
112
- [Simple Data Format specification](http://dataprotocols.org/simple-data-format/) (SDF).
113
-
114
- SDF is essentially a profile of DataPackage which includes some additional restrictions on
115
- how data should be published and described. For example all data is to be published as CSV files.
53
+ ```ruby
54
+ package = DataPackage::Package.new( "/my/data/package" )
55
+ package = DataPackage::Package.new( "/my/data/package/datapackage.json" )
56
+ ```
116
57
 
117
- The validation in the library is divided into two parts:
58
+ There are a set of helper methods for accessing data from the package, e.g:
118
59
 
119
- * Metadata validation -- checking that the structure of the `datapackage.json` file is correcgt
120
- * Integrity checking -- checking that the overall package and data files appear to be in order
60
+ ```ruby
61
+ package = DataPackage::Package.new( "/my/data/package" )
62
+ package.name
63
+ package.title
64
+ package.description
65
+ package.homepage
66
+ package.license
67
+ ```
121
68
 
122
- #### Metadata Validation
69
+ ## Creating a Data Package
123
70
 
124
- The basic structure of `datapackage.json` files are validated using [JSON Schema](http://json-schema.org/). This provides a simple
125
- declarative way to describe the expected structure of the package metadata.
71
+ ```ruby
72
+ package = DataPackage::Package.new
126
73
 
127
- The schema files can be found in the `etc` directory of the project and could be used in other applications. The schema files can
128
- be customised to support local validation checks (see below).
74
+ package.name = 'my_sleep_duration'
75
+ package.resources = [
76
+ {'name': 'data'}
77
+ ]
129
78
 
130
- #### Integrity Checking
79
+ resource = package.resources[0]
80
+ resource.descriptor['data'] = [
81
+ 7, 8, 5, 6, 9, 7, 8
82
+ ]
131
83
 
132
- While the metadata for a package might be correct, there are other ways in which the package could be invalid. For example,
133
- data files might be missing or incorrectly described.
84
+ File.open('datapackage.json', 'w') do |f|
85
+ f.write(package.to_json)
86
+ end
134
87
 
135
- The metadata validation is therefore supplemented with some custom code that performs some other checks:
88
+ # {"name": "my_sleep_duration", "resources": [{"name": "data", "data": [7, 8, 5, 6, 9, 7, 8]}]}
89
+ ```
136
90
 
137
- * (Both profiles) All resources described in the package must be accessible, e.g. the local file exists or a URL responds successfully to a `HEAD`
138
- * (`:simpledataformat`) All resources must be CSV files
139
- * (`:simpledataformat`) All resources must have a valid JSON Table Schema
140
- * (`:simpledataformat`) CSV `dialect` descriptions must be valid
141
- * (`:simpledataformat`) All fields declared in the schema must be present in the CSV file
142
- * (`:simpledataformat`) All fields present in the CSV file must be present in the schema
91
+ ## Validating a Data Package
143
92
 
144
- ### Customising the Validation Code
93
+ ```ruby
94
+ package = DataPackage::Package.new('http://data.okfn.org/data/core/gdp/datapackage.json')
145
95
 
146
- The library provides several extension points for customising the way that packages are validated.
96
+ package.valid?
97
+ #=> true
98
+ package.errors
99
+ #=> [] # An array of errors
100
+ ```
147
101
 
148
- #### Supplying Custom JSON Schemas
102
+ ## Using a different schema
149
103
 
150
- Custom JSON schemas can be provided to allow validation to be tweaked for local conventions. An options hash can be
151
- provided to the constructor of a `DataPackage::Validator` object, this can be used to map schema names to custom
152
- schemas.
104
+ By default, the gem uses the standard [Data Package Schema](http://specs.frictionlessdata.io/data-packages/), but alternative schemas are available.
153
105
 
154
- (Any options passed to the constructor of a `DataPackage::Package` object will also be passed to its validator)
155
-
156
- For example to create a new validation profile called `my-validation-rules` and then apply it:
106
+ ### Schemas in the local cache
157
107
 
158
- opts = {
159
- :schema => {
160
- :my-validation-rules => "/path/to/json/schema.json"
161
- }
162
- }
163
- package = DataPackage::Package.new( url )
164
- package.valid?(:my-validation-rules)
108
+ The gem comes with schemas for the standard Data Package Schema, as well as the [Tabular Data Package Schema](http://specs.frictionlessdata.io/tabular-data-package/), and the [Fiscal Data Package Schema](http://fiscal.dataprotocols.org/spec/). These can be referred to via an identifier, expressed as a symbol.
165
109
 
166
- This will cause the code to create a custom `DataPackage::Validator` instance that will apply the supplied schema. This class
167
- does not provide any integrity checks.
110
+ ```ruby
111
+ package = DataPackage::Package.new(nil, :tabular) # Or :fiscal
112
+ ```
168
113
 
169
- To mix a custom schema with the existing integrity checking, you must manually create a `Validator` instance. E.g:
114
+ ### Schemas from elsewhere
170
115
 
171
- opts = {
172
- :schema => {
173
- :my-validation-rules => "/path/to/json/schema.json"
174
- }
175
- }
176
- validator = DataPackage::SimpleDataFormatValidator(:my-validation-rules, opts)
177
- validator.valid?( package )
116
+ If you have a schema stored in an alternative registry, you can pass a `registry_url` option to the initializer.
178
117
 
179
- Custom schemas must be valid JSON files that conforms to the JSON Schema v4 specification. The absolute path to the schema file must be
180
- provided.
118
+ ```ruby
119
+ package = DataPackage::Package.new(nil, :identifier, {registry_url: 'http://example.org/my-registry.csv'} )
120
+ ```
181
121
 
182
- Validation is performed using the [json-schema](https://github.com/hoxworth/json-schema) gem which has some documented restrictions.
183
-
184
- The built-in schema files can also be overridden in this way, e.g. by specifying an alternate location for the `:datapackage` schema.
122
+ ## Developer notes
185
123
 
186
- #### Custom Integrity Checking
124
+ These notes are intended to help people that want to contribute to this package itself. If you just want to use it, you can safely ignore them.
187
125
 
188
- Integrity checking can be customized by creating new sub-classes of `DataPackage::Validator` or one of the existing sub-classes.
126
+ ### Updating the local schemas cache
189
127
 
190
- The following methods can be implemented:
128
+ We cache the schemas from https://github.com/dataprotocols/schemas using git-subtree. To update it, use:
191
129
 
192
- * `validate_metadata(package, messages)` -- perform additional metadata checking after JSON schema is provided.
193
- * `validate_resource(package, resource, messages)` -- called for each resource in the package
130
+ ```
131
+ git subtree pull --prefix datapackage/schemas https://github.com/dataprotocols/schemas.git master --squash
132
+ ```
@@ -0,0 +1,12 @@
1
+ module DataPackage
2
+ class RegistryError < StandardError; end
3
+
4
+ class SchemaException < Exception
5
+ attr_reader :status, :message
6
+
7
+ def initialize status
8
+ @status = status
9
+ @message = status
10
+ end
11
+ end
12
+ end
@@ -1,160 +1,181 @@
1
1
  require 'open-uri'
2
2
 
3
3
  module DataPackage
4
-
5
- class Package
6
-
7
- attr_reader :metadata, :opts
8
-
9
- # Parse a data package
10
- #
11
- # Supports reading data from JSON file, directory, and a URL
12
- #
13
- # package:: Hash or a String
14
- # opts:: Options used to customize reading and parsing
15
- def initialize(package, opts={})
16
- @opts = opts
17
- #TODO base directory/url
18
- if package.class == Hash
19
- @metadata = package
20
- else
21
- if !package.start_with?("http") && File.directory?(package)
22
- package = File.join(package, opts[:default_filename] || "datapackage.json")
23
- end
24
- if package.start_with?("http") && !package.end_with?("datapackage.json")
25
- package = URI.join(package, "datapackage.json")
26
- end
27
- @location = package.to_s
28
- @metadata = JSON.parse( open(package).read )
29
- end
30
- end
31
-
32
- #Returns the directory for a local file package or base url for a remote
33
- #Returns nil for an in-memory object (because it has no base as yet)
34
- def base
35
- #user can override base
36
- return @opts[:base] if @opts[:base]
37
- return "" unless @location
38
- #work out base directory or uri
39
- if local?
40
- return File.dirname( @location )
41
- else
42
- return @location.split("/")[0..-2].join("/")
43
- end
44
- end
45
-
46
- #Is this a local package? Returns true if created from an in-memory object or a file/directory reference
47
- def local?
48
- return !@location.start_with?("http") if @location
49
- return true
50
- end
51
-
52
- def name
53
- @metadata["name"]
54
- end
55
-
56
- def title
57
- @metadata["title"]
58
- end
59
-
60
- def description
61
- @metadata["description"]
62
- end
63
-
64
- def homepage
65
- @metadata["homepage"]
66
- end
67
-
68
- def licenses
69
- @metadata["licenses"] || []
70
- end
71
- alias_method :licences, :licenses
72
-
73
- #What version of datapackage specification is this using?
74
- def datapackage_version
75
- @metadata["datapackage_version"]
76
- end
77
-
78
- #What is the version of this specific data package?
79
- def version
80
- @metadata["version"]
81
- end
82
-
83
- def sources
84
- @metadata["sources"] || []
85
- end
86
-
87
- def keywords
88
- @metadata["keywords"] || []
89
- end
90
-
91
- def last_modified
92
- DateTime.parse @metadata["last_modified"] rescue nil
93
- end
94
-
95
- def image
96
- @metadata["image"]
97
- end
98
-
99
- def maintainers
100
- @metadata["maintainers"] || []
101
- end
102
-
103
- def contributors
104
- @metadata["contributors"] || []
105
- end
106
-
107
- def publisher
108
- @metadata["publisher"] || []
109
- end
110
-
111
- def resources
112
- @metadata["resources"] || []
113
- end
114
-
115
- def dependencies
116
- @metadata["dependencies"]
117
- end
118
-
119
- def property(property, default=nil)
120
- @metadata[property] || default
121
- end
122
-
123
- def valid?(profile=:datapackage, strict=false)
124
- validator = DataPackage::Validator.create(profile, @opts)
125
- return validator.valid?(self, strict)
126
- end
127
-
128
- def validate(profile=:datapackage)
129
- validator = DataPackage::Validator.create(profile, @opts)
130
- return validator.validate(self)
131
- end
132
-
133
- def resolve_resource(resource)
134
- return resource["url"] || resolve( resource["path"] )
135
- end
136
-
137
- def resolve(path)
138
- if local?
139
- return File.join( base , path) if base != ""
140
- return path
141
- else
142
- return URI.join(base, path)
143
- end
144
- end
145
-
146
- def resource_exists?(location)
147
- if !location.to_s.start_with?("http")
148
- return File.exists?( location )
149
- else
150
- begin
151
- status = RestClient.head( location ).code
152
- return status == 200
153
- rescue => e
154
- return false
155
- end
156
- end
157
- end
158
-
4
+ class Package < Hash
5
+ attr_reader :opts, :errors
6
+ attr_writer :resources
7
+
8
+ # Parse or create a data package
9
+ #
10
+ # Supports reading data from JSON file, directory, and a URL
11
+ #
12
+ # package:: Hash or a String
13
+ # schema:: Hash, Symbol or String
14
+ # opts:: Options used to customize reading and parsing
15
+ def initialize(package = nil, schema = :base, opts = {})
16
+ @opts = opts
17
+ @schema = DataPackage::Schema.new(schema || :base)
18
+ @dead_resources = []
19
+
20
+ self.merge! parse_package(package)
21
+ define_properties!
22
+ load_resources!
23
+ end
24
+
25
+ def parse_package(package)
26
+ # TODO: base directory/url
27
+ if package.nil?
28
+ {}
29
+ elsif package.class == Hash
30
+ package
31
+ else
32
+ read_package(package)
33
+ end
34
+ end
35
+
36
+ # Returns the directory for a local file package or base url for a remote
37
+ # Returns nil for an in-memory object (because it has no base as yet)
38
+ def base
39
+ # user can override base
40
+ return @opts[:base] if @opts[:base]
41
+ return '' unless @location
42
+ # work out base directory or uri
43
+ if local?
44
+ return File.dirname(@location)
45
+ else
46
+ return @location.split('/')[0..-2].join('/')
47
+ end
48
+ end
49
+
50
+ # Is this a local package? Returns true if created from an in-memory object or a file/directory reference
51
+ def local?
52
+ return @local if @local
53
+ return !@location.start_with?('http') if @location
54
+ true
55
+ end
56
+
57
+ def resources
58
+ update_resources!
59
+ @resources
60
+ end
61
+
62
+ def property(property, default = nil)
63
+ self[property] || default
64
+ end
65
+
66
+ def valid?
67
+ validate
68
+ @valid
69
+ end
70
+
71
+ def validate
72
+ @errors = @schema.validation_errors(self)
73
+ @valid = @schema.valid?(self)
74
+ end
75
+
76
+ def resource_exists?(location)
77
+ @dead_resources.include?(location)
78
+ end
79
+
80
+ def to_json
81
+ self.to_json
82
+ end
83
+
84
+ private
85
+
86
+ def define_properties!
87
+ (@schema['properties'] || {}).each do |k, v|
88
+ next if k == 'resources'
89
+ define_singleton_method("#{k.to_sym}=", proc { |p| set_property(k, p) })
90
+ define_singleton_method(k.to_sym.to_s, proc { property k, default_value(v) })
91
+ end
92
+ end
93
+
94
+ def load_resources!
95
+ @resources = (self['resources'] || [])
96
+ update_resources!
97
+ end
98
+
99
+ def update_resources!
100
+ @resources.map! do |resource|
101
+ begin
102
+ load_resource(resource)
103
+ rescue
104
+ @dead_resources << resource['path']
105
+ nil
106
+ end
107
+ end
108
+ end
109
+
110
+ def load_resource(resource)
111
+ if resource.is_a?(Resource)
112
+ resource
113
+ else
114
+ Resource.load(resource, base)
115
+ end
116
+ end
117
+
118
+ def default_value(schema_data)
119
+ case schema_data['type']
120
+ when 'string'
121
+ nil
122
+ when 'array'
123
+ []
124
+ when 'object'
125
+ {}
126
+ end
127
+ end
128
+
129
+ def set_property(key, value)
130
+ self[key] = value
131
+ end
132
+
133
+ def read_package(package)
134
+ if is_directory?(package)
135
+ package = File.join(package, opts[:default_filename] || 'datapackage.json')
136
+ elsif is_containing_url?(package)
137
+ package = URI.join(package, 'datapackage.json')
138
+ end
139
+
140
+ @location = package.to_s
141
+
142
+ if File.extname(package.to_s) == '.zip'
143
+ unzip_package(package)
144
+ else
145
+ JSON.parse open(package).read
146
+ end
147
+ end
148
+
149
+ def is_directory?(package)
150
+ !package.start_with?('http') && File.directory?(package)
151
+ end
152
+
153
+ def is_containing_url?(package)
154
+ package.start_with?('http') && !package.end_with?('datapackage.json', 'datapackage.zip')
155
+ end
156
+
157
+ def write_to_tempfile(url)
158
+ tempfile = Tempfile.new('datapackage')
159
+ tempfile.write(open(url).read)
160
+ tempfile.rewind
161
+ tempfile
162
+ end
163
+
164
+ def unzip_package(package)
165
+ package = write_to_tempfile(package) if package.start_with?('http')
166
+ dir = Dir.mktmpdir
167
+ Zip::File.open(package) do |zip_file|
168
+ # Extract all the files
169
+ zip_file.each { |entry| entry.extract("#{dir}/#{File.basename entry.name}") }
170
+ # Get and parse the datapackage metadata
171
+ entry = zip_file.glob("*/#{opts[:default_filename] || 'datapackage.json'}").first
172
+ package = JSON.parse(entry.get_input_stream.read)
173
+ end
174
+ # Set the base dir to the directory we unzipped to
175
+ @opts[:base] = dir
176
+ # This is now a local file, not a URL
177
+ @local = true
178
+ package
159
179
  end
160
- end
180
+ end
181
+ end