data_kitten 1.0.0

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml ADDED
@@ -0,0 +1,15 @@
1
+ ---
2
+ !binary "U0hBMQ==":
3
+ metadata.gz: !binary |-
4
+ MGU0NTIxNWM3YjhlNDU1MDNiYTk2YzkwYzE3ZjZmODU5YWEyMDE4NQ==
5
+ data.tar.gz: !binary |-
6
+ YWNhOTQ2ODVjZTI0MTUxYmI3MjRhNjZlODBkNjQ5NmZjNjc5MTI1NA==
7
+ SHA512:
8
+ metadata.gz: !binary |-
9
+ Zjc4OWRjYmRmYzcxNzJjZmNkMGVmMmRhZmIyODVmMTAwZmJmZGZlNzdlZmNl
10
+ Yjk5M2ZlYzU4NTllMTcyYzdkNTk4NTJkMGQ2NzcxMDhhY2MzNjZiOTcwZWU1
11
+ M2NhOWYwY2Y5M2NhZWRhZjUyM2M5ODg2YjM5NDVkYTlhZDgyMzM=
12
+ data.tar.gz: !binary |-
13
+ OWUyMjlhYzA1YzA0NDY4ZGE1OTNjMTA3ZDA2MjMyNDkzYmVlZDY3N2ZlYzIw
14
+ MWNmOTNmZjZkNzM2YzE0OTYxZjRkNWMzOGU5NmMwMWRjMTM0NTFkYTZiYmY4
15
+ YzFjMzljNzcyYzU3ZTQ3YmQzZDJjNWFiOWVhZmNlYjU3NGNmODA=
data/LICENSE.md ADDED
@@ -0,0 +1,20 @@
1
+ Copyright 2013 The Open Data Institute
2
+
3
+ Permission is hereby granted, free of charge, to any person obtaining
4
+ a copy of this software and associated documentation files (the
5
+ "Software"), to deal in the Software without restriction, including
6
+ without limitation the rights to use, copy, modify, merge, publish,
7
+ distribute, sublicense, and/or sell copies of the Software, and to
8
+ permit persons to whom the Software is furnished to do so, subject to
9
+ the following conditions:
10
+
11
+ The above copyright notice and this permission notice shall be
12
+ included in all copies or substantial portions of the Software.
13
+
14
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
15
+ EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
16
+ MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
17
+ NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE
18
+ LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION
19
+ OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION
20
+ WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
data/README.md ADDED
@@ -0,0 +1,73 @@
1
+ [![Build Status](http://img.shields.io/travis/theodi/data_kitten.svg)](https://travis-ci.org/theodi/data_kitten)
2
+ [![Dependency Status](http://img.shields.io/gemnasium/theodi/data_kitten.svg)](https://gemnasium.com/theodi/data_kitten)
3
+ [![Coverage Status](http://img.shields.io/coveralls/theodi/data_kitten.svg)](https://coveralls.io/r/theodi/data_kitten)
4
+ [![Code Climate](http://img.shields.io/codeclimate/github/theodi/data_kitten.svg)](https://codeclimate.com/github/theodi/data_kitten)
5
+ [![Gem Version](http://img.shields.io/gem/v/data_kitten.svg)](https://rubygems.org/gems/data_kitten)
6
+ [![License](http://img.shields.io/:license-mit-blue.svg)](http://theodi.mit-license.org)
7
+ [![Badges](http://img.shields.io/:badges-7/7-ff6799.svg)](https://github.com/pikesley/badger)
8
+
9
+ # data_kitten
10
+
11
+ ![DATAS - I HAZ THEM](https://gs1.wac.edgecastcdn.net/8019B6/data.tumblr.com/67399f2b335ef62d562dc9eb41c0db16/tumblr_mmy9g7rA8M1s4aj1ho1_500.jpg)
12
+
13
+ A collection of classes that represent Datasets and other concepts, modeled on [DCAT](http://www.w3.org/TR/vocab-dcat/)
14
+
15
+ The module is designed to automatically interrogate data sources and give back data
16
+ and metadata in a consistent format. The best starting place is probably by having a look at `Dataset`.
17
+
18
+ It is designed to handle data from multiple `Sources` (such as git repositories, local files, remote URLs),
19
+ `Hosts` (GitHub, etc), and `PublishingFormats` (DataPackage, RDFa, microdata, DSPL, etc).
20
+
21
+ Currently supports Datapackages in git repositories (including but not limited to GitHub repos).
22
+ Wider support will follow.
23
+
24
+ # Documentation
25
+
26
+ Full YARD documentation is available on [Rubydoc.info](http://rubydoc.info/github/theodi/data_kitten/master/frames).
27
+
28
+ # Licence
29
+
30
+ This code is open source under the MIT license. See the LICENSE.md file for full details.
31
+
32
+ # Requirements
33
+
34
+ * Git ~> 1.2.6
35
+
36
+ # Usage
37
+
38
+ Pop the gem into your Gemfile:
39
+
40
+ gem 'data_kitten', :git => "git://github.com/theodi/data_kitten.git"
41
+
42
+ Require if you need to:
43
+
44
+ require 'data_kitten'
45
+
46
+ Request a dataset:
47
+
48
+ dataset = DataKitten::Dataset.new(access_url: "https://github.com/theodi/dataset-mod-disposals.git")
49
+
50
+ Use the results:
51
+
52
+ dataset.supported?
53
+ dataset.origin
54
+ dataset.host
55
+ dataset.data_title
56
+ dataset.documentation_url
57
+ dataset.release_type
58
+ dataset.time_sensitive?
59
+ dataset.publishing_format
60
+ dataset.maintainers
61
+ dataset.publishers
62
+ dataset.licenses
63
+ dataset.contributors
64
+ dataset.crowdsourced?
65
+ dataset.contributor_agreement_url
66
+ dataset.distributions
67
+ dataset.change_history
68
+
69
+ # And more to come!
70
+
71
+ See example usage in a Rails project at [https://github.com/theodi/git-data-viewer](https://github.com/theodi/git-data-viewer)
72
+
73
+ ![actual_data_kitten](http://i.imgur.com/wXZEkh7.gif)
data/bin/data_kitten ADDED
@@ -0,0 +1,22 @@
1
+ #!/usr/bin/env ruby
2
+ $:.unshift File.join( File.dirname(__FILE__), "..", "lib")
3
+
4
+ require 'data_kitten'
5
+ require 'pp'
6
+
7
+ if ARGV.length == 0
8
+ puts "Usage: data_kitten <access_url>"
9
+ exit 1
10
+ end
11
+
12
+ dataset = DataKitten::Dataset.new(access_url: ARGV[0])
13
+
14
+ if dataset.publishing_format == nil
15
+ puts "Unable to determine format for dataset metadata"
16
+ exit 1
17
+ end
18
+
19
+ (dataset.public_methods - Object.public_methods).sort.delete_if {|x| x.to_s =~ /=/ }.each do |method|
20
+ puts "#{method}: #{dataset.send(method).pretty_inspect}"
21
+ end
22
+
@@ -0,0 +1,43 @@
1
+ require 'csv'
2
+ require 'uri'
3
+ require 'cgi'
4
+ require 'git'
5
+ require 'json'
6
+ require 'rest-client'
7
+ require 'rdf'
8
+ require 'linkeddata'
9
+ require 'nokogiri'
10
+ require 'uri'
11
+ require 'curb'
12
+ require 'datapackage'
13
+
14
+ require 'data_kitten/license'
15
+ require 'data_kitten/rights'
16
+ require 'data_kitten/agent'
17
+ require 'data_kitten/source'
18
+ require 'data_kitten/temporal'
19
+ require 'data_kitten/dataset'
20
+ require 'data_kitten/distribution_format'
21
+ require 'data_kitten/distribution'
22
+
23
+ # A collection of classes that represent Datasets and other concepts, modeled on {http://www.w3.org/TR/vocab-dcat/ DCAT}.
24
+ #
25
+ # The module is designed to automatically interrogate data sources and give back data and metadata in a consistent
26
+ # format. The best starting place is probably by having a look at {Dataset}.
27
+ #
28
+ # It is designed to handle data from multiple {Sources} (such as git repositories, local files, remote URLs),
29
+ # {Hosts} (GitHub, etc), and {PublishingFormats} (DataPackage, RDFa, microdata, DSPL, etc).
30
+ #
31
+ # Currently supports Datapackages in git repositories (including but not limited to GitHub repos). Wider support will follow.
32
+ #
33
+ # https://gs1.wac.edgecastcdn.net/8019B6/data.tumblr.com/67399f2b335ef62d562dc9eb41c0db16/tumblr_mmy9g7rA8M1s4aj1ho1_500.jpg
34
+ #
35
+ # @example Load a Dataset from a git repository
36
+ # dataset = Dataset.new(access_url: 'git://github.com/theodi/dataset-metadata-survey.git')
37
+ # dataset.supported? # => true
38
+ # dataset.origin # => :git
39
+ # dataset.host # => :github
40
+ # dataset.publishing_format # => :datapackage
41
+ # dataset.distributions # => [Distribution<#1>, Distribution<#2>]
42
+ # dataset.distributions[0].headers # => ['col1', 'col2']
43
+ # dataset.distributions[0].data[0] # => {'col1' => 'value_1', 'col2' => 'value_2'}
@@ -0,0 +1,38 @@
1
+ module DataKitten
2
+
3
+ # A person or organisation.
4
+ #
5
+ # Naming is based on {http://xmlns.com/foaf/spec/#term_Agent foaf:Agent}, but with useful aliases for other vocabularies.
6
+ class Agent
7
+
8
+ # Create a new Agent
9
+ #
10
+ # @param [Hash] options the details of the Agent.
11
+ # @option options [String] :name The Agent's name
12
+ # @option options [String] :homepage The homepage URL for the Agent
13
+ # @option options [String] :mbox Email address for the Agent
14
+ #
15
+ def initialize(options)
16
+ @name = options[:name]
17
+ @homepage = options[:homepage]
18
+ @mbox = options[:mbox]
19
+ end
20
+
21
+ # @!attribute name
22
+ # @return [String] the name of the Agent
23
+ attr_accessor :name
24
+
25
+ # @!attribute homepage
26
+ # @return [String] the homepage URL of the Agent
27
+ attr_accessor :homepage
28
+ alias_method :url, :homepage
29
+ alias_method :uri, :homepage
30
+
31
+ # @!attribute mbox
32
+ # @return [String] the email address of the Agent
33
+ attr_accessor :mbox
34
+ alias_method :email, :mbox
35
+
36
+ end
37
+
38
+ end
@@ -0,0 +1,227 @@
1
+ require 'data_kitten/origins'
2
+ require 'data_kitten/hosts'
3
+ require 'data_kitten/publishing_formats'
4
+
5
+ module DataKitten
6
+
7
+ # Represents a single dataset from some origin (see {http://www.w3.org/TR/vocab-dcat/#class-dataset dcat:Dataset}
8
+ # for relevant vocabulary).
9
+ #
10
+ # Designed to be created with a URI to the dataset, and then to work out metadata from there.
11
+ #
12
+ # Currently supports Datasets hosted in Git (and optionally on GitHub), and which
13
+ # use the Datapackage metadata format.
14
+ #
15
+ # @example Load a Dataset from a git repository
16
+ # dataset = Dataset.new(access_url: 'git://github.com/theodi/dataset-metadata-survey.git')
17
+ # dataset.supported? # => true
18
+ # dataset.origin # => :git
19
+ # dataset.host # => :github
20
+ # dataset.publishing_format # => :datapackage
21
+ #
22
+ class Dataset
23
+
24
+ include DataKitten::Origins
25
+ include DataKitten::Hosts
26
+ include DataKitten::PublishingFormats
27
+
28
+ # @!attribute access_url
29
+ # @return [String] the URL that gives access to the dataset
30
+ attr_accessor :access_url
31
+ alias_method :uri, :access_url
32
+ alias_method :url, :access_url
33
+
34
+ # Create a new Dataset object
35
+ #
36
+ # @param [Hash] options the details of the Dataset.
37
+ # @option options [String] :access_url A URL that can be used to access the Dataset.
38
+ # The class will attempt to auto-load metadata from this URL.
39
+ #
40
+ def initialize(options)
41
+ @access_url = options[:access_url]
42
+ detect_origin
43
+ detect_host
44
+ detect_publishing_format
45
+ end
46
+
47
+ # Can metadata be loaded for this Dataset?
48
+ #
49
+ # @return [Boolean] true if metadata can be loaded, false if it's
50
+ # an unknown origin type, or has an unknown metadata format.
51
+ def supported?
52
+ !(origin.nil? || publishing_format.nil?)
53
+ end
54
+
55
+ # The origin type of the dataset.
56
+ #
57
+ # @return [Symbol] The origin type. For instance, datasets loaded from git
58
+ # repositories will return +:git+. If no origin type is
59
+ # identified, will return +nil+.
60
+ def origin
61
+ nil
62
+ end
63
+
64
+ # Where the dataset is hosted.
65
+ #
66
+ # @return [Symbol] The host. For instance, data loaded from github repositories
67
+ # will return +:github+. This can be used to control extra host-specific
68
+ # behaviour if required. If no host type is identified, will return +nil+.
69
+ def host
70
+ nil
71
+ end
72
+
73
+ # The human-readable title of the dataset.
74
+ #
75
+ # @return [String] the title of the dataset.
76
+ def data_title
77
+ nil
78
+ end
79
+
80
+ # A brief description of the dataset
81
+ #
82
+ # @return [String] the description of the dataset.
83
+ def description
84
+ nil
85
+ end
86
+
87
+ # Keywords for the dataset
88
+ #
89
+ # @return [Array<string>] an array of keywords
90
+ def keywords
91
+ []
92
+ end
93
+
94
+ # Human-readable documentation for the dataset.
95
+ #
96
+ # @return [String] the URL of the documentation.
97
+ def documentation_url
98
+ nil
99
+ end
100
+
101
+ # What type of dataset is this?
102
+ # Options are: +:web_service+ for API-accessible data, or +:one_off+ for downloadable data dumps.
103
+ #
104
+ # @return [Symbol] the release type.
105
+ def release_type
106
+ false
107
+ end
108
+
109
+ # Date the dataset was released
110
+ #
111
+ # @return [Date] the release date of the dataset
112
+ def issued
113
+ nil
114
+ end
115
+ alias_method :release_date, :issued
116
+
117
+ # Date the dataset was last modified
118
+ #
119
+ # @return [Date] the dataset's last modified date
120
+ def modified
121
+ nil
122
+ end
123
+
124
+ # The temporal coverage of the dataset
125
+ #
126
+ # @return [Object<Temporal>] the start and end dates of the dataset's temporal coverage
127
+ def temporal
128
+ nil
129
+ end
130
+
131
+ # Where the data is sourced from
132
+ #
133
+ # @return [Array<Source>] the sources of the data, each as a Source object.
134
+ def sources
135
+ []
136
+ end
137
+
138
+ # Is the information time-sensitive?
139
+ #
140
+ # @return [Boolean] whether the information will go out of date.
141
+ def time_sensitive?
142
+ false
143
+ end
144
+
145
+ # The publishing format for the dataset.
146
+ #
147
+ # @return [Symbol] The format. For instance, datasets that publish metadata in
148
+ # Datapackage format will return +:datapackage+. If no format
149
+ # is identified, will return +nil+.
150
+ def publishing_format
151
+ nil
152
+ end
153
+
154
+ # A list of maintainers
155
+ #
156
+ # @return [Array<Agent>] An array of maintainers, each as an Agent object.
157
+ def maintainers
158
+ []
159
+ end
160
+
161
+ # A list of publishers
162
+ #
163
+ # @return [Array<Agent>] An array of publishers, each as an Agent object.
164
+ def publishers
165
+ []
166
+ end
167
+
168
+ # A list of licenses
169
+ #
170
+ # @return [Array<License>] An array of licenses, each as a License object.
171
+ def licenses
172
+ []
173
+ end
174
+
175
+ # The rights statment for the data
176
+ #
177
+ # @return [Object<Rights>] How the content and data can be used, as well as copyright notice and attribution URL
178
+ def rights
179
+ nil
180
+ end
181
+
182
+ # A list of contributors
183
+ #
184
+ # @return [Array<Agent>] An array of contributors to the dataset, each as an Agent object.
185
+ def contributors
186
+ []
187
+ end
188
+
189
+ # Has the data been crowdsourced?
190
+ #
191
+ # @return [Boolean] Whether the data has been crowdsourced or not.
192
+ def crowdsourced?
193
+ false
194
+ end
195
+
196
+ # The URL of the contributor license agreement
197
+ #
198
+ # @return [String] A URL for the agreement that contributors accept.
199
+ def contributor_agreement_url
200
+ nil
201
+ end
202
+
203
+ # A list of distributions. Has aliases for popular alternative vocabularies.
204
+ #
205
+ # @return [Array<Distribution>] An array of Distribution objects.
206
+ def distributions
207
+ []
208
+ end
209
+ alias_method :files, :distributions
210
+ alias_method :resources, :distributions
211
+
212
+ # How frequently the data is updated.
213
+ #
214
+ # @return [String] The frequency of update expressed as a dct:Frequency.
215
+ def update_frequency
216
+ nil
217
+ end
218
+
219
+ # A history of changes to the Dataset
220
+ #
221
+ # @return [Array] An array of changes. Exact format depends on the origin and publishing format.
222
+ def change_history
223
+ []
224
+ end
225
+
226
+ end
227
+ end