chimps 0.1.1 → 0.1.2
Sign up to get free protection for your applications and to get access to all the features.
- data/README.rdoc +292 -0
- data/VERSION +1 -1
- data/lib/chimps/commands/query.rb +3 -1
- data/lib/chimps/commands/update.rb +3 -0
- data/lib/chimps/commands/upload.rb +1 -1
- data/lib/chimps/config.rb +3 -3
- data/lib/chimps/request.rb +2 -2
- data/lib/chimps/utils/uses_yaml_data.rb +9 -5
- data/lib/chimps/workflows/uploader.rb +3 -3
- metadata +3 -3
- data/README.textile +0 -65
data/README.rdoc
ADDED
@@ -0,0 +1,292 @@
|
|
1
|
+
Infochimps[http://infochimps.org] offers two APIs for users to access
|
2
|
+
and modify data:
|
3
|
+
|
4
|
+
- an XML & JSON based {RESTful API}[http://infochimps.org/api] to list, show, create, update, and destroy datasets and associated resources on Infochimps[http://infochimps.org]
|
5
|
+
- a JSON based {Query API}[http://api.infochimps.com] to query particular rows in datasets
|
6
|
+
|
7
|
+
Chimps provides a Ruby wrapper for both of these APIs (built on
|
8
|
+
RestClient) as well as a command-line tool.
|
9
|
+
|
10
|
+
See the above links for details on the sorts of parameters the
|
11
|
+
Infochimps APIs expect and the output they provide.
|
12
|
+
|
13
|
+
= Installation
|
14
|
+
|
15
|
+
Chimps is hosted as a gem on Gemcutter[http://gemcutter.org]. You can see our current gem sources with
|
16
|
+
|
17
|
+
gem sources
|
18
|
+
|
19
|
+
If you don't see <tt>http://gemcutter.org</tt> you'll have to add it
|
20
|
+
with
|
21
|
+
|
22
|
+
gem sources -a http://gemcutter.org
|
23
|
+
|
24
|
+
Then you can install Chimps with
|
25
|
+
|
26
|
+
gem install chimps
|
27
|
+
|
28
|
+
== API keys
|
29
|
+
|
30
|
+
You'll need an API key and secret from Infochimps before you can start
|
31
|
+
adding or modifying datasets via the REST API. {Sign up for an
|
32
|
+
Infochimps account}[http://infochimps.org/signup] and register for an
|
33
|
+
API key.
|
34
|
+
|
35
|
+
You'll need a separate API key to use the Query API, {register for one
|
36
|
+
now}[http://api.infochimps.com/features-and-pricing].
|
37
|
+
|
38
|
+
Once you've registered for the API(s) you'll need to put them in your
|
39
|
+
<tt>~/.chimps</tt> file which should look like
|
40
|
+
|
41
|
+
# -*-yaml-*-
|
42
|
+
:site:
|
43
|
+
:username: monkeyboy
|
44
|
+
:key: xxxxxxxxxxxxxxxx
|
45
|
+
:secret: xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
|
46
|
+
:query:
|
47
|
+
:username: monkeyboy
|
48
|
+
:key: xxxxxxxxxxxxxxxxx
|
49
|
+
:secret: xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
|
50
|
+
|
51
|
+
= Usage
|
52
|
+
|
53
|
+
Chimps can be used as a library in your own code or as a command-line
|
54
|
+
tool.
|
55
|
+
|
56
|
+
== Chimps on the Command Line
|
57
|
+
|
58
|
+
You can use Chimps directly on the command line to interact with
|
59
|
+
Infochimps.
|
60
|
+
|
61
|
+
Try running
|
62
|
+
|
63
|
+
chimps help
|
64
|
+
|
65
|
+
to get started as well as
|
66
|
+
|
67
|
+
chimps help COMMAND
|
68
|
+
|
69
|
+
for help on a specific command. When running in verbose mode (with
|
70
|
+
<tt>-v</tt>), Chimps will print helpful diagnostics on each query it's
|
71
|
+
performing.
|
72
|
+
|
73
|
+
=== Testing
|
74
|
+
|
75
|
+
You can test whether or not you have access to the Infochimps REST API
|
76
|
+
with
|
77
|
+
|
78
|
+
chimps test
|
79
|
+
|
80
|
+
Chimps will try and print informative error messages if it finds it
|
81
|
+
can't authenticate you.
|
82
|
+
|
83
|
+
=== Searching
|
84
|
+
|
85
|
+
Search datasets
|
86
|
+
|
87
|
+
chimps search 'statisical abstract'
|
88
|
+
|
89
|
+
or other kinds of models
|
90
|
+
|
91
|
+
chimps search -m source 'Department of Justice'
|
92
|
+
|
93
|
+
This _does_ _not_ require credentials for the RESTful API.
|
94
|
+
|
95
|
+
=== Listing
|
96
|
+
|
97
|
+
You can list your datasets
|
98
|
+
|
99
|
+
chimps list
|
100
|
+
|
101
|
+
or all datasets
|
102
|
+
|
103
|
+
chimps list -a
|
104
|
+
|
105
|
+
=== Showing
|
106
|
+
|
107
|
+
You can get more information about a particular dataset (as a YAML
|
108
|
+
document)
|
109
|
+
|
110
|
+
chimps show my-awesome-dataset
|
111
|
+
|
112
|
+
This _does_ _not_ require credentials for the RESTful API.
|
113
|
+
|
114
|
+
=== Creating
|
115
|
+
|
116
|
+
You can create a dataset, passing properties directly on the command
|
117
|
+
line
|
118
|
+
|
119
|
+
chimps create title="My Awesome Dataset" description="Curt, but informative."
|
120
|
+
16011 my-awesome-dataset 2010-05-25T22:52:16Z My Awesome Dataset
|
121
|
+
|
122
|
+
or from a YAML input file
|
123
|
+
|
124
|
+
chimps create my_awesome_dataset.yaml
|
125
|
+
16011 my-awesome-dataset 2010-05-25T22:52:16Z My Awesome Dataset
|
126
|
+
|
127
|
+
Examples of input files are in the <tt>examples</tt> directory of the
|
128
|
+
Chimps distribution.
|
129
|
+
|
130
|
+
=== Updating
|
131
|
+
|
132
|
+
You can also update an existing dataset
|
133
|
+
|
134
|
+
chimps update my-awesome-dataset title="My TOTALLY Awesome Dataset"
|
135
|
+
|
136
|
+
Passing in data works just like the <tt>create</tt> command.
|
137
|
+
|
138
|
+
=== Destroying
|
139
|
+
|
140
|
+
You can destroy datasets as well
|
141
|
+
|
142
|
+
chimps destroy my-awesome-dataset
|
143
|
+
|
144
|
+
=== Downloading
|
145
|
+
|
146
|
+
You can download a dataset from Infochimps
|
147
|
+
|
148
|
+
chimps download my-awesome-dataset
|
149
|
+
|
150
|
+
which will put it in the current directory.
|
151
|
+
|
152
|
+
You can also specify a format or package.
|
153
|
+
|
154
|
+
chimps download -f csv -p tar.bz2 my-awesome-dataset
|
155
|
+
|
156
|
+
=== Uploading
|
157
|
+
|
158
|
+
You can upload data from your local machine to an existing dataset at
|
159
|
+
Infochimps
|
160
|
+
|
161
|
+
chimps upload my-awesome-dataset /path/to/my/data/*
|
162
|
+
16005 boozer 2010-05-20T13:58:07Z boozer
|
163
|
+
|
164
|
+
Chimps will package all the files you specify into a single archive
|
165
|
+
and upload it. You can annotate the upload with a particular format
|
166
|
+
(though Chimps will try and guess). Chimps will NOT make an archive
|
167
|
+
if you only attempt to upload a single file and it is already an
|
168
|
+
archive.
|
169
|
+
|
170
|
+
Chimps uses the {Infinite
|
171
|
+
Monkeywrench}[http://github.com/infochimps/imw] to process the data
|
172
|
+
for uploads.
|
173
|
+
|
174
|
+
=== Batch Jobs
|
175
|
+
|
176
|
+
Chimps allows you to peform batch requests against the Infochimps REST
|
177
|
+
API in which many changes are affected through a single API call.
|
178
|
+
|
179
|
+
chimps batch batch_data.yaml
|
180
|
+
Status Resource ID Errors
|
181
|
+
created source 13671
|
182
|
+
created dataset 16013
|
183
|
+
invalid Title is too short (minimum is 4 characters)
|
184
|
+
|
185
|
+
The contents in <tt>batch_data.yaml</tt> specify an array of resources
|
186
|
+
to update or create. Each resource's data can be attached to local
|
187
|
+
paths to upload. These paths will be packaged and uploaded (just as
|
188
|
+
in the +upload+ command) after the batch update finishes.
|
189
|
+
|
190
|
+
Errors in a particular resource will not cause the whole batch job to
|
191
|
+
fail (as above).
|
192
|
+
|
193
|
+
Learn more about the format of the <tt>batch_data.yaml</tt> file by
|
194
|
+
looking at the example in the +examples+ directory of the Chimps
|
195
|
+
distribution or by visiting the {Infochimps REST
|
196
|
+
API}[http://infochimps.org/api].
|
197
|
+
|
198
|
+
=== Querying
|
199
|
+
|
200
|
+
You can also use Chimps to make queries against the Infochimps Query
|
201
|
+
API.
|
202
|
+
|
203
|
+
chimps query soc/net/tw/influence screen_name=infochimps
|
204
|
+
{"replies_out":13,"account_age":602,"statuses":166,"id":15748351,"replies_in":22,"screen_name":"infochimps"}
|
205
|
+
|
206
|
+
where parameters to include for a _single_ query can be passed in on
|
207
|
+
the command line.
|
208
|
+
|
209
|
+
If you pass in the path to a YAML file then it must consist of an
|
210
|
+
array of such parameter hashes and will result in multiple queries
|
211
|
+
being made (to the same dataset)
|
212
|
+
|
213
|
+
chimps query soc/net/tw/influene query.yaml
|
214
|
+
{"replies_out":13,"account_age":602,"statuses":166,"id":15748351,"replies_in":22,"screen_name":"infochimps"}
|
215
|
+
{"replies_out":940,"account_age":440,"statuses":5015,"id":19058681,"replies_in":88909,"screen_name":"aplusk"}
|
216
|
+
{"replies_out":0,"account_age":1123,"statuses":634,"id":813286,"replies_in":14541,"screen_name":"BarackObama"}
|
217
|
+
|
218
|
+
== Chimps as a Library
|
219
|
+
|
220
|
+
You can also use Chimps in your own code to handle making requests of
|
221
|
+
Infochimps.
|
222
|
+
|
223
|
+
=== Using the REST API
|
224
|
+
|
225
|
+
The Chimps::Request class makes requests against the REST API. Create
|
226
|
+
a request by specifying a path on the Infochimps server (it _must_ end
|
227
|
+
with <tt>.json</tt>).
|
228
|
+
|
229
|
+
list_dataset_request = Chimps::Request.new('/datasets.json')
|
230
|
+
list_dataset_request.get
|
231
|
+
|
232
|
+
Some requests need be signed. Assuming you've propertly initialized
|
233
|
+
the <tt>Chimps::CONFIG</tt> Hash with the proper values (identical to
|
234
|
+
the arrangement of the <tt>~/.chimps</tt> configuration file) you can
|
235
|
+
simply ask the request to sign itself
|
236
|
+
|
237
|
+
authenticated_list_datasets_request = Chimps::Request.new('/datasets.json', :authenticate => true)
|
238
|
+
|
239
|
+
You can also pass in query params
|
240
|
+
|
241
|
+
authenticated_list_datasets_request_with_params = Chimps::Request.new('/datasets.json', :query_params => { :id => 'infochimps' }, :authenticate => true)
|
242
|
+
|
243
|
+
For POST and PUT requests you can also include data, which will also
|
244
|
+
be signed if you ask.
|
245
|
+
|
246
|
+
authenticated_create_dataset_request = Chimps::Request('/datasets.json', :data => { :title => "My Awesome Dataset", :description => "An amazing description." }, :authenticate => true)
|
247
|
+
authenticated_create_dataset_request.post
|
248
|
+
|
249
|
+
The +get+, +post+, +put+, and +delete+ methods of a Chimps::Request
|
250
|
+
all return a Chimps::Response which automatically parses the response
|
251
|
+
body into Ruby data structures.
|
252
|
+
|
253
|
+
=== Using the Query API
|
254
|
+
|
255
|
+
The Chimps::QueryRequest class makes requests against the Query API.
|
256
|
+
It works just the similarly to the Chimps::Request except that the
|
257
|
+
path supplied is the path to the corresponding dataset on the {Query
|
258
|
+
API}[http://api.infochimps.com].
|
259
|
+
|
260
|
+
All QueryRequests must be authenticated.
|
261
|
+
|
262
|
+
authenticated_query_request = Chimps::QueryRequest.new('soc/net/tw/trstrank.json', :query_params => { :screen_name => 'infochimps' } )
|
263
|
+
authenticated_query_request.get
|
264
|
+
|
265
|
+
=== Using Workflows
|
266
|
+
|
267
|
+
In addition to making single requests, Chimps also has a few workflows
|
268
|
+
which automate sequences of requests needed for certain complex tasks
|
269
|
+
(like uploading or downloading of data, both of which require
|
270
|
+
authorization tokens).
|
271
|
+
|
272
|
+
The three workflows implemented so far include
|
273
|
+
|
274
|
+
- Chimps::Workflows::Uploader
|
275
|
+
- Chimps::Workflows::Downloader
|
276
|
+
- Chimps::Workflows::BatchUpdater
|
277
|
+
|
278
|
+
Consult the documentation for each workflow to learn how to use it. A
|
279
|
+
brief example of how to use the Downloader:
|
280
|
+
|
281
|
+
downloader = Chimps::Workflows::Downloader.new(:dataset => 'my-awesome-dataset')
|
282
|
+
downloader.execute! # performs download
|
283
|
+
|
284
|
+
= Contributing
|
285
|
+
|
286
|
+
Chimps is an open source project created by the Infochimps team to
|
287
|
+
encourage adoption of the Infochimps APIs. The official repository is
|
288
|
+
hosted on GitHub
|
289
|
+
|
290
|
+
http://github.com/infochimps/chimps
|
291
|
+
|
292
|
+
Feel free to clone it and send pull requests.
|
data/VERSION
CHANGED
@@ -1 +1 @@
|
|
1
|
-
0.1.
|
1
|
+
0.1.2
|
@@ -26,7 +26,9 @@ You can learn about the main Infochimps site API at
|
|
26
26
|
EOF
|
27
27
|
|
28
28
|
include Chimps::Utils::UsesYamlData
|
29
|
-
|
29
|
+
def ignore_first_arg_on_command_line
|
30
|
+
true
|
31
|
+
end
|
30
32
|
|
31
33
|
# The dataset to query.
|
32
34
|
#
|
@@ -15,7 +15,7 @@ upload this archive to Infochimps. The local archive defaults to a
|
|
15
15
|
sensible name in the current directory but can also be customized.
|
16
16
|
|
17
17
|
If the only file to be packaged is already a package (.zip, .tar,
|
18
|
-
.tar.gz,
|
18
|
+
.tar.gz, &c.) then it will not be packaged again.
|
19
19
|
EOF
|
20
20
|
|
21
21
|
# The path to the archive
|
data/lib/chimps/config.rb
CHANGED
@@ -1,13 +1,13 @@
|
|
1
1
|
module Chimps
|
2
2
|
|
3
3
|
# Default configuration for Chimps. User-specific configuration
|
4
|
-
#
|
4
|
+
# lives in a YAML file <tt>~/.chimps</tt>.
|
5
5
|
CONFIG = {
|
6
6
|
:query => {
|
7
|
-
:host => 'http://api.infochimps.com'
|
7
|
+
:host => ENV["CHIMPS_QUERY_HOST"] || 'http://api.infochimps.com'
|
8
8
|
},
|
9
9
|
:site => {
|
10
|
-
:host => 'http://infochimps.org'
|
10
|
+
:host => ENV["CHIMPS_HOST"] || 'http://infochimps.org'
|
11
11
|
},
|
12
12
|
:identity_file => File.expand_path(ENV["CHIMPS_RC"] || "~/.chimps"),
|
13
13
|
:verbose => nil,
|
data/lib/chimps/request.rb
CHANGED
@@ -74,7 +74,7 @@ module Chimps
|
|
74
74
|
#
|
75
75
|
# @return [String]
|
76
76
|
def host
|
77
|
-
@host ||=
|
77
|
+
@host ||= Chimps::CONFIG[:site][:host]
|
78
78
|
end
|
79
79
|
|
80
80
|
# Return the URL for this request with the (signed, if necessary)
|
@@ -256,7 +256,7 @@ module Chimps
|
|
256
256
|
#
|
257
257
|
# @return [String]
|
258
258
|
def host
|
259
|
-
@host ||=
|
259
|
+
@host ||= Chimps::CONFIG[:query][:host]
|
260
260
|
end
|
261
261
|
|
262
262
|
# Authenticate this request by stuffing the <tt>:requested_at</tt>
|
@@ -2,8 +2,12 @@ module Chimps
|
|
2
2
|
module Utils
|
3
3
|
module UsesYamlData
|
4
4
|
|
5
|
-
|
6
|
-
|
5
|
+
def ignore_yaml_files_on_command_line
|
6
|
+
false
|
7
|
+
end
|
8
|
+
def ignore_first_arg_on_command_line
|
9
|
+
false
|
10
|
+
end
|
7
11
|
|
8
12
|
attr_reader :data_file
|
9
13
|
|
@@ -41,7 +45,7 @@ module Chimps
|
|
41
45
|
def params_from_command_line
|
42
46
|
returning([]) do |d|
|
43
47
|
argv.each_with_index do |arg, index|
|
44
|
-
next if index == 0 &&
|
48
|
+
next if index == 0 && ignore_first_arg_on_command_line
|
45
49
|
next unless arg =~ /^(\w+) *=(.*)$/
|
46
50
|
name, value = $1.downcase.to_sym, $2.strip
|
47
51
|
d << { name => value } # always a hash
|
@@ -52,7 +56,7 @@ module Chimps
|
|
52
56
|
def yaml_files_from_command_line
|
53
57
|
returning([]) do |d|
|
54
58
|
argv.each_with_index do |arg, index|
|
55
|
-
next if index == 0 &&
|
59
|
+
next if index == 0 && ignore_first_arg_on_command_line
|
56
60
|
next if arg =~ /^(\w+) *=(.*)$/
|
57
61
|
path = File.expand_path(arg)
|
58
62
|
raise CLIError.new("No such path #{path}") unless File.exist?(path)
|
@@ -62,7 +66,7 @@ module Chimps
|
|
62
66
|
end
|
63
67
|
|
64
68
|
def data_from_command_line
|
65
|
-
if
|
69
|
+
if ignore_yaml_files_on_command_line
|
66
70
|
params_from_command_line
|
67
71
|
else
|
68
72
|
yaml_files_from_command_line + params_from_command_line
|
@@ -130,13 +130,13 @@ module Chimps
|
|
130
130
|
#
|
131
131
|
# @return [String]
|
132
132
|
def readme_url
|
133
|
-
File.join(Chimps::CONFIG[:host], "/README-infochimps")
|
133
|
+
File.join(Chimps::CONFIG[:site][:host], "/README-infochimps")
|
134
134
|
end
|
135
135
|
|
136
136
|
# The URL to the ICSS file for this dataset on Infochimps
|
137
137
|
# servers
|
138
138
|
def icss_url
|
139
|
-
File.join(Chimps::CONFIG[:host], "datasets", "#{dataset}.yaml")
|
139
|
+
File.join(Chimps::CONFIG[:site][:host], "datasets", "#{dataset}.yaml")
|
140
140
|
end
|
141
141
|
|
142
142
|
# Both the local paths and remote paths to package.
|
@@ -193,7 +193,7 @@ module Chimps
|
|
193
193
|
return if skip_packaging?
|
194
194
|
archiver = IMW::Tools::Archiver.new(archive.name, input_paths)
|
195
195
|
result = archiver.package(archive.path)
|
196
|
-
raise PackagingError.new("Unable to package files for upload. Temporary files left in #{archiver.tmp_dir}") if result.is_a?(
|
196
|
+
raise PackagingError.new("Unable to package files for upload. Temporary files left in #{archiver.tmp_dir}") if result.is_a?(StandardError) || (!archiver.success?)
|
197
197
|
archiver.clean!
|
198
198
|
end
|
199
199
|
|
metadata
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
2
2
|
name: chimps
|
3
3
|
version: !ruby/object:Gem::Version
|
4
|
-
version: 0.1.
|
4
|
+
version: 0.1.2
|
5
5
|
platform: ruby
|
6
6
|
authors:
|
7
7
|
- Dhruv Bansal
|
@@ -70,13 +70,13 @@ extensions: []
|
|
70
70
|
|
71
71
|
extra_rdoc_files:
|
72
72
|
- LICENSE
|
73
|
-
- README.
|
73
|
+
- README.rdoc
|
74
74
|
files:
|
75
75
|
- .document
|
76
76
|
- .gitignore
|
77
77
|
- CHANGELOG.textile
|
78
78
|
- LICENSE
|
79
|
-
- README.
|
79
|
+
- README.rdoc
|
80
80
|
- Rakefile
|
81
81
|
- VERSION
|
82
82
|
- bin/chimps
|
data/README.textile
DELETED
@@ -1,65 +0,0 @@
|
|
1
|
-
h2. Awesome Chimp Tricks
|
2
|
-
|
3
|
-
h3. Searching
|
4
|
-
|
5
|
-
Search datasets
|
6
|
-
|
7
|
-
chimps search statisical abstract
|
8
|
-
|
9
|
-
Search sources
|
10
|
-
|
11
|
-
chimps search -m source department of justice
|
12
|
-
|
13
|
-
Search datasets with particular tags
|
14
|
-
|
15
|
-
chimps search -t government,finance statistical abstract
|
16
|
-
|
17
|
-
or categories
|
18
|
-
|
19
|
-
chimps search -c education statistical abstract
|
20
|
-
|
21
|
-
h3. Browsing
|
22
|
-
|
23
|
-
chimps describe dataset 3923
|
24
|
-
chimps describe source us-doj
|
25
|
-
chimps describe field length
|
26
|
-
|
27
|
-
h3. Downloading
|
28
|
-
|
29
|
-
chimps download 39283
|
30
|
-
|
31
|
-
h3. Creating
|
32
|
-
|
33
|
-
chimps create data.yaml
|
34
|
-
|
35
|
-
also
|
36
|
-
|
37
|
-
chimps schema source
|
38
|
-
chimps schema dataset
|
39
|
-
|
40
|
-
and of course
|
41
|
-
|
42
|
-
chimps upload 39283 path/to/my/data
|
43
|
-
|
44
|
-
h3. General Options
|
45
|
-
|
46
|
-
Work as someone other than the usual user
|
47
|
-
|
48
|
-
chimps -i path/to/my/identify_file.yml create data.yaml
|
49
|
-
|
50
|
-
|
51
|
-
h2. Settings and Credentials
|
52
|
-
|
53
|
-
Create a file in @~/.chimps@
|
54
|
-
|
55
|
-
<pre><code>
|
56
|
-
# -*-yaml-*-
|
57
|
-
:site:
|
58
|
-
:username: monkeyboy
|
59
|
-
:key: xxxxxxxxxxxxxxxx
|
60
|
-
:secret: xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
|
61
|
-
:query:
|
62
|
-
:username: monkeyboy
|
63
|
-
:key: xxxxxxxxxxxxxxxxx
|
64
|
-
:secret: xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
|
65
|
-
</code></pre>
|