chimps 0.1.1 → 0.1.2
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- data/README.rdoc +292 -0
- data/VERSION +1 -1
- data/lib/chimps/commands/query.rb +3 -1
- data/lib/chimps/commands/update.rb +3 -0
- data/lib/chimps/commands/upload.rb +1 -1
- data/lib/chimps/config.rb +3 -3
- data/lib/chimps/request.rb +2 -2
- data/lib/chimps/utils/uses_yaml_data.rb +9 -5
- data/lib/chimps/workflows/uploader.rb +3 -3
- metadata +3 -3
- data/README.textile +0 -65
data/README.rdoc
ADDED
@@ -0,0 +1,292 @@
|
|
1
|
+
Infochimps[http://infochimps.org] offers two APIs for users to access
|
2
|
+
and modify data:
|
3
|
+
|
4
|
+
- an XML & JSON based {RESTful API}[http://infochimps.org/api] to list, show, create, update, and destroy datasets and associated resources on Infochimps[http://infochimps.org]
|
5
|
+
- a JSON based {Query API}[http://api.infochimps.com] to query particular rows in datasets
|
6
|
+
|
7
|
+
Chimps provides a Ruby wrapper for both of these APIs (built on
|
8
|
+
RestClient) as well as a command-line tool.
|
9
|
+
|
10
|
+
See the above links for details on the sorts of parameters the
|
11
|
+
Infochimps APIs expect and the output they provide.
|
12
|
+
|
13
|
+
= Installation
|
14
|
+
|
15
|
+
Chimps is hosted as a gem on Gemcutter[http://gemcutter.org]. You can see our current gem sources with
|
16
|
+
|
17
|
+
gem sources
|
18
|
+
|
19
|
+
If you don't see <tt>http://gemcutter.org</tt> you'll have to add it
|
20
|
+
with
|
21
|
+
|
22
|
+
gem sources -a http://gemcutter.org
|
23
|
+
|
24
|
+
Then you can install Chimps with
|
25
|
+
|
26
|
+
gem install chimps
|
27
|
+
|
28
|
+
== API keys
|
29
|
+
|
30
|
+
You'll need an API key and secret from Infochimps before you can start
|
31
|
+
adding or modifying datasets via the REST API. {Sign up for an
|
32
|
+
Infochimps account}[http://infochimps.org/signup] and register for an
|
33
|
+
API key.
|
34
|
+
|
35
|
+
You'll need a separate API key to use the Query API, {register for one
|
36
|
+
now}[http://api.infochimps.com/features-and-pricing].
|
37
|
+
|
38
|
+
Once you've registered for the API(s) you'll need to put them in your
|
39
|
+
<tt>~/.chimps</tt> file which should look like
|
40
|
+
|
41
|
+
# -*-yaml-*-
|
42
|
+
:site:
|
43
|
+
:username: monkeyboy
|
44
|
+
:key: xxxxxxxxxxxxxxxx
|
45
|
+
:secret: xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
|
46
|
+
:query:
|
47
|
+
:username: monkeyboy
|
48
|
+
:key: xxxxxxxxxxxxxxxxx
|
49
|
+
:secret: xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
|
50
|
+
|
51
|
+
= Usage
|
52
|
+
|
53
|
+
Chimps can be used as a library in your own code or as a command-line
|
54
|
+
tool.
|
55
|
+
|
56
|
+
== Chimps on the Command Line
|
57
|
+
|
58
|
+
You can use Chimps directly on the command line to interact with
|
59
|
+
Infochimps.
|
60
|
+
|
61
|
+
Try running
|
62
|
+
|
63
|
+
chimps help
|
64
|
+
|
65
|
+
to get started as well as
|
66
|
+
|
67
|
+
chimps help COMMAND
|
68
|
+
|
69
|
+
for help on a specific command. When running in verbose mode (with
|
70
|
+
<tt>-v</tt>), Chimps will print helpful diagnostics on each query it's
|
71
|
+
performing.
|
72
|
+
|
73
|
+
=== Testing
|
74
|
+
|
75
|
+
You can test whether or not you have access to the Infochimps REST API
|
76
|
+
with
|
77
|
+
|
78
|
+
chimps test
|
79
|
+
|
80
|
+
Chimps will try and print informative error messages if it finds it
|
81
|
+
can't authenticate you.
|
82
|
+
|
83
|
+
=== Searching
|
84
|
+
|
85
|
+
Search datasets
|
86
|
+
|
87
|
+
chimps search 'statisical abstract'
|
88
|
+
|
89
|
+
or other kinds of models
|
90
|
+
|
91
|
+
chimps search -m source 'Department of Justice'
|
92
|
+
|
93
|
+
This _does_ _not_ require credentials for the RESTful API.
|
94
|
+
|
95
|
+
=== Listing
|
96
|
+
|
97
|
+
You can list your datasets
|
98
|
+
|
99
|
+
chimps list
|
100
|
+
|
101
|
+
or all datasets
|
102
|
+
|
103
|
+
chimps list -a
|
104
|
+
|
105
|
+
=== Showing
|
106
|
+
|
107
|
+
You can get more information about a particular dataset (as a YAML
|
108
|
+
document)
|
109
|
+
|
110
|
+
chimps show my-awesome-dataset
|
111
|
+
|
112
|
+
This _does_ _not_ require credentials for the RESTful API.
|
113
|
+
|
114
|
+
=== Creating
|
115
|
+
|
116
|
+
You can create a dataset, passing properties directly on the command
|
117
|
+
line
|
118
|
+
|
119
|
+
chimps create title="My Awesome Dataset" description="Curt, but informative."
|
120
|
+
16011 my-awesome-dataset 2010-05-25T22:52:16Z My Awesome Dataset
|
121
|
+
|
122
|
+
or from a YAML input file
|
123
|
+
|
124
|
+
chimps create my_awesome_dataset.yaml
|
125
|
+
16011 my-awesome-dataset 2010-05-25T22:52:16Z My Awesome Dataset
|
126
|
+
|
127
|
+
Examples of input files are in the <tt>examples</tt> directory of the
|
128
|
+
Chimps distribution.
|
129
|
+
|
130
|
+
=== Updating
|
131
|
+
|
132
|
+
You can also update an existing dataset
|
133
|
+
|
134
|
+
chimps update my-awesome-dataset title="My TOTALLY Awesome Dataset"
|
135
|
+
|
136
|
+
Passing in data works just like the <tt>create</tt> command.
|
137
|
+
|
138
|
+
=== Destroying
|
139
|
+
|
140
|
+
You can destroy datasets as well
|
141
|
+
|
142
|
+
chimps destroy my-awesome-dataset
|
143
|
+
|
144
|
+
=== Downloading
|
145
|
+
|
146
|
+
You can download a dataset from Infochimps
|
147
|
+
|
148
|
+
chimps download my-awesome-dataset
|
149
|
+
|
150
|
+
which will put it in the current directory.
|
151
|
+
|
152
|
+
You can also specify a format or package.
|
153
|
+
|
154
|
+
chimps download -f csv -p tar.bz2 my-awesome-dataset
|
155
|
+
|
156
|
+
=== Uploading
|
157
|
+
|
158
|
+
You can upload data from your local machine to an existing dataset at
|
159
|
+
Infochimps
|
160
|
+
|
161
|
+
chimps upload my-awesome-dataset /path/to/my/data/*
|
162
|
+
16005 boozer 2010-05-20T13:58:07Z boozer
|
163
|
+
|
164
|
+
Chimps will package all the files you specify into a single archive
|
165
|
+
and upload it. You can annotate the upload with a particular format
|
166
|
+
(though Chimps will try and guess). Chimps will NOT make an archive
|
167
|
+
if you only attempt to upload a single file and it is already an
|
168
|
+
archive.
|
169
|
+
|
170
|
+
Chimps uses the {Infinite
|
171
|
+
Monkeywrench}[http://github.com/infochimps/imw] to process the data
|
172
|
+
for uploads.
|
173
|
+
|
174
|
+
=== Batch Jobs
|
175
|
+
|
176
|
+
Chimps allows you to peform batch requests against the Infochimps REST
|
177
|
+
API in which many changes are affected through a single API call.
|
178
|
+
|
179
|
+
chimps batch batch_data.yaml
|
180
|
+
Status Resource ID Errors
|
181
|
+
created source 13671
|
182
|
+
created dataset 16013
|
183
|
+
invalid Title is too short (minimum is 4 characters)
|
184
|
+
|
185
|
+
The contents in <tt>batch_data.yaml</tt> specify an array of resources
|
186
|
+
to update or create. Each resource's data can be attached to local
|
187
|
+
paths to upload. These paths will be packaged and uploaded (just as
|
188
|
+
in the +upload+ command) after the batch update finishes.
|
189
|
+
|
190
|
+
Errors in a particular resource will not cause the whole batch job to
|
191
|
+
fail (as above).
|
192
|
+
|
193
|
+
Learn more about the format of the <tt>batch_data.yaml</tt> file by
|
194
|
+
looking at the example in the +examples+ directory of the Chimps
|
195
|
+
distribution or by visiting the {Infochimps REST
|
196
|
+
API}[http://infochimps.org/api].
|
197
|
+
|
198
|
+
=== Querying
|
199
|
+
|
200
|
+
You can also use Chimps to make queries against the Infochimps Query
|
201
|
+
API.
|
202
|
+
|
203
|
+
chimps query soc/net/tw/influence screen_name=infochimps
|
204
|
+
{"replies_out":13,"account_age":602,"statuses":166,"id":15748351,"replies_in":22,"screen_name":"infochimps"}
|
205
|
+
|
206
|
+
where parameters to include for a _single_ query can be passed in on
|
207
|
+
the command line.
|
208
|
+
|
209
|
+
If you pass in the path to a YAML file then it must consist of an
|
210
|
+
array of such parameter hashes and will result in multiple queries
|
211
|
+
being made (to the same dataset)
|
212
|
+
|
213
|
+
chimps query soc/net/tw/influene query.yaml
|
214
|
+
{"replies_out":13,"account_age":602,"statuses":166,"id":15748351,"replies_in":22,"screen_name":"infochimps"}
|
215
|
+
{"replies_out":940,"account_age":440,"statuses":5015,"id":19058681,"replies_in":88909,"screen_name":"aplusk"}
|
216
|
+
{"replies_out":0,"account_age":1123,"statuses":634,"id":813286,"replies_in":14541,"screen_name":"BarackObama"}
|
217
|
+
|
218
|
+
== Chimps as a Library
|
219
|
+
|
220
|
+
You can also use Chimps in your own code to handle making requests of
|
221
|
+
Infochimps.
|
222
|
+
|
223
|
+
=== Using the REST API
|
224
|
+
|
225
|
+
The Chimps::Request class makes requests against the REST API. Create
|
226
|
+
a request by specifying a path on the Infochimps server (it _must_ end
|
227
|
+
with <tt>.json</tt>).
|
228
|
+
|
229
|
+
list_dataset_request = Chimps::Request.new('/datasets.json')
|
230
|
+
list_dataset_request.get
|
231
|
+
|
232
|
+
Some requests need be signed. Assuming you've propertly initialized
|
233
|
+
the <tt>Chimps::CONFIG</tt> Hash with the proper values (identical to
|
234
|
+
the arrangement of the <tt>~/.chimps</tt> configuration file) you can
|
235
|
+
simply ask the request to sign itself
|
236
|
+
|
237
|
+
authenticated_list_datasets_request = Chimps::Request.new('/datasets.json', :authenticate => true)
|
238
|
+
|
239
|
+
You can also pass in query params
|
240
|
+
|
241
|
+
authenticated_list_datasets_request_with_params = Chimps::Request.new('/datasets.json', :query_params => { :id => 'infochimps' }, :authenticate => true)
|
242
|
+
|
243
|
+
For POST and PUT requests you can also include data, which will also
|
244
|
+
be signed if you ask.
|
245
|
+
|
246
|
+
authenticated_create_dataset_request = Chimps::Request('/datasets.json', :data => { :title => "My Awesome Dataset", :description => "An amazing description." }, :authenticate => true)
|
247
|
+
authenticated_create_dataset_request.post
|
248
|
+
|
249
|
+
The +get+, +post+, +put+, and +delete+ methods of a Chimps::Request
|
250
|
+
all return a Chimps::Response which automatically parses the response
|
251
|
+
body into Ruby data structures.
|
252
|
+
|
253
|
+
=== Using the Query API
|
254
|
+
|
255
|
+
The Chimps::QueryRequest class makes requests against the Query API.
|
256
|
+
It works just the similarly to the Chimps::Request except that the
|
257
|
+
path supplied is the path to the corresponding dataset on the {Query
|
258
|
+
API}[http://api.infochimps.com].
|
259
|
+
|
260
|
+
All QueryRequests must be authenticated.
|
261
|
+
|
262
|
+
authenticated_query_request = Chimps::QueryRequest.new('soc/net/tw/trstrank.json', :query_params => { :screen_name => 'infochimps' } )
|
263
|
+
authenticated_query_request.get
|
264
|
+
|
265
|
+
=== Using Workflows
|
266
|
+
|
267
|
+
In addition to making single requests, Chimps also has a few workflows
|
268
|
+
which automate sequences of requests needed for certain complex tasks
|
269
|
+
(like uploading or downloading of data, both of which require
|
270
|
+
authorization tokens).
|
271
|
+
|
272
|
+
The three workflows implemented so far include
|
273
|
+
|
274
|
+
- Chimps::Workflows::Uploader
|
275
|
+
- Chimps::Workflows::Downloader
|
276
|
+
- Chimps::Workflows::BatchUpdater
|
277
|
+
|
278
|
+
Consult the documentation for each workflow to learn how to use it. A
|
279
|
+
brief example of how to use the Downloader:
|
280
|
+
|
281
|
+
downloader = Chimps::Workflows::Downloader.new(:dataset => 'my-awesome-dataset')
|
282
|
+
downloader.execute! # performs download
|
283
|
+
|
284
|
+
= Contributing
|
285
|
+
|
286
|
+
Chimps is an open source project created by the Infochimps team to
|
287
|
+
encourage adoption of the Infochimps APIs. The official repository is
|
288
|
+
hosted on GitHub
|
289
|
+
|
290
|
+
http://github.com/infochimps/chimps
|
291
|
+
|
292
|
+
Feel free to clone it and send pull requests.
|
data/VERSION
CHANGED
@@ -1 +1 @@
|
|
1
|
-
0.1.
|
1
|
+
0.1.2
|
@@ -26,7 +26,9 @@ You can learn about the main Infochimps site API at
|
|
26
26
|
EOF
|
27
27
|
|
28
28
|
include Chimps::Utils::UsesYamlData
|
29
|
-
|
29
|
+
def ignore_first_arg_on_command_line
|
30
|
+
true
|
31
|
+
end
|
30
32
|
|
31
33
|
# The dataset to query.
|
32
34
|
#
|
@@ -15,7 +15,7 @@ upload this archive to Infochimps. The local archive defaults to a
|
|
15
15
|
sensible name in the current directory but can also be customized.
|
16
16
|
|
17
17
|
If the only file to be packaged is already a package (.zip, .tar,
|
18
|
-
.tar.gz,
|
18
|
+
.tar.gz, &c.) then it will not be packaged again.
|
19
19
|
EOF
|
20
20
|
|
21
21
|
# The path to the archive
|
data/lib/chimps/config.rb
CHANGED
@@ -1,13 +1,13 @@
|
|
1
1
|
module Chimps
|
2
2
|
|
3
3
|
# Default configuration for Chimps. User-specific configuration
|
4
|
-
#
|
4
|
+
# lives in a YAML file <tt>~/.chimps</tt>.
|
5
5
|
CONFIG = {
|
6
6
|
:query => {
|
7
|
-
:host => 'http://api.infochimps.com'
|
7
|
+
:host => ENV["CHIMPS_QUERY_HOST"] || 'http://api.infochimps.com'
|
8
8
|
},
|
9
9
|
:site => {
|
10
|
-
:host => 'http://infochimps.org'
|
10
|
+
:host => ENV["CHIMPS_HOST"] || 'http://infochimps.org'
|
11
11
|
},
|
12
12
|
:identity_file => File.expand_path(ENV["CHIMPS_RC"] || "~/.chimps"),
|
13
13
|
:verbose => nil,
|
data/lib/chimps/request.rb
CHANGED
@@ -74,7 +74,7 @@ module Chimps
|
|
74
74
|
#
|
75
75
|
# @return [String]
|
76
76
|
def host
|
77
|
-
@host ||=
|
77
|
+
@host ||= Chimps::CONFIG[:site][:host]
|
78
78
|
end
|
79
79
|
|
80
80
|
# Return the URL for this request with the (signed, if necessary)
|
@@ -256,7 +256,7 @@ module Chimps
|
|
256
256
|
#
|
257
257
|
# @return [String]
|
258
258
|
def host
|
259
|
-
@host ||=
|
259
|
+
@host ||= Chimps::CONFIG[:query][:host]
|
260
260
|
end
|
261
261
|
|
262
262
|
# Authenticate this request by stuffing the <tt>:requested_at</tt>
|
@@ -2,8 +2,12 @@ module Chimps
|
|
2
2
|
module Utils
|
3
3
|
module UsesYamlData
|
4
4
|
|
5
|
-
|
6
|
-
|
5
|
+
def ignore_yaml_files_on_command_line
|
6
|
+
false
|
7
|
+
end
|
8
|
+
def ignore_first_arg_on_command_line
|
9
|
+
false
|
10
|
+
end
|
7
11
|
|
8
12
|
attr_reader :data_file
|
9
13
|
|
@@ -41,7 +45,7 @@ module Chimps
|
|
41
45
|
def params_from_command_line
|
42
46
|
returning([]) do |d|
|
43
47
|
argv.each_with_index do |arg, index|
|
44
|
-
next if index == 0 &&
|
48
|
+
next if index == 0 && ignore_first_arg_on_command_line
|
45
49
|
next unless arg =~ /^(\w+) *=(.*)$/
|
46
50
|
name, value = $1.downcase.to_sym, $2.strip
|
47
51
|
d << { name => value } # always a hash
|
@@ -52,7 +56,7 @@ module Chimps
|
|
52
56
|
def yaml_files_from_command_line
|
53
57
|
returning([]) do |d|
|
54
58
|
argv.each_with_index do |arg, index|
|
55
|
-
next if index == 0 &&
|
59
|
+
next if index == 0 && ignore_first_arg_on_command_line
|
56
60
|
next if arg =~ /^(\w+) *=(.*)$/
|
57
61
|
path = File.expand_path(arg)
|
58
62
|
raise CLIError.new("No such path #{path}") unless File.exist?(path)
|
@@ -62,7 +66,7 @@ module Chimps
|
|
62
66
|
end
|
63
67
|
|
64
68
|
def data_from_command_line
|
65
|
-
if
|
69
|
+
if ignore_yaml_files_on_command_line
|
66
70
|
params_from_command_line
|
67
71
|
else
|
68
72
|
yaml_files_from_command_line + params_from_command_line
|
@@ -130,13 +130,13 @@ module Chimps
|
|
130
130
|
#
|
131
131
|
# @return [String]
|
132
132
|
def readme_url
|
133
|
-
File.join(Chimps::CONFIG[:host], "/README-infochimps")
|
133
|
+
File.join(Chimps::CONFIG[:site][:host], "/README-infochimps")
|
134
134
|
end
|
135
135
|
|
136
136
|
# The URL to the ICSS file for this dataset on Infochimps
|
137
137
|
# servers
|
138
138
|
def icss_url
|
139
|
-
File.join(Chimps::CONFIG[:host], "datasets", "#{dataset}.yaml")
|
139
|
+
File.join(Chimps::CONFIG[:site][:host], "datasets", "#{dataset}.yaml")
|
140
140
|
end
|
141
141
|
|
142
142
|
# Both the local paths and remote paths to package.
|
@@ -193,7 +193,7 @@ module Chimps
|
|
193
193
|
return if skip_packaging?
|
194
194
|
archiver = IMW::Tools::Archiver.new(archive.name, input_paths)
|
195
195
|
result = archiver.package(archive.path)
|
196
|
-
raise PackagingError.new("Unable to package files for upload. Temporary files left in #{archiver.tmp_dir}") if result.is_a?(
|
196
|
+
raise PackagingError.new("Unable to package files for upload. Temporary files left in #{archiver.tmp_dir}") if result.is_a?(StandardError) || (!archiver.success?)
|
197
197
|
archiver.clean!
|
198
198
|
end
|
199
199
|
|
metadata
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
2
2
|
name: chimps
|
3
3
|
version: !ruby/object:Gem::Version
|
4
|
-
version: 0.1.
|
4
|
+
version: 0.1.2
|
5
5
|
platform: ruby
|
6
6
|
authors:
|
7
7
|
- Dhruv Bansal
|
@@ -70,13 +70,13 @@ extensions: []
|
|
70
70
|
|
71
71
|
extra_rdoc_files:
|
72
72
|
- LICENSE
|
73
|
-
- README.
|
73
|
+
- README.rdoc
|
74
74
|
files:
|
75
75
|
- .document
|
76
76
|
- .gitignore
|
77
77
|
- CHANGELOG.textile
|
78
78
|
- LICENSE
|
79
|
-
- README.
|
79
|
+
- README.rdoc
|
80
80
|
- Rakefile
|
81
81
|
- VERSION
|
82
82
|
- bin/chimps
|
data/README.textile
DELETED
@@ -1,65 +0,0 @@
|
|
1
|
-
h2. Awesome Chimp Tricks
|
2
|
-
|
3
|
-
h3. Searching
|
4
|
-
|
5
|
-
Search datasets
|
6
|
-
|
7
|
-
chimps search statisical abstract
|
8
|
-
|
9
|
-
Search sources
|
10
|
-
|
11
|
-
chimps search -m source department of justice
|
12
|
-
|
13
|
-
Search datasets with particular tags
|
14
|
-
|
15
|
-
chimps search -t government,finance statistical abstract
|
16
|
-
|
17
|
-
or categories
|
18
|
-
|
19
|
-
chimps search -c education statistical abstract
|
20
|
-
|
21
|
-
h3. Browsing
|
22
|
-
|
23
|
-
chimps describe dataset 3923
|
24
|
-
chimps describe source us-doj
|
25
|
-
chimps describe field length
|
26
|
-
|
27
|
-
h3. Downloading
|
28
|
-
|
29
|
-
chimps download 39283
|
30
|
-
|
31
|
-
h3. Creating
|
32
|
-
|
33
|
-
chimps create data.yaml
|
34
|
-
|
35
|
-
also
|
36
|
-
|
37
|
-
chimps schema source
|
38
|
-
chimps schema dataset
|
39
|
-
|
40
|
-
and of course
|
41
|
-
|
42
|
-
chimps upload 39283 path/to/my/data
|
43
|
-
|
44
|
-
h3. General Options
|
45
|
-
|
46
|
-
Work as someone other than the usual user
|
47
|
-
|
48
|
-
chimps -i path/to/my/identify_file.yml create data.yaml
|
49
|
-
|
50
|
-
|
51
|
-
h2. Settings and Credentials
|
52
|
-
|
53
|
-
Create a file in @~/.chimps@
|
54
|
-
|
55
|
-
<pre><code>
|
56
|
-
# -*-yaml-*-
|
57
|
-
:site:
|
58
|
-
:username: monkeyboy
|
59
|
-
:key: xxxxxxxxxxxxxxxx
|
60
|
-
:secret: xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
|
61
|
-
:query:
|
62
|
-
:username: monkeyboy
|
63
|
-
:key: xxxxxxxxxxxxxxxxx
|
64
|
-
:secret: xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
|
65
|
-
</code></pre>
|