chimps 0.1.1 → 0.1.2

Sign up to get free protection for your applications and to get access to all the features.
data/README.rdoc ADDED
@@ -0,0 +1,292 @@
1
+ Infochimps[http://infochimps.org] offers two APIs for users to access
2
+ and modify data:
3
+
4
+ - an XML & JSON based {RESTful API}[http://infochimps.org/api] to list, show, create, update, and destroy datasets and associated resources on Infochimps[http://infochimps.org]
5
+ - a JSON based {Query API}[http://api.infochimps.com] to query particular rows in datasets
6
+
7
+ Chimps provides a Ruby wrapper for both of these APIs (built on
8
+ RestClient) as well as a command-line tool.
9
+
10
+ See the above links for details on the sorts of parameters the
11
+ Infochimps APIs expect and the output they provide.
12
+
13
+ = Installation
14
+
15
+ Chimps is hosted as a gem on Gemcutter[http://gemcutter.org]. You can see our current gem sources with
16
+
17
+ gem sources
18
+
19
+ If you don't see <tt>http://gemcutter.org</tt> you'll have to add it
20
+ with
21
+
22
+ gem sources -a http://gemcutter.org
23
+
24
+ Then you can install Chimps with
25
+
26
+ gem install chimps
27
+
28
+ == API keys
29
+
30
+ You'll need an API key and secret from Infochimps before you can start
31
+ adding or modifying datasets via the REST API. {Sign up for an
32
+ Infochimps account}[http://infochimps.org/signup] and register for an
33
+ API key.
34
+
35
+ You'll need a separate API key to use the Query API, {register for one
36
+ now}[http://api.infochimps.com/features-and-pricing].
37
+
38
+ Once you've registered for the API(s) you'll need to put them in your
39
+ <tt>~/.chimps</tt> file which should look like
40
+
41
+ # -*-yaml-*-
42
+ :site:
43
+ :username: monkeyboy
44
+ :key: xxxxxxxxxxxxxxxx
45
+ :secret: xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
46
+ :query:
47
+ :username: monkeyboy
48
+ :key: xxxxxxxxxxxxxxxxx
49
+ :secret: xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
50
+
51
+ = Usage
52
+
53
+ Chimps can be used as a library in your own code or as a command-line
54
+ tool.
55
+
56
+ == Chimps on the Command Line
57
+
58
+ You can use Chimps directly on the command line to interact with
59
+ Infochimps.
60
+
61
+ Try running
62
+
63
+ chimps help
64
+
65
+ to get started as well as
66
+
67
+ chimps help COMMAND
68
+
69
+ for help on a specific command. When running in verbose mode (with
70
+ <tt>-v</tt>), Chimps will print helpful diagnostics on each query it's
71
+ performing.
72
+
73
+ === Testing
74
+
75
+ You can test whether or not you have access to the Infochimps REST API
76
+ with
77
+
78
+ chimps test
79
+
80
+ Chimps will try and print informative error messages if it finds it
81
+ can't authenticate you.
82
+
83
+ === Searching
84
+
85
+ Search datasets
86
+
87
+ chimps search 'statisical abstract'
88
+
89
+ or other kinds of models
90
+
91
+ chimps search -m source 'Department of Justice'
92
+
93
+ This _does_ _not_ require credentials for the RESTful API.
94
+
95
+ === Listing
96
+
97
+ You can list your datasets
98
+
99
+ chimps list
100
+
101
+ or all datasets
102
+
103
+ chimps list -a
104
+
105
+ === Showing
106
+
107
+ You can get more information about a particular dataset (as a YAML
108
+ document)
109
+
110
+ chimps show my-awesome-dataset
111
+
112
+ This _does_ _not_ require credentials for the RESTful API.
113
+
114
+ === Creating
115
+
116
+ You can create a dataset, passing properties directly on the command
117
+ line
118
+
119
+ chimps create title="My Awesome Dataset" description="Curt, but informative."
120
+ 16011 my-awesome-dataset 2010-05-25T22:52:16Z My Awesome Dataset
121
+
122
+ or from a YAML input file
123
+
124
+ chimps create my_awesome_dataset.yaml
125
+ 16011 my-awesome-dataset 2010-05-25T22:52:16Z My Awesome Dataset
126
+
127
+ Examples of input files are in the <tt>examples</tt> directory of the
128
+ Chimps distribution.
129
+
130
+ === Updating
131
+
132
+ You can also update an existing dataset
133
+
134
+ chimps update my-awesome-dataset title="My TOTALLY Awesome Dataset"
135
+
136
+ Passing in data works just like the <tt>create</tt> command.
137
+
138
+ === Destroying
139
+
140
+ You can destroy datasets as well
141
+
142
+ chimps destroy my-awesome-dataset
143
+
144
+ === Downloading
145
+
146
+ You can download a dataset from Infochimps
147
+
148
+ chimps download my-awesome-dataset
149
+
150
+ which will put it in the current directory.
151
+
152
+ You can also specify a format or package.
153
+
154
+ chimps download -f csv -p tar.bz2 my-awesome-dataset
155
+
156
+ === Uploading
157
+
158
+ You can upload data from your local machine to an existing dataset at
159
+ Infochimps
160
+
161
+ chimps upload my-awesome-dataset /path/to/my/data/*
162
+ 16005 boozer 2010-05-20T13:58:07Z boozer
163
+
164
+ Chimps will package all the files you specify into a single archive
165
+ and upload it. You can annotate the upload with a particular format
166
+ (though Chimps will try and guess). Chimps will NOT make an archive
167
+ if you only attempt to upload a single file and it is already an
168
+ archive.
169
+
170
+ Chimps uses the {Infinite
171
+ Monkeywrench}[http://github.com/infochimps/imw] to process the data
172
+ for uploads.
173
+
174
+ === Batch Jobs
175
+
176
+ Chimps allows you to peform batch requests against the Infochimps REST
177
+ API in which many changes are affected through a single API call.
178
+
179
+ chimps batch batch_data.yaml
180
+ Status Resource ID Errors
181
+ created source 13671
182
+ created dataset 16013
183
+ invalid Title is too short (minimum is 4 characters)
184
+
185
+ The contents in <tt>batch_data.yaml</tt> specify an array of resources
186
+ to update or create. Each resource's data can be attached to local
187
+ paths to upload. These paths will be packaged and uploaded (just as
188
+ in the +upload+ command) after the batch update finishes.
189
+
190
+ Errors in a particular resource will not cause the whole batch job to
191
+ fail (as above).
192
+
193
+ Learn more about the format of the <tt>batch_data.yaml</tt> file by
194
+ looking at the example in the +examples+ directory of the Chimps
195
+ distribution or by visiting the {Infochimps REST
196
+ API}[http://infochimps.org/api].
197
+
198
+ === Querying
199
+
200
+ You can also use Chimps to make queries against the Infochimps Query
201
+ API.
202
+
203
+ chimps query soc/net/tw/influence screen_name=infochimps
204
+ {"replies_out":13,"account_age":602,"statuses":166,"id":15748351,"replies_in":22,"screen_name":"infochimps"}
205
+
206
+ where parameters to include for a _single_ query can be passed in on
207
+ the command line.
208
+
209
+ If you pass in the path to a YAML file then it must consist of an
210
+ array of such parameter hashes and will result in multiple queries
211
+ being made (to the same dataset)
212
+
213
+ chimps query soc/net/tw/influene query.yaml
214
+ {"replies_out":13,"account_age":602,"statuses":166,"id":15748351,"replies_in":22,"screen_name":"infochimps"}
215
+ {"replies_out":940,"account_age":440,"statuses":5015,"id":19058681,"replies_in":88909,"screen_name":"aplusk"}
216
+ {"replies_out":0,"account_age":1123,"statuses":634,"id":813286,"replies_in":14541,"screen_name":"BarackObama"}
217
+
218
+ == Chimps as a Library
219
+
220
+ You can also use Chimps in your own code to handle making requests of
221
+ Infochimps.
222
+
223
+ === Using the REST API
224
+
225
+ The Chimps::Request class makes requests against the REST API. Create
226
+ a request by specifying a path on the Infochimps server (it _must_ end
227
+ with <tt>.json</tt>).
228
+
229
+ list_dataset_request = Chimps::Request.new('/datasets.json')
230
+ list_dataset_request.get
231
+
232
+ Some requests need be signed. Assuming you've propertly initialized
233
+ the <tt>Chimps::CONFIG</tt> Hash with the proper values (identical to
234
+ the arrangement of the <tt>~/.chimps</tt> configuration file) you can
235
+ simply ask the request to sign itself
236
+
237
+ authenticated_list_datasets_request = Chimps::Request.new('/datasets.json', :authenticate => true)
238
+
239
+ You can also pass in query params
240
+
241
+ authenticated_list_datasets_request_with_params = Chimps::Request.new('/datasets.json', :query_params => { :id => 'infochimps' }, :authenticate => true)
242
+
243
+ For POST and PUT requests you can also include data, which will also
244
+ be signed if you ask.
245
+
246
+ authenticated_create_dataset_request = Chimps::Request('/datasets.json', :data => { :title => "My Awesome Dataset", :description => "An amazing description." }, :authenticate => true)
247
+ authenticated_create_dataset_request.post
248
+
249
+ The +get+, +post+, +put+, and +delete+ methods of a Chimps::Request
250
+ all return a Chimps::Response which automatically parses the response
251
+ body into Ruby data structures.
252
+
253
+ === Using the Query API
254
+
255
+ The Chimps::QueryRequest class makes requests against the Query API.
256
+ It works just the similarly to the Chimps::Request except that the
257
+ path supplied is the path to the corresponding dataset on the {Query
258
+ API}[http://api.infochimps.com].
259
+
260
+ All QueryRequests must be authenticated.
261
+
262
+ authenticated_query_request = Chimps::QueryRequest.new('soc/net/tw/trstrank.json', :query_params => { :screen_name => 'infochimps' } )
263
+ authenticated_query_request.get
264
+
265
+ === Using Workflows
266
+
267
+ In addition to making single requests, Chimps also has a few workflows
268
+ which automate sequences of requests needed for certain complex tasks
269
+ (like uploading or downloading of data, both of which require
270
+ authorization tokens).
271
+
272
+ The three workflows implemented so far include
273
+
274
+ - Chimps::Workflows::Uploader
275
+ - Chimps::Workflows::Downloader
276
+ - Chimps::Workflows::BatchUpdater
277
+
278
+ Consult the documentation for each workflow to learn how to use it. A
279
+ brief example of how to use the Downloader:
280
+
281
+ downloader = Chimps::Workflows::Downloader.new(:dataset => 'my-awesome-dataset')
282
+ downloader.execute! # performs download
283
+
284
+ = Contributing
285
+
286
+ Chimps is an open source project created by the Infochimps team to
287
+ encourage adoption of the Infochimps APIs. The official repository is
288
+ hosted on GitHub
289
+
290
+ http://github.com/infochimps/chimps
291
+
292
+ Feel free to clone it and send pull requests.
data/VERSION CHANGED
@@ -1 +1 @@
1
- 0.1.1
1
+ 0.1.2
@@ -26,7 +26,9 @@ You can learn about the main Infochimps site API at
26
26
  EOF
27
27
 
28
28
  include Chimps::Utils::UsesYamlData
29
- IGNORE_FIRST_ARG_ON_COMMAND_LINE = true # must come after include
29
+ def ignore_first_arg_on_command_line
30
+ true
31
+ end
30
32
 
31
33
  # The dataset to query.
32
34
  #
@@ -20,6 +20,9 @@ EOF
20
20
  MODELS = %w[dataset source license]
21
21
  include Chimps::Utils::UsesModel
22
22
  include Chimps::Utils::UsesYamlData
23
+ def ignore_first_arg_on_command_line
24
+ true
25
+ end
23
26
 
24
27
  # Issue the PUT request.
25
28
  def execute!
@@ -15,7 +15,7 @@ upload this archive to Infochimps. The local archive defaults to a
15
15
  sensible name in the current directory but can also be customized.
16
16
 
17
17
  If the only file to be packaged is already a package (.zip, .tar,
18
- .tar.gz, &.c) then it will not be packaged again.
18
+ .tar.gz, &c.) then it will not be packaged again.
19
19
  EOF
20
20
 
21
21
  # The path to the archive
data/lib/chimps/config.rb CHANGED
@@ -1,13 +1,13 @@
1
1
  module Chimps
2
2
 
3
3
  # Default configuration for Chimps. User-specific configuration
4
- # usually lives in a YAML file <tt>~/.chimps</tt>.
4
+ # lives in a YAML file <tt>~/.chimps</tt>.
5
5
  CONFIG = {
6
6
  :query => {
7
- :host => 'http://api.infochimps.com'
7
+ :host => ENV["CHIMPS_QUERY_HOST"] || 'http://api.infochimps.com'
8
8
  },
9
9
  :site => {
10
- :host => 'http://infochimps.org'
10
+ :host => ENV["CHIMPS_HOST"] || 'http://infochimps.org'
11
11
  },
12
12
  :identity_file => File.expand_path(ENV["CHIMPS_RC"] || "~/.chimps"),
13
13
  :verbose => nil,
@@ -74,7 +74,7 @@ module Chimps
74
74
  #
75
75
  # @return [String]
76
76
  def host
77
- @host ||= ENV["CHIMPS_HOST"] || Chimps::CONFIG[:site][:host]
77
+ @host ||= Chimps::CONFIG[:site][:host]
78
78
  end
79
79
 
80
80
  # Return the URL for this request with the (signed, if necessary)
@@ -256,7 +256,7 @@ module Chimps
256
256
  #
257
257
  # @return [String]
258
258
  def host
259
- @host ||= ENV["CHIMPS_QUERY_HOST"] || Chimps::CONFIG[:query][:host]
259
+ @host ||= Chimps::CONFIG[:query][:host]
260
260
  end
261
261
 
262
262
  # Authenticate this request by stuffing the <tt>:requested_at</tt>
@@ -2,8 +2,12 @@ module Chimps
2
2
  module Utils
3
3
  module UsesYamlData
4
4
 
5
- IGNORE_YAML_FILES_ON_COMMAND_LINE = false
6
- IGNORE_FIRST_ARG_ON_COMMAND_LINE = true
5
+ def ignore_yaml_files_on_command_line
6
+ false
7
+ end
8
+ def ignore_first_arg_on_command_line
9
+ false
10
+ end
7
11
 
8
12
  attr_reader :data_file
9
13
 
@@ -41,7 +45,7 @@ module Chimps
41
45
  def params_from_command_line
42
46
  returning([]) do |d|
43
47
  argv.each_with_index do |arg, index|
44
- next if index == 0 && IGNORE_FIRST_ARG_ON_COMMAND_LINE
48
+ next if index == 0 && ignore_first_arg_on_command_line
45
49
  next unless arg =~ /^(\w+) *=(.*)$/
46
50
  name, value = $1.downcase.to_sym, $2.strip
47
51
  d << { name => value } # always a hash
@@ -52,7 +56,7 @@ module Chimps
52
56
  def yaml_files_from_command_line
53
57
  returning([]) do |d|
54
58
  argv.each_with_index do |arg, index|
55
- next if index == 0 && IGNORE_FIRST_ARG_ON_COMMAND_LINE
59
+ next if index == 0 && ignore_first_arg_on_command_line
56
60
  next if arg =~ /^(\w+) *=(.*)$/
57
61
  path = File.expand_path(arg)
58
62
  raise CLIError.new("No such path #{path}") unless File.exist?(path)
@@ -62,7 +66,7 @@ module Chimps
62
66
  end
63
67
 
64
68
  def data_from_command_line
65
- if self.class::IGNORE_YAML_FILES_ON_COMMAND_LINE
69
+ if ignore_yaml_files_on_command_line
66
70
  params_from_command_line
67
71
  else
68
72
  yaml_files_from_command_line + params_from_command_line
@@ -130,13 +130,13 @@ module Chimps
130
130
  #
131
131
  # @return [String]
132
132
  def readme_url
133
- File.join(Chimps::CONFIG[:host], "/README-infochimps")
133
+ File.join(Chimps::CONFIG[:site][:host], "/README-infochimps")
134
134
  end
135
135
 
136
136
  # The URL to the ICSS file for this dataset on Infochimps
137
137
  # servers
138
138
  def icss_url
139
- File.join(Chimps::CONFIG[:host], "datasets", "#{dataset}.yaml")
139
+ File.join(Chimps::CONFIG[:site][:host], "datasets", "#{dataset}.yaml")
140
140
  end
141
141
 
142
142
  # Both the local paths and remote paths to package.
@@ -193,7 +193,7 @@ module Chimps
193
193
  return if skip_packaging?
194
194
  archiver = IMW::Tools::Archiver.new(archive.name, input_paths)
195
195
  result = archiver.package(archive.path)
196
- raise PackagingError.new("Unable to package files for upload. Temporary files left in #{archiver.tmp_dir}") if result.is_a?(RuntimeError) || (!archiver.success?)
196
+ raise PackagingError.new("Unable to package files for upload. Temporary files left in #{archiver.tmp_dir}") if result.is_a?(StandardError) || (!archiver.success?)
197
197
  archiver.clean!
198
198
  end
199
199
 
metadata CHANGED
@@ -1,7 +1,7 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: chimps
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.1.1
4
+ version: 0.1.2
5
5
  platform: ruby
6
6
  authors:
7
7
  - Dhruv Bansal
@@ -70,13 +70,13 @@ extensions: []
70
70
 
71
71
  extra_rdoc_files:
72
72
  - LICENSE
73
- - README.textile
73
+ - README.rdoc
74
74
  files:
75
75
  - .document
76
76
  - .gitignore
77
77
  - CHANGELOG.textile
78
78
  - LICENSE
79
- - README.textile
79
+ - README.rdoc
80
80
  - Rakefile
81
81
  - VERSION
82
82
  - bin/chimps
data/README.textile DELETED
@@ -1,65 +0,0 @@
1
- h2. Awesome Chimp Tricks
2
-
3
- h3. Searching
4
-
5
- Search datasets
6
-
7
- chimps search statisical abstract
8
-
9
- Search sources
10
-
11
- chimps search -m source department of justice
12
-
13
- Search datasets with particular tags
14
-
15
- chimps search -t government,finance statistical abstract
16
-
17
- or categories
18
-
19
- chimps search -c education statistical abstract
20
-
21
- h3. Browsing
22
-
23
- chimps describe dataset 3923
24
- chimps describe source us-doj
25
- chimps describe field length
26
-
27
- h3. Downloading
28
-
29
- chimps download 39283
30
-
31
- h3. Creating
32
-
33
- chimps create data.yaml
34
-
35
- also
36
-
37
- chimps schema source
38
- chimps schema dataset
39
-
40
- and of course
41
-
42
- chimps upload 39283 path/to/my/data
43
-
44
- h3. General Options
45
-
46
- Work as someone other than the usual user
47
-
48
- chimps -i path/to/my/identify_file.yml create data.yaml
49
-
50
-
51
- h2. Settings and Credentials
52
-
53
- Create a file in @~/.chimps@
54
-
55
- <pre><code>
56
- # -*-yaml-*-
57
- :site:
58
- :username: monkeyboy
59
- :key: xxxxxxxxxxxxxxxx
60
- :secret: xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
61
- :query:
62
- :username: monkeyboy
63
- :key: xxxxxxxxxxxxxxxxx
64
- :secret: xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
65
- </code></pre>