chimps 0.1.1 → 0.1.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
data/README.rdoc ADDED
@@ -0,0 +1,292 @@
1
+ Infochimps[http://infochimps.org] offers two APIs for users to access
2
+ and modify data:
3
+
4
+ - an XML & JSON based {RESTful API}[http://infochimps.org/api] to list, show, create, update, and destroy datasets and associated resources on Infochimps[http://infochimps.org]
5
+ - a JSON based {Query API}[http://api.infochimps.com] to query particular rows in datasets
6
+
7
+ Chimps provides a Ruby wrapper for both of these APIs (built on
8
+ RestClient) as well as a command-line tool.
9
+
10
+ See the above links for details on the sorts of parameters the
11
+ Infochimps APIs expect and the output they provide.
12
+
13
+ = Installation
14
+
15
+ Chimps is hosted as a gem on Gemcutter[http://gemcutter.org]. You can see our current gem sources with
16
+
17
+ gem sources
18
+
19
+ If you don't see <tt>http://gemcutter.org</tt> you'll have to add it
20
+ with
21
+
22
+ gem sources -a http://gemcutter.org
23
+
24
+ Then you can install Chimps with
25
+
26
+ gem install chimps
27
+
28
+ == API keys
29
+
30
+ You'll need an API key and secret from Infochimps before you can start
31
+ adding or modifying datasets via the REST API. {Sign up for an
32
+ Infochimps account}[http://infochimps.org/signup] and register for an
33
+ API key.
34
+
35
+ You'll need a separate API key to use the Query API, {register for one
36
+ now}[http://api.infochimps.com/features-and-pricing].
37
+
38
+ Once you've registered for the API(s) you'll need to put them in your
39
+ <tt>~/.chimps</tt> file which should look like
40
+
41
+ # -*-yaml-*-
42
+ :site:
43
+ :username: monkeyboy
44
+ :key: xxxxxxxxxxxxxxxx
45
+ :secret: xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
46
+ :query:
47
+ :username: monkeyboy
48
+ :key: xxxxxxxxxxxxxxxxx
49
+ :secret: xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
50
+
51
+ = Usage
52
+
53
+ Chimps can be used as a library in your own code or as a command-line
54
+ tool.
55
+
56
+ == Chimps on the Command Line
57
+
58
+ You can use Chimps directly on the command line to interact with
59
+ Infochimps.
60
+
61
+ Try running
62
+
63
+ chimps help
64
+
65
+ to get started as well as
66
+
67
+ chimps help COMMAND
68
+
69
+ for help on a specific command. When running in verbose mode (with
70
+ <tt>-v</tt>), Chimps will print helpful diagnostics on each query it's
71
+ performing.
72
+
73
+ === Testing
74
+
75
+ You can test whether or not you have access to the Infochimps REST API
76
+ with
77
+
78
+ chimps test
79
+
80
+ Chimps will try and print informative error messages if it finds it
81
+ can't authenticate you.
82
+
83
+ === Searching
84
+
85
+ Search datasets
86
+
87
+ chimps search 'statisical abstract'
88
+
89
+ or other kinds of models
90
+
91
+ chimps search -m source 'Department of Justice'
92
+
93
+ This _does_ _not_ require credentials for the RESTful API.
94
+
95
+ === Listing
96
+
97
+ You can list your datasets
98
+
99
+ chimps list
100
+
101
+ or all datasets
102
+
103
+ chimps list -a
104
+
105
+ === Showing
106
+
107
+ You can get more information about a particular dataset (as a YAML
108
+ document)
109
+
110
+ chimps show my-awesome-dataset
111
+
112
+ This _does_ _not_ require credentials for the RESTful API.
113
+
114
+ === Creating
115
+
116
+ You can create a dataset, passing properties directly on the command
117
+ line
118
+
119
+ chimps create title="My Awesome Dataset" description="Curt, but informative."
120
+ 16011 my-awesome-dataset 2010-05-25T22:52:16Z My Awesome Dataset
121
+
122
+ or from a YAML input file
123
+
124
+ chimps create my_awesome_dataset.yaml
125
+ 16011 my-awesome-dataset 2010-05-25T22:52:16Z My Awesome Dataset
126
+
127
+ Examples of input files are in the <tt>examples</tt> directory of the
128
+ Chimps distribution.
129
+
130
+ === Updating
131
+
132
+ You can also update an existing dataset
133
+
134
+ chimps update my-awesome-dataset title="My TOTALLY Awesome Dataset"
135
+
136
+ Passing in data works just like the <tt>create</tt> command.
137
+
138
+ === Destroying
139
+
140
+ You can destroy datasets as well
141
+
142
+ chimps destroy my-awesome-dataset
143
+
144
+ === Downloading
145
+
146
+ You can download a dataset from Infochimps
147
+
148
+ chimps download my-awesome-dataset
149
+
150
+ which will put it in the current directory.
151
+
152
+ You can also specify a format or package.
153
+
154
+ chimps download -f csv -p tar.bz2 my-awesome-dataset
155
+
156
+ === Uploading
157
+
158
+ You can upload data from your local machine to an existing dataset at
159
+ Infochimps
160
+
161
+ chimps upload my-awesome-dataset /path/to/my/data/*
162
+ 16005 boozer 2010-05-20T13:58:07Z boozer
163
+
164
+ Chimps will package all the files you specify into a single archive
165
+ and upload it. You can annotate the upload with a particular format
166
+ (though Chimps will try and guess). Chimps will NOT make an archive
167
+ if you only attempt to upload a single file and it is already an
168
+ archive.
169
+
170
+ Chimps uses the {Infinite
171
+ Monkeywrench}[http://github.com/infochimps/imw] to process the data
172
+ for uploads.
173
+
174
+ === Batch Jobs
175
+
176
+ Chimps allows you to peform batch requests against the Infochimps REST
177
+ API in which many changes are affected through a single API call.
178
+
179
+ chimps batch batch_data.yaml
180
+ Status Resource ID Errors
181
+ created source 13671
182
+ created dataset 16013
183
+ invalid Title is too short (minimum is 4 characters)
184
+
185
+ The contents in <tt>batch_data.yaml</tt> specify an array of resources
186
+ to update or create. Each resource's data can be attached to local
187
+ paths to upload. These paths will be packaged and uploaded (just as
188
+ in the +upload+ command) after the batch update finishes.
189
+
190
+ Errors in a particular resource will not cause the whole batch job to
191
+ fail (as above).
192
+
193
+ Learn more about the format of the <tt>batch_data.yaml</tt> file by
194
+ looking at the example in the +examples+ directory of the Chimps
195
+ distribution or by visiting the {Infochimps REST
196
+ API}[http://infochimps.org/api].
197
+
198
+ === Querying
199
+
200
+ You can also use Chimps to make queries against the Infochimps Query
201
+ API.
202
+
203
+ chimps query soc/net/tw/influence screen_name=infochimps
204
+ {"replies_out":13,"account_age":602,"statuses":166,"id":15748351,"replies_in":22,"screen_name":"infochimps"}
205
+
206
+ where parameters to include for a _single_ query can be passed in on
207
+ the command line.
208
+
209
+ If you pass in the path to a YAML file then it must consist of an
210
+ array of such parameter hashes and will result in multiple queries
211
+ being made (to the same dataset)
212
+
213
+ chimps query soc/net/tw/influene query.yaml
214
+ {"replies_out":13,"account_age":602,"statuses":166,"id":15748351,"replies_in":22,"screen_name":"infochimps"}
215
+ {"replies_out":940,"account_age":440,"statuses":5015,"id":19058681,"replies_in":88909,"screen_name":"aplusk"}
216
+ {"replies_out":0,"account_age":1123,"statuses":634,"id":813286,"replies_in":14541,"screen_name":"BarackObama"}
217
+
218
+ == Chimps as a Library
219
+
220
+ You can also use Chimps in your own code to handle making requests of
221
+ Infochimps.
222
+
223
+ === Using the REST API
224
+
225
+ The Chimps::Request class makes requests against the REST API. Create
226
+ a request by specifying a path on the Infochimps server (it _must_ end
227
+ with <tt>.json</tt>).
228
+
229
+ list_dataset_request = Chimps::Request.new('/datasets.json')
230
+ list_dataset_request.get
231
+
232
+ Some requests need be signed. Assuming you've propertly initialized
233
+ the <tt>Chimps::CONFIG</tt> Hash with the proper values (identical to
234
+ the arrangement of the <tt>~/.chimps</tt> configuration file) you can
235
+ simply ask the request to sign itself
236
+
237
+ authenticated_list_datasets_request = Chimps::Request.new('/datasets.json', :authenticate => true)
238
+
239
+ You can also pass in query params
240
+
241
+ authenticated_list_datasets_request_with_params = Chimps::Request.new('/datasets.json', :query_params => { :id => 'infochimps' }, :authenticate => true)
242
+
243
+ For POST and PUT requests you can also include data, which will also
244
+ be signed if you ask.
245
+
246
+ authenticated_create_dataset_request = Chimps::Request('/datasets.json', :data => { :title => "My Awesome Dataset", :description => "An amazing description." }, :authenticate => true)
247
+ authenticated_create_dataset_request.post
248
+
249
+ The +get+, +post+, +put+, and +delete+ methods of a Chimps::Request
250
+ all return a Chimps::Response which automatically parses the response
251
+ body into Ruby data structures.
252
+
253
+ === Using the Query API
254
+
255
+ The Chimps::QueryRequest class makes requests against the Query API.
256
+ It works just the similarly to the Chimps::Request except that the
257
+ path supplied is the path to the corresponding dataset on the {Query
258
+ API}[http://api.infochimps.com].
259
+
260
+ All QueryRequests must be authenticated.
261
+
262
+ authenticated_query_request = Chimps::QueryRequest.new('soc/net/tw/trstrank.json', :query_params => { :screen_name => 'infochimps' } )
263
+ authenticated_query_request.get
264
+
265
+ === Using Workflows
266
+
267
+ In addition to making single requests, Chimps also has a few workflows
268
+ which automate sequences of requests needed for certain complex tasks
269
+ (like uploading or downloading of data, both of which require
270
+ authorization tokens).
271
+
272
+ The three workflows implemented so far include
273
+
274
+ - Chimps::Workflows::Uploader
275
+ - Chimps::Workflows::Downloader
276
+ - Chimps::Workflows::BatchUpdater
277
+
278
+ Consult the documentation for each workflow to learn how to use it. A
279
+ brief example of how to use the Downloader:
280
+
281
+ downloader = Chimps::Workflows::Downloader.new(:dataset => 'my-awesome-dataset')
282
+ downloader.execute! # performs download
283
+
284
+ = Contributing
285
+
286
+ Chimps is an open source project created by the Infochimps team to
287
+ encourage adoption of the Infochimps APIs. The official repository is
288
+ hosted on GitHub
289
+
290
+ http://github.com/infochimps/chimps
291
+
292
+ Feel free to clone it and send pull requests.
data/VERSION CHANGED
@@ -1 +1 @@
1
- 0.1.1
1
+ 0.1.2
@@ -26,7 +26,9 @@ You can learn about the main Infochimps site API at
26
26
  EOF
27
27
 
28
28
  include Chimps::Utils::UsesYamlData
29
- IGNORE_FIRST_ARG_ON_COMMAND_LINE = true # must come after include
29
+ def ignore_first_arg_on_command_line
30
+ true
31
+ end
30
32
 
31
33
  # The dataset to query.
32
34
  #
@@ -20,6 +20,9 @@ EOF
20
20
  MODELS = %w[dataset source license]
21
21
  include Chimps::Utils::UsesModel
22
22
  include Chimps::Utils::UsesYamlData
23
+ def ignore_first_arg_on_command_line
24
+ true
25
+ end
23
26
 
24
27
  # Issue the PUT request.
25
28
  def execute!
@@ -15,7 +15,7 @@ upload this archive to Infochimps. The local archive defaults to a
15
15
  sensible name in the current directory but can also be customized.
16
16
 
17
17
  If the only file to be packaged is already a package (.zip, .tar,
18
- .tar.gz, &.c) then it will not be packaged again.
18
+ .tar.gz, &c.) then it will not be packaged again.
19
19
  EOF
20
20
 
21
21
  # The path to the archive
data/lib/chimps/config.rb CHANGED
@@ -1,13 +1,13 @@
1
1
  module Chimps
2
2
 
3
3
  # Default configuration for Chimps. User-specific configuration
4
- # usually lives in a YAML file <tt>~/.chimps</tt>.
4
+ # lives in a YAML file <tt>~/.chimps</tt>.
5
5
  CONFIG = {
6
6
  :query => {
7
- :host => 'http://api.infochimps.com'
7
+ :host => ENV["CHIMPS_QUERY_HOST"] || 'http://api.infochimps.com'
8
8
  },
9
9
  :site => {
10
- :host => 'http://infochimps.org'
10
+ :host => ENV["CHIMPS_HOST"] || 'http://infochimps.org'
11
11
  },
12
12
  :identity_file => File.expand_path(ENV["CHIMPS_RC"] || "~/.chimps"),
13
13
  :verbose => nil,
@@ -74,7 +74,7 @@ module Chimps
74
74
  #
75
75
  # @return [String]
76
76
  def host
77
- @host ||= ENV["CHIMPS_HOST"] || Chimps::CONFIG[:site][:host]
77
+ @host ||= Chimps::CONFIG[:site][:host]
78
78
  end
79
79
 
80
80
  # Return the URL for this request with the (signed, if necessary)
@@ -256,7 +256,7 @@ module Chimps
256
256
  #
257
257
  # @return [String]
258
258
  def host
259
- @host ||= ENV["CHIMPS_QUERY_HOST"] || Chimps::CONFIG[:query][:host]
259
+ @host ||= Chimps::CONFIG[:query][:host]
260
260
  end
261
261
 
262
262
  # Authenticate this request by stuffing the <tt>:requested_at</tt>
@@ -2,8 +2,12 @@ module Chimps
2
2
  module Utils
3
3
  module UsesYamlData
4
4
 
5
- IGNORE_YAML_FILES_ON_COMMAND_LINE = false
6
- IGNORE_FIRST_ARG_ON_COMMAND_LINE = true
5
+ def ignore_yaml_files_on_command_line
6
+ false
7
+ end
8
+ def ignore_first_arg_on_command_line
9
+ false
10
+ end
7
11
 
8
12
  attr_reader :data_file
9
13
 
@@ -41,7 +45,7 @@ module Chimps
41
45
  def params_from_command_line
42
46
  returning([]) do |d|
43
47
  argv.each_with_index do |arg, index|
44
- next if index == 0 && IGNORE_FIRST_ARG_ON_COMMAND_LINE
48
+ next if index == 0 && ignore_first_arg_on_command_line
45
49
  next unless arg =~ /^(\w+) *=(.*)$/
46
50
  name, value = $1.downcase.to_sym, $2.strip
47
51
  d << { name => value } # always a hash
@@ -52,7 +56,7 @@ module Chimps
52
56
  def yaml_files_from_command_line
53
57
  returning([]) do |d|
54
58
  argv.each_with_index do |arg, index|
55
- next if index == 0 && IGNORE_FIRST_ARG_ON_COMMAND_LINE
59
+ next if index == 0 && ignore_first_arg_on_command_line
56
60
  next if arg =~ /^(\w+) *=(.*)$/
57
61
  path = File.expand_path(arg)
58
62
  raise CLIError.new("No such path #{path}") unless File.exist?(path)
@@ -62,7 +66,7 @@ module Chimps
62
66
  end
63
67
 
64
68
  def data_from_command_line
65
- if self.class::IGNORE_YAML_FILES_ON_COMMAND_LINE
69
+ if ignore_yaml_files_on_command_line
66
70
  params_from_command_line
67
71
  else
68
72
  yaml_files_from_command_line + params_from_command_line
@@ -130,13 +130,13 @@ module Chimps
130
130
  #
131
131
  # @return [String]
132
132
  def readme_url
133
- File.join(Chimps::CONFIG[:host], "/README-infochimps")
133
+ File.join(Chimps::CONFIG[:site][:host], "/README-infochimps")
134
134
  end
135
135
 
136
136
  # The URL to the ICSS file for this dataset on Infochimps
137
137
  # servers
138
138
  def icss_url
139
- File.join(Chimps::CONFIG[:host], "datasets", "#{dataset}.yaml")
139
+ File.join(Chimps::CONFIG[:site][:host], "datasets", "#{dataset}.yaml")
140
140
  end
141
141
 
142
142
  # Both the local paths and remote paths to package.
@@ -193,7 +193,7 @@ module Chimps
193
193
  return if skip_packaging?
194
194
  archiver = IMW::Tools::Archiver.new(archive.name, input_paths)
195
195
  result = archiver.package(archive.path)
196
- raise PackagingError.new("Unable to package files for upload. Temporary files left in #{archiver.tmp_dir}") if result.is_a?(RuntimeError) || (!archiver.success?)
196
+ raise PackagingError.new("Unable to package files for upload. Temporary files left in #{archiver.tmp_dir}") if result.is_a?(StandardError) || (!archiver.success?)
197
197
  archiver.clean!
198
198
  end
199
199
 
metadata CHANGED
@@ -1,7 +1,7 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: chimps
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.1.1
4
+ version: 0.1.2
5
5
  platform: ruby
6
6
  authors:
7
7
  - Dhruv Bansal
@@ -70,13 +70,13 @@ extensions: []
70
70
 
71
71
  extra_rdoc_files:
72
72
  - LICENSE
73
- - README.textile
73
+ - README.rdoc
74
74
  files:
75
75
  - .document
76
76
  - .gitignore
77
77
  - CHANGELOG.textile
78
78
  - LICENSE
79
- - README.textile
79
+ - README.rdoc
80
80
  - Rakefile
81
81
  - VERSION
82
82
  - bin/chimps
data/README.textile DELETED
@@ -1,65 +0,0 @@
1
- h2. Awesome Chimp Tricks
2
-
3
- h3. Searching
4
-
5
- Search datasets
6
-
7
- chimps search statisical abstract
8
-
9
- Search sources
10
-
11
- chimps search -m source department of justice
12
-
13
- Search datasets with particular tags
14
-
15
- chimps search -t government,finance statistical abstract
16
-
17
- or categories
18
-
19
- chimps search -c education statistical abstract
20
-
21
- h3. Browsing
22
-
23
- chimps describe dataset 3923
24
- chimps describe source us-doj
25
- chimps describe field length
26
-
27
- h3. Downloading
28
-
29
- chimps download 39283
30
-
31
- h3. Creating
32
-
33
- chimps create data.yaml
34
-
35
- also
36
-
37
- chimps schema source
38
- chimps schema dataset
39
-
40
- and of course
41
-
42
- chimps upload 39283 path/to/my/data
43
-
44
- h3. General Options
45
-
46
- Work as someone other than the usual user
47
-
48
- chimps -i path/to/my/identify_file.yml create data.yaml
49
-
50
-
51
- h2. Settings and Credentials
52
-
53
- Create a file in @~/.chimps@
54
-
55
- <pre><code>
56
- # -*-yaml-*-
57
- :site:
58
- :username: monkeyboy
59
- :key: xxxxxxxxxxxxxxxx
60
- :secret: xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
61
- :query:
62
- :username: monkeyboy
63
- :key: xxxxxxxxxxxxxxxxx
64
- :secret: xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
65
- </code></pre>