mobilize-base 1.0.0
Sign up to get free protection for your applications and to get access to all the features.
- data/.gitignore +9 -0
- data/Gemfile +4 -0
- data/LICENSE.txt +20 -0
- data/README.md +509 -0
- data/Rakefile +34 -0
- data/lib/mobilize-base/extensions/array.rb +22 -0
- data/lib/mobilize-base/extensions/google_drive.rb +296 -0
- data/lib/mobilize-base/extensions/hash.rb +86 -0
- data/lib/mobilize-base/extensions/object.rb +6 -0
- data/lib/mobilize-base/extensions/resque.rb +180 -0
- data/lib/mobilize-base/extensions/string.rb +94 -0
- data/lib/mobilize-base/handlers/emailer.rb +24 -0
- data/lib/mobilize-base/handlers/gdriver.rb +309 -0
- data/lib/mobilize-base/handlers/mongoer.rb +32 -0
- data/lib/mobilize-base/jobtracker.rb +208 -0
- data/lib/mobilize-base/models/dataset.rb +70 -0
- data/lib/mobilize-base/models/job.rb +253 -0
- data/lib/mobilize-base/models/requestor.rb +223 -0
- data/lib/mobilize-base/tasks/mobilize-base.rake +2 -0
- data/lib/mobilize-base/tasks.rb +43 -0
- data/lib/mobilize-base/version.rb +5 -0
- data/lib/mobilize-base.rb +76 -0
- data/lib/samples/gdrive.yml +27 -0
- data/lib/samples/jobtracker.yml +24 -0
- data/lib/samples/mongoid.yml +21 -0
- data/lib/samples/resque.yml +12 -0
- data/mobilize-base.gemspec +35 -0
- data/test/mobilize_test.rb +125 -0
- data/test/redis-test.conf +540 -0
- data/test/test_helper.rb +23 -0
- metadata +260 -0
data/.gitignore
ADDED
data/Gemfile
ADDED
data/LICENSE.txt
ADDED
@@ -0,0 +1,20 @@
|
|
1
|
+
Copyright (c) Cassio Paes-Leme
|
2
|
+
|
3
|
+
Permission is hereby granted, free of charge, to any person obtaining
|
4
|
+
a copy of this software and associated documentation files (the
|
5
|
+
"Software"), to deal in the Software without restriction, including
|
6
|
+
without limitation the rights to use, copy, modify, merge, publish,
|
7
|
+
distribute, sublicense, and/or sell copies of the Software, and to
|
8
|
+
permit persons to whom the Software is furnished to do so, subject to
|
9
|
+
the following conditions:
|
10
|
+
|
11
|
+
The above copyright notice and this permission notice shall be
|
12
|
+
included in all copies or substantial portions of the Software.
|
13
|
+
|
14
|
+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
|
15
|
+
EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
|
16
|
+
MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
|
17
|
+
NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE
|
18
|
+
LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION
|
19
|
+
OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION
|
20
|
+
WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
|
data/README.md
ADDED
@@ -0,0 +1,509 @@
|
|
1
|
+
Mobilize
|
2
|
+
========
|
3
|
+
|
4
|
+
Mobilize is an end-to-end data transfer workflow manager with:
|
5
|
+
* a Google Spreadsheets UI through [google-drive-ruby][google_drive_ruby];
|
6
|
+
* a queue manager through [Resque][resque];
|
7
|
+
* a persistent caching / database layer through [Mongoid][mongoid];
|
8
|
+
* gems for data transfers to/from Hive, mySQL, and HTTP endpoints
|
9
|
+
(coming soon).
|
10
|
+
|
11
|
+
Mobilize-Base includes all the core scheduling and processing
|
12
|
+
functionality, allowing you to:
|
13
|
+
* put workers on the Mobilize Resque queue.
|
14
|
+
* create [Requestors](#section_Start_Requestors_Requestor) and their associated Google Spreadsheet [Jobspecs](#section_Start_Requestors_Jobspec);
|
15
|
+
* poll for [Jobs](#section_Job) on Jobspecs (currently gsheet to gsheet only) and add them to Resque;
|
16
|
+
* monitor the status of Jobs on a rolling log.
|
17
|
+
|
18
|
+
Table Of Contents
|
19
|
+
-----------------
|
20
|
+
* [Overview](#section_Overview)
|
21
|
+
* [Install](#section_Install)
|
22
|
+
* [Redis](#section_Install_Redis)
|
23
|
+
* [MongoDB](#section_Install_MongoDB)
|
24
|
+
* [Mobilize-Base](#section_Install_Mobilize-Base)
|
25
|
+
* [Default Folders and Files](#section_Install_Folders_and_Files)
|
26
|
+
* [Configure](#section_Configure)
|
27
|
+
* [Google Drive](#section_Configure_Google_Drive)
|
28
|
+
* [Jobtracker](#section_Configure_Jobtracker)
|
29
|
+
* [Mongoid](#section_Configure_Mongoid)
|
30
|
+
* [Resque](#section_Configure_Resque)
|
31
|
+
* [Start](#section_Start)
|
32
|
+
* [Start resque-web](#section_Start_Start_resque-web)
|
33
|
+
* [Set Environment](#section_Start_Set_Environment)
|
34
|
+
* [Create Requestor](#section_Start_Create_Requestor)
|
35
|
+
* [Start Workers](#section_Start_Start_Workers)
|
36
|
+
* [View Logs](#section_Start_View_Logs)
|
37
|
+
* [Start Jobtracker](#section_Start_Start_Jobtracker)
|
38
|
+
* [Create Job](#section_Start_Create_Job)
|
39
|
+
* [Run Test](#section_Start_Run_Test)
|
40
|
+
* [Meta](#section_Meta)
|
41
|
+
* [Author](#section_Author)
|
42
|
+
|
43
|
+
<a name='section_Overview'></a>
|
44
|
+
Overview
|
45
|
+
-----------
|
46
|
+
|
47
|
+
* Mobilize is a fun centralized way to access your data lying inside multiple different technoligies under one roof understood by everyone - that is Excel sheets!!
|
48
|
+
* Mobilize can enable transfer of data across diverse databases/technologies like to & from hive, hdfs, hbase, various apis, different databases so that people who are already well versed with dealing with excel sheets can still interact with these diverse technologies and be productive.
|
49
|
+
* The spreadsheets are currently hosted in the cloud on Google Spreadsheets, so that you can access them anywhere - even on your tablets.
|
50
|
+
* Mobilize in pluggable and extensible, so tomorrow if you want to access data from a cool new database techonology, you can just add a module for that.
|
51
|
+
|
52
|
+
|
53
|
+
<a name='section_Install'></a>
|
54
|
+
Install
|
55
|
+
------------
|
56
|
+
|
57
|
+
Mobilize requires Ruby 1.9.3, and has been tested on OSX and Ubuntu.
|
58
|
+
|
59
|
+
[RVM][rvm] is great for managing your rubies.
|
60
|
+
|
61
|
+
<a name='section_Install_Redis'></a>
|
62
|
+
### Redis
|
63
|
+
|
64
|
+
Redis is a pre-requisite for running Resque.
|
65
|
+
|
66
|
+
Please refer to the [Resque Redis Section][redis] for complete
|
67
|
+
instructions.
|
68
|
+
|
69
|
+
<a name='section_Install_MongoDB'></a>
|
70
|
+
### MongoDB
|
71
|
+
|
72
|
+
MongoDB is used to persist caches between reads and writes, keep track
|
73
|
+
of Requestors and Jobs, and store Datasets that map to endpoints.
|
74
|
+
|
75
|
+
Please refer to the [MongoDB Quickstart Page][mongodb_quickstart] to get started.
|
76
|
+
|
77
|
+
The settings for database and port are set in config/mongoid.yml
|
78
|
+
and are best left as default. Please refer to [Configure
|
79
|
+
Mongoid](#section_Configure_Mongoid) for details.
|
80
|
+
|
81
|
+
<a name='section_Install_Mobilize-Base'></a>
|
82
|
+
### Mobilize-Base
|
83
|
+
|
84
|
+
Mobilize-Base contains all of the gems it needs to run.
|
85
|
+
|
86
|
+
add this to your Gemfile:
|
87
|
+
|
88
|
+
``` ruby
|
89
|
+
gem "mobilize-base", "~>1.0"
|
90
|
+
```
|
91
|
+
|
92
|
+
or do
|
93
|
+
|
94
|
+
$ gem install mobilize-base
|
95
|
+
|
96
|
+
for a ruby-wide install.
|
97
|
+
|
98
|
+
<a name='section_Install_Folders_and_Files'></a>
|
99
|
+
### Folders and Files
|
100
|
+
|
101
|
+
Mobilize requires a config folder and a log folder.
|
102
|
+
|
103
|
+
If you're on Rails, it will use the built-in config and log folders.
|
104
|
+
|
105
|
+
Otherwise, it will use log and config folders in the project folder (the
|
106
|
+
same one that contains your Rakefile)
|
107
|
+
|
108
|
+
### Rakefile
|
109
|
+
|
110
|
+
Inside the Rakefile in your project's root folder, make sure you have:
|
111
|
+
|
112
|
+
``` ruby
|
113
|
+
require 'mobilize-base/tasks'
|
114
|
+
```
|
115
|
+
|
116
|
+
This defines tasks essential to run the environment.
|
117
|
+
|
118
|
+
### Config and Log Folders
|
119
|
+
|
120
|
+
run
|
121
|
+
|
122
|
+
$ rake mobilize:setup
|
123
|
+
|
124
|
+
Mobilize will create config and log folders at the project root
|
125
|
+
level. (same as the Rakefile)
|
126
|
+
|
127
|
+
It will also create all required config files, which are detailed below.
|
128
|
+
|
129
|
+
Resque will create a mobilize-resque-`<environment>`.log in the log folder,
|
130
|
+
and loop over 10 files, 10MB each.
|
131
|
+
|
132
|
+
<a name='section_Configure'></a>
|
133
|
+
Configure
|
134
|
+
------------
|
135
|
+
|
136
|
+
All Mobilize configurations live in files in `config/*.yml`. Samples can
|
137
|
+
be found below or on github in the [lib/samples][git_samples] folder.
|
138
|
+
|
139
|
+
<a name='section_Configure_Google_Drive'></a>
|
140
|
+
### Configure Google Drive
|
141
|
+
|
142
|
+
Google drive needs:
|
143
|
+
* an owner email address and password. You can set up separate owners
|
144
|
+
for different environments as in the below file, which will keep your
|
145
|
+
mission critical workers from getting rate-limit errors.
|
146
|
+
* one or more admins with email attributes -- these will be for people
|
147
|
+
who should be given write permissions to ALL Mobilize sheets, for
|
148
|
+
maintenance purposes.
|
149
|
+
* one or more workers with email and pw attributes -- they will be used
|
150
|
+
to queue up google reads and writes. This can be the same as the owner
|
151
|
+
account for testing purposes or low-volume environments.
|
152
|
+
|
153
|
+
__Mobilize only allows one Resque
|
154
|
+
worker at a time to use a Google drive worker account for
|
155
|
+
reading/writing.__
|
156
|
+
|
157
|
+
Sample gdrive.yml:
|
158
|
+
|
159
|
+
``` yml
|
160
|
+
|
161
|
+
development:
|
162
|
+
owner:
|
163
|
+
email: 'owner_development@host.com'
|
164
|
+
pw: "google_drive_password"
|
165
|
+
admins:
|
166
|
+
- {email: 'admin@host.com'}
|
167
|
+
workers:
|
168
|
+
- {email: 'worker_development001@host.com', pw: "worker001_google_drive_password"}
|
169
|
+
- {email: 'worker_development002@host.com', pw: "worker002_google_drive_password"}
|
170
|
+
test:
|
171
|
+
owner:
|
172
|
+
email: 'owner_test@host.com'
|
173
|
+
pw: "google_drive_password"
|
174
|
+
admins:
|
175
|
+
- {email: 'admin@host.com'}
|
176
|
+
workers:
|
177
|
+
- {email: 'worker_test001@host.com', pw: "worker001_google_drive_password"}
|
178
|
+
- {email: 'worker_test002@host.com', pw: "worker002_google_drive_password"}
|
179
|
+
production:
|
180
|
+
owner:
|
181
|
+
email: 'owner_production@host.com'
|
182
|
+
pw: "google_drive_password"
|
183
|
+
admins:
|
184
|
+
- {email: 'admin@host.com'}
|
185
|
+
workers:
|
186
|
+
- {email: 'worker_production001@host.com', pw: "worker001_google_drive_password"}
|
187
|
+
- {email: 'worker_production002@host.com', pw: "worker002_google_drive_password"}
|
188
|
+
|
189
|
+
```
|
190
|
+
|
191
|
+
<a name='section_Configure_Jobtracker'></a>
|
192
|
+
### Configure Jobtracker
|
193
|
+
|
194
|
+
The Jobtracker sits on your Resque and does 2 things:
|
195
|
+
* check for Requestors that are due for polling;
|
196
|
+
* send out notifications when:
|
197
|
+
* there are failed jobs on Resque;
|
198
|
+
* there are jobs on Resque that have run beyond the max run time.
|
199
|
+
|
200
|
+
Emails are sent using ActionMailer, through the owner Google Drive
|
201
|
+
account.
|
202
|
+
|
203
|
+
To this end, it needs these parameters, for which there is a sample
|
204
|
+
below and in the [lib/samples][git_samples] folder:
|
205
|
+
|
206
|
+
``` yml
|
207
|
+
development:
|
208
|
+
cycle_freq: 10 #10 secs between Jobtracker sweeps
|
209
|
+
notification_freq: 3600 #1 hour between failure/timeout notifications
|
210
|
+
requestor_refresh_freq: 300 #5 min between requestor checks
|
211
|
+
max_run_time: 14400 # if a job runs for 4h+, notification will be sent
|
212
|
+
admins: #emails to send notifications to
|
213
|
+
- {email: 'admin@host.com'}
|
214
|
+
test:
|
215
|
+
cycle_freq: 10 #10 secs between Jobtracker sweeps
|
216
|
+
notification_freq: 3600 #1 hour between failure/timeout notifications
|
217
|
+
requestor_refresh_freq: 300 #5 min between requestor checks
|
218
|
+
max_run_time: 14400 # if a job runs for 4h+, notification will be sent
|
219
|
+
admins: #emails to send notifications to
|
220
|
+
- {email: 'admin@host.com'}
|
221
|
+
|
222
|
+
production:
|
223
|
+
cycle_freq: 10 #10 secs between Jobtracker sweeps
|
224
|
+
notification_freq: 3600 #1 hour between failure/timeout notifications
|
225
|
+
requestor_refresh_freq: 300 #5 min between requestor checks
|
226
|
+
max_run_time: 14400 # if a job runs for 4h+, notification will be sent
|
227
|
+
admins: #emails to send notifications to
|
228
|
+
- {email: 'admin@host.com'}
|
229
|
+
```
|
230
|
+
|
231
|
+
<a name='section_Configure_Mongoid'></a>
|
232
|
+
### Configure Mongoid
|
233
|
+
|
234
|
+
Mongoid is the abstraction layer on top of MongoDB so we can interact
|
235
|
+
with it in an ActiveRecord-like fashion.
|
236
|
+
|
237
|
+
It needs the below parameters, which can be found in the [lib/samples][git_samples] folder.
|
238
|
+
|
239
|
+
You shouldn't need to change anything in this file.
|
240
|
+
|
241
|
+
``` yml
|
242
|
+
development:
|
243
|
+
sessions:
|
244
|
+
default:
|
245
|
+
database: mobilize-development
|
246
|
+
persist_in_safe_mode: true
|
247
|
+
hosts:
|
248
|
+
- 127.0.0.1:27017
|
249
|
+
test:
|
250
|
+
sessions:
|
251
|
+
default:
|
252
|
+
database: mobilize-test
|
253
|
+
persist_in_safe_mode: true
|
254
|
+
hosts:
|
255
|
+
- 127.0.0.1:27017
|
256
|
+
production:
|
257
|
+
sessions:
|
258
|
+
default:
|
259
|
+
database: mobilize-production
|
260
|
+
persist_in_safe_mode: true
|
261
|
+
hosts:
|
262
|
+
- 127.0.0.1:27017
|
263
|
+
```
|
264
|
+
|
265
|
+
<a name='section_Configure_Resque'></a>
|
266
|
+
### Configure Resque
|
267
|
+
|
268
|
+
Resque keeps track of Jobs, Workers and logging.
|
269
|
+
|
270
|
+
It needs the below parameters, which can be found in the [lib/samples][git_samples] folder.
|
271
|
+
|
272
|
+
* queue_name - the name of the Resque queue where you would like the Jobtracker and Resque Workers to
|
273
|
+
run. Default is mobilize.
|
274
|
+
* max_workers - the total number of simultaneous workers you would like
|
275
|
+
on your queue. Default is 4 for development and test, 36 in
|
276
|
+
production, but feel free to adjust depending on your hardware.
|
277
|
+
* redis_port - you should probably leave this alone, it specifies the
|
278
|
+
default port for dev and prod and a separate one for testing.
|
279
|
+
|
280
|
+
``` yml
|
281
|
+
development:
|
282
|
+
queue_name: 'mobilize'
|
283
|
+
max_workers: 4
|
284
|
+
redis_port: 6379
|
285
|
+
test:
|
286
|
+
queue_name: 'mobilize'
|
287
|
+
max_workers: 4
|
288
|
+
redis_port: 9736
|
289
|
+
production:
|
290
|
+
queue_name: 'mobilize'
|
291
|
+
max_workers: 36
|
292
|
+
redis_port: 6379
|
293
|
+
```
|
294
|
+
|
295
|
+
<a name='section_Start'></a>
|
296
|
+
Start
|
297
|
+
-----
|
298
|
+
|
299
|
+
A Mobilize instance can be considered "started" or "running" when you have:
|
300
|
+
|
301
|
+
1. Resque workers running on the Mobilize queue;
|
302
|
+
2. A Jobtracker running on one of the Resque workers;
|
303
|
+
3. One or more Requestors created in your MongoDB;
|
304
|
+
4. One or more Jobs created in a Requestor's Jobspec;
|
305
|
+
|
306
|
+
<a name='section_Start_Start_resque-web'></a>
|
307
|
+
### Start resque-web
|
308
|
+
|
309
|
+
To start resque-web, which is a kickass UI layer built in Sinatra,
|
310
|
+
you'll need to install the resque gem explicitly, as in
|
311
|
+
|
312
|
+
``` ruby
|
313
|
+
gem install resque
|
314
|
+
```
|
315
|
+
|
316
|
+
then, you can do
|
317
|
+
|
318
|
+
$ resque-web
|
319
|
+
|
320
|
+
and it'll start an instance on 127.0.0.1:5678
|
321
|
+
|
322
|
+
You'll want to keep an eye on this as it tracks your workers in real
|
323
|
+
time and allows you to keep track of failed jobs. More detail on the
|
324
|
+
[Resque Standalone section][resque-web].
|
325
|
+
|
326
|
+
<a name='section_Start_Set_Environment'></a>
|
327
|
+
### Set Environment
|
328
|
+
|
329
|
+
Mobilize takes the environment from your Rails.env if you're running
|
330
|
+
Rails, or assumes "development." You can specify "development", "test",
|
331
|
+
or "production," as per the yml files.
|
332
|
+
|
333
|
+
Otherwise, it takes it from MOBILIZE_ENV parameter, set from irb, as in:
|
334
|
+
|
335
|
+
``` ruby
|
336
|
+
> ENV['MOBILIZE_ENV'] = 'production'
|
337
|
+
> require 'mobilize-base'
|
338
|
+
```
|
339
|
+
This affects all parameters as set in the yml files, including the
|
340
|
+
database.
|
341
|
+
|
342
|
+
<a name='section_Start_Create_Requestor'></a>
|
343
|
+
### Create Requestor
|
344
|
+
|
345
|
+
Requestors are people who use the Mobilize service to move data from one
|
346
|
+
endpoint to another. They each have a Jobspec, which is a google sheet
|
347
|
+
that contains one or more Jobs.
|
348
|
+
|
349
|
+
To create a requestor, use the Requestor.find_or_create_by_email
|
350
|
+
command in irb (replace the user with your own email, or any email
|
351
|
+
google recognizes).
|
352
|
+
|
353
|
+
``` ruby
|
354
|
+
> Requestor.find_or_create_by_email("user@host.com")
|
355
|
+
```
|
356
|
+
|
357
|
+
<a name='section_Start_Start_Workers'></a>
|
358
|
+
### Start Workers
|
359
|
+
|
360
|
+
Workers are rake tasks that load the Mobilize environment and allow the
|
361
|
+
processing of the Jobtracker, Requestors and Jobs.
|
362
|
+
|
363
|
+
These will start as many workers as are defined in your resque.yml.
|
364
|
+
|
365
|
+
To start workers, do:
|
366
|
+
|
367
|
+
``` ruby
|
368
|
+
> Jobtracker.prep_workers
|
369
|
+
```
|
370
|
+
|
371
|
+
if you have workers already running and would like to kill and refresh
|
372
|
+
them, do:
|
373
|
+
|
374
|
+
``` ruby
|
375
|
+
> Jobtracker.restart_workers!
|
376
|
+
```
|
377
|
+
|
378
|
+
Note that this will kill any workers on the Mobilize queue.
|
379
|
+
|
380
|
+
<a name='section_Start_View_Logs'></a>
|
381
|
+
### View Logs
|
382
|
+
|
383
|
+
at this point, you'll want to start viewing the logs for the Resque
|
384
|
+
workers -- they will be stored under your log folder. You can do:
|
385
|
+
|
386
|
+
$ tail -f log/mobilize-`<environment>`.log
|
387
|
+
|
388
|
+
to view them.
|
389
|
+
|
390
|
+
<a name='section_Start_Start_Jobtracker'></a>
|
391
|
+
### Start Jobtracker
|
392
|
+
|
393
|
+
Once the Resque workers are running, and you have at least one Requestor
|
394
|
+
set up, it's time to start the Jobtracker:
|
395
|
+
|
396
|
+
``` ruby
|
397
|
+
> Jobtracker.start
|
398
|
+
```
|
399
|
+
|
400
|
+
The Jobtracker will automatically enqueue any Requestors that have not
|
401
|
+
been processed in the requestor_refresh period defined in the
|
402
|
+
jobtracker.yml, and create their Jobspecs if they do not exist. You can
|
403
|
+
see this process on your Resque UI and in the log file.
|
404
|
+
|
405
|
+
<a name='section_Start_Create_Job'></a>
|
406
|
+
### Create Job
|
407
|
+
|
408
|
+
Now it's time to go onto the Jobspec and add a Job to be processed.
|
409
|
+
|
410
|
+
To do this, you should log into your Google Drive with either the
|
411
|
+
owner's account, an admin account, or the Jobspec Requestor's account. These
|
412
|
+
will be the accounts with edit permissions to a given Jobspec.
|
413
|
+
|
414
|
+
Navigate to the Jobs tab on the Jobspec `(denoted by Jobspec_<requestor
|
415
|
+
name>)` and enter values under each header:
|
416
|
+
|
417
|
+
* name This is the name of the job you would like to add. Names must be unique across all your jobs, otherwise you will get an error
|
418
|
+
|
419
|
+
* active set this to blank or FALSE if you want to turn off a job
|
420
|
+
|
421
|
+
* schedule This uses human readable syntax to schedule jobs. It accepts the following:
|
422
|
+
* every `<integer>` hour -- fire the job at increments of `<integer>` hours, minimum of 1 hour
|
423
|
+
* every `<integer>` day -- fire the job at increments of `<integer>` days, minimum of 1
|
424
|
+
* every `<integer>` day after <HH:MM> -- fire the job at increments of <integer> days, after HH:MM UTC time
|
425
|
+
* every `<integer>` day_of_week after <HH:MM> -- fire the job on specified day of week, after HH:MM UTC time; 1=Sunday
|
426
|
+
* every `<integer>` day_of_month after <HH:MM> -- fire the job on specified day of month, after HH:MM UTC time
|
427
|
+
* once -- fire the job once if active is set to TRUE, set active to FALSE right after
|
428
|
+
* after `<jobname>` -- fire the job after the job named `<jobname>`
|
429
|
+
|
430
|
+
* status Mobilize writes this field with the last status returned by the job
|
431
|
+
|
432
|
+
* last_error Mobilize writes any errors to this field, and wipes it if
|
433
|
+
the job completes successfully.
|
434
|
+
|
435
|
+
* destination_url Mobilize writes this field with a link to the last dataset returned by the job, blank if none
|
436
|
+
|
437
|
+
* read_handler This is where the job reads its data from. For
|
438
|
+
mobilize-base, you should enter "gsheet"
|
439
|
+
|
440
|
+
* write_handler This is where the job writes its data to. For
|
441
|
+
mobilize-base, you should enter "gsheet"
|
442
|
+
|
443
|
+
* param_source This is the path to an array of data, as read from a google sheet,
|
444
|
+
that is relayed to the job.
|
445
|
+
The format is `<google docs book>/<google docs sheet>`, so if you
|
446
|
+
wanted to read from the "output" sheet on the "monthly results" book you
|
447
|
+
would write in `<monthly results>/<output>`. For a sheet in the Jobspec
|
448
|
+
itself you could write simply `<output>`.
|
449
|
+
|
450
|
+
* params This is a hash of data, expressed in a JSON. Not relevant to
|
451
|
+
mobilize-base
|
452
|
+
|
453
|
+
* destination This is the destination for the data, relayed to the job.
|
454
|
+
For a gsheet write_handler, this would be the name of the sheet to be
|
455
|
+
written to, similar to param_source.
|
456
|
+
|
457
|
+
<a name='section_Start_Run_Test'></a>
|
458
|
+
### Run Test
|
459
|
+
|
460
|
+
To run tests, you will need to
|
461
|
+
|
462
|
+
1) clone the repository
|
463
|
+
|
464
|
+
From the project folder, run
|
465
|
+
|
466
|
+
2) rake mobilize:setup
|
467
|
+
|
468
|
+
and populate the "test" environment in the config files with the
|
469
|
+
necessary details.
|
470
|
+
|
471
|
+
3) $ rake test
|
472
|
+
|
473
|
+
This will create a test Jobspec with a sample job. These will run off a
|
474
|
+
test redis instance which will be killed once the tests finish.
|
475
|
+
|
476
|
+
<a name='section_Meta'></a>
|
477
|
+
Meta
|
478
|
+
----
|
479
|
+
|
480
|
+
* Code: `git clone git://github.com/ngmoco/mobilize-base.git`
|
481
|
+
* Home: <https://github.com/ngmoco/mobilize-base>
|
482
|
+
* Bugs: <https://github.com//mobilize-base/issues>
|
483
|
+
* Gems: <http://rubygems.org/gems/mobilize-base>
|
484
|
+
|
485
|
+
<a name='section_Author'></a>
|
486
|
+
Author
|
487
|
+
------
|
488
|
+
|
489
|
+
Cassio Paes-Leme :: cpaesleme@ngmoco.com :: @cpaesleme
|
490
|
+
|
491
|
+
<a name='section_Special_Thanks'></a>
|
492
|
+
Special Thanks
|
493
|
+
--------------
|
494
|
+
|
495
|
+
* Al Thompson and Sagar Mehta for awesome design advice and discussions
|
496
|
+
* Elliott Clark for enlightening me to the wonders of Resque
|
497
|
+
* Bob Colner for pointing me to google-drive-ruby when I tried to
|
498
|
+
reinvent the wheel
|
499
|
+
* ngmoco:) and DeNA Global for supporting and adopting the Mobilize
|
500
|
+
platform
|
501
|
+
|
502
|
+
[google_drive_ruby]: https://github.com/gimite/google-drive-ruby
|
503
|
+
[resque]: https://github.com/defunkt/resque
|
504
|
+
[mongoid]: http://mongoid.org/en/mongoid/index.html
|
505
|
+
[resque_redis]: https://github.com/defunkt/resque#section_Installing_Redis
|
506
|
+
[mongodb_quickstart]: http://www.mongodb.org/display/DOCS/Quickstart
|
507
|
+
[git_samples]: https://github.ngmoco.com/Ngpipes/mobilize-base/tree/master/lib/samples
|
508
|
+
[rvm]: https://rvm.io/
|
509
|
+
[resque-web]: https://github.com/defunkt/resque#standalone
|
data/Rakefile
ADDED
@@ -0,0 +1,34 @@
|
|
1
|
+
require 'rubygems'
|
2
|
+
|
3
|
+
begin
|
4
|
+
require 'bundler/setup'
|
5
|
+
rescue LoadError => e
|
6
|
+
warn e.message
|
7
|
+
warn "Run `gem install bundler` to install Bundler"
|
8
|
+
exit -1
|
9
|
+
end
|
10
|
+
|
11
|
+
#
|
12
|
+
# Bundler
|
13
|
+
#
|
14
|
+
require "bundler/gem_tasks"
|
15
|
+
|
16
|
+
#
|
17
|
+
# Setup
|
18
|
+
#
|
19
|
+
$LOAD_PATH.unshift 'lib'
|
20
|
+
require 'mobilize-base/tasks'
|
21
|
+
|
22
|
+
|
23
|
+
#
|
24
|
+
# Tests
|
25
|
+
#
|
26
|
+
require 'rake/testtask'
|
27
|
+
|
28
|
+
Rake::TestTask.new do |test|
|
29
|
+
test.verbose = true
|
30
|
+
test.libs << "test"
|
31
|
+
test.libs << "lib"
|
32
|
+
test.test_files = FileList['test/**/*_test.rb']
|
33
|
+
end
|
34
|
+
task :default => :test
|
@@ -0,0 +1,22 @@
|
|
1
|
+
|
2
|
+
class Array
|
3
|
+
def sel(&blk)
|
4
|
+
return self.select(&blk)
|
5
|
+
end
|
6
|
+
def group_count
|
7
|
+
counts = Hash.new(0)
|
8
|
+
self.each { |m| counts[m] += 1 }
|
9
|
+
return counts
|
10
|
+
end
|
11
|
+
def sum
|
12
|
+
return self.inject{|sum,x| sum + x }
|
13
|
+
end
|
14
|
+
def hash_array_to_tsv
|
15
|
+
if self.first.nil? or self.first.class!=Hash
|
16
|
+
return ""
|
17
|
+
end
|
18
|
+
header = self.first.keys.join("\t")
|
19
|
+
rows = self.map{|r| r.values.join("\t")}
|
20
|
+
([header] + rows).join("\n")
|
21
|
+
end
|
22
|
+
end
|