mobilize-base 1.36 → 1.293
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- data/README.md +666 -1
- data/lib/mobilize-base.rb +1 -12
- data/lib/mobilize-base/extensions/array.rb +3 -8
- data/lib/mobilize-base/extensions/google_drive/acl.rb +1 -1
- data/lib/mobilize-base/extensions/google_drive/client_login_fetcher.rb +1 -2
- data/lib/mobilize-base/extensions/google_drive/file.rb +37 -11
- data/lib/mobilize-base/extensions/string.rb +6 -11
- data/lib/mobilize-base/extensions/yaml.rb +7 -10
- data/lib/mobilize-base/handlers/gbook.rb +38 -25
- data/lib/mobilize-base/handlers/gdrive.rb +4 -20
- data/lib/mobilize-base/handlers/gfile.rb +10 -64
- data/lib/mobilize-base/handlers/gridfs.rb +24 -19
- data/lib/mobilize-base/handlers/gsheet.rb +29 -45
- data/lib/mobilize-base/handlers/resque.rb +10 -17
- data/lib/mobilize-base/jobtracker.rb +196 -22
- data/lib/mobilize-base/models/job.rb +77 -107
- data/lib/mobilize-base/models/runner.rb +122 -36
- data/lib/mobilize-base/models/stage.rb +37 -18
- data/lib/mobilize-base/tasks.rb +13 -50
- data/lib/mobilize-base/version.rb +1 -1
- data/lib/samples/gdrive.yml +0 -15
- data/lib/samples/gridfs.yml +3 -0
- data/lib/samples/gsheet.yml +4 -4
- data/lib/samples/jobtracker.yml +6 -0
- data/mobilize-base.gemspec +3 -3
- data/test/base_job_rows.yml +11 -0
- data/test/mobilize-base_test.rb +106 -0
- data/test/test_base_1.yml +3 -0
- data/test/test_helper.rb +0 -155
- metadata +24 -36
- data/lib/mobilize-base/extensions/time.rb +0 -20
- data/lib/mobilize-base/helpers/job_helper.rb +0 -54
- data/lib/mobilize-base/helpers/jobtracker_helper.rb +0 -143
- data/lib/mobilize-base/helpers/runner_helper.rb +0 -83
- data/lib/mobilize-base/helpers/stage_helper.rb +0 -38
- data/lib/samples/gfile.yml +0 -9
- data/test/fixtures/base1_stage1.in.yml +0 -10
- data/test/fixtures/integration_expected.yml +0 -25
- data/test/fixtures/integration_jobs.yml +0 -12
- data/test/fixtures/is_due.yml +0 -97
- data/test/integration/mobilize-base_test.rb +0 -57
- data/test/unit/mobilize-base_test.rb +0 -33
data/README.md
CHANGED
@@ -1,4 +1,669 @@
|
|
1
1
|
Mobilize
|
2
2
|
========
|
3
3
|
|
4
|
-
|
4
|
+
Mobilize is an end-to-end data transfer workflow manager with:
|
5
|
+
* a Google Spreadsheets UI through [google-drive-ruby][google_drive_ruby];
|
6
|
+
* a queue manager through [Resque][resque];
|
7
|
+
* a persistent caching / database layer through [Mongoid][mongoid];
|
8
|
+
* gems for data transfers to/from Hive, mySQL, and HTTP endpoints
|
9
|
+
(coming soon).
|
10
|
+
|
11
|
+
Mobilize-Base includes all the core scheduling and processing
|
12
|
+
functionality, allowing you to:
|
13
|
+
* put workers on the Mobilize Resque queue.
|
14
|
+
* create [Users](#section_Start_Users_User) and their associated Google Spreadsheet [Runners](#section_Start_Users_Runner);
|
15
|
+
* poll for [Jobs](#section_Job) on Runners (currently gsheet to gsheet only) and add them to Resque;
|
16
|
+
* monitor the status of Jobs on a rolling log.
|
17
|
+
|
18
|
+
Table Of Contents
|
19
|
+
-----------------
|
20
|
+
* [Overview](#section_Overview)
|
21
|
+
* [Install](#section_Install)
|
22
|
+
* [Redis](#section_Install_Redis)
|
23
|
+
* [MongoDB](#section_Install_MongoDB)
|
24
|
+
* [Mobilize-Base](#section_Install_Mobilize-Base)
|
25
|
+
* [Default Folders and Files](#section_Install_Folders_and_Files)
|
26
|
+
* [Configure](#section_Configure)
|
27
|
+
* [Google Drive](#section_Configure_Google_Drive)
|
28
|
+
* [Google Sheets](#section_Configure_Google_Sheets)
|
29
|
+
* [Jobtracker](#section_Configure_Jobtracker)
|
30
|
+
* [Resque](#section_Configure_Resque)
|
31
|
+
* [Resque-Web](#section_Configure_Resque-Web)
|
32
|
+
* [Gridfs](#section_Configure_Gridfs)
|
33
|
+
* [Mongoid](#section_Configure_Mongoid)
|
34
|
+
* [Start](#section_Start)
|
35
|
+
* [Start Resque-Web](#section_Start_Start_Resque-Web)
|
36
|
+
* [Set Environment](#section_Start_Set_Environment)
|
37
|
+
* [Create User](#section_Start_Create_User)
|
38
|
+
* [Start Workers](#section_Start_Start_Workers)
|
39
|
+
* [View Logs](#section_Start_View_Logs)
|
40
|
+
* [Start Jobtracker](#section_Start_Start_Jobtracker)
|
41
|
+
* [Create Job](#section_Start_Create_Job)
|
42
|
+
* [Run Test](#section_Start_Run_Test)
|
43
|
+
* [Add Gbooks and Gsheets](#section_Start_Add_Gbooks_And_Gsheets)
|
44
|
+
* [Meta](#section_Meta)
|
45
|
+
* [Author](#section_Author)
|
46
|
+
* [Special Thanks](#section_Special_Thanks)
|
47
|
+
|
48
|
+
|
49
|
+
<a name='section_Overview'></a>
|
50
|
+
Overview
|
51
|
+
-----------
|
52
|
+
|
53
|
+
* Mobilize is a script deployment and data visualization framework with
|
54
|
+
a Google Spreadsheets UI.
|
55
|
+
* Mobilize uses Resque for parallelization and queueuing, MongoDB for caching,
|
56
|
+
and Google Drive for hosting, user input and display.
|
57
|
+
* The [mobilize-ssh][mobilize-ssh] gem allows you to run scripts and
|
58
|
+
copy files between different machines, and have output directed to a
|
59
|
+
spreadsheet for viewing and processing.
|
60
|
+
* The platform is easily extensible: add your own rake tasks and
|
61
|
+
handlers by following a few simple conventions, and you can have your own
|
62
|
+
Mobilize gem up and running in no time.
|
63
|
+
|
64
|
+
<a name='section_Install'></a>
|
65
|
+
Install
|
66
|
+
------------
|
67
|
+
|
68
|
+
Mobilize requires Ruby 1.9.3, and has been tested on OSX and Ubuntu.
|
69
|
+
|
70
|
+
[RVM][rvm] is great for managing your rubies.
|
71
|
+
|
72
|
+
<a name='section_Install_Redis'></a>
|
73
|
+
### Redis
|
74
|
+
|
75
|
+
Redis is a pre-requisite for running Resque.
|
76
|
+
|
77
|
+
Please refer to the [Resque Redis Section][redis] for complete
|
78
|
+
instructions.
|
79
|
+
|
80
|
+
<a name='section_Install_MongoDB'></a>
|
81
|
+
### MongoDB
|
82
|
+
|
83
|
+
MongoDB is used to persist caches between reads and writes, keep track
|
84
|
+
of Users and Jobs, and store Datasets that map to endpoints.
|
85
|
+
|
86
|
+
Please refer to the [MongoDB Quickstart Page][mongodb_quickstart] to get started.
|
87
|
+
|
88
|
+
The settings for database and port are set in config/mongoid.yml
|
89
|
+
and are best left as default. Please refer to [Configure
|
90
|
+
Mongoid](#section_Configure_Mongoid) for details.
|
91
|
+
|
92
|
+
<a name='section_Install_Mobilize-Base'></a>
|
93
|
+
### Mobilize-Base
|
94
|
+
|
95
|
+
Mobilize-Base contains all of the gems it needs to run.
|
96
|
+
|
97
|
+
add this to your Gemfile:
|
98
|
+
|
99
|
+
``` ruby
|
100
|
+
gem "mobilize-base"
|
101
|
+
```
|
102
|
+
|
103
|
+
or do
|
104
|
+
|
105
|
+
$ gem install mobilize-base
|
106
|
+
|
107
|
+
for a ruby-wide install.
|
108
|
+
|
109
|
+
<a name='section_Install_Folders_and_Files'></a>
|
110
|
+
### Folders and Files
|
111
|
+
|
112
|
+
Mobilize requires a config folder and a log folder.
|
113
|
+
|
114
|
+
If you're on Rails, it will use the built-in config and log folders.
|
115
|
+
|
116
|
+
Otherwise, it will use log and config folders in the project folder (the
|
117
|
+
same one that contains your Rakefile)
|
118
|
+
|
119
|
+
### Rakefile
|
120
|
+
|
121
|
+
Inside the Rakefile in your project's root folder, make sure you have:
|
122
|
+
|
123
|
+
``` ruby
|
124
|
+
require 'mobilize-base/tasks'
|
125
|
+
```
|
126
|
+
|
127
|
+
This defines rake tasks essential to run the environment.
|
128
|
+
|
129
|
+
### Config and Log Folders
|
130
|
+
|
131
|
+
run
|
132
|
+
|
133
|
+
$ rake mobilize_base:setup
|
134
|
+
|
135
|
+
Mobilize will create config/mobilize/ and log/ folders at the project root
|
136
|
+
level. (same as the Rakefile).
|
137
|
+
|
138
|
+
(You can override these by passing
|
139
|
+
MOBILIZE_CONFIG_DIR and/or MOBILIZE_LOG_DIR arguments to the command.
|
140
|
+
All directories must end with a '/'.)
|
141
|
+
|
142
|
+
The script will also create samples for all required config files, which are detailed below.
|
143
|
+
|
144
|
+
Resque will create a mobilize-resque-`<environment>`.log in the log folder,
|
145
|
+
and loop over 10 files, 10MB each.
|
146
|
+
|
147
|
+
<a name='section_Configure'></a>
|
148
|
+
Configure
|
149
|
+
------------
|
150
|
+
|
151
|
+
All Mobilize configurations live in files in `config/mobilize/*.yml` by
|
152
|
+
default. Samples can
|
153
|
+
be found below or on github in the [lib/samples][git_samples] folder.
|
154
|
+
|
155
|
+
<a name='section_Configure_Google_Drive'></a>
|
156
|
+
### Configure Google Drive
|
157
|
+
|
158
|
+
gdrive.yml needs:
|
159
|
+
* a domain, which can be gmail.com but may be different depending on
|
160
|
+
your organization. All gdrive accounts should have
|
161
|
+
the same domain, and all Users should have emails in this domain.
|
162
|
+
* an owner name and password. You can set up separate owners
|
163
|
+
for different environments as in the below file, which will keep your
|
164
|
+
mission critical workers from getting rate-limit errors.
|
165
|
+
* one or more admins with email attributes -- these will be for people
|
166
|
+
who should be given write permissions to all Mobilize books in the
|
167
|
+
environment for maintenance purposes.
|
168
|
+
* one or more workers with name and pw attributes -- they will be used
|
169
|
+
to queue up google reads and writes. This can be the same as the owner
|
170
|
+
account for testing purposes or low-volume environments.
|
171
|
+
|
172
|
+
__Mobilize only allows one Resque
|
173
|
+
worker at a time to use a Google drive worker account for
|
174
|
+
reading/writing, which is called a gdrive_slot.__
|
175
|
+
|
176
|
+
Sample gdrive.yml:
|
177
|
+
|
178
|
+
``` yml
|
179
|
+
---
|
180
|
+
development:
|
181
|
+
domain: host.com
|
182
|
+
owner:
|
183
|
+
name: owner_development
|
184
|
+
pw: google_drive_password
|
185
|
+
admins:
|
186
|
+
- name: admin
|
187
|
+
workers:
|
188
|
+
- name: worker_development001
|
189
|
+
pw: worker001_google_drive_password
|
190
|
+
- name: worker_development002,
|
191
|
+
pw: worker002_google_drive_password
|
192
|
+
test:
|
193
|
+
domain: host.com
|
194
|
+
owner:
|
195
|
+
name: owner_test
|
196
|
+
pw: google_drive_password
|
197
|
+
admins:
|
198
|
+
- name: admin
|
199
|
+
workers:
|
200
|
+
- name: worker_test001
|
201
|
+
pw: worker001_google_drive_password
|
202
|
+
- name: worker_test002
|
203
|
+
pw: worker002_google_drive_password
|
204
|
+
production:
|
205
|
+
domain: host.com
|
206
|
+
owner:
|
207
|
+
name: owner_production
|
208
|
+
pw: google_drive_password
|
209
|
+
admins:
|
210
|
+
- name: admin
|
211
|
+
workers:
|
212
|
+
- name: worker_production001
|
213
|
+
pw: worker001_google_drive_password
|
214
|
+
- name: worker_production002
|
215
|
+
pw: worker002_google_drive_password
|
216
|
+
```
|
217
|
+
|
218
|
+
<a name='section_Configure_Google_Sheets'></a>
|
219
|
+
### Configure Google Sheets
|
220
|
+
|
221
|
+
gsheet.yml needs:
|
222
|
+
* max_cells, which is the number of cells a sheet is allowed to have
|
223
|
+
written to it at one time. Default is 400k cells, which is the max per
|
224
|
+
book. Google Drive will throw its own exception if
|
225
|
+
you try to write more than that.
|
226
|
+
* Because Google Docs ties date formatting to the Locale for the
|
227
|
+
spreadsheet, there are 2 date format parameters:
|
228
|
+
* read_date_format, which is the format that should be read FROM google
|
229
|
+
sheets for date columns.
|
230
|
+
* sheet_date_format, which is the format that the google sheet is in.
|
231
|
+
* A date column is defined as one where the column header = "date" or "Date", or ends with "_date" or "Date".
|
232
|
+
* The defaults are set to US locale for sheet_date_format, because in 'Murica (US) we
|
233
|
+
use %m/%d/%Y for some reason, and to %Y-%m-%d format for
|
234
|
+
reading, which is more standard and sorts well as a string. If your
|
235
|
+
locale is NOT 'Murica you will want to change these.
|
236
|
+
|
237
|
+
Sample gsheet.yml
|
238
|
+
|
239
|
+
``` yml
|
240
|
+
---
|
241
|
+
development:
|
242
|
+
max_cells: 400000
|
243
|
+
read_date_format: "%Y-%m-%d"
|
244
|
+
sheet_date_format: "%m/%d/%Y"
|
245
|
+
test:
|
246
|
+
max_cells: 400000
|
247
|
+
read_date_format: "%Y-%m-%d"
|
248
|
+
sheet_date_format: "%m/%d/%Y"
|
249
|
+
staging:
|
250
|
+
max_cells: 400000
|
251
|
+
read_date_format: "%Y-%m-%d"
|
252
|
+
sheet_date_format: "%m/%d/%Y"
|
253
|
+
production:
|
254
|
+
max_cells: 400000
|
255
|
+
read_date_format: "%Y-%m-%d"
|
256
|
+
sheet_date_format: "%m/%d/%Y"
|
257
|
+
```
|
258
|
+
|
259
|
+
<a name='section_Configure_Jobtracker'></a>
|
260
|
+
### Configure Jobtracker
|
261
|
+
|
262
|
+
The Jobtracker sits on your Resque and does 2 things:
|
263
|
+
* check for Users that are due for polling;
|
264
|
+
* send out notifications when:
|
265
|
+
* there are failed jobs on Resque;
|
266
|
+
* there are jobs on Resque that have run beyond the max run time.
|
267
|
+
|
268
|
+
Emails are sent using ActionMailer, through the owner Google Drive
|
269
|
+
account.
|
270
|
+
|
271
|
+
To this end, it needs these parameters, for which there is a sample
|
272
|
+
below and in the [lib/samples][git_samples] folder:
|
273
|
+
|
274
|
+
``` yml
|
275
|
+
---
|
276
|
+
development:
|
277
|
+
cycle_freq: 10 #time between Jobtracker sweeps
|
278
|
+
notification_freq: 3600 #1 hour between failure/timeout notifications
|
279
|
+
runner_read_freq: 300 #5 min between runner reads
|
280
|
+
max_run_time: 14400 # if a job runs for 4h+, notification will be sent
|
281
|
+
extensions: [] #additional Mobilize modules to load workers with
|
282
|
+
admins: #emails to send notifications to
|
283
|
+
- email: admin@host.com
|
284
|
+
test:
|
285
|
+
cycle_freq: 10 #time between Jobtracker sweeps
|
286
|
+
notification_freq: 3600 #1 hour between failure/timeout notifications
|
287
|
+
runner_read_freq: 300 #5 min between runner reads
|
288
|
+
max_run_time: 14400 # if a job runs for 4h+, notification will be sent
|
289
|
+
extensions: [] #additional Mobilize modules to load workers with
|
290
|
+
admins: #emails to send notifications to
|
291
|
+
- email: admin@host.com
|
292
|
+
production:
|
293
|
+
cycle_freq: 10 #time between Jobtracker sweeps
|
294
|
+
notification_freq: 3600 #1 hour between failure/timeout notifications
|
295
|
+
runner_read_freq: 300 #5 min between runner reads
|
296
|
+
max_run_time: 14400 # if a job runs for 4h+, notification will be sent
|
297
|
+
extensions: [] #additional Mobilize modules to load workers with
|
298
|
+
admins: #emails to send notifications to
|
299
|
+
- email: admin@host.com
|
300
|
+
```
|
301
|
+
|
302
|
+
<a name='section_Configure_Resque'></a>
|
303
|
+
### Configure Resque
|
304
|
+
|
305
|
+
Resque keeps track of Jobs, Workers and logging.
|
306
|
+
|
307
|
+
It needs the below parameters, which can be found in the [lib/samples][git_samples] folder.
|
308
|
+
|
309
|
+
* queue_name - the name of the Resque queue where you would like the Jobtracker and Resque Workers to
|
310
|
+
run. Default is mobilize.
|
311
|
+
* max_workers - the total number of simultaneous workers you would like
|
312
|
+
on your queue. Default is 4 for development and test, 36 in
|
313
|
+
production, but feel free to adjust depending on your hardware.
|
314
|
+
* redis_port - you should probably leave this alone, it specifies the
|
315
|
+
default port for dev and prod and a separate one for testing.
|
316
|
+
* web_port - this specifies the port under which resque-web operates
|
317
|
+
|
318
|
+
``` yml
|
319
|
+
---
|
320
|
+
development:
|
321
|
+
queue_name: mobilize
|
322
|
+
max_workers: 4
|
323
|
+
redis_port: 6379
|
324
|
+
web_port: 8282
|
325
|
+
test:
|
326
|
+
queue_name: mobilize
|
327
|
+
max_workers: 4
|
328
|
+
redis_port: 9736
|
329
|
+
web_port: 8282
|
330
|
+
production:
|
331
|
+
queue_name: mobilize
|
332
|
+
max_workers: 36
|
333
|
+
redis_port: 6379
|
334
|
+
web_port: 8282
|
335
|
+
```
|
336
|
+
|
337
|
+
<a name='section_Configure_Resque-Web'></a>
|
338
|
+
### Configure Resque-Web
|
339
|
+
|
340
|
+
Please change your default username and password in the resque_web.rb
|
341
|
+
file in your config folder, reproduced below:
|
342
|
+
|
343
|
+
``` ruby
|
344
|
+
#comment out the below if you want no authentication on your web portal (not recommended)
|
345
|
+
Resque::Server.use(Rack::Auth::Basic) do |user, password|
|
346
|
+
[user, password] == ['admin', 'changeyourpassword']
|
347
|
+
end
|
348
|
+
```
|
349
|
+
|
350
|
+
This file is passed as a config file argument to
|
351
|
+
mobilize_base:resque_web task, as detailed in [Start Resque-Web](#section_Start_Start_Resque-Web).
|
352
|
+
|
353
|
+
<a name='section_Configure_Gridfs'></a>
|
354
|
+
### Configure Gridfs
|
355
|
+
|
356
|
+
Mobilize stores cached data in MongoDB Gridfs.
|
357
|
+
It needs the below parameters, which can be found in the [lib/samples][git_samples] folder.
|
358
|
+
|
359
|
+
* max_versions - the number of __different__ versions of data to keep
|
360
|
+
for a given cache. Default is 10. This is meant mostly to allow you to
|
361
|
+
restore Runners from cache if necessary.
|
362
|
+
* max_compressed_write_size - the amount of compressed data Gridfs will
|
363
|
+
allow. If you try to write more than this, an exception will be thrown.
|
364
|
+
|
365
|
+
``` yml
|
366
|
+
---
|
367
|
+
development:
|
368
|
+
max_versions: 10 #number of versions of cache to keep in gridfs
|
369
|
+
max_compressed_write_size: 1000000000 #~1GB
|
370
|
+
test:
|
371
|
+
max_versions: 10 #number of versions of cache to keep in gridfs
|
372
|
+
max_compressed_write_size: 1000000000 #~1GB
|
373
|
+
production:
|
374
|
+
max_versions: 10 #number of versions of cache to keep in gridfs
|
375
|
+
max_compressed_write_size: 1000000000 #~1GB
|
376
|
+
```
|
377
|
+
|
378
|
+
<a name='section_Configure_Mongoid'></a>
|
379
|
+
### Configure Mongoid
|
380
|
+
|
381
|
+
Mongoid is the abstraction layer on top of MongoDB so we can interact
|
382
|
+
with it in an ActiveRecord-like fashion.
|
383
|
+
|
384
|
+
It needs the below parameters, which can be found in the [lib/samples][git_samples] folder.
|
385
|
+
|
386
|
+
You shouldn't need to change anything in this file.
|
387
|
+
|
388
|
+
``` yml
|
389
|
+
---
|
390
|
+
development:
|
391
|
+
sessions:
|
392
|
+
default:
|
393
|
+
database: mobilize-development
|
394
|
+
persist_in_safe_mode: true
|
395
|
+
hosts:
|
396
|
+
- 127.0.0.1:27017
|
397
|
+
test:
|
398
|
+
sessions:
|
399
|
+
default:
|
400
|
+
database: mobilize-test
|
401
|
+
persist_in_safe_mode: true
|
402
|
+
hosts:
|
403
|
+
- 127.0.0.1:27017
|
404
|
+
production:
|
405
|
+
sessions:
|
406
|
+
default:
|
407
|
+
database: mobilize-production
|
408
|
+
persist_in_safe_mode: true
|
409
|
+
hosts:
|
410
|
+
- 127.0.0.1:27017
|
411
|
+
```
|
412
|
+
|
413
|
+
<a name='section_Start'></a>
|
414
|
+
Start
|
415
|
+
-----
|
416
|
+
|
417
|
+
A Mobilize instance can be considered "started" or "running" when you have:
|
418
|
+
|
419
|
+
1. Resque workers running on the Mobilize queue;
|
420
|
+
2. A Jobtracker running on one of the Resque workers;
|
421
|
+
3. One or more Users created in your MongoDB;
|
422
|
+
4. One or more Jobs created in a User's Runner;
|
423
|
+
|
424
|
+
<a name='section_Start_Start_resque-web'></a>
|
425
|
+
### Start resque-web
|
426
|
+
|
427
|
+
Mobilize ships with its own rake task to start resque web -- you can do
|
428
|
+
the following:
|
429
|
+
|
430
|
+
|
431
|
+
$ MOBILIZE_ENV=<environment> rake mobilize_base:resque_web
|
432
|
+
|
433
|
+
This will start a resque_web instance with the port specified in your
|
434
|
+
resque.yml and the config/auth scheme specified in your resque_web.rb.
|
435
|
+
|
436
|
+
More detail on the
|
437
|
+
[Resque-Web Standalone section][resque-web].
|
438
|
+
|
439
|
+
<a name='section_Start_Set_Environment'></a>
|
440
|
+
### Set Environment
|
441
|
+
|
442
|
+
Mobilize takes the environment from your Rails.env if you're running
|
443
|
+
Rails, or assumes "development." You can specify "development", "test",
|
444
|
+
or "production," as per the yml files.
|
445
|
+
|
446
|
+
Otherwise, it takes it from MOBILIZE_ENV parameter, as in:
|
447
|
+
|
448
|
+
``` ruby
|
449
|
+
> ENV['MOBILIZE_ENV'] = 'production'
|
450
|
+
> require 'mobilize-base'
|
451
|
+
```
|
452
|
+
This affects all parameters as set in the yml files, including the
|
453
|
+
database.
|
454
|
+
|
455
|
+
<a name='section_Start_Create_User'></a>
|
456
|
+
### Create User
|
457
|
+
|
458
|
+
Users are people who use the Mobilize service to move data from one
|
459
|
+
endpoint to another. They each have a Runner, which is a google sheet
|
460
|
+
that contains one or more Jobs.
|
461
|
+
|
462
|
+
To create a requestor, use the User.find_or_create_by_name
|
463
|
+
command (replace the user with your own name, or any name
|
464
|
+
in your domain).
|
465
|
+
|
466
|
+
``` ruby
|
467
|
+
irb> User.find_or_create_by_name("user_name")
|
468
|
+
```
|
469
|
+
|
470
|
+
<a name='section_Start_Start_Workers'></a>
|
471
|
+
### Start Workers
|
472
|
+
|
473
|
+
Workers are rake tasks that load the Mobilize environment and allow the
|
474
|
+
processing of the Jobtracker, Users and Jobs.
|
475
|
+
|
476
|
+
These will start as many workers as are defined in your resque.yml.
|
477
|
+
|
478
|
+
To start workers, do:
|
479
|
+
|
480
|
+
``` ruby
|
481
|
+
> Jobtracker.prep_workers
|
482
|
+
```
|
483
|
+
|
484
|
+
if you have workers already running and would like to kill and refresh
|
485
|
+
them, do:
|
486
|
+
|
487
|
+
``` ruby
|
488
|
+
> Jobtracker.restart_workers!
|
489
|
+
```
|
490
|
+
|
491
|
+
Note that restart will kill any workers on the Mobilize queue.
|
492
|
+
|
493
|
+
<a name='section_Start_View_Logs'></a>
|
494
|
+
### View Logs
|
495
|
+
|
496
|
+
at this point, you'll want to start viewing the logs for the Resque
|
497
|
+
workers -- they will be stored under your log folder, by default log/. You can do:
|
498
|
+
|
499
|
+
$ tail -f log/mobilize-`<environment>`.log
|
500
|
+
|
501
|
+
to view them.
|
502
|
+
|
503
|
+
<a name='section_Start_Start_Jobtracker'></a>
|
504
|
+
### Start Jobtracker
|
505
|
+
|
506
|
+
Once the Resque workers are running, and you have at least one User
|
507
|
+
set up, it's time to start the Jobtracker:
|
508
|
+
|
509
|
+
``` ruby
|
510
|
+
> Jobtracker.start
|
511
|
+
```
|
512
|
+
|
513
|
+
The Jobtracker will automatically enqueue any Users that have not
|
514
|
+
been processed in the requestor_refresh period defined in the
|
515
|
+
jobtracker.yml, and create their Runners if they do not exist. You can
|
516
|
+
see this process on your Resque UI and in the log file.
|
517
|
+
|
518
|
+
<a name='section_Start_Create_Job'></a>
|
519
|
+
### Create Job
|
520
|
+
|
521
|
+
Now it's time to go onto the Runner and add a Job to be processed.
|
522
|
+
|
523
|
+
To do this, you should log into your Google Drive with either the
|
524
|
+
owner's account, an admin account, or the Runner User's account. These
|
525
|
+
will be the accounts with edit permissions to a given Runner.
|
526
|
+
|
527
|
+
Navigate to the Jobs tab on the Runner `(denoted by Runner(<requestor
|
528
|
+
name>))` and enter values under each header:
|
529
|
+
|
530
|
+
* name This is the name of the job you would like to add. Names must be unique across all your jobs, otherwise you will get an error
|
531
|
+
|
532
|
+
* active set this to blank or FALSE if you want to turn off a job
|
533
|
+
|
534
|
+
* trigger This uses human readable syntax to schedule jobs. It accepts the following:
|
535
|
+
* every `<integer>` hour -- fire the job at increments of `<integer>` hours, minimum of 1 hour
|
536
|
+
* every `<integer>` day -- fire the job at increments of `<integer>` days, minimum of 1
|
537
|
+
* every `<integer>` day after <HH:MM> -- fire the job at increments of <integer> days, after HH:MM UTC time
|
538
|
+
* every `<integer>` day_of_week after <HH:MM> -- fire the job on specified day of week, after HH:MM UTC time; 1=Sunday
|
539
|
+
* every `<integer>` day_of_month after <HH:MM> -- fire the job on specified day of month, after HH:MM UTC time
|
540
|
+
* once -- fire the job once if active is set to TRUE, set active to FALSE right after
|
541
|
+
* after `<jobname>` -- fire the job after the job named `<jobname>`
|
542
|
+
|
543
|
+
* status Mobilize writes this field with the last status returned by the job
|
544
|
+
|
545
|
+
* stage1..stage5 List of stages to be performed by the job.
|
546
|
+
* Stages have this syntax: `<handler>.<call> <params>`.
|
547
|
+
* handler specifies the file that should receive the stage
|
548
|
+
* the call specifies the method within the file. The method should
|
549
|
+
be called `"<handler>.<call>_by_stage_path"`
|
550
|
+
* the params the method accepts, which are custom to each
|
551
|
+
stage. These should be of the for `<key1>: <value1>, <key2>: <value2>`, where
|
552
|
+
`<key>` is an unquoted string and `<value>` is a quoted string, an
|
553
|
+
integer, an array (delimited by square braces), or a hash (delimited by
|
554
|
+
curly braces).
|
555
|
+
* For mobilize-base, the following stage is available:
|
556
|
+
* gsheet.write `source: <input_path>`, which reads the sheet.
|
557
|
+
* The input_path should be of the form:
|
558
|
+
* `<gbook_name>/<gsheet_name>` or just `<gsheet_name>` if the target is in
|
559
|
+
the Runner itself.
|
560
|
+
* `gfile://<gfile_name>` if the target is a file.
|
561
|
+
* The file must be owned by the Gdrive owner.
|
562
|
+
* The test uses "gfile://test_base_1.tsv".
|
563
|
+
* The stage_name should be of the form `<stage_column>`. The test uses "stage1" for the first test
|
564
|
+
and "base1.out" for the second test. The first
|
565
|
+
takes the output from the first stage and the second reads it straight
|
566
|
+
from the referenced sheet.
|
567
|
+
* All stages accept retry parameters:
|
568
|
+
* retries: an integer specifying the number of times that the system will try it again before giving up.
|
569
|
+
* delay: an integer specifying the number of seconds between retries.
|
570
|
+
* always_on: if true, keeps the job on regardless of stage failures. The job will retry from the beginning with the same frequency as the Runner refresh rate.
|
571
|
+
* If a stage fails after all retries, it will output its standard error to a tab in the Runner with the name of the job, the name of the stage, and a ".err" extension
|
572
|
+
* The tab will be headed "response" and will contain the exception and backtrace for the error.
|
573
|
+
* The test uses "Requestor_mobilize(test)/base1.out" and
|
574
|
+
"Runner_mobilize(test)/base2.out" for target sheets.
|
575
|
+
|
576
|
+
<a name='section_Start_Run_Test'></a>
|
577
|
+
### Run Test
|
578
|
+
|
579
|
+
To run tests, you will need to
|
580
|
+
|
581
|
+
1) clone the repository
|
582
|
+
|
583
|
+
From the project folder, run
|
584
|
+
|
585
|
+
2) rake mobilize_base:setup
|
586
|
+
|
587
|
+
and populate the "test" environment in the config files with the
|
588
|
+
necessary details.
|
589
|
+
|
590
|
+
3) $ rake test
|
591
|
+
|
592
|
+
This will create a test Runner with a sample job. These will run off a
|
593
|
+
test redis instance which will be killed once the tests finish.
|
594
|
+
|
595
|
+
<a name='section_Start_'></a>
|
596
|
+
### Run Test
|
597
|
+
|
598
|
+
To run tests, you will need to
|
599
|
+
|
600
|
+
1) clone the repository
|
601
|
+
|
602
|
+
From the project folder, run
|
603
|
+
|
604
|
+
2) rake mobilize_base:setup
|
605
|
+
|
606
|
+
and populate the "test" environment in the config files with the
|
607
|
+
necessary details.
|
608
|
+
|
609
|
+
3) $ rake test
|
610
|
+
|
611
|
+
This will create a test Runner with a sample job. These will run off a
|
612
|
+
test redis instance. This instance will be kept alive so you can test
|
613
|
+
additional Mobilize modules. (see [mobilize-ssh][mobilize-ssh] for more)
|
614
|
+
|
615
|
+
<a name='section_Start_Add_Gbooks_And_Gsheets'></a>
|
616
|
+
### Add Gbooks and Gsheets
|
617
|
+
|
618
|
+
A User's Runner should be kept clean, preferably with only the jobs
|
619
|
+
sheet. The test keeps everything in the
|
620
|
+
Runner, but in reality you will want to create lots of different books
|
621
|
+
to share with different people in your organization.
|
622
|
+
|
623
|
+
To add a new Gbook, create one as you normally would, then make sure the
|
624
|
+
Owner is the same user as specified in your gdrive.yml/owner/name value.
|
625
|
+
Mobilize will handle the rest, extending permissions to workers and
|
626
|
+
admins.
|
627
|
+
|
628
|
+
Also make sure any Gsheets you specify for __read__ operations exist
|
629
|
+
prior to calling the job, or there will be an error. __Write__
|
630
|
+
operations will create the book and sheet if it does not already exist,
|
631
|
+
already under ownership of the owner account.
|
632
|
+
|
633
|
+
<a name='section_Meta'></a>
|
634
|
+
Meta
|
635
|
+
----
|
636
|
+
|
637
|
+
* Code: `git clone git://github.com/ngmoco/mobilize-base.git`
|
638
|
+
* Home: <https://github.com/ngmoco/mobilize-base>
|
639
|
+
* Bugs: <https://github.com/ngmoco/mobilize-base/issues>
|
640
|
+
* Gems: <http://rubygems.org/gems/mobilize-base>
|
641
|
+
|
642
|
+
<a name='section_Author'></a>
|
643
|
+
Author
|
644
|
+
------
|
645
|
+
|
646
|
+
Cassio Paes-Leme :: cpaesleme@ngmoco.com :: @cpaesleme
|
647
|
+
|
648
|
+
<a name='section_Special_Thanks'></a>
|
649
|
+
Special Thanks
|
650
|
+
--------------
|
651
|
+
|
652
|
+
* Al Thompson and Sagar Mehta for awesome design advice and discussions
|
653
|
+
* Elliott Clark for enlightening me to the wonders of Resque
|
654
|
+
* Bob Colner for pointing me to google-drive-ruby when I tried to
|
655
|
+
reinvent the wheel
|
656
|
+
* ngmoco:) and DeNA Global for supporting and adopting the Mobilize
|
657
|
+
platform
|
658
|
+
* gimite, defunkt, 10gen, and the countless other github heroes and
|
659
|
+
crewmembers.
|
660
|
+
|
661
|
+
[google_drive_ruby]: https://github.com/gimite/google-drive-ruby
|
662
|
+
[resque]: https://github.com/defunkt/resque
|
663
|
+
[mongoid]: http://mongoid.org/en/mongoid/index.html
|
664
|
+
[resque_redis]: https://github.com/defunkt/resque#section_Installing_Redis
|
665
|
+
[mongodb_quickstart]: http://www.mongodb.org/display/DOCS/Quickstart
|
666
|
+
[git_samples]: https://github.com/ngmoco/mobilize-base/tree/master/lib/samples
|
667
|
+
[rvm]: https://rvm.io/
|
668
|
+
[resque-web]: https://github.com/defunkt/resque#standalone
|
669
|
+
[mobilize-ssh]: https://github.com/ngmoco/mobilize-ssh
|