mobilize-base 1.36 → 1.293
Sign up to get free protection for your applications and to get access to all the features.
- data/README.md +666 -1
- data/lib/mobilize-base.rb +1 -12
- data/lib/mobilize-base/extensions/array.rb +3 -8
- data/lib/mobilize-base/extensions/google_drive/acl.rb +1 -1
- data/lib/mobilize-base/extensions/google_drive/client_login_fetcher.rb +1 -2
- data/lib/mobilize-base/extensions/google_drive/file.rb +37 -11
- data/lib/mobilize-base/extensions/string.rb +6 -11
- data/lib/mobilize-base/extensions/yaml.rb +7 -10
- data/lib/mobilize-base/handlers/gbook.rb +38 -25
- data/lib/mobilize-base/handlers/gdrive.rb +4 -20
- data/lib/mobilize-base/handlers/gfile.rb +10 -64
- data/lib/mobilize-base/handlers/gridfs.rb +24 -19
- data/lib/mobilize-base/handlers/gsheet.rb +29 -45
- data/lib/mobilize-base/handlers/resque.rb +10 -17
- data/lib/mobilize-base/jobtracker.rb +196 -22
- data/lib/mobilize-base/models/job.rb +77 -107
- data/lib/mobilize-base/models/runner.rb +122 -36
- data/lib/mobilize-base/models/stage.rb +37 -18
- data/lib/mobilize-base/tasks.rb +13 -50
- data/lib/mobilize-base/version.rb +1 -1
- data/lib/samples/gdrive.yml +0 -15
- data/lib/samples/gridfs.yml +3 -0
- data/lib/samples/gsheet.yml +4 -4
- data/lib/samples/jobtracker.yml +6 -0
- data/mobilize-base.gemspec +3 -3
- data/test/base_job_rows.yml +11 -0
- data/test/mobilize-base_test.rb +106 -0
- data/test/test_base_1.yml +3 -0
- data/test/test_helper.rb +0 -155
- metadata +24 -36
- data/lib/mobilize-base/extensions/time.rb +0 -20
- data/lib/mobilize-base/helpers/job_helper.rb +0 -54
- data/lib/mobilize-base/helpers/jobtracker_helper.rb +0 -143
- data/lib/mobilize-base/helpers/runner_helper.rb +0 -83
- data/lib/mobilize-base/helpers/stage_helper.rb +0 -38
- data/lib/samples/gfile.yml +0 -9
- data/test/fixtures/base1_stage1.in.yml +0 -10
- data/test/fixtures/integration_expected.yml +0 -25
- data/test/fixtures/integration_jobs.yml +0 -12
- data/test/fixtures/is_due.yml +0 -97
- data/test/integration/mobilize-base_test.rb +0 -57
- data/test/unit/mobilize-base_test.rb +0 -33
data/README.md
CHANGED
@@ -1,4 +1,669 @@
|
|
1
1
|
Mobilize
|
2
2
|
========
|
3
3
|
|
4
|
-
|
4
|
+
Mobilize is an end-to-end data transfer workflow manager with:
|
5
|
+
* a Google Spreadsheets UI through [google-drive-ruby][google_drive_ruby];
|
6
|
+
* a queue manager through [Resque][resque];
|
7
|
+
* a persistent caching / database layer through [Mongoid][mongoid];
|
8
|
+
* gems for data transfers to/from Hive, mySQL, and HTTP endpoints
|
9
|
+
(coming soon).
|
10
|
+
|
11
|
+
Mobilize-Base includes all the core scheduling and processing
|
12
|
+
functionality, allowing you to:
|
13
|
+
* put workers on the Mobilize Resque queue.
|
14
|
+
* create [Users](#section_Start_Users_User) and their associated Google Spreadsheet [Runners](#section_Start_Users_Runner);
|
15
|
+
* poll for [Jobs](#section_Job) on Runners (currently gsheet to gsheet only) and add them to Resque;
|
16
|
+
* monitor the status of Jobs on a rolling log.
|
17
|
+
|
18
|
+
Table Of Contents
|
19
|
+
-----------------
|
20
|
+
* [Overview](#section_Overview)
|
21
|
+
* [Install](#section_Install)
|
22
|
+
* [Redis](#section_Install_Redis)
|
23
|
+
* [MongoDB](#section_Install_MongoDB)
|
24
|
+
* [Mobilize-Base](#section_Install_Mobilize-Base)
|
25
|
+
* [Default Folders and Files](#section_Install_Folders_and_Files)
|
26
|
+
* [Configure](#section_Configure)
|
27
|
+
* [Google Drive](#section_Configure_Google_Drive)
|
28
|
+
* [Google Sheets](#section_Configure_Google_Sheets)
|
29
|
+
* [Jobtracker](#section_Configure_Jobtracker)
|
30
|
+
* [Resque](#section_Configure_Resque)
|
31
|
+
* [Resque-Web](#section_Configure_Resque-Web)
|
32
|
+
* [Gridfs](#section_Configure_Gridfs)
|
33
|
+
* [Mongoid](#section_Configure_Mongoid)
|
34
|
+
* [Start](#section_Start)
|
35
|
+
* [Start Resque-Web](#section_Start_Start_Resque-Web)
|
36
|
+
* [Set Environment](#section_Start_Set_Environment)
|
37
|
+
* [Create User](#section_Start_Create_User)
|
38
|
+
* [Start Workers](#section_Start_Start_Workers)
|
39
|
+
* [View Logs](#section_Start_View_Logs)
|
40
|
+
* [Start Jobtracker](#section_Start_Start_Jobtracker)
|
41
|
+
* [Create Job](#section_Start_Create_Job)
|
42
|
+
* [Run Test](#section_Start_Run_Test)
|
43
|
+
* [Add Gbooks and Gsheets](#section_Start_Add_Gbooks_And_Gsheets)
|
44
|
+
* [Meta](#section_Meta)
|
45
|
+
* [Author](#section_Author)
|
46
|
+
* [Special Thanks](#section_Special_Thanks)
|
47
|
+
|
48
|
+
|
49
|
+
<a name='section_Overview'></a>
|
50
|
+
Overview
|
51
|
+
-----------
|
52
|
+
|
53
|
+
* Mobilize is a script deployment and data visualization framework with
|
54
|
+
a Google Spreadsheets UI.
|
55
|
+
* Mobilize uses Resque for parallelization and queueuing, MongoDB for caching,
|
56
|
+
and Google Drive for hosting, user input and display.
|
57
|
+
* The [mobilize-ssh][mobilize-ssh] gem allows you to run scripts and
|
58
|
+
copy files between different machines, and have output directed to a
|
59
|
+
spreadsheet for viewing and processing.
|
60
|
+
* The platform is easily extensible: add your own rake tasks and
|
61
|
+
handlers by following a few simple conventions, and you can have your own
|
62
|
+
Mobilize gem up and running in no time.
|
63
|
+
|
64
|
+
<a name='section_Install'></a>
|
65
|
+
Install
|
66
|
+
------------
|
67
|
+
|
68
|
+
Mobilize requires Ruby 1.9.3, and has been tested on OSX and Ubuntu.
|
69
|
+
|
70
|
+
[RVM][rvm] is great for managing your rubies.
|
71
|
+
|
72
|
+
<a name='section_Install_Redis'></a>
|
73
|
+
### Redis
|
74
|
+
|
75
|
+
Redis is a pre-requisite for running Resque.
|
76
|
+
|
77
|
+
Please refer to the [Resque Redis Section][redis] for complete
|
78
|
+
instructions.
|
79
|
+
|
80
|
+
<a name='section_Install_MongoDB'></a>
|
81
|
+
### MongoDB
|
82
|
+
|
83
|
+
MongoDB is used to persist caches between reads and writes, keep track
|
84
|
+
of Users and Jobs, and store Datasets that map to endpoints.
|
85
|
+
|
86
|
+
Please refer to the [MongoDB Quickstart Page][mongodb_quickstart] to get started.
|
87
|
+
|
88
|
+
The settings for database and port are set in config/mongoid.yml
|
89
|
+
and are best left as default. Please refer to [Configure
|
90
|
+
Mongoid](#section_Configure_Mongoid) for details.
|
91
|
+
|
92
|
+
<a name='section_Install_Mobilize-Base'></a>
|
93
|
+
### Mobilize-Base
|
94
|
+
|
95
|
+
Mobilize-Base contains all of the gems it needs to run.
|
96
|
+
|
97
|
+
add this to your Gemfile:
|
98
|
+
|
99
|
+
``` ruby
|
100
|
+
gem "mobilize-base"
|
101
|
+
```
|
102
|
+
|
103
|
+
or do
|
104
|
+
|
105
|
+
$ gem install mobilize-base
|
106
|
+
|
107
|
+
for a ruby-wide install.
|
108
|
+
|
109
|
+
<a name='section_Install_Folders_and_Files'></a>
|
110
|
+
### Folders and Files
|
111
|
+
|
112
|
+
Mobilize requires a config folder and a log folder.
|
113
|
+
|
114
|
+
If you're on Rails, it will use the built-in config and log folders.
|
115
|
+
|
116
|
+
Otherwise, it will use log and config folders in the project folder (the
|
117
|
+
same one that contains your Rakefile)
|
118
|
+
|
119
|
+
### Rakefile
|
120
|
+
|
121
|
+
Inside the Rakefile in your project's root folder, make sure you have:
|
122
|
+
|
123
|
+
``` ruby
|
124
|
+
require 'mobilize-base/tasks'
|
125
|
+
```
|
126
|
+
|
127
|
+
This defines rake tasks essential to run the environment.
|
128
|
+
|
129
|
+
### Config and Log Folders
|
130
|
+
|
131
|
+
run
|
132
|
+
|
133
|
+
$ rake mobilize_base:setup
|
134
|
+
|
135
|
+
Mobilize will create config/mobilize/ and log/ folders at the project root
|
136
|
+
level. (same as the Rakefile).
|
137
|
+
|
138
|
+
(You can override these by passing
|
139
|
+
MOBILIZE_CONFIG_DIR and/or MOBILIZE_LOG_DIR arguments to the command.
|
140
|
+
All directories must end with a '/'.)
|
141
|
+
|
142
|
+
The script will also create samples for all required config files, which are detailed below.
|
143
|
+
|
144
|
+
Resque will create a mobilize-resque-`<environment>`.log in the log folder,
|
145
|
+
and loop over 10 files, 10MB each.
|
146
|
+
|
147
|
+
<a name='section_Configure'></a>
|
148
|
+
Configure
|
149
|
+
------------
|
150
|
+
|
151
|
+
All Mobilize configurations live in files in `config/mobilize/*.yml` by
|
152
|
+
default. Samples can
|
153
|
+
be found below or on github in the [lib/samples][git_samples] folder.
|
154
|
+
|
155
|
+
<a name='section_Configure_Google_Drive'></a>
|
156
|
+
### Configure Google Drive
|
157
|
+
|
158
|
+
gdrive.yml needs:
|
159
|
+
* a domain, which can be gmail.com but may be different depending on
|
160
|
+
your organization. All gdrive accounts should have
|
161
|
+
the same domain, and all Users should have emails in this domain.
|
162
|
+
* an owner name and password. You can set up separate owners
|
163
|
+
for different environments as in the below file, which will keep your
|
164
|
+
mission critical workers from getting rate-limit errors.
|
165
|
+
* one or more admins with email attributes -- these will be for people
|
166
|
+
who should be given write permissions to all Mobilize books in the
|
167
|
+
environment for maintenance purposes.
|
168
|
+
* one or more workers with name and pw attributes -- they will be used
|
169
|
+
to queue up google reads and writes. This can be the same as the owner
|
170
|
+
account for testing purposes or low-volume environments.
|
171
|
+
|
172
|
+
__Mobilize only allows one Resque
|
173
|
+
worker at a time to use a Google drive worker account for
|
174
|
+
reading/writing, which is called a gdrive_slot.__
|
175
|
+
|
176
|
+
Sample gdrive.yml:
|
177
|
+
|
178
|
+
``` yml
|
179
|
+
---
|
180
|
+
development:
|
181
|
+
domain: host.com
|
182
|
+
owner:
|
183
|
+
name: owner_development
|
184
|
+
pw: google_drive_password
|
185
|
+
admins:
|
186
|
+
- name: admin
|
187
|
+
workers:
|
188
|
+
- name: worker_development001
|
189
|
+
pw: worker001_google_drive_password
|
190
|
+
- name: worker_development002,
|
191
|
+
pw: worker002_google_drive_password
|
192
|
+
test:
|
193
|
+
domain: host.com
|
194
|
+
owner:
|
195
|
+
name: owner_test
|
196
|
+
pw: google_drive_password
|
197
|
+
admins:
|
198
|
+
- name: admin
|
199
|
+
workers:
|
200
|
+
- name: worker_test001
|
201
|
+
pw: worker001_google_drive_password
|
202
|
+
- name: worker_test002
|
203
|
+
pw: worker002_google_drive_password
|
204
|
+
production:
|
205
|
+
domain: host.com
|
206
|
+
owner:
|
207
|
+
name: owner_production
|
208
|
+
pw: google_drive_password
|
209
|
+
admins:
|
210
|
+
- name: admin
|
211
|
+
workers:
|
212
|
+
- name: worker_production001
|
213
|
+
pw: worker001_google_drive_password
|
214
|
+
- name: worker_production002
|
215
|
+
pw: worker002_google_drive_password
|
216
|
+
```
|
217
|
+
|
218
|
+
<a name='section_Configure_Google_Sheets'></a>
|
219
|
+
### Configure Google Sheets
|
220
|
+
|
221
|
+
gsheet.yml needs:
|
222
|
+
* max_cells, which is the number of cells a sheet is allowed to have
|
223
|
+
written to it at one time. Default is 400k cells, which is the max per
|
224
|
+
book. Google Drive will throw its own exception if
|
225
|
+
you try to write more than that.
|
226
|
+
* Because Google Docs ties date formatting to the Locale for the
|
227
|
+
spreadsheet, there are 2 date format parameters:
|
228
|
+
* read_date_format, which is the format that should be read FROM google
|
229
|
+
sheets for date columns.
|
230
|
+
* sheet_date_format, which is the format that the google sheet is in.
|
231
|
+
* A date column is defined as one where the column header = "date" or "Date", or ends with "_date" or "Date".
|
232
|
+
* The defaults are set to US locale for sheet_date_format, because in 'Murica (US) we
|
233
|
+
use %m/%d/%Y for some reason, and to %Y-%m-%d format for
|
234
|
+
reading, which is more standard and sorts well as a string. If your
|
235
|
+
locale is NOT 'Murica you will want to change these.
|
236
|
+
|
237
|
+
Sample gsheet.yml
|
238
|
+
|
239
|
+
``` yml
|
240
|
+
---
|
241
|
+
development:
|
242
|
+
max_cells: 400000
|
243
|
+
read_date_format: "%Y-%m-%d"
|
244
|
+
sheet_date_format: "%m/%d/%Y"
|
245
|
+
test:
|
246
|
+
max_cells: 400000
|
247
|
+
read_date_format: "%Y-%m-%d"
|
248
|
+
sheet_date_format: "%m/%d/%Y"
|
249
|
+
staging:
|
250
|
+
max_cells: 400000
|
251
|
+
read_date_format: "%Y-%m-%d"
|
252
|
+
sheet_date_format: "%m/%d/%Y"
|
253
|
+
production:
|
254
|
+
max_cells: 400000
|
255
|
+
read_date_format: "%Y-%m-%d"
|
256
|
+
sheet_date_format: "%m/%d/%Y"
|
257
|
+
```
|
258
|
+
|
259
|
+
<a name='section_Configure_Jobtracker'></a>
|
260
|
+
### Configure Jobtracker
|
261
|
+
|
262
|
+
The Jobtracker sits on your Resque and does 2 things:
|
263
|
+
* check for Users that are due for polling;
|
264
|
+
* send out notifications when:
|
265
|
+
* there are failed jobs on Resque;
|
266
|
+
* there are jobs on Resque that have run beyond the max run time.
|
267
|
+
|
268
|
+
Emails are sent using ActionMailer, through the owner Google Drive
|
269
|
+
account.
|
270
|
+
|
271
|
+
To this end, it needs these parameters, for which there is a sample
|
272
|
+
below and in the [lib/samples][git_samples] folder:
|
273
|
+
|
274
|
+
``` yml
|
275
|
+
---
|
276
|
+
development:
|
277
|
+
cycle_freq: 10 #time between Jobtracker sweeps
|
278
|
+
notification_freq: 3600 #1 hour between failure/timeout notifications
|
279
|
+
runner_read_freq: 300 #5 min between runner reads
|
280
|
+
max_run_time: 14400 # if a job runs for 4h+, notification will be sent
|
281
|
+
extensions: [] #additional Mobilize modules to load workers with
|
282
|
+
admins: #emails to send notifications to
|
283
|
+
- email: admin@host.com
|
284
|
+
test:
|
285
|
+
cycle_freq: 10 #time between Jobtracker sweeps
|
286
|
+
notification_freq: 3600 #1 hour between failure/timeout notifications
|
287
|
+
runner_read_freq: 300 #5 min between runner reads
|
288
|
+
max_run_time: 14400 # if a job runs for 4h+, notification will be sent
|
289
|
+
extensions: [] #additional Mobilize modules to load workers with
|
290
|
+
admins: #emails to send notifications to
|
291
|
+
- email: admin@host.com
|
292
|
+
production:
|
293
|
+
cycle_freq: 10 #time between Jobtracker sweeps
|
294
|
+
notification_freq: 3600 #1 hour between failure/timeout notifications
|
295
|
+
runner_read_freq: 300 #5 min between runner reads
|
296
|
+
max_run_time: 14400 # if a job runs for 4h+, notification will be sent
|
297
|
+
extensions: [] #additional Mobilize modules to load workers with
|
298
|
+
admins: #emails to send notifications to
|
299
|
+
- email: admin@host.com
|
300
|
+
```
|
301
|
+
|
302
|
+
<a name='section_Configure_Resque'></a>
|
303
|
+
### Configure Resque
|
304
|
+
|
305
|
+
Resque keeps track of Jobs, Workers and logging.
|
306
|
+
|
307
|
+
It needs the below parameters, which can be found in the [lib/samples][git_samples] folder.
|
308
|
+
|
309
|
+
* queue_name - the name of the Resque queue where you would like the Jobtracker and Resque Workers to
|
310
|
+
run. Default is mobilize.
|
311
|
+
* max_workers - the total number of simultaneous workers you would like
|
312
|
+
on your queue. Default is 4 for development and test, 36 in
|
313
|
+
production, but feel free to adjust depending on your hardware.
|
314
|
+
* redis_port - you should probably leave this alone, it specifies the
|
315
|
+
default port for dev and prod and a separate one for testing.
|
316
|
+
* web_port - this specifies the port under which resque-web operates
|
317
|
+
|
318
|
+
``` yml
|
319
|
+
---
|
320
|
+
development:
|
321
|
+
queue_name: mobilize
|
322
|
+
max_workers: 4
|
323
|
+
redis_port: 6379
|
324
|
+
web_port: 8282
|
325
|
+
test:
|
326
|
+
queue_name: mobilize
|
327
|
+
max_workers: 4
|
328
|
+
redis_port: 9736
|
329
|
+
web_port: 8282
|
330
|
+
production:
|
331
|
+
queue_name: mobilize
|
332
|
+
max_workers: 36
|
333
|
+
redis_port: 6379
|
334
|
+
web_port: 8282
|
335
|
+
```
|
336
|
+
|
337
|
+
<a name='section_Configure_Resque-Web'></a>
|
338
|
+
### Configure Resque-Web
|
339
|
+
|
340
|
+
Please change your default username and password in the resque_web.rb
|
341
|
+
file in your config folder, reproduced below:
|
342
|
+
|
343
|
+
``` ruby
|
344
|
+
#comment out the below if you want no authentication on your web portal (not recommended)
|
345
|
+
Resque::Server.use(Rack::Auth::Basic) do |user, password|
|
346
|
+
[user, password] == ['admin', 'changeyourpassword']
|
347
|
+
end
|
348
|
+
```
|
349
|
+
|
350
|
+
This file is passed as a config file argument to
|
351
|
+
mobilize_base:resque_web task, as detailed in [Start Resque-Web](#section_Start_Start_Resque-Web).
|
352
|
+
|
353
|
+
<a name='section_Configure_Gridfs'></a>
|
354
|
+
### Configure Gridfs
|
355
|
+
|
356
|
+
Mobilize stores cached data in MongoDB Gridfs.
|
357
|
+
It needs the below parameters, which can be found in the [lib/samples][git_samples] folder.
|
358
|
+
|
359
|
+
* max_versions - the number of __different__ versions of data to keep
|
360
|
+
for a given cache. Default is 10. This is meant mostly to allow you to
|
361
|
+
restore Runners from cache if necessary.
|
362
|
+
* max_compressed_write_size - the amount of compressed data Gridfs will
|
363
|
+
allow. If you try to write more than this, an exception will be thrown.
|
364
|
+
|
365
|
+
``` yml
|
366
|
+
---
|
367
|
+
development:
|
368
|
+
max_versions: 10 #number of versions of cache to keep in gridfs
|
369
|
+
max_compressed_write_size: 1000000000 #~1GB
|
370
|
+
test:
|
371
|
+
max_versions: 10 #number of versions of cache to keep in gridfs
|
372
|
+
max_compressed_write_size: 1000000000 #~1GB
|
373
|
+
production:
|
374
|
+
max_versions: 10 #number of versions of cache to keep in gridfs
|
375
|
+
max_compressed_write_size: 1000000000 #~1GB
|
376
|
+
```
|
377
|
+
|
378
|
+
<a name='section_Configure_Mongoid'></a>
|
379
|
+
### Configure Mongoid
|
380
|
+
|
381
|
+
Mongoid is the abstraction layer on top of MongoDB so we can interact
|
382
|
+
with it in an ActiveRecord-like fashion.
|
383
|
+
|
384
|
+
It needs the below parameters, which can be found in the [lib/samples][git_samples] folder.
|
385
|
+
|
386
|
+
You shouldn't need to change anything in this file.
|
387
|
+
|
388
|
+
``` yml
|
389
|
+
---
|
390
|
+
development:
|
391
|
+
sessions:
|
392
|
+
default:
|
393
|
+
database: mobilize-development
|
394
|
+
persist_in_safe_mode: true
|
395
|
+
hosts:
|
396
|
+
- 127.0.0.1:27017
|
397
|
+
test:
|
398
|
+
sessions:
|
399
|
+
default:
|
400
|
+
database: mobilize-test
|
401
|
+
persist_in_safe_mode: true
|
402
|
+
hosts:
|
403
|
+
- 127.0.0.1:27017
|
404
|
+
production:
|
405
|
+
sessions:
|
406
|
+
default:
|
407
|
+
database: mobilize-production
|
408
|
+
persist_in_safe_mode: true
|
409
|
+
hosts:
|
410
|
+
- 127.0.0.1:27017
|
411
|
+
```
|
412
|
+
|
413
|
+
<a name='section_Start'></a>
|
414
|
+
Start
|
415
|
+
-----
|
416
|
+
|
417
|
+
A Mobilize instance can be considered "started" or "running" when you have:
|
418
|
+
|
419
|
+
1. Resque workers running on the Mobilize queue;
|
420
|
+
2. A Jobtracker running on one of the Resque workers;
|
421
|
+
3. One or more Users created in your MongoDB;
|
422
|
+
4. One or more Jobs created in a User's Runner;
|
423
|
+
|
424
|
+
<a name='section_Start_Start_resque-web'></a>
|
425
|
+
### Start resque-web
|
426
|
+
|
427
|
+
Mobilize ships with its own rake task to start resque web -- you can do
|
428
|
+
the following:
|
429
|
+
|
430
|
+
|
431
|
+
$ MOBILIZE_ENV=<environment> rake mobilize_base:resque_web
|
432
|
+
|
433
|
+
This will start a resque_web instance with the port specified in your
|
434
|
+
resque.yml and the config/auth scheme specified in your resque_web.rb.
|
435
|
+
|
436
|
+
More detail on the
|
437
|
+
[Resque-Web Standalone section][resque-web].
|
438
|
+
|
439
|
+
<a name='section_Start_Set_Environment'></a>
|
440
|
+
### Set Environment
|
441
|
+
|
442
|
+
Mobilize takes the environment from your Rails.env if you're running
|
443
|
+
Rails, or assumes "development." You can specify "development", "test",
|
444
|
+
or "production," as per the yml files.
|
445
|
+
|
446
|
+
Otherwise, it takes it from MOBILIZE_ENV parameter, as in:
|
447
|
+
|
448
|
+
``` ruby
|
449
|
+
> ENV['MOBILIZE_ENV'] = 'production'
|
450
|
+
> require 'mobilize-base'
|
451
|
+
```
|
452
|
+
This affects all parameters as set in the yml files, including the
|
453
|
+
database.
|
454
|
+
|
455
|
+
<a name='section_Start_Create_User'></a>
|
456
|
+
### Create User
|
457
|
+
|
458
|
+
Users are people who use the Mobilize service to move data from one
|
459
|
+
endpoint to another. They each have a Runner, which is a google sheet
|
460
|
+
that contains one or more Jobs.
|
461
|
+
|
462
|
+
To create a requestor, use the User.find_or_create_by_name
|
463
|
+
command (replace the user with your own name, or any name
|
464
|
+
in your domain).
|
465
|
+
|
466
|
+
``` ruby
|
467
|
+
irb> User.find_or_create_by_name("user_name")
|
468
|
+
```
|
469
|
+
|
470
|
+
<a name='section_Start_Start_Workers'></a>
|
471
|
+
### Start Workers
|
472
|
+
|
473
|
+
Workers are rake tasks that load the Mobilize environment and allow the
|
474
|
+
processing of the Jobtracker, Users and Jobs.
|
475
|
+
|
476
|
+
These will start as many workers as are defined in your resque.yml.
|
477
|
+
|
478
|
+
To start workers, do:
|
479
|
+
|
480
|
+
``` ruby
|
481
|
+
> Jobtracker.prep_workers
|
482
|
+
```
|
483
|
+
|
484
|
+
if you have workers already running and would like to kill and refresh
|
485
|
+
them, do:
|
486
|
+
|
487
|
+
``` ruby
|
488
|
+
> Jobtracker.restart_workers!
|
489
|
+
```
|
490
|
+
|
491
|
+
Note that restart will kill any workers on the Mobilize queue.
|
492
|
+
|
493
|
+
<a name='section_Start_View_Logs'></a>
|
494
|
+
### View Logs
|
495
|
+
|
496
|
+
at this point, you'll want to start viewing the logs for the Resque
|
497
|
+
workers -- they will be stored under your log folder, by default log/. You can do:
|
498
|
+
|
499
|
+
$ tail -f log/mobilize-`<environment>`.log
|
500
|
+
|
501
|
+
to view them.
|
502
|
+
|
503
|
+
<a name='section_Start_Start_Jobtracker'></a>
|
504
|
+
### Start Jobtracker
|
505
|
+
|
506
|
+
Once the Resque workers are running, and you have at least one User
|
507
|
+
set up, it's time to start the Jobtracker:
|
508
|
+
|
509
|
+
``` ruby
|
510
|
+
> Jobtracker.start
|
511
|
+
```
|
512
|
+
|
513
|
+
The Jobtracker will automatically enqueue any Users that have not
|
514
|
+
been processed in the requestor_refresh period defined in the
|
515
|
+
jobtracker.yml, and create their Runners if they do not exist. You can
|
516
|
+
see this process on your Resque UI and in the log file.
|
517
|
+
|
518
|
+
<a name='section_Start_Create_Job'></a>
|
519
|
+
### Create Job
|
520
|
+
|
521
|
+
Now it's time to go onto the Runner and add a Job to be processed.
|
522
|
+
|
523
|
+
To do this, you should log into your Google Drive with either the
|
524
|
+
owner's account, an admin account, or the Runner User's account. These
|
525
|
+
will be the accounts with edit permissions to a given Runner.
|
526
|
+
|
527
|
+
Navigate to the Jobs tab on the Runner `(denoted by Runner(<requestor
|
528
|
+
name>))` and enter values under each header:
|
529
|
+
|
530
|
+
* name This is the name of the job you would like to add. Names must be unique across all your jobs, otherwise you will get an error
|
531
|
+
|
532
|
+
* active set this to blank or FALSE if you want to turn off a job
|
533
|
+
|
534
|
+
* trigger This uses human readable syntax to schedule jobs. It accepts the following:
|
535
|
+
* every `<integer>` hour -- fire the job at increments of `<integer>` hours, minimum of 1 hour
|
536
|
+
* every `<integer>` day -- fire the job at increments of `<integer>` days, minimum of 1
|
537
|
+
* every `<integer>` day after <HH:MM> -- fire the job at increments of <integer> days, after HH:MM UTC time
|
538
|
+
* every `<integer>` day_of_week after <HH:MM> -- fire the job on specified day of week, after HH:MM UTC time; 1=Sunday
|
539
|
+
* every `<integer>` day_of_month after <HH:MM> -- fire the job on specified day of month, after HH:MM UTC time
|
540
|
+
* once -- fire the job once if active is set to TRUE, set active to FALSE right after
|
541
|
+
* after `<jobname>` -- fire the job after the job named `<jobname>`
|
542
|
+
|
543
|
+
* status Mobilize writes this field with the last status returned by the job
|
544
|
+
|
545
|
+
* stage1..stage5 List of stages to be performed by the job.
|
546
|
+
* Stages have this syntax: `<handler>.<call> <params>`.
|
547
|
+
* handler specifies the file that should receive the stage
|
548
|
+
* the call specifies the method within the file. The method should
|
549
|
+
be called `"<handler>.<call>_by_stage_path"`
|
550
|
+
* the params the method accepts, which are custom to each
|
551
|
+
stage. These should be of the for `<key1>: <value1>, <key2>: <value2>`, where
|
552
|
+
`<key>` is an unquoted string and `<value>` is a quoted string, an
|
553
|
+
integer, an array (delimited by square braces), or a hash (delimited by
|
554
|
+
curly braces).
|
555
|
+
* For mobilize-base, the following stage is available:
|
556
|
+
* gsheet.write `source: <input_path>`, which reads the sheet.
|
557
|
+
* The input_path should be of the form:
|
558
|
+
* `<gbook_name>/<gsheet_name>` or just `<gsheet_name>` if the target is in
|
559
|
+
the Runner itself.
|
560
|
+
* `gfile://<gfile_name>` if the target is a file.
|
561
|
+
* The file must be owned by the Gdrive owner.
|
562
|
+
* The test uses "gfile://test_base_1.tsv".
|
563
|
+
* The stage_name should be of the form `<stage_column>`. The test uses "stage1" for the first test
|
564
|
+
and "base1.out" for the second test. The first
|
565
|
+
takes the output from the first stage and the second reads it straight
|
566
|
+
from the referenced sheet.
|
567
|
+
* All stages accept retry parameters:
|
568
|
+
* retries: an integer specifying the number of times that the system will try it again before giving up.
|
569
|
+
* delay: an integer specifying the number of seconds between retries.
|
570
|
+
* always_on: if true, keeps the job on regardless of stage failures. The job will retry from the beginning with the same frequency as the Runner refresh rate.
|
571
|
+
* If a stage fails after all retries, it will output its standard error to a tab in the Runner with the name of the job, the name of the stage, and a ".err" extension
|
572
|
+
* The tab will be headed "response" and will contain the exception and backtrace for the error.
|
573
|
+
* The test uses "Requestor_mobilize(test)/base1.out" and
|
574
|
+
"Runner_mobilize(test)/base2.out" for target sheets.
|
575
|
+
|
576
|
+
<a name='section_Start_Run_Test'></a>
|
577
|
+
### Run Test
|
578
|
+
|
579
|
+
To run tests, you will need to
|
580
|
+
|
581
|
+
1) clone the repository
|
582
|
+
|
583
|
+
From the project folder, run
|
584
|
+
|
585
|
+
2) rake mobilize_base:setup
|
586
|
+
|
587
|
+
and populate the "test" environment in the config files with the
|
588
|
+
necessary details.
|
589
|
+
|
590
|
+
3) $ rake test
|
591
|
+
|
592
|
+
This will create a test Runner with a sample job. These will run off a
|
593
|
+
test redis instance which will be killed once the tests finish.
|
594
|
+
|
595
|
+
<a name='section_Start_'></a>
|
596
|
+
### Run Test
|
597
|
+
|
598
|
+
To run tests, you will need to
|
599
|
+
|
600
|
+
1) clone the repository
|
601
|
+
|
602
|
+
From the project folder, run
|
603
|
+
|
604
|
+
2) rake mobilize_base:setup
|
605
|
+
|
606
|
+
and populate the "test" environment in the config files with the
|
607
|
+
necessary details.
|
608
|
+
|
609
|
+
3) $ rake test
|
610
|
+
|
611
|
+
This will create a test Runner with a sample job. These will run off a
|
612
|
+
test redis instance. This instance will be kept alive so you can test
|
613
|
+
additional Mobilize modules. (see [mobilize-ssh][mobilize-ssh] for more)
|
614
|
+
|
615
|
+
<a name='section_Start_Add_Gbooks_And_Gsheets'></a>
|
616
|
+
### Add Gbooks and Gsheets
|
617
|
+
|
618
|
+
A User's Runner should be kept clean, preferably with only the jobs
|
619
|
+
sheet. The test keeps everything in the
|
620
|
+
Runner, but in reality you will want to create lots of different books
|
621
|
+
to share with different people in your organization.
|
622
|
+
|
623
|
+
To add a new Gbook, create one as you normally would, then make sure the
|
624
|
+
Owner is the same user as specified in your gdrive.yml/owner/name value.
|
625
|
+
Mobilize will handle the rest, extending permissions to workers and
|
626
|
+
admins.
|
627
|
+
|
628
|
+
Also make sure any Gsheets you specify for __read__ operations exist
|
629
|
+
prior to calling the job, or there will be an error. __Write__
|
630
|
+
operations will create the book and sheet if it does not already exist,
|
631
|
+
already under ownership of the owner account.
|
632
|
+
|
633
|
+
<a name='section_Meta'></a>
|
634
|
+
Meta
|
635
|
+
----
|
636
|
+
|
637
|
+
* Code: `git clone git://github.com/ngmoco/mobilize-base.git`
|
638
|
+
* Home: <https://github.com/ngmoco/mobilize-base>
|
639
|
+
* Bugs: <https://github.com/ngmoco/mobilize-base/issues>
|
640
|
+
* Gems: <http://rubygems.org/gems/mobilize-base>
|
641
|
+
|
642
|
+
<a name='section_Author'></a>
|
643
|
+
Author
|
644
|
+
------
|
645
|
+
|
646
|
+
Cassio Paes-Leme :: cpaesleme@ngmoco.com :: @cpaesleme
|
647
|
+
|
648
|
+
<a name='section_Special_Thanks'></a>
|
649
|
+
Special Thanks
|
650
|
+
--------------
|
651
|
+
|
652
|
+
* Al Thompson and Sagar Mehta for awesome design advice and discussions
|
653
|
+
* Elliott Clark for enlightening me to the wonders of Resque
|
654
|
+
* Bob Colner for pointing me to google-drive-ruby when I tried to
|
655
|
+
reinvent the wheel
|
656
|
+
* ngmoco:) and DeNA Global for supporting and adopting the Mobilize
|
657
|
+
platform
|
658
|
+
* gimite, defunkt, 10gen, and the countless other github heroes and
|
659
|
+
crewmembers.
|
660
|
+
|
661
|
+
[google_drive_ruby]: https://github.com/gimite/google-drive-ruby
|
662
|
+
[resque]: https://github.com/defunkt/resque
|
663
|
+
[mongoid]: http://mongoid.org/en/mongoid/index.html
|
664
|
+
[resque_redis]: https://github.com/defunkt/resque#section_Installing_Redis
|
665
|
+
[mongodb_quickstart]: http://www.mongodb.org/display/DOCS/Quickstart
|
666
|
+
[git_samples]: https://github.com/ngmoco/mobilize-base/tree/master/lib/samples
|
667
|
+
[rvm]: https://rvm.io/
|
668
|
+
[resque-web]: https://github.com/defunkt/resque#standalone
|
669
|
+
[mobilize-ssh]: https://github.com/ngmoco/mobilize-ssh
|