mobilize-hive 1.35 → 1.36
Sign up to get free protection for your applications and to get access to all the features.
- data/README.md +3 -253
- data/lib/mobilize-hive/handlers/hive.rb +47 -43
- data/lib/mobilize-hive/helpers/hive_helper.rb +4 -0
- data/lib/mobilize-hive/tasks.rb +1 -0
- data/lib/mobilize-hive/version.rb +1 -1
- data/lib/mobilize-hive.rb +3 -0
- data/lib/samples/hive.yml +6 -0
- data/mobilize-hive.gemspec +1 -1
- data/test/{hive_test_1.hql → fixtures/hive1.hql} +0 -0
- data/test/{hive_test_1_in.yml → fixtures/hive1.in.yml} +0 -0
- data/test/{hive_test_1_schema.yml → fixtures/hive1.schema.yml} +0 -0
- data/test/fixtures/hive1.sql +1 -0
- data/test/fixtures/hive4_stage1.in +1 -0
- data/test/fixtures/hive4_stage2.in.yml +4 -0
- data/test/fixtures/integration_expected.yml +69 -0
- data/test/fixtures/integration_jobs.yml +34 -0
- data/test/integration/mobilize-hive_test.rb +43 -0
- data/test/test_helper.rb +1 -0
- metadata +24 -16
- data/test/hive_job_rows.yml +0 -34
- data/test/mobilize-hive_test.rb +0 -112
data/README.md
CHANGED
@@ -1,254 +1,4 @@
|
|
1
|
-
Mobilize
|
2
|
-
|
1
|
+
Mobilize
|
2
|
+
========
|
3
3
|
|
4
|
-
|
5
|
-
* read, write, and copy hive files through Google Spreadsheets.
|
6
|
-
|
7
|
-
Table Of Contents
|
8
|
-
-----------------
|
9
|
-
* [Overview](#section_Overview)
|
10
|
-
* [Install](#section_Install)
|
11
|
-
* [Mobilize-Hive](#section_Install_Mobilize-Hive)
|
12
|
-
* [Install Dirs and Files](#section_Install_Dirs_and_Files)
|
13
|
-
* [Configure](#section_Configure)
|
14
|
-
* [Hive](#section_Configure_Hive)
|
15
|
-
* [Start](#section_Start)
|
16
|
-
* [Create Job](#section_Start_Create_Job)
|
17
|
-
* [Run Test](#section_Start_Run_Test)
|
18
|
-
* [Meta](#section_Meta)
|
19
|
-
* [Special Thanks](#section_Special_Thanks)
|
20
|
-
* [Author](#section_Author)
|
21
|
-
|
22
|
-
<a name='section_Overview'></a>
|
23
|
-
Overview
|
24
|
-
-----------
|
25
|
-
|
26
|
-
* Mobilize-hive adds Hive methods to mobilize-hdfs.
|
27
|
-
|
28
|
-
<a name='section_Install'></a>
|
29
|
-
Install
|
30
|
-
------------
|
31
|
-
|
32
|
-
Make sure you go through all the steps in the
|
33
|
-
[mobilize-base][mobilize-base],
|
34
|
-
[mobilize-ssh][mobilize-ssh],
|
35
|
-
[mobilize-hdfs][mobilize-hdfs],
|
36
|
-
install sections first.
|
37
|
-
|
38
|
-
<a name='section_Install_Mobilize-Hive'></a>
|
39
|
-
### Mobilize-Hive
|
40
|
-
|
41
|
-
add this to your Gemfile:
|
42
|
-
|
43
|
-
``` ruby
|
44
|
-
gem "mobilize-hive"
|
45
|
-
```
|
46
|
-
|
47
|
-
or do
|
48
|
-
|
49
|
-
$ gem install mobilize-hive
|
50
|
-
|
51
|
-
for a ruby-wide install.
|
52
|
-
|
53
|
-
<a name='section_Install_Dirs_and_Files'></a>
|
54
|
-
### Dirs and Files
|
55
|
-
|
56
|
-
### Rakefile
|
57
|
-
|
58
|
-
Inside the Rakefile in your project's root dir, make sure you have:
|
59
|
-
|
60
|
-
``` ruby
|
61
|
-
require 'mobilize-base/tasks'
|
62
|
-
require 'mobilize-ssh/tasks'
|
63
|
-
require 'mobilize-hdfs/tasks'
|
64
|
-
require 'mobilize-hive/tasks'
|
65
|
-
```
|
66
|
-
|
67
|
-
This defines rake tasks essential to run the environment.
|
68
|
-
|
69
|
-
### Config Dir
|
70
|
-
|
71
|
-
run
|
72
|
-
|
73
|
-
$ rake mobilize_hive:setup
|
74
|
-
|
75
|
-
This will copy over a sample hive.yml to your config dir.
|
76
|
-
|
77
|
-
<a name='section_Configure'></a>
|
78
|
-
Configure
|
79
|
-
------------
|
80
|
-
|
81
|
-
<a name='section_Configure_Hive'></a>
|
82
|
-
### Configure Hive
|
83
|
-
|
84
|
-
* Hive is big data. That means we need to be careful when reading from
|
85
|
-
the cluster as it could easily fill up our mongodb instance, RAM, local disk
|
86
|
-
space, etc.
|
87
|
-
* To achieve this, all hive operations, stage outputs, etc. are
|
88
|
-
executed and stored on the cluster only.
|
89
|
-
* The exceptions are:
|
90
|
-
* writing to the cluster from an external source, such as a google
|
91
|
-
sheet. Here there
|
92
|
-
is no risk as the external source has much more strict size limits than
|
93
|
-
hive.
|
94
|
-
* reading from the cluster, such as for posting to google sheet. In
|
95
|
-
this case, the read_limit parameter dictates the maximum amount that can
|
96
|
-
be read. If the data is bigger than the read limit, an exception will be
|
97
|
-
raised.
|
98
|
-
|
99
|
-
The Hive configuration consists of:
|
100
|
-
* clusters - this defines aliases for clusters, which are used as
|
101
|
-
parameters for Hive stages. They should have the same name as those
|
102
|
-
in hadoop.yml. Each cluster has:
|
103
|
-
* max_slots - defines the total number of simultaneous slots to be
|
104
|
-
used for hive jobs on this cluster
|
105
|
-
* output_db - defines the db which should be used to hold stage outputs.
|
106
|
-
* This db must have open permissions (777) so any user on the system can
|
107
|
-
write to it -- the tables inside will be owned by the users themselves.
|
108
|
-
* exec_path - defines the path to the hive executable
|
109
|
-
|
110
|
-
Sample hive.yml:
|
111
|
-
|
112
|
-
``` yml
|
113
|
-
---
|
114
|
-
development:
|
115
|
-
clusters:
|
116
|
-
dev_cluster:
|
117
|
-
max_slots: 5
|
118
|
-
output_db: mobilize
|
119
|
-
exec_path: /path/to/hive
|
120
|
-
test:
|
121
|
-
clusters:
|
122
|
-
test_cluster:
|
123
|
-
max_slots: 5
|
124
|
-
output_db: mobilize
|
125
|
-
exec_path: /path/to/hive
|
126
|
-
production:
|
127
|
-
clusters:
|
128
|
-
prod_cluster:
|
129
|
-
max_slots: 5
|
130
|
-
output_db: mobilize
|
131
|
-
exec_path: /path/to/hive
|
132
|
-
```
|
133
|
-
|
134
|
-
<a name='section_Start'></a>
|
135
|
-
Start
|
136
|
-
-----
|
137
|
-
|
138
|
-
<a name='section_Start_Create_Job'></a>
|
139
|
-
### Create Job
|
140
|
-
|
141
|
-
* For mobilize-hive, the following stages are available.
|
142
|
-
* cluster and user are optional for all of the below.
|
143
|
-
* cluster defaults to the first cluster listed;
|
144
|
-
* user is treated the same way as in [mobilize-ssh][mobilize-ssh].
|
145
|
-
* params are also optional for all of the below. They replace HQL in sources.
|
146
|
-
* params are passed as a YML or JSON, as in:
|
147
|
-
* `hive.run source:<source_path>, params:{'date': '2013-03-01', 'unit': 'widgets'}`
|
148
|
-
* this example replaces all the keys, preceded by '@' in all source hqls with the value.
|
149
|
-
* The preceding '@' is used to keep from replacing instances
|
150
|
-
of "date" and "unit" in the HQL; you should have `@date` and `@unit` in your actual HQL
|
151
|
-
if you'd like to replace those tokens.
|
152
|
-
* in addition, the following params are substituted automatically:
|
153
|
-
* `$utc_date` - replaced with YYYY-MM-DD date, UTC
|
154
|
-
* `$utc_time` - replaced with HH:MM time, UTC
|
155
|
-
* any occurrence of these values in HQL will be replaced at runtime.
|
156
|
-
* hive.run `hql:<hql> || source:<gsheet_path>, user:<user>, cluster:<cluster>`, which executes the
|
157
|
-
script in the hql or source sheet and returns any output specified at the
|
158
|
-
end. If the cmd or last query in source is a select statement, column headers will be
|
159
|
-
returned as well.
|
160
|
-
* hive.write `hql:<hql> || source:<source_path>, target:<hive_path>, partitions:<partition_path>, user:<user>, cluster:<cluster>, schema:<gsheet_path>, drop:<true/false>`,
|
161
|
-
which writes the source or query result to the selected hive table.
|
162
|
-
* hive_path
|
163
|
-
* should be of the form `<hive_db>/<table_name>` or `<hive_db>.<table_name>`.
|
164
|
-
* source:
|
165
|
-
* can be a gsheet_path, hdfs_path, or hive_path (no partitions)
|
166
|
-
* for gsheet and hdfs path,
|
167
|
-
* if the file ends in .*ql, it's treated the same as passing hql
|
168
|
-
* otherwise it is treated as a tsv with the first row as column headers
|
169
|
-
* target:
|
170
|
-
* Should be a hive_path, as in `<hive_db>/<table_name>` or `<hive_db>.<table_name>`.
|
171
|
-
* partitions:
|
172
|
-
* Due to Hive limitation, partition names CANNOT be reserved keywords when writing from tsv (gsheet or hdfs source)
|
173
|
-
* Partitions should be specified as a path, as in partitions:`<partition1>/<partition2>`.
|
174
|
-
* schema:
|
175
|
-
* optional. gsheet_path to column schema.
|
176
|
-
* two columns: name, datatype
|
177
|
-
* Any columns not defined here will receive "string" as the datatype
|
178
|
-
* partitions can have their datatypes overridden here as well
|
179
|
-
* columns named here that are not in the dataset will be ignored
|
180
|
-
* drop:
|
181
|
-
* optional. drops the target table before performing write
|
182
|
-
* defaults to false
|
183
|
-
|
184
|
-
<a name='section_Start_Run_Test'></a>
|
185
|
-
### Run Test
|
186
|
-
|
187
|
-
To run tests, you will need to
|
188
|
-
|
189
|
-
1) go through [mobilize-base][mobilize-base], [mobilize-ssh][mobilize-ssh], [mobilize-hdfs][mobilize-hdfs] tests first
|
190
|
-
|
191
|
-
2) clone the mobilize-hive repository
|
192
|
-
|
193
|
-
From the project folder, run
|
194
|
-
|
195
|
-
3) $ rake mobilize_hive:setup
|
196
|
-
|
197
|
-
Copy over the config files from the mobilize-base, mobilize-ssh,
|
198
|
-
mobilize-hdfs projects into the config dir, and populate the values in the hive.yml file.
|
199
|
-
|
200
|
-
Make sure you use the same names for your hive clusters as you do in
|
201
|
-
hadoop.yml.
|
202
|
-
|
203
|
-
3) $ rake test
|
204
|
-
|
205
|
-
* The test runs these jobs:
|
206
|
-
* hive_test_1:
|
207
|
-
* `hive.write target:"mobilize/hive_test_1/act_date",source:"Runner_mobilize(test)/hive_test_1.in", schema:"hive_test_1.schema", drop:true`
|
208
|
-
* `hive.run source:"hive_test_1.hql"`
|
209
|
-
* `hive.run cmd:"show databases"`
|
210
|
-
* `gsheet.write source:"stage2", target:"hive_test_1_stage_2.out"`
|
211
|
-
* `gsheet.write source:"stage3", target:"hive_test_1_stage_3.out"`
|
212
|
-
* hive_test_1.hql runs a select statement on the table created in the
|
213
|
-
write command.
|
214
|
-
* at the end of the test, there should be two sheets, one with a
|
215
|
-
sum of the data as in your write query, one with the results of the show
|
216
|
-
databases command.
|
217
|
-
* hive_test_2:
|
218
|
-
* `hive.write source:"hdfs://user/mobilize/test/test_hdfs_1.out", target:"mobilize.hive_test_2", drop:true`
|
219
|
-
* `hive.run cmd:"select * from mobilize.hive_test_2"`
|
220
|
-
* `gsheet.write source:"stage2", target:"hive_test_2.out"`
|
221
|
-
* this test uses the output from the first hdfs test as an input, so make sure you've run that first.
|
222
|
-
* hive_test_3:
|
223
|
-
* `hive.write source:"hive://mobilize.hive_test_1",target:"mobilize/hive_test_3/date/product",drop:true`
|
224
|
-
* `hive.run hql:"select act_date as ```date```,product,category,value from mobilize.hive_test_1;"`
|
225
|
-
* `hive.write source:"stage2",target:"mobilize/hive_test_3/date/product", drop:false`
|
226
|
-
* `gsheet.write source:"hive://mobilize/hive_test_3", target:"hive_test_3.out"`
|
227
|
-
|
228
|
-
|
229
|
-
<a name='section_Meta'></a>
|
230
|
-
Meta
|
231
|
-
----
|
232
|
-
|
233
|
-
* Code: `git clone git://github.com/dena/mobilize-hive.git`
|
234
|
-
* Home: <https://github.com/dena/mobilize-hive>
|
235
|
-
* Bugs: <https://github.com/dena/mobilize-hive/issues>
|
236
|
-
* Gems: <http://rubygems.org/gems/mobilize-hive>
|
237
|
-
|
238
|
-
<a name='section_Special_Thanks'></a>
|
239
|
-
Special Thanks
|
240
|
-
--------------
|
241
|
-
* This release goes to Toby Negrin, who championed this project with
|
242
|
-
DeNA and gave me the support to get it properly architected, tested, and documented.
|
243
|
-
* Also many thanks to the Analytics team at DeNA who build and maintain
|
244
|
-
our Big Data infrastructure.
|
245
|
-
|
246
|
-
<a name='section_Author'></a>
|
247
|
-
Author
|
248
|
-
------
|
249
|
-
|
250
|
-
Cassio Paes-Leme :: cpaesleme@dena.com :: @cpaesleme
|
251
|
-
|
252
|
-
[mobilize-base]: https://github.com/dena/mobilize-base
|
253
|
-
[mobilize-ssh]: https://github.com/dena/mobilize-ssh
|
254
|
-
[mobilize-hdfs]: https://github.com/dena/mobilize-hdfs
|
4
|
+
Please refer to the mobilize-server wiki: https://github.com/DeNA/mobilize-server/wiki
|
@@ -94,22 +94,29 @@ module Mobilize
|
|
94
94
|
|
95
95
|
#run a generic hive command, with the option of passing a file hash to be locally available
|
96
96
|
def Hive.run(cluster,hql,user_name,params=nil,file_hash=nil)
|
97
|
-
|
98
|
-
|
97
|
+
preps = Hive.prepends.map do |p|
|
98
|
+
prefix = "set "
|
99
|
+
suffix = ";"
|
100
|
+
prep_out = p
|
101
|
+
prep_out = "#{prefix}#{prep_out}" unless prep_out.starts_with?(prefix)
|
102
|
+
prep_out = "#{prep_out}#{suffix}" unless prep_out.ends_with?(suffix)
|
103
|
+
prep_out
|
104
|
+
end.join
|
105
|
+
hql = "#{preps}#{hql}"
|
99
106
|
filename = hql.to_md5
|
100
107
|
file_hash||= {}
|
101
108
|
file_hash[filename] = hql
|
102
|
-
#add in default params
|
103
109
|
params ||= {}
|
104
|
-
params = params.merge(Hive.default_params)
|
105
110
|
#replace any params in the file_hash and command
|
106
111
|
params.each do |k,v|
|
107
112
|
file_hash.each do |name,data|
|
108
|
-
|
109
|
-
|
110
|
-
|
111
|
-
|
112
|
-
|
113
|
+
data.gsub!("@#{k}",v)
|
114
|
+
end
|
115
|
+
end
|
116
|
+
#add in default params
|
117
|
+
Hive.default_params.each do |k,v|
|
118
|
+
file_hash.each do |name,data|
|
119
|
+
data.gsub!(k,v)
|
113
120
|
end
|
114
121
|
end
|
115
122
|
#silent mode so we don't have logs in stderr; clip output
|
@@ -155,9 +162,9 @@ module Mobilize
|
|
155
162
|
Gdrive.unslot_worker_by_path(stage_path)
|
156
163
|
|
157
164
|
#check for select at end
|
158
|
-
hql_array = hql.split("
|
159
|
-
last_statement = hql_array.last
|
160
|
-
if last_statement.to_s.starts_with?("select")
|
165
|
+
hql_array = hql.split("\n").reject{|l| l.starts_with?("--") or l.strip.length==0}.join("\n").split(";").map{|h| h.strip}
|
166
|
+
last_statement = hql_array.last
|
167
|
+
if last_statement.to_s.downcase.starts_with?("select")
|
161
168
|
#nil if no prior commands
|
162
169
|
prior_hql = hql_array[0..-2].join(";") if hql_array.length > 1
|
163
170
|
select_hql = hql_array.last
|
@@ -181,41 +188,37 @@ module Mobilize
|
|
181
188
|
response
|
182
189
|
end
|
183
190
|
|
184
|
-
def Hive.schema_hash(schema_path,user_name,gdrive_slot)
|
185
|
-
if schema_path.index("
|
186
|
-
|
187
|
-
|
191
|
+
def Hive.schema_hash(schema_path,stage_path,user_name,gdrive_slot)
|
192
|
+
handler = if schema_path.index("://")
|
193
|
+
schema_path.split("://").first
|
194
|
+
else
|
195
|
+
"gsheet"
|
196
|
+
end
|
197
|
+
dst = "Mobilize::#{handler.downcase.capitalize}".constantize.path_to_dst(schema_path,stage_path,gdrive_slot)
|
198
|
+
out_raw = dst.read(user_name,gdrive_slot)
|
199
|
+
#determine the datatype for schema; accept json, yaml, tsv
|
200
|
+
if schema_path.ends_with?(".yml")
|
201
|
+
out_ha = begin;YAML.load(out_raw);rescue ScriptError, StandardError;nil;end if out_ha.nil?
|
188
202
|
else
|
189
|
-
|
190
|
-
|
191
|
-
r = u.runner
|
192
|
-
runner_sheet = r.gbook(gdrive_slot).worksheet_by_title(schema_path)
|
193
|
-
out_tsv = if runner_sheet
|
194
|
-
runner_sheet.read(user_name)
|
195
|
-
else
|
196
|
-
#check for gfile. will fail if there isn't one.
|
197
|
-
Gfile.find_by_path(schema_path).read(user_name)
|
198
|
-
end
|
203
|
+
out_ha = begin;JSON.parse(out_raw);rescue ScriptError, StandardError;nil;end
|
204
|
+
out_ha = out_raw.tsv_to_hash_array if out_ha.nil?
|
199
205
|
end
|
200
|
-
#use Gridfs to cache gdrive results
|
201
|
-
file_name = schema_path.split("/").last
|
202
|
-
out_url = "gridfs://#{schema_path}/#{file_name}"
|
203
|
-
Dataset.write_by_url(out_url,out_tsv,user_name)
|
204
|
-
schema_tsv = Dataset.find_by_url(out_url).read(user_name,gdrive_slot)
|
205
206
|
schema_hash = {}
|
206
|
-
|
207
|
-
schema_hash[
|
207
|
+
out_ha.each do |hash|
|
208
|
+
schema_hash[hash['name']] = hash['datatype']
|
208
209
|
end
|
209
210
|
schema_hash
|
210
211
|
end
|
211
212
|
|
212
|
-
def Hive.hql_to_table(cluster, db, table, part_array, source_hql, user_name, job_name, drop=false, schema_hash=nil,
|
213
|
+
def Hive.hql_to_table(cluster, db, table, part_array, source_hql, user_name, job_name, drop=false, schema_hash=nil, run_params=nil)
|
213
214
|
table_path = [db,table].join(".")
|
214
215
|
table_stats = Hive.table_stats(cluster, db, table, user_name)
|
215
216
|
url = "hive://" + [cluster,db,table,part_array.compact.join("/")].join("/")
|
216
217
|
|
217
|
-
|
218
|
-
|
218
|
+
#decomment hql
|
219
|
+
|
220
|
+
source_hql_array = source_hql.split("\n").reject{|l| l.starts_with?("--") or l.strip.length==0}.join("\n").split(";").map{|h| h.strip}
|
221
|
+
last_select_i = source_hql_array.rindex{|s| s.downcase.starts_with?("select")}
|
219
222
|
#find the last select query -- it should be used for the temp table creation
|
220
223
|
last_select_hql = (source_hql_array[last_select_i..-1].join(";")+";")
|
221
224
|
#if there is anything prior to the last select, add it in prior to table creation
|
@@ -228,7 +231,7 @@ module Mobilize
|
|
228
231
|
temp_set_hql = "set mapred.job.name=#{job_name} (temp table);"
|
229
232
|
temp_drop_hql = "drop table if exists #{temp_table_path};"
|
230
233
|
temp_create_hql = "#{temp_set_hql}#{prior_hql}#{temp_drop_hql}create table #{temp_table_path} as #{last_select_hql}"
|
231
|
-
response = Hive.run(cluster,temp_create_hql,user_name,
|
234
|
+
response = Hive.run(cluster,temp_create_hql,user_name,run_params)
|
232
235
|
raise response['stderr'] if response['stderr'].to_s.ie{|s| s.index("FAILED") or s.index("KILLED")}
|
233
236
|
|
234
237
|
source_table_stats = Hive.table_stats(cluster,temp_db,temp_table_name,user_name)
|
@@ -267,7 +270,7 @@ module Mobilize
|
|
267
270
|
target_insert_hql,
|
268
271
|
temp_drop_hql].join
|
269
272
|
|
270
|
-
response = Hive.run(cluster, target_full_hql, user_name,
|
273
|
+
response = Hive.run(cluster, target_full_hql, user_name, run_params)
|
271
274
|
|
272
275
|
raise response['stderr'] if response['stderr'].to_s.ie{|s| s.index("FAILED") or s.index("KILLED")}
|
273
276
|
|
@@ -319,7 +322,7 @@ module Mobilize
|
|
319
322
|
part_set_hql = "set hive.cli.print.header=true;set mapred.job.name=#{job_name} (permutations);"
|
320
323
|
part_select_hql = "select distinct #{target_part_stmt} from #{temp_table_path};"
|
321
324
|
part_perm_hql = part_set_hql + part_select_hql
|
322
|
-
response = Hive.run(cluster, part_perm_hql, user_name,
|
325
|
+
response = Hive.run(cluster, part_perm_hql, user_name, run_params)
|
323
326
|
raise response['stderr'] if response['stderr'].to_s.ie{|s| s.index("FAILED") or s.index("KILLED")}
|
324
327
|
part_perm_tsv = response['stdout']
|
325
328
|
#having gotten the permutations, ensure they are dropped
|
@@ -332,7 +335,7 @@ module Mobilize
|
|
332
335
|
|
333
336
|
part_drop_hql = part_hash_array.map do |h|
|
334
337
|
part_drop_stmt = h.map do |name,value|
|
335
|
-
part_defs[name[1..-2]]=="string" ? "#{name}='#{value}'" : "#{name}=#{value}"
|
338
|
+
part_defs[name[1..-2]].downcase=="string" ? "#{name}='#{value}'" : "#{name}=#{value}"
|
336
339
|
end.join(",")
|
337
340
|
"use #{db};alter table #{table} drop if exists partition (#{part_drop_stmt});"
|
338
341
|
end.join
|
@@ -345,7 +348,7 @@ module Mobilize
|
|
345
348
|
|
346
349
|
target_full_hql = [target_set_hql, target_create_hql, target_insert_hql, temp_drop_hql].join
|
347
350
|
|
348
|
-
response = Hive.run(cluster, target_full_hql, user_name,
|
351
|
+
response = Hive.run(cluster, target_full_hql, user_name, run_params)
|
349
352
|
raise response['stderr'] if response['stderr'].to_s.ie{|s| s.index("FAILED") or s.index("KILLED")}
|
350
353
|
else
|
351
354
|
error_msg = "Incompatible partition specs"
|
@@ -500,7 +503,7 @@ module Mobilize
|
|
500
503
|
job_name = s.path.sub("Runner_","")
|
501
504
|
|
502
505
|
schema_hash = if params['schema']
|
503
|
-
Hive.schema_hash(params['schema'],user_name,gdrive_slot)
|
506
|
+
Hive.schema_hash(params['schema'],stage_path,user_name,gdrive_slot)
|
504
507
|
else
|
505
508
|
{}
|
506
509
|
end
|
@@ -543,7 +546,8 @@ module Mobilize
|
|
543
546
|
result = begin
|
544
547
|
url = if source_hql
|
545
548
|
#include any params (or nil) at the end
|
546
|
-
|
549
|
+
run_params = params['params']
|
550
|
+
Hive.hql_to_table(cluster, db, table, part_array, source_hql, user_name, job_name, drop, schema_hash,run_params)
|
547
551
|
elsif source_tsv
|
548
552
|
Hive.tsv_to_table(cluster, db, table, part_array, source_tsv, user_name, drop, schema_hash)
|
549
553
|
elsif source
|
@@ -26,6 +26,10 @@ module Mobilize
|
|
26
26
|
(1..self.clusters[cluster]['max_slots']).to_a.map{|s| "#{cluster}_#{s.to_s}"}
|
27
27
|
end
|
28
28
|
|
29
|
+
def self.prepends
|
30
|
+
self.config['prepends']
|
31
|
+
end
|
32
|
+
|
29
33
|
def self.slot_worker_by_cluster_and_path(cluster,path)
|
30
34
|
working_slots = Mobilize::Resque.jobs.map{|j| begin j['args'][1]['hive_slot'];rescue;nil;end}.compact.uniq
|
31
35
|
self.slot_ids(cluster).each do |slot_id|
|
data/lib/mobilize-hive/tasks.rb
CHANGED
data/lib/mobilize-hive.rb
CHANGED
data/lib/samples/hive.yml
CHANGED
@@ -1,17 +1,23 @@
|
|
1
1
|
---
|
2
2
|
development:
|
3
|
+
prepends:
|
4
|
+
- "hive.stats.autogather=false"
|
3
5
|
clusters:
|
4
6
|
dev_cluster:
|
5
7
|
max_slots: 5
|
6
8
|
temp_table_db: mobilize
|
7
9
|
exec_path: /path/to/hive
|
8
10
|
test:
|
11
|
+
prepends:
|
12
|
+
- "hive.stats.autogather=false"
|
9
13
|
clusters:
|
10
14
|
test_cluster:
|
11
15
|
max_slots: 5
|
12
16
|
temp_table_db: mobilize
|
13
17
|
exec_path: /path/to/hive
|
14
18
|
production:
|
19
|
+
prepends:
|
20
|
+
- "hive.stats.autogather=false"
|
15
21
|
clusters:
|
16
22
|
prod_cluster:
|
17
23
|
max_slots: 5
|
data/mobilize-hive.gemspec
CHANGED
@@ -16,5 +16,5 @@ Gem::Specification.new do |gem|
|
|
16
16
|
gem.executables = gem.files.grep(%r{^bin/}).map{ |f| File.basename(f) }
|
17
17
|
gem.test_files = gem.files.grep(%r{^(test|spec|features)/})
|
18
18
|
gem.require_paths = ["lib"]
|
19
|
-
gem.add_runtime_dependency "mobilize-hdfs","1.
|
19
|
+
gem.add_runtime_dependency "mobilize-hdfs","1.36"
|
20
20
|
end
|
File without changes
|
File without changes
|
File without changes
|
@@ -0,0 +1 @@
|
|
1
|
+
select act_date,product, sum(value) as sum from mobilize.hive_test_1 group by act_date,product;
|
@@ -0,0 +1 @@
|
|
1
|
+
|
@@ -0,0 +1,69 @@
|
|
1
|
+
---
|
2
|
+
- path: "Runner_mobilize(test)/jobs"
|
3
|
+
state: working
|
4
|
+
count: 1
|
5
|
+
confirmed_ats: []
|
6
|
+
- path: "Runner_mobilize(test)/jobs/hive1/stage1"
|
7
|
+
state: working
|
8
|
+
count: 1
|
9
|
+
confirmed_ats: []
|
10
|
+
- path: "Runner_mobilize(test)/jobs/hive1/stage2"
|
11
|
+
state: working
|
12
|
+
count: 1
|
13
|
+
confirmed_ats: []
|
14
|
+
- path: "Runner_mobilize(test)/jobs/hive1/stage3"
|
15
|
+
state: working
|
16
|
+
count: 1
|
17
|
+
confirmed_ats: []
|
18
|
+
- path: "Runner_mobilize(test)/jobs/hive1/stage4"
|
19
|
+
state: working
|
20
|
+
count: 1
|
21
|
+
confirmed_ats: []
|
22
|
+
- path: "Runner_mobilize(test)/jobs/hive1/stage5"
|
23
|
+
state: working
|
24
|
+
count: 1
|
25
|
+
confirmed_ats: []
|
26
|
+
- path: "Runner_mobilize(test)/jobs/hive2/stage1"
|
27
|
+
state: working
|
28
|
+
count: 1
|
29
|
+
confirmed_ats: []
|
30
|
+
- path: "Runner_mobilize(test)/jobs/hive2/stage2"
|
31
|
+
state: working
|
32
|
+
count: 1
|
33
|
+
confirmed_ats: []
|
34
|
+
- path: "Runner_mobilize(test)/jobs/hive2/stage3"
|
35
|
+
state: working
|
36
|
+
count: 1
|
37
|
+
confirmed_ats: []
|
38
|
+
- path: "Runner_mobilize(test)/jobs/hive3/stage1"
|
39
|
+
state: working
|
40
|
+
count: 1
|
41
|
+
confirmed_ats: []
|
42
|
+
- path: "Runner_mobilize(test)/jobs/hive3/stage2"
|
43
|
+
state: working
|
44
|
+
count: 1
|
45
|
+
confirmed_ats: []
|
46
|
+
- path: "Runner_mobilize(test)/jobs/hive3/stage3"
|
47
|
+
state: working
|
48
|
+
count: 1
|
49
|
+
confirmed_ats: []
|
50
|
+
- path: "Runner_mobilize(test)/jobs/hive3/stage4"
|
51
|
+
state: working
|
52
|
+
count: 1
|
53
|
+
confirmed_ats: []
|
54
|
+
- path: "Runner_mobilize(test)/jobs/hive4/stage1"
|
55
|
+
state: working
|
56
|
+
count: 1
|
57
|
+
confirmed_ats: []
|
58
|
+
- path: "Runner_mobilize(test)/jobs/hive4/stage2"
|
59
|
+
state: working
|
60
|
+
count: 1
|
61
|
+
confirmed_ats: []
|
62
|
+
- path: "Runner_mobilize(test)/jobs/hive4/stage3"
|
63
|
+
state: working
|
64
|
+
count: 1
|
65
|
+
confirmed_ats: []
|
66
|
+
- path: "Runner_mobilize(test)/jobs/hive4/stage4"
|
67
|
+
state: working
|
68
|
+
count: 1
|
69
|
+
confirmed_ats: []
|
@@ -0,0 +1,34 @@
|
|
1
|
+
---
|
2
|
+
- name: hive1
|
3
|
+
active: true
|
4
|
+
trigger: once
|
5
|
+
status: ""
|
6
|
+
stage1: hive.write target:"mobilize/hive1", partitions:"act_date", drop:true,
|
7
|
+
source:"Runner_mobilize(test)/hive1.in", schema:"hive1.schema"
|
8
|
+
stage2: hive.run source:"hive1.sql"
|
9
|
+
stage3: hive.run hql:"show databases;"
|
10
|
+
stage4: gsheet.write source:"stage2", target:"hive1_stage2.out"
|
11
|
+
stage5: gsheet.write source:"stage3", target:"hive1_stage3.out"
|
12
|
+
- name: hive2
|
13
|
+
active: true
|
14
|
+
trigger: after hive1
|
15
|
+
status: ""
|
16
|
+
stage1: hive.write source:"hdfs://user/mobilize/test/hdfs1.out", target:"mobilize.hive2", drop:true
|
17
|
+
stage2: hive.run hql:"select * from mobilize.hive2;"
|
18
|
+
stage3: gsheet.write source:"stage2", target:"hive2.out"
|
19
|
+
- name: hive3
|
20
|
+
active: true
|
21
|
+
trigger: after hive2
|
22
|
+
status: ""
|
23
|
+
stage1: hive.run hql:"select '@date' as `date`,product,category,value from mobilize.hive1;", params:{'date':'2013-01-01'}
|
24
|
+
stage2: hive.write source:"stage1",target:"mobilize/hive3", partitions:"date/product", drop:true
|
25
|
+
stage3: hive.write hql:"select * from mobilize.hive3;",target:"mobilize/hive3", partitions:"date/product", drop:false
|
26
|
+
stage4: gsheet.write source:"hive://mobilize/hive3", target:"hive3.out"
|
27
|
+
- name: hive4
|
28
|
+
active: true
|
29
|
+
trigger: after hive3
|
30
|
+
status: ""
|
31
|
+
stage1: hive.write source:"hive4_stage1.in", target:"mobilize/hive1", partitions:"act_date"
|
32
|
+
stage2: hive.write source:"hive4_stage2.in", target:"mobilize/hive1", partitions:"act_date"
|
33
|
+
stage3: hive.run hql:"select '@date $utc_time' as `date_time`,product,category,value from mobilize.hive1;", params:{'date':'$utc_date'}
|
34
|
+
stage4: gsheet.write source:stage3, target:"hive4.out"
|
@@ -0,0 +1,43 @@
|
|
1
|
+
require 'test_helper'
|
2
|
+
describe "Mobilize" do
|
3
|
+
# enqueues 4 workers on Resque
|
4
|
+
it "runs integration test" do
|
5
|
+
|
6
|
+
puts "restart workers"
|
7
|
+
Mobilize::Jobtracker.restart_workers!
|
8
|
+
|
9
|
+
u = TestHelper.owner_user
|
10
|
+
r = u.runner
|
11
|
+
user_name = u.name
|
12
|
+
gdrive_slot = u.email
|
13
|
+
|
14
|
+
puts "add test data"
|
15
|
+
["hive1.in","hive4_stage1.in","hive4_stage2.in","hive1.schema","hive1.sql"].each do |fixture_name|
|
16
|
+
target_url = "gsheet://#{r.title}/#{fixture_name}"
|
17
|
+
TestHelper.write_fixture(fixture_name, target_url, 'replace')
|
18
|
+
end
|
19
|
+
|
20
|
+
puts "add/update jobs"
|
21
|
+
u.jobs.each{|j| j.delete}
|
22
|
+
jobs_fixture_name = "integration_jobs"
|
23
|
+
jobs_target_url = "gsheet://#{r.title}/jobs"
|
24
|
+
TestHelper.write_fixture(jobs_fixture_name, jobs_target_url, 'update')
|
25
|
+
|
26
|
+
puts "job rows added, force enqueue runner, wait for stages"
|
27
|
+
#wait for stages to complete
|
28
|
+
expected_fixture_name = "integration_expected"
|
29
|
+
Mobilize::Jobtracker.stop!
|
30
|
+
r.enqueue!
|
31
|
+
TestHelper.confirm_expected_jobs(expected_fixture_name,2100)
|
32
|
+
|
33
|
+
puts "update job status and activity"
|
34
|
+
r.update_gsheet(gdrive_slot)
|
35
|
+
|
36
|
+
puts "check posted data"
|
37
|
+
assert TestHelper.check_output("gsheet://#{r.title}/hive1_stage2.out", 'min_length' => 219) == true
|
38
|
+
assert TestHelper.check_output("gsheet://#{r.title}/hive1_stage3.out", 'min_length' => 3) == true
|
39
|
+
assert TestHelper.check_output("gsheet://#{r.title}/hive2.out", 'min_length' => 599) == true
|
40
|
+
assert TestHelper.check_output("gsheet://#{r.title}/hive3.out", 'min_length' => 347) == true
|
41
|
+
assert TestHelper.check_output("gsheet://#{r.title}/hive4.out", 'min_length' => 432) == true
|
42
|
+
end
|
43
|
+
end
|
data/test/test_helper.rb
CHANGED
metadata
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
2
2
|
name: mobilize-hive
|
3
3
|
version: !ruby/object:Gem::Version
|
4
|
-
version: '1.
|
4
|
+
version: '1.36'
|
5
5
|
prerelease:
|
6
6
|
platform: ruby
|
7
7
|
authors:
|
@@ -9,7 +9,7 @@ authors:
|
|
9
9
|
autorequire:
|
10
10
|
bindir: bin
|
11
11
|
cert_chain: []
|
12
|
-
date: 2013-
|
12
|
+
date: 2013-05-21 00:00:00.000000000 Z
|
13
13
|
dependencies:
|
14
14
|
- !ruby/object:Gem::Dependency
|
15
15
|
name: mobilize-hdfs
|
@@ -18,7 +18,7 @@ dependencies:
|
|
18
18
|
requirements:
|
19
19
|
- - '='
|
20
20
|
- !ruby/object:Gem::Version
|
21
|
-
version: '1.
|
21
|
+
version: '1.36'
|
22
22
|
type: :runtime
|
23
23
|
prerelease: false
|
24
24
|
version_requirements: !ruby/object:Gem::Requirement
|
@@ -26,7 +26,7 @@ dependencies:
|
|
26
26
|
requirements:
|
27
27
|
- - '='
|
28
28
|
- !ruby/object:Gem::Version
|
29
|
-
version: '1.
|
29
|
+
version: '1.36'
|
30
30
|
description: Adds hive read, write, and run support to mobilize-hdfs
|
31
31
|
email:
|
32
32
|
- cpaesleme@dena.com
|
@@ -46,11 +46,15 @@ files:
|
|
46
46
|
- lib/mobilize-hive/version.rb
|
47
47
|
- lib/samples/hive.yml
|
48
48
|
- mobilize-hive.gemspec
|
49
|
-
- test/
|
50
|
-
- test/
|
51
|
-
- test/
|
52
|
-
- test/
|
53
|
-
- test/
|
49
|
+
- test/fixtures/hive1.hql
|
50
|
+
- test/fixtures/hive1.in.yml
|
51
|
+
- test/fixtures/hive1.schema.yml
|
52
|
+
- test/fixtures/hive1.sql
|
53
|
+
- test/fixtures/hive4_stage1.in
|
54
|
+
- test/fixtures/hive4_stage2.in.yml
|
55
|
+
- test/fixtures/integration_expected.yml
|
56
|
+
- test/fixtures/integration_jobs.yml
|
57
|
+
- test/integration/mobilize-hive_test.rb
|
54
58
|
- test/redis-test.conf
|
55
59
|
- test/test_helper.rb
|
56
60
|
homepage: http://github.com/dena/mobilize-hive
|
@@ -67,7 +71,7 @@ required_ruby_version: !ruby/object:Gem::Requirement
|
|
67
71
|
version: '0'
|
68
72
|
segments:
|
69
73
|
- 0
|
70
|
-
hash:
|
74
|
+
hash: 837156919845089008
|
71
75
|
required_rubygems_version: !ruby/object:Gem::Requirement
|
72
76
|
none: false
|
73
77
|
requirements:
|
@@ -76,7 +80,7 @@ required_rubygems_version: !ruby/object:Gem::Requirement
|
|
76
80
|
version: '0'
|
77
81
|
segments:
|
78
82
|
- 0
|
79
|
-
hash:
|
83
|
+
hash: 837156919845089008
|
80
84
|
requirements: []
|
81
85
|
rubyforge_project:
|
82
86
|
rubygems_version: 1.8.25
|
@@ -84,10 +88,14 @@ signing_key:
|
|
84
88
|
specification_version: 3
|
85
89
|
summary: Adds hive read, write, and run support to mobilize-hdfs
|
86
90
|
test_files:
|
87
|
-
- test/
|
88
|
-
- test/
|
89
|
-
- test/
|
90
|
-
- test/
|
91
|
-
- test/
|
91
|
+
- test/fixtures/hive1.hql
|
92
|
+
- test/fixtures/hive1.in.yml
|
93
|
+
- test/fixtures/hive1.schema.yml
|
94
|
+
- test/fixtures/hive1.sql
|
95
|
+
- test/fixtures/hive4_stage1.in
|
96
|
+
- test/fixtures/hive4_stage2.in.yml
|
97
|
+
- test/fixtures/integration_expected.yml
|
98
|
+
- test/fixtures/integration_jobs.yml
|
99
|
+
- test/integration/mobilize-hive_test.rb
|
92
100
|
- test/redis-test.conf
|
93
101
|
- test/test_helper.rb
|
data/test/hive_job_rows.yml
DELETED
@@ -1,34 +0,0 @@
|
|
1
|
-
---
|
2
|
-
- name: hive_test_1
|
3
|
-
active: true
|
4
|
-
trigger: once
|
5
|
-
status: ""
|
6
|
-
stage1: hive.write target:"mobilize/hive_test_1", partitions:"act_date", drop:true,
|
7
|
-
source:"Runner_mobilize(test)/hive_test_1.in", schema:"hive_test_1.schema"
|
8
|
-
stage2: hive.run source:"hive_test_1.hql"
|
9
|
-
stage3: hive.run hql:"show databases;"
|
10
|
-
stage4: gsheet.write source:"stage2", target:"hive_test_1_stage_2.out"
|
11
|
-
stage5: gsheet.write source:"stage3", target:"hive_test_1_stage_3.out"
|
12
|
-
- name: hive_test_2
|
13
|
-
active: true
|
14
|
-
trigger: after hive_test_1
|
15
|
-
status: ""
|
16
|
-
stage1: hive.write source:"hdfs://user/mobilize/test/test_hdfs_1.out", target:"mobilize.hive_test_2", drop:true
|
17
|
-
stage2: hive.run hql:"select * from mobilize.hive_test_2;"
|
18
|
-
stage3: gsheet.write source:"stage2", target:"hive_test_2.out"
|
19
|
-
- name: hive_test_3
|
20
|
-
active: true
|
21
|
-
trigger: after hive_test_2
|
22
|
-
status: ""
|
23
|
-
stage1: hive.run hql:"select '@date' as `date`,product,category,value from mobilize.hive_test_1;", params:{'date':'2013-01-01'}
|
24
|
-
stage2: hive.write source:"stage1",target:"mobilize/hive_test_3", partitions:"date/product", drop:true
|
25
|
-
stage3: hive.write hql:"select * from mobilize.hive_test_3;",target:"mobilize/hive_test_3", partitions:"date/product", drop:false
|
26
|
-
stage4: gsheet.write source:"hive://mobilize/hive_test_3", target:"hive_test_3.out"
|
27
|
-
- name: hive_test_4
|
28
|
-
active: true
|
29
|
-
trigger: after hive_test_3
|
30
|
-
status: ""
|
31
|
-
stage1: hive.write source:"hive_test_4_stage_1.in", target:"mobilize/hive_test_1", partitions:"act_date"
|
32
|
-
stage2: hive.write source:"hive_test_4_stage_2.in", target:"mobilize/hive_test_1", partitions:"act_date"
|
33
|
-
stage3: hive.run hql:"select '$utc_date $utc_time' as `date_time`,product,category,value from mobilize.hive_test_1;"
|
34
|
-
stage4: gsheet.write source:stage3, target:"hive_test_4.out"
|
data/test/mobilize-hive_test.rb
DELETED
@@ -1,112 +0,0 @@
|
|
1
|
-
require 'test_helper'
|
2
|
-
|
3
|
-
describe "Mobilize" do
|
4
|
-
|
5
|
-
def before
|
6
|
-
puts 'nothing before'
|
7
|
-
end
|
8
|
-
|
9
|
-
# enqueues 4 workers on Resque
|
10
|
-
it "runs integration test" do
|
11
|
-
|
12
|
-
puts "restart workers"
|
13
|
-
Mobilize::Jobtracker.restart_workers!
|
14
|
-
|
15
|
-
gdrive_slot = Mobilize::Gdrive.owner_email
|
16
|
-
puts "create user 'mobilize'"
|
17
|
-
user_name = gdrive_slot.split("@").first
|
18
|
-
u = Mobilize::User.where(:name=>user_name).first
|
19
|
-
r = u.runner
|
20
|
-
|
21
|
-
puts "add test_source data"
|
22
|
-
hive_1_in_sheet = Mobilize::Gsheet.find_by_path("#{r.path.split("/")[0..-2].join("/")}/hive_test_1.in",gdrive_slot)
|
23
|
-
[hive_1_in_sheet].each {|s| s.delete if s}
|
24
|
-
hive_1_in_sheet = Mobilize::Gsheet.find_or_create_by_path("#{r.path.split("/")[0..-2].join("/")}/hive_test_1.in",gdrive_slot)
|
25
|
-
hive_1_in_tsv = YAML.load_file("#{Mobilize::Base.root}/test/hive_test_1_in.yml").hash_array_to_tsv
|
26
|
-
hive_1_in_sheet.write(hive_1_in_tsv,Mobilize::Gdrive.owner_name)
|
27
|
-
|
28
|
-
#create blank sheet
|
29
|
-
hive_4_stage_1_in_sheet = Mobilize::Gsheet.find_by_path("#{r.path.split("/")[0..-2].join("/")}/hive_test_4_stage_1.in",gdrive_slot)
|
30
|
-
[hive_4_stage_1_in_sheet].each {|s| s.delete if s}
|
31
|
-
hive_4_stage_1_in_sheet = Mobilize::Gsheet.find_or_create_by_path("#{r.path.split("/")[0..-2].join("/")}/hive_test_4_stage_1.in",gdrive_slot)
|
32
|
-
|
33
|
-
#create sheet w just headers
|
34
|
-
hive_4_stage_2_in_sheet = Mobilize::Gsheet.find_by_path("#{r.path.split("/")[0..-2].join("/")}/hive_test_4_stage_2.in",gdrive_slot)
|
35
|
-
[hive_4_stage_2_in_sheet].each {|s| s.delete if s}
|
36
|
-
hive_4_stage_2_in_sheet = Mobilize::Gsheet.find_or_create_by_path("#{r.path.split("/")[0..-2].join("/")}/hive_test_4_stage_2.in",gdrive_slot)
|
37
|
-
hive_4_stage_2_in_sheet_header = hive_1_in_tsv.tsv_header_array.join("\t")
|
38
|
-
hive_4_stage_2_in_sheet.write(hive_4_stage_2_in_sheet_header,Mobilize::Gdrive.owner_name)
|
39
|
-
|
40
|
-
hive_1_schema_sheet = Mobilize::Gsheet.find_by_path("#{r.path.split("/")[0..-2].join("/")}/hive_test_1.schema",gdrive_slot)
|
41
|
-
[hive_1_schema_sheet].each {|s| s.delete if s}
|
42
|
-
hive_1_schema_sheet = Mobilize::Gsheet.find_or_create_by_path("#{r.path.split("/")[0..-2].join("/")}/hive_test_1.schema",gdrive_slot)
|
43
|
-
hive_1_schema_tsv = YAML.load_file("#{Mobilize::Base.root}/test/hive_test_1_schema.yml").hash_array_to_tsv
|
44
|
-
hive_1_schema_sheet.write(hive_1_schema_tsv,Mobilize::Gdrive.owner_name)
|
45
|
-
|
46
|
-
hive_1_hql_sheet = Mobilize::Gsheet.find_by_path("#{r.path.split("/")[0..-2].join("/")}/hive_test_1.hql",gdrive_slot)
|
47
|
-
[hive_1_hql_sheet].each {|s| s.delete if s}
|
48
|
-
hive_1_hql_sheet = Mobilize::Gsheet.find_or_create_by_path("#{r.path.split("/")[0..-2].join("/")}/hive_test_1.hql",gdrive_slot)
|
49
|
-
hive_1_hql_tsv = File.open("#{Mobilize::Base.root}/test/hive_test_1.hql").read
|
50
|
-
hive_1_hql_sheet.write(hive_1_hql_tsv,Mobilize::Gdrive.owner_name)
|
51
|
-
|
52
|
-
jobs_sheet = r.gsheet(gdrive_slot)
|
53
|
-
|
54
|
-
test_job_rows = ::YAML.load_file("#{Mobilize::Base.root}/test/hive_job_rows.yml")
|
55
|
-
test_job_rows.map{|j| r.jobs(j['name'])}.each{|j| j.delete if j}
|
56
|
-
jobs_sheet.add_or_update_rows(test_job_rows)
|
57
|
-
|
58
|
-
hive_1_stage_2_target_sheet = Mobilize::Gsheet.find_by_path("#{r.path.split("/")[0..-2].join("/")}/hive_test_1_stage_2.out",gdrive_slot)
|
59
|
-
[hive_1_stage_2_target_sheet].each{|s| s.delete if s}
|
60
|
-
hive_1_stage_3_target_sheet = Mobilize::Gsheet.find_by_path("#{r.path.split("/")[0..-2].join("/")}/hive_test_1_stage_3.out",gdrive_slot)
|
61
|
-
[hive_1_stage_3_target_sheet].each{|s| s.delete if s}
|
62
|
-
hive_2_target_sheet = Mobilize::Gsheet.find_by_path("#{r.path.split("/")[0..-2].join("/")}/hive_test_2.out",gdrive_slot)
|
63
|
-
[hive_2_target_sheet].each{|s| s.delete if s}
|
64
|
-
hive_3_target_sheet = Mobilize::Gsheet.find_by_path("#{r.path.split("/")[0..-2].join("/")}/hive_test_3.out",gdrive_slot)
|
65
|
-
[hive_3_target_sheet].each{|s| s.delete if s}
|
66
|
-
hive_4_target_sheet = Mobilize::Gsheet.find_by_path("#{r.path.split("/")[0..-2].join("/")}/hive_test_4.out",gdrive_slot)
|
67
|
-
[hive_4_target_sheet].each{|s| s.delete if s}
|
68
|
-
|
69
|
-
puts "job row added, force enqueued requestor, wait for stages"
|
70
|
-
r.enqueue!
|
71
|
-
wait_for_stages(2100)
|
72
|
-
|
73
|
-
puts "jobtracker posted data to test sheet"
|
74
|
-
hive_1_stage_2_target_sheet = Mobilize::Gsheet.find_by_path("#{r.path.split("/")[0..-2].join("/")}/hive_test_1_stage_2.out",gdrive_slot)
|
75
|
-
hive_1_stage_3_target_sheet = Mobilize::Gsheet.find_by_path("#{r.path.split("/")[0..-2].join("/")}/hive_test_1_stage_3.out",gdrive_slot)
|
76
|
-
hive_2_target_sheet = Mobilize::Gsheet.find_by_path("#{r.path.split("/")[0..-2].join("/")}/hive_test_2.out",gdrive_slot)
|
77
|
-
hive_3_target_sheet = Mobilize::Gsheet.find_by_path("#{r.path.split("/")[0..-2].join("/")}/hive_test_3.out",gdrive_slot)
|
78
|
-
hive_4_target_sheet = Mobilize::Gsheet.find_by_path("#{r.path.split("/")[0..-2].join("/")}/hive_test_4.out",gdrive_slot)
|
79
|
-
|
80
|
-
assert hive_1_stage_2_target_sheet.read(u.name).length == 219
|
81
|
-
assert hive_1_stage_3_target_sheet.read(u.name).length > 3
|
82
|
-
assert hive_2_target_sheet.read(u.name).length == 599
|
83
|
-
assert hive_3_target_sheet.read(u.name).length == 347
|
84
|
-
assert hive_4_target_sheet.read(u.name).length == 432
|
85
|
-
end
|
86
|
-
|
87
|
-
def wait_for_stages(time_limit=600,stage_limit=120,wait_length=10)
|
88
|
-
time = 0
|
89
|
-
time_since_stage = 0
|
90
|
-
#check for 10 min
|
91
|
-
while time < time_limit and time_since_stage < stage_limit
|
92
|
-
sleep wait_length
|
93
|
-
job_classes = Mobilize::Resque.jobs.map{|j| j['class']}
|
94
|
-
if job_classes.include?("Mobilize::Stage")
|
95
|
-
time_since_stage = 0
|
96
|
-
puts "saw stage at #{time.to_s} seconds"
|
97
|
-
else
|
98
|
-
time_since_stage += wait_length
|
99
|
-
puts "#{time_since_stage.to_s} seconds since stage seen"
|
100
|
-
end
|
101
|
-
time += wait_length
|
102
|
-
puts "total wait time #{time.to_s} seconds"
|
103
|
-
end
|
104
|
-
|
105
|
-
if time >= time_limit
|
106
|
-
raise "Timed out before stage completion"
|
107
|
-
end
|
108
|
-
end
|
109
|
-
|
110
|
-
|
111
|
-
|
112
|
-
end
|