mobilize-hive 1.35 → 1.36
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- data/README.md +3 -253
- data/lib/mobilize-hive/handlers/hive.rb +47 -43
- data/lib/mobilize-hive/helpers/hive_helper.rb +4 -0
- data/lib/mobilize-hive/tasks.rb +1 -0
- data/lib/mobilize-hive/version.rb +1 -1
- data/lib/mobilize-hive.rb +3 -0
- data/lib/samples/hive.yml +6 -0
- data/mobilize-hive.gemspec +1 -1
- data/test/{hive_test_1.hql → fixtures/hive1.hql} +0 -0
- data/test/{hive_test_1_in.yml → fixtures/hive1.in.yml} +0 -0
- data/test/{hive_test_1_schema.yml → fixtures/hive1.schema.yml} +0 -0
- data/test/fixtures/hive1.sql +1 -0
- data/test/fixtures/hive4_stage1.in +1 -0
- data/test/fixtures/hive4_stage2.in.yml +4 -0
- data/test/fixtures/integration_expected.yml +69 -0
- data/test/fixtures/integration_jobs.yml +34 -0
- data/test/integration/mobilize-hive_test.rb +43 -0
- data/test/test_helper.rb +1 -0
- metadata +24 -16
- data/test/hive_job_rows.yml +0 -34
- data/test/mobilize-hive_test.rb +0 -112
data/README.md
CHANGED
|
@@ -1,254 +1,4 @@
|
|
|
1
|
-
Mobilize
|
|
2
|
-
|
|
1
|
+
Mobilize
|
|
2
|
+
========
|
|
3
3
|
|
|
4
|
-
|
|
5
|
-
* read, write, and copy hive files through Google Spreadsheets.
|
|
6
|
-
|
|
7
|
-
Table Of Contents
|
|
8
|
-
-----------------
|
|
9
|
-
* [Overview](#section_Overview)
|
|
10
|
-
* [Install](#section_Install)
|
|
11
|
-
* [Mobilize-Hive](#section_Install_Mobilize-Hive)
|
|
12
|
-
* [Install Dirs and Files](#section_Install_Dirs_and_Files)
|
|
13
|
-
* [Configure](#section_Configure)
|
|
14
|
-
* [Hive](#section_Configure_Hive)
|
|
15
|
-
* [Start](#section_Start)
|
|
16
|
-
* [Create Job](#section_Start_Create_Job)
|
|
17
|
-
* [Run Test](#section_Start_Run_Test)
|
|
18
|
-
* [Meta](#section_Meta)
|
|
19
|
-
* [Special Thanks](#section_Special_Thanks)
|
|
20
|
-
* [Author](#section_Author)
|
|
21
|
-
|
|
22
|
-
<a name='section_Overview'></a>
|
|
23
|
-
Overview
|
|
24
|
-
-----------
|
|
25
|
-
|
|
26
|
-
* Mobilize-hive adds Hive methods to mobilize-hdfs.
|
|
27
|
-
|
|
28
|
-
<a name='section_Install'></a>
|
|
29
|
-
Install
|
|
30
|
-
------------
|
|
31
|
-
|
|
32
|
-
Make sure you go through all the steps in the
|
|
33
|
-
[mobilize-base][mobilize-base],
|
|
34
|
-
[mobilize-ssh][mobilize-ssh],
|
|
35
|
-
[mobilize-hdfs][mobilize-hdfs],
|
|
36
|
-
install sections first.
|
|
37
|
-
|
|
38
|
-
<a name='section_Install_Mobilize-Hive'></a>
|
|
39
|
-
### Mobilize-Hive
|
|
40
|
-
|
|
41
|
-
add this to your Gemfile:
|
|
42
|
-
|
|
43
|
-
``` ruby
|
|
44
|
-
gem "mobilize-hive"
|
|
45
|
-
```
|
|
46
|
-
|
|
47
|
-
or do
|
|
48
|
-
|
|
49
|
-
$ gem install mobilize-hive
|
|
50
|
-
|
|
51
|
-
for a ruby-wide install.
|
|
52
|
-
|
|
53
|
-
<a name='section_Install_Dirs_and_Files'></a>
|
|
54
|
-
### Dirs and Files
|
|
55
|
-
|
|
56
|
-
### Rakefile
|
|
57
|
-
|
|
58
|
-
Inside the Rakefile in your project's root dir, make sure you have:
|
|
59
|
-
|
|
60
|
-
``` ruby
|
|
61
|
-
require 'mobilize-base/tasks'
|
|
62
|
-
require 'mobilize-ssh/tasks'
|
|
63
|
-
require 'mobilize-hdfs/tasks'
|
|
64
|
-
require 'mobilize-hive/tasks'
|
|
65
|
-
```
|
|
66
|
-
|
|
67
|
-
This defines rake tasks essential to run the environment.
|
|
68
|
-
|
|
69
|
-
### Config Dir
|
|
70
|
-
|
|
71
|
-
run
|
|
72
|
-
|
|
73
|
-
$ rake mobilize_hive:setup
|
|
74
|
-
|
|
75
|
-
This will copy over a sample hive.yml to your config dir.
|
|
76
|
-
|
|
77
|
-
<a name='section_Configure'></a>
|
|
78
|
-
Configure
|
|
79
|
-
------------
|
|
80
|
-
|
|
81
|
-
<a name='section_Configure_Hive'></a>
|
|
82
|
-
### Configure Hive
|
|
83
|
-
|
|
84
|
-
* Hive is big data. That means we need to be careful when reading from
|
|
85
|
-
the cluster as it could easily fill up our mongodb instance, RAM, local disk
|
|
86
|
-
space, etc.
|
|
87
|
-
* To achieve this, all hive operations, stage outputs, etc. are
|
|
88
|
-
executed and stored on the cluster only.
|
|
89
|
-
* The exceptions are:
|
|
90
|
-
* writing to the cluster from an external source, such as a google
|
|
91
|
-
sheet. Here there
|
|
92
|
-
is no risk as the external source has much more strict size limits than
|
|
93
|
-
hive.
|
|
94
|
-
* reading from the cluster, such as for posting to google sheet. In
|
|
95
|
-
this case, the read_limit parameter dictates the maximum amount that can
|
|
96
|
-
be read. If the data is bigger than the read limit, an exception will be
|
|
97
|
-
raised.
|
|
98
|
-
|
|
99
|
-
The Hive configuration consists of:
|
|
100
|
-
* clusters - this defines aliases for clusters, which are used as
|
|
101
|
-
parameters for Hive stages. They should have the same name as those
|
|
102
|
-
in hadoop.yml. Each cluster has:
|
|
103
|
-
* max_slots - defines the total number of simultaneous slots to be
|
|
104
|
-
used for hive jobs on this cluster
|
|
105
|
-
* output_db - defines the db which should be used to hold stage outputs.
|
|
106
|
-
* This db must have open permissions (777) so any user on the system can
|
|
107
|
-
write to it -- the tables inside will be owned by the users themselves.
|
|
108
|
-
* exec_path - defines the path to the hive executable
|
|
109
|
-
|
|
110
|
-
Sample hive.yml:
|
|
111
|
-
|
|
112
|
-
``` yml
|
|
113
|
-
---
|
|
114
|
-
development:
|
|
115
|
-
clusters:
|
|
116
|
-
dev_cluster:
|
|
117
|
-
max_slots: 5
|
|
118
|
-
output_db: mobilize
|
|
119
|
-
exec_path: /path/to/hive
|
|
120
|
-
test:
|
|
121
|
-
clusters:
|
|
122
|
-
test_cluster:
|
|
123
|
-
max_slots: 5
|
|
124
|
-
output_db: mobilize
|
|
125
|
-
exec_path: /path/to/hive
|
|
126
|
-
production:
|
|
127
|
-
clusters:
|
|
128
|
-
prod_cluster:
|
|
129
|
-
max_slots: 5
|
|
130
|
-
output_db: mobilize
|
|
131
|
-
exec_path: /path/to/hive
|
|
132
|
-
```
|
|
133
|
-
|
|
134
|
-
<a name='section_Start'></a>
|
|
135
|
-
Start
|
|
136
|
-
-----
|
|
137
|
-
|
|
138
|
-
<a name='section_Start_Create_Job'></a>
|
|
139
|
-
### Create Job
|
|
140
|
-
|
|
141
|
-
* For mobilize-hive, the following stages are available.
|
|
142
|
-
* cluster and user are optional for all of the below.
|
|
143
|
-
* cluster defaults to the first cluster listed;
|
|
144
|
-
* user is treated the same way as in [mobilize-ssh][mobilize-ssh].
|
|
145
|
-
* params are also optional for all of the below. They replace HQL in sources.
|
|
146
|
-
* params are passed as a YML or JSON, as in:
|
|
147
|
-
* `hive.run source:<source_path>, params:{'date': '2013-03-01', 'unit': 'widgets'}`
|
|
148
|
-
* this example replaces all the keys, preceded by '@' in all source hqls with the value.
|
|
149
|
-
* The preceding '@' is used to keep from replacing instances
|
|
150
|
-
of "date" and "unit" in the HQL; you should have `@date` and `@unit` in your actual HQL
|
|
151
|
-
if you'd like to replace those tokens.
|
|
152
|
-
* in addition, the following params are substituted automatically:
|
|
153
|
-
* `$utc_date` - replaced with YYYY-MM-DD date, UTC
|
|
154
|
-
* `$utc_time` - replaced with HH:MM time, UTC
|
|
155
|
-
* any occurrence of these values in HQL will be replaced at runtime.
|
|
156
|
-
* hive.run `hql:<hql> || source:<gsheet_path>, user:<user>, cluster:<cluster>`, which executes the
|
|
157
|
-
script in the hql or source sheet and returns any output specified at the
|
|
158
|
-
end. If the cmd or last query in source is a select statement, column headers will be
|
|
159
|
-
returned as well.
|
|
160
|
-
* hive.write `hql:<hql> || source:<source_path>, target:<hive_path>, partitions:<partition_path>, user:<user>, cluster:<cluster>, schema:<gsheet_path>, drop:<true/false>`,
|
|
161
|
-
which writes the source or query result to the selected hive table.
|
|
162
|
-
* hive_path
|
|
163
|
-
* should be of the form `<hive_db>/<table_name>` or `<hive_db>.<table_name>`.
|
|
164
|
-
* source:
|
|
165
|
-
* can be a gsheet_path, hdfs_path, or hive_path (no partitions)
|
|
166
|
-
* for gsheet and hdfs path,
|
|
167
|
-
* if the file ends in .*ql, it's treated the same as passing hql
|
|
168
|
-
* otherwise it is treated as a tsv with the first row as column headers
|
|
169
|
-
* target:
|
|
170
|
-
* Should be a hive_path, as in `<hive_db>/<table_name>` or `<hive_db>.<table_name>`.
|
|
171
|
-
* partitions:
|
|
172
|
-
* Due to Hive limitation, partition names CANNOT be reserved keywords when writing from tsv (gsheet or hdfs source)
|
|
173
|
-
* Partitions should be specified as a path, as in partitions:`<partition1>/<partition2>`.
|
|
174
|
-
* schema:
|
|
175
|
-
* optional. gsheet_path to column schema.
|
|
176
|
-
* two columns: name, datatype
|
|
177
|
-
* Any columns not defined here will receive "string" as the datatype
|
|
178
|
-
* partitions can have their datatypes overridden here as well
|
|
179
|
-
* columns named here that are not in the dataset will be ignored
|
|
180
|
-
* drop:
|
|
181
|
-
* optional. drops the target table before performing write
|
|
182
|
-
* defaults to false
|
|
183
|
-
|
|
184
|
-
<a name='section_Start_Run_Test'></a>
|
|
185
|
-
### Run Test
|
|
186
|
-
|
|
187
|
-
To run tests, you will need to
|
|
188
|
-
|
|
189
|
-
1) go through [mobilize-base][mobilize-base], [mobilize-ssh][mobilize-ssh], [mobilize-hdfs][mobilize-hdfs] tests first
|
|
190
|
-
|
|
191
|
-
2) clone the mobilize-hive repository
|
|
192
|
-
|
|
193
|
-
From the project folder, run
|
|
194
|
-
|
|
195
|
-
3) $ rake mobilize_hive:setup
|
|
196
|
-
|
|
197
|
-
Copy over the config files from the mobilize-base, mobilize-ssh,
|
|
198
|
-
mobilize-hdfs projects into the config dir, and populate the values in the hive.yml file.
|
|
199
|
-
|
|
200
|
-
Make sure you use the same names for your hive clusters as you do in
|
|
201
|
-
hadoop.yml.
|
|
202
|
-
|
|
203
|
-
3) $ rake test
|
|
204
|
-
|
|
205
|
-
* The test runs these jobs:
|
|
206
|
-
* hive_test_1:
|
|
207
|
-
* `hive.write target:"mobilize/hive_test_1/act_date",source:"Runner_mobilize(test)/hive_test_1.in", schema:"hive_test_1.schema", drop:true`
|
|
208
|
-
* `hive.run source:"hive_test_1.hql"`
|
|
209
|
-
* `hive.run cmd:"show databases"`
|
|
210
|
-
* `gsheet.write source:"stage2", target:"hive_test_1_stage_2.out"`
|
|
211
|
-
* `gsheet.write source:"stage3", target:"hive_test_1_stage_3.out"`
|
|
212
|
-
* hive_test_1.hql runs a select statement on the table created in the
|
|
213
|
-
write command.
|
|
214
|
-
* at the end of the test, there should be two sheets, one with a
|
|
215
|
-
sum of the data as in your write query, one with the results of the show
|
|
216
|
-
databases command.
|
|
217
|
-
* hive_test_2:
|
|
218
|
-
* `hive.write source:"hdfs://user/mobilize/test/test_hdfs_1.out", target:"mobilize.hive_test_2", drop:true`
|
|
219
|
-
* `hive.run cmd:"select * from mobilize.hive_test_2"`
|
|
220
|
-
* `gsheet.write source:"stage2", target:"hive_test_2.out"`
|
|
221
|
-
* this test uses the output from the first hdfs test as an input, so make sure you've run that first.
|
|
222
|
-
* hive_test_3:
|
|
223
|
-
* `hive.write source:"hive://mobilize.hive_test_1",target:"mobilize/hive_test_3/date/product",drop:true`
|
|
224
|
-
* `hive.run hql:"select act_date as ```date```,product,category,value from mobilize.hive_test_1;"`
|
|
225
|
-
* `hive.write source:"stage2",target:"mobilize/hive_test_3/date/product", drop:false`
|
|
226
|
-
* `gsheet.write source:"hive://mobilize/hive_test_3", target:"hive_test_3.out"`
|
|
227
|
-
|
|
228
|
-
|
|
229
|
-
<a name='section_Meta'></a>
|
|
230
|
-
Meta
|
|
231
|
-
----
|
|
232
|
-
|
|
233
|
-
* Code: `git clone git://github.com/dena/mobilize-hive.git`
|
|
234
|
-
* Home: <https://github.com/dena/mobilize-hive>
|
|
235
|
-
* Bugs: <https://github.com/dena/mobilize-hive/issues>
|
|
236
|
-
* Gems: <http://rubygems.org/gems/mobilize-hive>
|
|
237
|
-
|
|
238
|
-
<a name='section_Special_Thanks'></a>
|
|
239
|
-
Special Thanks
|
|
240
|
-
--------------
|
|
241
|
-
* This release goes to Toby Negrin, who championed this project with
|
|
242
|
-
DeNA and gave me the support to get it properly architected, tested, and documented.
|
|
243
|
-
* Also many thanks to the Analytics team at DeNA who build and maintain
|
|
244
|
-
our Big Data infrastructure.
|
|
245
|
-
|
|
246
|
-
<a name='section_Author'></a>
|
|
247
|
-
Author
|
|
248
|
-
------
|
|
249
|
-
|
|
250
|
-
Cassio Paes-Leme :: cpaesleme@dena.com :: @cpaesleme
|
|
251
|
-
|
|
252
|
-
[mobilize-base]: https://github.com/dena/mobilize-base
|
|
253
|
-
[mobilize-ssh]: https://github.com/dena/mobilize-ssh
|
|
254
|
-
[mobilize-hdfs]: https://github.com/dena/mobilize-hdfs
|
|
4
|
+
Please refer to the mobilize-server wiki: https://github.com/DeNA/mobilize-server/wiki
|
|
@@ -94,22 +94,29 @@ module Mobilize
|
|
|
94
94
|
|
|
95
95
|
#run a generic hive command, with the option of passing a file hash to be locally available
|
|
96
96
|
def Hive.run(cluster,hql,user_name,params=nil,file_hash=nil)
|
|
97
|
-
|
|
98
|
-
|
|
97
|
+
preps = Hive.prepends.map do |p|
|
|
98
|
+
prefix = "set "
|
|
99
|
+
suffix = ";"
|
|
100
|
+
prep_out = p
|
|
101
|
+
prep_out = "#{prefix}#{prep_out}" unless prep_out.starts_with?(prefix)
|
|
102
|
+
prep_out = "#{prep_out}#{suffix}" unless prep_out.ends_with?(suffix)
|
|
103
|
+
prep_out
|
|
104
|
+
end.join
|
|
105
|
+
hql = "#{preps}#{hql}"
|
|
99
106
|
filename = hql.to_md5
|
|
100
107
|
file_hash||= {}
|
|
101
108
|
file_hash[filename] = hql
|
|
102
|
-
#add in default params
|
|
103
109
|
params ||= {}
|
|
104
|
-
params = params.merge(Hive.default_params)
|
|
105
110
|
#replace any params in the file_hash and command
|
|
106
111
|
params.each do |k,v|
|
|
107
112
|
file_hash.each do |name,data|
|
|
108
|
-
|
|
109
|
-
|
|
110
|
-
|
|
111
|
-
|
|
112
|
-
|
|
113
|
+
data.gsub!("@#{k}",v)
|
|
114
|
+
end
|
|
115
|
+
end
|
|
116
|
+
#add in default params
|
|
117
|
+
Hive.default_params.each do |k,v|
|
|
118
|
+
file_hash.each do |name,data|
|
|
119
|
+
data.gsub!(k,v)
|
|
113
120
|
end
|
|
114
121
|
end
|
|
115
122
|
#silent mode so we don't have logs in stderr; clip output
|
|
@@ -155,9 +162,9 @@ module Mobilize
|
|
|
155
162
|
Gdrive.unslot_worker_by_path(stage_path)
|
|
156
163
|
|
|
157
164
|
#check for select at end
|
|
158
|
-
hql_array = hql.split("
|
|
159
|
-
last_statement = hql_array.last
|
|
160
|
-
if last_statement.to_s.starts_with?("select")
|
|
165
|
+
hql_array = hql.split("\n").reject{|l| l.starts_with?("--") or l.strip.length==0}.join("\n").split(";").map{|h| h.strip}
|
|
166
|
+
last_statement = hql_array.last
|
|
167
|
+
if last_statement.to_s.downcase.starts_with?("select")
|
|
161
168
|
#nil if no prior commands
|
|
162
169
|
prior_hql = hql_array[0..-2].join(";") if hql_array.length > 1
|
|
163
170
|
select_hql = hql_array.last
|
|
@@ -181,41 +188,37 @@ module Mobilize
|
|
|
181
188
|
response
|
|
182
189
|
end
|
|
183
190
|
|
|
184
|
-
def Hive.schema_hash(schema_path,user_name,gdrive_slot)
|
|
185
|
-
if schema_path.index("
|
|
186
|
-
|
|
187
|
-
|
|
191
|
+
def Hive.schema_hash(schema_path,stage_path,user_name,gdrive_slot)
|
|
192
|
+
handler = if schema_path.index("://")
|
|
193
|
+
schema_path.split("://").first
|
|
194
|
+
else
|
|
195
|
+
"gsheet"
|
|
196
|
+
end
|
|
197
|
+
dst = "Mobilize::#{handler.downcase.capitalize}".constantize.path_to_dst(schema_path,stage_path,gdrive_slot)
|
|
198
|
+
out_raw = dst.read(user_name,gdrive_slot)
|
|
199
|
+
#determine the datatype for schema; accept json, yaml, tsv
|
|
200
|
+
if schema_path.ends_with?(".yml")
|
|
201
|
+
out_ha = begin;YAML.load(out_raw);rescue ScriptError, StandardError;nil;end if out_ha.nil?
|
|
188
202
|
else
|
|
189
|
-
|
|
190
|
-
|
|
191
|
-
r = u.runner
|
|
192
|
-
runner_sheet = r.gbook(gdrive_slot).worksheet_by_title(schema_path)
|
|
193
|
-
out_tsv = if runner_sheet
|
|
194
|
-
runner_sheet.read(user_name)
|
|
195
|
-
else
|
|
196
|
-
#check for gfile. will fail if there isn't one.
|
|
197
|
-
Gfile.find_by_path(schema_path).read(user_name)
|
|
198
|
-
end
|
|
203
|
+
out_ha = begin;JSON.parse(out_raw);rescue ScriptError, StandardError;nil;end
|
|
204
|
+
out_ha = out_raw.tsv_to_hash_array if out_ha.nil?
|
|
199
205
|
end
|
|
200
|
-
#use Gridfs to cache gdrive results
|
|
201
|
-
file_name = schema_path.split("/").last
|
|
202
|
-
out_url = "gridfs://#{schema_path}/#{file_name}"
|
|
203
|
-
Dataset.write_by_url(out_url,out_tsv,user_name)
|
|
204
|
-
schema_tsv = Dataset.find_by_url(out_url).read(user_name,gdrive_slot)
|
|
205
206
|
schema_hash = {}
|
|
206
|
-
|
|
207
|
-
schema_hash[
|
|
207
|
+
out_ha.each do |hash|
|
|
208
|
+
schema_hash[hash['name']] = hash['datatype']
|
|
208
209
|
end
|
|
209
210
|
schema_hash
|
|
210
211
|
end
|
|
211
212
|
|
|
212
|
-
def Hive.hql_to_table(cluster, db, table, part_array, source_hql, user_name, job_name, drop=false, schema_hash=nil,
|
|
213
|
+
def Hive.hql_to_table(cluster, db, table, part_array, source_hql, user_name, job_name, drop=false, schema_hash=nil, run_params=nil)
|
|
213
214
|
table_path = [db,table].join(".")
|
|
214
215
|
table_stats = Hive.table_stats(cluster, db, table, user_name)
|
|
215
216
|
url = "hive://" + [cluster,db,table,part_array.compact.join("/")].join("/")
|
|
216
217
|
|
|
217
|
-
|
|
218
|
-
|
|
218
|
+
#decomment hql
|
|
219
|
+
|
|
220
|
+
source_hql_array = source_hql.split("\n").reject{|l| l.starts_with?("--") or l.strip.length==0}.join("\n").split(";").map{|h| h.strip}
|
|
221
|
+
last_select_i = source_hql_array.rindex{|s| s.downcase.starts_with?("select")}
|
|
219
222
|
#find the last select query -- it should be used for the temp table creation
|
|
220
223
|
last_select_hql = (source_hql_array[last_select_i..-1].join(";")+";")
|
|
221
224
|
#if there is anything prior to the last select, add it in prior to table creation
|
|
@@ -228,7 +231,7 @@ module Mobilize
|
|
|
228
231
|
temp_set_hql = "set mapred.job.name=#{job_name} (temp table);"
|
|
229
232
|
temp_drop_hql = "drop table if exists #{temp_table_path};"
|
|
230
233
|
temp_create_hql = "#{temp_set_hql}#{prior_hql}#{temp_drop_hql}create table #{temp_table_path} as #{last_select_hql}"
|
|
231
|
-
response = Hive.run(cluster,temp_create_hql,user_name,
|
|
234
|
+
response = Hive.run(cluster,temp_create_hql,user_name,run_params)
|
|
232
235
|
raise response['stderr'] if response['stderr'].to_s.ie{|s| s.index("FAILED") or s.index("KILLED")}
|
|
233
236
|
|
|
234
237
|
source_table_stats = Hive.table_stats(cluster,temp_db,temp_table_name,user_name)
|
|
@@ -267,7 +270,7 @@ module Mobilize
|
|
|
267
270
|
target_insert_hql,
|
|
268
271
|
temp_drop_hql].join
|
|
269
272
|
|
|
270
|
-
response = Hive.run(cluster, target_full_hql, user_name,
|
|
273
|
+
response = Hive.run(cluster, target_full_hql, user_name, run_params)
|
|
271
274
|
|
|
272
275
|
raise response['stderr'] if response['stderr'].to_s.ie{|s| s.index("FAILED") or s.index("KILLED")}
|
|
273
276
|
|
|
@@ -319,7 +322,7 @@ module Mobilize
|
|
|
319
322
|
part_set_hql = "set hive.cli.print.header=true;set mapred.job.name=#{job_name} (permutations);"
|
|
320
323
|
part_select_hql = "select distinct #{target_part_stmt} from #{temp_table_path};"
|
|
321
324
|
part_perm_hql = part_set_hql + part_select_hql
|
|
322
|
-
response = Hive.run(cluster, part_perm_hql, user_name,
|
|
325
|
+
response = Hive.run(cluster, part_perm_hql, user_name, run_params)
|
|
323
326
|
raise response['stderr'] if response['stderr'].to_s.ie{|s| s.index("FAILED") or s.index("KILLED")}
|
|
324
327
|
part_perm_tsv = response['stdout']
|
|
325
328
|
#having gotten the permutations, ensure they are dropped
|
|
@@ -332,7 +335,7 @@ module Mobilize
|
|
|
332
335
|
|
|
333
336
|
part_drop_hql = part_hash_array.map do |h|
|
|
334
337
|
part_drop_stmt = h.map do |name,value|
|
|
335
|
-
part_defs[name[1..-2]]=="string" ? "#{name}='#{value}'" : "#{name}=#{value}"
|
|
338
|
+
part_defs[name[1..-2]].downcase=="string" ? "#{name}='#{value}'" : "#{name}=#{value}"
|
|
336
339
|
end.join(",")
|
|
337
340
|
"use #{db};alter table #{table} drop if exists partition (#{part_drop_stmt});"
|
|
338
341
|
end.join
|
|
@@ -345,7 +348,7 @@ module Mobilize
|
|
|
345
348
|
|
|
346
349
|
target_full_hql = [target_set_hql, target_create_hql, target_insert_hql, temp_drop_hql].join
|
|
347
350
|
|
|
348
|
-
response = Hive.run(cluster, target_full_hql, user_name,
|
|
351
|
+
response = Hive.run(cluster, target_full_hql, user_name, run_params)
|
|
349
352
|
raise response['stderr'] if response['stderr'].to_s.ie{|s| s.index("FAILED") or s.index("KILLED")}
|
|
350
353
|
else
|
|
351
354
|
error_msg = "Incompatible partition specs"
|
|
@@ -500,7 +503,7 @@ module Mobilize
|
|
|
500
503
|
job_name = s.path.sub("Runner_","")
|
|
501
504
|
|
|
502
505
|
schema_hash = if params['schema']
|
|
503
|
-
Hive.schema_hash(params['schema'],user_name,gdrive_slot)
|
|
506
|
+
Hive.schema_hash(params['schema'],stage_path,user_name,gdrive_slot)
|
|
504
507
|
else
|
|
505
508
|
{}
|
|
506
509
|
end
|
|
@@ -543,7 +546,8 @@ module Mobilize
|
|
|
543
546
|
result = begin
|
|
544
547
|
url = if source_hql
|
|
545
548
|
#include any params (or nil) at the end
|
|
546
|
-
|
|
549
|
+
run_params = params['params']
|
|
550
|
+
Hive.hql_to_table(cluster, db, table, part_array, source_hql, user_name, job_name, drop, schema_hash,run_params)
|
|
547
551
|
elsif source_tsv
|
|
548
552
|
Hive.tsv_to_table(cluster, db, table, part_array, source_tsv, user_name, drop, schema_hash)
|
|
549
553
|
elsif source
|
|
@@ -26,6 +26,10 @@ module Mobilize
|
|
|
26
26
|
(1..self.clusters[cluster]['max_slots']).to_a.map{|s| "#{cluster}_#{s.to_s}"}
|
|
27
27
|
end
|
|
28
28
|
|
|
29
|
+
def self.prepends
|
|
30
|
+
self.config['prepends']
|
|
31
|
+
end
|
|
32
|
+
|
|
29
33
|
def self.slot_worker_by_cluster_and_path(cluster,path)
|
|
30
34
|
working_slots = Mobilize::Resque.jobs.map{|j| begin j['args'][1]['hive_slot'];rescue;nil;end}.compact.uniq
|
|
31
35
|
self.slot_ids(cluster).each do |slot_id|
|
data/lib/mobilize-hive/tasks.rb
CHANGED
data/lib/mobilize-hive.rb
CHANGED
data/lib/samples/hive.yml
CHANGED
|
@@ -1,17 +1,23 @@
|
|
|
1
1
|
---
|
|
2
2
|
development:
|
|
3
|
+
prepends:
|
|
4
|
+
- "hive.stats.autogather=false"
|
|
3
5
|
clusters:
|
|
4
6
|
dev_cluster:
|
|
5
7
|
max_slots: 5
|
|
6
8
|
temp_table_db: mobilize
|
|
7
9
|
exec_path: /path/to/hive
|
|
8
10
|
test:
|
|
11
|
+
prepends:
|
|
12
|
+
- "hive.stats.autogather=false"
|
|
9
13
|
clusters:
|
|
10
14
|
test_cluster:
|
|
11
15
|
max_slots: 5
|
|
12
16
|
temp_table_db: mobilize
|
|
13
17
|
exec_path: /path/to/hive
|
|
14
18
|
production:
|
|
19
|
+
prepends:
|
|
20
|
+
- "hive.stats.autogather=false"
|
|
15
21
|
clusters:
|
|
16
22
|
prod_cluster:
|
|
17
23
|
max_slots: 5
|
data/mobilize-hive.gemspec
CHANGED
|
@@ -16,5 +16,5 @@ Gem::Specification.new do |gem|
|
|
|
16
16
|
gem.executables = gem.files.grep(%r{^bin/}).map{ |f| File.basename(f) }
|
|
17
17
|
gem.test_files = gem.files.grep(%r{^(test|spec|features)/})
|
|
18
18
|
gem.require_paths = ["lib"]
|
|
19
|
-
gem.add_runtime_dependency "mobilize-hdfs","1.
|
|
19
|
+
gem.add_runtime_dependency "mobilize-hdfs","1.36"
|
|
20
20
|
end
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
@@ -0,0 +1 @@
|
|
|
1
|
+
select act_date,product, sum(value) as sum from mobilize.hive_test_1 group by act_date,product;
|
|
@@ -0,0 +1 @@
|
|
|
1
|
+
|
|
@@ -0,0 +1,69 @@
|
|
|
1
|
+
---
|
|
2
|
+
- path: "Runner_mobilize(test)/jobs"
|
|
3
|
+
state: working
|
|
4
|
+
count: 1
|
|
5
|
+
confirmed_ats: []
|
|
6
|
+
- path: "Runner_mobilize(test)/jobs/hive1/stage1"
|
|
7
|
+
state: working
|
|
8
|
+
count: 1
|
|
9
|
+
confirmed_ats: []
|
|
10
|
+
- path: "Runner_mobilize(test)/jobs/hive1/stage2"
|
|
11
|
+
state: working
|
|
12
|
+
count: 1
|
|
13
|
+
confirmed_ats: []
|
|
14
|
+
- path: "Runner_mobilize(test)/jobs/hive1/stage3"
|
|
15
|
+
state: working
|
|
16
|
+
count: 1
|
|
17
|
+
confirmed_ats: []
|
|
18
|
+
- path: "Runner_mobilize(test)/jobs/hive1/stage4"
|
|
19
|
+
state: working
|
|
20
|
+
count: 1
|
|
21
|
+
confirmed_ats: []
|
|
22
|
+
- path: "Runner_mobilize(test)/jobs/hive1/stage5"
|
|
23
|
+
state: working
|
|
24
|
+
count: 1
|
|
25
|
+
confirmed_ats: []
|
|
26
|
+
- path: "Runner_mobilize(test)/jobs/hive2/stage1"
|
|
27
|
+
state: working
|
|
28
|
+
count: 1
|
|
29
|
+
confirmed_ats: []
|
|
30
|
+
- path: "Runner_mobilize(test)/jobs/hive2/stage2"
|
|
31
|
+
state: working
|
|
32
|
+
count: 1
|
|
33
|
+
confirmed_ats: []
|
|
34
|
+
- path: "Runner_mobilize(test)/jobs/hive2/stage3"
|
|
35
|
+
state: working
|
|
36
|
+
count: 1
|
|
37
|
+
confirmed_ats: []
|
|
38
|
+
- path: "Runner_mobilize(test)/jobs/hive3/stage1"
|
|
39
|
+
state: working
|
|
40
|
+
count: 1
|
|
41
|
+
confirmed_ats: []
|
|
42
|
+
- path: "Runner_mobilize(test)/jobs/hive3/stage2"
|
|
43
|
+
state: working
|
|
44
|
+
count: 1
|
|
45
|
+
confirmed_ats: []
|
|
46
|
+
- path: "Runner_mobilize(test)/jobs/hive3/stage3"
|
|
47
|
+
state: working
|
|
48
|
+
count: 1
|
|
49
|
+
confirmed_ats: []
|
|
50
|
+
- path: "Runner_mobilize(test)/jobs/hive3/stage4"
|
|
51
|
+
state: working
|
|
52
|
+
count: 1
|
|
53
|
+
confirmed_ats: []
|
|
54
|
+
- path: "Runner_mobilize(test)/jobs/hive4/stage1"
|
|
55
|
+
state: working
|
|
56
|
+
count: 1
|
|
57
|
+
confirmed_ats: []
|
|
58
|
+
- path: "Runner_mobilize(test)/jobs/hive4/stage2"
|
|
59
|
+
state: working
|
|
60
|
+
count: 1
|
|
61
|
+
confirmed_ats: []
|
|
62
|
+
- path: "Runner_mobilize(test)/jobs/hive4/stage3"
|
|
63
|
+
state: working
|
|
64
|
+
count: 1
|
|
65
|
+
confirmed_ats: []
|
|
66
|
+
- path: "Runner_mobilize(test)/jobs/hive4/stage4"
|
|
67
|
+
state: working
|
|
68
|
+
count: 1
|
|
69
|
+
confirmed_ats: []
|
|
@@ -0,0 +1,34 @@
|
|
|
1
|
+
---
|
|
2
|
+
- name: hive1
|
|
3
|
+
active: true
|
|
4
|
+
trigger: once
|
|
5
|
+
status: ""
|
|
6
|
+
stage1: hive.write target:"mobilize/hive1", partitions:"act_date", drop:true,
|
|
7
|
+
source:"Runner_mobilize(test)/hive1.in", schema:"hive1.schema"
|
|
8
|
+
stage2: hive.run source:"hive1.sql"
|
|
9
|
+
stage3: hive.run hql:"show databases;"
|
|
10
|
+
stage4: gsheet.write source:"stage2", target:"hive1_stage2.out"
|
|
11
|
+
stage5: gsheet.write source:"stage3", target:"hive1_stage3.out"
|
|
12
|
+
- name: hive2
|
|
13
|
+
active: true
|
|
14
|
+
trigger: after hive1
|
|
15
|
+
status: ""
|
|
16
|
+
stage1: hive.write source:"hdfs://user/mobilize/test/hdfs1.out", target:"mobilize.hive2", drop:true
|
|
17
|
+
stage2: hive.run hql:"select * from mobilize.hive2;"
|
|
18
|
+
stage3: gsheet.write source:"stage2", target:"hive2.out"
|
|
19
|
+
- name: hive3
|
|
20
|
+
active: true
|
|
21
|
+
trigger: after hive2
|
|
22
|
+
status: ""
|
|
23
|
+
stage1: hive.run hql:"select '@date' as `date`,product,category,value from mobilize.hive1;", params:{'date':'2013-01-01'}
|
|
24
|
+
stage2: hive.write source:"stage1",target:"mobilize/hive3", partitions:"date/product", drop:true
|
|
25
|
+
stage3: hive.write hql:"select * from mobilize.hive3;",target:"mobilize/hive3", partitions:"date/product", drop:false
|
|
26
|
+
stage4: gsheet.write source:"hive://mobilize/hive3", target:"hive3.out"
|
|
27
|
+
- name: hive4
|
|
28
|
+
active: true
|
|
29
|
+
trigger: after hive3
|
|
30
|
+
status: ""
|
|
31
|
+
stage1: hive.write source:"hive4_stage1.in", target:"mobilize/hive1", partitions:"act_date"
|
|
32
|
+
stage2: hive.write source:"hive4_stage2.in", target:"mobilize/hive1", partitions:"act_date"
|
|
33
|
+
stage3: hive.run hql:"select '@date $utc_time' as `date_time`,product,category,value from mobilize.hive1;", params:{'date':'$utc_date'}
|
|
34
|
+
stage4: gsheet.write source:stage3, target:"hive4.out"
|
|
@@ -0,0 +1,43 @@
|
|
|
1
|
+
require 'test_helper'
|
|
2
|
+
describe "Mobilize" do
|
|
3
|
+
# enqueues 4 workers on Resque
|
|
4
|
+
it "runs integration test" do
|
|
5
|
+
|
|
6
|
+
puts "restart workers"
|
|
7
|
+
Mobilize::Jobtracker.restart_workers!
|
|
8
|
+
|
|
9
|
+
u = TestHelper.owner_user
|
|
10
|
+
r = u.runner
|
|
11
|
+
user_name = u.name
|
|
12
|
+
gdrive_slot = u.email
|
|
13
|
+
|
|
14
|
+
puts "add test data"
|
|
15
|
+
["hive1.in","hive4_stage1.in","hive4_stage2.in","hive1.schema","hive1.sql"].each do |fixture_name|
|
|
16
|
+
target_url = "gsheet://#{r.title}/#{fixture_name}"
|
|
17
|
+
TestHelper.write_fixture(fixture_name, target_url, 'replace')
|
|
18
|
+
end
|
|
19
|
+
|
|
20
|
+
puts "add/update jobs"
|
|
21
|
+
u.jobs.each{|j| j.delete}
|
|
22
|
+
jobs_fixture_name = "integration_jobs"
|
|
23
|
+
jobs_target_url = "gsheet://#{r.title}/jobs"
|
|
24
|
+
TestHelper.write_fixture(jobs_fixture_name, jobs_target_url, 'update')
|
|
25
|
+
|
|
26
|
+
puts "job rows added, force enqueue runner, wait for stages"
|
|
27
|
+
#wait for stages to complete
|
|
28
|
+
expected_fixture_name = "integration_expected"
|
|
29
|
+
Mobilize::Jobtracker.stop!
|
|
30
|
+
r.enqueue!
|
|
31
|
+
TestHelper.confirm_expected_jobs(expected_fixture_name,2100)
|
|
32
|
+
|
|
33
|
+
puts "update job status and activity"
|
|
34
|
+
r.update_gsheet(gdrive_slot)
|
|
35
|
+
|
|
36
|
+
puts "check posted data"
|
|
37
|
+
assert TestHelper.check_output("gsheet://#{r.title}/hive1_stage2.out", 'min_length' => 219) == true
|
|
38
|
+
assert TestHelper.check_output("gsheet://#{r.title}/hive1_stage3.out", 'min_length' => 3) == true
|
|
39
|
+
assert TestHelper.check_output("gsheet://#{r.title}/hive2.out", 'min_length' => 599) == true
|
|
40
|
+
assert TestHelper.check_output("gsheet://#{r.title}/hive3.out", 'min_length' => 347) == true
|
|
41
|
+
assert TestHelper.check_output("gsheet://#{r.title}/hive4.out", 'min_length' => 432) == true
|
|
42
|
+
end
|
|
43
|
+
end
|
data/test/test_helper.rb
CHANGED
metadata
CHANGED
|
@@ -1,7 +1,7 @@
|
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
|
2
2
|
name: mobilize-hive
|
|
3
3
|
version: !ruby/object:Gem::Version
|
|
4
|
-
version: '1.
|
|
4
|
+
version: '1.36'
|
|
5
5
|
prerelease:
|
|
6
6
|
platform: ruby
|
|
7
7
|
authors:
|
|
@@ -9,7 +9,7 @@ authors:
|
|
|
9
9
|
autorequire:
|
|
10
10
|
bindir: bin
|
|
11
11
|
cert_chain: []
|
|
12
|
-
date: 2013-
|
|
12
|
+
date: 2013-05-21 00:00:00.000000000 Z
|
|
13
13
|
dependencies:
|
|
14
14
|
- !ruby/object:Gem::Dependency
|
|
15
15
|
name: mobilize-hdfs
|
|
@@ -18,7 +18,7 @@ dependencies:
|
|
|
18
18
|
requirements:
|
|
19
19
|
- - '='
|
|
20
20
|
- !ruby/object:Gem::Version
|
|
21
|
-
version: '1.
|
|
21
|
+
version: '1.36'
|
|
22
22
|
type: :runtime
|
|
23
23
|
prerelease: false
|
|
24
24
|
version_requirements: !ruby/object:Gem::Requirement
|
|
@@ -26,7 +26,7 @@ dependencies:
|
|
|
26
26
|
requirements:
|
|
27
27
|
- - '='
|
|
28
28
|
- !ruby/object:Gem::Version
|
|
29
|
-
version: '1.
|
|
29
|
+
version: '1.36'
|
|
30
30
|
description: Adds hive read, write, and run support to mobilize-hdfs
|
|
31
31
|
email:
|
|
32
32
|
- cpaesleme@dena.com
|
|
@@ -46,11 +46,15 @@ files:
|
|
|
46
46
|
- lib/mobilize-hive/version.rb
|
|
47
47
|
- lib/samples/hive.yml
|
|
48
48
|
- mobilize-hive.gemspec
|
|
49
|
-
- test/
|
|
50
|
-
- test/
|
|
51
|
-
- test/
|
|
52
|
-
- test/
|
|
53
|
-
- test/
|
|
49
|
+
- test/fixtures/hive1.hql
|
|
50
|
+
- test/fixtures/hive1.in.yml
|
|
51
|
+
- test/fixtures/hive1.schema.yml
|
|
52
|
+
- test/fixtures/hive1.sql
|
|
53
|
+
- test/fixtures/hive4_stage1.in
|
|
54
|
+
- test/fixtures/hive4_stage2.in.yml
|
|
55
|
+
- test/fixtures/integration_expected.yml
|
|
56
|
+
- test/fixtures/integration_jobs.yml
|
|
57
|
+
- test/integration/mobilize-hive_test.rb
|
|
54
58
|
- test/redis-test.conf
|
|
55
59
|
- test/test_helper.rb
|
|
56
60
|
homepage: http://github.com/dena/mobilize-hive
|
|
@@ -67,7 +71,7 @@ required_ruby_version: !ruby/object:Gem::Requirement
|
|
|
67
71
|
version: '0'
|
|
68
72
|
segments:
|
|
69
73
|
- 0
|
|
70
|
-
hash:
|
|
74
|
+
hash: 837156919845089008
|
|
71
75
|
required_rubygems_version: !ruby/object:Gem::Requirement
|
|
72
76
|
none: false
|
|
73
77
|
requirements:
|
|
@@ -76,7 +80,7 @@ required_rubygems_version: !ruby/object:Gem::Requirement
|
|
|
76
80
|
version: '0'
|
|
77
81
|
segments:
|
|
78
82
|
- 0
|
|
79
|
-
hash:
|
|
83
|
+
hash: 837156919845089008
|
|
80
84
|
requirements: []
|
|
81
85
|
rubyforge_project:
|
|
82
86
|
rubygems_version: 1.8.25
|
|
@@ -84,10 +88,14 @@ signing_key:
|
|
|
84
88
|
specification_version: 3
|
|
85
89
|
summary: Adds hive read, write, and run support to mobilize-hdfs
|
|
86
90
|
test_files:
|
|
87
|
-
- test/
|
|
88
|
-
- test/
|
|
89
|
-
- test/
|
|
90
|
-
- test/
|
|
91
|
-
- test/
|
|
91
|
+
- test/fixtures/hive1.hql
|
|
92
|
+
- test/fixtures/hive1.in.yml
|
|
93
|
+
- test/fixtures/hive1.schema.yml
|
|
94
|
+
- test/fixtures/hive1.sql
|
|
95
|
+
- test/fixtures/hive4_stage1.in
|
|
96
|
+
- test/fixtures/hive4_stage2.in.yml
|
|
97
|
+
- test/fixtures/integration_expected.yml
|
|
98
|
+
- test/fixtures/integration_jobs.yml
|
|
99
|
+
- test/integration/mobilize-hive_test.rb
|
|
92
100
|
- test/redis-test.conf
|
|
93
101
|
- test/test_helper.rb
|
data/test/hive_job_rows.yml
DELETED
|
@@ -1,34 +0,0 @@
|
|
|
1
|
-
---
|
|
2
|
-
- name: hive_test_1
|
|
3
|
-
active: true
|
|
4
|
-
trigger: once
|
|
5
|
-
status: ""
|
|
6
|
-
stage1: hive.write target:"mobilize/hive_test_1", partitions:"act_date", drop:true,
|
|
7
|
-
source:"Runner_mobilize(test)/hive_test_1.in", schema:"hive_test_1.schema"
|
|
8
|
-
stage2: hive.run source:"hive_test_1.hql"
|
|
9
|
-
stage3: hive.run hql:"show databases;"
|
|
10
|
-
stage4: gsheet.write source:"stage2", target:"hive_test_1_stage_2.out"
|
|
11
|
-
stage5: gsheet.write source:"stage3", target:"hive_test_1_stage_3.out"
|
|
12
|
-
- name: hive_test_2
|
|
13
|
-
active: true
|
|
14
|
-
trigger: after hive_test_1
|
|
15
|
-
status: ""
|
|
16
|
-
stage1: hive.write source:"hdfs://user/mobilize/test/test_hdfs_1.out", target:"mobilize.hive_test_2", drop:true
|
|
17
|
-
stage2: hive.run hql:"select * from mobilize.hive_test_2;"
|
|
18
|
-
stage3: gsheet.write source:"stage2", target:"hive_test_2.out"
|
|
19
|
-
- name: hive_test_3
|
|
20
|
-
active: true
|
|
21
|
-
trigger: after hive_test_2
|
|
22
|
-
status: ""
|
|
23
|
-
stage1: hive.run hql:"select '@date' as `date`,product,category,value from mobilize.hive_test_1;", params:{'date':'2013-01-01'}
|
|
24
|
-
stage2: hive.write source:"stage1",target:"mobilize/hive_test_3", partitions:"date/product", drop:true
|
|
25
|
-
stage3: hive.write hql:"select * from mobilize.hive_test_3;",target:"mobilize/hive_test_3", partitions:"date/product", drop:false
|
|
26
|
-
stage4: gsheet.write source:"hive://mobilize/hive_test_3", target:"hive_test_3.out"
|
|
27
|
-
- name: hive_test_4
|
|
28
|
-
active: true
|
|
29
|
-
trigger: after hive_test_3
|
|
30
|
-
status: ""
|
|
31
|
-
stage1: hive.write source:"hive_test_4_stage_1.in", target:"mobilize/hive_test_1", partitions:"act_date"
|
|
32
|
-
stage2: hive.write source:"hive_test_4_stage_2.in", target:"mobilize/hive_test_1", partitions:"act_date"
|
|
33
|
-
stage3: hive.run hql:"select '$utc_date $utc_time' as `date_time`,product,category,value from mobilize.hive_test_1;"
|
|
34
|
-
stage4: gsheet.write source:stage3, target:"hive_test_4.out"
|
data/test/mobilize-hive_test.rb
DELETED
|
@@ -1,112 +0,0 @@
|
|
|
1
|
-
require 'test_helper'
|
|
2
|
-
|
|
3
|
-
describe "Mobilize" do
|
|
4
|
-
|
|
5
|
-
def before
|
|
6
|
-
puts 'nothing before'
|
|
7
|
-
end
|
|
8
|
-
|
|
9
|
-
# enqueues 4 workers on Resque
|
|
10
|
-
it "runs integration test" do
|
|
11
|
-
|
|
12
|
-
puts "restart workers"
|
|
13
|
-
Mobilize::Jobtracker.restart_workers!
|
|
14
|
-
|
|
15
|
-
gdrive_slot = Mobilize::Gdrive.owner_email
|
|
16
|
-
puts "create user 'mobilize'"
|
|
17
|
-
user_name = gdrive_slot.split("@").first
|
|
18
|
-
u = Mobilize::User.where(:name=>user_name).first
|
|
19
|
-
r = u.runner
|
|
20
|
-
|
|
21
|
-
puts "add test_source data"
|
|
22
|
-
hive_1_in_sheet = Mobilize::Gsheet.find_by_path("#{r.path.split("/")[0..-2].join("/")}/hive_test_1.in",gdrive_slot)
|
|
23
|
-
[hive_1_in_sheet].each {|s| s.delete if s}
|
|
24
|
-
hive_1_in_sheet = Mobilize::Gsheet.find_or_create_by_path("#{r.path.split("/")[0..-2].join("/")}/hive_test_1.in",gdrive_slot)
|
|
25
|
-
hive_1_in_tsv = YAML.load_file("#{Mobilize::Base.root}/test/hive_test_1_in.yml").hash_array_to_tsv
|
|
26
|
-
hive_1_in_sheet.write(hive_1_in_tsv,Mobilize::Gdrive.owner_name)
|
|
27
|
-
|
|
28
|
-
#create blank sheet
|
|
29
|
-
hive_4_stage_1_in_sheet = Mobilize::Gsheet.find_by_path("#{r.path.split("/")[0..-2].join("/")}/hive_test_4_stage_1.in",gdrive_slot)
|
|
30
|
-
[hive_4_stage_1_in_sheet].each {|s| s.delete if s}
|
|
31
|
-
hive_4_stage_1_in_sheet = Mobilize::Gsheet.find_or_create_by_path("#{r.path.split("/")[0..-2].join("/")}/hive_test_4_stage_1.in",gdrive_slot)
|
|
32
|
-
|
|
33
|
-
#create sheet w just headers
|
|
34
|
-
hive_4_stage_2_in_sheet = Mobilize::Gsheet.find_by_path("#{r.path.split("/")[0..-2].join("/")}/hive_test_4_stage_2.in",gdrive_slot)
|
|
35
|
-
[hive_4_stage_2_in_sheet].each {|s| s.delete if s}
|
|
36
|
-
hive_4_stage_2_in_sheet = Mobilize::Gsheet.find_or_create_by_path("#{r.path.split("/")[0..-2].join("/")}/hive_test_4_stage_2.in",gdrive_slot)
|
|
37
|
-
hive_4_stage_2_in_sheet_header = hive_1_in_tsv.tsv_header_array.join("\t")
|
|
38
|
-
hive_4_stage_2_in_sheet.write(hive_4_stage_2_in_sheet_header,Mobilize::Gdrive.owner_name)
|
|
39
|
-
|
|
40
|
-
hive_1_schema_sheet = Mobilize::Gsheet.find_by_path("#{r.path.split("/")[0..-2].join("/")}/hive_test_1.schema",gdrive_slot)
|
|
41
|
-
[hive_1_schema_sheet].each {|s| s.delete if s}
|
|
42
|
-
hive_1_schema_sheet = Mobilize::Gsheet.find_or_create_by_path("#{r.path.split("/")[0..-2].join("/")}/hive_test_1.schema",gdrive_slot)
|
|
43
|
-
hive_1_schema_tsv = YAML.load_file("#{Mobilize::Base.root}/test/hive_test_1_schema.yml").hash_array_to_tsv
|
|
44
|
-
hive_1_schema_sheet.write(hive_1_schema_tsv,Mobilize::Gdrive.owner_name)
|
|
45
|
-
|
|
46
|
-
hive_1_hql_sheet = Mobilize::Gsheet.find_by_path("#{r.path.split("/")[0..-2].join("/")}/hive_test_1.hql",gdrive_slot)
|
|
47
|
-
[hive_1_hql_sheet].each {|s| s.delete if s}
|
|
48
|
-
hive_1_hql_sheet = Mobilize::Gsheet.find_or_create_by_path("#{r.path.split("/")[0..-2].join("/")}/hive_test_1.hql",gdrive_slot)
|
|
49
|
-
hive_1_hql_tsv = File.open("#{Mobilize::Base.root}/test/hive_test_1.hql").read
|
|
50
|
-
hive_1_hql_sheet.write(hive_1_hql_tsv,Mobilize::Gdrive.owner_name)
|
|
51
|
-
|
|
52
|
-
jobs_sheet = r.gsheet(gdrive_slot)
|
|
53
|
-
|
|
54
|
-
test_job_rows = ::YAML.load_file("#{Mobilize::Base.root}/test/hive_job_rows.yml")
|
|
55
|
-
test_job_rows.map{|j| r.jobs(j['name'])}.each{|j| j.delete if j}
|
|
56
|
-
jobs_sheet.add_or_update_rows(test_job_rows)
|
|
57
|
-
|
|
58
|
-
hive_1_stage_2_target_sheet = Mobilize::Gsheet.find_by_path("#{r.path.split("/")[0..-2].join("/")}/hive_test_1_stage_2.out",gdrive_slot)
|
|
59
|
-
[hive_1_stage_2_target_sheet].each{|s| s.delete if s}
|
|
60
|
-
hive_1_stage_3_target_sheet = Mobilize::Gsheet.find_by_path("#{r.path.split("/")[0..-2].join("/")}/hive_test_1_stage_3.out",gdrive_slot)
|
|
61
|
-
[hive_1_stage_3_target_sheet].each{|s| s.delete if s}
|
|
62
|
-
hive_2_target_sheet = Mobilize::Gsheet.find_by_path("#{r.path.split("/")[0..-2].join("/")}/hive_test_2.out",gdrive_slot)
|
|
63
|
-
[hive_2_target_sheet].each{|s| s.delete if s}
|
|
64
|
-
hive_3_target_sheet = Mobilize::Gsheet.find_by_path("#{r.path.split("/")[0..-2].join("/")}/hive_test_3.out",gdrive_slot)
|
|
65
|
-
[hive_3_target_sheet].each{|s| s.delete if s}
|
|
66
|
-
hive_4_target_sheet = Mobilize::Gsheet.find_by_path("#{r.path.split("/")[0..-2].join("/")}/hive_test_4.out",gdrive_slot)
|
|
67
|
-
[hive_4_target_sheet].each{|s| s.delete if s}
|
|
68
|
-
|
|
69
|
-
puts "job row added, force enqueued requestor, wait for stages"
|
|
70
|
-
r.enqueue!
|
|
71
|
-
wait_for_stages(2100)
|
|
72
|
-
|
|
73
|
-
puts "jobtracker posted data to test sheet"
|
|
74
|
-
hive_1_stage_2_target_sheet = Mobilize::Gsheet.find_by_path("#{r.path.split("/")[0..-2].join("/")}/hive_test_1_stage_2.out",gdrive_slot)
|
|
75
|
-
hive_1_stage_3_target_sheet = Mobilize::Gsheet.find_by_path("#{r.path.split("/")[0..-2].join("/")}/hive_test_1_stage_3.out",gdrive_slot)
|
|
76
|
-
hive_2_target_sheet = Mobilize::Gsheet.find_by_path("#{r.path.split("/")[0..-2].join("/")}/hive_test_2.out",gdrive_slot)
|
|
77
|
-
hive_3_target_sheet = Mobilize::Gsheet.find_by_path("#{r.path.split("/")[0..-2].join("/")}/hive_test_3.out",gdrive_slot)
|
|
78
|
-
hive_4_target_sheet = Mobilize::Gsheet.find_by_path("#{r.path.split("/")[0..-2].join("/")}/hive_test_4.out",gdrive_slot)
|
|
79
|
-
|
|
80
|
-
assert hive_1_stage_2_target_sheet.read(u.name).length == 219
|
|
81
|
-
assert hive_1_stage_3_target_sheet.read(u.name).length > 3
|
|
82
|
-
assert hive_2_target_sheet.read(u.name).length == 599
|
|
83
|
-
assert hive_3_target_sheet.read(u.name).length == 347
|
|
84
|
-
assert hive_4_target_sheet.read(u.name).length == 432
|
|
85
|
-
end
|
|
86
|
-
|
|
87
|
-
def wait_for_stages(time_limit=600,stage_limit=120,wait_length=10)
|
|
88
|
-
time = 0
|
|
89
|
-
time_since_stage = 0
|
|
90
|
-
#check for 10 min
|
|
91
|
-
while time < time_limit and time_since_stage < stage_limit
|
|
92
|
-
sleep wait_length
|
|
93
|
-
job_classes = Mobilize::Resque.jobs.map{|j| j['class']}
|
|
94
|
-
if job_classes.include?("Mobilize::Stage")
|
|
95
|
-
time_since_stage = 0
|
|
96
|
-
puts "saw stage at #{time.to_s} seconds"
|
|
97
|
-
else
|
|
98
|
-
time_since_stage += wait_length
|
|
99
|
-
puts "#{time_since_stage.to_s} seconds since stage seen"
|
|
100
|
-
end
|
|
101
|
-
time += wait_length
|
|
102
|
-
puts "total wait time #{time.to_s} seconds"
|
|
103
|
-
end
|
|
104
|
-
|
|
105
|
-
if time >= time_limit
|
|
106
|
-
raise "Timed out before stage completion"
|
|
107
|
-
end
|
|
108
|
-
end
|
|
109
|
-
|
|
110
|
-
|
|
111
|
-
|
|
112
|
-
end
|