mobilize-hdfs 1.351 → 1.361
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- data/README.md +3 -233
- data/lib/mobilize-hdfs.rb +3 -0
- data/lib/mobilize-hdfs/handlers/hdfs.rb +1 -11
- data/lib/mobilize-hdfs/tasks.rb +3 -2
- data/lib/mobilize-hdfs/version.rb +1 -1
- data/mobilize-hdfs.gemspec +1 -1
- data/test/fixtures/hdfs1.in.yml +91 -0
- data/test/fixtures/integration_expected.yml +17 -0
- data/test/fixtures/integration_jobs.yml +10 -0
- data/test/integration/mobilize-hdfs_test.rb +42 -0
- data/test/test_helper.rb +1 -0
- metadata +14 -10
- data/test/hdfs_job_rows.yml +0 -10
- data/test/mobilize-hdfs_test.rb +0 -70
data/README.md
CHANGED
@@ -1,234 +1,4 @@
|
|
1
|
-
Mobilize
|
2
|
-
|
1
|
+
Mobilize
|
2
|
+
========
|
3
3
|
|
4
|
-
|
5
|
-
* read, write, and copy hdfs files through Google
|
6
|
-
Spreadsheets.
|
7
|
-
|
8
|
-
Table Of Contents
|
9
|
-
-----------------
|
10
|
-
* [Overview](#section_Overview)
|
11
|
-
* [Install](#section_Install)
|
12
|
-
* [Mobilize-Hdfs](#section_Install_Mobilize-Hdfs)
|
13
|
-
* [Install Dirs and Files](#section_Install_Dirs_and_Files)
|
14
|
-
* [Configure](#section_Configure)
|
15
|
-
* [Hadoop](#section_Configure_Hadoop)
|
16
|
-
* [Start](#section_Start)
|
17
|
-
* [Create Job](#section_Start_Create_Job)
|
18
|
-
* [Run Test](#section_Start_Run_Test)
|
19
|
-
* [Meta](#section_Meta)
|
20
|
-
* [Author](#section_Author)
|
21
|
-
|
22
|
-
<a name='section_Overview'></a>
|
23
|
-
Overview
|
24
|
-
-----------
|
25
|
-
|
26
|
-
* Mobilize-hdfs adds Hdfs methods to mobilize-ssh.
|
27
|
-
|
28
|
-
<a name='section_Install'></a>
|
29
|
-
Install
|
30
|
-
------------
|
31
|
-
|
32
|
-
Make sure you go through all the steps in the
|
33
|
-
[mobilize-base][mobilize-base] and [mobilize-ssh][mobilize-ssh]
|
34
|
-
install sections first.
|
35
|
-
|
36
|
-
<a name='section_Install_Mobilize-Hdfs'></a>
|
37
|
-
### Mobilize-Hdfs
|
38
|
-
|
39
|
-
add this to your Gemfile:
|
40
|
-
|
41
|
-
``` ruby
|
42
|
-
gem "mobilize-hdfs"
|
43
|
-
```
|
44
|
-
|
45
|
-
or do
|
46
|
-
|
47
|
-
$ gem install mobilize-hdfs
|
48
|
-
|
49
|
-
for a ruby-wide install.
|
50
|
-
|
51
|
-
<a name='section_Install_Dirs_and_Files'></a>
|
52
|
-
### Dirs and Files
|
53
|
-
|
54
|
-
### Rakefile
|
55
|
-
|
56
|
-
Inside the Rakefile in your project's root dir, make sure you have:
|
57
|
-
|
58
|
-
``` ruby
|
59
|
-
require 'mobilize-base/tasks'
|
60
|
-
require 'mobilize-ssh/tasks'
|
61
|
-
require 'mobilize-hdfs/tasks'
|
62
|
-
```
|
63
|
-
|
64
|
-
This defines rake tasks essential to run the environment.
|
65
|
-
|
66
|
-
### Config Dir
|
67
|
-
|
68
|
-
run
|
69
|
-
|
70
|
-
$ rake mobilize_hdfs:setup
|
71
|
-
|
72
|
-
This will copy over a sample hadoop.yml to your config dir.
|
73
|
-
|
74
|
-
<a name='section_Configure'></a>
|
75
|
-
Configure
|
76
|
-
------------
|
77
|
-
|
78
|
-
<a name='section_Configure_Hadoop'></a>
|
79
|
-
### Configure Hadoop
|
80
|
-
|
81
|
-
* Hadoop is big data. That means we need to be careful when reading from
|
82
|
-
the cluster as it could easily fill up our mongodb instance, RAM, local disk
|
83
|
-
space, etc.
|
84
|
-
* To achieve this, all hadoop operations, stage outputs, etc. are
|
85
|
-
executed and stored on the cluster only.
|
86
|
-
* The exceptions are:
|
87
|
-
* writing to the cluster from an external source, such as a google
|
88
|
-
sheet. Here there
|
89
|
-
is no risk as the external source has much more strict size limits than
|
90
|
-
hdfs.
|
91
|
-
* reading from the cluster, such as for posting to google sheet. In
|
92
|
-
this case, the read_limit parameter dictates the maximum amount that can
|
93
|
-
be read. If the data is bigger than the read limit, an exception will be
|
94
|
-
raised.
|
95
|
-
|
96
|
-
The Hadoop configuration consists of:
|
97
|
-
* output_dir, which is the absolute path to the directory in HDFS that will store stage
|
98
|
-
outputs. Directory names should end with a slash (/). It will choose the
|
99
|
-
first cluster as the default cluster to write to.
|
100
|
-
* read_limit, which is the maximum size data that can be read from the
|
101
|
-
cluster. Default is 1GB.
|
102
|
-
* clusters - this defines aliases for clusters, which are used as
|
103
|
-
parameters for Hdfs stages. Cluster aliases contain 5 parameters:
|
104
|
-
* namenode - defines the name and port for accessing the namenode
|
105
|
-
* name - namenode full name, as in namenode1.host.com
|
106
|
-
* port - namenode port, by default 50070
|
107
|
-
* gateway_node - defines the node that executes the cluster commands.
|
108
|
-
* exec_path - defines the path to the hadoop
|
109
|
-
This node must be defined in ssh.yml according to the specs in
|
110
|
-
[mobilize-ssh][mobilize-ssh]. The gateway node can be the same for
|
111
|
-
multiple clusters, depending on your cluster setup.
|
112
|
-
|
113
|
-
Sample hadoop.yml:
|
114
|
-
|
115
|
-
``` yml
|
116
|
-
---
|
117
|
-
development:
|
118
|
-
output_dir: /user/mobilize/development/
|
119
|
-
read_limit: 1000000000
|
120
|
-
clusters:
|
121
|
-
dev_cluster:
|
122
|
-
namenode:
|
123
|
-
name: dev_namenode.host.com
|
124
|
-
port: 50070
|
125
|
-
gateway_node: dev_hadoop_host
|
126
|
-
exec_path: /path/to/hadoop
|
127
|
-
dev_cluster_2:
|
128
|
-
namenode:
|
129
|
-
name: dev_namenode_2.host.com
|
130
|
-
port: 50070
|
131
|
-
gateway_node: dev_hadoop_host
|
132
|
-
exec_path: /path/to/hadoop
|
133
|
-
test:
|
134
|
-
output_dir: /user/mobilize/test/
|
135
|
-
read_limit: 1000000000
|
136
|
-
clusters:
|
137
|
-
test_cluster:
|
138
|
-
namenode:
|
139
|
-
name: test_namenode.host.com
|
140
|
-
port: 50070
|
141
|
-
gateway_node: test_hadoop_host
|
142
|
-
exec_path: /path/to/hadoop
|
143
|
-
test_cluster_2:
|
144
|
-
namenode:
|
145
|
-
name: test_namenode_2.host.com
|
146
|
-
port: 50070
|
147
|
-
gateway_node: test_hadoop_host
|
148
|
-
exec_path: /path/to/hadoop
|
149
|
-
production:
|
150
|
-
output_dir: /user/mobilize/production/
|
151
|
-
read_limit: 1000000000
|
152
|
-
clusters:
|
153
|
-
prod_cluster:
|
154
|
-
namenode:
|
155
|
-
name: prod_namenode.host.com
|
156
|
-
port: 50070
|
157
|
-
gateway_node: prod_hadoop_host
|
158
|
-
exec_path: /path/to/hadoop
|
159
|
-
prod_cluster_2:
|
160
|
-
namenode:
|
161
|
-
name: prod_namenode_2.host.com
|
162
|
-
port: 50070
|
163
|
-
gateway_node: prod_hadoop_host
|
164
|
-
exec_path: /path/to/hadoop
|
165
|
-
```
|
166
|
-
|
167
|
-
<a name='section_Start'></a>
|
168
|
-
Start
|
169
|
-
-----
|
170
|
-
|
171
|
-
<a name='section_Start_Create_Job'></a>
|
172
|
-
### Create Job
|
173
|
-
|
174
|
-
* For mobilize-hdfs, the following stages are available.
|
175
|
-
* cluster and user are optional for all of the below.
|
176
|
-
* cluster defaults to output_cluster;
|
177
|
-
* user is treated the same way as in [mobilize-ssh][mobilize-ssh].
|
178
|
-
* hdfs.write `source:<full_path>, target:<hdfs_full_path>, user:<user>`
|
179
|
-
* The full_path can use `<gsheet_path>` or `<hdfs_path>`. The test uses "test_hdfs_1.in".
|
180
|
-
* `<hdfs_path>` is the cluster alias followed by absolute path on the cluster.
|
181
|
-
* if a full path is supplied without a preceding cluster alias (e.g. "/user/mobilize/test/test_hdfs_1.in"),
|
182
|
-
the first listed cluster will be used as the default.
|
183
|
-
* The test uses "/user/mobilize/test/test_hdfs_1.in" for the initial
|
184
|
-
write, then "test_cluster_2/user/mobilize/test/test_hdfs_copy.out" for
|
185
|
-
the cross-cluster write.
|
186
|
-
* both cluster arguments and user are optional. If writing from
|
187
|
-
one cluster to another, your source_cluster gateway_node must be able to
|
188
|
-
access both clusters.
|
189
|
-
|
190
|
-
<a name='section_Start_Run_Test'></a>
|
191
|
-
### Run Test
|
192
|
-
|
193
|
-
To run tests, you will need to
|
194
|
-
|
195
|
-
1) go through the [mobilize-base][mobilize-base] and [mobilize-ssh][mobilize-ssh] tests first
|
196
|
-
|
197
|
-
2) clone the mobilize-hdfs repository
|
198
|
-
|
199
|
-
From the project folder, run
|
200
|
-
|
201
|
-
3) $ rake mobilize_hdfs:setup
|
202
|
-
|
203
|
-
Copy over the config files from the mobilize-base and mobilize-ssh
|
204
|
-
projects into the config dir, and populate the values in the hadoop.yml file.
|
205
|
-
|
206
|
-
If you don't have two clusters, you can populate test_cluster_2 with the
|
207
|
-
same cluster as your first.
|
208
|
-
|
209
|
-
3) $ rake test
|
210
|
-
|
211
|
-
* The test runs a 3 stage job:
|
212
|
-
* test_hdfs_1:
|
213
|
-
* `hdfs.write target:"/user/mobilize/test/test_hdfs_1.out", source:"test_hdfs_1.in"`
|
214
|
-
* `hdfs.write source:"/user/mobilize/test/test_hdfs_1.out",target:"test_cluster_2/user/mobilize/test/test_hdfs_1_copy.out"`
|
215
|
-
* `gsheet.write source:"hdfs://test_cluster_2/user/mobilize/test/test_hdfs_1_copy.out", target:"test_hdfs_1_copy.out"`
|
216
|
-
* at the end of the test, there should be a sheet named "test_hdfs_1_copy.out" with the same data as test_hdfs_1.in
|
217
|
-
|
218
|
-
<a name='section_Meta'></a>
|
219
|
-
Meta
|
220
|
-
----
|
221
|
-
|
222
|
-
* Code: `git clone git://github.com/dena/mobilize-hdfs.git`
|
223
|
-
* Home: <https://github.com/dena/mobilize-hdfs>
|
224
|
-
* Bugs: <https://github.com/dena/mobilize-hdfs/issues>
|
225
|
-
* Gems: <http://rubygems.org/gems/mobilize-hdfs>
|
226
|
-
|
227
|
-
<a name='section_Author'></a>
|
228
|
-
Author
|
229
|
-
------
|
230
|
-
|
231
|
-
Cassio Paes-Leme :: cpaesleme@dena.com :: @cpaesleme
|
232
|
-
|
233
|
-
[mobilize-base]: https://github.com/dena/mobilize-base
|
234
|
-
[mobilize-ssh]: https://github.com/dena/mobilize-ssh
|
4
|
+
Please refer to the mobilize-server wiki: https://github.com/DeNA/mobilize-server/wiki
|
data/lib/mobilize-hdfs.rb
CHANGED
@@ -127,17 +127,7 @@ module Mobilize
|
|
127
127
|
path = path.starts_with?("/") ? path : "/#{path}"
|
128
128
|
end
|
129
129
|
url = "hdfs://#{cluster}#{path}"
|
130
|
-
|
131
|
-
begin
|
132
|
-
response = Hadoop.run(cluster, "fs -tail '#{hdfs_url}'", user_name)
|
133
|
-
if response['exit_code']==0 or is_target
|
134
|
-
return "hdfs://#{cluster}#{path}"
|
135
|
-
else
|
136
|
-
raise "Unable to find #{url} with error: #{response['stderr']}"
|
137
|
-
end
|
138
|
-
rescue => exc
|
139
|
-
raise Exception, "Unable to find #{url} with error: #{exc.to_s}", exc.backtrace
|
140
|
-
end
|
130
|
+
return url
|
141
131
|
end
|
142
132
|
|
143
133
|
def Hdfs.user_name_by_stage_path(stage_path,cluster=nil)
|
data/lib/mobilize-hdfs/tasks.rb
CHANGED
@@ -1,6 +1,7 @@
|
|
1
|
-
|
1
|
+
require 'yaml'
|
2
|
+
namespace :mobilize do
|
2
3
|
desc "Set up config and log folders and files"
|
3
|
-
task :
|
4
|
+
task :setup_hdfs do
|
4
5
|
sample_dir = File.dirname(__FILE__) + '/../samples/'
|
5
6
|
sample_files = Dir.entries(sample_dir)
|
6
7
|
config_dir = (ENV['MOBILIZE_CONFIG_DIR'] ||= "config/mobilize/")
|
data/mobilize-hdfs.gemspec
CHANGED
@@ -16,5 +16,5 @@ Gem::Specification.new do |gem|
|
|
16
16
|
gem.executables = gem.files.grep(%r{^bin/}).map{ |f| File.basename(f) }
|
17
17
|
gem.test_files = gem.files.grep(%r{^(test|spec|features)/})
|
18
18
|
gem.require_paths = ["lib"]
|
19
|
-
gem.add_runtime_dependency "mobilize-ssh","1.
|
19
|
+
gem.add_runtime_dependency "mobilize-ssh","1.361"
|
20
20
|
end
|
@@ -0,0 +1,91 @@
|
|
1
|
+
---
|
2
|
+
- test0: test0
|
3
|
+
test1: test1
|
4
|
+
test2: test2
|
5
|
+
test3: test3
|
6
|
+
test4: test4
|
7
|
+
test5: test5
|
8
|
+
test6: test6
|
9
|
+
test7: test7
|
10
|
+
test8: test8
|
11
|
+
test9: test9
|
12
|
+
- test0: test0
|
13
|
+
test1: test1
|
14
|
+
test2: test2
|
15
|
+
test3: test3
|
16
|
+
test4: test4
|
17
|
+
test5: test5
|
18
|
+
test6: test6
|
19
|
+
test7: test7
|
20
|
+
test8: test8
|
21
|
+
test9: test9
|
22
|
+
- test0: test0
|
23
|
+
test1: test1
|
24
|
+
test2: test2
|
25
|
+
test3: test3
|
26
|
+
test4: test4
|
27
|
+
test5: test5
|
28
|
+
test6: test6
|
29
|
+
test7: test7
|
30
|
+
test8: test8
|
31
|
+
test9: test9
|
32
|
+
- test0: test0
|
33
|
+
test1: test1
|
34
|
+
test2: test2
|
35
|
+
test3: test3
|
36
|
+
test4: test4
|
37
|
+
test5: test5
|
38
|
+
test6: test6
|
39
|
+
test7: test7
|
40
|
+
test8: test8
|
41
|
+
test9: test9
|
42
|
+
- test0: test0
|
43
|
+
test1: test1
|
44
|
+
test2: test2
|
45
|
+
test3: test3
|
46
|
+
test4: test4
|
47
|
+
test5: test5
|
48
|
+
test6: test6
|
49
|
+
test7: test7
|
50
|
+
test8: test8
|
51
|
+
test9: test9
|
52
|
+
- test0: test0
|
53
|
+
test1: test1
|
54
|
+
test2: test2
|
55
|
+
test3: test3
|
56
|
+
test4: test4
|
57
|
+
test5: test5
|
58
|
+
test6: test6
|
59
|
+
test7: test7
|
60
|
+
test8: test8
|
61
|
+
test9: test9
|
62
|
+
- test0: test0
|
63
|
+
test1: test1
|
64
|
+
test2: test2
|
65
|
+
test3: test3
|
66
|
+
test4: test4
|
67
|
+
test5: test5
|
68
|
+
test6: test6
|
69
|
+
test7: test7
|
70
|
+
test8: test8
|
71
|
+
test9: test9
|
72
|
+
- test0: test0
|
73
|
+
test1: test1
|
74
|
+
test2: test2
|
75
|
+
test3: test3
|
76
|
+
test4: test4
|
77
|
+
test5: test5
|
78
|
+
test6: test6
|
79
|
+
test7: test7
|
80
|
+
test8: test8
|
81
|
+
test9: test9
|
82
|
+
- test0: test0
|
83
|
+
test1: test1
|
84
|
+
test2: test2
|
85
|
+
test3: test3
|
86
|
+
test4: test4
|
87
|
+
test5: test5
|
88
|
+
test6: test6
|
89
|
+
test7: test7
|
90
|
+
test8: test8
|
91
|
+
test9: test9
|
@@ -0,0 +1,17 @@
|
|
1
|
+
---
|
2
|
+
- path: "Runner_mobilize(test)/jobs"
|
3
|
+
state: working
|
4
|
+
count: 1
|
5
|
+
confirmed_ats: []
|
6
|
+
- path: "Runner_mobilize(test)/jobs/hdfs1/stage1"
|
7
|
+
state: working
|
8
|
+
count: 1
|
9
|
+
confirmed_ats: []
|
10
|
+
- path: "Runner_mobilize(test)/jobs/hdfs1/stage2"
|
11
|
+
state: working
|
12
|
+
count: 1
|
13
|
+
confirmed_ats: []
|
14
|
+
- path: "Runner_mobilize(test)/jobs/hdfs1/stage3"
|
15
|
+
state: working
|
16
|
+
count: 1
|
17
|
+
confirmed_ats: []
|
@@ -0,0 +1,10 @@
|
|
1
|
+
- name: hdfs1
|
2
|
+
active: true
|
3
|
+
trigger: once
|
4
|
+
status: ""
|
5
|
+
stage1: hdfs.write target:"/user/mobilize/test/hdfs1.out",
|
6
|
+
source:"hdfs1.in"
|
7
|
+
stage2: hdfs.write source:"/user/mobilize/test/hdfs1.out",
|
8
|
+
target:"test_cluster_2/user/mobilize/test/hdfs1_copy.out",
|
9
|
+
stage3: gsheet.write source:"hdfs://test_cluster_2/user/mobilize/test/hdfs1_copy.out",
|
10
|
+
target:"Runner_mobilize(test)/hdfs1_copy.out"
|
@@ -0,0 +1,42 @@
|
|
1
|
+
require 'test_helper'
|
2
|
+
describe "Mobilize" do
|
3
|
+
# enqueues 4 workers on Resque
|
4
|
+
it "runs integration test" do
|
5
|
+
|
6
|
+
puts "restart workers"
|
7
|
+
Mobilize::Jobtracker.restart_workers!
|
8
|
+
|
9
|
+
u = TestHelper.owner_user
|
10
|
+
r = u.runner
|
11
|
+
user_name = u.name
|
12
|
+
gdrive_slot = u.email
|
13
|
+
|
14
|
+
puts "add test data"
|
15
|
+
["hdfs1.in"].each do |fixture_name|
|
16
|
+
target_url = "gsheet://#{r.title}/#{fixture_name}"
|
17
|
+
TestHelper.write_fixture(fixture_name, target_url, 'replace')
|
18
|
+
end
|
19
|
+
|
20
|
+
puts "add/update jobs"
|
21
|
+
u.jobs.each{|j| j.delete}
|
22
|
+
jobs_fixture_name = "integration_jobs"
|
23
|
+
jobs_target_url = "gsheet://#{r.title}/jobs"
|
24
|
+
TestHelper.write_fixture(jobs_fixture_name, jobs_target_url, 'update')
|
25
|
+
|
26
|
+
puts "job rows added, force enqueue runner, wait for stages"
|
27
|
+
#wait for stages to complete
|
28
|
+
expected_fixture_name = "integration_expected"
|
29
|
+
Mobilize::Jobtracker.stop!
|
30
|
+
r.enqueue!
|
31
|
+
TestHelper.confirm_expected_jobs(expected_fixture_name)
|
32
|
+
|
33
|
+
puts "update job status and activity"
|
34
|
+
r.update_gsheet(gdrive_slot)
|
35
|
+
|
36
|
+
puts "check posted data"
|
37
|
+
['hdfs1_copy.out'].each do |out_name|
|
38
|
+
url = "gsheet://#{r.title}/#{out_name}"
|
39
|
+
assert TestHelper.check_output(url, 'min_length' => 599) == true
|
40
|
+
end
|
41
|
+
end
|
42
|
+
end
|
data/test/test_helper.rb
CHANGED
metadata
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
2
2
|
name: mobilize-hdfs
|
3
3
|
version: !ruby/object:Gem::Version
|
4
|
-
version: '1.
|
4
|
+
version: '1.361'
|
5
5
|
prerelease:
|
6
6
|
platform: ruby
|
7
7
|
authors:
|
@@ -9,7 +9,7 @@ authors:
|
|
9
9
|
autorequire:
|
10
10
|
bindir: bin
|
11
11
|
cert_chain: []
|
12
|
-
date: 2013-
|
12
|
+
date: 2013-05-31 00:00:00.000000000 Z
|
13
13
|
dependencies:
|
14
14
|
- !ruby/object:Gem::Dependency
|
15
15
|
name: mobilize-ssh
|
@@ -18,7 +18,7 @@ dependencies:
|
|
18
18
|
requirements:
|
19
19
|
- - '='
|
20
20
|
- !ruby/object:Gem::Version
|
21
|
-
version: '1.
|
21
|
+
version: '1.361'
|
22
22
|
type: :runtime
|
23
23
|
prerelease: false
|
24
24
|
version_requirements: !ruby/object:Gem::Requirement
|
@@ -26,7 +26,7 @@ dependencies:
|
|
26
26
|
requirements:
|
27
27
|
- - '='
|
28
28
|
- !ruby/object:Gem::Version
|
29
|
-
version: '1.
|
29
|
+
version: '1.361'
|
30
30
|
description: Adds hdfs read, write, and copy support to mobilize-ssh
|
31
31
|
email:
|
32
32
|
- cpaesleme@dena.com
|
@@ -46,8 +46,10 @@ files:
|
|
46
46
|
- lib/mobilize-hdfs/version.rb
|
47
47
|
- lib/samples/hadoop.yml
|
48
48
|
- mobilize-hdfs.gemspec
|
49
|
-
- test/
|
50
|
-
- test/
|
49
|
+
- test/fixtures/hdfs1.in.yml
|
50
|
+
- test/fixtures/integration_expected.yml
|
51
|
+
- test/fixtures/integration_jobs.yml
|
52
|
+
- test/integration/mobilize-hdfs_test.rb
|
51
53
|
- test/redis-test.conf
|
52
54
|
- test/test_helper.rb
|
53
55
|
homepage: http://github.com/dena/mobilize-hdfs
|
@@ -64,7 +66,7 @@ required_ruby_version: !ruby/object:Gem::Requirement
|
|
64
66
|
version: '0'
|
65
67
|
segments:
|
66
68
|
- 0
|
67
|
-
hash:
|
69
|
+
hash: 2190980204946063989
|
68
70
|
required_rubygems_version: !ruby/object:Gem::Requirement
|
69
71
|
none: false
|
70
72
|
requirements:
|
@@ -73,7 +75,7 @@ required_rubygems_version: !ruby/object:Gem::Requirement
|
|
73
75
|
version: '0'
|
74
76
|
segments:
|
75
77
|
- 0
|
76
|
-
hash:
|
78
|
+
hash: 2190980204946063989
|
77
79
|
requirements: []
|
78
80
|
rubyforge_project:
|
79
81
|
rubygems_version: 1.8.25
|
@@ -81,7 +83,9 @@ signing_key:
|
|
81
83
|
specification_version: 3
|
82
84
|
summary: Adds hdfs read, write, and copy support to mobilize-ssh
|
83
85
|
test_files:
|
84
|
-
- test/
|
85
|
-
- test/
|
86
|
+
- test/fixtures/hdfs1.in.yml
|
87
|
+
- test/fixtures/integration_expected.yml
|
88
|
+
- test/fixtures/integration_jobs.yml
|
89
|
+
- test/integration/mobilize-hdfs_test.rb
|
86
90
|
- test/redis-test.conf
|
87
91
|
- test/test_helper.rb
|
data/test/hdfs_job_rows.yml
DELETED
@@ -1,10 +0,0 @@
|
|
1
|
-
- name: test_hdfs_1
|
2
|
-
active: true
|
3
|
-
trigger: once
|
4
|
-
status: ""
|
5
|
-
stage1: hdfs.write target:"/user/mobilize/test/test_hdfs_1.out",
|
6
|
-
source:"test_hdfs_1.in"
|
7
|
-
stage2: hdfs.write source:"/user/mobilize/test/test_hdfs_1.out",
|
8
|
-
target:"test_cluster_2/user/mobilize/test/test_hdfs_1_copy.out",
|
9
|
-
stage3: gsheet.write source:"hdfs://test_cluster_2/user/mobilize/test/test_hdfs_1_copy.out",
|
10
|
-
target:"Runner_mobilize(test)/test_hdfs_1_copy.out"
|
data/test/mobilize-hdfs_test.rb
DELETED
@@ -1,70 +0,0 @@
|
|
1
|
-
require 'test_helper'
|
2
|
-
|
3
|
-
describe "Mobilize" do
|
4
|
-
|
5
|
-
def before
|
6
|
-
puts 'nothing before'
|
7
|
-
end
|
8
|
-
|
9
|
-
# enqueues 4 workers on Resque
|
10
|
-
it "runs integration test" do
|
11
|
-
|
12
|
-
puts "restart workers"
|
13
|
-
Mobilize::Jobtracker.restart_workers!
|
14
|
-
|
15
|
-
gdrive_slot = Mobilize::Gdrive.owner_email
|
16
|
-
puts "create user 'mobilize'"
|
17
|
-
user_name = gdrive_slot.split("@").first
|
18
|
-
u = Mobilize::User.where(:name=>user_name).first
|
19
|
-
r = u.runner
|
20
|
-
hdfs_1_sheet = Mobilize::Gsheet.find_by_path("#{r.path.split("/")[0..-2].join("/")}/test_hdfs_1.in",gdrive_slot)
|
21
|
-
[hdfs_1_sheet].each {|s| s.delete if s}
|
22
|
-
|
23
|
-
puts "add test_source data"
|
24
|
-
hdfs_1_sheet = Mobilize::Gsheet.find_or_create_by_path("#{r.path.split("/")[0..-2].join("/")}/test_hdfs_1.in",gdrive_slot)
|
25
|
-
hdfs_1_tsv = ([%w{test0 test1 test2 test3 test4 test5 test6 test7 test8 test9}.join("\t")]*10).join("\n")
|
26
|
-
hdfs_1_sheet.write(hdfs_1_tsv,u.name)
|
27
|
-
|
28
|
-
jobs_sheet = r.gsheet(gdrive_slot)
|
29
|
-
|
30
|
-
test_job_rows = ::YAML.load_file("#{Mobilize::Base.root}/test/hdfs_job_rows.yml")
|
31
|
-
test_job_rows.map{|j| r.jobs(j['name'])}.each{|j| j.delete if j}
|
32
|
-
jobs_sheet.add_or_update_rows(test_job_rows)
|
33
|
-
|
34
|
-
hdfs_1_target_sheet = Mobilize::Gsheet.find_by_path("#{r.path.split("/")[0..-2].join("/")}/test_hdfs_1_copy.out",gdrive_slot)
|
35
|
-
[hdfs_1_target_sheet].each {|s| s.delete if s}
|
36
|
-
|
37
|
-
puts "job row added, force enqueued requestor, wait for stages"
|
38
|
-
r.enqueue!
|
39
|
-
wait_for_stages
|
40
|
-
|
41
|
-
puts "jobtracker posted data to test sheet"
|
42
|
-
test_destination_sheet = Mobilize::Gsheet.find_by_path("#{r.path.split("/")[0..-2].join("/")}/test_hdfs_1_copy.out",gdrive_slot)
|
43
|
-
|
44
|
-
assert test_destination_sheet.read(u.name).length == 599
|
45
|
-
end
|
46
|
-
|
47
|
-
def wait_for_stages(time_limit=600,stage_limit=120,wait_length=10)
|
48
|
-
time = 0
|
49
|
-
time_since_stage = 0
|
50
|
-
#check for 10 min
|
51
|
-
while time < time_limit and time_since_stage < stage_limit
|
52
|
-
sleep wait_length
|
53
|
-
job_classes = Mobilize::Resque.jobs.map{|j| j['class']}
|
54
|
-
if job_classes.include?("Mobilize::Stage")
|
55
|
-
time_since_stage = 0
|
56
|
-
puts "saw stage at #{time.to_s} seconds"
|
57
|
-
else
|
58
|
-
time_since_stage += wait_length
|
59
|
-
puts "#{time_since_stage.to_s} seconds since stage seen"
|
60
|
-
end
|
61
|
-
time += wait_length
|
62
|
-
puts "total wait time #{time.to_s} seconds"
|
63
|
-
end
|
64
|
-
|
65
|
-
if time >= time_limit
|
66
|
-
raise "Timed out before stage completion"
|
67
|
-
end
|
68
|
-
end
|
69
|
-
|
70
|
-
end
|