pwrake 2.1.3 → 2.2.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA1:
3
- metadata.gz: db880d8483f79d6812bd1920557350a7b4311378
4
- data.tar.gz: a1d2779a59eb21dd4df2b8ee5e9a00054a50654e
3
+ metadata.gz: f781cdfe39747649da0fafb2ebb37e579d294483
4
+ data.tar.gz: 5985be277367367065e9fef64188fc43ec8a2de4
5
5
  SHA512:
6
- metadata.gz: 4bc21c0d7a5ef317d5d03883ca918e2c2218e0ec28e13167079127820e0c4800923f4d23c6891e4eff5dcdfb010db4b2d6214a6d9eedf5c6c5597a1353467d7a
7
- data.tar.gz: b851a5ffb98ddccf77b52bbd5c4339a900d617ce8018e9c68ee665999ac9a70a8b5295be21c88832a8d9f20cd257da81b9b4c91715f487fde0711a60db4e9931
6
+ metadata.gz: 847964dd1146902e2f6f5988a26188184845e1b7879ed43a311560266e90d33d3c5febe7bb6727cc2003075b540046028a6cbaea6c28451888866d1e43aa1cfe
7
+ data.tar.gz: 76af1e8e961e9ff22e1c6ca0626a416eb73f2f43a9ce83c0c4b72a311841fd319b6a99c1fbf9594cbf5a4efafe8dc00833dae772550b705837f58e40f48ad69a
data/.gitignore CHANGED
@@ -30,3 +30,5 @@ tmp
30
30
  rhosts
31
31
  spec/*/*.dat
32
32
  spec/*/*.csv
33
+ spec/*/hoge*
34
+ spec/*/Pwrake*
data/README.md CHANGED
@@ -3,9 +3,9 @@
3
3
  Parallel Workflow extension for Rake, runs on multicores, clusters, clouds.
4
4
  * Author: Masahiro Tanaka
5
5
 
6
- ([README in Japanese](https://github.com/masa16/pwrake/wiki/Pwrakeとは)),
7
- ([GitHub Repository](https://github.com/masa16/pwrake)),
8
- ([RubyGems](https://rubygems.org/gems/pwrake))
6
+ [README in Japanese](https://github.com/masa16/pwrake/wiki/Pwrakeとは),
7
+ [GitHub Repository](https://github.com/masa16/pwrake),
8
+ [RubyGems](https://rubygems.org/gems/pwrake)
9
9
 
10
10
  ## Features
11
11
 
@@ -14,7 +14,7 @@ Parallel Workflow extension for Rake, runs on multicores, clusters, clouds.
14
14
  * The tasks which do not have mutual dependencies are automatically executed in parallel.
15
15
  * The `multitask` which is a parallel task definition of Rake is no more necessary.
16
16
  * Parallel and distributed execution is possible using a computer cluster which consists of multiple compute nodes.
17
- * Cluster settings: SSH login, and the directory sharing using a shared filesystem, e.g., NFS, Gfarm.
17
+ * Cluster settings: SSH login (or MPI), and the directory sharing using a shared filesystem, e.g., NFS, Gfarm.
18
18
  * Pwrake automatically connects to remote hosts using SSH. You do not need to start a daemon.
19
19
  * Remote host names and the number of cores to use are provided in a hostfile.
20
20
  * [Gfarm file system](http://sourceforge.net/projects/gfarm/) utilizes storage of compute nodes. It provides the high-performance parallel I/O.
@@ -68,7 +68,15 @@ In this case, you need the rehash of command paths:
68
68
 
69
69
  4. Run `pwrake` with an option `--hostfile` or `-F`:
70
70
 
71
- $ pwrake --hostfile=hosts
71
+ $ pwrake -F hosts
72
+
73
+ ### Use MPI to start remote worker
74
+
75
+ 1. Setup MPI on your cluster.
76
+ 2. Install [MPipe gem](https://rubygems.org/gems/mpipe). (requires `mpicc`)
77
+ 3. Run `pwrake-mpi` command.
78
+
79
+ $ pwrake-mpi -F hosts
72
80
 
73
81
  ## Options
74
82
 
@@ -115,8 +123,6 @@ In this case, you need the rehash of command paths:
115
123
  WORK_DIR default=$PWD
116
124
  FILESYSTEM default(autodetect)|gfarm
117
125
  SSH_OPTION SSH option
118
- SHELL_COMMAND default=$SHELL
119
- SHELL_RC Run-Command when shell starts
120
126
  PASS_ENV (Array) Environment variables passed to SSH
121
127
  HEARTBEAT default=240 - Hearbeat interval in seconds
122
128
  RETRY default=1 - The number of retry
@@ -177,7 +183,7 @@ Properties (The leftmost item is default):
177
183
 
178
184
  gem install ffi
179
185
 
180
- ## For Graph Partitioning
186
+ ## Scheduling with Graph Partitioning
181
187
 
182
188
  * Compile and Install METIS 5.1.0 (http://www.cs.umn.edu/~metis/). This requires CMake.
183
189
 
@@ -187,15 +193,22 @@ Properties (The leftmost item is default):
187
193
  --with-metis-include=/usr/local/include \
188
194
  --with-metis-lib=/usr/local/lib
189
195
 
196
+ * Option (`pwrake_conf.yaml`):
197
+
198
+ GRAPH_PARTITION: true
199
+
200
+ * See publication: [M. Tanaka and O. Tatebe, “Workflow Scheduling to Minimize Data Movement Using Multi-constraint Graph Partitioning,” in CCGrid 2012](http://ieeexplore.ieee.org/abstract/document/6217406/)
201
+
190
202
  ## Current version
191
203
 
192
- * Pwrake version 2.0.0
204
+ * Pwrake version 2.2.0
193
205
 
194
206
  ## Tested Platform
195
207
 
196
- * Ruby 2.2.2
197
- * Rake 10.4.2
198
- * CentOS 6.7
208
+
209
+ * Ruby 2.4.0
210
+ * Rake 12.0.0
211
+ * CentOS 7.3
199
212
 
200
213
  ## Acknowledgment
201
214
 
@@ -0,0 +1,41 @@
1
+ #!/usr/bin/env ruby
2
+
3
+ begin
4
+ require 'rake'
5
+ rescue LoadError
6
+ require 'rubygems'
7
+ require 'rake'
8
+ end
9
+
10
+ libpath = File.absolute_path(File.dirname(__FILE__))+"/../lib"
11
+ $LOAD_PATH.unshift libpath
12
+
13
+ require "pwrake/version"
14
+ require "pwrake/master/master_application"
15
+ require "shellwords"
16
+
17
+ module Pwrake
18
+ module MasterApplication
19
+ def run
20
+ standard_exception_handling do
21
+ init("pwrake") # <- parse options here
22
+ opts = Option.new
23
+ hosts = opts.host_map.map{|b,a| a.map{|h| h.name}}.flatten
24
+ if opts['MASTER_IS_FIRST_HOST']
25
+ [hosts[0],*hosts]
26
+ else
27
+ [Socket.gethostname,*hosts]
28
+ end
29
+ end
30
+ end
31
+ end;end
32
+
33
+ class Rake::Application
34
+ prepend Pwrake::MasterApplication
35
+ end
36
+
37
+ hosts = Rake.application.run.join(',')
38
+ args = ARGV.map{|x| Shellwords.escape(x)}.join(" ")
39
+
40
+ cmd="mpirun -wdir \"$HOME\" -host #{hosts} pwrake-mpi-run \"$PWD\" #{args}"
41
+ exec cmd
@@ -0,0 +1,16 @@
1
+ #!/usr/bin/env ruby
2
+
3
+ libpath = File.absolute_path(File.dirname(__FILE__))+"/../lib"
4
+ $LOAD_PATH.unshift libpath
5
+
6
+ require "mpipe"
7
+ MPipe.init
8
+
9
+ wdir = ARGV.shift
10
+
11
+ if MPipe::Comm.rank == 0
12
+ Dir.chdir(wdir)
13
+ require "pwrake/mpi/branch"
14
+ else
15
+ require "pwrake/mpi/worker"
16
+ end
@@ -9,6 +9,12 @@ module Pwrake
9
9
 
10
10
  class Branch
11
11
 
12
+ @@io_class = IO
13
+
14
+ def self.io_class=(io_class)
15
+ @@io_class = io_class
16
+ end
17
+
12
18
  def initialize(opts,r,w)
13
19
  Thread.abort_on_exception = true
14
20
  @option = opts
@@ -16,10 +22,17 @@ module Pwrake
16
22
  @shells = []
17
23
  @ior = r
18
24
  @iow = w
19
- @selector = NBIO::Selector.new
25
+ @selector = NBIO::Selector.new(@@io_class)
20
26
  @master_rd = NBIO::Reader.new(@selector,@ior)
21
27
  @master_wt = NBIO::Writer.new(@selector,@iow)
22
28
  @shell_start_interval = @option['SHELL_START_INTERVAL']
29
+
30
+ # init_logger
31
+ Log.set_logger(@option)
32
+ if dir = @option['LOG_DIR']
33
+ fn = File.join(dir,@option["COMMAND_CSV_FILE"])
34
+ Shell.profiler.open(fn,@option['GNU_TIME'],@option['PLOT_PARALLELISM'])
35
+ end
23
36
  end
24
37
 
25
38
  # Rakefile is loaded after 'init' before 'run'
@@ -33,34 +46,6 @@ module Pwrake
33
46
  Log.debug "Branch#run end"
34
47
  end
35
48
 
36
- attr_reader :logger
37
-
38
- def init_logger
39
- if dir = @option['LOG_DIR']
40
- logfile = File.join(dir,@option['LOG_FILE'])
41
- @logger = Logger.new(logfile)
42
- else
43
- if @option['DEBUG']
44
- @logger = Logger.new($stderr)
45
- else
46
- @logger = Logger.new(File::NULL)
47
- end
48
- end
49
-
50
- if @option['DEBUG']
51
- @logger.level = Logger::DEBUG
52
- elsif @option['TRACE']
53
- @logger.level = Logger::INFO
54
- else
55
- @logger.level = Logger::WARN
56
- end
57
-
58
- if dir = @option['LOG_DIR']
59
- fn = File.join(dir,@option["COMMAND_CSV_FILE"])
60
- Shell.profiler.open(fn,@option['GNU_TIME'],@option['PLOT_PARALLELISM'])
61
- end
62
- end
63
-
64
49
  def setup_worker
65
50
  @cs = CommunicatorSet.new(@master_rd,@selector,@option.worker_option)
66
51
  @cs.create_communicators
@@ -76,6 +61,9 @@ module Pwrake
76
61
  @cs.each_value do |comm|
77
62
  # set WorkerChannel#ncore at Master
78
63
  @master_wt.put_line "ncore:#{comm.id}:#{comm.ncore}"
64
+ comm.ipaddr.each do |ipa|
65
+ @master_wt.put_line "ip:#{comm.id}:#{ipa}"
66
+ end
79
67
  end
80
68
  @master_wt.put_line "ncore:done"
81
69
  end.resume
@@ -6,51 +6,41 @@ module Pwrake
6
6
  # The TaskManager module is a mixin for managing tasks.
7
7
  module BranchApplication
8
8
 
9
- def logger
10
- @branch.logger
11
- end
12
-
13
9
  def run_branch(r,w)
14
- #standard_exception_handling do
15
- init("pwrake_branch")
16
- opts = Marshal.load(r)
17
- if !opts.kind_of?(Hash)
18
- raise "opts is not a Hash: opts=#{opts.inspect}"
19
- end
20
- @branch = Branch.new(opts,r,w)
21
- @branch.init_logger
22
- opts.feedback_options
23
- load_rakefile
24
- w.puts "pwrake_branch start"
25
- w.flush
26
- begin
27
- @branch.run
28
- rescue => e
29
- Log.fatal e
30
- $stderr.puts e
31
- $stderr.puts e.backtrace
32
- @branch.kill
33
- ensure
34
- @branch.finish
35
- end
36
- #end
10
+ init("pwrake_branch")
11
+ opts = Marshal.load(r)
12
+ if !opts.kind_of?(Hash)
13
+ raise "opts is not a Hash: opts=#{opts.inspect}"
14
+ end
15
+ @branch = Branch.new(opts,r,w)
16
+ opts.feedback_options
17
+ load_rakefile
18
+ w.puts "pwrake_branch start"
19
+ w.flush
20
+ begin
21
+ @branch.run
22
+ rescue => e
23
+ Log.fatal e
24
+ $stderr.puts e
25
+ $stderr.puts e.backtrace
26
+ @branch.kill
27
+ ensure
28
+ @branch.finish
29
+ end
37
30
  end
38
31
 
39
32
  def run_branch_in_thread(r,w,opts)
40
- #standard_exception_handling do
41
- @branch = Branch.new(opts,r,w)
42
- @branch.init_logger
43
- begin
44
- @branch.run
45
- rescue => e
46
- Log.fatal e
47
- $stderr.puts e
48
- $stderr.puts e.backtrace
49
- @branch.kill
50
- ensure
51
- @branch.finish
52
- end
53
- #end
33
+ @branch = Branch.new(opts,r,w)
34
+ begin
35
+ @branch.run
36
+ rescue => e
37
+ Log.fatal e
38
+ $stderr.puts e
39
+ $stderr.puts e.backtrace
40
+ @branch.kill
41
+ ensure
42
+ @branch.finish
43
+ end
54
44
  end
55
45
 
56
46
  end
@@ -38,6 +38,7 @@ class Communicator
38
38
  attr_reader :id, :host, :ncore, :channel
39
39
  attr_reader :reader, :writer, :handler
40
40
  attr_reader :shells
41
+ attr_reader :ipaddr
41
42
 
42
43
  def initialize(set,id,host,ncore,selector,option)
43
44
  @set = set
@@ -47,6 +48,7 @@ class Communicator
47
48
  @selector = selector
48
49
  @option = option
49
50
  @shells = {}
51
+ @ipaddr = []
50
52
  end
51
53
 
52
54
  def inspect
@@ -58,10 +60,9 @@ class Communicator
58
60
  CommChannel.new(@host,i,q,@writer,[@ior,@iow,@ioe])
59
61
  end
60
62
 
61
- def connect(worker_code)
63
+ def setup_pipe(worker_code)
62
64
  rb_cmd = "ruby -e 'eval ARGF.read(#{worker_code.size})'"
63
- if ['localhost','localhost.localdomain','127.0.0.1'].include? @host
64
- #if /^localhost/ =~ @host
65
+ if %w[127.0.0.1 ::1].include?(IPSocket.getaddress(@host))
65
66
  cmd = rb_cmd
66
67
  else
67
68
  cmd = "ssh -x -T #{@option[:ssh_option]} #{@host} \"#{rb_cmd}\""
@@ -74,18 +75,33 @@ class Communicator
74
75
  w0.close
75
76
  w1.close
76
77
  r2.close
78
+ # send worker_code
79
+ @iow.write(worker_code)
80
+ end
81
+
82
+ def connect(worker_code)
83
+ setup_pipe(worker_code)
84
+
85
+ # send ncore and options
86
+ opts = Marshal.dump(@option)
87
+ s = [@ncore||0, opts.size].pack("V2")
88
+ @iow.write(s)
89
+ @iow.write(opts)
90
+
77
91
  sel = @set.selector
78
92
  @reader = NBIO::MultiReader.new(sel,@ior)
79
93
  @rd_err = NBIO::Reader.new(sel,@ioe)
80
94
  @writer = NBIO::Writer.new(sel,@iow)
81
95
  @handler = NBIO::Handler.new(@reader,@writer,@host)
82
- #
83
- @writer.write(worker_code)
84
- @writer.write(Marshal.dump(@ncore))
85
- @writer.write(Marshal.dump(@option))
96
+
86
97
  # read ncore
87
98
  while s = @reader.get_line
88
- if /^ncore:(.*)$/ =~ s
99
+ case s
100
+ when /^ip:(.*)$/
101
+ a = $1
102
+ @ipaddr.push(a)
103
+ Log.debug "ip=#{a} @#{@host}"
104
+ when /^ncore:(.*)$/
89
105
  a = $1
90
106
  Log.debug "ncore=#{a} @#{@host}"
91
107
  if /^(\d+)$/ =~ a
@@ -134,9 +150,14 @@ class Communicator
134
150
  err_out = []
135
151
  begin
136
152
  finish_shells
137
- @handler.exit
138
- while s = @rd_err.get_line
139
- err_out << s
153
+ if @handler
154
+ @handler.exit
155
+ @handler = nil
156
+ end
157
+ if @rd_err
158
+ while s = @rd_err.get_line
159
+ err_out << s
160
+ end
140
161
  end
141
162
  rescue => e
142
163
  m = Log.bt(e)
@@ -11,10 +11,16 @@ class CommunicatorSet
11
11
  @selector = selector
12
12
  @option = option
13
13
  @communicators = {}
14
+ @error_host = []
14
15
  @initial_communicators = []
15
16
  if hb = @option[:heartbeat]
16
17
  @heartbeat_timeout = hb + 15
17
18
  end
19
+ init_hosts
20
+ end
21
+
22
+ def init_hosts
23
+ # for pwrake-mpi
18
24
  end
19
25
 
20
26
  attr_reader :selector
@@ -4,10 +4,38 @@ module Pwrake
4
4
 
5
5
  module Log
6
6
 
7
+ @@logger = nil
8
+
7
9
  module_function
8
10
 
11
+ def set_logger(option)
12
+ unless @@logger
13
+ if logdir = option['LOG_DIR']
14
+ ::FileUtils.mkdir_p(logdir)
15
+ logfile = File.join(logdir, option['LOG_FILE'])
16
+ @@logger = Logger.new(logfile)
17
+ else
18
+ if option['DEBUG']
19
+ @@logger = Logger.new($stderr)
20
+ else
21
+ @@logger = Logger.new(File::NULL)
22
+ end
23
+ end
24
+
25
+ if option['DEBUG']
26
+ @@logger.level = Logger::DEBUG
27
+ else
28
+ @@logger.level = Logger::INFO
29
+ end
30
+
31
+ at_exit{@@logger.close}
32
+ end
33
+ end
34
+
9
35
  def method_missing(meth_id,*args)
10
- Rake.application.logger.send(meth_id,*args)
36
+ if @@logger
37
+ @@logger.send(meth_id,*args)
38
+ end
11
39
  end
12
40
 
13
41
  def bt(e)