ood_core 0.11.1 → 0.13.0

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: bf5cfe29bd0770daa8404169e04fcc8fcdc9a89b88f83bbfbc8675040b119ccf
4
- data.tar.gz: c6082f2c7b751c0b7f247dcc790ae35bc1a372830b6f7c529d1b5574eed714ee
3
+ metadata.gz: 3296708d7bc47f3379a9e4a6c845d3f25c5ccefb599f4b92406d9dffdaef220b
4
+ data.tar.gz: b6af9e90b67bc9a7a52203808d849d8800336b30b09bdb8ed204526d01bc92e9
5
5
  SHA512:
6
- metadata.gz: 30c82f37cf6c974c04a3d8c9bc9da21e47014b7aecfa03424575196e42aa0ebebb89aba6717e73073a1ed3d963d8391fa76da263a4e3a7e4dc4250a2cb32f830
7
- data.tar.gz: 9c7be268d29f4dd6c9cec57ce783e7eb0777c76e3df7025062e77868985a61e410cd24ec7d39c2a4e91a039d85a85c41fad75e28a408c8e00940597e5a1fb1ff
6
+ metadata.gz: 623ac6e6f8081d68a3e925d1150c9f20a0f613ccfb6837519d1b95d04533a72caa403c54327aad85dcea9c0694cc23941f40307d942623c095f53fed7fc32026
7
+ data.tar.gz: 0d785a9ade36b2f6f62f9ae55672091346aa4fb76bf358e6c00d4bc007623b8d1798813474665fc7b4d850d89e041fae5c2fefc9719fbe9f53a161a76127eaad
@@ -6,12 +6,40 @@ The format is based on [Keep a Changelog](http://keepachangelog.com/en/1.0.0/)
6
6
  and this project adheres to [Semantic Versioning](http://semver.org/spec/v2.0.0.html).
7
7
 
8
8
  ## [Unreleased]
9
+ ## [0.13.0] - 2020-08-10
10
+ ### Added
11
+ - CloudyCluster CCQ Adapter
12
+
13
+ ## [0.12.0] - 2020-08-05
14
+ ### Added
15
+ - qos option to Slurm and Torque [#205](https://github.com/OSC/ood_core/pull/205)
16
+ - native hash returned in qstat for SGE adapter [#198](https://github.com/OSC/ood_core/pull/198)
17
+ - option for specifying `submit_host` to submit jobs via ssh on other host [#204](https://github.com/OSC/ood_core/pull/204)
18
+
19
+ ### Fixed
20
+ - SGE handle milliseconds instead of seconds when milliseconds used [#206](https://github.com/OSC/ood_core/issues/206)
21
+ - Torque's native "hash" for job submission now handles env vars values with spaces [#202](https://github.com/OSC/ood_core/pull/202)
22
+
23
+ ## [0.11.4] - 2020-05-27
24
+ ### Fixed
25
+ - Environment exports in SLURM while implementing [#158](https://github.com/OSC/ood_core/issues/158)
26
+ and [#109](https://github.com/OSC/ood_core/issues/109) in [#163](https://github.com/OSC/ood_core/pull/163)
27
+
28
+ ## [0.11.3] - 2020-05-11
29
+ ### Fixed
30
+ - LinuxhHost Adapter to work with any login shell ([#188](https://github.com/OSC/ood_core/pull/188))
31
+ - LinuxhHost Adapter needs to display long lines in pstree to successfully parse
32
+ output ([#188](https://github.com/OSC/ood_core/pull/188))
33
+
34
+ ## [0.11.2] - 2020-04-23
35
+ ### Fixed
36
+ - fix signature of `LinuxHost#info_where_owner`
9
37
 
10
- ## [0.11.1] - 2012-03-18
38
+ ## [0.11.1] - 2020-03-18
11
39
  ### Changed
12
40
  - Only the version changed. Had to republish to rubygems.org
13
41
 
14
- ## [0.11.0] - 2012-03-18
42
+ ## [0.11.0] - 2020-03-18
15
43
  ### Added
16
44
  - Added directive prefixes to each adapter (e.g. `#QSUB`) ([#161](https://github.com/OSC/ood_core/issues/161))
17
45
  - LHA supports `submit_host` field in native ([#164](https://github.com/OSC/ood_core/issues/164))
@@ -219,7 +247,12 @@ and this project adheres to [Semantic Versioning](http://semver.org/spec/v2.0.0.
219
247
  ### Added
220
248
  - Initial release!
221
249
 
222
- [Unreleased]: https://github.com/OSC/ood_core/compare/v0.11.1...HEAD
250
+ [Unreleased]: https://github.com/OSC/ood_core/compare/v0.13.0...HEAD
251
+ [0.13.0]: https://github.com/OSC/ood_core/compare/v0.12.0...v0.13.0
252
+ [0.12.0]: https://github.com/OSC/ood_core/compare/v0.11.4...v0.12.0
253
+ [0.11.4]: https://github.com/OSC/ood_core/compare/v0.11.3...v0.11.4
254
+ [0.11.3]: https://github.com/OSC/ood_core/compare/v0.11.2...v0.11.3
255
+ [0.11.2]: https://github.com/OSC/ood_core/compare/v0.11.1...v0.11.2
223
256
  [0.11.1]: https://github.com/OSC/ood_core/compare/v0.11.0...v0.11.1
224
257
  [0.11.0]: https://github.com/OSC/ood_core/compare/v0.10.0...v0.11.0
225
258
  [0.10.0]: https://github.com/OSC/ood_core/compare/v0.9.3...v0.10.0
data/README.md CHANGED
@@ -4,12 +4,13 @@
4
4
  ![GitHub Release](https://img.shields.io/github/release/osc/ood_core.svg)
5
5
  ![GitHub License](https://img.shields.io/github/license/osc/ood_core.svg)
6
6
 
7
- Welcome to your new gem! In this directory, you'll find the files you need to
8
- be able to package up your Ruby library into a gem. Put your Ruby code in the
9
- file `lib/ood_core`. To experiment with that code, run `bin/console` for an
10
- interactive prompt.
7
+ - Website: http://openondemand.org/
8
+ - Website repo with JOSS publication: https://github.com/OSC/Open-OnDemand
9
+ - Documentation: https://osc.github.io/ood-documentation/master/
10
+ - Main code repo: https://github.com/OSC/ondemand
11
+ - Core library repo: https://github.com/OSC/ood_core
11
12
 
12
- TODO: Delete this and the text above, and describe your gem
13
+ OnDemand core library with adapters for each batch scheduler.
13
14
 
14
15
  ## Installation
15
16
 
@@ -0,0 +1,267 @@
1
+ require "ood_core/job/adapters/helper"
2
+ require "tempfile"
3
+
4
+ module OodCore
5
+ module Job
6
+ class Factory
7
+ using Refinements::HashExtensions
8
+
9
+ # Build the Cloudy Cluster adapter from a configuration
10
+ # @param config [#to_h] the configuration for job adapter
11
+ # @option config [Object] :image (nil) The default VM image to use
12
+ # @option config [Object] :cloud (gcp) The cloud provider being used [gcp,aws]
13
+ # @option config [Object] :scheduler (nil) The name of the scheduler to use
14
+ # @option config [Object] :sge_root (nil) Path to SGE root, note that
15
+ # @option config [#to_h] :bin (nil) Path to CC client binaries
16
+ # @option config [#to_h] :bin_overrides ({}) Optional overrides to CC client executables
17
+ def self.build_ccq(config)
18
+ Adapters::CCQ.new(config.to_h.symbolize_keys)
19
+ end
20
+ end
21
+
22
+ module Adapters
23
+
24
+ class PromptError < StandardError; end
25
+
26
+ class CCQ < Adapter
27
+ using Refinements::ArrayExtensions
28
+
29
+ attr_reader :image, :cloud, :scheduler, :bin, :bin_overrides, :jobid_regex
30
+
31
+ def initialize(config)
32
+ @image = config.fetch(:image, nil)
33
+ @cloud = config.fetch(:cloud, gcp_provider)
34
+ @scheduler = config.fetch(:scheduler, nil)
35
+ @bin = config.fetch(:bin, '/opt/CloudyCluster/srv/CCQ')
36
+ @bin_overrides = config.fetch(:bin_overrides, {})
37
+ @jobid_regex = config.fetch(:jobid_regex, "job id is: (?<job_id>\\d+) you")
38
+ end
39
+
40
+ # Submit a job with the attributes defined in the job template instance
41
+ # @param script [Script] script object that describes the script and
42
+ # attributes for the submitted job
43
+ # @param after [#to_s, Array<#to_s>] not used
44
+ # @param afterok [#to_s, Array<#to_s>] not used
45
+ # @param afternotok [#to_s, Array<#to_s>] not used
46
+ # @param afterany [#to_s, Array<#to_s>] not used
47
+ # @return [String] the job id returned after successfully submitting a
48
+ # job
49
+ # @see Adapter#submit
50
+ def submit(script, after: [], afterok: [], afternotok: [], afterany: [])
51
+ script_file = make_script_file(script.content)
52
+ args = []
53
+
54
+ # cluster configuration args
55
+ args.concat ["-s", scheduler] unless scheduler.nil?
56
+ args.concat [image_arg, image] unless image.nil?
57
+
58
+ args.concat ["-o", script.output_path.to_s] unless script.output_path.nil?
59
+ args.concat ["-e", script.error_path.to_s] unless script.error_path.nil?
60
+ args.concat ["-tl", seconds_to_duration(script.wall_time)] unless script.wall_time.nil?
61
+ args.concat ["-js", script_file.path.to_s]
62
+
63
+ args.concat script.native if script.native
64
+
65
+ output = call("ccqsub", args: args)
66
+ parse_job_id_from_ccqsub(output)
67
+ ensure
68
+ script_file.close
69
+ end
70
+
71
+ # Retrieve info for all jobs from the resource manager
72
+ # @return [Array<Info>] information describing submitted jobs
73
+ def info_all(attrs: nil)
74
+ args = []
75
+ args.concat ["-s", scheduler] unless scheduler.nil?
76
+
77
+ stat_output = call("ccqstat", args: args)
78
+ info_from_ccqstat(stat_output)
79
+ end
80
+
81
+ # Retrieve job info from the resource manager
82
+ # @param id [#to_s] the id of the job
83
+ # @return [Info] information describing submitted job
84
+ def info(id)
85
+ args = []
86
+ args.concat ["-s", scheduler] unless scheduler.nil?
87
+ args.concat ["-ji", id]
88
+
89
+ stat_output = call("ccqstat", args: args)
90
+
91
+ # WARNING: code path differs here than info_all because the output
92
+ # from ccqstat -ji $JOBID is much more data than just the 4
93
+ # columns that ccqstat gives.
94
+ info_from_ccqstat_extended(stat_output)
95
+ end
96
+
97
+ # Retrieve job status from resource manager
98
+ # @param id [#to_s] the id of the job
99
+ # @return [Status] status of job
100
+ # @see Adapter#status
101
+ def status(id)
102
+ info(id).status
103
+ end
104
+
105
+ # This adapter does not implement hold and will always raise
106
+ # an exception.
107
+ # @param id [#to_s] the id of the job
108
+ # @raise [JobAdapterError] always
109
+ # @return [void]
110
+ def hold(_)
111
+ raise NotImplementedError, "subclass did not define #hold"
112
+ end
113
+
114
+ # This adapter does not implement release and will always raise
115
+ # an exception.
116
+ # @param id [#to_s] the id of the job
117
+ # @raise [JobAdapterError] always
118
+ # @return [void]
119
+ def release(_)
120
+ raise NotImplementedError, "subclass did not define #release"
121
+ end
122
+
123
+ # Delete the submitted job
124
+ # @param id [#to_s] the id of the job
125
+ # @return [void]
126
+ def delete(id)
127
+ call("ccqdel", args: [id])
128
+ end
129
+
130
+ def directive_prefix
131
+ '#CC'
132
+ end
133
+
134
+ private
135
+
136
+ # Mapping of state codes
137
+ STATE_MAP =
138
+ {
139
+ 'Error' => :suspended, # not running, but infrastructure still possibly exists
140
+ 'CreatingCG' => :queued, # creating control group
141
+ 'Pending' => :queued, # in queue
142
+ 'Submitted' => :queued, #
143
+ 'Provisioning' => :queued, # node is being provisioned
144
+ 'Running' => :running, #
145
+ 'Completed' => :completed, #
146
+ }.freeze
147
+
148
+ def gcp_provider
149
+ 'gcp'
150
+ end
151
+
152
+ def aws_provider
153
+ 'aws'
154
+ end
155
+
156
+ def image_arg
157
+ if cloud == gcp_provider
158
+ '-gcpgi'
159
+ else
160
+ '-awsami'
161
+ end
162
+ end
163
+
164
+ def call(cmd, args: [], env: {}, stdin: "")
165
+ cmd = OodCore::Job::Adapters::Helper.bin_path(cmd, bin, bin_overrides)
166
+ args = args.map(&:to_s)
167
+ env = env.to_h
168
+ o, e, s = Open3.capture3(env, cmd, *args, stdin_data: stdin.to_s)
169
+ s.success? ? o : interpret_and_raise(e, cmd)
170
+ end
171
+
172
+ # helper function to interpret an error the command had given and
173
+ # raise a different error.
174
+ def interpret_and_raise(error, command)
175
+ # a special case with CCQ that prompts the user for username & password
176
+ # so let's be helpful and tell the user what to do.
177
+ if error.end_with?("EOFError: EOF when reading a line\n")
178
+ raise(
179
+ PromptError,
180
+ "The #{command} command was prompted. You need to generate the certificate " +
181
+ "manually in a shell by running 'ccqstat'\nand entering your username/password"
182
+ )
183
+ else
184
+ raise(JobAdapterError, e.message)
185
+ end
186
+ end
187
+
188
+ # Convert seconds to duration
189
+ def seconds_to_duration(seconds)
190
+ format("%02d:%02d:%02d", seconds / 3600, seconds / 60 % 60, seconds % 60)
191
+ end
192
+
193
+ # helper to make a script file. We can't pipe it into ccq so we have to
194
+ # write a file.
195
+ def make_script_file(content)
196
+ file = Tempfile.new(tmp_file_name)
197
+ file.write(content.to_s)
198
+ file.flush
199
+ file
200
+ end
201
+
202
+ def tmp_file_name
203
+ 'ccq_ood_script_'
204
+ end
205
+
206
+ def parse_job_id_from_ccqsub(output)
207
+ match_data = /#{jobid_regex}/.match(output)
208
+ # match_data could be nil, OR re-configured jobid_regex could be looking for a different named group
209
+ job_id = match_data&.named_captures&.fetch('job_id', nil)
210
+ throw JobAdapterError.new "Could not extract job id out of ccqsub output '#{output}'" if job_id.nil?
211
+ job_id
212
+ end
213
+
214
+ # parse an Ood::Job::Info object from extended ccqstat output
215
+ def info_from_ccqstat_extended(data)
216
+ raw = extended_data_to_hash(data)
217
+ data_hash = { native: raw }
218
+ data_hash[:status] = get_state(raw['status'])
219
+ data_hash[:id] = raw['name']
220
+ data_hash[:job_name] = raw['jobName']
221
+ data_hash[:job_owner] = raw['userName']
222
+ data_hash[:submit_host] = raw['submitHostInstanceId']
223
+ data_hash[:dispatch_time] = raw['startTime'].to_i
224
+ data_hash[:submission_time] = raw['dateSubmitted'].to_i
225
+ data_hash[:queue_name] = raw['criteriaPriority']
226
+
227
+ Info.new(data_hash)
228
+ end
229
+
230
+ # extended data is just lines of 'key: value' value, so parse
231
+ # it and stick it all in a hash.
232
+ def extended_data_to_hash(data)
233
+ Hash[data.to_s.scan(/(\w+): (\S+)/)]
234
+ end
235
+
236
+ def info_from_ccqstat(data)
237
+ infos = []
238
+
239
+ data.to_s.each_line do |line|
240
+ words = line.split(/\s/).reject(&:empty?)
241
+ next if !words.empty? && words[0] == "Id" # just skip the header
242
+
243
+ infos << Info.new(line_to_hash(words)) if words.size == 5
244
+ end
245
+
246
+ infos
247
+ end
248
+
249
+ def line_to_hash(words)
250
+ return unless words.size == 5
251
+
252
+ data_hash = {}
253
+ data_hash[:id] = words[0]
254
+ data_hash[:job_name] = words[1]
255
+ data_hash[:job_owner] = words[2]
256
+ data_hash[:status] = get_state(words[4])
257
+
258
+ data_hash
259
+ end
260
+
261
+ def get_state(state)
262
+ STATE_MAP.fetch(state, :undetermined)
263
+ end
264
+ end
265
+ end
266
+ end
267
+ end
@@ -12,7 +12,26 @@ module OodCore
12
12
  def self.bin_path(cmd, bin_default, bin_overrides)
13
13
  bin_overrides.fetch(cmd.to_s) { Pathname.new(bin_default.to_s).join(cmd.to_s).to_s }
14
14
  end
15
+
16
+ # Gets a command that submits command on another host via ssh
17
+ # @param submit_host [String] where to submit the command
18
+ # @param cmd [String] the desired command to execute on another host
19
+ # @param cmd_args [Array] arguments to the command specified above
20
+ # @param strict_host_checking [Bool] whether to use strict_host_checking
21
+ # @param env [Hash] env variables to be set w/ssh
22
+ #
23
+ # @return cmd [String] command wrapped in ssh if submit_host is present
24
+ # @return args [Array] command arguments including ssh_flags and original command
25
+ def self.ssh_wrap(submit_host, cmd, cmd_args, strict_host_checking = true, env = {})
26
+ return cmd, cmd_args if submit_host.to_s.empty?
27
+
28
+ check_host = strict_host_checking ? "yes" : "no"
29
+ args = ['-o', 'BatchMode=yes', '-o', 'UserKnownHostsFile=/dev/null', '-o', "StrictHostKeyChecking=#{check_host}", "#{submit_host}"]
30
+ env.each{|key, value| args.push("export #{key}=#{value};")}
31
+
32
+ return 'ssh', args + [cmd] + cmd_args
33
+ end
15
34
  end
16
35
  end
17
36
  end
18
- end
37
+ end
@@ -106,7 +106,7 @@ module OodCore
106
106
  # @param owner [#to_s, Array<#to_s>] the owner(s) of the jobs
107
107
  # @raise [JobAdapterError] if something goes wrong getting job info
108
108
  # @return [Array<Info>] information describing submitted jobs
109
- def info_where_owner(owner: nil, attrs: nil)
109
+ def info_where_owner(_, attrs: nil)
110
110
  info_all
111
111
  end
112
112
 
@@ -57,7 +57,7 @@ class OodCore::Job::Adapters::LinuxHost::Launcher
57
57
  # @param hostname [#to_s] The hostname to submit the work to
58
58
  # @param script [OodCore::Job::Script] The script object defining the work
59
59
  def start_remote_session(script)
60
- cmd = ssh_cmd(submit_host(script))
60
+ cmd = ssh_cmd(submit_host(script), ['/usr/bin/env', 'bash'])
61
61
 
62
62
  session_name = unique_session_name
63
63
  output = call(*cmd, stdin: wrapped_script(script, session_name))
@@ -67,13 +67,13 @@ class OodCore::Job::Adapters::LinuxHost::Launcher
67
67
  end
68
68
 
69
69
  def stop_remote_session(session_name, hostname)
70
- cmd = ssh_cmd(hostname)
70
+ cmd = ssh_cmd(hostname, ['/usr/bin/env', 'bash'])
71
71
 
72
72
  kill_cmd = <<~SCRIPT
73
73
  # Get the tmux pane PID for the target session
74
74
  pane_pid=$(tmux list-panes -aF '\#{session_name} \#{pane_pid}' | grep '#{session_name}' | cut -f 2 -d ' ')
75
75
  # Find the Singularity sinit PID child of the pane process
76
- pane_sinit_pid=$(pstree -p "$pane_pid" | grep -o 'sinit([[:digit:]]*' | grep -o '[[:digit:]]*')
76
+ pane_sinit_pid=$(pstree -p -l "$pane_pid" | grep -o 'sinit([[:digit:]]*' | grep -o '[[:digit:]]*')
77
77
  # Kill sinit which stops both Singularity-based processes and the tmux session
78
78
  kill "$pane_sinit_pid"
79
79
  SCRIPT
@@ -116,19 +116,23 @@ class OodCore::Job::Adapters::LinuxHost::Launcher
116
116
  s.success? ? o : raise(Error, e)
117
117
  end
118
118
 
119
- # The SSH invocation to send a command
119
+ # The full command to ssh into the destination host and execute the command.
120
+ # SSH options include:
120
121
  # -t Force pseudo-terminal allocation (required to allow tmux to run)
121
122
  # -o BatchMode=yes (set mode to be non-interactive)
122
123
  # if ! strict_host_checking
123
124
  # -o UserKnownHostsFile=/dev/null (do not update the user's known hosts file)
124
125
  # -o StrictHostKeyChecking=no (do no check the user's known hosts file)
125
- def ssh_cmd(destination_host)
126
+ #
127
+ # @param destination_host [#to_s] the destination host you wish to ssh into
128
+ # @param cmd [Array<#to_s>] the command to be executed on the destination host
129
+ def ssh_cmd(destination_host, cmd)
126
130
  if strict_host_checking
127
131
  [
128
132
  'ssh', '-t',
129
133
  '-o', 'BatchMode=yes',
130
134
  "#{username}@#{destination_host}"
131
- ]
135
+ ].concat(cmd)
132
136
  else
133
137
  [
134
138
  'ssh', '-t',
@@ -136,7 +140,7 @@ class OodCore::Job::Adapters::LinuxHost::Launcher
136
140
  '-o', 'UserKnownHostsFile=/dev/null',
137
141
  '-o', 'StrictHostKeyChecking=no',
138
142
  "#{username}@#{destination_host}"
139
- ]
143
+ ].concat(cmd)
140
144
  end
141
145
  end
142
146
 
@@ -170,6 +174,7 @@ class OodCore::Job::Adapters::LinuxHost::Launcher
170
174
  'session_name' => session_name,
171
175
  'singularity_bin' => singularity_bin,
172
176
  'singularity_image' => singularity_image(script.native),
177
+ 'ssh_hosts' => ssh_hosts,
173
178
  'tmux_bin' => tmux_bin,
174
179
  }.each{
175
180
  |key, value| bnd.local_variable_set(key, value)
@@ -245,7 +250,7 @@ class OodCore::Job::Adapters::LinuxHost::Launcher
245
250
  ['#{session_name}', '#{session_created}', '#{pane_pid}'].join(UNIT_SEPARATOR)
246
251
  )
247
252
  keys = [:session_name, :session_created, :session_pid]
248
- cmd = ssh_cmd(destination_host) + ['tmux', 'list-panes', '-aF', format_str]
253
+ cmd = ssh_cmd(destination_host, ['tmux', 'list-panes', '-aF', format_str])
249
254
 
250
255
  call(*cmd).split(
251
256
  "\n"