ood_core 0.11.1 → 0.13.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: bf5cfe29bd0770daa8404169e04fcc8fcdc9a89b88f83bbfbc8675040b119ccf
4
- data.tar.gz: c6082f2c7b751c0b7f247dcc790ae35bc1a372830b6f7c529d1b5574eed714ee
3
+ metadata.gz: 3296708d7bc47f3379a9e4a6c845d3f25c5ccefb599f4b92406d9dffdaef220b
4
+ data.tar.gz: b6af9e90b67bc9a7a52203808d849d8800336b30b09bdb8ed204526d01bc92e9
5
5
  SHA512:
6
- metadata.gz: 30c82f37cf6c974c04a3d8c9bc9da21e47014b7aecfa03424575196e42aa0ebebb89aba6717e73073a1ed3d963d8391fa76da263a4e3a7e4dc4250a2cb32f830
7
- data.tar.gz: 9c7be268d29f4dd6c9cec57ce783e7eb0777c76e3df7025062e77868985a61e410cd24ec7d39c2a4e91a039d85a85c41fad75e28a408c8e00940597e5a1fb1ff
6
+ metadata.gz: 623ac6e6f8081d68a3e925d1150c9f20a0f613ccfb6837519d1b95d04533a72caa403c54327aad85dcea9c0694cc23941f40307d942623c095f53fed7fc32026
7
+ data.tar.gz: 0d785a9ade36b2f6f62f9ae55672091346aa4fb76bf358e6c00d4bc007623b8d1798813474665fc7b4d850d89e041fae5c2fefc9719fbe9f53a161a76127eaad
@@ -6,12 +6,40 @@ The format is based on [Keep a Changelog](http://keepachangelog.com/en/1.0.0/)
6
6
  and this project adheres to [Semantic Versioning](http://semver.org/spec/v2.0.0.html).
7
7
 
8
8
  ## [Unreleased]
9
+ ## [0.13.0] - 2020-08-10
10
+ ### Added
11
+ - CloudyCluster CCQ Adapter
12
+
13
+ ## [0.12.0] - 2020-08-05
14
+ ### Added
15
+ - qos option to Slurm and Torque [#205](https://github.com/OSC/ood_core/pull/205)
16
+ - native hash returned in qstat for SGE adapter [#198](https://github.com/OSC/ood_core/pull/198)
17
+ - option for specifying `submit_host` to submit jobs via ssh on other host [#204](https://github.com/OSC/ood_core/pull/204)
18
+
19
+ ### Fixed
20
+ - SGE handle milliseconds instead of seconds when milliseconds used [#206](https://github.com/OSC/ood_core/issues/206)
21
+ - Torque's native "hash" for job submission now handles env vars values with spaces [#202](https://github.com/OSC/ood_core/pull/202)
22
+
23
+ ## [0.11.4] - 2020-05-27
24
+ ### Fixed
25
+ - Environment exports in SLURM while implementing [#158](https://github.com/OSC/ood_core/issues/158)
26
+ and [#109](https://github.com/OSC/ood_core/issues/109) in [#163](https://github.com/OSC/ood_core/pull/163)
27
+
28
+ ## [0.11.3] - 2020-05-11
29
+ ### Fixed
30
+ - LinuxhHost Adapter to work with any login shell ([#188](https://github.com/OSC/ood_core/pull/188))
31
+ - LinuxhHost Adapter needs to display long lines in pstree to successfully parse
32
+ output ([#188](https://github.com/OSC/ood_core/pull/188))
33
+
34
+ ## [0.11.2] - 2020-04-23
35
+ ### Fixed
36
+ - fix signature of `LinuxHost#info_where_owner`
9
37
 
10
- ## [0.11.1] - 2012-03-18
38
+ ## [0.11.1] - 2020-03-18
11
39
  ### Changed
12
40
  - Only the version changed. Had to republish to rubygems.org
13
41
 
14
- ## [0.11.0] - 2012-03-18
42
+ ## [0.11.0] - 2020-03-18
15
43
  ### Added
16
44
  - Added directive prefixes to each adapter (e.g. `#QSUB`) ([#161](https://github.com/OSC/ood_core/issues/161))
17
45
  - LHA supports `submit_host` field in native ([#164](https://github.com/OSC/ood_core/issues/164))
@@ -219,7 +247,12 @@ and this project adheres to [Semantic Versioning](http://semver.org/spec/v2.0.0.
219
247
  ### Added
220
248
  - Initial release!
221
249
 
222
- [Unreleased]: https://github.com/OSC/ood_core/compare/v0.11.1...HEAD
250
+ [Unreleased]: https://github.com/OSC/ood_core/compare/v0.13.0...HEAD
251
+ [0.13.0]: https://github.com/OSC/ood_core/compare/v0.12.0...v0.13.0
252
+ [0.12.0]: https://github.com/OSC/ood_core/compare/v0.11.4...v0.12.0
253
+ [0.11.4]: https://github.com/OSC/ood_core/compare/v0.11.3...v0.11.4
254
+ [0.11.3]: https://github.com/OSC/ood_core/compare/v0.11.2...v0.11.3
255
+ [0.11.2]: https://github.com/OSC/ood_core/compare/v0.11.1...v0.11.2
223
256
  [0.11.1]: https://github.com/OSC/ood_core/compare/v0.11.0...v0.11.1
224
257
  [0.11.0]: https://github.com/OSC/ood_core/compare/v0.10.0...v0.11.0
225
258
  [0.10.0]: https://github.com/OSC/ood_core/compare/v0.9.3...v0.10.0
data/README.md CHANGED
@@ -4,12 +4,13 @@
4
4
  ![GitHub Release](https://img.shields.io/github/release/osc/ood_core.svg)
5
5
  ![GitHub License](https://img.shields.io/github/license/osc/ood_core.svg)
6
6
 
7
- Welcome to your new gem! In this directory, you'll find the files you need to
8
- be able to package up your Ruby library into a gem. Put your Ruby code in the
9
- file `lib/ood_core`. To experiment with that code, run `bin/console` for an
10
- interactive prompt.
7
+ - Website: http://openondemand.org/
8
+ - Website repo with JOSS publication: https://github.com/OSC/Open-OnDemand
9
+ - Documentation: https://osc.github.io/ood-documentation/master/
10
+ - Main code repo: https://github.com/OSC/ondemand
11
+ - Core library repo: https://github.com/OSC/ood_core
11
12
 
12
- TODO: Delete this and the text above, and describe your gem
13
+ OnDemand core library with adapters for each batch scheduler.
13
14
 
14
15
  ## Installation
15
16
 
@@ -0,0 +1,267 @@
1
+ require "ood_core/job/adapters/helper"
2
+ require "tempfile"
3
+
4
+ module OodCore
5
+ module Job
6
+ class Factory
7
+ using Refinements::HashExtensions
8
+
9
+ # Build the Cloudy Cluster adapter from a configuration
10
+ # @param config [#to_h] the configuration for job adapter
11
+ # @option config [Object] :image (nil) The default VM image to use
12
+ # @option config [Object] :cloud (gcp) The cloud provider being used [gcp,aws]
13
+ # @option config [Object] :scheduler (nil) The name of the scheduler to use
14
+ # @option config [Object] :sge_root (nil) Path to SGE root, note that
15
+ # @option config [#to_h] :bin (nil) Path to CC client binaries
16
+ # @option config [#to_h] :bin_overrides ({}) Optional overrides to CC client executables
17
+ def self.build_ccq(config)
18
+ Adapters::CCQ.new(config.to_h.symbolize_keys)
19
+ end
20
+ end
21
+
22
+ module Adapters
23
+
24
+ class PromptError < StandardError; end
25
+
26
+ class CCQ < Adapter
27
+ using Refinements::ArrayExtensions
28
+
29
+ attr_reader :image, :cloud, :scheduler, :bin, :bin_overrides, :jobid_regex
30
+
31
+ def initialize(config)
32
+ @image = config.fetch(:image, nil)
33
+ @cloud = config.fetch(:cloud, gcp_provider)
34
+ @scheduler = config.fetch(:scheduler, nil)
35
+ @bin = config.fetch(:bin, '/opt/CloudyCluster/srv/CCQ')
36
+ @bin_overrides = config.fetch(:bin_overrides, {})
37
+ @jobid_regex = config.fetch(:jobid_regex, "job id is: (?<job_id>\\d+) you")
38
+ end
39
+
40
+ # Submit a job with the attributes defined in the job template instance
41
+ # @param script [Script] script object that describes the script and
42
+ # attributes for the submitted job
43
+ # @param after [#to_s, Array<#to_s>] not used
44
+ # @param afterok [#to_s, Array<#to_s>] not used
45
+ # @param afternotok [#to_s, Array<#to_s>] not used
46
+ # @param afterany [#to_s, Array<#to_s>] not used
47
+ # @return [String] the job id returned after successfully submitting a
48
+ # job
49
+ # @see Adapter#submit
50
+ def submit(script, after: [], afterok: [], afternotok: [], afterany: [])
51
+ script_file = make_script_file(script.content)
52
+ args = []
53
+
54
+ # cluster configuration args
55
+ args.concat ["-s", scheduler] unless scheduler.nil?
56
+ args.concat [image_arg, image] unless image.nil?
57
+
58
+ args.concat ["-o", script.output_path.to_s] unless script.output_path.nil?
59
+ args.concat ["-e", script.error_path.to_s] unless script.error_path.nil?
60
+ args.concat ["-tl", seconds_to_duration(script.wall_time)] unless script.wall_time.nil?
61
+ args.concat ["-js", script_file.path.to_s]
62
+
63
+ args.concat script.native if script.native
64
+
65
+ output = call("ccqsub", args: args)
66
+ parse_job_id_from_ccqsub(output)
67
+ ensure
68
+ script_file.close
69
+ end
70
+
71
+ # Retrieve info for all jobs from the resource manager
72
+ # @return [Array<Info>] information describing submitted jobs
73
+ def info_all(attrs: nil)
74
+ args = []
75
+ args.concat ["-s", scheduler] unless scheduler.nil?
76
+
77
+ stat_output = call("ccqstat", args: args)
78
+ info_from_ccqstat(stat_output)
79
+ end
80
+
81
+ # Retrieve job info from the resource manager
82
+ # @param id [#to_s] the id of the job
83
+ # @return [Info] information describing submitted job
84
+ def info(id)
85
+ args = []
86
+ args.concat ["-s", scheduler] unless scheduler.nil?
87
+ args.concat ["-ji", id]
88
+
89
+ stat_output = call("ccqstat", args: args)
90
+
91
+ # WARNING: code path differs here than info_all because the output
92
+ # from ccqstat -ji $JOBID is much more data than just the 4
93
+ # columns that ccqstat gives.
94
+ info_from_ccqstat_extended(stat_output)
95
+ end
96
+
97
+ # Retrieve job status from resource manager
98
+ # @param id [#to_s] the id of the job
99
+ # @return [Status] status of job
100
+ # @see Adapter#status
101
+ def status(id)
102
+ info(id).status
103
+ end
104
+
105
+ # This adapter does not implement hold and will always raise
106
+ # an exception.
107
+ # @param id [#to_s] the id of the job
108
+ # @raise [JobAdapterError] always
109
+ # @return [void]
110
+ def hold(_)
111
+ raise NotImplementedError, "subclass did not define #hold"
112
+ end
113
+
114
+ # This adapter does not implement release and will always raise
115
+ # an exception.
116
+ # @param id [#to_s] the id of the job
117
+ # @raise [JobAdapterError] always
118
+ # @return [void]
119
+ def release(_)
120
+ raise NotImplementedError, "subclass did not define #release"
121
+ end
122
+
123
+ # Delete the submitted job
124
+ # @param id [#to_s] the id of the job
125
+ # @return [void]
126
+ def delete(id)
127
+ call("ccqdel", args: [id])
128
+ end
129
+
130
+ def directive_prefix
131
+ '#CC'
132
+ end
133
+
134
+ private
135
+
136
+ # Mapping of state codes
137
+ STATE_MAP =
138
+ {
139
+ 'Error' => :suspended, # not running, but infrastructure still possibly exists
140
+ 'CreatingCG' => :queued, # creating control group
141
+ 'Pending' => :queued, # in queue
142
+ 'Submitted' => :queued, #
143
+ 'Provisioning' => :queued, # node is being provisioned
144
+ 'Running' => :running, #
145
+ 'Completed' => :completed, #
146
+ }.freeze
147
+
148
+ def gcp_provider
149
+ 'gcp'
150
+ end
151
+
152
+ def aws_provider
153
+ 'aws'
154
+ end
155
+
156
+ def image_arg
157
+ if cloud == gcp_provider
158
+ '-gcpgi'
159
+ else
160
+ '-awsami'
161
+ end
162
+ end
163
+
164
+ def call(cmd, args: [], env: {}, stdin: "")
165
+ cmd = OodCore::Job::Adapters::Helper.bin_path(cmd, bin, bin_overrides)
166
+ args = args.map(&:to_s)
167
+ env = env.to_h
168
+ o, e, s = Open3.capture3(env, cmd, *args, stdin_data: stdin.to_s)
169
+ s.success? ? o : interpret_and_raise(e, cmd)
170
+ end
171
+
172
+ # helper function to interpret an error the command had given and
173
+ # raise a different error.
174
+ def interpret_and_raise(error, command)
175
+ # a special case with CCQ that prompts the user for username & password
176
+ # so let's be helpful and tell the user what to do.
177
+ if error.end_with?("EOFError: EOF when reading a line\n")
178
+ raise(
179
+ PromptError,
180
+ "The #{command} command was prompted. You need to generate the certificate " +
181
+ "manually in a shell by running 'ccqstat'\nand entering your username/password"
182
+ )
183
+ else
184
+ raise(JobAdapterError, e.message)
185
+ end
186
+ end
187
+
188
+ # Convert seconds to duration
189
+ def seconds_to_duration(seconds)
190
+ format("%02d:%02d:%02d", seconds / 3600, seconds / 60 % 60, seconds % 60)
191
+ end
192
+
193
+ # helper to make a script file. We can't pipe it into ccq so we have to
194
+ # write a file.
195
+ def make_script_file(content)
196
+ file = Tempfile.new(tmp_file_name)
197
+ file.write(content.to_s)
198
+ file.flush
199
+ file
200
+ end
201
+
202
+ def tmp_file_name
203
+ 'ccq_ood_script_'
204
+ end
205
+
206
+ def parse_job_id_from_ccqsub(output)
207
+ match_data = /#{jobid_regex}/.match(output)
208
+ # match_data could be nil, OR re-configured jobid_regex could be looking for a different named group
209
+ job_id = match_data&.named_captures&.fetch('job_id', nil)
210
+ throw JobAdapterError.new "Could not extract job id out of ccqsub output '#{output}'" if job_id.nil?
211
+ job_id
212
+ end
213
+
214
+ # parse an Ood::Job::Info object from extended ccqstat output
215
+ def info_from_ccqstat_extended(data)
216
+ raw = extended_data_to_hash(data)
217
+ data_hash = { native: raw }
218
+ data_hash[:status] = get_state(raw['status'])
219
+ data_hash[:id] = raw['name']
220
+ data_hash[:job_name] = raw['jobName']
221
+ data_hash[:job_owner] = raw['userName']
222
+ data_hash[:submit_host] = raw['submitHostInstanceId']
223
+ data_hash[:dispatch_time] = raw['startTime'].to_i
224
+ data_hash[:submission_time] = raw['dateSubmitted'].to_i
225
+ data_hash[:queue_name] = raw['criteriaPriority']
226
+
227
+ Info.new(data_hash)
228
+ end
229
+
230
+ # extended data is just lines of 'key: value' value, so parse
231
+ # it and stick it all in a hash.
232
+ def extended_data_to_hash(data)
233
+ Hash[data.to_s.scan(/(\w+): (\S+)/)]
234
+ end
235
+
236
+ def info_from_ccqstat(data)
237
+ infos = []
238
+
239
+ data.to_s.each_line do |line|
240
+ words = line.split(/\s/).reject(&:empty?)
241
+ next if !words.empty? && words[0] == "Id" # just skip the header
242
+
243
+ infos << Info.new(line_to_hash(words)) if words.size == 5
244
+ end
245
+
246
+ infos
247
+ end
248
+
249
+ def line_to_hash(words)
250
+ return unless words.size == 5
251
+
252
+ data_hash = {}
253
+ data_hash[:id] = words[0]
254
+ data_hash[:job_name] = words[1]
255
+ data_hash[:job_owner] = words[2]
256
+ data_hash[:status] = get_state(words[4])
257
+
258
+ data_hash
259
+ end
260
+
261
+ def get_state(state)
262
+ STATE_MAP.fetch(state, :undetermined)
263
+ end
264
+ end
265
+ end
266
+ end
267
+ end
@@ -12,7 +12,26 @@ module OodCore
12
12
  def self.bin_path(cmd, bin_default, bin_overrides)
13
13
  bin_overrides.fetch(cmd.to_s) { Pathname.new(bin_default.to_s).join(cmd.to_s).to_s }
14
14
  end
15
+
16
+ # Gets a command that submits command on another host via ssh
17
+ # @param submit_host [String] where to submit the command
18
+ # @param cmd [String] the desired command to execute on another host
19
+ # @param cmd_args [Array] arguments to the command specified above
20
+ # @param strict_host_checking [Bool] whether to use strict_host_checking
21
+ # @param env [Hash] env variables to be set w/ssh
22
+ #
23
+ # @return cmd [String] command wrapped in ssh if submit_host is present
24
+ # @return args [Array] command arguments including ssh_flags and original command
25
+ def self.ssh_wrap(submit_host, cmd, cmd_args, strict_host_checking = true, env = {})
26
+ return cmd, cmd_args if submit_host.to_s.empty?
27
+
28
+ check_host = strict_host_checking ? "yes" : "no"
29
+ args = ['-o', 'BatchMode=yes', '-o', 'UserKnownHostsFile=/dev/null', '-o', "StrictHostKeyChecking=#{check_host}", "#{submit_host}"]
30
+ env.each{|key, value| args.push("export #{key}=#{value};")}
31
+
32
+ return 'ssh', args + [cmd] + cmd_args
33
+ end
15
34
  end
16
35
  end
17
36
  end
18
- end
37
+ end
@@ -106,7 +106,7 @@ module OodCore
106
106
  # @param owner [#to_s, Array<#to_s>] the owner(s) of the jobs
107
107
  # @raise [JobAdapterError] if something goes wrong getting job info
108
108
  # @return [Array<Info>] information describing submitted jobs
109
- def info_where_owner(owner: nil, attrs: nil)
109
+ def info_where_owner(_, attrs: nil)
110
110
  info_all
111
111
  end
112
112
 
@@ -57,7 +57,7 @@ class OodCore::Job::Adapters::LinuxHost::Launcher
57
57
  # @param hostname [#to_s] The hostname to submit the work to
58
58
  # @param script [OodCore::Job::Script] The script object defining the work
59
59
  def start_remote_session(script)
60
- cmd = ssh_cmd(submit_host(script))
60
+ cmd = ssh_cmd(submit_host(script), ['/usr/bin/env', 'bash'])
61
61
 
62
62
  session_name = unique_session_name
63
63
  output = call(*cmd, stdin: wrapped_script(script, session_name))
@@ -67,13 +67,13 @@ class OodCore::Job::Adapters::LinuxHost::Launcher
67
67
  end
68
68
 
69
69
  def stop_remote_session(session_name, hostname)
70
- cmd = ssh_cmd(hostname)
70
+ cmd = ssh_cmd(hostname, ['/usr/bin/env', 'bash'])
71
71
 
72
72
  kill_cmd = <<~SCRIPT
73
73
  # Get the tmux pane PID for the target session
74
74
  pane_pid=$(tmux list-panes -aF '\#{session_name} \#{pane_pid}' | grep '#{session_name}' | cut -f 2 -d ' ')
75
75
  # Find the Singularity sinit PID child of the pane process
76
- pane_sinit_pid=$(pstree -p "$pane_pid" | grep -o 'sinit([[:digit:]]*' | grep -o '[[:digit:]]*')
76
+ pane_sinit_pid=$(pstree -p -l "$pane_pid" | grep -o 'sinit([[:digit:]]*' | grep -o '[[:digit:]]*')
77
77
  # Kill sinit which stops both Singularity-based processes and the tmux session
78
78
  kill "$pane_sinit_pid"
79
79
  SCRIPT
@@ -116,19 +116,23 @@ class OodCore::Job::Adapters::LinuxHost::Launcher
116
116
  s.success? ? o : raise(Error, e)
117
117
  end
118
118
 
119
- # The SSH invocation to send a command
119
+ # The full command to ssh into the destination host and execute the command.
120
+ # SSH options include:
120
121
  # -t Force pseudo-terminal allocation (required to allow tmux to run)
121
122
  # -o BatchMode=yes (set mode to be non-interactive)
122
123
  # if ! strict_host_checking
123
124
  # -o UserKnownHostsFile=/dev/null (do not update the user's known hosts file)
124
125
  # -o StrictHostKeyChecking=no (do no check the user's known hosts file)
125
- def ssh_cmd(destination_host)
126
+ #
127
+ # @param destination_host [#to_s] the destination host you wish to ssh into
128
+ # @param cmd [Array<#to_s>] the command to be executed on the destination host
129
+ def ssh_cmd(destination_host, cmd)
126
130
  if strict_host_checking
127
131
  [
128
132
  'ssh', '-t',
129
133
  '-o', 'BatchMode=yes',
130
134
  "#{username}@#{destination_host}"
131
- ]
135
+ ].concat(cmd)
132
136
  else
133
137
  [
134
138
  'ssh', '-t',
@@ -136,7 +140,7 @@ class OodCore::Job::Adapters::LinuxHost::Launcher
136
140
  '-o', 'UserKnownHostsFile=/dev/null',
137
141
  '-o', 'StrictHostKeyChecking=no',
138
142
  "#{username}@#{destination_host}"
139
- ]
143
+ ].concat(cmd)
140
144
  end
141
145
  end
142
146
 
@@ -170,6 +174,7 @@ class OodCore::Job::Adapters::LinuxHost::Launcher
170
174
  'session_name' => session_name,
171
175
  'singularity_bin' => singularity_bin,
172
176
  'singularity_image' => singularity_image(script.native),
177
+ 'ssh_hosts' => ssh_hosts,
173
178
  'tmux_bin' => tmux_bin,
174
179
  }.each{
175
180
  |key, value| bnd.local_variable_set(key, value)
@@ -245,7 +250,7 @@ class OodCore::Job::Adapters::LinuxHost::Launcher
245
250
  ['#{session_name}', '#{session_created}', '#{pane_pid}'].join(UNIT_SEPARATOR)
246
251
  )
247
252
  keys = [:session_name, :session_created, :session_pid]
248
- cmd = ssh_cmd(destination_host) + ['tmux', 'list-panes', '-aF', format_str]
253
+ cmd = ssh_cmd(destination_host, ['tmux', 'list-panes', '-aF', format_str])
249
254
 
250
255
  call(*cmd).split(
251
256
  "\n"