ood_core 0.10.0 → 0.12.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +5 -5
- data/.travis.yml +2 -2
- data/CHANGELOG.md +54 -1
- data/README.md +6 -5
- data/lib/ood_core.rb +1 -0
- data/lib/ood_core/batch_connect/template.rb +44 -2
- data/lib/ood_core/cluster.rb +15 -0
- data/lib/ood_core/clusters.rb +19 -5
- data/lib/ood_core/invalid_cluster.rb +37 -0
- data/lib/ood_core/job/adapter.rb +35 -4
- data/lib/ood_core/job/adapters/drmaa.rb +1 -1
- data/lib/ood_core/job/adapters/helper.rb +20 -1
- data/lib/ood_core/job/adapters/linux_host.rb +5 -1
- data/lib/ood_core/job/adapters/linux_host/launcher.rb +22 -9
- data/lib/ood_core/job/adapters/linux_host/templates/script_wrapper.erb.sh +15 -1
- data/lib/ood_core/job/adapters/lsf.rb +5 -0
- data/lib/ood_core/job/adapters/lsf/batch.rb +5 -3
- data/lib/ood_core/job/adapters/lsf/helper.rb +29 -23
- data/lib/ood_core/job/adapters/pbspro.rb +58 -33
- data/lib/ood_core/job/adapters/sge.rb +4 -0
- data/lib/ood_core/job/adapters/sge/batch.rb +7 -7
- data/lib/ood_core/job/adapters/sge/helper.rb +19 -18
- data/lib/ood_core/job/adapters/sge/qstat_xml_j_r_listener.rb +54 -8
- data/lib/ood_core/job/adapters/sge/qstat_xml_r_listener.rb +25 -2
- data/lib/ood_core/job/adapters/slurm.rb +85 -38
- data/lib/ood_core/job/adapters/torque.rb +34 -22
- data/lib/ood_core/job/adapters/torque/batch.rb +29 -12
- data/lib/ood_core/job/array_ids.rb +18 -53
- data/lib/ood_core/job/script.rb +19 -2
- data/lib/ood_core/version.rb +1 -1
- data/ood_core.gemspec +2 -1
- metadata +20 -6
checksums.yaml
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
---
|
2
|
-
|
3
|
-
metadata.gz:
|
4
|
-
data.tar.gz:
|
2
|
+
SHA256:
|
3
|
+
metadata.gz: bb944d43beb0aced99e13efb2ef10bf33f9666c705c50ca5ae1727751de43073
|
4
|
+
data.tar.gz: 6e3cd66160be3bbd63124d6f2ddc794bce4ae64e385977828faba5dfd28ff838
|
5
5
|
SHA512:
|
6
|
-
metadata.gz:
|
7
|
-
data.tar.gz:
|
6
|
+
metadata.gz: 176e331a856c1e6958c444426d5c1b41aa881e90a69dca507b07f5463eb81355689e8391e0bf27823fc42a9484789f623ffd566b9d6c414c9cf741a7cafd1def
|
7
|
+
data.tar.gz: 15481101ad3120d3e8457612f2b8a8be4f1e268b38538b18b710f27887836f7a47eac3bf2e89d8f73745ae96ae78e21cd5ba5afefb4161cf95f435d6f2fdf001
|
data/.travis.yml
CHANGED
data/CHANGELOG.md
CHANGED
@@ -6,6 +6,53 @@ The format is based on [Keep a Changelog](http://keepachangelog.com/en/1.0.0/)
|
|
6
6
|
and this project adheres to [Semantic Versioning](http://semver.org/spec/v2.0.0.html).
|
7
7
|
|
8
8
|
## [Unreleased]
|
9
|
+
## [0.12.0] - 2020-08-05
|
10
|
+
### Added
|
11
|
+
- qos option to Slurm and Torque [#205](https://github.com/OSC/ood_core/pull/205)
|
12
|
+
- native hash returned in qstat for SGE adapter [#198](https://github.com/OSC/ood_core/pull/198)
|
13
|
+
- option for specifying `submit_host` to submit jobs via ssh on other host [#204](https://github.com/OSC/ood_core/pull/204)
|
14
|
+
|
15
|
+
### Fixed
|
16
|
+
- SGE handle milliseconds instead of seconds when milliseconds used [#206](https://github.com/OSC/ood_core/issues/206)
|
17
|
+
- Torque's native "hash" for job submission now handles env vars values with spaces [#202](https://github.com/OSC/ood_core/pull/202)
|
18
|
+
|
19
|
+
## [0.11.4] - 2020-05-27
|
20
|
+
### Fixed
|
21
|
+
- Environment exports in SLURM while implementing [#158](https://github.com/OSC/ood_core/issues/158)
|
22
|
+
and [#109](https://github.com/OSC/ood_core/issues/109) in [#163](https://github.com/OSC/ood_core/pull/163)
|
23
|
+
|
24
|
+
## [0.11.3] - 2020-05-11
|
25
|
+
### Fixed
|
26
|
+
- LinuxhHost Adapter to work with any login shell ([#188](https://github.com/OSC/ood_core/pull/188))
|
27
|
+
- LinuxhHost Adapter needs to display long lines in pstree to successfully parse
|
28
|
+
output ([#188](https://github.com/OSC/ood_core/pull/188))
|
29
|
+
|
30
|
+
## [0.11.2] - 2020-04-23
|
31
|
+
### Fixed
|
32
|
+
- fix signature of `LinuxHost#info_where_owner`
|
33
|
+
|
34
|
+
## [0.11.1] - 2020-03-18
|
35
|
+
### Changed
|
36
|
+
- Only the version changed. Had to republish to rubygems.org
|
37
|
+
|
38
|
+
## [0.11.0] - 2020-03-18
|
39
|
+
### Added
|
40
|
+
- Added directive prefixes to each adapter (e.g. `#QSUB`) ([#161](https://github.com/OSC/ood_core/issues/161))
|
41
|
+
- LHA supports `submit_host` field in native ([#164](https://github.com/OSC/ood_core/issues/164))
|
42
|
+
- Cluster files can be yaml or yml extensions ([#171](https://github.com/OSC/ood_core/issues/171))
|
43
|
+
- Users can add a flag `OOD_JOB_NAME_ILLEGAL_CHARS` to sanitize job names ([#183](https://github.com/OSC/ood_core/issues/183)
|
44
|
+
|
45
|
+
### Changed
|
46
|
+
- Simplified job array parsing ([#144](https://github.com/OSC/ood_core/issues/144))
|
47
|
+
|
48
|
+
### Fixed
|
49
|
+
- Issue where environment variables were not properly exported to the job ([#158](https://github.com/OSC/ood_core/issues/158))
|
50
|
+
- Parsing bad cluster files ([#150](https://github.com/OSC/ood_core/issues/150) and [#178](https://github.com/OSC/ood_core/issues/178))
|
51
|
+
- netcat is no longer a hard dependency. Now lsof, python and bash can be used ([153](https://github.com/OSC/ood_core/issues/153))
|
52
|
+
- GE crash when nil config file was given ([#175](https://github.com/OSC/ood_core/issues/175))
|
53
|
+
- GE sometimes reported incorrect core count ([#168](https://github.com/OSC/ood_core/issues/168))
|
54
|
+
|
55
|
+
|
9
56
|
## [0.10.0] - 2019-11-05
|
10
57
|
### Added
|
11
58
|
- Added an adapter for submitting work on Linux hosted systems without using a scheduler
|
@@ -196,7 +243,13 @@ and this project adheres to [Semantic Versioning](http://semver.org/spec/v2.0.0.
|
|
196
243
|
### Added
|
197
244
|
- Initial release!
|
198
245
|
|
199
|
-
[Unreleased]: https://github.com/OSC/ood_core/compare/v0.
|
246
|
+
[Unreleased]: https://github.com/OSC/ood_core/compare/v0.12.0...HEAD
|
247
|
+
[0.12.0]: https://github.com/OSC/ood_core/compare/v0.11.4...v0.12.0
|
248
|
+
[0.11.4]: https://github.com/OSC/ood_core/compare/v0.11.3...v0.11.4
|
249
|
+
[0.11.3]: https://github.com/OSC/ood_core/compare/v0.11.2...v0.11.3
|
250
|
+
[0.11.2]: https://github.com/OSC/ood_core/compare/v0.11.1...v0.11.2
|
251
|
+
[0.11.1]: https://github.com/OSC/ood_core/compare/v0.11.0...v0.11.1
|
252
|
+
[0.11.0]: https://github.com/OSC/ood_core/compare/v0.10.0...v0.11.0
|
200
253
|
[0.10.0]: https://github.com/OSC/ood_core/compare/v0.9.3...v0.10.0
|
201
254
|
[0.9.3]: https://github.com/OSC/ood_core/compare/v0.9.2...v0.9.3
|
202
255
|
[0.9.2]: https://github.com/OSC/ood_core/compare/v0.9.1...v0.9.2
|
data/README.md
CHANGED
@@ -4,12 +4,13 @@
|
|
4
4
|

|
5
5
|

|
6
6
|
|
7
|
-
|
8
|
-
|
9
|
-
|
10
|
-
|
7
|
+
- Website: http://openondemand.org/
|
8
|
+
- Website repo with JOSS publication: https://github.com/OSC/Open-OnDemand
|
9
|
+
- Documentation: https://osc.github.io/ood-documentation/master/
|
10
|
+
- Main code repo: https://github.com/OSC/ondemand
|
11
|
+
- Core library repo: https://github.com/OSC/ood_core
|
11
12
|
|
12
|
-
|
13
|
+
OnDemand core library with adapters for each batch scheduler.
|
13
14
|
|
14
15
|
## Installation
|
15
16
|
|
data/lib/ood_core.rb
CHANGED
@@ -117,11 +117,47 @@ module OodCore
|
|
117
117
|
}
|
118
118
|
export -f random_number
|
119
119
|
|
120
|
+
port_used_python() {
|
121
|
+
python -c "import socket; socket.socket().connect(('$1',$2))" >/dev/null 2>&1
|
122
|
+
}
|
123
|
+
|
124
|
+
port_used_python3() {
|
125
|
+
python3 -c "import socket; socket.socket().connect(('$1',$2))" >/dev/null 2>&1
|
126
|
+
}
|
127
|
+
|
128
|
+
port_used_nc(){
|
129
|
+
nc -w 2 "$1" "$2" < /dev/null > /dev/null 2>&1
|
130
|
+
}
|
131
|
+
|
132
|
+
port_used_lsof(){
|
133
|
+
lsof -i :"$2" >/dev/null 2>&1
|
134
|
+
}
|
135
|
+
|
136
|
+
port_used_bash(){
|
137
|
+
local bash_supported=$(strings /bin/bash 2>/dev/null | grep tcp)
|
138
|
+
if [ "$bash_supported" == "/dev/tcp/*/*" ]; then
|
139
|
+
(: < /dev/tcp/$1/$2) >/dev/null 2>&1
|
140
|
+
else
|
141
|
+
return 127
|
142
|
+
fi
|
143
|
+
}
|
144
|
+
|
120
145
|
# Check if port $1 is in use
|
121
146
|
port_used () {
|
122
147
|
local port="${1#*:}"
|
123
148
|
local host=$((expr "${1}" : '\\(.*\\):' || echo "localhost") | awk 'END{print $NF}')
|
124
|
-
|
149
|
+
local port_strategies=(port_used_nc port_used_lsof port_used_bash port_used_python port_used_python3)
|
150
|
+
|
151
|
+
for strategy in ${port_strategies[@]};
|
152
|
+
do
|
153
|
+
$strategy $host $port
|
154
|
+
status=$?
|
155
|
+
if [[ "$status" == "0" ]] || [[ "$status" == "1" ]]; then
|
156
|
+
return $status
|
157
|
+
fi
|
158
|
+
done
|
159
|
+
|
160
|
+
return 127
|
125
161
|
}
|
126
162
|
export -f port_used
|
127
163
|
|
@@ -143,8 +179,14 @@ module OodCore
|
|
143
179
|
local port="${1}"
|
144
180
|
local time="${2:-30}"
|
145
181
|
for ((i=1; i<=time*2; i++)); do
|
146
|
-
|
182
|
+
port_used "${port}"
|
183
|
+
port_status=$?
|
184
|
+
if [ "$port_status" == "0" ]; then
|
147
185
|
return 0
|
186
|
+
elif [ "$port_status" == "127" ]; then
|
187
|
+
echo "commands to find port were either not found or inaccessible."
|
188
|
+
echo "command options are lsof, nc, bash's /dev/tcp, or python (or python3) with socket lib."
|
189
|
+
return 127
|
148
190
|
fi
|
149
191
|
sleep 0.5
|
150
192
|
done
|
data/lib/ood_core/cluster.rb
CHANGED
@@ -28,6 +28,10 @@ module OodCore
|
|
28
28
|
# @return [Hash] the acls configuration
|
29
29
|
attr_reader :acls_config
|
30
30
|
|
31
|
+
# The errors encountered with configuring this cluster
|
32
|
+
# @return Array<String> the errors
|
33
|
+
attr_reader :errors
|
34
|
+
|
31
35
|
# @param cluster [#to_h] the cluster object
|
32
36
|
# @option cluster [#to_sym] :id The cluster id
|
33
37
|
# @option cluster [#to_h] :metadata ({}) The cluster's metadata
|
@@ -39,6 +43,8 @@ module OodCore
|
|
39
43
|
# against
|
40
44
|
# @option cluster [#to_h] :batch_connect ({}) Configuration for batch
|
41
45
|
# connect templates
|
46
|
+
# @option cluster [#to_a] :errors ([]) List of configuration errors
|
47
|
+
#
|
42
48
|
def initialize(cluster)
|
43
49
|
c = cluster.to_h.symbolize_keys
|
44
50
|
|
@@ -52,6 +58,9 @@ module OodCore
|
|
52
58
|
@custom_config = c.fetch(:custom, {}) .to_h.symbolize_keys
|
53
59
|
@acls_config = c.fetch(:acls, []) .map(&:to_h)
|
54
60
|
@batch_connect_config = c.fetch(:batch_connect, {}).to_h.symbolize_keys
|
61
|
+
|
62
|
+
# side affects from object creation and validation
|
63
|
+
@errors = c.fetch(:errors, []) .to_a
|
55
64
|
end
|
56
65
|
|
57
66
|
# Metadata that provides extra information about this cluster
|
@@ -159,6 +168,12 @@ module OodCore
|
|
159
168
|
}
|
160
169
|
end
|
161
170
|
|
171
|
+
# This cluster is always valid
|
172
|
+
# @return true
|
173
|
+
def valid?
|
174
|
+
return true
|
175
|
+
end
|
176
|
+
|
162
177
|
private
|
163
178
|
# Build acl adapter objects from array
|
164
179
|
def build_acls(ary)
|
data/lib/ood_core/clusters.rb
CHANGED
@@ -21,16 +21,30 @@ module OodCore
|
|
21
21
|
if config.file?
|
22
22
|
if config.readable?
|
23
23
|
CONFIG_VERSION.any? do |version|
|
24
|
-
|
25
|
-
|
24
|
+
begin
|
25
|
+
YAML.safe_load(config.read)&.fetch(version, {}).each do |k, v|
|
26
|
+
clusters << Cluster.new(send("parse_#{version}", id: k, cluster: v))
|
27
|
+
end
|
28
|
+
rescue Psych::SyntaxError => e
|
29
|
+
clusters << InvalidCluster.new(
|
30
|
+
id: config.basename(config.extname).to_s,
|
31
|
+
errors: [ e.message.to_s ]
|
32
|
+
)
|
26
33
|
end
|
27
34
|
end
|
28
35
|
end
|
29
36
|
elsif config.directory?
|
30
|
-
Pathname.glob(config.join("*.yml")).select(&:file?).select(&:readable?).each do |p|
|
37
|
+
Pathname.glob([config.join("*.yml"), config.join("*.yaml")]).select(&:file?).select(&:readable?).each do |p|
|
31
38
|
CONFIG_VERSION.any? do |version|
|
32
|
-
|
33
|
-
|
39
|
+
begin
|
40
|
+
if cluster = YAML.safe_load(p.read)&.fetch(version, nil)
|
41
|
+
clusters << Cluster.new(send("parse_#{version}", id: p.basename(p.extname()).to_s, cluster: cluster))
|
42
|
+
end
|
43
|
+
rescue Psych::SyntaxError => e
|
44
|
+
clusters << InvalidCluster.new(
|
45
|
+
id: p.basename(p.extname).to_s,
|
46
|
+
errors: [ e.message.to_s ]
|
47
|
+
)
|
34
48
|
end
|
35
49
|
end
|
36
50
|
end
|
@@ -0,0 +1,37 @@
|
|
1
|
+
module OodCore
|
2
|
+
# A special case of an OodCore::Cluster where something went awry in the
|
3
|
+
# creation and it's invalid for some reason. Users should only be able
|
4
|
+
# to rely on id and metadata.error_msg. All *allow? related functions
|
5
|
+
# false, meaning nothing is allowed.
|
6
|
+
class InvalidCluster < Cluster
|
7
|
+
# Jobs are not allowed
|
8
|
+
# @return false
|
9
|
+
def login_allow?
|
10
|
+
false
|
11
|
+
end
|
12
|
+
|
13
|
+
# Jobs are not allowed
|
14
|
+
# @return false
|
15
|
+
def job_allow?
|
16
|
+
false
|
17
|
+
end
|
18
|
+
|
19
|
+
# Custom features are not allowed
|
20
|
+
# @return false
|
21
|
+
def custom_allow?(_)
|
22
|
+
false
|
23
|
+
end
|
24
|
+
|
25
|
+
# This cluster is not allowed to be used
|
26
|
+
# @return false
|
27
|
+
def allow?
|
28
|
+
false
|
29
|
+
end
|
30
|
+
|
31
|
+
# This cluster is never valid
|
32
|
+
# @return false
|
33
|
+
def valid?
|
34
|
+
return false
|
35
|
+
end
|
36
|
+
end
|
37
|
+
end
|
data/lib/ood_core/job/adapter.rb
CHANGED
@@ -36,7 +36,7 @@ module OodCore
|
|
36
36
|
# Retrieve info for all jobs from the resource manager
|
37
37
|
# @abstract Subclass is expected to implement {#info_all}
|
38
38
|
# @raise [NotImplementedError] if subclass did not define {#info_all}
|
39
|
-
# @param attrs [Array<symbol>] defaults to nil (and all attrs are provided)
|
39
|
+
# @param attrs [Array<symbol>] defaults to nil (and all attrs are provided)
|
40
40
|
# This array specifies only attrs you want, in addition to id and status.
|
41
41
|
# If an array, the Info object that is returned to you is not guarenteed
|
42
42
|
# to have a value for any attr besides the ones specified and id and status.
|
@@ -51,7 +51,7 @@ module OodCore
|
|
51
51
|
# Retrieve info for all jobs for a given owner or owners from the
|
52
52
|
# resource manager
|
53
53
|
# @param owner [#to_s, Array<#to_s>] the owner(s) of the jobs
|
54
|
-
# @param attrs [Array<symbol>] defaults to nil (and all attrs are provided)
|
54
|
+
# @param attrs [Array<symbol>] defaults to nil (and all attrs are provided)
|
55
55
|
# This array specifies only attrs you want, in addition to id and status.
|
56
56
|
# If an array, the Info object that is returned to you is not guarenteed
|
57
57
|
# to have a value for any attr besides the ones specified and id and status.
|
@@ -69,7 +69,7 @@ module OodCore
|
|
69
69
|
end
|
70
70
|
|
71
71
|
# Iterate over each job Info object
|
72
|
-
# @param attrs [Array<symbol>] defaults to nil (and all attrs are provided)
|
72
|
+
# @param attrs [Array<symbol>] defaults to nil (and all attrs are provided)
|
73
73
|
# This array specifies only attrs you want, in addition to id and status.
|
74
74
|
# If an array, the Info object that is returned to you is not guarenteed
|
75
75
|
# to have a value for any attr besides the ones specified and id and status.
|
@@ -88,7 +88,7 @@ module OodCore
|
|
88
88
|
|
89
89
|
# Iterate over each job Info object
|
90
90
|
# @param owner [#to_s, Array<#to_s>] the owner(s) of the jobs
|
91
|
-
# @param attrs [Array<symbol>] defaults to nil (and all attrs are provided)
|
91
|
+
# @param attrs [Array<symbol>] defaults to nil (and all attrs are provided)
|
92
92
|
# This array specifies only attrs you want, in addition to id and status.
|
93
93
|
# If an array, the Info object that is returned to you is not guarenteed
|
94
94
|
# to have a value for any attr besides the ones specified and id and status.
|
@@ -157,6 +157,37 @@ module OodCore
|
|
157
157
|
def delete(id)
|
158
158
|
raise NotImplementedError, "subclass did not define #delete"
|
159
159
|
end
|
160
|
+
|
161
|
+
# Return the scheduler-specific directive prefix
|
162
|
+
#
|
163
|
+
# Examples of directive prefixes include #QSUB, #BSUB and allow placing what would
|
164
|
+
# otherwise be command line options inside the job launch script.
|
165
|
+
#
|
166
|
+
# The method should return nil if the adapter does not support prefixes
|
167
|
+
#
|
168
|
+
# @abstract Subclass is expected to implement {#directive_prefix}
|
169
|
+
# @raise [NotImplementedError] if subclass did not defined {#directive_prefix}
|
170
|
+
# @return [String]
|
171
|
+
def directive_prefix
|
172
|
+
raise NotImplementedError, "subclass did not define #directive_prefix"
|
173
|
+
end
|
174
|
+
|
175
|
+
# Replace illegal chars in job name with a dash
|
176
|
+
#
|
177
|
+
# @return [String] job name with dashes replacing illegal chars
|
178
|
+
def sanitize_job_name(job_name)
|
179
|
+
# escape ^ and omit -
|
180
|
+
chars = job_name_illegal_chars.to_s.gsub("^", "\\^").gsub("-", "")
|
181
|
+
job_name.tr(chars, "-")
|
182
|
+
end
|
183
|
+
|
184
|
+
# Illegal chars that should not be used in a job name
|
185
|
+
# A dash is assumed to be legal in job names in all batch schedulers
|
186
|
+
#
|
187
|
+
# @return [String] string of chars
|
188
|
+
def job_name_illegal_chars
|
189
|
+
ENV["OOD_JOB_NAME_ILLEGAL_CHARS"].to_s
|
190
|
+
end
|
160
191
|
end
|
161
192
|
end
|
162
193
|
end
|
@@ -13,7 +13,7 @@
|
|
13
13
|
# The contents of this file are subject to the Sun Industry Standards
|
14
14
|
# Source License Version 1.2 (the "License"); You may not use this file
|
15
15
|
# except in compliance with the License. You may obtain a copy of the
|
16
|
-
# License at http://
|
16
|
+
# License at http://gridscheduler.sourceforge.net/Gridengine_SISSL_license.html
|
17
17
|
#
|
18
18
|
# Software provided under this License is provided on an "AS IS" basis,
|
19
19
|
# WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING,
|
@@ -12,7 +12,26 @@ module OodCore
|
|
12
12
|
def self.bin_path(cmd, bin_default, bin_overrides)
|
13
13
|
bin_overrides.fetch(cmd.to_s) { Pathname.new(bin_default.to_s).join(cmd.to_s).to_s }
|
14
14
|
end
|
15
|
+
|
16
|
+
# Gets a command that submits command on another host via ssh
|
17
|
+
# @param submit_host [String] where to submit the command
|
18
|
+
# @param cmd [String] the desired command to execute on another host
|
19
|
+
# @param cmd_args [Array] arguments to the command specified above
|
20
|
+
# @param strict_host_checking [Bool] whether to use strict_host_checking
|
21
|
+
# @param env [Hash] env variables to be set w/ssh
|
22
|
+
#
|
23
|
+
# @return cmd [String] command wrapped in ssh if submit_host is present
|
24
|
+
# @return args [Array] command arguments including ssh_flags and original command
|
25
|
+
def self.ssh_wrap(submit_host, cmd, cmd_args, strict_host_checking = true, env = {})
|
26
|
+
return cmd, cmd_args if submit_host.to_s.empty?
|
27
|
+
|
28
|
+
check_host = strict_host_checking ? "yes" : "no"
|
29
|
+
args = ['-o', 'BatchMode=yes', '-o', 'UserKnownHostsFile=/dev/null', '-o', "StrictHostKeyChecking=#{check_host}", "#{submit_host}"]
|
30
|
+
env.each{|key, value| args.push("export #{key}=#{value};")}
|
31
|
+
|
32
|
+
return 'ssh', args + [cmd] + cmd_args
|
33
|
+
end
|
15
34
|
end
|
16
35
|
end
|
17
36
|
end
|
18
|
-
end
|
37
|
+
end
|
@@ -106,7 +106,7 @@ module OodCore
|
|
106
106
|
# @param owner [#to_s, Array<#to_s>] the owner(s) of the jobs
|
107
107
|
# @raise [JobAdapterError] if something goes wrong getting job info
|
108
108
|
# @return [Array<Info>] information describing submitted jobs
|
109
|
-
def info_where_owner(
|
109
|
+
def info_where_owner(_, attrs: nil)
|
110
110
|
info_all
|
111
111
|
end
|
112
112
|
|
@@ -201,6 +201,10 @@ module OodCore
|
|
201
201
|
raise JobAdapterError, e.message
|
202
202
|
end
|
203
203
|
|
204
|
+
def directive_prefix
|
205
|
+
nil
|
206
|
+
end
|
207
|
+
|
204
208
|
private
|
205
209
|
|
206
210
|
def host_permitted?(destination_host)
|
@@ -11,7 +11,7 @@ require 'time'
|
|
11
11
|
class OodCore::Job::Adapters::LinuxHost::Launcher
|
12
12
|
attr_reader :contain, :debug, :site_timeout, :session_name_label, :singularity_bin,
|
13
13
|
:site_singularity_bindpath, :default_singularity_image, :ssh_hosts,
|
14
|
-
:strict_host_checking, :
|
14
|
+
:strict_host_checking, :tmux_bin, :username
|
15
15
|
# The root exception class that all LinuxHost adapter-specific exceptions inherit
|
16
16
|
# from
|
17
17
|
class Error < StandardError; end
|
@@ -57,7 +57,7 @@ class OodCore::Job::Adapters::LinuxHost::Launcher
|
|
57
57
|
# @param hostname [#to_s] The hostname to submit the work to
|
58
58
|
# @param script [OodCore::Job::Script] The script object defining the work
|
59
59
|
def start_remote_session(script)
|
60
|
-
cmd = ssh_cmd(submit_host)
|
60
|
+
cmd = ssh_cmd(submit_host(script), ['/usr/bin/env', 'bash'])
|
61
61
|
|
62
62
|
session_name = unique_session_name
|
63
63
|
output = call(*cmd, stdin: wrapped_script(script, session_name))
|
@@ -67,13 +67,13 @@ class OodCore::Job::Adapters::LinuxHost::Launcher
|
|
67
67
|
end
|
68
68
|
|
69
69
|
def stop_remote_session(session_name, hostname)
|
70
|
-
cmd = ssh_cmd(hostname)
|
70
|
+
cmd = ssh_cmd(hostname, ['/usr/bin/env', 'bash'])
|
71
71
|
|
72
72
|
kill_cmd = <<~SCRIPT
|
73
73
|
# Get the tmux pane PID for the target session
|
74
74
|
pane_pid=$(tmux list-panes -aF '\#{session_name} \#{pane_pid}' | grep '#{session_name}' | cut -f 2 -d ' ')
|
75
75
|
# Find the Singularity sinit PID child of the pane process
|
76
|
-
pane_sinit_pid=$(pstree -p "$pane_pid" | grep -o 'sinit([[:digit:]]*' | grep -o '[[:digit:]]*')
|
76
|
+
pane_sinit_pid=$(pstree -p -l "$pane_pid" | grep -o 'sinit([[:digit:]]*' | grep -o '[[:digit:]]*')
|
77
77
|
# Kill sinit which stops both Singularity-based processes and the tmux session
|
78
78
|
kill "$pane_sinit_pid"
|
79
79
|
SCRIPT
|
@@ -98,6 +98,14 @@ class OodCore::Job::Adapters::LinuxHost::Launcher
|
|
98
98
|
}
|
99
99
|
end
|
100
100
|
|
101
|
+
def submit_host(script = nil)
|
102
|
+
if script && script.native && script.native['submit_host_override']
|
103
|
+
script.native['submit_host_override']
|
104
|
+
else
|
105
|
+
@submit_host
|
106
|
+
end
|
107
|
+
end
|
108
|
+
|
101
109
|
private
|
102
110
|
|
103
111
|
# Call a forked Slurm command for a given cluster
|
@@ -108,19 +116,23 @@ class OodCore::Job::Adapters::LinuxHost::Launcher
|
|
108
116
|
s.success? ? o : raise(Error, e)
|
109
117
|
end
|
110
118
|
|
111
|
-
# The
|
119
|
+
# The full command to ssh into the destination host and execute the command.
|
120
|
+
# SSH options include:
|
112
121
|
# -t Force pseudo-terminal allocation (required to allow tmux to run)
|
113
122
|
# -o BatchMode=yes (set mode to be non-interactive)
|
114
123
|
# if ! strict_host_checking
|
115
124
|
# -o UserKnownHostsFile=/dev/null (do not update the user's known hosts file)
|
116
125
|
# -o StrictHostKeyChecking=no (do no check the user's known hosts file)
|
117
|
-
|
126
|
+
#
|
127
|
+
# @param destination_host [#to_s] the destination host you wish to ssh into
|
128
|
+
# @param cmd [Array<#to_s>] the command to be executed on the destination host
|
129
|
+
def ssh_cmd(destination_host, cmd)
|
118
130
|
if strict_host_checking
|
119
131
|
[
|
120
132
|
'ssh', '-t',
|
121
133
|
'-o', 'BatchMode=yes',
|
122
134
|
"#{username}@#{destination_host}"
|
123
|
-
]
|
135
|
+
].concat(cmd)
|
124
136
|
else
|
125
137
|
[
|
126
138
|
'ssh', '-t',
|
@@ -128,7 +140,7 @@ class OodCore::Job::Adapters::LinuxHost::Launcher
|
|
128
140
|
'-o', 'UserKnownHostsFile=/dev/null',
|
129
141
|
'-o', 'StrictHostKeyChecking=no',
|
130
142
|
"#{username}@#{destination_host}"
|
131
|
-
]
|
143
|
+
].concat(cmd)
|
132
144
|
end
|
133
145
|
end
|
134
146
|
|
@@ -162,6 +174,7 @@ class OodCore::Job::Adapters::LinuxHost::Launcher
|
|
162
174
|
'session_name' => session_name,
|
163
175
|
'singularity_bin' => singularity_bin,
|
164
176
|
'singularity_image' => singularity_image(script.native),
|
177
|
+
'ssh_hosts' => ssh_hosts,
|
165
178
|
'tmux_bin' => tmux_bin,
|
166
179
|
}.each{
|
167
180
|
|key, value| bnd.local_variable_set(key, value)
|
@@ -237,7 +250,7 @@ class OodCore::Job::Adapters::LinuxHost::Launcher
|
|
237
250
|
['#{session_name}', '#{session_created}', '#{pane_pid}'].join(UNIT_SEPARATOR)
|
238
251
|
)
|
239
252
|
keys = [:session_name, :session_created, :session_pid]
|
240
|
-
cmd = ssh_cmd(destination_host
|
253
|
+
cmd = ssh_cmd(destination_host, ['tmux', 'list-panes', '-aF', format_str])
|
241
254
|
|
242
255
|
call(*cmd).split(
|
243
256
|
"\n"
|