clusterfuck 0.1.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- data/.document +5 -0
- data/.gitignore +5 -0
- data/LICENSE +20 -0
- data/README.rdoc +49 -0
- data/Rakefile +60 -0
- data/VERSION +1 -0
- data/bin/clusterfuck +20 -0
- data/lib/clusterfuck.rb +218 -0
- data/test/clusterfuck_test.rb +7 -0
- data/test/test_helper.rb +9 -0
- metadata +66 -0
data/.document
ADDED
data/LICENSE
ADDED
@@ -0,0 +1,20 @@
|
|
1
|
+
Copyright (c) 2009 Trevor Fountain
|
2
|
+
|
3
|
+
Permission is hereby granted, free of charge, to any person obtaining
|
4
|
+
a copy of this software and associated documentation files (the
|
5
|
+
"Software"), to deal in the Software without restriction, including
|
6
|
+
without limitation the rights to use, copy, modify, merge, publish,
|
7
|
+
distribute, sublicense, and/or sell copies of the Software, and to
|
8
|
+
permit persons to whom the Software is furnished to do so, subject to
|
9
|
+
the following conditions:
|
10
|
+
|
11
|
+
The above copyright notice and this permission notice shall be
|
12
|
+
included in all copies or substantial portions of the Software.
|
13
|
+
|
14
|
+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
|
15
|
+
EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
|
16
|
+
MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
|
17
|
+
NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE
|
18
|
+
LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION
|
19
|
+
OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION
|
20
|
+
WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
|
data/README.rdoc
ADDED
@@ -0,0 +1,49 @@
|
|
1
|
+
= Clusterfuck
|
2
|
+
==== A Subversive Distributed-Systems Tool
|
3
|
+
|
4
|
+
Clusterfuck is a tool for automating the process of SSH-ing into remote machines and kickstarting a large number
|
5
|
+
of jobs. It's probably best explained by an example, so here's what I use it for:
|
6
|
+
|
7
|
+
As part of my research I need to compute the distance between each pair of objects in a set of about 70,000 items.
|
8
|
+
Computing the distance between each pair takes a few seconds; running the entire job on a single machine generally takes over a day.
|
9
|
+
However, as a member of the University I have a ssh login that works on quite a few machines, so I found myself breaking the job up into smaller, quicker chunks and running each chunk on a different machine.
|
10
|
+
Clusterfuck was born out of my frustration with that method -- "surely," I said to myself, "this can be automated."
|
11
|
+
|
12
|
+
If you have a lot of jobs to run and access to multiple machines on which to run them, Clusterfuck is for you!
|
13
|
+
|
14
|
+
== Usage
|
15
|
+
To use Clusterfuck you'll first need to create a configuration file (a "clusterfile"). An example clusterfile might look something like this:
|
16
|
+
|
17
|
+
Clusterfuck::Task.new do |task|
|
18
|
+
task.hosts = %w{clark asimov}
|
19
|
+
task.jobs = (0..3).map { |x| Clusterfuck::Job.new("host{x}","sleep 0.5 && hostname") }
|
20
|
+
task.temp = "fragments"
|
21
|
+
task.username = "SSHUSERNAME"
|
22
|
+
task.password = "SSHPASSWORD"
|
23
|
+
task.debug = true
|
24
|
+
end
|
25
|
+
|
26
|
+
This creates a new clusterfuck task and distributes the jobs across two hosts, +clark+ and +asimov+.
|
27
|
+
The jobs to be run in this case are pretty trivial; we basically ssh into each machine, sleep for a little bit, then get the hostname.
|
28
|
+
Whatever each job prints to stdout is saved in +task+.+temp+ (under the current working directory); running
|
29
|
+
this clusterfile will create 4 files in <code>./fragments/</code>: host0.[hostname], host1.[hostname], host2.[hostname], and host3.[hostname] (where [hostname] is the name of the machine on which the job was run).
|
30
|
+
+task+.+username+ and +task+.+password+ are the SSH credentials used to log into the maching -- currently, Clusterfuck
|
31
|
+
can only use one global set of credentials. There's no technical reason for this, other than the fact that I don't
|
32
|
+
really need to use machine-specific logins, so it'll probably appear in future releases.
|
33
|
+
+task+.+verbose+ turns on verbose output (messages to stdout each time a job is started, skipped, or canceled).
|
34
|
+
|
35
|
+
Once you have a clusterfile you can kick off your jobs by running the command +clusterfuck+ in the same directory.
|
36
|
+
|
37
|
+
== Note on Patches/Pull Requests
|
38
|
+
|
39
|
+
* Fork the project.
|
40
|
+
* Add something cool or fix a nefarious bug. Documentation wins extra love.
|
41
|
+
* Add tests for it. I'd really like this, but since I haven't written any tests myself yet I can't really blame you if you skip it...
|
42
|
+
* Commit, but do not mess with rakefile, version, or history.
|
43
|
+
(if you want to have your own version that's ok -- but
|
44
|
+
bump the version in a separate commit that I can ignore when I pull)
|
45
|
+
* Send me a pull request.
|
46
|
+
|
47
|
+
== Copyright
|
48
|
+
|
49
|
+
Copyright (c) 2009 Trevor Fountain. See LICENSE for details.
|
data/Rakefile
ADDED
@@ -0,0 +1,60 @@
|
|
1
|
+
require 'rubygems'
|
2
|
+
require 'rake'
|
3
|
+
|
4
|
+
begin
|
5
|
+
require 'jeweler'
|
6
|
+
Jeweler::Tasks.new do |gem|
|
7
|
+
gem.name = "clusterfuck"
|
8
|
+
gem.summary = %Q{Run jobs across multiple machines via ssh}
|
9
|
+
gem.description = %Q{Automate the execution of jobs across multiple machines with SSH. Ideal for systems with shared filesystems.}
|
10
|
+
gem.email = "doches@gmail.com"
|
11
|
+
gem.homepage = "http://github.com/doches/clusterfuck"
|
12
|
+
gem.authors = ["Trevor Fountain"]
|
13
|
+
# gem is a Gem::Specification... see http://www.rubygems.org/read/chapter/20 for additional settings
|
14
|
+
end
|
15
|
+
Jeweler::GemcutterTasks.new
|
16
|
+
rescue LoadError
|
17
|
+
puts "Jeweler (or a dependency) not available. Install it with: sudo gem install jeweler"
|
18
|
+
end
|
19
|
+
|
20
|
+
require 'rake/testtask'
|
21
|
+
Rake::TestTask.new(:test) do |test|
|
22
|
+
test.libs << 'lib' << 'test'
|
23
|
+
test.pattern = 'test/**/*_test.rb'
|
24
|
+
test.verbose = true
|
25
|
+
end
|
26
|
+
|
27
|
+
begin
|
28
|
+
require 'rcov/rcovtask'
|
29
|
+
Rcov::RcovTask.new do |test|
|
30
|
+
test.libs << 'test'
|
31
|
+
test.pattern = 'test/**/*_test.rb'
|
32
|
+
test.verbose = true
|
33
|
+
end
|
34
|
+
rescue LoadError
|
35
|
+
task :rcov do
|
36
|
+
abort "RCov is not available. In order to run rcov, you must: sudo gem install spicycode-rcov"
|
37
|
+
end
|
38
|
+
end
|
39
|
+
|
40
|
+
task :test => :check_dependencies
|
41
|
+
|
42
|
+
task :default => :test
|
43
|
+
|
44
|
+
gem 'rdoc'
|
45
|
+
require 'rdoc'
|
46
|
+
require 'rake/rdoctask'
|
47
|
+
Rake::RDocTask.new do |rdoc|
|
48
|
+
if File.exist?('VERSION')
|
49
|
+
version = File.read('VERSION')
|
50
|
+
else
|
51
|
+
version = ""
|
52
|
+
end
|
53
|
+
|
54
|
+
rdoc.rdoc_dir = 'rdoc'
|
55
|
+
rdoc.title = "Clusterfuck #{version}"
|
56
|
+
rdoc.rdoc_files.include('README*')
|
57
|
+
rdoc.rdoc_files.include('lib/*.rb')
|
58
|
+
rdoc.main = "README.rdoc"
|
59
|
+
rdoc.options += ["-SHN","-f","darkfish"]
|
60
|
+
end
|
data/VERSION
ADDED
@@ -0,0 +1 @@
|
|
1
|
+
0.1.0
|
data/bin/clusterfuck
ADDED
@@ -0,0 +1,20 @@
|
|
1
|
+
#!/usr/bin/env ruby
|
2
|
+
|
3
|
+
require 'clusterfuck'
|
4
|
+
if ARGV[0]
|
5
|
+
# Use specified file
|
6
|
+
load ARGV[0]
|
7
|
+
else
|
8
|
+
# Search the current directory for a clusterfile
|
9
|
+
found = false
|
10
|
+
Dir.foreach(".") do |file|
|
11
|
+
if file.downcase == "clusterfile"
|
12
|
+
load file
|
13
|
+
found = true
|
14
|
+
break
|
15
|
+
end
|
16
|
+
end
|
17
|
+
if not found
|
18
|
+
STDERR.puts "No clusterfile found!"
|
19
|
+
end
|
20
|
+
end
|
data/lib/clusterfuck.rb
ADDED
@@ -0,0 +1,218 @@
|
|
1
|
+
require 'socket'
|
2
|
+
require 'net/ssh'
|
3
|
+
|
4
|
+
# Clusterfuck is an ugly, dirty hack to run a large number of jobs on multiple machines.
|
5
|
+
# If you can break your task up into a series of small, independent jobs, clusterfuck
|
6
|
+
# can automate the process of distributing jobs across machines.
|
7
|
+
module Clusterfuck
|
8
|
+
# Print a message when a job is cancelled due to too many failures
|
9
|
+
VERBOSE_CANCEL = 0
|
10
|
+
# Print a message when a job is cancelled AND at each failure
|
11
|
+
VERBOSE_FAIL = 1
|
12
|
+
# Print a message for cancellations and failures, AND each time a job is started.
|
13
|
+
VERBOSE_ALL = 2
|
14
|
+
|
15
|
+
# The flag used to prefix dry run (debugging) messages.
|
16
|
+
DEBUG_WARN = "[DRY-RUN]"
|
17
|
+
# The interval to sleep instead of running jobs when performing a dry run (in seconds)
|
18
|
+
DEBUG_INTERVAL = [0.2,1.0]
|
19
|
+
|
20
|
+
# A configuration holds the various pieces of information Clusterfuck needs
|
21
|
+
# to represent a task.
|
22
|
+
#
|
23
|
+
# You probably won't need to instantiate a Configuration directly; one is created
|
24
|
+
# when you create a new Task, and passed to the block it takes as a parameter. See Task
|
25
|
+
# for more information.
|
26
|
+
#
|
27
|
+
# Possible configuration options include:
|
28
|
+
# [timeout] Number of seconds to wait before an SSH connection 'times out' (DEFAULT: 2)
|
29
|
+
# [max_fail] Max number of times a failing job will be re-attempted on a new machine (DEFAULT: 3)
|
30
|
+
# [hosts] Array of hostnames (or ip addresses) as Strings to use as nodes
|
31
|
+
# [jobs] Array of Job objects, one per job, which will be allocated to the +hosts+. If you're lazy,
|
32
|
+
# you can also just use an array of strings (where each string is the command to run) -- a short
|
33
|
+
# name for each will be produced using the first 8 chars from the command.
|
34
|
+
# [verbose] Level of message reporting. One of +VERBOSE_CANCEL+,+VERBOSE_FAIL+, or +VERBOSE_ALL+
|
35
|
+
# (DEFAULT: +VERBOSE_CANCEL+)
|
36
|
+
# [username] The SSH username to use to connect
|
37
|
+
# [password] The SSH password to use to connect
|
38
|
+
# [show_report] Show a report after all jobs are complete that gives statistics for each machine.
|
39
|
+
# [debug] Do a 'dry run' -- allocate jobs to machines and display the result but DO NOT actually
|
40
|
+
# connect to any machines or run any jobs. Useful for testing your clusterfile before
|
41
|
+
# kicking off a major run.
|
42
|
+
# [temp] Directory in which to capture stdout from each job. Setting this to +false+
|
43
|
+
# will cause clusterfuck to ignore job output, leaving it up to you to capture the results
|
44
|
+
# of each job. (DEFAULT: ./fragments)
|
45
|
+
class Configuration
|
46
|
+
# Holds the user-specified options. Again, you probably don't want to access this directly -- use the
|
47
|
+
# getter/setter syntax instead.
|
48
|
+
attr_reader :options
|
49
|
+
|
50
|
+
# Create a new Configuration object with default options.
|
51
|
+
def initialize
|
52
|
+
@options = {
|
53
|
+
"timeout" => 2,
|
54
|
+
"max_fail" => 3,
|
55
|
+
"verbose" => VERBOSE_CANCEL,
|
56
|
+
"show_report" => true,
|
57
|
+
"temp" => "./fragments",
|
58
|
+
}
|
59
|
+
end
|
60
|
+
|
61
|
+
# You can get/set options as if they were attributes, i.e. +config.foo = "bar"+ will set the option +foo+ to "bar".
|
62
|
+
def method_missing(key,args=nil)
|
63
|
+
if args.nil?
|
64
|
+
return @options[key.to_s]
|
65
|
+
else
|
66
|
+
key = key.to_s.gsub!("=","")
|
67
|
+
@options[key] = args
|
68
|
+
end
|
69
|
+
end
|
70
|
+
|
71
|
+
# Get a pretty-printed version of the currently set options
|
72
|
+
def to_s
|
73
|
+
@options.map { |pair| "#{pair[0]} = \"#{pair[1]}\""}.join(", ")
|
74
|
+
end
|
75
|
+
|
76
|
+
# Convert array of string commands to Job objects if necessary
|
77
|
+
def jobify!
|
78
|
+
@options["jobs"].map! do |job|
|
79
|
+
if not job.is_a?(Job) # Ah-ha, make this string into a job
|
80
|
+
short = job.downcase.gsub(/[^a-z]/,"")
|
81
|
+
short = job[0..7] if short.size > 8
|
82
|
+
Job.new(short,job)
|
83
|
+
else # Don't change anything...
|
84
|
+
job
|
85
|
+
end
|
86
|
+
end
|
87
|
+
end
|
88
|
+
end
|
89
|
+
|
90
|
+
# The primary means of interacting with Clusterfuck. Create a new
|
91
|
+
# Task, passing in a block that takes a Configuration object as a parameter (rake-style).
|
92
|
+
# The constructor returns after all jobs have been completed.
|
93
|
+
class Task
|
94
|
+
# See Configuration for a list of recognized configuration options.
|
95
|
+
def initialize(&custom)
|
96
|
+
# Run configuration options specified in clusterfile
|
97
|
+
config = Configuration.new
|
98
|
+
custom.call(config)
|
99
|
+
config.jobify!
|
100
|
+
|
101
|
+
# Make output fragment directory
|
102
|
+
`mkdir #{config.temp}` if config.temp and not File.exists?(config.temp)
|
103
|
+
|
104
|
+
# Run all jobs
|
105
|
+
machines = config.hosts.map { |name| Machine.new(name,config) }
|
106
|
+
machines.each { |machine| machine.run }
|
107
|
+
|
108
|
+
# Wait for jobs to terminate
|
109
|
+
machines.each do |machine|
|
110
|
+
begin
|
111
|
+
machine.thread.join
|
112
|
+
rescue Timeout::Error
|
113
|
+
STDERR.puts machine.to_s
|
114
|
+
end
|
115
|
+
end
|
116
|
+
|
117
|
+
# Print a report, if requested
|
118
|
+
if config.show_report
|
119
|
+
puts " Machine\t| STARTED\t| COMPLETE\t| FAILED\t|"
|
120
|
+
machines.each { |machine| puts machine.report }
|
121
|
+
end
|
122
|
+
end
|
123
|
+
end
|
124
|
+
|
125
|
+
# Represents a single machine (node) in our ad hoc cluster
|
126
|
+
class Machine
|
127
|
+
# The hostname of this machine
|
128
|
+
attr_accessor :host
|
129
|
+
# The global config options specified when the task was created
|
130
|
+
attr_accessor :config
|
131
|
+
# The thread represented this machine's ssh process
|
132
|
+
attr_reader :thread
|
133
|
+
# The number of jobs this machine has completed
|
134
|
+
attr_reader :jobs_completed
|
135
|
+
# The number of jobs this machine has attempted
|
136
|
+
attr_reader :jobs_attempted
|
137
|
+
# Was this machine dropped from the host list (too many failed jobs)?
|
138
|
+
attr_reader :dropped
|
139
|
+
|
140
|
+
# Create a new machine with the specified +host+ and +config+
|
141
|
+
def initialize(host,config)
|
142
|
+
self.host = host
|
143
|
+
self.config = config
|
144
|
+
|
145
|
+
@thread = nil
|
146
|
+
@jobs_completed = 0
|
147
|
+
@jobs_attempted = 0
|
148
|
+
@dropped = false
|
149
|
+
end
|
150
|
+
|
151
|
+
# Open an SSH connection to this machine and process jobs until the global jobs queue is empty
|
152
|
+
def run
|
153
|
+
@thread = Thread.new do
|
154
|
+
while config.jobs.size > 0
|
155
|
+
job = config.jobs.shift
|
156
|
+
if config.debug
|
157
|
+
puts "#{DEBUG_WARN} #{self.host} starting job '#{job.short_name}'"
|
158
|
+
puts "#{DEBUG_WARN} #{job.command}"
|
159
|
+
delay = rand*(DEBUG_INTERVAL[1]-DEBUG_INTERVAL[0])+DEBUG_INTERVAL[0]
|
160
|
+
@jobs_attempted += 1
|
161
|
+
sleep(delay)
|
162
|
+
@jobs_completed += 1
|
163
|
+
else
|
164
|
+
begin
|
165
|
+
@jobs_attempted += 1
|
166
|
+
Net::SSH.start(self.host,config.username,:password => config.password,:timeout => config.timeout) do |ssh|
|
167
|
+
puts "Starting job #{job.short_name} on #{self.host}" if config.verbose >= VERBOSE_ALL
|
168
|
+
if config.temp
|
169
|
+
ssh.exec(job.command + " > #{Dir.getwd}/#{config.temp}/#{job.short_name}.#{self.host}")
|
170
|
+
else
|
171
|
+
ssh.exec(job.command)
|
172
|
+
end
|
173
|
+
@jobs_completed += 1
|
174
|
+
end
|
175
|
+
rescue Timeout::Error
|
176
|
+
puts "#{job.short_name} FAILED on #{self.host}, dropping it from the hostlist" if config.verbose >= VERBOSE_FAIL
|
177
|
+
if not job.failed < config.max_fail
|
178
|
+
config.jobs.push job
|
179
|
+
job.failed += 1
|
180
|
+
else
|
181
|
+
puts "CANCELLING #{job.short_name}, too many failures (#{job.failed})" if config.verbose >= VERBOSE_CANCEL
|
182
|
+
end
|
183
|
+
@dropped = true
|
184
|
+
break
|
185
|
+
end
|
186
|
+
end
|
187
|
+
end
|
188
|
+
end
|
189
|
+
end
|
190
|
+
|
191
|
+
# Get a one-line summary of this machine's performance
|
192
|
+
def report
|
193
|
+
tab = "\t"
|
194
|
+
if self.host.size > 7
|
195
|
+
tab = ""
|
196
|
+
end
|
197
|
+
"#{self.host}#{tab}\t| #{@jobs_attempted}\t\t| #{@jobs_completed}\t\t| #{@dropped ? 'YES' : 'no'}\t\t|"
|
198
|
+
end
|
199
|
+
end
|
200
|
+
|
201
|
+
# Represents an individual job to be run
|
202
|
+
class Job
|
203
|
+
# The short name of this job, used to name the temporary file it produces
|
204
|
+
attr_accessor :short_name
|
205
|
+
# The actual command to run to execute this job.
|
206
|
+
attr_accessor :command
|
207
|
+
# The number of times this job has been unsuccessfully attempted.
|
208
|
+
attr_accessor :failed
|
209
|
+
|
210
|
+
# Create a new job with the specified short and command
|
211
|
+
def initialize(short_name,command)
|
212
|
+
self.short_name = short_name
|
213
|
+
self.command = command
|
214
|
+
|
215
|
+
self.failed = 0
|
216
|
+
end
|
217
|
+
end
|
218
|
+
end
|
data/test/test_helper.rb
ADDED
metadata
ADDED
@@ -0,0 +1,66 @@
|
|
1
|
+
--- !ruby/object:Gem::Specification
|
2
|
+
name: clusterfuck
|
3
|
+
version: !ruby/object:Gem::Version
|
4
|
+
version: 0.1.0
|
5
|
+
platform: ruby
|
6
|
+
authors:
|
7
|
+
- Trevor Fountain
|
8
|
+
autorequire:
|
9
|
+
bindir: bin
|
10
|
+
cert_chain: []
|
11
|
+
|
12
|
+
date: 2009-10-20 00:00:00 +01:00
|
13
|
+
default_executable: clusterfuck
|
14
|
+
dependencies: []
|
15
|
+
|
16
|
+
description: Automate the execution of jobs across multiple machines with SSH. Ideal for systems with shared filesystems.
|
17
|
+
email: doches@gmail.com
|
18
|
+
executables:
|
19
|
+
- clusterfuck
|
20
|
+
extensions: []
|
21
|
+
|
22
|
+
extra_rdoc_files:
|
23
|
+
- LICENSE
|
24
|
+
- README.rdoc
|
25
|
+
files:
|
26
|
+
- .document
|
27
|
+
- .gitignore
|
28
|
+
- LICENSE
|
29
|
+
- README.rdoc
|
30
|
+
- Rakefile
|
31
|
+
- VERSION
|
32
|
+
- bin/clusterfuck
|
33
|
+
- lib/clusterfuck.rb
|
34
|
+
- test/clusterfuck_test.rb
|
35
|
+
- test/test_helper.rb
|
36
|
+
has_rdoc: true
|
37
|
+
homepage: http://github.com/doches/clusterfuck
|
38
|
+
licenses: []
|
39
|
+
|
40
|
+
post_install_message:
|
41
|
+
rdoc_options:
|
42
|
+
- --charset=UTF-8
|
43
|
+
require_paths:
|
44
|
+
- lib
|
45
|
+
required_ruby_version: !ruby/object:Gem::Requirement
|
46
|
+
requirements:
|
47
|
+
- - ">="
|
48
|
+
- !ruby/object:Gem::Version
|
49
|
+
version: "0"
|
50
|
+
version:
|
51
|
+
required_rubygems_version: !ruby/object:Gem::Requirement
|
52
|
+
requirements:
|
53
|
+
- - ">="
|
54
|
+
- !ruby/object:Gem::Version
|
55
|
+
version: "0"
|
56
|
+
version:
|
57
|
+
requirements: []
|
58
|
+
|
59
|
+
rubyforge_project:
|
60
|
+
rubygems_version: 1.3.5
|
61
|
+
signing_key:
|
62
|
+
specification_version: 3
|
63
|
+
summary: Run jobs across multiple machines via ssh
|
64
|
+
test_files:
|
65
|
+
- test/test_helper.rb
|
66
|
+
- test/clusterfuck_test.rb
|