clusterfuck 0.1.0

Sign up to get free protection for your applications and to get access to all the features.
@@ -0,0 +1,5 @@
1
+ README.rdoc
2
+ lib/**/*.rb
3
+ bin/*
4
+ features/**/*.feature
5
+ LICENSE
@@ -0,0 +1,5 @@
1
+ *.sw?
2
+ .DS_Store
3
+ coverage
4
+ rdoc
5
+ pkg
data/LICENSE ADDED
@@ -0,0 +1,20 @@
1
+ Copyright (c) 2009 Trevor Fountain
2
+
3
+ Permission is hereby granted, free of charge, to any person obtaining
4
+ a copy of this software and associated documentation files (the
5
+ "Software"), to deal in the Software without restriction, including
6
+ without limitation the rights to use, copy, modify, merge, publish,
7
+ distribute, sublicense, and/or sell copies of the Software, and to
8
+ permit persons to whom the Software is furnished to do so, subject to
9
+ the following conditions:
10
+
11
+ The above copyright notice and this permission notice shall be
12
+ included in all copies or substantial portions of the Software.
13
+
14
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
15
+ EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
16
+ MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
17
+ NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE
18
+ LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION
19
+ OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION
20
+ WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
@@ -0,0 +1,49 @@
1
+ = Clusterfuck
2
+ ==== A Subversive Distributed-Systems Tool
3
+
4
+ Clusterfuck is a tool for automating the process of SSH-ing into remote machines and kickstarting a large number
5
+ of jobs. It's probably best explained by an example, so here's what I use it for:
6
+
7
+ As part of my research I need to compute the distance between each pair of objects in a set of about 70,000 items.
8
+ Computing the distance between each pair takes a few seconds; running the entire job on a single machine generally takes over a day.
9
+ However, as a member of the University I have a ssh login that works on quite a few machines, so I found myself breaking the job up into smaller, quicker chunks and running each chunk on a different machine.
10
+ Clusterfuck was born out of my frustration with that method -- "surely," I said to myself, "this can be automated."
11
+
12
+ If you have a lot of jobs to run and access to multiple machines on which to run them, Clusterfuck is for you!
13
+
14
+ == Usage
15
+ To use Clusterfuck you'll first need to create a configuration file (a "clusterfile"). An example clusterfile might look something like this:
16
+
17
+ Clusterfuck::Task.new do |task|
18
+ task.hosts = %w{clark asimov}
19
+ task.jobs = (0..3).map { |x| Clusterfuck::Job.new("host{x}","sleep 0.5 && hostname") }
20
+ task.temp = "fragments"
21
+ task.username = "SSHUSERNAME"
22
+ task.password = "SSHPASSWORD"
23
+ task.debug = true
24
+ end
25
+
26
+ This creates a new clusterfuck task and distributes the jobs across two hosts, +clark+ and +asimov+.
27
+ The jobs to be run in this case are pretty trivial; we basically ssh into each machine, sleep for a little bit, then get the hostname.
28
+ Whatever each job prints to stdout is saved in +task+.+temp+ (under the current working directory); running
29
+ this clusterfile will create 4 files in <code>./fragments/</code>: host0.[hostname], host1.[hostname], host2.[hostname], and host3.[hostname] (where [hostname] is the name of the machine on which the job was run).
30
+ +task+.+username+ and +task+.+password+ are the SSH credentials used to log into the maching -- currently, Clusterfuck
31
+ can only use one global set of credentials. There's no technical reason for this, other than the fact that I don't
32
+ really need to use machine-specific logins, so it'll probably appear in future releases.
33
+ +task+.+verbose+ turns on verbose output (messages to stdout each time a job is started, skipped, or canceled).
34
+
35
+ Once you have a clusterfile you can kick off your jobs by running the command +clusterfuck+ in the same directory.
36
+
37
+ == Note on Patches/Pull Requests
38
+
39
+ * Fork the project.
40
+ * Add something cool or fix a nefarious bug. Documentation wins extra love.
41
+ * Add tests for it. I'd really like this, but since I haven't written any tests myself yet I can't really blame you if you skip it...
42
+ * Commit, but do not mess with rakefile, version, or history.
43
+ (if you want to have your own version that's ok -- but
44
+ bump the version in a separate commit that I can ignore when I pull)
45
+ * Send me a pull request.
46
+
47
+ == Copyright
48
+
49
+ Copyright (c) 2009 Trevor Fountain. See LICENSE for details.
@@ -0,0 +1,60 @@
1
+ require 'rubygems'
2
+ require 'rake'
3
+
4
+ begin
5
+ require 'jeweler'
6
+ Jeweler::Tasks.new do |gem|
7
+ gem.name = "clusterfuck"
8
+ gem.summary = %Q{Run jobs across multiple machines via ssh}
9
+ gem.description = %Q{Automate the execution of jobs across multiple machines with SSH. Ideal for systems with shared filesystems.}
10
+ gem.email = "doches@gmail.com"
11
+ gem.homepage = "http://github.com/doches/clusterfuck"
12
+ gem.authors = ["Trevor Fountain"]
13
+ # gem is a Gem::Specification... see http://www.rubygems.org/read/chapter/20 for additional settings
14
+ end
15
+ Jeweler::GemcutterTasks.new
16
+ rescue LoadError
17
+ puts "Jeweler (or a dependency) not available. Install it with: sudo gem install jeweler"
18
+ end
19
+
20
+ require 'rake/testtask'
21
+ Rake::TestTask.new(:test) do |test|
22
+ test.libs << 'lib' << 'test'
23
+ test.pattern = 'test/**/*_test.rb'
24
+ test.verbose = true
25
+ end
26
+
27
+ begin
28
+ require 'rcov/rcovtask'
29
+ Rcov::RcovTask.new do |test|
30
+ test.libs << 'test'
31
+ test.pattern = 'test/**/*_test.rb'
32
+ test.verbose = true
33
+ end
34
+ rescue LoadError
35
+ task :rcov do
36
+ abort "RCov is not available. In order to run rcov, you must: sudo gem install spicycode-rcov"
37
+ end
38
+ end
39
+
40
+ task :test => :check_dependencies
41
+
42
+ task :default => :test
43
+
44
+ gem 'rdoc'
45
+ require 'rdoc'
46
+ require 'rake/rdoctask'
47
+ Rake::RDocTask.new do |rdoc|
48
+ if File.exist?('VERSION')
49
+ version = File.read('VERSION')
50
+ else
51
+ version = ""
52
+ end
53
+
54
+ rdoc.rdoc_dir = 'rdoc'
55
+ rdoc.title = "Clusterfuck #{version}"
56
+ rdoc.rdoc_files.include('README*')
57
+ rdoc.rdoc_files.include('lib/*.rb')
58
+ rdoc.main = "README.rdoc"
59
+ rdoc.options += ["-SHN","-f","darkfish"]
60
+ end
data/VERSION ADDED
@@ -0,0 +1 @@
1
+ 0.1.0
@@ -0,0 +1,20 @@
1
+ #!/usr/bin/env ruby
2
+
3
+ require 'clusterfuck'
4
+ if ARGV[0]
5
+ # Use specified file
6
+ load ARGV[0]
7
+ else
8
+ # Search the current directory for a clusterfile
9
+ found = false
10
+ Dir.foreach(".") do |file|
11
+ if file.downcase == "clusterfile"
12
+ load file
13
+ found = true
14
+ break
15
+ end
16
+ end
17
+ if not found
18
+ STDERR.puts "No clusterfile found!"
19
+ end
20
+ end
@@ -0,0 +1,218 @@
1
+ require 'socket'
2
+ require 'net/ssh'
3
+
4
+ # Clusterfuck is an ugly, dirty hack to run a large number of jobs on multiple machines.
5
+ # If you can break your task up into a series of small, independent jobs, clusterfuck
6
+ # can automate the process of distributing jobs across machines.
7
+ module Clusterfuck
8
+ # Print a message when a job is cancelled due to too many failures
9
+ VERBOSE_CANCEL = 0
10
+ # Print a message when a job is cancelled AND at each failure
11
+ VERBOSE_FAIL = 1
12
+ # Print a message for cancellations and failures, AND each time a job is started.
13
+ VERBOSE_ALL = 2
14
+
15
+ # The flag used to prefix dry run (debugging) messages.
16
+ DEBUG_WARN = "[DRY-RUN]"
17
+ # The interval to sleep instead of running jobs when performing a dry run (in seconds)
18
+ DEBUG_INTERVAL = [0.2,1.0]
19
+
20
+ # A configuration holds the various pieces of information Clusterfuck needs
21
+ # to represent a task.
22
+ #
23
+ # You probably won't need to instantiate a Configuration directly; one is created
24
+ # when you create a new Task, and passed to the block it takes as a parameter. See Task
25
+ # for more information.
26
+ #
27
+ # Possible configuration options include:
28
+ # [timeout] Number of seconds to wait before an SSH connection 'times out' (DEFAULT: 2)
29
+ # [max_fail] Max number of times a failing job will be re-attempted on a new machine (DEFAULT: 3)
30
+ # [hosts] Array of hostnames (or ip addresses) as Strings to use as nodes
31
+ # [jobs] Array of Job objects, one per job, which will be allocated to the +hosts+. If you're lazy,
32
+ # you can also just use an array of strings (where each string is the command to run) -- a short
33
+ # name for each will be produced using the first 8 chars from the command.
34
+ # [verbose] Level of message reporting. One of +VERBOSE_CANCEL+,+VERBOSE_FAIL+, or +VERBOSE_ALL+
35
+ # (DEFAULT: +VERBOSE_CANCEL+)
36
+ # [username] The SSH username to use to connect
37
+ # [password] The SSH password to use to connect
38
+ # [show_report] Show a report after all jobs are complete that gives statistics for each machine.
39
+ # [debug] Do a 'dry run' -- allocate jobs to machines and display the result but DO NOT actually
40
+ # connect to any machines or run any jobs. Useful for testing your clusterfile before
41
+ # kicking off a major run.
42
+ # [temp] Directory in which to capture stdout from each job. Setting this to +false+
43
+ # will cause clusterfuck to ignore job output, leaving it up to you to capture the results
44
+ # of each job. (DEFAULT: ./fragments)
45
+ class Configuration
46
+ # Holds the user-specified options. Again, you probably don't want to access this directly -- use the
47
+ # getter/setter syntax instead.
48
+ attr_reader :options
49
+
50
+ # Create a new Configuration object with default options.
51
+ def initialize
52
+ @options = {
53
+ "timeout" => 2,
54
+ "max_fail" => 3,
55
+ "verbose" => VERBOSE_CANCEL,
56
+ "show_report" => true,
57
+ "temp" => "./fragments",
58
+ }
59
+ end
60
+
61
+ # You can get/set options as if they were attributes, i.e. +config.foo = "bar"+ will set the option +foo+ to "bar".
62
+ def method_missing(key,args=nil)
63
+ if args.nil?
64
+ return @options[key.to_s]
65
+ else
66
+ key = key.to_s.gsub!("=","")
67
+ @options[key] = args
68
+ end
69
+ end
70
+
71
+ # Get a pretty-printed version of the currently set options
72
+ def to_s
73
+ @options.map { |pair| "#{pair[0]} = \"#{pair[1]}\""}.join(", ")
74
+ end
75
+
76
+ # Convert array of string commands to Job objects if necessary
77
+ def jobify!
78
+ @options["jobs"].map! do |job|
79
+ if not job.is_a?(Job) # Ah-ha, make this string into a job
80
+ short = job.downcase.gsub(/[^a-z]/,"")
81
+ short = job[0..7] if short.size > 8
82
+ Job.new(short,job)
83
+ else # Don't change anything...
84
+ job
85
+ end
86
+ end
87
+ end
88
+ end
89
+
90
+ # The primary means of interacting with Clusterfuck. Create a new
91
+ # Task, passing in a block that takes a Configuration object as a parameter (rake-style).
92
+ # The constructor returns after all jobs have been completed.
93
+ class Task
94
+ # See Configuration for a list of recognized configuration options.
95
+ def initialize(&custom)
96
+ # Run configuration options specified in clusterfile
97
+ config = Configuration.new
98
+ custom.call(config)
99
+ config.jobify!
100
+
101
+ # Make output fragment directory
102
+ `mkdir #{config.temp}` if config.temp and not File.exists?(config.temp)
103
+
104
+ # Run all jobs
105
+ machines = config.hosts.map { |name| Machine.new(name,config) }
106
+ machines.each { |machine| machine.run }
107
+
108
+ # Wait for jobs to terminate
109
+ machines.each do |machine|
110
+ begin
111
+ machine.thread.join
112
+ rescue Timeout::Error
113
+ STDERR.puts machine.to_s
114
+ end
115
+ end
116
+
117
+ # Print a report, if requested
118
+ if config.show_report
119
+ puts " Machine\t| STARTED\t| COMPLETE\t| FAILED\t|"
120
+ machines.each { |machine| puts machine.report }
121
+ end
122
+ end
123
+ end
124
+
125
+ # Represents a single machine (node) in our ad hoc cluster
126
+ class Machine
127
+ # The hostname of this machine
128
+ attr_accessor :host
129
+ # The global config options specified when the task was created
130
+ attr_accessor :config
131
+ # The thread represented this machine's ssh process
132
+ attr_reader :thread
133
+ # The number of jobs this machine has completed
134
+ attr_reader :jobs_completed
135
+ # The number of jobs this machine has attempted
136
+ attr_reader :jobs_attempted
137
+ # Was this machine dropped from the host list (too many failed jobs)?
138
+ attr_reader :dropped
139
+
140
+ # Create a new machine with the specified +host+ and +config+
141
+ def initialize(host,config)
142
+ self.host = host
143
+ self.config = config
144
+
145
+ @thread = nil
146
+ @jobs_completed = 0
147
+ @jobs_attempted = 0
148
+ @dropped = false
149
+ end
150
+
151
+ # Open an SSH connection to this machine and process jobs until the global jobs queue is empty
152
+ def run
153
+ @thread = Thread.new do
154
+ while config.jobs.size > 0
155
+ job = config.jobs.shift
156
+ if config.debug
157
+ puts "#{DEBUG_WARN} #{self.host} starting job '#{job.short_name}'"
158
+ puts "#{DEBUG_WARN} #{job.command}"
159
+ delay = rand*(DEBUG_INTERVAL[1]-DEBUG_INTERVAL[0])+DEBUG_INTERVAL[0]
160
+ @jobs_attempted += 1
161
+ sleep(delay)
162
+ @jobs_completed += 1
163
+ else
164
+ begin
165
+ @jobs_attempted += 1
166
+ Net::SSH.start(self.host,config.username,:password => config.password,:timeout => config.timeout) do |ssh|
167
+ puts "Starting job #{job.short_name} on #{self.host}" if config.verbose >= VERBOSE_ALL
168
+ if config.temp
169
+ ssh.exec(job.command + " > #{Dir.getwd}/#{config.temp}/#{job.short_name}.#{self.host}")
170
+ else
171
+ ssh.exec(job.command)
172
+ end
173
+ @jobs_completed += 1
174
+ end
175
+ rescue Timeout::Error
176
+ puts "#{job.short_name} FAILED on #{self.host}, dropping it from the hostlist" if config.verbose >= VERBOSE_FAIL
177
+ if not job.failed < config.max_fail
178
+ config.jobs.push job
179
+ job.failed += 1
180
+ else
181
+ puts "CANCELLING #{job.short_name}, too many failures (#{job.failed})" if config.verbose >= VERBOSE_CANCEL
182
+ end
183
+ @dropped = true
184
+ break
185
+ end
186
+ end
187
+ end
188
+ end
189
+ end
190
+
191
+ # Get a one-line summary of this machine's performance
192
+ def report
193
+ tab = "\t"
194
+ if self.host.size > 7
195
+ tab = ""
196
+ end
197
+ "#{self.host}#{tab}\t| #{@jobs_attempted}\t\t| #{@jobs_completed}\t\t| #{@dropped ? 'YES' : 'no'}\t\t|"
198
+ end
199
+ end
200
+
201
+ # Represents an individual job to be run
202
+ class Job
203
+ # The short name of this job, used to name the temporary file it produces
204
+ attr_accessor :short_name
205
+ # The actual command to run to execute this job.
206
+ attr_accessor :command
207
+ # The number of times this job has been unsuccessfully attempted.
208
+ attr_accessor :failed
209
+
210
+ # Create a new job with the specified short and command
211
+ def initialize(short_name,command)
212
+ self.short_name = short_name
213
+ self.command = command
214
+
215
+ self.failed = 0
216
+ end
217
+ end
218
+ end
@@ -0,0 +1,7 @@
1
+ require 'test_helper'
2
+
3
+ class ClusterfuckTest < Test::Unit::TestCase
4
+ def test_something_for_real
5
+ flunk "hey buddy, you should probably rename this file and start testing for real"
6
+ end
7
+ end
@@ -0,0 +1,9 @@
1
+ require 'rubygems'
2
+ require 'test/unit'
3
+
4
+ $LOAD_PATH.unshift(File.dirname(__FILE__))
5
+ $LOAD_PATH.unshift(File.join(File.dirname(__FILE__), '..', 'lib'))
6
+ require 'clusterfuck'
7
+
8
+ class Test::Unit::TestCase
9
+ end
metadata ADDED
@@ -0,0 +1,66 @@
1
+ --- !ruby/object:Gem::Specification
2
+ name: clusterfuck
3
+ version: !ruby/object:Gem::Version
4
+ version: 0.1.0
5
+ platform: ruby
6
+ authors:
7
+ - Trevor Fountain
8
+ autorequire:
9
+ bindir: bin
10
+ cert_chain: []
11
+
12
+ date: 2009-10-20 00:00:00 +01:00
13
+ default_executable: clusterfuck
14
+ dependencies: []
15
+
16
+ description: Automate the execution of jobs across multiple machines with SSH. Ideal for systems with shared filesystems.
17
+ email: doches@gmail.com
18
+ executables:
19
+ - clusterfuck
20
+ extensions: []
21
+
22
+ extra_rdoc_files:
23
+ - LICENSE
24
+ - README.rdoc
25
+ files:
26
+ - .document
27
+ - .gitignore
28
+ - LICENSE
29
+ - README.rdoc
30
+ - Rakefile
31
+ - VERSION
32
+ - bin/clusterfuck
33
+ - lib/clusterfuck.rb
34
+ - test/clusterfuck_test.rb
35
+ - test/test_helper.rb
36
+ has_rdoc: true
37
+ homepage: http://github.com/doches/clusterfuck
38
+ licenses: []
39
+
40
+ post_install_message:
41
+ rdoc_options:
42
+ - --charset=UTF-8
43
+ require_paths:
44
+ - lib
45
+ required_ruby_version: !ruby/object:Gem::Requirement
46
+ requirements:
47
+ - - ">="
48
+ - !ruby/object:Gem::Version
49
+ version: "0"
50
+ version:
51
+ required_rubygems_version: !ruby/object:Gem::Requirement
52
+ requirements:
53
+ - - ">="
54
+ - !ruby/object:Gem::Version
55
+ version: "0"
56
+ version:
57
+ requirements: []
58
+
59
+ rubyforge_project:
60
+ rubygems_version: 1.3.5
61
+ signing_key:
62
+ specification_version: 3
63
+ summary: Run jobs across multiple machines via ssh
64
+ test_files:
65
+ - test/test_helper.rb
66
+ - test/clusterfuck_test.rb