clusterfuck 0.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,5 @@
1
+ README.rdoc
2
+ lib/**/*.rb
3
+ bin/*
4
+ features/**/*.feature
5
+ LICENSE
@@ -0,0 +1,5 @@
1
+ *.sw?
2
+ .DS_Store
3
+ coverage
4
+ rdoc
5
+ pkg
data/LICENSE ADDED
@@ -0,0 +1,20 @@
1
+ Copyright (c) 2009 Trevor Fountain
2
+
3
+ Permission is hereby granted, free of charge, to any person obtaining
4
+ a copy of this software and associated documentation files (the
5
+ "Software"), to deal in the Software without restriction, including
6
+ without limitation the rights to use, copy, modify, merge, publish,
7
+ distribute, sublicense, and/or sell copies of the Software, and to
8
+ permit persons to whom the Software is furnished to do so, subject to
9
+ the following conditions:
10
+
11
+ The above copyright notice and this permission notice shall be
12
+ included in all copies or substantial portions of the Software.
13
+
14
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
15
+ EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
16
+ MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
17
+ NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE
18
+ LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION
19
+ OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION
20
+ WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
@@ -0,0 +1,49 @@
1
+ = Clusterfuck
2
+ ==== A Subversive Distributed-Systems Tool
3
+
4
+ Clusterfuck is a tool for automating the process of SSH-ing into remote machines and kickstarting a large number
5
+ of jobs. It's probably best explained by an example, so here's what I use it for:
6
+
7
+ As part of my research I need to compute the distance between each pair of objects in a set of about 70,000 items.
8
+ Computing the distance between each pair takes a few seconds; running the entire job on a single machine generally takes over a day.
9
+ However, as a member of the University I have a ssh login that works on quite a few machines, so I found myself breaking the job up into smaller, quicker chunks and running each chunk on a different machine.
10
+ Clusterfuck was born out of my frustration with that method -- "surely," I said to myself, "this can be automated."
11
+
12
+ If you have a lot of jobs to run and access to multiple machines on which to run them, Clusterfuck is for you!
13
+
14
+ == Usage
15
+ To use Clusterfuck you'll first need to create a configuration file (a "clusterfile"). An example clusterfile might look something like this:
16
+
17
+ Clusterfuck::Task.new do |task|
18
+ task.hosts = %w{clark asimov}
19
+ task.jobs = (0..3).map { |x| Clusterfuck::Job.new("host{x}","sleep 0.5 && hostname") }
20
+ task.temp = "fragments"
21
+ task.username = "SSHUSERNAME"
22
+ task.password = "SSHPASSWORD"
23
+ task.debug = true
24
+ end
25
+
26
+ This creates a new clusterfuck task and distributes the jobs across two hosts, +clark+ and +asimov+.
27
+ The jobs to be run in this case are pretty trivial; we basically ssh into each machine, sleep for a little bit, then get the hostname.
28
+ Whatever each job prints to stdout is saved in +task+.+temp+ (under the current working directory); running
29
+ this clusterfile will create 4 files in <code>./fragments/</code>: host0.[hostname], host1.[hostname], host2.[hostname], and host3.[hostname] (where [hostname] is the name of the machine on which the job was run).
30
+ +task+.+username+ and +task+.+password+ are the SSH credentials used to log into the maching -- currently, Clusterfuck
31
+ can only use one global set of credentials. There's no technical reason for this, other than the fact that I don't
32
+ really need to use machine-specific logins, so it'll probably appear in future releases.
33
+ +task+.+verbose+ turns on verbose output (messages to stdout each time a job is started, skipped, or canceled).
34
+
35
+ Once you have a clusterfile you can kick off your jobs by running the command +clusterfuck+ in the same directory.
36
+
37
+ == Note on Patches/Pull Requests
38
+
39
+ * Fork the project.
40
+ * Add something cool or fix a nefarious bug. Documentation wins extra love.
41
+ * Add tests for it. I'd really like this, but since I haven't written any tests myself yet I can't really blame you if you skip it...
42
+ * Commit, but do not mess with rakefile, version, or history.
43
+ (if you want to have your own version that's ok -- but
44
+ bump the version in a separate commit that I can ignore when I pull)
45
+ * Send me a pull request.
46
+
47
+ == Copyright
48
+
49
+ Copyright (c) 2009 Trevor Fountain. See LICENSE for details.
@@ -0,0 +1,60 @@
1
+ require 'rubygems'
2
+ require 'rake'
3
+
4
+ begin
5
+ require 'jeweler'
6
+ Jeweler::Tasks.new do |gem|
7
+ gem.name = "clusterfuck"
8
+ gem.summary = %Q{Run jobs across multiple machines via ssh}
9
+ gem.description = %Q{Automate the execution of jobs across multiple machines with SSH. Ideal for systems with shared filesystems.}
10
+ gem.email = "doches@gmail.com"
11
+ gem.homepage = "http://github.com/doches/clusterfuck"
12
+ gem.authors = ["Trevor Fountain"]
13
+ # gem is a Gem::Specification... see http://www.rubygems.org/read/chapter/20 for additional settings
14
+ end
15
+ Jeweler::GemcutterTasks.new
16
+ rescue LoadError
17
+ puts "Jeweler (or a dependency) not available. Install it with: sudo gem install jeweler"
18
+ end
19
+
20
+ require 'rake/testtask'
21
+ Rake::TestTask.new(:test) do |test|
22
+ test.libs << 'lib' << 'test'
23
+ test.pattern = 'test/**/*_test.rb'
24
+ test.verbose = true
25
+ end
26
+
27
+ begin
28
+ require 'rcov/rcovtask'
29
+ Rcov::RcovTask.new do |test|
30
+ test.libs << 'test'
31
+ test.pattern = 'test/**/*_test.rb'
32
+ test.verbose = true
33
+ end
34
+ rescue LoadError
35
+ task :rcov do
36
+ abort "RCov is not available. In order to run rcov, you must: sudo gem install spicycode-rcov"
37
+ end
38
+ end
39
+
40
+ task :test => :check_dependencies
41
+
42
+ task :default => :test
43
+
44
+ gem 'rdoc'
45
+ require 'rdoc'
46
+ require 'rake/rdoctask'
47
+ Rake::RDocTask.new do |rdoc|
48
+ if File.exist?('VERSION')
49
+ version = File.read('VERSION')
50
+ else
51
+ version = ""
52
+ end
53
+
54
+ rdoc.rdoc_dir = 'rdoc'
55
+ rdoc.title = "Clusterfuck #{version}"
56
+ rdoc.rdoc_files.include('README*')
57
+ rdoc.rdoc_files.include('lib/*.rb')
58
+ rdoc.main = "README.rdoc"
59
+ rdoc.options += ["-SHN","-f","darkfish"]
60
+ end
data/VERSION ADDED
@@ -0,0 +1 @@
1
+ 0.1.0
@@ -0,0 +1,20 @@
1
+ #!/usr/bin/env ruby
2
+
3
+ require 'clusterfuck'
4
+ if ARGV[0]
5
+ # Use specified file
6
+ load ARGV[0]
7
+ else
8
+ # Search the current directory for a clusterfile
9
+ found = false
10
+ Dir.foreach(".") do |file|
11
+ if file.downcase == "clusterfile"
12
+ load file
13
+ found = true
14
+ break
15
+ end
16
+ end
17
+ if not found
18
+ STDERR.puts "No clusterfile found!"
19
+ end
20
+ end
@@ -0,0 +1,218 @@
1
+ require 'socket'
2
+ require 'net/ssh'
3
+
4
+ # Clusterfuck is an ugly, dirty hack to run a large number of jobs on multiple machines.
5
+ # If you can break your task up into a series of small, independent jobs, clusterfuck
6
+ # can automate the process of distributing jobs across machines.
7
+ module Clusterfuck
8
+ # Print a message when a job is cancelled due to too many failures
9
+ VERBOSE_CANCEL = 0
10
+ # Print a message when a job is cancelled AND at each failure
11
+ VERBOSE_FAIL = 1
12
+ # Print a message for cancellations and failures, AND each time a job is started.
13
+ VERBOSE_ALL = 2
14
+
15
+ # The flag used to prefix dry run (debugging) messages.
16
+ DEBUG_WARN = "[DRY-RUN]"
17
+ # The interval to sleep instead of running jobs when performing a dry run (in seconds)
18
+ DEBUG_INTERVAL = [0.2,1.0]
19
+
20
+ # A configuration holds the various pieces of information Clusterfuck needs
21
+ # to represent a task.
22
+ #
23
+ # You probably won't need to instantiate a Configuration directly; one is created
24
+ # when you create a new Task, and passed to the block it takes as a parameter. See Task
25
+ # for more information.
26
+ #
27
+ # Possible configuration options include:
28
+ # [timeout] Number of seconds to wait before an SSH connection 'times out' (DEFAULT: 2)
29
+ # [max_fail] Max number of times a failing job will be re-attempted on a new machine (DEFAULT: 3)
30
+ # [hosts] Array of hostnames (or ip addresses) as Strings to use as nodes
31
+ # [jobs] Array of Job objects, one per job, which will be allocated to the +hosts+. If you're lazy,
32
+ # you can also just use an array of strings (where each string is the command to run) -- a short
33
+ # name for each will be produced using the first 8 chars from the command.
34
+ # [verbose] Level of message reporting. One of +VERBOSE_CANCEL+,+VERBOSE_FAIL+, or +VERBOSE_ALL+
35
+ # (DEFAULT: +VERBOSE_CANCEL+)
36
+ # [username] The SSH username to use to connect
37
+ # [password] The SSH password to use to connect
38
+ # [show_report] Show a report after all jobs are complete that gives statistics for each machine.
39
+ # [debug] Do a 'dry run' -- allocate jobs to machines and display the result but DO NOT actually
40
+ # connect to any machines or run any jobs. Useful for testing your clusterfile before
41
+ # kicking off a major run.
42
+ # [temp] Directory in which to capture stdout from each job. Setting this to +false+
43
+ # will cause clusterfuck to ignore job output, leaving it up to you to capture the results
44
+ # of each job. (DEFAULT: ./fragments)
45
+ class Configuration
46
+ # Holds the user-specified options. Again, you probably don't want to access this directly -- use the
47
+ # getter/setter syntax instead.
48
+ attr_reader :options
49
+
50
+ # Create a new Configuration object with default options.
51
+ def initialize
52
+ @options = {
53
+ "timeout" => 2,
54
+ "max_fail" => 3,
55
+ "verbose" => VERBOSE_CANCEL,
56
+ "show_report" => true,
57
+ "temp" => "./fragments",
58
+ }
59
+ end
60
+
61
+ # You can get/set options as if they were attributes, i.e. +config.foo = "bar"+ will set the option +foo+ to "bar".
62
+ def method_missing(key,args=nil)
63
+ if args.nil?
64
+ return @options[key.to_s]
65
+ else
66
+ key = key.to_s.gsub!("=","")
67
+ @options[key] = args
68
+ end
69
+ end
70
+
71
+ # Get a pretty-printed version of the currently set options
72
+ def to_s
73
+ @options.map { |pair| "#{pair[0]} = \"#{pair[1]}\""}.join(", ")
74
+ end
75
+
76
+ # Convert array of string commands to Job objects if necessary
77
+ def jobify!
78
+ @options["jobs"].map! do |job|
79
+ if not job.is_a?(Job) # Ah-ha, make this string into a job
80
+ short = job.downcase.gsub(/[^a-z]/,"")
81
+ short = job[0..7] if short.size > 8
82
+ Job.new(short,job)
83
+ else # Don't change anything...
84
+ job
85
+ end
86
+ end
87
+ end
88
+ end
89
+
90
+ # The primary means of interacting with Clusterfuck. Create a new
91
+ # Task, passing in a block that takes a Configuration object as a parameter (rake-style).
92
+ # The constructor returns after all jobs have been completed.
93
+ class Task
94
+ # See Configuration for a list of recognized configuration options.
95
+ def initialize(&custom)
96
+ # Run configuration options specified in clusterfile
97
+ config = Configuration.new
98
+ custom.call(config)
99
+ config.jobify!
100
+
101
+ # Make output fragment directory
102
+ `mkdir #{config.temp}` if config.temp and not File.exists?(config.temp)
103
+
104
+ # Run all jobs
105
+ machines = config.hosts.map { |name| Machine.new(name,config) }
106
+ machines.each { |machine| machine.run }
107
+
108
+ # Wait for jobs to terminate
109
+ machines.each do |machine|
110
+ begin
111
+ machine.thread.join
112
+ rescue Timeout::Error
113
+ STDERR.puts machine.to_s
114
+ end
115
+ end
116
+
117
+ # Print a report, if requested
118
+ if config.show_report
119
+ puts " Machine\t| STARTED\t| COMPLETE\t| FAILED\t|"
120
+ machines.each { |machine| puts machine.report }
121
+ end
122
+ end
123
+ end
124
+
125
+ # Represents a single machine (node) in our ad hoc cluster
126
+ class Machine
127
+ # The hostname of this machine
128
+ attr_accessor :host
129
+ # The global config options specified when the task was created
130
+ attr_accessor :config
131
+ # The thread represented this machine's ssh process
132
+ attr_reader :thread
133
+ # The number of jobs this machine has completed
134
+ attr_reader :jobs_completed
135
+ # The number of jobs this machine has attempted
136
+ attr_reader :jobs_attempted
137
+ # Was this machine dropped from the host list (too many failed jobs)?
138
+ attr_reader :dropped
139
+
140
+ # Create a new machine with the specified +host+ and +config+
141
+ def initialize(host,config)
142
+ self.host = host
143
+ self.config = config
144
+
145
+ @thread = nil
146
+ @jobs_completed = 0
147
+ @jobs_attempted = 0
148
+ @dropped = false
149
+ end
150
+
151
+ # Open an SSH connection to this machine and process jobs until the global jobs queue is empty
152
+ def run
153
+ @thread = Thread.new do
154
+ while config.jobs.size > 0
155
+ job = config.jobs.shift
156
+ if config.debug
157
+ puts "#{DEBUG_WARN} #{self.host} starting job '#{job.short_name}'"
158
+ puts "#{DEBUG_WARN} #{job.command}"
159
+ delay = rand*(DEBUG_INTERVAL[1]-DEBUG_INTERVAL[0])+DEBUG_INTERVAL[0]
160
+ @jobs_attempted += 1
161
+ sleep(delay)
162
+ @jobs_completed += 1
163
+ else
164
+ begin
165
+ @jobs_attempted += 1
166
+ Net::SSH.start(self.host,config.username,:password => config.password,:timeout => config.timeout) do |ssh|
167
+ puts "Starting job #{job.short_name} on #{self.host}" if config.verbose >= VERBOSE_ALL
168
+ if config.temp
169
+ ssh.exec(job.command + " > #{Dir.getwd}/#{config.temp}/#{job.short_name}.#{self.host}")
170
+ else
171
+ ssh.exec(job.command)
172
+ end
173
+ @jobs_completed += 1
174
+ end
175
+ rescue Timeout::Error
176
+ puts "#{job.short_name} FAILED on #{self.host}, dropping it from the hostlist" if config.verbose >= VERBOSE_FAIL
177
+ if not job.failed < config.max_fail
178
+ config.jobs.push job
179
+ job.failed += 1
180
+ else
181
+ puts "CANCELLING #{job.short_name}, too many failures (#{job.failed})" if config.verbose >= VERBOSE_CANCEL
182
+ end
183
+ @dropped = true
184
+ break
185
+ end
186
+ end
187
+ end
188
+ end
189
+ end
190
+
191
+ # Get a one-line summary of this machine's performance
192
+ def report
193
+ tab = "\t"
194
+ if self.host.size > 7
195
+ tab = ""
196
+ end
197
+ "#{self.host}#{tab}\t| #{@jobs_attempted}\t\t| #{@jobs_completed}\t\t| #{@dropped ? 'YES' : 'no'}\t\t|"
198
+ end
199
+ end
200
+
201
+ # Represents an individual job to be run
202
+ class Job
203
+ # The short name of this job, used to name the temporary file it produces
204
+ attr_accessor :short_name
205
+ # The actual command to run to execute this job.
206
+ attr_accessor :command
207
+ # The number of times this job has been unsuccessfully attempted.
208
+ attr_accessor :failed
209
+
210
+ # Create a new job with the specified short and command
211
+ def initialize(short_name,command)
212
+ self.short_name = short_name
213
+ self.command = command
214
+
215
+ self.failed = 0
216
+ end
217
+ end
218
+ end
@@ -0,0 +1,7 @@
1
+ require 'test_helper'
2
+
3
+ class ClusterfuckTest < Test::Unit::TestCase
4
+ def test_something_for_real
5
+ flunk "hey buddy, you should probably rename this file and start testing for real"
6
+ end
7
+ end
@@ -0,0 +1,9 @@
1
+ require 'rubygems'
2
+ require 'test/unit'
3
+
4
+ $LOAD_PATH.unshift(File.dirname(__FILE__))
5
+ $LOAD_PATH.unshift(File.join(File.dirname(__FILE__), '..', 'lib'))
6
+ require 'clusterfuck'
7
+
8
+ class Test::Unit::TestCase
9
+ end
metadata ADDED
@@ -0,0 +1,66 @@
1
+ --- !ruby/object:Gem::Specification
2
+ name: clusterfuck
3
+ version: !ruby/object:Gem::Version
4
+ version: 0.1.0
5
+ platform: ruby
6
+ authors:
7
+ - Trevor Fountain
8
+ autorequire:
9
+ bindir: bin
10
+ cert_chain: []
11
+
12
+ date: 2009-10-20 00:00:00 +01:00
13
+ default_executable: clusterfuck
14
+ dependencies: []
15
+
16
+ description: Automate the execution of jobs across multiple machines with SSH. Ideal for systems with shared filesystems.
17
+ email: doches@gmail.com
18
+ executables:
19
+ - clusterfuck
20
+ extensions: []
21
+
22
+ extra_rdoc_files:
23
+ - LICENSE
24
+ - README.rdoc
25
+ files:
26
+ - .document
27
+ - .gitignore
28
+ - LICENSE
29
+ - README.rdoc
30
+ - Rakefile
31
+ - VERSION
32
+ - bin/clusterfuck
33
+ - lib/clusterfuck.rb
34
+ - test/clusterfuck_test.rb
35
+ - test/test_helper.rb
36
+ has_rdoc: true
37
+ homepage: http://github.com/doches/clusterfuck
38
+ licenses: []
39
+
40
+ post_install_message:
41
+ rdoc_options:
42
+ - --charset=UTF-8
43
+ require_paths:
44
+ - lib
45
+ required_ruby_version: !ruby/object:Gem::Requirement
46
+ requirements:
47
+ - - ">="
48
+ - !ruby/object:Gem::Version
49
+ version: "0"
50
+ version:
51
+ required_rubygems_version: !ruby/object:Gem::Requirement
52
+ requirements:
53
+ - - ">="
54
+ - !ruby/object:Gem::Version
55
+ version: "0"
56
+ version:
57
+ requirements: []
58
+
59
+ rubyforge_project:
60
+ rubygems_version: 1.3.5
61
+ signing_key:
62
+ specification_version: 3
63
+ summary: Run jobs across multiple machines via ssh
64
+ test_files:
65
+ - test/test_helper.rb
66
+ - test/clusterfuck_test.rb