auto-consul 0.0.1

Sign up to get free protection for your applications and to get access to all the features.
data/Gemfile ADDED
@@ -0,0 +1,12 @@
1
+ source 'https://rubygems.org'
2
+
3
+ gem 'aws-sdk'
4
+
5
+ group :development do
6
+ gem 'rake', '~> 10.1.0'
7
+ gem 'pry', '~> 0.9.0'
8
+ end
9
+
10
+ group :test do
11
+ gem 'rspec'
12
+ end
data/Gemfile.lock ADDED
@@ -0,0 +1,36 @@
1
+ GEM
2
+ remote: https://rubygems.org/
3
+ specs:
4
+ aws-sdk (1.40.0)
5
+ json (~> 1.4)
6
+ nokogiri (>= 1.4.4)
7
+ coderay (1.1.0)
8
+ diff-lcs (1.2.5)
9
+ json (1.8.1)
10
+ method_source (0.8.2)
11
+ mini_portile (0.5.3)
12
+ nokogiri (1.6.1)
13
+ mini_portile (~> 0.5.0)
14
+ pry (0.9.12.6)
15
+ coderay (~> 1.0)
16
+ method_source (~> 0.8)
17
+ slop (~> 3.4)
18
+ rake (10.1.1)
19
+ rspec (2.14.1)
20
+ rspec-core (~> 2.14.0)
21
+ rspec-expectations (~> 2.14.0)
22
+ rspec-mocks (~> 2.14.0)
23
+ rspec-core (2.14.8)
24
+ rspec-expectations (2.14.5)
25
+ diff-lcs (>= 1.1.3, < 2.0)
26
+ rspec-mocks (2.14.6)
27
+ slop (3.5.0)
28
+
29
+ PLATFORMS
30
+ ruby
31
+
32
+ DEPENDENCIES
33
+ aws-sdk
34
+ pry (~> 0.9.0)
35
+ rake (~> 10.1.0)
36
+ rspec
data/LICENSE ADDED
@@ -0,0 +1,21 @@
1
+ The MIT License (MIT)
2
+
3
+ Copyright (c) 2014 Ethan Rowe
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ of this software and associated documentation files (the "Software"), to deal
7
+ in the Software without restriction, including without limitation the rights
8
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ copies of the Software, and to permit persons to whom the Software is
10
+ furnished to do so, subject to the following conditions:
11
+
12
+ The above copyright notice and this permission notice shall be included in all
13
+ copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21
+ SOFTWARE.
data/README.md ADDED
@@ -0,0 +1,132 @@
1
+ auto-consul
2
+ ===========
3
+
4
+ Ruby gem for bootstrapping consul cluster members
5
+
6
+ # Example usage
7
+
8
+ Given two vagrant boxes, each with consul and auto-consul installed.
9
+
10
+ Export your AWS keys into the environment in each:
11
+
12
+ ```
13
+ export AWS_ACCESS_KEY_ID=...
14
+ export AWS_SECRET_ACCESS_KEY=...
15
+ ```
16
+
17
+ This will allow the AWS SDK to pick them up.
18
+
19
+ The server, screen A:
20
+
21
+ auto-consul -r s3://my-bucket/consul/test-cluster \
22
+ -a 192.168.50.100 \
23
+ -n server1 \
24
+ run
25
+
26
+ Then, server screen B:
27
+
28
+ while true; do
29
+ auto-consul -r s3://my-bucket/consul/test-cluster \
30
+ -a 192.168.50.100 \
31
+ -n server1 \
32
+ heartbeat
33
+ sleep 60
34
+ done
35
+
36
+ The first launches the agent, the latter checks its run status and
37
+ issues a heartbeat to the specified S3 bucket.
38
+
39
+ Because this is the first server, there will be no heartbeats in the
40
+ bucket (assuming a fresh bucket/key combination). Therefore, the agent
41
+ will be launched in server mode, along with the bootstrap option to
42
+ initialize the raft cluster for state management.
43
+
44
+ Look in the S3 bucket above, under "servers", and you should see
45
+ a timestamped entry like "20140516092731-server1". This is produced
46
+ by the "heartbeat" command and allows new agents to discover active
47
+ members of the cluster for joining.
48
+
49
+ Having seen the server heartbeat, go to the agent vagrant box, and
50
+ do something similar. Screen A:
51
+
52
+ auto-consul -r s3://my-bucket/consul/test-cluster \
53
+ -a 192.168.50.101 \
54
+ -n agent1 \
55
+ run
56
+
57
+ In this case, the agent will discover the server via its heartbeat. It
58
+ will know that we have enough servers (it defaults to only wanting one;
59
+ that's fine for dev/testing but not good for availability) and thus
60
+ simply join as a normal agent.
61
+
62
+ Screen B:
63
+
64
+ while true; do
65
+ auto-consul -r s3://my-bucket/consul/test-cluster \
66
+ -a 192.168.50.101 \
67
+ -n agent1 \
68
+ heartbeat
69
+ sleep 60
70
+ done
71
+
72
+ This generates heartbeats like the server did, but while the server
73
+ sends heartbeats both to "servers" and "agents" in the bucket, the
74
+ normal agent sends heartbeats only to "agents".
75
+
76
+ # Mode determination
77
+
78
+ Given a desired number of servers (defaulting to 1) and a registry
79
+ (for now, an S3-style URL), the basic algorithm is:
80
+
81
+ - Are there enough servers?
82
+ - Yes: Be an agent. Done.
83
+ - No: are there no servers?
84
+ - Yes: Be a server with bootstrap mode. Done.
85
+ - No: Be a server without bootstrap mode, joining with others. Done.
86
+
87
+ There is very obviously a race condition in the determination of node
88
+ mode. In practice, it should be easy enough to coordinate things such
89
+ that the race doesn't cause problems. Longer term, we'll need to revise
90
+ the mode determination logic to use a backend supporting optimistic
91
+ locking or some such. (A compare-and-swap pattern would work fine; consul
92
+ itself would allow for this given one existing server).
93
+
94
+ ## Heartbeats and membership
95
+
96
+ The heartbeats give us a rough indication of cluster membership. The
97
+ tool uses an expiry time (in seconds) to determine which heartbeats are
98
+ still active, and will purge any expired heartbeats from the registry
99
+ whenever it encounters them.
100
+
101
+ Each heartbeat tells us:
102
+ - The node's name within the consul cluster
103
+ - The timestamp of the heartbeat (the freshness)
104
+ - The IP at which the node can be reached for cluster join operations.
105
+
106
+ For now, it is necessary to run the heartbeat utility in parallel to the
107
+ run utility. In subsequent work we may want to have these things coordinated
108
+ by one daemon, but given the experimental nature of this project it's not
109
+ worth caring about just yet.
110
+
111
+ The heartbeat asks consul for its status and from that determines if it
112
+ is running as a server or regular agent (or if it is running at all). If
113
+ consul is not running at all, no heartbeat will be emitted.
114
+
115
+ The default expiry is 120 seconds. It is recommended that heartbeats fire
116
+ at half that duration (60 seconds).
117
+
118
+ # Cluster join
119
+
120
+ After the node mode is determined, it's necessary (except in the case of
121
+ a bootstrap-mode server) to join a cluster by contacting an extant member.
122
+
123
+ This is the primary purpose of the heartbeat registry; a server-mode node
124
+ will find the IP (from the active heartbeats) of a *server*, and use that
125
+ IP to join the cluster. An agent-mode node will find the IP of an *agent*
126
+ for the join operation.
127
+
128
+ In a production-ready tool, we would have a monitor on the registry and
129
+ keep trying new hosts until a join succeeds. But in this experimental
130
+ phase, it just picks the first member in the relevant list and uses that.
131
+ If that member is actually down, then the join simply won't work.
132
+
data/bin/auto-consul ADDED
@@ -0,0 +1,110 @@
1
+ #!/usr/bin/env ruby
2
+
3
+ # vim: set filetype=ruby;
4
+
5
+ require 'auto-consul'
6
+ require 'optparse'
7
+ require 'socket'
8
+ require 'ostruct'
9
+
10
+ class UnknownCommandException < Exception
11
+ end
12
+
13
+ class Command < OpenStruct
14
+ def local
15
+ @local ||= AutoConsul::Local.bind_to_path(data)
16
+ end
17
+
18
+ def cluster
19
+ @cluster ||= AutoConsul::Cluster.new registry
20
+ end
21
+
22
+ def state
23
+ @state ||= AutoConsul::RunState::CLIProvider.new
24
+ end
25
+
26
+ def do_set_mode
27
+ cluster.set_mode! local, expiry, servers
28
+ end
29
+
30
+ def do_run
31
+ if local.mode.nil?
32
+ do_set_mode
33
+ end
34
+ do_direct_run
35
+ end
36
+
37
+ def do_direct_run
38
+ runner = :no_op
39
+ runner = :run_agent! if local.agent?
40
+ runner = :run_server! if local.server?
41
+ runner = AutoConsul::Runner.method(runner)
42
+ runner.call node, addr, expiry, local, cluster
43
+ end
44
+
45
+ def do_heartbeat
46
+ if state.running?
47
+ cluster.servers.heartbeat! node, addr, expiry if state.server?
48
+ cluster.agents.heartbeat! node, addr, expiry if state.agent?
49
+ end
50
+ end
51
+
52
+ def execute cmd
53
+ command = "do_#{cmd}".to_sym
54
+ if respond_to? command
55
+ send command
56
+ else
57
+ raise UnknownCommandException.new("Unknown command: #{cmd}")
58
+ end
59
+ end
60
+ end
61
+
62
+ runner = Command.new(:data => '/tmp/consul/state',
63
+ :dc => 'dc1',
64
+ :expiry => 120,
65
+ :servers => 1,
66
+ :node => Socket.gethostname.split('.', 2)[0])
67
+
68
+ parser = OptionParser.new do |opts|
69
+ opts.banner = "Usage: auto-consul [options] COMMAND"
70
+
71
+ opts.on("-r", "--registry URL", String, "The cluster registry URL") do |u|
72
+ runner.registry = u
73
+ end
74
+
75
+ opts.on("-d", "--data-dir PATH", String, "The path where local state will be preserved.") do |d|
76
+ runner.data_dir = d
77
+ end
78
+
79
+ opts.on("-a", "--address IPADDR", String, "The IP address to bind to and announce for cluster communication.") do |a|
80
+ runner.addr = a
81
+ end
82
+
83
+ opts.on("-n", "--node NAME", String, "The unique name by which the node identifies itself within the cluster.") do |n|
84
+ runner.node = n
85
+ end
86
+
87
+ opts.on("-e", "--expiry SECONDS", Integer, "The expiration time (in seconds) for registry heartbeats") do |e|
88
+ runner.expiry = e.to_i
89
+ end
90
+
91
+ opts.on("-s", "--servers NUMBER", Integer, "The desired number of consul servers.") do |s|
92
+ runner.servers = s.to_i
93
+ end
94
+
95
+ opts.on_tail('-h', '--help', "Show this help message.") do
96
+ puts opts
97
+ exit
98
+ end
99
+ end
100
+
101
+ parser.parse!
102
+
103
+ begin
104
+ runner.execute(ARGV.shift)
105
+ rescue UnknownCommandException => e
106
+ puts e.message
107
+ puts parser
108
+ exit 2
109
+ end
110
+
@@ -0,0 +1,53 @@
1
+ require 'uri'
2
+
3
+ module AutoConsul
4
+ class Cluster
5
+ def self.get_provider_for_uri uri_string
6
+ uri = URI(uri_string)
7
+ Registry.supported_schemes[uri.scheme.downcase].new uri
8
+ end
9
+
10
+ attr_reader :uri_string
11
+
12
+ def initialize uri
13
+ @uri_string = uri
14
+ end
15
+
16
+ def servers
17
+ @servers ||= self.class.get_provider_for_uri File.join(uri_string, 'servers')
18
+ end
19
+
20
+ def agents
21
+ @agents ||= self.class.get_provider_for_uri File.join(uri_string, 'agents')
22
+ end
23
+
24
+ def set_mode! local_state, expiry, desired_servers=1
25
+ if servers.members(expiry).size < desired_servers
26
+ local_state.set_server!
27
+ else
28
+ local_state.set_agent!
29
+ end
30
+ end
31
+
32
+ module Registry
33
+ def self.supported_schemes
34
+ constants.inject({}) do |a, const|
35
+ if const.to_s =~ /^(.+?)Provider$/
36
+ a[$1.downcase] = const_get(const)
37
+ end
38
+ a
39
+ end
40
+ end
41
+
42
+ class Provider
43
+ attr_reader :uri
44
+
45
+ def initialize uri
46
+ @uri = uri
47
+ end
48
+ end
49
+ end
50
+ end
51
+ end
52
+
53
+ require_relative 'providers/s3'
@@ -0,0 +1,78 @@
1
+ require 'fileutils'
2
+
3
+ module AutoConsul
4
+ module Local
5
+ class FileSystemState
6
+ def initialize path
7
+ unless File.directory? path
8
+ FileUtils.mkdir_p path
9
+ end
10
+
11
+ @path = path
12
+ end
13
+
14
+ def path
15
+ @path
16
+ end
17
+
18
+ def mode_path
19
+ File.join(path, 'mode')
20
+ end
21
+
22
+ def set_server!
23
+ set_mode 'server'
24
+ end
25
+
26
+ def set_agent!
27
+ set_mode 'agent'
28
+ end
29
+
30
+ def set_mode mode
31
+ File.open(mode_path, 'w') do |f|
32
+ f.write mode
33
+ end
34
+ end
35
+
36
+ VALID_MODES = {
37
+ 'agent' => 'agent',
38
+ 'server' => 'server',
39
+ }
40
+
41
+ def self.determine_mode mode_file
42
+ if File.file? mode_file
43
+ value = File.open(mode_file, 'r') {|f| f.read }
44
+ VALID_MODES[value]
45
+ else
46
+ nil
47
+ end
48
+ end
49
+
50
+ def mode
51
+ if @mode.nil?
52
+ @mode = self.class.determine_mode mode_path
53
+ end
54
+ @mode
55
+ end
56
+
57
+ def server?
58
+ mode == 'server'
59
+ end
60
+
61
+ def agent?
62
+ mode == 'agent'
63
+ end
64
+
65
+ def data_path
66
+ if not (m = mode).nil?
67
+ File.join(path, mode)
68
+ else
69
+ nil
70
+ end
71
+ end
72
+ end
73
+
74
+ def self.bind_to_path path
75
+ FileSystemState.new path
76
+ end
77
+ end
78
+ end
@@ -0,0 +1,108 @@
1
+ require 'aws-sdk'
2
+
3
+ module AutoConsul::Cluster::Registry
4
+ class S3Provider < Provider
5
+ class S3Member
6
+ attr_reader :s3_object, :identifier, :time
7
+
8
+ def initialize s3obj
9
+ @s3_object = s3obj
10
+ @time, @identifier = S3Provider.from_key_base(File.basename(s3obj.key))
11
+ @data_read = false
12
+ end
13
+
14
+ def data
15
+ if not @data_read
16
+ @data = s3_object.read
17
+ @data_read = true
18
+ end
19
+ @data
20
+ end
21
+ end
22
+
23
+ def s3
24
+ @s3 ||= self.class.get_s3
25
+ end
26
+
27
+ def self.get_s3
28
+ AWS::S3.new
29
+ end
30
+
31
+ def bucket_name
32
+ uri.host
33
+ end
34
+
35
+ def bucket
36
+ @bucket ||= s3.buckets[bucket_name]
37
+ end
38
+
39
+ def key_prefix
40
+ uri.path[1..-1]
41
+ end
42
+
43
+ def now
44
+ Time.now
45
+ end
46
+
47
+ KEY_TIMESTAMP_FORMAT = '%Y%m%d%H%M%S'
48
+
49
+ def self.write_stamp time
50
+ time.dup.utc.strftime KEY_TIMESTAMP_FORMAT
51
+ end
52
+
53
+ def self.read_stamp stamp
54
+ t = Time.strptime(stamp, KEY_TIMESTAMP_FORMAT)
55
+ Time.utc(t.year, t.month, t.day, t.hour, t.min, t.sec, 0)
56
+ end
57
+
58
+ def self.to_key_base time, identifier
59
+ "#{write_stamp time}-#{identifier.to_s}"
60
+ end
61
+
62
+ def self.from_key_base key_base
63
+ stamp, identifier = key_base.split('-', 2)
64
+ [read_stamp(stamp), identifier]
65
+ end
66
+
67
+ def write_key time, identity
68
+ File.join(key_prefix, self.class.to_key_base(time, identity))
69
+ end
70
+
71
+ def heartbeat! identity, data, expiry=nil
72
+ result = bucket.objects[write_key now, identity].write data
73
+ purge!(expiry) unless expiry.nil?
74
+ result
75
+ end
76
+
77
+ def purge! expiry
78
+ min_key = File.join(key_prefix, "#{self.class.write_stamp(Time.now - expiry + 1)}-")
79
+ bucket.objects.with_prefix(key_prefix).delete_if do |s3obj|
80
+ s3obj.key < min_key
81
+ end
82
+ end
83
+
84
+ def members expiry
85
+ deletes, actives = [], {}
86
+ # The expiry gives an exclusive boundary, not an inclusive,
87
+ # so the minimal allowable key must begin one second after the
88
+ # specified expiry (given a resolution of seconds).
89
+ min_time = Time.now.utc - expiry + 1
90
+ min_key = File.join(key_prefix, min_time.strftime('%Y%m%d%H%M%S-'))
91
+ bucket.objects.with_prefix(key_prefix).each do |obj|
92
+ if obj.key < min_key
93
+ deletes << obj
94
+ else
95
+ o = S3Member.new(obj)
96
+ actives[o.identifier] = o
97
+ end
98
+ end
99
+ deletes! deletes
100
+ actives.values.sort_by {|m| [m.time, m.identifier]}
101
+ end
102
+
103
+ def deletes! deletes
104
+ bucket.objects.delete deletes if deletes.size > 0
105
+ end
106
+ end
107
+ end
108
+
@@ -0,0 +1,61 @@
1
+ module AutoConsul::RunState
2
+ class CLIProvider
3
+ AGENT_MASK = 0b1
4
+ SERVER_MASK = 0b10
5
+
6
+ def check_run_state
7
+ result = 0
8
+ r, w = IO.pipe
9
+ if system("consul info", :out => w)
10
+ w.close
11
+ result = flags_from_output r
12
+ r.close
13
+ end
14
+ result
15
+ end
16
+
17
+ def flags_from_output stream
18
+ consul_block = false
19
+ result = 0
20
+ stream.each do |line|
21
+ if line =~ /^consul:\s/
22
+ consul_block = true
23
+ result = AGENT_MASK
24
+ break
25
+ end
26
+ end
27
+
28
+ if consul_block
29
+ stream.each do |line|
30
+ # Exit condition from consul block
31
+ break if line !~ /^\s+/
32
+
33
+ if line =~ /^\s+server\s+=\s+true/
34
+ result |= SERVER_MASK
35
+ break
36
+ end
37
+ end
38
+ end
39
+ result
40
+ end
41
+
42
+ def run_state
43
+ if @run_state.nil?
44
+ @run_state = check_run_state
45
+ end
46
+ @run_state
47
+ end
48
+
49
+ def agent?
50
+ (run_state & AGENT_MASK) > 0
51
+ end
52
+
53
+ def server?
54
+ (run_state & SERVER_MASK) > 0
55
+ end
56
+
57
+ def running?
58
+ run_state > 0
59
+ end
60
+ end
61
+ end
@@ -0,0 +1,5 @@
1
+ module AutoConsul::RunState
2
+ end
3
+
4
+ require_relative 'run_state/cli'
5
+
@@ -0,0 +1,56 @@
1
+ module AutoConsul
2
+ module Runner
3
+ SLEEP_INTERVAL = 2
4
+ RETRIES = 5
5
+
6
+ def self.launch_and_join(agent_args, remote_ip=nil)
7
+ pid = spawn(*(['consul', 'agent'] + agent_args))
8
+
9
+ # We really need to check that is running, but later.
10
+ return nil unless verify_running(pid)
11
+
12
+ if not remote_ip.nil?
13
+ join remote_ip
14
+ end
15
+
16
+ pid
17
+ end
18
+
19
+ def self.verify_running pid
20
+ RETRIES.times do |i|
21
+ sleep SLEEP_INTERVAL + (SLEEP_INTERVAL * i)
22
+ return true if system('consul', 'info')
23
+ end
24
+ return false
25
+ end
26
+
27
+ def self.join remote_ip
28
+ system('consul', 'join', remote_ip)
29
+ end
30
+
31
+ def self.pick_joining_host hosts
32
+ # Lets randomize this later.
33
+ hosts[0].data
34
+ end
35
+
36
+ def self.run_agent! identity, bind_ip, expiry, local_state, registry
37
+ remote_ip = pick_joining_host(registry.agents.members(expiry))
38
+ pid = launch_and_join(['-bind', bind_ip,
39
+ '-data-dir', local_state.data_path,
40
+ '-node', identity], remote_ip)
41
+ Process.wait pid
42
+ end
43
+
44
+ def self.run_server! identity, bind_ip, expiry, local_state, registry
45
+ members = registry.servers.members(expiry)
46
+ remote_ip = members.size > 0 ? pick_joining_host(members) : nil
47
+
48
+ args = ['-bind', bind_ip, '-data-dir', local_state.data_path, '-node', identity, '-server']
49
+ args << '-bootstrap' if members.size < 1
50
+
51
+ pid = launch_and_join(args, remote_ip)
52
+
53
+ Process.wait pid unless pid.nil?
54
+ end
55
+ end
56
+ end
@@ -0,0 +1,8 @@
1
+ module AutoConsul
2
+ end
3
+
4
+ require 'auto-consul/local'
5
+ require 'auto-consul/cluster'
6
+ require 'auto-consul/run_state'
7
+ require 'auto-consul/runner'
8
+