hastur 1.2.8

Sign up to get free protection for your applications and to get access to all the features.
@@ -0,0 +1,9 @@
1
+ *.swp
2
+ *.gem
3
+ .bundle
4
+ Gemfile.lock
5
+ pkg/*
6
+ doc/*
7
+ yardoc/*
8
+ coverage
9
+ .yardoc/*
@@ -0,0 +1 @@
1
+ --no-private
data/Gemfile ADDED
@@ -0,0 +1,3 @@
1
+ source "http://rubygems.org"
2
+
3
+ gemspec
data/README ADDED
@@ -0,0 +1,120 @@
1
+ What Is It?
2
+ -----------
3
+
4
+ Hastur is a monitoring system written by Ooyala. It uses Cassandra
5
+ for time series storage, resulting in remarkable power, flexibility
6
+ and scalability.
7
+
8
+ Hastur works hard to make it easy to add your data and easy to get it
9
+ back at full resolution. For instance, it makes it easy to query in
10
+ big batches from a REST server, build a dashboard of metrics, show
11
+ errors in production or email you when an error rate gets too high.
12
+
13
+ This gem helps you get your data into Hastur. See the "hastur-server"
14
+ gem for the back end, and for how to get your data back out.
15
+
16
+ How Do I Use It?
17
+ ----------------
18
+
19
+ Install this gem ("gem install hastur") or add it to your app's
20
+ Gemfile and run "bundle".
21
+
22
+ Add Hastur calls to your application, such as:
23
+
24
+ Hastur.counter "my.thing.to.count" # Add 1 to my.thing.to.count
25
+ Hastur.gauge "other.thing.foo_latency", 371.1 # Record a latency of 371.1
26
+
27
+ You can find extensive per-method documentation in the source code, or
28
+ see "Is It Documented?" below for friendly HTML documentation.
29
+
30
+ This is enough to instrument your application code, but you'll need to
31
+ install a local daemon and have a back-end collector for it to talk
32
+ to. See the hastur-server gem for specifics.
33
+
34
+ Hastur allows you to send at regular intervals using Hastur.every,
35
+ which will call a block from a background thread:
36
+
37
+ @total = 0
38
+ Hastur.every(:minute) { Hastur.gauge("total.counting.so.far", @total) }
39
+ loop { sleep 1; @total += 1 } # Count one per second, send it once per minute
40
+
41
+ The YARD documentation (see below) has far more specifics.
42
+
43
+ Is It Documented?
44
+ -----------------
45
+
46
+ We use YARD. "gem install yard redcarpet", then type "yardoc" from
47
+ this source directory. This will generate documentation -- point a
48
+ browser at "doc/index.html" for the top-level view.
49
+
50
+ Mechanism
51
+ ---------
52
+
53
+ Your messages are automatically timestamped in microseconds, labeled
54
+ and converted to a JSON structure for transport and storage.
55
+
56
+ Hastur sends the JSON over a local UDP socket to a local "Hastur
57
+ Agent", a daemon that forwards your data to the shared Hastur servers.
58
+ That means that your application will never slow down for Hastur --
59
+ failed sends become no-ops. Note that local UDP won't randomly drop
60
+ packets like internet UDP, though you can lose them if there's no
61
+ Hastur Agent running.
62
+
63
+ The Hastur Agent forwards the messages to Hastur Routers over ZeroMQ
64
+ (see "http://0mq.org"). The routers send it to the sinks, which
65
+ preprocess your data, index it and write it to Cassandra. They also
66
+ forward to the syndicators for the streaming interface (e.g. to email
67
+ you if there's a problem).
68
+
69
+ Cassandra is a highly scalable clustered key-value store inspired
70
+ somewhat by Amazon Dynamo. It's a lot of the "secret sauce" that
71
+ makes Hastur interesting.
72
+
73
+ Hints and Tips
74
+ --------------
75
+
76
+ 1. You can retrieve messages with the same name prefix all together from
77
+ the REST API (for instance: "my.thing.*"). It's usually a good idea
78
+ to give metrics the same prefix if you will retrieve them at the same
79
+ time. This prefix syntax is very efficient for Cassandra. That's why
80
+ we made it easy to use.
81
+
82
+ 2. Every call allows you to pass labels - a one-level string-to-string
83
+ hash of tags about what that call means and what data goes with it.
84
+ For instance, you might call:
85
+
86
+ Hastur.gauge "my.thing.total_latency", 317.4, :now, :units => "usec"
87
+
88
+ Eventually you'll be able to query messages by label through the REST
89
+ interface, but for now that's inconvenient. However, it's easy to
90
+ subscribe to labels in the streaming interface. So labels are a
91
+ powerful way to mark data as being interesting to alert you about.
92
+
93
+ For example:
94
+
95
+ Hastur.gauge "my.thing.total_latency", 317.4, :now, :severity => "omg!"
96
+
97
+ It's easy to subscribe to any latency with a severity label in the
98
+ streaming interface, which would let you calculate how bad the overall
99
+ latency pretty well. See the hastur-server gem for details of the
100
+ trigger interface.
101
+
102
+ 3. You can group multiple messages together by giving them the same
103
+ timestamp. For instance:
104
+
105
+ ts = Hastur.timestamp
106
+ Hastur.gauge "my.thing.latency1", val1, ts
107
+ Hastur.gauge "my.thing.latency2", val2, ts
108
+ Hastur.counter "my.thing.counter371", 1, ts
109
+
110
+ This makes it easy to query all events with exactly that timestamp
111
+ and the same prefix ("my.thing.*"), and otherwise to make sure they're
112
+ exactly the same.
113
+
114
+ Do *not* give multiple messages the same name *and* the same
115
+ timestamp. Hastur will only store a single event with the same name
116
+ and timestamp from the same node. If you give several of them the
117
+ same name and timestamp, you'll lose all but one.
118
+
119
+ Keep in mind that timestamps are in microseconds -- you're not limited
120
+ to one event with the same name per second.
@@ -0,0 +1,27 @@
1
+ require "bundler/gem_tasks"
2
+ require "rake/testtask"
3
+
4
+ namespace "test" do
5
+ desc "Run all unit tests"
6
+ Rake::TestTask.new(:units) do |t|
7
+ t.libs += ["test"]
8
+ t.test_files = Dir["test/*_test.rb"]
9
+ t.verbose = true
10
+ end
11
+ end
12
+
13
+ inclusion_tests = Dir["test/inclusion/*_test.rb"]
14
+
15
+ inclusion_tests.each do |test_filename|
16
+ test_name = test_filename.split("/")[-1].sub(/_test\.rb$/, "").gsub("_", " ")
17
+
18
+ desc "Hastur #{test_name} inclusion test"
19
+ task "test:inclusion:#{test_name}" do
20
+ system("ruby", "-I.", test_filename)
21
+ raise "Test #{test_name} failed!" unless $?.success?
22
+ end
23
+
24
+ task "test:inclusions" => "test:inclusion:#{test_name}"
25
+ end
26
+
27
+ task "test" => [ "test:units", "test:inclusions" ]
@@ -0,0 +1,127 @@
1
+ #!/usr/bin/env ruby
2
+
3
+ require "hastur"
4
+ require "chronic"
5
+ require "trollop"
6
+
7
+ opts = Trollop::options do
8
+ banner <<EOS
9
+ hastur is a command-line program to send Hastur metrics.
10
+
11
+ Usage:
12
+ hastur [options] <type> [<name> [<value>]]
13
+
14
+ Examples:
15
+ hastur counter things.to.do 4 --labels app=MyApp type=todo
16
+ hastur heartbeat slow.cron.job
17
+ hastur mark script.ran.after.failed.job --labels env=development activity=debugging
18
+ hastur gauge old.gauge 37.1 --time "3 months ago Saturday at 5pm"
19
+
20
+ Options:
21
+ EOS
22
+ opt :time, "Timestamp to send", :type => String
23
+ opt :label, "Labels to send", :type => :strings, :multi => true
24
+ opt :print, "Print the call args", :type => :boolean, :default => false
25
+ end
26
+
27
+ Trollop::die "you must give a type!" if ARGV.empty?
28
+ Type = ARGV.shift.downcase
29
+
30
+ # Args:
31
+ # - mark: name, value, timestamp, labels
32
+ # - counter: name, increment, timestamp, labels
33
+ # - gauge: name, value, timestamp, labels
34
+ # - event: name, subject, body, attn, timestamp, labels
35
+ # - heartbeat: name, value, timeout, timestamp, labels
36
+
37
+ TYPES = {
38
+ "mark" => {
39
+ :name => true,
40
+ :value => :maybe,
41
+ },
42
+ "gauge" => {
43
+ :name => true,
44
+ :value => true,
45
+ },
46
+ "counter" => {
47
+ :name => true,
48
+ :value => :maybe,
49
+ },
50
+ "heartbeat" => {
51
+ :name => :maybe,
52
+ :value => :maybe,
53
+ :timeout => :maybe,
54
+ }
55
+ }
56
+
57
+ Trollop::die "Type must be one of: #{TYPES.keys.join(', ')}" unless TYPES[Type]
58
+
59
+ #
60
+ # This method tries to evaluate a string as Ruby and, if it can't,
61
+ # dies saying so.
62
+ #
63
+ # @param [String] value_string The code to evaluate
64
+ # @param [String] description What the value will be used as
65
+ # @return The result of the eval
66
+ # @raise TrollopException Your string didn't parse or raised an exception
67
+ #
68
+ def try_eval(value_string, description)
69
+ begin
70
+ value = eval(value_string)
71
+ rescue Exception
72
+ Trollop::die "Your #{description} (#{value_string}) didn't run as Ruby: #{$!.message}"
73
+ end
74
+ value
75
+ end
76
+
77
+ #
78
+ # Try to get an argument by name if this message type supports it.
79
+ #
80
+ def try_get_arg(args, arg_name, message_type)
81
+ # Doesn't allow this arg type? Return quietly.
82
+ return unless TYPES[Type][arg_name]
83
+
84
+ if ARGV.size > 0
85
+ # If the arg is here and TYPES[Type][arg_name] is true or maybe, use it.
86
+ if block_given?
87
+ args << yield(ARGV.shift, arg_name, message_type)
88
+ else
89
+ args << try_eval(ARGV.shift, arg_name)
90
+ end
91
+ elsif TYPES[Type][arg_name] == :maybe
92
+ args << nil
93
+ else
94
+ Trollop::die "You must give a #{arg_name} for a metric of type #{Type}"
95
+ end
96
+ end
97
+
98
+ ##############################
99
+ # Build the argument list
100
+ ##############################
101
+ args = [ Type ]
102
+
103
+ try_get_arg(args, :name, Type) { |arg, _, _| arg.to_s }
104
+ try_get_arg(args, :value, Type)
105
+ # TODO(noah): add timeout for heartbeat
106
+
107
+ # Time is next to last
108
+ time = Time.now
109
+ if opts[:time]
110
+ time = Chronic.parse opts[:time]
111
+ end
112
+ args << time
113
+
114
+ # Labels is last
115
+ labels = {}
116
+ if opts[:label]
117
+ opts[:label].flatten.each do |item|
118
+ name, value = item.split("=")
119
+ labels[name] = try_eval(value, "label value")
120
+ end
121
+ end
122
+
123
+ args << labels
124
+
125
+ puts "Hastur.send *#{args.inspect}" if opts[:print]
126
+
127
+ Hastur.send(*args)
@@ -0,0 +1,10 @@
1
+ #!/bin/bash
2
+
3
+ : ${REPO_ROOT:="$WORKSPACE"}
4
+ source $HOME/.rvm/scripts/rvm
5
+
6
+ cd $REPO_ROOT/hastur
7
+ rvm --create use 1.9.3@hastur
8
+ gem install --no-rdoc --no-ri bundler
9
+ bundle install
10
+ COVERAGE=true bundle exec rake --trace test:units
@@ -0,0 +1,4 @@
1
+ source :rubygems
2
+
3
+ gem "goliath"
4
+ gem "em-http-request"
@@ -0,0 +1 @@
1
+ Here is a short example how to log GET requests using Hastur in a transparent proxy.
@@ -0,0 +1,93 @@
1
+ require "rubygems"
2
+ require "goliath"
3
+ require "em-synchrony/em-http"
4
+ require "time"
5
+
6
+ $LOAD_PATH << File.join(File.dirname(__FILE__), "../lib")
7
+ require "hastur/eventmachine"
8
+
9
+ class GetProxy < Goliath::API
10
+ use Goliath::Rack::Params
11
+
12
+ attr_reader :backend
13
+ def initialize
14
+ ::ARGV.each_with_index do |arg,idx|
15
+ if arg == "--backend"
16
+ @backend = ::ARGV[idx + 1]
17
+ ::ARGV.slice! idx, 2
18
+ break
19
+ end
20
+ end
21
+
22
+ unless @backend
23
+ raise "Initialization error: could not determine backend server, try --backend <url>"
24
+ end
25
+
26
+ super
27
+ end
28
+
29
+ def response(env)
30
+ url = "#{@backend}#{env['REQUEST_PATH']}"
31
+ start = Hastur.timestamp
32
+ http = EM::HttpRequest.new(url).get :query => params
33
+ done = Hastur.timestamp
34
+
35
+ uri = URI.parse url
36
+
37
+ # Hastur was designed to be queried ground-up using labels. Liberal use
38
+ # of labels is recommended. We add labels as we need them.
39
+ labels = { :scheme => uri.scheme,
40
+ :host => uri.host,
41
+ :port => uri.port
42
+ }
43
+
44
+ case http.response_header.status
45
+ when 300..307
46
+ # Marks are interesting, but non-critical points. Value defaults to nil,
47
+ # timestamp defaults to 'now'.
48
+ Hastur.mark(
49
+ "test.proxy.3xx", # name
50
+ nil, # value
51
+ start, # timestamp
52
+ :status => :moved # label
53
+ )
54
+ labels[:status] = "3xx"
55
+ when 400..417
56
+ # Log is used for low priority data that will be buffered and batched by
57
+ # Hastur. Severity is optional and irrelevant to delivery.
58
+ Hastur.log(
59
+ "test.proxy.4xx", # name
60
+ { # data
61
+ :path => uri.path,
62
+ :query => uri.query,
63
+ },
64
+ start # timestamp
65
+ )
66
+ labels[:status] = "4xx"
67
+ when 500.505
68
+ # Event is serious business. Hastur will punish the little elves crankin
69
+ # in it bowels mercilessly to get this out and about ASAP.
70
+ Hastur.event(
71
+ "test.proxy.5xx", # name
72
+ "Internal Server Error", # subject
73
+ nil, # body
74
+ ["devnull@ooyala.com"], # attn
75
+ start, # timestamp
76
+ :path => uri.path, # labels
77
+ :query => uri.query # labels
78
+ )
79
+ labels[:status] = "5xx"
80
+ end
81
+
82
+ # Gauges are used to track values.
83
+ Hastur.gauge(
84
+ # Use . to separate namespaces in Hastur.
85
+ "test.proxy.latencies.ttr", # name
86
+ done.to_f - start.to_f, # value
87
+ start, # timestamp
88
+ labels # labels
89
+ )
90
+
91
+ [http.response_header.status, http.response_header, http.response]
92
+ end
93
+ end
@@ -0,0 +1,28 @@
1
+ # -*- encoding: utf-8 -*-
2
+ $:.push File.expand_path("../lib", __FILE__)
3
+ require "hastur/version"
4
+
5
+ Gem::Specification.new do |s|
6
+ s.name = "hastur"
7
+ s.version = Hastur::VERSION
8
+ s.platform = Gem::Platform::RUBY
9
+ s.authors = ["Viet Nguyen"]
10
+ s.email = ["viet@ooyala.com"]
11
+ s.homepage = "http://www.ooyala.com"
12
+ s.description = "Hastur API client gem"
13
+ s.summary = "A gem used to communicate with the Hastur Client through UDP."
14
+ s.rubyforge_project = "hastur"
15
+
16
+ s.executables = `git ls-files -- bin/*`.split("\n").map{ |f| File.basename(f) }
17
+
18
+ s.add_development_dependency "yard"
19
+ s.add_development_dependency "mocha"
20
+ s.add_development_dependency "minitest"
21
+ s.add_development_dependency "simplecov" if RUBY_VERSION[/^1.9/]
22
+ s.add_development_dependency "rake"
23
+ s.add_runtime_dependency "multi_json", "~>1.3.2"
24
+ s.add_runtime_dependency "chronic"
25
+
26
+ s.files = `git ls-files`.split("\n")
27
+ s.require_paths = ["lib"]
28
+ end
@@ -0,0 +1,2 @@
1
+ require "hastur/api"
2
+ Hastur.start
@@ -0,0 +1,732 @@
1
+ require "multi_json"
2
+ require "socket"
3
+ require "date"
4
+ require "thread"
5
+
6
+ require "hastur/version"
7
+
8
+ #
9
+ # Hastur API gem that allows services/apps to easily publish
10
+ # correct Hastur-commands to their local machine's UDP sockets.
11
+ # Bare minimum for all JSON packets is to have :type key/values to
12
+ # map to a hastur message type, which the router uses for sink delivery.
13
+ #
14
+ module Hastur
15
+ extend self
16
+
17
+ # TODO(noah): Change all instance variables to use Hastur.variable
18
+ # and add attr_reader/attr_accessor for them if appropriate.
19
+ # Right now you could use a mix of Hastur.variable and including
20
+ # the Hastur module and get two full sets of Hastur stuff.
21
+ # This will only matter if people include Hastur directly,
22
+ # which we haven't documented as possible.
23
+
24
+ class << self
25
+ attr_accessor :mutex
26
+ end
27
+
28
+ Hastur.mutex ||= Mutex.new
29
+
30
+ SECS_2100 = 4102444800
31
+ MILLI_SECS_2100 = 4102444800000
32
+ MICRO_SECS_2100 = 4102444800000000
33
+ NANO_SECS_2100 = 4102444800000000000
34
+
35
+ SECS_1971 = 31536000
36
+ MILLI_SECS_1971 = 31536000000
37
+ MICRO_SECS_1971 = 31536000000000
38
+ NANO_SECS_1971 = 31536000000000000
39
+
40
+ PLUGIN_INTERVALS = [ :five_minutes, :thirty_minutes, :hourly, :daily, :monthly ]
41
+
42
+ #
43
+ # Prevents starting a background thread under any circumstances.
44
+ #
45
+ def no_background_thread!
46
+ @prevent_background_thread = true
47
+ end
48
+
49
+ START_OPTS = [
50
+ :background_thread
51
+ ]
52
+
53
+ #
54
+ # Start Hastur's background thread and/or do process registration
55
+ # or neither, according to what options are set.
56
+ #
57
+ # @param [Hash] opts The options for features
58
+ # @option opts [boolean] :background_thread Whether to start a background thread
59
+ #
60
+ def start(opts = {})
61
+ bad_keys = opts.keys - START_OPTS
62
+ raise "Unknown options to Hastur.start: #{bad_keys.join(", ")}!" unless bad_keys.empty?
63
+
64
+ unless @prevent_background_thread ||
65
+ (opts.has_key?(:background_thread) && !opts[:background_thread])
66
+ start_background_thread
67
+ end
68
+
69
+ @process_registration_done = true
70
+ register_process Hastur.app_name, {}
71
+ end
72
+
73
+ #
74
+ # Starts a background thread that will execute blocks of code every so often.
75
+ #
76
+ def start_background_thread
77
+ if @prevent_background_thread
78
+ raise "You can't start a background thread! Somebody called .no_background_thread! already."
79
+ end
80
+
81
+ return if @bg_thread
82
+
83
+ @intervals = [:five_secs, :minute, :hour, :day]
84
+ @interval_values = [5, 60, 60*60, 60*60*24 ]
85
+ __reset_bg_thread__
86
+ end
87
+
88
+ #
89
+ # This should ordinarily only be for testing. It kills the
90
+ # background thread so that automatic heartbeats and .every() blocks
91
+ # don't happen. If you restart the background thread, all your
92
+ # .every() blocks go away, but the process heartbeat is restarted.
93
+ #
94
+ def kill_background_thread
95
+ __kill_bg_thread__
96
+ end
97
+
98
+ #
99
+ # Returns whether the background thread is currently running.
100
+ # @todo Debug this.
101
+ #
102
+ def background_thread?
103
+ @bg_thread && !@bg_thread.alive?
104
+ end
105
+
106
+ #
107
+ # Best effort to make all timestamps be Hastur timestamps, 64 bit
108
+ # numbers that represent the total number of microseconds since Jan
109
+ # 1, 1970 at midnight UTC. Accepts second, millisecond or nanosecond
110
+ # timestamps and Ruby times. You can also give :now or nil for Time.now.
111
+ #
112
+ # @param timestamp The timestamp as a Fixnum, Float or Time. Defaults to Time.now.
113
+ # @return [Fixnum] Number of microseconds since Jan 1, 1970 midnight UTC
114
+ # @raise RuntimeError Unable to validate timestamp format
115
+ #
116
+ def epoch_usec(timestamp=Time.now)
117
+ timestamp = Time.now if timestamp.nil? || timestamp == :now
118
+
119
+ case timestamp
120
+ when Time
121
+ (timestamp.to_f*1000000).to_i
122
+ when DateTime
123
+ # Ruby 1.8.7 doesn't have to DateTime#to_time or DateTime#to_f method.
124
+ # For right now, declare failure.
125
+ raise "Ruby DateTime objects are not yet supported!"
126
+ when SECS_1971..SECS_2100
127
+ timestamp * 1000000
128
+ when MILLI_SECS_1971..MILLI_SECS_2100
129
+ timestamp * 1000
130
+ when MICRO_SECS_1971..MICRO_SECS_2100
131
+ timestamp
132
+ when NANO_SECS_1971..NANO_SECS_2100
133
+ timestamp / 1000
134
+ else
135
+ raise "Unable to validate timestamp: #{timestamp}"
136
+ end
137
+ end
138
+
139
+ alias :timestamp :epoch_usec
140
+
141
+ #
142
+ # Attempts to determine the application name, or uses an
143
+ # application-provided one, if set. In order, Hastur checks:
144
+ #
145
+ # * User-provided app name via Hastur.app_name=
146
+ # * HASTUR_APP_NAME environment variable
147
+ # * ::HASTUR_APP_NAME Ruby constant
148
+ # * Ecology.application, if set
149
+ # * File.basename($0)
150
+ #
151
+ # @return [String] The application name, or best guess at same
152
+ #
153
+ def app_name
154
+ return @app_name if @app_name
155
+
156
+ return @app_name = ENV['HASTUR_APP_NAME'] if ENV['HASTUR_APP_NAME']
157
+
158
+ top_level = ::HASTUR_APP_NAME rescue nil
159
+ return @app_name = top_level if top_level
160
+
161
+ eco = Ecology rescue nil
162
+ return @app_name = Ecology.application if eco
163
+
164
+ @app_name = File.basename $0
165
+ end
166
+ alias application app_name
167
+
168
+ #
169
+ # Set the application name that Hastur registers as.
170
+ #
171
+ # @param [String] new_name The new application name.
172
+ #
173
+ def app_name=(new_name)
174
+ old_name = @app_name
175
+
176
+ @app_name = new_name
177
+
178
+ if @process_registration_done
179
+ err_str = "You changed the application name from #{old_name} to " +
180
+ "#{new_name} after the process was registered!"
181
+ STDERR.puts err_str
182
+ Hastur.log err_str
183
+ end
184
+ end
185
+ alias application= app_name=
186
+
187
+ #
188
+ # Add default labels which will be sent back with every Hastur
189
+ # message sent by this process. The labels will be sent back with
190
+ # the same constant value each time that is specified in the labels
191
+ # hash.
192
+ #
193
+ # This is a useful way to send back information that won't change
194
+ # during the run, or that will change only occasionally like
195
+ # resource usage, server information, deploy environment, etc. The
196
+ # same kind of information can be sent back using info_process(), so
197
+ # consider which way makes more sense for your case.
198
+ #
199
+ # @param [Hash] new_default_labels A hash of new labels to send.
200
+ #
201
+ def add_default_labels(new_default_labels)
202
+ @default_labels ||= {}
203
+
204
+ @default_labels.merge!
205
+ end
206
+
207
+ #
208
+ # Remove default labels which will be sent back with every Hastur
209
+ # message sent by this process. This cannot remove the three
210
+ # automatic defaults (application, pid, tid). Keys that have not
211
+ # been added cannot be removed, and so will be silently ignored (no
212
+ # exception will be raised).
213
+ #
214
+ # @param [Array<String> or multiple strings] default_label_keys Keys to stop sending
215
+ #
216
+ def remove_default_label_names(*default_label_keys)
217
+ keys_to_remove = default_label_keys.flatten
218
+
219
+ keys_to_remove.each { |key| @default_labels.delete(key) }
220
+ end
221
+
222
+ #
223
+ # Reset the default labels which will be sent back with every Hastur
224
+ # message sent by this process. After this, only the automatic
225
+ # default labels (process ID, thread ID, application name) will be
226
+ # sent, plus of course the ones specified for the specific Hastur
227
+ # message call.
228
+ #
229
+ def reset_default_labels
230
+ @default_labels = {}
231
+ end
232
+
233
+ #
234
+ # Reset Hastur module for tests. This removes all settings and
235
+ # kills the background thread, resetting Hastur to its initial
236
+ # pre-start condition.
237
+ #
238
+ def reset
239
+ __kill_bg_thread__
240
+ @app_name = nil
241
+ @prevent_background_thread = nil
242
+ @process_registration_done = nil
243
+ @udp_port = nil
244
+ @__delivery_method__ = nil
245
+ @scheduled_blocks = nil
246
+ @last_time = nil
247
+ @intervals = nil
248
+ @interval_values = nil
249
+ @default_labels = nil
250
+ @message_name_prefix = nil
251
+ end
252
+
253
+ #
254
+ # Set a message-name prefix for all message types that have names.
255
+ # It will be prepended automatically for those message types' names.
256
+ # A nil value will be treated as the empty string. Plugin names
257
+ # don't count as message names for these purposes, and will not be
258
+ # prefixed.
259
+ #
260
+ def message_name_prefix=(value)
261
+ @message_name_prefix = value
262
+ end
263
+
264
+ def message_name_prefix
265
+ @message_name_prefix || ""
266
+ end
267
+
268
+ protected
269
+
270
+ #
271
+ # Sends a compound data structure to Hastur. This is protected and for
272
+ # internal use only at the moment and is used for system statistics
273
+ # that are automatically collected by Hastur Agent.
274
+ #
275
+ # @param [String] name The counter name
276
+ # @param [Hash,Array] value compound value
277
+ # @param timestamp The timestamp as a Fixnum, Float, Time or :now
278
+ # @param [Hash] labels Any additional data labels to send
279
+ #
280
+ def compound(name, value=[], timestamp=:now, labels={})
281
+ send_to_udp :type => :compound,
282
+ :name => message_name_prefix + (name || ""),
283
+ :value => value,
284
+ :timestamp => epoch_usec(timestamp),
285
+ :labels => default_labels.merge(labels)
286
+ end
287
+
288
+ #
289
+ # Returns the default labels for any UDP message that ships.
290
+ #
291
+ def default_labels
292
+ pid = Process.pid
293
+ thread = Thread.current
294
+ unless thread[:tid]
295
+ thread[:tid] = thread_id(thread)
296
+ end
297
+
298
+ {
299
+ :pid => pid,
300
+ :tid => thread[:tid],
301
+ :app => app_name,
302
+ }
303
+ end
304
+
305
+ #
306
+ # This is a convenience function because the Ruby
307
+ # thread API has no accessor for the thread ID,
308
+ # but includes it in "to_s" (buh?)
309
+ #
310
+ def thread_id(thread)
311
+ return 0 if thread == Thread.main
312
+
313
+ str = thread.to_s
314
+
315
+ match = nil
316
+ match = str.match /(0x\d+)/
317
+ return nil unless match
318
+ match[1].to_i
319
+ end
320
+
321
+ #
322
+ # Get the UDP port.
323
+ #
324
+ # @return The UDP port. Defaults to 8125.
325
+ #
326
+ def udp_port
327
+ @udp_port || 8125
328
+ end
329
+
330
+ private
331
+
332
+ #
333
+ # Sends a message unmolested to the HASTUR_UDP_PORT on 127.0.0.1
334
+ #
335
+ # @param m The message to send
336
+ # @todo Combine this with __send_to_udp__
337
+ #
338
+ def send_to_udp(m)
339
+ if @__delivery_method__
340
+ @__delivery_method__.call(m)
341
+ else
342
+ __send_to_udp__(m)
343
+ end
344
+ end
345
+
346
+ def __send_to_udp__(m)
347
+ begin
348
+ u = ::UDPSocket.new
349
+ mj = MultiJson.dump m
350
+ u.send mj, 0, "127.0.0.1", udp_port
351
+ rescue Errno::EMSGSIZE => e
352
+ return if @no_recurse
353
+ @no_recurse = true
354
+ err = "Message too long to send via Hastur UDP Socket. " +
355
+ "Backtrace: #{e.backtrace.inspect} " + "(Truncated) Message: #{mj}"
356
+ Hastur.log err
357
+ @no_recurse = false
358
+ rescue Exception => e
359
+ return if @no_recurse
360
+ @no_recurse = true
361
+ err = "Exception sending via Hastur UDP Socket. " + "Exception: #{e.message} " +
362
+ "Backtrace: #{e.backtrace.inspect} " + "(Truncated) Message: #{mj}"
363
+ Hastur.log err
364
+ @no_recurse = false
365
+
366
+ end
367
+ end
368
+
369
+ #
370
+ # Kills the background thread if it's running.
371
+ #
372
+ def __kill_bg_thread__
373
+ if @bg_thread
374
+ @bg_thread.kill
375
+ @bg_thread = nil
376
+ end
377
+ end
378
+
379
+ #
380
+ # Resets Hastur's background thread, removing all scheduled
381
+ # callbacks and resetting the times for all intervals. This is TEST
382
+ # MODE ONLY and will do TERRIBLE THINGS IF CALLED IN PRODUCTION.
383
+ #
384
+ def __reset_bg_thread__
385
+ if @prevent_background_thread
386
+ raise "You can't start a background thread! Somebody called .no_background_thread! already."
387
+ end
388
+
389
+ __kill_bg_thread__
390
+
391
+ @last_time ||= Hash.new
392
+
393
+ Hastur.mutex.synchronize do
394
+ @scheduled_blocks ||= Hash.new
395
+
396
+ # initialize all of the scheduling hashes
397
+ @intervals.each do |interval|
398
+ @last_time[interval] = Time.at(0)
399
+ @scheduled_blocks[interval] = []
400
+ end
401
+ end
402
+
403
+ # add a heartbeat background job
404
+ every :minute do
405
+ heartbeat("process_heartbeat")
406
+ end
407
+
408
+ # define a thread that will schedule and execute all of the background jobs.
409
+ # it is not very accurate on the scheduling, but should not be a problem
410
+ @bg_thread = Thread.new do
411
+ begin
412
+ loop do
413
+ # for each of the interval buckets
414
+ curr_time = Time.now
415
+
416
+ @intervals.each_with_index do |interval, idx|
417
+ to_call = []
418
+
419
+ # Don't need to dup this because we never change the old
420
+ # array, only reassign a new one.
421
+ Hastur.mutex.synchronize { to_call = @scheduled_blocks[interval] }
422
+
423
+ # execute the scheduled items if time is up
424
+ if curr_time - @last_time[ interval ] >= @interval_values[idx]
425
+ @last_time[interval] = curr_time
426
+ to_call.each(&:call)
427
+ end
428
+ end
429
+
430
+ # TODO(noah): increase this time?
431
+ sleep 1 # rest
432
+ end
433
+ rescue Exception => e
434
+ STDERR.puts e.inspect
435
+ end
436
+ end
437
+ end
438
+
439
+ public
440
+
441
+ #
442
+ # Set delivery method to the given proc/block. The block is saved
443
+ # and called with each message to be sent. If no block is given or
444
+ # if this method is not called, the delivery method defaults to
445
+ # sending over the configured UDP port.
446
+ #
447
+ def deliver_with(&block)
448
+ @__delivery_method__ = block
449
+ end
450
+
451
+ #
452
+ # Set the UDP port. Defaults to 8125
453
+ #
454
+ # @param [Fixnum] new_port The new port number.
455
+ #
456
+ def udp_port=(new_port)
457
+ @udp_port = new_port
458
+ end
459
+
460
+ #
461
+ # Sends a 'mark' stat to Hastur. A mark gives the time that
462
+ # an interesting event occurred even with no value attached.
463
+ # You can also use a mark to send back string-valued stats
464
+ # that might otherwise be guages -- "Green", "Yellow",
465
+ # "Red" or similar.
466
+ #
467
+ # It is different from a Hastur event because it happens at
468
+ # stat priority -- it can be batched or slightly delayed,
469
+ # and doesn't have an end-to-end acknowledgement included.
470
+ #
471
+ # @param [String] name The mark name
472
+ # @param [String] value An optional string value
473
+ # @param timestamp The timestamp as a Fixnum, Float, Time or :now
474
+ # @param [Hash] labels Any additional data labels to send
475
+ #
476
+ def mark(name, value = nil, timestamp=:now, labels={})
477
+ send_to_udp :type => :mark,
478
+ :name => message_name_prefix + (name || ""),
479
+ :value => value,
480
+ :timestamp => epoch_usec(timestamp),
481
+ :labels => default_labels.merge(labels)
482
+ end
483
+
484
+ #
485
+ # Sends a 'counter' stat to Hastur. Counters are linear,
486
+ # and are sent as deltas (differences). Sending a
487
+ # value of 1 adds 1 to the counter.
488
+ #
489
+ # @param [String] name The counter name
490
+ # @param [Fixnum] value Amount to increment the counter by
491
+ # @param timestamp The timestamp as a Fixnum, Float, Time or :now
492
+ # @param [Hash] labels Any additional data labels to send
493
+ #
494
+ def counter(name, value=1, timestamp=:now, labels={})
495
+ send_to_udp :type => :counter,
496
+ :name => message_name_prefix + (name || ""),
497
+ :value => value,
498
+ :timestamp => epoch_usec(timestamp),
499
+ :labels => default_labels.merge(labels)
500
+ end
501
+
502
+ #
503
+ # Sends a 'gauge' stat to Hastur. A gauge's value may or may
504
+ # not be on a linear scale. It is sent as an exact value, not
505
+ # a difference.
506
+ #
507
+ # @param [String] name The mark name
508
+ # @param value The value of the gauge as a Fixnum or Float
509
+ # @param timestamp The timestamp as a Fixnum, Float, Time or :now
510
+ # @param [Hash] labels Any additional data labels to send
511
+ #
512
+ def gauge(name, value, timestamp=:now, labels={})
513
+ send_to_udp :type => :gauge,
514
+ :name => message_name_prefix + (name || ""),
515
+ :value => value,
516
+ :timestamp => epoch_usec(timestamp),
517
+ :labels => default_labels.merge(labels)
518
+ end
519
+
520
+ #
521
+ # Sends an event to Hastur. An event is high-priority and never buffered,
522
+ # and will be sent preferentially to stats or heartbeats. It includes
523
+ # an end-to-end acknowledgement to ensure arrival, but is expensive
524
+ # to store, send and query.
525
+ #
526
+ # 'Attn' is a mechanism to describe the system or component in which the
527
+ # event occurs and who would care about it. Obvious values to include in the
528
+ # array include user logins, email addresses, team names, and server, library
529
+ # or component names. This allows making searches like "what events should I
530
+ # worry about?" or "what events have recently occurred on the Rails server?"
531
+ #
532
+ # @param [String] name The name of the event (ex: "bad.log.line")
533
+ # @param [String] subject The subject or message for this specific event (ex "Got bad log line: @#$#@garbage@#$#@")
534
+ # @param [String] body An optional body with details of the event. A stack trace or email body would go here.
535
+ # @param [Array] attn The relevant components or teams for this event. Web hooks or email addresses would go here.
536
+ # @param timestamp The timestamp as a Fixnum, Float, Time or :now
537
+ # @param [Hash] labels Any additional data labels to send
538
+ #
539
+ def event(name, subject=nil, body=nil, attn=[], timestamp=:now, labels={})
540
+ send_to_udp :type => :event,
541
+ :name => message_name_prefix + (name || ""),
542
+ :subject => subject.to_s[0...3_072],
543
+ :body => body.to_s[0...3_072],
544
+ :attn => [ attn ].flatten,
545
+ :timestamp => epoch_usec(timestamp),
546
+ :labels => default_labels.merge(labels)
547
+ end
548
+
549
+ #
550
+ # Sends a log line to Hastur. A log line is of relatively low
551
+ # priority, comparable to stats, and is allowed to be buffered or
552
+ # batched while higher-priority data is sent first.
553
+ #
554
+ # Severity can be included in the data field with the tag
555
+ # "severity" if desired.
556
+ #
557
+ # @param [String] subject The subject or message for this specific log (ex "Got bad input: @#$#@garbage@#$#@")
558
+ # @param [Hash] data Additional JSON-able data to be sent
559
+ # @param timestamp The timestamp as a Fixnum, Float, Time or :now
560
+ # @param [Hash] labels Any additional data labels to send
561
+ #
562
+ def log(subject=nil, data={}, timestamp=:now, labels={})
563
+ send_to_udp :type => :log,
564
+ :subject => subject.to_s[0...7_168],
565
+ :data => data,
566
+ :timestamp => epoch_usec(timestamp),
567
+ :labels => default_labels.merge(labels)
568
+ end
569
+
570
+ #
571
+ # Sends a process registration to Hastur. This indicates that the
572
+ # process is currently running, and that heartbeats should be sent
573
+ # for some time afterward.
574
+ #
575
+ # @param [String] name The name of the application or best guess
576
+ # @param [Hash] data The additional data to include with the registration
577
+ # @param timestamp The timestamp as a Fixnum, Float, Time or :now
578
+ # @param [Hash] labels Any additional data labels to send
579
+ #
580
+ def register_process(name = app_name, data = {}, timestamp = :now, labels = {})
581
+ send_to_udp :type => :reg_process,
582
+ :data => { "language" => "ruby", "version" => Hastur::VERSION }.merge(data),
583
+ :timestamp => epoch_usec(timestamp),
584
+ :labels => default_labels.merge(labels)
585
+ end
586
+
587
+ #
588
+ # Sends freeform process information to Hastur. This can be
589
+ # supplemental information about resources like memory, loaded gems,
590
+ # Ruby version, files open and whatnot. It can be additional
591
+ # configuration or deployment information like environment
592
+ # (dev/staging/prod), software or component version, etc. It can be
593
+ # information about the application as deployed, as run, or as it is
594
+ # currently running.
595
+ #
596
+ # The default labels contain application name and process ID to
597
+ # match this information with the process registration and similar
598
+ # details.
599
+ #
600
+ # Any number of these can be sent as information changes or is
601
+ # superceded. However, if information changes constantly or needs
602
+ # to be graphed or alerted on, send that separately as a metric or
603
+ # event. Info_process messages are freeform and not readily
604
+ # separable or graphable.
605
+ #
606
+ # @param [String] tag The tag or title of this chunk of process info
607
+ # @param [Hash] data The detailed data being sent
608
+ # @param timestamp The timestamp as a Fixnum, Float, Time or :now
609
+ # @param [Hash] labels Any additional data labels to send
610
+ #
611
+ def info_process(tag, data = {}, timestamp = :now, labels = {})
612
+ send_to_udp :type => :info_process,
613
+ :tag => tag,
614
+ :data => data,
615
+ :timestamp => epoch_usec(timestamp),
616
+ :labels => default_labels.merge(labels)
617
+ end
618
+
619
+ #
620
+ # This sends back freeform data about the agent or host that Hastur
621
+ # is running on. Sample uses include what libraries or packages are
622
+ # installed and available, the total installed memory
623
+ #
624
+ # Any number of these can be sent as information changes or is
625
+ # superceded. However, if information changes constantly or needs
626
+ # to be graphed or alerted on, send that separately as a metric or
627
+ # event. Info_agent messages are freeform and not readily separable
628
+ # or graphable.
629
+ #
630
+ # @param [String] tag The tag or title of this chunk of process info
631
+ # @param [Hash] data The detailed data being sent
632
+ # @param timestamp The timestamp as a Fixnum, Float, Time or :now
633
+ # @param [Hash] labels Any additional data labels to send
634
+ #
635
+ def info_agent(tag, data = {}, timestamp = :now, labels = {})
636
+ send_to_udp :type => :info_agent,
637
+ :tag => tag,
638
+ :data => data,
639
+ :timestamp => epoch_usec(timestamp),
640
+ :labels => default_labels.merge(labels)
641
+ end
642
+
643
+ #
644
+ # Sends a plugin registration to Hastur. A plugin is a program on the host machine which
645
+ # can be run to determine status of the machine, an application or anything else interesting.
646
+ #
647
+ # This registration tells Hastur to begin scheduling runs
648
+ # of the plugin and report back on the resulting status codes or crashes.
649
+ #
650
+ # @param [String] name The name of the plugin, and of the heartbeat sent back
651
+ # @param [String] plugin_path The path on the local file system to this plugin executable
652
+ # @param [Array] plugin_args The array of arguments to pass to the plugin executable
653
+ # @param [Symbol] plugin_interval The interval to run the plugin. The scheduling will be slightly approximate. One of: PLUGIN_INTERVALS
654
+ # @param timestamp The timestamp as a Fixnum, Float, Time or :now
655
+ # @param [Hash] labels Any additional data labels to send
656
+ #
657
+ def register_plugin(name, plugin_path, plugin_args, plugin_interval, timestamp=:now, labels={})
658
+ unless PLUGIN_INTERVALS.include?(plugin_interval)
659
+ raise "Interval must be one of: #{PLUGIN_INTERVALS.join(', ')}"
660
+ end
661
+ send_to_udp :type => :reg_pluginv1,
662
+ :plugin_path => plugin_path,
663
+ :plugin_args => plugin_args,
664
+ :interval => plugin_interval,
665
+ :plugin => name,
666
+ :timestamp => epoch_usec(timestamp),
667
+ :labels => default_labels.merge(labels)
668
+ end
669
+
670
+ #
671
+ # Sends a heartbeat to Hastur. A heartbeat is a periodic
672
+ # message which indicates that a host, application or
673
+ # service is currently running. It is higher priority
674
+ # than a statistic and should not be batched, but is
675
+ # lower priority than an event does not include an
676
+ # end-to-end acknowledgement.
677
+ #
678
+ # Plugin results are sent as a heartbeat with the
679
+ # plugin's name as the heartbeat name.
680
+ #
681
+ # @param [String] name The name of the heartbeat.
682
+ # @param value The value of the heartbeat as a Fixnum or Float
683
+ # @param [Float] timeout How long in seconds to expect to wait, at maximum, before the next heartbeat. If this is nil, don't worry if it doesn't arrive.
684
+ # @param timestamp The timestamp as a Fixnum, Float, Time or :now
685
+ # @param [Hash] labels Any additional data labels to send
686
+ #
687
+ def heartbeat(name="application.heartbeat", value=nil, timeout = nil, timestamp=:now, labels={})
688
+ send_to_udp :name => message_name_prefix + (name || ""),
689
+ :type => :hb_process,
690
+ :value => value,
691
+ :timestamp => epoch_usec(timestamp),
692
+ :labels => default_labels.merge(labels)
693
+ end
694
+
695
+ #
696
+ # Run the block and report its runtime back to Hastur as a gauge.
697
+ #
698
+ # @param [String] name The name of the gauge.
699
+ # @example
700
+ # Hastur.time "foo.bar" { fib 10 }
701
+ # Hastur.time "foo.bar", Time.now, :from => "over there" do fib(100) end
702
+ #
703
+ def time(name, timestamp=nil, labels={})
704
+ started = Time.now
705
+ ret = yield
706
+ ended = Time.now
707
+ gauge name, ended - started, timestamp || started, labels
708
+ ret
709
+ end
710
+
711
+ #
712
+ # Runs a block of code periodically every interval.
713
+ # Use this method to report statistics at a fixed time interval.
714
+ #
715
+ # @param [Symbol] interval How often to run. One of [:five_secs, :minute, :hour, :day]
716
+ # @yield [] A block which will send Hastur messages, called periodically
717
+ #
718
+ def every(interval, &block)
719
+ if @prevent_background_thread
720
+ log("You called .every(), but background threads are specifically prevented.")
721
+ end
722
+
723
+ unless @intervals.include?(interval)
724
+ raise "Interval must be one of these: #{@intervals}, you gave #{interval.inspect}"
725
+ end
726
+
727
+ # Don't add to existing array. += will create a new array. Then
728
+ # when we save a reference to the old array and iterate through
729
+ # it, it won't change midway.
730
+ Hastur.mutex.synchronize { @scheduled_blocks[interval] += [ block ] }
731
+ end
732
+ end