logstash_writer 0.0.1

Sign up to get free protection for your applications and to get access to all the features.
@@ -0,0 +1,197 @@
1
+ The `LogstashWriter` is an opinionated, reliable, and standards-observant
2
+ implementation of a means of getting events to a logstash cluster.
3
+
4
+
5
+ # Installation
6
+
7
+ It's a gem:
8
+
9
+ gem install gemplate
10
+
11
+ There's also the wonders of [the Gemfile](http://bundler.io):
12
+
13
+ gem 'gemplate'
14
+
15
+ If you're the sturdy type that likes to run from git:
16
+
17
+ rake install
18
+
19
+ Or, if you've eschewed the convenience of Rubygems entirely, then you
20
+ presumably know what to do already.
21
+
22
+
23
+ # Logstash Configuration
24
+
25
+ In order for logstash to receive the events being written, it must have a
26
+ `json_lines` TCP input configured. Something like this will do the trick:
27
+
28
+ input {
29
+ tcp {
30
+ id => "json_lines"
31
+ port => 5151
32
+ codec => "json_lines"
33
+ }
34
+ }
35
+
36
+ We'd really like to support the more featureful lumberjack (or, these days,
37
+ "beats") protocol, but [Elastic refuses to document it
38
+ properly](https://github.com/elastic/libbeat/issues/279), so until such time
39
+ as that is fixed, we are stuck with the `json_lines` approach.
40
+
41
+
42
+ # Usage
43
+
44
+ An instance of `LogstashWriter` needs to be given the location of a server
45
+ (or servers) to send the events to. This can be any of:
46
+
47
+ # An IPv4 address and port
48
+ lw = LogstashWriter.new(server_name: "192.0.2.42:5151")
49
+
50
+ # An IPv6 address and port
51
+ lw = LogstashWriter.new(server_name: "[2001:db8::42]:5151")
52
+ # ... or without the brackets, if you like to live dangerously:
53
+ lw = LogstashWriter.new(server_name: "2001:db8::42:5151")
54
+
55
+ # A hostname that resolves to one or more A/AAAA addresses, and port
56
+ lw = LogstashWriter.new(server_name: "logstash:5151")
57
+
58
+ # A DNS name that resolves to one or more SRV records (which
59
+ # specify the port as part of the record)
60
+ lw = LogstashWriter.new(server_name: "_logstash._tcp")
61
+
62
+ Once you have your `LogstashWriter` instance, you can start firing
63
+ events:
64
+
65
+ lw.send_event(any: "hash", you: "like")
66
+
67
+ However they won't actually be sent to the logstash server until you start
68
+ the background worker thread:
69
+
70
+ lw.run
71
+
72
+ When it comes time to shutdown, you can do so gracefully, like this:
73
+
74
+ lw.stop
75
+
76
+ This will wait for all events in the queue to drain to the logstash server
77
+ before returning.
78
+
79
+ In the event that a logstash server is unavailable at the time your events
80
+ are sent, events will be queued until a server is contactable. However,
81
+ because memory is a finite resource, the backlog is limited to 1,000 events
82
+ by default. If you want a larger (or smaller) limit, tell the writer when
83
+ you create it:
84
+
85
+ lw = LogstashWriter.new(server_name: "...", backlog: 1_000_000)
86
+
87
+ If you want to know what your writer is doing, give it a logger:
88
+
89
+ lw = LogstashWriter.new(server_name: "...", logger: Logger.new("/dev/stderr")
90
+
91
+
92
+ ## Prometheus Metrics
93
+
94
+ If you're instrumentally inclined, you can get Prometheus metrics
95
+ out of the writer by passing a client registry (which you'll presumably know
96
+ what to do with if you're into that sort of thing):
97
+
98
+ reg = Prometheus::Client::Registry.new
99
+ lw = LogstashWriter.new(server_name: "...", metrics_registry: reg)
100
+
101
+ The metrics that are exposed are:
102
+
103
+ * **`logstash_writer_events_received_total`** -- the number of events that
104
+ have been submitted for writing by calling `#send_event`.
105
+
106
+ * **`logstash_writer_events_written_total`** -- the number of events that
107
+ have been submitted to the logstash server, labelled by `server` (the
108
+ `address:port` pair for the server that each event was submitted to).
109
+
110
+ * **`logstash_writer_events_dropped_total`** -- the number of events
111
+ that were dropped due to the backlog buffer filling up. An increase
112
+ in this value over time indicates that your logstash servers are either
113
+ unreliable, or unable to cope with peak event ingestion loads.
114
+
115
+ * **`logstash_writer_queue_size`** -- the number of events currently in
116
+ the backlog queue awaiting transmission. In *theory*, this value should
117
+ always be `received - (sent + dropped)`, but this gauge is maintained
118
+ separately as a cross-check in case of bugs.
119
+
120
+ * **`logstash_writer_last_sent_event_timestamp`** -- the UTC timestamp,
121
+ represented as the number of (fractional) seconds since the Unix epoch, at
122
+ which the most recent event sent to a logstash server was originally
123
+ submitted via `#send_event`. This might require some unpacking.
124
+
125
+ If everything is going along swimmingly, there's no queued events, and
126
+ events submitted are immediately forwarded to logstash, this gauge will
127
+ be whenever the last event was sent. No big problem. However, in the
128
+ event of problems, this timestamp can tell you several things.
129
+
130
+ Firstly, if there are queued events, you can tell how far behind in real
131
+ time your logstash event history is, by calculating `NOW() -
132
+ logstash_writer_last_sent_event_timestamp`. Thus, if you're not finding
133
+ events in your Kibana dashboard you were expecting to see, you can tell
134
+ that there's a clog in the pipes by looking at this.
135
+
136
+ Alternately, if the queue is empty, but this timestamp is perhaps older
137
+ than you'd expect, then you know the problem is "upstream" of
138
+ `LogstashWriter`. If your code isn't calling `#send_event`, then this
139
+ timestamp won't be progressing, and you can go look for a deadlock or
140
+ something in your code, and don't need to check if logstash is misbehaving
141
+ (again).
142
+
143
+ * **`logstash_writer_connected_to_server`** -- this flag timeseries (can be
144
+ either `1` or `0`) is simply a way for you to quickly determine whether
145
+ the writer has a server to talk to, if it wants one. That is, this time
146
+ series will only be `0` if there's an event to write but no logstash
147
+ server can be found to write it to.
148
+
149
+ * **`logstash_writer_connect_exceptions_total`** -- a count of exceptions
150
+ raised whilst attempting to connect to a logstash server, labelled by the
151
+ exception class and the server to which the connection was attempted.
152
+
153
+ * **`logstash_writer_write_exceptions_total`** -- a count of exceptions
154
+ raised whilst attempting to write data to a connected logstash server,
155
+ labelled by the exception class and the server to which the write was
156
+ directed.
157
+
158
+ * **`logstash_writer_write_loop_exceptions_total`** -- a count of exceptions
159
+ raised in the "write loop", which is the main infinite loop executed by
160
+ the background worker thread. Exceptions which occur here are...
161
+ concerning, because whilst exceptions are expected while connecting and
162
+ writing to logstash servers, the write loop *itself* shouldn't normally
163
+ be flinging exceptions around.
164
+
165
+ * **`logstash_writer_write_loop_ok`** -- a flag (can be either `1` or `0`)
166
+ indicating whether the write loop is dead or not. This is, essentially,
167
+ the `up` series for the logstash writer; if this is `0`, nothing useful is
168
+ happening in the logstash writer.
169
+
170
+
171
+ # Contributing
172
+
173
+ Patches can be sent as [a Github pull
174
+ request](https://github.com/discourse/logstash-writer). This project is
175
+ intended to be a safe, welcoming space for collaboration, and contributors
176
+ are expected to adhere to the [Contributor Covenant code of
177
+ conduct](CODE_OF_CONDUCT.md).
178
+
179
+
180
+ # Licence
181
+
182
+ Unless otherwise stated, everything in this repo is covered by the following
183
+ copyright notice:
184
+
185
+ Copyright (C) 2015 Civilized Discourse Construction Kit, Inc.
186
+
187
+ This program is free software: you can redistribute it and/or modify it
188
+ under the terms of the GNU General Public License version 3, as
189
+ published by the Free Software Foundation.
190
+
191
+ This program is distributed in the hope that it will be useful,
192
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
193
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
194
+ GNU General Public License for more details.
195
+
196
+ You should have received a copy of the GNU General Public License
197
+ along with this program. If not, see <http://www.gnu.org/licenses/>.
@@ -0,0 +1,498 @@
1
+ require 'ipaddr'
2
+ require 'json'
3
+ require 'resolv'
4
+ require 'socket'
5
+ require 'prometheus/client'
6
+
7
+ # Write messages to a logstash server.
8
+ #
9
+ # Flings events, represented as JSON objects, to logstash using the
10
+ # `json_lines` codec (over TCP). Doesn't do any munging or modification of
11
+ # the event data given to it, other than adding `@timestamp` and `_id`
12
+ # fields if they do not already exist.
13
+ #
14
+ # We support highly-available logstash installations by means of multiple
15
+ # address records, or via SRV records. See the docs for .new for details
16
+ # as to the valid formats for the server.
17
+ #
18
+ class LogstashWriter
19
+ # How long, in seconds, to pause the first time an error is encountered.
20
+ # Each successive error will cause a longer wait, so as to prevent
21
+ # thundering herds.
22
+ INITIAL_RETRY_WAIT = 0.5
23
+
24
+ # Create a new logstash writer.
25
+ #
26
+ # Once the object is created, you're ready to give it messages by
27
+ # calling #send_event. No messages will actually be *delivered* to
28
+ # logstash, though, until you call #run.
29
+ #
30
+ # If multiple addresses are returned from an A/AAAA resolution, or
31
+ # multiple SRV records, then the records will all be tried in random
32
+ # order (for A/AAAA records) or in line with the standard rules for
33
+ # weight and priority (for SRV records).
34
+ #
35
+ # @param server_name [String] details for connecting to the logstash
36
+ # server(s). This can be:
37
+ #
38
+ # * `<IPv4 address>:<port>` -- a literal IPv4 address, and mandatory
39
+ # port.
40
+ #
41
+ # * `[<IPv6 address>]:<port>` -- a literal IPv6 address, and mandatory
42
+ # port. enclosing the address in square brackets isn't required, but
43
+ # it's a serving suggestion to make it a little easier to discern
44
+ # address from port. Forgetting the include the port will end in
45
+ # confusion.
46
+ #
47
+ # * `<hostname>:<port>` -- the given hostname will be resolved for
48
+ # A/AAAA records, and all returned addresses will be tried in random
49
+ # order until one is found that accepts a connection.
50
+ #
51
+ # * `<dnsname>` -- the given dnsname will be resolved for SRV records,
52
+ # and the returned target hostnames and ports will be tried in the
53
+ # RFC2782-approved manner according to priority and weight until one
54
+ # is found which accepts a connection.
55
+ #
56
+ # @param logger [Logger] something to which we can write log entries
57
+ # for debugging and error-reporting purposes.
58
+ #
59
+ # @param backlog [Integer] a non-negative integer specifying the maximum
60
+ # number of events that should be queued during periods when the
61
+ # logstash server is unavailable. If the limit is exceeded, the oldest
62
+ # (= first event to be queued) will be dropped.
63
+ #
64
+ # @param metrics_registry [Prometheus::Client::Registry] where to register
65
+ # the metrics instrumenting the operation of the writer instance.
66
+ #
67
+ # @param metrics_prefix [#to_s] what to prefix all of the metrics used to
68
+ # instrument the operation of the writer instance. If you instantiate
69
+ # multiple LogstashWriter instances with the same `stats_registry`, this
70
+ # parameter *must* be different for each of them, or you will get some
71
+ # inscrutable exception raised from the registry.
72
+ #
73
+ def initialize(server_name:, logger: Logger.new("/dev/null"), backlog: 1_000, metrics_registry: Prometheus::Client::Registry.new, metrics_prefix: :logstash_writer)
74
+ @server_name, @logger, @backlog = server_name, logger, backlog
75
+
76
+ @metrics = {
77
+ received: metrics_registry.counter(:"#{metrics_prefix}_events_received_total", "The number of logstash events which have been submitted for delivery"),
78
+ sent: metrics_registry.counter(:"#{metrics_prefix}_events_written_total", "The number of logstash events which have been delivered to the logstash server"),
79
+ queue_size: metrics_registry.gauge(:"#{metrics_prefix}_queue_size", "The number of events currently in the queue to be sent"),
80
+ dropped: metrics_registry.counter(:"#{metrics_prefix}_events_dropped_total", "The number of events which have been dropped from the queue"),
81
+
82
+ lag: metrics_registry.gauge(:"#{metrics_prefix}_last_sent_event_timestamp", "When the last event successfully sent to logstash was originally received"),
83
+
84
+ connected: metrics_registry.gauge(:"#{metrics_prefix}_connected_to_server", "Boolean flag indicating whether we are currently connected to a logstash server"),
85
+ connect_exception: metrics_registry.counter(:"#{metrics_prefix}_connect_exceptions_total", "The number of exceptions that have occurred whilst attempting to connect to a logstash server"),
86
+ write_exception: metrics_registry.counter(:"#{metrics_prefix}_write_exceptions_total", "The number of exceptions that have occurred whilst attempting to write an event to a logstash server"),
87
+
88
+ write_loop_exception: metrics_registry.counter(:"#{metrics_prefix}_write_loop_exceptions_total", "The number of exceptions that have occurred in the writing loop"),
89
+ write_loop_ok: metrics_registry.gauge(:"#{metrics_prefix}_write_loop_ok", "Boolean flag indicating whether the writing loop is currently operating correctly, or is in a post-apocalyptic hellscape of never-ending exceptions"),
90
+ }
91
+
92
+ @metrics[:lag].set({}, 0)
93
+ @metrics[:queue_size].set({}, 0)
94
+
95
+ @queue = []
96
+ @queue_mutex = Mutex.new
97
+ @queue_cv = ConditionVariable.new
98
+
99
+ @socket_mutex = Mutex.new
100
+ @worker_mutex = Mutex.new
101
+ end
102
+
103
+ # Add an event to the queue, to be sent to logstash. Actual event
104
+ # delivery will happen in a worker thread that is started with
105
+ # #run. If the event does not have a `@timestamp` or `_id` element, they
106
+ # will be added with appropriate values.
107
+ #
108
+ # @param e [Hash] the event data to be sent.
109
+ #
110
+ # @return [NilClass]
111
+ #
112
+ def send_event(e)
113
+ unless e.is_a?(Hash)
114
+ raise ArgumentError, "Event must be a hash"
115
+ end
116
+
117
+ unless e.has_key?(:@timestamp) || e.has_key?("@timestamp")
118
+ e[:@timestamp] = Time.now.utc.strftime("%FT%TZ")
119
+ end
120
+
121
+ unless e.has_key?(:_id) || e.has_key?("_id")
122
+ # This is the quickest way I've found to get a long, random string.
123
+ # We don't need any sort of cryptographic or unpredictability
124
+ # guarantees for what we're doing here, so SecureRandom is unnecessary
125
+ # overhead.
126
+ e[:_id] = rand(0x1000_0000_0000_0000_0000_0000_0000_0000).to_s(36)
127
+ end
128
+
129
+ @queue_mutex.synchronize do
130
+ @queue << { content: e, arrival_timestamp: Time.now }
131
+ while @queue.length > @backlog
132
+ @queue.shift
133
+ stat_dropped
134
+ end
135
+ @queue_cv.signal
136
+
137
+ stat_received
138
+ end
139
+
140
+ nil
141
+ end
142
+
143
+ # Start sending events.
144
+ #
145
+ # This method will return almost immediately, and actual event
146
+ # transmission will commence in a separate thread.
147
+ #
148
+ # @return [NilClass]
149
+ #
150
+ def run
151
+ @worker_mutex.synchronize do
152
+ if @worker_thread.nil?
153
+ m, cv = Mutex.new, ConditionVariable.new
154
+
155
+ @worker_thread = Thread.new { cv.signal; write_loop }
156
+
157
+ # Don't return until the thread has *actually* started
158
+ m.synchronize { cv.wait(m) }
159
+ end
160
+ end
161
+
162
+ nil
163
+ end
164
+
165
+ # Stop the worker thread.
166
+ #
167
+ # Politely ask the worker thread to please finish up once it's
168
+ # finished sending all messages that have been queued. This will
169
+ # return once the worker thread has finished.
170
+ #
171
+ # @return [NilClass]
172
+ #
173
+ def stop
174
+ @worker_mutex.synchronize do
175
+ if @worker_thread
176
+ @terminate = true
177
+ @queue_cv.signal
178
+ begin
179
+ @worker_thread.join
180
+ rescue Exception => ex
181
+ @logger.error("LogstashWriter") { (["Worker thread terminated with exception: #{ex.message} (#{ex.class})"] + ex.backtrace).join("\n ") }
182
+ end
183
+ @worker_thread = nil
184
+ @socket_mutex.synchronize { (@current_socket.close; @current_socket = nil) if @current_socket }
185
+ end
186
+ end
187
+
188
+ nil
189
+ end
190
+
191
+ # Disconnect from the currently-active server.
192
+ #
193
+ # In certain circumstances, you may wish to force the writer to stop
194
+ # sending messages to the currently-connected logstash server, and
195
+ # re-resolve the `server_name` to get new a new address to talk to.
196
+ # Calling this method will cause that to happen.
197
+ #
198
+ # @return [NilClass]
199
+ #
200
+ def force_disconnect!
201
+ @socket_mutex.synchronize do
202
+ return if @current_socket.nil?
203
+
204
+ @logger.info("LogstashWriter") { "Forced disconnect from #{describe_peer(@current_socket) }" }
205
+ @current_socket.close if @current_socket
206
+ @current_socket = nil
207
+ end
208
+
209
+ nil
210
+ end
211
+
212
+ private
213
+
214
+ # The main "worker" method for getting events out of the queue and
215
+ # firing them at logstash.
216
+ #
217
+ def write_loop
218
+ error_wait = INITIAL_RETRY_WAIT
219
+
220
+ catch :terminate do
221
+ loop do
222
+ event = nil
223
+
224
+ begin
225
+ @queue_mutex.synchronize do
226
+ while @queue.empty? && !@terminate
227
+ @queue_cv.wait(@queue_mutex)
228
+ end
229
+
230
+ if @queue.empty? && @terminate
231
+ @terminate = false
232
+ throw :terminate
233
+ end
234
+
235
+ event = @queue.shift
236
+ end
237
+
238
+ current_socket do |s|
239
+ s.puts event[:content].to_json
240
+ stat_sent(describe_peer(s), event[:arrival_timestamp])
241
+ @metrics[:write_loop_ok].set({}, 1)
242
+ error_wait = INITIAL_RETRY_WAIT
243
+ end
244
+ rescue StandardError => ex
245
+ @logger.error("LogstashWriter") { (["Exception in write_loop: #{ex.message} (#{ex.class})"] + ex.backtrace).join("\n ") }
246
+ @queue_mutex.synchronize { @queue.unshift(event) if event }
247
+ @metrics[:write_loop_exception].increment(class: ex.class.to_s)
248
+ @metrics[:write_loop_ok].set({}, 0)
249
+ sleep error_wait
250
+ # Increase the error wait timeout for next time, up to a maximum
251
+ # interval of about 60 seconds
252
+ error_wait *= 1.1
253
+ error_wait = 60 if error_wait > 60
254
+ error_wait += rand / 0.5
255
+ end
256
+ end
257
+ end
258
+ end
259
+
260
+ # Yield a TCPSocket connected to the server we currently believe to be
261
+ # accepting log entries, so that something can send log entries to it.
262
+ #
263
+ # The yielding allows us to centralise all error detection and handling
264
+ # within this one method, and retry sending just by calling `yield` again
265
+ # when we've connected to another server.
266
+ #
267
+ def current_socket
268
+ # This could all be handled more cleanly with recursion, but I don't
269
+ # want to fill the stack if we have to retry a lot of times. Also
270
+ # can't just use `retry` because not all of the "go around again"
271
+ # conditions are due to exceptions.
272
+ done = false
273
+
274
+ until done
275
+ @socket_mutex.synchronize do
276
+ if @current_socket
277
+ begin
278
+ @logger.debug("LogstashWriter") { "Using current server #{describe_peer(@current_socket)}" }
279
+ yield @current_socket
280
+ @metrics[:connected].set({}, 1)
281
+ done = true
282
+ rescue SystemCallError => ex
283
+ # Something went wrong during the send; disconnect from this
284
+ # server and recycle
285
+ @metrics[:write_exception].increment(server: describe_peer(@current_socket), class: ex.class.to_s)
286
+ @logger.info("LogstashWriter") { "Error while writing to current server: #{ex.message} (#{ex.class})" }
287
+ @current_socket.close
288
+ @current_socket = nil
289
+ @metrics[:connected].set({}, 0)
290
+
291
+ sleep INITIAL_RETRY_WAIT
292
+ end
293
+ else
294
+ retry_delay = INITIAL_RETRY_WAIT * 10
295
+ candidates = resolve_server_name
296
+ @logger.debug("LogstashWriter") { "Server candidates: #{candidates.inspect}" }
297
+
298
+ if candidates.empty?
299
+ # A useful error message will (should?) have been logged by something
300
+ # down in the bowels of resolve_server_name, so all we have to do
301
+ # is wait a little while, then let the loop retry.
302
+ sleep INITIAL_RETRY_WAIT * 10
303
+ else
304
+ begin
305
+ next_server = candidates.shift
306
+
307
+ if next_server
308
+ @logger.debug("LogstashWriter") { "Trying to connect to #{next_server.to_s}" }
309
+ @current_socket = next_server.socket
310
+ else
311
+ @logger.debug("LogstashWriter") { "Could not connect to any server; pausing before trying again" }
312
+ @current_socket = nil
313
+ sleep retry_delay
314
+
315
+ # Calculate a longer retry delay next time we fail to connect
316
+ # to every server in the list, up to a maximum of (roughly) 60
317
+ # seconds.
318
+ retry_delay *= 1.5
319
+ retry_delay = 60 if retry_delay > 60
320
+ # A bit of randomness to prevent the thundering herd never goes
321
+ # amiss
322
+ retry_delay += rand
323
+ end
324
+ rescue SystemCallError => ex
325
+ # Connection failed for any number of reasons; try the next one in the list
326
+ @metrics[:connect_exception].increment(server: next_server.to_s, class: ex.class.to_s)
327
+ @logger.error("LogstashWriter") { "Failed to connect to #{next_server.to_s}: #{ex.message} (#{ex.class})" }
328
+ sleep INITIAL_RETRY_WAIT
329
+ retry
330
+ end
331
+ end
332
+ end
333
+ end
334
+ end
335
+ end
336
+
337
+ # Generate a human-readable description of the remote end of the given
338
+ # socket.
339
+ #
340
+ def describe_peer(s)
341
+ pa = s.peeraddr
342
+ if pa[0] == "AF_INET6"
343
+ "[#{pa[3]}]:#{pa[1]}"
344
+ else
345
+ "#{pa[3]}:#{pa[1]}"
346
+ end
347
+ end
348
+
349
+ # Turn the server_name given in the constructor into a list of Target
350
+ # objects, suitable for iterating through to find someone to talk to.
351
+ #
352
+ def resolve_server_name
353
+ return [static_target] if static_target
354
+
355
+ # The IPv6 literal case should have been taken care of by
356
+ # static_target, so the only two cases we have to deal with
357
+ # here are specified-port (assume A/AAAA) or no port (assume SRV).
358
+ if @server_name =~ /:/
359
+ host, port = @server_name.split(":", 2)
360
+ targets_from_address_record(host, port)
361
+ else
362
+ targets_from_srv_record(host)
363
+ end
364
+ end
365
+
366
+ # Figure out whether the server spec we were given looks like an address:port
367
+ # combo (in which case return a memoised target), else return `nil` to let
368
+ # the DNS take over.
369
+ def static_target
370
+ # It is valid to memoize this because address literals don't change
371
+ # their resolution over time.
372
+ @static_target ||= begin
373
+ if @server_name =~ /\A(.*):(\d+)\z/
374
+ begin
375
+ IPAddr.new($1)
376
+ rescue ArgumentError
377
+ # Whatever is on the LHS isn't a recognisable address literal;
378
+ # assume hostname
379
+ nil
380
+ else
381
+ Target.new($1, $2.to_i)
382
+ end
383
+ end
384
+ end
385
+ end
386
+
387
+ # Resolve hostname as A/AAAA, and generate randomly-sorted list of Target
388
+ # records from the list of addresses resolved.
389
+ #
390
+ def targets_from_address_record(hostname, port)
391
+ addrs = Resolv::DNS.new.getaddresses(hostname)
392
+ if addrs.empty?
393
+ @logger.warn("LogstashWriter") { "No addresses resolved for server_name #{hostname.inspect}" }
394
+ end
395
+ addrs.sort_by { rand }.map { |a| Target.new(a.to_s, port.to_i) }
396
+ end
397
+
398
+ # Resolve the given hostname as a SRV record, and generate a list of
399
+ # Target records from the resources returned. The list will be arranged
400
+ # in line with the RFC2782-specified algorithm, respecting the weight and
401
+ # priority of the records.
402
+ #
403
+ def targets_from_srv_record(hostname)
404
+ [].tap do |list|
405
+ left = Resolv::DNS.new.getresources(@server_name, Resolv::DNS::Resource::IN::SRV)
406
+ if left.empty?
407
+ @logger.warn("LogstashWriter") { "No SRV records found for server_name #{@server_name.inspect}" }
408
+ end
409
+
410
+ # Let the soft-SRV shuffle... BEGIN!
411
+ until left.empty?
412
+ prio = left.map { |rr| rr.priority }.uniq.min
413
+ candidates = left.select { |rr| rr.priority == prio }
414
+ left -= candidates
415
+ candidates.sort_by! { |rr| [rr.weight, rr.target.to_s] }
416
+ until candidates.empty?
417
+ selector = rand(candidates.inject(1) { |n, rr| n + rr.weight })
418
+ chosen = candidates.inject(0) do |n, rr|
419
+ break rr if n + rr.weight >= selector
420
+ n + rr.weight
421
+ end
422
+ candidates.delete(chosen)
423
+ list << Target.new(chosen.target.to_s, chosen.port)
424
+ end
425
+ end
426
+ end
427
+ end
428
+
429
+ def stat_received
430
+ @metrics[:received].increment({})
431
+ @metrics[:queue_size].increment({})
432
+ end
433
+
434
+ def stat_sent(peer, arrived_time)
435
+ @metrics[:sent].increment(server: peer)
436
+ @metrics[:queue_size].decrement({})
437
+ @metrics[:lag].set({}, arrived_time.to_f)
438
+ end
439
+
440
+ def stat_dropped
441
+ @metrics[:queue_size].decrement({})
442
+ @metrics[:dropped].increment({})
443
+ end
444
+
445
+ # An individual target for logstash messages
446
+ #
447
+ # Takes a host and port, gives back a socket to send data down.
448
+ #
449
+ class Target
450
+ # Create a new target.
451
+ #
452
+ # @param addr [String] an IP address or hostname to which to connect.
453
+ #
454
+ # @param port [Integer] the TCP port number, in the range 1-65535.
455
+ #
456
+ # @raise [ArgumentError] if `addr` is not a valid-looking IP address or
457
+ # hostname, or if the port number is not in the valid range.
458
+ #
459
+ def initialize(addr, port)
460
+ #:nocov:
461
+ unless addr.is_a? String
462
+ raise ArgumentError, "addr #{addr.inspect} is not a string"
463
+ end
464
+
465
+ unless port.is_a? Integer
466
+ raise ArgumentError, "port #{port.inspect} is not an integer"
467
+ end
468
+
469
+ unless (1..65535).include?(port)
470
+ raise ArgumentError, "invalid port number #{port.inspect} (must be in range 1-65535)"
471
+ end
472
+ #:nocov:
473
+
474
+ @addr, @port = addr, port
475
+ end
476
+
477
+ # Create a connection.
478
+ #
479
+ # @return [IO] a socket to the target.
480
+ #
481
+ # @raise [SystemCallError] if connection cannot be established
482
+ # for any reason.
483
+ #
484
+ def socket
485
+ TCPSocket.new(@addr, @port)
486
+ end
487
+
488
+ # Simple string representation of the target.
489
+ #
490
+ # @return [String]
491
+ #
492
+ def to_s
493
+ "#{@addr}:#{@port}"
494
+ end
495
+ end
496
+
497
+ private_constant :Target
498
+ end