franz 1.6.3 → 1.6.4

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA1:
3
- metadata.gz: e5b279a825ee6227a22713a782d777b1612dcce3
4
- data.tar.gz: 6fe951e9e90f98987606487bb96de1ad250bc57d
3
+ metadata.gz: fd7ded091e8791fdf52b423e2a66895e48dfd0f3
4
+ data.tar.gz: 5aca71dcd6052cbb2a147884b86de8141c505664
5
5
  SHA512:
6
- metadata.gz: 26b11bff12d3cae5b4429d570589b071323f5a22adb5ff965d73e764bbeedc021aa4ce4b48e0e9ecc75449e9c14bb936da71c7605f844b2b5b07f31d6dd8b701
7
- data.tar.gz: 14fde0811e334aae73983373d3ea08470ab86ef9c32d9986665e7e61d4331768cf663e31a07571f615141c96987028b8809aa1c763f9643c34aacfa0e61e28fe
6
+ metadata.gz: 4f51b1de2cac2128ee0bac0cc728ba08657c773a52965083e6c88d2dca9e5ebffac09276484c863c1d1bfdd993982b605ea9135d252dd17be648038df75e7ebe
7
+ data.tar.gz: 7ef909db8aca7ce60a1fde825d86f8113bbd1d3fdcf7a3454ace72fed0baadff0522b7637aab72e658eb7ac00a1cc03f25a3a3c8006df43fbdf2fdd1a1317ce8
data/History.md ADDED
@@ -0,0 +1,287 @@
1
+ # Franz
2
+
3
+ Hi there, your old pal Sean Clemmer here. Imma talk quite a lot, so we might as
4
+ well get acquainted. I work on the Operations team at Blue Jeans Network, an
5
+ enterprise videoconferencing provider based in Mountain View, CA. Our team
6
+ provides infrastructure, tools, and support to the Engineering team, including,
7
+ most importantly for our purposes, log storage and processing.
8
+
9
+
10
+ ## A lil History
11
+
12
+ Before the latest rearchitecture, logs at Blue Jeans were basically `rsync`ed
13
+ from every host to a central log server. Once on the box, a few processes came
14
+ afterwards to compress the files and scan for meeting identifiers. Our reporting
15
+ tools queried the log server with a meeting ID, and it replied with a list of
16
+ files and their location.
17
+
18
+ Compression saved a lot of space, and the search index was fairly small, since
19
+ we only needed to store a map of meeting IDs to file paths. If you wanted to
20
+ search the text, you need to log into the log server itself and `grep`.
21
+
22
+ And all of that worked for everyone until a point. At a certain number of files,
23
+ `grep` just wasn't fast enough, and worse, it was stealing resources necessary
24
+ for processing logs. At a certain volume, we just couldn't scan the logs fast
25
+ enough. Our scripts were getting harder to maintain, and we were looking for
26
+ answers sooner rather than later.
27
+
28
+
29
+ ### Exploring our options
30
+
31
+ We did a fair amount of research and fiddling before deciding anything. We
32
+ looked especially hard at the Elasticsearch-Logstash-Kibana (ELK) stack,
33
+ Graylog2, and rearchitecting our scripts as a distributed system. In the end,
34
+ we decided we weren't smart enough and there wasn't enough time to design our
35
+ own system from the ground up. We also found Graylog2 to be a bit immature and
36
+ lacking in features compared to the ELK stack.
37
+
38
+ In the end, we appreciated the ELK stack had a lot of community and corporate
39
+ support, it was easy to get started, and everything was fairly well-documented.
40
+ Elasticsearch in particular seemed like a well-architected and professional
41
+ software project. Logstash had an active development community, and the author
42
+ Jordan Sissel soon joined Elasticsearch, Inc. as an employee.
43
+
44
+ There was a lot of hype around ELK, and I thought we could make it work.
45
+
46
+
47
+ ### Our requirements
48
+
49
+ If you'll recall, the old system was really geared to look up log files based
50
+ on a special meeting ID. Quick, but no real search.
51
+
52
+ To emulate this with Elasticsearch we might store the whole file in a document
53
+ along with the meeting ID, which would make queries straightforward. A more
54
+ common approach in the ELK community is to store individual lines of a log file
55
+ in Elasticsearch documents. I figured we could get the file back by asking the
56
+ Elasticsearch cluster for an aggregation of documents corresponding to a given
57
+ file path on a given host.
58
+
59
+ To prove it to I slapped a couple scripts together, although the initial
60
+ implementation actually used *facets* and not *aggregations*. If we could get
61
+ the file back, that was everything we needed; we got advanced query support with
62
+ Elasticsearch and visualization with Kibana for free. Logstash was making it
63
+ easy for me to play around. Fun, even.
64
+
65
+
66
+ ### Moving forward with ELK
67
+
68
+ I got to reading about the pipelines, and I realized pretty quick we were't
69
+ just gonna be able to hook Logstash right into Elasticsearch. You should put
70
+ a kind of buffer inbetween, and both RabbitMQ and Redis were popular at the
71
+ time. While I was developing our solution, alternative "forwarders" like
72
+ Lumberjack were just being introduced. After evaluating our options, I decided
73
+ on RabbitMQ based on team experience and the native Logstash support.
74
+
75
+ So the initial pipeline looked like this:
76
+
77
+ Logstash -> RabbitMQ -> Logstash -> Elasticsearch <- Kibana
78
+
79
+ The first Logstash stage picked up logs with the `file` input and shipped them
80
+ out with the `rabbitmq` output. These log *events* sat in RabbitMQ until a
81
+ second, dedicated Logstash agent came along to parse it and shove it into
82
+ Elasticsearch for long-term storage and search. I slapped Kibana on top to
83
+ provide our users a usable window into the cluster.
84
+
85
+ And it all *kinda* worked. It wasn't very fast, and outages were fairly common,
86
+ but the all pieces were on the board. Over a few weeks I tuned and expanded the
87
+ RabbitMQ and Elasticsearch clusters, but still we were missing chunks of files,
88
+ missing whole files, and Logstash would die regularly with all kinds of strange
89
+ issues. Encoding issues, buffer issues, timeout issues, heap size issues.
90
+
91
+
92
+ ### Fighting with Logstash
93
+
94
+ Surely we weren't the only people running into issues with this very popular
95
+ piece of open source software? Logstash has a GitHub project, a JIRA account,
96
+ and an active mailing list. I scoured issues, source code, pull requests, and
97
+ e-mail archives. These seemed like huge bugs:
98
+
99
+ 1. *Multiline:* Both the multiline codec and filter had their issues. The codec
100
+ is generally preffered, but even still you might miss the last line of your
101
+ file, because Logstash does not implement *multiline flush*. Logstash will
102
+ buffer the last line of a file indefinitely, thinking you may come back and
103
+ write to the log.
104
+ 2. *File handling:* Because Logstash keeps files open indefinitely, it can soak
105
+ up file handles after running a while. We had tens of thousands of log files
106
+ on some hosts, and Logstash just wasn't having it.
107
+ 3. *Reconstruction:* Despite my initial proofs, we were having a lot of trouble
108
+ reconstructing log files from individual events. Lines were often missing,
109
+ truncated, and shuffled.
110
+
111
+ The multiline issue was actually fixed by the community, so I forked Logstash,
112
+ applied some patches, and did a little hacking to get it working right.
113
+
114
+ Fixing the second issue required delving deep into the depths of Logstash, the
115
+ `file` input, and Jordan Sissel's FileWatch project. FileWatch provides most of
116
+ the implementation for the `file` input, but it was riddled with bugs. I forked
117
+ the project and went through a major refactor to simplify and sanitize the code.
118
+ Eventually I was able to make it so Logstash would relinquish a file handle
119
+ some short interval after reading the file had ceased.
120
+
121
+ The third issue was rather more difficult. Subtle bugs at play. Rather than
122
+ relying on the `@timestamp` field, which we found did not have enough
123
+ resolution, I added a new field called `@seq`, just a simple counter, which
124
+ enabled us to put the events back in order. Still we were missing chunks, and
125
+ some lines appeared to be interleaved. Just weird stuff.
126
+
127
+ After hacking Logstash half to death we decided the first stage of the pipeline
128
+ would have to change. We'd still use Logstash to move events from RabbitMQ into
129
+ Elasticsearch, but we couldn't trust it to collect files.
130
+
131
+
132
+ ### And so Franz was born
133
+
134
+ I researched Logstash alternatives, but there weren't many at the time. Fluentd
135
+ looked promising, but early testing revealed the multiline facility wasn't quite
136
+ there yet. Lumberjack was just gaining some popularity, but it was still too
137
+ immature. In the end, I decided I had a pretty good handle on our requirements
138
+ and I would take a stab at implementing a solution.
139
+
140
+ It would be risky, but Logstash and the community just weren't moving fast
141
+ enough for our needs. Engineering was justly upset with our logging "solution",
142
+ and I was pretty frantic after weeks of hacking and debugging. How hard could
143
+ it really be to tail a file and send the lines out to a queue?
144
+
145
+ After a few prototypes and a couple false starts, we had our boy Franz.
146
+
147
+
148
+
149
+ ## Design and Implementation
150
+
151
+ From 10,000 feet Franz and Logstash are pretty similar; you can imagine Franz is
152
+ basically a Logstash agent configured with a `file` input and `rabbitmq` output.
153
+ Franz accepts a single configuration file that tells the process which files to
154
+ tail, how to handle them, and where to send the output. Besides solving the
155
+ three issues we discussed earlier, Franz provides a kind of `json` codec and
156
+ `drop` filter (in Logstash parlance).
157
+
158
+ I decided early on to implement Franz in Ruby, like Logstash. Unlike Logstash,
159
+ which is typically executed by JRuby, I decided to stick with Mat'z Ruby for
160
+ Franz in order to obtain a lower resource footprint at the expense of true
161
+ concurrency (MRI has a GIL).
162
+
163
+ Implementation-wise, Franz bears little resemblance to Logstash. Logstash has
164
+ a clever system which "compiles" the inputs, filters, and outputs into a single
165
+ block of code. Franz is a fairly straightward Ruby program with only a handful
166
+ of classes and a simple execution path.
167
+
168
+
169
+ ### The Twelve-Factor App
170
+
171
+ I was heavily influenced by [the Twelve-Factor App](http://12factor.net):
172
+
173
+ 1. *Codebase:* Franz is contained in a single repo on [GitHub](https://github.com/sczizzo/franz).
174
+ 2. *Dependencies:* Franz provides a `Gemfile` to isolate dependencies.
175
+ 3. *Config:* Franz separates code from configuration (no env vars, though).
176
+ 4. *Backing Services:* Franz is agnostic to the connected RabbitMQ server.
177
+ 5. *Build, release, run:* Franz is versioned and released as a [RubyGem](https://rubygems.org/gems/franz).
178
+ 6. *Processes:* Franz provides mostly-stateless share-nothing executions.
179
+ 7. *Port binding:* Franz isn't a Web service, so no worries here!
180
+ 8. *Concurrency:* Franz is a single process and plays nice with Upstart.
181
+ 9. *Disposability:* Franz uses a crash-only architecture, discussed below.
182
+ 10. *Dev/prod parity:* We run the same configuration in every environment.
183
+ 11. *Logs:* Franz provides structured logs which can be routed to file.
184
+ 12. *Admin processes:* Franz is simple enough this isn't necessary.
185
+
186
+
187
+ ### Crash-Only Architecture
188
+
189
+ Logstash assumes you might want to stop the process and restart it later, having
190
+ the new instance pick up where the last left off. To support this, Logstash (or
191
+ really, FileWatch) keeps a small "checkpoint" file, which is written whenever
192
+ Logstash is shut down.
193
+
194
+ Franz takes this one step further and implements a ["crash-only" design](http://lwn.net/Articles/191059).
195
+ The basic idea here the application does not distinguish between a crash and
196
+ a restart. In practical terms, Franz simply writes a checkpoint at regular
197
+ intervals; when asked to shut down, it aborts immediately.
198
+
199
+ Franz checkpoints are simple, too. It's just a `Hash` from log file paths to
200
+ the current `cursor` (byte offset) and `seq` (sequence number):
201
+
202
+ {
203
+ "/path/to/my.log": {
204
+ "cursor": 1234,
205
+ "seq": 99
206
+ }
207
+ }
208
+
209
+ The checkpoint file contains the `Marshal` representation of this `Hash`.
210
+
211
+
212
+ ### Sash
213
+
214
+ The `Sash` is a data structure I discovered during the development of Franz,
215
+ which came out of the implementation of multiline flush. Here's a taste:
216
+
217
+ s = Sash.new # => #<Sash...>
218
+ s.keys # => []
219
+ s.insert :key, :value # => :value
220
+ s.get :key # => [ :value ]
221
+ s.insert :key, :crazy # => :crazy
222
+ s.mtime :key # => 2014-02-18 21:24:30 -0800
223
+ s.flush :key # => [ :value, :crazy ]
224
+
225
+ For multilining, what you do is create a `Sash` keyed by each path, and insert
226
+ each line in the appropriate key as they come in from upstream. Before you
227
+ insert it, though, you check if the line in question matches the multiline
228
+ pattern for the key: If so, you flush the `Sash` key and write the result out as
229
+ an event. Now the `Sash` key will buffer the next event.
230
+
231
+ In fact, a `Sash` key will only ever contain lines for at most one event, and
232
+ the `mtime` method allows us to know how recently that key was modified. To
233
+ implement multiline flush correctly, we periodically check the `Sash` for old
234
+ keys and flush them according to configuration. `Sash` methods are threadsafe,
235
+ so we can do this on the side without interrupting the main thread.
236
+
237
+
238
+ ### Slog
239
+
240
+ [Slog](https://github.com/sczizzo/slog) was also factored out of Franz, but
241
+ it's heavily inspired by Logstash. By default, output is pretty and colored (I
242
+ swear):
243
+
244
+ Slog.new.info 'example'
245
+ #
246
+ # {
247
+ # "level": "info",
248
+ # "@timestamp": "2014-12-25T06:22:43.459-08:00",
249
+ # "message": "example"
250
+ # }
251
+ #
252
+
253
+ `Slog` works perfectly with Logstash or Franz when configured to treat the log
254
+ file as JSON; they'll add the other fields necessary for reconstruction.
255
+
256
+ More than anything, structured logging has changed how I approach logs. Instead
257
+ of writing *everything* to file, Franz strives to log useful events that contain
258
+ metadata. Instead of every request, an occasional digest. Instead of a paragraph
259
+ of text, a simple summary. Franz uses different log levels appropriately,
260
+ allowing end users to control verbosity.
261
+
262
+
263
+ ### Execution Path
264
+
265
+ Franz is implemented as a series of stages connected via bounded queues:
266
+
267
+ Input -> Discover -> Watch -> Tail -> Agg -> Output
268
+
269
+ Each of these stages is a class under the `Franz` namespace, and they run up
270
+ to a couple `Thread`s, typically a worker and maybe a helper (e.g. multiline
271
+ flush). Communicating via `SizedQueue`s helps ensure correctness and constrain
272
+ memory usage under high load.
273
+
274
+ 0. `Input`: Actually wires together `Discover`, `Watch`, `Tail`, and `Agg`.
275
+ 1. `Discover`: Performs half of file existence detection by expanding globs and
276
+ keeping track of files known to Franz.
277
+ 2. `Watch`: Works in tandem with `Discover` to maintain a list of known files and
278
+ their status. Events are generated when a file is created, destroyed, or
279
+ modified (including appended, truncated, and replaced).
280
+ 3. `Tail`: Receives low-level file events from a `Watch` and handles the actual
281
+ reading of files, providing a stream of lines.
282
+ 4. `Agg`: Mostly aggregates `Tail` events by applying the multiline filter, but it
283
+ also applies the `host` and `type` fields. Basically, it does all the post
284
+ processing after we've retreived a line from a file.
285
+ 5. `Output`: RabbitMQ output for Franz, based on the really-very-good [Bunny](https://github.com/ruby-amqp/bunny)
286
+ client. You must declare an `x-consistent-hash` exchange, as we generate a
287
+ random `Integer` for routing. Such is life.
data/Rakefile CHANGED
@@ -21,7 +21,6 @@ end
21
21
 
22
22
  require 'rubygems/tasks'
23
23
  Gem::Tasks.new({
24
- push: false,
25
24
  sign: {}
26
25
  }) do |tasks|
27
26
  tasks.console.command = 'pry'
@@ -30,23 +29,4 @@ Gem::Tasks::Sign::Checksum.new sha2: true
30
29
 
31
30
 
32
31
  require 'rake/version_task'
33
- Rake::VersionTask.new
34
-
35
-
36
- desc "Upload build artifacts to WOPR"
37
- task :upload => :build do
38
- pkg_name = 'franz-%s.gem' % File.read('VERSION').strip
39
- pkg_path = File.join 'pkg', pkg_name
40
-
41
- require 'net/ftp'
42
- ftp = Net::FTP.new
43
- ftp.connect '10.4.4.15', 8080
44
- ftp.login
45
- ftp.passive
46
- begin
47
- ftp.put pkg_path
48
- ftp.sendcmd("SITE CHMOD 0664 #{pkg_name}")
49
- ensure
50
- ftp.close
51
- end
52
- end
32
+ Rake::VersionTask.new
data/Readme.md CHANGED
@@ -1,4 +1,4 @@
1
- # Franz
1
+ # Franz ![Version](https://img.shields.io/gem/v/franz.svg?style=flat-square)
2
2
 
3
3
  Franz ships line-oriented log files to [RabbitMQ](http://www.rabbitmq.com/).
4
4
  Think barebones [logstash](http://logstash.net/) in pure Ruby with more modest
@@ -48,3 +48,106 @@ Okay one last feature: Every log event is assigned a sequential identifier
48
48
  according to its path (and implicitly, host) in the `@seq` field. This is useful
49
49
  if you expect your packets to get criss-crossed and you want to reconstruct the
50
50
  events in order without relying on timestamps, which you shouldn't.
51
+
52
+
53
+ ## Usage, Configuration &c.
54
+
55
+ ### Installation
56
+
57
+ You can build a gem from this repository, or use RubyGems:
58
+
59
+ $ gem install franz
60
+
61
+ ### Usage
62
+
63
+ Just call for help!
64
+
65
+ $ franz --help
66
+
67
+ .--.,
68
+ ,--.' \ __ ,-. ,---, ,----,
69
+ | | /\/,' ,'/ /| ,-+-. / | .' .`|
70
+ : : : ' | |' | ,--.--. ,--.'|' | .' .' .'
71
+ : | |-,| | ,'/ \ | | ,"' |,---, ' ./
72
+ | : :/|' : / .--. .-. | | | / | |; | .' /
73
+ | | .'| | ' \__\/: . . | | | | |`---' / ;--,
74
+ ' : ' ; : | ," .--.; | | | | |/ / / / .`|
75
+ | | | | , ; / / ,. | | | |--' ./__; .'
76
+ | : \ ---' ; : .' \| |/ ; | .'
77
+ | |,' | , .-./'---' `---'
78
+ `--' `--`---' v1.6.0
79
+
80
+
81
+ Aggregate log file events and send them elsewhere
82
+
83
+ Usage: franz [<options>]
84
+
85
+ Options:
86
+ --config, -c <s>: Configuration file to use (default: config.json)
87
+ --debug, -d: Enable debugging output
88
+ --trace, -t: Enable trace output
89
+ --log, -l <s>: Log to file, not STDOUT
90
+ --version, -v: Print version and exit
91
+ --help, -h: Show this message
92
+
93
+ ### Configuration
94
+
95
+ It's kinda like a JSON version of the Logstash config language:
96
+
97
+ {
98
+ // The asterisk will be replaced with a Unix timestamp
99
+ "checkpoint": "/etc/franz/franz.*.db",
100
+
101
+ // All input configs are files by convention
102
+ "input": {
103
+ "configs": [
104
+
105
+ // Only "type" and "includes" are required
106
+ {
107
+ "type": "example", // A nice name
108
+ "includes": [ "/path/to/your.*.log" ], // File path globs
109
+ "excludes": [ "your.bad.*.log" ], // Basename globs
110
+ "multiline": "(?i-mx:^[a-z]{3} +\\d{1,2})", // Stringified RegExp
111
+ "drop": "(?i-mx:^\\d)", // Same story.
112
+ "json?": false // JSON-formatted?
113
+ }
114
+ ]
115
+ },
116
+
117
+ // Only RabbitMQ is supported at the moment
118
+ "output": {
119
+ "rabbitmq": {
120
+
121
+ // Must be a consistently-hashed exchange
122
+ "exchange": {
123
+ "name": "logs"
124
+ },
125
+
126
+ // See Bunny docs for connection configuration
127
+ "connection": {
128
+ "host": "localhost",
129
+ "vhost": "/logs",
130
+ "user": "logs",
131
+ "pass": "logs"
132
+ }
133
+ }
134
+ }
135
+ }
136
+
137
+ ### Operation
138
+
139
+ At Blue Jeans, we deploy Franz with Upstart. Here's a minimal config:
140
+
141
+ #!upstart
142
+ description "franz"
143
+
144
+ console log
145
+
146
+ start on startup
147
+ stop on shutdown
148
+ respawn
149
+
150
+ exec franz
151
+
152
+ We actually use the [`bjn_franz` cookbook](https://github.com/sczizzo/bjn-franz-cookbook)
153
+ for Chef.
data/VERSION CHANGED
@@ -1 +1 @@
1
- 1.6.3
1
+ 1.6.4
data/bin/franz CHANGED
@@ -63,13 +63,18 @@ logger.info \
63
63
  event: 'boot',
64
64
  version: Franz::VERSION
65
65
 
66
+ statz = Franz::Stats.new \
67
+ interval: (config[:output][:stats_interval] || 300),
68
+ logger: logger
69
+
66
70
  # Now we'll connect to our output, RabbitMQ. This creates a new thread in the
67
71
  # background, which will consume the events generated by our input on io
68
72
  Franz::Output.new \
69
73
  input: io,
70
74
  output: config[:output][:rabbitmq],
71
75
  logger: logger,
72
- tags: config[:output][:tags]
76
+ tags: config[:output][:tags],
77
+ statz: statz
73
78
 
74
79
  # Franz has only one kind of input, plain text files.
75
80
  Franz::Input.new \
@@ -77,7 +82,8 @@ Franz::Input.new \
77
82
  output: io,
78
83
  logger: logger,
79
84
  checkpoint: config[:checkpoint],
80
- checkpoint_interval: config[:checkpoint_interval]
85
+ checkpoint_interval: config[:checkpoint_interval],
86
+ statz: statz
81
87
 
82
88
  # Ensure memory doesn't grow too large (> 1GB by default)
83
89
  def mem_kb ; `ps -o rss= -p #{$$}`.strip.to_i ; end
@@ -85,9 +91,11 @@ def mem_kb ; `ps -o rss= -p #{$$}`.strip.to_i ; end
85
91
  mem_limit = config[:memory_limit] || 1_000_000
86
92
  mem_sleep = config[:memory_limit_interval] || 60
87
93
 
94
+ statz.create :mem_kb
88
95
  loop do
89
96
  sleep mem_sleep
90
97
  mem_used = mem_kb
98
+ statz.set :mem_kb, mem_used
91
99
  if mem_used > mem_limit
92
100
  logger.fatal \
93
101
  event: 'killed',
data/lib/franz/agg.rb CHANGED
@@ -4,6 +4,7 @@ require 'socket'
4
4
  require 'pathname'
5
5
 
6
6
  require_relative 'sash'
7
+ require_relative 'stats'
7
8
  require_relative 'input_config'
8
9
 
9
10
  module Franz
@@ -40,6 +41,9 @@ module Franz
40
41
  @buffer = Franz::Sash.new
41
42
  @stop = false
42
43
 
44
+ @statz = opts[:statz] || Franz::Stats.new
45
+ @statz.create :num_lines, 0
46
+
43
47
  @t1 = Thread.new do
44
48
  until @stop
45
49
  flush
@@ -127,6 +131,7 @@ module Franz
127
131
  raw: event
128
132
  multiline = @ic.config(event[:path])[:multiline] rescue nil
129
133
  if multiline.nil?
134
+ @statz.inc :num_lines
130
135
  enqueue event[:path], event[:line] unless event[:line].empty?
131
136
  else
132
137
  lock[event[:path]].synchronize do
@@ -140,7 +145,9 @@ module Franz
140
145
  end
141
146
  if event[:line] =~ multiline
142
147
  buffered = buffer.flush(event[:path])
143
- lines = buffered.map { |e| e[:line] }.join("\n")
148
+ lines = buffered.map { |e| e[:line] }
149
+ @statz.inc :num_lines, lines.length
150
+ lines = lines.join("\n")
144
151
  enqueue event[:path], lines unless lines.empty?
145
152
  end
146
153
  buffer.insert event[:path], event
@@ -157,7 +164,9 @@ module Franz
157
164
  lock[path].synchronize do
158
165
  if force || started - buffer.mtime(path) >= flush_interval
159
166
  buffered = buffer.remove(path)
160
- lines = buffered.map { |e| e[:line] }.join("\n")
167
+ lines = buffered.map { |e| e[:line] }
168
+ @statz.inc :num_lines, lines.length
169
+ lines = lines.join("\n")
161
170
  enqueue path, lines unless lines.empty?
162
171
  end
163
172
  end
@@ -2,6 +2,8 @@ require 'set'
2
2
  require 'logger'
3
3
  require 'shellwords'
4
4
 
5
+ require_relative 'stats'
6
+
5
7
 
6
8
  # Discover performs half of file existence detection by expanding globs and
7
9
  # keeping track of files known to Franz. Discover requires a deletions Queue to
@@ -35,6 +37,9 @@ class Franz::Discover
35
37
  config
36
38
  end
37
39
 
40
+ @statz = opts[:statz] || Franz::Stats.new
41
+ @statz.create :num_discovered, 0
42
+ @statz.create :num_deleted, 0
38
43
 
39
44
  @stop = false
40
45
 
@@ -43,6 +48,7 @@ class Franz::Discover
43
48
  until deletions.empty?
44
49
  d = deletions.pop
45
50
  @known.delete d
51
+ @statz.inc :num_deleted
46
52
  log.debug \
47
53
  event: 'discover deleted',
48
54
  path: d
@@ -51,6 +57,7 @@ class Franz::Discover
51
57
  discover.each do |discovery|
52
58
  discoveries.push discovery
53
59
  @known.add discovery
60
+ @statz.inc :num_discovered
54
61
  log.debug \
55
62
  event: 'discover discovered',
56
63
  path: discovery
data/lib/franz/input.rb CHANGED
@@ -7,6 +7,8 @@ require_relative 'agg'
7
7
  require_relative 'tail'
8
8
  require_relative 'watch'
9
9
  require_relative 'discover'
10
+ require_relative 'stats'
11
+
10
12
 
11
13
  module Franz
12
14
 
@@ -46,6 +48,7 @@ module Franz
46
48
  }.deep_merge!(opts)
47
49
 
48
50
  @logger = opts[:logger]
51
+ @statz = opts[:statz] || Franz::Stats.new
49
52
 
50
53
  @checkpoint_interval = opts[:checkpoint_interval]
51
54
  @checkpoint_path = opts[:checkpoint].sub('*', '%d')
@@ -91,8 +94,9 @@ module Franz
91
94
  deletions: deletions,
92
95
  discover_interval: opts[:input][:discover_interval],
93
96
  ignore_before: opts[:input][:ignore_before],
94
- logger: opts[:logger],
95
- known: known
97
+ logger: @logger,
98
+ known: known,
99
+ statz: @statz
96
100
 
97
101
  @watch = Franz::Watch.new \
98
102
  input_config: ic,
@@ -101,9 +105,10 @@ module Franz
101
105
  watch_events: watch_events,
102
106
  watch_interval: opts[:input][:watch_interval],
103
107
  play_catchup?: opts[:input][:play_catchup?],
104
- logger: opts[:logger],
108
+ logger: @logger,
105
109
  stats: stats,
106
- cursors: cursors
110
+ cursors: cursors,
111
+ statz: @statz
107
112
 
108
113
  @tail = Franz::Tail.new \
109
114
  input_config: ic,
@@ -112,8 +117,9 @@ module Franz
112
117
  block_size: opts[:input][:block_size],
113
118
  line_limit: opts[:input][:line_limit],
114
119
  read_limit: opts[:input][:read_limit],
115
- logger: opts[:logger],
116
- cursors: cursors
120
+ logger: @logger,
121
+ cursors: cursors,
122
+ statz: @statz
117
123
 
118
124
  @agg = Franz::Agg.new \
119
125
  input_config: ic,
@@ -121,8 +127,9 @@ module Franz
121
127
  agg_events: opts[:output],
122
128
  flush_interval: opts[:input][:flush_interval],
123
129
  buffer_limit: opts[:input][:buffer_limit],
124
- logger: opts[:logger],
125
- seqs: seqs
130
+ logger: @logger,
131
+ seqs: seqs,
132
+ statz: @statz
126
133
 
127
134
  @stop = false
128
135
  @t = Thread.new do
data/lib/franz/output.rb CHANGED
@@ -3,6 +3,9 @@ require 'json'
3
3
  require 'bunny'
4
4
  require 'deep_merge'
5
5
 
6
+ require_relative 'stats'
7
+
8
+
6
9
  module Franz
7
10
 
8
11
  # RabbitMQ output for Franz. You must declare an x-consistent-hash type
@@ -32,6 +35,9 @@ module Franz
32
35
  }
33
36
  }.deep_merge!(opts)
34
37
 
38
+ @statz = opts[:statz] || Franz::Stats.new
39
+ @statz.create :num_output, 0
40
+
35
41
  @logger = opts[:logger]
36
42
 
37
43
  rabbit = Bunny.new opts[:output][:connection].merge({
@@ -84,6 +90,8 @@ module Franz
84
90
  JSON::generate(event),
85
91
  routing_key: rand.rand(10_000),
86
92
  persistent: false
93
+
94
+ @statz.inc :num_output
87
95
  end
88
96
  end
89
97
 
@@ -0,0 +1,97 @@
1
+ require 'thread'
2
+ require 'logger'
3
+
4
+
5
+ module Franz
6
+ class Stats
7
+
8
+ def initialize opts={}
9
+ @logger = opts[:logger] || Logger.new(STDOUT)
10
+ @interval = opts[:interval] || 300
11
+ @stats = Hash.new
12
+ @lock = Mutex.new
13
+ @t = Thread.new do
14
+ loop do
15
+ sleep @interval
16
+ report
17
+ reset
18
+ end
19
+ end
20
+ end
21
+
22
+
23
+ def stop
24
+ return state if @stop
25
+ @stop = true
26
+ @t.stop
27
+ log.info event: 'stats stopped'
28
+ return nil
29
+ end
30
+
31
+
32
+ def create name, default=nil
33
+ with_lock do
34
+ stats[name] = Hash.new { |h,k| h[k] = default }
35
+ end
36
+ end
37
+
38
+ def delete name
39
+ with_lock do
40
+ stats.delete name
41
+ end
42
+ end
43
+
44
+ def inc name, by=1
45
+ with_lock do
46
+ stats[name][:val] += by
47
+ end
48
+ end
49
+
50
+ def dec name, by=1
51
+ with_lock do
52
+ stats[name][:val] -= by
53
+ end
54
+ end
55
+
56
+ def set name, to
57
+ with_lock do
58
+ stats[name][:val] = to
59
+ end
60
+ end
61
+
62
+ def get name
63
+ with_lock do
64
+ stats[name][:val]
65
+ end
66
+ end
67
+
68
+ private
69
+ attr_reader :stats
70
+
71
+ def log ; @logger end
72
+
73
+ def with_lock &block
74
+ @lock.synchronize do
75
+ yield
76
+ end
77
+ end
78
+
79
+ def report
80
+ ready_stats = with_lock do
81
+ stats.map { |k,vhash| [ k, vhash[:val] ] }
82
+ end
83
+ log.fatal \
84
+ event: 'stats',
85
+ interval: @interval,
86
+ stats: Hash[ready_stats]
87
+ end
88
+
89
+ def reset
90
+ with_lock do
91
+ stats.keys.each do |k|
92
+ stats[k].delete :val
93
+ end
94
+ end
95
+ end
96
+ end
97
+ end
data/lib/franz/tail.rb CHANGED
@@ -3,6 +3,9 @@ require 'logger'
3
3
 
4
4
  require 'eventmachine'
5
5
 
6
+ require_relative 'stats'
7
+
8
+
6
9
  module Franz
7
10
 
8
11
  # Tail receives low-level file events from a Watch and handles the actual
@@ -34,6 +37,11 @@ module Franz
34
37
  @buffer = Hash.new { |h, k| h[k] = BufferedTokenizer.new("\n", @line_limit) }
35
38
  @stop = false
36
39
 
40
+ @statz = opts[:statz] || Franz::Stats.new
41
+ @statz.create :num_reads, 0
42
+ @statz.create :num_rotates, 0
43
+ @statz.create :num_deletes, 0
44
+
37
45
  @tail_thread = Thread.new do
38
46
  handle(watch_events.shift) until @stop
39
47
  end
@@ -177,9 +185,11 @@ module Franz
177
185
  case event[:name]
178
186
 
179
187
  when :deleted
188
+ @statz.inc :num_deletes
180
189
  close path
181
190
 
182
191
  when :replaced, :truncated
192
+ @statz.inc :num_rotates
183
193
  close path
184
194
  read path, size
185
195
 
@@ -187,6 +197,7 @@ module Franz
187
197
  # Ignore read requests after a nil read. We'll wait for the next
188
198
  # event that tells us to close the file. Fingers crossed...
189
199
  unless @nil_read[path]
200
+ @statz.inc :num_reads
190
201
  read path, size
191
202
 
192
203
  else # following a nil read
data/lib/franz/watch.rb CHANGED
@@ -2,6 +2,9 @@ require 'set'
2
2
 
3
3
  require 'logger'
4
4
 
5
+ require_relative 'stats'
6
+
7
+
5
8
  module Franz
6
9
 
7
10
  # Watch works in tandem with Discover to maintain a list of known files and
@@ -31,6 +34,9 @@ module Franz
31
34
  @stats = opts[:stats] || Hash.new
32
35
  @logger = opts[:logger] || Logger.new(STDOUT)
33
36
 
37
+ @statz = opts[:statz] || Franz::Stats.new
38
+ @statz.create :num_watched
39
+
34
40
  # Make sure we're up-to-date by rewinding our old stats to our cursors
35
41
  if @play_catchup
36
42
  log.debug event: 'play catchup'
@@ -94,9 +100,8 @@ module Franz
94
100
  end
95
101
 
96
102
  def watch
97
- log.debug \
98
- event: 'watch',
99
- stats_size: stats.keys.length
103
+ log.debug event: 'watch'
104
+ @statz.set :num_watched, stats.keys.length
100
105
  deleted = []
101
106
 
102
107
  stats.keys.each do |path|
data/lib/franz.rb CHANGED
@@ -6,4 +6,5 @@ require_relative 'franz/input_config'
6
6
  require_relative 'franz/metadata'
7
7
  require_relative 'franz/output'
8
8
  require_relative 'franz/tail'
9
- require_relative 'franz/watch'
9
+ require_relative 'franz/watch'
10
+ require_relative 'franz/stats'
metadata CHANGED
@@ -1,14 +1,14 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: franz
3
3
  version: !ruby/object:Gem::Version
4
- version: 1.6.3
4
+ version: 1.6.4
5
5
  platform: ruby
6
6
  authors:
7
7
  - Sean Clemmer
8
8
  autorequire:
9
9
  bindir: bin
10
10
  cert_chain: []
11
- date: 2014-12-26 00:00:00.000000000 Z
11
+ date: 2015-01-25 00:00:00.000000000 Z
12
12
  dependencies:
13
13
  - !ruby/object:Gem::Dependency
14
14
  name: slog
@@ -103,6 +103,7 @@ extra_rdoc_files: []
103
103
  files:
104
104
  - ".gitignore"
105
105
  - Gemfile
106
+ - History.md
106
107
  - LICENSE
107
108
  - Rakefile
108
109
  - Readme.md
@@ -118,6 +119,7 @@ files:
118
119
  - lib/franz/metadata.rb
119
120
  - lib/franz/output.rb
120
121
  - lib/franz/sash.rb
122
+ - lib/franz/stats.rb
121
123
  - lib/franz/tail.rb
122
124
  - lib/franz/watch.rb
123
125
  - test/test_franz_agg.rb