franz 1.6.3 → 1.6.4

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA1:
3
- metadata.gz: e5b279a825ee6227a22713a782d777b1612dcce3
4
- data.tar.gz: 6fe951e9e90f98987606487bb96de1ad250bc57d
3
+ metadata.gz: fd7ded091e8791fdf52b423e2a66895e48dfd0f3
4
+ data.tar.gz: 5aca71dcd6052cbb2a147884b86de8141c505664
5
5
  SHA512:
6
- metadata.gz: 26b11bff12d3cae5b4429d570589b071323f5a22adb5ff965d73e764bbeedc021aa4ce4b48e0e9ecc75449e9c14bb936da71c7605f844b2b5b07f31d6dd8b701
7
- data.tar.gz: 14fde0811e334aae73983373d3ea08470ab86ef9c32d9986665e7e61d4331768cf663e31a07571f615141c96987028b8809aa1c763f9643c34aacfa0e61e28fe
6
+ metadata.gz: 4f51b1de2cac2128ee0bac0cc728ba08657c773a52965083e6c88d2dca9e5ebffac09276484c863c1d1bfdd993982b605ea9135d252dd17be648038df75e7ebe
7
+ data.tar.gz: 7ef909db8aca7ce60a1fde825d86f8113bbd1d3fdcf7a3454ace72fed0baadff0522b7637aab72e658eb7ac00a1cc03f25a3a3c8006df43fbdf2fdd1a1317ce8
data/History.md ADDED
@@ -0,0 +1,287 @@
1
+ # Franz
2
+
3
+ Hi there, your old pal Sean Clemmer here. Imma talk quite a lot, so we might as
4
+ well get acquainted. I work on the Operations team at Blue Jeans Network, an
5
+ enterprise videoconferencing provider based in Mountain View, CA. Our team
6
+ provides infrastructure, tools, and support to the Engineering team, including,
7
+ most importantly for our purposes, log storage and processing.
8
+
9
+
10
+ ## A lil History
11
+
12
+ Before the latest rearchitecture, logs at Blue Jeans were basically `rsync`ed
13
+ from every host to a central log server. Once on the box, a few processes came
14
+ afterwards to compress the files and scan for meeting identifiers. Our reporting
15
+ tools queried the log server with a meeting ID, and it replied with a list of
16
+ files and their location.
17
+
18
+ Compression saved a lot of space, and the search index was fairly small, since
19
+ we only needed to store a map of meeting IDs to file paths. If you wanted to
20
+ search the text, you need to log into the log server itself and `grep`.
21
+
22
+ And all of that worked for everyone until a point. At a certain number of files,
23
+ `grep` just wasn't fast enough, and worse, it was stealing resources necessary
24
+ for processing logs. At a certain volume, we just couldn't scan the logs fast
25
+ enough. Our scripts were getting harder to maintain, and we were looking for
26
+ answers sooner rather than later.
27
+
28
+
29
+ ### Exploring our options
30
+
31
+ We did a fair amount of research and fiddling before deciding anything. We
32
+ looked especially hard at the Elasticsearch-Logstash-Kibana (ELK) stack,
33
+ Graylog2, and rearchitecting our scripts as a distributed system. In the end,
34
+ we decided we weren't smart enough and there wasn't enough time to design our
35
+ own system from the ground up. We also found Graylog2 to be a bit immature and
36
+ lacking in features compared to the ELK stack.
37
+
38
+ In the end, we appreciated the ELK stack had a lot of community and corporate
39
+ support, it was easy to get started, and everything was fairly well-documented.
40
+ Elasticsearch in particular seemed like a well-architected and professional
41
+ software project. Logstash had an active development community, and the author
42
+ Jordan Sissel soon joined Elasticsearch, Inc. as an employee.
43
+
44
+ There was a lot of hype around ELK, and I thought we could make it work.
45
+
46
+
47
+ ### Our requirements
48
+
49
+ If you'll recall, the old system was really geared to look up log files based
50
+ on a special meeting ID. Quick, but no real search.
51
+
52
+ To emulate this with Elasticsearch we might store the whole file in a document
53
+ along with the meeting ID, which would make queries straightforward. A more
54
+ common approach in the ELK community is to store individual lines of a log file
55
+ in Elasticsearch documents. I figured we could get the file back by asking the
56
+ Elasticsearch cluster for an aggregation of documents corresponding to a given
57
+ file path on a given host.
58
+
59
+ To prove it to I slapped a couple scripts together, although the initial
60
+ implementation actually used *facets* and not *aggregations*. If we could get
61
+ the file back, that was everything we needed; we got advanced query support with
62
+ Elasticsearch and visualization with Kibana for free. Logstash was making it
63
+ easy for me to play around. Fun, even.
64
+
65
+
66
+ ### Moving forward with ELK
67
+
68
+ I got to reading about the pipelines, and I realized pretty quick we were't
69
+ just gonna be able to hook Logstash right into Elasticsearch. You should put
70
+ a kind of buffer inbetween, and both RabbitMQ and Redis were popular at the
71
+ time. While I was developing our solution, alternative "forwarders" like
72
+ Lumberjack were just being introduced. After evaluating our options, I decided
73
+ on RabbitMQ based on team experience and the native Logstash support.
74
+
75
+ So the initial pipeline looked like this:
76
+
77
+ Logstash -> RabbitMQ -> Logstash -> Elasticsearch <- Kibana
78
+
79
+ The first Logstash stage picked up logs with the `file` input and shipped them
80
+ out with the `rabbitmq` output. These log *events* sat in RabbitMQ until a
81
+ second, dedicated Logstash agent came along to parse it and shove it into
82
+ Elasticsearch for long-term storage and search. I slapped Kibana on top to
83
+ provide our users a usable window into the cluster.
84
+
85
+ And it all *kinda* worked. It wasn't very fast, and outages were fairly common,
86
+ but the all pieces were on the board. Over a few weeks I tuned and expanded the
87
+ RabbitMQ and Elasticsearch clusters, but still we were missing chunks of files,
88
+ missing whole files, and Logstash would die regularly with all kinds of strange
89
+ issues. Encoding issues, buffer issues, timeout issues, heap size issues.
90
+
91
+
92
+ ### Fighting with Logstash
93
+
94
+ Surely we weren't the only people running into issues with this very popular
95
+ piece of open source software? Logstash has a GitHub project, a JIRA account,
96
+ and an active mailing list. I scoured issues, source code, pull requests, and
97
+ e-mail archives. These seemed like huge bugs:
98
+
99
+ 1. *Multiline:* Both the multiline codec and filter had their issues. The codec
100
+ is generally preffered, but even still you might miss the last line of your
101
+ file, because Logstash does not implement *multiline flush*. Logstash will
102
+ buffer the last line of a file indefinitely, thinking you may come back and
103
+ write to the log.
104
+ 2. *File handling:* Because Logstash keeps files open indefinitely, it can soak
105
+ up file handles after running a while. We had tens of thousands of log files
106
+ on some hosts, and Logstash just wasn't having it.
107
+ 3. *Reconstruction:* Despite my initial proofs, we were having a lot of trouble
108
+ reconstructing log files from individual events. Lines were often missing,
109
+ truncated, and shuffled.
110
+
111
+ The multiline issue was actually fixed by the community, so I forked Logstash,
112
+ applied some patches, and did a little hacking to get it working right.
113
+
114
+ Fixing the second issue required delving deep into the depths of Logstash, the
115
+ `file` input, and Jordan Sissel's FileWatch project. FileWatch provides most of
116
+ the implementation for the `file` input, but it was riddled with bugs. I forked
117
+ the project and went through a major refactor to simplify and sanitize the code.
118
+ Eventually I was able to make it so Logstash would relinquish a file handle
119
+ some short interval after reading the file had ceased.
120
+
121
+ The third issue was rather more difficult. Subtle bugs at play. Rather than
122
+ relying on the `@timestamp` field, which we found did not have enough
123
+ resolution, I added a new field called `@seq`, just a simple counter, which
124
+ enabled us to put the events back in order. Still we were missing chunks, and
125
+ some lines appeared to be interleaved. Just weird stuff.
126
+
127
+ After hacking Logstash half to death we decided the first stage of the pipeline
128
+ would have to change. We'd still use Logstash to move events from RabbitMQ into
129
+ Elasticsearch, but we couldn't trust it to collect files.
130
+
131
+
132
+ ### And so Franz was born
133
+
134
+ I researched Logstash alternatives, but there weren't many at the time. Fluentd
135
+ looked promising, but early testing revealed the multiline facility wasn't quite
136
+ there yet. Lumberjack was just gaining some popularity, but it was still too
137
+ immature. In the end, I decided I had a pretty good handle on our requirements
138
+ and I would take a stab at implementing a solution.
139
+
140
+ It would be risky, but Logstash and the community just weren't moving fast
141
+ enough for our needs. Engineering was justly upset with our logging "solution",
142
+ and I was pretty frantic after weeks of hacking and debugging. How hard could
143
+ it really be to tail a file and send the lines out to a queue?
144
+
145
+ After a few prototypes and a couple false starts, we had our boy Franz.
146
+
147
+
148
+
149
+ ## Design and Implementation
150
+
151
+ From 10,000 feet Franz and Logstash are pretty similar; you can imagine Franz is
152
+ basically a Logstash agent configured with a `file` input and `rabbitmq` output.
153
+ Franz accepts a single configuration file that tells the process which files to
154
+ tail, how to handle them, and where to send the output. Besides solving the
155
+ three issues we discussed earlier, Franz provides a kind of `json` codec and
156
+ `drop` filter (in Logstash parlance).
157
+
158
+ I decided early on to implement Franz in Ruby, like Logstash. Unlike Logstash,
159
+ which is typically executed by JRuby, I decided to stick with Mat'z Ruby for
160
+ Franz in order to obtain a lower resource footprint at the expense of true
161
+ concurrency (MRI has a GIL).
162
+
163
+ Implementation-wise, Franz bears little resemblance to Logstash. Logstash has
164
+ a clever system which "compiles" the inputs, filters, and outputs into a single
165
+ block of code. Franz is a fairly straightward Ruby program with only a handful
166
+ of classes and a simple execution path.
167
+
168
+
169
+ ### The Twelve-Factor App
170
+
171
+ I was heavily influenced by [the Twelve-Factor App](http://12factor.net):
172
+
173
+ 1. *Codebase:* Franz is contained in a single repo on [GitHub](https://github.com/sczizzo/franz).
174
+ 2. *Dependencies:* Franz provides a `Gemfile` to isolate dependencies.
175
+ 3. *Config:* Franz separates code from configuration (no env vars, though).
176
+ 4. *Backing Services:* Franz is agnostic to the connected RabbitMQ server.
177
+ 5. *Build, release, run:* Franz is versioned and released as a [RubyGem](https://rubygems.org/gems/franz).
178
+ 6. *Processes:* Franz provides mostly-stateless share-nothing executions.
179
+ 7. *Port binding:* Franz isn't a Web service, so no worries here!
180
+ 8. *Concurrency:* Franz is a single process and plays nice with Upstart.
181
+ 9. *Disposability:* Franz uses a crash-only architecture, discussed below.
182
+ 10. *Dev/prod parity:* We run the same configuration in every environment.
183
+ 11. *Logs:* Franz provides structured logs which can be routed to file.
184
+ 12. *Admin processes:* Franz is simple enough this isn't necessary.
185
+
186
+
187
+ ### Crash-Only Architecture
188
+
189
+ Logstash assumes you might want to stop the process and restart it later, having
190
+ the new instance pick up where the last left off. To support this, Logstash (or
191
+ really, FileWatch) keeps a small "checkpoint" file, which is written whenever
192
+ Logstash is shut down.
193
+
194
+ Franz takes this one step further and implements a ["crash-only" design](http://lwn.net/Articles/191059).
195
+ The basic idea here the application does not distinguish between a crash and
196
+ a restart. In practical terms, Franz simply writes a checkpoint at regular
197
+ intervals; when asked to shut down, it aborts immediately.
198
+
199
+ Franz checkpoints are simple, too. It's just a `Hash` from log file paths to
200
+ the current `cursor` (byte offset) and `seq` (sequence number):
201
+
202
+ {
203
+ "/path/to/my.log": {
204
+ "cursor": 1234,
205
+ "seq": 99
206
+ }
207
+ }
208
+
209
+ The checkpoint file contains the `Marshal` representation of this `Hash`.
210
+
211
+
212
+ ### Sash
213
+
214
+ The `Sash` is a data structure I discovered during the development of Franz,
215
+ which came out of the implementation of multiline flush. Here's a taste:
216
+
217
+ s = Sash.new # => #<Sash...>
218
+ s.keys # => []
219
+ s.insert :key, :value # => :value
220
+ s.get :key # => [ :value ]
221
+ s.insert :key, :crazy # => :crazy
222
+ s.mtime :key # => 2014-02-18 21:24:30 -0800
223
+ s.flush :key # => [ :value, :crazy ]
224
+
225
+ For multilining, what you do is create a `Sash` keyed by each path, and insert
226
+ each line in the appropriate key as they come in from upstream. Before you
227
+ insert it, though, you check if the line in question matches the multiline
228
+ pattern for the key: If so, you flush the `Sash` key and write the result out as
229
+ an event. Now the `Sash` key will buffer the next event.
230
+
231
+ In fact, a `Sash` key will only ever contain lines for at most one event, and
232
+ the `mtime` method allows us to know how recently that key was modified. To
233
+ implement multiline flush correctly, we periodically check the `Sash` for old
234
+ keys and flush them according to configuration. `Sash` methods are threadsafe,
235
+ so we can do this on the side without interrupting the main thread.
236
+
237
+
238
+ ### Slog
239
+
240
+ [Slog](https://github.com/sczizzo/slog) was also factored out of Franz, but
241
+ it's heavily inspired by Logstash. By default, output is pretty and colored (I
242
+ swear):
243
+
244
+ Slog.new.info 'example'
245
+ #
246
+ # {
247
+ # "level": "info",
248
+ # "@timestamp": "2014-12-25T06:22:43.459-08:00",
249
+ # "message": "example"
250
+ # }
251
+ #
252
+
253
+ `Slog` works perfectly with Logstash or Franz when configured to treat the log
254
+ file as JSON; they'll add the other fields necessary for reconstruction.
255
+
256
+ More than anything, structured logging has changed how I approach logs. Instead
257
+ of writing *everything* to file, Franz strives to log useful events that contain
258
+ metadata. Instead of every request, an occasional digest. Instead of a paragraph
259
+ of text, a simple summary. Franz uses different log levels appropriately,
260
+ allowing end users to control verbosity.
261
+
262
+
263
+ ### Execution Path
264
+
265
+ Franz is implemented as a series of stages connected via bounded queues:
266
+
267
+ Input -> Discover -> Watch -> Tail -> Agg -> Output
268
+
269
+ Each of these stages is a class under the `Franz` namespace, and they run up
270
+ to a couple `Thread`s, typically a worker and maybe a helper (e.g. multiline
271
+ flush). Communicating via `SizedQueue`s helps ensure correctness and constrain
272
+ memory usage under high load.
273
+
274
+ 0. `Input`: Actually wires together `Discover`, `Watch`, `Tail`, and `Agg`.
275
+ 1. `Discover`: Performs half of file existence detection by expanding globs and
276
+ keeping track of files known to Franz.
277
+ 2. `Watch`: Works in tandem with `Discover` to maintain a list of known files and
278
+ their status. Events are generated when a file is created, destroyed, or
279
+ modified (including appended, truncated, and replaced).
280
+ 3. `Tail`: Receives low-level file events from a `Watch` and handles the actual
281
+ reading of files, providing a stream of lines.
282
+ 4. `Agg`: Mostly aggregates `Tail` events by applying the multiline filter, but it
283
+ also applies the `host` and `type` fields. Basically, it does all the post
284
+ processing after we've retreived a line from a file.
285
+ 5. `Output`: RabbitMQ output for Franz, based on the really-very-good [Bunny](https://github.com/ruby-amqp/bunny)
286
+ client. You must declare an `x-consistent-hash` exchange, as we generate a
287
+ random `Integer` for routing. Such is life.
data/Rakefile CHANGED
@@ -21,7 +21,6 @@ end
21
21
 
22
22
  require 'rubygems/tasks'
23
23
  Gem::Tasks.new({
24
- push: false,
25
24
  sign: {}
26
25
  }) do |tasks|
27
26
  tasks.console.command = 'pry'
@@ -30,23 +29,4 @@ Gem::Tasks::Sign::Checksum.new sha2: true
30
29
 
31
30
 
32
31
  require 'rake/version_task'
33
- Rake::VersionTask.new
34
-
35
-
36
- desc "Upload build artifacts to WOPR"
37
- task :upload => :build do
38
- pkg_name = 'franz-%s.gem' % File.read('VERSION').strip
39
- pkg_path = File.join 'pkg', pkg_name
40
-
41
- require 'net/ftp'
42
- ftp = Net::FTP.new
43
- ftp.connect '10.4.4.15', 8080
44
- ftp.login
45
- ftp.passive
46
- begin
47
- ftp.put pkg_path
48
- ftp.sendcmd("SITE CHMOD 0664 #{pkg_name}")
49
- ensure
50
- ftp.close
51
- end
52
- end
32
+ Rake::VersionTask.new
data/Readme.md CHANGED
@@ -1,4 +1,4 @@
1
- # Franz
1
+ # Franz ![Version](https://img.shields.io/gem/v/franz.svg?style=flat-square)
2
2
 
3
3
  Franz ships line-oriented log files to [RabbitMQ](http://www.rabbitmq.com/).
4
4
  Think barebones [logstash](http://logstash.net/) in pure Ruby with more modest
@@ -48,3 +48,106 @@ Okay one last feature: Every log event is assigned a sequential identifier
48
48
  according to its path (and implicitly, host) in the `@seq` field. This is useful
49
49
  if you expect your packets to get criss-crossed and you want to reconstruct the
50
50
  events in order without relying on timestamps, which you shouldn't.
51
+
52
+
53
+ ## Usage, Configuration &c.
54
+
55
+ ### Installation
56
+
57
+ You can build a gem from this repository, or use RubyGems:
58
+
59
+ $ gem install franz
60
+
61
+ ### Usage
62
+
63
+ Just call for help!
64
+
65
+ $ franz --help
66
+
67
+ .--.,
68
+ ,--.' \ __ ,-. ,---, ,----,
69
+ | | /\/,' ,'/ /| ,-+-. / | .' .`|
70
+ : : : ' | |' | ,--.--. ,--.'|' | .' .' .'
71
+ : | |-,| | ,'/ \ | | ,"' |,---, ' ./
72
+ | : :/|' : / .--. .-. | | | / | |; | .' /
73
+ | | .'| | ' \__\/: . . | | | | |`---' / ;--,
74
+ ' : ' ; : | ," .--.; | | | | |/ / / / .`|
75
+ | | | | , ; / / ,. | | | |--' ./__; .'
76
+ | : \ ---' ; : .' \| |/ ; | .'
77
+ | |,' | , .-./'---' `---'
78
+ `--' `--`---' v1.6.0
79
+
80
+
81
+ Aggregate log file events and send them elsewhere
82
+
83
+ Usage: franz [<options>]
84
+
85
+ Options:
86
+ --config, -c <s>: Configuration file to use (default: config.json)
87
+ --debug, -d: Enable debugging output
88
+ --trace, -t: Enable trace output
89
+ --log, -l <s>: Log to file, not STDOUT
90
+ --version, -v: Print version and exit
91
+ --help, -h: Show this message
92
+
93
+ ### Configuration
94
+
95
+ It's kinda like a JSON version of the Logstash config language:
96
+
97
+ {
98
+ // The asterisk will be replaced with a Unix timestamp
99
+ "checkpoint": "/etc/franz/franz.*.db",
100
+
101
+ // All input configs are files by convention
102
+ "input": {
103
+ "configs": [
104
+
105
+ // Only "type" and "includes" are required
106
+ {
107
+ "type": "example", // A nice name
108
+ "includes": [ "/path/to/your.*.log" ], // File path globs
109
+ "excludes": [ "your.bad.*.log" ], // Basename globs
110
+ "multiline": "(?i-mx:^[a-z]{3} +\\d{1,2})", // Stringified RegExp
111
+ "drop": "(?i-mx:^\\d)", // Same story.
112
+ "json?": false // JSON-formatted?
113
+ }
114
+ ]
115
+ },
116
+
117
+ // Only RabbitMQ is supported at the moment
118
+ "output": {
119
+ "rabbitmq": {
120
+
121
+ // Must be a consistently-hashed exchange
122
+ "exchange": {
123
+ "name": "logs"
124
+ },
125
+
126
+ // See Bunny docs for connection configuration
127
+ "connection": {
128
+ "host": "localhost",
129
+ "vhost": "/logs",
130
+ "user": "logs",
131
+ "pass": "logs"
132
+ }
133
+ }
134
+ }
135
+ }
136
+
137
+ ### Operation
138
+
139
+ At Blue Jeans, we deploy Franz with Upstart. Here's a minimal config:
140
+
141
+ #!upstart
142
+ description "franz"
143
+
144
+ console log
145
+
146
+ start on startup
147
+ stop on shutdown
148
+ respawn
149
+
150
+ exec franz
151
+
152
+ We actually use the [`bjn_franz` cookbook](https://github.com/sczizzo/bjn-franz-cookbook)
153
+ for Chef.
data/VERSION CHANGED
@@ -1 +1 @@
1
- 1.6.3
1
+ 1.6.4
data/bin/franz CHANGED
@@ -63,13 +63,18 @@ logger.info \
63
63
  event: 'boot',
64
64
  version: Franz::VERSION
65
65
 
66
+ statz = Franz::Stats.new \
67
+ interval: (config[:output][:stats_interval] || 300),
68
+ logger: logger
69
+
66
70
  # Now we'll connect to our output, RabbitMQ. This creates a new thread in the
67
71
  # background, which will consume the events generated by our input on io
68
72
  Franz::Output.new \
69
73
  input: io,
70
74
  output: config[:output][:rabbitmq],
71
75
  logger: logger,
72
- tags: config[:output][:tags]
76
+ tags: config[:output][:tags],
77
+ statz: statz
73
78
 
74
79
  # Franz has only one kind of input, plain text files.
75
80
  Franz::Input.new \
@@ -77,7 +82,8 @@ Franz::Input.new \
77
82
  output: io,
78
83
  logger: logger,
79
84
  checkpoint: config[:checkpoint],
80
- checkpoint_interval: config[:checkpoint_interval]
85
+ checkpoint_interval: config[:checkpoint_interval],
86
+ statz: statz
81
87
 
82
88
  # Ensure memory doesn't grow too large (> 1GB by default)
83
89
  def mem_kb ; `ps -o rss= -p #{$$}`.strip.to_i ; end
@@ -85,9 +91,11 @@ def mem_kb ; `ps -o rss= -p #{$$}`.strip.to_i ; end
85
91
  mem_limit = config[:memory_limit] || 1_000_000
86
92
  mem_sleep = config[:memory_limit_interval] || 60
87
93
 
94
+ statz.create :mem_kb
88
95
  loop do
89
96
  sleep mem_sleep
90
97
  mem_used = mem_kb
98
+ statz.set :mem_kb, mem_used
91
99
  if mem_used > mem_limit
92
100
  logger.fatal \
93
101
  event: 'killed',
data/lib/franz/agg.rb CHANGED
@@ -4,6 +4,7 @@ require 'socket'
4
4
  require 'pathname'
5
5
 
6
6
  require_relative 'sash'
7
+ require_relative 'stats'
7
8
  require_relative 'input_config'
8
9
 
9
10
  module Franz
@@ -40,6 +41,9 @@ module Franz
40
41
  @buffer = Franz::Sash.new
41
42
  @stop = false
42
43
 
44
+ @statz = opts[:statz] || Franz::Stats.new
45
+ @statz.create :num_lines, 0
46
+
43
47
  @t1 = Thread.new do
44
48
  until @stop
45
49
  flush
@@ -127,6 +131,7 @@ module Franz
127
131
  raw: event
128
132
  multiline = @ic.config(event[:path])[:multiline] rescue nil
129
133
  if multiline.nil?
134
+ @statz.inc :num_lines
130
135
  enqueue event[:path], event[:line] unless event[:line].empty?
131
136
  else
132
137
  lock[event[:path]].synchronize do
@@ -140,7 +145,9 @@ module Franz
140
145
  end
141
146
  if event[:line] =~ multiline
142
147
  buffered = buffer.flush(event[:path])
143
- lines = buffered.map { |e| e[:line] }.join("\n")
148
+ lines = buffered.map { |e| e[:line] }
149
+ @statz.inc :num_lines, lines.length
150
+ lines = lines.join("\n")
144
151
  enqueue event[:path], lines unless lines.empty?
145
152
  end
146
153
  buffer.insert event[:path], event
@@ -157,7 +164,9 @@ module Franz
157
164
  lock[path].synchronize do
158
165
  if force || started - buffer.mtime(path) >= flush_interval
159
166
  buffered = buffer.remove(path)
160
- lines = buffered.map { |e| e[:line] }.join("\n")
167
+ lines = buffered.map { |e| e[:line] }
168
+ @statz.inc :num_lines, lines.length
169
+ lines = lines.join("\n")
161
170
  enqueue path, lines unless lines.empty?
162
171
  end
163
172
  end
@@ -2,6 +2,8 @@ require 'set'
2
2
  require 'logger'
3
3
  require 'shellwords'
4
4
 
5
+ require_relative 'stats'
6
+
5
7
 
6
8
  # Discover performs half of file existence detection by expanding globs and
7
9
  # keeping track of files known to Franz. Discover requires a deletions Queue to
@@ -35,6 +37,9 @@ class Franz::Discover
35
37
  config
36
38
  end
37
39
 
40
+ @statz = opts[:statz] || Franz::Stats.new
41
+ @statz.create :num_discovered, 0
42
+ @statz.create :num_deleted, 0
38
43
 
39
44
  @stop = false
40
45
 
@@ -43,6 +48,7 @@ class Franz::Discover
43
48
  until deletions.empty?
44
49
  d = deletions.pop
45
50
  @known.delete d
51
+ @statz.inc :num_deleted
46
52
  log.debug \
47
53
  event: 'discover deleted',
48
54
  path: d
@@ -51,6 +57,7 @@ class Franz::Discover
51
57
  discover.each do |discovery|
52
58
  discoveries.push discovery
53
59
  @known.add discovery
60
+ @statz.inc :num_discovered
54
61
  log.debug \
55
62
  event: 'discover discovered',
56
63
  path: discovery
data/lib/franz/input.rb CHANGED
@@ -7,6 +7,8 @@ require_relative 'agg'
7
7
  require_relative 'tail'
8
8
  require_relative 'watch'
9
9
  require_relative 'discover'
10
+ require_relative 'stats'
11
+
10
12
 
11
13
  module Franz
12
14
 
@@ -46,6 +48,7 @@ module Franz
46
48
  }.deep_merge!(opts)
47
49
 
48
50
  @logger = opts[:logger]
51
+ @statz = opts[:statz] || Franz::Stats.new
49
52
 
50
53
  @checkpoint_interval = opts[:checkpoint_interval]
51
54
  @checkpoint_path = opts[:checkpoint].sub('*', '%d')
@@ -91,8 +94,9 @@ module Franz
91
94
  deletions: deletions,
92
95
  discover_interval: opts[:input][:discover_interval],
93
96
  ignore_before: opts[:input][:ignore_before],
94
- logger: opts[:logger],
95
- known: known
97
+ logger: @logger,
98
+ known: known,
99
+ statz: @statz
96
100
 
97
101
  @watch = Franz::Watch.new \
98
102
  input_config: ic,
@@ -101,9 +105,10 @@ module Franz
101
105
  watch_events: watch_events,
102
106
  watch_interval: opts[:input][:watch_interval],
103
107
  play_catchup?: opts[:input][:play_catchup?],
104
- logger: opts[:logger],
108
+ logger: @logger,
105
109
  stats: stats,
106
- cursors: cursors
110
+ cursors: cursors,
111
+ statz: @statz
107
112
 
108
113
  @tail = Franz::Tail.new \
109
114
  input_config: ic,
@@ -112,8 +117,9 @@ module Franz
112
117
  block_size: opts[:input][:block_size],
113
118
  line_limit: opts[:input][:line_limit],
114
119
  read_limit: opts[:input][:read_limit],
115
- logger: opts[:logger],
116
- cursors: cursors
120
+ logger: @logger,
121
+ cursors: cursors,
122
+ statz: @statz
117
123
 
118
124
  @agg = Franz::Agg.new \
119
125
  input_config: ic,
@@ -121,8 +127,9 @@ module Franz
121
127
  agg_events: opts[:output],
122
128
  flush_interval: opts[:input][:flush_interval],
123
129
  buffer_limit: opts[:input][:buffer_limit],
124
- logger: opts[:logger],
125
- seqs: seqs
130
+ logger: @logger,
131
+ seqs: seqs,
132
+ statz: @statz
126
133
 
127
134
  @stop = false
128
135
  @t = Thread.new do
data/lib/franz/output.rb CHANGED
@@ -3,6 +3,9 @@ require 'json'
3
3
  require 'bunny'
4
4
  require 'deep_merge'
5
5
 
6
+ require_relative 'stats'
7
+
8
+
6
9
  module Franz
7
10
 
8
11
  # RabbitMQ output for Franz. You must declare an x-consistent-hash type
@@ -32,6 +35,9 @@ module Franz
32
35
  }
33
36
  }.deep_merge!(opts)
34
37
 
38
+ @statz = opts[:statz] || Franz::Stats.new
39
+ @statz.create :num_output, 0
40
+
35
41
  @logger = opts[:logger]
36
42
 
37
43
  rabbit = Bunny.new opts[:output][:connection].merge({
@@ -84,6 +90,8 @@ module Franz
84
90
  JSON::generate(event),
85
91
  routing_key: rand.rand(10_000),
86
92
  persistent: false
93
+
94
+ @statz.inc :num_output
87
95
  end
88
96
  end
89
97
 
@@ -0,0 +1,97 @@
1
+ require 'thread'
2
+ require 'logger'
3
+
4
+
5
+ module Franz
6
+ class Stats
7
+
8
+ def initialize opts={}
9
+ @logger = opts[:logger] || Logger.new(STDOUT)
10
+ @interval = opts[:interval] || 300
11
+ @stats = Hash.new
12
+ @lock = Mutex.new
13
+ @t = Thread.new do
14
+ loop do
15
+ sleep @interval
16
+ report
17
+ reset
18
+ end
19
+ end
20
+ end
21
+
22
+
23
+ def stop
24
+ return state if @stop
25
+ @stop = true
26
+ @t.stop
27
+ log.info event: 'stats stopped'
28
+ return nil
29
+ end
30
+
31
+
32
+ def create name, default=nil
33
+ with_lock do
34
+ stats[name] = Hash.new { |h,k| h[k] = default }
35
+ end
36
+ end
37
+
38
+ def delete name
39
+ with_lock do
40
+ stats.delete name
41
+ end
42
+ end
43
+
44
+ def inc name, by=1
45
+ with_lock do
46
+ stats[name][:val] += by
47
+ end
48
+ end
49
+
50
+ def dec name, by=1
51
+ with_lock do
52
+ stats[name][:val] -= by
53
+ end
54
+ end
55
+
56
+ def set name, to
57
+ with_lock do
58
+ stats[name][:val] = to
59
+ end
60
+ end
61
+
62
+ def get name
63
+ with_lock do
64
+ stats[name][:val]
65
+ end
66
+ end
67
+
68
+ private
69
+ attr_reader :stats
70
+
71
+ def log ; @logger end
72
+
73
+ def with_lock &block
74
+ @lock.synchronize do
75
+ yield
76
+ end
77
+ end
78
+
79
+ def report
80
+ ready_stats = with_lock do
81
+ stats.map { |k,vhash| [ k, vhash[:val] ] }
82
+ end
83
+ log.fatal \
84
+ event: 'stats',
85
+ interval: @interval,
86
+ stats: Hash[ready_stats]
87
+ end
88
+
89
+ def reset
90
+ with_lock do
91
+ stats.keys.each do |k|
92
+ stats[k].delete :val
93
+ end
94
+ end
95
+ end
96
+ end
97
+ end
data/lib/franz/tail.rb CHANGED
@@ -3,6 +3,9 @@ require 'logger'
3
3
 
4
4
  require 'eventmachine'
5
5
 
6
+ require_relative 'stats'
7
+
8
+
6
9
  module Franz
7
10
 
8
11
  # Tail receives low-level file events from a Watch and handles the actual
@@ -34,6 +37,11 @@ module Franz
34
37
  @buffer = Hash.new { |h, k| h[k] = BufferedTokenizer.new("\n", @line_limit) }
35
38
  @stop = false
36
39
 
40
+ @statz = opts[:statz] || Franz::Stats.new
41
+ @statz.create :num_reads, 0
42
+ @statz.create :num_rotates, 0
43
+ @statz.create :num_deletes, 0
44
+
37
45
  @tail_thread = Thread.new do
38
46
  handle(watch_events.shift) until @stop
39
47
  end
@@ -177,9 +185,11 @@ module Franz
177
185
  case event[:name]
178
186
 
179
187
  when :deleted
188
+ @statz.inc :num_deletes
180
189
  close path
181
190
 
182
191
  when :replaced, :truncated
192
+ @statz.inc :num_rotates
183
193
  close path
184
194
  read path, size
185
195
 
@@ -187,6 +197,7 @@ module Franz
187
197
  # Ignore read requests after a nil read. We'll wait for the next
188
198
  # event that tells us to close the file. Fingers crossed...
189
199
  unless @nil_read[path]
200
+ @statz.inc :num_reads
190
201
  read path, size
191
202
 
192
203
  else # following a nil read
data/lib/franz/watch.rb CHANGED
@@ -2,6 +2,9 @@ require 'set'
2
2
 
3
3
  require 'logger'
4
4
 
5
+ require_relative 'stats'
6
+
7
+
5
8
  module Franz
6
9
 
7
10
  # Watch works in tandem with Discover to maintain a list of known files and
@@ -31,6 +34,9 @@ module Franz
31
34
  @stats = opts[:stats] || Hash.new
32
35
  @logger = opts[:logger] || Logger.new(STDOUT)
33
36
 
37
+ @statz = opts[:statz] || Franz::Stats.new
38
+ @statz.create :num_watched
39
+
34
40
  # Make sure we're up-to-date by rewinding our old stats to our cursors
35
41
  if @play_catchup
36
42
  log.debug event: 'play catchup'
@@ -94,9 +100,8 @@ module Franz
94
100
  end
95
101
 
96
102
  def watch
97
- log.debug \
98
- event: 'watch',
99
- stats_size: stats.keys.length
103
+ log.debug event: 'watch'
104
+ @statz.set :num_watched, stats.keys.length
100
105
  deleted = []
101
106
 
102
107
  stats.keys.each do |path|
data/lib/franz.rb CHANGED
@@ -6,4 +6,5 @@ require_relative 'franz/input_config'
6
6
  require_relative 'franz/metadata'
7
7
  require_relative 'franz/output'
8
8
  require_relative 'franz/tail'
9
- require_relative 'franz/watch'
9
+ require_relative 'franz/watch'
10
+ require_relative 'franz/stats'
metadata CHANGED
@@ -1,14 +1,14 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: franz
3
3
  version: !ruby/object:Gem::Version
4
- version: 1.6.3
4
+ version: 1.6.4
5
5
  platform: ruby
6
6
  authors:
7
7
  - Sean Clemmer
8
8
  autorequire:
9
9
  bindir: bin
10
10
  cert_chain: []
11
- date: 2014-12-26 00:00:00.000000000 Z
11
+ date: 2015-01-25 00:00:00.000000000 Z
12
12
  dependencies:
13
13
  - !ruby/object:Gem::Dependency
14
14
  name: slog
@@ -103,6 +103,7 @@ extra_rdoc_files: []
103
103
  files:
104
104
  - ".gitignore"
105
105
  - Gemfile
106
+ - History.md
106
107
  - LICENSE
107
108
  - Rakefile
108
109
  - Readme.md
@@ -118,6 +119,7 @@ files:
118
119
  - lib/franz/metadata.rb
119
120
  - lib/franz/output.rb
120
121
  - lib/franz/sash.rb
122
+ - lib/franz/stats.rb
121
123
  - lib/franz/tail.rb
122
124
  - lib/franz/watch.rb
123
125
  - test/test_franz_agg.rb