RubyGems - franz - Versions diffs - 1.6.3 → 1.6.4 - Mend

franz 1.6.3 → 1.6.4

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (15) hide show

checksums.yaml CHANGED Viewed

@@ -1,7 +1,7 @@
 ---
 SHA1:
-  metadata.gz: e5b279a825ee6227a22713a782d777b1612dcce3
-  data.tar.gz: 6fe951e9e90f98987606487bb96de1ad250bc57d
+  metadata.gz: fd7ded091e8791fdf52b423e2a66895e48dfd0f3
+  data.tar.gz: 5aca71dcd6052cbb2a147884b86de8141c505664
 SHA512:
-  metadata.gz: 26b11bff12d3cae5b4429d570589b071323f5a22adb5ff965d73e764bbeedc021aa4ce4b48e0e9ecc75449e9c14bb936da71c7605f844b2b5b07f31d6dd8b701
-  data.tar.gz: 14fde0811e334aae73983373d3ea08470ab86ef9c32d9986665e7e61d4331768cf663e31a07571f615141c96987028b8809aa1c763f9643c34aacfa0e61e28fe
+  metadata.gz: 4f51b1de2cac2128ee0bac0cc728ba08657c773a52965083e6c88d2dca9e5ebffac09276484c863c1d1bfdd993982b605ea9135d252dd17be648038df75e7ebe
+  data.tar.gz: 7ef909db8aca7ce60a1fde825d86f8113bbd1d3fdcf7a3454ace72fed0baadff0522b7637aab72e658eb7ac00a1cc03f25a3a3c8006df43fbdf2fdd1a1317ce8

data/History.md ADDED Viewed

@@ -0,0 +1,287 @@
+# Franz
+Hi there, your old pal Sean Clemmer here. Imma talk quite a lot, so we might as
+well get acquainted. I work on the Operations team at Blue Jeans Network, an
+enterprise videoconferencing provider based in Mountain View, CA. Our team
+provides infrastructure, tools, and support to the Engineering team, including,
+most importantly for our purposes, log storage and processing.
+## A lil History
+Before the latest rearchitecture, logs at Blue Jeans were basically `rsync`ed
+from every host to a central log server. Once on the box, a few processes came
+afterwards to compress the files and scan for meeting identifiers. Our reporting
+tools queried the log server with a meeting ID, and it replied with a list of
+files and their location.
+Compression saved a lot of space, and the search index was fairly small, since
+we only needed to store a map of meeting IDs to file paths. If you wanted to
+search the text, you need to log into the log server itself and `grep`.
+And all of that worked for everyone until a point. At a certain number of files,
+`grep` just wasn't fast enough, and worse, it was stealing resources necessary
+for processing logs. At a certain volume, we just couldn't scan the logs fast
+enough. Our scripts were getting harder to maintain, and we were looking for
+answers sooner rather than later.
+### Exploring our options
+We did a fair amount of research and fiddling before deciding anything. We
+looked especially hard at the Elasticsearch-Logstash-Kibana (ELK) stack,
+Graylog2, and rearchitecting our scripts as a distributed system. In the end,
+we decided we weren't smart enough and there wasn't enough time to design our
+own system from the ground up. We also found Graylog2 to be a bit immature and
+lacking in features compared to the ELK stack.
+In the end, we appreciated the ELK stack had a lot of community and corporate
+support, it was easy to get started, and everything was fairly well-documented.
+Elasticsearch in particular seemed like a well-architected and professional
+software project. Logstash had an active development community, and the author
+Jordan Sissel soon joined Elasticsearch, Inc. as an employee.
+There was a lot of hype around ELK, and I thought we could make it work.
+### Our requirements
+If you'll recall, the old system was really geared to look up log files based
+on a special meeting ID. Quick, but no real search.
+To emulate this with Elasticsearch we might store the whole file in a document
+along with the meeting ID, which would make queries straightforward. A more
+common approach in the ELK community is to store individual lines of a log file
+in Elasticsearch documents. I figured we could get the file back by asking the
+Elasticsearch cluster for an aggregation of documents corresponding to a given
+file path on a given host.
+To prove it to I slapped a couple scripts together, although the initial
+implementation actually used *facets* and not *aggregations*. If we could get
+the file back, that was everything we needed; we got advanced query support with
+Elasticsearch and visualization with Kibana for free. Logstash was making it
+easy for me to play around. Fun, even.
+### Moving forward with ELK
+I got to reading about the pipelines, and I realized pretty quick we were't
+just gonna be able to hook Logstash right into Elasticsearch. You should put
+a kind of buffer inbetween, and both RabbitMQ and Redis were popular at the
+time. While I was developing our solution, alternative "forwarders" like
+Lumberjack were just being introduced. After evaluating our options, I decided
+on RabbitMQ based on team experience and the native Logstash support.
+So the initial pipeline looked like this:
+    Logstash -> RabbitMQ -> Logstash -> Elasticsearch <- Kibana
+The first Logstash stage picked up logs with the `file` input and shipped them
+out with the `rabbitmq` output. These log *events* sat in RabbitMQ until a
+second, dedicated Logstash agent came along to parse it and shove it into
+Elasticsearch for long-term storage and search. I slapped Kibana on top to
+provide our users a usable window into the cluster.
+And it all *kinda* worked. It wasn't very fast, and outages were fairly common,
+but the all pieces were on the board. Over a few weeks I tuned and expanded the
+RabbitMQ and Elasticsearch clusters, but still we were missing chunks of files,
+missing whole files, and Logstash would die regularly with all kinds of strange
+issues. Encoding issues, buffer issues, timeout issues, heap size issues.
+### Fighting with Logstash
+Surely we weren't the only people running into issues with this very popular
+piece of open source software? Logstash has a GitHub project, a JIRA account,
+and an active mailing list. I scoured issues, source code, pull requests, and
+e-mail archives. These seemed like huge bugs:
+1. *Multiline:* Both the multiline codec and filter had their issues. The codec
+   is generally preffered, but even still you might miss the last line of your
+   file, because Logstash does not implement *multiline flush*. Logstash will
+   buffer the last line of a file indefinitely, thinking you may come back and
+   write to the log.
+2. *File handling:* Because Logstash keeps files open indefinitely, it can soak
+   up file handles after running a while. We had tens of thousands of log files
+   on some hosts, and Logstash just wasn't having it.
+3. *Reconstruction:* Despite my initial proofs, we were having a lot of trouble
+   reconstructing log files from individual events. Lines were often missing,
+   truncated, and shuffled.
+The multiline issue was actually fixed by the community, so I forked Logstash,
+applied some patches, and did a little hacking to get it working right.
+Fixing the second issue required delving deep into the depths of Logstash, the
+`file` input, and Jordan Sissel's FileWatch project. FileWatch provides most of
+the implementation for the `file` input, but it was riddled with bugs. I forked
+the project and went through a major refactor to simplify and sanitize the code.
+Eventually I was able to make it so Logstash would relinquish a file handle
+some short interval after reading the file had ceased.
+The third issue was rather more difficult. Subtle bugs at play. Rather than
+relying on the `@timestamp` field, which we found did not have enough
+resolution, I added a new field called `@seq`, just a simple counter, which
+enabled us to put the events back in order. Still we were missing chunks, and
+some lines appeared to be interleaved. Just weird stuff.
+After hacking Logstash half to death we decided the first stage of the pipeline
+would have to change. We'd still use Logstash to move events from RabbitMQ into
+Elasticsearch, but we couldn't trust it to collect files.
+### And so Franz was born
+I researched Logstash alternatives, but there weren't many at the time. Fluentd
+looked promising, but early testing revealed the multiline facility wasn't quite
+there yet. Lumberjack was just gaining some popularity, but it was still too
+immature. In the end, I decided I had a pretty good handle on our requirements
+and I would take a stab at implementing a solution.
+It would be risky, but Logstash and the community just weren't moving fast
+enough for our needs. Engineering was justly upset with our logging "solution",
+and I was pretty frantic after weeks of hacking and debugging. How hard could
+it really be to tail a file and send the lines out to a queue?
+After a few prototypes and a couple false starts, we had our boy Franz.
+## Design and Implementation
+From 10,000 feet Franz and Logstash are pretty similar; you can imagine Franz is
+basically a Logstash agent configured with a `file` input and `rabbitmq` output.
+Franz accepts a single configuration file that tells the process which files to
+tail, how to handle them, and where to send the output. Besides solving the
+three issues we discussed earlier, Franz provides a kind of `json` codec and
+`drop` filter (in Logstash parlance).
+I decided early on to implement Franz in Ruby, like Logstash. Unlike Logstash,
+which is typically executed by JRuby, I decided to stick with Mat'z Ruby for
+Franz in order to obtain a lower resource footprint at the expense of true
+concurrency (MRI has a GIL).
+Implementation-wise, Franz bears little resemblance to Logstash. Logstash has
+a clever system which "compiles" the inputs, filters, and outputs into a single
+block of code. Franz is a fairly straightward Ruby program with only a handful
+of classes and a simple execution path.
+### The Twelve-Factor App
+I was heavily influenced by [the Twelve-Factor App](http://12factor.net):
+1. *Codebase:* Franz is contained in a single repo on [GitHub](https://github.com/sczizzo/franz).
+2. *Dependencies:* Franz provides a `Gemfile` to isolate dependencies.
+3. *Config:* Franz separates code from configuration (no env vars, though).
+4. *Backing Services:* Franz is agnostic to the connected RabbitMQ server.
+5. *Build, release, run:* Franz is versioned and released as a [RubyGem](https://rubygems.org/gems/franz).
+6. *Processes:* Franz provides mostly-stateless share-nothing executions.
+7. *Port binding:* Franz isn't a Web service, so no worries here!
+8. *Concurrency:* Franz is a single process and plays nice with Upstart.
+9. *Disposability:* Franz uses a crash-only architecture, discussed below.
+10. *Dev/prod parity:* We run the same configuration in every environment.
+11. *Logs:* Franz provides structured logs which can be routed to file.
+12. *Admin processes:* Franz is simple enough this isn't necessary.
+### Crash-Only Architecture
+Logstash assumes you might want to stop the process and restart it later, having
+the new instance pick up where the last left off. To support this, Logstash (or
+really, FileWatch) keeps a small "checkpoint" file, which is written whenever
+Logstash is shut down.
+Franz takes this one step further and implements a ["crash-only" design](http://lwn.net/Articles/191059).
+The basic idea here the application does not distinguish between a crash and
+a restart. In practical terms, Franz simply writes a checkpoint at regular
+intervals; when asked to shut down, it aborts immediately.
+Franz checkpoints are simple, too. It's just a `Hash` from log file paths to
+the current `cursor` (byte offset) and `seq` (sequence number):
+    {
+      "/path/to/my.log": {
+        "cursor": 1234,
+        "seq": 99
+      }
+    }
+The checkpoint file contains the `Marshal` representation of this `Hash`.
+### Sash
+The `Sash` is a data structure I discovered during the development of Franz,
+which came out of the implementation of multiline flush. Here's a taste:
+    s = Sash.new           # => #<Sash...>
+    s.keys                 # => []
+    s.insert :key, :value  # => :value
+    s.get :key             # => [ :value ]
+    s.insert :key, :crazy  # => :crazy
+    s.mtime :key           # => 2014-02-18 21:24:30 -0800
+    s.flush :key           # => [ :value, :crazy ]
+For multilining, what you do is create a `Sash` keyed by each path, and insert
+each line in the appropriate key as they come in from upstream. Before you
+insert it, though, you check if the line in question matches the multiline
+pattern for the key: If so, you flush the `Sash` key and write the result out as
+an event. Now the `Sash` key will buffer the next event.
+In fact, a `Sash` key will only ever contain lines for at most one event, and
+the `mtime` method allows us to know how recently that key was modified. To
+implement multiline flush correctly, we periodically check the `Sash` for old
+keys and flush them according to configuration. `Sash` methods are threadsafe,
+so we can do this on the side without interrupting the main thread.
+### Slog
+[Slog](https://github.com/sczizzo/slog) was also factored out of Franz, but
+it's heavily inspired by Logstash. By default, output is pretty and colored (I
+swear):
+    Slog.new.info 'example'
+    #
+    #   {
+    #     "level": "info",
+    #     "@timestamp": "2014-12-25T06:22:43.459-08:00",
+    #     "message": "example"
+    #   }
+    #
+`Slog` works perfectly with Logstash or Franz when configured to treat the log
+file as JSON; they'll add the other fields necessary for reconstruction.
+More than anything, structured logging has changed how I approach logs. Instead
+of writing *everything* to file, Franz strives to log useful events that contain
+metadata. Instead of every request, an occasional digest. Instead of a paragraph
+of text, a simple summary. Franz uses different log levels appropriately,
+allowing end users to control verbosity.
+### Execution Path
+Franz is implemented as a series of stages connected via bounded queues:
+    Input -> Discover -> Watch -> Tail -> Agg -> Output
+Each of these stages is a class under the `Franz` namespace, and they run up
+to a couple `Thread`s, typically a worker and maybe a helper (e.g. multiline
+flush). Communicating via `SizedQueue`s helps ensure correctness and constrain
+memory usage under high load.
+0. `Input`: Actually wires together `Discover`, `Watch`, `Tail`, and `Agg`.
+1. `Discover`: Performs half of file existence detection by expanding globs and
+   keeping track of files known to Franz.
+2. `Watch`: Works in tandem with `Discover` to maintain a list of known files and
+   their status. Events are generated when a file is created, destroyed, or
+   modified (including appended, truncated, and replaced).
+3. `Tail`: Receives low-level file events from a `Watch` and handles the actual
+   reading of files, providing a stream of lines.
+4. `Agg`: Mostly aggregates `Tail` events by applying the multiline filter, but it
+   also applies the `host` and `type` fields. Basically, it does all the post
+   processing after we've retreived a line from a file.
+5. `Output`: RabbitMQ output for Franz, based on the really-very-good [Bunny](https://github.com/ruby-amqp/bunny)
+   client. You must declare an `x-consistent-hash` exchange, as we generate a
+   random `Integer` for routing. Such is life.

data/Rakefile CHANGED Viewed

@@ -21,7 +21,6 @@ end
 require 'rubygems/tasks'
 Gem::Tasks.new({
-  push: false,
   sign: {}
 }) do |tasks|
   tasks.console.command = 'pry'
@@ -30,23 +29,4 @@ Gem::Tasks::Sign::Checksum.new sha2: true
 require 'rake/version_task'
-Rake::VersionTask.new
-desc "Upload build artifacts to WOPR"
-task :upload => :build do
-  pkg_name = 'franz-%s.gem' % File.read('VERSION').strip
-  pkg_path = File.join 'pkg', pkg_name
-  require 'net/ftp'
-  ftp = Net::FTP.new
-  ftp.connect '10.4.4.15', 8080
-  ftp.login
-  ftp.passive
-  begin
-    ftp.put pkg_path
-    ftp.sendcmd("SITE CHMOD 0664 #{pkg_name}")
-  ensure
-    ftp.close
-  end
-end
+Rake::VersionTask.new

data/Readme.md CHANGED Viewed

@@ -1,4 +1,4 @@
-# Franz
+# Franz ![Version](https://img.shields.io/gem/v/franz.svg?style=flat-square)
 Franz ships line-oriented log files to [RabbitMQ](http://www.rabbitmq.com/).
 Think barebones [logstash](http://logstash.net/) in pure Ruby with more modest
@@ -48,3 +48,106 @@ Okay one last feature: Every log event is assigned a sequential identifier
 according to its path (and implicitly, host) in the `@seq` field. This is useful
 if you expect your packets to get criss-crossed and you want to reconstruct the
 events in order without relying on timestamps, which you shouldn't.
+## Usage, Configuration &c.
+### Installation
+You can build a gem from this repository, or use RubyGems:
+    $ gem install franz
+### Usage
+Just call for help!
+    $ franz --help
+          .--.,
+        ,--.'  \  __  ,-.                 ,---,        ,----,
+        |  | /\/,' ,'/ /|             ,-+-. /  |     .'   .`|
+        :  : :  '  | |' | ,--.--.    ,--.'|'   |  .'   .'  .'
+        :  | |-,|  |   ,'/       \  |   |  ,"' |,---, '   ./
+        |  : :/|'  :  / .--.  .-. | |   | /  | |;   | .'  /
+        |  |  .'|  | '   \__\/: . . |   | |  | |`---' /  ;--,
+        '  : '  ;  : |   ," .--.; | |   | |  |/   /  /  / .`|
+        |  | |  |  , ;  /  /  ,.  | |   | |--'  ./__;     .'
+        |  : \   ---'  ;  :   .'   \|   |/      ;   |  .'
+        |  |,'         |  ,     .-./'---'       `---'
+        `--'            `--`---'                       v1.6.0
+    Aggregate log file events and send them elsewhere
+    Usage: franz [<options>]
+    Options:
+      --config, -c <s>:   Configuration file to use (default: config.json)
+           --debug, -d:   Enable debugging output
+           --trace, -t:   Enable trace output
+         --log, -l <s>:   Log to file, not STDOUT
+         --version, -v:   Print version and exit
+            --help, -h:   Show this message
+### Configuration
+It's kinda like a JSON version of the Logstash config language:
+    {
+      // The asterisk will be replaced with a Unix timestamp
+      "checkpoint": "/etc/franz/franz.*.db",
+      // All input configs are files by convention
+      "input": {
+        "configs": [
+          // Only "type" and "includes" are required
+          {
+            "type": "example",                          // A nice name
+            "includes": [ "/path/to/your.*.log" ],      // File path globs
+            "excludes": [ "your.bad.*.log" ],           // Basename globs
+            "multiline": "(?i-mx:^[a-z]{3} +\\d{1,2})", // Stringified RegExp
+            "drop": "(?i-mx:^\\d)",                     // Same story.
+            "json?": false                              // JSON-formatted?
+          }
+        ]
+      },
+      // Only RabbitMQ is supported at the moment
+      "output": {
+        "rabbitmq": {
+          // Must be a consistently-hashed exchange
+          "exchange": {
+            "name": "logs"
+          },
+          // See Bunny docs for connection configuration
+          "connection": {
+            "host": "localhost",
+            "vhost": "/logs",
+            "user": "logs",
+            "pass": "logs"
+          }
+        }
+      }
+    }
+### Operation
+At Blue Jeans, we deploy Franz with Upstart. Here's a minimal config:
+    #!upstart
+    description "franz"
+    console log
+    start on startup
+    stop on shutdown
+    respawn
+    exec franz
+We actually use the [`bjn_franz` cookbook](https://github.com/sczizzo/bjn-franz-cookbook)
+for Chef.

data/VERSION CHANGED Viewed

	@@ -1 +1 @@
1	- 1.6.3
1	+ 1.6.4

data/bin/franz CHANGED Viewed

@@ -63,13 +63,18 @@ logger.info \
   event: 'boot',
   version: Franz::VERSION
+statz = Franz::Stats.new \
+  interval: (config[:output][:stats_interval] || 300),
+  logger: logger
 # Now we'll connect to our output, RabbitMQ. This creates a new thread in the
 # background, which will consume the events generated by our input on io
 Franz::Output.new \
   input: io,
   output: config[:output][:rabbitmq],
   logger: logger,
-  tags: config[:output][:tags]
+  tags: config[:output][:tags],
+  statz: statz
 # Franz has only one kind of input, plain text files.
 Franz::Input.new \
@@ -77,7 +82,8 @@ Franz::Input.new \
   output: io,
   logger: logger,
   checkpoint: config[:checkpoint],
-  checkpoint_interval: config[:checkpoint_interval]
+  checkpoint_interval: config[:checkpoint_interval],
+  statz: statz
 # Ensure memory doesn't grow too large (> 1GB by default)
 def mem_kb ; `ps -o rss= -p #{$$}`.strip.to_i ; end
@@ -85,9 +91,11 @@ def mem_kb ; `ps -o rss= -p #{$$}`.strip.to_i ; end
 mem_limit = config[:memory_limit] || 1_000_000
 mem_sleep = config[:memory_limit_interval] || 60
+statz.create :mem_kb
 loop do
   sleep mem_sleep
   mem_used = mem_kb
+  statz.set :mem_kb, mem_used
   if mem_used > mem_limit
     logger.fatal \
       event: 'killed',

data/lib/franz/agg.rb CHANGED Viewed

@@ -4,6 +4,7 @@ require 'socket'
 require 'pathname'
 require_relative 'sash'
+require_relative 'stats'
 require_relative 'input_config'
 module Franz
@@ -40,6 +41,9 @@ module Franz
       @buffer = Franz::Sash.new
       @stop   = false
+      @statz = opts[:statz] || Franz::Stats.new
+      @statz.create :num_lines, 0
       @t1 = Thread.new do
         until @stop
           flush
@@ -127,6 +131,7 @@ module Franz
         raw: event
       multiline = @ic.config(event[:path])[:multiline] rescue nil
       if multiline.nil?
+        @statz.inc :num_lines
         enqueue event[:path], event[:line] unless event[:line].empty?
       else
         lock[event[:path]].synchronize do
@@ -140,7 +145,9 @@ module Franz
           end
           if event[:line] =~ multiline
             buffered = buffer.flush(event[:path])
-            lines    = buffered.map { |e| e[:line] }.join("\n")
+            lines    = buffered.map { |e| e[:line] }
+            @statz.inc :num_lines, lines.length
+            lines    = lines.join("\n")
             enqueue event[:path], lines unless lines.empty?
           end
           buffer.insert event[:path], event
@@ -157,7 +164,9 @@ module Franz
         lock[path].synchronize do
           if force || started - buffer.mtime(path) >= flush_interval
             buffered = buffer.remove(path)
-            lines    = buffered.map { |e| e[:line] }.join("\n")
+            lines    = buffered.map { |e| e[:line] }
+            @statz.inc :num_lines, lines.length
+            lines    = lines.join("\n")
             enqueue path, lines unless lines.empty?
           end
         end

data/lib/franz/discover.rb CHANGED Viewed

@@ -2,6 +2,8 @@ require 'set'
 require 'logger'
 require 'shellwords'
+require_relative 'stats'
 # Discover performs half of file existence detection by expanding globs and
 # keeping track of files known to Franz. Discover requires a deletions Queue to
@@ -35,6 +37,9 @@ class Franz::Discover
       config
     end
+    @statz = opts[:statz] || Franz::Stats.new
+    @statz.create :num_discovered, 0
+    @statz.create :num_deleted, 0
     @stop = false
@@ -43,6 +48,7 @@ class Franz::Discover
         until deletions.empty?
           d = deletions.pop
           @known.delete d
+          @statz.inc :num_deleted
           log.debug \
             event: 'discover deleted',
             path: d
@@ -51,6 +57,7 @@ class Franz::Discover
         discover.each do |discovery|
           discoveries.push discovery
           @known.add discovery
+          @statz.inc :num_discovered
           log.debug \
             event: 'discover discovered',
             path: discovery

data/lib/franz/input.rb CHANGED Viewed

@@ -7,6 +7,8 @@ require_relative 'agg'
 require_relative 'tail'
 require_relative 'watch'
 require_relative 'discover'
+require_relative 'stats'
 module Franz
@@ -46,6 +48,7 @@ module Franz
       }.deep_merge!(opts)
       @logger = opts[:logger]
+      @statz = opts[:statz] || Franz::Stats.new
       @checkpoint_interval = opts[:checkpoint_interval]
       @checkpoint_path     = opts[:checkpoint].sub('*', '%d')
@@ -91,8 +94,9 @@ module Franz
         deletions: deletions,
         discover_interval: opts[:input][:discover_interval],
         ignore_before: opts[:input][:ignore_before],
-        logger: opts[:logger],
-        known: known
+        logger: @logger,
+        known: known,
+        statz: @statz
       @watch = Franz::Watch.new \
         input_config: ic,
@@ -101,9 +105,10 @@ module Franz
         watch_events: watch_events,
         watch_interval: opts[:input][:watch_interval],
         play_catchup?: opts[:input][:play_catchup?],
-        logger: opts[:logger],
+        logger: @logger,
         stats: stats,
-        cursors: cursors
+        cursors: cursors,
+        statz: @statz
       @tail = Franz::Tail.new \
         input_config: ic,
@@ -112,8 +117,9 @@ module Franz
         block_size: opts[:input][:block_size],
         line_limit: opts[:input][:line_limit],
         read_limit: opts[:input][:read_limit],
-        logger: opts[:logger],
-        cursors: cursors
+        logger: @logger,
+        cursors: cursors,
+        statz: @statz
       @agg = Franz::Agg.new \
         input_config: ic,
@@ -121,8 +127,9 @@ module Franz
         agg_events: opts[:output],
         flush_interval: opts[:input][:flush_interval],
         buffer_limit: opts[:input][:buffer_limit],
-        logger: opts[:logger],
-        seqs: seqs
+        logger: @logger,
+        seqs: seqs,
+        statz: @statz
       @stop = false
       @t = Thread.new do

data/lib/franz/output.rb CHANGED Viewed

@@ -3,6 +3,9 @@ require 'json'
 require 'bunny'
 require 'deep_merge'
+require_relative 'stats'
 module Franz
   # RabbitMQ output for Franz. You must declare an x-consistent-hash type
@@ -32,6 +35,9 @@ module Franz
         }
       }.deep_merge!(opts)
+      @statz = opts[:statz] || Franz::Stats.new
+      @statz.create :num_output, 0
       @logger = opts[:logger]
       rabbit = Bunny.new opts[:output][:connection].merge({
@@ -84,6 +90,8 @@ module Franz
             JSON::generate(event),
             routing_key: rand.rand(10_000),
             persistent: false
+          @statz.inc :num_output
         end
       end

data/lib/franz/stats.rb ADDED Viewed

@@ -0,0 +1,97 @@
+require 'thread'
+require 'logger'
+module Franz
+  class Stats
+    def initialize opts={}
+      @logger = opts[:logger] || Logger.new(STDOUT)
+      @interval = opts[:interval] || 300
+      @stats = Hash.new
+      @lock = Mutex.new
+      @t = Thread.new do
+        loop do
+          sleep @interval
+          report
+          reset
+        end
+      end
+    end
+    def stop
+      return state if @stop
+      @stop = true
+      @t.stop
+      log.info event: 'stats stopped'
+      return nil
+    end
+    def create name, default=nil
+      with_lock do
+        stats[name] = Hash.new { |h,k| h[k] = default }
+      end
+    end
+    def delete name
+      with_lock do
+        stats.delete name
+      end
+    end
+    def inc name, by=1
+      with_lock do
+        stats[name][:val] += by
+      end
+    end
+    def dec name, by=1
+      with_lock do
+        stats[name][:val] -= by
+      end
+    end
+    def set name, to
+      with_lock do
+        stats[name][:val] = to
+      end
+    end
+    def get name
+      with_lock do
+        stats[name][:val]
+      end
+    end
+  private
+    attr_reader :stats
+    def log ; @logger end
+    def with_lock &block
+      @lock.synchronize do
+        yield
+      end
+    end
+    def report
+      ready_stats = with_lock do
+        stats.map { |k,vhash| [ k, vhash[:val] ] }
+      end
+      log.fatal \
+        event: 'stats',
+        interval: @interval,
+        stats: Hash[ready_stats]
+    end
+    def reset
+      with_lock do
+        stats.keys.each do |k|
+          stats[k].delete :val
+        end
+      end
+    end
+  end
+end

data/lib/franz/tail.rb CHANGED Viewed

@@ -3,6 +3,9 @@ require 'logger'
 require 'eventmachine'
+require_relative 'stats'
 module Franz
   # Tail receives low-level file events from a Watch and handles the actual
@@ -34,6 +37,11 @@ module Franz
       @buffer = Hash.new { |h, k| h[k] = BufferedTokenizer.new("\n", @line_limit) }
       @stop = false
+      @statz = opts[:statz] || Franz::Stats.new
+      @statz.create :num_reads, 0
+      @statz.create :num_rotates, 0
+      @statz.create :num_deletes, 0
       @tail_thread = Thread.new do
         handle(watch_events.shift) until @stop
       end
@@ -177,9 +185,11 @@ module Franz
       case event[:name]
       when :deleted
+        @statz.inc :num_deletes
         close path
       when :replaced, :truncated
+        @statz.inc :num_rotates
         close path
         read path, size
@@ -187,6 +197,7 @@ module Franz
         # Ignore read requests after a nil read. We'll wait for the next
         # event that tells us to close the file. Fingers crossed...
         unless @nil_read[path]
+          @statz.inc :num_reads
           read path, size
         else # following a nil read

data/lib/franz/watch.rb CHANGED Viewed

@@ -2,6 +2,9 @@ require 'set'
 require 'logger'
+require_relative 'stats'
 module Franz
   # Watch works in tandem with Discover to maintain a list of known files and
@@ -31,6 +34,9 @@ module Franz
       @stats          = opts[:stats]          || Hash.new
       @logger         = opts[:logger]         || Logger.new(STDOUT)
+      @statz = opts[:statz] || Franz::Stats.new
+      @statz.create :num_watched
       # Make sure we're up-to-date by rewinding our old stats to our cursors
       if @play_catchup
         log.debug event: 'play catchup'
@@ -94,9 +100,8 @@ module Franz
     end
     def watch
-      log.debug \
-        event: 'watch',
-        stats_size: stats.keys.length
+      log.debug event: 'watch'
+      @statz.set :num_watched, stats.keys.length
       deleted = []
       stats.keys.each do |path|

data/lib/franz.rb CHANGED Viewed

@@ -6,4 +6,5 @@ require_relative 'franz/input_config'
 require_relative 'franz/metadata'
 require_relative 'franz/output'
 require_relative 'franz/tail'
-require_relative 'franz/watch'
+require_relative 'franz/watch'
+require_relative 'franz/stats'

metadata CHANGED Viewed

@@ -1,14 +1,14 @@
 --- !ruby/object:Gem::Specification
 name: franz
 version: !ruby/object:Gem::Version
-  version: 1.6.3
+  version: 1.6.4
 platform: ruby
 authors:
 - Sean Clemmer
 autorequire:
 bindir: bin
 cert_chain: []
-date: 2014-12-26 00:00:00.000000000 Z
+date: 2015-01-25 00:00:00.000000000 Z
 dependencies:
 - !ruby/object:Gem::Dependency
   name: slog
@@ -103,6 +103,7 @@ extra_rdoc_files: []
 files:
 - ".gitignore"
 - Gemfile
+- History.md
 - LICENSE
 - Rakefile
 - Readme.md
@@ -118,6 +119,7 @@ files:
 - lib/franz/metadata.rb
 - lib/franz/output.rb
 - lib/franz/sash.rb
+- lib/franz/stats.rb
 - lib/franz/tail.rb
 - lib/franz/watch.rb
 - test/test_franz_agg.rb