log2json 0.1.5

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
data/Gemfile ADDED
@@ -0,0 +1,2 @@
1
+ source "https://rubygems.org"
2
+ gemspec
data/Gemfile.lock ADDED
@@ -0,0 +1,24 @@
1
+ PATH
2
+ remote: .
3
+ specs:
4
+ log2json (0.1.0)
5
+ jls-grok (~> 0.10.10)
6
+ persistent_http (~> 1.0.5)
7
+ redis (~> 3.0.2)
8
+
9
+ GEM
10
+ remote: https://rubygems.org/
11
+ specs:
12
+ cabin (0.5.0)
13
+ gene_pool (1.3.0)
14
+ jls-grok (0.10.10)
15
+ cabin (~> 0.5.0)
16
+ persistent_http (1.0.5)
17
+ gene_pool (>= 1.3)
18
+ redis (3.0.3)
19
+
20
+ PLATFORMS
21
+ ruby
22
+
23
+ DEPENDENCIES
24
+ log2json!
data/README ADDED
@@ -0,0 +1,66 @@
1
+ Log2json lets you read, filter and send logs as JSON objects via Unix pipes.
2
+ It is inspired by Logstash, and is meant to be compatible with it at the JSON
3
+ event/record level so that it can easily work with Kibana.
4
+
5
+ Reading logs is done via a shell script(eg, `tail`) running in its own process.
6
+ You then configure(see the `syslog2json` or the `nginxlog2json` script for
7
+ examples) and run your filters in Ruby using the `Log2Json` module and its
8
+ contained helper classes.
9
+
10
+ `Log2Json` reads from stdin the logs(one log record per line), parses the log
11
+ lines into JSON records, and then serializes and writes the records to stdout,
12
+ which then can be piped to another process for processing or sending it to
13
+ somewhere else.
14
+
15
+ Currently, Log2json ships with a `tail-log` script that can be run as the input
16
+ process. It's the same as using the Linux `tail` utility with the `-v -F`
17
+ options except that it also tracks the positions(as the numbers of lines read
18
+ from the beginning of the files) in a few files in the file system so that if the
19
+ input process is interrupted, it can continue reading from where it left off
20
+ next time if the files had been followed. This feature is similar to the sincedb
21
+ feature in Logstash's file input.
22
+
23
+ Note: If you don't need the tracking feature(ie, you are fine with always
24
+ tailling from the end of file with `-v -F -n0`), then you can just use the `tail`
25
+ utility that comes with your Linux distribution.(Or more specifically, the
26
+ `tail` from the GNU coreutils). Other versions of the `tail` utility may also
27
+ work, but are not tested. The input protocol expected by Log2json is very
28
+ simple and documented in the source code.
29
+
30
+ ** The `tail-log` script uses a patched version of `tail` from the GNU coreutils
31
+ package. A binary of the `tail` utility compiled for Ubuntu 12.04 LTS is
32
+ included with the Log2json gem. If the binary doesn't work for your
33
+ distribution, then you'll need to get GNU coreutils-8.13, apply the patch(it
34
+ can be found in the src/ directory of the installed gem), and then replace
35
+ the bin/tail binary in the directory of the installed gem with your version
36
+ of the binary. **
37
+
38
+ P.S. If you know of a way to configure and compile ONLY the tail program in
39
+ coreutils, please let me know! The reason I'm not building tail post gem
40
+ installation is that it takes too long to configure && make because that
41
+ actually builds every utilties in coreutils.
42
+
43
+
44
+ For shipping logs to Redis, there's the `lines2redis` script that can be used as
45
+ the output process in the pipe. For shipping logs from Redis to ElasticSearch,
46
+ Log2json provides a `redis2es` script.
47
+
48
+ Finally here's an example of Log2json in action:
49
+
50
+ From a client machine:
51
+
52
+ tail-log /var/log/{sys,mail}log /var/log/{kern,auth}.log | syslog2json |
53
+ queue=jsonlogs \
54
+ flush_size=20 \
55
+ flush_interval=30 \
56
+ lines2redis host.to.redis.server 6379 0 # use redis DB 0
57
+
58
+
59
+ On the Redis server:
60
+
61
+ redis_queue=jsonlogs redis2es host.to.es.server
62
+
63
+
64
+
65
+
66
+
data/bin/lines2redis ADDED
@@ -0,0 +1,73 @@
1
+ #!/usr/bin/env ruby
2
+ #
3
+ # A simple script that reads lines from STDIN and dump them to a Redis list.
4
+
5
+ require 'thread'
6
+ require 'redis'
7
+ require 'logger'
8
+
9
+ @log = Logger.new(STDOUT)
10
+
11
+ def const(name, default)
12
+ name = name.to_s.downcase
13
+ val = ENV["lines2redis_#{name}"] || ENV[name]
14
+ val = val.to_i() if !val.nil? && default.is_a?(Fixnum)
15
+ Object.const_set(name.upcase, val || default)
16
+ end
17
+
18
+ const(:REDIS_QUEUE, 'jsonlogs')
19
+ const(:FLUSH_SIZE, 100)
20
+ const(:FLUSH_INTERVAL, 30) # seconds
21
+
22
+ config = {}
23
+ [:host, :port, :db].each_with_index do |s, i|
24
+ config[s] = ARGV[i] if ARGV[i]
25
+ end
26
+ @redis = Redis.new(config)
27
+ ARGV.clear()
28
+
29
+ @lock = Mutex.new
30
+ @queue = []
31
+
32
+ def main
33
+ Thread.new do
34
+ loop do
35
+ sleep(FLUSH_INTERVAL)
36
+ @lock.synchronize do
37
+ flush()
38
+ end
39
+ end
40
+ end
41
+ while line=gets()
42
+ line.chomp!
43
+ @lock.synchronize do
44
+ @queue << line
45
+ flush() if @queue.size >= FLUSH_SIZE
46
+ end
47
+ end
48
+ flush()
49
+ end
50
+
51
+ def flush
52
+ return if @queue.empty?
53
+ if @queue.size >= FLUSH_SIZE*2
54
+ @log.warn('Aborting, dumping queued log messages to stdout!')
55
+ @queue.each { |msg| puts msg }
56
+ raise "Queue has grown too big(size=#{@queue.size})!"
57
+ end
58
+ begin
59
+ @redis.rpush(REDIS_QUEUE, @queue)
60
+ @queue.clear()
61
+ rescue Redis::BaseConnectionError
62
+ @log.error($!)
63
+ end
64
+ end
65
+
66
+ begin
67
+ main()
68
+ ensure
69
+ @log.warn("Terminating! Flushing the queue...")
70
+ flush()
71
+ end
72
+
73
+
data/bin/nginxlog2json ADDED
@@ -0,0 +1,58 @@
1
+ #!/usr/bin/env ruby
2
+
3
+ require 'date'
4
+ require 'log2json'
5
+ require 'log2json/filters/nginx_access'
6
+ # Require your log2json filter gems here if needed
7
+
8
+ # FILTERS will be { type1=>[ filterX, filterY, ...], type2=>[...], ... }
9
+ FILTERS = Hash.new { |hash, key| hash[key] = [] }
10
+
11
+ # This method will be used later by the GrokFilter to process the JSON log records.
12
+ def nginx_error_log_proc(record)
13
+ return nil if record.nil? # return nil if the recrod doesn't match our regexs
14
+ fields = record['@fields']
15
+ record['@timestamp'] = DateTime.strptime(fields['datetime'], '%Y/%m/%d %T')
16
+ fields.delete('datetime')
17
+ record['@tags'] << 'nginx' << 'http'
18
+ record
19
+ end
20
+
21
+ # Configure log filters
22
+ [
23
+ # You can subclass the GrokFilter and use it here like this NginxAccessLogFilter
24
+ ::Log2Json::Filters::NginxAccessLogFilter.new('NginxAccessLogFilter'),
25
+
26
+ # Or you can use conigure the GrokFilter directly here like this:
27
+ ::Log2Json::Filters::GrokFilter.new(
28
+ 'nginx-error', # type
29
+ 'NginxErrorLogFilter', # name
30
+
31
+ # list of Grok regex
32
+ ['%{DATESTAMP:datetime} \[(?<level>[^\]]+)\] %{NUMBER:pid}#%{NUMBER:tid}: %{GREEDYDATA:message}'],
33
+
34
+ &method(:nginx_error_log_proc)
35
+ ),
36
+
37
+ # You can add more filters if needed
38
+
39
+ ].each { |filter| FILTERS[filter.type] << filter }
40
+
41
+ # Setup the file path-to-type map
42
+ SPITTER = ::Log2Json::Spitter.new(STDIN,
43
+ ENV['type'] || {
44
+ %r</access\.log$> => 'nginx-access',
45
+ %r</error\.log$> => 'nginx-error',
46
+ nil => 'unknown' # setup a default type to apply when there's no matches.
47
+ },
48
+ # So eg, if a log record comes from /var/log/nginx/access.log then it will be marked with type: nginx-access
49
+ # and all filters of that type will process such log record.
50
+
51
+ # Give users the ability to set tags and fields via ENV vars that will apply to ALL log records.
52
+ :TAGS => ENV['tags'] || '',
53
+ :FIELDS => ENV['fields'] || '',
54
+ )
55
+
56
+
57
+ # Start processing log lines
58
+ ::Log2Json.main(FILTERS, :spitter => SPITTER)
data/bin/redis2es ADDED
@@ -0,0 +1,146 @@
1
+ #!/usr/bin/env ruby
2
+
3
+
4
+ require 'logger'
5
+ require 'date'
6
+ require 'net/http'
7
+ require 'json'
8
+ require 'redis'
9
+ require 'persistent_http' # 1.0.5
10
+ # depends on gene_pool 1.3.0
11
+
12
+ def show_usage_and_exit(status=1)
13
+ puts "Usage: #{$0} <elasticsearch_host> [port]"
14
+ exit status
15
+ end
16
+
17
+ ES_HOST = ARGV[0] || show_usage_and_exit
18
+ ES_PORT = ARGV[1] || 9200
19
+
20
+ def const(name, default)
21
+ name = name.to_s.downcase
22
+ val = ENV["redis2es_#{name}"] || ENV[name]
23
+ val = val.to_i() if !val.nil? && default.is_a?(Fixnum)
24
+ Object.const_set(name.upcase, val || default)
25
+ end
26
+
27
+ # These constants can be overriden via environment variables in lower case or
28
+ # also prefixed with redis2es.(the later has higher precedence).
29
+ # Eg., flush_size=100 or redis2es_flush_size=100
30
+
31
+ const(:REDIS_HOST, 'localhost')
32
+ const(:REDIS_PORT, 6379)
33
+
34
+ # name of the redis list that queues the incoming log messages.
35
+ const(:REDIS_QUEUE, 'jsonlogs')
36
+
37
+ # the encoding assumed for the log records.
38
+ const(:LOG_ENCODING, 'UTF-8')
39
+
40
+ # name of the ES index for the logs. It will be passed to DateTime.strftime(LOG_INDEX_NAME)
41
+ const(:LOG_INDEX_NAME, 'log2json-%Y.%m.%d')
42
+
43
+ # max number of log records allowed in the queue.
44
+ const(:FLUSH_SIZE, 200)
45
+
46
+ # flush the queue roughly every FLUSH_TIMEOUT seconds.
47
+ # This value must be >= 2 and it must be a multiple of 2.
48
+ const(:FLUSH_TIMEOUT, 60)
49
+ if FLUSH_TIMEOUT < 2 or FLUSH_TIMEOUT % 2 != 0
50
+ STDERR.write("Invalid FLUSH_TIMEOUT=#{FLUSH_TIMEOUT}\n")
51
+ exit 1
52
+ end
53
+
54
+ LOG = Logger.new(STDOUT)
55
+ HTTP_LOG = Logger.new(STDOUT)
56
+ HTTP_LOG.level = Logger::WARN
57
+
58
+ @@http = PersistentHTTP.new(
59
+ :name => 'redis2es_http_client',
60
+ :logger => HTTP_LOG,
61
+
62
+ # this script is the only consumer of the pool and it uses only one connection at a time.
63
+ :pool_size => 1,
64
+ # Note: if the ES server can handle the load, we might be able to run multiple instances
65
+ # of this script to process the queue and send logs ES with multiple connections.
66
+
67
+ # we'll retry posting to ES since having duplicate data in ES is better than not having them.
68
+ :force_retry => true,
69
+ :url => "http://#{ES_HOST}:#{ES_PORT}"
70
+ )
71
+
72
+ @queue = []
73
+ @redis = Redis.new(host: REDIS_HOST, port: REDIS_PORT)
74
+
75
+ def flush_queue
76
+ if not @queue.empty?
77
+ req = Net::HTTP::Post.new('/_bulk')
78
+ req.body = @queue.join("\n") + "\n"
79
+ response = nil
80
+ begin
81
+ response = @@http.request(req)
82
+ ensure
83
+ if response.nil? or response.code != '200'
84
+ LOG.error(response.body) if not response.nil?
85
+ #FIXME: might be a good idea to push the undelivered log records to another queue in redis.
86
+ LOG.warn("Failed sending bulk request(#{@queue.size} records) to ES! Logging the request body instead.")
87
+ LOG.info("Failed request body:\n"+req.body)
88
+ end
89
+ end
90
+ @queue.clear()
91
+ end
92
+ end
93
+
94
+ # Determines the name of the index in ElasticSearch from the given log record's timestamp.
95
+ def es_index(tstamp)
96
+ begin
97
+ t = DateTime.parse(tstamp)
98
+ rescue ArgumentError
99
+ LOG.warn("Failed parsing timestamp: #{tstamp}")
100
+ t = DateTime.now
101
+ end
102
+ t.strftime(LOG_INDEX_NAME)
103
+ end
104
+
105
+ def enqueue(logstr)
106
+ #FIXME: might be safer to do a transcoding with replacements for invalid or undefined characters.
107
+ log = JSON.load(logstr.force_encoding(LOG_ENCODING))
108
+
109
+ # add header for each entry according to http://www.elasticsearch.org/guide/reference/api/bulk/
110
+ @queue << {"index" => {"_index" => es_index(log["@timestamp"]), "_type" => log["@type"]}}.to_json
111
+ @queue << log.to_json
112
+ end
113
+
114
+ def main
115
+ time_start = Time.now
116
+ loop do
117
+ # wait for input from the redis queue
118
+ ret = @redis.blpop(REDIS_QUEUE, timeout: FLUSH_TIMEOUT/2)
119
+ enqueue(ret[1]) if ret != nil
120
+
121
+ # try to queue up to FLUSH_SIZE
122
+ while @queue.size < FLUSH_SIZE do
123
+ # Logstash's redis input actually uses a Lua script to do the lpop in one request,
124
+ # but let's keep it simple and stupid first here.
125
+ body = @redis.lpop(REDIS_QUEUE)
126
+ break if body.nil?
127
+ enqueue(body)
128
+ end
129
+
130
+ # flush when the queue is full or when time is up.
131
+ if @queue.size == FLUSH_SIZE or (Time.now - time_start) >= FLUSH_TIMEOUT
132
+ time_start = Time.now # reset timer upon a flush or timeout
133
+ flush_queue()
134
+ end
135
+
136
+ end # loop
137
+ end
138
+
139
+ begin
140
+ main()
141
+ ensure
142
+ LOG.warn("Terminating! Flushing the queue(size=#{@queue.size})...")
143
+ flush_queue()
144
+ end
145
+
146
+
data/bin/syslog2json ADDED
@@ -0,0 +1,23 @@
1
+ #!/usr/bin/env ruby
2
+
3
+ require 'log2json'
4
+ require 'log2json/filters/syslog'
5
+ # Require your log2json filter gems here if needed
6
+
7
+ # Configure log filters
8
+ # FILTERS will be { type1=>[ filterX, filterY, ...], type2=>[...], ... }
9
+ FILTERS = Hash.new { |hash, key| hash[key] = [] }
10
+
11
+ [
12
+ # As a demo, we setup the built-in syslog filter here.
13
+ ::Log2Json::Filters::SyslogFilter.new('SysLogFilter'),
14
+
15
+ # You can add more filters if needed
16
+
17
+ ].each { |filter| FILTERS[filter.type] << filter }
18
+
19
+ # Assume the type of the input logs
20
+ ENV['type'] = 'syslog'
21
+
22
+ # Start processing log lines
23
+ ::Log2Json.main(FILTERS)
data/bin/tail ADDED
Binary file
data/bin/tail-log ADDED
@@ -0,0 +1,7 @@
1
+ #!/usr/bin/env ruby
2
+ #
3
+ # Wrapper for running the tail-log.sh shell script.
4
+ require 'log2json'
5
+ loc = Log2Json.method(:main).source_location[0]
6
+ loc = File.expand_path(File.join(loc, '..', '..', 'bin', 'tail-log.sh'))
7
+ exec(loc, *ARGV)
data/bin/tail-log.sh ADDED
@@ -0,0 +1,67 @@
1
+ #!/bin/bash
2
+ #
3
+ set -e
4
+
5
+ # Find out the absolute path to the tail utility.
6
+ # This is a patched version of the tail utility in GNU coreutils-8.13 compiled for Ubuntu 12.04 LTS.
7
+ # The difference is that if header will be shown(ie, with -v or when multiple files are specified),
8
+ # it will also print "==> file.name <== [event]" to stdout whenever a file truncation or a new file is
9
+ # detected. [event] will be one of "[new_file]" or "[truncated]".
10
+ TAIL=$(
11
+ ruby -- - <<'EOF'
12
+ require 'log2json'
13
+ loc = Log2Json.method(:main).source_location[0]
14
+ puts File.expand_path(File.join(loc, '..', '..', 'bin', 'tail'))
15
+ EOF
16
+ )
17
+
18
+ # Turn each path arguments into absolute path.
19
+ OIFS=$IFS
20
+ IFS="
21
+ "
22
+ set -- $(ruby -e "ARGV.each {|p| puts File.absolute_path(p)}" "$@")
23
+ IFS=$OIFS
24
+
25
+ # This is where we store the files that track the positions of the
26
+ # files we are tailing.
27
+ SINCEDB_DIR=${SINCEDB_DIR:-~/.tail-log}
28
+ mkdir -p "$SINCEDB_DIR" || true
29
+
30
+
31
+ # Helper to build the arguments to tail.
32
+ # Specifically, we expect the use of GNU tail as found in GNU coreutils.
33
+ # It allows us to follow(with -F) files across rotations or truncations.
34
+ # It also lets us start tailing from the n-th line of a file.
35
+ build_tail_args() {
36
+ local i=${#TAIL_ARGS[*]}
37
+ local fpath t line sincedb_path
38
+ for fpath in "$@"
39
+ do
40
+ sincedb_path=$SINCEDB_DIR/$fpath.since
41
+ if [ -r "$sincedb_path" ]; then
42
+ read line < "$sincedb_path"
43
+ t=($line)
44
+ # if inode number is unchanged and the current file size is not smaller
45
+ # then we start tailing from 1 + the line number recorded in the sincedb.
46
+ if [[ ${t[0]} == $(stat -c "%i" "$fpath") && ${t[1]} -le $(stat -c "%s" "$fpath") ]]; then
47
+ TAIL_ARGS[$((i++))]="-n+$((t[2] + 1))"
48
+ # tail -n+N means start tailing from the N-th line of the file
49
+ # and we're even allowed to specify different -n+N for different files!
50
+ TAIL_ARGS[$((i++))]=$fpath
51
+ continue
52
+ fi
53
+ fi
54
+ TAIL_ARGS[$((i++))]="-n+$(($(wc -l "$fpath" | cut -d' ' -f1) + 1))"
55
+ # Note: we can't just ask tail to seek to the end here(ie, with -n0) since
56
+ # then we'd lose track of the line count.
57
+ # Note: if fpath doesn't exist yet, then the above evaluates to "-n+1", which
58
+ # is fine.
59
+ TAIL_ARGS[$((i++))]=$fpath
60
+ done
61
+ }
62
+
63
+ TAIL_ARGS=(-v -F)
64
+ build_tail_args "$@"
65
+
66
+ $TAIL "${TAIL_ARGS[@]}" | track-tails "$SINCEDB_DIR" "${TAIL_ARGS[@]}"
67
+