log2json 0.1.5

Sign up to get free protection for your applications and to get access to all the features.
data/Gemfile ADDED
@@ -0,0 +1,2 @@
1
+ source "https://rubygems.org"
2
+ gemspec
data/Gemfile.lock ADDED
@@ -0,0 +1,24 @@
1
+ PATH
2
+ remote: .
3
+ specs:
4
+ log2json (0.1.0)
5
+ jls-grok (~> 0.10.10)
6
+ persistent_http (~> 1.0.5)
7
+ redis (~> 3.0.2)
8
+
9
+ GEM
10
+ remote: https://rubygems.org/
11
+ specs:
12
+ cabin (0.5.0)
13
+ gene_pool (1.3.0)
14
+ jls-grok (0.10.10)
15
+ cabin (~> 0.5.0)
16
+ persistent_http (1.0.5)
17
+ gene_pool (>= 1.3)
18
+ redis (3.0.3)
19
+
20
+ PLATFORMS
21
+ ruby
22
+
23
+ DEPENDENCIES
24
+ log2json!
data/README ADDED
@@ -0,0 +1,66 @@
1
+ Log2json lets you read, filter and send logs as JSON objects via Unix pipes.
2
+ It is inspired by Logstash, and is meant to be compatible with it at the JSON
3
+ event/record level so that it can easily work with Kibana.
4
+
5
+ Reading logs is done via a shell script(eg, `tail`) running in its own process.
6
+ You then configure(see the `syslog2json` or the `nginxlog2json` script for
7
+ examples) and run your filters in Ruby using the `Log2Json` module and its
8
+ contained helper classes.
9
+
10
+ `Log2Json` reads from stdin the logs(one log record per line), parses the log
11
+ lines into JSON records, and then serializes and writes the records to stdout,
12
+ which then can be piped to another process for processing or sending it to
13
+ somewhere else.
14
+
15
+ Currently, Log2json ships with a `tail-log` script that can be run as the input
16
+ process. It's the same as using the Linux `tail` utility with the `-v -F`
17
+ options except that it also tracks the positions(as the numbers of lines read
18
+ from the beginning of the files) in a few files in the file system so that if the
19
+ input process is interrupted, it can continue reading from where it left off
20
+ next time if the files had been followed. This feature is similar to the sincedb
21
+ feature in Logstash's file input.
22
+
23
+ Note: If you don't need the tracking feature(ie, you are fine with always
24
+ tailling from the end of file with `-v -F -n0`), then you can just use the `tail`
25
+ utility that comes with your Linux distribution.(Or more specifically, the
26
+ `tail` from the GNU coreutils). Other versions of the `tail` utility may also
27
+ work, but are not tested. The input protocol expected by Log2json is very
28
+ simple and documented in the source code.
29
+
30
+ ** The `tail-log` script uses a patched version of `tail` from the GNU coreutils
31
+ package. A binary of the `tail` utility compiled for Ubuntu 12.04 LTS is
32
+ included with the Log2json gem. If the binary doesn't work for your
33
+ distribution, then you'll need to get GNU coreutils-8.13, apply the patch(it
34
+ can be found in the src/ directory of the installed gem), and then replace
35
+ the bin/tail binary in the directory of the installed gem with your version
36
+ of the binary. **
37
+
38
+ P.S. If you know of a way to configure and compile ONLY the tail program in
39
+ coreutils, please let me know! The reason I'm not building tail post gem
40
+ installation is that it takes too long to configure && make because that
41
+ actually builds every utilties in coreutils.
42
+
43
+
44
+ For shipping logs to Redis, there's the `lines2redis` script that can be used as
45
+ the output process in the pipe. For shipping logs from Redis to ElasticSearch,
46
+ Log2json provides a `redis2es` script.
47
+
48
+ Finally here's an example of Log2json in action:
49
+
50
+ From a client machine:
51
+
52
+ tail-log /var/log/{sys,mail}log /var/log/{kern,auth}.log | syslog2json |
53
+ queue=jsonlogs \
54
+ flush_size=20 \
55
+ flush_interval=30 \
56
+ lines2redis host.to.redis.server 6379 0 # use redis DB 0
57
+
58
+
59
+ On the Redis server:
60
+
61
+ redis_queue=jsonlogs redis2es host.to.es.server
62
+
63
+
64
+
65
+
66
+
data/bin/lines2redis ADDED
@@ -0,0 +1,73 @@
1
+ #!/usr/bin/env ruby
2
+ #
3
+ # A simple script that reads lines from STDIN and dump them to a Redis list.
4
+
5
+ require 'thread'
6
+ require 'redis'
7
+ require 'logger'
8
+
9
+ @log = Logger.new(STDOUT)
10
+
11
+ def const(name, default)
12
+ name = name.to_s.downcase
13
+ val = ENV["lines2redis_#{name}"] || ENV[name]
14
+ val = val.to_i() if !val.nil? && default.is_a?(Fixnum)
15
+ Object.const_set(name.upcase, val || default)
16
+ end
17
+
18
+ const(:REDIS_QUEUE, 'jsonlogs')
19
+ const(:FLUSH_SIZE, 100)
20
+ const(:FLUSH_INTERVAL, 30) # seconds
21
+
22
+ config = {}
23
+ [:host, :port, :db].each_with_index do |s, i|
24
+ config[s] = ARGV[i] if ARGV[i]
25
+ end
26
+ @redis = Redis.new(config)
27
+ ARGV.clear()
28
+
29
+ @lock = Mutex.new
30
+ @queue = []
31
+
32
+ def main
33
+ Thread.new do
34
+ loop do
35
+ sleep(FLUSH_INTERVAL)
36
+ @lock.synchronize do
37
+ flush()
38
+ end
39
+ end
40
+ end
41
+ while line=gets()
42
+ line.chomp!
43
+ @lock.synchronize do
44
+ @queue << line
45
+ flush() if @queue.size >= FLUSH_SIZE
46
+ end
47
+ end
48
+ flush()
49
+ end
50
+
51
+ def flush
52
+ return if @queue.empty?
53
+ if @queue.size >= FLUSH_SIZE*2
54
+ @log.warn('Aborting, dumping queued log messages to stdout!')
55
+ @queue.each { |msg| puts msg }
56
+ raise "Queue has grown too big(size=#{@queue.size})!"
57
+ end
58
+ begin
59
+ @redis.rpush(REDIS_QUEUE, @queue)
60
+ @queue.clear()
61
+ rescue Redis::BaseConnectionError
62
+ @log.error($!)
63
+ end
64
+ end
65
+
66
+ begin
67
+ main()
68
+ ensure
69
+ @log.warn("Terminating! Flushing the queue...")
70
+ flush()
71
+ end
72
+
73
+
data/bin/nginxlog2json ADDED
@@ -0,0 +1,58 @@
1
+ #!/usr/bin/env ruby
2
+
3
+ require 'date'
4
+ require 'log2json'
5
+ require 'log2json/filters/nginx_access'
6
+ # Require your log2json filter gems here if needed
7
+
8
+ # FILTERS will be { type1=>[ filterX, filterY, ...], type2=>[...], ... }
9
+ FILTERS = Hash.new { |hash, key| hash[key] = [] }
10
+
11
+ # This method will be used later by the GrokFilter to process the JSON log records.
12
+ def nginx_error_log_proc(record)
13
+ return nil if record.nil? # return nil if the recrod doesn't match our regexs
14
+ fields = record['@fields']
15
+ record['@timestamp'] = DateTime.strptime(fields['datetime'], '%Y/%m/%d %T')
16
+ fields.delete('datetime')
17
+ record['@tags'] << 'nginx' << 'http'
18
+ record
19
+ end
20
+
21
+ # Configure log filters
22
+ [
23
+ # You can subclass the GrokFilter and use it here like this NginxAccessLogFilter
24
+ ::Log2Json::Filters::NginxAccessLogFilter.new('NginxAccessLogFilter'),
25
+
26
+ # Or you can use conigure the GrokFilter directly here like this:
27
+ ::Log2Json::Filters::GrokFilter.new(
28
+ 'nginx-error', # type
29
+ 'NginxErrorLogFilter', # name
30
+
31
+ # list of Grok regex
32
+ ['%{DATESTAMP:datetime} \[(?<level>[^\]]+)\] %{NUMBER:pid}#%{NUMBER:tid}: %{GREEDYDATA:message}'],
33
+
34
+ &method(:nginx_error_log_proc)
35
+ ),
36
+
37
+ # You can add more filters if needed
38
+
39
+ ].each { |filter| FILTERS[filter.type] << filter }
40
+
41
+ # Setup the file path-to-type map
42
+ SPITTER = ::Log2Json::Spitter.new(STDIN,
43
+ ENV['type'] || {
44
+ %r</access\.log$> => 'nginx-access',
45
+ %r</error\.log$> => 'nginx-error',
46
+ nil => 'unknown' # setup a default type to apply when there's no matches.
47
+ },
48
+ # So eg, if a log record comes from /var/log/nginx/access.log then it will be marked with type: nginx-access
49
+ # and all filters of that type will process such log record.
50
+
51
+ # Give users the ability to set tags and fields via ENV vars that will apply to ALL log records.
52
+ :TAGS => ENV['tags'] || '',
53
+ :FIELDS => ENV['fields'] || '',
54
+ )
55
+
56
+
57
+ # Start processing log lines
58
+ ::Log2Json.main(FILTERS, :spitter => SPITTER)
data/bin/redis2es ADDED
@@ -0,0 +1,146 @@
1
+ #!/usr/bin/env ruby
2
+
3
+
4
+ require 'logger'
5
+ require 'date'
6
+ require 'net/http'
7
+ require 'json'
8
+ require 'redis'
9
+ require 'persistent_http' # 1.0.5
10
+ # depends on gene_pool 1.3.0
11
+
12
+ def show_usage_and_exit(status=1)
13
+ puts "Usage: #{$0} <elasticsearch_host> [port]"
14
+ exit status
15
+ end
16
+
17
+ ES_HOST = ARGV[0] || show_usage_and_exit
18
+ ES_PORT = ARGV[1] || 9200
19
+
20
+ def const(name, default)
21
+ name = name.to_s.downcase
22
+ val = ENV["redis2es_#{name}"] || ENV[name]
23
+ val = val.to_i() if !val.nil? && default.is_a?(Fixnum)
24
+ Object.const_set(name.upcase, val || default)
25
+ end
26
+
27
+ # These constants can be overriden via environment variables in lower case or
28
+ # also prefixed with redis2es.(the later has higher precedence).
29
+ # Eg., flush_size=100 or redis2es_flush_size=100
30
+
31
+ const(:REDIS_HOST, 'localhost')
32
+ const(:REDIS_PORT, 6379)
33
+
34
+ # name of the redis list that queues the incoming log messages.
35
+ const(:REDIS_QUEUE, 'jsonlogs')
36
+
37
+ # the encoding assumed for the log records.
38
+ const(:LOG_ENCODING, 'UTF-8')
39
+
40
+ # name of the ES index for the logs. It will be passed to DateTime.strftime(LOG_INDEX_NAME)
41
+ const(:LOG_INDEX_NAME, 'log2json-%Y.%m.%d')
42
+
43
+ # max number of log records allowed in the queue.
44
+ const(:FLUSH_SIZE, 200)
45
+
46
+ # flush the queue roughly every FLUSH_TIMEOUT seconds.
47
+ # This value must be >= 2 and it must be a multiple of 2.
48
+ const(:FLUSH_TIMEOUT, 60)
49
+ if FLUSH_TIMEOUT < 2 or FLUSH_TIMEOUT % 2 != 0
50
+ STDERR.write("Invalid FLUSH_TIMEOUT=#{FLUSH_TIMEOUT}\n")
51
+ exit 1
52
+ end
53
+
54
+ LOG = Logger.new(STDOUT)
55
+ HTTP_LOG = Logger.new(STDOUT)
56
+ HTTP_LOG.level = Logger::WARN
57
+
58
+ @@http = PersistentHTTP.new(
59
+ :name => 'redis2es_http_client',
60
+ :logger => HTTP_LOG,
61
+
62
+ # this script is the only consumer of the pool and it uses only one connection at a time.
63
+ :pool_size => 1,
64
+ # Note: if the ES server can handle the load, we might be able to run multiple instances
65
+ # of this script to process the queue and send logs ES with multiple connections.
66
+
67
+ # we'll retry posting to ES since having duplicate data in ES is better than not having them.
68
+ :force_retry => true,
69
+ :url => "http://#{ES_HOST}:#{ES_PORT}"
70
+ )
71
+
72
+ @queue = []
73
+ @redis = Redis.new(host: REDIS_HOST, port: REDIS_PORT)
74
+
75
+ def flush_queue
76
+ if not @queue.empty?
77
+ req = Net::HTTP::Post.new('/_bulk')
78
+ req.body = @queue.join("\n") + "\n"
79
+ response = nil
80
+ begin
81
+ response = @@http.request(req)
82
+ ensure
83
+ if response.nil? or response.code != '200'
84
+ LOG.error(response.body) if not response.nil?
85
+ #FIXME: might be a good idea to push the undelivered log records to another queue in redis.
86
+ LOG.warn("Failed sending bulk request(#{@queue.size} records) to ES! Logging the request body instead.")
87
+ LOG.info("Failed request body:\n"+req.body)
88
+ end
89
+ end
90
+ @queue.clear()
91
+ end
92
+ end
93
+
94
+ # Determines the name of the index in ElasticSearch from the given log record's timestamp.
95
+ def es_index(tstamp)
96
+ begin
97
+ t = DateTime.parse(tstamp)
98
+ rescue ArgumentError
99
+ LOG.warn("Failed parsing timestamp: #{tstamp}")
100
+ t = DateTime.now
101
+ end
102
+ t.strftime(LOG_INDEX_NAME)
103
+ end
104
+
105
+ def enqueue(logstr)
106
+ #FIXME: might be safer to do a transcoding with replacements for invalid or undefined characters.
107
+ log = JSON.load(logstr.force_encoding(LOG_ENCODING))
108
+
109
+ # add header for each entry according to http://www.elasticsearch.org/guide/reference/api/bulk/
110
+ @queue << {"index" => {"_index" => es_index(log["@timestamp"]), "_type" => log["@type"]}}.to_json
111
+ @queue << log.to_json
112
+ end
113
+
114
+ def main
115
+ time_start = Time.now
116
+ loop do
117
+ # wait for input from the redis queue
118
+ ret = @redis.blpop(REDIS_QUEUE, timeout: FLUSH_TIMEOUT/2)
119
+ enqueue(ret[1]) if ret != nil
120
+
121
+ # try to queue up to FLUSH_SIZE
122
+ while @queue.size < FLUSH_SIZE do
123
+ # Logstash's redis input actually uses a Lua script to do the lpop in one request,
124
+ # but let's keep it simple and stupid first here.
125
+ body = @redis.lpop(REDIS_QUEUE)
126
+ break if body.nil?
127
+ enqueue(body)
128
+ end
129
+
130
+ # flush when the queue is full or when time is up.
131
+ if @queue.size == FLUSH_SIZE or (Time.now - time_start) >= FLUSH_TIMEOUT
132
+ time_start = Time.now # reset timer upon a flush or timeout
133
+ flush_queue()
134
+ end
135
+
136
+ end # loop
137
+ end
138
+
139
+ begin
140
+ main()
141
+ ensure
142
+ LOG.warn("Terminating! Flushing the queue(size=#{@queue.size})...")
143
+ flush_queue()
144
+ end
145
+
146
+
data/bin/syslog2json ADDED
@@ -0,0 +1,23 @@
1
+ #!/usr/bin/env ruby
2
+
3
+ require 'log2json'
4
+ require 'log2json/filters/syslog'
5
+ # Require your log2json filter gems here if needed
6
+
7
+ # Configure log filters
8
+ # FILTERS will be { type1=>[ filterX, filterY, ...], type2=>[...], ... }
9
+ FILTERS = Hash.new { |hash, key| hash[key] = [] }
10
+
11
+ [
12
+ # As a demo, we setup the built-in syslog filter here.
13
+ ::Log2Json::Filters::SyslogFilter.new('SysLogFilter'),
14
+
15
+ # You can add more filters if needed
16
+
17
+ ].each { |filter| FILTERS[filter.type] << filter }
18
+
19
+ # Assume the type of the input logs
20
+ ENV['type'] = 'syslog'
21
+
22
+ # Start processing log lines
23
+ ::Log2Json.main(FILTERS)
data/bin/tail ADDED
Binary file
data/bin/tail-log ADDED
@@ -0,0 +1,7 @@
1
+ #!/usr/bin/env ruby
2
+ #
3
+ # Wrapper for running the tail-log.sh shell script.
4
+ require 'log2json'
5
+ loc = Log2Json.method(:main).source_location[0]
6
+ loc = File.expand_path(File.join(loc, '..', '..', 'bin', 'tail-log.sh'))
7
+ exec(loc, *ARGV)
data/bin/tail-log.sh ADDED
@@ -0,0 +1,67 @@
1
+ #!/bin/bash
2
+ #
3
+ set -e
4
+
5
+ # Find out the absolute path to the tail utility.
6
+ # This is a patched version of the tail utility in GNU coreutils-8.13 compiled for Ubuntu 12.04 LTS.
7
+ # The difference is that if header will be shown(ie, with -v or when multiple files are specified),
8
+ # it will also print "==> file.name <== [event]" to stdout whenever a file truncation or a new file is
9
+ # detected. [event] will be one of "[new_file]" or "[truncated]".
10
+ TAIL=$(
11
+ ruby -- - <<'EOF'
12
+ require 'log2json'
13
+ loc = Log2Json.method(:main).source_location[0]
14
+ puts File.expand_path(File.join(loc, '..', '..', 'bin', 'tail'))
15
+ EOF
16
+ )
17
+
18
+ # Turn each path arguments into absolute path.
19
+ OIFS=$IFS
20
+ IFS="
21
+ "
22
+ set -- $(ruby -e "ARGV.each {|p| puts File.absolute_path(p)}" "$@")
23
+ IFS=$OIFS
24
+
25
+ # This is where we store the files that track the positions of the
26
+ # files we are tailing.
27
+ SINCEDB_DIR=${SINCEDB_DIR:-~/.tail-log}
28
+ mkdir -p "$SINCEDB_DIR" || true
29
+
30
+
31
+ # Helper to build the arguments to tail.
32
+ # Specifically, we expect the use of GNU tail as found in GNU coreutils.
33
+ # It allows us to follow(with -F) files across rotations or truncations.
34
+ # It also lets us start tailing from the n-th line of a file.
35
+ build_tail_args() {
36
+ local i=${#TAIL_ARGS[*]}
37
+ local fpath t line sincedb_path
38
+ for fpath in "$@"
39
+ do
40
+ sincedb_path=$SINCEDB_DIR/$fpath.since
41
+ if [ -r "$sincedb_path" ]; then
42
+ read line < "$sincedb_path"
43
+ t=($line)
44
+ # if inode number is unchanged and the current file size is not smaller
45
+ # then we start tailing from 1 + the line number recorded in the sincedb.
46
+ if [[ ${t[0]} == $(stat -c "%i" "$fpath") && ${t[1]} -le $(stat -c "%s" "$fpath") ]]; then
47
+ TAIL_ARGS[$((i++))]="-n+$((t[2] + 1))"
48
+ # tail -n+N means start tailing from the N-th line of the file
49
+ # and we're even allowed to specify different -n+N for different files!
50
+ TAIL_ARGS[$((i++))]=$fpath
51
+ continue
52
+ fi
53
+ fi
54
+ TAIL_ARGS[$((i++))]="-n+$(($(wc -l "$fpath" | cut -d' ' -f1) + 1))"
55
+ # Note: we can't just ask tail to seek to the end here(ie, with -n0) since
56
+ # then we'd lose track of the line count.
57
+ # Note: if fpath doesn't exist yet, then the above evaluates to "-n+1", which
58
+ # is fine.
59
+ TAIL_ARGS[$((i++))]=$fpath
60
+ done
61
+ }
62
+
63
+ TAIL_ARGS=(-v -F)
64
+ build_tail_args "$@"
65
+
66
+ $TAIL "${TAIL_ARGS[@]}" | track-tails "$SINCEDB_DIR" "${TAIL_ARGS[@]}"
67
+