logstash-output-webhdfs 0.0.1 → 0.0.2
Sign up to get free protection for your applications and to get access to all the features.
- checksums.yaml +4 -4
- data/README.md +31 -76
- data/lib/logstash/outputs/webhdfs.rb +17 -10
- data/logstash-output-webhdfs.gemspec +2 -2
- data/spec/outputs/webhdfs_spec.rb +97 -16
- metadata +2 -2
checksums.yaml
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
---
|
2
2
|
SHA1:
|
3
|
-
metadata.gz:
|
4
|
-
data.tar.gz:
|
3
|
+
metadata.gz: b62674a3d35ba63e84ddb865b846c9b89c1bd08a
|
4
|
+
data.tar.gz: 3e35d2d01038c0c8e57f798b13aa31f9e46614d7
|
5
5
|
SHA512:
|
6
|
-
metadata.gz:
|
7
|
-
data.tar.gz:
|
6
|
+
metadata.gz: 4e1b78ca8e557ac1b76a4d4aaa79b137092f440ee5e183e2347be1a13b9c36dbb852349aa4e3be59641f89e4cc2167918704e75a383fc7de294fbe0ccf874412
|
7
|
+
data.tar.gz: 2db171e7d65de252e3eedf7b301a20e234a88b53b0c62c7b5bf80d25fbc44d951ea123274256dd312fd9710cd80c686d1ea4cf87206a184aa042905ea264581b
|
data/README.md
CHANGED
@@ -1,86 +1,41 @@
|
|
1
|
-
|
1
|
+
logstash-webhdfs
|
2
|
+
================
|
2
3
|
|
3
|
-
|
4
|
+
A logstash plugin to store events via webhdfs.
|
4
5
|
|
5
|
-
|
6
|
-
|
7
|
-
## Documentation
|
8
|
-
|
9
|
-
Logstash provides infrastructure to automatically generate documentation for this plugin. We use the asciidoc format to write documentation so any comments in the source code will be first converted into asciidoc and then into html. All plugin documentation are placed under one [central location](http://www.elasticsearch.org/guide/en/logstash/current/).
|
10
|
-
|
11
|
-
- For formatting code or config example, you can use the asciidoc `[source,ruby]` directive
|
12
|
-
- For more asciidoc formatting tips, see the excellent reference here https://github.com/elasticsearch/docs#asciidoc-guide
|
13
|
-
|
14
|
-
## Need Help?
|
15
|
-
|
16
|
-
Need help? Try #logstash on freenode IRC or the logstash-users@googlegroups.com mailing list.
|
17
|
-
|
18
|
-
## Developing
|
19
|
-
|
20
|
-
### 1. Plugin Developement and Testing
|
21
|
-
|
22
|
-
#### Code
|
23
|
-
- To get started, you'll need JRuby with the Bundler gem installed.
|
24
|
-
|
25
|
-
- Create a new plugin or clone and existing from the GitHub [logstash-plugins](https://github.com/logstash-plugins) organization. We also provide [example plugins](https://github.com/logstash-plugins?query=example).
|
26
|
-
|
27
|
-
- Install dependencies
|
28
|
-
```sh
|
29
|
-
bundle install
|
30
|
-
```
|
31
|
-
|
32
|
-
#### Test
|
33
|
-
|
34
|
-
- Update your dependencies
|
6
|
+
Tested with v1.3.3, v1.4.0 and 1.5.0.
|
35
7
|
|
36
|
-
|
37
|
-
bundle install
|
38
|
-
```
|
39
|
-
|
40
|
-
- Run tests
|
41
|
-
|
42
|
-
```sh
|
43
|
-
bundle exec rspec
|
44
|
-
```
|
45
|
-
|
46
|
-
### 2. Running your unpublished Plugin in Logstash
|
47
|
-
|
48
|
-
#### 2.1 Run in a local Logstash clone
|
49
|
-
|
50
|
-
- Edit Logstash `Gemfile` and add the local plugin path, for example:
|
51
|
-
```ruby
|
52
|
-
gem "logstash-filter-awesome", :path => "/your/local/logstash-filter-awesome"
|
53
|
-
```
|
54
|
-
- Install plugin
|
55
|
-
```sh
|
56
|
-
bin/plugin install --no-verify
|
57
|
-
```
|
58
|
-
- Run Logstash with your plugin
|
59
|
-
```sh
|
60
|
-
bin/logstash -e 'filter {awesome {}}'
|
61
|
-
```
|
62
|
-
At this point any modifications to the plugin code will be applied to this local Logstash setup. After modifying the plugin, simply rerun Logstash.
|
8
|
+
It is fully free and fully open source. The license is Apache 2.0, meaning you are pretty much free to use it however you want in whatever way.
|
63
9
|
|
64
|
-
|
10
|
+
This plugin only has a mandatory dependency on the webhdfs gem from Kazuki Ohta and TAGOMORI Satoshi (@see: https://github.com/kzk/webhdfs). Optional dependencies are zlib and snappy gem.
|
65
11
|
|
66
|
-
|
12
|
+
No jars from hadoop are needed, thus reducing configuration and compatibility problems.
|
67
13
|
|
68
|
-
|
69
|
-
|
70
|
-
gem build logstash-filter-awesome.gemspec
|
14
|
+
## Installation
|
15
|
+
Change into your logstash install directory and execute:
|
71
16
|
```
|
72
|
-
|
73
|
-
```sh
|
74
|
-
bin/plugin install /your/local/plugin/logstash-filter-awesome.gem
|
17
|
+
bin/plugin install logstash-output-webhdfs
|
75
18
|
```
|
76
|
-
- Start Logstash and proceed to test the plugin
|
77
|
-
|
78
|
-
## Contributing
|
79
19
|
|
80
|
-
|
81
|
-
|
82
|
-
Programming is not a required skill. Whatever you've seen about open source and maintainers or community members saying "send patches or die" - you will not see that here.
|
83
|
-
|
84
|
-
It is more important to the community that you are able to contribute.
|
20
|
+
## Documentation
|
85
21
|
|
86
|
-
|
22
|
+
Example configuration:
|
23
|
+
|
24
|
+
output {
|
25
|
+
webhdfs {
|
26
|
+
workers => 2
|
27
|
+
server => "your.nameno.de:14000"
|
28
|
+
user => "flume"
|
29
|
+
path => "/user/flume/logstash/dt=%{+Y}-%{+M}-%{+d}/logstash-%{+H}.log"
|
30
|
+
flush_size => 500
|
31
|
+
compression => "snappy"
|
32
|
+
idle_flush_time => 10
|
33
|
+
retry_interval => 0.5
|
34
|
+
}
|
35
|
+
}
|
36
|
+
|
37
|
+
For a complete list of options, see config section in source code.
|
38
|
+
|
39
|
+
This plugin has dependencies on:
|
40
|
+
* webhdfs module @<https://github.com/kzk/webhdfs>
|
41
|
+
* snappy module @<https://github.com/miyucy/snappy>
|
@@ -7,11 +7,18 @@ require "stud/buffer"
|
|
7
7
|
# restapi.
|
8
8
|
#
|
9
9
|
# This plugin only has a mandatory dependency on the webhdfs gem from
|
10
|
-
#
|
10
|
+
# Kazuki Ohta and TAGOMORI Satoshi (@see: https://github.com/kzk/webhdfs).
|
11
11
|
# Optional dependencies are zlib and snappy gem.
|
12
12
|
# No jars from hadoop are needed, thus reducing configuration and compatibility
|
13
13
|
# problems.
|
14
14
|
#
|
15
|
+
# If you get an error like:
|
16
|
+
#
|
17
|
+
# Max write retries reached. Exception: initialize: name or service not known {:level=>:error}
|
18
|
+
#
|
19
|
+
# make sure, that the hostname of your namenode is resolvable on the host running logstash. When creating/appending
|
20
|
+
# to a file, webhdfs somtime sends a 307 TEMPORARY_REDIRECT with the HOSTNAME of the machine its running on.
|
21
|
+
#
|
15
22
|
# USAGE:
|
16
23
|
# This is an example of logstash config:
|
17
24
|
#
|
@@ -95,8 +102,8 @@ class LogStash::Outputs::WebHdfs < LogStash::Outputs::Base
|
|
95
102
|
# How many times should we retry.
|
96
103
|
config :retry_times, :validate => :number, :default => 5
|
97
104
|
|
98
|
-
# Compress output. One of [
|
99
|
-
config :
|
105
|
+
# Compress output. One of ['none', 'snappy', 'gzip']
|
106
|
+
config :compression, :validate => ["none", "snappy", "gzip"], :default => "none"
|
100
107
|
|
101
108
|
# Set snappy chunksize. Only neccessary for stream format. Defaults to 32k. Max is 65536
|
102
109
|
# @see http://code.google.com/p/snappy/source/browse/trunk/framing_format.txt
|
@@ -116,13 +123,13 @@ class LogStash::Outputs::WebHdfs < LogStash::Outputs::Base
|
|
116
123
|
rescue LoadError
|
117
124
|
@logger.error("Module webhdfs could not be loaded.")
|
118
125
|
end
|
119
|
-
if @
|
126
|
+
if @compression == "gzip"
|
120
127
|
begin
|
121
128
|
require "zlib"
|
122
129
|
rescue LoadError
|
123
130
|
@logger.error("Gzip compression selected but zlib module could not be loaded.")
|
124
131
|
end
|
125
|
-
elsif @
|
132
|
+
elsif @compression == "snappy"
|
126
133
|
begin
|
127
134
|
require "snappy"
|
128
135
|
rescue LoadError
|
@@ -185,15 +192,15 @@ class LogStash::Outputs::WebHdfs < LogStash::Outputs::Base
|
|
185
192
|
output_files[path] << event_as_string
|
186
193
|
end
|
187
194
|
output_files.each do |path, output|
|
188
|
-
if @
|
195
|
+
if @compression == "gzip"
|
189
196
|
path += ".gz"
|
190
197
|
output = compress_gzip(output)
|
191
|
-
elsif @
|
198
|
+
elsif @compression == "snappy"
|
192
199
|
path += ".snappy"
|
193
200
|
if @snappy_format == "file"
|
194
201
|
output = compress_snappy_file(output)
|
195
202
|
elsif
|
196
|
-
|
203
|
+
output = compress_snappy_stream(output)
|
197
204
|
end
|
198
205
|
end
|
199
206
|
write_tries = 0
|
@@ -234,7 +241,7 @@ class LogStash::Outputs::WebHdfs < LogStash::Outputs::Base
|
|
234
241
|
data= data.encode(Encoding::ASCII_8BIT, "binary", :undef => :replace)
|
235
242
|
buffer = StringIO.new('', 'w')
|
236
243
|
buffer.set_encoding Encoding::ASCII_8BIT unless RUBY_VERSION =~ /^1\.8/
|
237
|
-
compressed = Snappy.deflate(
|
244
|
+
compressed = Snappy.deflate(data)
|
238
245
|
buffer << [compressed.size, compressed].pack("Na*")
|
239
246
|
buffer.string
|
240
247
|
end
|
@@ -261,7 +268,7 @@ class LogStash::Outputs::WebHdfs < LogStash::Outputs::Base
|
|
261
268
|
@client.append(path, data)
|
262
269
|
rescue WebHDFS::FileNotFoundError
|
263
270
|
# Add snappy header if format is "file".
|
264
|
-
if @snappy_format == "file"
|
271
|
+
if @compression == "snappy" and @snappy_format == "file"
|
265
272
|
@client.create(path, get_snappy_header! + data)
|
266
273
|
elsif
|
267
274
|
@client.create(path, data)
|
@@ -1,7 +1,7 @@
|
|
1
1
|
Gem::Specification.new do |s|
|
2
2
|
|
3
3
|
s.name = 'logstash-output-webhdfs'
|
4
|
-
s.version = '0.0.
|
4
|
+
s.version = '0.0.2'
|
5
5
|
s.licenses = ['Apache License (2.0)']
|
6
6
|
s.summary = "Plugin to write events to hdfs via webhdfs."
|
7
7
|
s.description = "This gem is a logstash plugin required to be installed on top of the Logstash core pipeline using $LS_HOME/bin/plugin install gemname. This gem is not a stand-alone program"
|
@@ -24,4 +24,4 @@ Gem::Specification.new do |s|
|
|
24
24
|
s.add_runtime_dependency 'webhdfs'
|
25
25
|
s.add_runtime_dependency 'snappy'
|
26
26
|
s.add_development_dependency 'logstash-devutils'
|
27
|
-
end
|
27
|
+
end
|
@@ -1,36 +1,45 @@
|
|
1
1
|
# encoding: utf-8
|
2
2
|
require 'logstash/devutils/rspec/spec_helper'
|
3
3
|
require 'logstash/outputs/webhdfs'
|
4
|
+
require 'webhdfs'
|
5
|
+
require 'json'
|
4
6
|
|
5
7
|
describe 'outputs/webhdfs' do
|
6
8
|
|
7
|
-
|
8
|
-
|
9
|
-
|
10
|
-
|
11
|
-
|
9
|
+
webhdfs_server = 'localhost'
|
10
|
+
webhdfs_port = 50070
|
11
|
+
webhdfs_user = 'hadoop'
|
12
|
+
path_to_testlog = '/user/hadoop/test.log'
|
13
|
+
current_logfile_name = '/user/hadoop/test.log'
|
14
|
+
current_config = ""
|
15
|
+
|
16
|
+
event = LogStash::Event.new(
|
17
|
+
'message' => 'Hello world!',
|
18
|
+
'source' => 'out of the blue',
|
19
|
+
'type' => 'generator',
|
20
|
+
'host' => 'localhost',
|
12
21
|
'@timestamp' => LogStash::Timestamp.now)
|
13
|
-
|
22
|
+
|
23
|
+
default_config = { 'server' => webhdfs_server + ':' + webhdfs_port.to_s,
|
24
|
+
'user' => webhdfs_user,
|
25
|
+
'path' => path_to_testlog,
|
26
|
+
'compression' => 'none' }
|
27
|
+
|
28
|
+
client = WebHDFS::Client.new(webhdfs_server, webhdfs_port, webhdfs_user)
|
14
29
|
|
15
30
|
context 'when initializing' do
|
16
31
|
|
17
32
|
it 'should fail to register without required values' do
|
18
|
-
|
19
|
-
begin
|
20
|
-
LogStash::Plugin.lookup("output", "webhdfs").new()
|
21
|
-
rescue LogStash::ConfigurationError
|
22
|
-
configuration_error = true
|
23
|
-
end
|
24
|
-
insist { configuration_error } == true
|
33
|
+
expect { LogStash::Plugin.lookup("output", "webhdfs").new() }.to raise_error(error=LogStash::ConfigurationError)
|
25
34
|
end
|
26
35
|
|
27
|
-
|
28
|
-
|
29
|
-
it 'should register' do
|
36
|
+
it 'should register with default values' do
|
37
|
+
subject = LogStash::Plugin.lookup("output", "webhdfs").new(default_config)
|
30
38
|
expect { subject.register }.to_not raise_error
|
31
39
|
end
|
32
40
|
|
33
41
|
it 'should have default config values' do
|
42
|
+
subject = LogStash::Plugin.lookup("output", "webhdfs").new(default_config)
|
34
43
|
insist { subject.idle_flush_time } == 1
|
35
44
|
insist { subject.flush_size } == 500
|
36
45
|
insist { subject.open_timeout } == 30
|
@@ -43,6 +52,78 @@ describe 'outputs/webhdfs' do
|
|
43
52
|
insist { subject.snappy_format } == 'stream'
|
44
53
|
insist { subject.remove_at_timestamp } == true
|
45
54
|
end
|
55
|
+
end
|
56
|
+
|
57
|
+
context 'when writing messages' do
|
58
|
+
|
59
|
+
before :each do
|
60
|
+
current_logfile_name = path_to_testlog
|
61
|
+
current_config = default_config.clone
|
62
|
+
end
|
63
|
+
|
64
|
+
it 'should match the event data' do
|
65
|
+
subject = LogStash::Plugin.lookup("output", "webhdfs").new(current_config)
|
66
|
+
subject.register
|
67
|
+
subject.receive(event)
|
68
|
+
subject.teardown
|
69
|
+
insist { client.read(current_logfile_name).strip } == event.to_json
|
70
|
+
end
|
71
|
+
|
72
|
+
it 'should match the configured pattern' do
|
73
|
+
current_config['message_format'] = '%{message} came %{source}.'
|
74
|
+
subject = LogStash::Plugin.lookup("output", "webhdfs").new(current_config)
|
75
|
+
subject.register
|
76
|
+
subject.receive(event)
|
77
|
+
subject.teardown
|
78
|
+
insist { client.read(current_logfile_name).strip } == 'Hello world! came out of the blue.'
|
79
|
+
end
|
80
|
+
|
81
|
+
# Hive does not like a leading "@", but we need @timestamp for path calculation.
|
82
|
+
it 'should remove the @timestamp field if configured' do
|
83
|
+
current_config['remove_at_timestamp'] = true
|
84
|
+
current_config['message_format'] = '%{@timestamp} should be missing.'
|
85
|
+
subject = LogStash::Plugin.lookup("output", "webhdfs").new(current_config)
|
86
|
+
subject.register
|
87
|
+
subject.receive(event)
|
88
|
+
subject.teardown
|
89
|
+
insist { client.read(current_logfile_name).strip } == '%{@timestamp} should be missing.'
|
90
|
+
end
|
91
|
+
|
92
|
+
it 'should flush after configured idle time' do
|
93
|
+
current_config['idle_flush_time'] = 2
|
94
|
+
subject = LogStash::Plugin.lookup("output", "webhdfs").new(current_config)
|
95
|
+
subject.register
|
96
|
+
subject.receive(event)
|
97
|
+
expect { client.read(current_logfile_name) }.to raise_error(error=WebHDFS::FileNotFoundError)
|
98
|
+
sleep 3
|
99
|
+
insist { client.read(current_logfile_name).strip } == event.to_json
|
100
|
+
end
|
101
|
+
|
102
|
+
it 'should write some messages uncompressed' do
|
103
|
+
subject = LogStash::Plugin.lookup("output", "webhdfs").new(current_config)
|
104
|
+
subject.register
|
105
|
+
for _ in 0..499
|
106
|
+
subject.receive(event)
|
107
|
+
end
|
108
|
+
subject.teardown
|
109
|
+
insist { client.read(current_logfile_name).lines.count } == 500
|
110
|
+
end
|
111
|
+
|
112
|
+
it 'should write some messages gzip compressed' do
|
113
|
+
current_logfile_name = current_logfile_name + ".gz"
|
114
|
+
current_config['compression'] = 'gzip'
|
115
|
+
subject = LogStash::Plugin.lookup("output", "webhdfs").new(current_config)
|
116
|
+
subject.register
|
117
|
+
for _ in 0..499
|
118
|
+
subject.receive(event)
|
119
|
+
end
|
120
|
+
subject.teardown
|
121
|
+
insist { Zlib::Inflate.new(window_bits=47).inflate(client.read(current_logfile_name)).lines.count } == 500
|
122
|
+
end
|
123
|
+
|
124
|
+
after :each do
|
125
|
+
client.delete(current_logfile_name)
|
126
|
+
end
|
46
127
|
|
47
128
|
end
|
48
129
|
|
metadata
CHANGED
@@ -1,14 +1,14 @@
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
2
2
|
name: logstash-output-webhdfs
|
3
3
|
version: !ruby/object:Gem::Version
|
4
|
-
version: 0.0.
|
4
|
+
version: 0.0.2
|
5
5
|
platform: ruby
|
6
6
|
authors:
|
7
7
|
- Björn Puttmann, loshkovskyi
|
8
8
|
autorequire:
|
9
9
|
bindir: bin
|
10
10
|
cert_chain: []
|
11
|
-
date: 2015-05-
|
11
|
+
date: 2015-05-22 00:00:00.000000000 Z
|
12
12
|
dependencies:
|
13
13
|
- !ruby/object:Gem::Dependency
|
14
14
|
name: logstash-core
|