logstash-output-webhdfs 0.0.1 → 0.0.2

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA1:
3
- metadata.gz: 9f37f22c7256c852597fc24ff4578a11503f68b7
4
- data.tar.gz: 968727c576295ddb49f09f8230418e14828da865
3
+ metadata.gz: b62674a3d35ba63e84ddb865b846c9b89c1bd08a
4
+ data.tar.gz: 3e35d2d01038c0c8e57f798b13aa31f9e46614d7
5
5
  SHA512:
6
- metadata.gz: 7d9f851d5b01629602616fdf58ad009bf09e3a8bf5befe8d0313e61eddaf10ce9b5ae2b196f38e2119f2f6098387547eda3b6f9f6af96af7685b45369efadb89
7
- data.tar.gz: ee608a1e5f9876d0aa0df0d4bbc11192d62668ac97ca91b8a17434fc744730301b69472b020d0085efc3dfcded5bff68facc4c7b2c525e7ea6cefd89017fea6c
6
+ metadata.gz: 4e1b78ca8e557ac1b76a4d4aaa79b137092f440ee5e183e2347be1a13b9c36dbb852349aa4e3be59641f89e4cc2167918704e75a383fc7de294fbe0ccf874412
7
+ data.tar.gz: 2db171e7d65de252e3eedf7b301a20e234a88b53b0c62c7b5bf80d25fbc44d951ea123274256dd312fd9710cd80c686d1ea4cf87206a184aa042905ea264581b
data/README.md CHANGED
@@ -1,86 +1,41 @@
1
- # Logstash Plugin
1
+ logstash-webhdfs
2
+ ================
2
3
 
3
- This is a plugin for [Logstash](https://github.com/elasticsearch/logstash).
4
+ A logstash plugin to store events via webhdfs.
4
5
 
5
- It is fully free and fully open source. The license is Apache 2.0, meaning you are pretty much free to use it however you want in whatever way.
6
-
7
- ## Documentation
8
-
9
- Logstash provides infrastructure to automatically generate documentation for this plugin. We use the asciidoc format to write documentation so any comments in the source code will be first converted into asciidoc and then into html. All plugin documentation are placed under one [central location](http://www.elasticsearch.org/guide/en/logstash/current/).
10
-
11
- - For formatting code or config example, you can use the asciidoc `[source,ruby]` directive
12
- - For more asciidoc formatting tips, see the excellent reference here https://github.com/elasticsearch/docs#asciidoc-guide
13
-
14
- ## Need Help?
15
-
16
- Need help? Try #logstash on freenode IRC or the logstash-users@googlegroups.com mailing list.
17
-
18
- ## Developing
19
-
20
- ### 1. Plugin Developement and Testing
21
-
22
- #### Code
23
- - To get started, you'll need JRuby with the Bundler gem installed.
24
-
25
- - Create a new plugin or clone and existing from the GitHub [logstash-plugins](https://github.com/logstash-plugins) organization. We also provide [example plugins](https://github.com/logstash-plugins?query=example).
26
-
27
- - Install dependencies
28
- ```sh
29
- bundle install
30
- ```
31
-
32
- #### Test
33
-
34
- - Update your dependencies
6
+ Tested with v1.3.3, v1.4.0 and 1.5.0.
35
7
 
36
- ```sh
37
- bundle install
38
- ```
39
-
40
- - Run tests
41
-
42
- ```sh
43
- bundle exec rspec
44
- ```
45
-
46
- ### 2. Running your unpublished Plugin in Logstash
47
-
48
- #### 2.1 Run in a local Logstash clone
49
-
50
- - Edit Logstash `Gemfile` and add the local plugin path, for example:
51
- ```ruby
52
- gem "logstash-filter-awesome", :path => "/your/local/logstash-filter-awesome"
53
- ```
54
- - Install plugin
55
- ```sh
56
- bin/plugin install --no-verify
57
- ```
58
- - Run Logstash with your plugin
59
- ```sh
60
- bin/logstash -e 'filter {awesome {}}'
61
- ```
62
- At this point any modifications to the plugin code will be applied to this local Logstash setup. After modifying the plugin, simply rerun Logstash.
8
+ It is fully free and fully open source. The license is Apache 2.0, meaning you are pretty much free to use it however you want in whatever way.
63
9
 
64
- #### 2.2 Run in an installed Logstash
10
+ This plugin only has a mandatory dependency on the webhdfs gem from Kazuki Ohta and TAGOMORI Satoshi (@see: https://github.com/kzk/webhdfs). Optional dependencies are zlib and snappy gem.
65
11
 
66
- You can use the same **2.1** method to run your plugin in an installed Logstash by editing its `Gemfile` and pointing the `:path` to your local plugin development directory or you can build the gem and install it using:
12
+ No jars from hadoop are needed, thus reducing configuration and compatibility problems.
67
13
 
68
- - Build your plugin gem
69
- ```sh
70
- gem build logstash-filter-awesome.gemspec
14
+ ## Installation
15
+ Change into your logstash install directory and execute:
71
16
  ```
72
- - Install the plugin from the Logstash home
73
- ```sh
74
- bin/plugin install /your/local/plugin/logstash-filter-awesome.gem
17
+ bin/plugin install logstash-output-webhdfs
75
18
  ```
76
- - Start Logstash and proceed to test the plugin
77
-
78
- ## Contributing
79
19
 
80
- All contributions are welcome: ideas, patches, documentation, bug reports, complaints, and even something you drew up on a napkin.
81
-
82
- Programming is not a required skill. Whatever you've seen about open source and maintainers or community members saying "send patches or die" - you will not see that here.
83
-
84
- It is more important to the community that you are able to contribute.
20
+ ## Documentation
85
21
 
86
- For more information about contributing, see the [CONTRIBUTING](https://github.com/elasticsearch/logstash/blob/master/CONTRIBUTING.md) file.
22
+ Example configuration:
23
+
24
+ output {
25
+ webhdfs {
26
+ workers => 2
27
+ server => "your.nameno.de:14000"
28
+ user => "flume"
29
+ path => "/user/flume/logstash/dt=%{+Y}-%{+M}-%{+d}/logstash-%{+H}.log"
30
+ flush_size => 500
31
+ compression => "snappy"
32
+ idle_flush_time => 10
33
+ retry_interval => 0.5
34
+ }
35
+ }
36
+
37
+ For a complete list of options, see config section in source code.
38
+
39
+ This plugin has dependencies on:
40
+ * webhdfs module @<https://github.com/kzk/webhdfs>
41
+ * snappy module @<https://github.com/miyucy/snappy>
@@ -7,11 +7,18 @@ require "stud/buffer"
7
7
  # restapi.
8
8
  #
9
9
  # This plugin only has a mandatory dependency on the webhdfs gem from
10
- # Tagamori Satoshi (@see: https://github.com/kzk/webhdfs).
10
+ # Kazuki Ohta and TAGOMORI Satoshi (@see: https://github.com/kzk/webhdfs).
11
11
  # Optional dependencies are zlib and snappy gem.
12
12
  # No jars from hadoop are needed, thus reducing configuration and compatibility
13
13
  # problems.
14
14
  #
15
+ # If you get an error like:
16
+ #
17
+ # Max write retries reached. Exception: initialize: name or service not known {:level=>:error}
18
+ #
19
+ # make sure, that the hostname of your namenode is resolvable on the host running logstash. When creating/appending
20
+ # to a file, webhdfs somtime sends a 307 TEMPORARY_REDIRECT with the HOSTNAME of the machine its running on.
21
+ #
15
22
  # USAGE:
16
23
  # This is an example of logstash config:
17
24
  #
@@ -95,8 +102,8 @@ class LogStash::Outputs::WebHdfs < LogStash::Outputs::Base
95
102
  # How many times should we retry.
96
103
  config :retry_times, :validate => :number, :default => 5
97
104
 
98
- # Compress output. One of [false, 'snappy', 'gzip']
99
- config :compress, :validate => ["false", "snappy", "gzip"], :default => "false"
105
+ # Compress output. One of ['none', 'snappy', 'gzip']
106
+ config :compression, :validate => ["none", "snappy", "gzip"], :default => "none"
100
107
 
101
108
  # Set snappy chunksize. Only neccessary for stream format. Defaults to 32k. Max is 65536
102
109
  # @see http://code.google.com/p/snappy/source/browse/trunk/framing_format.txt
@@ -116,13 +123,13 @@ class LogStash::Outputs::WebHdfs < LogStash::Outputs::Base
116
123
  rescue LoadError
117
124
  @logger.error("Module webhdfs could not be loaded.")
118
125
  end
119
- if @compress == "gzip"
126
+ if @compression == "gzip"
120
127
  begin
121
128
  require "zlib"
122
129
  rescue LoadError
123
130
  @logger.error("Gzip compression selected but zlib module could not be loaded.")
124
131
  end
125
- elsif @compress == "snappy"
132
+ elsif @compression == "snappy"
126
133
  begin
127
134
  require "snappy"
128
135
  rescue LoadError
@@ -185,15 +192,15 @@ class LogStash::Outputs::WebHdfs < LogStash::Outputs::Base
185
192
  output_files[path] << event_as_string
186
193
  end
187
194
  output_files.each do |path, output|
188
- if @compress == "gzip"
195
+ if @compression == "gzip"
189
196
  path += ".gz"
190
197
  output = compress_gzip(output)
191
- elsif @compress == "snappy"
198
+ elsif @compression == "snappy"
192
199
  path += ".snappy"
193
200
  if @snappy_format == "file"
194
201
  output = compress_snappy_file(output)
195
202
  elsif
196
- output = compress_snappy_stream(output)
203
+ output = compress_snappy_stream(output)
197
204
  end
198
205
  end
199
206
  write_tries = 0
@@ -234,7 +241,7 @@ class LogStash::Outputs::WebHdfs < LogStash::Outputs::Base
234
241
  data= data.encode(Encoding::ASCII_8BIT, "binary", :undef => :replace)
235
242
  buffer = StringIO.new('', 'w')
236
243
  buffer.set_encoding Encoding::ASCII_8BIT unless RUBY_VERSION =~ /^1\.8/
237
- compressed = Snappy.deflate(chunk)
244
+ compressed = Snappy.deflate(data)
238
245
  buffer << [compressed.size, compressed].pack("Na*")
239
246
  buffer.string
240
247
  end
@@ -261,7 +268,7 @@ class LogStash::Outputs::WebHdfs < LogStash::Outputs::Base
261
268
  @client.append(path, data)
262
269
  rescue WebHDFS::FileNotFoundError
263
270
  # Add snappy header if format is "file".
264
- if @snappy_format == "file"
271
+ if @compression == "snappy" and @snappy_format == "file"
265
272
  @client.create(path, get_snappy_header! + data)
266
273
  elsif
267
274
  @client.create(path, data)
@@ -1,7 +1,7 @@
1
1
  Gem::Specification.new do |s|
2
2
 
3
3
  s.name = 'logstash-output-webhdfs'
4
- s.version = '0.0.1'
4
+ s.version = '0.0.2'
5
5
  s.licenses = ['Apache License (2.0)']
6
6
  s.summary = "Plugin to write events to hdfs via webhdfs."
7
7
  s.description = "This gem is a logstash plugin required to be installed on top of the Logstash core pipeline using $LS_HOME/bin/plugin install gemname. This gem is not a stand-alone program"
@@ -24,4 +24,4 @@ Gem::Specification.new do |s|
24
24
  s.add_runtime_dependency 'webhdfs'
25
25
  s.add_runtime_dependency 'snappy'
26
26
  s.add_development_dependency 'logstash-devutils'
27
- end
27
+ end
@@ -1,36 +1,45 @@
1
1
  # encoding: utf-8
2
2
  require 'logstash/devutils/rspec/spec_helper'
3
3
  require 'logstash/outputs/webhdfs'
4
+ require 'webhdfs'
5
+ require 'json'
4
6
 
5
7
  describe 'outputs/webhdfs' do
6
8
 
7
- let(:event) do
8
- LogStash::Event.new(
9
- 'message' => 'fanastic log entry',
10
- 'source' => 'someapp',
11
- 'type' => 'nginx',
9
+ webhdfs_server = 'localhost'
10
+ webhdfs_port = 50070
11
+ webhdfs_user = 'hadoop'
12
+ path_to_testlog = '/user/hadoop/test.log'
13
+ current_logfile_name = '/user/hadoop/test.log'
14
+ current_config = ""
15
+
16
+ event = LogStash::Event.new(
17
+ 'message' => 'Hello world!',
18
+ 'source' => 'out of the blue',
19
+ 'type' => 'generator',
20
+ 'host' => 'localhost',
12
21
  '@timestamp' => LogStash::Timestamp.now)
13
- end
22
+
23
+ default_config = { 'server' => webhdfs_server + ':' + webhdfs_port.to_s,
24
+ 'user' => webhdfs_user,
25
+ 'path' => path_to_testlog,
26
+ 'compression' => 'none' }
27
+
28
+ client = WebHDFS::Client.new(webhdfs_server, webhdfs_port, webhdfs_user)
14
29
 
15
30
  context 'when initializing' do
16
31
 
17
32
  it 'should fail to register without required values' do
18
- configuration_error = false
19
- begin
20
- LogStash::Plugin.lookup("output", "webhdfs").new()
21
- rescue LogStash::ConfigurationError
22
- configuration_error = true
23
- end
24
- insist { configuration_error } == true
33
+ expect { LogStash::Plugin.lookup("output", "webhdfs").new() }.to raise_error(error=LogStash::ConfigurationError)
25
34
  end
26
35
 
27
- subject = LogStash::Plugin.lookup("output", "webhdfs").new('server' => '127.0.0.1:50070', 'path' => '/path/to/webhdfs.file')
28
-
29
- it 'should register' do
36
+ it 'should register with default values' do
37
+ subject = LogStash::Plugin.lookup("output", "webhdfs").new(default_config)
30
38
  expect { subject.register }.to_not raise_error
31
39
  end
32
40
 
33
41
  it 'should have default config values' do
42
+ subject = LogStash::Plugin.lookup("output", "webhdfs").new(default_config)
34
43
  insist { subject.idle_flush_time } == 1
35
44
  insist { subject.flush_size } == 500
36
45
  insist { subject.open_timeout } == 30
@@ -43,6 +52,78 @@ describe 'outputs/webhdfs' do
43
52
  insist { subject.snappy_format } == 'stream'
44
53
  insist { subject.remove_at_timestamp } == true
45
54
  end
55
+ end
56
+
57
+ context 'when writing messages' do
58
+
59
+ before :each do
60
+ current_logfile_name = path_to_testlog
61
+ current_config = default_config.clone
62
+ end
63
+
64
+ it 'should match the event data' do
65
+ subject = LogStash::Plugin.lookup("output", "webhdfs").new(current_config)
66
+ subject.register
67
+ subject.receive(event)
68
+ subject.teardown
69
+ insist { client.read(current_logfile_name).strip } == event.to_json
70
+ end
71
+
72
+ it 'should match the configured pattern' do
73
+ current_config['message_format'] = '%{message} came %{source}.'
74
+ subject = LogStash::Plugin.lookup("output", "webhdfs").new(current_config)
75
+ subject.register
76
+ subject.receive(event)
77
+ subject.teardown
78
+ insist { client.read(current_logfile_name).strip } == 'Hello world! came out of the blue.'
79
+ end
80
+
81
+ # Hive does not like a leading "@", but we need @timestamp for path calculation.
82
+ it 'should remove the @timestamp field if configured' do
83
+ current_config['remove_at_timestamp'] = true
84
+ current_config['message_format'] = '%{@timestamp} should be missing.'
85
+ subject = LogStash::Plugin.lookup("output", "webhdfs").new(current_config)
86
+ subject.register
87
+ subject.receive(event)
88
+ subject.teardown
89
+ insist { client.read(current_logfile_name).strip } == '%{@timestamp} should be missing.'
90
+ end
91
+
92
+ it 'should flush after configured idle time' do
93
+ current_config['idle_flush_time'] = 2
94
+ subject = LogStash::Plugin.lookup("output", "webhdfs").new(current_config)
95
+ subject.register
96
+ subject.receive(event)
97
+ expect { client.read(current_logfile_name) }.to raise_error(error=WebHDFS::FileNotFoundError)
98
+ sleep 3
99
+ insist { client.read(current_logfile_name).strip } == event.to_json
100
+ end
101
+
102
+ it 'should write some messages uncompressed' do
103
+ subject = LogStash::Plugin.lookup("output", "webhdfs").new(current_config)
104
+ subject.register
105
+ for _ in 0..499
106
+ subject.receive(event)
107
+ end
108
+ subject.teardown
109
+ insist { client.read(current_logfile_name).lines.count } == 500
110
+ end
111
+
112
+ it 'should write some messages gzip compressed' do
113
+ current_logfile_name = current_logfile_name + ".gz"
114
+ current_config['compression'] = 'gzip'
115
+ subject = LogStash::Plugin.lookup("output", "webhdfs").new(current_config)
116
+ subject.register
117
+ for _ in 0..499
118
+ subject.receive(event)
119
+ end
120
+ subject.teardown
121
+ insist { Zlib::Inflate.new(window_bits=47).inflate(client.read(current_logfile_name)).lines.count } == 500
122
+ end
123
+
124
+ after :each do
125
+ client.delete(current_logfile_name)
126
+ end
46
127
 
47
128
  end
48
129
 
metadata CHANGED
@@ -1,14 +1,14 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: logstash-output-webhdfs
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.0.1
4
+ version: 0.0.2
5
5
  platform: ruby
6
6
  authors:
7
7
  - Björn Puttmann, loshkovskyi
8
8
  autorequire:
9
9
  bindir: bin
10
10
  cert_chain: []
11
- date: 2015-05-19 00:00:00.000000000 Z
11
+ date: 2015-05-22 00:00:00.000000000 Z
12
12
  dependencies:
13
13
  - !ruby/object:Gem::Dependency
14
14
  name: logstash-core