logstash-output-webhdfs 0.0.1

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml ADDED
@@ -0,0 +1,7 @@
1
+ ---
2
+ SHA1:
3
+ metadata.gz: 9f37f22c7256c852597fc24ff4578a11503f68b7
4
+ data.tar.gz: 968727c576295ddb49f09f8230418e14828da865
5
+ SHA512:
6
+ metadata.gz: 7d9f851d5b01629602616fdf58ad009bf09e3a8bf5befe8d0313e61eddaf10ce9b5ae2b196f38e2119f2f6098387547eda3b6f9f6af96af7685b45369efadb89
7
+ data.tar.gz: ee608a1e5f9876d0aa0df0d4bbc11192d62668ac97ca91b8a17434fc744730301b69472b020d0085efc3dfcded5bff68facc4c7b2c525e7ea6cefd89017fea6c
data/.gitignore ADDED
@@ -0,0 +1,19 @@
1
+ *.gem
2
+ *.rbc
3
+ .bundle
4
+ .config
5
+ coverage
6
+ InstalledFiles
7
+ lib/bundler/man
8
+ pkg
9
+ rdoc
10
+ spec/reports
11
+ test/tmp
12
+ test/version_tmp
13
+ tmp
14
+ .idea/
15
+ # YARD artifacts
16
+ .yardoc
17
+ _yardoc
18
+ doc/
19
+ *.conf
data/Gemfile ADDED
@@ -0,0 +1,2 @@
1
+ source 'https://rubygems.org'
2
+ gemspec
data/LICENSE ADDED
@@ -0,0 +1,13 @@
1
+ # Copyright 2014-2015 dbap GmbH. <http://www.dbap.de>
2
+ #
3
+ # Licensed under the Apache License, Version 2.0 (the "License");
4
+ # you may not use this file except in compliance with the License.
5
+ # You may obtain a copy of the License at
6
+ #
7
+ # http://www.apache.org/licenses/LICENSE-2.0
8
+ #
9
+ # Unless required by applicable law or agreed to in writing, software
10
+ # distributed under the License is distributed on an "AS IS" BASIS,
11
+ # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
12
+ # See the License for the specific language governing permissions and
13
+ # limitations under the License.
data/README.md ADDED
@@ -0,0 +1,86 @@
1
+ # Logstash Plugin
2
+
3
+ This is a plugin for [Logstash](https://github.com/elasticsearch/logstash).
4
+
5
+ It is fully free and fully open source. The license is Apache 2.0, meaning you are pretty much free to use it however you want in whatever way.
6
+
7
+ ## Documentation
8
+
9
+ Logstash provides infrastructure to automatically generate documentation for this plugin. We use the asciidoc format to write documentation so any comments in the source code will be first converted into asciidoc and then into html. All plugin documentation are placed under one [central location](http://www.elasticsearch.org/guide/en/logstash/current/).
10
+
11
+ - For formatting code or config example, you can use the asciidoc `[source,ruby]` directive
12
+ - For more asciidoc formatting tips, see the excellent reference here https://github.com/elasticsearch/docs#asciidoc-guide
13
+
14
+ ## Need Help?
15
+
16
+ Need help? Try #logstash on freenode IRC or the logstash-users@googlegroups.com mailing list.
17
+
18
+ ## Developing
19
+
20
+ ### 1. Plugin Developement and Testing
21
+
22
+ #### Code
23
+ - To get started, you'll need JRuby with the Bundler gem installed.
24
+
25
+ - Create a new plugin or clone and existing from the GitHub [logstash-plugins](https://github.com/logstash-plugins) organization. We also provide [example plugins](https://github.com/logstash-plugins?query=example).
26
+
27
+ - Install dependencies
28
+ ```sh
29
+ bundle install
30
+ ```
31
+
32
+ #### Test
33
+
34
+ - Update your dependencies
35
+
36
+ ```sh
37
+ bundle install
38
+ ```
39
+
40
+ - Run tests
41
+
42
+ ```sh
43
+ bundle exec rspec
44
+ ```
45
+
46
+ ### 2. Running your unpublished Plugin in Logstash
47
+
48
+ #### 2.1 Run in a local Logstash clone
49
+
50
+ - Edit Logstash `Gemfile` and add the local plugin path, for example:
51
+ ```ruby
52
+ gem "logstash-filter-awesome", :path => "/your/local/logstash-filter-awesome"
53
+ ```
54
+ - Install plugin
55
+ ```sh
56
+ bin/plugin install --no-verify
57
+ ```
58
+ - Run Logstash with your plugin
59
+ ```sh
60
+ bin/logstash -e 'filter {awesome {}}'
61
+ ```
62
+ At this point any modifications to the plugin code will be applied to this local Logstash setup. After modifying the plugin, simply rerun Logstash.
63
+
64
+ #### 2.2 Run in an installed Logstash
65
+
66
+ You can use the same **2.1** method to run your plugin in an installed Logstash by editing its `Gemfile` and pointing the `:path` to your local plugin development directory or you can build the gem and install it using:
67
+
68
+ - Build your plugin gem
69
+ ```sh
70
+ gem build logstash-filter-awesome.gemspec
71
+ ```
72
+ - Install the plugin from the Logstash home
73
+ ```sh
74
+ bin/plugin install /your/local/plugin/logstash-filter-awesome.gem
75
+ ```
76
+ - Start Logstash and proceed to test the plugin
77
+
78
+ ## Contributing
79
+
80
+ All contributions are welcome: ideas, patches, documentation, bug reports, complaints, and even something you drew up on a napkin.
81
+
82
+ Programming is not a required skill. Whatever you've seen about open source and maintainers or community members saying "send patches or die" - you will not see that here.
83
+
84
+ It is more important to the community that you are able to contribute.
85
+
86
+ For more information about contributing, see the [CONTRIBUTING](https://github.com/elasticsearch/logstash/blob/master/CONTRIBUTING.md) file.
data/Rakefile ADDED
@@ -0,0 +1,7 @@
1
+ @files=[]
2
+
3
+ task :default do
4
+ system("rake -T")
5
+ end
6
+
7
+ require "logstash/devutils/rake"
@@ -0,0 +1,275 @@
1
+ # encoding: utf-8
2
+ require "logstash/namespace"
3
+ require "logstash/outputs/base"
4
+ require "stud/buffer"
5
+
6
+ # Summary: Plugin to send logstash events to to files in HDFS via webhdfs
7
+ # restapi.
8
+ #
9
+ # This plugin only has a mandatory dependency on the webhdfs gem from
10
+ # Tagamori Satoshi (@see: https://github.com/kzk/webhdfs).
11
+ # Optional dependencies are zlib and snappy gem.
12
+ # No jars from hadoop are needed, thus reducing configuration and compatibility
13
+ # problems.
14
+ #
15
+ # USAGE:
16
+ # This is an example of logstash config:
17
+ #
18
+ # [source,ruby]
19
+ # ----------------------------------
20
+ # webhdfs {
21
+ # server => "127.0.0.1:50070" # (required)
22
+ # path => "/user/logstash/dt=%{+YYYY-MM-dd}/logstash-%{+HH}.log" # (required)
23
+ # user => "hue" # (optional)
24
+ # message_format => "%{@source_host}" # (optional)
25
+ # idle_flush_time => 10 # (optional)
26
+ # flush_size => 50 # (optional)
27
+ # open_timeout => 15 # (optional)
28
+ # read_timeout => 15 # (optional)
29
+ # use_httpfs => true # (optional)
30
+ # retry_interval => 1 # (optional)
31
+ # retry_times => 3 # (optional)
32
+ # compress => "snappy" # (optional)
33
+ # remove_at_timestamp => false # (optional)
34
+ # }
35
+ # ----------------------------------
36
+ #
37
+ # Author: Bjoern Puttmann <b.puttmann@dbap.de> - dbap GmbH, Münster, Germany.
38
+
39
+ class LogStash::Outputs::WebHdfs < LogStash::Outputs::Base
40
+ include Stud::Buffer
41
+
42
+ config_name "webhdfs"
43
+ milestone 1
44
+
45
+ if RUBY_VERSION[0..2] == '1.8'
46
+ MAGIC = "\x82SNAPPY\x0"
47
+ else
48
+ MAGIC = "\x82SNAPPY\x0".force_encoding Encoding::ASCII_8BIT
49
+ end
50
+ DEFAULT_VERSION = 1
51
+ MINIMUM_COMPATIBLE_VERSION = 1
52
+
53
+ # The server name and port for webhdfs/httpfs connections.
54
+ config :server, :validate => :string, :required => true
55
+
56
+ # The Username for webhdfs.
57
+ config :user, :validate => :string, :required => false
58
+
59
+ # The path to the file to write to. Event fields can be used here,
60
+ # as well as date fields in the joda time format, e.g.:
61
+ # ....
62
+ # "/user/logstash/dt=%{+YYYY-MM-dd}/%{@source_host}-%{+HH}.log"
63
+ # ....
64
+ config :path, :validate => :string, :required => true
65
+
66
+ # The format to use when writing events to the file. This value
67
+ # supports any string and can include %{name} and other dynamic
68
+ # strings.
69
+ #
70
+ # If this setting is omitted, the full json representation of the
71
+ # event will be written as a single line.
72
+ config :message_format, :validate => :string
73
+
74
+ # Sending data to webhdfs in x seconds intervals.
75
+ config :idle_flush_time, :validate => :number, :default => 1
76
+
77
+ # Sending data to webhdfs if event count is above, even if store_interval_in_secs is not reached.
78
+ config :flush_size, :validate => :number, :default => 500
79
+
80
+ # WebHdfs open timeout, default 30s (in ruby net/http).
81
+ config :open_timeout, :validate => :number, :default => 30
82
+
83
+ # The WebHdfs read timeout, default 30s (in ruby net/http).
84
+ config :read_timeout, :validate => :number, :default => 30
85
+
86
+ # Use httpfs mode if set to true, else webhdfs.
87
+ config :use_httpfs, :validate => :boolean, :default => false
88
+
89
+ # Retry some known webhdfs errors. These may be caused by race conditions when appending to same file, etc.
90
+ config :retry_known_errors, :validate => :boolean, :default => true
91
+
92
+ # How long should we wait between retries.
93
+ config :retry_interval, :validate => :number, :default => 0.5
94
+
95
+ # How many times should we retry.
96
+ config :retry_times, :validate => :number, :default => 5
97
+
98
+ # Compress output. One of [false, 'snappy', 'gzip']
99
+ config :compress, :validate => ["false", "snappy", "gzip"], :default => "false"
100
+
101
+ # Set snappy chunksize. Only neccessary for stream format. Defaults to 32k. Max is 65536
102
+ # @see http://code.google.com/p/snappy/source/browse/trunk/framing_format.txt
103
+ config :snappy_bufsize, :validate => :number, :default => 32768
104
+
105
+ # Set snappy format. One of "stream", "file". Set to stream to be hive compatible.
106
+ config :snappy_format, :validate => ["stream", "file"], :default => "stream"
107
+
108
+ # Remove @timestamp field. Hive does not like a leading "@", but we need @timestamp for path calculation.
109
+ config :remove_at_timestamp, :validate => :boolean, :default => true
110
+
111
+ public
112
+
113
+ def register
114
+ begin
115
+ require 'webhdfs'
116
+ rescue LoadError
117
+ @logger.error("Module webhdfs could not be loaded.")
118
+ end
119
+ if @compress == "gzip"
120
+ begin
121
+ require "zlib"
122
+ rescue LoadError
123
+ @logger.error("Gzip compression selected but zlib module could not be loaded.")
124
+ end
125
+ elsif @compress == "snappy"
126
+ begin
127
+ require "snappy"
128
+ rescue LoadError
129
+ @logger.error("Snappy compression selected but snappy module could not be loaded.")
130
+ end
131
+ end
132
+ @files = {}
133
+ @host, @port = @server.split(':')
134
+ @client = prepare_client(@host, @port, @user)
135
+ # Test client connection.
136
+ begin
137
+ @client.list('/')
138
+ rescue => e
139
+ @logger.error("Webhdfs check request failed. (namenode: #{@client.host}:#{@client.port}, Exception: #{e.message})")
140
+ end
141
+ buffer_initialize(
142
+ :max_items => @flush_size,
143
+ :max_interval => @idle_flush_time,
144
+ :logger => @logger
145
+ )
146
+ end # def register
147
+
148
+ public
149
+ def receive(event)
150
+ return unless output?(event)
151
+ buffer_receive(event)
152
+ end # def receive
153
+
154
+ def prepare_client(host, port, username)
155
+ client = WebHDFS::Client.new(host, port, username)
156
+ if @use_httpfs
157
+ client.httpfs_mode = true
158
+ end
159
+ client.open_timeout = @open_timeout
160
+ client.read_timeout = @read_timeout
161
+ if @retry_known_errors
162
+ client.retry_known_errors = true
163
+ client.retry_interval = @retry_interval if @retry_interval
164
+ client.retry_times = @retry_times if @retry_times
165
+ end
166
+ client
167
+ end
168
+
169
+ def flush(events=nil, teardown=false)
170
+ return if not events
171
+ # Avoid creating a new string for newline every time
172
+ newline = "\n".freeze
173
+ output_files = Hash.new { |hash, key| hash[key] = "" }
174
+ events.collect do |event|
175
+ path = event.sprintf(@path)
176
+ if @remove_at_timestamp
177
+ event.remove("@timestamp")
178
+ end
179
+ if @message_format
180
+ event_as_string = event.sprintf(@message_format)
181
+ else
182
+ event_as_string = event.to_json
183
+ end
184
+ event_as_string += newline unless event_as_string.end_with? newline
185
+ output_files[path] << event_as_string
186
+ end
187
+ output_files.each do |path, output|
188
+ if @compress == "gzip"
189
+ path += ".gz"
190
+ output = compress_gzip(output)
191
+ elsif @compress == "snappy"
192
+ path += ".snappy"
193
+ if @snappy_format == "file"
194
+ output = compress_snappy_file(output)
195
+ elsif
196
+ output = compress_snappy_stream(output)
197
+ end
198
+ end
199
+ write_tries = 0
200
+ while write_tries < @retry_times do
201
+ begin
202
+ write_data(path, output)
203
+ break
204
+ rescue => e
205
+ write_tries += 1
206
+ # Retry max_retry times. This can solve problems like leases being hold by another process. Sadly this is no
207
+ # KNOWN_ERROR in rubys webhdfs client.
208
+ if write_tries < @retry_times
209
+ @logger.warn("Retrying webhdfs write for multiple times. Maybe you should increase retry_interval or reduce number of workers.")
210
+ sleep(@retry_interval * write_tries)
211
+ next
212
+ else
213
+ # Issue error after max retries.
214
+ @logger.error("Max write retries reached. Exception: #{e.message}")
215
+ end
216
+ end
217
+ end
218
+ end
219
+ end
220
+
221
+ def compress_gzip(data)
222
+ buffer = StringIO.new('','w')
223
+ compressor = Zlib::GzipWriter.new(buffer)
224
+ begin
225
+ compressor.write data
226
+ ensure
227
+ compressor.close()
228
+ end
229
+ buffer.string
230
+ end
231
+
232
+ def compress_snappy_file(data)
233
+ # Encode data to ASCII_8BIT (binary)
234
+ data= data.encode(Encoding::ASCII_8BIT, "binary", :undef => :replace)
235
+ buffer = StringIO.new('', 'w')
236
+ buffer.set_encoding Encoding::ASCII_8BIT unless RUBY_VERSION =~ /^1\.8/
237
+ compressed = Snappy.deflate(chunk)
238
+ buffer << [compressed.size, compressed].pack("Na*")
239
+ buffer.string
240
+ end
241
+
242
+ def compress_snappy_stream(data)
243
+ # Encode data to ASCII_8BIT (binary)
244
+ data= data.encode(Encoding::ASCII_8BIT, "binary", :undef => :replace)
245
+ buffer = StringIO.new
246
+ buffer.set_encoding Encoding::ASCII_8BIT unless RUBY_VERSION =~ /^1\.8/
247
+ chunks = data.scan(/.{1,#{@snappy_bufsize}}/m)
248
+ chunks.each do |chunk|
249
+ compressed = Snappy.deflate(chunk)
250
+ buffer << [chunk.size, compressed.size, compressed].pack("NNa*")
251
+ end
252
+ return buffer.string
253
+ end
254
+
255
+ def get_snappy_header!
256
+ [MAGIC, DEFAULT_VERSION, MINIMUM_COMPATIBLE_VERSION].pack("a8NN")
257
+ end
258
+
259
+ def write_data(path, data)
260
+ begin
261
+ @client.append(path, data)
262
+ rescue WebHDFS::FileNotFoundError
263
+ # Add snappy header if format is "file".
264
+ if @snappy_format == "file"
265
+ @client.create(path, get_snappy_header! + data)
266
+ elsif
267
+ @client.create(path, data)
268
+ end
269
+ end
270
+ end
271
+
272
+ def teardown
273
+ buffer_flush(:final => true)
274
+ end # def teardown
275
+ end
@@ -0,0 +1,27 @@
1
+ Gem::Specification.new do |s|
2
+
3
+ s.name = 'logstash-output-webhdfs'
4
+ s.version = '0.0.1'
5
+ s.licenses = ['Apache License (2.0)']
6
+ s.summary = "Plugin to write events to hdfs via webhdfs."
7
+ s.description = "This gem is a logstash plugin required to be installed on top of the Logstash core pipeline using $LS_HOME/bin/plugin install gemname. This gem is not a stand-alone program"
8
+ s.authors = ["Björn Puttmann, loshkovskyi"]
9
+ s.email = 'b.puttmann@dbap.de'
10
+ s.homepage = "http://www.dbap.de"
11
+ s.require_paths = ["lib"]
12
+
13
+ # Files
14
+ s.files = `git ls-files`.split($\)+::Dir.glob('vendor/*')
15
+
16
+ # Tests
17
+ s.test_files = s.files.grep(%r{^(test|spec|features)/})
18
+
19
+ # Special flag to let us know this is actually a logstash plugin
20
+ s.metadata = { "logstash_plugin" => "true", "logstash_group" => "output" }
21
+
22
+ # Gem dependencies
23
+ s.add_runtime_dependency 'logstash-core', '>= 1.4.0', '< 2.0.0'
24
+ s.add_runtime_dependency 'webhdfs'
25
+ s.add_runtime_dependency 'snappy'
26
+ s.add_development_dependency 'logstash-devutils'
27
+ end
@@ -0,0 +1,49 @@
1
+ # encoding: utf-8
2
+ require 'logstash/devutils/rspec/spec_helper'
3
+ require 'logstash/outputs/webhdfs'
4
+
5
+ describe 'outputs/webhdfs' do
6
+
7
+ let(:event) do
8
+ LogStash::Event.new(
9
+ 'message' => 'fanastic log entry',
10
+ 'source' => 'someapp',
11
+ 'type' => 'nginx',
12
+ '@timestamp' => LogStash::Timestamp.now)
13
+ end
14
+
15
+ context 'when initializing' do
16
+
17
+ it 'should fail to register without required values' do
18
+ configuration_error = false
19
+ begin
20
+ LogStash::Plugin.lookup("output", "webhdfs").new()
21
+ rescue LogStash::ConfigurationError
22
+ configuration_error = true
23
+ end
24
+ insist { configuration_error } == true
25
+ end
26
+
27
+ subject = LogStash::Plugin.lookup("output", "webhdfs").new('server' => '127.0.0.1:50070', 'path' => '/path/to/webhdfs.file')
28
+
29
+ it 'should register' do
30
+ expect { subject.register }.to_not raise_error
31
+ end
32
+
33
+ it 'should have default config values' do
34
+ insist { subject.idle_flush_time } == 1
35
+ insist { subject.flush_size } == 500
36
+ insist { subject.open_timeout } == 30
37
+ insist { subject.read_timeout } == 30
38
+ insist { subject.use_httpfs } == false
39
+ insist { subject.retry_known_errors } == true
40
+ insist { subject.retry_interval } == 0.5
41
+ insist { subject.retry_times } == 5
42
+ insist { subject.snappy_bufsize } == 32768
43
+ insist { subject.snappy_format } == 'stream'
44
+ insist { subject.remove_at_timestamp } == true
45
+ end
46
+
47
+ end
48
+
49
+ end
metadata ADDED
@@ -0,0 +1,116 @@
1
+ --- !ruby/object:Gem::Specification
2
+ name: logstash-output-webhdfs
3
+ version: !ruby/object:Gem::Version
4
+ version: 0.0.1
5
+ platform: ruby
6
+ authors:
7
+ - Björn Puttmann, loshkovskyi
8
+ autorequire:
9
+ bindir: bin
10
+ cert_chain: []
11
+ date: 2015-05-19 00:00:00.000000000 Z
12
+ dependencies:
13
+ - !ruby/object:Gem::Dependency
14
+ name: logstash-core
15
+ version_requirements: !ruby/object:Gem::Requirement
16
+ requirements:
17
+ - - '>='
18
+ - !ruby/object:Gem::Version
19
+ version: 1.4.0
20
+ - - <
21
+ - !ruby/object:Gem::Version
22
+ version: 2.0.0
23
+ requirement: !ruby/object:Gem::Requirement
24
+ requirements:
25
+ - - '>='
26
+ - !ruby/object:Gem::Version
27
+ version: 1.4.0
28
+ - - <
29
+ - !ruby/object:Gem::Version
30
+ version: 2.0.0
31
+ prerelease: false
32
+ type: :runtime
33
+ - !ruby/object:Gem::Dependency
34
+ name: webhdfs
35
+ version_requirements: !ruby/object:Gem::Requirement
36
+ requirements:
37
+ - - '>='
38
+ - !ruby/object:Gem::Version
39
+ version: '0'
40
+ requirement: !ruby/object:Gem::Requirement
41
+ requirements:
42
+ - - '>='
43
+ - !ruby/object:Gem::Version
44
+ version: '0'
45
+ prerelease: false
46
+ type: :runtime
47
+ - !ruby/object:Gem::Dependency
48
+ name: snappy
49
+ version_requirements: !ruby/object:Gem::Requirement
50
+ requirements:
51
+ - - '>='
52
+ - !ruby/object:Gem::Version
53
+ version: '0'
54
+ requirement: !ruby/object:Gem::Requirement
55
+ requirements:
56
+ - - '>='
57
+ - !ruby/object:Gem::Version
58
+ version: '0'
59
+ prerelease: false
60
+ type: :runtime
61
+ - !ruby/object:Gem::Dependency
62
+ name: logstash-devutils
63
+ version_requirements: !ruby/object:Gem::Requirement
64
+ requirements:
65
+ - - '>='
66
+ - !ruby/object:Gem::Version
67
+ version: '0'
68
+ requirement: !ruby/object:Gem::Requirement
69
+ requirements:
70
+ - - '>='
71
+ - !ruby/object:Gem::Version
72
+ version: '0'
73
+ prerelease: false
74
+ type: :development
75
+ description: This gem is a logstash plugin required to be installed on top of the Logstash core pipeline using $LS_HOME/bin/plugin install gemname. This gem is not a stand-alone program
76
+ email: b.puttmann@dbap.de
77
+ executables: []
78
+ extensions: []
79
+ extra_rdoc_files: []
80
+ files:
81
+ - .gitignore
82
+ - Gemfile
83
+ - LICENSE
84
+ - README.md
85
+ - Rakefile
86
+ - lib/logstash/outputs/webhdfs.rb
87
+ - logstash-output-webhdfs.gemspec
88
+ - spec/outputs/webhdfs_spec.rb
89
+ homepage: http://www.dbap.de
90
+ licenses:
91
+ - Apache License (2.0)
92
+ metadata:
93
+ logstash_plugin: 'true'
94
+ logstash_group: output
95
+ post_install_message:
96
+ rdoc_options: []
97
+ require_paths:
98
+ - lib
99
+ required_ruby_version: !ruby/object:Gem::Requirement
100
+ requirements:
101
+ - - '>='
102
+ - !ruby/object:Gem::Version
103
+ version: '0'
104
+ required_rubygems_version: !ruby/object:Gem::Requirement
105
+ requirements:
106
+ - - '>='
107
+ - !ruby/object:Gem::Version
108
+ version: '0'
109
+ requirements: []
110
+ rubyforge_project:
111
+ rubygems_version: 2.2.4
112
+ signing_key:
113
+ specification_version: 4
114
+ summary: Plugin to write events to hdfs via webhdfs.
115
+ test_files:
116
+ - spec/outputs/webhdfs_spec.rb