logstash-input-azureblob 0.9.12-java

Sign up to get free protection for your applications and to get access to all the features.
@@ -0,0 +1,7 @@
1
+ ---
2
+ SHA256:
3
+ metadata.gz: 5947577417b1f859db0712b7c414a536f4456a359d222d5069cc400d8a4ceb50
4
+ data.tar.gz: 2c57b9f7ec19871095b19f0eb7fa07f05e3d7b67386b7815ee583331d50a10f6
5
+ SHA512:
6
+ metadata.gz: dd9c54213183b732055ccf15470b41e0428933f942ac23911abefa9a535453e7b01721a922ad5a3677d90a581ddd4603628fdfc5655682a66fe6fa9045cdf737
7
+ data.tar.gz: 11ea4a6e8d69e1640bcbc078c7d01bc93b2f9a5cf45d30c6a2d12357f8cdb30dbbe482ad8a44a08eb3bbd6ac8808766b5c9b5c1ae8fcbf5321c23092103d68e0
@@ -0,0 +1,7 @@
1
+ ## 2016.08.17
2
+ * Added a new configuration parameter for custom endpoint.
3
+
4
+ ## 2016.05.05
5
+ * Made the plugin to respect Logstash shutdown signal.
6
+ * Updated the *logstash-core* runtime dependency requirement to '~> 2.0'.
7
+ * Updated the *logstash-devutils* development dependency requirement to '>= 0.0.16'
data/Gemfile ADDED
@@ -0,0 +1,2 @@
1
+ source 'https://rubygems.org'
2
+ gemspec
data/LICENSE ADDED
@@ -0,0 +1,17 @@
1
+
2
+ Copyright (c) Microsoft. All rights reserved.
3
+ Microsoft would like to thank its contributors, a list
4
+ of whom are at http://aka.ms/entlib-contributors
5
+
6
+ Licensed under the Apache License, Version 2.0 (the "License"); you
7
+ may not use this file except in compliance with the License. You may
8
+ obtain a copy of the License at
9
+
10
+ http://www.apache.org/licenses/LICENSE-2.0
11
+
12
+ Unless required by applicable law or agreed to in writing, software
13
+ distributed under the License is distributed on an "AS IS" BASIS,
14
+ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
15
+ implied. See the License for the specific language governing permissions
16
+ and limitations under the License.
17
+
@@ -0,0 +1,253 @@
1
+ # Logstash input plugin for Azure Storage Blobs
2
+
3
+ ## Summary
4
+ This plugin reads and parses data from Azure Storage Blobs.
5
+
6
+ ## Installation
7
+ You can install this plugin using the Logstash "plugin" or "logstash-plugin" (for newer versions of Logstash) command:
8
+ ```sh
9
+ logstash-plugin install logstash-input-azureblob
10
+ ```
11
+ For more information, see Logstash reference [Working with plugins](https://www.elastic.co/guide/en/logstash/current/working-with-plugins.html).
12
+
13
+ ## Configuration
14
+ ### Required Parameters
15
+ __*storage_account_name*__
16
+
17
+ The storage account name.
18
+
19
+ __*storage_access_key*__
20
+
21
+ The access key to the storage account.
22
+
23
+ __*container*__
24
+
25
+ The blob container name.
26
+
27
+ ### Optional Parameters
28
+ __*endpoint*__
29
+
30
+ Specifies the endpoint of Azure Service Management. The default value is `core.windows.net`.
31
+
32
+ __*registry_path*__
33
+
34
+ Specifies the file path for the registry file to record offsets and coordinate between multiple clients. The default value is `data/registry`.
35
+
36
+ Overwrite this value when there happen to be a file at the path of `data/registry` in the azure blob container.
37
+
38
+ __*interval*__
39
+
40
+ Set how many seconds to idle before checking for new logs. The default, `30`, means idle for `30` seconds.
41
+
42
+ __*registry_create_policy*__
43
+
44
+ Specifies the way to initially set offset for existing blob files.
45
+
46
+ This option only applies for registry creation.
47
+
48
+ Valid values include:
49
+
50
+ - resume
51
+ - start_over
52
+
53
+ The default, `resume`, means when the registry is initially created, it assumes all blob has been consumed and it will start to pick up any new content in the blobs.
54
+
55
+ When set to `start_over`, it assumes none of the blob is consumed and it will read all blob files from begining.
56
+
57
+ Offsets will be picked up from registry file whenever it exists.
58
+
59
+ __*file_head_bytes*__
60
+
61
+ Specifies the header of the file in bytes that does not repeat over records. Usually, these are json opening tags. The default value is `0`.
62
+
63
+ __*file_tail_bytes*__
64
+
65
+ Specifies the tail of the file that does not repeat over records. Usually, these are json closing tags. The defaul tvalue is `0`.
66
+
67
+ ### Advanced tweaking parameters
68
+
69
+ Keep these parameters default to use under normal situration. Tweak these parameters when dealing with large scale azure blobs and logs.
70
+
71
+ __*blob_list_page_size*__
72
+
73
+ Specifies the page-size for returned blob items. Too big number will hit heap overflow; Too small number will leads to too many requests. The default of `100` is good for heap size of 1G.
74
+
75
+ __*file_chunk_size_bytes*__
76
+
77
+ Specifies the buffer size used to download the blob content. This is also the maximum buffer size that will be passed to a codec except for JSON. The JSON codec will only receive valid JSON that might span between multiple chunks. Any malformed JSON content will be skipped.
78
+
79
+ The default value is 4194304 (4MB)
80
+
81
+ ### Examples
82
+
83
+ * Bare-bone settings:
84
+
85
+ ```yaml
86
+ input
87
+ {
88
+ azureblob
89
+ {
90
+ storage_account_name => "mystorageaccount"
91
+ storage_access_key => "VGhpcyBpcyBhIGZha2Uga2V5Lg=="
92
+ container => "mycontainer"
93
+ }
94
+ }
95
+ ```
96
+
97
+ * Example for Wad-IIS
98
+
99
+ ```yaml
100
+ input {
101
+ azureblob
102
+ {
103
+ storage_account_name => 'mystorageaccount'
104
+ storage_access_key => 'VGhpcyBpcyBhIGZha2Uga2V5Lg=='
105
+ container => 'wad-iis-logfiles'
106
+ codec => line
107
+ }
108
+ }
109
+ filter {
110
+ ## Ignore the comments that IIS will add to the start of the W3C logs
111
+ #
112
+ if [message] =~ "^#" {
113
+ drop {}
114
+ }
115
+
116
+ grok {
117
+ # https://grokdebug.herokuapp.com/
118
+ match => ["message", "%{TIMESTAMP_ISO8601:log_timestamp} %{WORD:sitename} %{WORD:computername} %{IP:server_ip} %{WORD:method} %{URIPATH:uriStem} %{NOTSPACE:uriQuery} %{NUMBER:port} %{NOTSPACE:username} %{IPORHOST:clientIP} %{NOTSPACE:protocolVersion} %{NOTSPACE:userAgent} %{NOTSPACE:cookie} %{NOTSPACE:referer} %{NOTSPACE:requestHost} %{NUMBER:response} %{NUMBER:subresponse} %{NUMBER:win32response} %{NUMBER:bytesSent} %{NUMBER:bytesReceived} %{NUMBER:timetaken}"]
119
+ }
120
+
121
+ ## Set the Event Timesteamp from the log
122
+ #
123
+ date {
124
+ match => [ "log_timestamp", "YYYY-MM-dd HH:mm:ss" ]
125
+ timezone => "Etc/UTC"
126
+ }
127
+
128
+ ## If the log record has a value for 'bytesSent', then add a new field
129
+ # to the event that converts it to kilobytes
130
+ #
131
+ if [bytesSent] {
132
+ ruby {
133
+ code => "event.set('kilobytesSent', event.get('bytesSent').to_i / 1024.0)"
134
+ }
135
+ }
136
+
137
+ ## Do the same conversion for the bytes received value
138
+ #
139
+ if [bytesReceived] {
140
+ ruby {
141
+ code => "event.set('kilobytesReceived', event.get('bytesReceived').to_i / 1024.0 )"
142
+ }
143
+ }
144
+
145
+ ## Perform some mutations on the records to prep them for Elastic
146
+ #
147
+ mutate {
148
+ ## Convert some fields from strings to integers
149
+ #
150
+ convert => ["bytesSent", "integer"]
151
+ convert => ["bytesReceived", "integer"]
152
+ convert => ["timetaken", "integer"]
153
+
154
+ ## Create a new field for the reverse DNS lookup below
155
+ #
156
+ add_field => { "clientHostname" => "%{clientIP}" }
157
+
158
+ ## Finally remove the original log_timestamp field since the event will
159
+ # have the proper date on it
160
+ #
161
+ remove_field => [ "log_timestamp"]
162
+ }
163
+
164
+ ## Do a reverse lookup on the client IP to get their hostname.
165
+ #
166
+ dns {
167
+ ## Now that we've copied the clientIP into a new field we can
168
+ # simply replace it here using a reverse lookup
169
+ #
170
+ action => "replace"
171
+ reverse => ["clientHostname"]
172
+ }
173
+
174
+ ## Parse out the user agent
175
+ #
176
+ useragent {
177
+ source=> "useragent"
178
+ prefix=> "browser"
179
+ }
180
+ }
181
+ output {
182
+ file {
183
+ path => '/var/tmp/logstash-file-output'
184
+ codec => rubydebug
185
+ }
186
+ stdout {
187
+ codec => rubydebug
188
+ }
189
+ }
190
+ ```
191
+
192
+ * NSG Logs
193
+
194
+ ```yaml
195
+ input {
196
+ azureblob
197
+ {
198
+ storage_account_name => "mystorageaccount"
199
+ storage_access_key => "VGhpcyBpcyBhIGZha2Uga2V5Lg=="
200
+ container => "insights-logs-networksecuritygroupflowevent"
201
+ codec => "json"
202
+ # Refer https://docs.microsoft.com/en-us/azure/network-watcher/network-watcher-read-nsg-flow-logs
203
+ # Typical numbers could be 21/9 or 12/2 depends on the nsg log file types
204
+ file_head_bytes => 21
205
+ file_tail_bytes => 9
206
+ # Enable / tweak these settings when event is too big for codec to handle.
207
+ # break_json_down_policy => "with_head_tail"
208
+ # break_json_batch_count => 2
209
+ }
210
+ }
211
+
212
+ filter {
213
+ split { field => "[records]" }
214
+ split { field => "[records][properties][flows]"}
215
+ split { field => "[records][properties][flows][flows]"}
216
+ split { field => "[records][properties][flows][flows][flowTuples]"}
217
+
218
+ mutate{
219
+ split => { "[records][resourceId]" => "/"}
220
+ add_field => {"Subscription" => "%{[records][resourceId][2]}"
221
+ "ResourceGroup" => "%{[records][resourceId][4]}"
222
+ "NetworkSecurityGroup" => "%{[records][resourceId][8]}"}
223
+ convert => {"Subscription" => "string"}
224
+ convert => {"ResourceGroup" => "string"}
225
+ convert => {"NetworkSecurityGroup" => "string"}
226
+ split => { "[records][properties][flows][flows][flowTuples]" => ","}
227
+ add_field => {
228
+ "unixtimestamp" => "%{[records][properties][flows][flows][flowTuples][0]}"
229
+ "srcIp" => "%{[records][properties][flows][flows][flowTuples][1]}"
230
+ "destIp" => "%{[records][properties][flows][flows][flowTuples][2]}"
231
+ "srcPort" => "%{[records][properties][flows][flows][flowTuples][3]}"
232
+ "destPort" => "%{[records][properties][flows][flows][flowTuples][4]}"
233
+ "protocol" => "%{[records][properties][flows][flows][flowTuples][5]}"
234
+ "trafficflow" => "%{[records][properties][flows][flows][flowTuples][6]}"
235
+ "traffic" => "%{[records][properties][flows][flows][flowTuples][7]}"
236
+ }
237
+ convert => {"unixtimestamp" => "integer"}
238
+ convert => {"srcPort" => "integer"}
239
+ convert => {"destPort" => "integer"}
240
+ }
241
+
242
+ date{
243
+ match => ["unixtimestamp" , "UNIX"]
244
+ }
245
+ }
246
+
247
+ output {
248
+ stdout { codec => rubydebug }
249
+ }
250
+ ```
251
+
252
+ ## More information
253
+ The source code of this plugin is hosted in GitHub repo [Microsoft Azure Diagnostics with ELK](https://github.com/Azure/azure-diagnostics-tools). We welcome you to provide feedback and/or contribute to the project.
@@ -0,0 +1,202 @@
1
+ # encoding: utf-8
2
+
3
+ require Dir[ File.dirname(__FILE__) + "/../../*_jars.rb" ].first
4
+
5
+ # Interface for a class that reads strings of arbitrary length from the end of a container
6
+ class LinearReader
7
+ # returns [content, are_more_bytes_available]
8
+ # content is a string
9
+ # are_more_bytes_available is a boolean stating if the container has more bytes to read
10
+ def read()
11
+ raise 'not implemented'
12
+ end
13
+ end
14
+
15
+ class JsonParser
16
+ def initialize(logger, linear_reader)
17
+ @logger = logger
18
+ @linear_reader = linear_reader
19
+ @stream_base_offset = 0
20
+
21
+ @stream_reader = StreamReader.new(@logger,@linear_reader)
22
+ @parser_factory = javax::json::Json.createParserFactory(nil)
23
+ @parser = @parser_factory.createParser(@stream_reader)
24
+ end
25
+
26
+ def parse(on_json_cbk, on_skip_malformed_cbk)
27
+ completed = false
28
+ while !completed
29
+ completed, start_index, end_index = parse_single_object(on_json_cbk)
30
+ if !completed
31
+
32
+ # if current position in the stream is not a well formed JSON then
33
+ # I can skip all future chars until I find a '{' so I won't have to create the parser for each char
34
+ json_candidate_start_index = @stream_reader.find('{', end_index)
35
+ json_candidate_start_index = @stream_reader.get_cached_stream_length - 1 if json_candidate_start_index.nil?
36
+ @logger.debug("JsonParser::parse Skipping Malformed JSON (start: #{start_index} end: #{end_index} candidate: #{json_candidate_start_index - 1}). Resetting the parser")
37
+ end_index = json_candidate_start_index - 1
38
+
39
+ on_skip_malformed_cbk.call(@stream_reader.get_stream_buffer(start_index, end_index))
40
+ @stream_reader.drop_stream(end_index + 1)
41
+ @stream_reader.reset_cached_stream_index(0)
42
+
43
+ @stream_base_offset = 0
44
+ @parser.close()
45
+ if @stream_reader.get_cached_stream_length <= 1
46
+ on_skip_malformed_cbk.call(@stream_reader.get_stream_buffer(0, -1))
47
+ return
48
+ end
49
+ @parser = @parser_factory.createParser(@stream_reader)
50
+ end
51
+ end
52
+ end
53
+
54
+ private
55
+ def parse_single_object(on_json_cbk)
56
+ depth = 0
57
+ stream_start_offset = 0
58
+ stream_end_offset = 0
59
+ while @parser.hasNext
60
+ event = @parser.next
61
+
62
+ if event == javax::json::stream::JsonParser::Event::START_OBJECT
63
+ depth = depth + 1
64
+ elsif event == javax::json::stream::JsonParser::Event::END_OBJECT
65
+ depth = depth - 1 # can't be negative because the parser handles the format correctness
66
+
67
+ if depth == 0
68
+ stream_end_offset = @parser.getLocation() .getStreamOffset() - 1
69
+ @logger.debug ("JsonParser::parse_single_object Json object found stream_start_offset: #{stream_start_offset} stream_end_offset: #{stream_end_offset}")
70
+
71
+ on_json_cbk.call(@stream_reader.get_stream_buffer(stream_start_offset - @stream_base_offset, stream_end_offset - @stream_base_offset))
72
+ stream_start_offset = stream_end_offset + 1
73
+
74
+ #Drop parsed bytes
75
+ @stream_reader.drop_stream(stream_end_offset - @stream_base_offset)
76
+ @stream_base_offset = stream_end_offset
77
+ end
78
+
79
+ end
80
+ end
81
+ return true
82
+ rescue javax::json::stream::JsonParsingException => e
83
+ return false, stream_start_offset - @stream_base_offset,
84
+ @parser.getLocation().getStreamOffset() - 1 - @stream_base_offset
85
+ rescue javax::json::JsonException, java::util::NoSuchElementException => e
86
+ @logger.debug("JsonParser::parse_single_object Exception stream_start_offset: #{stream_start_offset} stream_end_offset: #{stream_end_offset}")
87
+ raise e
88
+ end
89
+ end # class JsonParser
90
+
91
+ class StreamReader < java::io::Reader
92
+ def initialize(logger, reader)
93
+ super()
94
+ @logger = logger
95
+ @reader = reader
96
+
97
+ @stream_buffer = ""
98
+ @is_full_stream_read = false
99
+ @index = 0
100
+ @stream_buffer_length = 0
101
+ end
102
+
103
+ def markSupported
104
+ return false
105
+ end
106
+
107
+ def close
108
+ end
109
+
110
+ def get_cached_stream_length
111
+ return @stream_buffer_length
112
+ end
113
+
114
+ def get_cached_stream_index
115
+ return @index
116
+ end
117
+
118
+ def get_stream_buffer(start_index, end_index)
119
+ return @stream_buffer[start_index..end_index]
120
+ end
121
+
122
+ def find(substring, offset)
123
+ return @stream_buffer.index(substring, offset)
124
+ end
125
+
126
+ def drop_stream(until_offset)
127
+ @logger.debug("StreamReader::drop_stream until_offset:#{until_offset} index: #{@index}")
128
+ if @index < until_offset
129
+ return
130
+ end
131
+ @stream_buffer = @stream_buffer[until_offset..-1]
132
+ @index = @index - until_offset
133
+ @stream_buffer_length = @stream_buffer_length - until_offset
134
+ end
135
+
136
+ def reset_cached_stream_index(new_offset)
137
+ @logger.debug("StreamReader::reset_cached_stream_index new_offset:#{new_offset} index: #{@index}")
138
+ if new_offset < 0
139
+ return
140
+ end
141
+ @index = new_offset
142
+ end
143
+
144
+ #offset refers to the offset in the output bufferhttp://docs.oracle.com/javase/7/docs/api/java/io/Reader.html#read(char[],%20int,%20int)
145
+ def read(buf, offset, len)
146
+ @logger.debug("StreamReader::read #{offset} #{len} | stream index: #{@index} stream length: #{@stream_buffer_length}")
147
+ are_all_bytes_available = true
148
+ if @index + len - offset > @stream_buffer_length
149
+ are_all_bytes_available = fill_stream_buffer(@index + len - offset - @stream_buffer_length)
150
+ end
151
+
152
+ if (@stream_buffer_length - @index) < len
153
+ len = @stream_buffer_length - @index
154
+ @logger.debug("StreamReader::read #{offset} Actual length: #{len}")
155
+ end
156
+
157
+ if len > 0
158
+ #TODO: optimize this
159
+ jv_string = @stream_buffer[@index..@index+len-1].to_java
160
+ jv_bytes_array = jv_string.toCharArray()
161
+ java::lang::System.arraycopy(jv_bytes_array, 0, buf, offset, len)
162
+
163
+ @index = @index + len
164
+ end
165
+
166
+ if !are_all_bytes_available && len == 0
167
+ @logger.debug("StreamReader::read end of stream")
168
+ return -1
169
+ else
170
+ return len
171
+ end
172
+
173
+ rescue java::lang::IndexOutOfBoundsException => e
174
+ @logger.debug("StreamReader::read IndexOutOfBoundsException")
175
+ raise e
176
+ rescue java::lang::ArrayStoreException => e
177
+ @logger.debug("StreamReader::read ArrayStoreException")
178
+ raise e
179
+ rescue java::lang::NullPointerException => e
180
+ @logger.debug("StreamReader::read NullPointerException")
181
+ raise e
182
+ end
183
+
184
+ private
185
+ def fill_stream_buffer(len)
186
+ @logger.debug("StreamReader::fill_stream_buffer #{len}")
187
+ bytes_read = 0
188
+ while bytes_read < len
189
+ content, are_more_bytes_available = @reader.read
190
+ if !content.nil? && content.length > 0
191
+ @stream_buffer << content
192
+ @stream_buffer_length = @stream_buffer_length + content.length
193
+ bytes_read = bytes_read + content.length
194
+ end
195
+ if !are_more_bytes_available
196
+ return false
197
+ end
198
+ end
199
+ return true
200
+ end
201
+
202
+ end # class StreamReader