logstash-input-azureblob 0.9.12-java

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,7 @@
1
+ ---
2
+ SHA256:
3
+ metadata.gz: 5947577417b1f859db0712b7c414a536f4456a359d222d5069cc400d8a4ceb50
4
+ data.tar.gz: 2c57b9f7ec19871095b19f0eb7fa07f05e3d7b67386b7815ee583331d50a10f6
5
+ SHA512:
6
+ metadata.gz: dd9c54213183b732055ccf15470b41e0428933f942ac23911abefa9a535453e7b01721a922ad5a3677d90a581ddd4603628fdfc5655682a66fe6fa9045cdf737
7
+ data.tar.gz: 11ea4a6e8d69e1640bcbc078c7d01bc93b2f9a5cf45d30c6a2d12357f8cdb30dbbe482ad8a44a08eb3bbd6ac8808766b5c9b5c1ae8fcbf5321c23092103d68e0
@@ -0,0 +1,7 @@
1
+ ## 2016.08.17
2
+ * Added a new configuration parameter for custom endpoint.
3
+
4
+ ## 2016.05.05
5
+ * Made the plugin to respect Logstash shutdown signal.
6
+ * Updated the *logstash-core* runtime dependency requirement to '~> 2.0'.
7
+ * Updated the *logstash-devutils* development dependency requirement to '>= 0.0.16'
data/Gemfile ADDED
@@ -0,0 +1,2 @@
1
+ source 'https://rubygems.org'
2
+ gemspec
data/LICENSE ADDED
@@ -0,0 +1,17 @@
1
+
2
+ Copyright (c) Microsoft. All rights reserved.
3
+ Microsoft would like to thank its contributors, a list
4
+ of whom are at http://aka.ms/entlib-contributors
5
+
6
+ Licensed under the Apache License, Version 2.0 (the "License"); you
7
+ may not use this file except in compliance with the License. You may
8
+ obtain a copy of the License at
9
+
10
+ http://www.apache.org/licenses/LICENSE-2.0
11
+
12
+ Unless required by applicable law or agreed to in writing, software
13
+ distributed under the License is distributed on an "AS IS" BASIS,
14
+ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
15
+ implied. See the License for the specific language governing permissions
16
+ and limitations under the License.
17
+
@@ -0,0 +1,253 @@
1
+ # Logstash input plugin for Azure Storage Blobs
2
+
3
+ ## Summary
4
+ This plugin reads and parses data from Azure Storage Blobs.
5
+
6
+ ## Installation
7
+ You can install this plugin using the Logstash "plugin" or "logstash-plugin" (for newer versions of Logstash) command:
8
+ ```sh
9
+ logstash-plugin install logstash-input-azureblob
10
+ ```
11
+ For more information, see Logstash reference [Working with plugins](https://www.elastic.co/guide/en/logstash/current/working-with-plugins.html).
12
+
13
+ ## Configuration
14
+ ### Required Parameters
15
+ __*storage_account_name*__
16
+
17
+ The storage account name.
18
+
19
+ __*storage_access_key*__
20
+
21
+ The access key to the storage account.
22
+
23
+ __*container*__
24
+
25
+ The blob container name.
26
+
27
+ ### Optional Parameters
28
+ __*endpoint*__
29
+
30
+ Specifies the endpoint of Azure Service Management. The default value is `core.windows.net`.
31
+
32
+ __*registry_path*__
33
+
34
+ Specifies the file path for the registry file to record offsets and coordinate between multiple clients. The default value is `data/registry`.
35
+
36
+ Overwrite this value when there happen to be a file at the path of `data/registry` in the azure blob container.
37
+
38
+ __*interval*__
39
+
40
+ Set how many seconds to idle before checking for new logs. The default, `30`, means idle for `30` seconds.
41
+
42
+ __*registry_create_policy*__
43
+
44
+ Specifies the way to initially set offset for existing blob files.
45
+
46
+ This option only applies for registry creation.
47
+
48
+ Valid values include:
49
+
50
+ - resume
51
+ - start_over
52
+
53
+ The default, `resume`, means when the registry is initially created, it assumes all blob has been consumed and it will start to pick up any new content in the blobs.
54
+
55
+ When set to `start_over`, it assumes none of the blob is consumed and it will read all blob files from begining.
56
+
57
+ Offsets will be picked up from registry file whenever it exists.
58
+
59
+ __*file_head_bytes*__
60
+
61
+ Specifies the header of the file in bytes that does not repeat over records. Usually, these are json opening tags. The default value is `0`.
62
+
63
+ __*file_tail_bytes*__
64
+
65
+ Specifies the tail of the file that does not repeat over records. Usually, these are json closing tags. The defaul tvalue is `0`.
66
+
67
+ ### Advanced tweaking parameters
68
+
69
+ Keep these parameters default to use under normal situration. Tweak these parameters when dealing with large scale azure blobs and logs.
70
+
71
+ __*blob_list_page_size*__
72
+
73
+ Specifies the page-size for returned blob items. Too big number will hit heap overflow; Too small number will leads to too many requests. The default of `100` is good for heap size of 1G.
74
+
75
+ __*file_chunk_size_bytes*__
76
+
77
+ Specifies the buffer size used to download the blob content. This is also the maximum buffer size that will be passed to a codec except for JSON. The JSON codec will only receive valid JSON that might span between multiple chunks. Any malformed JSON content will be skipped.
78
+
79
+ The default value is 4194304 (4MB)
80
+
81
+ ### Examples
82
+
83
+ * Bare-bone settings:
84
+
85
+ ```yaml
86
+ input
87
+ {
88
+ azureblob
89
+ {
90
+ storage_account_name => "mystorageaccount"
91
+ storage_access_key => "VGhpcyBpcyBhIGZha2Uga2V5Lg=="
92
+ container => "mycontainer"
93
+ }
94
+ }
95
+ ```
96
+
97
+ * Example for Wad-IIS
98
+
99
+ ```yaml
100
+ input {
101
+ azureblob
102
+ {
103
+ storage_account_name => 'mystorageaccount'
104
+ storage_access_key => 'VGhpcyBpcyBhIGZha2Uga2V5Lg=='
105
+ container => 'wad-iis-logfiles'
106
+ codec => line
107
+ }
108
+ }
109
+ filter {
110
+ ## Ignore the comments that IIS will add to the start of the W3C logs
111
+ #
112
+ if [message] =~ "^#" {
113
+ drop {}
114
+ }
115
+
116
+ grok {
117
+ # https://grokdebug.herokuapp.com/
118
+ match => ["message", "%{TIMESTAMP_ISO8601:log_timestamp} %{WORD:sitename} %{WORD:computername} %{IP:server_ip} %{WORD:method} %{URIPATH:uriStem} %{NOTSPACE:uriQuery} %{NUMBER:port} %{NOTSPACE:username} %{IPORHOST:clientIP} %{NOTSPACE:protocolVersion} %{NOTSPACE:userAgent} %{NOTSPACE:cookie} %{NOTSPACE:referer} %{NOTSPACE:requestHost} %{NUMBER:response} %{NUMBER:subresponse} %{NUMBER:win32response} %{NUMBER:bytesSent} %{NUMBER:bytesReceived} %{NUMBER:timetaken}"]
119
+ }
120
+
121
+ ## Set the Event Timesteamp from the log
122
+ #
123
+ date {
124
+ match => [ "log_timestamp", "YYYY-MM-dd HH:mm:ss" ]
125
+ timezone => "Etc/UTC"
126
+ }
127
+
128
+ ## If the log record has a value for 'bytesSent', then add a new field
129
+ # to the event that converts it to kilobytes
130
+ #
131
+ if [bytesSent] {
132
+ ruby {
133
+ code => "event.set('kilobytesSent', event.get('bytesSent').to_i / 1024.0)"
134
+ }
135
+ }
136
+
137
+ ## Do the same conversion for the bytes received value
138
+ #
139
+ if [bytesReceived] {
140
+ ruby {
141
+ code => "event.set('kilobytesReceived', event.get('bytesReceived').to_i / 1024.0 )"
142
+ }
143
+ }
144
+
145
+ ## Perform some mutations on the records to prep them for Elastic
146
+ #
147
+ mutate {
148
+ ## Convert some fields from strings to integers
149
+ #
150
+ convert => ["bytesSent", "integer"]
151
+ convert => ["bytesReceived", "integer"]
152
+ convert => ["timetaken", "integer"]
153
+
154
+ ## Create a new field for the reverse DNS lookup below
155
+ #
156
+ add_field => { "clientHostname" => "%{clientIP}" }
157
+
158
+ ## Finally remove the original log_timestamp field since the event will
159
+ # have the proper date on it
160
+ #
161
+ remove_field => [ "log_timestamp"]
162
+ }
163
+
164
+ ## Do a reverse lookup on the client IP to get their hostname.
165
+ #
166
+ dns {
167
+ ## Now that we've copied the clientIP into a new field we can
168
+ # simply replace it here using a reverse lookup
169
+ #
170
+ action => "replace"
171
+ reverse => ["clientHostname"]
172
+ }
173
+
174
+ ## Parse out the user agent
175
+ #
176
+ useragent {
177
+ source=> "useragent"
178
+ prefix=> "browser"
179
+ }
180
+ }
181
+ output {
182
+ file {
183
+ path => '/var/tmp/logstash-file-output'
184
+ codec => rubydebug
185
+ }
186
+ stdout {
187
+ codec => rubydebug
188
+ }
189
+ }
190
+ ```
191
+
192
+ * NSG Logs
193
+
194
+ ```yaml
195
+ input {
196
+ azureblob
197
+ {
198
+ storage_account_name => "mystorageaccount"
199
+ storage_access_key => "VGhpcyBpcyBhIGZha2Uga2V5Lg=="
200
+ container => "insights-logs-networksecuritygroupflowevent"
201
+ codec => "json"
202
+ # Refer https://docs.microsoft.com/en-us/azure/network-watcher/network-watcher-read-nsg-flow-logs
203
+ # Typical numbers could be 21/9 or 12/2 depends on the nsg log file types
204
+ file_head_bytes => 21
205
+ file_tail_bytes => 9
206
+ # Enable / tweak these settings when event is too big for codec to handle.
207
+ # break_json_down_policy => "with_head_tail"
208
+ # break_json_batch_count => 2
209
+ }
210
+ }
211
+
212
+ filter {
213
+ split { field => "[records]" }
214
+ split { field => "[records][properties][flows]"}
215
+ split { field => "[records][properties][flows][flows]"}
216
+ split { field => "[records][properties][flows][flows][flowTuples]"}
217
+
218
+ mutate{
219
+ split => { "[records][resourceId]" => "/"}
220
+ add_field => {"Subscription" => "%{[records][resourceId][2]}"
221
+ "ResourceGroup" => "%{[records][resourceId][4]}"
222
+ "NetworkSecurityGroup" => "%{[records][resourceId][8]}"}
223
+ convert => {"Subscription" => "string"}
224
+ convert => {"ResourceGroup" => "string"}
225
+ convert => {"NetworkSecurityGroup" => "string"}
226
+ split => { "[records][properties][flows][flows][flowTuples]" => ","}
227
+ add_field => {
228
+ "unixtimestamp" => "%{[records][properties][flows][flows][flowTuples][0]}"
229
+ "srcIp" => "%{[records][properties][flows][flows][flowTuples][1]}"
230
+ "destIp" => "%{[records][properties][flows][flows][flowTuples][2]}"
231
+ "srcPort" => "%{[records][properties][flows][flows][flowTuples][3]}"
232
+ "destPort" => "%{[records][properties][flows][flows][flowTuples][4]}"
233
+ "protocol" => "%{[records][properties][flows][flows][flowTuples][5]}"
234
+ "trafficflow" => "%{[records][properties][flows][flows][flowTuples][6]}"
235
+ "traffic" => "%{[records][properties][flows][flows][flowTuples][7]}"
236
+ }
237
+ convert => {"unixtimestamp" => "integer"}
238
+ convert => {"srcPort" => "integer"}
239
+ convert => {"destPort" => "integer"}
240
+ }
241
+
242
+ date{
243
+ match => ["unixtimestamp" , "UNIX"]
244
+ }
245
+ }
246
+
247
+ output {
248
+ stdout { codec => rubydebug }
249
+ }
250
+ ```
251
+
252
+ ## More information
253
+ The source code of this plugin is hosted in GitHub repo [Microsoft Azure Diagnostics with ELK](https://github.com/Azure/azure-diagnostics-tools). We welcome you to provide feedback and/or contribute to the project.
@@ -0,0 +1,202 @@
1
+ # encoding: utf-8
2
+
3
+ require Dir[ File.dirname(__FILE__) + "/../../*_jars.rb" ].first
4
+
5
+ # Interface for a class that reads strings of arbitrary length from the end of a container
6
+ class LinearReader
7
+ # returns [content, are_more_bytes_available]
8
+ # content is a string
9
+ # are_more_bytes_available is a boolean stating if the container has more bytes to read
10
+ def read()
11
+ raise 'not implemented'
12
+ end
13
+ end
14
+
15
+ class JsonParser
16
+ def initialize(logger, linear_reader)
17
+ @logger = logger
18
+ @linear_reader = linear_reader
19
+ @stream_base_offset = 0
20
+
21
+ @stream_reader = StreamReader.new(@logger,@linear_reader)
22
+ @parser_factory = javax::json::Json.createParserFactory(nil)
23
+ @parser = @parser_factory.createParser(@stream_reader)
24
+ end
25
+
26
+ def parse(on_json_cbk, on_skip_malformed_cbk)
27
+ completed = false
28
+ while !completed
29
+ completed, start_index, end_index = parse_single_object(on_json_cbk)
30
+ if !completed
31
+
32
+ # if current position in the stream is not a well formed JSON then
33
+ # I can skip all future chars until I find a '{' so I won't have to create the parser for each char
34
+ json_candidate_start_index = @stream_reader.find('{', end_index)
35
+ json_candidate_start_index = @stream_reader.get_cached_stream_length - 1 if json_candidate_start_index.nil?
36
+ @logger.debug("JsonParser::parse Skipping Malformed JSON (start: #{start_index} end: #{end_index} candidate: #{json_candidate_start_index - 1}). Resetting the parser")
37
+ end_index = json_candidate_start_index - 1
38
+
39
+ on_skip_malformed_cbk.call(@stream_reader.get_stream_buffer(start_index, end_index))
40
+ @stream_reader.drop_stream(end_index + 1)
41
+ @stream_reader.reset_cached_stream_index(0)
42
+
43
+ @stream_base_offset = 0
44
+ @parser.close()
45
+ if @stream_reader.get_cached_stream_length <= 1
46
+ on_skip_malformed_cbk.call(@stream_reader.get_stream_buffer(0, -1))
47
+ return
48
+ end
49
+ @parser = @parser_factory.createParser(@stream_reader)
50
+ end
51
+ end
52
+ end
53
+
54
+ private
55
+ def parse_single_object(on_json_cbk)
56
+ depth = 0
57
+ stream_start_offset = 0
58
+ stream_end_offset = 0
59
+ while @parser.hasNext
60
+ event = @parser.next
61
+
62
+ if event == javax::json::stream::JsonParser::Event::START_OBJECT
63
+ depth = depth + 1
64
+ elsif event == javax::json::stream::JsonParser::Event::END_OBJECT
65
+ depth = depth - 1 # can't be negative because the parser handles the format correctness
66
+
67
+ if depth == 0
68
+ stream_end_offset = @parser.getLocation() .getStreamOffset() - 1
69
+ @logger.debug ("JsonParser::parse_single_object Json object found stream_start_offset: #{stream_start_offset} stream_end_offset: #{stream_end_offset}")
70
+
71
+ on_json_cbk.call(@stream_reader.get_stream_buffer(stream_start_offset - @stream_base_offset, stream_end_offset - @stream_base_offset))
72
+ stream_start_offset = stream_end_offset + 1
73
+
74
+ #Drop parsed bytes
75
+ @stream_reader.drop_stream(stream_end_offset - @stream_base_offset)
76
+ @stream_base_offset = stream_end_offset
77
+ end
78
+
79
+ end
80
+ end
81
+ return true
82
+ rescue javax::json::stream::JsonParsingException => e
83
+ return false, stream_start_offset - @stream_base_offset,
84
+ @parser.getLocation().getStreamOffset() - 1 - @stream_base_offset
85
+ rescue javax::json::JsonException, java::util::NoSuchElementException => e
86
+ @logger.debug("JsonParser::parse_single_object Exception stream_start_offset: #{stream_start_offset} stream_end_offset: #{stream_end_offset}")
87
+ raise e
88
+ end
89
+ end # class JsonParser
90
+
91
+ class StreamReader < java::io::Reader
92
+ def initialize(logger, reader)
93
+ super()
94
+ @logger = logger
95
+ @reader = reader
96
+
97
+ @stream_buffer = ""
98
+ @is_full_stream_read = false
99
+ @index = 0
100
+ @stream_buffer_length = 0
101
+ end
102
+
103
+ def markSupported
104
+ return false
105
+ end
106
+
107
+ def close
108
+ end
109
+
110
+ def get_cached_stream_length
111
+ return @stream_buffer_length
112
+ end
113
+
114
+ def get_cached_stream_index
115
+ return @index
116
+ end
117
+
118
+ def get_stream_buffer(start_index, end_index)
119
+ return @stream_buffer[start_index..end_index]
120
+ end
121
+
122
+ def find(substring, offset)
123
+ return @stream_buffer.index(substring, offset)
124
+ end
125
+
126
+ def drop_stream(until_offset)
127
+ @logger.debug("StreamReader::drop_stream until_offset:#{until_offset} index: #{@index}")
128
+ if @index < until_offset
129
+ return
130
+ end
131
+ @stream_buffer = @stream_buffer[until_offset..-1]
132
+ @index = @index - until_offset
133
+ @stream_buffer_length = @stream_buffer_length - until_offset
134
+ end
135
+
136
+ def reset_cached_stream_index(new_offset)
137
+ @logger.debug("StreamReader::reset_cached_stream_index new_offset:#{new_offset} index: #{@index}")
138
+ if new_offset < 0
139
+ return
140
+ end
141
+ @index = new_offset
142
+ end
143
+
144
+ #offset refers to the offset in the output bufferhttp://docs.oracle.com/javase/7/docs/api/java/io/Reader.html#read(char[],%20int,%20int)
145
+ def read(buf, offset, len)
146
+ @logger.debug("StreamReader::read #{offset} #{len} | stream index: #{@index} stream length: #{@stream_buffer_length}")
147
+ are_all_bytes_available = true
148
+ if @index + len - offset > @stream_buffer_length
149
+ are_all_bytes_available = fill_stream_buffer(@index + len - offset - @stream_buffer_length)
150
+ end
151
+
152
+ if (@stream_buffer_length - @index) < len
153
+ len = @stream_buffer_length - @index
154
+ @logger.debug("StreamReader::read #{offset} Actual length: #{len}")
155
+ end
156
+
157
+ if len > 0
158
+ #TODO: optimize this
159
+ jv_string = @stream_buffer[@index..@index+len-1].to_java
160
+ jv_bytes_array = jv_string.toCharArray()
161
+ java::lang::System.arraycopy(jv_bytes_array, 0, buf, offset, len)
162
+
163
+ @index = @index + len
164
+ end
165
+
166
+ if !are_all_bytes_available && len == 0
167
+ @logger.debug("StreamReader::read end of stream")
168
+ return -1
169
+ else
170
+ return len
171
+ end
172
+
173
+ rescue java::lang::IndexOutOfBoundsException => e
174
+ @logger.debug("StreamReader::read IndexOutOfBoundsException")
175
+ raise e
176
+ rescue java::lang::ArrayStoreException => e
177
+ @logger.debug("StreamReader::read ArrayStoreException")
178
+ raise e
179
+ rescue java::lang::NullPointerException => e
180
+ @logger.debug("StreamReader::read NullPointerException")
181
+ raise e
182
+ end
183
+
184
+ private
185
+ def fill_stream_buffer(len)
186
+ @logger.debug("StreamReader::fill_stream_buffer #{len}")
187
+ bytes_read = 0
188
+ while bytes_read < len
189
+ content, are_more_bytes_available = @reader.read
190
+ if !content.nil? && content.length > 0
191
+ @stream_buffer << content
192
+ @stream_buffer_length = @stream_buffer_length + content.length
193
+ bytes_read = bytes_read + content.length
194
+ end
195
+ if !are_more_bytes_available
196
+ return false
197
+ end
198
+ end
199
+ return true
200
+ end
201
+
202
+ end # class StreamReader