logstash-input-azureblob-offline 0.9.13.1-java

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml ADDED
@@ -0,0 +1,7 @@
1
+ ---
2
+ SHA256:
3
+ metadata.gz: 5a371bf57592f1bf54a8c5f0ba5bf2a2eea9ac951ad7c56fa0b886b31fe1b384
4
+ data.tar.gz: 5035df72c4a90a2ad2c48b58496bbc27d827f8e18a56ff97a3891eb081d79a56
5
+ SHA512:
6
+ metadata.gz: 867d291446ffc9155651568bad9db5cfffa81465a925381beca487ce02dd60b81c71f52a15d694d9c8b1aaf4987123ebd5f275c35b991cf5be9d33e63dd7b345
7
+ data.tar.gz: cc329c6aa43a729b010dc1bf1be4da761e78103a0cdaa45db88d6fdba0a00cd8ba830a56d87a2e897f5cdafb9088e49f96b829266f257e335389ce27ada11b5f
data/CHANGELOG.md ADDED
@@ -0,0 +1,7 @@
1
+ ## 2016.08.17
2
+ * Added a new configuration parameter for custom endpoint.
3
+
4
+ ## 2016.05.05
5
+ * Made the plugin to respect Logstash shutdown signal.
6
+ * Updated the *logstash-core* runtime dependency requirement to '~> 2.0'.
7
+ * Updated the *logstash-devutils* development dependency requirement to '>= 0.0.16'
data/Gemfile ADDED
@@ -0,0 +1,2 @@
1
+ source 'https://rubygems.org'
2
+ gemspec
data/LICENSE ADDED
@@ -0,0 +1,17 @@
1
+
2
+ Copyright (c) Microsoft. All rights reserved.
3
+ Microsoft would like to thank its contributors, a list
4
+ of whom are at http://aka.ms/entlib-contributors
5
+
6
+ Licensed under the Apache License, Version 2.0 (the "License"); you
7
+ may not use this file except in compliance with the License. You may
8
+ obtain a copy of the License at
9
+
10
+ http://www.apache.org/licenses/LICENSE-2.0
11
+
12
+ Unless required by applicable law or agreed to in writing, software
13
+ distributed under the License is distributed on an "AS IS" BASIS,
14
+ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
15
+ implied. See the License for the specific language governing permissions
16
+ and limitations under the License.
17
+
data/README.md ADDED
@@ -0,0 +1,266 @@
1
+ # Logstash input plugin for Azure Storage Blobs
2
+
3
+ ## Summary
4
+ This plugin reads and parses data from Azure Storage Blobs.
5
+
6
+ ## Installation
7
+ You can install this plugin using the Logstash "plugin" or "logstash-plugin" (for newer versions of Logstash) command:
8
+ ```sh
9
+ logstash-plugin install logstash-input-azureblob
10
+ ```
11
+ For more information, see Logstash reference [Working with plugins](https://www.elastic.co/guide/en/logstash/current/working-with-plugins.html).
12
+
13
+ ## Configuration
14
+ ### Required Parameters
15
+ __*storage_account_name*__
16
+
17
+ The storage account name.
18
+
19
+ __*storage_access_key*__
20
+
21
+ The access key to the storage account.
22
+
23
+ __*container*__
24
+
25
+ The blob container name.
26
+
27
+ ### Optional Parameters
28
+ __*path_filters*__
29
+
30
+ The path(s) to the file(s) to use as an input. By default it will watch every files in the storage container. You can use filename patterns here, such as `logs/*.log`. If you use a pattern like `logs/**/*.log`, a recursive search of `logs` will be done for all `*.log` files.
31
+
32
+ Do not include a leading `/`, as Azure path look like this: `path/to/blob/file.txt`
33
+
34
+ You may also configure multiple paths. See an example on the [Logstash configuration page](http://www.elastic.co/guide/en/logstash/current/configuration-file-structure.html#array).
35
+
36
+ __*endpoint*__
37
+
38
+ Specifies the endpoint of Azure Service Management. The default value is `core.windows.net`.
39
+
40
+ __*registry_path*__
41
+
42
+ Specifies the file path for the registry file to record offsets and coordinate between multiple clients. The default value is `data/registry`.
43
+
44
+ Overwrite this value when there happen to be a file at the path of `data/registry` in the azure blob container.
45
+
46
+ __*interval*__
47
+
48
+ Set how many seconds to idle before checking for new logs. The default, `30`, means idle for `30` seconds.
49
+
50
+ __*registry_create_policy*__
51
+
52
+ Specifies the way to initially set offset for existing blob files.
53
+
54
+ This option only applies for registry creation.
55
+
56
+ Valid values include:
57
+
58
+ - resume
59
+ - start_over
60
+
61
+ The default, `resume`, means when the registry is initially created, it assumes all blob has been consumed and it will start to pick up any new content in the blobs.
62
+
63
+ When set to `start_over`, it assumes none of the blob is consumed and it will read all blob files from begining.
64
+
65
+ Offsets will be picked up from registry file whenever it exists.
66
+
67
+ __*file_head_bytes*__
68
+
69
+ Specifies the header of the file in bytes that does not repeat over records. Usually, these are json opening tags. The default value is `0`.
70
+
71
+ __*file_tail_bytes*__
72
+
73
+ Specifies the tail of the file that does not repeat over records. Usually, these are json closing tags. The default value is `0`.
74
+
75
+ __*azure_blob_file_path_field*__
76
+
77
+ Specifies whether or not to output the full path of the blob within the Azure Blob container as a field on each event
78
+
79
+ __*azure_blob_file_path_field_name*__
80
+
81
+ Define the name of the field that will be emitted in the event when `azure_blob_file_path_field` is true
82
+
83
+ ### Advanced tweaking parameters
84
+
85
+ Keep these parameters default to use under normal situration. Tweak these parameters when dealing with large scale azure blobs and logs.
86
+
87
+ __*blob_list_page_size*__
88
+
89
+ Specifies the page-size for returned blob items. Too big number will hit heap overflow; Too small number will leads to too many requests. The default of `100` is good for heap size of 1G.
90
+
91
+ __*file_chunk_size_bytes*__
92
+
93
+ Specifies the buffer size used to download the blob content. This is also the maximum buffer size that will be passed to a codec except for JSON. The JSON codec will only receive valid JSON that might span between multiple chunks. Any malformed JSON content will be skipped.
94
+
95
+ The default value is 4194304 (4MB)
96
+
97
+ ### Examples
98
+
99
+ * Bare-bone settings:
100
+
101
+ ```yaml
102
+ input
103
+ {
104
+ azureblob
105
+ {
106
+ storage_account_name => "mystorageaccount"
107
+ storage_access_key => "VGhpcyBpcyBhIGZha2Uga2V5Lg=="
108
+ container => "mycontainer"
109
+ }
110
+ }
111
+ ```
112
+
113
+ * Example for Wad-IIS
114
+
115
+ ```yaml
116
+ input {
117
+ azureblob
118
+ {
119
+ storage_account_name => 'mystorageaccount'
120
+ storage_access_key => 'VGhpcyBpcyBhIGZha2Uga2V5Lg=='
121
+ container => 'wad-iis-logfiles'
122
+ codec => line
123
+ }
124
+ }
125
+ filter {
126
+ ## Ignore the comments that IIS will add to the start of the W3C logs
127
+ #
128
+ if [message] =~ "^#" {
129
+ drop {}
130
+ }
131
+
132
+ grok {
133
+ # https://grokdebug.herokuapp.com/
134
+ match => ["message", "%{TIMESTAMP_ISO8601:log_timestamp} %{WORD:sitename} %{WORD:computername} %{IP:server_ip} %{WORD:method} %{URIPATH:uriStem} %{NOTSPACE:uriQuery} %{NUMBER:port} %{NOTSPACE:username} %{IPORHOST:clientIP} %{NOTSPACE:protocolVersion} %{NOTSPACE:userAgent} %{NOTSPACE:cookie} %{NOTSPACE:referer} %{NOTSPACE:requestHost} %{NUMBER:response} %{NUMBER:subresponse} %{NUMBER:win32response} %{NUMBER:bytesSent} %{NUMBER:bytesReceived} %{NUMBER:timetaken}"]
135
+ }
136
+
137
+ ## Set the Event Timesteamp from the log
138
+ #
139
+ date {
140
+ match => [ "log_timestamp", "YYYY-MM-dd HH:mm:ss" ]
141
+ timezone => "Etc/UTC"
142
+ }
143
+
144
+ ## If the log record has a value for 'bytesSent', then add a new field
145
+ # to the event that converts it to kilobytes
146
+ #
147
+ if [bytesSent] {
148
+ ruby {
149
+ code => "event.set('kilobytesSent', event.get('bytesSent').to_i / 1024.0)"
150
+ }
151
+ }
152
+
153
+ ## Do the same conversion for the bytes received value
154
+ #
155
+ if [bytesReceived] {
156
+ ruby {
157
+ code => "event.set('kilobytesReceived', event.get('bytesReceived').to_i / 1024.0 )"
158
+ }
159
+ }
160
+
161
+ ## Perform some mutations on the records to prep them for Elastic
162
+ #
163
+ mutate {
164
+ ## Convert some fields from strings to integers
165
+ #
166
+ convert => ["bytesSent", "integer"]
167
+ convert => ["bytesReceived", "integer"]
168
+ convert => ["timetaken", "integer"]
169
+
170
+ ## Create a new field for the reverse DNS lookup below
171
+ #
172
+ add_field => { "clientHostname" => "%{clientIP}" }
173
+
174
+ ## Finally remove the original log_timestamp field since the event will
175
+ # have the proper date on it
176
+ #
177
+ remove_field => [ "log_timestamp"]
178
+ }
179
+
180
+ ## Do a reverse lookup on the client IP to get their hostname.
181
+ #
182
+ dns {
183
+ ## Now that we've copied the clientIP into a new field we can
184
+ # simply replace it here using a reverse lookup
185
+ #
186
+ action => "replace"
187
+ reverse => ["clientHostname"]
188
+ }
189
+
190
+ ## Parse out the user agent
191
+ #
192
+ useragent {
193
+ source=> "useragent"
194
+ prefix=> "browser"
195
+ }
196
+ }
197
+ output {
198
+ file {
199
+ path => '/var/tmp/logstash-file-output'
200
+ codec => rubydebug
201
+ }
202
+ stdout {
203
+ codec => rubydebug
204
+ }
205
+ }
206
+ ```
207
+
208
+ * NSG Logs
209
+
210
+ ```yaml
211
+ input {
212
+ azureblob
213
+ {
214
+ storage_account_name => "mystorageaccount"
215
+ storage_access_key => "VGhpcyBpcyBhIGZha2Uga2V5Lg=="
216
+ container => "insights-logs-networksecuritygroupflowevent"
217
+ codec => "json"
218
+ # Refer https://docs.microsoft.com/en-us/azure/network-watcher/network-watcher-read-nsg-flow-logs
219
+ # Typical numbers could be 21/9 or 12/2 depends on the nsg log file types
220
+ file_head_bytes => 21
221
+ file_tail_bytes => 9
222
+ }
223
+ }
224
+
225
+ filter {
226
+ split { field => "[records]" }
227
+ split { field => "[records][properties][flows]"}
228
+ split { field => "[records][properties][flows][flows]"}
229
+ split { field => "[records][properties][flows][flows][flowTuples]"}
230
+
231
+ mutate{
232
+ split => { "[records][resourceId]" => "/"}
233
+ add_field => {"Subscription" => "%{[records][resourceId][2]}"
234
+ "ResourceGroup" => "%{[records][resourceId][4]}"
235
+ "NetworkSecurityGroup" => "%{[records][resourceId][8]}"}
236
+ convert => {"Subscription" => "string"}
237
+ convert => {"ResourceGroup" => "string"}
238
+ convert => {"NetworkSecurityGroup" => "string"}
239
+ split => { "[records][properties][flows][flows][flowTuples]" => ","}
240
+ add_field => {
241
+ "unixtimestamp" => "%{[records][properties][flows][flows][flowTuples][0]}"
242
+ "srcIp" => "%{[records][properties][flows][flows][flowTuples][1]}"
243
+ "destIp" => "%{[records][properties][flows][flows][flowTuples][2]}"
244
+ "srcPort" => "%{[records][properties][flows][flows][flowTuples][3]}"
245
+ "destPort" => "%{[records][properties][flows][flows][flowTuples][4]}"
246
+ "protocol" => "%{[records][properties][flows][flows][flowTuples][5]}"
247
+ "trafficflow" => "%{[records][properties][flows][flows][flowTuples][6]}"
248
+ "traffic" => "%{[records][properties][flows][flows][flowTuples][7]}"
249
+ }
250
+ convert => {"unixtimestamp" => "integer"}
251
+ convert => {"srcPort" => "integer"}
252
+ convert => {"destPort" => "integer"}
253
+ }
254
+
255
+ date{
256
+ match => ["unixtimestamp" , "UNIX"]
257
+ }
258
+ }
259
+
260
+ output {
261
+ stdout { codec => rubydebug }
262
+ }
263
+ ```
264
+
265
+ ## More information
266
+ The source code of this plugin is hosted in GitHub repo [Microsoft Azure Diagnostics with ELK](https://github.com/Azure/azure-diagnostics-tools). We welcome you to provide feedback and/or contribute to the project.
@@ -0,0 +1,202 @@
1
+ # encoding: utf-8
2
+
3
+ require Dir[ File.dirname(__FILE__) + "/../../*_jars.rb" ].first
4
+
5
+ # Interface for a class that reads strings of arbitrary length from the end of a container
6
+ class LinearReader
7
+ # returns [content, are_more_bytes_available]
8
+ # content is a string
9
+ # are_more_bytes_available is a boolean stating if the container has more bytes to read
10
+ def read()
11
+ raise 'not implemented'
12
+ end
13
+ end
14
+
15
+ class JsonParser
16
+ def initialize(logger, linear_reader)
17
+ @logger = logger
18
+ @linear_reader = linear_reader
19
+ @stream_base_offset = 0
20
+
21
+ @stream_reader = StreamReader.new(@logger,@linear_reader)
22
+ @parser_factory = javax::json::Json.createParserFactory(nil)
23
+ @parser = @parser_factory.createParser(@stream_reader)
24
+ end
25
+
26
+ def parse(on_json_cbk, on_skip_malformed_cbk)
27
+ completed = false
28
+ while !completed
29
+ completed, start_index, end_index = parse_single_object(on_json_cbk)
30
+ if !completed
31
+
32
+ # if current position in the stream is not a well formed JSON then
33
+ # I can skip all future chars until I find a '{' so I won't have to create the parser for each char
34
+ json_candidate_start_index = @stream_reader.find('{', end_index)
35
+ json_candidate_start_index = @stream_reader.get_cached_stream_length - 1 if json_candidate_start_index.nil?
36
+ @logger.debug("JsonParser::parse Skipping Malformed JSON (start: #{start_index} end: #{end_index} candidate: #{json_candidate_start_index - 1}). Resetting the parser")
37
+ end_index = json_candidate_start_index - 1
38
+
39
+ on_skip_malformed_cbk.call(@stream_reader.get_stream_buffer(start_index, end_index))
40
+ @stream_reader.drop_stream(end_index + 1)
41
+ @stream_reader.reset_cached_stream_index(0)
42
+
43
+ @stream_base_offset = 0
44
+ @parser.close()
45
+ if @stream_reader.get_cached_stream_length <= 1
46
+ on_skip_malformed_cbk.call(@stream_reader.get_stream_buffer(0, -1))
47
+ return
48
+ end
49
+ @parser = @parser_factory.createParser(@stream_reader)
50
+ end
51
+ end
52
+ end
53
+
54
+ private
55
+ def parse_single_object(on_json_cbk)
56
+ depth = 0
57
+ stream_start_offset = 0
58
+ stream_end_offset = 0
59
+ while @parser.hasNext
60
+ event = @parser.next
61
+
62
+ if event == javax::json::stream::JsonParser::Event::START_OBJECT
63
+ depth = depth + 1
64
+ elsif event == javax::json::stream::JsonParser::Event::END_OBJECT
65
+ depth = depth - 1 # can't be negative because the parser handles the format correctness
66
+
67
+ if depth == 0
68
+ stream_end_offset = @parser.getLocation() .getStreamOffset() - 1
69
+ @logger.debug ("JsonParser::parse_single_object Json object found stream_start_offset: #{stream_start_offset} stream_end_offset: #{stream_end_offset}")
70
+
71
+ on_json_cbk.call(@stream_reader.get_stream_buffer(stream_start_offset - @stream_base_offset, stream_end_offset - @stream_base_offset))
72
+ stream_start_offset = stream_end_offset + 1
73
+
74
+ #Drop parsed bytes
75
+ @stream_reader.drop_stream(stream_end_offset - @stream_base_offset)
76
+ @stream_base_offset = stream_end_offset
77
+ end
78
+
79
+ end
80
+ end
81
+ return true
82
+ rescue javax::json::stream::JsonParsingException => e
83
+ return false, stream_start_offset - @stream_base_offset,
84
+ @parser.getLocation().getStreamOffset() - 1 - @stream_base_offset
85
+ rescue javax::json::JsonException, java::util::NoSuchElementException => e
86
+ @logger.debug("JsonParser::parse_single_object Exception stream_start_offset: #{stream_start_offset} stream_end_offset: #{stream_end_offset}")
87
+ raise e
88
+ end
89
+ end # class JsonParser
90
+
91
+ class StreamReader < java::io::Reader
92
+ def initialize(logger, reader)
93
+ super()
94
+ @logger = logger
95
+ @reader = reader
96
+
97
+ @stream_buffer = ""
98
+ @is_full_stream_read = false
99
+ @index = 0
100
+ @stream_buffer_length = 0
101
+ end
102
+
103
+ def markSupported
104
+ return false
105
+ end
106
+
107
+ def close
108
+ end
109
+
110
+ def get_cached_stream_length
111
+ return @stream_buffer_length
112
+ end
113
+
114
+ def get_cached_stream_index
115
+ return @index
116
+ end
117
+
118
+ def get_stream_buffer(start_index, end_index)
119
+ return @stream_buffer[start_index..end_index]
120
+ end
121
+
122
+ def find(substring, offset)
123
+ return @stream_buffer.index(substring, offset)
124
+ end
125
+
126
+ def drop_stream(until_offset)
127
+ @logger.debug("StreamReader::drop_stream until_offset:#{until_offset} index: #{@index}")
128
+ if @index < until_offset
129
+ return
130
+ end
131
+ @stream_buffer = @stream_buffer[until_offset..-1]
132
+ @index = @index - until_offset
133
+ @stream_buffer_length = @stream_buffer_length - until_offset
134
+ end
135
+
136
+ def reset_cached_stream_index(new_offset)
137
+ @logger.debug("StreamReader::reset_cached_stream_index new_offset:#{new_offset} index: #{@index}")
138
+ if new_offset < 0
139
+ return
140
+ end
141
+ @index = new_offset
142
+ end
143
+
144
+ #offset refers to the offset in the output bufferhttp://docs.oracle.com/javase/7/docs/api/java/io/Reader.html#read(char[],%20int,%20int)
145
+ def read(buf, offset, len)
146
+ @logger.debug("StreamReader::read #{offset} #{len} | stream index: #{@index} stream length: #{@stream_buffer_length}")
147
+ are_all_bytes_available = true
148
+ if @index + len - offset > @stream_buffer_length
149
+ are_all_bytes_available = fill_stream_buffer(@index + len - offset - @stream_buffer_length)
150
+ end
151
+
152
+ if (@stream_buffer_length - @index) < len
153
+ len = @stream_buffer_length - @index
154
+ @logger.debug("StreamReader::read #{offset} Actual length: #{len}")
155
+ end
156
+
157
+ if len > 0
158
+ #TODO: optimize this
159
+ jv_string = @stream_buffer[@index..@index+len-1].to_java
160
+ jv_bytes_array = jv_string.toCharArray()
161
+ java::lang::System.arraycopy(jv_bytes_array, 0, buf, offset, len)
162
+
163
+ @index = @index + len
164
+ end
165
+
166
+ if !are_all_bytes_available && len == 0
167
+ @logger.debug("StreamReader::read end of stream")
168
+ return -1
169
+ else
170
+ return len
171
+ end
172
+
173
+ rescue java::lang::IndexOutOfBoundsException => e
174
+ @logger.debug("StreamReader::read IndexOutOfBoundsException")
175
+ raise e
176
+ rescue java::lang::ArrayStoreException => e
177
+ @logger.debug("StreamReader::read ArrayStoreException")
178
+ raise e
179
+ rescue java::lang::NullPointerException => e
180
+ @logger.debug("StreamReader::read NullPointerException")
181
+ raise e
182
+ end
183
+
184
+ private
185
+ def fill_stream_buffer(len)
186
+ @logger.debug("StreamReader::fill_stream_buffer #{len}")
187
+ bytes_read = 0
188
+ while bytes_read < len
189
+ content, are_more_bytes_available = @reader.read
190
+ if !content.nil? && content.length > 0
191
+ @stream_buffer << content
192
+ @stream_buffer_length = @stream_buffer_length + content.length
193
+ bytes_read = bytes_read + content.length
194
+ end
195
+ if !are_more_bytes_available
196
+ return false
197
+ end
198
+ end
199
+ return true
200
+ end
201
+
202
+ end # class StreamReader