logstash-input-azureblob-json-head-tail 0.9.9
Sign up to get free protection for your applications and to get access to all the features.
- checksums.yaml +7 -0
- data/CHANGELOG.md +7 -0
- data/Gemfile +2 -0
- data/LICENSE +17 -0
- data/README.md +243 -0
- data/lib/logstash/inputs/azureblob.rb +383 -0
- data/logstash-input-azureblob.gemspec +26 -0
- data/spec/inputs/azureblob_spec.rb +1 -0
- metadata +131 -0
checksums.yaml
ADDED
@@ -0,0 +1,7 @@
|
|
1
|
+
---
|
2
|
+
SHA1:
|
3
|
+
metadata.gz: 36d2a3fcea0fdf5f3af0b3175acab68a94901b7b
|
4
|
+
data.tar.gz: 85b71255879d3b0b15c6277b4aa9530406982255
|
5
|
+
SHA512:
|
6
|
+
metadata.gz: c7e8986e035347e2a482877ae18683fbd1001cc7c0b7bdbf67077090a718d81be5da4a05fa1ba6f388874bc28bb9183927b22faaab8840db6bc69e29c282c607
|
7
|
+
data.tar.gz: 6b99f505724ed327e11eb38c5b971d970691ff9903668482936ee01ab3a89920174218e2b0432e09d56fb9dac0e4f2e79ce466488252ee47fab1586d437bafe7
|
data/CHANGELOG.md
ADDED
@@ -0,0 +1,7 @@
|
|
1
|
+
## 2016.08.17
|
2
|
+
* Added a new configuration parameter for custom endpoint.
|
3
|
+
|
4
|
+
## 2016.05.05
|
5
|
+
* Made the plugin to respect Logstash shutdown signal.
|
6
|
+
* Updated the *logstash-core* runtime dependency requirement to '~> 2.0'.
|
7
|
+
* Updated the *logstash-devutils* development dependency requirement to '>= 0.0.16'
|
data/Gemfile
ADDED
data/LICENSE
ADDED
@@ -0,0 +1,17 @@
|
|
1
|
+
|
2
|
+
Copyright (c) Microsoft. All rights reserved.
|
3
|
+
Microsoft would like to thank its contributors, a list
|
4
|
+
of whom are at http://aka.ms/entlib-contributors
|
5
|
+
|
6
|
+
Licensed under the Apache License, Version 2.0 (the "License"); you
|
7
|
+
may not use this file except in compliance with the License. You may
|
8
|
+
obtain a copy of the License at
|
9
|
+
|
10
|
+
http://www.apache.org/licenses/LICENSE-2.0
|
11
|
+
|
12
|
+
Unless required by applicable law or agreed to in writing, software
|
13
|
+
distributed under the License is distributed on an "AS IS" BASIS,
|
14
|
+
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
|
15
|
+
implied. See the License for the specific language governing permissions
|
16
|
+
and limitations under the License.
|
17
|
+
|
data/README.md
ADDED
@@ -0,0 +1,243 @@
|
|
1
|
+
# Logstash input plugin for Azure Storage Blobs
|
2
|
+
|
3
|
+
## Summary
|
4
|
+
This plugin reads and parses data from Azure Storage Blobs.
|
5
|
+
|
6
|
+
## Installation
|
7
|
+
You can install this plugin using the Logstash "plugin" or "logstash-plugin" (for newer versions of Logstash) command:
|
8
|
+
```sh
|
9
|
+
logstash-plugin install logstash-input-azureblob
|
10
|
+
```
|
11
|
+
For more information, see Logstash reference [Working with plugins](https://www.elastic.co/guide/en/logstash/current/working-with-plugins.html).
|
12
|
+
|
13
|
+
## Configuration
|
14
|
+
### Required Parameters
|
15
|
+
__*storage_account_name*__
|
16
|
+
|
17
|
+
The storage account name.
|
18
|
+
|
19
|
+
__*storage_access_key*__
|
20
|
+
|
21
|
+
The access key to the storage account.
|
22
|
+
|
23
|
+
__*container*__
|
24
|
+
|
25
|
+
The blob container name.
|
26
|
+
|
27
|
+
### Optional Parameters
|
28
|
+
__*endpoint*__
|
29
|
+
|
30
|
+
Specifies the endpoint of Azure Service Management. The default value is `core.windows.net`.
|
31
|
+
|
32
|
+
__*registry_path*__
|
33
|
+
|
34
|
+
Specifies the file path for the registry file to record offsets and coordinate between multiple clients. The default value is `data/registry`.
|
35
|
+
|
36
|
+
Overwrite this value when there happen to be a file at the path of `data/registry` in the azure blob container.
|
37
|
+
|
38
|
+
__*interval*__
|
39
|
+
|
40
|
+
Set how many seconds to idle before checking for new logs. The default, `30`, means idle for `30` seconds.
|
41
|
+
|
42
|
+
__*registry_create_policy*__
|
43
|
+
|
44
|
+
Specifies the way to initially set offset for existing blob files.
|
45
|
+
|
46
|
+
This option only applies for registry creation.
|
47
|
+
|
48
|
+
Valid values include:
|
49
|
+
|
50
|
+
- resume
|
51
|
+
- start_over
|
52
|
+
|
53
|
+
The default, `resume`, means when the registry is initially created, it assumes all blob has been consumed and it will start to pick up any new content in the blobs.
|
54
|
+
|
55
|
+
When set to `start_over`, it assumes none of the blob is consumed and it will read all blob files from begining.
|
56
|
+
|
57
|
+
Offsets will be picked up from registry file whenever it exists.
|
58
|
+
|
59
|
+
__*file_head_bytes*__
|
60
|
+
|
61
|
+
Specifies the header of the file in bytes that does not repeat over records. Usually, these are json opening tags. The default value is `0`.
|
62
|
+
|
63
|
+
__*file_tail_bytes*__
|
64
|
+
|
65
|
+
Specifies the tail of the file that does not repeat over records. Usually, these are json closing tags. The defaul tvalue is `0`.
|
66
|
+
|
67
|
+
__*record_preprocess_reg_exp*__
|
68
|
+
|
69
|
+
Specifies the regular expression to process content before pushing the event. The matched will be removed. For example, `^\s*,` will removing the leading `,` from the content. The regular expression uses multiline mode.
|
70
|
+
|
71
|
+
__*blob_list_page_size*__
|
72
|
+
|
73
|
+
Specifies the page-size for returned blob items. Too big number will hit heap overflow; Too small number will leads to too many requests. The default of `100` is good for heap size of 1G.
|
74
|
+
|
75
|
+
### Examples
|
76
|
+
|
77
|
+
* Bare-bone settings:
|
78
|
+
|
79
|
+
```yaml
|
80
|
+
input
|
81
|
+
{
|
82
|
+
azureblob
|
83
|
+
{
|
84
|
+
storage_account_name => "mystorageaccount"
|
85
|
+
storage_access_key => "VGhpcyBpcyBhIGZha2Uga2V5Lg=="
|
86
|
+
container => "mycontainer"
|
87
|
+
}
|
88
|
+
}
|
89
|
+
```
|
90
|
+
|
91
|
+
* Example for Wad-IIS
|
92
|
+
|
93
|
+
```yaml
|
94
|
+
input {
|
95
|
+
azureblob
|
96
|
+
{
|
97
|
+
storage_account_name => 'mystorageaccount'
|
98
|
+
storage_access_key => 'VGhpcyBpcyBhIGZha2Uga2V5Lg=='
|
99
|
+
container => 'wad-iis-logfiles'
|
100
|
+
codec => line
|
101
|
+
}
|
102
|
+
}
|
103
|
+
filter {
|
104
|
+
## Ignore the comments that IIS will add to the start of the W3C logs
|
105
|
+
#
|
106
|
+
if [message] =~ "^#" {
|
107
|
+
drop {}
|
108
|
+
}
|
109
|
+
|
110
|
+
grok {
|
111
|
+
# https://grokdebug.herokuapp.com/
|
112
|
+
match => ["message", "%{TIMESTAMP_ISO8601:log_timestamp} %{WORD:sitename} %{WORD:computername} %{IP:server_ip} %{WORD:method} %{URIPATH:uriStem} %{NOTSPACE:uriQuery} %{NUMBER:port} %{NOTSPACE:username} %{IPORHOST:clientIP} %{NOTSPACE:protocolVersion} %{NOTSPACE:userAgent} %{NOTSPACE:cookie} %{NOTSPACE:referer} %{NOTSPACE:requestHost} %{NUMBER:response} %{NUMBER:subresponse} %{NUMBER:win32response} %{NUMBER:bytesSent} %{NUMBER:bytesReceived} %{NUMBER:timetaken}"]
|
113
|
+
}
|
114
|
+
|
115
|
+
## Set the Event Timesteamp from the log
|
116
|
+
#
|
117
|
+
date {
|
118
|
+
match => [ "log_timestamp", "YYYY-MM-dd HH:mm:ss" ]
|
119
|
+
timezone => "Etc/UTC"
|
120
|
+
}
|
121
|
+
|
122
|
+
## If the log record has a value for 'bytesSent', then add a new field
|
123
|
+
# to the event that converts it to kilobytes
|
124
|
+
#
|
125
|
+
if [bytesSent] {
|
126
|
+
ruby {
|
127
|
+
code => "event['kilobytesSent'] = event['bytesSent'].to_i / 1024.0"
|
128
|
+
}
|
129
|
+
}
|
130
|
+
|
131
|
+
## Do the same conversion for the bytes received value
|
132
|
+
#
|
133
|
+
if [bytesReceived] {
|
134
|
+
ruby {
|
135
|
+
code => "event['kilobytesReceived'] = event['bytesReceived'].to_i / 1024.0"
|
136
|
+
}
|
137
|
+
}
|
138
|
+
|
139
|
+
## Perform some mutations on the records to prep them for Elastic
|
140
|
+
#
|
141
|
+
mutate {
|
142
|
+
## Convert some fields from strings to integers
|
143
|
+
#
|
144
|
+
convert => ["bytesSent", "integer"]
|
145
|
+
convert => ["bytesReceived", "integer"]
|
146
|
+
convert => ["timetaken", "integer"]
|
147
|
+
|
148
|
+
## Create a new field for the reverse DNS lookup below
|
149
|
+
#
|
150
|
+
add_field => { "clientHostname" => "%{clientIP}" }
|
151
|
+
|
152
|
+
## Finally remove the original log_timestamp field since the event will
|
153
|
+
# have the proper date on it
|
154
|
+
#
|
155
|
+
remove_field => [ "log_timestamp"]
|
156
|
+
}
|
157
|
+
|
158
|
+
## Do a reverse lookup on the client IP to get their hostname.
|
159
|
+
#
|
160
|
+
dns {
|
161
|
+
## Now that we've copied the clientIP into a new field we can
|
162
|
+
# simply replace it here using a reverse lookup
|
163
|
+
#
|
164
|
+
action => "replace"
|
165
|
+
reverse => ["clientHostname"]
|
166
|
+
}
|
167
|
+
|
168
|
+
## Parse out the user agent
|
169
|
+
#
|
170
|
+
useragent {
|
171
|
+
source=> "useragent"
|
172
|
+
prefix=> "browser"
|
173
|
+
}
|
174
|
+
}
|
175
|
+
output {
|
176
|
+
file {
|
177
|
+
path => '/var/tmp/logstash-file-output'
|
178
|
+
codec => rubydebug
|
179
|
+
}
|
180
|
+
stdout {
|
181
|
+
codec => rubydebug
|
182
|
+
}
|
183
|
+
}
|
184
|
+
```
|
185
|
+
|
186
|
+
* NSG Logs
|
187
|
+
|
188
|
+
```yaml
|
189
|
+
input {
|
190
|
+
azureblob
|
191
|
+
{
|
192
|
+
storage_account_name => "mystorageaccount"
|
193
|
+
storage_access_key => "VGhpcyBpcyBhIGZha2Uga2V5Lg=="
|
194
|
+
container => "insights-logs-networksecuritygroupflowevent"
|
195
|
+
codec => "json"
|
196
|
+
file_head_bytes => 21
|
197
|
+
file_tail_bytes => 9
|
198
|
+
record_preprocess_reg_exp => "^\s*,"
|
199
|
+
}
|
200
|
+
}
|
201
|
+
|
202
|
+
filter {
|
203
|
+
split { field => "[records]" }
|
204
|
+
split { field => "[records][properties][flows]"}
|
205
|
+
split { field => "[records][properties][flows][flows]"}
|
206
|
+
split { field => "[records][properties][flows][flows][flowTuples]"}
|
207
|
+
|
208
|
+
mutate{
|
209
|
+
split => { "[records][resourceId]" => "/"}
|
210
|
+
add_field => {"Subscription" => "%{[records][resourceId][2]}"
|
211
|
+
"ResourceGroup" => "%{[records][resourceId][4]}"
|
212
|
+
"NetworkSecurityGroup" => "%{[records][resourceId][8]}"}
|
213
|
+
convert => {"Subscription" => "string"}
|
214
|
+
convert => {"ResourceGroup" => "string"}
|
215
|
+
convert => {"NetworkSecurityGroup" => "string"}
|
216
|
+
split => { "[records][properties][flows][flows][flowTuples]" => ","}
|
217
|
+
add_field => {
|
218
|
+
"unixtimestamp" => "%{[records][properties][flows][flows][flowTuples][0]}"
|
219
|
+
"srcIp" => "%{[records][properties][flows][flows][flowTuples][1]}"
|
220
|
+
"destIp" => "%{[records][properties][flows][flows][flowTuples][2]}"
|
221
|
+
"srcPort" => "%{[records][properties][flows][flows][flowTuples][3]}"
|
222
|
+
"destPort" => "%{[records][properties][flows][flows][flowTuples][4]}"
|
223
|
+
"protocol" => "%{[records][properties][flows][flows][flowTuples][5]}"
|
224
|
+
"trafficflow" => "%{[records][properties][flows][flows][flowTuples][6]}"
|
225
|
+
"traffic" => "%{[records][properties][flows][flows][flowTuples][7]}"
|
226
|
+
}
|
227
|
+
convert => {"unixtimestamp" => "integer"}
|
228
|
+
convert => {"srcPort" => "integer"}
|
229
|
+
convert => {"destPort" => "integer"}
|
230
|
+
}
|
231
|
+
|
232
|
+
date{
|
233
|
+
match => ["unixtimestamp" , "UNIX"]
|
234
|
+
}
|
235
|
+
}
|
236
|
+
|
237
|
+
output {
|
238
|
+
stdout { codec => rubydebug }
|
239
|
+
}
|
240
|
+
```
|
241
|
+
|
242
|
+
## More information
|
243
|
+
The source code of this plugin is hosted in GitHub repo [Microsoft Azure Diagnostics with ELK](https://github.com/Azure/azure-diagnostics-tools). We welcome you to provide feedback and/or contribute to the project.
|
@@ -0,0 +1,383 @@
|
|
1
|
+
# encoding: utf-8
|
2
|
+
require "logstash/inputs/base"
|
3
|
+
require "logstash/namespace"
|
4
|
+
|
5
|
+
# Azure Storage SDK for Ruby
|
6
|
+
require "azure/storage"
|
7
|
+
require 'json' # for registry content
|
8
|
+
require "securerandom" # for generating uuid.
|
9
|
+
|
10
|
+
# Registry item to coordinate between mulitple clients
|
11
|
+
class LogStash::Inputs::RegistryItem
|
12
|
+
attr_accessor :file_path, :etag, :offset, :reader, :gen
|
13
|
+
# Allow json serialization.
|
14
|
+
def as_json(options={})
|
15
|
+
{
|
16
|
+
file_path: @file_path,
|
17
|
+
etag: @etag,
|
18
|
+
reader: @reader,
|
19
|
+
offset: @offset,
|
20
|
+
gen: @gen
|
21
|
+
}
|
22
|
+
end # as_json
|
23
|
+
|
24
|
+
def to_json(*options)
|
25
|
+
as_json(*options).to_json(*options)
|
26
|
+
end # to_json
|
27
|
+
|
28
|
+
def initialize(file_path, etag, reader, offset = 0, gen = 0)
|
29
|
+
@file_path = file_path
|
30
|
+
@etag = etag
|
31
|
+
@reader = reader
|
32
|
+
@offset = offset
|
33
|
+
@gen = gen
|
34
|
+
end # initialize
|
35
|
+
end # class RegistryItem
|
36
|
+
|
37
|
+
|
38
|
+
# Logstash input plugin for Azure Blobs
|
39
|
+
#
|
40
|
+
# This logstash plugin gathers data from Microsoft Azure Blobs
|
41
|
+
class LogStash::Inputs::LogstashInputAzureblob < LogStash::Inputs::Base
|
42
|
+
config_name "azureblob"
|
43
|
+
|
44
|
+
# If undefined, Logstash will complain, even if codec is unused.
|
45
|
+
default :codec, "json_lines"
|
46
|
+
|
47
|
+
# Set the account name for the azure storage account.
|
48
|
+
config :storage_account_name, :validate => :string
|
49
|
+
|
50
|
+
# Set the key to access the storage account.
|
51
|
+
config :storage_access_key, :validate => :string
|
52
|
+
|
53
|
+
# Set the container of the blobs.
|
54
|
+
config :container, :validate => :string
|
55
|
+
|
56
|
+
# Set the endpoint for the blobs.
|
57
|
+
#
|
58
|
+
# The default, `core.windows.net` targets the public azure.
|
59
|
+
config :endpoint, :validate => :string, :default => 'core.windows.net'
|
60
|
+
|
61
|
+
# Set the value of using backup mode.
|
62
|
+
config :backupmode, :validate => :boolean, :default => false, :deprecated => true, :obsolete => 'This option is obsoleted and the settings will be ignored.'
|
63
|
+
|
64
|
+
# Set the value for the registry file.
|
65
|
+
#
|
66
|
+
# The default, `data/registry`, is used to coordinate readings for various instances of the clients.
|
67
|
+
config :registry_path, :validate => :string, :default => 'data/registry'
|
68
|
+
|
69
|
+
# Set how many seconds to keep idle before checking for new logs.
|
70
|
+
#
|
71
|
+
# The default, `30`, means trigger a reading for the log every 30 seconds after entering idle.
|
72
|
+
config :interval, :validate => :number, :default => 30
|
73
|
+
|
74
|
+
# Set the registry create mode
|
75
|
+
#
|
76
|
+
# The default, `resume`, means when the registry is initially created, it assumes all logs has been handled.
|
77
|
+
# When set to `start_over`, it will read all log files from begining.
|
78
|
+
config :registry_create_policy, :validate => :string, :default => 'resume'
|
79
|
+
|
80
|
+
# Sets the header of the file that does not repeat over records. Usually, these are json opening tags.
|
81
|
+
config :file_head_bytes, :validate => :number, :default => 0
|
82
|
+
|
83
|
+
# Sets the tail of the file that does not repeat over records. Usually, these are json closing tags.
|
84
|
+
config :file_tail_bytes, :validate => :number, :default => 0
|
85
|
+
|
86
|
+
# Sets the regular expression to process content before pushing the event.
|
87
|
+
config :record_preprocess_reg_exp, :validate => :string
|
88
|
+
|
89
|
+
# Sets the page-size for returned blob items. Too big number will hit heap overflow; Too small number will leads to too many requests.
|
90
|
+
#
|
91
|
+
# The default, `100` is good for default heap size of 1G.
|
92
|
+
config :blob_list_page_size, :validate => :number, :default => 100
|
93
|
+
|
94
|
+
# Constant of max integer
|
95
|
+
MAX = 2 ** ([42].pack('i').size * 16 -2 ) -1
|
96
|
+
|
97
|
+
public
|
98
|
+
def register
|
99
|
+
# this is the reader # for this specific instance.
|
100
|
+
@reader = SecureRandom.uuid
|
101
|
+
@registry_locker = "#{@registry_path}.lock"
|
102
|
+
|
103
|
+
# Setup a specific instance of an Azure::Storage::Client
|
104
|
+
client = Azure::Storage::Client.create(:storage_account_name => @storage_account_name, :storage_access_key => @storage_access_key, :storage_blob_host => "https://#{@storage_account_name}.blob.#{@endpoint}")
|
105
|
+
# Get an azure storage blob service object from a specific instance of an Azure::Storage::Client
|
106
|
+
@azure_blob = client.blob_client
|
107
|
+
# Add retry filter to the service object
|
108
|
+
@azure_blob.with_filter(Azure::Storage::Core::Filter::ExponentialRetryPolicyFilter.new)
|
109
|
+
end # def register
|
110
|
+
|
111
|
+
def run(queue)
|
112
|
+
# we can abort the loop if stop? becomes true
|
113
|
+
while !stop?
|
114
|
+
process(queue)
|
115
|
+
Stud.stoppable_sleep(@interval) { stop? }
|
116
|
+
end # loop
|
117
|
+
end # def run
|
118
|
+
|
119
|
+
def stop
|
120
|
+
cleanup_registry
|
121
|
+
end # def stop
|
122
|
+
|
123
|
+
# Start processing the next item.
|
124
|
+
def process(queue)
|
125
|
+
begin
|
126
|
+
blob, start_index, gen = register_for_read
|
127
|
+
|
128
|
+
if(!blob.nil?)
|
129
|
+
begin
|
130
|
+
blob_name = blob.name
|
131
|
+
# Work-around: After returned by get_blob, the etag will contains quotes.
|
132
|
+
new_etag = blob.properties[:etag]
|
133
|
+
# ~ Work-around
|
134
|
+
|
135
|
+
blob, header = @azure_blob.get_blob(@container, blob_name, {:end_range => @file_head_bytes}) if header.nil? unless @file_head_bytes.nil? or @file_head_bytes <= 0
|
136
|
+
|
137
|
+
if start_index == 0
|
138
|
+
# Skip the header since it is already read.
|
139
|
+
start_index = start_index + @file_head_bytes
|
140
|
+
else
|
141
|
+
# Adjust the offset when it is other than first time, then read till the end of the file, including the tail.
|
142
|
+
start_index = start_index - @file_tail_bytes
|
143
|
+
start_index = 0 if start_index < 0
|
144
|
+
end
|
145
|
+
|
146
|
+
blob, content = @azure_blob.get_blob(@container, blob_name, {:start_range => start_index} )
|
147
|
+
|
148
|
+
# content will be used to calculate the new offset. Create a new variable for processed content.
|
149
|
+
processed_content = content
|
150
|
+
if(!@record_preprocess_reg_exp.nil?)
|
151
|
+
reg_exp = Regexp.new(@record_preprocess_reg_exp, Regexp::MULTILINE)
|
152
|
+
processed_content = content.sub(reg_exp, '')
|
153
|
+
end
|
154
|
+
|
155
|
+
# Putting header and content and tail together before pushing into event queue
|
156
|
+
processed_content = "#{header}#{processed_content}" unless header.nil? || header.length == 0
|
157
|
+
|
158
|
+
@codec.decode(processed_content) do |event|
|
159
|
+
decorate(event)
|
160
|
+
queue << event
|
161
|
+
end # decode
|
162
|
+
ensure
|
163
|
+
# Making sure the reader is removed from the registry even when there's exception.
|
164
|
+
new_offset = start_index
|
165
|
+
new_offset = new_offset + content.length unless content.nil?
|
166
|
+
new_registry_item = LogStash::Inputs::RegistryItem.new(blob_name, new_etag, nil, new_offset, gen)
|
167
|
+
update_registry(new_registry_item)
|
168
|
+
end # begin
|
169
|
+
end # if
|
170
|
+
rescue StandardError => e
|
171
|
+
@logger.error("Oh My, An error occurred. \nError:#{e}:\nTrace:\n#{e.backtrace}", :exception => e)
|
172
|
+
end # begin
|
173
|
+
end # process
|
174
|
+
|
175
|
+
# Deserialize registry hash from json string.
|
176
|
+
def deserialize_registry_hash (json_string)
|
177
|
+
result = Hash.new
|
178
|
+
temp_hash = JSON.parse(json_string)
|
179
|
+
temp_hash.values.each { |kvp|
|
180
|
+
result[kvp['file_path']] = LogStash::Inputs::RegistryItem.new(kvp['file_path'], kvp['etag'], kvp['reader'], kvp['offset'], kvp['gen'])
|
181
|
+
}
|
182
|
+
return result
|
183
|
+
end #deserialize_registry_hash
|
184
|
+
|
185
|
+
# List all the blobs in the given container.
|
186
|
+
def list_all_blobs
|
187
|
+
blobs = Set.new []
|
188
|
+
continuation_token = NIL
|
189
|
+
@blob_list_page_size = 100 if @blob_list_page_size <= 0
|
190
|
+
loop do
|
191
|
+
# Need to limit the returned number of the returned entries to avoid out of memory exception.
|
192
|
+
entries = @azure_blob.list_blobs(@container, { :timeout => 10, :marker => continuation_token, :max_results => @blob_list_page_size })
|
193
|
+
entries.each do |entry|
|
194
|
+
blobs << entry
|
195
|
+
end # each
|
196
|
+
continuation_token = entries.continuation_token
|
197
|
+
break if continuation_token.empty?
|
198
|
+
end # loop
|
199
|
+
return blobs
|
200
|
+
end # def list_blobs
|
201
|
+
|
202
|
+
# Raise generation for blob in registry
|
203
|
+
def raise_gen(registry_hash, file_path)
|
204
|
+
begin
|
205
|
+
target_item = registry_hash[file_path]
|
206
|
+
begin
|
207
|
+
target_item.gen += 1
|
208
|
+
# Protect gen from overflow.
|
209
|
+
target_item.gen = target_item.gen / 2 if target_item.gen == MAX
|
210
|
+
rescue StandardError => e
|
211
|
+
@logger.error("Fail to get the next generation for target item #{target_item}.", :exception => e)
|
212
|
+
target_item.gen = 0
|
213
|
+
end
|
214
|
+
|
215
|
+
min_gen_item = registry_hash.values.min_by { |x| x.gen }
|
216
|
+
while min_gen_item.gen > 0
|
217
|
+
registry_hash.values.each { |value|
|
218
|
+
value.gen -= 1
|
219
|
+
}
|
220
|
+
min_gen_item = registry_hash.values.min_by { |x| x.gen }
|
221
|
+
end
|
222
|
+
end
|
223
|
+
end # raise_gen
|
224
|
+
|
225
|
+
# Acquire a lease on a blob item with retries.
|
226
|
+
#
|
227
|
+
# By default, it will retry 30 times with 1 second interval.
|
228
|
+
def acquire_lease(blob_name, retry_times = 30, interval_sec = 1)
|
229
|
+
lease = nil;
|
230
|
+
retried = 0;
|
231
|
+
while lease.nil? do
|
232
|
+
begin
|
233
|
+
lease = @azure_blob.acquire_blob_lease(@container, blob_name, {:timeout => 10})
|
234
|
+
rescue StandardError => e
|
235
|
+
if(e.type == 'LeaseAlreadyPresent')
|
236
|
+
if (retried > retry_times)
|
237
|
+
raise
|
238
|
+
end
|
239
|
+
retried += 1
|
240
|
+
sleep interval_sec
|
241
|
+
end
|
242
|
+
end
|
243
|
+
end #while
|
244
|
+
return lease
|
245
|
+
end # acquire_lease
|
246
|
+
|
247
|
+
# Return the next blob for reading as well as the start index.
|
248
|
+
def register_for_read
|
249
|
+
begin
|
250
|
+
all_blobs = list_all_blobs
|
251
|
+
registry = all_blobs.find { |item| item.name.downcase == @registry_path }
|
252
|
+
registry_locker = all_blobs.find { |item| item.name.downcase == @registry_locker }
|
253
|
+
|
254
|
+
candidate_blobs = all_blobs.select { |item| (item.name.downcase != @registry_path) && ( item.name.downcase != @registry_locker ) }
|
255
|
+
|
256
|
+
start_index = 0
|
257
|
+
gen = 0
|
258
|
+
lease = nil
|
259
|
+
|
260
|
+
# Put lease on locker file than the registy file to allow update of the registry as a workaround for Azure Storage Ruby SDK issue # 16.
|
261
|
+
# Workaround: https://github.com/Azure/azure-storage-ruby/issues/16
|
262
|
+
registry_locker = @azure_blob.create_block_blob(@container, @registry_locker, @reader) if registry_locker.nil?
|
263
|
+
lease = acquire_lease(@registry_locker)
|
264
|
+
# ~ Workaround
|
265
|
+
|
266
|
+
if(registry.nil?)
|
267
|
+
registry_hash = create_registry(candidate_blobs)
|
268
|
+
else
|
269
|
+
registry_hash = load_registry
|
270
|
+
end #if
|
271
|
+
|
272
|
+
picked_blobs = Set.new []
|
273
|
+
# Pick up the next candidate
|
274
|
+
picked_blob = nil
|
275
|
+
candidate_blobs.each { |candidate_blob|
|
276
|
+
registry_item = registry_hash[candidate_blob.name]
|
277
|
+
|
278
|
+
# Appending items that doesn't exist in the hash table
|
279
|
+
if registry_item.nil?
|
280
|
+
registry_item = LogStash::Inputs::RegistryItem.new(candidate_blob.name, candidate_blob.properties[:etag], nil, 0, 0)
|
281
|
+
registry_hash[candidate_blob.name] = registry_item
|
282
|
+
end # if
|
283
|
+
|
284
|
+
if ((registry_item.offset < candidate_blob.properties[:content_length]) && (registry_item.reader.nil? || registry_item.reader == @reader))
|
285
|
+
picked_blobs << candidate_blob
|
286
|
+
end
|
287
|
+
}
|
288
|
+
|
289
|
+
picked_blob = picked_blobs.min_by { |b| registry_hash[b.name].gen }
|
290
|
+
if !picked_blob.nil?
|
291
|
+
registry_item = registry_hash[picked_blob.name]
|
292
|
+
registry_item.reader = @reader
|
293
|
+
registry_hash[picked_blob.name] = registry_item
|
294
|
+
start_index = registry_item.offset
|
295
|
+
raise_gen(registry_hash, picked_blob.name)
|
296
|
+
gen = registry_item.gen
|
297
|
+
end #if
|
298
|
+
|
299
|
+
# Save the chnage for the registry
|
300
|
+
save_registry(registry_hash)
|
301
|
+
|
302
|
+
@azure_blob.release_blob_lease(@container, @registry_locker, lease)
|
303
|
+
lease = nil;
|
304
|
+
|
305
|
+
return picked_blob, start_index, gen
|
306
|
+
rescue StandardError => e
|
307
|
+
@logger.error("Oh My, An error occurred. #{e}:\n#{e.backtrace}", :exception => e)
|
308
|
+
return nil, nil, nil
|
309
|
+
ensure
|
310
|
+
@azure_blob.release_blob_lease(@container, @registry_locker, lease) unless lease.nil?
|
311
|
+
lease = nil
|
312
|
+
end # rescue
|
313
|
+
end #register_for_read
|
314
|
+
|
315
|
+
# Update the registry
|
316
|
+
def update_registry (registry_item)
|
317
|
+
begin
|
318
|
+
lease = nil
|
319
|
+
lease = acquire_lease(@registry_locker)
|
320
|
+
registry_hash = load_registry
|
321
|
+
registry_hash[registry_item.file_path] = registry_item
|
322
|
+
save_registry(registry_hash)
|
323
|
+
@azure_blob.release_blob_lease(@container, @registry_locker, lease)
|
324
|
+
lease = nil
|
325
|
+
rescue StandardError => e
|
326
|
+
@logger.error("Oh My, An error occurred. #{e}:\n#{e.backtrace}", :exception => e)
|
327
|
+
ensure
|
328
|
+
@azure_blob.release_blob_lease(@container, @registry_locker, lease) unless lease.nil?
|
329
|
+
lease = nil
|
330
|
+
end #rescue
|
331
|
+
end # def update_registry
|
332
|
+
|
333
|
+
# Clean up the registry.
|
334
|
+
def cleanup_registry
|
335
|
+
begin
|
336
|
+
lease = nil
|
337
|
+
lease = acquire_lease(@registry_locker)
|
338
|
+
registry_hash = load_registry
|
339
|
+
registry_hash.each { | key, registry_item|
|
340
|
+
registry_item.reader = nil if registry_item.reader == @reader
|
341
|
+
}
|
342
|
+
save_registry(registry_hash)
|
343
|
+
@azure_blob.release_blob_lease(@container, @registry_locker, lease)
|
344
|
+
lease = nil
|
345
|
+
rescue StandardError => e
|
346
|
+
@logger.error("Oh My, An error occurred. #{e}:\n#{e.backtrace}", :exception => e)
|
347
|
+
ensure
|
348
|
+
@azure_blob.release_blob_lease(@container, @registry_locker, lease) unless lease.nil?
|
349
|
+
lease = nil
|
350
|
+
end #rescue
|
351
|
+
end # def cleanup_registry
|
352
|
+
|
353
|
+
# Create a registry file to coordinate between multiple azure blob inputs.
|
354
|
+
def create_registry (blob_items)
|
355
|
+
registry_hash = Hash.new
|
356
|
+
|
357
|
+
blob_items.each do |blob_item|
|
358
|
+
initial_offset = 0
|
359
|
+
initial_offset = blob_item.properties[:content_length] if @registry_create_policy == 'resume'
|
360
|
+
registry_item = LogStash::Inputs::RegistryItem.new(blob_item.name, blob_item.properties[:etag], nil, initial_offset, 0)
|
361
|
+
registry_hash[blob_item.name] = registry_item
|
362
|
+
end # each
|
363
|
+
save_registry(registry_hash)
|
364
|
+
return registry_hash
|
365
|
+
end # create_registry
|
366
|
+
|
367
|
+
# Load the content of the registry into the registry hash and return it.
|
368
|
+
def load_registry
|
369
|
+
# Get content
|
370
|
+
registry_blob, registry_blob_body = @azure_blob.get_blob(@container, @registry_path)
|
371
|
+
registry_hash = deserialize_registry_hash(registry_blob_body)
|
372
|
+
return registry_hash
|
373
|
+
end # def load_registry
|
374
|
+
|
375
|
+
# Serialize the registry hash and save it.
|
376
|
+
def save_registry(registry_hash)
|
377
|
+
# Serialize hash to json
|
378
|
+
registry_hash_json = JSON.generate(registry_hash)
|
379
|
+
|
380
|
+
# Upload registry to blob
|
381
|
+
@azure_blob.create_block_blob(@container, @registry_path, registry_hash_json)
|
382
|
+
end # def save_registry
|
383
|
+
end # class LogStash::Inputs::LogstashInputAzureblob
|
@@ -0,0 +1,26 @@
|
|
1
|
+
Gem::Specification.new do |s|
|
2
|
+
s.name = 'logstash-input-azureblob-json-head-tail'
|
3
|
+
s.version = '0.9.9'
|
4
|
+
s.licenses = ['Apache License (2.0)']
|
5
|
+
s.summary = 'This plugin collects Microsoft Azure Diagnostics data from Azure Storage Blobs.'
|
6
|
+
s.description = 'This gem is a Logstash plugin. It reads and parses data from Azure Storage Blobs.'
|
7
|
+
s.homepage = 'https://github.com/Azure/azure-diagnostics-tools'
|
8
|
+
s.authors = ['Microsoft Corporation']
|
9
|
+
s.email = 'azdiag@microsoft.com'
|
10
|
+
s.require_paths = ['lib']
|
11
|
+
|
12
|
+
# Files
|
13
|
+
s.files = Dir['lib/**/*','spec/**/*','vendor/**/*','*.gemspec','*.md','Gemfile','LICENSE']
|
14
|
+
# Tests
|
15
|
+
s.test_files = s.files.grep(%r{^(test|spec|features)/})
|
16
|
+
|
17
|
+
# Special flag to let us know this is actually a logstash plugin
|
18
|
+
s.metadata = { "logstash_plugin" => "true", "logstash_group" => "input" }
|
19
|
+
|
20
|
+
# Gem dependencies
|
21
|
+
s.add_runtime_dependency "logstash-core-plugin-api", '>= 1.60', '<= 2.99'
|
22
|
+
s.add_runtime_dependency 'logstash-codec-json_lines'
|
23
|
+
s.add_runtime_dependency 'stud', '>= 0.0.22'
|
24
|
+
s.add_runtime_dependency 'azure-storage', '~> 0.12.3.preview'
|
25
|
+
s.add_development_dependency 'logstash-devutils'
|
26
|
+
end
|
@@ -0,0 +1 @@
|
|
1
|
+
require "logstash/devutils/rspec/spec_helper"
|
metadata
ADDED
@@ -0,0 +1,131 @@
|
|
1
|
+
--- !ruby/object:Gem::Specification
|
2
|
+
name: logstash-input-azureblob-json-head-tail
|
3
|
+
version: !ruby/object:Gem::Version
|
4
|
+
version: 0.9.9
|
5
|
+
platform: ruby
|
6
|
+
authors:
|
7
|
+
- Microsoft Corporation
|
8
|
+
autorequire:
|
9
|
+
bindir: bin
|
10
|
+
cert_chain: []
|
11
|
+
date: 2017-08-23 00:00:00.000000000 Z
|
12
|
+
dependencies:
|
13
|
+
- !ruby/object:Gem::Dependency
|
14
|
+
name: logstash-core-plugin-api
|
15
|
+
requirement: !ruby/object:Gem::Requirement
|
16
|
+
requirements:
|
17
|
+
- - '>='
|
18
|
+
- !ruby/object:Gem::Version
|
19
|
+
version: '1.60'
|
20
|
+
- - <=
|
21
|
+
- !ruby/object:Gem::Version
|
22
|
+
version: '2.99'
|
23
|
+
type: :runtime
|
24
|
+
prerelease: false
|
25
|
+
version_requirements: !ruby/object:Gem::Requirement
|
26
|
+
requirements:
|
27
|
+
- - '>='
|
28
|
+
- !ruby/object:Gem::Version
|
29
|
+
version: '1.60'
|
30
|
+
- - <=
|
31
|
+
- !ruby/object:Gem::Version
|
32
|
+
version: '2.99'
|
33
|
+
- !ruby/object:Gem::Dependency
|
34
|
+
name: logstash-codec-json_lines
|
35
|
+
requirement: !ruby/object:Gem::Requirement
|
36
|
+
requirements:
|
37
|
+
- - '>='
|
38
|
+
- !ruby/object:Gem::Version
|
39
|
+
version: '0'
|
40
|
+
type: :runtime
|
41
|
+
prerelease: false
|
42
|
+
version_requirements: !ruby/object:Gem::Requirement
|
43
|
+
requirements:
|
44
|
+
- - '>='
|
45
|
+
- !ruby/object:Gem::Version
|
46
|
+
version: '0'
|
47
|
+
- !ruby/object:Gem::Dependency
|
48
|
+
name: stud
|
49
|
+
requirement: !ruby/object:Gem::Requirement
|
50
|
+
requirements:
|
51
|
+
- - '>='
|
52
|
+
- !ruby/object:Gem::Version
|
53
|
+
version: 0.0.22
|
54
|
+
type: :runtime
|
55
|
+
prerelease: false
|
56
|
+
version_requirements: !ruby/object:Gem::Requirement
|
57
|
+
requirements:
|
58
|
+
- - '>='
|
59
|
+
- !ruby/object:Gem::Version
|
60
|
+
version: 0.0.22
|
61
|
+
- !ruby/object:Gem::Dependency
|
62
|
+
name: azure-storage
|
63
|
+
requirement: !ruby/object:Gem::Requirement
|
64
|
+
requirements:
|
65
|
+
- - ~>
|
66
|
+
- !ruby/object:Gem::Version
|
67
|
+
version: 0.12.3.preview
|
68
|
+
type: :runtime
|
69
|
+
prerelease: false
|
70
|
+
version_requirements: !ruby/object:Gem::Requirement
|
71
|
+
requirements:
|
72
|
+
- - ~>
|
73
|
+
- !ruby/object:Gem::Version
|
74
|
+
version: 0.12.3.preview
|
75
|
+
- !ruby/object:Gem::Dependency
|
76
|
+
name: logstash-devutils
|
77
|
+
requirement: !ruby/object:Gem::Requirement
|
78
|
+
requirements:
|
79
|
+
- - '>='
|
80
|
+
- !ruby/object:Gem::Version
|
81
|
+
version: '0'
|
82
|
+
type: :development
|
83
|
+
prerelease: false
|
84
|
+
version_requirements: !ruby/object:Gem::Requirement
|
85
|
+
requirements:
|
86
|
+
- - '>='
|
87
|
+
- !ruby/object:Gem::Version
|
88
|
+
version: '0'
|
89
|
+
description: This gem is a Logstash plugin. It reads and parses data from Azure Storage
|
90
|
+
Blobs.
|
91
|
+
email: azdiag@microsoft.com
|
92
|
+
executables: []
|
93
|
+
extensions: []
|
94
|
+
extra_rdoc_files: []
|
95
|
+
files:
|
96
|
+
- lib/logstash/inputs/azureblob.rb
|
97
|
+
- spec/inputs/azureblob_spec.rb
|
98
|
+
- logstash-input-azureblob.gemspec
|
99
|
+
- CHANGELOG.md
|
100
|
+
- README.md
|
101
|
+
- Gemfile
|
102
|
+
- LICENSE
|
103
|
+
homepage: https://github.com/Azure/azure-diagnostics-tools
|
104
|
+
licenses:
|
105
|
+
- Apache License (2.0)
|
106
|
+
metadata:
|
107
|
+
logstash_plugin: 'true'
|
108
|
+
logstash_group: input
|
109
|
+
post_install_message:
|
110
|
+
rdoc_options: []
|
111
|
+
require_paths:
|
112
|
+
- lib
|
113
|
+
required_ruby_version: !ruby/object:Gem::Requirement
|
114
|
+
requirements:
|
115
|
+
- - '>='
|
116
|
+
- !ruby/object:Gem::Version
|
117
|
+
version: '0'
|
118
|
+
required_rubygems_version: !ruby/object:Gem::Requirement
|
119
|
+
requirements:
|
120
|
+
- - '>='
|
121
|
+
- !ruby/object:Gem::Version
|
122
|
+
version: '0'
|
123
|
+
requirements: []
|
124
|
+
rubyforge_project:
|
125
|
+
rubygems_version: 2.0.14.1
|
126
|
+
signing_key:
|
127
|
+
specification_version: 4
|
128
|
+
summary: This plugin collects Microsoft Azure Diagnostics data from Azure Storage
|
129
|
+
Blobs.
|
130
|
+
test_files:
|
131
|
+
- spec/inputs/azureblob_spec.rb
|