logstash-input-azureblob-json-head-tail 0.9.9
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +7 -0
- data/CHANGELOG.md +7 -0
- data/Gemfile +2 -0
- data/LICENSE +17 -0
- data/README.md +243 -0
- data/lib/logstash/inputs/azureblob.rb +383 -0
- data/logstash-input-azureblob.gemspec +26 -0
- data/spec/inputs/azureblob_spec.rb +1 -0
- metadata +131 -0
checksums.yaml
ADDED
@@ -0,0 +1,7 @@
|
|
1
|
+
---
|
2
|
+
SHA1:
|
3
|
+
metadata.gz: 36d2a3fcea0fdf5f3af0b3175acab68a94901b7b
|
4
|
+
data.tar.gz: 85b71255879d3b0b15c6277b4aa9530406982255
|
5
|
+
SHA512:
|
6
|
+
metadata.gz: c7e8986e035347e2a482877ae18683fbd1001cc7c0b7bdbf67077090a718d81be5da4a05fa1ba6f388874bc28bb9183927b22faaab8840db6bc69e29c282c607
|
7
|
+
data.tar.gz: 6b99f505724ed327e11eb38c5b971d970691ff9903668482936ee01ab3a89920174218e2b0432e09d56fb9dac0e4f2e79ce466488252ee47fab1586d437bafe7
|
data/CHANGELOG.md
ADDED
@@ -0,0 +1,7 @@
|
|
1
|
+
## 2016.08.17
|
2
|
+
* Added a new configuration parameter for custom endpoint.
|
3
|
+
|
4
|
+
## 2016.05.05
|
5
|
+
* Made the plugin to respect Logstash shutdown signal.
|
6
|
+
* Updated the *logstash-core* runtime dependency requirement to '~> 2.0'.
|
7
|
+
* Updated the *logstash-devutils* development dependency requirement to '>= 0.0.16'
|
data/Gemfile
ADDED
data/LICENSE
ADDED
@@ -0,0 +1,17 @@
|
|
1
|
+
|
2
|
+
Copyright (c) Microsoft. All rights reserved.
|
3
|
+
Microsoft would like to thank its contributors, a list
|
4
|
+
of whom are at http://aka.ms/entlib-contributors
|
5
|
+
|
6
|
+
Licensed under the Apache License, Version 2.0 (the "License"); you
|
7
|
+
may not use this file except in compliance with the License. You may
|
8
|
+
obtain a copy of the License at
|
9
|
+
|
10
|
+
http://www.apache.org/licenses/LICENSE-2.0
|
11
|
+
|
12
|
+
Unless required by applicable law or agreed to in writing, software
|
13
|
+
distributed under the License is distributed on an "AS IS" BASIS,
|
14
|
+
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
|
15
|
+
implied. See the License for the specific language governing permissions
|
16
|
+
and limitations under the License.
|
17
|
+
|
data/README.md
ADDED
@@ -0,0 +1,243 @@
|
|
1
|
+
# Logstash input plugin for Azure Storage Blobs
|
2
|
+
|
3
|
+
## Summary
|
4
|
+
This plugin reads and parses data from Azure Storage Blobs.
|
5
|
+
|
6
|
+
## Installation
|
7
|
+
You can install this plugin using the Logstash "plugin" or "logstash-plugin" (for newer versions of Logstash) command:
|
8
|
+
```sh
|
9
|
+
logstash-plugin install logstash-input-azureblob
|
10
|
+
```
|
11
|
+
For more information, see Logstash reference [Working with plugins](https://www.elastic.co/guide/en/logstash/current/working-with-plugins.html).
|
12
|
+
|
13
|
+
## Configuration
|
14
|
+
### Required Parameters
|
15
|
+
__*storage_account_name*__
|
16
|
+
|
17
|
+
The storage account name.
|
18
|
+
|
19
|
+
__*storage_access_key*__
|
20
|
+
|
21
|
+
The access key to the storage account.
|
22
|
+
|
23
|
+
__*container*__
|
24
|
+
|
25
|
+
The blob container name.
|
26
|
+
|
27
|
+
### Optional Parameters
|
28
|
+
__*endpoint*__
|
29
|
+
|
30
|
+
Specifies the endpoint of Azure Service Management. The default value is `core.windows.net`.
|
31
|
+
|
32
|
+
__*registry_path*__
|
33
|
+
|
34
|
+
Specifies the file path for the registry file to record offsets and coordinate between multiple clients. The default value is `data/registry`.
|
35
|
+
|
36
|
+
Overwrite this value when there happen to be a file at the path of `data/registry` in the azure blob container.
|
37
|
+
|
38
|
+
__*interval*__
|
39
|
+
|
40
|
+
Set how many seconds to idle before checking for new logs. The default, `30`, means idle for `30` seconds.
|
41
|
+
|
42
|
+
__*registry_create_policy*__
|
43
|
+
|
44
|
+
Specifies the way to initially set offset for existing blob files.
|
45
|
+
|
46
|
+
This option only applies for registry creation.
|
47
|
+
|
48
|
+
Valid values include:
|
49
|
+
|
50
|
+
- resume
|
51
|
+
- start_over
|
52
|
+
|
53
|
+
The default, `resume`, means when the registry is initially created, it assumes all blob has been consumed and it will start to pick up any new content in the blobs.
|
54
|
+
|
55
|
+
When set to `start_over`, it assumes none of the blob is consumed and it will read all blob files from begining.
|
56
|
+
|
57
|
+
Offsets will be picked up from registry file whenever it exists.
|
58
|
+
|
59
|
+
__*file_head_bytes*__
|
60
|
+
|
61
|
+
Specifies the header of the file in bytes that does not repeat over records. Usually, these are json opening tags. The default value is `0`.
|
62
|
+
|
63
|
+
__*file_tail_bytes*__
|
64
|
+
|
65
|
+
Specifies the tail of the file that does not repeat over records. Usually, these are json closing tags. The defaul tvalue is `0`.
|
66
|
+
|
67
|
+
__*record_preprocess_reg_exp*__
|
68
|
+
|
69
|
+
Specifies the regular expression to process content before pushing the event. The matched will be removed. For example, `^\s*,` will removing the leading `,` from the content. The regular expression uses multiline mode.
|
70
|
+
|
71
|
+
__*blob_list_page_size*__
|
72
|
+
|
73
|
+
Specifies the page-size for returned blob items. Too big number will hit heap overflow; Too small number will leads to too many requests. The default of `100` is good for heap size of 1G.
|
74
|
+
|
75
|
+
### Examples
|
76
|
+
|
77
|
+
* Bare-bone settings:
|
78
|
+
|
79
|
+
```yaml
|
80
|
+
input
|
81
|
+
{
|
82
|
+
azureblob
|
83
|
+
{
|
84
|
+
storage_account_name => "mystorageaccount"
|
85
|
+
storage_access_key => "VGhpcyBpcyBhIGZha2Uga2V5Lg=="
|
86
|
+
container => "mycontainer"
|
87
|
+
}
|
88
|
+
}
|
89
|
+
```
|
90
|
+
|
91
|
+
* Example for Wad-IIS
|
92
|
+
|
93
|
+
```yaml
|
94
|
+
input {
|
95
|
+
azureblob
|
96
|
+
{
|
97
|
+
storage_account_name => 'mystorageaccount'
|
98
|
+
storage_access_key => 'VGhpcyBpcyBhIGZha2Uga2V5Lg=='
|
99
|
+
container => 'wad-iis-logfiles'
|
100
|
+
codec => line
|
101
|
+
}
|
102
|
+
}
|
103
|
+
filter {
|
104
|
+
## Ignore the comments that IIS will add to the start of the W3C logs
|
105
|
+
#
|
106
|
+
if [message] =~ "^#" {
|
107
|
+
drop {}
|
108
|
+
}
|
109
|
+
|
110
|
+
grok {
|
111
|
+
# https://grokdebug.herokuapp.com/
|
112
|
+
match => ["message", "%{TIMESTAMP_ISO8601:log_timestamp} %{WORD:sitename} %{WORD:computername} %{IP:server_ip} %{WORD:method} %{URIPATH:uriStem} %{NOTSPACE:uriQuery} %{NUMBER:port} %{NOTSPACE:username} %{IPORHOST:clientIP} %{NOTSPACE:protocolVersion} %{NOTSPACE:userAgent} %{NOTSPACE:cookie} %{NOTSPACE:referer} %{NOTSPACE:requestHost} %{NUMBER:response} %{NUMBER:subresponse} %{NUMBER:win32response} %{NUMBER:bytesSent} %{NUMBER:bytesReceived} %{NUMBER:timetaken}"]
|
113
|
+
}
|
114
|
+
|
115
|
+
## Set the Event Timesteamp from the log
|
116
|
+
#
|
117
|
+
date {
|
118
|
+
match => [ "log_timestamp", "YYYY-MM-dd HH:mm:ss" ]
|
119
|
+
timezone => "Etc/UTC"
|
120
|
+
}
|
121
|
+
|
122
|
+
## If the log record has a value for 'bytesSent', then add a new field
|
123
|
+
# to the event that converts it to kilobytes
|
124
|
+
#
|
125
|
+
if [bytesSent] {
|
126
|
+
ruby {
|
127
|
+
code => "event['kilobytesSent'] = event['bytesSent'].to_i / 1024.0"
|
128
|
+
}
|
129
|
+
}
|
130
|
+
|
131
|
+
## Do the same conversion for the bytes received value
|
132
|
+
#
|
133
|
+
if [bytesReceived] {
|
134
|
+
ruby {
|
135
|
+
code => "event['kilobytesReceived'] = event['bytesReceived'].to_i / 1024.0"
|
136
|
+
}
|
137
|
+
}
|
138
|
+
|
139
|
+
## Perform some mutations on the records to prep them for Elastic
|
140
|
+
#
|
141
|
+
mutate {
|
142
|
+
## Convert some fields from strings to integers
|
143
|
+
#
|
144
|
+
convert => ["bytesSent", "integer"]
|
145
|
+
convert => ["bytesReceived", "integer"]
|
146
|
+
convert => ["timetaken", "integer"]
|
147
|
+
|
148
|
+
## Create a new field for the reverse DNS lookup below
|
149
|
+
#
|
150
|
+
add_field => { "clientHostname" => "%{clientIP}" }
|
151
|
+
|
152
|
+
## Finally remove the original log_timestamp field since the event will
|
153
|
+
# have the proper date on it
|
154
|
+
#
|
155
|
+
remove_field => [ "log_timestamp"]
|
156
|
+
}
|
157
|
+
|
158
|
+
## Do a reverse lookup on the client IP to get their hostname.
|
159
|
+
#
|
160
|
+
dns {
|
161
|
+
## Now that we've copied the clientIP into a new field we can
|
162
|
+
# simply replace it here using a reverse lookup
|
163
|
+
#
|
164
|
+
action => "replace"
|
165
|
+
reverse => ["clientHostname"]
|
166
|
+
}
|
167
|
+
|
168
|
+
## Parse out the user agent
|
169
|
+
#
|
170
|
+
useragent {
|
171
|
+
source=> "useragent"
|
172
|
+
prefix=> "browser"
|
173
|
+
}
|
174
|
+
}
|
175
|
+
output {
|
176
|
+
file {
|
177
|
+
path => '/var/tmp/logstash-file-output'
|
178
|
+
codec => rubydebug
|
179
|
+
}
|
180
|
+
stdout {
|
181
|
+
codec => rubydebug
|
182
|
+
}
|
183
|
+
}
|
184
|
+
```
|
185
|
+
|
186
|
+
* NSG Logs
|
187
|
+
|
188
|
+
```yaml
|
189
|
+
input {
|
190
|
+
azureblob
|
191
|
+
{
|
192
|
+
storage_account_name => "mystorageaccount"
|
193
|
+
storage_access_key => "VGhpcyBpcyBhIGZha2Uga2V5Lg=="
|
194
|
+
container => "insights-logs-networksecuritygroupflowevent"
|
195
|
+
codec => "json"
|
196
|
+
file_head_bytes => 21
|
197
|
+
file_tail_bytes => 9
|
198
|
+
record_preprocess_reg_exp => "^\s*,"
|
199
|
+
}
|
200
|
+
}
|
201
|
+
|
202
|
+
filter {
|
203
|
+
split { field => "[records]" }
|
204
|
+
split { field => "[records][properties][flows]"}
|
205
|
+
split { field => "[records][properties][flows][flows]"}
|
206
|
+
split { field => "[records][properties][flows][flows][flowTuples]"}
|
207
|
+
|
208
|
+
mutate{
|
209
|
+
split => { "[records][resourceId]" => "/"}
|
210
|
+
add_field => {"Subscription" => "%{[records][resourceId][2]}"
|
211
|
+
"ResourceGroup" => "%{[records][resourceId][4]}"
|
212
|
+
"NetworkSecurityGroup" => "%{[records][resourceId][8]}"}
|
213
|
+
convert => {"Subscription" => "string"}
|
214
|
+
convert => {"ResourceGroup" => "string"}
|
215
|
+
convert => {"NetworkSecurityGroup" => "string"}
|
216
|
+
split => { "[records][properties][flows][flows][flowTuples]" => ","}
|
217
|
+
add_field => {
|
218
|
+
"unixtimestamp" => "%{[records][properties][flows][flows][flowTuples][0]}"
|
219
|
+
"srcIp" => "%{[records][properties][flows][flows][flowTuples][1]}"
|
220
|
+
"destIp" => "%{[records][properties][flows][flows][flowTuples][2]}"
|
221
|
+
"srcPort" => "%{[records][properties][flows][flows][flowTuples][3]}"
|
222
|
+
"destPort" => "%{[records][properties][flows][flows][flowTuples][4]}"
|
223
|
+
"protocol" => "%{[records][properties][flows][flows][flowTuples][5]}"
|
224
|
+
"trafficflow" => "%{[records][properties][flows][flows][flowTuples][6]}"
|
225
|
+
"traffic" => "%{[records][properties][flows][flows][flowTuples][7]}"
|
226
|
+
}
|
227
|
+
convert => {"unixtimestamp" => "integer"}
|
228
|
+
convert => {"srcPort" => "integer"}
|
229
|
+
convert => {"destPort" => "integer"}
|
230
|
+
}
|
231
|
+
|
232
|
+
date{
|
233
|
+
match => ["unixtimestamp" , "UNIX"]
|
234
|
+
}
|
235
|
+
}
|
236
|
+
|
237
|
+
output {
|
238
|
+
stdout { codec => rubydebug }
|
239
|
+
}
|
240
|
+
```
|
241
|
+
|
242
|
+
## More information
|
243
|
+
The source code of this plugin is hosted in GitHub repo [Microsoft Azure Diagnostics with ELK](https://github.com/Azure/azure-diagnostics-tools). We welcome you to provide feedback and/or contribute to the project.
|
@@ -0,0 +1,383 @@
|
|
1
|
+
# encoding: utf-8
|
2
|
+
require "logstash/inputs/base"
|
3
|
+
require "logstash/namespace"
|
4
|
+
|
5
|
+
# Azure Storage SDK for Ruby
|
6
|
+
require "azure/storage"
|
7
|
+
require 'json' # for registry content
|
8
|
+
require "securerandom" # for generating uuid.
|
9
|
+
|
10
|
+
# Registry item to coordinate between mulitple clients
|
11
|
+
class LogStash::Inputs::RegistryItem
|
12
|
+
attr_accessor :file_path, :etag, :offset, :reader, :gen
|
13
|
+
# Allow json serialization.
|
14
|
+
def as_json(options={})
|
15
|
+
{
|
16
|
+
file_path: @file_path,
|
17
|
+
etag: @etag,
|
18
|
+
reader: @reader,
|
19
|
+
offset: @offset,
|
20
|
+
gen: @gen
|
21
|
+
}
|
22
|
+
end # as_json
|
23
|
+
|
24
|
+
def to_json(*options)
|
25
|
+
as_json(*options).to_json(*options)
|
26
|
+
end # to_json
|
27
|
+
|
28
|
+
def initialize(file_path, etag, reader, offset = 0, gen = 0)
|
29
|
+
@file_path = file_path
|
30
|
+
@etag = etag
|
31
|
+
@reader = reader
|
32
|
+
@offset = offset
|
33
|
+
@gen = gen
|
34
|
+
end # initialize
|
35
|
+
end # class RegistryItem
|
36
|
+
|
37
|
+
|
38
|
+
# Logstash input plugin for Azure Blobs
|
39
|
+
#
|
40
|
+
# This logstash plugin gathers data from Microsoft Azure Blobs
|
41
|
+
class LogStash::Inputs::LogstashInputAzureblob < LogStash::Inputs::Base
|
42
|
+
config_name "azureblob"
|
43
|
+
|
44
|
+
# If undefined, Logstash will complain, even if codec is unused.
|
45
|
+
default :codec, "json_lines"
|
46
|
+
|
47
|
+
# Set the account name for the azure storage account.
|
48
|
+
config :storage_account_name, :validate => :string
|
49
|
+
|
50
|
+
# Set the key to access the storage account.
|
51
|
+
config :storage_access_key, :validate => :string
|
52
|
+
|
53
|
+
# Set the container of the blobs.
|
54
|
+
config :container, :validate => :string
|
55
|
+
|
56
|
+
# Set the endpoint for the blobs.
|
57
|
+
#
|
58
|
+
# The default, `core.windows.net` targets the public azure.
|
59
|
+
config :endpoint, :validate => :string, :default => 'core.windows.net'
|
60
|
+
|
61
|
+
# Set the value of using backup mode.
|
62
|
+
config :backupmode, :validate => :boolean, :default => false, :deprecated => true, :obsolete => 'This option is obsoleted and the settings will be ignored.'
|
63
|
+
|
64
|
+
# Set the value for the registry file.
|
65
|
+
#
|
66
|
+
# The default, `data/registry`, is used to coordinate readings for various instances of the clients.
|
67
|
+
config :registry_path, :validate => :string, :default => 'data/registry'
|
68
|
+
|
69
|
+
# Set how many seconds to keep idle before checking for new logs.
|
70
|
+
#
|
71
|
+
# The default, `30`, means trigger a reading for the log every 30 seconds after entering idle.
|
72
|
+
config :interval, :validate => :number, :default => 30
|
73
|
+
|
74
|
+
# Set the registry create mode
|
75
|
+
#
|
76
|
+
# The default, `resume`, means when the registry is initially created, it assumes all logs has been handled.
|
77
|
+
# When set to `start_over`, it will read all log files from begining.
|
78
|
+
config :registry_create_policy, :validate => :string, :default => 'resume'
|
79
|
+
|
80
|
+
# Sets the header of the file that does not repeat over records. Usually, these are json opening tags.
|
81
|
+
config :file_head_bytes, :validate => :number, :default => 0
|
82
|
+
|
83
|
+
# Sets the tail of the file that does not repeat over records. Usually, these are json closing tags.
|
84
|
+
config :file_tail_bytes, :validate => :number, :default => 0
|
85
|
+
|
86
|
+
# Sets the regular expression to process content before pushing the event.
|
87
|
+
config :record_preprocess_reg_exp, :validate => :string
|
88
|
+
|
89
|
+
# Sets the page-size for returned blob items. Too big number will hit heap overflow; Too small number will leads to too many requests.
|
90
|
+
#
|
91
|
+
# The default, `100` is good for default heap size of 1G.
|
92
|
+
config :blob_list_page_size, :validate => :number, :default => 100
|
93
|
+
|
94
|
+
# Constant of max integer
|
95
|
+
MAX = 2 ** ([42].pack('i').size * 16 -2 ) -1
|
96
|
+
|
97
|
+
public
|
98
|
+
def register
|
99
|
+
# this is the reader # for this specific instance.
|
100
|
+
@reader = SecureRandom.uuid
|
101
|
+
@registry_locker = "#{@registry_path}.lock"
|
102
|
+
|
103
|
+
# Setup a specific instance of an Azure::Storage::Client
|
104
|
+
client = Azure::Storage::Client.create(:storage_account_name => @storage_account_name, :storage_access_key => @storage_access_key, :storage_blob_host => "https://#{@storage_account_name}.blob.#{@endpoint}")
|
105
|
+
# Get an azure storage blob service object from a specific instance of an Azure::Storage::Client
|
106
|
+
@azure_blob = client.blob_client
|
107
|
+
# Add retry filter to the service object
|
108
|
+
@azure_blob.with_filter(Azure::Storage::Core::Filter::ExponentialRetryPolicyFilter.new)
|
109
|
+
end # def register
|
110
|
+
|
111
|
+
def run(queue)
|
112
|
+
# we can abort the loop if stop? becomes true
|
113
|
+
while !stop?
|
114
|
+
process(queue)
|
115
|
+
Stud.stoppable_sleep(@interval) { stop? }
|
116
|
+
end # loop
|
117
|
+
end # def run
|
118
|
+
|
119
|
+
def stop
|
120
|
+
cleanup_registry
|
121
|
+
end # def stop
|
122
|
+
|
123
|
+
# Start processing the next item.
|
124
|
+
def process(queue)
|
125
|
+
begin
|
126
|
+
blob, start_index, gen = register_for_read
|
127
|
+
|
128
|
+
if(!blob.nil?)
|
129
|
+
begin
|
130
|
+
blob_name = blob.name
|
131
|
+
# Work-around: After returned by get_blob, the etag will contains quotes.
|
132
|
+
new_etag = blob.properties[:etag]
|
133
|
+
# ~ Work-around
|
134
|
+
|
135
|
+
blob, header = @azure_blob.get_blob(@container, blob_name, {:end_range => @file_head_bytes}) if header.nil? unless @file_head_bytes.nil? or @file_head_bytes <= 0
|
136
|
+
|
137
|
+
if start_index == 0
|
138
|
+
# Skip the header since it is already read.
|
139
|
+
start_index = start_index + @file_head_bytes
|
140
|
+
else
|
141
|
+
# Adjust the offset when it is other than first time, then read till the end of the file, including the tail.
|
142
|
+
start_index = start_index - @file_tail_bytes
|
143
|
+
start_index = 0 if start_index < 0
|
144
|
+
end
|
145
|
+
|
146
|
+
blob, content = @azure_blob.get_blob(@container, blob_name, {:start_range => start_index} )
|
147
|
+
|
148
|
+
# content will be used to calculate the new offset. Create a new variable for processed content.
|
149
|
+
processed_content = content
|
150
|
+
if(!@record_preprocess_reg_exp.nil?)
|
151
|
+
reg_exp = Regexp.new(@record_preprocess_reg_exp, Regexp::MULTILINE)
|
152
|
+
processed_content = content.sub(reg_exp, '')
|
153
|
+
end
|
154
|
+
|
155
|
+
# Putting header and content and tail together before pushing into event queue
|
156
|
+
processed_content = "#{header}#{processed_content}" unless header.nil? || header.length == 0
|
157
|
+
|
158
|
+
@codec.decode(processed_content) do |event|
|
159
|
+
decorate(event)
|
160
|
+
queue << event
|
161
|
+
end # decode
|
162
|
+
ensure
|
163
|
+
# Making sure the reader is removed from the registry even when there's exception.
|
164
|
+
new_offset = start_index
|
165
|
+
new_offset = new_offset + content.length unless content.nil?
|
166
|
+
new_registry_item = LogStash::Inputs::RegistryItem.new(blob_name, new_etag, nil, new_offset, gen)
|
167
|
+
update_registry(new_registry_item)
|
168
|
+
end # begin
|
169
|
+
end # if
|
170
|
+
rescue StandardError => e
|
171
|
+
@logger.error("Oh My, An error occurred. \nError:#{e}:\nTrace:\n#{e.backtrace}", :exception => e)
|
172
|
+
end # begin
|
173
|
+
end # process
|
174
|
+
|
175
|
+
# Deserialize registry hash from json string.
|
176
|
+
def deserialize_registry_hash (json_string)
|
177
|
+
result = Hash.new
|
178
|
+
temp_hash = JSON.parse(json_string)
|
179
|
+
temp_hash.values.each { |kvp|
|
180
|
+
result[kvp['file_path']] = LogStash::Inputs::RegistryItem.new(kvp['file_path'], kvp['etag'], kvp['reader'], kvp['offset'], kvp['gen'])
|
181
|
+
}
|
182
|
+
return result
|
183
|
+
end #deserialize_registry_hash
|
184
|
+
|
185
|
+
# List all the blobs in the given container.
|
186
|
+
def list_all_blobs
|
187
|
+
blobs = Set.new []
|
188
|
+
continuation_token = NIL
|
189
|
+
@blob_list_page_size = 100 if @blob_list_page_size <= 0
|
190
|
+
loop do
|
191
|
+
# Need to limit the returned number of the returned entries to avoid out of memory exception.
|
192
|
+
entries = @azure_blob.list_blobs(@container, { :timeout => 10, :marker => continuation_token, :max_results => @blob_list_page_size })
|
193
|
+
entries.each do |entry|
|
194
|
+
blobs << entry
|
195
|
+
end # each
|
196
|
+
continuation_token = entries.continuation_token
|
197
|
+
break if continuation_token.empty?
|
198
|
+
end # loop
|
199
|
+
return blobs
|
200
|
+
end # def list_blobs
|
201
|
+
|
202
|
+
# Raise generation for blob in registry
|
203
|
+
def raise_gen(registry_hash, file_path)
|
204
|
+
begin
|
205
|
+
target_item = registry_hash[file_path]
|
206
|
+
begin
|
207
|
+
target_item.gen += 1
|
208
|
+
# Protect gen from overflow.
|
209
|
+
target_item.gen = target_item.gen / 2 if target_item.gen == MAX
|
210
|
+
rescue StandardError => e
|
211
|
+
@logger.error("Fail to get the next generation for target item #{target_item}.", :exception => e)
|
212
|
+
target_item.gen = 0
|
213
|
+
end
|
214
|
+
|
215
|
+
min_gen_item = registry_hash.values.min_by { |x| x.gen }
|
216
|
+
while min_gen_item.gen > 0
|
217
|
+
registry_hash.values.each { |value|
|
218
|
+
value.gen -= 1
|
219
|
+
}
|
220
|
+
min_gen_item = registry_hash.values.min_by { |x| x.gen }
|
221
|
+
end
|
222
|
+
end
|
223
|
+
end # raise_gen
|
224
|
+
|
225
|
+
# Acquire a lease on a blob item with retries.
|
226
|
+
#
|
227
|
+
# By default, it will retry 30 times with 1 second interval.
|
228
|
+
def acquire_lease(blob_name, retry_times = 30, interval_sec = 1)
|
229
|
+
lease = nil;
|
230
|
+
retried = 0;
|
231
|
+
while lease.nil? do
|
232
|
+
begin
|
233
|
+
lease = @azure_blob.acquire_blob_lease(@container, blob_name, {:timeout => 10})
|
234
|
+
rescue StandardError => e
|
235
|
+
if(e.type == 'LeaseAlreadyPresent')
|
236
|
+
if (retried > retry_times)
|
237
|
+
raise
|
238
|
+
end
|
239
|
+
retried += 1
|
240
|
+
sleep interval_sec
|
241
|
+
end
|
242
|
+
end
|
243
|
+
end #while
|
244
|
+
return lease
|
245
|
+
end # acquire_lease
|
246
|
+
|
247
|
+
# Return the next blob for reading as well as the start index.
|
248
|
+
def register_for_read
|
249
|
+
begin
|
250
|
+
all_blobs = list_all_blobs
|
251
|
+
registry = all_blobs.find { |item| item.name.downcase == @registry_path }
|
252
|
+
registry_locker = all_blobs.find { |item| item.name.downcase == @registry_locker }
|
253
|
+
|
254
|
+
candidate_blobs = all_blobs.select { |item| (item.name.downcase != @registry_path) && ( item.name.downcase != @registry_locker ) }
|
255
|
+
|
256
|
+
start_index = 0
|
257
|
+
gen = 0
|
258
|
+
lease = nil
|
259
|
+
|
260
|
+
# Put lease on locker file than the registy file to allow update of the registry as a workaround for Azure Storage Ruby SDK issue # 16.
|
261
|
+
# Workaround: https://github.com/Azure/azure-storage-ruby/issues/16
|
262
|
+
registry_locker = @azure_blob.create_block_blob(@container, @registry_locker, @reader) if registry_locker.nil?
|
263
|
+
lease = acquire_lease(@registry_locker)
|
264
|
+
# ~ Workaround
|
265
|
+
|
266
|
+
if(registry.nil?)
|
267
|
+
registry_hash = create_registry(candidate_blobs)
|
268
|
+
else
|
269
|
+
registry_hash = load_registry
|
270
|
+
end #if
|
271
|
+
|
272
|
+
picked_blobs = Set.new []
|
273
|
+
# Pick up the next candidate
|
274
|
+
picked_blob = nil
|
275
|
+
candidate_blobs.each { |candidate_blob|
|
276
|
+
registry_item = registry_hash[candidate_blob.name]
|
277
|
+
|
278
|
+
# Appending items that doesn't exist in the hash table
|
279
|
+
if registry_item.nil?
|
280
|
+
registry_item = LogStash::Inputs::RegistryItem.new(candidate_blob.name, candidate_blob.properties[:etag], nil, 0, 0)
|
281
|
+
registry_hash[candidate_blob.name] = registry_item
|
282
|
+
end # if
|
283
|
+
|
284
|
+
if ((registry_item.offset < candidate_blob.properties[:content_length]) && (registry_item.reader.nil? || registry_item.reader == @reader))
|
285
|
+
picked_blobs << candidate_blob
|
286
|
+
end
|
287
|
+
}
|
288
|
+
|
289
|
+
picked_blob = picked_blobs.min_by { |b| registry_hash[b.name].gen }
|
290
|
+
if !picked_blob.nil?
|
291
|
+
registry_item = registry_hash[picked_blob.name]
|
292
|
+
registry_item.reader = @reader
|
293
|
+
registry_hash[picked_blob.name] = registry_item
|
294
|
+
start_index = registry_item.offset
|
295
|
+
raise_gen(registry_hash, picked_blob.name)
|
296
|
+
gen = registry_item.gen
|
297
|
+
end #if
|
298
|
+
|
299
|
+
# Save the chnage for the registry
|
300
|
+
save_registry(registry_hash)
|
301
|
+
|
302
|
+
@azure_blob.release_blob_lease(@container, @registry_locker, lease)
|
303
|
+
lease = nil;
|
304
|
+
|
305
|
+
return picked_blob, start_index, gen
|
306
|
+
rescue StandardError => e
|
307
|
+
@logger.error("Oh My, An error occurred. #{e}:\n#{e.backtrace}", :exception => e)
|
308
|
+
return nil, nil, nil
|
309
|
+
ensure
|
310
|
+
@azure_blob.release_blob_lease(@container, @registry_locker, lease) unless lease.nil?
|
311
|
+
lease = nil
|
312
|
+
end # rescue
|
313
|
+
end #register_for_read
|
314
|
+
|
315
|
+
# Update the registry
|
316
|
+
def update_registry (registry_item)
|
317
|
+
begin
|
318
|
+
lease = nil
|
319
|
+
lease = acquire_lease(@registry_locker)
|
320
|
+
registry_hash = load_registry
|
321
|
+
registry_hash[registry_item.file_path] = registry_item
|
322
|
+
save_registry(registry_hash)
|
323
|
+
@azure_blob.release_blob_lease(@container, @registry_locker, lease)
|
324
|
+
lease = nil
|
325
|
+
rescue StandardError => e
|
326
|
+
@logger.error("Oh My, An error occurred. #{e}:\n#{e.backtrace}", :exception => e)
|
327
|
+
ensure
|
328
|
+
@azure_blob.release_blob_lease(@container, @registry_locker, lease) unless lease.nil?
|
329
|
+
lease = nil
|
330
|
+
end #rescue
|
331
|
+
end # def update_registry
|
332
|
+
|
333
|
+
# Clean up the registry.
|
334
|
+
def cleanup_registry
|
335
|
+
begin
|
336
|
+
lease = nil
|
337
|
+
lease = acquire_lease(@registry_locker)
|
338
|
+
registry_hash = load_registry
|
339
|
+
registry_hash.each { | key, registry_item|
|
340
|
+
registry_item.reader = nil if registry_item.reader == @reader
|
341
|
+
}
|
342
|
+
save_registry(registry_hash)
|
343
|
+
@azure_blob.release_blob_lease(@container, @registry_locker, lease)
|
344
|
+
lease = nil
|
345
|
+
rescue StandardError => e
|
346
|
+
@logger.error("Oh My, An error occurred. #{e}:\n#{e.backtrace}", :exception => e)
|
347
|
+
ensure
|
348
|
+
@azure_blob.release_blob_lease(@container, @registry_locker, lease) unless lease.nil?
|
349
|
+
lease = nil
|
350
|
+
end #rescue
|
351
|
+
end # def cleanup_registry
|
352
|
+
|
353
|
+
# Create a registry file to coordinate between multiple azure blob inputs.
|
354
|
+
def create_registry (blob_items)
|
355
|
+
registry_hash = Hash.new
|
356
|
+
|
357
|
+
blob_items.each do |blob_item|
|
358
|
+
initial_offset = 0
|
359
|
+
initial_offset = blob_item.properties[:content_length] if @registry_create_policy == 'resume'
|
360
|
+
registry_item = LogStash::Inputs::RegistryItem.new(blob_item.name, blob_item.properties[:etag], nil, initial_offset, 0)
|
361
|
+
registry_hash[blob_item.name] = registry_item
|
362
|
+
end # each
|
363
|
+
save_registry(registry_hash)
|
364
|
+
return registry_hash
|
365
|
+
end # create_registry
|
366
|
+
|
367
|
+
# Load the content of the registry into the registry hash and return it.
|
368
|
+
def load_registry
|
369
|
+
# Get content
|
370
|
+
registry_blob, registry_blob_body = @azure_blob.get_blob(@container, @registry_path)
|
371
|
+
registry_hash = deserialize_registry_hash(registry_blob_body)
|
372
|
+
return registry_hash
|
373
|
+
end # def load_registry
|
374
|
+
|
375
|
+
# Serialize the registry hash and save it.
|
376
|
+
def save_registry(registry_hash)
|
377
|
+
# Serialize hash to json
|
378
|
+
registry_hash_json = JSON.generate(registry_hash)
|
379
|
+
|
380
|
+
# Upload registry to blob
|
381
|
+
@azure_blob.create_block_blob(@container, @registry_path, registry_hash_json)
|
382
|
+
end # def save_registry
|
383
|
+
end # class LogStash::Inputs::LogstashInputAzureblob
|
@@ -0,0 +1,26 @@
|
|
1
|
+
Gem::Specification.new do |s|
|
2
|
+
s.name = 'logstash-input-azureblob-json-head-tail'
|
3
|
+
s.version = '0.9.9'
|
4
|
+
s.licenses = ['Apache License (2.0)']
|
5
|
+
s.summary = 'This plugin collects Microsoft Azure Diagnostics data from Azure Storage Blobs.'
|
6
|
+
s.description = 'This gem is a Logstash plugin. It reads and parses data from Azure Storage Blobs.'
|
7
|
+
s.homepage = 'https://github.com/Azure/azure-diagnostics-tools'
|
8
|
+
s.authors = ['Microsoft Corporation']
|
9
|
+
s.email = 'azdiag@microsoft.com'
|
10
|
+
s.require_paths = ['lib']
|
11
|
+
|
12
|
+
# Files
|
13
|
+
s.files = Dir['lib/**/*','spec/**/*','vendor/**/*','*.gemspec','*.md','Gemfile','LICENSE']
|
14
|
+
# Tests
|
15
|
+
s.test_files = s.files.grep(%r{^(test|spec|features)/})
|
16
|
+
|
17
|
+
# Special flag to let us know this is actually a logstash plugin
|
18
|
+
s.metadata = { "logstash_plugin" => "true", "logstash_group" => "input" }
|
19
|
+
|
20
|
+
# Gem dependencies
|
21
|
+
s.add_runtime_dependency "logstash-core-plugin-api", '>= 1.60', '<= 2.99'
|
22
|
+
s.add_runtime_dependency 'logstash-codec-json_lines'
|
23
|
+
s.add_runtime_dependency 'stud', '>= 0.0.22'
|
24
|
+
s.add_runtime_dependency 'azure-storage', '~> 0.12.3.preview'
|
25
|
+
s.add_development_dependency 'logstash-devutils'
|
26
|
+
end
|
@@ -0,0 +1 @@
|
|
1
|
+
require "logstash/devutils/rspec/spec_helper"
|
metadata
ADDED
@@ -0,0 +1,131 @@
|
|
1
|
+
--- !ruby/object:Gem::Specification
|
2
|
+
name: logstash-input-azureblob-json-head-tail
|
3
|
+
version: !ruby/object:Gem::Version
|
4
|
+
version: 0.9.9
|
5
|
+
platform: ruby
|
6
|
+
authors:
|
7
|
+
- Microsoft Corporation
|
8
|
+
autorequire:
|
9
|
+
bindir: bin
|
10
|
+
cert_chain: []
|
11
|
+
date: 2017-08-23 00:00:00.000000000 Z
|
12
|
+
dependencies:
|
13
|
+
- !ruby/object:Gem::Dependency
|
14
|
+
name: logstash-core-plugin-api
|
15
|
+
requirement: !ruby/object:Gem::Requirement
|
16
|
+
requirements:
|
17
|
+
- - '>='
|
18
|
+
- !ruby/object:Gem::Version
|
19
|
+
version: '1.60'
|
20
|
+
- - <=
|
21
|
+
- !ruby/object:Gem::Version
|
22
|
+
version: '2.99'
|
23
|
+
type: :runtime
|
24
|
+
prerelease: false
|
25
|
+
version_requirements: !ruby/object:Gem::Requirement
|
26
|
+
requirements:
|
27
|
+
- - '>='
|
28
|
+
- !ruby/object:Gem::Version
|
29
|
+
version: '1.60'
|
30
|
+
- - <=
|
31
|
+
- !ruby/object:Gem::Version
|
32
|
+
version: '2.99'
|
33
|
+
- !ruby/object:Gem::Dependency
|
34
|
+
name: logstash-codec-json_lines
|
35
|
+
requirement: !ruby/object:Gem::Requirement
|
36
|
+
requirements:
|
37
|
+
- - '>='
|
38
|
+
- !ruby/object:Gem::Version
|
39
|
+
version: '0'
|
40
|
+
type: :runtime
|
41
|
+
prerelease: false
|
42
|
+
version_requirements: !ruby/object:Gem::Requirement
|
43
|
+
requirements:
|
44
|
+
- - '>='
|
45
|
+
- !ruby/object:Gem::Version
|
46
|
+
version: '0'
|
47
|
+
- !ruby/object:Gem::Dependency
|
48
|
+
name: stud
|
49
|
+
requirement: !ruby/object:Gem::Requirement
|
50
|
+
requirements:
|
51
|
+
- - '>='
|
52
|
+
- !ruby/object:Gem::Version
|
53
|
+
version: 0.0.22
|
54
|
+
type: :runtime
|
55
|
+
prerelease: false
|
56
|
+
version_requirements: !ruby/object:Gem::Requirement
|
57
|
+
requirements:
|
58
|
+
- - '>='
|
59
|
+
- !ruby/object:Gem::Version
|
60
|
+
version: 0.0.22
|
61
|
+
- !ruby/object:Gem::Dependency
|
62
|
+
name: azure-storage
|
63
|
+
requirement: !ruby/object:Gem::Requirement
|
64
|
+
requirements:
|
65
|
+
- - ~>
|
66
|
+
- !ruby/object:Gem::Version
|
67
|
+
version: 0.12.3.preview
|
68
|
+
type: :runtime
|
69
|
+
prerelease: false
|
70
|
+
version_requirements: !ruby/object:Gem::Requirement
|
71
|
+
requirements:
|
72
|
+
- - ~>
|
73
|
+
- !ruby/object:Gem::Version
|
74
|
+
version: 0.12.3.preview
|
75
|
+
- !ruby/object:Gem::Dependency
|
76
|
+
name: logstash-devutils
|
77
|
+
requirement: !ruby/object:Gem::Requirement
|
78
|
+
requirements:
|
79
|
+
- - '>='
|
80
|
+
- !ruby/object:Gem::Version
|
81
|
+
version: '0'
|
82
|
+
type: :development
|
83
|
+
prerelease: false
|
84
|
+
version_requirements: !ruby/object:Gem::Requirement
|
85
|
+
requirements:
|
86
|
+
- - '>='
|
87
|
+
- !ruby/object:Gem::Version
|
88
|
+
version: '0'
|
89
|
+
description: This gem is a Logstash plugin. It reads and parses data from Azure Storage
|
90
|
+
Blobs.
|
91
|
+
email: azdiag@microsoft.com
|
92
|
+
executables: []
|
93
|
+
extensions: []
|
94
|
+
extra_rdoc_files: []
|
95
|
+
files:
|
96
|
+
- lib/logstash/inputs/azureblob.rb
|
97
|
+
- spec/inputs/azureblob_spec.rb
|
98
|
+
- logstash-input-azureblob.gemspec
|
99
|
+
- CHANGELOG.md
|
100
|
+
- README.md
|
101
|
+
- Gemfile
|
102
|
+
- LICENSE
|
103
|
+
homepage: https://github.com/Azure/azure-diagnostics-tools
|
104
|
+
licenses:
|
105
|
+
- Apache License (2.0)
|
106
|
+
metadata:
|
107
|
+
logstash_plugin: 'true'
|
108
|
+
logstash_group: input
|
109
|
+
post_install_message:
|
110
|
+
rdoc_options: []
|
111
|
+
require_paths:
|
112
|
+
- lib
|
113
|
+
required_ruby_version: !ruby/object:Gem::Requirement
|
114
|
+
requirements:
|
115
|
+
- - '>='
|
116
|
+
- !ruby/object:Gem::Version
|
117
|
+
version: '0'
|
118
|
+
required_rubygems_version: !ruby/object:Gem::Requirement
|
119
|
+
requirements:
|
120
|
+
- - '>='
|
121
|
+
- !ruby/object:Gem::Version
|
122
|
+
version: '0'
|
123
|
+
requirements: []
|
124
|
+
rubyforge_project:
|
125
|
+
rubygems_version: 2.0.14.1
|
126
|
+
signing_key:
|
127
|
+
specification_version: 4
|
128
|
+
summary: This plugin collects Microsoft Azure Diagnostics data from Azure Storage
|
129
|
+
Blobs.
|
130
|
+
test_files:
|
131
|
+
- spec/inputs/azureblob_spec.rb
|