logstash-input-azure_blob_storage 0.10.0 → 0.10.1
Sign up to get free protection for your applications and to get access to all the features.
- checksums.yaml +4 -4
- data/CHANGELOG.md +2 -1
- data/README.md +54 -4
- data/lib/logstash/inputs/azure_blob_storage.rb +28 -16
- data/logstash-input-azure_blob_storage.gemspec +13 -2
- data/spec/inputs/azure_blob_storage_spec.rb +7 -0
- metadata +13 -6
checksums.yaml
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
---
|
2
2
|
SHA256:
|
3
|
-
metadata.gz:
|
4
|
-
data.tar.gz:
|
3
|
+
metadata.gz: db216440cf4319f70a5fbdb001a53da72826afdcbce43c04ada28b63c5d9e1f8
|
4
|
+
data.tar.gz: e2e519090c0d67b6b65f4570c34be4f0b02592864459f261c7be613486f2941b
|
5
5
|
SHA512:
|
6
|
-
metadata.gz:
|
7
|
-
data.tar.gz:
|
6
|
+
metadata.gz: 341abd35cf3b732c1a0bada111cbac79cafa92fb48fd14ad24e0121245c67f1819c1d6e5e4647dda645773231da6aa2a61b7e1aaee4027fa97fe9e857a8a334f
|
7
|
+
data.tar.gz: 93542f740dda404889f623c9a9d460a1ac6f7181ddd6974de8d775e4e29585b8fccd43b9bd05a880250a4da2709121bf5f018f15a8ce49a9d758b1c645482cbf
|
data/CHANGELOG.md
CHANGED
data/README.md
CHANGED
@@ -10,10 +10,13 @@ All plugin documentation are placed under one [central location](http://www.elas
|
|
10
10
|
|
11
11
|
## Need Help?
|
12
12
|
|
13
|
-
Need help? Try #logstash on freenode IRC or the https://discuss.elastic.co/c/logstash discussion forum. For real problems or feature requests, raise a github issue. Pull requests will ionly be merged after discussion through an issue.
|
13
|
+
Need help? Try #logstash on freenode IRC or the https://discuss.elastic.co/c/logstash discussion forum. For real problems or feature requests, raise a github issue [GITHUB/janmg/logstash-input-azure_blob_storage/](https://github.com/janmg/logstash-input-azure_blob_storage). Pull requests will ionly be merged after discussion through an issue.
|
14
14
|
|
15
15
|
## Purpose
|
16
|
-
This plugin can read from Azure Storage Blobs,
|
16
|
+
This plugin can read from Azure Storage Blobs, for instance diagnostics logs for NSG flow logs or accesslogs from App Services.
|
17
|
+
[Azure Blob Storage](https://azure.microsoft.com/en-us/services/storage/blobs/)
|
18
|
+
|
19
|
+
After every interval it will write a registry to the storageaccount to save the information of how many bytes per blob (file) are read and processed. After all files are processed and at least one interval has passed a new file list is generated and a worklist is constructed that will be processed. When a file has already been processed before, partial files are read from the offset to the filesize at the time of the file listing. If the codec is JSON partial files will be have the header and tail will be added. They can be configured. If logtype is nsgflowlog, the plugin will process the splitting into individual tuple events. The logtype wadiis may in the future be used to process the grok formats to split into log lines. Any other format is fed into the queue as one event per file or partial file. It's then up to the filter to split and mutate the file format. use source => message in the filter {} block.
|
17
20
|
|
18
21
|
## Installation
|
19
22
|
This plugin can be installed through logstash-plugin
|
@@ -49,6 +52,9 @@ curl -XPUT 'localhost:9600/_node/logging?pretty' -H 'Content-Type: application/j
|
|
49
52
|
|
50
53
|
|
51
54
|
## Configuration Examples
|
55
|
+
The minimum configuration required as input is storageaccount, access_key and container.
|
56
|
+
|
57
|
+
For nsgflowlogs, a simple configuration looks like this
|
52
58
|
```
|
53
59
|
input {
|
54
60
|
azure_blob_storage {
|
@@ -79,7 +85,7 @@ output {
|
|
79
85
|
}
|
80
86
|
```
|
81
87
|
|
82
|
-
|
88
|
+
It's possible to specify the optional parameters to overwrite the defaults. The iplookup, use_redis and iplist parameters are used for additional information about the source and destination ip address. Redis can be used for caching the results and iplist is to configure an array of ip addresses.
|
83
89
|
```
|
84
90
|
input {
|
85
91
|
azure_blob_storage {
|
@@ -90,7 +96,7 @@ input {
|
|
90
96
|
logtype => "nsgflowlog"
|
91
97
|
prefix => "resourceId=/"
|
92
98
|
registry_create_policy => "resume"
|
93
|
-
interval =>
|
99
|
+
interval => 300
|
94
100
|
iplookup => "http://10.0.0.5:6081/ripe.php?ip="
|
95
101
|
use_redis => true
|
96
102
|
iplist => [
|
@@ -100,3 +106,47 @@ input {
|
|
100
106
|
}
|
101
107
|
}
|
102
108
|
```
|
109
|
+
|
110
|
+
For WAD IIS and App Services the HTTP AccessLogs can be retrieved from a storage account as line based events and parsed through GROK. The date stamp can also be parsed with %{TIMESTAMP_ISO8601:log_timestamp}. For WAD IIS logfiles the container is wad-iis-logfiles. In the future grokking may happen already by the plugin.
|
111
|
+
```
|
112
|
+
input {
|
113
|
+
azure_blob_storage {
|
114
|
+
storageaccount => "yourstorageaccountname"
|
115
|
+
access_key => "Ba5e64c0d3=="
|
116
|
+
container => "access-logs"
|
117
|
+
interval => 300
|
118
|
+
codec => line
|
119
|
+
}
|
120
|
+
}
|
121
|
+
|
122
|
+
filter {
|
123
|
+
if [message] =~ "^#" {
|
124
|
+
drop {}
|
125
|
+
}
|
126
|
+
|
127
|
+
mutate {
|
128
|
+
strip => "message"
|
129
|
+
}
|
130
|
+
|
131
|
+
grok {
|
132
|
+
match => ['message', '(?<timestamp>%{YEAR}-%{MONTHNUM}-%{MONTHDAY} %{HOUR}:%{MINUTE}:%{SECOND}\d+) %{NOTSPACE:instanceId} %{WORD:httpMethod} %{URIPATH:requestUri} %{NOTSPACE:requestQuery} %{NUMBER:port} %{NOTSPACE:username} %{IPORHOST:clientIP} %{NOTSPACE:userAgent} %{NOTSPACE:cookie} %{NOTSPACE:referer} %{NOTSPACE:host} %{NUMBER:httpStatus} %{NUMBER:subresponse} %{NUMBER:win32response} %{NUMBER:sentBytes:int} %{NUMBER:receivedBytes:int} %{NUMBER:timeTaken:int}']
|
133
|
+
}
|
134
|
+
|
135
|
+
date {
|
136
|
+
match => [ "timestamp", "YYYY-MM-dd HH:mm:ss" ]
|
137
|
+
target => "@timestamp"
|
138
|
+
}
|
139
|
+
|
140
|
+
mutate {
|
141
|
+
remove_field => ["log_timestamp"]
|
142
|
+
remove_field => ["message"]
|
143
|
+
remove_field => ["win32response"]
|
144
|
+
remove_field => ["subresponse"]
|
145
|
+
remove_field => ["username"]
|
146
|
+
remove_field => ["clientPort"]
|
147
|
+
remove_field => ["port"]
|
148
|
+
remove_field => ["timestamp"]
|
149
|
+
}
|
150
|
+
}
|
151
|
+
```
|
152
|
+
|
@@ -7,10 +7,10 @@ require 'azure/storage/blob'
|
|
7
7
|
#require 'date'
|
8
8
|
#require 'json'
|
9
9
|
#require 'thread'
|
10
|
-
#require
|
10
|
+
#require 'redis'
|
11
11
|
#require 'net/http'
|
12
12
|
|
13
|
-
# This is a logstash input plugin for files in Azure Blob Storage. There is a storage explorer in the portal and an application with the same name https://storageexplorer.com. A storage account has by default a globally unique name, {storageaccount}.blob.core.windows.net which is a CNAME to Azures blob servers blob.*.store.core.windows.net. A storageaccount has an container and those have a directory and blobs (like files)
|
13
|
+
# This is a logstash input plugin for files in Azure Blob Storage. There is a storage explorer in the portal and an application with the same name https://storageexplorer.com. A storage account has by default a globally unique name, {storageaccount}.blob.core.windows.net which is a CNAME to Azures blob servers blob.*.store.core.windows.net. A storageaccount has an container and those have a directory and blobs (like files). Blobs have one or more blocks. After writing the blocks, they can be committed. Some Azure diagnostics can send events to an EventHub that can be parse through the plugin logstash-input-azure_event_hubs, but for the events that are only stored in an storage account, use this plugin. The original logstash-input-azureblob from azure-diagnostics-tools is great for low volumes, but it suffers from outdated client, slow reads, lease locking issues and json parse errors.
|
14
14
|
# https://azure.microsoft.com/en-us/services/storage/blobs/
|
15
15
|
class LogStash::Inputs::AzureBlobStorage < LogStash::Inputs::Base
|
16
16
|
config_name "azure_blob_storage"
|
@@ -23,16 +23,19 @@ class LogStash::Inputs::AzureBlobStorage < LogStash::Inputs::Base
|
|
23
23
|
|
24
24
|
# The storage account is accessed through Azure::Storage::Blob::BlobService, it needs either a sas_token, connection string or a storageaccount/access_key pair.
|
25
25
|
# https://github.com/Azure/azure-storage-ruby/blob/master/blob/lib/azure/storage/blob/blob_service.rb#L42
|
26
|
-
config :connection_string, :validate => :password
|
26
|
+
config :connection_string, :validate => :password, :required => false
|
27
27
|
|
28
28
|
# The storage account name for the azure storage account.
|
29
|
-
config :storageaccount, :validate => :string
|
29
|
+
config :storageaccount, :validate => :string, :required => false
|
30
|
+
|
31
|
+
# DNS Suffix other then blob.core.windows.net
|
32
|
+
config :dns_suffix, :validate => :string, :required => false, :default => 'core.windows.net'
|
30
33
|
|
31
34
|
# The (primary or secondary) Access Key for the the storage account. The key can be found in the portal.azure.com or through the azure api StorageAccounts/ListKeys. For example the PowerShell command Get-AzStorageAccountKey.
|
32
|
-
config :access_key, :validate => :password
|
35
|
+
config :access_key, :validate => :password, :required => false
|
33
36
|
|
34
37
|
# SAS is the Shared Access Signature, that provides restricted access rights. If the sas_token is absent, the access_key is used instead.
|
35
|
-
config :sas_token, :validate => :password
|
38
|
+
config :sas_token, :validate => :password, :required => false
|
36
39
|
|
37
40
|
# The container of the blobs.
|
38
41
|
config :container, :validate => :string, :default => 'insights-logs-networksecuritygroupflowevent'
|
@@ -91,21 +94,22 @@ class LogStash::Inputs::AzureBlobStorage < LogStash::Inputs::Base
|
|
91
94
|
|
92
95
|
# Optional to enrich NSGFLOWLOGS with netname and subnet the iplookup value points to a webservice that provides the information in JSON format like this.
|
93
96
|
# {"ip":"8.8.8.8","netname":"Google","subnet":"8.8.8.0\/24","hostname":"google-public-dns-a.google.com"}
|
94
|
-
|
95
|
-
|
96
|
-
# Optional Redis IP cache
|
97
|
-
config :use_redis, :validate => :boolean, :required => false, :default => false
|
98
|
-
|
97
|
+
# In the query parameter has the <ip> tag will be replaced by the IP address to lookup, other parameters are optional and according to your lookup service.
|
98
|
+
config :iplookup, :validate => :string, :required => false, :default => 'http://127.0.0.1/ripe.php?ip=<ip>&TOKEN=token'
|
99
99
|
|
100
100
|
# Optional array of JSON objects that don't require a lookup
|
101
101
|
config :iplist, :validate => :array, :required => false, :default => ['{"ip":"10.0.0.4","netname":"Application Gateway","subnet":"10.0.0.0\/24","hostname":"appgw"}']
|
102
102
|
|
103
|
+
# Optional Redis IP cache
|
104
|
+
config :use_redis, :validate => :boolean, :required => false, :default => false
|
105
|
+
|
103
106
|
|
104
107
|
|
105
108
|
public
|
106
109
|
def register
|
107
110
|
@pipe_id = Thread.current[:name].split("[").last.split("]").first
|
108
111
|
@logger.info("=== "+config_name+"/"+@pipe_id+"/"+@id[0,6]+" ===")
|
112
|
+
@logger.info("Contact me at jan@janmg.com, if something in this plugin doesn't work")
|
109
113
|
# TODO: consider multiple readers, so add pipeline @id or use logstash-to-logstash communication?
|
110
114
|
# TODO: Implement retry ... Error: Connection refused - Failed to open TCP connection to
|
111
115
|
|
@@ -118,11 +122,12 @@ def register
|
|
118
122
|
# 2. connection_string
|
119
123
|
# 3. storageaccount / access_key
|
120
124
|
|
121
|
-
|
125
|
+
unless connection_string.nil?
|
126
|
+
conn = connection_string.value
|
127
|
+
end
|
122
128
|
unless sas_token.nil?
|
123
|
-
# TODO: Fix SAS Tokens
|
124
129
|
unless sas_token.value.start_with?('?')
|
125
|
-
conn = "BlobEndpoint=https://#{storageaccount}
|
130
|
+
conn = "BlobEndpoint=https://#{storageaccount}.#{dns_suffix};SharedAccessSignature=#{sas_token.value}"
|
126
131
|
else
|
127
132
|
conn = sas_token.value
|
128
133
|
end
|
@@ -132,6 +137,7 @@ def register
|
|
132
137
|
else
|
133
138
|
@blob_client = Azure::Storage::Blob::BlobService.create(
|
134
139
|
storage_account_name: storageaccount,
|
140
|
+
storage_dns_suffix: dns_suffix,
|
135
141
|
storage_access_key: access_key.value,
|
136
142
|
)
|
137
143
|
end
|
@@ -200,7 +206,7 @@ def run(queue)
|
|
200
206
|
|
201
207
|
# Worklist is the subset of files where the already read offset is smaller than the file size
|
202
208
|
worklist = filelist.select {|name,file| file[:offset] < file[:length]}
|
203
|
-
@logger.
|
209
|
+
@logger.debug(@pipe_id+" worklist contains #{worklist.size} blobs to process")
|
204
210
|
# This would be ideal for threading since it's IO intensive, would be nice with a ruby native ThreadPool
|
205
211
|
worklist.each do |name, file|
|
206
212
|
res = resource(name)
|
@@ -255,6 +261,8 @@ def stop
|
|
255
261
|
end
|
256
262
|
|
257
263
|
|
264
|
+
|
265
|
+
private
|
258
266
|
def full_read(filename)
|
259
267
|
return @blob_client.get_blob(container, filename)[1]
|
260
268
|
end
|
@@ -371,6 +379,7 @@ end
|
|
371
379
|
def learn_encapsulation
|
372
380
|
# From one file, read first block and last block to learn head and tail
|
373
381
|
blob = @blob_client.list_blobs(container, { maxresults: 1, prefix: @prefix }).first
|
382
|
+
return if blob.nil?
|
374
383
|
blocks = @blob_client.list_blob_blocks(container, blob.name)[:committed]
|
375
384
|
@logger.info(@pipe_id+" using #{blob.name} to learn the json header and tail")
|
376
385
|
@head = @blob_client.get_blob(container, blob.name, start_range: 0, end_range: blocks.first.size-1)[1]
|
@@ -397,6 +406,7 @@ end
|
|
397
406
|
|
398
407
|
|
399
408
|
# Optional lookup for netname and hostname for the srcip and dstip returned in a Hash
|
409
|
+
# TODO: split out to own class
|
400
410
|
def addip(srcip, dstip)
|
401
411
|
#TODO: return anonymous merge
|
402
412
|
srcjson = JSON.parse(lookup(srcip))
|
@@ -409,8 +419,10 @@ def lookup(ip)
|
|
409
419
|
unless @red.nil?
|
410
420
|
res = @red.get(ip)
|
411
421
|
end
|
422
|
+
uri = URI.parse(iplookup.sub('<ip>',ip))
|
423
|
+
res = Net::HTTP.get(uri)
|
412
424
|
if res.nil?
|
413
|
-
res = Net::HTTP.get(
|
425
|
+
res = Net::HTTP.get(uri)
|
414
426
|
unless @red.nil?
|
415
427
|
@red.set(ip, res)
|
416
428
|
@red.expire(ip,604800)
|
@@ -1,9 +1,20 @@
|
|
1
1
|
Gem::Specification.new do |s|
|
2
2
|
s.name = 'logstash-input-azure_blob_storage'
|
3
|
-
s.version = '0.10.
|
3
|
+
s.version = '0.10.1'
|
4
4
|
s.licenses = ['Apache-2.0']
|
5
5
|
s.summary = 'This logstash plugin reads and parses data from Azure Storage Blobs.'
|
6
|
-
s.description =
|
6
|
+
s.description = <<-EOF
|
7
|
+
This gem is a Logstash plugin. It reads and parses data from Azure Storage Blobs. The azure_blob_storage is a reimplementation to replace azureblob from azure-diagnostics-tools/Logstash. It can deal with larger volumes and partial file reads and eliminating a delay when rebuilding the registry.
|
8
|
+
|
9
|
+
The logstash pipeline configuration would look like this
|
10
|
+
input {
|
11
|
+
azure_blob_storage {
|
12
|
+
storageaccount => "yourstorageaccountname"
|
13
|
+
access_key => "Ba5e64c0d3=="
|
14
|
+
container => "insights-logs-networksecuritygroupflowevent"
|
15
|
+
}
|
16
|
+
}
|
17
|
+
EOF
|
7
18
|
s.homepage = 'https://github.com/janmg/logstash-input-azure_blob_storage'
|
8
19
|
s.authors = ['Jan Geertsma']
|
9
20
|
s.email = 'jan@janmg.com'
|
@@ -8,4 +8,11 @@ describe LogStash::Inputs::AzureBlobStorage do
|
|
8
8
|
let(:config) { { "interval" => 100 } }
|
9
9
|
end
|
10
10
|
|
11
|
+
def test_helper_methodes
|
12
|
+
assert_equal('b', AzureBlobStorage.val('a=b')
|
13
|
+
assert_equal('whatever', AzureBlobStorage.strip_comma(',whatever')
|
14
|
+
assert_equal('whatever', AzureBlobStorage.strip_comma('whatever,')
|
15
|
+
assert_equal('whatever', AzureBlobStorage.strip_comma(',whatever,')
|
16
|
+
assert_equal('whatever', AzureBlobStorage.strip_comma('whatever')
|
17
|
+
end
|
11
18
|
end
|
metadata
CHANGED
@@ -1,14 +1,14 @@
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
2
2
|
name: logstash-input-azure_blob_storage
|
3
3
|
version: !ruby/object:Gem::Version
|
4
|
-
version: 0.10.
|
4
|
+
version: 0.10.1
|
5
5
|
platform: ruby
|
6
6
|
authors:
|
7
7
|
- Jan Geertsma
|
8
8
|
autorequire:
|
9
9
|
bindir: bin
|
10
10
|
cert_chain: []
|
11
|
-
date: 2019-
|
11
|
+
date: 2019-03-22 00:00:00.000000000 Z
|
12
12
|
dependencies:
|
13
13
|
- !ruby/object:Gem::Dependency
|
14
14
|
requirement: !ruby/object:Gem::Requirement
|
@@ -86,10 +86,17 @@ dependencies:
|
|
86
86
|
- - ">="
|
87
87
|
- !ruby/object:Gem::Version
|
88
88
|
version: 1.0.0
|
89
|
-
description:
|
90
|
-
|
91
|
-
|
92
|
-
|
89
|
+
description: |2
|
90
|
+
This gem is a Logstash plugin. It reads and parses data from Azure Storage Blobs. The azure_blob_storage is a reimplementation to replace azureblob from azure-diagnostics-tools/Logstash. It can deal with larger volumes and partial file reads and eliminating a delay when rebuilding the registry.
|
91
|
+
|
92
|
+
The logstash pipeline configuration would look like this
|
93
|
+
input {
|
94
|
+
azure_blob_storage {
|
95
|
+
storageaccount => "yourstorageaccountname"
|
96
|
+
access_key => "Ba5e64c0d3=="
|
97
|
+
container => "insights-logs-networksecuritygroupflowevent"
|
98
|
+
}
|
99
|
+
}
|
93
100
|
email: jan@janmg.com
|
94
101
|
executables: []
|
95
102
|
extensions: []
|