logstash-input-azure_blob_storage 0.10.0 → 0.10.1

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: 47775d226b17cd57d8ce290569e35cf893713f42757bd953fd46b93055733527
4
- data.tar.gz: de73405430b71405ecc0a4873a72feefabdb7dd7f410734dd550928623a39c53
3
+ metadata.gz: db216440cf4319f70a5fbdb001a53da72826afdcbce43c04ada28b63c5d9e1f8
4
+ data.tar.gz: e2e519090c0d67b6b65f4570c34be4f0b02592864459f261c7be613486f2941b
5
5
  SHA512:
6
- metadata.gz: a028a0df1310312d9a1826de016407693ca14002c316d08267de3be51abc94aaa9db997e3fb19677cd23db551d029d7e1acffc65068eb0b2ca4a2a6d408dbbe7
7
- data.tar.gz: b54aa9e59046793f26bfaa5fcd2795c809f2fdd535449f00a8ab6b7392107f8c9a2d3f937a3d24d1c0bcb2890d60493baeb360b29c1188666781ecc9e4e7b014
6
+ metadata.gz: 341abd35cf3b732c1a0bada111cbac79cafa92fb48fd14ad24e0121245c67f1819c1d6e5e4647dda645773231da6aa2a61b7e1aaee4027fa97fe9e857a8a334f
7
+ data.tar.gz: 93542f740dda404889f623c9a9d460a1ac6f7181ddd6974de8d775e4e29585b8fccd43b9bd05a880250a4da2709121bf5f018f15a8ce49a9d758b1c645482cbf
data/CHANGELOG.md CHANGED
@@ -1,2 +1,3 @@
1
- ## 0.1.0
1
+ ## 0.10.0
2
2
  - Plugin created with the logstash plugin generator
3
+ - Reimplemented logstash-input-azureblob with incompatible config and data/registry
data/README.md CHANGED
@@ -10,10 +10,13 @@ All plugin documentation are placed under one [central location](http://www.elas
10
10
 
11
11
  ## Need Help?
12
12
 
13
- Need help? Try #logstash on freenode IRC or the https://discuss.elastic.co/c/logstash discussion forum. For real problems or feature requests, raise a github issue. Pull requests will ionly be merged after discussion through an issue.
13
+ Need help? Try #logstash on freenode IRC or the https://discuss.elastic.co/c/logstash discussion forum. For real problems or feature requests, raise a github issue [GITHUB/janmg/logstash-input-azure_blob_storage/](https://github.com/janmg/logstash-input-azure_blob_storage). Pull requests will ionly be merged after discussion through an issue.
14
14
 
15
15
  ## Purpose
16
- This plugin can read from Azure Storage Blobs, after every interval it will write a registry to the storageaccount to save the information of how many bytes per blob (file) are read and processed. After all files are processed and at least one interval has passed a new file list is generated and a worklist is constructed that will be processed. When a file has already been processed before, partial files are read from the offset to the filesize at the time of the file listing. If the codec is JSON partial files will be have the header and tail will be added. They can be configured. If logtype is nsgflowlog, the plugin will process the splitting into individual tuple events. The logtype wadiis may in the future be used to process the grok formats to split into log lines. Any other format is fed into the queue as one event per file or partial file. It's then up to the filter to split and mutate the file format. use source => message in the filter {} block.
16
+ This plugin can read from Azure Storage Blobs, for instance diagnostics logs for NSG flow logs or accesslogs from App Services.
17
+ [Azure Blob Storage](https://azure.microsoft.com/en-us/services/storage/blobs/)
18
+
19
+ After every interval it will write a registry to the storageaccount to save the information of how many bytes per blob (file) are read and processed. After all files are processed and at least one interval has passed a new file list is generated and a worklist is constructed that will be processed. When a file has already been processed before, partial files are read from the offset to the filesize at the time of the file listing. If the codec is JSON partial files will be have the header and tail will be added. They can be configured. If logtype is nsgflowlog, the plugin will process the splitting into individual tuple events. The logtype wadiis may in the future be used to process the grok formats to split into log lines. Any other format is fed into the queue as one event per file or partial file. It's then up to the filter to split and mutate the file format. use source => message in the filter {} block.
17
20
 
18
21
  ## Installation
19
22
  This plugin can be installed through logstash-plugin
@@ -49,6 +52,9 @@ curl -XPUT 'localhost:9600/_node/logging?pretty' -H 'Content-Type: application/j
49
52
 
50
53
 
51
54
  ## Configuration Examples
55
+ The minimum configuration required as input is storageaccount, access_key and container.
56
+
57
+ For nsgflowlogs, a simple configuration looks like this
52
58
  ```
53
59
  input {
54
60
  azure_blob_storage {
@@ -79,7 +85,7 @@ output {
79
85
  }
80
86
  ```
81
87
 
82
- You can include additional options to tweak the operations
88
+ It's possible to specify the optional parameters to overwrite the defaults. The iplookup, use_redis and iplist parameters are used for additional information about the source and destination ip address. Redis can be used for caching the results and iplist is to configure an array of ip addresses.
83
89
  ```
84
90
  input {
85
91
  azure_blob_storage {
@@ -90,7 +96,7 @@ input {
90
96
  logtype => "nsgflowlog"
91
97
  prefix => "resourceId=/"
92
98
  registry_create_policy => "resume"
93
- interval => 60
99
+ interval => 300
94
100
  iplookup => "http://10.0.0.5:6081/ripe.php?ip="
95
101
  use_redis => true
96
102
  iplist => [
@@ -100,3 +106,47 @@ input {
100
106
  }
101
107
  }
102
108
  ```
109
+
110
+ For WAD IIS and App Services the HTTP AccessLogs can be retrieved from a storage account as line based events and parsed through GROK. The date stamp can also be parsed with %{TIMESTAMP_ISO8601:log_timestamp}. For WAD IIS logfiles the container is wad-iis-logfiles. In the future grokking may happen already by the plugin.
111
+ ```
112
+ input {
113
+ azure_blob_storage {
114
+ storageaccount => "yourstorageaccountname"
115
+ access_key => "Ba5e64c0d3=="
116
+ container => "access-logs"
117
+ interval => 300
118
+ codec => line
119
+ }
120
+ }
121
+
122
+ filter {
123
+ if [message] =~ "^#" {
124
+ drop {}
125
+ }
126
+
127
+ mutate {
128
+ strip => "message"
129
+ }
130
+
131
+ grok {
132
+ match => ['message', '(?<timestamp>%{YEAR}-%{MONTHNUM}-%{MONTHDAY} %{HOUR}:%{MINUTE}:%{SECOND}\d+) %{NOTSPACE:instanceId} %{WORD:httpMethod} %{URIPATH:requestUri} %{NOTSPACE:requestQuery} %{NUMBER:port} %{NOTSPACE:username} %{IPORHOST:clientIP} %{NOTSPACE:userAgent} %{NOTSPACE:cookie} %{NOTSPACE:referer} %{NOTSPACE:host} %{NUMBER:httpStatus} %{NUMBER:subresponse} %{NUMBER:win32response} %{NUMBER:sentBytes:int} %{NUMBER:receivedBytes:int} %{NUMBER:timeTaken:int}']
133
+ }
134
+
135
+ date {
136
+ match => [ "timestamp", "YYYY-MM-dd HH:mm:ss" ]
137
+ target => "@timestamp"
138
+ }
139
+
140
+ mutate {
141
+ remove_field => ["log_timestamp"]
142
+ remove_field => ["message"]
143
+ remove_field => ["win32response"]
144
+ remove_field => ["subresponse"]
145
+ remove_field => ["username"]
146
+ remove_field => ["clientPort"]
147
+ remove_field => ["port"]
148
+ remove_field => ["timestamp"]
149
+ }
150
+ }
151
+ ```
152
+
@@ -7,10 +7,10 @@ require 'azure/storage/blob'
7
7
  #require 'date'
8
8
  #require 'json'
9
9
  #require 'thread'
10
- #require "redis"
10
+ #require 'redis'
11
11
  #require 'net/http'
12
12
 
13
- # This is a logstash input plugin for files in Azure Blob Storage. There is a storage explorer in the portal and an application with the same name https://storageexplorer.com. A storage account has by default a globally unique name, {storageaccount}.blob.core.windows.net which is a CNAME to Azures blob servers blob.*.store.core.windows.net. A storageaccount has an container and those have a directory and blobs (like files) and blobs are constructed of or more blocks. Some Azure diagnostics can send events to an EventHub that can be parse through the plugin logstash-input-azure_event_hubs, but for the events that are only stored in an storage account, use this plugin. The original logstash-input-azureblob from azure-diagnostics-tools is great for low volumes, but it suffers from outdated client, slow reads, lease locking issues and json parse errors.
13
+ # This is a logstash input plugin for files in Azure Blob Storage. There is a storage explorer in the portal and an application with the same name https://storageexplorer.com. A storage account has by default a globally unique name, {storageaccount}.blob.core.windows.net which is a CNAME to Azures blob servers blob.*.store.core.windows.net. A storageaccount has an container and those have a directory and blobs (like files). Blobs have one or more blocks. After writing the blocks, they can be committed. Some Azure diagnostics can send events to an EventHub that can be parse through the plugin logstash-input-azure_event_hubs, but for the events that are only stored in an storage account, use this plugin. The original logstash-input-azureblob from azure-diagnostics-tools is great for low volumes, but it suffers from outdated client, slow reads, lease locking issues and json parse errors.
14
14
  # https://azure.microsoft.com/en-us/services/storage/blobs/
15
15
  class LogStash::Inputs::AzureBlobStorage < LogStash::Inputs::Base
16
16
  config_name "azure_blob_storage"
@@ -23,16 +23,19 @@ class LogStash::Inputs::AzureBlobStorage < LogStash::Inputs::Base
23
23
 
24
24
  # The storage account is accessed through Azure::Storage::Blob::BlobService, it needs either a sas_token, connection string or a storageaccount/access_key pair.
25
25
  # https://github.com/Azure/azure-storage-ruby/blob/master/blob/lib/azure/storage/blob/blob_service.rb#L42
26
- config :connection_string, :validate => :password
26
+ config :connection_string, :validate => :password, :required => false
27
27
 
28
28
  # The storage account name for the azure storage account.
29
- config :storageaccount, :validate => :string
29
+ config :storageaccount, :validate => :string, :required => false
30
+
31
+ # DNS Suffix other then blob.core.windows.net
32
+ config :dns_suffix, :validate => :string, :required => false, :default => 'core.windows.net'
30
33
 
31
34
  # The (primary or secondary) Access Key for the the storage account. The key can be found in the portal.azure.com or through the azure api StorageAccounts/ListKeys. For example the PowerShell command Get-AzStorageAccountKey.
32
- config :access_key, :validate => :password
35
+ config :access_key, :validate => :password, :required => false
33
36
 
34
37
  # SAS is the Shared Access Signature, that provides restricted access rights. If the sas_token is absent, the access_key is used instead.
35
- config :sas_token, :validate => :password
38
+ config :sas_token, :validate => :password, :required => false
36
39
 
37
40
  # The container of the blobs.
38
41
  config :container, :validate => :string, :default => 'insights-logs-networksecuritygroupflowevent'
@@ -91,21 +94,22 @@ class LogStash::Inputs::AzureBlobStorage < LogStash::Inputs::Base
91
94
 
92
95
  # Optional to enrich NSGFLOWLOGS with netname and subnet the iplookup value points to a webservice that provides the information in JSON format like this.
93
96
  # {"ip":"8.8.8.8","netname":"Google","subnet":"8.8.8.0\/24","hostname":"google-public-dns-a.google.com"}
94
- config :iplookup, :validate => :string, :required => false, :default => 'http://127.0.0.1/ripe.php?ip='
95
-
96
- # Optional Redis IP cache
97
- config :use_redis, :validate => :boolean, :required => false, :default => false
98
-
97
+ # In the query parameter has the <ip> tag will be replaced by the IP address to lookup, other parameters are optional and according to your lookup service.
98
+ config :iplookup, :validate => :string, :required => false, :default => 'http://127.0.0.1/ripe.php?ip=<ip>&TOKEN=token'
99
99
 
100
100
  # Optional array of JSON objects that don't require a lookup
101
101
  config :iplist, :validate => :array, :required => false, :default => ['{"ip":"10.0.0.4","netname":"Application Gateway","subnet":"10.0.0.0\/24","hostname":"appgw"}']
102
102
 
103
+ # Optional Redis IP cache
104
+ config :use_redis, :validate => :boolean, :required => false, :default => false
105
+
103
106
 
104
107
 
105
108
  public
106
109
  def register
107
110
  @pipe_id = Thread.current[:name].split("[").last.split("]").first
108
111
  @logger.info("=== "+config_name+"/"+@pipe_id+"/"+@id[0,6]+" ===")
112
+ @logger.info("Contact me at jan@janmg.com, if something in this plugin doesn't work")
109
113
  # TODO: consider multiple readers, so add pipeline @id or use logstash-to-logstash communication?
110
114
  # TODO: Implement retry ... Error: Connection refused - Failed to open TCP connection to
111
115
 
@@ -118,11 +122,12 @@ def register
118
122
  # 2. connection_string
119
123
  # 3. storageaccount / access_key
120
124
 
121
- conn = connection_string
125
+ unless connection_string.nil?
126
+ conn = connection_string.value
127
+ end
122
128
  unless sas_token.nil?
123
- # TODO: Fix SAS Tokens
124
129
  unless sas_token.value.start_with?('?')
125
- conn = "BlobEndpoint=https://#{storageaccount}.blob.core.windows.net;SharedAccessSignature=#{sas_token.value}"
130
+ conn = "BlobEndpoint=https://#{storageaccount}.#{dns_suffix};SharedAccessSignature=#{sas_token.value}"
126
131
  else
127
132
  conn = sas_token.value
128
133
  end
@@ -132,6 +137,7 @@ def register
132
137
  else
133
138
  @blob_client = Azure::Storage::Blob::BlobService.create(
134
139
  storage_account_name: storageaccount,
140
+ storage_dns_suffix: dns_suffix,
135
141
  storage_access_key: access_key.value,
136
142
  )
137
143
  end
@@ -200,7 +206,7 @@ def run(queue)
200
206
 
201
207
  # Worklist is the subset of files where the already read offset is smaller than the file size
202
208
  worklist = filelist.select {|name,file| file[:offset] < file[:length]}
203
- @logger.info(@pipe_id+" worklist contains #{worklist.size} blobs to process")
209
+ @logger.debug(@pipe_id+" worklist contains #{worklist.size} blobs to process")
204
210
  # This would be ideal for threading since it's IO intensive, would be nice with a ruby native ThreadPool
205
211
  worklist.each do |name, file|
206
212
  res = resource(name)
@@ -255,6 +261,8 @@ def stop
255
261
  end
256
262
 
257
263
 
264
+
265
+ private
258
266
  def full_read(filename)
259
267
  return @blob_client.get_blob(container, filename)[1]
260
268
  end
@@ -371,6 +379,7 @@ end
371
379
  def learn_encapsulation
372
380
  # From one file, read first block and last block to learn head and tail
373
381
  blob = @blob_client.list_blobs(container, { maxresults: 1, prefix: @prefix }).first
382
+ return if blob.nil?
374
383
  blocks = @blob_client.list_blob_blocks(container, blob.name)[:committed]
375
384
  @logger.info(@pipe_id+" using #{blob.name} to learn the json header and tail")
376
385
  @head = @blob_client.get_blob(container, blob.name, start_range: 0, end_range: blocks.first.size-1)[1]
@@ -397,6 +406,7 @@ end
397
406
 
398
407
 
399
408
  # Optional lookup for netname and hostname for the srcip and dstip returned in a Hash
409
+ # TODO: split out to own class
400
410
  def addip(srcip, dstip)
401
411
  #TODO: return anonymous merge
402
412
  srcjson = JSON.parse(lookup(srcip))
@@ -409,8 +419,10 @@ def lookup(ip)
409
419
  unless @red.nil?
410
420
  res = @red.get(ip)
411
421
  end
422
+ uri = URI.parse(iplookup.sub('<ip>',ip))
423
+ res = Net::HTTP.get(uri)
412
424
  if res.nil?
413
- res = Net::HTTP.get(URI(iplookup + ip))
425
+ res = Net::HTTP.get(uri)
414
426
  unless @red.nil?
415
427
  @red.set(ip, res)
416
428
  @red.expire(ip,604800)
@@ -1,9 +1,20 @@
1
1
  Gem::Specification.new do |s|
2
2
  s.name = 'logstash-input-azure_blob_storage'
3
- s.version = '0.10.0'
3
+ s.version = '0.10.1'
4
4
  s.licenses = ['Apache-2.0']
5
5
  s.summary = 'This logstash plugin reads and parses data from Azure Storage Blobs.'
6
- s.description = 'This gem is a Logstash plugin. It reads and parses data from Azure Storage Blobs. The azure_blob_storage is a rewrite to replace azureblob from azure-diagnostics-tools/Logstash. It can deal with larger volumes and partial file reads and eliminating a delay when rebuilding the registry'
6
+ s.description = <<-EOF
7
+ This gem is a Logstash plugin. It reads and parses data from Azure Storage Blobs. The azure_blob_storage is a reimplementation to replace azureblob from azure-diagnostics-tools/Logstash. It can deal with larger volumes and partial file reads and eliminating a delay when rebuilding the registry.
8
+
9
+ The logstash pipeline configuration would look like this
10
+ input {
11
+ azure_blob_storage {
12
+ storageaccount => "yourstorageaccountname"
13
+ access_key => "Ba5e64c0d3=="
14
+ container => "insights-logs-networksecuritygroupflowevent"
15
+ }
16
+ }
17
+ EOF
7
18
  s.homepage = 'https://github.com/janmg/logstash-input-azure_blob_storage'
8
19
  s.authors = ['Jan Geertsma']
9
20
  s.email = 'jan@janmg.com'
@@ -8,4 +8,11 @@ describe LogStash::Inputs::AzureBlobStorage do
8
8
  let(:config) { { "interval" => 100 } }
9
9
  end
10
10
 
11
+ def test_helper_methodes
12
+ assert_equal('b', AzureBlobStorage.val('a=b')
13
+ assert_equal('whatever', AzureBlobStorage.strip_comma(',whatever')
14
+ assert_equal('whatever', AzureBlobStorage.strip_comma('whatever,')
15
+ assert_equal('whatever', AzureBlobStorage.strip_comma(',whatever,')
16
+ assert_equal('whatever', AzureBlobStorage.strip_comma('whatever')
17
+ end
11
18
  end
metadata CHANGED
@@ -1,14 +1,14 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: logstash-input-azure_blob_storage
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.10.0
4
+ version: 0.10.1
5
5
  platform: ruby
6
6
  authors:
7
7
  - Jan Geertsma
8
8
  autorequire:
9
9
  bindir: bin
10
10
  cert_chain: []
11
- date: 2019-02-27 00:00:00.000000000 Z
11
+ date: 2019-03-22 00:00:00.000000000 Z
12
12
  dependencies:
13
13
  - !ruby/object:Gem::Dependency
14
14
  requirement: !ruby/object:Gem::Requirement
@@ -86,10 +86,17 @@ dependencies:
86
86
  - - ">="
87
87
  - !ruby/object:Gem::Version
88
88
  version: 1.0.0
89
- description: This gem is a Logstash plugin. It reads and parses data from Azure Storage
90
- Blobs. The azure_blob_storage is a rewrite to replace azureblob from azure-diagnostics-tools/Logstash.
91
- It can deal with larger volumes and partial file reads and eliminating a delay when
92
- rebuilding the registry
89
+ description: |2
90
+ This gem is a Logstash plugin. It reads and parses data from Azure Storage Blobs. The azure_blob_storage is a reimplementation to replace azureblob from azure-diagnostics-tools/Logstash. It can deal with larger volumes and partial file reads and eliminating a delay when rebuilding the registry.
91
+
92
+ The logstash pipeline configuration would look like this
93
+ input {
94
+ azure_blob_storage {
95
+ storageaccount => "yourstorageaccountname"
96
+ access_key => "Ba5e64c0d3=="
97
+ container => "insights-logs-networksecuritygroupflowevent"
98
+ }
99
+ }
93
100
  email: jan@janmg.com
94
101
  executables: []
95
102
  extensions: []