logstash-input-azure_blob_storage 0.10.0

Sign up to get free protection for your applications and to get access to all the features.
@@ -0,0 +1,7 @@
1
+ ---
2
+ SHA256:
3
+ metadata.gz: 47775d226b17cd57d8ce290569e35cf893713f42757bd953fd46b93055733527
4
+ data.tar.gz: de73405430b71405ecc0a4873a72feefabdb7dd7f410734dd550928623a39c53
5
+ SHA512:
6
+ metadata.gz: a028a0df1310312d9a1826de016407693ca14002c316d08267de3be51abc94aaa9db997e3fb19677cd23db551d029d7e1acffc65068eb0b2ca4a2a6d408dbbe7
7
+ data.tar.gz: b54aa9e59046793f26bfaa5fcd2795c809f2fdd535449f00a8ab6b7392107f8c9a2d3f937a3d24d1c0bcb2890d60493baeb360b29c1188666781ecc9e4e7b014
@@ -0,0 +1,2 @@
1
+ ## 0.1.0
2
+ - Plugin created with the logstash plugin generator
@@ -0,0 +1,10 @@
1
+ The following is a list of people who have contributed ideas, code, bug
2
+ reports, or in general have helped logstash along its way.
3
+
4
+ Contributors:
5
+ * Jan Geertsma - jan@janmg.com
6
+
7
+ Note: If you've sent us patches, bug reports, or otherwise contributed to
8
+ Logstash, and you aren't on the list above and want to be, please let us know
9
+ and we'll make sure you're here. Contributions from folks like you are what make
10
+ open source awesome.
@@ -0,0 +1,2 @@
1
+ # logstash-input-azure_blob_storage
2
+ Example input plugin. This should help bootstrap your effort to write your own input plugin!
data/Gemfile ADDED
@@ -0,0 +1,3 @@
1
+ source 'https://rubygems.org'
2
+ gemspec
3
+
data/LICENSE ADDED
@@ -0,0 +1,11 @@
1
+ Licensed under the Apache License, Version 2.0 (the "License");
2
+ you may not use this file except in compliance with the License.
3
+ You may obtain a copy of the License at
4
+
5
+ http://www.apache.org/licenses/LICENSE-2.0
6
+
7
+ Unless required by applicable law or agreed to in writing, software
8
+ distributed under the License is distributed on an "AS IS" BASIS,
9
+ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
10
+ See the License for the specific language governing permissions and
11
+ limitations under the License.
@@ -0,0 +1,102 @@
1
+ # Logstash Plugin
2
+
3
+ This is a plugin for [Logstash](https://github.com/elastic/logstash).
4
+
5
+ It is fully free and fully open source. The license is Apache 2.0, meaning you are pretty much free to use it however you want in whatever way.
6
+
7
+ ## Documentation
8
+
9
+ All plugin documentation are placed under one [central location](http://www.elastic.co/guide/en/logstash/current/).
10
+
11
+ ## Need Help?
12
+
13
+ Need help? Try #logstash on freenode IRC or the https://discuss.elastic.co/c/logstash discussion forum. For real problems or feature requests, raise a github issue. Pull requests will ionly be merged after discussion through an issue.
14
+
15
+ ## Purpose
16
+ This plugin can read from Azure Storage Blobs, after every interval it will write a registry to the storageaccount to save the information of how many bytes per blob (file) are read and processed. After all files are processed and at least one interval has passed a new file list is generated and a worklist is constructed that will be processed. When a file has already been processed before, partial files are read from the offset to the filesize at the time of the file listing. If the codec is JSON partial files will be have the header and tail will be added. They can be configured. If logtype is nsgflowlog, the plugin will process the splitting into individual tuple events. The logtype wadiis may in the future be used to process the grok formats to split into log lines. Any other format is fed into the queue as one event per file or partial file. It's then up to the filter to split and mutate the file format. use source => message in the filter {} block.
17
+
18
+ ## Installation
19
+ This plugin can be installed through logstash-plugin
20
+ ```
21
+ logstash-plugin install logstash-input-azure_blob_storage
22
+ ```
23
+
24
+ ## Enabling NSG Flowlogs
25
+ 1. Enable Network Watcher in your regions
26
+ 2. Create Storage account per region
27
+ v1 or v2 are both fine
28
+ Any resource group works fine, NetworkWatcherRG would be the best
29
+ 3. Enable in Network Watcher for every NSG the NSG Flow logs
30
+ the list_blobs has a limit of 5000 files, with one file per hour per nsg make sure the retention time is set so that all files can be seen. for 180 NSG's with 1 day retention is 4320 files, more retention leads to delays in processing. So either use multiple storage accounts with multiple pipelines, or use the same storage account with a prefix to separate.
31
+ 4. In storage account there will be a/ container / resourceID
32
+ {storageaccount}.blob.core.windows.net/insights-logs-networksecuritygroupflowevent/resourceId=/SUBSCRIPTIONS/{UUID}/RESOURCEGROUPS/{RG}/PROVIDERS/MICROSOFT.NETWORK/NETWORKSECURITYGROUPS/{NSG}/y=2019/m=02/d=12/h=07/m=00/macAddress={MAC}/PT1H.json
33
+ 5. Get credentials of the storageaccount
34
+ - SAS token (shared access signature) starts with a '?'
35
+ - connection string ... one string with all the connection details
36
+ - Access key (key1 or key2)
37
+
38
+ ## Troubleshooting
39
+
40
+ The default loglevel can be changed in global logstash.yml. On the info level, the plugin save offsets to the registry every interval and will log statistics of processed events (one ) plugin will print for each pipeline the first 6 characters of the ID, in DEBUG the yml log level debug shows details of number of events per (partial) files that are read.
41
+ ```
42
+ log.level
43
+ ```
44
+ The log level of the plugin can be put into DEBUG through
45
+
46
+ ```
47
+ curl -XPUT 'localhost:9600/_node/logging?pretty' -H 'Content-Type: application/json' -d'{"logger.logstash.inputs.azureblobstorage" : "DEBUG"}'
48
+ ```
49
+
50
+
51
+ ## Configuration Examples
52
+ ```
53
+ input {
54
+ azure_blob_storage {
55
+ storageaccount => "yourstorageaccountname"
56
+ access_key => "Ba5e64c0d3=="
57
+ container => "insights-logs-networksecuritygroupflowevent"
58
+ }
59
+ }
60
+
61
+ filter {
62
+ json {
63
+ source => "message"
64
+ }
65
+ mutate {
66
+ add_field => { "environment" => "test-env" }
67
+ remove_field => [ "message" ]
68
+ }
69
+ date {
70
+ match => ["unixtimestamp", "UNIX"]
71
+ }
72
+ }
73
+
74
+ output {
75
+ elasticsearch {
76
+ hosts => "elasticsearch"
77
+ index => "nsg-flow-logs-%{+xxxx.ww}"
78
+ }
79
+ }
80
+ ```
81
+
82
+ You can include additional options to tweak the operations
83
+ ```
84
+ input {
85
+ azure_blob_storage {
86
+ storageaccount => "yourstorageaccountname"
87
+ access_key => "Ba5e64c0d3=="
88
+ container => "insights-logs-networksecuritygroupflowevent"
89
+ codec => "json"
90
+ logtype => "nsgflowlog"
91
+ prefix => "resourceId=/"
92
+ registry_create_policy => "resume"
93
+ interval => 60
94
+ iplookup => "http://10.0.0.5:6081/ripe.php?ip="
95
+ use_redis => true
96
+ iplist => [
97
+ "{'ip':'10.0.0.4','netname':'Application Gateway','subnet':'10.0.0.0\/24','hostname':'appgw'}",
98
+ "{'ip':'36.156.24.96',netname':'China Mobile','subnet':'36.156.0.0\/16','hostname':'bigbadwolf'}"
99
+ ]
100
+ }
101
+ }
102
+ ```
@@ -0,0 +1,422 @@
1
+ # encoding: utf-8
2
+ require "logstash/inputs/base"
3
+ require "stud/interval"
4
+ require 'azure/storage/blob'
5
+ #require 'securerandom'
6
+ #require 'rbconfig'
7
+ #require 'date'
8
+ #require 'json'
9
+ #require 'thread'
10
+ #require "redis"
11
+ #require 'net/http'
12
+
13
+ # This is a logstash input plugin for files in Azure Blob Storage. There is a storage explorer in the portal and an application with the same name https://storageexplorer.com. A storage account has by default a globally unique name, {storageaccount}.blob.core.windows.net which is a CNAME to Azures blob servers blob.*.store.core.windows.net. A storageaccount has an container and those have a directory and blobs (like files) and blobs are constructed of or more blocks. Some Azure diagnostics can send events to an EventHub that can be parse through the plugin logstash-input-azure_event_hubs, but for the events that are only stored in an storage account, use this plugin. The original logstash-input-azureblob from azure-diagnostics-tools is great for low volumes, but it suffers from outdated client, slow reads, lease locking issues and json parse errors.
14
+ # https://azure.microsoft.com/en-us/services/storage/blobs/
15
+ class LogStash::Inputs::AzureBlobStorage < LogStash::Inputs::Base
16
+ config_name "azure_blob_storage"
17
+
18
+ # If undefined, Logstash will complain, even if codec is unused. The codec for nsgflowlog has to be JSON and the for WADIIS and APPSERVICE it has to be plain.
19
+ default :codec, "json"
20
+
21
+ # logtype can be nsgflowlog, wadiis, appservice or raw. The default is raw, where files are read and added as one event. If the file grows, the next interval the file is read from the offset, so that the delta is sent as another event. In raw mode, further processing has to be done in the filter block. If the logtype is specified, this plugin will split and mutate and add individual events to the queue.
22
+ config :logtype, :validate => ['nsgflowlog','wadiis','appservice','raw'], :default => 'raw'
23
+
24
+ # The storage account is accessed through Azure::Storage::Blob::BlobService, it needs either a sas_token, connection string or a storageaccount/access_key pair.
25
+ # https://github.com/Azure/azure-storage-ruby/blob/master/blob/lib/azure/storage/blob/blob_service.rb#L42
26
+ config :connection_string, :validate => :password
27
+
28
+ # The storage account name for the azure storage account.
29
+ config :storageaccount, :validate => :string
30
+
31
+ # The (primary or secondary) Access Key for the the storage account. The key can be found in the portal.azure.com or through the azure api StorageAccounts/ListKeys. For example the PowerShell command Get-AzStorageAccountKey.
32
+ config :access_key, :validate => :password
33
+
34
+ # SAS is the Shared Access Signature, that provides restricted access rights. If the sas_token is absent, the access_key is used instead.
35
+ config :sas_token, :validate => :password
36
+
37
+ # The container of the blobs.
38
+ config :container, :validate => :string, :default => 'insights-logs-networksecuritygroupflowevent'
39
+
40
+ # The registry file keeps track of the files that have been processed and until which offset in bytes. It's similar in function
41
+ #
42
+ # The default, `data/registry`, it contains a Ruby Marshal Serialized Hash of the filename the offset read sofar and the filelength the list time a filelisting was done.
43
+ config :registry_path, :validate => :string, :required => false, :default => 'data/registry.dat'
44
+
45
+ # The default, `resume`, will load the registry offsets and will start processing files from the offsets.
46
+ # When set to `start_over`, all log files are processed from begining.
47
+ # when set to `start_fresh`, it will read log files that are created or appended since this start of the pipeline.
48
+ config :registry_create_policy, :validate => ['resume','start_over','start_fresh'], :required => false, :default => 'resume'
49
+
50
+ # The registry keeps track of the files that where already procesed. The interval is used to save the registry regularly, when new events have have been processed. It is also used to wait before listing the files again and substraciting the registry of already processed files to determine the worklist.
51
+ #
52
+ # waiting time in seconds until processing the next batch. NSGFLOWLOGS append a block per minute, so use multiples of 60 seconds, 300 for 5 minutes, 600 for 10 minutes. The registry is also saved after every interval.
53
+ # Partial reading starts from the offset and reads until the end, so the starting tag is prepended
54
+ #
55
+ # A00000000000000000000000000000000 12 {"records":[
56
+ # D672f4bbd95a04209b00dc05d899e3cce 2576 json objects for 1st minute
57
+ # D7fe0d4f275a84c32982795b0e5c7d3a1 2312 json objects for 2nd minute
58
+ # Z00000000000000000000000000000000 2 ]}
59
+ config :interval, :validate => :number, :default => 60
60
+
61
+ # WAD IIS Grok Pattern
62
+ #config :grokpattern, :validate => :string, :required => false, :default => '%{TIMESTAMP_ISO8601:log_timestamp} %{NOTSPACE:instanceId} %{NOTSPACE:instanceId2} %{IPORHOST:ServerIP} %{WORD:httpMethod} %{URIPATH:requestUri} %{NOTSPACE:requestQuery} %{NUMBER:port} %{NOTSPACE:username} %{IPORHOST:clientIP} %{NOTSPACE:httpVersion} %{NOTSPACE:userAgent} %{NOTSPACE:cookie} %{NOTSPACE:referer} %{NOTSPACE:host} %{NUMBER:httpStatus} %{NUMBER:subresponse} %{NUMBER:win32response} %{NUMBER:sentBytes:int} %{NUMBER:receivedBytes:int} %{NUMBER:timeTaken:int}'
63
+
64
+ # The string that starts the JSON. Only needed when the codec is JSON. When partial file are read, the result will not be valid JSON unless the start and end are put back. the file_head and file_tail are learned at startup, by reading the first file in the blob_list and taking the first and last block, this would work for blobs that are appended like nsgflowlogs. The configuration can be set to override the learning. In case learning fails and the option is not set, the default is to use the 'records' as set by nsgflowlogs.
65
+ config :file_head, :validate => :string, :required => false, :default => '{"records":['
66
+ # The string that ends the JSON
67
+ config :file_tail, :validate => :string, :required => false, :default => ']}'
68
+
69
+ # The path(s) to the file(s) to use as an input. By default it will
70
+ # watch every files in the storage container.
71
+ # You can use filename patterns here, such as `logs/*.log`.
72
+ # If you use a pattern like `logs/**/*.log`, a recursive search
73
+ # of `logs` will be done for all `*.log` files.
74
+ # Do not include a leading `/`, as Azure path look like this:
75
+ # `path/to/blob/file.txt`
76
+ #
77
+ # You may also configure multiple paths. See an example
78
+ # on the <<array,Logstash configuration page>>.
79
+ # For NSGFLOWLOGS a path starts with "resourceId=/", but this would only be needed to exclude other files that may be written in the same container.
80
+ config :prefix, :validate => :string, :required => false
81
+
82
+ # Set the value for the registry file.
83
+ #
84
+ # The default, `data/registry`, it contains a Ruby Marshal Serialized Hash of the filename the offset read sofar and the filelength the list time a filelisting was done.
85
+ config :registry_path, :validate => :string, :required => false, :default => 'data/registry'
86
+
87
+ # The default, `resume`, will load the registry offsets and will start processing files from the offsets.
88
+ # When set to `start_over`, all log files are processed from begining.
89
+ # when set to `start_fresh`, it will read log files that are created or appended since this start of the pipeline.
90
+ config :registry_create_policy, :validate => ['resume','start_over','start_fresh'], :required => false, :default => 'resume'
91
+
92
+ # Optional to enrich NSGFLOWLOGS with netname and subnet the iplookup value points to a webservice that provides the information in JSON format like this.
93
+ # {"ip":"8.8.8.8","netname":"Google","subnet":"8.8.8.0\/24","hostname":"google-public-dns-a.google.com"}
94
+ config :iplookup, :validate => :string, :required => false, :default => 'http://127.0.0.1/ripe.php?ip='
95
+
96
+ # Optional Redis IP cache
97
+ config :use_redis, :validate => :boolean, :required => false, :default => false
98
+
99
+
100
+ # Optional array of JSON objects that don't require a lookup
101
+ config :iplist, :validate => :array, :required => false, :default => ['{"ip":"10.0.0.4","netname":"Application Gateway","subnet":"10.0.0.0\/24","hostname":"appgw"}']
102
+
103
+
104
+
105
+ public
106
+ def register
107
+ @pipe_id = Thread.current[:name].split("[").last.split("]").first
108
+ @logger.info("=== "+config_name+"/"+@pipe_id+"/"+@id[0,6]+" ===")
109
+ # TODO: consider multiple readers, so add pipeline @id or use logstash-to-logstash communication?
110
+ # TODO: Implement retry ... Error: Connection refused - Failed to open TCP connection to
111
+
112
+ # counter for all processed events since the start of this pipeline
113
+ @processed = 0
114
+ @regsaved = @processed
115
+
116
+ # Try in this order to access the storageaccount
117
+ # 1. storageaccount / sas_token
118
+ # 2. connection_string
119
+ # 3. storageaccount / access_key
120
+
121
+ conn = connection_string
122
+ unless sas_token.nil?
123
+ # TODO: Fix SAS Tokens
124
+ unless sas_token.value.start_with?('?')
125
+ conn = "BlobEndpoint=https://#{storageaccount}.blob.core.windows.net;SharedAccessSignature=#{sas_token.value}"
126
+ else
127
+ conn = sas_token.value
128
+ end
129
+ end
130
+ unless conn.nil?
131
+ @blob_client = Azure::Storage::Blob::BlobService.create_from_connection_string(conn)
132
+ else
133
+ @blob_client = Azure::Storage::Blob::BlobService.create(
134
+ storage_account_name: storageaccount,
135
+ storage_access_key: access_key.value,
136
+ )
137
+ end
138
+
139
+ # redis is optional to cache ip's from the optional iplookup
140
+ # iplookups are optional and so is the dependancy for caching through redis
141
+ if use_redis && !iplookup.nil?
142
+ begin
143
+ require 'redis'
144
+ rescue LoadError
145
+ require 'rubygems/dependency_installer'
146
+ installer = Gem::DependencyInstaller.new
147
+ installer.install 'redis'
148
+ Gem.refresh
149
+ Gem::Specification.find_by_name('redis').activate
150
+ require 'redis'
151
+ ensure
152
+ @red = Redis.new
153
+ end
154
+ end
155
+
156
+ @registry = Hash.new
157
+ unless registry_create_policy == "start_over"
158
+ begin
159
+ @registry = Marshal.load(@blob_client.get_blob(container, registry_path)[1])
160
+ #[0] headers [1] responsebody
161
+ rescue
162
+ @registry.clear
163
+ end
164
+ end
165
+ # read filelist and set offsets to file length to mark all the old files as done
166
+ if registry_create_policy == "start_fresh"
167
+ @registry.each do |name, file|
168
+ @registry.store(name, { :offset => file[:length], :length => file[:length] })
169
+ end
170
+ end
171
+
172
+ @is_json = (defined?(LogStash::Codecs::JSON) == 'constant') && (@codec.is_a? LogStash::Codecs::JSON)
173
+ @head = ''
174
+ @tail = ''
175
+ # if codec=json sniff one files blocks A and Z to learn file_head and file_tail
176
+ if @is_json
177
+ learn_encapsulation
178
+ if file_head
179
+ @head = file_head
180
+ end
181
+ if file_tail
182
+ @tail = file_tail
183
+ end
184
+ end
185
+ end # def register
186
+
187
+ def run(queue)
188
+ filelist = Hash.new
189
+
190
+ # we can abort the loop if stop? becomes true
191
+ while !stop?
192
+ chrono = Time.now.to_i
193
+ # load te registry, compare it's offsets to file list, set offset to 0 for new files, process the whole list and if finished within the interval wait for next loop,
194
+ # TODO: sort by timestamp
195
+ #filelist.sort_by(|k,v|resource(k)[:date])
196
+
197
+ filelist = list_blobs()
198
+ save_registry(filelist)
199
+ @registry = filelist
200
+
201
+ # Worklist is the subset of files where the already read offset is smaller than the file size
202
+ worklist = filelist.select {|name,file| file[:offset] < file[:length]}
203
+ @logger.info(@pipe_id+" worklist contains #{worklist.size} blobs to process")
204
+ # This would be ideal for threading since it's IO intensive, would be nice with a ruby native ThreadPool
205
+ worklist.each do |name, file|
206
+ res = resource(name)
207
+ if file[:offset] == 0
208
+ chunk = full_read(name)
209
+ # this may read more than originally listed
210
+ file[:length]=chunk.size
211
+ else
212
+ chunk = partial_read_json(name, file[:offset], file[:length])
213
+ @logger.debug(@pipe_id+" partial file #{res[:nsg]} [#{res[:date]}]")
214
+ end
215
+ if logtype == "nsgflowlog" && @is_json
216
+ begin
217
+ @processed += nsgflowlog(queue, JSON.parse(chunk))
218
+ rescue JSON::ParserError
219
+ @logger.error(@pipe_id+" parse error on #{res[:nsg]} [#{res[:date]}] offset: #{file[:offset]} length: #{file[:length]}")
220
+ end
221
+ # TODO Convert this to line based grokking.
222
+ elsif logtype == "wadiis" && !@is_json
223
+ @processed += wadiislog(queue, file[:name])
224
+ else
225
+ @codec.decode(chunk) do |event|
226
+ decorate(event)
227
+ queue << event
228
+ end
229
+ @processed += 1
230
+ end
231
+ @logger.debug(@pipe_id+" Processed #{res[:nsg]} [#{res[:date]}] #{@processed} events")
232
+ @registry.store(name, { :offset => file[:length], :length => file[:length] })
233
+ # if stop? good moment to stop what we're doing
234
+ if stop?
235
+ return
236
+ end
237
+ # save the registry regularly
238
+ now = Time.now.to_i
239
+ if ((now - chrono) > interval)
240
+ save_registry(@registry)
241
+ chrono = now
242
+ end
243
+ end
244
+ # Save the registry and sleep until the remaining polling interval is over
245
+ save_registry(@registry)
246
+ sleeptime = interval - (Time.now.to_i - chrono)
247
+ Stud.stoppable_sleep(sleeptime) { stop? }
248
+ end
249
+
250
+ # event = LogStash::Event.new("message" => @message, "host" => @host)
251
+ end # def run
252
+
253
+ def stop
254
+ save_registry(@registry)
255
+ end
256
+
257
+
258
+ def full_read(filename)
259
+ return @blob_client.get_blob(container, filename)[1]
260
+ end
261
+
262
+ def partial_read_json(filename, offset, length)
263
+ content = @blob_client.get_blob(container, filename, start_range: offset-@tail.length, end_range: length-1)[1]
264
+ if content.end_with?(@tail)
265
+ # the tail is part of the last block, so included in the total length of the get_blob
266
+ return @head + strip_comma(content)
267
+ else
268
+ # when the file has grown between list_blobs and the time of partial reading, the tail will be wrong
269
+ return @head + strip_comma(content[0...-@tail.length]) + @tail
270
+ end
271
+ end
272
+
273
+ def strip_comma(str)
274
+ # when skipping over the first blocks the json will start with a comma that needs to be stripped. there should not be a trailing comma, but it gets stripped too
275
+ if str.start_with?(',')
276
+ str[0] = ''
277
+ end
278
+ str.nil? ? nil : str.chomp(",")
279
+ end
280
+
281
+
282
+
283
+ def nsgflowlog(queue, json)
284
+ count=0
285
+ json["records"].each do |record|
286
+ res = resource(record["resourceId"])
287
+ resource = { :subscription => res[:subscription], :resourcegroup => res[:resourcegroup], :nsg => res[:nsg] }
288
+ @logger.trace(resource.to_s)
289
+ record["properties"]["flows"].each do |flows|
290
+ rule = resource.merge ({ :rule => flows["rule"]})
291
+ flows["flows"].each do |flowx|
292
+ flowx["flowTuples"].each do |tup|
293
+ tups = tup.split(',')
294
+ ev = rule.merge({:unixtimestamp => tups[0], :src_ip => tups[1], :dst_ip => tups[2], :src_port => tups[3], :dst_port => tups[4], :protocol => tups[5], :direction => tups[6], :decision => tups[7]})
295
+ if (record["properties"]["Version"]==2)
296
+ ev.merge!( {:flowstate => tups[8], :src_pack => tups[9], :src_bytes => tups[10], :dst_pack => tups[11], :dst_bytes => tups[12]} )
297
+ end
298
+ unless iplookup.nil?
299
+ ev.merge!(addip(tups[1], tups[2]))
300
+ end
301
+ @logger.trace(ev.to_s)
302
+ event = LogStash::Event.new('message' => ev.to_json)
303
+ decorate(event)
304
+ queue << event
305
+ count+=1
306
+ end
307
+ end
308
+ end
309
+ end
310
+ return count
311
+ end
312
+
313
+ def wadiislog(lines)
314
+ count=0
315
+ lines.each do |line|
316
+ unless line.start_with?('#')
317
+ queue << LogStash::Event.new('message' => ev.to_json)
318
+ count+=1
319
+ end
320
+ end
321
+ return count
322
+ # date {
323
+ # match => [ "log_timestamp", "YYYY-MM-dd HH:mm:ss" ]
324
+ # target => "@timestamp"
325
+ # remove_field => ["log_timestamp"]
326
+ # }
327
+ end
328
+
329
+ # list all blobs in the blobstore, set the offsets from the registry and return the filelist
330
+ def list_blobs()
331
+ files = Hash.new
332
+ nextMarker = nil
333
+ loop do
334
+ blobs = @blob_client.list_blobs(@container, { marker: nextMarker, prefix: @prefix })
335
+ blobs.each do |blob|
336
+ # exclude the registry itself
337
+ unless blob.name == @registry_path
338
+ offset = 0
339
+ length = blob.properties[:content_length].to_i
340
+ off = @registry[blob.name]
341
+ unless off.nil?
342
+ @logger.debug(@pipe_id+" seen #{blob.name} which is #{length} with offset #{offset}")
343
+ offset = off[:offset]
344
+ end
345
+ files.store(blob.name, { :offset => offset, :length => length })
346
+ end
347
+ end
348
+ nextMarker = blobs.continuation_token
349
+ break unless nextMarker && !nextMarker.empty?
350
+ end
351
+ @logger.debug(@pipe_id+" list_blobs found #{files.size} blobs")
352
+ return files
353
+ end
354
+
355
+ # When events were processed after the last registry save, start a thread to update the registry file.
356
+ def save_registry(filelist)
357
+ # TODO because of threading, processed values and regsaved are not thread safe, they can change as instance variable @!
358
+ unless @processed == @regsaved
359
+ @regsaved = @processed
360
+ @logger.info(@pipe_id+" processed #{@processed} events, saving #{filelist.size} blobs and offsets to registry #{registry_path}")
361
+ Thread.new {
362
+ begin
363
+ @blob_client.create_block_blob(container, registry_path, Marshal.dump(filelist))
364
+ rescue
365
+ @logger.error(@pipe_id+" Oh my, registry write failed, do you have write access?")
366
+ end
367
+ }
368
+ end
369
+ end
370
+
371
+ def learn_encapsulation
372
+ # From one file, read first block and last block to learn head and tail
373
+ blob = @blob_client.list_blobs(container, { maxresults: 1, prefix: @prefix }).first
374
+ blocks = @blob_client.list_blob_blocks(container, blob.name)[:committed]
375
+ @logger.info(@pipe_id+" using #{blob.name} to learn the json header and tail")
376
+ @head = @blob_client.get_blob(container, blob.name, start_range: 0, end_range: blocks.first.size-1)[1]
377
+ @logger.info(@pipe_id+" learned header: #{@head}")
378
+ length = blob.properties[:content_length].to_i
379
+ offset = length - blocks.last.size
380
+ @tail = @blob_client.get_blob(container, blob.name, start_range: offset, end_range: length-1)[1]
381
+ @logger.info(@pipe_id+" learned tail: #{@tail}")
382
+ end
383
+
384
+ def resource(str)
385
+ temp = str.split('/')
386
+ date = '---'
387
+ unless temp[9].nil?
388
+ date = val(temp[9])+'/'+val(temp[10])+'/'+val(temp[11])+'-'+val(temp[12])+':00'
389
+ end
390
+ return {:subscription=> temp[2], :resourcegroup=>temp[4], :nsg=>temp[8], :date=>date}
391
+ end
392
+
393
+ def val(str)
394
+ return str.split('=')[1]
395
+ end
396
+
397
+
398
+
399
+ # Optional lookup for netname and hostname for the srcip and dstip returned in a Hash
400
+ def addip(srcip, dstip)
401
+ #TODO: return anonymous merge
402
+ srcjson = JSON.parse(lookup(srcip))
403
+ dstjson = JSON.parse(lookup(dstip))
404
+ return {:srcnet=>srcjson["netname"],:srchost=>srcjson["hostname"],:dstnet=>dstjson["netname"],:dsthost=>dstjson["hostname"]}
405
+ end
406
+
407
+ def lookup(ip)
408
+ # TODO if ip in iplist return config
409
+ unless @red.nil?
410
+ res = @red.get(ip)
411
+ end
412
+ if res.nil?
413
+ res = Net::HTTP.get(URI(iplookup + ip))
414
+ unless @red.nil?
415
+ @red.set(ip, res)
416
+ @red.expire(ip,604800)
417
+ end
418
+ end
419
+ return res
420
+ end
421
+
422
+ end # class LogStash::Inputs::AzureBlobStorage
@@ -0,0 +1,26 @@
1
+ Gem::Specification.new do |s|
2
+ s.name = 'logstash-input-azure_blob_storage'
3
+ s.version = '0.10.0'
4
+ s.licenses = ['Apache-2.0']
5
+ s.summary = 'This logstash plugin reads and parses data from Azure Storage Blobs.'
6
+ s.description = 'This gem is a Logstash plugin. It reads and parses data from Azure Storage Blobs. The azure_blob_storage is a rewrite to replace azureblob from azure-diagnostics-tools/Logstash. It can deal with larger volumes and partial file reads and eliminating a delay when rebuilding the registry'
7
+ s.homepage = 'https://github.com/janmg/logstash-input-azure_blob_storage'
8
+ s.authors = ['Jan Geertsma']
9
+ s.email = 'jan@janmg.com'
10
+ s.require_paths = ['lib']
11
+
12
+ # Files
13
+ s.files = Dir['lib/**/*','spec/**/*','vendor/**/*','*.gemspec','*.md','CONTRIBUTORS','Gemfile','LICENSE','NOTICE.TXT']
14
+ # Tests
15
+ s.test_files = s.files.grep(%r{^(test|spec|features)/})
16
+
17
+ # Special flag to let us know this is actually a logstash plugin
18
+ s.metadata = { "logstash_plugin" => "true", "logstash_group" => "input" }
19
+
20
+ # Gem dependencies
21
+ s.add_runtime_dependency "logstash-core-plugin-api", "~> 2.0"
22
+ s.add_runtime_dependency 'logstash-codec-plain', '~> 3.0'
23
+ s.add_runtime_dependency 'stud', '~> 0.0.22'
24
+ s.add_runtime_dependency 'azure-storage-blob', '~> 1.0'
25
+ s.add_development_dependency 'logstash-devutils', '~> 1.0', '>= 1.0.0'
26
+ end
@@ -0,0 +1,11 @@
1
+ # encoding: utf-8
2
+ require "logstash/devutils/rspec/spec_helper"
3
+ require "logstash/inputs/azure_blob_storage"
4
+
5
+ describe LogStash::Inputs::AzureBlobStorage do
6
+
7
+ it_behaves_like "an interruptible input plugin" do
8
+ let(:config) { { "interval" => 100 } }
9
+ end
10
+
11
+ end
metadata ADDED
@@ -0,0 +1,134 @@
1
+ --- !ruby/object:Gem::Specification
2
+ name: logstash-input-azure_blob_storage
3
+ version: !ruby/object:Gem::Version
4
+ version: 0.10.0
5
+ platform: ruby
6
+ authors:
7
+ - Jan Geertsma
8
+ autorequire:
9
+ bindir: bin
10
+ cert_chain: []
11
+ date: 2019-02-27 00:00:00.000000000 Z
12
+ dependencies:
13
+ - !ruby/object:Gem::Dependency
14
+ requirement: !ruby/object:Gem::Requirement
15
+ requirements:
16
+ - - "~>"
17
+ - !ruby/object:Gem::Version
18
+ version: '2.0'
19
+ name: logstash-core-plugin-api
20
+ prerelease: false
21
+ type: :runtime
22
+ version_requirements: !ruby/object:Gem::Requirement
23
+ requirements:
24
+ - - "~>"
25
+ - !ruby/object:Gem::Version
26
+ version: '2.0'
27
+ - !ruby/object:Gem::Dependency
28
+ requirement: !ruby/object:Gem::Requirement
29
+ requirements:
30
+ - - "~>"
31
+ - !ruby/object:Gem::Version
32
+ version: '3.0'
33
+ name: logstash-codec-plain
34
+ prerelease: false
35
+ type: :runtime
36
+ version_requirements: !ruby/object:Gem::Requirement
37
+ requirements:
38
+ - - "~>"
39
+ - !ruby/object:Gem::Version
40
+ version: '3.0'
41
+ - !ruby/object:Gem::Dependency
42
+ requirement: !ruby/object:Gem::Requirement
43
+ requirements:
44
+ - - "~>"
45
+ - !ruby/object:Gem::Version
46
+ version: 0.0.22
47
+ name: stud
48
+ prerelease: false
49
+ type: :runtime
50
+ version_requirements: !ruby/object:Gem::Requirement
51
+ requirements:
52
+ - - "~>"
53
+ - !ruby/object:Gem::Version
54
+ version: 0.0.22
55
+ - !ruby/object:Gem::Dependency
56
+ requirement: !ruby/object:Gem::Requirement
57
+ requirements:
58
+ - - "~>"
59
+ - !ruby/object:Gem::Version
60
+ version: '1.0'
61
+ name: azure-storage-blob
62
+ prerelease: false
63
+ type: :runtime
64
+ version_requirements: !ruby/object:Gem::Requirement
65
+ requirements:
66
+ - - "~>"
67
+ - !ruby/object:Gem::Version
68
+ version: '1.0'
69
+ - !ruby/object:Gem::Dependency
70
+ requirement: !ruby/object:Gem::Requirement
71
+ requirements:
72
+ - - "~>"
73
+ - !ruby/object:Gem::Version
74
+ version: '1.0'
75
+ - - ">="
76
+ - !ruby/object:Gem::Version
77
+ version: 1.0.0
78
+ name: logstash-devutils
79
+ prerelease: false
80
+ type: :development
81
+ version_requirements: !ruby/object:Gem::Requirement
82
+ requirements:
83
+ - - "~>"
84
+ - !ruby/object:Gem::Version
85
+ version: '1.0'
86
+ - - ">="
87
+ - !ruby/object:Gem::Version
88
+ version: 1.0.0
89
+ description: This gem is a Logstash plugin. It reads and parses data from Azure Storage
90
+ Blobs. The azure_blob_storage is a rewrite to replace azureblob from azure-diagnostics-tools/Logstash.
91
+ It can deal with larger volumes and partial file reads and eliminating a delay when
92
+ rebuilding the registry
93
+ email: jan@janmg.com
94
+ executables: []
95
+ extensions: []
96
+ extra_rdoc_files: []
97
+ files:
98
+ - CHANGELOG.md
99
+ - CONTRIBUTORS
100
+ - DEVELOPER.md
101
+ - Gemfile
102
+ - LICENSE
103
+ - README.md
104
+ - lib/logstash/inputs/azure_blob_storage.rb
105
+ - logstash-input-azure_blob_storage.gemspec
106
+ - spec/inputs/azure_blob_storage_spec.rb
107
+ homepage: https://github.com/janmg/logstash-input-azure_blob_storage
108
+ licenses:
109
+ - Apache-2.0
110
+ metadata:
111
+ logstash_plugin: 'true'
112
+ logstash_group: input
113
+ post_install_message:
114
+ rdoc_options: []
115
+ require_paths:
116
+ - lib
117
+ required_ruby_version: !ruby/object:Gem::Requirement
118
+ requirements:
119
+ - - ">="
120
+ - !ruby/object:Gem::Version
121
+ version: '0'
122
+ required_rubygems_version: !ruby/object:Gem::Requirement
123
+ requirements:
124
+ - - ">="
125
+ - !ruby/object:Gem::Version
126
+ version: '0'
127
+ requirements: []
128
+ rubyforge_project:
129
+ rubygems_version: 2.6.13
130
+ signing_key:
131
+ specification_version: 4
132
+ summary: This logstash plugin reads and parses data from Azure Storage Blobs.
133
+ test_files:
134
+ - spec/inputs/azure_blob_storage_spec.rb