logstash-input-azure_blob_storage 0.12.6 → 0.12.7
Sign up to get free protection for your applications and to get access to all the features.
- checksums.yaml +4 -4
- data/CHANGELOG.md +8 -3
- data/README.md +24 -7
- data/lib/logstash/inputs/azure_blob_storage.rb +108 -24
- data/logstash-input-azure_blob_storage.gemspec +1 -1
- metadata +2 -2
checksums.yaml
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
---
|
2
2
|
SHA256:
|
3
|
-
metadata.gz:
|
4
|
-
data.tar.gz:
|
3
|
+
metadata.gz: 6bc1a46c4c6ae533e05c83f0e7cb90715cad7390a5cedb9b6e023c46f2e620d1
|
4
|
+
data.tar.gz: 520d7b5131a6b00b6de066a12cd93a99082c7af0bb7184df9f2bc9c8ca64babd
|
5
5
|
SHA512:
|
6
|
-
metadata.gz:
|
7
|
-
data.tar.gz:
|
6
|
+
metadata.gz: 3c069008cfef9b08c4b9793b24538c9c8bdc217b64285626d3c9564a57584b237bfef90f4382e4b68366c2555b1b9a6e91d897951bbcc336b355eaefb310ce00
|
7
|
+
data.tar.gz: ccb7ba1d556cec586872ebe1c94237b3223f484902218d3bff899993467b741519c521b9c08075f98328536cf31274cd2aa386f64458097b025bbef2841c486d
|
data/CHANGELOG.md
CHANGED
@@ -1,7 +1,12 @@
|
|
1
|
-
## PROBABLY 0.12.4 is the most stable version until I sort out when and why JSON Parse errors happen
|
2
|
-
Join the discussion if you have something to share!
|
3
|
-
https://github.com/janmg/logstash-input-azure_blob_storage/issues/34
|
4
1
|
|
2
|
+
## 0.12.7
|
3
|
+
- rewrote partial_read, now the occasional json parse errors should be fixed by reading only commited blocks.
|
4
|
+
(This may also have been related to reading a second partial_read, where the offset wasn't updated correctly?)
|
5
|
+
- used the new header and tail block name, should now learn header and footer automatically again?
|
6
|
+
- added addall to the configurations to add system, mac, category, time, operation to the output
|
7
|
+
- added optional environment configuration option
|
8
|
+
- removed the date, which was always set to ---
|
9
|
+
- made a start on event rewriting to make it ECS compatibility
|
5
10
|
|
6
11
|
## 0.12.6
|
7
12
|
- Fixed the 0.12.5 exception handling, it actually caused a warning to become a fatal pipeline crashing error
|
data/README.md
CHANGED
@@ -42,9 +42,11 @@ input {
|
|
42
42
|
## Additional Configuration
|
43
43
|
The registry keeps track of files in the storage account, their size and how many bytes have been processed. Files can grow and the added part will be processed as a partial file. The registry is saved todisk every interval.
|
44
44
|
|
45
|
+
The interval is also defines when a new round of listing files and processing data should happen. The NSGFLOWLOG's are written every minute into a new block of the hourly blob. This data can be partially read, because the plugin knows the JSON head and tail and removes the leading comma and fixes the JSON before parsing new events
|
46
|
+
|
45
47
|
The registry_create_policy determines at the start of the pipeline if processing should resume from the last known unprocessed file, or to start_fresh ignoring old files and start only processing new events that came after the start of the pipeline. Or start_over to process all the files ignoring the registry.
|
46
48
|
|
47
|
-
interval defines the minimum time the registry should be saved to the registry file
|
49
|
+
interval defines the minimum time the registry should be saved to the registry file. By default to 'data/registry.dat' in the storageaccount, but can be also kept on the server running logstash by setting registry_local_path. The registry is kept also in memory, the registry file is only needed in case the pipeline dies unexpectedly. During a normal shutdown the registry is also saved.
|
48
50
|
|
49
51
|
When registry_local_path is set to a directory, the registry is saved on the logstash server in that directory. The filename is the pipe.id
|
50
52
|
|
@@ -66,13 +68,15 @@ The pipeline can be started in several ways.
|
|
66
68
|
```
|
67
69
|
- As managed pipeline from Kibana
|
68
70
|
|
69
|
-
Logstash itself (so not specific to this plugin) has a feature where multiple instances can run on the same system. The default TCP port is 9600, but if it's already in use it will use 9601 (and up). To update a config file on a running instance on the commandline you can add the argument --config.reload.automatic and if you modify the files that are in the pipeline.yml you can send a SIGHUP channel to reload the pipelines where the config was changed.
|
71
|
+
Logstash itself (so not specific to this plugin) has a feature where multiple instances can run on the same system. The default TCP port is 9600, but if it's already in use it will use 9601 (and up), this is probably not true anymore from v8. To update a config file on a running instance on the commandline you can add the argument --config.reload.automatic and if you modify the files that are in the pipeline.yml you can send a SIGHUP channel to reload the pipelines where the config was changed.
|
70
72
|
[https://www.elastic.co/guide/en/logstash/current/reloading-config.html](https://www.elastic.co/guide/en/logstash/current/reloading-config.html)
|
71
73
|
|
72
74
|
## Internal Working
|
73
75
|
When the plugin is started, it will read all the filenames and sizes in the blob store excluding the directies of files that are excluded by the "path_filters". After every interval it will write a registry to the storageaccount to save the information of how many bytes per blob (file) are read and processed. After all files are processed and at least one interval has passed a new file list is generated and a worklist is constructed that will be processed. When a file has already been processed before, partial files are read from the offset to the filesize at the time of the file listing. If the codec is JSON partial files will be have the header and tail will be added. They can be configured. If logtype is nsgflowlog, the plugin will process the splitting into individual tuple events. The logtype wadiis may in the future be used to process the grok formats to split into log lines. Any other format is fed into the queue as one event per file or partial file. It's then up to the filter to split and mutate the file format.
|
74
76
|
|
75
|
-
By default the root of the json message is named "message"
|
77
|
+
By default the root of the json message is named "message", you can modify the content in the filter block
|
78
|
+
|
79
|
+
Additional fields can be enabled with addfilename and addall, ecs_compatibility is not yet supported.
|
76
80
|
|
77
81
|
The configurations and the rest of the code are in [https://github.com/janmg/logstash-input-azure_blob_storage/tree/master/lib/logstash/inputs](lib/logstash/inputs) [https://github.com/janmg/logstash-input-azure_blob_storage/blob/master/lib/logstash/inputs/azure_blob_storage.rb#L10](azure_blob_storage.rb)
|
78
82
|
|
@@ -130,7 +134,7 @@ filter {
|
|
130
134
|
}
|
131
135
|
|
132
136
|
output {
|
133
|
-
stdout { }
|
137
|
+
stdout { codec => rubydebug }
|
134
138
|
}
|
135
139
|
|
136
140
|
output {
|
@@ -139,24 +143,37 @@ output {
|
|
139
143
|
index => "nsg-flow-logs-%{+xxxx.ww}"
|
140
144
|
}
|
141
145
|
}
|
146
|
+
|
147
|
+
output {
|
148
|
+
file {
|
149
|
+
path => /tmp/abuse.txt
|
150
|
+
codec => line { format => "%{decision} %{flowstate} %{src_ip} ${dst_port}"}
|
151
|
+
}
|
152
|
+
}
|
153
|
+
|
142
154
|
```
|
143
155
|
A more elaborate input configuration example
|
144
156
|
```
|
145
157
|
input {
|
146
158
|
azure_blob_storage {
|
147
159
|
codec => "json"
|
148
|
-
storageaccount => "yourstorageaccountname"
|
149
|
-
access_key => "Ba5e64c0d3=="
|
160
|
+
# storageaccount => "yourstorageaccountname"
|
161
|
+
# access_key => "Ba5e64c0d3=="
|
162
|
+
connection_string => "DefaultEndpointsProtocol=https;AccountName=yourstorageaccountname;AccountKey=Ba5e64c0d3==;EndpointSuffix=core.windows.net"
|
150
163
|
container => "insights-logs-networksecuritygroupflowevent"
|
151
164
|
logtype => "nsgflowlog"
|
152
165
|
prefix => "resourceId=/"
|
153
166
|
path_filters => ['**/*.json']
|
154
167
|
addfilename => true
|
168
|
+
addall => true
|
169
|
+
environment => "dev-env"
|
155
170
|
registry_create_policy => "resume"
|
156
171
|
registry_local_path => "/usr/share/logstash/plugin"
|
157
172
|
interval => 300
|
158
173
|
debug_timer => true
|
159
|
-
debug_until =>
|
174
|
+
debug_until => 1000
|
175
|
+
addall => true
|
176
|
+
registry_create_policy => "start_over"
|
160
177
|
}
|
161
178
|
}
|
162
179
|
|
@@ -17,10 +17,12 @@ require 'json'
|
|
17
17
|
# D672f4bbd95a04209b00dc05d899e3cce 2576 json objects for 1st minute
|
18
18
|
# D7fe0d4f275a84c32982795b0e5c7d3a1 2312 json objects for 2nd minute
|
19
19
|
# Z00000000000000000000000000000000 2 ]}
|
20
|
-
|
20
|
+
#
|
21
|
+
# The azure-storage-ruby connects to the storageaccount and the files are read through get_blob. For partial read the options with start and end ar used.
|
22
|
+
# https://github.com/Azure/azure-storage-ruby/blob/master/blob/lib/azure/storage/blob/blob.rb#L89
|
23
|
+
#
|
21
24
|
# A storage account has by default a globally unique name, {storageaccount}.blob.core.windows.net which is a CNAME to Azures blob servers blob.*.store.core.windows.net. A storageaccount has an container and those have a directory and blobs (like files). Blobs have one or more blocks. After writing the blocks, they can be committed. Some Azure diagnostics can send events to an EventHub that can be parse through the plugin logstash-input-azure_event_hubs, but for the events that are only stored in an storage account, use this plugin. The original logstash-input-azureblob from azure-diagnostics-tools is great for low volumes, but it suffers from outdated client, slow reads, lease locking issues and json parse errors.
|
22
25
|
|
23
|
-
|
24
26
|
class LogStash::Inputs::AzureBlobStorage < LogStash::Inputs::Base
|
25
27
|
config_name "azure_blob_storage"
|
26
28
|
|
@@ -74,6 +76,12 @@ class LogStash::Inputs::AzureBlobStorage < LogStash::Inputs::Base
|
|
74
76
|
# add the filename as a field into the events
|
75
77
|
config :addfilename, :validate => :boolean, :default => false, :required => false
|
76
78
|
|
79
|
+
# add environment
|
80
|
+
config :environment, :validate => :string, :required => false
|
81
|
+
|
82
|
+
# add all resource details
|
83
|
+
config :addall, :validate => :boolean, :default => false, :required => false
|
84
|
+
|
77
85
|
# debug_until will at the creation of the pipeline for a maximum amount of processed messages shows 3 types of log printouts including processed filenames. After a number of events, the plugin will stop logging the events and continue silently. This is a lightweight alternative to switching the loglevel from info to debug or even trace to see what the plugin is doing and how fast at the start of the plugin. A good value would be approximately 3x the amount of events per file. For instance 6000 events.
|
78
86
|
config :debug_until, :validate => :number, :default => 0, :required => false
|
79
87
|
|
@@ -260,9 +268,8 @@ public
|
|
260
268
|
delta_size = 0
|
261
269
|
end
|
262
270
|
else
|
263
|
-
chunk =
|
264
|
-
delta_size = chunk.size
|
265
|
-
@logger.debug("partial file #{name} from #{file[:offset]} to #{file[:length]}")
|
271
|
+
chunk = partial_read(name, file[:offset])
|
272
|
+
delta_size = chunk.size - @head.length - 1
|
266
273
|
end
|
267
274
|
|
268
275
|
if logtype == "nsgflowlog" && @is_json
|
@@ -272,14 +279,13 @@ public
|
|
272
279
|
begin
|
273
280
|
fingjson = JSON.parse(chunk)
|
274
281
|
@processed += nsgflowlog(queue, fingjson, name)
|
275
|
-
@logger.debug("Processed #{res[:nsg]}
|
282
|
+
@logger.debug("Processed #{res[:nsg]} #{@processed} events")
|
276
283
|
rescue JSON::ParserError => e
|
277
|
-
@logger.error("parse error #{e.message} on #{res[:nsg]}
|
284
|
+
@logger.error("parse error #{e.message} on #{res[:nsg]} offset: #{file[:offset]} length: #{file[:length]}")
|
278
285
|
if (@debug_until > @processed) then @logger.info("#{chunk}") end
|
279
286
|
end
|
280
287
|
end
|
281
288
|
# TODO: Convert this to line based grokking.
|
282
|
-
# TODO: ECS Compliance?
|
283
289
|
elsif logtype == "wadiis" && !@is_json
|
284
290
|
@processed += wadiislog(queue, name)
|
285
291
|
else
|
@@ -398,14 +404,35 @@ private
|
|
398
404
|
return chuck
|
399
405
|
end
|
400
406
|
|
401
|
-
def
|
402
|
-
|
403
|
-
|
404
|
-
|
405
|
-
|
407
|
+
def partial_read(blobname, offset)
|
408
|
+
# 1. read committed blocks, calculate length
|
409
|
+
# 2. calculate the offset to read
|
410
|
+
# 3. strip comma
|
411
|
+
# if json strip comma and fix head and tail
|
412
|
+
size = 0
|
413
|
+
blocks = @blob_client.list_blob_blocks(container, blobname)
|
414
|
+
blocks[:committed].each do |block|
|
415
|
+
size += block.size
|
416
|
+
end
|
417
|
+
# read the new blob blocks from the offset to the last committed size.
|
418
|
+
# if it is json, fix the head and tail
|
419
|
+
# crap committed block at the end is the tail, so must be substracted from the read and then comma stripped and tail added.
|
420
|
+
# but why did I need a -1 for the length?? probably the offset starts at 0 and ends at size-1
|
421
|
+
|
422
|
+
# should first check commit, read and the check committed again? no, only read the commited size
|
423
|
+
# should read the full content and then substract json tail
|
424
|
+
|
425
|
+
if @is_json
|
426
|
+
content = @blob_client.get_blob(container, blobname, start_range: offset-1, end_range: size-1)[1]
|
427
|
+
if content.end_with?(@tail)
|
428
|
+
return @head + strip_comma(content)
|
429
|
+
else
|
430
|
+
@logger.info("Fixed a tail! probably new committed blocks started appearing!")
|
431
|
+
# substract the length of the tail and add the tail, because the file grew.size was calculated as the block boundary, so replacing the last bytes with the tail should fix the problem
|
432
|
+
return @head + strip_comma(content[0...-@tail.length]) + @tail
|
433
|
+
end
|
406
434
|
else
|
407
|
-
|
408
|
-
return @head + strip_comma(content[0...-@tail.length]) + @tail
|
435
|
+
content = @blob_client.get_blob(container, blobname, start_range: offset, end_range: size-1)[1]
|
409
436
|
end
|
410
437
|
end
|
411
438
|
|
@@ -422,8 +449,9 @@ private
|
|
422
449
|
count=0
|
423
450
|
begin
|
424
451
|
json["records"].each do |record|
|
425
|
-
|
426
|
-
resource = { :subscription => res[:subscription], :resourcegroup => res[:resourcegroup], :nsg => res[:nsg] }
|
452
|
+
resource = resource(record["resourceId"])
|
453
|
+
# resource = { :subscription => res[:subscription], :resourcegroup => res[:resourcegroup], :nsg => res[:nsg] }
|
454
|
+
extras = { :time => record["time"], :system => record["systemId"], :mac => record["macAddress"], :category => record["category"], :operation => record["operationName"] }
|
427
455
|
@logger.trace(resource.to_s)
|
428
456
|
record["properties"]["flows"].each do |flows|
|
429
457
|
rule = resource.merge ({ :rule => flows["rule"]})
|
@@ -442,7 +470,18 @@ private
|
|
442
470
|
if @addfilename
|
443
471
|
ev.merge!( {:filename => name } )
|
444
472
|
end
|
473
|
+
unless @environment.nil?
|
474
|
+
ev.merge!( {:environment => environment } )
|
475
|
+
end
|
476
|
+
if @addall
|
477
|
+
ev.merge!( extras )
|
478
|
+
end
|
479
|
+
|
480
|
+
# Add event to logstash queue
|
445
481
|
event = LogStash::Event.new('message' => ev.to_json)
|
482
|
+
#if @ecs_compatibility != "disabled"
|
483
|
+
# event = ecs(event)
|
484
|
+
#end
|
446
485
|
decorate(event)
|
447
486
|
queue << event
|
448
487
|
count+=1
|
@@ -563,11 +602,11 @@ private
|
|
563
602
|
unless blob.name == registry_path
|
564
603
|
begin
|
565
604
|
blocks = @blob_client.list_blob_blocks(container, blob.name)[:committed]
|
566
|
-
if blocks.first.name
|
605
|
+
if ['A00000000000000000000000000000000','QTAwMDAwMDAwMDAwMDAwMDAwMDAwMDAwMDAwMDAwMDAw'].include?(blocks.first.name)
|
567
606
|
@logger.debug("using #{blob.name}/#{blocks.first.name} to learn the json header")
|
568
607
|
@head = @blob_client.get_blob(container, blob.name, start_range: 0, end_range: blocks.first.size-1)[1]
|
569
608
|
end
|
570
|
-
if blocks.last.name
|
609
|
+
if ['Z00000000000000000000000000000000','WjAwMDAwMDAwMDAwMDAwMDAwMDAwMDAwMDAwMDAwMDAw'].include?(blocks.last.name)
|
571
610
|
@logger.debug("using #{blob.name}/#{blocks.last.name} to learn the json footer")
|
572
611
|
length = blob.properties[:content_length].to_i
|
573
612
|
offset = length - blocks.last.size
|
@@ -586,15 +625,60 @@ private
|
|
586
625
|
|
587
626
|
def resource(str)
|
588
627
|
temp = str.split('/')
|
589
|
-
date = '---'
|
590
|
-
unless temp[9].nil?
|
591
|
-
|
592
|
-
end
|
593
|
-
return {:subscription=> temp[2], :resourcegroup=>temp[4], :nsg=>temp[8]
|
628
|
+
#date = '---'
|
629
|
+
#unless temp[9].nil?
|
630
|
+
# date = val(temp[9])+'/'+val(temp[10])+'/'+val(temp[11])+'-'+val(temp[12])+':00'
|
631
|
+
#end
|
632
|
+
return {:subscription=> temp[2], :resourcegroup=>temp[4], :nsg=>temp[8]}
|
594
633
|
end
|
595
634
|
|
596
635
|
def val(str)
|
597
636
|
return str.split('=')[1]
|
598
637
|
end
|
599
638
|
|
639
|
+
=begin
|
640
|
+
def ecs(old)
|
641
|
+
# https://www.elastic.co/guide/en/ecs/current/ecs-field-reference.html
|
642
|
+
ecs = LogStash::Event.new()
|
643
|
+
ecs.set("ecs.version", "1.0.0")
|
644
|
+
ecs.set("@timestamp", old.timestamp)
|
645
|
+
ecs.set("cloud.provider", "azure")
|
646
|
+
ecs.set("cloud.account.id", old.get("[subscription]")
|
647
|
+
ecs.set("cloud.project.id", old.get("[environment]")
|
648
|
+
ecs.set("file.name", old.get("[filename]")
|
649
|
+
ecs.set("event.category", "network")
|
650
|
+
if old.get("[decision]") == "D"
|
651
|
+
ecs.set("event.type", "denied")
|
652
|
+
else
|
653
|
+
ecs.set("event.type", "allowed")
|
654
|
+
end
|
655
|
+
ecs.set("event.action", "")
|
656
|
+
ecs.set("rule.ruleset", old.get("[nsg]")
|
657
|
+
ecs.set("rule.name", old.get("[rule]")
|
658
|
+
ecs.set("trace.id", old.get("[protocol]")+"/"+old.get("[src_ip]")+":"+old.get("[src_port]")+"-"+old.get("[dst_ip]")+":"+old.get("[dst_port]")
|
659
|
+
# requires logic to match sockets and flip src/dst for outgoing.
|
660
|
+
ecs.set("host.mac", old.get("[mac]")
|
661
|
+
ecs.set("source.ip", old.get("[src_ip]")
|
662
|
+
ecs.set("source.port", old.get("[src_port]")
|
663
|
+
ecs.set("source.bytes", old.get("[srcbytes]")
|
664
|
+
ecs.set("source.packets", old.get("[src_pack]")
|
665
|
+
ecs.set("destination.ip", old.get("[dst_ip]")
|
666
|
+
ecs.set("destination.port", old.get("[dst_port]")
|
667
|
+
ecs.set("destination.bytes", old.get("[dst_bytes]")
|
668
|
+
ecs.set("destination.packets", old.get("[dst_packets]")
|
669
|
+
if old.get("[protocol]") = "U"
|
670
|
+
ecs.set("network.transport", "udp")
|
671
|
+
else
|
672
|
+
ecs.set("network.transport", "tcp")
|
673
|
+
end
|
674
|
+
if old.get("[decision]") == "I"
|
675
|
+
ecs.set("network.direction", "incoming")
|
676
|
+
else
|
677
|
+
ecs.set("network.direction", "outgoing")
|
678
|
+
end
|
679
|
+
ecs.set("network.bytes", old.get("[src_bytes]")+old.get("[dst_bytes]")
|
680
|
+
ecs.set("network.packets", old.get("[src_packets]")+old.get("[dst_packets]")
|
681
|
+
return ecs
|
682
|
+
end
|
683
|
+
=end
|
600
684
|
end # class LogStash::Inputs::AzureBlobStorage
|
metadata
CHANGED
@@ -1,14 +1,14 @@
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
2
2
|
name: logstash-input-azure_blob_storage
|
3
3
|
version: !ruby/object:Gem::Version
|
4
|
-
version: 0.12.
|
4
|
+
version: 0.12.7
|
5
5
|
platform: ruby
|
6
6
|
authors:
|
7
7
|
- Jan Geertsma
|
8
8
|
autorequire:
|
9
9
|
bindir: bin
|
10
10
|
cert_chain: []
|
11
|
-
date: 2023-
|
11
|
+
date: 2023-04-02 00:00:00.000000000 Z
|
12
12
|
dependencies:
|
13
13
|
- !ruby/object:Gem::Dependency
|
14
14
|
requirement: !ruby/object:Gem::Requirement
|