logstash-input-azure_blob_storage 0.12.6 → 0.12.7

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: b50189c380606c6fdb8b7f7216fe20d15c0d410f1c1f6670211baf25baa567ca
4
- data.tar.gz: 189c80c15720ec9a85b8bb223a5ae7e4666fd0ebd6a96946f201bee96cf3dafc
3
+ metadata.gz: 6bc1a46c4c6ae533e05c83f0e7cb90715cad7390a5cedb9b6e023c46f2e620d1
4
+ data.tar.gz: 520d7b5131a6b00b6de066a12cd93a99082c7af0bb7184df9f2bc9c8ca64babd
5
5
  SHA512:
6
- metadata.gz: 599ca22fd813634d3ffd5fbbef0361605fd7611ea4050bc85e30c06fe97dbfe6dcd879ee092573e8a94229435d25c7cef71255bc72f33ea3d4813de987600e4c
7
- data.tar.gz: 53cc0e73c25323ba891e90a820c679071516187d641ed2c5dd5810a5bbb9654c2cf67c6239d400b58d8786c4cc4737aaa54b0fc1f145b4136ebf1f6b0203a00d
6
+ metadata.gz: 3c069008cfef9b08c4b9793b24538c9c8bdc217b64285626d3c9564a57584b237bfef90f4382e4b68366c2555b1b9a6e91d897951bbcc336b355eaefb310ce00
7
+ data.tar.gz: ccb7ba1d556cec586872ebe1c94237b3223f484902218d3bff899993467b741519c521b9c08075f98328536cf31274cd2aa386f64458097b025bbef2841c486d
data/CHANGELOG.md CHANGED
@@ -1,7 +1,12 @@
1
- ## PROBABLY 0.12.4 is the most stable version until I sort out when and why JSON Parse errors happen
2
- Join the discussion if you have something to share!
3
- https://github.com/janmg/logstash-input-azure_blob_storage/issues/34
4
1
 
2
+ ## 0.12.7
3
+ - rewrote partial_read, now the occasional json parse errors should be fixed by reading only commited blocks.
4
+ (This may also have been related to reading a second partial_read, where the offset wasn't updated correctly?)
5
+ - used the new header and tail block name, should now learn header and footer automatically again?
6
+ - added addall to the configurations to add system, mac, category, time, operation to the output
7
+ - added optional environment configuration option
8
+ - removed the date, which was always set to ---
9
+ - made a start on event rewriting to make it ECS compatibility
5
10
 
6
11
  ## 0.12.6
7
12
  - Fixed the 0.12.5 exception handling, it actually caused a warning to become a fatal pipeline crashing error
data/README.md CHANGED
@@ -42,9 +42,11 @@ input {
42
42
  ## Additional Configuration
43
43
  The registry keeps track of files in the storage account, their size and how many bytes have been processed. Files can grow and the added part will be processed as a partial file. The registry is saved todisk every interval.
44
44
 
45
+ The interval is also defines when a new round of listing files and processing data should happen. The NSGFLOWLOG's are written every minute into a new block of the hourly blob. This data can be partially read, because the plugin knows the JSON head and tail and removes the leading comma and fixes the JSON before parsing new events
46
+
45
47
  The registry_create_policy determines at the start of the pipeline if processing should resume from the last known unprocessed file, or to start_fresh ignoring old files and start only processing new events that came after the start of the pipeline. Or start_over to process all the files ignoring the registry.
46
48
 
47
- interval defines the minimum time the registry should be saved to the registry file (by default to 'data/registry.dat'), this is only needed in case the pipeline dies unexpectedly. During a normal shutdown the registry is also saved.
49
+ interval defines the minimum time the registry should be saved to the registry file. By default to 'data/registry.dat' in the storageaccount, but can be also kept on the server running logstash by setting registry_local_path. The registry is kept also in memory, the registry file is only needed in case the pipeline dies unexpectedly. During a normal shutdown the registry is also saved.
48
50
 
49
51
  When registry_local_path is set to a directory, the registry is saved on the logstash server in that directory. The filename is the pipe.id
50
52
 
@@ -66,13 +68,15 @@ The pipeline can be started in several ways.
66
68
  ```
67
69
  - As managed pipeline from Kibana
68
70
 
69
- Logstash itself (so not specific to this plugin) has a feature where multiple instances can run on the same system. The default TCP port is 9600, but if it's already in use it will use 9601 (and up). To update a config file on a running instance on the commandline you can add the argument --config.reload.automatic and if you modify the files that are in the pipeline.yml you can send a SIGHUP channel to reload the pipelines where the config was changed.
71
+ Logstash itself (so not specific to this plugin) has a feature where multiple instances can run on the same system. The default TCP port is 9600, but if it's already in use it will use 9601 (and up), this is probably not true anymore from v8. To update a config file on a running instance on the commandline you can add the argument --config.reload.automatic and if you modify the files that are in the pipeline.yml you can send a SIGHUP channel to reload the pipelines where the config was changed.
70
72
  [https://www.elastic.co/guide/en/logstash/current/reloading-config.html](https://www.elastic.co/guide/en/logstash/current/reloading-config.html)
71
73
 
72
74
  ## Internal Working
73
75
  When the plugin is started, it will read all the filenames and sizes in the blob store excluding the directies of files that are excluded by the "path_filters". After every interval it will write a registry to the storageaccount to save the information of how many bytes per blob (file) are read and processed. After all files are processed and at least one interval has passed a new file list is generated and a worklist is constructed that will be processed. When a file has already been processed before, partial files are read from the offset to the filesize at the time of the file listing. If the codec is JSON partial files will be have the header and tail will be added. They can be configured. If logtype is nsgflowlog, the plugin will process the splitting into individual tuple events. The logtype wadiis may in the future be used to process the grok formats to split into log lines. Any other format is fed into the queue as one event per file or partial file. It's then up to the filter to split and mutate the file format.
74
76
 
75
- By default the root of the json message is named "message" so you can modify the content in the filter block
77
+ By default the root of the json message is named "message", you can modify the content in the filter block
78
+
79
+ Additional fields can be enabled with addfilename and addall, ecs_compatibility is not yet supported.
76
80
 
77
81
  The configurations and the rest of the code are in [https://github.com/janmg/logstash-input-azure_blob_storage/tree/master/lib/logstash/inputs](lib/logstash/inputs) [https://github.com/janmg/logstash-input-azure_blob_storage/blob/master/lib/logstash/inputs/azure_blob_storage.rb#L10](azure_blob_storage.rb)
78
82
 
@@ -130,7 +134,7 @@ filter {
130
134
  }
131
135
 
132
136
  output {
133
- stdout { }
137
+ stdout { codec => rubydebug }
134
138
  }
135
139
 
136
140
  output {
@@ -139,24 +143,37 @@ output {
139
143
  index => "nsg-flow-logs-%{+xxxx.ww}"
140
144
  }
141
145
  }
146
+
147
+ output {
148
+ file {
149
+ path => /tmp/abuse.txt
150
+ codec => line { format => "%{decision} %{flowstate} %{src_ip} ${dst_port}"}
151
+ }
152
+ }
153
+
142
154
  ```
143
155
  A more elaborate input configuration example
144
156
  ```
145
157
  input {
146
158
  azure_blob_storage {
147
159
  codec => "json"
148
- storageaccount => "yourstorageaccountname"
149
- access_key => "Ba5e64c0d3=="
160
+ # storageaccount => "yourstorageaccountname"
161
+ # access_key => "Ba5e64c0d3=="
162
+ connection_string => "DefaultEndpointsProtocol=https;AccountName=yourstorageaccountname;AccountKey=Ba5e64c0d3==;EndpointSuffix=core.windows.net"
150
163
  container => "insights-logs-networksecuritygroupflowevent"
151
164
  logtype => "nsgflowlog"
152
165
  prefix => "resourceId=/"
153
166
  path_filters => ['**/*.json']
154
167
  addfilename => true
168
+ addall => true
169
+ environment => "dev-env"
155
170
  registry_create_policy => "resume"
156
171
  registry_local_path => "/usr/share/logstash/plugin"
157
172
  interval => 300
158
173
  debug_timer => true
159
- debug_until => 100
174
+ debug_until => 1000
175
+ addall => true
176
+ registry_create_policy => "start_over"
160
177
  }
161
178
  }
162
179
 
@@ -17,10 +17,12 @@ require 'json'
17
17
  # D672f4bbd95a04209b00dc05d899e3cce 2576 json objects for 1st minute
18
18
  # D7fe0d4f275a84c32982795b0e5c7d3a1 2312 json objects for 2nd minute
19
19
  # Z00000000000000000000000000000000 2 ]}
20
-
20
+ #
21
+ # The azure-storage-ruby connects to the storageaccount and the files are read through get_blob. For partial read the options with start and end ar used.
22
+ # https://github.com/Azure/azure-storage-ruby/blob/master/blob/lib/azure/storage/blob/blob.rb#L89
23
+ #
21
24
  # A storage account has by default a globally unique name, {storageaccount}.blob.core.windows.net which is a CNAME to Azures blob servers blob.*.store.core.windows.net. A storageaccount has an container and those have a directory and blobs (like files). Blobs have one or more blocks. After writing the blocks, they can be committed. Some Azure diagnostics can send events to an EventHub that can be parse through the plugin logstash-input-azure_event_hubs, but for the events that are only stored in an storage account, use this plugin. The original logstash-input-azureblob from azure-diagnostics-tools is great for low volumes, but it suffers from outdated client, slow reads, lease locking issues and json parse errors.
22
25
 
23
-
24
26
  class LogStash::Inputs::AzureBlobStorage < LogStash::Inputs::Base
25
27
  config_name "azure_blob_storage"
26
28
 
@@ -74,6 +76,12 @@ class LogStash::Inputs::AzureBlobStorage < LogStash::Inputs::Base
74
76
  # add the filename as a field into the events
75
77
  config :addfilename, :validate => :boolean, :default => false, :required => false
76
78
 
79
+ # add environment
80
+ config :environment, :validate => :string, :required => false
81
+
82
+ # add all resource details
83
+ config :addall, :validate => :boolean, :default => false, :required => false
84
+
77
85
  # debug_until will at the creation of the pipeline for a maximum amount of processed messages shows 3 types of log printouts including processed filenames. After a number of events, the plugin will stop logging the events and continue silently. This is a lightweight alternative to switching the loglevel from info to debug or even trace to see what the plugin is doing and how fast at the start of the plugin. A good value would be approximately 3x the amount of events per file. For instance 6000 events.
78
86
  config :debug_until, :validate => :number, :default => 0, :required => false
79
87
 
@@ -260,9 +268,8 @@ public
260
268
  delta_size = 0
261
269
  end
262
270
  else
263
- chunk = partial_read_json(name, file[:offset], file[:length])
264
- delta_size = chunk.size
265
- @logger.debug("partial file #{name} from #{file[:offset]} to #{file[:length]}")
271
+ chunk = partial_read(name, file[:offset])
272
+ delta_size = chunk.size - @head.length - 1
266
273
  end
267
274
 
268
275
  if logtype == "nsgflowlog" && @is_json
@@ -272,14 +279,13 @@ public
272
279
  begin
273
280
  fingjson = JSON.parse(chunk)
274
281
  @processed += nsgflowlog(queue, fingjson, name)
275
- @logger.debug("Processed #{res[:nsg]} [#{res[:date]}] #{@processed} events")
282
+ @logger.debug("Processed #{res[:nsg]} #{@processed} events")
276
283
  rescue JSON::ParserError => e
277
- @logger.error("parse error #{e.message} on #{res[:nsg]} [#{res[:date]}] offset: #{file[:offset]} length: #{file[:length]}")
284
+ @logger.error("parse error #{e.message} on #{res[:nsg]} offset: #{file[:offset]} length: #{file[:length]}")
278
285
  if (@debug_until > @processed) then @logger.info("#{chunk}") end
279
286
  end
280
287
  end
281
288
  # TODO: Convert this to line based grokking.
282
- # TODO: ECS Compliance?
283
289
  elsif logtype == "wadiis" && !@is_json
284
290
  @processed += wadiislog(queue, name)
285
291
  else
@@ -398,14 +404,35 @@ private
398
404
  return chuck
399
405
  end
400
406
 
401
- def partial_read_json(filename, offset, length)
402
- content = @blob_client.get_blob(container, filename, start_range: offset-@tail.length, end_range: length-1)[1]
403
- if content.end_with?(@tail)
404
- # the tail is part of the last block, so included in the total length of the get_blob
405
- return @head + strip_comma(content)
407
+ def partial_read(blobname, offset)
408
+ # 1. read committed blocks, calculate length
409
+ # 2. calculate the offset to read
410
+ # 3. strip comma
411
+ # if json strip comma and fix head and tail
412
+ size = 0
413
+ blocks = @blob_client.list_blob_blocks(container, blobname)
414
+ blocks[:committed].each do |block|
415
+ size += block.size
416
+ end
417
+ # read the new blob blocks from the offset to the last committed size.
418
+ # if it is json, fix the head and tail
419
+ # crap committed block at the end is the tail, so must be substracted from the read and then comma stripped and tail added.
420
+ # but why did I need a -1 for the length?? probably the offset starts at 0 and ends at size-1
421
+
422
+ # should first check commit, read and the check committed again? no, only read the commited size
423
+ # should read the full content and then substract json tail
424
+
425
+ if @is_json
426
+ content = @blob_client.get_blob(container, blobname, start_range: offset-1, end_range: size-1)[1]
427
+ if content.end_with?(@tail)
428
+ return @head + strip_comma(content)
429
+ else
430
+ @logger.info("Fixed a tail! probably new committed blocks started appearing!")
431
+ # substract the length of the tail and add the tail, because the file grew.size was calculated as the block boundary, so replacing the last bytes with the tail should fix the problem
432
+ return @head + strip_comma(content[0...-@tail.length]) + @tail
433
+ end
406
434
  else
407
- # when the file has grown between list_blobs and the time of partial reading, the tail will be wrong
408
- return @head + strip_comma(content[0...-@tail.length]) + @tail
435
+ content = @blob_client.get_blob(container, blobname, start_range: offset, end_range: size-1)[1]
409
436
  end
410
437
  end
411
438
 
@@ -422,8 +449,9 @@ private
422
449
  count=0
423
450
  begin
424
451
  json["records"].each do |record|
425
- res = resource(record["resourceId"])
426
- resource = { :subscription => res[:subscription], :resourcegroup => res[:resourcegroup], :nsg => res[:nsg] }
452
+ resource = resource(record["resourceId"])
453
+ # resource = { :subscription => res[:subscription], :resourcegroup => res[:resourcegroup], :nsg => res[:nsg] }
454
+ extras = { :time => record["time"], :system => record["systemId"], :mac => record["macAddress"], :category => record["category"], :operation => record["operationName"] }
427
455
  @logger.trace(resource.to_s)
428
456
  record["properties"]["flows"].each do |flows|
429
457
  rule = resource.merge ({ :rule => flows["rule"]})
@@ -442,7 +470,18 @@ private
442
470
  if @addfilename
443
471
  ev.merge!( {:filename => name } )
444
472
  end
473
+ unless @environment.nil?
474
+ ev.merge!( {:environment => environment } )
475
+ end
476
+ if @addall
477
+ ev.merge!( extras )
478
+ end
479
+
480
+ # Add event to logstash queue
445
481
  event = LogStash::Event.new('message' => ev.to_json)
482
+ #if @ecs_compatibility != "disabled"
483
+ # event = ecs(event)
484
+ #end
446
485
  decorate(event)
447
486
  queue << event
448
487
  count+=1
@@ -563,11 +602,11 @@ private
563
602
  unless blob.name == registry_path
564
603
  begin
565
604
  blocks = @blob_client.list_blob_blocks(container, blob.name)[:committed]
566
- if blocks.first.name.start_with?('A00')
605
+ if ['A00000000000000000000000000000000','QTAwMDAwMDAwMDAwMDAwMDAwMDAwMDAwMDAwMDAwMDAw'].include?(blocks.first.name)
567
606
  @logger.debug("using #{blob.name}/#{blocks.first.name} to learn the json header")
568
607
  @head = @blob_client.get_blob(container, blob.name, start_range: 0, end_range: blocks.first.size-1)[1]
569
608
  end
570
- if blocks.last.name.start_with?('Z00')
609
+ if ['Z00000000000000000000000000000000','WjAwMDAwMDAwMDAwMDAwMDAwMDAwMDAwMDAwMDAwMDAw'].include?(blocks.last.name)
571
610
  @logger.debug("using #{blob.name}/#{blocks.last.name} to learn the json footer")
572
611
  length = blob.properties[:content_length].to_i
573
612
  offset = length - blocks.last.size
@@ -586,15 +625,60 @@ private
586
625
 
587
626
  def resource(str)
588
627
  temp = str.split('/')
589
- date = '---'
590
- unless temp[9].nil?
591
- date = val(temp[9])+'/'+val(temp[10])+'/'+val(temp[11])+'-'+val(temp[12])+':00'
592
- end
593
- return {:subscription=> temp[2], :resourcegroup=>temp[4], :nsg=>temp[8], :date=>date}
628
+ #date = '---'
629
+ #unless temp[9].nil?
630
+ # date = val(temp[9])+'/'+val(temp[10])+'/'+val(temp[11])+'-'+val(temp[12])+':00'
631
+ #end
632
+ return {:subscription=> temp[2], :resourcegroup=>temp[4], :nsg=>temp[8]}
594
633
  end
595
634
 
596
635
  def val(str)
597
636
  return str.split('=')[1]
598
637
  end
599
638
 
639
+ =begin
640
+ def ecs(old)
641
+ # https://www.elastic.co/guide/en/ecs/current/ecs-field-reference.html
642
+ ecs = LogStash::Event.new()
643
+ ecs.set("ecs.version", "1.0.0")
644
+ ecs.set("@timestamp", old.timestamp)
645
+ ecs.set("cloud.provider", "azure")
646
+ ecs.set("cloud.account.id", old.get("[subscription]")
647
+ ecs.set("cloud.project.id", old.get("[environment]")
648
+ ecs.set("file.name", old.get("[filename]")
649
+ ecs.set("event.category", "network")
650
+ if old.get("[decision]") == "D"
651
+ ecs.set("event.type", "denied")
652
+ else
653
+ ecs.set("event.type", "allowed")
654
+ end
655
+ ecs.set("event.action", "")
656
+ ecs.set("rule.ruleset", old.get("[nsg]")
657
+ ecs.set("rule.name", old.get("[rule]")
658
+ ecs.set("trace.id", old.get("[protocol]")+"/"+old.get("[src_ip]")+":"+old.get("[src_port]")+"-"+old.get("[dst_ip]")+":"+old.get("[dst_port]")
659
+ # requires logic to match sockets and flip src/dst for outgoing.
660
+ ecs.set("host.mac", old.get("[mac]")
661
+ ecs.set("source.ip", old.get("[src_ip]")
662
+ ecs.set("source.port", old.get("[src_port]")
663
+ ecs.set("source.bytes", old.get("[srcbytes]")
664
+ ecs.set("source.packets", old.get("[src_pack]")
665
+ ecs.set("destination.ip", old.get("[dst_ip]")
666
+ ecs.set("destination.port", old.get("[dst_port]")
667
+ ecs.set("destination.bytes", old.get("[dst_bytes]")
668
+ ecs.set("destination.packets", old.get("[dst_packets]")
669
+ if old.get("[protocol]") = "U"
670
+ ecs.set("network.transport", "udp")
671
+ else
672
+ ecs.set("network.transport", "tcp")
673
+ end
674
+ if old.get("[decision]") == "I"
675
+ ecs.set("network.direction", "incoming")
676
+ else
677
+ ecs.set("network.direction", "outgoing")
678
+ end
679
+ ecs.set("network.bytes", old.get("[src_bytes]")+old.get("[dst_bytes]")
680
+ ecs.set("network.packets", old.get("[src_packets]")+old.get("[dst_packets]")
681
+ return ecs
682
+ end
683
+ =end
600
684
  end # class LogStash::Inputs::AzureBlobStorage
@@ -1,6 +1,6 @@
1
1
  Gem::Specification.new do |s|
2
2
  s.name = 'logstash-input-azure_blob_storage'
3
- s.version = '0.12.6'
3
+ s.version = '0.12.7'
4
4
  s.licenses = ['Apache-2.0']
5
5
  s.summary = 'This logstash plugin reads and parses data from Azure Storage Blobs.'
6
6
  s.description = <<-EOF
metadata CHANGED
@@ -1,14 +1,14 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: logstash-input-azure_blob_storage
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.12.6
4
+ version: 0.12.7
5
5
  platform: ruby
6
6
  authors:
7
7
  - Jan Geertsma
8
8
  autorequire:
9
9
  bindir: bin
10
10
  cert_chain: []
11
- date: 2023-03-17 00:00:00.000000000 Z
11
+ date: 2023-04-02 00:00:00.000000000 Z
12
12
  dependencies:
13
13
  - !ruby/object:Gem::Dependency
14
14
  requirement: !ruby/object:Gem::Requirement