logstash-input-azure_blob_storage 0.12.5 → 0.12.7

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: 00e66bdb4eda73c6d9a4219034d6a33bbf4d8a1c8206f7f2e6ec39b414dd9d63
4
- data.tar.gz: 953fa1cc28b60e5a44d7575ead5428e6fd62c309ad1c9630a2f2cd73dac1ffc3
3
+ metadata.gz: 6bc1a46c4c6ae533e05c83f0e7cb90715cad7390a5cedb9b6e023c46f2e620d1
4
+ data.tar.gz: 520d7b5131a6b00b6de066a12cd93a99082c7af0bb7184df9f2bc9c8ca64babd
5
5
  SHA512:
6
- metadata.gz: a19ff34ae098f9bf115789b43781c4073268934c34fd69a2ae119ea844deffcd30d853aefc29e00fe4495858cbf336ec1c6dc0f2113c26239a47fcacfb73bb87
7
- data.tar.gz: 93ff0a91bfc54f8b159c80c9e0156064b3805166c0d83ae51fae09a8944e6f6b852ed72d269d5b1dd6e8561007a80042583fe2f7d7be20b09122c414c13ff94b
6
+ metadata.gz: 3c069008cfef9b08c4b9793b24538c9c8bdc217b64285626d3c9564a57584b237bfef90f4382e4b68366c2555b1b9a6e91d897951bbcc336b355eaefb310ce00
7
+ data.tar.gz: ccb7ba1d556cec586872ebe1c94237b3223f484902218d3bff899993467b741519c521b9c08075f98328536cf31274cd2aa386f64458097b025bbef2841c486d
data/CHANGELOG.md CHANGED
@@ -1,3 +1,21 @@
1
+
2
+ ## 0.12.7
3
+ - rewrote partial_read, now the occasional json parse errors should be fixed by reading only commited blocks.
4
+ (This may also have been related to reading a second partial_read, where the offset wasn't updated correctly?)
5
+ - used the new header and tail block name, should now learn header and footer automatically again?
6
+ - added addall to the configurations to add system, mac, category, time, operation to the output
7
+ - added optional environment configuration option
8
+ - removed the date, which was always set to ---
9
+ - made a start on event rewriting to make it ECS compatibility
10
+
11
+ ## 0.12.6
12
+ - Fixed the 0.12.5 exception handling, it actually caused a warning to become a fatal pipeline crashing error
13
+ - The chuck that failed to process should be printed in debug mode, for testing use debug_until => 10000
14
+ - Now check if registry entry exist before loading the offsets, to avoid caught: undefined method `[]' for nil:NilClass
15
+
16
+ ## 0.12.5
17
+ - Added exception message on json parse errors
18
+
1
19
  ## 0.12.4
2
20
  - Connection Cache reset removed, since agents are cached per host
3
21
  - Explicit handling of json_lines and respecting line boundaries (thanks nttoshev)
data/README.md CHANGED
@@ -42,9 +42,11 @@ input {
42
42
  ## Additional Configuration
43
43
  The registry keeps track of files in the storage account, their size and how many bytes have been processed. Files can grow and the added part will be processed as a partial file. The registry is saved todisk every interval.
44
44
 
45
+ The interval is also defines when a new round of listing files and processing data should happen. The NSGFLOWLOG's are written every minute into a new block of the hourly blob. This data can be partially read, because the plugin knows the JSON head and tail and removes the leading comma and fixes the JSON before parsing new events
46
+
45
47
  The registry_create_policy determines at the start of the pipeline if processing should resume from the last known unprocessed file, or to start_fresh ignoring old files and start only processing new events that came after the start of the pipeline. Or start_over to process all the files ignoring the registry.
46
48
 
47
- interval defines the minimum time the registry should be saved to the registry file (by default to 'data/registry.dat'), this is only needed in case the pipeline dies unexpectedly. During a normal shutdown the registry is also saved.
49
+ interval defines the minimum time the registry should be saved to the registry file. By default to 'data/registry.dat' in the storageaccount, but can be also kept on the server running logstash by setting registry_local_path. The registry is kept also in memory, the registry file is only needed in case the pipeline dies unexpectedly. During a normal shutdown the registry is also saved.
48
50
 
49
51
  When registry_local_path is set to a directory, the registry is saved on the logstash server in that directory. The filename is the pipe.id
50
52
 
@@ -66,13 +68,15 @@ The pipeline can be started in several ways.
66
68
  ```
67
69
  - As managed pipeline from Kibana
68
70
 
69
- Logstash itself (so not specific to this plugin) has a feature where multiple instances can run on the same system. The default TCP port is 9600, but if it's already in use it will use 9601 (and up). To update a config file on a running instance on the commandline you can add the argument --config.reload.automatic and if you modify the files that are in the pipeline.yml you can send a SIGHUP channel to reload the pipelines where the config was changed.
71
+ Logstash itself (so not specific to this plugin) has a feature where multiple instances can run on the same system. The default TCP port is 9600, but if it's already in use it will use 9601 (and up), this is probably not true anymore from v8. To update a config file on a running instance on the commandline you can add the argument --config.reload.automatic and if you modify the files that are in the pipeline.yml you can send a SIGHUP channel to reload the pipelines where the config was changed.
70
72
  [https://www.elastic.co/guide/en/logstash/current/reloading-config.html](https://www.elastic.co/guide/en/logstash/current/reloading-config.html)
71
73
 
72
74
  ## Internal Working
73
75
  When the plugin is started, it will read all the filenames and sizes in the blob store excluding the directies of files that are excluded by the "path_filters". After every interval it will write a registry to the storageaccount to save the information of how many bytes per blob (file) are read and processed. After all files are processed and at least one interval has passed a new file list is generated and a worklist is constructed that will be processed. When a file has already been processed before, partial files are read from the offset to the filesize at the time of the file listing. If the codec is JSON partial files will be have the header and tail will be added. They can be configured. If logtype is nsgflowlog, the plugin will process the splitting into individual tuple events. The logtype wadiis may in the future be used to process the grok formats to split into log lines. Any other format is fed into the queue as one event per file or partial file. It's then up to the filter to split and mutate the file format.
74
76
 
75
- By default the root of the json message is named "message" so you can modify the content in the filter block
77
+ By default the root of the json message is named "message", you can modify the content in the filter block
78
+
79
+ Additional fields can be enabled with addfilename and addall, ecs_compatibility is not yet supported.
76
80
 
77
81
  The configurations and the rest of the code are in [https://github.com/janmg/logstash-input-azure_blob_storage/tree/master/lib/logstash/inputs](lib/logstash/inputs) [https://github.com/janmg/logstash-input-azure_blob_storage/blob/master/lib/logstash/inputs/azure_blob_storage.rb#L10](azure_blob_storage.rb)
78
82
 
@@ -130,7 +134,7 @@ filter {
130
134
  }
131
135
 
132
136
  output {
133
- stdout { }
137
+ stdout { codec => rubydebug }
134
138
  }
135
139
 
136
140
  output {
@@ -139,24 +143,37 @@ output {
139
143
  index => "nsg-flow-logs-%{+xxxx.ww}"
140
144
  }
141
145
  }
146
+
147
+ output {
148
+ file {
149
+ path => /tmp/abuse.txt
150
+ codec => line { format => "%{decision} %{flowstate} %{src_ip} ${dst_port}"}
151
+ }
152
+ }
153
+
142
154
  ```
143
155
  A more elaborate input configuration example
144
156
  ```
145
157
  input {
146
158
  azure_blob_storage {
147
159
  codec => "json"
148
- storageaccount => "yourstorageaccountname"
149
- access_key => "Ba5e64c0d3=="
160
+ # storageaccount => "yourstorageaccountname"
161
+ # access_key => "Ba5e64c0d3=="
162
+ connection_string => "DefaultEndpointsProtocol=https;AccountName=yourstorageaccountname;AccountKey=Ba5e64c0d3==;EndpointSuffix=core.windows.net"
150
163
  container => "insights-logs-networksecuritygroupflowevent"
151
164
  logtype => "nsgflowlog"
152
165
  prefix => "resourceId=/"
153
166
  path_filters => ['**/*.json']
154
167
  addfilename => true
168
+ addall => true
169
+ environment => "dev-env"
155
170
  registry_create_policy => "resume"
156
171
  registry_local_path => "/usr/share/logstash/plugin"
157
172
  interval => 300
158
173
  debug_timer => true
159
- debug_until => 100
174
+ debug_until => 1000
175
+ addall => true
176
+ registry_create_policy => "start_over"
160
177
  }
161
178
  }
162
179
 
@@ -17,10 +17,12 @@ require 'json'
17
17
  # D672f4bbd95a04209b00dc05d899e3cce 2576 json objects for 1st minute
18
18
  # D7fe0d4f275a84c32982795b0e5c7d3a1 2312 json objects for 2nd minute
19
19
  # Z00000000000000000000000000000000 2 ]}
20
-
20
+ #
21
+ # The azure-storage-ruby connects to the storageaccount and the files are read through get_blob. For partial read the options with start and end ar used.
22
+ # https://github.com/Azure/azure-storage-ruby/blob/master/blob/lib/azure/storage/blob/blob.rb#L89
23
+ #
21
24
  # A storage account has by default a globally unique name, {storageaccount}.blob.core.windows.net which is a CNAME to Azures blob servers blob.*.store.core.windows.net. A storageaccount has an container and those have a directory and blobs (like files). Blobs have one or more blocks. After writing the blocks, they can be committed. Some Azure diagnostics can send events to an EventHub that can be parse through the plugin logstash-input-azure_event_hubs, but for the events that are only stored in an storage account, use this plugin. The original logstash-input-azureblob from azure-diagnostics-tools is great for low volumes, but it suffers from outdated client, slow reads, lease locking issues and json parse errors.
22
25
 
23
-
24
26
  class LogStash::Inputs::AzureBlobStorage < LogStash::Inputs::Base
25
27
  config_name "azure_blob_storage"
26
28
 
@@ -74,6 +76,12 @@ class LogStash::Inputs::AzureBlobStorage < LogStash::Inputs::Base
74
76
  # add the filename as a field into the events
75
77
  config :addfilename, :validate => :boolean, :default => false, :required => false
76
78
 
79
+ # add environment
80
+ config :environment, :validate => :string, :required => false
81
+
82
+ # add all resource details
83
+ config :addall, :validate => :boolean, :default => false, :required => false
84
+
77
85
  # debug_until will at the creation of the pipeline for a maximum amount of processed messages shows 3 types of log printouts including processed filenames. After a number of events, the plugin will stop logging the events and continue silently. This is a lightweight alternative to switching the loglevel from info to debug or even trace to see what the plugin is doing and how fast at the start of the plugin. A good value would be approximately 3x the amount of events per file. For instance 6000 events.
78
86
  config :debug_until, :validate => :number, :default => 0, :required => false
79
87
 
@@ -205,10 +213,12 @@ public
205
213
  filelist = list_blobs(false)
206
214
  filelist.each do |name, file|
207
215
  off = 0
208
- begin
216
+ if @registry.key?(name) then
217
+ begin
209
218
  off = @registry[name][:offset]
210
- rescue Exception => e
219
+ rescue Exception => e
211
220
  @logger.error("caught: #{e.message} while reading #{name}")
221
+ end
212
222
  end
213
223
  @registry.store(name, { :offset => off, :length => file[:length] })
214
224
  if (@debug_until > @processed) then @logger.info("2: adding offsets: #{name} #{off} #{file[:length]}") end
@@ -258,9 +268,8 @@ public
258
268
  delta_size = 0
259
269
  end
260
270
  else
261
- chunk = partial_read_json(name, file[:offset], file[:length])
262
- delta_size = chunk.size
263
- @logger.debug("partial file #{name} from #{file[:offset]} to #{file[:length]}")
271
+ chunk = partial_read(name, file[:offset])
272
+ delta_size = chunk.size - @head.length - 1
264
273
  end
265
274
 
266
275
  if logtype == "nsgflowlog" && @is_json
@@ -270,14 +279,13 @@ public
270
279
  begin
271
280
  fingjson = JSON.parse(chunk)
272
281
  @processed += nsgflowlog(queue, fingjson, name)
273
- @logger.debug("Processed #{res[:nsg]} [#{res[:date]}] #{@processed} events")
274
- rescue JSON::ParserError
275
- @logger.error("parse error #{e.message} on #{res[:nsg]} [#{res[:date]}] offset: #{file[:offset]} length: #{file[:length]}")
276
- @logger.debug("#{chunk}")
282
+ @logger.debug("Processed #{res[:nsg]} #{@processed} events")
283
+ rescue JSON::ParserError => e
284
+ @logger.error("parse error #{e.message} on #{res[:nsg]} offset: #{file[:offset]} length: #{file[:length]}")
285
+ if (@debug_until > @processed) then @logger.info("#{chunk}") end
277
286
  end
278
287
  end
279
288
  # TODO: Convert this to line based grokking.
280
- # TODO: ECS Compliance?
281
289
  elsif logtype == "wadiis" && !@is_json
282
290
  @processed += wadiislog(queue, name)
283
291
  else
@@ -396,14 +404,35 @@ private
396
404
  return chuck
397
405
  end
398
406
 
399
- def partial_read_json(filename, offset, length)
400
- content = @blob_client.get_blob(container, filename, start_range: offset-@tail.length, end_range: length-1)[1]
401
- if content.end_with?(@tail)
402
- # the tail is part of the last block, so included in the total length of the get_blob
403
- return @head + strip_comma(content)
407
+ def partial_read(blobname, offset)
408
+ # 1. read committed blocks, calculate length
409
+ # 2. calculate the offset to read
410
+ # 3. strip comma
411
+ # if json strip comma and fix head and tail
412
+ size = 0
413
+ blocks = @blob_client.list_blob_blocks(container, blobname)
414
+ blocks[:committed].each do |block|
415
+ size += block.size
416
+ end
417
+ # read the new blob blocks from the offset to the last committed size.
418
+ # if it is json, fix the head and tail
419
+ # crap committed block at the end is the tail, so must be substracted from the read and then comma stripped and tail added.
420
+ # but why did I need a -1 for the length?? probably the offset starts at 0 and ends at size-1
421
+
422
+ # should first check commit, read and the check committed again? no, only read the commited size
423
+ # should read the full content and then substract json tail
424
+
425
+ if @is_json
426
+ content = @blob_client.get_blob(container, blobname, start_range: offset-1, end_range: size-1)[1]
427
+ if content.end_with?(@tail)
428
+ return @head + strip_comma(content)
429
+ else
430
+ @logger.info("Fixed a tail! probably new committed blocks started appearing!")
431
+ # substract the length of the tail and add the tail, because the file grew.size was calculated as the block boundary, so replacing the last bytes with the tail should fix the problem
432
+ return @head + strip_comma(content[0...-@tail.length]) + @tail
433
+ end
404
434
  else
405
- # when the file has grown between list_blobs and the time of partial reading, the tail will be wrong
406
- return @head + strip_comma(content[0...-@tail.length]) + @tail
435
+ content = @blob_client.get_blob(container, blobname, start_range: offset, end_range: size-1)[1]
407
436
  end
408
437
  end
409
438
 
@@ -420,8 +449,9 @@ private
420
449
  count=0
421
450
  begin
422
451
  json["records"].each do |record|
423
- res = resource(record["resourceId"])
424
- resource = { :subscription => res[:subscription], :resourcegroup => res[:resourcegroup], :nsg => res[:nsg] }
452
+ resource = resource(record["resourceId"])
453
+ # resource = { :subscription => res[:subscription], :resourcegroup => res[:resourcegroup], :nsg => res[:nsg] }
454
+ extras = { :time => record["time"], :system => record["systemId"], :mac => record["macAddress"], :category => record["category"], :operation => record["operationName"] }
425
455
  @logger.trace(resource.to_s)
426
456
  record["properties"]["flows"].each do |flows|
427
457
  rule = resource.merge ({ :rule => flows["rule"]})
@@ -440,7 +470,18 @@ private
440
470
  if @addfilename
441
471
  ev.merge!( {:filename => name } )
442
472
  end
473
+ unless @environment.nil?
474
+ ev.merge!( {:environment => environment } )
475
+ end
476
+ if @addall
477
+ ev.merge!( extras )
478
+ end
479
+
480
+ # Add event to logstash queue
443
481
  event = LogStash::Event.new('message' => ev.to_json)
482
+ #if @ecs_compatibility != "disabled"
483
+ # event = ecs(event)
484
+ #end
444
485
  decorate(event)
445
486
  queue << event
446
487
  count+=1
@@ -561,11 +602,11 @@ private
561
602
  unless blob.name == registry_path
562
603
  begin
563
604
  blocks = @blob_client.list_blob_blocks(container, blob.name)[:committed]
564
- if blocks.first.name.start_with?('A00')
605
+ if ['A00000000000000000000000000000000','QTAwMDAwMDAwMDAwMDAwMDAwMDAwMDAwMDAwMDAwMDAw'].include?(blocks.first.name)
565
606
  @logger.debug("using #{blob.name}/#{blocks.first.name} to learn the json header")
566
607
  @head = @blob_client.get_blob(container, blob.name, start_range: 0, end_range: blocks.first.size-1)[1]
567
608
  end
568
- if blocks.last.name.start_with?('Z00')
609
+ if ['Z00000000000000000000000000000000','WjAwMDAwMDAwMDAwMDAwMDAwMDAwMDAwMDAwMDAwMDAw'].include?(blocks.last.name)
569
610
  @logger.debug("using #{blob.name}/#{blocks.last.name} to learn the json footer")
570
611
  length = blob.properties[:content_length].to_i
571
612
  offset = length - blocks.last.size
@@ -573,7 +614,7 @@ private
573
614
  @logger.debug("learned tail: #{@tail}")
574
615
  end
575
616
  rescue Exception => e
576
- @logger.info("learn json one of the attempts failed #{e.message}")
617
+ @logger.info("learn json one of the attempts failed")
577
618
  end
578
619
  end
579
620
  end
@@ -584,15 +625,60 @@ private
584
625
 
585
626
  def resource(str)
586
627
  temp = str.split('/')
587
- date = '---'
588
- unless temp[9].nil?
589
- date = val(temp[9])+'/'+val(temp[10])+'/'+val(temp[11])+'-'+val(temp[12])+':00'
590
- end
591
- return {:subscription=> temp[2], :resourcegroup=>temp[4], :nsg=>temp[8], :date=>date}
628
+ #date = '---'
629
+ #unless temp[9].nil?
630
+ # date = val(temp[9])+'/'+val(temp[10])+'/'+val(temp[11])+'-'+val(temp[12])+':00'
631
+ #end
632
+ return {:subscription=> temp[2], :resourcegroup=>temp[4], :nsg=>temp[8]}
592
633
  end
593
634
 
594
635
  def val(str)
595
636
  return str.split('=')[1]
596
637
  end
597
638
 
639
+ =begin
640
+ def ecs(old)
641
+ # https://www.elastic.co/guide/en/ecs/current/ecs-field-reference.html
642
+ ecs = LogStash::Event.new()
643
+ ecs.set("ecs.version", "1.0.0")
644
+ ecs.set("@timestamp", old.timestamp)
645
+ ecs.set("cloud.provider", "azure")
646
+ ecs.set("cloud.account.id", old.get("[subscription]")
647
+ ecs.set("cloud.project.id", old.get("[environment]")
648
+ ecs.set("file.name", old.get("[filename]")
649
+ ecs.set("event.category", "network")
650
+ if old.get("[decision]") == "D"
651
+ ecs.set("event.type", "denied")
652
+ else
653
+ ecs.set("event.type", "allowed")
654
+ end
655
+ ecs.set("event.action", "")
656
+ ecs.set("rule.ruleset", old.get("[nsg]")
657
+ ecs.set("rule.name", old.get("[rule]")
658
+ ecs.set("trace.id", old.get("[protocol]")+"/"+old.get("[src_ip]")+":"+old.get("[src_port]")+"-"+old.get("[dst_ip]")+":"+old.get("[dst_port]")
659
+ # requires logic to match sockets and flip src/dst for outgoing.
660
+ ecs.set("host.mac", old.get("[mac]")
661
+ ecs.set("source.ip", old.get("[src_ip]")
662
+ ecs.set("source.port", old.get("[src_port]")
663
+ ecs.set("source.bytes", old.get("[srcbytes]")
664
+ ecs.set("source.packets", old.get("[src_pack]")
665
+ ecs.set("destination.ip", old.get("[dst_ip]")
666
+ ecs.set("destination.port", old.get("[dst_port]")
667
+ ecs.set("destination.bytes", old.get("[dst_bytes]")
668
+ ecs.set("destination.packets", old.get("[dst_packets]")
669
+ if old.get("[protocol]") = "U"
670
+ ecs.set("network.transport", "udp")
671
+ else
672
+ ecs.set("network.transport", "tcp")
673
+ end
674
+ if old.get("[decision]") == "I"
675
+ ecs.set("network.direction", "incoming")
676
+ else
677
+ ecs.set("network.direction", "outgoing")
678
+ end
679
+ ecs.set("network.bytes", old.get("[src_bytes]")+old.get("[dst_bytes]")
680
+ ecs.set("network.packets", old.get("[src_packets]")+old.get("[dst_packets]")
681
+ return ecs
682
+ end
683
+ =end
598
684
  end # class LogStash::Inputs::AzureBlobStorage
@@ -1,6 +1,6 @@
1
1
  Gem::Specification.new do |s|
2
2
  s.name = 'logstash-input-azure_blob_storage'
3
- s.version = '0.12.5'
3
+ s.version = '0.12.7'
4
4
  s.licenses = ['Apache-2.0']
5
5
  s.summary = 'This logstash plugin reads and parses data from Azure Storage Blobs.'
6
6
  s.description = <<-EOF
@@ -23,7 +23,6 @@ EOF
23
23
  s.add_runtime_dependency 'logstash-core-plugin-api', '~> 2.0'
24
24
  s.add_runtime_dependency 'stud', '~> 0.0.23'
25
25
  s.add_runtime_dependency 'azure-storage-blob', '~> 2', '>= 2.0.3'
26
-
27
- s.add_development_dependency 'logstash-devutils'
28
- s.add_development_dependency 'rubocop'
26
+ s.add_development_dependency 'logstash-devutils', '~> 2.4'
27
+ s.add_development_dependency 'rubocop', '~> 1.48'
29
28
  end
metadata CHANGED
@@ -1,14 +1,14 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: logstash-input-azure_blob_storage
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.12.5
4
+ version: 0.12.7
5
5
  platform: ruby
6
6
  authors:
7
7
  - Jan Geertsma
8
8
  autorequire:
9
9
  bindir: bin
10
10
  cert_chain: []
11
- date: 2023-03-08 00:00:00.000000000 Z
11
+ date: 2023-04-02 00:00:00.000000000 Z
12
12
  dependencies:
13
13
  - !ruby/object:Gem::Dependency
14
14
  requirement: !ruby/object:Gem::Requirement
@@ -61,31 +61,31 @@ dependencies:
61
61
  - !ruby/object:Gem::Dependency
62
62
  requirement: !ruby/object:Gem::Requirement
63
63
  requirements:
64
- - - ">="
64
+ - - "~>"
65
65
  - !ruby/object:Gem::Version
66
- version: '0'
66
+ version: '2.4'
67
67
  name: logstash-devutils
68
68
  prerelease: false
69
69
  type: :development
70
70
  version_requirements: !ruby/object:Gem::Requirement
71
71
  requirements:
72
- - - ">="
72
+ - - "~>"
73
73
  - !ruby/object:Gem::Version
74
- version: '0'
74
+ version: '2.4'
75
75
  - !ruby/object:Gem::Dependency
76
76
  requirement: !ruby/object:Gem::Requirement
77
77
  requirements:
78
- - - ">="
78
+ - - "~>"
79
79
  - !ruby/object:Gem::Version
80
- version: '0'
80
+ version: '1.48'
81
81
  name: rubocop
82
82
  prerelease: false
83
83
  type: :development
84
84
  version_requirements: !ruby/object:Gem::Requirement
85
85
  requirements:
86
- - - ">="
86
+ - - "~>"
87
87
  - !ruby/object:Gem::Version
88
- version: '0'
88
+ version: '1.48'
89
89
  description: " This gem is a Logstash plugin. It reads and parses data from Azure\
90
90
  \ Storage Blobs. The azure_blob_storage is a reimplementation to replace azureblob\
91
91
  \ from azure-diagnostics-tools/Logstash. It can deal with larger volumes and partial\