RubyGems - logstash-input-azure_blob_storage - Versions diffs - 0.10.0 - Mend

logstash-input-azure_blob_storage 0.10.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (11) hide show

checksums.yaml +7 -0
data/CHANGELOG.md +2 -0
data/CONTRIBUTORS +10 -0
data/DEVELOPER.md +2 -0
data/Gemfile +3 -0
data/LICENSE +11 -0
data/README.md +102 -0
data/lib/logstash/inputs/azure_blob_storage.rb +422 -0
data/logstash-input-azure_blob_storage.gemspec +26 -0
data/spec/inputs/azure_blob_storage_spec.rb +11 -0
metadata +134 -0

checksums.yaml ADDED

@@ -0,0 +1,7 @@
+---
+SHA256:
+  metadata.gz: 47775d226b17cd57d8ce290569e35cf893713f42757bd953fd46b93055733527
+  data.tar.gz: de73405430b71405ecc0a4873a72feefabdb7dd7f410734dd550928623a39c53
+SHA512:
+  metadata.gz: a028a0df1310312d9a1826de016407693ca14002c316d08267de3be51abc94aaa9db997e3fb19677cd23db551d029d7e1acffc65068eb0b2ca4a2a6d408dbbe7
+  data.tar.gz: b54aa9e59046793f26bfaa5fcd2795c809f2fdd535449f00a8ab6b7392107f8c9a2d3f937a3d24d1c0bcb2890d60493baeb360b29c1188666781ecc9e4e7b014

data/CHANGELOG.md ADDED

	@@ -0,0 +1,2 @@
1	+ ## 0.1.0
2	+ - Plugin created with the logstash plugin generator

data/CONTRIBUTORS ADDED

@@ -0,0 +1,10 @@
+The following is a list of people who have contributed ideas, code, bug
+reports, or in general have helped logstash along its way.
+Contributors:
+* Jan Geertsma - jan@janmg.com
+Note: If you've sent us patches, bug reports, or otherwise contributed to
+Logstash, and you aren't on the list above and want to be, please let us know
+and we'll make sure you're here. Contributions from folks like you are what make
+open source awesome.

data/DEVELOPER.md ADDED

	@@ -0,0 +1,2 @@
1	+ # logstash-input-azure_blob_storage
2	+ Example input plugin. This should help bootstrap your effort to write your own input plugin!

data/Gemfile ADDED

@@ -0,0 +1,3 @@
+source 'https://rubygems.org'
+gemspec

data/LICENSE ADDED

@@ -0,0 +1,11 @@
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+    http://www.apache.org/licenses/LICENSE-2.0
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.

data/README.md ADDED

@@ -0,0 +1,102 @@
+# Logstash Plugin
+This is a plugin for [Logstash](https://github.com/elastic/logstash).
+It is fully free and fully open source. The license is Apache 2.0, meaning you are pretty much free to use it however you want in whatever way.
+## Documentation
+All plugin documentation are placed under one [central location](http://www.elastic.co/guide/en/logstash/current/).
+## Need Help?
+Need help? Try #logstash on freenode IRC or the https://discuss.elastic.co/c/logstash discussion forum. For real problems or feature requests, raise a github issue. Pull requests will ionly be merged after discussion through an issue.
+## Purpose
+This plugin can read from Azure Storage Blobs, after every interval it will write a registry to the storageaccount to save the information of how many bytes per blob (file) are read and processed. After all files are processed and at least one interval has passed a new file list is generated and a worklist is constructed that will be processed. When a file has already been processed before, partial files are read from the offset to the filesize at the time of the file listing. If the codec is JSON partial files will be have the header and tail will be added. They can be configured. If logtype is nsgflowlog, the plugin will process the splitting into individual tuple events. The logtype wadiis may in the future be used to process the grok formats to split into log lines. Any other format is fed into the queue as one event per file or partial file. It's then up to the filter to split and mutate the file format. use source => message in the filter {} block.
+## Installation
+This plugin can be installed through logstash-plugin
+```
+logstash-plugin install logstash-input-azure_blob_storage
+```
+## Enabling NSG Flowlogs
+1. Enable Network Watcher in your regions
+2. Create Storage account per region
+   v1 or v2 are both fine
+   Any resource group works fine, NetworkWatcherRG would be the best
+3. Enable in Network Watcher for every NSG the NSG Flow logs
+   the list_blobs has a limit of 5000 files, with one file per hour per nsg make sure the retention time is set so that all files can be seen. for 180 NSG's with 1 day retention is 4320 files, more retention leads to delays in processing. So either use multiple storage accounts with multiple pipelines, or use the same storage account with a prefix to separate.
+4. In storage account there will be a/ container / resourceID
+{storageaccount}.blob.core.windows.net/insights-logs-networksecuritygroupflowevent/resourceId=/SUBSCRIPTIONS/{UUID}/RESOURCEGROUPS/{RG}/PROVIDERS/MICROSOFT.NETWORK/NETWORKSECURITYGROUPS/{NSG}/y=2019/m=02/d=12/h=07/m=00/macAddress={MAC}/PT1H.json
+5. Get credentials of the storageaccount
+   - SAS token (shared access signature) starts with a '?'
+   - connection string ... one string with all the connection details
+   - Access key (key1 or key2)
+## Troubleshooting
+The default loglevel can be changed in global logstash.yml. On the info level, the plugin save offsets to the registry every interval and will log statistics of processed events (one ) plugin will print for each pipeline the first 6 characters of the ID, in DEBUG the yml log level debug shows details of number of events per (partial) files that are read.
+```
+log.level
+```
+The log level of the plugin can be put into DEBUG through
+```
+curl -XPUT 'localhost:9600/_node/logging?pretty' -H 'Content-Type: application/json' -d'{"logger.logstash.inputs.azureblobstorage" : "DEBUG"}'
+```
+## Configuration Examples
+```
+input {
+    azure_blob_storage {
+        storageaccount => "yourstorageaccountname"
+        access_key => "Ba5e64c0d3=="
+        container => "insights-logs-networksecuritygroupflowevent"
+    }
+}
+filter {
+    json {
+        source => "message"
+    }
+    mutate {
+        add_field => { "environment" => "test-env" }
+        remove_field => [ "message" ]
+    }
+    date {
+        match => ["unixtimestamp", "UNIX"]
+    }
+}
+output {
+    elasticsearch {
+        hosts => "elasticsearch"
+        index => "nsg-flow-logs-%{+xxxx.ww}"
+    }
+}
+```
+You can include additional options to tweak the operations
+```
+input {
+    azure_blob_storage {
+        storageaccount => "yourstorageaccountname"
+        access_key => "Ba5e64c0d3=="
+        container => "insights-logs-networksecuritygroupflowevent"
+        codec => "json"
+        logtype => "nsgflowlog"
+        prefix => "resourceId=/"
+        registry_create_policy => "resume"
+        interval => 60
+        iplookup => "http://10.0.0.5:6081/ripe.php?ip="
+        use_redis => true
+        iplist => [
+            "{'ip':'10.0.0.4','netname':'Application Gateway','subnet':'10.0.0.0\/24','hostname':'appgw'}",
+            "{'ip':'36.156.24.96',netname':'China Mobile','subnet':'36.156.0.0\/16','hostname':'bigbadwolf'}"
+        ]
+    }
+}
+```

data/lib/logstash/inputs/azure_blob_storage.rb ADDED

@@ -0,0 +1,422 @@
+# encoding: utf-8
+require "logstash/inputs/base"
+require "stud/interval"
+require 'azure/storage/blob'
+#require 'securerandom'
+#require 'rbconfig'
+#require 'date'
+#require 'json'
+#require 'thread'
+#require "redis"
+#require 'net/http'
+# This is a logstash input plugin for files in Azure Blob Storage. There is a storage explorer in the portal and an application with the same name https://storageexplorer.com. A storage account has by default a globally unique name, {storageaccount}.blob.core.windows.net which is a CNAME to  Azures blob servers blob.*.store.core.windows.net. A storageaccount has an container and those have a directory and blobs (like files) and blobs are constructed of or more blocks. Some Azure diagnostics can send events to an EventHub that can be parse through the plugin logstash-input-azure_event_hubs, but for the events that are only stored in an storage account, use this plugin. The original logstash-input-azureblob from azure-diagnostics-tools is great for low volumes, but it suffers from outdated client, slow reads, lease locking issues and json parse errors.
+# https://azure.microsoft.com/en-us/services/storage/blobs/
+class LogStash::Inputs::AzureBlobStorage < LogStash::Inputs::Base
+  config_name "azure_blob_storage"
+  # If undefined, Logstash will complain, even if codec is unused. The codec for nsgflowlog has to be JSON and the for WADIIS and APPSERVICE it has to be plain.
+  default :codec, "json"
+  # logtype can be nsgflowlog, wadiis, appservice or raw. The default is raw, where files are read and added as one event. If the file grows, the next interval the file is read from the offset, so that the delta is sent as another event. In raw mode, further processing has to be done in the filter block. If the logtype is specified, this plugin will split and mutate and add individual events to the queue.
+  config :logtype, :validate => ['nsgflowlog','wadiis','appservice','raw'], :default => 'raw'
+  # The storage account is accessed through Azure::Storage::Blob::BlobService, it needs either a sas_token, connection string or a storageaccount/access_key pair.
+  # https://github.com/Azure/azure-storage-ruby/blob/master/blob/lib/azure/storage/blob/blob_service.rb#L42
+  config :connection_string, :validate => :password
+  # The storage account name for the azure storage account.
+  config :storageaccount, :validate => :string
+  # The (primary or secondary) Access Key for the the storage account. The key can be found in the portal.azure.com or through the azure api StorageAccounts/ListKeys. For example the PowerShell command Get-AzStorageAccountKey.
+  config :access_key, :validate => :password
+  # SAS is the Shared Access Signature, that provides restricted access rights. If the sas_token is absent, the access_key is used instead.
+  config :sas_token, :validate => :password
+  # The container of the blobs.
+  config :container, :validate => :string, :default => 'insights-logs-networksecuritygroupflowevent'
+  # The registry file keeps track of the files that have been processed and until which offset in bytes. It's similar in function
+  #
+  # The default, `data/registry`, it contains a Ruby Marshal Serialized Hash of the filename the offset read sofar and the filelength the list time a filelisting was done.
+  config :registry_path, :validate => :string, :required => false, :default => 'data/registry.dat'
+  # The default, `resume`, will load the registry offsets and will start processing files from the offsets.
+  # When set to `start_over`, all log files are processed from begining.
+  # when set to `start_fresh`, it will read log files that are created or appended since this start of the pipeline.
+  config :registry_create_policy, :validate => ['resume','start_over','start_fresh'], :required => false, :default => 'resume'
+  # The registry keeps track of the files that where already procesed. The interval is used to save the registry regularly, when new events have have been processed. It is also used to wait before listing the files again and substraciting the registry of already processed files to determine the worklist.
+  #
+  # waiting time in seconds until processing the next batch. NSGFLOWLOGS append a block per minute, so use multiples of 60 seconds, 300 for 5 minutes, 600 for 10 minutes. The registry is also saved after every interval.
+  # Partial reading starts from the offset and reads until the end, so the starting tag is prepended
+  #
+  # A00000000000000000000000000000000 12    {"records":[
+  # D672f4bbd95a04209b00dc05d899e3cce 2576  json objects for 1st minute
+  # D7fe0d4f275a84c32982795b0e5c7d3a1 2312  json objects for 2nd minute
+  # Z00000000000000000000000000000000 2     ]}
+  config :interval, :validate => :number, :default => 60
+  # WAD IIS Grok Pattern
+  #config :grokpattern, :validate => :string, :required => false, :default => '%{TIMESTAMP_ISO8601:log_timestamp} %{NOTSPACE:instanceId} %{NOTSPACE:instanceId2} %{IPORHOST:ServerIP} %{WORD:httpMethod} %{URIPATH:requestUri} %{NOTSPACE:requestQuery} %{NUMBER:port} %{NOTSPACE:username} %{IPORHOST:clientIP} %{NOTSPACE:httpVersion} %{NOTSPACE:userAgent} %{NOTSPACE:cookie} %{NOTSPACE:referer} %{NOTSPACE:host} %{NUMBER:httpStatus} %{NUMBER:subresponse} %{NUMBER:win32response} %{NUMBER:sentBytes:int} %{NUMBER:receivedBytes:int} %{NUMBER:timeTaken:int}'
+  # The string that starts the JSON. Only needed when the codec is JSON. When partial file are read, the result will not be valid JSON unless the start and end are put back. the file_head and file_tail are learned at startup, by reading the first file in the blob_list and taking the first and last block, this would work for blobs that are appended like nsgflowlogs. The configuration can be set to override the learning. In case learning fails and the option is not set, the default is to use the 'records' as set by nsgflowlogs.
+  config :file_head, :validate => :string, :required => false, :default => '{"records":['
+  # The string that ends the JSON
+  config :file_tail, :validate => :string, :required => false, :default => ']}'
+  # The path(s) to the file(s) to use as an input. By default it will
+  # watch every files in the storage container.
+  # You can use filename patterns here, such as `logs/*.log`.
+  # If you use a pattern like `logs/**/*.log`, a recursive search
+  # of `logs` will be done for all `*.log` files.
+  # Do not include a leading `/`, as Azure path look like this:
+  # `path/to/blob/file.txt`
+  #
+  # You may also configure multiple paths. See an example
+  # on the <<array,Logstash configuration page>>.
+  # For NSGFLOWLOGS a path starts with "resourceId=/", but this would only be needed to exclude other files that may be written in the same container.
+  config :prefix, :validate => :string, :required => false
+  # Set the value for the registry file.
+  #
+  # The default, `data/registry`, it contains a Ruby Marshal Serialized Hash of the filename the offset read sofar and the filelength the list time a filelisting was done.
+  config :registry_path, :validate => :string, :required => false, :default => 'data/registry'
+  # The default, `resume`, will load the registry offsets and will start processing files from the offsets.
+  # When set to `start_over`, all log files are processed from begining.
+  # when set to `start_fresh`, it will read log files that are created or appended since this start of the pipeline.
+  config :registry_create_policy, :validate => ['resume','start_over','start_fresh'], :required => false, :default => 'resume'
+  # Optional to enrich NSGFLOWLOGS with netname and subnet the iplookup value points to a webservice that provides the information in JSON format like this.
+  # {"ip":"8.8.8.8","netname":"Google","subnet":"8.8.8.0\/24","hostname":"google-public-dns-a.google.com"}
+  config :iplookup, :validate => :string, :required => false, :default => 'http://127.0.0.1/ripe.php?ip='
+  # Optional Redis IP cache
+  config :use_redis, :validate => :boolean, :required => false, :default => false
+  # Optional array of JSON objects that don't require a lookup
+  config :iplist, :validate => :array, :required => false, :default => ['{"ip":"10.0.0.4","netname":"Application Gateway","subnet":"10.0.0.0\/24","hostname":"appgw"}']
+public
+def register
+    @pipe_id = Thread.current[:name].split("[").last.split("]").first
+    @logger.info("=== "+config_name+"/"+@pipe_id+"/"+@id[0,6]+" ===")
+    # TODO: consider multiple readers, so add pipeline @id or use logstash-to-logstash communication?
+    # TODO: Implement retry ... Error: Connection refused - Failed to open TCP connection to
+    # counter for all processed events since the start of this pipeline
+    @processed = 0
+    @regsaved = @processed
+    # Try in this order to access the storageaccount
+    # 1. storageaccount / sas_token
+    # 2. connection_string
+    # 3. storageaccount / access_key
+    conn = connection_string
+    unless sas_token.nil?
+        # TODO: Fix SAS Tokens
+        unless sas_token.value.start_with?('?')
+		conn = "BlobEndpoint=https://#{storageaccount}.blob.core.windows.net;SharedAccessSignature=#{sas_token.value}"
+        else
+		conn = sas_token.value
+    	end
+    end
+    unless conn.nil?
+        @blob_client = Azure::Storage::Blob::BlobService.create_from_connection_string(conn)
+    else
+        @blob_client = Azure::Storage::Blob::BlobService.create(
+            storage_account_name: storageaccount,
+            storage_access_key: access_key.value,
+        )
+    end
+    # redis is optional to cache ip's from the optional iplookup
+    # iplookups are optional and so is the dependancy for caching through redis
+    if use_redis && !iplookup.nil?
+      begin
+        require 'redis'
+      rescue LoadError
+        require 'rubygems/dependency_installer'
+        installer = Gem::DependencyInstaller.new
+        installer.install 'redis'
+        Gem.refresh
+        Gem::Specification.find_by_name('redis').activate
+        require 'redis'
+      ensure
+        @red = Redis.new
+      end
+    end
+    @registry = Hash.new
+    unless registry_create_policy == "start_over"
+      begin
+        @registry = Marshal.load(@blob_client.get_blob(container, registry_path)[1])
+        #[0] headers [1] responsebody
+      rescue
+        @registry.clear
+      end
+    end
+    # read filelist and set offsets to file length to mark all the old files as done
+    if registry_create_policy == "start_fresh"
+        @registry.each do |name, file|
+            @registry.store(name, { :offset => file[:length], :length => file[:length] })
+        end
+    end
+    @is_json = (defined?(LogStash::Codecs::JSON) == 'constant') && (@codec.is_a? LogStash::Codecs::JSON)
+    @head = ''
+    @tail = ''
+    # if codec=json sniff one files blocks A and Z to learn file_head and file_tail
+    if @is_json
+        learn_encapsulation
+        if file_head
+           @head = file_head
+        end
+        if file_tail
+           @tail = file_tail
+        end
+    end
+end # def register
+def run(queue)
+    filelist = Hash.new
+      # we can abort the loop if stop? becomes true
+      while !stop?
+        chrono = Time.now.to_i
+        # load te registry, compare it's offsets to file list, set offset to 0 for new files, process the whole list and if finished within the interval wait for next loop,
+        # TODO: sort by timestamp
+        #filelist.sort_by(|k,v|resource(k)[:date])
+        filelist = list_blobs()
+        save_registry(filelist)
+        @registry = filelist
+        # Worklist is the subset of files where the already read offset is smaller than the file size
+        worklist = filelist.select {|name,file| file[:offset] < file[:length]}
+        @logger.info(@pipe_id+" worklist contains #{worklist.size} blobs to process")
+        # This would be ideal for threading since it's IO intensive, would be nice with a ruby native ThreadPool
+        worklist.each do |name, file|
+            res = resource(name)
+            if file[:offset] == 0
+                chunk = full_read(name)
+                # this may read more than originally listed
+                file[:length]=chunk.size
+            else
+                chunk = partial_read_json(name, file[:offset], file[:length])
+                @logger.debug(@pipe_id+" partial file #{res[:nsg]} [#{res[:date]}]")
+            end
+            if logtype == "nsgflowlog" && @is_json
+                begin
+                    @processed += nsgflowlog(queue, JSON.parse(chunk))
+                rescue JSON::ParserError
+                    @logger.error(@pipe_id+" parse error on #{res[:nsg]} [#{res[:date]}] offset: #{file[:offset]} length: #{file[:length]}")
+                end
+            # TODO Convert this to line based grokking.
+	    elsif logtype == "wadiis" && !@is_json
+                @processed += wadiislog(queue, file[:name])
+            else
+                @codec.decode(chunk) do |event|
+                    decorate(event)
+                    queue << event
+                end
+                @processed += 1
+            end
+            @logger.debug(@pipe_id+" Processed #{res[:nsg]} [#{res[:date]}] #{@processed} events")
+            @registry.store(name, { :offset => file[:length], :length => file[:length] })
+            # if stop? good moment to stop what we're doing
+            if stop?
+                return
+            end
+            # save the registry regularly
+            now = Time.now.to_i
+            if ((now - chrono) > interval)
+                save_registry(@registry)
+                chrono = now
+            end
+        end
+        # Save the registry and sleep until the remaining polling interval is over
+        save_registry(@registry)
+        sleeptime = interval - (Time.now.to_i - chrono)
+        Stud.stoppable_sleep(sleeptime) { stop? }
+    end
+    #  event = LogStash::Event.new("message" => @message, "host" => @host)
+  end # def run
+def stop
+    save_registry(@registry)
+end
+def full_read(filename)
+    return @blob_client.get_blob(container, filename)[1]
+end
+def partial_read_json(filename, offset, length)
+    content = @blob_client.get_blob(container, filename, start_range: offset-@tail.length, end_range: length-1)[1]
+    if content.end_with?(@tail)
+        # the tail is part of the last block, so included in the total length of the get_blob
+        return @head + strip_comma(content)
+    else
+        # when the file has grown between list_blobs and the time of partial reading, the tail will be wrong
+        return @head + strip_comma(content[0...-@tail.length]) + @tail
+    end
+end
+def strip_comma(str)
+    # when skipping over the first blocks the json will start with a comma that needs to be stripped. there should not be a trailing comma, but it gets stripped too
+    if str.start_with?(',')
+       str[0] = ''
+    end
+    str.nil? ? nil : str.chomp(",")
+end
+def nsgflowlog(queue, json)
+    count=0
+    json["records"].each do |record|
+      res = resource(record["resourceId"])
+      resource = { :subscription => res[:subscription], :resourcegroup => res[:resourcegroup], :nsg => res[:nsg] }
+      @logger.trace(resource.to_s)
+      record["properties"]["flows"].each do |flows|
+          rule = resource.merge ({ :rule => flows["rule"]})
+          flows["flows"].each do |flowx|
+              flowx["flowTuples"].each do |tup|
+                  tups = tup.split(',')
+                  ev = rule.merge({:unixtimestamp => tups[0], :src_ip => tups[1], :dst_ip => tups[2], :src_port => tups[3], :dst_port => tups[4], :protocol => tups[5], :direction => tups[6], :decision => tups[7]})
+                  if (record["properties"]["Version"]==2)
+                      ev.merge!( {:flowstate => tups[8], :src_pack => tups[9], :src_bytes => tups[10], :dst_pack => tups[11], :dst_bytes => tups[12]} )
+                  end
+                  unless iplookup.nil?
+                    ev.merge!(addip(tups[1], tups[2]))
+                  end
+                  @logger.trace(ev.to_s)
+                  event = LogStash::Event.new('message' => ev.to_json)
+                  decorate(event)
+                  queue << event
+                  count+=1
+              end
+          end
+      end
+    end
+    return count
+end
+def wadiislog(lines)
+      count=0
+      lines.each do |line|
+          unless line.start_with?('#')
+              queue << LogStash::Event.new('message' => ev.to_json)
+              count+=1
+          end
+      end
+      return count
+  # date {
+  #   match => [ "log_timestamp", "YYYY-MM-dd HH:mm:ss" ]
+  #   target => "@timestamp"
+  #   remove_field => ["log_timestamp"]
+  # }
+end
+# list all blobs in the blobstore, set the offsets from the registry and return the filelist
+def list_blobs()
+    files = Hash.new
+    nextMarker = nil
+    loop do
+        blobs = @blob_client.list_blobs(@container, { marker: nextMarker, prefix: @prefix })
+        blobs.each do |blob|
+            # exclude the registry itself
+            unless blob.name == @registry_path
+                offset = 0
+                length = blob.properties[:content_length].to_i
+                off = @registry[blob.name]
+                unless off.nil?
+                    @logger.debug(@pipe_id+" seen #{blob.name} which is #{length} with offset #{offset}")
+                    offset = off[:offset]
+                end
+                files.store(blob.name, { :offset => offset, :length => length })
+            end
+        end
+        nextMarker = blobs.continuation_token
+        break unless nextMarker && !nextMarker.empty?
+    end
+    @logger.debug(@pipe_id+" list_blobs found #{files.size} blobs")
+    return files
+end
+# When events were processed after the last registry save, start a thread to update the registry file.
+def save_registry(filelist)
+     # TODO because of threading, processed values and regsaved are not thread safe, they can change as instance variable @!
+     unless @processed == @regsaved
+         @regsaved = @processed
+         @logger.info(@pipe_id+" processed #{@processed} events, saving #{filelist.size} blobs and offsets to registry #{registry_path}")
+         Thread.new {
+           begin
+             @blob_client.create_block_blob(container, registry_path, Marshal.dump(filelist))
+           rescue
+             @logger.error(@pipe_id+" Oh my, registry write failed, do you have write access?")
+           end
+         }
+      end
+end
+def learn_encapsulation
+    # From one file, read first block and last block to learn head and tail
+    blob = @blob_client.list_blobs(container, { maxresults: 1, prefix: @prefix }).first
+    blocks = @blob_client.list_blob_blocks(container, blob.name)[:committed]
+    @logger.info(@pipe_id+" using #{blob.name} to learn the json header and tail")
+    @head = @blob_client.get_blob(container, blob.name, start_range: 0, end_range: blocks.first.size-1)[1]
+    @logger.info(@pipe_id+" learned header: #{@head}")
+    length = blob.properties[:content_length].to_i
+    offset = length - blocks.last.size
+    @tail = @blob_client.get_blob(container, blob.name, start_range: offset, end_range: length-1)[1]
+    @logger.info(@pipe_id+" learned tail: #{@tail}")
+end
+def resource(str)
+      temp = str.split('/')
+      date = '---'
+      unless temp[9].nil?
+        date = val(temp[9])+'/'+val(temp[10])+'/'+val(temp[11])+'-'+val(temp[12])+':00'
+      end
+      return {:subscription=> temp[2], :resourcegroup=>temp[4], :nsg=>temp[8], :date=>date}
+end
+def val(str)
+    return str.split('=')[1]
+end
+# Optional lookup for netname and hostname for the srcip and dstip returned in a Hash
+def addip(srcip, dstip)
+    #TODO: return anonymous merge
+    srcjson = JSON.parse(lookup(srcip))
+    dstjson = JSON.parse(lookup(dstip))
+    return {:srcnet=>srcjson["netname"],:srchost=>srcjson["hostname"],:dstnet=>dstjson["netname"],:dsthost=>dstjson["hostname"]}
+end
+def lookup(ip)
+    # TODO if ip in iplist return config
+    unless @red.nil?
+        res = @red.get(ip)
+    end
+    if res.nil?
+        res = Net::HTTP.get(URI(iplookup + ip))
+	unless @red.nil?
+            @red.set(ip, res)
+            @red.expire(ip,604800)
+        end
+    end
+    return res
+end
+end # class LogStash::Inputs::AzureBlobStorage

data/logstash-input-azure_blob_storage.gemspec ADDED

@@ -0,0 +1,26 @@
+Gem::Specification.new do |s|
+  s.name          = 'logstash-input-azure_blob_storage'
+  s.version       = '0.10.0'
+  s.licenses      = ['Apache-2.0']
+  s.summary       = 'This logstash plugin reads and parses data from Azure Storage Blobs.'
+  s.description   = 'This gem is a Logstash plugin. It reads and parses data from Azure Storage Blobs. The azure_blob_storage is a rewrite to replace azureblob from azure-diagnostics-tools/Logstash. It can deal with larger volumes and partial file reads and eliminating a delay when rebuilding the registry'
+  s.homepage      = 'https://github.com/janmg/logstash-input-azure_blob_storage'
+  s.authors       = ['Jan Geertsma']
+  s.email         = 'jan@janmg.com'
+  s.require_paths = ['lib']
+  # Files
+  s.files = Dir['lib/**/*','spec/**/*','vendor/**/*','*.gemspec','*.md','CONTRIBUTORS','Gemfile','LICENSE','NOTICE.TXT']
+   # Tests
+  s.test_files = s.files.grep(%r{^(test|spec|features)/})
+  # Special flag to let us know this is actually a logstash plugin
+  s.metadata = { "logstash_plugin" => "true", "logstash_group" => "input" }
+  # Gem dependencies
+  s.add_runtime_dependency "logstash-core-plugin-api", "~> 2.0"
+  s.add_runtime_dependency 'logstash-codec-plain', '~> 3.0'
+  s.add_runtime_dependency 'stud', '~> 0.0.22'
+  s.add_runtime_dependency 'azure-storage-blob', '~> 1.0'
+  s.add_development_dependency 'logstash-devutils', '~> 1.0', '>= 1.0.0'
+end

data/spec/inputs/azure_blob_storage_spec.rb ADDED

@@ -0,0 +1,11 @@
+# encoding: utf-8
+require "logstash/devutils/rspec/spec_helper"
+require "logstash/inputs/azure_blob_storage"
+describe LogStash::Inputs::AzureBlobStorage do
+  it_behaves_like "an interruptible input plugin" do
+    let(:config) { { "interval" => 100 } }
+  end
+end

metadata ADDED

@@ -0,0 +1,134 @@
+--- !ruby/object:Gem::Specification
+name: logstash-input-azure_blob_storage
+version: !ruby/object:Gem::Version
+  version: 0.10.0
+platform: ruby
+authors:
+- Jan Geertsma
+autorequire:
+bindir: bin
+cert_chain: []
+date: 2019-02-27 00:00:00.000000000 Z
+dependencies:
+- !ruby/object:Gem::Dependency
+  requirement: !ruby/object:Gem::Requirement
+    requirements:
+    - - "~>"
+      - !ruby/object:Gem::Version
+        version: '2.0'
+  name: logstash-core-plugin-api
+  prerelease: false
+  type: :runtime
+  version_requirements: !ruby/object:Gem::Requirement
+    requirements:
+    - - "~>"
+      - !ruby/object:Gem::Version
+        version: '2.0'
+- !ruby/object:Gem::Dependency
+  requirement: !ruby/object:Gem::Requirement
+    requirements:
+    - - "~>"
+      - !ruby/object:Gem::Version
+        version: '3.0'
+  name: logstash-codec-plain
+  prerelease: false
+  type: :runtime
+  version_requirements: !ruby/object:Gem::Requirement
+    requirements:
+    - - "~>"
+      - !ruby/object:Gem::Version
+        version: '3.0'
+- !ruby/object:Gem::Dependency
+  requirement: !ruby/object:Gem::Requirement
+    requirements:
+    - - "~>"
+      - !ruby/object:Gem::Version
+        version: 0.0.22
+  name: stud
+  prerelease: false
+  type: :runtime
+  version_requirements: !ruby/object:Gem::Requirement
+    requirements:
+    - - "~>"
+      - !ruby/object:Gem::Version
+        version: 0.0.22
+- !ruby/object:Gem::Dependency
+  requirement: !ruby/object:Gem::Requirement
+    requirements:
+    - - "~>"
+      - !ruby/object:Gem::Version
+        version: '1.0'
+  name: azure-storage-blob
+  prerelease: false
+  type: :runtime
+  version_requirements: !ruby/object:Gem::Requirement
+    requirements:
+    - - "~>"
+      - !ruby/object:Gem::Version
+        version: '1.0'
+- !ruby/object:Gem::Dependency
+  requirement: !ruby/object:Gem::Requirement
+    requirements:
+    - - "~>"
+      - !ruby/object:Gem::Version
+        version: '1.0'
+    - - ">="
+      - !ruby/object:Gem::Version
+        version: 1.0.0
+  name: logstash-devutils
+  prerelease: false
+  type: :development
+  version_requirements: !ruby/object:Gem::Requirement
+    requirements:
+    - - "~>"
+      - !ruby/object:Gem::Version
+        version: '1.0'
+    - - ">="
+      - !ruby/object:Gem::Version
+        version: 1.0.0
+description: This gem is a Logstash plugin. It reads and parses data from Azure Storage
+  Blobs. The azure_blob_storage is a rewrite to replace azureblob from azure-diagnostics-tools/Logstash.
+  It can deal with larger volumes and partial file reads and eliminating a delay when
+  rebuilding the registry
+email: jan@janmg.com
+executables: []
+extensions: []
+extra_rdoc_files: []
+files:
+- CHANGELOG.md
+- CONTRIBUTORS
+- DEVELOPER.md
+- Gemfile
+- LICENSE
+- README.md
+- lib/logstash/inputs/azure_blob_storage.rb
+- logstash-input-azure_blob_storage.gemspec
+- spec/inputs/azure_blob_storage_spec.rb
+homepage: https://github.com/janmg/logstash-input-azure_blob_storage
+licenses:
+- Apache-2.0
+metadata:
+  logstash_plugin: 'true'
+  logstash_group: input
+post_install_message:
+rdoc_options: []
+require_paths:
+- lib
+required_ruby_version: !ruby/object:Gem::Requirement
+  requirements:
+  - - ">="
+    - !ruby/object:Gem::Version
+      version: '0'
+required_rubygems_version: !ruby/object:Gem::Requirement
+  requirements:
+  - - ">="
+    - !ruby/object:Gem::Version
+      version: '0'
+requirements: []
+rubyforge_project:
+rubygems_version: 2.6.13
+signing_key:
+specification_version: 4
+summary: This logstash plugin reads and parses data from Azure Storage Blobs.
+test_files:
+- spec/inputs/azure_blob_storage_spec.rb