RubyGems - logstash-input-azureblob - Versions diffs - 0.9.12-java - Mend

logstash-input-azureblob 0.9.12-java

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (13) hide show

checksums.yaml +7 -0
data/CHANGELOG.md +7 -0
data/Gemfile +2 -0
data/LICENSE +17 -0
data/README.md +253 -0
data/lib/com/microsoft/json-parser.rb +202 -0
data/lib/logstash-input-azureblob_jars.rb +10 -0
data/lib/logstash/inputs/azureblob.rb +500 -0
data/lib/org/glassfish/javax.json/1.1/javax.json-1.1.jar +0 -0
data/logstash-input-azureblob.gemspec +32 -0
data/spec/com/microsoft/json-parser_spec.rb +280 -0
data/spec/inputs/azureblob_spec.rb +324 -0
metadata +165 -0

checksums.yaml ADDED

@@ -0,0 +1,7 @@
+---
+SHA256:
+  metadata.gz: 5947577417b1f859db0712b7c414a536f4456a359d222d5069cc400d8a4ceb50
+  data.tar.gz: 2c57b9f7ec19871095b19f0eb7fa07f05e3d7b67386b7815ee583331d50a10f6
+SHA512:
+  metadata.gz: dd9c54213183b732055ccf15470b41e0428933f942ac23911abefa9a535453e7b01721a922ad5a3677d90a581ddd4603628fdfc5655682a66fe6fa9045cdf737
+  data.tar.gz: 11ea4a6e8d69e1640bcbc078c7d01bc93b2f9a5cf45d30c6a2d12357f8cdb30dbbe482ad8a44a08eb3bbd6ac8808766b5c9b5c1ae8fcbf5321c23092103d68e0

data/CHANGELOG.md ADDED

@@ -0,0 +1,7 @@
+## 2016.08.17
+* Added a new configuration parameter for custom endpoint.
+## 2016.05.05
+* Made the plugin to respect Logstash shutdown signal.
+* Updated the *logstash-core* runtime dependency requirement to '~> 2.0'.
+* Updated the *logstash-devutils* development dependency requirement to '>= 0.0.16'

data/Gemfile ADDED

	@@ -0,0 +1,2 @@
1	+ source 'https://rubygems.org'
2	+ gemspec

data/LICENSE ADDED

@@ -0,0 +1,17 @@
+Copyright (c) Microsoft.  All rights reserved.
+Microsoft would like to thank its contributors, a list
+of whom are at http://aka.ms/entlib-contributors
+Licensed under the Apache License, Version 2.0 (the "License"); you
+may not use this file except in compliance with the License. You may
+obtain a copy of the License at
+http://www.apache.org/licenses/LICENSE-2.0
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
+implied. See the License for the specific language governing permissions
+and limitations under the License.

data/README.md ADDED

@@ -0,0 +1,253 @@
+# Logstash input plugin for Azure Storage Blobs
+## Summary
+This plugin reads and parses data from Azure Storage Blobs.
+## Installation
+You can install this plugin using the Logstash "plugin" or "logstash-plugin" (for newer versions of Logstash) command:
+```sh
+logstash-plugin install logstash-input-azureblob
+```
+For more information, see Logstash reference [Working with plugins](https://www.elastic.co/guide/en/logstash/current/working-with-plugins.html).
+## Configuration
+### Required Parameters
+__*storage_account_name*__
+The storage account name.
+__*storage_access_key*__
+The access key to the storage account.
+__*container*__
+The blob container name.
+### Optional Parameters
+__*endpoint*__
+Specifies the endpoint of Azure Service Management. The default value is `core.windows.net`.
+__*registry_path*__
+Specifies the file path for the registry file to record offsets and coordinate between multiple clients. The default value is `data/registry`.
+Overwrite this value when there happen to be a file at the path of `data/registry` in the azure blob container.
+__*interval*__
+Set how many seconds to idle before checking for new logs. The default, `30`, means idle for `30` seconds.
+__*registry_create_policy*__
+Specifies the way to initially set offset for existing blob files.
+This option only applies for registry creation.
+Valid values include:
+  - resume
+  - start_over
+The default, `resume`, means when the registry is initially created, it assumes all blob has been consumed and it will start to pick up any new content in the blobs.
+When set to `start_over`, it assumes none of the blob is consumed and it will read all blob files from begining.
+Offsets will be picked up from registry file whenever it exists.
+__*file_head_bytes*__
+Specifies the header of the file in bytes that does not repeat over records. Usually, these are json opening tags. The default value is `0`.
+__*file_tail_bytes*__
+Specifies the tail of the file that does not repeat over records. Usually, these are json closing tags. The defaul tvalue is `0`.
+### Advanced tweaking parameters
+Keep these parameters default to use under normal situration. Tweak these parameters when dealing with large scale azure blobs and logs.
+__*blob_list_page_size*__
+Specifies the page-size for returned blob items. Too big number will hit heap overflow; Too small number will leads to too many requests. The default of `100` is good for heap size of 1G.
+__*file_chunk_size_bytes*__
+Specifies the buffer size used to download the blob content. This is also the maximum buffer size that will be passed to a codec except for JSON. The JSON codec will only receive valid JSON that might span between multiple chunks. Any malformed JSON content will be skipped.
+The default value is 4194304 (4MB)
+### Examples
+* Bare-bone settings:
+```yaml
+input
+{
+    azureblob
+    {
+        storage_account_name => "mystorageaccount"
+        storage_access_key => "VGhpcyBpcyBhIGZha2Uga2V5Lg=="
+        container => "mycontainer"
+    }
+}
+```
+* Example for Wad-IIS
+```yaml
+input {
+    azureblob
+    {
+        storage_account_name => 'mystorageaccount'
+        storage_access_key => 'VGhpcyBpcyBhIGZha2Uga2V5Lg=='
+        container => 'wad-iis-logfiles'
+        codec => line
+    }
+}
+filter {
+  ## Ignore the comments that IIS will add to the start of the W3C logs
+  #
+  if [message] =~ "^#" {
+    drop {}
+  }
+  grok {
+      # https://grokdebug.herokuapp.com/
+      match => ["message", "%{TIMESTAMP_ISO8601:log_timestamp} %{WORD:sitename} %{WORD:computername} %{IP:server_ip} %{WORD:method} %{URIPATH:uriStem} %{NOTSPACE:uriQuery} %{NUMBER:port} %{NOTSPACE:username} %{IPORHOST:clientIP} %{NOTSPACE:protocolVersion} %{NOTSPACE:userAgent} %{NOTSPACE:cookie} %{NOTSPACE:referer} %{NOTSPACE:requestHost} %{NUMBER:response} %{NUMBER:subresponse} %{NUMBER:win32response} %{NUMBER:bytesSent} %{NUMBER:bytesReceived} %{NUMBER:timetaken}"]
+  }
+  ## Set the Event Timesteamp from the log
+  #
+  date {
+    match => [ "log_timestamp", "YYYY-MM-dd HH:mm:ss" ]
+      timezone => "Etc/UTC"
+  }
+  ## If the log record has a value for 'bytesSent', then add a new field
+  #   to the event that converts it to kilobytes
+  #
+  if [bytesSent] {
+    ruby {
+      code => "event.set('kilobytesSent', event.get('bytesSent').to_i / 1024.0)"
+    }
+  }
+  ## Do the same conversion for the bytes received value
+  #
+  if [bytesReceived] {
+    ruby {
+      code => "event.set('kilobytesReceived', event.get('bytesReceived').to_i / 1024.0 )"
+    }
+  }
+  ## Perform some mutations on the records to prep them for Elastic
+  #
+  mutate {
+    ## Convert some fields from strings to integers
+    #
+    convert => ["bytesSent", "integer"]
+    convert => ["bytesReceived", "integer"]
+    convert => ["timetaken", "integer"]
+    ## Create a new field for the reverse DNS lookup below
+    #
+    add_field => { "clientHostname" => "%{clientIP}" }
+    ## Finally remove the original log_timestamp field since the event will
+    #   have the proper date on it
+    #
+    remove_field => [ "log_timestamp"]
+  }
+  ## Do a reverse lookup on the client IP to get their hostname.
+  #
+  dns {
+    ## Now that we've copied the clientIP into a new field we can
+    #   simply replace it here using a reverse lookup
+    #
+    action => "replace"
+    reverse => ["clientHostname"]
+  }
+  ## Parse out the user agent
+  #
+  useragent {
+    source=> "useragent"
+    prefix=> "browser"
+  }
+}
+output {
+    file {
+        path => '/var/tmp/logstash-file-output'
+        codec => rubydebug
+    }
+    stdout {
+        codec => rubydebug
+    }
+}
+```
+* NSG Logs
+```yaml
+input {
+   azureblob
+     {
+         storage_account_name => "mystorageaccount"
+         storage_access_key => "VGhpcyBpcyBhIGZha2Uga2V5Lg=="
+         container => "insights-logs-networksecuritygroupflowevent"
+         codec => "json"
+         # Refer https://docs.microsoft.com/en-us/azure/network-watcher/network-watcher-read-nsg-flow-logs
+         # Typical numbers could be 21/9 or 12/2 depends on the nsg log file types
+         file_head_bytes => 21
+         file_tail_bytes => 9
+         # Enable / tweak these settings when event is too big for codec to handle.
+         # break_json_down_policy => "with_head_tail"
+         # break_json_batch_count => 2
+     }
+   }
+   filter {
+     split { field => "[records]" }
+     split { field => "[records][properties][flows]"}
+     split { field => "[records][properties][flows][flows]"}
+     split { field => "[records][properties][flows][flows][flowTuples]"}
+  mutate{
+   split => { "[records][resourceId]" => "/"}
+   add_field => {"Subscription" => "%{[records][resourceId][2]}"
+                 "ResourceGroup" => "%{[records][resourceId][4]}"
+                 "NetworkSecurityGroup" => "%{[records][resourceId][8]}"}
+   convert => {"Subscription" => "string"}
+   convert => {"ResourceGroup" => "string"}
+   convert => {"NetworkSecurityGroup" => "string"}
+   split => { "[records][properties][flows][flows][flowTuples]" => ","}
+   add_field => {
+               "unixtimestamp" => "%{[records][properties][flows][flows][flowTuples][0]}"
+               "srcIp" => "%{[records][properties][flows][flows][flowTuples][1]}"
+               "destIp" => "%{[records][properties][flows][flows][flowTuples][2]}"
+               "srcPort" => "%{[records][properties][flows][flows][flowTuples][3]}"
+               "destPort" => "%{[records][properties][flows][flows][flowTuples][4]}"
+               "protocol" => "%{[records][properties][flows][flows][flowTuples][5]}"
+               "trafficflow" => "%{[records][properties][flows][flows][flowTuples][6]}"
+               "traffic" => "%{[records][properties][flows][flows][flowTuples][7]}"
+                }
+   convert => {"unixtimestamp" => "integer"}
+   convert => {"srcPort" => "integer"}
+   convert => {"destPort" => "integer"}
+  }
+  date{
+    match => ["unixtimestamp" , "UNIX"]
+  }
+ }
+ output {
+   stdout { codec => rubydebug }
+ }
+```
+## More information
+The source code of this plugin is hosted in GitHub repo [Microsoft Azure Diagnostics with ELK](https://github.com/Azure/azure-diagnostics-tools). We welcome you to provide feedback and/or contribute to the project.

data/lib/com/microsoft/json-parser.rb ADDED

@@ -0,0 +1,202 @@
+# encoding: utf-8
+require Dir[ File.dirname(__FILE__) + "/../../*_jars.rb" ].first
+# Interface for a class that reads strings of arbitrary length from the end of a container
+class LinearReader
+    # returns [content, are_more_bytes_available]
+    # content is a string
+    # are_more_bytes_available is a boolean stating if the container has more bytes to read
+    def read()
+        raise 'not implemented'
+    end
+end
+class JsonParser
+  def initialize(logger, linear_reader)
+    @logger = logger
+    @linear_reader = linear_reader
+    @stream_base_offset = 0
+    @stream_reader = StreamReader.new(@logger,@linear_reader)
+    @parser_factory = javax::json::Json.createParserFactory(nil)
+    @parser = @parser_factory.createParser(@stream_reader)
+  end
+  def parse(on_json_cbk, on_skip_malformed_cbk)
+    completed = false
+    while !completed
+      completed, start_index, end_index = parse_single_object(on_json_cbk)
+      if !completed
+        # if current position in the stream is not a well formed JSON then
+        # I can skip all future chars until I find a '{' so I won't have to create the parser for each char
+        json_candidate_start_index = @stream_reader.find('{',   end_index)
+        json_candidate_start_index = @stream_reader.get_cached_stream_length - 1 if json_candidate_start_index.nil?
+        @logger.debug("JsonParser::parse Skipping Malformed JSON (start: #{start_index} end: #{end_index} candidate: #{json_candidate_start_index - 1}).  Resetting the parser")
+        end_index = json_candidate_start_index - 1
+        on_skip_malformed_cbk.call(@stream_reader.get_stream_buffer(start_index, end_index))
+        @stream_reader.drop_stream(end_index + 1)
+        @stream_reader.reset_cached_stream_index(0)
+        @stream_base_offset = 0
+        @parser.close()
+        if @stream_reader.get_cached_stream_length <= 1
+          on_skip_malformed_cbk.call(@stream_reader.get_stream_buffer(0, -1))
+          return
+        end
+        @parser  = @parser_factory.createParser(@stream_reader)
+      end
+    end
+  end
+  private
+  def parse_single_object(on_json_cbk)
+    depth = 0
+    stream_start_offset = 0
+    stream_end_offset = 0
+    while @parser.hasNext
+      event = @parser.next
+      if event == javax::json::stream::JsonParser::Event::START_OBJECT
+        depth = depth + 1
+      elsif event == javax::json::stream::JsonParser::Event::END_OBJECT
+        depth = depth - 1 # can't be negative because the parser handles the format correctness
+        if depth == 0
+          stream_end_offset = @parser.getLocation() .getStreamOffset() - 1
+          @logger.debug ("JsonParser::parse_single_object Json  object found stream_start_offset: #{stream_start_offset} stream_end_offset: #{stream_end_offset}")
+          on_json_cbk.call(@stream_reader.get_stream_buffer(stream_start_offset - @stream_base_offset,  stream_end_offset - @stream_base_offset))
+          stream_start_offset = stream_end_offset + 1
+          #Drop parsed bytes
+          @stream_reader.drop_stream(stream_end_offset  - @stream_base_offset)
+          @stream_base_offset = stream_end_offset
+        end
+      end
+    end
+    return true
+    rescue javax::json::stream::JsonParsingException => e
+      return false, stream_start_offset - @stream_base_offset,
+      @parser.getLocation().getStreamOffset() - 1 - @stream_base_offset
+    rescue javax::json::JsonException, java::util::NoSuchElementException => e
+      @logger.debug("JsonParser::parse_single_object Exception stream_start_offset: #{stream_start_offset} stream_end_offset: #{stream_end_offset}")
+      raise e
+  end
+end # class JsonParser
+class StreamReader < java::io::Reader
+  def initialize(logger, reader)
+    super()
+    @logger = logger
+    @reader = reader
+    @stream_buffer = ""
+    @is_full_stream_read = false
+    @index = 0
+    @stream_buffer_length = 0
+  end
+  def markSupported
+    return false
+  end
+  def close
+  end
+  def get_cached_stream_length
+    return @stream_buffer_length
+  end
+  def get_cached_stream_index
+    return @index
+  end
+  def get_stream_buffer(start_index, end_index)
+    return @stream_buffer[start_index..end_index]
+  end
+  def find(substring, offset)
+    return @stream_buffer.index(substring, offset)
+  end
+  def drop_stream(until_offset)
+    @logger.debug("StreamReader::drop_stream until_offset:#{until_offset} index: #{@index}")
+    if @index < until_offset
+      return
+    end
+    @stream_buffer = @stream_buffer[until_offset..-1]
+    @index = @index - until_offset
+    @stream_buffer_length = @stream_buffer_length - until_offset
+  end
+  def reset_cached_stream_index(new_offset)
+    @logger.debug("StreamReader::reset_cached_stream_index new_offset:#{new_offset} index: #{@index}")
+    if new_offset < 0
+      return
+    end
+    @index = new_offset
+  end
+  #offset refers to the offset in the output bufferhttp://docs.oracle.com/javase/7/docs/api/java/io/Reader.html#read(char[],%20int,%20int)
+  def read(buf, offset, len)
+    @logger.debug("StreamReader::read #{offset} #{len}  | stream index: #{@index} stream length: #{@stream_buffer_length}")
+    are_all_bytes_available = true
+    if @index + len - offset > @stream_buffer_length
+      are_all_bytes_available = fill_stream_buffer(@index + len - offset - @stream_buffer_length)
+    end
+    if (@stream_buffer_length - @index) < len
+      len = @stream_buffer_length - @index
+      @logger.debug("StreamReader::read #{offset} Actual length: #{len}")
+    end
+    if len > 0
+      #TODO: optimize this
+      jv_string = @stream_buffer[@index..@index+len-1].to_java
+      jv_bytes_array = jv_string.toCharArray()
+      java::lang::System.arraycopy(jv_bytes_array, 0, buf, offset, len)
+      @index = @index + len
+    end
+    if !are_all_bytes_available && len == 0
+      @logger.debug("StreamReader::read end of stream")
+      return -1
+    else
+      return len
+    end
+    rescue java::lang::IndexOutOfBoundsException => e
+      @logger.debug("StreamReader::read IndexOutOfBoundsException")
+      raise e
+    rescue java::lang::ArrayStoreException => e
+      @logger.debug("StreamReader::read ArrayStoreException")
+      raise e
+    rescue java::lang::NullPointerException => e
+      @logger.debug("StreamReader::read NullPointerException")
+      raise e
+  end
+  private
+  def fill_stream_buffer(len)
+    @logger.debug("StreamReader::fill_stream_buffer #{len}")
+    bytes_read = 0
+    while bytes_read < len
+      content, are_more_bytes_available = @reader.read
+      if !content.nil? && content.length > 0
+        @stream_buffer << content
+        @stream_buffer_length  = @stream_buffer_length + content.length
+        bytes_read = bytes_read + content.length
+      end
+      if !are_more_bytes_available
+        return false
+      end
+    end
+    return true
+  end
+end # class StreamReader