RubyGems - logstash-filter-augment - Versions diffs - 0.1.0 → 0.2.0 - Mend

logstash-filter-augment 0.1.0 → 0.2.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (7) hide show

checksums.yaml +4 -4
data/README.md +88 -10
data/lib/logstash/filters/augment.rb +9 -2
data/logstash-filter-augment.gemspec +1 -1
data/spec/filters/augment_spec.rb +19 -0
data/spec/fixtures/test-with-tabs.txt +2 -0
metadata +4 -2

checksums.yaml CHANGED

@@ -1,7 +1,7 @@
 ---
 SHA1:
-  metadata.gz: c4299e05ab87426f7d354fc3133aa115e064e07b
-  data.tar.gz: b78aecd5b87b25e96eab92240808ccdc1221d88b
+  metadata.gz: 2a72a388e9e7ae66e9f689bd614d6917d2796c78
+  data.tar.gz: bcdfce52b299a4187fe846e7d8cd2e385b0ead28
 SHA512:
-  metadata.gz: 7dc36059f8478636395e7f12ab12b8a7b1d283c03c35434a4e9e699211de6262d35efab81c155ff443f03190e11a18e601bd5701342dc95a33d4551bae9cb22b
-  data.tar.gz: 2f4f7d31525289ac828bc17cdafd38c23e1f82a96456f8047a8fe2cac2d5ef988240d5a7f439ad590de5b9a94fb9fe384c2bd531181bc0dc77cd8d97807a6100
+  metadata.gz: 355a1d212bccd9c4af829d45c6e1e73c0395ccf33de9dbb4a20f7f5697d9e2aaf4b816dc1d0580e82cc4637710deddb7cb7acb8db7413a948c80fb8f27e0a017
+  data.tar.gz: d8c07568b6b23f336065d0a1860f4d6fc83b59bb1c0ce4c17749cc82a22eb601d88301dad8279eeaea8c06f8ca7ad26fcdf4f4fd30d0a51880cd3195b8648241

data/README.md CHANGED

@@ -8,11 +8,46 @@ It can be used to augment events in logstash from config, CSV file, JSON file, o
 ## Documentation
-The logstash-filter-augment plugin can be configured statically like this:
+logstash-filter-augment is a logstash plugin for augmenting events with data from a config file or exteral file (in CSV, JSON, or YAML format).  The filter takes a `field` parameter that specifies what is being looked up.  Based on configuration, it will find the object that is referred to and add the fields of that object to your event.
+In the case of a CSV file, you'll want to specify the `csv_key` to tell it which field of the file is the key (it'll default to the first column in the CSV if you don't specify).  If your CSV file doesn't contain a header row, you'll need to set the `csv_header` to be an array of the column names.  If you do have a header, you can still specify the `csv_header`, but be sure to also specify that you want to `csv_first_line => ignore`.
+In the case of JSON, you can provide a simple dictionary that maps the keys to the objects:
+```json
+{
+  "200": { "color": "green", "message": "ok" }
+}
+```
+or in Array format:
+```json
+[
+{"code": 200, "color": "green", "message": "ok"}
+]
+```
+but then you'll have to provide a `json_key => "code"` parameter in your config file to let it know which field you want to use for lookups.
+YAML works the same as JSON -- you can specify either a dictionary or an array:
+```yaml
+200:
+  color: green
+  message: ok
+404:
+  color: red
+  message: not found
+```
+or
+```yaml
+- code: 200
+  color: green
+  message: ok
+- code: 404
+  color: red
+  message: not found
+```
+but again, you'll need to specify the `yaml_key => "code"`
+Finally you can configure logstash-filter-augment statically with a dictionary:
 ```ruby
-filter {
-  augment {
-    field => "status"
     dictionary => {
         "200" => {
           "color" => "green"
@@ -23,17 +58,60 @@ filter {
           "message" => "Missing"
         }
       }
-      augment_default => {
+      default => {
         "color" => "orange"
         "message" => "not found"
       }
   }
-}
 ```
-And then when an event with status=200 in, it will add color=green and message=OK to the event
-Additionally you use a CSV, YAML, or JSON file to define the mapping.
+If you choose this route, be careful that you quote your keys or you could end up with weird logstash errors.
+### config parameters
+| parameter | required (default)| Description  |
+| --------- |:---:| ---|
+| field | Yes | the field of the event to look up in the dictionary |
+| dictionary_path | Yes if `dictionary` isn't provided | The list of files to load |
+| dictionary_type | No (auto) | The type of files provided on dictionary_path.  Allowed values are `auto`, `csv`, `json`, `yaml`, and `yml` |
+| dictionary | Yes if `dictionary_path` isn't provided | A dictionary to use.  See example above |
+| csv_header | No | The header fields of the csv_file |
+| csv_first_line | No (auto) | indicates what to do with the first line of the file.  Valid values are `ignore`, `header`, `data`, and `auto`.  `auto` treats the first line as data if csv_header is set or `header` if it isn't |
+| csv_key | No | On CSV files, which field name is the key.  Defaults to the first column of the file if not set |
+| csv_remove_key | No(true) | Remove the key from the object.  You might want to set this to false if you don't have a `default` set so that you know which records were matched |
+| csv_col_sep | No(,) | Change the column seperator for CSV files.  If you need to use tabs, you have to embed a real tab in the quotes |
+| csv_quote_char | No(") | Change the quote character for CSV files |
+| json_key | Yes, if array | The field of the json objects to use as a key for the dictionary |
+| json_remove_key | No | Similar to csv_remove_key |
+| yaml_key | Yes, if array | The field of the YAML objects to use as a key for the dictionary |
+| yaml_remove_key | No | Similar to csv_remove_key |
+| augment_fields | No (all fields) | The fields to copy from the object to the target.  If this is specified, only these fields will be copied. |
+| ignore_fields | No | If this list is specified and `augment_fields` isn't, then these fields will not be copied |
+| default | No | A dictionary of fields to add to the target if the key isn't in the data |
+| target | No ("") | Where to target the fields.  If this is left as the default "", it targets the event itself.  Otherwise you can specify a valid event selector.  For example, [user][location] Would set user.location.{fields from object} |
+| refresh_interval | No (60) | The number of seconds between checking to see if the file has been modified. Set to -1 to disable checking, set to 0 to check on every event (not recommended)|
+## Use Cases
+### Geocoding by key
+If you have a field that can be used to lookup a location and you have a location file, you could configure this way:
+```ruby
+   augment {
+      field => "store"
+      target => "[location]"
+      dictionary_path => "geocode.csv"
+      csv_header => ["id","lat","lon"]
+      csv_key => "id"
+      csv_first_line => "data"
+   }
+```
+and then be sure that your mapping / mapping template changes "location" into a geo_point
+### Attach multiple pieces of user data based on user key
+```ruby
+  augment {
+    field => "username"
+    dictionary_path => ["users1.csv", "users2.csv"]
+    csv_header => ["username","fullName","address1","address2","city","state","zipcode"]
+    csv_key => "id"
+    csv_first_line => "ignore"
+  }
+```
 ## Developing
 ### 1. Plugin Developement and Testing

data/lib/logstash/filters/augment.rb CHANGED

@@ -59,7 +59,7 @@ class LogStash::Filters::Augment < LogStash::Filters::Base
   #  - 'ignore' skips it (csv_header must be set)
   #  - 'header' reads it and populates csv_header with it (csv_header must not be set)
   #  - 'data' reads it as data (csv_header must be set)
-  #  - 'auto' treats the first line as data if csv_header is set or header if csv_data isn't set
+  #  - 'auto' treats the first line as `data` if csv_header is set or `header` if it isn't
   config :csv_first_line, :validate => ["data","header","ignore","auto"], :default=>"auto"
   # the csv_key determines which field of the csv file is the dictionary key
   # if this is not set, it will default to first column of the csv file
@@ -70,6 +70,10 @@ class LogStash::Filters::Augment < LogStash::Filters::Base
   # is false then the event will have a status=200.  If csv_remove_key is true, then the event won't have
   # a status unless it already existed in the event.
   config :csv_remove_key, :validate => :boolean, :default => true
+  # the column seperator for a CSV file
+  config :csv_col_sep, :validate => :string, :default => ","
+  # the quote character for a CSV file
+  config :csv_quote_char, :validate => :string, :default => '"'
   # if the json file provided is an array, this specifies which field of the
   # array of objects is the key value
   config :json_key, :validate => :string
@@ -265,7 +269,7 @@ private
         raise LogStash::ConfigurationError, "The csv_first_line is set to 'ignore' but csv_header is not set"
       end
     end
-    csv_lines = CSV.read(filename);
+    csv_lines = CSV.read(filename,{ :col_sep => @csv_col_sep, :quote_char => @csv_quote_char });
     if @csv_first_line == 'header'
       @csv_header = csv_lines.shift
     elsif @csv_first_line == 'ignore'
@@ -316,6 +320,9 @@ private
     if ! @dictionaries
       return
     end
+    if @refresh_interval < 0 && @dictionary_mtime # don't refresh if we aren't supposed to
+      return
+    end
     if (@next_refresh && @next_refresh + @refresh_interval < Time.now)
       return
     end

data/logstash-filter-augment.gemspec CHANGED

@@ -1,6 +1,6 @@
 Gem::Specification.new do |s|
   s.name          = 'logstash-filter-augment'
-  s.version       = '0.1.0'
+  s.version       = '0.2.0'
   s.licenses      = ['Apache License (2.0)']
   s.summary       = 'A logstash plugin to augment your events from data in files'
   s.description   = 'A logstash plugin that can merge data from CSV, YAML, and JSON files with events.'

data/spec/filters/augment_spec.rb CHANGED

@@ -69,6 +69,25 @@ describe LogStash::Filters::Augment do
       expect { subject }.to raise_exception LogStash::ConfigurationError
     end
   end
+  describe "csv file with a options set" do
+    filename = File.join(File.dirname(__FILE__), "..", "fixtures", "test-with-tabs.txt")
+    config <<-CONFIG
+    filter {
+      augment {
+        field => "status"
+        dictionary_path => '#{filename}'
+        dictionary_type => "csv"
+        csv_first_line => "data"
+        csv_header => ["status","color","message"]
+        csv_col_sep => "	"
+      }
+    }
+    CONFIG
+    sample("status" => "200") do
+      insist { subject.get("color")} == "green"
+      insist { subject.get("message")} == "ok"
+    end
+  end
   describe "simple csv file with header ignored" do
     filename = File.join(File.dirname(__FILE__), "..", "fixtures", "test-with-headers.csv")
     config <<-CONFIG

data/spec/fixtures/test-with-tabs.txt ADDED

	@@ -0,0 +1,2 @@
1	+ 200 green ok
2	+ 404 red not found

metadata CHANGED

@@ -1,14 +1,14 @@
 --- !ruby/object:Gem::Specification
 name: logstash-filter-augment
 version: !ruby/object:Gem::Version
-  version: 0.1.0
+  version: 0.2.0
 platform: ruby
 authors:
 - Adam Caldwell
 autorequire:
 bindir: bin
 cert_chain: []
-date: 2017-01-16 00:00:00.000000000 Z
+date: 2017-02-13 00:00:00.000000000 Z
 dependencies:
 - !ruby/object:Gem::Dependency
   requirement: !ruby/object:Gem::Requirement
@@ -56,6 +56,7 @@ files:
 - spec/fixtures/json-array.json
 - spec/fixtures/json-hash.json
 - spec/fixtures/test-with-headers.csv
+- spec/fixtures/test-with-tabs.txt
 - spec/fixtures/test-without-headers.csv
 - spec/fixtures/yaml-array.yaml
 - spec/fixtures/yaml-object.yaml
@@ -92,6 +93,7 @@ test_files:
 - spec/fixtures/json-array.json
 - spec/fixtures/json-hash.json
 - spec/fixtures/test-with-headers.csv
+- spec/fixtures/test-with-tabs.txt
 - spec/fixtures/test-without-headers.csv
 - spec/fixtures/yaml-array.yaml
 - spec/fixtures/yaml-object.yaml