fluent-plugin-elasticsearch 2.0.0 → 2.0.1.rc.1

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA1:
3
- metadata.gz: be60a48d14569fe431d01e4791dad39dca520057
4
- data.tar.gz: 5a58c1c390eedcf2c7a45a511af2d5a99c5c44b4
3
+ metadata.gz: a1eace20ded89e27d5d2ff1b27756dec9d616694
4
+ data.tar.gz: 662270c8258077cfd76d824c9015fce4c441bde7
5
5
  SHA512:
6
- metadata.gz: bd02e104796941648c9dee0dcda38f29426a2ad7d8971003a834f1ad304d27f3048c68adbb0d166655580828ef4142a2727b346d578eb3bf9e743a2377fa55e3
7
- data.tar.gz: 52168dd54eedf52993ae2a21265ae5b3310404b778e4a4da2aad463d3b37fdc921d7d6176a0f4d47311705cc24d1e16b9d9381c771e77fad1e48bc4cad4668f6
6
+ metadata.gz: 947cbfbe4cd83dc106befe56408cd8ad954c385bc98742381ed1310e7e641f75c319348098b211848daaeafca72eada16d863132bf85bc4a25902f4bd8ad01c4
7
+ data.tar.gz: 52a61b0d82ca4c9cc2be40792424a846cf96d521adb184ee6da030ad19949dba0a69e876d6ae82c571b56011fd8317c8cf3a7cc83356aadee7bf5d461771d8dd
data/History.md CHANGED
@@ -4,6 +4,9 @@
4
4
  - Log ES response errors (#230)
5
5
  - Use latest elasticsearch-ruby (#240)
6
6
 
7
+ ### 2.0.1.rc.1
8
+ - Add generating hash id mechanism to avoid records duplication (#318)
9
+
7
10
  ### 2.0.0
8
11
  - Release for Fluentd v0.14 stable.
9
12
 
data/README.md CHANGED
@@ -53,6 +53,7 @@ Current maintainers: @cosmo0920
53
53
  + [Proxy Support](#proxy-support)
54
54
  + [Buffer options](#buffer-options)
55
55
  + [Hash flattening](#hash-flattening)
56
+ + [Generate Hash ID](#generate-hash-id)
56
57
  + [Not seeing a config you need?](#not-seeing-a-config-you-need)
57
58
  + [Dynamic configuration](#dynamic-configuration)
58
59
  + [Placeholders](#placeholders)
@@ -347,7 +348,7 @@ reload_on_failure true # defaults to false
347
348
  You can set in the elasticsearch-transport how often dead connections from the elasticsearch-transport's pool will be resurrected.
348
349
 
349
350
  ```
350
- resurrect_after 5 # defaults to 60s
351
+ resurrect_after 5s # defaults to 60s
351
352
  ```
352
353
 
353
354
  ### include_tag_key, tag_key
@@ -388,6 +389,29 @@ This following record `{"name": "Johnny", "request_id": "87d89af7daffad6"}` will
388
389
  { "name": "Johnny", "request_id": "87d89af7daffad6" }
389
390
  ```
390
391
 
392
+ Fluentd re-emits events that failed to be indexed/ingested in Elasticsearch with a new and unique `_id` value, this means that congested Elasticsearch clusters that reject events (due to command queue overflow, for example) will cause Fluentd to re-emit the event with a new `_id`, however Elasticsearch may actually process both (or more) attempts (with some delay) and create duplicate events in the index (since each have a unique `_id` value), one possible workaround is to use the [fluent-plugin-genhashvalue](https://github.com/mtakemi/fluent-plugin-genhashvalue) plugin to generate a unique `_hash` key in the record of each event, this `_hash` record can be used as the `id_key` to prevent Elasticsearch from creating deplicate events.
393
+
394
+ ```
395
+ id_key _hash
396
+ ```
397
+
398
+ Example configuration for [fluent-plugin-genhashvalue](https://github.com/mtakemi/fluent-plugin-genhashvalue) (review the documentation of the plugin for more details)
399
+ ```
400
+ <filter logs.**>
401
+ @type genhashvalue
402
+ keys sessionid,requestid
403
+ hash_type md5 # md5/sha1/sha256/sha512
404
+ base64_enc true
405
+ base91_enc false
406
+ set_key _hash
407
+ separator _
408
+ inc_time_as_key true
409
+ inc_tag_as_key true
410
+ </filter>
411
+ ```
412
+
413
+ :warning: In order to avoid hash-collisions and loosing data careful consideration is required when choosing the keys in the event record that should be used to calculate the hash
414
+
391
415
  ### parent_key
392
416
 
393
417
  ```
@@ -500,7 +524,7 @@ Starting with version 0.8.0, this gem uses excon, which supports proxy with envi
500
524
 
501
525
  ```
502
526
  buffer_type memory
503
- flush_interval 60
527
+ flush_interval 60s
504
528
  retry_limit 17
505
529
  retry_wait 1.0
506
530
  num_threads 1
@@ -529,6 +553,19 @@ This will produce elasticsearch output that looks like this:
529
553
 
530
554
  Note that the flattener does not deal with arrays at this time.
531
555
 
556
+ ### Generate Hash ID
557
+
558
+ By default, the fluentd elasticsearch plugin does not emit records with a _id field, leaving it to Elasticsearch to generate a unique _id as the record is indexed. When an Elasticsearch cluster is congested and begins to take longer to respond than the configured request_timeout, the fluentd elasticsearch plugin will re-send the same bulk request. Since Elasticsearch can't tell its actually the same request, all documents in the request are indexed again resulting in duplicate data. In certain scenarios, this can result in essentially and infinite loop generating multiple copies of the same data.
559
+
560
+ Using an _id generated by the fluentd elasticsearch plugin will communicate to Elasticsearch the uniqueness of the requests so that duplicates will be rejected or simply replace the existing records.
561
+ Here is a sample config:
562
+
563
+ ```
564
+ <hash>
565
+ hash_id_key _id # storing generated hash id key
566
+ <hash>
567
+ ```
568
+
532
569
  ### Not seeing a config you need?
533
570
 
534
571
  We try to keep the scope of this plugin small and not add too many configuration options. If you think an option would be useful to others, feel free to open an issue or contribute a Pull Request.
@@ -3,7 +3,7 @@ $:.push File.expand_path('../lib', __FILE__)
3
3
 
4
4
  Gem::Specification.new do |s|
5
5
  s.name = 'fluent-plugin-elasticsearch'
6
- s.version = '2.0.0'
6
+ s.version = '2.0.1.rc.1'
7
7
  s.authors = ['diogo', 'pitr']
8
8
  s.email = ['pitr.vern@gmail.com', 'me@diogoterror.com']
9
9
  s.description = %q{ElasticSearch output plugin for Fluent event collector}
@@ -0,0 +1,23 @@
1
+ require 'securerandom'
2
+ require 'base64'
3
+
4
+ module Fluent
5
+ module Plugin
6
+ module GenerateHashIdSupport
7
+ def self.included(klass)
8
+ klass.instance_eval {
9
+ config_section :hash, param_name: :hash_config, required: false, multi: false do
10
+ config_param :hash_id_key, :string, default: '_id'
11
+ end
12
+ }
13
+ end
14
+
15
+ def generate_hash_id_key(record)
16
+ s = ""
17
+ s += Base64.strict_encode64(SecureRandom.uuid)
18
+ record[@hash_config.hash_id_key] = s
19
+ record
20
+ end
21
+ end
22
+ end
23
+ end
@@ -11,6 +11,7 @@ end
11
11
 
12
12
  require 'fluent/plugin/output'
13
13
  require_relative 'elasticsearch_index_template'
14
+ require_relative 'generate_hash_id_support'
14
15
 
15
16
  module Fluent::Plugin
16
17
  class ElasticsearchOutput < Output
@@ -79,6 +80,7 @@ module Fluent::Plugin
79
80
  end
80
81
 
81
82
  include Fluent::ElasticsearchIndexTemplate
83
+ include Fluent::Plugin::GenerateHashIdSupport
82
84
 
83
85
  def initialize
84
86
  super
@@ -340,6 +342,10 @@ module Fluent::Plugin
340
342
  record = flatten_record(record)
341
343
  end
342
344
 
345
+ if @hash_config
346
+ record = generate_hash_id_key(record)
347
+ end
348
+
343
349
  dt = nil
344
350
  if @logstash_format || @include_timestamp
345
351
  if record.has_key?(TIMESTAMP_FIELD)
@@ -13,6 +13,8 @@ module Fluent::Plugin
13
13
  DYNAMIC_PARAM_NAMES = %W[hosts host port include_timestamp logstash_format logstash_prefix logstash_dateformat time_key utc_index index_name tag_key type_name id_key parent_key routing_key write_operation]
14
14
  DYNAMIC_PARAM_SYMBOLS = DYNAMIC_PARAM_NAMES.map { |n| "@#{n}".to_sym }
15
15
 
16
+ include Fluent::Plugin::GenerateHashIdSupport
17
+
16
18
  attr_reader :dynamic_config
17
19
 
18
20
  def configure(conf)
@@ -130,6 +132,10 @@ module Fluent::Plugin
130
132
  chunk.msgpack_each do |time, record|
131
133
  next unless record.is_a? Hash
132
134
 
135
+ if @hash_config
136
+ record = generate_hash_id_key(record)
137
+ end
138
+
133
139
  begin
134
140
  # evaluate all configurations here
135
141
  DYNAMIC_PARAM_SYMBOLS.each_with_index { |var, i|
@@ -1,10 +1,12 @@
1
1
  require 'helper'
2
2
  require 'date'
3
+ require 'fluent/test/helpers'
3
4
  require 'fluent/test/driver/output'
4
5
  require 'flexmock/test_unit'
5
6
 
6
7
  class ElasticsearchOutput < Test::Unit::TestCase
7
8
  include FlexMock::TestCase
9
+ include Fluent::Test::Helpers
8
10
 
9
11
  attr_accessor :index_cmds, :index_command_counts
10
12
 
@@ -438,6 +440,34 @@ class ElasticsearchOutput < Test::Unit::TestCase
438
440
  assert_equal('myindex', index_cmds.first['index']['_index'])
439
441
  end
440
442
 
443
+ class AdditionalHashIdMechanismTest < self
444
+ data("default" => {"hash_id_key" => '_id'},
445
+ "custom hash_id_key" => {"hash_id_key" => '_hash_id'},
446
+ )
447
+ def test_writes_with_genrate_hash(data)
448
+ driver.configure(Fluent::Config::Element.new(
449
+ 'ROOT', '', {
450
+ '@type' => 'elasticsearch',
451
+ 'id_key' => data["hash_id_key"],
452
+ }, [
453
+ Fluent::Config::Element.new('hash', '', {
454
+ 'keys' => ['request_id'],
455
+ 'hash_id_key' => data["hash_id_key"],
456
+ }, [])
457
+ ]
458
+ ))
459
+ stub_elastic_ping
460
+ stub_elastic
461
+ flexmock(SecureRandom).should_receive(:uuid)
462
+ .and_return("13a0c028-bf7c-4ae2-ad03-ec09a40006df")
463
+ time = event_time("2017-10-15 15:00:23.34567890 UTC")
464
+ driver.run(default_tag: 'test') do
465
+ driver.feed(time, sample_record.merge('request_id' => 'elastic'))
466
+ end
467
+ assert_equal(Base64.strict_encode64(SecureRandom.uuid), index_cmds[1]["#{data["hash_id_key"]}"])
468
+ end
469
+ end
470
+
441
471
  class IndexNamePlaceholdersTest < self
442
472
  def test_writes_to_speficied_index_with_tag_placeholder
443
473
  driver.configure("index_name myindex.${tag}\n")
@@ -1,10 +1,12 @@
1
1
  require 'helper'
2
2
  require 'date'
3
+ require 'fluent/test/helpers'
3
4
  require 'fluent/test/driver/output'
4
5
  require 'flexmock/test_unit'
5
6
 
6
7
  class ElasticsearchOutputDynamic < Test::Unit::TestCase
7
8
  include FlexMock::TestCase
9
+ include Fluent::Test::Helpers
8
10
 
9
11
  attr_accessor :index_cmds, :index_command_counts
10
12
 
@@ -316,6 +318,35 @@ class ElasticsearchOutputDynamic < Test::Unit::TestCase
316
318
  assert_equal(2000, total)
317
319
  end
318
320
 
321
+ class AdditionalHashIdMechanismTest < self
322
+ data("default" => {"hash_id_key" => '_id'},
323
+ "custom hash_id_key" => {"hash_id_key" => '_hash_id'},
324
+ )
325
+ def test_writes_with_genrate_hash(data)
326
+ driver.configure(Fluent::Config::Element.new(
327
+ 'ROOT', '', {
328
+ '@type' => 'elasticsearch',
329
+ 'id_key' => data["hash_id_key"],
330
+ }, [
331
+ Fluent::Config::Element.new('hash', '', {
332
+ 'keys' => ['request_id'],
333
+ 'hash_id_key' => data["hash_id_key"],
334
+ }, [])
335
+ ]
336
+ ))
337
+ stub_elastic_ping
338
+ stub_elastic
339
+ stub_elastic
340
+ flexmock(SecureRandom).should_receive(:uuid)
341
+ .and_return("82120f33-897a-4d9d-b3d5-14afd18fb412")
342
+ time = event_time("2017-10-15 15:00:23.34567890 UTC")
343
+ driver.run(default_tag: 'test') do
344
+ driver.feed(time, sample_record.merge('request_id' => 'elastic'))
345
+ end
346
+ assert_equal(Base64.strict_encode64(SecureRandom.uuid), index_cmds[1]["#{data["hash_id_key"]}"])
347
+ end
348
+ end
349
+
319
350
  def test_makes_bulk_request
320
351
  stub_elastic_ping
321
352
  stub_elastic
@@ -734,7 +765,7 @@ class ElasticsearchOutputDynamic < Test::Unit::TestCase
734
765
  stub_request(:post, "http://localhost:9200/_bulk").with do |req|
735
766
  raise ZeroDivisionError, "any not host_unreachable_exceptions exception"
736
767
  end
737
-
768
+
738
769
  driver.configure("reconnect_on_error false\n")
739
770
 
740
771
  assert_raise(ZeroDivisionError) {
metadata CHANGED
@@ -1,7 +1,7 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: fluent-plugin-elasticsearch
3
3
  version: !ruby/object:Gem::Version
4
- version: 2.0.0
4
+ version: 2.0.1.rc.1
5
5
  platform: ruby
6
6
  authors:
7
7
  - diogo
@@ -9,7 +9,7 @@ authors:
9
9
  autorequire:
10
10
  bindir: bin
11
11
  cert_chain: []
12
- date: 2017-11-06 00:00:00.000000000 Z
12
+ date: 2017-11-17 00:00:00.000000000 Z
13
13
  dependencies:
14
14
  - !ruby/object:Gem::Dependency
15
15
  name: fluentd
@@ -144,6 +144,7 @@ files:
144
144
  - Rakefile
145
145
  - fluent-plugin-elasticsearch.gemspec
146
146
  - lib/fluent/plugin/elasticsearch_index_template.rb
147
+ - lib/fluent/plugin/generate_hash_id_support.rb
147
148
  - lib/fluent/plugin/out_elasticsearch.rb
148
149
  - lib/fluent/plugin/out_elasticsearch_dynamic.rb
149
150
  - test/helper.rb
@@ -165,9 +166,9 @@ required_ruby_version: !ruby/object:Gem::Requirement
165
166
  version: '2.0'
166
167
  required_rubygems_version: !ruby/object:Gem::Requirement
167
168
  requirements:
168
- - - ">="
169
+ - - ">"
169
170
  - !ruby/object:Gem::Version
170
- version: '0'
171
+ version: 1.3.1
171
172
  requirements: []
172
173
  rubyforge_project:
173
174
  rubygems_version: 2.6.13