fluent-plugin-elasticsearch 2.0.0 → 2.0.1.rc.1
Sign up to get free protection for your applications and to get access to all the features.
- checksums.yaml +4 -4
- data/History.md +3 -0
- data/README.md +39 -2
- data/fluent-plugin-elasticsearch.gemspec +1 -1
- data/lib/fluent/plugin/generate_hash_id_support.rb +23 -0
- data/lib/fluent/plugin/out_elasticsearch.rb +6 -0
- data/lib/fluent/plugin/out_elasticsearch_dynamic.rb +6 -0
- data/test/plugin/test_out_elasticsearch.rb +30 -0
- data/test/plugin/test_out_elasticsearch_dynamic.rb +32 -1
- metadata +5 -4
checksums.yaml
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
---
|
2
2
|
SHA1:
|
3
|
-
metadata.gz:
|
4
|
-
data.tar.gz:
|
3
|
+
metadata.gz: a1eace20ded89e27d5d2ff1b27756dec9d616694
|
4
|
+
data.tar.gz: 662270c8258077cfd76d824c9015fce4c441bde7
|
5
5
|
SHA512:
|
6
|
-
metadata.gz:
|
7
|
-
data.tar.gz:
|
6
|
+
metadata.gz: 947cbfbe4cd83dc106befe56408cd8ad954c385bc98742381ed1310e7e641f75c319348098b211848daaeafca72eada16d863132bf85bc4a25902f4bd8ad01c4
|
7
|
+
data.tar.gz: 52a61b0d82ca4c9cc2be40792424a846cf96d521adb184ee6da030ad19949dba0a69e876d6ae82c571b56011fd8317c8cf3a7cc83356aadee7bf5d461771d8dd
|
data/History.md
CHANGED
data/README.md
CHANGED
@@ -53,6 +53,7 @@ Current maintainers: @cosmo0920
|
|
53
53
|
+ [Proxy Support](#proxy-support)
|
54
54
|
+ [Buffer options](#buffer-options)
|
55
55
|
+ [Hash flattening](#hash-flattening)
|
56
|
+
+ [Generate Hash ID](#generate-hash-id)
|
56
57
|
+ [Not seeing a config you need?](#not-seeing-a-config-you-need)
|
57
58
|
+ [Dynamic configuration](#dynamic-configuration)
|
58
59
|
+ [Placeholders](#placeholders)
|
@@ -347,7 +348,7 @@ reload_on_failure true # defaults to false
|
|
347
348
|
You can set in the elasticsearch-transport how often dead connections from the elasticsearch-transport's pool will be resurrected.
|
348
349
|
|
349
350
|
```
|
350
|
-
resurrect_after
|
351
|
+
resurrect_after 5s # defaults to 60s
|
351
352
|
```
|
352
353
|
|
353
354
|
### include_tag_key, tag_key
|
@@ -388,6 +389,29 @@ This following record `{"name": "Johnny", "request_id": "87d89af7daffad6"}` will
|
|
388
389
|
{ "name": "Johnny", "request_id": "87d89af7daffad6" }
|
389
390
|
```
|
390
391
|
|
392
|
+
Fluentd re-emits events that failed to be indexed/ingested in Elasticsearch with a new and unique `_id` value, this means that congested Elasticsearch clusters that reject events (due to command queue overflow, for example) will cause Fluentd to re-emit the event with a new `_id`, however Elasticsearch may actually process both (or more) attempts (with some delay) and create duplicate events in the index (since each have a unique `_id` value), one possible workaround is to use the [fluent-plugin-genhashvalue](https://github.com/mtakemi/fluent-plugin-genhashvalue) plugin to generate a unique `_hash` key in the record of each event, this `_hash` record can be used as the `id_key` to prevent Elasticsearch from creating deplicate events.
|
393
|
+
|
394
|
+
```
|
395
|
+
id_key _hash
|
396
|
+
```
|
397
|
+
|
398
|
+
Example configuration for [fluent-plugin-genhashvalue](https://github.com/mtakemi/fluent-plugin-genhashvalue) (review the documentation of the plugin for more details)
|
399
|
+
```
|
400
|
+
<filter logs.**>
|
401
|
+
@type genhashvalue
|
402
|
+
keys sessionid,requestid
|
403
|
+
hash_type md5 # md5/sha1/sha256/sha512
|
404
|
+
base64_enc true
|
405
|
+
base91_enc false
|
406
|
+
set_key _hash
|
407
|
+
separator _
|
408
|
+
inc_time_as_key true
|
409
|
+
inc_tag_as_key true
|
410
|
+
</filter>
|
411
|
+
```
|
412
|
+
|
413
|
+
:warning: In order to avoid hash-collisions and loosing data careful consideration is required when choosing the keys in the event record that should be used to calculate the hash
|
414
|
+
|
391
415
|
### parent_key
|
392
416
|
|
393
417
|
```
|
@@ -500,7 +524,7 @@ Starting with version 0.8.0, this gem uses excon, which supports proxy with envi
|
|
500
524
|
|
501
525
|
```
|
502
526
|
buffer_type memory
|
503
|
-
flush_interval
|
527
|
+
flush_interval 60s
|
504
528
|
retry_limit 17
|
505
529
|
retry_wait 1.0
|
506
530
|
num_threads 1
|
@@ -529,6 +553,19 @@ This will produce elasticsearch output that looks like this:
|
|
529
553
|
|
530
554
|
Note that the flattener does not deal with arrays at this time.
|
531
555
|
|
556
|
+
### Generate Hash ID
|
557
|
+
|
558
|
+
By default, the fluentd elasticsearch plugin does not emit records with a _id field, leaving it to Elasticsearch to generate a unique _id as the record is indexed. When an Elasticsearch cluster is congested and begins to take longer to respond than the configured request_timeout, the fluentd elasticsearch plugin will re-send the same bulk request. Since Elasticsearch can't tell its actually the same request, all documents in the request are indexed again resulting in duplicate data. In certain scenarios, this can result in essentially and infinite loop generating multiple copies of the same data.
|
559
|
+
|
560
|
+
Using an _id generated by the fluentd elasticsearch plugin will communicate to Elasticsearch the uniqueness of the requests so that duplicates will be rejected or simply replace the existing records.
|
561
|
+
Here is a sample config:
|
562
|
+
|
563
|
+
```
|
564
|
+
<hash>
|
565
|
+
hash_id_key _id # storing generated hash id key
|
566
|
+
<hash>
|
567
|
+
```
|
568
|
+
|
532
569
|
### Not seeing a config you need?
|
533
570
|
|
534
571
|
We try to keep the scope of this plugin small and not add too many configuration options. If you think an option would be useful to others, feel free to open an issue or contribute a Pull Request.
|
@@ -3,7 +3,7 @@ $:.push File.expand_path('../lib', __FILE__)
|
|
3
3
|
|
4
4
|
Gem::Specification.new do |s|
|
5
5
|
s.name = 'fluent-plugin-elasticsearch'
|
6
|
-
s.version = '2.0.
|
6
|
+
s.version = '2.0.1.rc.1'
|
7
7
|
s.authors = ['diogo', 'pitr']
|
8
8
|
s.email = ['pitr.vern@gmail.com', 'me@diogoterror.com']
|
9
9
|
s.description = %q{ElasticSearch output plugin for Fluent event collector}
|
@@ -0,0 +1,23 @@
|
|
1
|
+
require 'securerandom'
|
2
|
+
require 'base64'
|
3
|
+
|
4
|
+
module Fluent
|
5
|
+
module Plugin
|
6
|
+
module GenerateHashIdSupport
|
7
|
+
def self.included(klass)
|
8
|
+
klass.instance_eval {
|
9
|
+
config_section :hash, param_name: :hash_config, required: false, multi: false do
|
10
|
+
config_param :hash_id_key, :string, default: '_id'
|
11
|
+
end
|
12
|
+
}
|
13
|
+
end
|
14
|
+
|
15
|
+
def generate_hash_id_key(record)
|
16
|
+
s = ""
|
17
|
+
s += Base64.strict_encode64(SecureRandom.uuid)
|
18
|
+
record[@hash_config.hash_id_key] = s
|
19
|
+
record
|
20
|
+
end
|
21
|
+
end
|
22
|
+
end
|
23
|
+
end
|
@@ -11,6 +11,7 @@ end
|
|
11
11
|
|
12
12
|
require 'fluent/plugin/output'
|
13
13
|
require_relative 'elasticsearch_index_template'
|
14
|
+
require_relative 'generate_hash_id_support'
|
14
15
|
|
15
16
|
module Fluent::Plugin
|
16
17
|
class ElasticsearchOutput < Output
|
@@ -79,6 +80,7 @@ module Fluent::Plugin
|
|
79
80
|
end
|
80
81
|
|
81
82
|
include Fluent::ElasticsearchIndexTemplate
|
83
|
+
include Fluent::Plugin::GenerateHashIdSupport
|
82
84
|
|
83
85
|
def initialize
|
84
86
|
super
|
@@ -340,6 +342,10 @@ module Fluent::Plugin
|
|
340
342
|
record = flatten_record(record)
|
341
343
|
end
|
342
344
|
|
345
|
+
if @hash_config
|
346
|
+
record = generate_hash_id_key(record)
|
347
|
+
end
|
348
|
+
|
343
349
|
dt = nil
|
344
350
|
if @logstash_format || @include_timestamp
|
345
351
|
if record.has_key?(TIMESTAMP_FIELD)
|
@@ -13,6 +13,8 @@ module Fluent::Plugin
|
|
13
13
|
DYNAMIC_PARAM_NAMES = %W[hosts host port include_timestamp logstash_format logstash_prefix logstash_dateformat time_key utc_index index_name tag_key type_name id_key parent_key routing_key write_operation]
|
14
14
|
DYNAMIC_PARAM_SYMBOLS = DYNAMIC_PARAM_NAMES.map { |n| "@#{n}".to_sym }
|
15
15
|
|
16
|
+
include Fluent::Plugin::GenerateHashIdSupport
|
17
|
+
|
16
18
|
attr_reader :dynamic_config
|
17
19
|
|
18
20
|
def configure(conf)
|
@@ -130,6 +132,10 @@ module Fluent::Plugin
|
|
130
132
|
chunk.msgpack_each do |time, record|
|
131
133
|
next unless record.is_a? Hash
|
132
134
|
|
135
|
+
if @hash_config
|
136
|
+
record = generate_hash_id_key(record)
|
137
|
+
end
|
138
|
+
|
133
139
|
begin
|
134
140
|
# evaluate all configurations here
|
135
141
|
DYNAMIC_PARAM_SYMBOLS.each_with_index { |var, i|
|
@@ -1,10 +1,12 @@
|
|
1
1
|
require 'helper'
|
2
2
|
require 'date'
|
3
|
+
require 'fluent/test/helpers'
|
3
4
|
require 'fluent/test/driver/output'
|
4
5
|
require 'flexmock/test_unit'
|
5
6
|
|
6
7
|
class ElasticsearchOutput < Test::Unit::TestCase
|
7
8
|
include FlexMock::TestCase
|
9
|
+
include Fluent::Test::Helpers
|
8
10
|
|
9
11
|
attr_accessor :index_cmds, :index_command_counts
|
10
12
|
|
@@ -438,6 +440,34 @@ class ElasticsearchOutput < Test::Unit::TestCase
|
|
438
440
|
assert_equal('myindex', index_cmds.first['index']['_index'])
|
439
441
|
end
|
440
442
|
|
443
|
+
class AdditionalHashIdMechanismTest < self
|
444
|
+
data("default" => {"hash_id_key" => '_id'},
|
445
|
+
"custom hash_id_key" => {"hash_id_key" => '_hash_id'},
|
446
|
+
)
|
447
|
+
def test_writes_with_genrate_hash(data)
|
448
|
+
driver.configure(Fluent::Config::Element.new(
|
449
|
+
'ROOT', '', {
|
450
|
+
'@type' => 'elasticsearch',
|
451
|
+
'id_key' => data["hash_id_key"],
|
452
|
+
}, [
|
453
|
+
Fluent::Config::Element.new('hash', '', {
|
454
|
+
'keys' => ['request_id'],
|
455
|
+
'hash_id_key' => data["hash_id_key"],
|
456
|
+
}, [])
|
457
|
+
]
|
458
|
+
))
|
459
|
+
stub_elastic_ping
|
460
|
+
stub_elastic
|
461
|
+
flexmock(SecureRandom).should_receive(:uuid)
|
462
|
+
.and_return("13a0c028-bf7c-4ae2-ad03-ec09a40006df")
|
463
|
+
time = event_time("2017-10-15 15:00:23.34567890 UTC")
|
464
|
+
driver.run(default_tag: 'test') do
|
465
|
+
driver.feed(time, sample_record.merge('request_id' => 'elastic'))
|
466
|
+
end
|
467
|
+
assert_equal(Base64.strict_encode64(SecureRandom.uuid), index_cmds[1]["#{data["hash_id_key"]}"])
|
468
|
+
end
|
469
|
+
end
|
470
|
+
|
441
471
|
class IndexNamePlaceholdersTest < self
|
442
472
|
def test_writes_to_speficied_index_with_tag_placeholder
|
443
473
|
driver.configure("index_name myindex.${tag}\n")
|
@@ -1,10 +1,12 @@
|
|
1
1
|
require 'helper'
|
2
2
|
require 'date'
|
3
|
+
require 'fluent/test/helpers'
|
3
4
|
require 'fluent/test/driver/output'
|
4
5
|
require 'flexmock/test_unit'
|
5
6
|
|
6
7
|
class ElasticsearchOutputDynamic < Test::Unit::TestCase
|
7
8
|
include FlexMock::TestCase
|
9
|
+
include Fluent::Test::Helpers
|
8
10
|
|
9
11
|
attr_accessor :index_cmds, :index_command_counts
|
10
12
|
|
@@ -316,6 +318,35 @@ class ElasticsearchOutputDynamic < Test::Unit::TestCase
|
|
316
318
|
assert_equal(2000, total)
|
317
319
|
end
|
318
320
|
|
321
|
+
class AdditionalHashIdMechanismTest < self
|
322
|
+
data("default" => {"hash_id_key" => '_id'},
|
323
|
+
"custom hash_id_key" => {"hash_id_key" => '_hash_id'},
|
324
|
+
)
|
325
|
+
def test_writes_with_genrate_hash(data)
|
326
|
+
driver.configure(Fluent::Config::Element.new(
|
327
|
+
'ROOT', '', {
|
328
|
+
'@type' => 'elasticsearch',
|
329
|
+
'id_key' => data["hash_id_key"],
|
330
|
+
}, [
|
331
|
+
Fluent::Config::Element.new('hash', '', {
|
332
|
+
'keys' => ['request_id'],
|
333
|
+
'hash_id_key' => data["hash_id_key"],
|
334
|
+
}, [])
|
335
|
+
]
|
336
|
+
))
|
337
|
+
stub_elastic_ping
|
338
|
+
stub_elastic
|
339
|
+
stub_elastic
|
340
|
+
flexmock(SecureRandom).should_receive(:uuid)
|
341
|
+
.and_return("82120f33-897a-4d9d-b3d5-14afd18fb412")
|
342
|
+
time = event_time("2017-10-15 15:00:23.34567890 UTC")
|
343
|
+
driver.run(default_tag: 'test') do
|
344
|
+
driver.feed(time, sample_record.merge('request_id' => 'elastic'))
|
345
|
+
end
|
346
|
+
assert_equal(Base64.strict_encode64(SecureRandom.uuid), index_cmds[1]["#{data["hash_id_key"]}"])
|
347
|
+
end
|
348
|
+
end
|
349
|
+
|
319
350
|
def test_makes_bulk_request
|
320
351
|
stub_elastic_ping
|
321
352
|
stub_elastic
|
@@ -734,7 +765,7 @@ class ElasticsearchOutputDynamic < Test::Unit::TestCase
|
|
734
765
|
stub_request(:post, "http://localhost:9200/_bulk").with do |req|
|
735
766
|
raise ZeroDivisionError, "any not host_unreachable_exceptions exception"
|
736
767
|
end
|
737
|
-
|
768
|
+
|
738
769
|
driver.configure("reconnect_on_error false\n")
|
739
770
|
|
740
771
|
assert_raise(ZeroDivisionError) {
|
metadata
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
2
2
|
name: fluent-plugin-elasticsearch
|
3
3
|
version: !ruby/object:Gem::Version
|
4
|
-
version: 2.0.
|
4
|
+
version: 2.0.1.rc.1
|
5
5
|
platform: ruby
|
6
6
|
authors:
|
7
7
|
- diogo
|
@@ -9,7 +9,7 @@ authors:
|
|
9
9
|
autorequire:
|
10
10
|
bindir: bin
|
11
11
|
cert_chain: []
|
12
|
-
date: 2017-11-
|
12
|
+
date: 2017-11-17 00:00:00.000000000 Z
|
13
13
|
dependencies:
|
14
14
|
- !ruby/object:Gem::Dependency
|
15
15
|
name: fluentd
|
@@ -144,6 +144,7 @@ files:
|
|
144
144
|
- Rakefile
|
145
145
|
- fluent-plugin-elasticsearch.gemspec
|
146
146
|
- lib/fluent/plugin/elasticsearch_index_template.rb
|
147
|
+
- lib/fluent/plugin/generate_hash_id_support.rb
|
147
148
|
- lib/fluent/plugin/out_elasticsearch.rb
|
148
149
|
- lib/fluent/plugin/out_elasticsearch_dynamic.rb
|
149
150
|
- test/helper.rb
|
@@ -165,9 +166,9 @@ required_ruby_version: !ruby/object:Gem::Requirement
|
|
165
166
|
version: '2.0'
|
166
167
|
required_rubygems_version: !ruby/object:Gem::Requirement
|
167
168
|
requirements:
|
168
|
-
- - "
|
169
|
+
- - ">"
|
169
170
|
- !ruby/object:Gem::Version
|
170
|
-
version:
|
171
|
+
version: 1.3.1
|
171
172
|
requirements: []
|
172
173
|
rubyforge_project:
|
173
174
|
rubygems_version: 2.6.13
|