fluent-plugin-elasticsearch 5.0.3 → 5.0.4

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: 6eb418d889b91bf79c37c1cd72789981eb4133bcfc34e21b26e79bb919462272
4
- data.tar.gz: 168ddc77fb73216da63f1ce65ed673cf12ab13d6db549fc3e2da81956c938990
3
+ metadata.gz: 7993521144deb5ebe2665d231cdeda9833cb4b3ba87909a2e21b96f8a43f61cc
4
+ data.tar.gz: 25e0341fe8d2b131350fb521c520fa31314cdcf478914aaad2d4a3bdbf5a3954
5
5
  SHA512:
6
- metadata.gz: 5c4a2a8f63b25ea8785e0d4ef291043a93a86fea8f42638caaf46dd3efe29fd0a12919dd339d8a28c97aef65494ba054a3ff3312a122ab2cd89cb54cb069055c
7
- data.tar.gz: 6b13851ce29b2a6f2083ebc57ae733a9ca00d79a9fd09ac4c7630d99a9c6d2da302c48f2e0cd901810e0a851911e14f8355fc336e21afd97d3f33fe205b0c78a
6
+ metadata.gz: db5b6b3f9f4fc1e1d6a9c438e371aa73f962798f8a39519a01bfdeb38bfcb7a8b8e6b4efe20202108d518c33dd0701c29c20c7d4b7c0bb01b16f5907f34dea5d
7
+ data.tar.gz: 8c69a02d9cda795457f104177773939eb53e652a74f4ebeab4d8dc2b6f943222f9ca967eeca93b0ca283f8be4fa5d381b1300f762d6b5d7e77a07bebb4194524
data/History.md CHANGED
@@ -1,6 +1,12 @@
1
1
  ## Changelog [[tags]](https://github.com/uken/fluent-plugin-elasticsearch/tags)
2
2
 
3
3
  ### [Unreleased]
4
+
5
+ ### 5.0.4
6
+ - test: out_elasticsearch: Remove a needless headers from affinity stub (#888)
7
+ - Target Index Affinity (#883)
8
+
9
+ ### 5.0.3
4
10
  - Fix use_legacy_template documentation (#880)
5
11
  - Add FAQ for dynamic index/template (#878)
6
12
  - Handle IPv6 address string on host and hosts parameters (#877)
data/README.md CHANGED
@@ -38,6 +38,7 @@ Current maintainers: @cosmo0920
38
38
  + [suppress_type_name](#suppress_type_name)
39
39
  + [target_index_key](#target_index_key)
40
40
  + [target_type_key](#target_type_key)
41
+ + [target_index_affinity](#target_index_affinity)
41
42
  + [template_name](#template_name)
42
43
  + [template_file](#template_file)
43
44
  + [template_overwrite](#template_overwrite)
@@ -454,6 +455,75 @@ and this record will be written to the specified index (`logstash-2014.12.19`) r
454
455
 
455
456
  Similar to `target_index_key` config, find the type name to write to in the record under this key (or nested record). If key not found in record - fallback to `type_name` (default "fluentd").
456
457
 
458
+ ### target_index_affinity
459
+
460
+ Enable plugin to dynamically select logstash time based target index in update/upsert operations based on already indexed data rather than current time of indexing.
461
+
462
+ ```
463
+ target_index_affinity true # defaults to false
464
+ ```
465
+
466
+ By default plugin writes data of logstash format index based on current time. For example daily based index after mignight data is written to newly created index. This is normally ok when data is coming from single source and not updated after indexing.
467
+
468
+ But if you have a use case where data is also updated after indexing and `id_key` is used to identify the document uniquely for updating. Logstash format is wanted to be used for easy data managing and retention. Updates are done right after indexing to complete the data (all data not available from single source) and no updates are done anymore later point on time. In this case problem happends at index rotation time where write to 2 indexes with same id_key value may happen.
469
+
470
+ This setting will search existing data by using elastic search's [id query](https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-ids-query.html) using `id_key` value (with logstash_prefix and logstash_prefix_separator index pattarn e.g. `logstash-*`). The index of found data is used for update/upsert. When no data is found, data is written to current logstash index as normally.
471
+
472
+ This setting requires following other settings:
473
+ ```
474
+ logstash_format true
475
+ id_key myId # Some field on your data to identify the data uniquely
476
+ write_operation upsert # upsert or update
477
+ ```
478
+
479
+ Suppose you have the following situation where you have 2 different match to consume data from 2 different Kafka topics independently but close in time with each other (order not known).
480
+
481
+ ```
482
+ <match data1>
483
+ @type elasticsearch
484
+ ...
485
+ id_key myId
486
+ write_operation upsert
487
+ logstash_format true
488
+ logstash_dateformat %Y.%m.%d
489
+ logstash_prefix myindexprefix
490
+ target_index_affinity true
491
+ ...
492
+
493
+ <match data2>
494
+ @type elasticsearch
495
+ ...
496
+ id_key myId
497
+ write_operation upsert
498
+ logstash_format true
499
+ logstash_dateformat %Y.%m.%d
500
+ logstash_prefix myindexprefix
501
+ target_index_affinity true
502
+ ...
503
+ ```
504
+
505
+ If your first (data1) input is:
506
+ ```
507
+ {
508
+ "myId": "myuniqueId1",
509
+ "datafield1": "some value",
510
+ }
511
+ ```
512
+
513
+ and your second (data2) input is:
514
+ ```
515
+ {
516
+ "myId": "myuniqueId1",
517
+ "datafield99": "some important data from other source tightly related to id myuniqueId1 and wanted to be in same document.",
518
+ }
519
+ ```
520
+
521
+ Date today is 10.05.2021 so data is written to index `myindexprefix-2021.05.10` when both data1 and data2 is consumed during today.
522
+ But when we are close to index rotation and data1 is consumed and indexed at `2021-05-10T23:59:55.59707672Z` and data2
523
+ is consumed a bit later at `2021-05-11T00:00:58.222079Z` i.e. logstash index has been rotated and normally data2 would have been written
524
+ to index `myindexprefix-2021.05.11`. But with target_index_affinity setting as value true, data2 is now written to index `myindexprefix-2021.05.10`
525
+ into same document with data1 as wanted and duplicated document is avoided.
526
+
457
527
  ### template_name
458
528
 
459
529
  The name of the template to define. If a template by the name given is already present, it will be left unchanged, unless [template_overwrite](#template_overwrite) is set, in which case the template will be updated.
@@ -3,7 +3,7 @@ $:.push File.expand_path('../lib', __FILE__)
3
3
 
4
4
  Gem::Specification.new do |s|
5
5
  s.name = 'fluent-plugin-elasticsearch'
6
- s.version = '5.0.3'
6
+ s.version = '5.0.4'
7
7
  s.authors = ['diogo', 'pitr', 'Hiroshi Hatake']
8
8
  s.email = ['pitr.vern@gmail.com', 'me@diogoterror.com', 'cosmo0920.wp@gmail.com']
9
9
  s.description = %q{Elasticsearch output plugin for Fluent event collector}
@@ -43,13 +43,14 @@ class Fluent::Plugin::ElasticsearchErrorHandler
43
43
  stats = Hash.new(0)
44
44
  meta = {}
45
45
  header = {}
46
+ affinity_target_indices = @plugin.get_affinity_target_indices(chunk)
46
47
  chunk.msgpack_each do |time, rawrecord|
47
48
  bulk_message = ''
48
49
  next unless rawrecord.is_a? Hash
49
50
  begin
50
51
  # we need a deep copy for process_message to alter
51
52
  processrecord = Marshal.load(Marshal.dump(rawrecord))
52
- meta, header, record = @plugin.process_message(tag, meta, header, time, processrecord, extracted_values)
53
+ meta, header, record = @plugin.process_message(tag, meta, header, time, processrecord, affinity_target_indices, extracted_values)
53
54
  next unless @plugin.append_record_to_messages(@plugin.write_operation, meta, header, record, bulk_message)
54
55
  rescue => e
55
56
  stats[:bad_chunk_record] += 1
@@ -2,6 +2,7 @@
2
2
  require 'date'
3
3
  require 'excon'
4
4
  require 'elasticsearch'
5
+ require 'set'
5
6
  begin
6
7
  require 'elasticsearch/xpack'
7
8
  rescue LoadError
@@ -175,6 +176,7 @@ EOC
175
176
  config_param :truncate_caches_interval, :time, :default => nil
176
177
  config_param :use_legacy_template, :bool, :default => true
177
178
  config_param :catch_transport_exception_on_retry, :bool, :default => true
179
+ config_param :target_index_affinity, :bool, :default => false
178
180
 
179
181
  config_section :metadata, param_name: :metainfo, multi: false do
180
182
  config_param :include_chunk_id, :bool, :default => false
@@ -834,13 +836,14 @@ EOC
834
836
  extract_placeholders(@host, chunk)
835
837
  end
836
838
 
839
+ affinity_target_indices = get_affinity_target_indices(chunk)
837
840
  chunk.msgpack_each do |time, record|
838
841
  next unless record.is_a? Hash
839
842
 
840
843
  record = inject_chunk_id_to_record_if_needed(record, chunk_id)
841
844
 
842
845
  begin
843
- meta, header, record = process_message(tag, meta, header, time, record, extracted_values)
846
+ meta, header, record = process_message(tag, meta, header, time, record, affinity_target_indices, extracted_values)
844
847
  info = if @include_index_in_url
845
848
  RequestInfo.new(host, meta.delete("_index".freeze), meta["_index".freeze], meta.delete("_alias".freeze))
846
849
  else
@@ -877,6 +880,42 @@ EOC
877
880
  end
878
881
  end
879
882
 
883
+ def target_index_affinity_enabled?()
884
+ @target_index_affinity && @logstash_format && @id_key && (@write_operation == UPDATE_OP || @write_operation == UPSERT_OP)
885
+ end
886
+
887
+ def get_affinity_target_indices(chunk)
888
+ indices = Hash.new
889
+ if target_index_affinity_enabled?()
890
+ id_key_accessor = record_accessor_create(@id_key)
891
+ ids = Set.new
892
+ chunk.msgpack_each do |time, record|
893
+ next unless record.is_a? Hash
894
+ begin
895
+ ids << id_key_accessor.call(record)
896
+ end
897
+ end
898
+ log.debug("Find affinity target_indices by quering on ES (write_operation #{@write_operation}) for ids: #{ids.to_a}")
899
+ options = {
900
+ :index => "#{logstash_prefix}#{@logstash_prefix_separator}*",
901
+ }
902
+ query = {
903
+ 'query' => { 'ids' => { 'values' => ids.to_a } },
904
+ '_source' => false,
905
+ 'sort' => [
906
+ {"_index" => {"order" => "desc"}}
907
+ ]
908
+ }
909
+ result = client.search(options.merge(:body => Yajl.dump(query)))
910
+ # There should be just one hit per _id, but in case there still is multiple, just the oldest index is stored to map
911
+ result['hits']['hits'].each do |hit|
912
+ indices[hit["_id"]] = hit["_index"]
913
+ log.debug("target_index for id: #{hit["_id"]} from es: #{hit["_index"]}")
914
+ end
915
+ end
916
+ indices
917
+ end
918
+
880
919
  def split_request?(bulk_message, info)
881
920
  # For safety.
882
921
  end
@@ -889,7 +928,7 @@ EOC
889
928
  false
890
929
  end
891
930
 
892
- def process_message(tag, meta, header, time, record, extracted_values)
931
+ def process_message(tag, meta, header, time, record, affinity_target_indices, extracted_values)
893
932
  logstash_prefix, logstash_dateformat, index_name, type_name, _template_name, _customize_template, _deflector_alias, application_name, pipeline, _ilm_policy_id = extracted_values
894
933
 
895
934
  if @flatten_hashes
@@ -930,6 +969,15 @@ EOC
930
969
  record[@tag_key] = tag
931
970
  end
932
971
 
972
+ # If affinity target indices map has value for this particular id, use it as target_index
973
+ if !affinity_target_indices.empty?
974
+ id_accessor = record_accessor_create(@id_key)
975
+ id_value = id_accessor.call(record)
976
+ if affinity_target_indices.key?(id_value)
977
+ target_index = affinity_target_indices[id_value]
978
+ end
979
+ end
980
+
933
981
  target_type_parent, target_type_child_key = @target_type_key ? get_parent_of(record, @target_type_key) : nil
934
982
  if target_type_parent && target_type_parent[target_type_child_key]
935
983
  target_type = target_type_parent.delete(target_type_child_key)
@@ -27,10 +27,15 @@ class TestElasticsearchErrorHandler < Test::Unit::TestCase
27
27
  @error_events << {:tag => tag, :time=>time, :record=>record, :error=>e}
28
28
  end
29
29
 
30
- def process_message(tag, meta, header, time, record, extracted_values)
30
+ def process_message(tag, meta, header, time, record, affinity_target_indices, extracted_values)
31
31
  return [meta, header, record]
32
32
  end
33
33
 
34
+ def get_affinity_target_indices(chunk)
35
+ indices = Hash.new
36
+ indices
37
+ end
38
+
34
39
  def append_record_to_messages(op, meta, header, record, msgs)
35
40
  if record.has_key?('raise') && record['raise']
36
41
  raise Exception('process_message')
@@ -10,7 +10,7 @@ class ElasticsearchOutputTest < Test::Unit::TestCase
10
10
  include FlexMock::TestCase
11
11
  include Fluent::Test::Helpers
12
12
 
13
- attr_accessor :index_cmds, :index_command_counts
13
+ attr_accessor :index_cmds, :index_command_counts, :index_cmds_all_requests
14
14
 
15
15
  def setup
16
16
  Fluent::Test.setup
@@ -70,6 +70,14 @@ class ElasticsearchOutputTest < Test::Unit::TestCase
70
70
  end
71
71
  end
72
72
 
73
+ def stub_elastic_all_requests(url="http://localhost:9200/_bulk")
74
+ @index_cmds_all_requests = Array.new
75
+ stub_request(:post, url).with do |req|
76
+ @index_cmds = req.body.split("\n").map {|r| JSON.parse(r) }
77
+ @index_cmds_all_requests << @index_cmds
78
+ end
79
+ end
80
+
73
81
  def stub_elastic_unavailable(url="http://localhost:9200/_bulk")
74
82
  stub_request(:post, url).to_return(:status => [503, "Service Unavailable"])
75
83
  end
@@ -4094,6 +4102,185 @@ class ElasticsearchOutputTest < Test::Unit::TestCase
4094
4102
  assert_equal(pipeline, index_cmds.first['index']['pipeline'])
4095
4103
  end
4096
4104
 
4105
+ def stub_elastic_affinity_target_index_search_with_body(url="http://localhost:9200/logstash-*/_search", ids, return_body_str)
4106
+ # Note: ids used in query is unique list of ids
4107
+ stub_request(:post, url)
4108
+ .with(
4109
+ body: "{\"query\":{\"ids\":{\"values\":#{ids.uniq.to_json}}},\"_source\":false,\"sort\":[{\"_index\":{\"order\":\"desc\"}}]}",
4110
+ )
4111
+ .to_return(lambda do |req|
4112
+ { :status => 200,
4113
+ :headers => { 'Content-Type' => 'json' },
4114
+ :body => return_body_str
4115
+ }
4116
+ end)
4117
+ end
4118
+
4119
+ def stub_elastic_affinity_target_index_search(url="http://localhost:9200/logstash-*/_search", ids, indices)
4120
+ # Example ids and indices arrays.
4121
+ # [ "3408a2c8eecd4fbfb82e45012b54fa82", "2816fc6ef4524b3f8f7e869002005433"]
4122
+ # [ "logstash-2021.04.28", "logstash-2021.04.29"]
4123
+ body = %({
4124
+ "took" : 31,
4125
+ "timed_out" : false,
4126
+ "_shards" : {
4127
+ "total" : 52,
4128
+ "successful" : 52,
4129
+ "skipped" : 48,
4130
+ "failed" : 0
4131
+ },
4132
+ "hits" : {
4133
+ "total" : {
4134
+ "value" : 356,
4135
+ "relation" : "eq"
4136
+ },
4137
+ "max_score" : null,
4138
+ "hits" : [
4139
+ {
4140
+ "_index" : "#{indices[0]}",
4141
+ "_type" : "_doc",
4142
+ "_id" : "#{ids[0]}",
4143
+ "_score" : null,
4144
+ "sort" : [
4145
+ "#{indices[0]}"
4146
+ ]
4147
+ },
4148
+ {
4149
+ "_index" : "#{indices[1]}",
4150
+ "_type" : "_doc",
4151
+ "_id" : "#{ids[1]}",
4152
+ "_score" : null,
4153
+ "sort" : [
4154
+ "#{indices[1]}"
4155
+ ]
4156
+ }
4157
+ ]
4158
+ }
4159
+ })
4160
+ stub_elastic_affinity_target_index_search_with_body(ids, body)
4161
+ end
4162
+
4163
+ def stub_elastic_affinity_target_index_search_return_empty(url="http://localhost:9200/logstash-*/_search", ids)
4164
+ empty_body = %({
4165
+ "took" : 5,
4166
+ "timed_out" : false,
4167
+ "_shards" : {
4168
+ "total" : 54,
4169
+ "successful" : 54,
4170
+ "skipped" : 53,
4171
+ "failed" : 0
4172
+ },
4173
+ "hits" : {
4174
+ "total" : {
4175
+ "value" : 0,
4176
+ "relation" : "eq"
4177
+ },
4178
+ "max_score" : null,
4179
+ "hits" : [ ]
4180
+ }
4181
+ })
4182
+ stub_elastic_affinity_target_index_search_with_body(ids, empty_body)
4183
+ end
4184
+
4185
+ def test_writes_to_affinity_target_index
4186
+ driver.configure("target_index_affinity true
4187
+ logstash_format true
4188
+ id_key my_id
4189
+ write_operation update")
4190
+
4191
+ my_id_value = "3408a2c8eecd4fbfb82e45012b54fa82"
4192
+ ids = [my_id_value]
4193
+ indices = ["logstash-2021.04.28"]
4194
+ stub_elastic
4195
+ stub_elastic_affinity_target_index_search(ids, indices)
4196
+ driver.run(default_tag: 'test') do
4197
+ driver.feed(sample_record('my_id' => my_id_value))
4198
+ end
4199
+ assert_equal('logstash-2021.04.28', index_cmds.first['update']['_index'])
4200
+ end
4201
+
4202
+ def test_writes_to_affinity_target_index_write_operation_upsert
4203
+ driver.configure("target_index_affinity true
4204
+ logstash_format true
4205
+ id_key my_id
4206
+ write_operation upsert")
4207
+
4208
+ my_id_value = "3408a2c8eecd4fbfb82e45012b54fa82"
4209
+ ids = [my_id_value]
4210
+ indices = ["logstash-2021.04.28"]
4211
+ stub_elastic
4212
+ stub_elastic_affinity_target_index_search(ids, indices)
4213
+ driver.run(default_tag: 'test') do
4214
+ driver.feed(sample_record('my_id' => my_id_value))
4215
+ end
4216
+ assert_equal('logstash-2021.04.28', index_cmds.first['update']['_index'])
4217
+ end
4218
+
4219
+ def test_writes_to_affinity_target_index_index_not_exists_yet
4220
+ driver.configure("target_index_affinity true
4221
+ logstash_format true
4222
+ id_key my_id
4223
+ write_operation update")
4224
+
4225
+ my_id_value = "3408a2c8eecd4fbfb82e45012b54fa82"
4226
+ ids = [my_id_value]
4227
+ stub_elastic
4228
+ stub_elastic_affinity_target_index_search_return_empty(ids)
4229
+ time = Time.parse Date.today.iso8601
4230
+ driver.run(default_tag: 'test') do
4231
+ driver.feed(time.to_i, sample_record('my_id' => my_id_value))
4232
+ end
4233
+ assert_equal("logstash-#{time.utc.strftime("%Y.%m.%d")}", index_cmds.first['update']['_index'])
4234
+ end
4235
+
4236
+ def test_writes_to_affinity_target_index_multiple_indices
4237
+ driver.configure("target_index_affinity true
4238
+ logstash_format true
4239
+ id_key my_id
4240
+ write_operation update")
4241
+
4242
+ my_id_value = "2816fc6ef4524b3f8f7e869002005433"
4243
+ my_id_value2 = "3408a2c8eecd4fbfb82e45012b54fa82"
4244
+ ids = [my_id_value, my_id_value2]
4245
+ indices = ["logstash-2021.04.29", "logstash-2021.04.28"]
4246
+ stub_elastic_all_requests
4247
+ stub_elastic_affinity_target_index_search(ids, indices)
4248
+ driver.run(default_tag: 'test') do
4249
+ driver.feed(sample_record('my_id' => my_id_value))
4250
+ driver.feed(sample_record('my_id' => my_id_value2))
4251
+ end
4252
+ assert_equal(2, index_cmds_all_requests.count)
4253
+ assert_equal('logstash-2021.04.29', index_cmds_all_requests[0].first['update']['_index'])
4254
+ assert_equal(my_id_value, index_cmds_all_requests[0].first['update']['_id'])
4255
+ assert_equal('logstash-2021.04.28', index_cmds_all_requests[1].first['update']['_index'])
4256
+ assert_equal(my_id_value2, index_cmds_all_requests[1].first['update']['_id'])
4257
+ end
4258
+
4259
+ def test_writes_to_affinity_target_index_same_id_dublicated_write_to_oldest_index
4260
+ driver.configure("target_index_affinity true
4261
+ logstash_format true
4262
+ id_key my_id
4263
+ write_operation update")
4264
+
4265
+ my_id_value = "2816fc6ef4524b3f8f7e869002005433"
4266
+ # It may happen than same id has inserted to two index while data inserted during rollover period
4267
+ ids = [my_id_value, my_id_value]
4268
+ # Simulate the used sorting here, as search sorts indices in DESC order to pick only oldest index per single _id
4269
+ indices = ["logstash-2021.04.29", "logstash-2021.04.28"]
4270
+
4271
+ stub_elastic_all_requests
4272
+ stub_elastic_affinity_target_index_search(ids, indices)
4273
+ driver.run(default_tag: 'test') do
4274
+ driver.feed(sample_record('my_id' => my_id_value))
4275
+ driver.feed(sample_record('my_id' => my_id_value))
4276
+ end
4277
+ assert_equal('logstash-2021.04.28', index_cmds.first['update']['_index'])
4278
+
4279
+ assert_equal(1, index_cmds_all_requests.count)
4280
+ assert_equal('logstash-2021.04.28', index_cmds_all_requests[0].first['update']['_index'])
4281
+ assert_equal(my_id_value, index_cmds_all_requests[0].first['update']['_id'])
4282
+ end
4283
+
4097
4284
  class PipelinePlaceholdersTest < self
4098
4285
  def test_writes_to_default_index_with_pipeline_tag_placeholder
4099
4286
  pipeline = "fluentd-${tag}"
metadata CHANGED
@@ -1,7 +1,7 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: fluent-plugin-elasticsearch
3
3
  version: !ruby/object:Gem::Version
4
- version: 5.0.3
4
+ version: 5.0.4
5
5
  platform: ruby
6
6
  authors:
7
7
  - diogo
@@ -10,7 +10,7 @@ authors:
10
10
  autorequire:
11
11
  bindir: bin
12
12
  cert_chain: []
13
- date: 2021-04-20 00:00:00.000000000 Z
13
+ date: 2021-06-07 00:00:00.000000000 Z
14
14
  dependencies:
15
15
  - !ruby/object:Gem::Dependency
16
16
  name: fluentd