logstash-input-elasticsearch 4.21.2 → 4.22.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: 70af2192f555f8afff4ef2f96072f2b215a2039207dfa12a9449f507f7b13f7b
4
- data.tar.gz: 73621246eccfd1fbb385be5e9ca5ef9a071cdb64008cb539a1e80a08c7a0ed34
3
+ metadata.gz: 4b69fd432de3b9ad93091c9e1ada1469965f039f302ec8e4fa68c51f30954445
4
+ data.tar.gz: 20f9f41949e985e1eca21d8370e704260c780c88659536b496d9c405adcf4838
5
5
  SHA512:
6
- metadata.gz: bbc5c842d77204339e0bb64174f98ffb8bb1728957a1f64d1f83e1f5bad27ad76fc24f44b23a64d23247b26a806cfee7cbd52a16ea34e5490f1355bcdbb98303
7
- data.tar.gz: 7b258f80ca64e5dd16593a65d7326a5f3695f840cbf32fdeac9363a6a19d4747de9135065a7b940602cd77f43a02910b74d667761184ccb846a864e128334a20
6
+ metadata.gz: 0b22f9d333bd434006a11e3d8ca15a082bc57365bda305897776b625c9d19b1059533f3b1af97b958e11966bb2f3951f35c928345b0fd03a41e299588088061e
7
+ data.tar.gz: 4b3cac772e4ea9d4586923dee83e1e626efffacbfbda02348dc2e09cf704c04011a1cfec5146ce490997c05bdaa62f9206ab2c2de3d3224e8dbe7d0f649c35b2
data/CHANGELOG.md CHANGED
@@ -1,3 +1,6 @@
1
+ ## 4.22.0
2
+ - Add "cursor"-like index tracking [#205](https://github.com/logstash-plugins/logstash-input-elasticsearch/pull/205)
3
+
1
4
  ## 4.21.2
2
5
  - Add elastic-transport client support used in elasticsearch-ruby 8.x [#225](https://github.com/logstash-plugins/logstash-input-elasticsearch/pull/225)
3
6
 
data/docs/index.asciidoc CHANGED
@@ -48,7 +48,7 @@ This would create an Elasticsearch query with the following format:
48
48
  "sort": [ "_doc" ]
49
49
  }'
50
50
 
51
-
51
+ [id="plugins-{type}s-{plugin}-scheduling"]
52
52
  ==== Scheduling
53
53
 
54
54
  Input from this plugin can be scheduled to run periodically according to a specific
@@ -103,6 +103,133 @@ Common causes are:
103
103
  - When the hit result contains top-level fields that are {logstash-ref}/processing.html#reserved-fields[reserved in Logstash] but do not have the expected shape. Use the <<plugins-{type}s-{plugin}-target>> directive to avoid conflicts with the top-level namespace.
104
104
  - When <<plugins-{type}s-{plugin}-docinfo>> is enabled and the docinfo fields cannot be merged into the hit result. Combine <<plugins-{type}s-{plugin}-target>> and <<plugins-{type}s-{plugin}-docinfo_target>> to avoid conflict.
105
105
 
106
+ [id="plugins-{type}s-{plugin}-cursor"]
107
+ ==== Tracking a field's value across runs
108
+
109
+ .Technical Preview: Tracking a field's value
110
+ ****
111
+ The feature that allows tracking a field's value across runs is in _Technical Preview_.
112
+ Configuration options and implementation details are subject to change in minor releases without being preceded by deprecation warnings.
113
+ ****
114
+
115
+ Some uses cases require tracking the value of a particular field between two jobs.
116
+ Examples include:
117
+
118
+ * avoiding the need to re-process the entire result set of a long query after an unplanned restart
119
+ * grabbing only new data from an index instead of processing the entire set on each job.
120
+
121
+ The Elasticsearch input plugin provides the <<plugins-{type}s-{plugin}-tracking_field>> and <<plugins-{type}s-{plugin}-tracking_field_seed>> options.
122
+ When <<plugins-{type}s-{plugin}-tracking_field>> is set, the plugin records the value of that field for the last document retrieved in a run into
123
+ a file.
124
+ (The file location defaults to <<plugins-{type}s-{plugin}-last_run_metadata_path>>.)
125
+
126
+ You can then inject this value in the query using the placeholder `:last_value`.
127
+ The value will be injected into the query before execution, and then updated after the query completes if new data was found.
128
+
129
+ This feature works best when:
130
+
131
+ * the query sorts by the tracking field,
132
+ * the timestamp field is added by {es}, and
133
+ * the field type has enough resolution so that two events are unlikely to have the same value.
134
+
135
+ Consider using a tracking field whose type is https://www.elastic.co/guide/en/elasticsearch/reference/current/date_nanos.html[date nanoseconds].
136
+ If the tracking field is of this data type, you can use an extra placeholder called `:present` to inject the nano-second based value of "now-30s".
137
+ This placeholder is useful as the right-hand side of a range filter, allowing the collection of
138
+ new data but leaving partially-searchable bulk request data to the next scheduled job.
139
+
140
+ [id="plugins-{type}s-{plugin}-tracking-sample"]
141
+ ===== Sample configuration: Track field value across runs
142
+
143
+ This section contains a series of steps to help you set up the "tailing" of data being written to a set of indices, using a date nanosecond field added by an Elasticsearch ingest pipeline and the `tracking_field` capability of this plugin.
144
+
145
+ . Create ingest pipeline that adds Elasticsearch's `_ingest.timestamp` field to the documents as `event.ingested`:
146
+ +
147
+ [source, json]
148
+ PUT _ingest/pipeline/my-pipeline
149
+ {
150
+ "processors": [
151
+ {
152
+ "script": {
153
+ "lang": "painless",
154
+ "source": "ctx.putIfAbsent(\"event\", [:]); ctx.event.ingested = metadata().now.format(DateTimeFormatter.ISO_INSTANT);"
155
+ }
156
+ }
157
+ ]
158
+ }
159
+
160
+ [start=2]
161
+ . Create an index mapping where the tracking field is of date nanosecond type and invokes the defined pipeline:
162
+ +
163
+ [source, json]
164
+ PUT /_template/my_template
165
+ {
166
+ "index_patterns": ["test-*"],
167
+ "settings": {
168
+ "index.default_pipeline": "my-pipeline",
169
+ },
170
+ "mappings": {
171
+ "properties": {
172
+ "event": {
173
+ "properties": {
174
+ "ingested": {
175
+ "type": "date_nanos",
176
+ "format": "strict_date_optional_time_nanos"
177
+ }
178
+ }
179
+ }
180
+ }
181
+ }
182
+ }
183
+
184
+ [start=3]
185
+ . Define a query that looks at all data of the indices, sorted by the tracking field, and with a range filter since the last value seen until present:
186
+ +
187
+ [source,json]
188
+ {
189
+ "query": {
190
+ "range": {
191
+ "event.ingested": {
192
+ "gt": ":last_value",
193
+ "lt": ":present"
194
+ }
195
+ }
196
+ },
197
+ "sort": [
198
+ {
199
+ "event.ingested": {
200
+ "order": "asc",
201
+ "format": "strict_date_optional_time_nanos",
202
+ "numeric_type": "date_nanos"
203
+ }
204
+ }
205
+ ]
206
+ }
207
+
208
+ [start=4]
209
+ . Configure the Elasticsearch input to query the indices with the query defined above, every minute, and track the `event.ingested` field:
210
+ +
211
+ [source, ruby]
212
+ input {
213
+ elasticsearch {
214
+ id => tail_test_index
215
+ hosts => [ 'https://..']
216
+ api_key => '....'
217
+ index => 'test-*'
218
+ query => '{ "query": { "range": { "event.ingested": { "gt": ":last_value", "lt": ":present"}}}, "sort": [ { "event.ingested": {"order": "asc", "format": "strict_date_optional_time_nanos", "numeric_type" : "date_nanos" } } ] }'
219
+ tracking_field => "[event][ingested]"
220
+ slices => 5 # optional use of slices to speed data processing, should be equal to or less than number of primary shards
221
+ schedule => '* * * * *' # every minute
222
+ schedule_overlap => false # don't accumulate jobs if one takes longer than 1 minute
223
+ }
224
+ }
225
+
226
+ With this sample setup, new documents are indexed into a `test-*` index.
227
+ The next scheduled run:
228
+
229
+ * selects all new documents since the last observed value of the tracking field,
230
+ * uses {ref}/point-in-time-api.html#point-in-time-api[Point in time (PIT)] + {ref}/paginate-search-results.html#search-after[Search after] to paginate through all the data, and
231
+ * updates the value of the field at the end of the pagination.
232
+
106
233
  [id="plugins-{type}s-{plugin}-options"]
107
234
  ==== Elasticsearch Input configuration options
108
235
 
@@ -123,12 +250,14 @@ This plugin supports the following configuration options plus the <<plugins-{typ
123
250
  | <<plugins-{type}s-{plugin}-ecs_compatibility>> |<<string,string>>|No
124
251
  | <<plugins-{type}s-{plugin}-hosts>> |<<array,array>>|No
125
252
  | <<plugins-{type}s-{plugin}-index>> |<<string,string>>|No
253
+ | <<plugins-{type}s-{plugin}-last_run_metadata_path>> |<<string,string>>|No
126
254
  | <<plugins-{type}s-{plugin}-password>> |<<password,password>>|No
127
255
  | <<plugins-{type}s-{plugin}-proxy>> |<<uri,uri>>|No
128
256
  | <<plugins-{type}s-{plugin}-query>> |<<string,string>>|No
129
257
  | <<plugins-{type}s-{plugin}-response_type>> |<<string,string>>, one of `["hits","aggregations"]`|No
130
258
  | <<plugins-{type}s-{plugin}-request_timeout_seconds>> | <<number,number>>|No
131
259
  | <<plugins-{type}s-{plugin}-schedule>> |<<string,string>>|No
260
+ | <<plugins-{type}s-{plugin}-schedule_overlap>> |<<boolean,boolean>>|No
132
261
  | <<plugins-{type}s-{plugin}-scroll>> |<<string,string>>|No
133
262
  | <<plugins-{type}s-{plugin}-search_api>> |<<string,string>>, one of `["auto", "search_after", "scroll"]`|No
134
263
  | <<plugins-{type}s-{plugin}-size>> |<<number,number>>|No
@@ -148,6 +277,8 @@ This plugin supports the following configuration options plus the <<plugins-{typ
148
277
  | <<plugins-{type}s-{plugin}-ssl_verification_mode>> |<<string,string>>, one of `["full", "none"]`|No
149
278
  | <<plugins-{type}s-{plugin}-socket_timeout_seconds>> | <<number,number>>|No
150
279
  | <<plugins-{type}s-{plugin}-target>> | {logstash-ref}/field-references-deepdive.html[field reference] | No
280
+ | <<plugins-{type}s-{plugin}-tracking_field>> |<<string,string>>|No
281
+ | <<plugins-{type}s-{plugin}-tracking_field_seed>> |<<string,string>>|No
151
282
  | <<plugins-{type}s-{plugin}-retries>> | <<number,number>>|No
152
283
  | <<plugins-{type}s-{plugin}-user>> |<<string,string>>|No
153
284
  |=======================================================================
@@ -327,6 +458,17 @@ Check out {ref}/api-conventions.html#api-multi-index[Multi Indices
327
458
  documentation] in the Elasticsearch documentation for info on
328
459
  referencing multiple indices.
329
460
 
461
+ [id="plugins-{type}s-{plugin}-last_run_metadata_path"]
462
+ ===== `last_run_metadata_path`
463
+
464
+ * Value type is <<string,string>>
465
+ * There is no default value for this setting.
466
+
467
+ The path to store the last observed value of the tracking field, when used.
468
+ By default this file is stored as `<path.data>/plugins/inputs/elasticsearch/<pipeline_id>/last_run_value`.
469
+
470
+ This setting should point to file, not a directory, and Logstash must have read+write access to this file.
471
+
330
472
  [id="plugins-{type}s-{plugin}-password"]
331
473
  ===== `password`
332
474
 
@@ -407,6 +549,19 @@ for example: "* * * * *" (execute query every minute, on the minute)
407
549
  There is no schedule by default. If no schedule is given, then the statement is run
408
550
  exactly once.
409
551
 
552
+ [id="plugins-{type}s-{plugin}-schedule_overlap"]
553
+ ===== `schedule_overlap`
554
+
555
+ * Value type is <<boolean,boolean>>
556
+ * Default value is `true`
557
+
558
+ Whether to allow queuing of a scheduled run if a run is occurring.
559
+ While this is ideal for ensuring a new run happens immediately after the previous on finishes if there
560
+ is a lot of work to do, but given the queue is unbounded it may lead to an out of memory over long periods of time
561
+ if the queue grows continuously.
562
+
563
+ When in doubt, set `schedule_overlap` to false (it may become the default value in the future).
564
+
410
565
  [id="plugins-{type}s-{plugin}-scroll"]
411
566
  ===== `scroll`
412
567
 
@@ -617,6 +772,28 @@ When the `target` is set to a field reference, the `_source` of the hit is place
617
772
  This option can be useful to avoid populating unknown fields when a downstream schema such as ECS is enforced.
618
773
  It is also possible to target an entry in the event's metadata, which will be available during event processing but not exported to your outputs (e.g., `target \=> "[@metadata][_source]"`).
619
774
 
775
+ [id="plugins-{type}s-{plugin}-tracking_field"]
776
+ ===== `tracking_field`
777
+
778
+ * Value type is <<string,string>>
779
+ * There is no default value for this setting.
780
+
781
+ Which field from the last event of a previous run will be used a cursor value for the following run.
782
+ The value of this field is injected into each query if the query uses the placeholder `:last_value`.
783
+ For the first query after a pipeline is started, the value used is either read from <<plugins-{type}s-{plugin}-last_run_metadata_path>> file,
784
+ or taken from <<plugins-{type}s-{plugin}-tracking_field_seed>> setting.
785
+
786
+ Note: The tracking value is updated after each page is read and at the end of each Point in Time. In case of a crash the last saved value will be used so some duplication of data can occur. For this reason the use of unique document IDs for each event is recommended in the downstream destination.
787
+
788
+ [id="plugins-{type}s-{plugin}-tracking_field_seed"]
789
+ ===== `tracking_field_seed`
790
+
791
+ * Value type is <<string,string>>
792
+ * Default value is `"1970-01-01T00:00:00.000000000Z"`
793
+
794
+ The starting value for the <<plugins-{type}s-{plugin}-tracking_field>> if there is no <<plugins-{type}s-{plugin}-last_run_metadata_path>> already.
795
+ This field defaults to the nanosecond precision ISO8601 representation of `epoch`, or "1970-01-01T00:00:00.000000000Z", given nano-second precision timestamps are the
796
+ most reliable data format to use for this feature.
620
797
 
621
798
  [id="plugins-{type}s-{plugin}-user"]
622
799
  ===== `user`
@@ -12,14 +12,9 @@ module LogStash
12
12
  @client = client
13
13
  @plugin_params = plugin.params
14
14
 
15
+ @index = @plugin_params["index"]
15
16
  @size = @plugin_params["size"]
16
- @query = @plugin_params["query"]
17
17
  @retries = @plugin_params["retries"]
18
- @agg_options = {
19
- :index => @plugin_params["index"],
20
- :size => 0
21
- }.merge(:body => @query)
22
-
23
18
  @plugin = plugin
24
19
  end
25
20
 
@@ -33,10 +28,18 @@ module LogStash
33
28
  false
34
29
  end
35
30
 
36
- def do_run(output_queue)
31
+ def aggregation_options(query_object)
32
+ {
33
+ :index => @index,
34
+ :size => 0,
35
+ :body => query_object
36
+ }
37
+ end
38
+
39
+ def do_run(output_queue, query_object)
37
40
  logger.info("Aggregation starting")
38
41
  r = retryable(AGGREGATION_JOB) do
39
- @client.search(@agg_options)
42
+ @client.search(aggregation_options(query_object))
40
43
  end
41
44
  @plugin.push_hit(r, output_queue, 'aggregations') if r
42
45
  end
@@ -0,0 +1,58 @@
1
+ require 'fileutils'
2
+
3
+ module LogStash; module Inputs; class Elasticsearch
4
+ class CursorTracker
5
+ include LogStash::Util::Loggable
6
+
7
+ attr_reader :last_value
8
+
9
+ def initialize(last_run_metadata_path:, tracking_field:, tracking_field_seed:)
10
+ @last_run_metadata_path = last_run_metadata_path
11
+ @last_value_hashmap = Java::java.util.concurrent.ConcurrentHashMap.new
12
+ @last_value = IO.read(@last_run_metadata_path) rescue nil || tracking_field_seed
13
+ @tracking_field = tracking_field
14
+ logger.info "Starting value for cursor field \"#{@tracking_field}\": #{@last_value}"
15
+ @mutex = Mutex.new
16
+ end
17
+
18
+ def checkpoint_cursor(intermediate: true)
19
+ @mutex.synchronize do
20
+ if intermediate
21
+ # in intermediate checkpoints pick the smallest
22
+ converge_last_value {|v1, v2| v1 < v2 ? v1 : v2}
23
+ else
24
+ # in the last search of a PIT choose the largest
25
+ converge_last_value {|v1, v2| v1 > v2 ? v1 : v2}
26
+ @last_value_hashmap.clear
27
+ end
28
+ IO.write(@last_run_metadata_path, @last_value)
29
+ end
30
+ end
31
+
32
+ def converge_last_value(&block)
33
+ return if @last_value_hashmap.empty?
34
+ new_last_value = @last_value_hashmap.reduceValues(1000, &block)
35
+ logger.debug? && logger.debug("converge_last_value: got #{@last_value_hashmap.values.inspect}. won: #{new_last_value}")
36
+ return if new_last_value == @last_value
37
+ @last_value = new_last_value
38
+ logger.info "New cursor value for field \"#{@tracking_field}\" is: #{new_last_value}"
39
+ end
40
+
41
+ def record_last_value(event)
42
+ value = event.get(@tracking_field)
43
+ logger.trace? && logger.trace("storing last_value if #{@tracking_field} for #{Thread.current.object_id}: #{value}")
44
+ @last_value_hashmap.put(Thread.current.object_id, value)
45
+ end
46
+
47
+ def inject_cursor(query_json)
48
+ # ":present" means "now - 30s" to avoid grabbing partially visible data in the PIT
49
+ result = query_json.gsub(":last_value", @last_value.to_s).gsub(":present", now_minus_30s)
50
+ logger.debug("inject_cursor: injected values for ':last_value' and ':present'", :query => result)
51
+ result
52
+ end
53
+
54
+ def now_minus_30s
55
+ Java::java.time.Instant.now.minusSeconds(30).to_s
56
+ end
57
+ end
58
+ end; end; end
@@ -21,9 +21,10 @@ module LogStash
21
21
  @pipeline_id = plugin.pipeline_id
22
22
  end
23
23
 
24
- def do_run(output_queue)
25
- return retryable_search(output_queue) if @slices.nil? || @slices <= 1
24
+ def do_run(output_queue, query)
25
+ @query = query
26
26
 
27
+ return retryable_search(output_queue) if @slices.nil? || @slices <= 1
27
28
  retryable_slice_search(output_queue)
28
29
  end
29
30
 
@@ -122,6 +123,13 @@ module LogStash
122
123
  PIT_JOB = "create point in time (PIT)"
123
124
  SEARCH_AFTER_JOB = "search_after paginated search"
124
125
 
126
+ attr_accessor :cursor_tracker
127
+
128
+ def do_run(output_queue, query)
129
+ super(output_queue, query)
130
+ @cursor_tracker.checkpoint_cursor(intermediate: false) if @cursor_tracker
131
+ end
132
+
125
133
  def pit?(id)
126
134
  !!id&.is_a?(String)
127
135
  end
@@ -192,6 +200,8 @@ module LogStash
192
200
  end
193
201
  end
194
202
 
203
+ @cursor_tracker.checkpoint_cursor(intermediate: true) if @cursor_tracker
204
+
195
205
  logger.info("Query completed", log_details)
196
206
  end
197
207
 
@@ -73,6 +73,7 @@ class LogStash::Inputs::Elasticsearch < LogStash::Inputs::Base
73
73
 
74
74
  require 'logstash/inputs/elasticsearch/paginated_search'
75
75
  require 'logstash/inputs/elasticsearch/aggregation'
76
+ require 'logstash/inputs/elasticsearch/cursor_tracker'
76
77
 
77
78
  include LogStash::PluginMixins::ECSCompatibilitySupport(:disabled, :v1, :v8 => :v1)
78
79
  include LogStash::PluginMixins::ECSCompatibilitySupport::TargetCheck
@@ -124,6 +125,20 @@ class LogStash::Inputs::Elasticsearch < LogStash::Inputs::Base
124
125
  # by this pipeline input.
125
126
  config :slices, :validate => :number
126
127
 
128
+ # Enable tracking the value of a given field to be used as a cursor
129
+ # Main concerns:
130
+ # * using anything other than _event.timestamp easily leads to data loss
131
+ # * the first "synchronization run can take a long time"
132
+ config :tracking_field, :validate => :string
133
+
134
+ # Define the initial seed value of the tracking_field
135
+ config :tracking_field_seed, :validate => :string, :default => "1970-01-01T00:00:00.000000000Z"
136
+
137
+ # The location of where the tracking field value will be stored
138
+ # The value is persisted after each scheduled run (and not per result)
139
+ # If it's not set it defaults to '${path.data}/plugins/inputs/elasticsearch/<pipeline_id>/last_run_value'
140
+ config :last_run_metadata_path, :validate => :string
141
+
127
142
  # If set, include Elasticsearch document information such as index, type, and
128
143
  # the id in the event.
129
144
  #
@@ -262,6 +277,10 @@ class LogStash::Inputs::Elasticsearch < LogStash::Inputs::Base
262
277
  # exactly once.
263
278
  config :schedule, :validate => :string
264
279
 
280
+ # Allow scheduled runs to overlap (enabled by default). Setting to false will
281
+ # only start a new scheduled run after the previous one completes.
282
+ config :schedule_overlap, :validate => :boolean
283
+
265
284
  # If set, the _source of each hit will be added nested under the target instead of at the top-level
266
285
  config :target, :validate => :field_reference
267
286
 
@@ -335,16 +354,30 @@ class LogStash::Inputs::Elasticsearch < LogStash::Inputs::Base
335
354
 
336
355
  setup_query_executor
337
356
 
357
+ setup_cursor_tracker
358
+
338
359
  @client
339
360
  end
340
361
 
341
362
  def run(output_queue)
342
363
  if @schedule
343
- scheduler.cron(@schedule) { @query_executor.do_run(output_queue) }
364
+ scheduler.cron(@schedule, :overlap => @schedule_overlap) do
365
+ @query_executor.do_run(output_queue, get_query_object())
366
+ end
344
367
  scheduler.join
345
368
  else
346
- @query_executor.do_run(output_queue)
369
+ @query_executor.do_run(output_queue, get_query_object())
370
+ end
371
+ end
372
+
373
+ def get_query_object
374
+ if @cursor_tracker
375
+ query = @cursor_tracker.inject_cursor(@query)
376
+ @logger.debug("new query is #{query}")
377
+ else
378
+ query = @query
347
379
  end
380
+ LogStash::Json.load(query)
348
381
  end
349
382
 
350
383
  ##
@@ -354,6 +387,11 @@ class LogStash::Inputs::Elasticsearch < LogStash::Inputs::Base
354
387
  event = event_from_hit(hit, root_field)
355
388
  decorate(event)
356
389
  output_queue << event
390
+ record_last_value(event)
391
+ end
392
+
393
+ def record_last_value(event)
394
+ @cursor_tracker.record_last_value(event) if @tracking_field
357
395
  end
358
396
 
359
397
  def event_from_hit(hit, root_field)
@@ -676,6 +714,28 @@ class LogStash::Inputs::Elasticsearch < LogStash::Inputs::Base
676
714
  end
677
715
  end
678
716
 
717
+ def setup_cursor_tracker
718
+ return unless @tracking_field
719
+ return unless @query_executor.is_a?(LogStash::Inputs::Elasticsearch::SearchAfter)
720
+
721
+ if @resolved_search_api != "search_after" || @response_type != "hits"
722
+ raise ConfigurationError.new("The `tracking_field` feature can only be used with `search_after` non-aggregation queries")
723
+ end
724
+
725
+ @cursor_tracker = CursorTracker.new(last_run_metadata_path: last_run_metadata_path,
726
+ tracking_field: @tracking_field,
727
+ tracking_field_seed: @tracking_field_seed)
728
+ @query_executor.cursor_tracker = @cursor_tracker
729
+ end
730
+
731
+ def last_run_metadata_path
732
+ return @last_run_metadata_path if @last_run_metadata_path
733
+
734
+ last_run_metadata_path = ::File.join(LogStash::SETTINGS.get_value("path.data"), "plugins", "inputs", "elasticsearch", pipeline_id, "last_run_value")
735
+ FileUtils.mkdir_p ::File.dirname(last_run_metadata_path)
736
+ last_run_metadata_path
737
+ end
738
+
679
739
  def get_transport_client_class
680
740
  # LS-core includes `elasticsearch` gem. The gem is composed of two separate gems: `elasticsearch-api` and `elasticsearch-transport`
681
741
  # And now `elasticsearch-transport` is old, instead we have `elastic-transport`.
@@ -1,7 +1,7 @@
1
1
  Gem::Specification.new do |s|
2
2
 
3
3
  s.name = 'logstash-input-elasticsearch'
4
- s.version = '4.21.2'
4
+ s.version = '4.22.0'
5
5
  s.licenses = ['Apache License (2.0)']
6
6
  s.summary = "Reads query results from an Elasticsearch cluster"
7
7
  s.description = "This gem is a Logstash plugin required to be installed on top of the Logstash core pipeline using $LS_HOME/bin/logstash-plugin install gemname. This gem is not a stand-alone program"
@@ -0,0 +1,72 @@
1
+ # encoding: utf-8
2
+ require "logstash/devutils/rspec/spec_helper"
3
+ require "logstash/devutils/rspec/shared_examples"
4
+ require "logstash/inputs/elasticsearch"
5
+ require "logstash/inputs/elasticsearch/cursor_tracker"
6
+
7
+ describe LogStash::Inputs::Elasticsearch::CursorTracker do
8
+
9
+ let(:last_run_metadata_path) { Tempfile.new('cursor_tracker_testing').path }
10
+ let(:tracking_field_seed) { "1980-01-01T23:59:59.999999999Z" }
11
+ let(:options) do
12
+ {
13
+ :last_run_metadata_path => last_run_metadata_path,
14
+ :tracking_field => "my_field",
15
+ :tracking_field_seed => tracking_field_seed
16
+ }
17
+ end
18
+
19
+ subject { described_class.new(**options) }
20
+
21
+ it "creating a class works" do
22
+ expect(subject).to be_a described_class
23
+ end
24
+
25
+ describe "checkpoint_cursor" do
26
+ before(:each) do
27
+ subject.checkpoint_cursor(intermediate: false) # store seed value
28
+ [
29
+ Thread.new(subject) {|subject| subject.record_last_value(LogStash::Event.new("my_field" => "2025-01-03T23:59:59.999999999Z")) },
30
+ Thread.new(subject) {|subject| subject.record_last_value(LogStash::Event.new("my_field" => "2025-01-01T23:59:59.999999999Z")) },
31
+ Thread.new(subject) {|subject| subject.record_last_value(LogStash::Event.new("my_field" => "2025-01-02T23:59:59.999999999Z")) },
32
+ ].each(&:join)
33
+ end
34
+ context "when doing intermediate checkpoint" do
35
+ it "persists the smallest value" do
36
+ subject.checkpoint_cursor(intermediate: true)
37
+ expect(IO.read(last_run_metadata_path)).to eq("2025-01-01T23:59:59.999999999Z")
38
+ end
39
+ end
40
+ context "when doing non-intermediate checkpoint" do
41
+ it "persists the largest value" do
42
+ subject.checkpoint_cursor(intermediate: false)
43
+ expect(IO.read(last_run_metadata_path)).to eq("2025-01-03T23:59:59.999999999Z")
44
+ end
45
+ end
46
+ end
47
+
48
+ describe "inject_cursor" do
49
+ let(:new_value) { "2025-01-03T23:59:59.999999999Z" }
50
+ let(:fake_now) { "2026-09-19T23:59:59.999999999Z" }
51
+
52
+ let(:query) do
53
+ %q[
54
+ { "query": { "range": { "event.ingested": { "gt": :last_value, "lt": :present}}}, "sort": [ { "event.ingested": {"order": "asc", "format": "strict_date_optional_time_nanos", "numeric_type" : "date_nanos" } } ] }
55
+ ]
56
+ end
57
+
58
+ before(:each) do
59
+ subject.record_last_value(LogStash::Event.new("my_field" => new_value))
60
+ subject.checkpoint_cursor(intermediate: false)
61
+ allow(subject).to receive(:now_minus_30s).and_return(fake_now)
62
+ end
63
+
64
+ it "injects the value of the cursor into json query if it contains :last_value" do
65
+ expect(subject.inject_cursor(query)).to match(/#{new_value}/)
66
+ end
67
+
68
+ it "injects current time into json query if it contains :present" do
69
+ expect(subject.inject_cursor(query)).to match(/#{fake_now}/)
70
+ end
71
+ end
72
+ end
@@ -1152,7 +1152,7 @@ describe LogStash::Inputs::Elasticsearch, :ecs_compatibility_support do
1152
1152
 
1153
1153
  context "when there's an exception" do
1154
1154
  before(:each) do
1155
- allow(client).to receive(:search).and_raise RuntimeError
1155
+ allow(client).to receive(:search).and_raise RuntimeError.new("test exception")
1156
1156
  end
1157
1157
  it 'produces no events' do
1158
1158
  plugin.run queue
@@ -1297,6 +1297,10 @@ describe LogStash::Inputs::Elasticsearch, :ecs_compatibility_support do
1297
1297
 
1298
1298
  let(:mock_queue) { double('queue', :<< => nil) }
1299
1299
 
1300
+ before(:each) do
1301
+ plugin.send(:setup_cursor_tracker)
1302
+ end
1303
+
1300
1304
  it 'pushes a generated event to the queue' do
1301
1305
  plugin.send(:push_hit, hit, mock_queue)
1302
1306
  expect(mock_queue).to have_received(:<<) do |event|
@@ -76,6 +76,14 @@ describe LogStash::Inputs::Elasticsearch do
76
76
  shared_examples 'secured_elasticsearch' do
77
77
  it_behaves_like 'an elasticsearch index plugin'
78
78
 
79
+ let(:unauth_exception_class) do
80
+ begin
81
+ Elasticsearch::Transport::Transport::Errors::Unauthorized
82
+ rescue
83
+ Elastic::Transport::Transport::Errors::Unauthorized
84
+ end
85
+ end
86
+
79
87
  context "incorrect auth credentials" do
80
88
 
81
89
  let(:config) do
@@ -85,7 +93,7 @@ describe LogStash::Inputs::Elasticsearch do
85
93
  let(:queue) { [] }
86
94
 
87
95
  it "fails to run the plugin" do
88
- expect { plugin.register }.to raise_error Elasticsearch::Transport::Transport::Errors::Unauthorized
96
+ expect { plugin.register }.to raise_error unauth_exception_class
89
97
  end
90
98
  end
91
99
  end
metadata CHANGED
@@ -1,14 +1,14 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: logstash-input-elasticsearch
3
3
  version: !ruby/object:Gem::Version
4
- version: 4.21.2
4
+ version: 4.22.0
5
5
  platform: ruby
6
6
  authors:
7
7
  - Elastic
8
8
  autorequire:
9
9
  bindir: bin
10
10
  cert_chain: []
11
- date: 2025-03-17 00:00:00.000000000 Z
11
+ date: 2025-04-07 00:00:00.000000000 Z
12
12
  dependencies:
13
13
  - !ruby/object:Gem::Dependency
14
14
  requirement: !ruby/object:Gem::Requirement
@@ -278,6 +278,7 @@ files:
278
278
  - lib/logstash/helpers/loggable_try.rb
279
279
  - lib/logstash/inputs/elasticsearch.rb
280
280
  - lib/logstash/inputs/elasticsearch/aggregation.rb
281
+ - lib/logstash/inputs/elasticsearch/cursor_tracker.rb
281
282
  - lib/logstash/inputs/elasticsearch/paginated_search.rb
282
283
  - lib/logstash/inputs/elasticsearch/patches/_elasticsearch_transport_connections_selector.rb
283
284
  - lib/logstash/inputs/elasticsearch/patches/_elasticsearch_transport_http_manticore.rb
@@ -291,6 +292,7 @@ files:
291
292
  - spec/fixtures/test_certs/es.crt
292
293
  - spec/fixtures/test_certs/es.key
293
294
  - spec/fixtures/test_certs/renew.sh
295
+ - spec/inputs/cursor_tracker_spec.rb
294
296
  - spec/inputs/elasticsearch_spec.rb
295
297
  - spec/inputs/elasticsearch_ssl_spec.rb
296
298
  - spec/inputs/integration/elasticsearch_spec.rb
@@ -330,6 +332,7 @@ test_files:
330
332
  - spec/fixtures/test_certs/es.crt
331
333
  - spec/fixtures/test_certs/es.key
332
334
  - spec/fixtures/test_certs/renew.sh
335
+ - spec/inputs/cursor_tracker_spec.rb
333
336
  - spec/inputs/elasticsearch_spec.rb
334
337
  - spec/inputs/elasticsearch_ssl_spec.rb
335
338
  - spec/inputs/integration/elasticsearch_spec.rb