RubyGems - logstash-integration-kafka - Versions diffs - 10.0.0-java → 10.4.0-java - Mend

logstash-integration-kafka 10.0.0-java → 10.4.0-java

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (22) hide show

checksums.yaml +5 -5
data/CHANGELOG.md +24 -2
data/CONTRIBUTORS +2 -0
data/LICENSE +199 -10
data/docs/index.asciidoc +4 -1
data/docs/input-kafka.asciidoc +122 -71
data/docs/output-kafka.asciidoc +50 -18
data/lib/logstash-integration-kafka_jars.rb +3 -3
data/lib/logstash/inputs/kafka.rb +90 -54
data/lib/logstash/outputs/kafka.rb +59 -32
data/logstash-integration-kafka.gemspec +3 -3
data/spec/integration/inputs/kafka_spec.rb +81 -112
data/spec/integration/outputs/kafka_spec.rb +89 -72
data/spec/unit/inputs/kafka_spec.rb +63 -1
data/spec/unit/outputs/kafka_spec.rb +26 -5
data/vendor/jar-dependencies/com/github/luben/zstd-jni/1.4.3-1/zstd-jni-1.4.3-1.jar +0 -0
data/vendor/jar-dependencies/org/apache/kafka/kafka-clients/2.4.1/kafka-clients-2.4.1.jar +0 -0
data/vendor/jar-dependencies/org/slf4j/slf4j-api/1.7.28/slf4j-api-1.7.28.jar +0 -0
metadata +9 -9
data/vendor/jar-dependencies/com/github/luben/zstd-jni/1.4.2-1/zstd-jni-1.4.2-1.jar +0 -0
data/vendor/jar-dependencies/org/apache/kafka/kafka-clients/2.3.0/kafka-clients-2.3.0.jar +0 -0
data/vendor/jar-dependencies/org/slf4j/slf4j-api/1.7.26/slf4j-api-1.7.26.jar +0 -0

data/docs/output-kafka.asciidoc CHANGED

@@ -23,7 +23,7 @@ include::{include_path}/plugin_header.asciidoc[]
 Write events to a Kafka topic.
-This plugin uses Kafka Client 2.1.0. For broker compatibility, see the official https://cwiki.apache.org/confluence/display/KAFKA/Compatibility+Matrix[Kafka compatibility reference]. If the linked compatibility wiki is not up-to-date, please contact Kafka support/community to confirm compatibility.
+This plugin uses Kafka Client 2.3.0. For broker compatibility, see the official https://cwiki.apache.org/confluence/display/KAFKA/Compatibility+Matrix[Kafka compatibility reference]. If the linked compatibility wiki is not up-to-date, please contact Kafka support/community to confirm compatibility.
 If you require features not yet available in this plugin (including client version upgrades), please file an issue with details about what you need.
@@ -47,15 +47,19 @@ If you want the full content of your events to be sent as json, you should set t
       }
     }
-For more information see http://kafka.apache.org/documentation.html#theproducer
+For more information see https://kafka.apache.org/24/documentation.html#theproducer
-Kafka producer configuration: http://kafka.apache.org/documentation.html#newproducerconfigs
+Kafka producer configuration: https://kafka.apache.org/24/documentation.html#producerconfigs
 [id="plugins-{type}s-{plugin}-options"]
 ==== Kafka Output Configuration Options
 This plugin supports the following configuration options plus the <<plugins-{type}s-{plugin}-common-options>> described later.
+NOTE: Some of these options map to a Kafka option. Defaults usually reflect the Kafka default setting,
+and might change if Kafka's producer defaults change.
+See the https://kafka.apache.org/24/documentation for more details.
 [cols="<,<,<",options="header",]
 |=======================================================================
 |Setting |Input type|Required
@@ -63,6 +67,7 @@ This plugin supports the following configuration options plus the <
 | <<plugins-{type}s-{plugin}-batch_size>> |<<number,number>>|No
 | <<plugins-{type}s-{plugin}-bootstrap_servers>> |<<string,string>>|No
 | <<plugins-{type}s-{plugin}-buffer_memory>> |<<number,number>>|No
+| <<plugins-{type}s-{plugin}-client_dns_lookup>> |<<string,string>>|No
 | <<plugins-{type}s-{plugin}-client_id>> |<<string,string>>|No
 | <<plugins-{type}s-{plugin}-compression_type>> |<<string,string>>, one of `["none", "gzip", "snappy", "lz4"]`|No
 | <<plugins-{type}s-{plugin}-jaas_path>> |a valid filesystem path|No
@@ -73,9 +78,10 @@ This plugin supports the following configuration options plus the <
 | <<plugins-{type}s-{plugin}-message_key>> |<<string,string>>|No
 | <<plugins-{type}s-{plugin}-metadata_fetch_timeout_ms>> |<<number,number>>|No
 | <<plugins-{type}s-{plugin}-metadata_max_age_ms>> |<<number,number>>|No
+| <<plugins-{type}s-{plugin}-partitioner>> |<<string,string>>|No
 | <<plugins-{type}s-{plugin}-receive_buffer_bytes>> |<<number,number>>|No
 | <<plugins-{type}s-{plugin}-reconnect_backoff_ms>> |<<number,number>>|No
-| <<plugins-{type}s-{plugin}-request_timeout_ms>> |<<string,string>>|No
+| <<plugins-{type}s-{plugin}-request_timeout_ms>> |<<number,number>>|No
 | <<plugins-{type}s-{plugin}-retries>> |<<number,number>>|No
 | <<plugins-{type}s-{plugin}-retry_backoff_ms>> |<<number,number>>|No
 | <<plugins-{type}s-{plugin}-sasl_jaas_config>> |<<string,string>>|No
@@ -118,7 +124,7 @@ acks=all, This means the leader will wait for the full set of in-sync replicas t
 ===== `batch_size`
   * Value type is <<number,number>>
-  * Default value is `16384`
+  * Default value is `16384`.
 The producer will attempt to batch records together into fewer requests whenever multiple
 records are being sent to the same partition. This helps performance on both the client
@@ -140,10 +146,21 @@ subset of brokers.
 ===== `buffer_memory`
   * Value type is <<number,number>>
-  * Default value is `33554432`
+  * Default value is `33554432` (32MB).
 The total bytes of memory the producer can use to buffer records waiting to be sent to the server.
+[id="plugins-{type}s-{plugin}-client_dns_lookup"]
+===== `client_dns_lookup`
+  * Value type is <<string,string>>
+  * Default value is `"default"`
+How DNS lookups should be done. If set to `use_all_dns_ips`, when the lookup returns multiple
+IP addresses for a hostname, they will all be attempted to connect to before failing the
+connection. If the value is `resolve_canonical_bootstrap_servers_only` each entry will be
+resolved and expanded into a list of canonical names.
 [id="plugins-{type}s-{plugin}-client_id"]
 ===== `client_id`
@@ -220,7 +237,7 @@ to allow other records to be sent so that the sends can be batched together.
 ===== `max_request_size`
   * Value type is <<number,number>>
-  * Default value is `1048576`
+  * Default value is `1048576` (1MB).
 The maximum size of a request
@@ -230,29 +247,44 @@ The maximum size of a request
   * Value type is <<string,string>>
   * There is no default value for this setting.
-The key for the message
+The key for the message.
 [id="plugins-{type}s-{plugin}-metadata_fetch_timeout_ms"]
 ===== `metadata_fetch_timeout_ms`
   * Value type is <<number,number>>
-  * Default value is `60000`
+  * Default value is `60000` milliseconds (60 seconds).
-the timeout setting for initial metadata request to fetch topic metadata.
+The timeout setting for initial metadata request to fetch topic metadata.
 [id="plugins-{type}s-{plugin}-metadata_max_age_ms"]
 ===== `metadata_max_age_ms`
   * Value type is <<number,number>>
-  * Default value is `300000`
+  * Default value is `300000` milliseconds (5 minutes).
+The max time in milliseconds before a metadata refresh is forced.
+[id="plugins-{type}s-{plugin}-partitioner"]
+===== `partitioner`
-the max time in milliseconds before a metadata refresh is forced.
+* Value type is <<string,string>>
+* There is no default value for this setting.
+The default behavior is to hash the `message_key` of an event to get the partition.
+When no message key is present, the plugin picks a partition in a round-robin fashion.
+Available options for choosing a partitioning strategy are as follows:
+* `default` use the default partitioner as described above
+* `round_robin` distributes writes to all partitions equally, regardless of `message_key`
+* `uniform_sticky` sticks to a partition for the duration of a batch than randomly picks a new one
 [id="plugins-{type}s-{plugin}-receive_buffer_bytes"]
 ===== `receive_buffer_bytes`
   * Value type is <<number,number>>
-  * Default value is `32768`
+  * Default value is `32768` (32KB).
 The size of the TCP receive buffer to use when reading data
@@ -260,15 +292,15 @@ The size of the TCP receive buffer to use when reading data
 ===== `reconnect_backoff_ms`
   * Value type is <<number,number>>
-  * Default value is `10`
+  * Default value is `50`.
 The amount of time to wait before attempting to reconnect to a given host when a connection fails.
 [id="plugins-{type}s-{plugin}-request_timeout_ms"]
 ===== `request_timeout_ms`
-  * Value type is <<string,string>>
-  * There is no default value for this setting.
+  * Value type is <<number,number>>
+  * Default value is `40000` milliseconds (40 seconds).
 The configuration controls the maximum amount of time the client will wait
 for the response of a request. If the response is not received before the timeout
@@ -295,7 +327,7 @@ A value less than zero is a configuration error.
 ===== `retry_backoff_ms`
   * Value type is <<number,number>>
-  * Default value is `100`
+  * Default value is `100` milliseconds.
 The amount of time to wait before attempting to retry a failed produce request to a given topic partition.
@@ -348,7 +380,7 @@ Security protocol to use, which can be either of PLAINTEXT,SSL,SASL_PLAINTEXT,SA
 ===== `send_buffer_bytes`
   * Value type is <<number,number>>
-  * Default value is `131072`
+  * Default value is `131072` (128KB).
 The size of the TCP send buffer to use when sending data.

data/lib/logstash-integration-kafka_jars.rb CHANGED

@@ -1,8 +1,8 @@
 # AUTOGENERATED BY THE GRADLE SCRIPT. DO NOT EDIT.
 require 'jar_dependencies'
-require_jar('org.apache.kafka', 'kafka-clients', '2.3.0')
-require_jar('com.github.luben', 'zstd-jni', '1.4.2-1')
-require_jar('org.slf4j', 'slf4j-api', '1.7.26')
+require_jar('org.apache.kafka', 'kafka-clients', '2.4.1')
+require_jar('com.github.luben', 'zstd-jni', '1.4.3-1')
+require_jar('org.slf4j', 'slf4j-api', '1.7.28')
 require_jar('org.lz4', 'lz4-java', '1.6.0')
 require_jar('org.xerial.snappy', 'snappy-java', '1.1.7.3')

data/lib/logstash/inputs/kafka.rb CHANGED

@@ -53,7 +53,7 @@ class LogStash::Inputs::Kafka < LogStash::Inputs::Base
   default :codec, 'plain'
   # The frequency in milliseconds that the consumer offsets are committed to Kafka.
-  config :auto_commit_interval_ms, :validate => :string, :default => "5000"
+  config :auto_commit_interval_ms, :validate => :number, :default => 5000 # Kafka default
   # What to do when there is no initial offset in Kafka or if an offset is out of range:
   #
   # * earliest: automatically reset the offset to the earliest offset
@@ -70,35 +70,40 @@ class LogStash::Inputs::Kafka < LogStash::Inputs::Base
   # Automatically check the CRC32 of the records consumed. This ensures no on-the-wire or on-disk
   # corruption to the messages occurred. This check adds some overhead, so it may be
   # disabled in cases seeking extreme performance.
-  config :check_crcs, :validate => :string
+  config :check_crcs, :validate => :boolean, :default => true
+  # How DNS lookups should be done. If set to `use_all_dns_ips`, when the lookup returns multiple
+  # IP addresses for a hostname, they will all be attempted to connect to before failing the
+  # connection. If the value is `resolve_canonical_bootstrap_servers_only` each entry will be
+  # resolved and expanded into a list of canonical names.
+  config :client_dns_lookup, :validate => ["default", "use_all_dns_ips", "resolve_canonical_bootstrap_servers_only"], :default => "default"
   # The id string to pass to the server when making requests. The purpose of this
   # is to be able to track the source of requests beyond just ip/port by allowing
   # a logical application name to be included.
   config :client_id, :validate => :string, :default => "logstash"
   # Close idle connections after the number of milliseconds specified by this config.
-  config :connections_max_idle_ms, :validate => :string
+  config :connections_max_idle_ms, :validate => :number, :default => 540_000 # (9m) Kafka default
   # Ideally you should have as many threads as the number of partitions for a perfect
   # balance — more threads than partitions means that some threads will be idle
   config :consumer_threads, :validate => :number, :default => 1
   # If true, periodically commit to Kafka the offsets of messages already returned by the consumer.
   # This committed offset will be used when the process fails as the position from
   # which the consumption will begin.
-  config :enable_auto_commit, :validate => :string, :default => "true"
+  config :enable_auto_commit, :validate => :boolean, :default => true
   # Whether records from internal topics (such as offsets) should be exposed to the consumer.
   # If set to true the only way to receive records from an internal topic is subscribing to it.
   config :exclude_internal_topics, :validate => :string
   # The maximum amount of data the server should return for a fetch request. This is not an
   # absolute maximum, if the first message in the first non-empty partition of the fetch is larger
   # than this value, the message will still be returned to ensure that the consumer can make progress.
-  config :fetch_max_bytes, :validate => :string
+  config :fetch_max_bytes, :validate => :number, :default => 52_428_800 # (50MB) Kafka default
   # The maximum amount of time the server will block before answering the fetch request if
   # there isn't sufficient data to immediately satisfy `fetch_min_bytes`. This
   # should be less than or equal to the timeout used in `poll_timeout_ms`
-  config :fetch_max_wait_ms, :validate => :string
+  config :fetch_max_wait_ms, :validate => :number, :default => 500 # Kafka default
   # The minimum amount of data the server should return for a fetch request. If insufficient
   # data is available the request will wait for that much data to accumulate
   # before answering the request.
-  config :fetch_min_bytes, :validate => :string
+  config :fetch_min_bytes, :validate => :number
   # The identifier of the group this consumer belongs to. Consumer group is a single logical subscriber
   # that happens to be made up of multiple processors. Messages in a topic will be distributed to all
   # Logstash instances with the same `group_id`
@@ -108,48 +113,55 @@ class LogStash::Inputs::Kafka < LogStash::Inputs::Base
   # consumers join or leave the group. The value must be set lower than
   # `session.timeout.ms`, but typically should be set no higher than 1/3 of that value.
   # It can be adjusted even lower to control the expected time for normal rebalances.
-  config :heartbeat_interval_ms, :validate => :string
+  config :heartbeat_interval_ms, :validate => :number, :default => 3000 # Kafka default
+  # Controls how to read messages written transactionally. If set to read_committed, consumer.poll()
+  # will only return transactional messages which have been committed. If set to read_uncommitted'
+  # (the default), consumer.poll() will return all messages, even transactional messages which have
+  # been aborted. Non-transactional messages will be returned unconditionally in either mode.
+  config :isolation_level, :validate => ["read_uncommitted", "read_committed"], :default => "read_uncommitted" # Kafka default
   # Java Class used to deserialize the record's key
   config :key_deserializer_class, :validate => :string, :default => "org.apache.kafka.common.serialization.StringDeserializer"
   # The maximum delay between invocations of poll() when using consumer group management. This places
   # an upper bound on the amount of time that the consumer can be idle before fetching more records.
   # If poll() is not called before expiration of this timeout, then the consumer is considered failed and
   # the group will rebalance in order to reassign the partitions to another member.
-  # The value of the configuration `request_timeout_ms` must always be larger than max_poll_interval_ms
-  config :max_poll_interval_ms, :validate => :string
+  config :max_poll_interval_ms, :validate => :number, :default => 300_000 # (5m) Kafka default
   # The maximum amount of data per-partition the server will return. The maximum total memory used for a
   # request will be <code>#partitions * max.partition.fetch.bytes</code>. This size must be at least
   # as large as the maximum message size the server allows or else it is possible for the producer to
   # send messages larger than the consumer can fetch. If that happens, the consumer can get stuck trying
   # to fetch a large message on a certain partition.
-  config :max_partition_fetch_bytes, :validate => :string
+  config :max_partition_fetch_bytes, :validate => :number, :default => 1_048_576 # (1MB) Kafka default
   # The maximum number of records returned in a single call to poll().
-  config :max_poll_records, :validate => :string
+  config :max_poll_records, :validate => :number, :default => 500 # Kafka default
   # The period of time in milliseconds after which we force a refresh of metadata even if
   # we haven't seen any partition leadership changes to proactively discover any new brokers or partitions
-  config :metadata_max_age_ms, :validate => :string
-  # The class name of the partition assignment strategy that the client will use to distribute
-  # partition ownership amongst consumer instances
+  config :metadata_max_age_ms, :validate => :number, :default => 300_000 # (5m) Kafka default
+  # The name of the partition assignment strategy that the client uses to distribute
+  # partition ownership amongst consumer instances, supported options are `range`,
+  # `round_robin`, `sticky` and `cooperative_sticky`
+  # (for backwards compatibility setting the class name directly is supported).
   config :partition_assignment_strategy, :validate => :string
   # The size of the TCP receive buffer (SO_RCVBUF) to use when reading data.
-  config :receive_buffer_bytes, :validate => :string
-  # The amount of time to wait before attempting to reconnect to a given host.
+  # If the value is `-1`, the OS default will be used.
+  config :receive_buffer_bytes, :validate => :number, :default => 32_768 # (32KB) Kafka default
+  # The base amount of time to wait before attempting to reconnect to a given host.
   # This avoids repeatedly connecting to a host in a tight loop.
-  # This backoff applies to all requests sent by the consumer to the broker.
-  config :reconnect_backoff_ms, :validate => :string
-  # The configuration controls the maximum amount of time the client will wait
-  # for the response of a request. If the response is not received before the timeout
-  # elapses the client will resend the request if necessary or fail the request if
-  # retries are exhausted.
-  config :request_timeout_ms, :validate => :string
+  # This backoff applies to all connection attempts by the client to a broker.
+  config :reconnect_backoff_ms, :validate => :number, :default => 50 # Kafka default
+  # The configuration controls the maximum amount of time the client will wait for the response of a request.
+  # If the response is not received before the timeout elapses the client will resend the request if necessary
+  # or fail the request if retries are exhausted.
+  config :request_timeout_ms, :validate => :number, :default => 40_000 # Kafka default
   # The amount of time to wait before attempting to retry a failed fetch request
   # to a given topic partition. This avoids repeated fetching-and-failing in a tight loop.
-  config :retry_backoff_ms, :validate => :string
-  # The size of the TCP send buffer (SO_SNDBUF) to use when sending data
-  config :send_buffer_bytes, :validate => :string
+  config :retry_backoff_ms, :validate => :number, :default => 100 # Kafka default
+  # The size of the TCP send buffer (SO_SNDBUF) to use when sending data.
+  # If the value is -1, the OS default will be used.
+  config :send_buffer_bytes, :validate => :number, :default => 131_072 # (128KB) Kafka default
   # The timeout after which, if the `poll_timeout_ms` is not invoked, the consumer is marked dead
   # and a rebalance operation is triggered for the group identified by `group_id`
-  config :session_timeout_ms, :validate => :string
+  config :session_timeout_ms, :validate => :number, :default => 10_000 # (10s) Kafka default
   # Java Class used to deserialize the record's value
   config :value_deserializer_class, :validate => :string, :default => "org.apache.kafka.common.serialization.StringDeserializer"
   # A list of topics to subscribe to, defaults to ["logstash"].
@@ -159,6 +171,11 @@ class LogStash::Inputs::Kafka < LogStash::Inputs::Base
   config :topics_pattern, :validate => :string
   # Time kafka consumer will wait to receive new messages from topics
   config :poll_timeout_ms, :validate => :number, :default => 100
+  # The rack id string to pass to the server when making requests. This is used
+  # as a selector for a rack, region, or datacenter. Corresponds to the broker.rack parameter
+  # in the broker configuration.
+  # Only has an effect in combination with brokers with Kafka 2.4+ with the broker.rack setting. Ignored otherwise.
+  config :client_rack, :validate => :string
   # The truststore type.
   config :ssl_truststore_type, :validate => :string
   # The JKS truststore path to validate the Kafka broker's certificate.
@@ -269,9 +286,7 @@ class LogStash::Inputs::Kafka < LogStash::Inputs::Base
             end
           end
           # Manual offset commit
-          if @enable_auto_commit == "false"
-            consumer.commitSync
-          end
+          consumer.commitSync if @enable_auto_commit.eql?(false)
         end
       rescue org.apache.kafka.common.errors.WakeupException => e
         raise e if !stop?
@@ -287,32 +302,35 @@ class LogStash::Inputs::Kafka < LogStash::Inputs::Base
       props = java.util.Properties.new
       kafka = org.apache.kafka.clients.consumer.ConsumerConfig
-      props.put(kafka::AUTO_COMMIT_INTERVAL_MS_CONFIG, auto_commit_interval_ms)
+      props.put(kafka::AUTO_COMMIT_INTERVAL_MS_CONFIG, auto_commit_interval_ms.to_s) unless auto_commit_interval_ms.nil?
       props.put(kafka::AUTO_OFFSET_RESET_CONFIG, auto_offset_reset) unless auto_offset_reset.nil?
       props.put(kafka::BOOTSTRAP_SERVERS_CONFIG, bootstrap_servers)
-      props.put(kafka::CHECK_CRCS_CONFIG, check_crcs) unless check_crcs.nil?
+      props.put(kafka::CHECK_CRCS_CONFIG, check_crcs.to_s) unless check_crcs.nil?
+      props.put(kafka::CLIENT_DNS_LOOKUP_CONFIG, client_dns_lookup)
       props.put(kafka::CLIENT_ID_CONFIG, client_id)
-      props.put(kafka::CONNECTIONS_MAX_IDLE_MS_CONFIG, connections_max_idle_ms) unless connections_max_idle_ms.nil?
-      props.put(kafka::ENABLE_AUTO_COMMIT_CONFIG, enable_auto_commit)
+      props.put(kafka::CONNECTIONS_MAX_IDLE_MS_CONFIG, connections_max_idle_ms.to_s) unless connections_max_idle_ms.nil?
+      props.put(kafka::ENABLE_AUTO_COMMIT_CONFIG, enable_auto_commit.to_s)
       props.put(kafka::EXCLUDE_INTERNAL_TOPICS_CONFIG, exclude_internal_topics) unless exclude_internal_topics.nil?
-      props.put(kafka::FETCH_MAX_BYTES_CONFIG, fetch_max_bytes) unless fetch_max_bytes.nil?
-      props.put(kafka::FETCH_MAX_WAIT_MS_CONFIG, fetch_max_wait_ms) unless fetch_max_wait_ms.nil?
-      props.put(kafka::FETCH_MIN_BYTES_CONFIG, fetch_min_bytes) unless fetch_min_bytes.nil?
+      props.put(kafka::FETCH_MAX_BYTES_CONFIG, fetch_max_bytes.to_s) unless fetch_max_bytes.nil?
+      props.put(kafka::FETCH_MAX_WAIT_MS_CONFIG, fetch_max_wait_ms.to_s) unless fetch_max_wait_ms.nil?
+      props.put(kafka::FETCH_MIN_BYTES_CONFIG, fetch_min_bytes.to_s) unless fetch_min_bytes.nil?
       props.put(kafka::GROUP_ID_CONFIG, group_id)
-      props.put(kafka::HEARTBEAT_INTERVAL_MS_CONFIG, heartbeat_interval_ms) unless heartbeat_interval_ms.nil?
+      props.put(kafka::HEARTBEAT_INTERVAL_MS_CONFIG, heartbeat_interval_ms.to_s) unless heartbeat_interval_ms.nil?
+      props.put(kafka::ISOLATION_LEVEL_CONFIG, isolation_level)
       props.put(kafka::KEY_DESERIALIZER_CLASS_CONFIG, key_deserializer_class)
-      props.put(kafka::MAX_PARTITION_FETCH_BYTES_CONFIG, max_partition_fetch_bytes) unless max_partition_fetch_bytes.nil?
-      props.put(kafka::MAX_POLL_RECORDS_CONFIG, max_poll_records) unless max_poll_records.nil?
-      props.put(kafka::MAX_POLL_INTERVAL_MS_CONFIG, max_poll_interval_ms) unless max_poll_interval_ms.nil?
-      props.put(kafka::METADATA_MAX_AGE_CONFIG, metadata_max_age_ms) unless metadata_max_age_ms.nil?
-      props.put(kafka::PARTITION_ASSIGNMENT_STRATEGY_CONFIG, partition_assignment_strategy) unless partition_assignment_strategy.nil?
-      props.put(kafka::RECEIVE_BUFFER_CONFIG, receive_buffer_bytes) unless receive_buffer_bytes.nil?
-      props.put(kafka::RECONNECT_BACKOFF_MS_CONFIG, reconnect_backoff_ms) unless reconnect_backoff_ms.nil?
-      props.put(kafka::REQUEST_TIMEOUT_MS_CONFIG, request_timeout_ms) unless request_timeout_ms.nil?
-      props.put(kafka::RETRY_BACKOFF_MS_CONFIG, retry_backoff_ms) unless retry_backoff_ms.nil?
-      props.put(kafka::SEND_BUFFER_CONFIG, send_buffer_bytes) unless send_buffer_bytes.nil?
-      props.put(kafka::SESSION_TIMEOUT_MS_CONFIG, session_timeout_ms) unless session_timeout_ms.nil?
+      props.put(kafka::MAX_PARTITION_FETCH_BYTES_CONFIG, max_partition_fetch_bytes.to_s) unless max_partition_fetch_bytes.nil?
+      props.put(kafka::MAX_POLL_RECORDS_CONFIG, max_poll_records.to_s) unless max_poll_records.nil?
+      props.put(kafka::MAX_POLL_INTERVAL_MS_CONFIG, max_poll_interval_ms.to_s) unless max_poll_interval_ms.nil?
+      props.put(kafka::METADATA_MAX_AGE_CONFIG, metadata_max_age_ms.to_s) unless metadata_max_age_ms.nil?
+      props.put(kafka::PARTITION_ASSIGNMENT_STRATEGY_CONFIG, partition_assignment_strategy_class) unless partition_assignment_strategy.nil?
+      props.put(kafka::RECEIVE_BUFFER_CONFIG, receive_buffer_bytes.to_s) unless receive_buffer_bytes.nil?
+      props.put(kafka::RECONNECT_BACKOFF_MS_CONFIG, reconnect_backoff_ms.to_s) unless reconnect_backoff_ms.nil?
+      props.put(kafka::REQUEST_TIMEOUT_MS_CONFIG, request_timeout_ms.to_s) unless request_timeout_ms.nil?
+      props.put(kafka::RETRY_BACKOFF_MS_CONFIG, retry_backoff_ms.to_s) unless retry_backoff_ms.nil?
+      props.put(kafka::SEND_BUFFER_CONFIG, send_buffer_bytes.to_s) unless send_buffer_bytes.nil?
+      props.put(kafka::SESSION_TIMEOUT_MS_CONFIG, session_timeout_ms.to_s) unless session_timeout_ms.nil?
       props.put(kafka::VALUE_DESERIALIZER_CLASS_CONFIG, value_deserializer_class)
+      props.put(kafka::CLIENT_RACK_CONFIG, client_rack) unless client_rack.nil?
       props.put("security.protocol", security_protocol) unless security_protocol.nil?
@@ -334,6 +352,24 @@ class LogStash::Inputs::Kafka < LogStash::Inputs::Base
     end
   end
+  def partition_assignment_strategy_class
+    case partition_assignment_strategy
+    when 'range'
+      'org.apache.kafka.clients.consumer.RangeAssignor'
+    when 'round_robin'
+      'org.apache.kafka.clients.consumer.RoundRobinAssignor'
+    when 'sticky'
+      'org.apache.kafka.clients.consumer.StickyAssignor'
+    when 'cooperative_sticky'
+      'org.apache.kafka.clients.consumer.CooperativeStickyAssignor'
+    else
+      unless partition_assignment_strategy.index('.')
+        raise LogStash::ConfigurationError, "unsupported partition_assignment_strategy: #{partition_assignment_strategy.inspect}"
+      end
+      partition_assignment_strategy # assume a fully qualified class-name
+    end
+  end
   def set_trustore_keystore_config(props)
     props.put("ssl.truststore.type", ssl_truststore_type) unless ssl_truststore_type.nil?
     props.put("ssl.truststore.location", ssl_truststore_location) unless ssl_truststore_location.nil?
@@ -348,15 +384,15 @@ class LogStash::Inputs::Kafka < LogStash::Inputs::Base
   end
   def set_sasl_config(props)
-    java.lang.System.setProperty("java.security.auth.login.config",jaas_path) unless jaas_path.nil?
-    java.lang.System.setProperty("java.security.krb5.conf",kerberos_config) unless kerberos_config.nil?
+    java.lang.System.setProperty("java.security.auth.login.config", jaas_path) unless jaas_path.nil?
+    java.lang.System.setProperty("java.security.krb5.conf", kerberos_config) unless kerberos_config.nil?
-    props.put("sasl.mechanism",sasl_mechanism)
+    props.put("sasl.mechanism", sasl_mechanism)
     if sasl_mechanism == "GSSAPI" && sasl_kerberos_service_name.nil?
       raise LogStash::ConfigurationError, "sasl_kerberos_service_name must be specified when SASL mechanism is GSSAPI"
     end
-    props.put("sasl.kerberos.service.name",sasl_kerberos_service_name) unless sasl_kerberos_service_name.nil?
+    props.put("sasl.kerberos.service.name", sasl_kerberos_service_name) unless sasl_kerberos_service_name.nil?
     props.put("sasl.jaas.config", sasl_jaas_config) unless sasl_jaas_config.nil?
   end
 end #class LogStash::Inputs::Kafka

data/lib/logstash/outputs/kafka.rb CHANGED

@@ -3,8 +3,6 @@ require 'logstash/outputs/base'
 require 'java'
 require 'logstash-integration-kafka_jars.rb'
-java_import org.apache.kafka.clients.producer.ProducerRecord
 # Write events to a Kafka topic. This uses the Kafka Producer API to write messages to a topic on
 # the broker.
 #
@@ -49,6 +47,9 @@ java_import org.apache.kafka.clients.producer.ProducerRecord
 #
 # Kafka producer configuration: http://kafka.apache.org/documentation.html#newproducerconfigs
 class LogStash::Outputs::Kafka < LogStash::Outputs::Base
+  java_import org.apache.kafka.clients.producer.ProducerRecord
   declare_threadsafe!
   config_name 'kafka'
@@ -66,7 +67,7 @@ class LogStash::Outputs::Kafka < LogStash::Outputs::Base
   # The producer will attempt to batch records together into fewer requests whenever multiple
   # records are being sent to the same partition. This helps performance on both the client
   # and the server. This configuration controls the default batch size in bytes.
-  config :batch_size, :validate => :number, :default => 16384
+  config :batch_size, :validate => :number, :default => 16_384 # Kafka default
   # This is for bootstrapping and the producer will only use it for getting metadata (topics,
   # partitions and replicas). The socket connections for sending the actual data will be
   # established based on the broker information returned in the metadata. The format is
@@ -74,10 +75,15 @@ class LogStash::Outputs::Kafka < LogStash::Outputs::Base
   # subset of brokers.
   config :bootstrap_servers, :validate => :string, :default => 'localhost:9092'
   # The total bytes of memory the producer can use to buffer records waiting to be sent to the server.
-  config :buffer_memory, :validate => :number, :default => 33554432
+  config :buffer_memory, :validate => :number, :default => 33_554_432 # (32M) Kafka default
   # The compression type for all data generated by the producer.
   # The default is none (i.e. no compression). Valid values are none, gzip, or snappy.
   config :compression_type, :validate => ["none", "gzip", "snappy", "lz4"], :default => "none"
+  # How DNS lookups should be done. If set to `use_all_dns_ips`, when the lookup returns multiple
+  # IP addresses for a hostname, they will all be attempted to connect to before failing the
+  # connection. If the value is `resolve_canonical_bootstrap_servers_only` each entry will be
+  # resolved and expanded into a list of canonical names.
+  config :client_dns_lookup, :validate => ["default", "use_all_dns_ips", "resolve_canonical_bootstrap_servers_only"], :default => "default"
   # The id string to pass to the server when making requests.
   # The purpose of this is to be able to track the source of requests beyond just
   # ip/port by allowing a logical application name to be included with the request
@@ -91,24 +97,26 @@ class LogStash::Outputs::Kafka < LogStash::Outputs::Base
   # This setting accomplishes this by adding a small amount of artificial delay—that is,
   # rather than immediately sending out a record the producer will wait for up to the given delay
   # to allow other records to be sent so that the sends can be batched together.
-  config :linger_ms, :validate => :number, :default => 0
+  config :linger_ms, :validate => :number, :default => 0 # Kafka default
   # The maximum size of a request
-  config :max_request_size, :validate => :number, :default => 1048576
+  config :max_request_size, :validate => :number, :default => 1_048_576 # (1MB) Kafka default
   # The key for the message
   config :message_key, :validate => :string
   # the timeout setting for initial metadata request to fetch topic metadata.
-  config :metadata_fetch_timeout_ms, :validate => :number, :default => 60000
+  config :metadata_fetch_timeout_ms, :validate => :number, :default => 60_000
   # the max time in milliseconds before a metadata refresh is forced.
-  config :metadata_max_age_ms, :validate => :number, :default => 300000
+  config :metadata_max_age_ms, :validate => :number, :default => 300_000 # (5m) Kafka default
+  # Partitioner to use - can be `default`, `uniform_sticky`, `round_robin` or a fully qualified class name of a custom partitioner.
+  config :partitioner, :validate => :string
   # The size of the TCP receive buffer to use when reading data
-  config :receive_buffer_bytes, :validate => :number, :default => 32768
+  config :receive_buffer_bytes, :validate => :number, :default => 32_768 # (32KB) Kafka default
   # The amount of time to wait before attempting to reconnect to a given host when a connection fails.
-  config :reconnect_backoff_ms, :validate => :number, :default => 10
+  config :reconnect_backoff_ms, :validate => :number, :default => 50 # Kafka default
   # The configuration controls the maximum amount of time the client will wait
   # for the response of a request. If the response is not received before the timeout
   # elapses the client will resend the request if necessary or fail the request if
   # retries are exhausted.
-  config :request_timeout_ms, :validate => :string
+  config :request_timeout_ms, :validate => :number, :default => 40_000 # (40s) Kafka default
   # The default retry behavior is to retry until successful. To prevent data loss,
   # the use of this setting is discouraged.
   #
@@ -119,9 +127,9 @@ class LogStash::Outputs::Kafka < LogStash::Outputs::Base
   # A value less than zero is a configuration error.
   config :retries, :validate => :number
   # The amount of time to wait before attempting to retry a failed produce request to a given topic partition.
-  config :retry_backoff_ms, :validate => :number, :default => 100
+  config :retry_backoff_ms, :validate => :number, :default => 100 # Kafka default
   # The size of the TCP send buffer to use when sending data.
-  config :send_buffer_bytes, :validate => :number, :default => 131072
+  config :send_buffer_bytes, :validate => :number, :default => 131_072 # (128KB) Kafka default
   # The truststore type.
   config :ssl_truststore_type, :validate => :string
   # The JKS truststore path to validate the Kafka broker's certificate.
@@ -183,7 +191,7 @@ class LogStash::Outputs::Kafka < LogStash::Outputs::Base
         raise ConfigurationError, "A negative retry count (#{@retries}) is not valid. Must be a value >= 0"
       end
-      @logger.warn("Kafka output is configured with finite retry. This instructs Logstash to LOSE DATA after a set number of send attempts fails. If you do not want to lose data if Kafka is down, then you must remove the retry setting.", :retries => @retries)
+      logger.warn("Kafka output is configured with finite retry. This instructs Logstash to LOSE DATA after a set number of send attempts fails. If you do not want to lose data if Kafka is down, then you must remove the retry setting.", :retries => @retries)
     end
@@ -201,8 +209,6 @@ class LogStash::Outputs::Kafka < LogStash::Outputs::Base
     end
   end
-  # def register
   def prepare(record)
     # This output is threadsafe, so we need to keep a batch per thread.
     @thread_batch_map[Thread.current].add(record)
@@ -268,7 +274,7 @@ class LogStash::Outputs::Kafka < LogStash::Outputs::Base
           result = future.get()
         rescue => e
           # TODO(sissel): Add metric to count failures, possibly by exception type.
-          logger.warn("KafkaProducer.send() failed: #{e}", :exception => e)
+          logger.warn("producer send failed", :exception => e.class, :message => e.message)
           failures << batch[i]
         end
       end
@@ -302,10 +308,9 @@ class LogStash::Outputs::Kafka < LogStash::Outputs::Base
     end
     prepare(record)
   rescue LogStash::ShutdownSignal
-    @logger.debug('Kafka producer got shutdown signal')
+    logger.debug('producer received shutdown signal')
   rescue => e
-    @logger.warn('kafka producer threw exception, restarting',
-                 :exception => e)
+    logger.warn('producer threw exception, restarting', :exception => e.class, :message => e.message)
   end
   def create_producer
@@ -318,14 +323,19 @@ class LogStash::Outputs::Kafka < LogStash::Outputs::Base
       props.put(kafka::BOOTSTRAP_SERVERS_CONFIG, bootstrap_servers)
       props.put(kafka::BUFFER_MEMORY_CONFIG, buffer_memory.to_s)
       props.put(kafka::COMPRESSION_TYPE_CONFIG, compression_type)
+      props.put(kafka::CLIENT_DNS_LOOKUP_CONFIG, client_dns_lookup)
       props.put(kafka::CLIENT_ID_CONFIG, client_id) unless client_id.nil?
       props.put(kafka::KEY_SERIALIZER_CLASS_CONFIG, key_serializer)
       props.put(kafka::LINGER_MS_CONFIG, linger_ms.to_s)
       props.put(kafka::MAX_REQUEST_SIZE_CONFIG, max_request_size.to_s)
-      props.put(kafka::METADATA_MAX_AGE_CONFIG, metadata_max_age_ms) unless metadata_max_age_ms.nil?
+      props.put(kafka::METADATA_MAX_AGE_CONFIG, metadata_max_age_ms.to_s) unless metadata_max_age_ms.nil?
+      unless partitioner.nil?
+        props.put(kafka::PARTITIONER_CLASS_CONFIG, partitioner = partitioner_class)
+        logger.debug('producer configured using partitioner', :partitioner_class => partitioner)
+      end
       props.put(kafka::RECEIVE_BUFFER_CONFIG, receive_buffer_bytes.to_s) unless receive_buffer_bytes.nil?
-      props.put(kafka::RECONNECT_BACKOFF_MS_CONFIG, reconnect_backoff_ms) unless reconnect_backoff_ms.nil?
-      props.put(kafka::REQUEST_TIMEOUT_MS_CONFIG, request_timeout_ms) unless request_timeout_ms.nil?
+      props.put(kafka::RECONNECT_BACKOFF_MS_CONFIG, reconnect_backoff_ms.to_s) unless reconnect_backoff_ms.nil?
+      props.put(kafka::REQUEST_TIMEOUT_MS_CONFIG, request_timeout_ms.to_s) unless request_timeout_ms.nil?
       props.put(kafka::RETRIES_CONFIG, retries.to_s) unless retries.nil?
       props.put(kafka::RETRY_BACKOFF_MS_CONFIG, retry_backoff_ms.to_s)
       props.put(kafka::SEND_BUFFER_CONFIG, send_buffer_bytes.to_s)
@@ -342,7 +352,6 @@ class LogStash::Outputs::Kafka < LogStash::Outputs::Base
         set_sasl_config(props)
       end
       org.apache.kafka.clients.producer.KafkaProducer.new(props)
     rescue => e
       logger.error("Unable to create Kafka producer from given configuration",
@@ -352,13 +361,31 @@ class LogStash::Outputs::Kafka < LogStash::Outputs::Base
     end
   end
+  def partitioner_class
+    case partitioner
+    when 'round_robin'
+      'org.apache.kafka.clients.producer.RoundRobinPartitioner'
+    when 'uniform_sticky'
+      'org.apache.kafka.clients.producer.UniformStickyPartitioner'
+    when 'default'
+      'org.apache.kafka.clients.producer.internals.DefaultPartitioner'
+    else
+      unless partitioner.index('.')
+        raise LogStash::ConfigurationError, "unsupported partitioner: #{partitioner.inspect}"
+      end
+      partitioner # assume a fully qualified class-name
+    end
+  end
   def set_trustore_keystore_config(props)
-    if ssl_truststore_location.nil?
-      raise LogStash::ConfigurationError, "ssl_truststore_location must be set when SSL is enabled"
+    unless ssl_endpoint_identification_algorithm.to_s.strip.empty?
+      if ssl_truststore_location.nil?
+        raise LogStash::ConfigurationError, "ssl_truststore_location must be set when SSL is enabled"
+      end
+      props.put("ssl.truststore.type", ssl_truststore_type) unless ssl_truststore_type.nil?
+      props.put("ssl.truststore.location", ssl_truststore_location)
+      props.put("ssl.truststore.password", ssl_truststore_password.value) unless ssl_truststore_password.nil?
     end
-    props.put("ssl.truststore.type", ssl_truststore_type) unless ssl_truststore_type.nil?
-    props.put("ssl.truststore.location", ssl_truststore_location)
-    props.put("ssl.truststore.password", ssl_truststore_password.value) unless ssl_truststore_password.nil?
     # Client auth stuff
     props.put("ssl.keystore.type", ssl_keystore_type) unless ssl_keystore_type.nil?
@@ -369,15 +396,15 @@ class LogStash::Outputs::Kafka < LogStash::Outputs::Base
   end
   def set_sasl_config(props)
-    java.lang.System.setProperty("java.security.auth.login.config",jaas_path) unless jaas_path.nil?
-    java.lang.System.setProperty("java.security.krb5.conf",kerberos_config) unless kerberos_config.nil?
+    java.lang.System.setProperty("java.security.auth.login.config", jaas_path) unless jaas_path.nil?
+    java.lang.System.setProperty("java.security.krb5.conf", kerberos_config) unless kerberos_config.nil?
     props.put("sasl.mechanism",sasl_mechanism)
     if sasl_mechanism == "GSSAPI" && sasl_kerberos_service_name.nil?
       raise LogStash::ConfigurationError, "sasl_kerberos_service_name must be specified when SASL mechanism is GSSAPI"
     end
-    props.put("sasl.kerberos.service.name",sasl_kerberos_service_name) unless sasl_kerberos_service_name.nil?
+    props.put("sasl.kerberos.service.name", sasl_kerberos_service_name) unless sasl_kerberos_service_name.nil?
     props.put("sasl.jaas.config", sasl_jaas_config) unless sasl_jaas_config.nil?
   end