ruby-kafka 0.1.3 → 0.1.4

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA1:
3
- metadata.gz: a2754b8ff5d028b2883c5657b91d1ab13881fa27
4
- data.tar.gz: 71ddafeb0cabe9f03ec326037b6391070372befa
3
+ metadata.gz: 19434a542527d7e7ccc4b2c5d768f42a6f6e6837
4
+ data.tar.gz: 23392b9a3567d5ef80acf3a31ed6f7d97e4c0648
5
5
  SHA512:
6
- metadata.gz: 09421c5b62a1c90e6be2ca25f2b1823d92384e7176d48252c787b8c4de73feac02f7362ac36f98e9da8b1a44968591302e4ee9020f4af5ef02baa8f852f89b7d
7
- data.tar.gz: 361c720fd7bc8af0e162c463c0f24a31a5d92b942e23e6b1eeb17bb1843516ffb0b7a3778b43ee411ff492ea69274ba2e7624ab5d99631089f3da333a85fa5ed
6
+ metadata.gz: 6acc0c5c6b58a6cd6e44573d26d86e1c1eb6d1222cb37f81d820514cc8bb0cfd9d8821e05aff8e46b0172247cc4803dcffbffe32f7cde32274cdea4f1fd3941e
7
+ data.tar.gz: f4d57945d640265d13c75efaa91c3f3ba182b5d33f0c33adadbea8d6f41a9b0d50d215a6841e3dc2f9431585a05193879e8185044ad4c35b015ac3ee8e05c71c
data/README.md CHANGED
@@ -2,7 +2,7 @@
2
2
 
3
3
  [![Circle CI](https://circleci.com/gh/zendesk/ruby-kafka.svg?style=shield)](https://circleci.com/gh/zendesk/ruby-kafka/tree/master)
4
4
 
5
- A Ruby client library for the Kafka distributed log system. The focus of this library will be operational simplicity, with good logging and metrics that can make debugging issues easier.
5
+ A Ruby client library for [Apache Kafka](http://kafka.apache.org/), a distributed log and message bus. The focus of this library will be operational simplicity, with good logging and metrics that can make debugging issues easier.
6
6
 
7
7
  Currently, only the Producer API has been implemented, but a fully-fledged Consumer implementation compatible with Kafka 0.9 is on the roadmap.
8
8
 
@@ -24,6 +24,10 @@ Or install it yourself as:
24
24
 
25
25
  ## Usage
26
26
 
27
+ Please see the [documentation site](http://www.rubydoc.info/gems/ruby-kafka) for detailed documentation on the latest release.
28
+
29
+ An example of a fairly simple Kafka producer:
30
+
27
31
  ```ruby
28
32
  require "kafka"
29
33
 
@@ -64,6 +68,16 @@ producer.send_messages
64
68
 
65
69
  Read the docs for [Kafka::Producer](http://www.rubydoc.info/gems/ruby-kafka/Kafka/Producer) for more details.
66
70
 
71
+ ### Buffering and Error Handling
72
+
73
+ The producer is designed for resilience in the face of temporary network errors, Kafka broker failovers, and other issues that prevent the client from writing messages to the destination topics. It does this by employing local, in-memory buffers. Only when messages are acknowledged by a Kafka broker will they be removed from the buffer.
74
+
75
+ Typically, you'd configure the producer to retry failed attempts at sending messages, but sometimes all retries are exhausted. In that case, `Kafka::FailedToSendMessages` is raised from `Kafka::Producer#send_messages`. If you wish to have your application be resilient to this happening (e.g. if you're logging to Kafka from a web application) you can rescue this exception. The failed messages are still retained in the buffer, so a subsequent call to `#send_messages` will still attempt to send them.
76
+
77
+ Note that there's a maximum buffer size; pass in a different value for `max_buffer_size` when calling `#get_producer` in order to configure this.
78
+
79
+ A final note on buffers: local buffers give resilience against broker and network failures, and allow higher throughput due to message batching, but they also trade off consistency guarantees for higher availibility and resilience. If your local process dies while messages are buffered, those messages will be lost. If you require high levels of consistency, you should call `#send_messages` immediately after `#produce`.
80
+
67
81
  ### Understanding Timeouts
68
82
 
69
83
  It's important to understand how timeouts work if you have a latency sensitive application. This library allows configuring timeouts on different levels:
@@ -94,7 +108,7 @@ After checking out the repo, run `bin/setup` to install dependencies. Then, run
94
108
 
95
109
  The current stable release is v0.1. This release is running in production at Zendesk, but it's still not recommended that you use it when data loss is unacceptable. It will take a little while until all edge cases have been uncovered and handled.
96
110
 
97
- The API may still be changed in v0.2.
111
+ The API may still be changed in v0.2.
98
112
 
99
113
  ### v0.2: Stable Producer API
100
114
 
@@ -114,7 +128,7 @@ There are a few existing Kafka clients in Ruby:
114
128
  * [Hermann](https://github.com/reiseburo/hermann) wraps the C library [librdkafka](https://github.com/edenhill/librdkafka) and seems to be very efficient, but its API and mode of operation is too intrusive for our needs.
115
129
  * [jruby-kafka](https://github.com/joekiller/jruby-kafka) is a great option if you're running on JRuby.
116
130
 
117
- We needed a robust client that could be used from our existing Ruby apps, allowed our Ops to monitor operation, and provided flexible error handling. There didn't exist such a client, hence this project.
131
+ We needed a robust client that could be used from our existing Ruby apps, allowed our Ops to monitor operation, and provided flexible error handling. There didn't exist such a client, hence this project.
118
132
 
119
133
  ## Contributing
120
134
 
data/Rakefile CHANGED
@@ -3,4 +3,4 @@ require "rspec/core/rake_task"
3
3
 
4
4
  RSpec::Core::RakeTask.new(:spec)
5
5
 
6
- task :default => :spec
6
+ task default: :spec
@@ -0,0 +1,48 @@
1
+ # Consumes lines from a Kafka partition and writes them to STDOUT.
2
+ #
3
+ # You need to define the environment variable KAFKA_BROKERS for this
4
+ # to work, e.g.
5
+ #
6
+ # export KAFKA_BROKERS=localhost:9092
7
+ #
8
+
9
+ $LOAD_PATH.unshift(File.expand_path("../../lib", __FILE__))
10
+
11
+ require "kafka"
12
+
13
+ # We don't want log output to clutter the console. Replace `StringIO.new`
14
+ # with e.g. `$stderr` if you want to see what's happening under the hood.
15
+ logger = Logger.new(StringIO.new)
16
+
17
+ brokers = ENV.fetch("KAFKA_BROKERS").split(",")
18
+
19
+ # Make sure to create this topic in your Kafka cluster or configure the
20
+ # cluster to auto-create topics.
21
+ topic = "text"
22
+
23
+ kafka = Kafka.new(
24
+ seed_brokers: brokers,
25
+ client_id: "simple-consumer",
26
+ socket_timeout: 20,
27
+ logger: logger,
28
+ )
29
+
30
+ begin
31
+ offset = :latest
32
+ partition = 0
33
+
34
+ loop do
35
+ messages = kafka.fetch_messages(
36
+ topic: topic,
37
+ partition: partition,
38
+ offset: offset
39
+ )
40
+
41
+ messages.each do |message|
42
+ puts message.value
43
+ offset = message.offset + 1
44
+ end
45
+ end
46
+ ensure
47
+ kafka.close
48
+ end
@@ -1,4 +1,10 @@
1
1
  # Reads lines from STDIN, writing them to Kafka.
2
+ #
3
+ # You need to define the environment variable KAFKA_BROKERS for this
4
+ # to work, e.g.
5
+ #
6
+ # export KAFKA_BROKERS=localhost:9092
7
+ #
2
8
 
3
9
  $LOAD_PATH.unshift(File.expand_path("../../lib", __FILE__))
4
10
 
@@ -9,7 +15,7 @@ brokers = ENV.fetch("KAFKA_BROKERS").split(",")
9
15
 
10
16
  # Make sure to create this topic in your Kafka cluster or configure the
11
17
  # cluster to auto-create topics.
12
- topic = "random-messages"
18
+ topic = "text"
13
19
 
14
20
  kafka = Kafka.new(
15
21
  seed_brokers: brokers,
@@ -8,7 +8,7 @@ module Kafka
8
8
  # Kafka protocol specification.
9
9
  #
10
10
  # See https://cwiki.apache.org/confluence/display/KAFKA/A+Guide+To+The+Kafka+Protocol
11
- class ProtocolError < StandardError
11
+ class ProtocolError < Error
12
12
  end
13
13
 
14
14
  # This indicates that a message contents does not match its CRC.
@@ -15,14 +15,20 @@ module Kafka
15
15
  @logger = logger
16
16
  end
17
17
 
18
+ # @return [String]
18
19
  def to_s
19
20
  "#{@connection} (node_id=#{@node_id.inspect})"
20
21
  end
21
22
 
23
+ # @return [nil]
22
24
  def disconnect
23
25
  @connection.close
24
26
  end
25
27
 
28
+ # Fetches cluster metadata from the broker.
29
+ #
30
+ # @param (see Kafka::Protocol::TopicMetadataRequest#initialize)
31
+ # @return [Kafka::Protocol::MetadataResponse]
26
32
  def fetch_metadata(**options)
27
33
  request = Protocol::TopicMetadataRequest.new(**options)
28
34
  response_class = Protocol::MetadataResponse
@@ -30,6 +36,32 @@ module Kafka
30
36
  @connection.send_request(request, response_class)
31
37
  end
32
38
 
39
+ # Fetches messages from a specified topic and partition.
40
+ #
41
+ # @param (see Kafka::Protocol::FetchRequest#initialize)
42
+ # @return [Kafka::Protocol::FetchResponse]
43
+ def fetch_messages(**options)
44
+ request = Protocol::FetchRequest.new(**options)
45
+ response_class = Protocol::FetchResponse
46
+
47
+ @connection.send_request(request, response_class)
48
+ end
49
+
50
+ # Lists the offset of the specified topics and partitions.
51
+ #
52
+ # @param (see Kafka::Protocol::ListOffsetRequest#initialize)
53
+ # @return [Kafka::Protocol::ListOffsetResponse]
54
+ def list_offsets(**options)
55
+ request = Protocol::ListOffsetRequest.new(**options)
56
+ response_class = Protocol::ListOffsetResponse
57
+
58
+ @connection.send_request(request, response_class)
59
+ end
60
+
61
+ # Produces a set of messages to the broker.
62
+ #
63
+ # @param (see Kafka::Protocol::ProduceRequest#initialize)
64
+ # @return [Kafka::Protocol::ProduceResponse]
33
65
  def produce(**options)
34
66
  request = Protocol::ProduceRequest.new(**options)
35
67
  response_class = request.requires_acks? ? Protocol::ProduceResponse : nil
@@ -1,124 +1,38 @@
1
1
  require "kafka/broker"
2
2
 
3
3
  module Kafka
4
-
5
- # A broker pool represents the set of brokers in a cluster. It needs to be initialized
6
- # with a non-empty list of seed brokers. The first seed broker that the pool can connect
7
- # to will be asked for the cluster metadata, which allows the pool to map topic
8
- # partitions to the current leader for those partitions.
9
4
  class BrokerPool
10
-
11
- # Initializes a broker pool with a set of seed brokers.
12
- #
13
- # The pool will try to fetch cluster metadata from one of the brokers.
14
- #
15
- # @param seed_brokers [Array<String>]
16
- # @param client_id [String]
17
- # @param logger [Logger]
18
- # @param connect_timeout [Integer, nil] see {Connection#initialize}.
19
- # @param socket_timeout [Integer, nil] see {Connection#initialize}.
20
- def initialize(seed_brokers:, client_id:, logger:, connect_timeout: nil, socket_timeout: nil)
5
+ def initialize(client_id:, connect_timeout: nil, socket_timeout: nil, logger:)
21
6
  @client_id = client_id
22
- @logger = logger
23
7
  @connect_timeout = connect_timeout
24
8
  @socket_timeout = socket_timeout
9
+ @logger = logger
25
10
  @brokers = {}
26
- @seed_brokers = seed_brokers
27
- @cluster_info = nil
28
11
  end
29
12
 
30
- def mark_as_stale!
31
- @cluster_info = nil
32
- end
13
+ def connect(host, port, node_id: nil)
14
+ return @brokers.fetch(node_id) if @brokers.key?(node_id)
33
15
 
34
- # Finds the broker acting as the leader of the given topic and partition.
35
- #
36
- # @param topic [String]
37
- # @param partition [Integer]
38
- # @return [Broker] the broker that's currently leader.
39
- def get_leader(topic, partition)
40
- get_broker(get_leader_id(topic, partition))
41
- end
16
+ broker = Broker.connect(
17
+ host: host,
18
+ port: port,
19
+ node_id: node_id,
20
+ client_id: @client_id,
21
+ connect_timeout: @connect_timeout,
22
+ socket_timeout: @socket_timeout,
23
+ logger: @logger,
24
+ )
42
25
 
43
- def partitions_for(topic)
44
- cluster_info.partitions_for(topic)
45
- end
26
+ @brokers[node_id] = broker unless node_id.nil?
46
27
 
47
- def topics
48
- cluster_info.topics.map(&:topic_name)
28
+ broker
49
29
  end
50
30
 
51
- def shutdown
31
+ def close
52
32
  @brokers.each do |id, broker|
53
33
  @logger.info "Disconnecting broker #{id}"
54
34
  broker.disconnect
55
35
  end
56
36
  end
57
-
58
- private
59
-
60
- def get_leader_id(topic, partition)
61
- cluster_info.find_leader_id(topic, partition)
62
- end
63
-
64
- def get_broker(broker_id)
65
- @brokers[broker_id] ||= connect_to_broker(broker_id)
66
- end
67
-
68
- def cluster_info
69
- @cluster_info ||= fetch_cluster_info
70
- end
71
-
72
- # Fetches the cluster metadata.
73
- #
74
- # This is used to update the partition leadership information, among other things.
75
- # The methods will go through each node listed in `seed_brokers`, connecting to the
76
- # first one that is available. This node will be queried for the cluster metadata.
77
- #
78
- # @raise [ConnectionError] if none of the nodes in `seed_brokers` are available.
79
- # @return [Protocol::MetadataResponse] the cluster metadata.
80
- def fetch_cluster_info
81
- @seed_brokers.each do |node|
82
- @logger.info "Trying to initialize broker pool from node #{node}"
83
-
84
- begin
85
- host, port = node.split(":", 2)
86
-
87
- broker = Broker.connect(
88
- host: host,
89
- port: port.to_i,
90
- client_id: @client_id,
91
- socket_timeout: @socket_timeout,
92
- logger: @logger,
93
- )
94
-
95
- cluster_info = broker.fetch_metadata
96
-
97
- @logger.info "Initialized broker pool with brokers: #{cluster_info.brokers.inspect}"
98
-
99
- return cluster_info
100
- rescue Error => e
101
- @logger.error "Failed to fetch metadata from #{node}: #{e}"
102
- ensure
103
- broker.disconnect unless broker.nil?
104
- end
105
- end
106
-
107
- raise ConnectionError, "Could not connect to any of the seed brokers: #{@seed_brokers.inspect}"
108
- end
109
-
110
- def connect_to_broker(broker_id)
111
- broker_info = cluster_info.find_broker(broker_id)
112
-
113
- Broker.connect(
114
- host: broker_info.host,
115
- port: broker_info.port,
116
- node_id: broker_info.node_id,
117
- client_id: @client_id,
118
- connect_timeout: @connect_timeout,
119
- socket_timeout: @socket_timeout,
120
- logger: @logger,
121
- )
122
- end
123
37
  end
124
38
  end
@@ -1,9 +1,12 @@
1
- require "kafka/broker_pool"
1
+ require "kafka/cluster"
2
2
  require "kafka/producer"
3
+ require "kafka/fetched_message"
4
+ require "kafka/fetch_operation"
3
5
 
4
6
  module Kafka
5
7
  class Client
6
8
  DEFAULT_CLIENT_ID = "ruby-kafka"
9
+ DEFAULT_LOGGER = Logger.new("/dev/null")
7
10
 
8
11
  # Initializes a new Kafka client.
9
12
  #
@@ -21,15 +24,20 @@ module Kafka
21
24
  # connections. See {BrokerPool#initialize}.
22
25
  #
23
26
  # @return [Client]
24
- def initialize(seed_brokers:, client_id: DEFAULT_CLIENT_ID, logger:, connect_timeout: nil, socket_timeout: nil)
27
+ def initialize(seed_brokers:, client_id: DEFAULT_CLIENT_ID, logger: DEFAULT_LOGGER, connect_timeout: nil, socket_timeout: nil)
25
28
  @logger = logger
26
29
 
27
- @broker_pool = BrokerPool.new(
28
- seed_brokers: seed_brokers,
30
+ broker_pool = BrokerPool.new(
29
31
  client_id: client_id,
30
- logger: logger,
31
32
  connect_timeout: connect_timeout,
32
33
  socket_timeout: socket_timeout,
34
+ logger: logger,
35
+ )
36
+
37
+ @cluster = Cluster.new(
38
+ seed_brokers: seed_brokers,
39
+ broker_pool: broker_pool,
40
+ logger: logger,
33
41
  )
34
42
  end
35
43
 
@@ -38,20 +46,94 @@ module Kafka
38
46
  # `options` are passed to {Producer#initialize}.
39
47
  #
40
48
  # @see Producer#initialize
41
- # @return [Producer] the Kafka producer.
49
+ # @return [Kafka::Producer] the Kafka producer.
42
50
  def get_producer(**options)
43
- Producer.new(broker_pool: @broker_pool, logger: @logger, **options)
51
+ Producer.new(cluster: @cluster, logger: @logger, **options)
52
+ end
53
+
54
+ # Fetches a batch of messages from a single partition. Note that it's possible
55
+ # to get back empty batches.
56
+ #
57
+ # The starting point for the fetch can be configured with the `:offset` argument.
58
+ # If you pass a number, the fetch will start at that offset. However, there are
59
+ # two special Symbol values that can be passed instead:
60
+ #
61
+ # * `:earliest` — the first offset in the partition.
62
+ # * `:latest` — the next offset that will be written to, effectively making the
63
+ # call block until there is a new message in the partition.
64
+ #
65
+ # The Kafka protocol specifies the numeric values of these two options: -2 and -1,
66
+ # respectively. You can also pass in these numbers directly.
67
+ #
68
+ # ## Example
69
+ #
70
+ # When enumerating the messages in a partition, you typically fetch batches
71
+ # sequentially.
72
+ #
73
+ # offset = :earliest
74
+ #
75
+ # loop do
76
+ # messages = kafka.fetch_messages(
77
+ # topic: "my-topic",
78
+ # partition: 42,
79
+ # offset: offset,
80
+ # )
81
+ #
82
+ # messages.each do |message|
83
+ # puts message.offset, message.key, message.value
84
+ #
85
+ # # Set the next offset that should be read to be the subsequent
86
+ # # offset.
87
+ # offset = message.offset + 1
88
+ # end
89
+ # end
90
+ #
91
+ # See a working example in `examples/simple-consumer.rb`.
92
+ #
93
+ # @note This API is still alpha level. Don't try to use it in production.
94
+ #
95
+ # @param topic [String] the topic that messages should be fetched from.
96
+ #
97
+ # @param partition [Integer] the partition that messages should be fetched from.
98
+ #
99
+ # @param offset [Integer, Symbol] the offset to start reading from. Default is
100
+ # the latest offset.
101
+ #
102
+ # @param max_wait_time [Integer] the maximum amount of time to wait before
103
+ # the server responds, in seconds.
104
+ #
105
+ # @param min_bytes [Integer] the minimum number of bytes to wait for. If set to
106
+ # zero, the broker will respond immediately, but the response may be empty.
107
+ # The default is 1 byte, which means that the broker will respond as soon as
108
+ # a message is written to the partition.
109
+ #
110
+ # @param max_bytes [Integer] the maximum number of bytes to include in the
111
+ # response message set. Default is 1 MB. You need to set this higher if you
112
+ # expect messages to be larger than this.
113
+ #
114
+ # @return [Array<Kafka::FetchedMessage>] the messages returned from the broker.
115
+ def fetch_messages(topic:, partition:, offset: :latest, max_wait_time: 5, min_bytes: 1, max_bytes: 1048576)
116
+ operation = FetchOperation.new(
117
+ cluster: @cluster,
118
+ logger: @logger,
119
+ min_bytes: min_bytes,
120
+ max_wait_time: max_wait_time,
121
+ )
122
+
123
+ operation.fetch_from_partition(topic, partition, offset: offset, max_bytes: max_bytes)
124
+
125
+ operation.execute
44
126
  end
45
127
 
46
128
  # Lists all topics in the cluster.
47
129
  #
48
130
  # @return [Array<String>] the list of topic names.
49
131
  def topics
50
- @broker_pool.topics
132
+ @cluster.topics
51
133
  end
52
134
 
53
135
  def close
54
- @broker_pool.shutdown
136
+ @cluster.disconnect
55
137
  end
56
138
  end
57
139
  end