ruby-kafka-ec2 0.1.7 → 0.2.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: 66c71213189c16f43593597889adfb2ef3d0f4757cbf6ae7bb600310a2f88855
4
- data.tar.gz: 71a0256485b92b88ed891e76bd3c3cbd85420b70332835cd33d4aeaf748e207d
3
+ metadata.gz: e7732059807b7aad8dfe8df2758fa0dc9ad8f8063adaf9cf71b615c8384e74aa
4
+ data.tar.gz: feb725eb274ff28e6b3e5827f02d9c1b406c127f7109c6ec6a4001054665b125
5
5
  SHA512:
6
- metadata.gz: c88ff1e2fe4ebd92fe6b9a13a87fd5e9582c09228ae17b29cdbd4c0186f83e22d6c96bb6f193fbd5ae2ed624979fe3c33a80c6db88e72a0800b2f878ffbfc7b1
7
- data.tar.gz: 78cd8b945be174b64cdd261686683521160ed7c2c7334c6586726d1803abea4e0aad96d8470304ef324fe7b04281b7048d1591794771178c2bf71d0597f07ca7
6
+ metadata.gz: eade4b284de35a438d52b4f18928cb9e287fe65064c1a7baec50545627bd7996835150326e88499b2ee13d4499dd1aa1093471fdd829b8b2a545eecee1799a5b
7
+ data.tar.gz: fed5703514978a1986a720678d3590c29fcc9359802fe9b98a3a6632f7dcd28cf0f43151f19a8c5a1e1d9a609667a13598b3780589ff57480d928955f6219f8f
data/README.md CHANGED
@@ -24,9 +24,9 @@ Or install it yourself as:
24
24
 
25
25
  ### Kafka::EC2::MixedInstanceAssignmentStrategy
26
26
 
27
- `Kafka::EC2::MixedInstanceAssignmentStrategy` is an assignor for auto-scaling groups with mixed instance policies. The throughputs of consumers usually depend on instance families and availability zones. For example, if your application writes data to a database, the throughputs of consumers running on the same availability zone as the writer DB instance is higher.
27
+ `Kafka::EC2::MixedInstanceAssignmentStrategy` is an assignor for auto-scaling groups with mixed instance policies. The throughputs of consumers usually depend on instance families and availability zones. For example, if your application writes data to a database, the throughputs of consumers running on the same availability zone as that of the writer DB instance is higher.
28
28
 
29
- To assign more partitions to consumers with high throughputs, you have to define `Kafka::EC2::MixedInstanceAssignmentStrategyFactory` first like below:
29
+ To assign more partitions to consumers with high throughputs, you have to initialize `Kafka::EC2::MixedInstanceAssignmentStrategy` first like below:
30
30
 
31
31
  ```ruby
32
32
  require "aws-sdk-rds"
@@ -34,7 +34,7 @@ require "kafka"
34
34
  require "kafka/ec2"
35
35
 
36
36
  rds = Aws::RDS::Client.new(region: "ap-northeast-1")
37
- assignment_strategy_factory = Kafka::EC2::MixedInstanceAssignmentStrategyFactory.new(
37
+ assignment_strategy = Kafka::EC2::MixedInstanceAssignmentStrategy.new(
38
38
  instance_family_weights: {
39
39
  "r4" => 1.00,
40
40
  "r5" => 1.20,
@@ -68,19 +68,17 @@ assignment_strategy_factory = Kafka::EC2::MixedInstanceAssignmentStrategyFactory
68
68
 
69
69
  In the preceding example, consumers running on c5 instances will have 1.5x as many partitions compared to consumers running on r4 instances. In a similar way, if the writer DB instance is in ap-northeast-1a, consumers in ap-northeast-1a will have 4x as many partitions compared to consumers in ap-northeast-1c.
70
70
 
71
- You can use `Kafka::EC2::MixedInstanceAssignmentStrategy` by specifying the factory to `Kafka::EC2.with_assignment_strategy_factory` and creating a consumer in the block:
71
+ You can use `Kafka::EC2::MixedInstanceAssignmentStrategy` by specifying it to `Kafka#consumer`:
72
72
 
73
73
 
74
74
  ```ruby
75
- consumer = Kafka::EC2.with_assignment_strategy_factory(assignment_strategy_factory) do
76
- kafka.consumer(group_id: ENV["KAFKA_CONSUMER_GROUP_ID"])
77
- end
75
+ consumer = kafka.consumer(group_id: ENV["KAFKA_CONSUMER_GROUP_ID"], assignment_strategy: assignment_strategy)
78
76
  ```
79
77
 
80
78
  You can also specify weights for each combination of availability zones and instance families:
81
79
 
82
80
  ```ruby
83
- assignment_strategy_factory = Kafka::EC2::MixedInstanceAssignmentStrategyFactory.new(
81
+ assignment_strategy = Kafka::EC2::MixedInstanceAssignmentStrategy.new(
84
82
  weights: ->() {
85
83
  db_cluster = rds.describe_db_clusters(filters: [
86
84
  { name: "db-cluster-id", values: [ENV["RDS_CLUSTER"]] },
@@ -121,7 +119,7 @@ assignment_strategy_factory = Kafka::EC2::MixedInstanceAssignmentStrategyFactory
121
119
  The strategy also has the option `partition_weights`. This is useful when the topic has some skewed partitions. Suppose the partition with ID 0 of the topic "foo" receives twice as many records as other partitions. To reduce the number of partitions assigned to the consumer that consumes the partition with ID 0, specify `partition_weights` like below:
122
120
 
123
121
  ```ruby
124
- assignment_strategy_factory = Kafka::EC2::MixedInstanceAssignmentStrategyFactory.new(
122
+ assignment_strategy = Kafka::EC2::MixedInstanceAssignmentStrategy.new(
125
123
  partition_weights: {
126
124
  "foo" => {
127
125
  0 => 2,
@@ -6,10 +6,8 @@ require "kafka/protocol/member_assignment"
6
6
  module Kafka
7
7
  class EC2
8
8
  class MixedInstanceAssignmentStrategy
9
- # metadata is a byte sequence created by Kafka::Protocol::ConsumerGroupProtocol.encode
10
- attr_accessor :member_id_to_metadata
9
+ DELIMITER = ","
11
10
 
12
- # @param cluster [Kafka::Cluster]
13
11
  # @param instance_family_weights [Hash{String => Numeric}, Proc] a hash whose the key
14
12
  # is the instance family and whose value is the weight. If the object is a proc,
15
13
  # it must returns such a hash and the proc is called every time the method "assign"
@@ -23,22 +21,35 @@ module Kafka
23
21
  # instance_family_weights or availability_zone_weights. If the object is a proc,
24
22
  # it must returns such a hash and the proc is called every time the method "assign"
25
23
  # is called.
26
- def initialize(cluster:, instance_family_weights: {}, availability_zone_weights: {}, weights: {}, partition_weights: {})
27
- @cluster = cluster
24
+ def initialize(instance_family_weights: {}, availability_zone_weights: {}, weights: {}, partition_weights: {})
28
25
  @instance_family_weights = instance_family_weights
29
26
  @availability_zone_weights = availability_zone_weights
30
27
  @weights = weights
31
28
  @partition_weights = partition_weights
32
29
  end
33
30
 
31
+ def protocol_name
32
+ "mixedinstance"
33
+ end
34
+
35
+ def user_data
36
+ Net::HTTP.start("169.254.169.254", 80) do |http|
37
+ [
38
+ http.get("/latest/meta-data/instance-id").body,
39
+ http.get("/latest/meta-data/instance-type").body,
40
+ http.get("/latest/meta-data/placement/availability-zone").body,
41
+ ].join(DELIMITER)
42
+ end
43
+ end
44
+
34
45
  # Assign the topic partitions to the group members.
35
46
  #
36
47
  # @param members [Array<String>] member ids
37
48
  # @param topics [Array<String>] topics
38
49
  # @return [Hash{String => Protocol::MemberAssignment}] a hash mapping member
39
50
  # ids to assignments.
40
- def assign(members:, topics:)
41
- group_assignment = {}
51
+ def call(cluster:, members:, partitions:)
52
+ member_id_to_partitions = Hash.new { |h, k| h[k] = [] }
42
53
  instance_id_to_capacity = Hash.new(0)
43
54
  instance_id_to_member_ids = Hash.new { |h, k| h[k] = [] }
44
55
  total_capacity = 0
@@ -47,10 +58,8 @@ module Kafka
47
58
  instance_family_to_capacity = @instance_family_weights.is_a?(Proc) ? @instance_family_weights.call() : @instance_family_weights
48
59
  az_to_capacity = @availability_zone_weights.is_a?(Proc) ? @availability_zone_weights.call() : @availability_zone_weights
49
60
  weights = @weights.is_a?(Proc) ? @weights.call() : @weights
50
- members.each do |member_id|
51
- group_assignment[member_id] = Protocol::MemberAssignment.new
52
-
53
- instance_id, instance_type, az = member_id_to_metadata[member_id].split(",")
61
+ members.each do |member_id, metadata|
62
+ instance_id, instance_type, az = metadata.user_data.split(DELIMITER)
54
63
  instance_id_to_member_ids[instance_id] << member_id
55
64
  member_id_to_instance_id[member_id] = instance_id
56
65
  capacity = calculate_capacity(instance_type, az, instance_family_to_capacity, az_to_capacity, weights)
@@ -58,17 +67,8 @@ module Kafka
58
67
  total_capacity += capacity
59
68
  end
60
69
 
61
- topic_partitions = topics.flat_map do |topic|
62
- begin
63
- partitions = @cluster.partitions_for(topic).map(&:partition_id)
64
- rescue UnknownTopicOrPartition
65
- raise UnknownTopicOrPartition, "unknown topic #{topic}"
66
- end
67
- Array.new(partitions.count) { topic }.zip(partitions)
68
- end
69
-
70
- partition_weights = build_partition_weights(topics)
71
- partition_weight_per_capacity = topic_partitions.sum { |topic, partition| partition_weights.dig(topic, partition) } / total_capacity
70
+ partition_weights = build_partition_weights(partitions)
71
+ partition_weight_per_capacity = partitions.sum { |partition| partition_weights.dig(partition.topic, partition.partition_id) } / total_capacity
72
72
 
73
73
  last_index = 0
74
74
  member_id_to_acceptable_partition_weight = {}
@@ -77,12 +77,12 @@ module Kafka
77
77
  member_ids = instance_id_to_member_ids[instance_id]
78
78
  member_ids.each do |member_id|
79
79
  acceptable_partition_weight = capacity * partition_weight_per_capacity / member_ids.size
80
- while last_index < topic_partitions.size
81
- topic, partition = topic_partitions[last_index]
82
- partition_weight = partition_weights.dig(topic, partition)
80
+ while last_index < partitions.size
81
+ partition = partitions[last_index]
82
+ partition_weight = partition_weights.dig(partition.topic, partition.partition_id)
83
83
  break if acceptable_partition_weight - partition_weight < 0
84
84
 
85
- group_assignment[member_id].assign(topic, [partition])
85
+ member_id_to_partitions[member_id] << partition
86
86
  acceptable_partition_weight -= partition_weight
87
87
 
88
88
  last_index += 1
@@ -93,7 +93,7 @@ module Kafka
93
93
  end
94
94
  end
95
95
 
96
- while last_index < topic_partitions.size
96
+ while last_index < partitions.size
97
97
  max_acceptable_partition_weight = member_id_to_acceptable_partition_weight.values.max
98
98
  member_ids = member_id_to_acceptable_partition_weight.select { |_, w| w == max_acceptable_partition_weight }.keys
99
99
  if member_ids.size == 1
@@ -101,17 +101,17 @@ module Kafka
101
101
  else
102
102
  member_id = member_ids.max_by { |id| instance_id_to_total_acceptable_partition_weight[member_id_to_instance_id[id]] }
103
103
  end
104
- topic, partition = topic_partitions[last_index]
105
- group_assignment[member_id].assign(topic, [partition])
104
+ partition = partitions[last_index]
105
+ member_id_to_partitions[member_id] << partition
106
106
 
107
- partition_weight = partition_weights.dig(topic, partition)
107
+ partition_weight = partition_weights.dig(partition.topic, partition.partition_id)
108
108
  member_id_to_acceptable_partition_weight[member_id] -= partition_weight
109
109
  instance_id_to_total_acceptable_partition_weight[member_id_to_instance_id[member_id]] -= partition_weight
110
110
 
111
111
  last_index += 1
112
112
  end
113
113
 
114
- group_assignment
114
+ member_id_to_partitions
115
115
  rescue Kafka::LeaderNotAvailable
116
116
  sleep 1
117
117
  retry
@@ -126,12 +126,12 @@ module Kafka
126
126
  (capacity || instance_family_to_capacity.fetch(instance_family, 1) * az_to_capacity.fetch(az, 1)).to_f
127
127
  end
128
128
 
129
- def build_partition_weights(topics)
129
+ def build_partition_weights(partitions)
130
130
  # Duplicate the weights to not destruct @partition_weights or the return value of @partition_weights
131
- weights = (@partition_weights.is_a?(Proc) ? @partition_weights.call() : @partition_weights).dup
132
- topics.each do |t|
133
- weights[t] = weights[t].dup || {}
134
- weights[t].default = 1
131
+ weights = (@partition_weights.is_a?(Proc) ? @partition_weights.call : @partition_weights).dup
132
+ partitions.map(&:topic).uniq.each do |topic|
133
+ weights[topic] = weights[topic].dup || {}
134
+ weights[topic].default = 1
135
135
  end
136
136
 
137
137
  weights
@@ -1,5 +1,5 @@
1
1
  module Kafka
2
2
  class EC2
3
- VERSION = "0.1.7"
3
+ VERSION = "0.2.0"
4
4
  end
5
5
  end
data/lib/kafka/ec2.rb CHANGED
@@ -1,23 +1,7 @@
1
- require "kafka/ec2/ext/consumer_group"
2
- require "kafka/ec2/ext/protocol/join_group_request"
3
- require "kafka/ec2/mixed_instance_assignment_strategy_factory"
1
+ require "kafka/ec2/mixed_instance_assignment_strategy"
4
2
  require "kafka/ec2/version"
5
3
 
6
4
  module Kafka
7
5
  class EC2
8
- class << self
9
- attr_reader :assignment_strategy_factory
10
-
11
- def with_assignment_strategy_factory(factory)
12
- @assignment_strategy_factory = factory
13
- yield
14
- ensure
15
- @assignment_strategy_factory = nil
16
- end
17
-
18
- def assignment_strategy_classes
19
- @assignment_strategy_classes ||= {}
20
- end
21
- end
22
6
  end
23
7
  end
metadata CHANGED
@@ -1,14 +1,14 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: ruby-kafka-ec2
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.1.7
4
+ version: 0.2.0
5
5
  platform: ruby
6
6
  authors:
7
7
  - abicky
8
- autorequire:
8
+ autorequire:
9
9
  bindir: exe
10
10
  cert_chain: []
11
- date: 2021-03-16 00:00:00.000000000 Z
11
+ date: 2022-03-29 00:00:00.000000000 Z
12
12
  dependencies:
13
13
  - !ruby/object:Gem::Dependency
14
14
  name: ruby-kafka
@@ -89,10 +89,7 @@ files:
89
89
  - bin/console
90
90
  - bin/setup
91
91
  - lib/kafka/ec2.rb
92
- - lib/kafka/ec2/ext/consumer_group.rb
93
- - lib/kafka/ec2/ext/protocol/join_group_request.rb
94
92
  - lib/kafka/ec2/mixed_instance_assignment_strategy.rb
95
- - lib/kafka/ec2/mixed_instance_assignment_strategy_factory.rb
96
93
  - lib/kafka/ec2/version.rb
97
94
  - ruby-kafka-ec2.gemspec
98
95
  homepage: https://github.com/abicky/ruby-kafka-ec2
@@ -101,7 +98,7 @@ licenses:
101
98
  metadata:
102
99
  homepage_uri: https://github.com/abicky/ruby-kafka-ec2
103
100
  source_code_uri: https://github.com/abicky/ruby-kafka-ec2
104
- post_install_message:
101
+ post_install_message:
105
102
  rdoc_options: []
106
103
  require_paths:
107
104
  - lib
@@ -116,8 +113,8 @@ required_rubygems_version: !ruby/object:Gem::Requirement
116
113
  - !ruby/object:Gem::Version
117
114
  version: '0'
118
115
  requirements: []
119
- rubygems_version: 3.1.4
120
- signing_key:
116
+ rubygems_version: 3.2.22
117
+ signing_key:
121
118
  specification_version: 4
122
119
  summary: An extension of ruby-kafka for EC2
123
120
  test_files: []
@@ -1,33 +0,0 @@
1
- # frozen_string_literal: true
2
-
3
- require "kafka/consumer_group"
4
- require "kafka/ec2/mixed_instance_assignment_strategy"
5
-
6
- module Kafka
7
- class EC2
8
- module Ext
9
- module ConsumerGroup
10
- def initialize(*args, **kwargs)
11
- super
12
- if Kafka::EC2.assignment_strategy_factory
13
- @assignment_strategy = Kafka::EC2.assignment_strategy_factory.create(cluster: @cluster)
14
- end
15
- Kafka::EC2.assignment_strategy_classes[@group_id] = @assignment_strategy.class
16
- end
17
-
18
- def join_group
19
- super
20
- if Kafka::EC2.assignment_strategy_classes[@group_id] == Kafka::EC2::MixedInstanceAssignmentStrategy
21
- @assignment_strategy.member_id_to_metadata = @members
22
- end
23
- end
24
- end
25
- end
26
- end
27
- end
28
-
29
- module Kafka
30
- class ConsumerGroup
31
- prepend Kafka::EC2::Ext::ConsumerGroup
32
- end
33
- end
@@ -1,39 +0,0 @@
1
- # frozen_string_literal: true
2
-
3
- require "net/http"
4
-
5
- require "kafka/protocol/consumer_group_protocol"
6
- require "kafka/protocol/join_group_request"
7
-
8
- module Kafka
9
- class EC2
10
- module Ext
11
- module Protocol
12
- module JoinGroupRequest
13
- def initialize(*args, topics: [], **kwargs)
14
- super
15
- if Kafka::EC2.assignment_strategy_classes[@group_id] == Kafka::EC2::MixedInstanceAssignmentStrategy
16
- user_data = Net::HTTP.start("169.254.169.254", 80) do |http|
17
- instance_id = http.get("/latest/meta-data/instance-id").body
18
- instance_type = http.get("/latest/meta-data/instance-type").body
19
- az = http.get("/latest/meta-data/placement/availability-zone").body
20
- "|#{instance_id},#{instance_type},#{az}"
21
- end
22
- @group_protocols = {
23
- "mixedinstance" => Kafka::Protocol::ConsumerGroupProtocol.new(topics: topics, user_data: user_data),
24
- }
25
- end
26
- end
27
- end
28
- end
29
- end
30
- end
31
- end
32
-
33
- module Kafka
34
- module Protocol
35
- class JoinGroupRequest
36
- prepend Kafka::EC2::Ext::Protocol::JoinGroupRequest
37
- end
38
- end
39
- end
@@ -1,30 +0,0 @@
1
- # frozen_string_literal: true
2
-
3
- require "kafka/ec2/mixed_instance_assignment_strategy"
4
-
5
- module Kafka
6
- class EC2
7
- class MixedInstanceAssignmentStrategyFactory
8
- # @param instance_family_weights [Hash, Proc]
9
- # @param availability_zone_weights [Hash, Proc]
10
- # @param weights [Hash, Proc]
11
- # @see Kafka::EC2::MixedInstanceAssignmentStrategy#initialize
12
- def initialize(instance_family_weights: {}, availability_zone_weights: {}, weights: {}, partition_weights: {})
13
- @instance_family_weights = instance_family_weights
14
- @availability_zone_weights = availability_zone_weights
15
- @weights = weights
16
- @partition_weights = partition_weights
17
- end
18
-
19
- def create(cluster:)
20
- Kafka::EC2::MixedInstanceAssignmentStrategy.new(
21
- cluster: cluster,
22
- instance_family_weights: @instance_family_weights,
23
- availability_zone_weights: @availability_zone_weights,
24
- weights: @weights,
25
- partition_weights: @partition_weights,
26
- )
27
- end
28
- end
29
- end
30
- end