ruby-kafka-ec2 0.1.5 → 0.2.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: 8dde731c3652090bf18202d68b916cbdcff9ed09673bd84d5f20470a37c63373
4
- data.tar.gz: d1a95de4724b3b5f85230c55a70469cc5e6c1e6008423b83f74c415bf2c9d289
3
+ metadata.gz: e7732059807b7aad8dfe8df2758fa0dc9ad8f8063adaf9cf71b615c8384e74aa
4
+ data.tar.gz: feb725eb274ff28e6b3e5827f02d9c1b406c127f7109c6ec6a4001054665b125
5
5
  SHA512:
6
- metadata.gz: f37b8fa41b773933aac85f170884adb75fc0e446faf9fb92c109aa039f5a869874194dbdf3a9099899e273ef8543f75c5f7aca0fd99cff1845bc43ac081bde50
7
- data.tar.gz: 78bc5df7157441563d73e19f35804069ddbb2c1863bccfe2711594c27a6caed78a5209fdd2ca7f55a3cf2302bed326ec47302720bf4253a9f8b4df950e5a0d0f
6
+ metadata.gz: eade4b284de35a438d52b4f18928cb9e287fe65064c1a7baec50545627bd7996835150326e88499b2ee13d4499dd1aa1093471fdd829b8b2a545eecee1799a5b
7
+ data.tar.gz: fed5703514978a1986a720678d3590c29fcc9359802fe9b98a3a6632f7dcd28cf0f43151f19a8c5a1e1d9a609667a13598b3780589ff57480d928955f6219f8f
data/README.md CHANGED
@@ -24,9 +24,9 @@ Or install it yourself as:
24
24
 
25
25
  ### Kafka::EC2::MixedInstanceAssignmentStrategy
26
26
 
27
- `Kafka::EC2::MixedInstanceAssignmentStrategy` is an assignor for auto-scaling groups with mixed instance policies. The throughputs of consumers usually depend on instance families and availability zones. For example, if your application writes data to a database, the throughputs of consumers running on the same availability zone as the writer DB instance is higher.
27
+ `Kafka::EC2::MixedInstanceAssignmentStrategy` is an assignor for auto-scaling groups with mixed instance policies. The throughputs of consumers usually depend on instance families and availability zones. For example, if your application writes data to a database, the throughputs of consumers running on the same availability zone as that of the writer DB instance is higher.
28
28
 
29
- To assign more partitions to consumers with high throughputs, you have to define `Kafka::EC2::MixedInstanceAssignmentStrategyFactory` first like below:
29
+ To assign more partitions to consumers with high throughputs, you have to initialize `Kafka::EC2::MixedInstanceAssignmentStrategy` first like below:
30
30
 
31
31
  ```ruby
32
32
  require "aws-sdk-rds"
@@ -34,7 +34,7 @@ require "kafka"
34
34
  require "kafka/ec2"
35
35
 
36
36
  rds = Aws::RDS::Client.new(region: "ap-northeast-1")
37
- assignment_strategy_factory = Kafka::EC2::MixedInstanceAssignmentStrategyFactory.new(
37
+ assignment_strategy = Kafka::EC2::MixedInstanceAssignmentStrategy.new(
38
38
  instance_family_weights: {
39
39
  "r4" => 1.00,
40
40
  "r5" => 1.20,
@@ -68,19 +68,17 @@ assignment_strategy_factory = Kafka::EC2::MixedInstanceAssignmentStrategyFactory
68
68
 
69
69
  In the preceding example, consumers running on c5 instances will have 1.5x as many partitions compared to consumers running on r4 instances. In a similar way, if the writer DB instance is in ap-northeast-1a, consumers in ap-northeast-1a will have 4x as many partitions compared to consumers in ap-northeast-1c.
70
70
 
71
- You can use `Kafka::EC2::MixedInstanceAssignmentStrategy` by specifying the factory to `Kafka::EC2.with_assignment_strategy_factory` and creating a consumer in the block:
71
+ You can use `Kafka::EC2::MixedInstanceAssignmentStrategy` by specifying it to `Kafka#consumer`:
72
72
 
73
73
 
74
74
  ```ruby
75
- consumer = Kafka::EC2.with_assignment_strategy_factory(assignment_strategy_factory) do
76
- kafka.consumer(group_id: ENV["KAFKA_CONSUMER_GROUP_ID"])
77
- end
75
+ consumer = kafka.consumer(group_id: ENV["KAFKA_CONSUMER_GROUP_ID"], assignment_strategy: assignment_strategy)
78
76
  ```
79
77
 
80
78
  You can also specify weights for each combination of availability zones and instance families:
81
79
 
82
80
  ```ruby
83
- assignment_strategy_factory = Kafka::EC2::MixedInstanceAssignmentStrategyFactory.new(
81
+ assignment_strategy = Kafka::EC2::MixedInstanceAssignmentStrategy.new(
84
82
  weights: ->() {
85
83
  db_cluster = rds.describe_db_clusters(filters: [
86
84
  { name: "db-cluster-id", values: [ENV["RDS_CLUSTER"]] },
@@ -121,7 +119,7 @@ assignment_strategy_factory = Kafka::EC2::MixedInstanceAssignmentStrategyFactory
121
119
  The strategy also has the option `partition_weights`. This is useful when the topic has some skewed partitions. Suppose the partition with ID 0 of the topic "foo" receives twice as many records as other partitions. To reduce the number of partitions assigned to the consumer that consumes the partition with ID 0, specify `partition_weights` like below:
122
120
 
123
121
  ```ruby
124
- assignment_strategy_factory = Kafka::EC2::MixedInstanceAssignmentStrategyFactory.new(
122
+ assignment_strategy = Kafka::EC2::MixedInstanceAssignmentStrategy.new(
125
123
  partition_weights: {
126
124
  "foo" => {
127
125
  0 => 2,
@@ -6,10 +6,8 @@ require "kafka/protocol/member_assignment"
6
6
  module Kafka
7
7
  class EC2
8
8
  class MixedInstanceAssignmentStrategy
9
- # metadata is a byte sequence created by Kafka::Protocol::ConsumerGroupProtocol.encode
10
- attr_accessor :member_id_to_metadata
9
+ DELIMITER = ","
11
10
 
12
- # @param cluster [Kafka::Cluster]
13
11
  # @param instance_family_weights [Hash{String => Numeric}, Proc] a hash whose the key
14
12
  # is the instance family and whose value is the weight. If the object is a proc,
15
13
  # it must returns such a hash and the proc is called every time the method "assign"
@@ -23,82 +21,97 @@ module Kafka
23
21
  # instance_family_weights or availability_zone_weights. If the object is a proc,
24
22
  # it must returns such a hash and the proc is called every time the method "assign"
25
23
  # is called.
26
- def initialize(cluster:, instance_family_weights: {}, availability_zone_weights: {}, weights: {}, partition_weights: {})
27
- @cluster = cluster
24
+ def initialize(instance_family_weights: {}, availability_zone_weights: {}, weights: {}, partition_weights: {})
28
25
  @instance_family_weights = instance_family_weights
29
26
  @availability_zone_weights = availability_zone_weights
30
27
  @weights = weights
31
28
  @partition_weights = partition_weights
32
29
  end
33
30
 
31
+ def protocol_name
32
+ "mixedinstance"
33
+ end
34
+
35
+ def user_data
36
+ Net::HTTP.start("169.254.169.254", 80) do |http|
37
+ [
38
+ http.get("/latest/meta-data/instance-id").body,
39
+ http.get("/latest/meta-data/instance-type").body,
40
+ http.get("/latest/meta-data/placement/availability-zone").body,
41
+ ].join(DELIMITER)
42
+ end
43
+ end
44
+
34
45
  # Assign the topic partitions to the group members.
35
46
  #
36
47
  # @param members [Array<String>] member ids
37
48
  # @param topics [Array<String>] topics
38
49
  # @return [Hash{String => Protocol::MemberAssignment}] a hash mapping member
39
50
  # ids to assignments.
40
- def assign(members:, topics:)
41
- group_assignment = {}
51
+ def call(cluster:, members:, partitions:)
52
+ member_id_to_partitions = Hash.new { |h, k| h[k] = [] }
42
53
  instance_id_to_capacity = Hash.new(0)
43
54
  instance_id_to_member_ids = Hash.new { |h, k| h[k] = [] }
44
55
  total_capacity = 0
56
+ member_id_to_instance_id = {}
45
57
 
46
58
  instance_family_to_capacity = @instance_family_weights.is_a?(Proc) ? @instance_family_weights.call() : @instance_family_weights
47
59
  az_to_capacity = @availability_zone_weights.is_a?(Proc) ? @availability_zone_weights.call() : @availability_zone_weights
48
60
  weights = @weights.is_a?(Proc) ? @weights.call() : @weights
49
- members.each do |member_id|
50
- group_assignment[member_id] = Protocol::MemberAssignment.new
51
-
52
- instance_id, instance_type, az = member_id_to_metadata[member_id].split(",")
61
+ members.each do |member_id, metadata|
62
+ instance_id, instance_type, az = metadata.user_data.split(DELIMITER)
53
63
  instance_id_to_member_ids[instance_id] << member_id
64
+ member_id_to_instance_id[member_id] = instance_id
54
65
  capacity = calculate_capacity(instance_type, az, instance_family_to_capacity, az_to_capacity, weights)
55
66
  instance_id_to_capacity[instance_id] += capacity
56
67
  total_capacity += capacity
57
68
  end
58
69
 
59
- topic_partitions = topics.flat_map do |topic|
60
- begin
61
- partitions = @cluster.partitions_for(topic).map(&:partition_id)
62
- rescue UnknownTopicOrPartition
63
- raise UnknownTopicOrPartition, "unknown topic #{topic}"
64
- end
65
- Array.new(partitions.count) { topic }.zip(partitions)
66
- end
67
-
68
- partition_weights = build_partition_weights(topics)
69
- partition_weight_per_capacity = topic_partitions.sum { |topic, partition| partition_weights.dig(topic, partition) } / total_capacity
70
+ partition_weights = build_partition_weights(partitions)
71
+ partition_weight_per_capacity = partitions.sum { |partition| partition_weights.dig(partition.topic, partition.partition_id) } / total_capacity
70
72
 
71
73
  last_index = 0
72
74
  member_id_to_acceptable_partition_weight = {}
75
+ instance_id_to_total_acceptable_partition_weight = Hash.new(0)
73
76
  instance_id_to_capacity.each do |instance_id, capacity|
74
77
  member_ids = instance_id_to_member_ids[instance_id]
75
78
  member_ids.each do |member_id|
76
79
  acceptable_partition_weight = capacity * partition_weight_per_capacity / member_ids.size
77
- loop do
78
- topic, partition = topic_partitions[last_index]
79
- partition_weight = partition_weights.dig(topic, partition)
80
- if last_index == topic_partitions.size || acceptable_partition_weight - partition_weight < 0
81
- member_id_to_acceptable_partition_weight[member_id] = acceptable_partition_weight
82
- break
83
- end
84
-
85
- group_assignment[member_id].assign(topic, [partition])
86
- last_index += 1
80
+ while last_index < partitions.size
81
+ partition = partitions[last_index]
82
+ partition_weight = partition_weights.dig(partition.topic, partition.partition_id)
83
+ break if acceptable_partition_weight - partition_weight < 0
84
+
85
+ member_id_to_partitions[member_id] << partition
87
86
  acceptable_partition_weight -= partition_weight
87
+
88
+ last_index += 1
88
89
  end
90
+
91
+ member_id_to_acceptable_partition_weight[member_id] = acceptable_partition_weight
92
+ instance_id_to_total_acceptable_partition_weight[instance_id] += acceptable_partition_weight
89
93
  end
90
94
  end
91
95
 
92
- if last_index < topic_partitions.size
93
- member_id_to_acceptable_partition_weight.sort_by { |_, remaining| -remaining }.each do |member_id, _|
94
- topic, partition = topic_partitions[last_index]
95
- group_assignment[member_id].assign(topic, [partition])
96
- last_index += 1
97
- break if last_index == topic_partitions.size
96
+ while last_index < partitions.size
97
+ max_acceptable_partition_weight = member_id_to_acceptable_partition_weight.values.max
98
+ member_ids = member_id_to_acceptable_partition_weight.select { |_, w| w == max_acceptable_partition_weight }.keys
99
+ if member_ids.size == 1
100
+ member_id = member_ids.first
101
+ else
102
+ member_id = member_ids.max_by { |id| instance_id_to_total_acceptable_partition_weight[member_id_to_instance_id[id]] }
98
103
  end
104
+ partition = partitions[last_index]
105
+ member_id_to_partitions[member_id] << partition
106
+
107
+ partition_weight = partition_weights.dig(partition.topic, partition.partition_id)
108
+ member_id_to_acceptable_partition_weight[member_id] -= partition_weight
109
+ instance_id_to_total_acceptable_partition_weight[member_id_to_instance_id[member_id]] -= partition_weight
110
+
111
+ last_index += 1
99
112
  end
100
113
 
101
- group_assignment
114
+ member_id_to_partitions
102
115
  rescue Kafka::LeaderNotAvailable
103
116
  sleep 1
104
117
  retry
@@ -113,12 +126,12 @@ module Kafka
113
126
  (capacity || instance_family_to_capacity.fetch(instance_family, 1) * az_to_capacity.fetch(az, 1)).to_f
114
127
  end
115
128
 
116
- def build_partition_weights(topics)
129
+ def build_partition_weights(partitions)
117
130
  # Duplicate the weights to not destruct @partition_weights or the return value of @partition_weights
118
- weights = (@partition_weights.is_a?(Proc) ? @partition_weights.call() : @partition_weights).dup
119
- topics.each do |t|
120
- weights[t] = weights[t].dup || {}
121
- weights[t].default = 1
131
+ weights = (@partition_weights.is_a?(Proc) ? @partition_weights.call : @partition_weights).dup
132
+ partitions.map(&:topic).uniq.each do |topic|
133
+ weights[topic] = weights[topic].dup || {}
134
+ weights[topic].default = 1
122
135
  end
123
136
 
124
137
  weights
@@ -1,5 +1,5 @@
1
1
  module Kafka
2
2
  class EC2
3
- VERSION = "0.1.5"
3
+ VERSION = "0.2.0"
4
4
  end
5
5
  end
data/lib/kafka/ec2.rb CHANGED
@@ -1,23 +1,7 @@
1
- require "kafka/ec2/ext/consumer_group"
2
- require "kafka/ec2/ext/protocol/join_group_request"
3
- require "kafka/ec2/mixed_instance_assignment_strategy_factory"
1
+ require "kafka/ec2/mixed_instance_assignment_strategy"
4
2
  require "kafka/ec2/version"
5
3
 
6
4
  module Kafka
7
5
  class EC2
8
- class << self
9
- attr_reader :assignment_strategy_factory
10
-
11
- def with_assignment_strategy_factory(factory)
12
- @assignment_strategy_factory = factory
13
- yield
14
- ensure
15
- @assignment_strategy_factory = nil
16
- end
17
-
18
- def assignment_strategy_classes
19
- @assignment_strategy_classes ||= {}
20
- end
21
- end
22
6
  end
23
7
  end
metadata CHANGED
@@ -1,14 +1,14 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: ruby-kafka-ec2
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.1.5
4
+ version: 0.2.0
5
5
  platform: ruby
6
6
  authors:
7
7
  - abicky
8
- autorequire:
8
+ autorequire:
9
9
  bindir: exe
10
10
  cert_chain: []
11
- date: 2020-10-08 00:00:00.000000000 Z
11
+ date: 2022-03-29 00:00:00.000000000 Z
12
12
  dependencies:
13
13
  - !ruby/object:Gem::Dependency
14
14
  name: ruby-kafka
@@ -89,10 +89,7 @@ files:
89
89
  - bin/console
90
90
  - bin/setup
91
91
  - lib/kafka/ec2.rb
92
- - lib/kafka/ec2/ext/consumer_group.rb
93
- - lib/kafka/ec2/ext/protocol/join_group_request.rb
94
92
  - lib/kafka/ec2/mixed_instance_assignment_strategy.rb
95
- - lib/kafka/ec2/mixed_instance_assignment_strategy_factory.rb
96
93
  - lib/kafka/ec2/version.rb
97
94
  - ruby-kafka-ec2.gemspec
98
95
  homepage: https://github.com/abicky/ruby-kafka-ec2
@@ -101,7 +98,7 @@ licenses:
101
98
  metadata:
102
99
  homepage_uri: https://github.com/abicky/ruby-kafka-ec2
103
100
  source_code_uri: https://github.com/abicky/ruby-kafka-ec2
104
- post_install_message:
101
+ post_install_message:
105
102
  rdoc_options: []
106
103
  require_paths:
107
104
  - lib
@@ -116,8 +113,8 @@ required_rubygems_version: !ruby/object:Gem::Requirement
116
113
  - !ruby/object:Gem::Version
117
114
  version: '0'
118
115
  requirements: []
119
- rubygems_version: 3.0.3
120
- signing_key:
116
+ rubygems_version: 3.2.22
117
+ signing_key:
121
118
  specification_version: 4
122
119
  summary: An extension of ruby-kafka for EC2
123
120
  test_files: []
@@ -1,33 +0,0 @@
1
- # frozen_string_literal: true
2
-
3
- require "kafka/consumer_group"
4
- require "kafka/ec2/mixed_instance_assignment_strategy"
5
-
6
- module Kafka
7
- class EC2
8
- module Ext
9
- module ConsumerGroup
10
- def initialize(*args, **kwargs)
11
- super
12
- if Kafka::EC2.assignment_strategy_factory
13
- @assignment_strategy = Kafka::EC2.assignment_strategy_factory.create(cluster: @cluster)
14
- end
15
- Kafka::EC2.assignment_strategy_classes[@group_id] = @assignment_strategy.class
16
- end
17
-
18
- def join_group
19
- super
20
- if Kafka::EC2.assignment_strategy_classes[@group_id] == Kafka::EC2::MixedInstanceAssignmentStrategy
21
- @assignment_strategy.member_id_to_metadata = @members
22
- end
23
- end
24
- end
25
- end
26
- end
27
- end
28
-
29
- module Kafka
30
- class ConsumerGroup
31
- prepend Kafka::EC2::Ext::ConsumerGroup
32
- end
33
- end
@@ -1,39 +0,0 @@
1
- # frozen_string_literal: true
2
-
3
- require "net/http"
4
-
5
- require "kafka/protocol/consumer_group_protocol"
6
- require "kafka/protocol/join_group_request"
7
-
8
- module Kafka
9
- class EC2
10
- module Ext
11
- module Protocol
12
- module JoinGroupRequest
13
- def initialize(*args, topics: [], **kwargs)
14
- super
15
- if Kafka::EC2.assignment_strategy_classes[@group_id] == Kafka::EC2::MixedInstanceAssignmentStrategy
16
- user_data = Net::HTTP.start("169.254.169.254", 80) do |http|
17
- instance_id = http.get("/latest/meta-data/instance-id").body
18
- instance_type = http.get("/latest/meta-data/instance-type").body
19
- az = http.get("/latest/meta-data/placement/availability-zone").body
20
- "|#{instance_id},#{instance_type},#{az}"
21
- end
22
- @group_protocols = {
23
- "mixedinstance" => Kafka::Protocol::ConsumerGroupProtocol.new(topics: topics, user_data: user_data),
24
- }
25
- end
26
- end
27
- end
28
- end
29
- end
30
- end
31
- end
32
-
33
- module Kafka
34
- module Protocol
35
- class JoinGroupRequest
36
- prepend Kafka::EC2::Ext::Protocol::JoinGroupRequest
37
- end
38
- end
39
- end
@@ -1,30 +0,0 @@
1
- # frozen_string_literal: true
2
-
3
- require "kafka/ec2/mixed_instance_assignment_strategy"
4
-
5
- module Kafka
6
- class EC2
7
- class MixedInstanceAssignmentStrategyFactory
8
- # @param instance_family_weights [Hash, Proc]
9
- # @param availability_zone_weights [Hash, Proc]
10
- # @param weights [Hash, Proc]
11
- # @see Kafka::EC2::MixedInstanceAssignmentStrategy#initialize
12
- def initialize(instance_family_weights: {}, availability_zone_weights: {}, weights: {}, partition_weights: {})
13
- @instance_family_weights = instance_family_weights
14
- @availability_zone_weights = availability_zone_weights
15
- @weights = weights
16
- @partition_weights = partition_weights
17
- end
18
-
19
- def create(cluster:)
20
- Kafka::EC2::MixedInstanceAssignmentStrategy.new(
21
- cluster: cluster,
22
- instance_family_weights: @instance_family_weights,
23
- availability_zone_weights: @availability_zone_weights,
24
- weights: @weights,
25
- partition_weights: @partition_weights,
26
- )
27
- end
28
- end
29
- end
30
- end