racecar 2.0.0 → 2.10.0.beta2
Sign up to get free protection for your applications and to get access to all the features.
- checksums.yaml +4 -4
- data/.github/dependabot.yml +17 -0
- data/.github/workflows/ci.yml +46 -0
- data/.github/workflows/publish.yml +12 -0
- data/.gitignore +1 -2
- data/CHANGELOG.md +83 -1
- data/Dockerfile +9 -0
- data/Gemfile +6 -0
- data/Gemfile.lock +72 -0
- data/README.md +303 -82
- data/Rakefile +5 -0
- data/docker-compose.yml +65 -0
- data/examples/batch_consumer.rb +4 -2
- data/examples/cat_consumer.rb +2 -0
- data/examples/producing_consumer.rb +2 -0
- data/exe/racecar +37 -14
- data/extra/datadog-dashboard.json +1 -0
- data/lib/ensure_hash_compact.rb +2 -0
- data/lib/generators/racecar/consumer_generator.rb +2 -0
- data/lib/generators/racecar/install_generator.rb +2 -0
- data/lib/racecar/cli.rb +26 -21
- data/lib/racecar/config.rb +80 -4
- data/lib/racecar/consumer.rb +51 -6
- data/lib/racecar/consumer_set.rb +113 -44
- data/lib/racecar/ctl.rb +31 -3
- data/lib/racecar/daemon.rb +4 -2
- data/lib/racecar/datadog.rb +83 -3
- data/lib/racecar/delivery_callback.rb +27 -0
- data/lib/racecar/erroneous_state_error.rb +34 -0
- data/lib/racecar/heroku.rb +49 -0
- data/lib/racecar/instrumenter.rb +4 -7
- data/lib/racecar/liveness_probe.rb +78 -0
- data/lib/racecar/message.rb +6 -1
- data/lib/racecar/message_delivery_error.rb +112 -0
- data/lib/racecar/null_instrumenter.rb +2 -0
- data/lib/racecar/parallel_runner.rb +110 -0
- data/lib/racecar/pause.rb +8 -4
- data/lib/racecar/producer.rb +139 -0
- data/lib/racecar/rails_config_file_loader.rb +7 -1
- data/lib/racecar/rebalance_listener.rb +58 -0
- data/lib/racecar/runner.rb +79 -37
- data/lib/racecar/version.rb +3 -1
- data/lib/racecar.rb +36 -8
- data/racecar.gemspec +7 -4
- metadata +47 -25
- data/.github/workflows/rspec.yml +0 -24
data/README.md
CHANGED
@@ -10,22 +10,22 @@ The framework is based on [rdkafka-ruby](https://github.com/appsignal/rdkafka-ru
|
|
10
10
|
|
11
11
|
1. [Installation](#installation)
|
12
12
|
2. [Usage](#usage)
|
13
|
-
|
14
|
-
|
15
|
-
|
16
|
-
|
17
|
-
|
18
|
-
|
19
|
-
|
20
|
-
|
21
|
-
|
22
|
-
|
13
|
+
1. [Creating consumers](#creating-consumers)
|
14
|
+
2. [Running consumers](#running-consumers)
|
15
|
+
3. [Producing messages](#producing-messages)
|
16
|
+
4. [Configuration](#configuration)
|
17
|
+
5. [Testing consumers](#testing-consumers)
|
18
|
+
6. [Deploying consumers](#deploying-consumers)
|
19
|
+
7. [Handling errors](#handling-errors)
|
20
|
+
8. [Logging](#logging)
|
21
|
+
9. [Operations](#operations)
|
22
|
+
10. [Upgrading from v1 to v2](#upgrading-from-v1-to-v2)
|
23
|
+
11. [Compression](#compression)
|
23
24
|
3. [Development](#development)
|
24
25
|
4. [Contributing](#contributing)
|
25
26
|
5. [Support and Discussion](#support-and-discussion)
|
26
27
|
6. [Copyright and license](#copyright-and-license)
|
27
28
|
|
28
|
-
|
29
29
|
## Installation
|
30
30
|
|
31
31
|
Add this line to your application's Gemfile:
|
@@ -50,9 +50,7 @@ This will add a config file in `config/racecar.yml`.
|
|
50
50
|
|
51
51
|
## Usage
|
52
52
|
|
53
|
-
Racecar is built for simplicity of development and operation.
|
54
|
-
|
55
|
-
First, a short introduction to the Kafka consumer concept as well as some basic background on Kafka.
|
53
|
+
Racecar is built for simplicity of development and operation. First, a short introduction to the Kafka consumer concept as well as some basic background on Kafka.
|
56
54
|
|
57
55
|
Kafka stores messages in so-called _partitions_ which are grouped into _topics_. Within a partition, each message gets a unique offset.
|
58
56
|
|
@@ -79,12 +77,38 @@ In order to create your own consumer, run the Rails generator `racecar:consumer`
|
|
79
77
|
|
80
78
|
$ bundle exec rails generate racecar:consumer TapDance
|
81
79
|
|
82
|
-
This will create a file at `app/consumers/tap_dance_consumer.rb` which you can modify to your liking. Add one or more calls to
|
80
|
+
This will create a file at `app/consumers/tap_dance_consumer.rb` which you can modify to your liking. Add one or more calls to `subscribes_to` in order to have the consumer subscribe to Kafka topics.
|
83
81
|
|
84
82
|
Now run your consumer with `bundle exec racecar TapDanceConsumer`.
|
85
83
|
|
86
84
|
Note: if you're not using Rails, you'll have to add the file yourself. No-one will judge you for copy-pasting it.
|
87
85
|
|
86
|
+
#### Running consumers in parallel (experimental)
|
87
|
+
|
88
|
+
Warning - limited battle testing in production environments; use at your own risk!
|
89
|
+
|
90
|
+
If you want to process different partitions in parallel, and don't want to deploy a number of instances matching the total partitions of the topic, you can specify the number of workers to spin up - that number of processes will be forked, and each will register its own consumer in the group. Some things to note:
|
91
|
+
|
92
|
+
- This would make no difference on a single partitioned topic - only one consumer would ever be assigned a partition. A couple of example configurations to process all partitions in parallel (we'll assume a 15 partition topic):
|
93
|
+
- Parallel workers set to 3, 5 separate instances / replicas running in your container orchestrator
|
94
|
+
- Parallel workers set to 5, 3 separate instances / replicas running in your container orchestrator
|
95
|
+
- Since we're forking new processes, the memory demands are a little higher
|
96
|
+
- From some initial testing, running 5 parallel workers requires no more than double the memory of running a Racecar consumer without parallelism.
|
97
|
+
|
98
|
+
The number of parallel workers is configured per consumer class; you may only want to take advantage of this for busier consumers:
|
99
|
+
|
100
|
+
```ruby
|
101
|
+
class ParallelProcessingConsumer < Racecar::Consumer
|
102
|
+
subscribes_to "some-topic"
|
103
|
+
|
104
|
+
self.parallel_workers = 5
|
105
|
+
|
106
|
+
def process(message)
|
107
|
+
...
|
108
|
+
end
|
109
|
+
end
|
110
|
+
```
|
111
|
+
|
88
112
|
#### Initializing consumers
|
89
113
|
|
90
114
|
You can optionally add an `initialize` method if you need to do any set-up work before processing messages, e.g.
|
@@ -160,9 +184,9 @@ message.headers #=> { "Header-A" => 42, ... }
|
|
160
184
|
|
161
185
|
In order to avoid your consumer being kicked out of its group during long-running message processing operations, you'll need to let Kafka regularly know that the consumer is still healthy. There's two mechanisms in place to ensure that:
|
162
186
|
|
163
|
-
|
187
|
+
_Heartbeats:_ They are automatically sent in the background and ensure the broker can still talk to the consumer. This will detect network splits, ungraceful shutdowns, etc.
|
164
188
|
|
165
|
-
|
189
|
+
_Message Fetch Interval:_ Kafka expects the consumer to query for new messages within this time limit. This will detect situations with slow IO or the consumer being stuck in an infinite loop without making actual progress. This limit applies to a whole batch if you do batch processing. Use `max_poll_interval` to increase the default 5 minute timeout, or reduce batching with `fetch_messages`.
|
166
190
|
|
167
191
|
#### Tearing down resources when stopping
|
168
192
|
|
@@ -222,11 +246,76 @@ The `deliver!` method can be used to block until the broker received all queued
|
|
222
246
|
|
223
247
|
You can set message headers by passing a `headers:` option with a Hash of headers.
|
224
248
|
|
249
|
+
### Standalone Producer
|
250
|
+
|
251
|
+
Racecar provides a standalone producer to publish messages to Kafka directly from your Rails application:
|
252
|
+
|
253
|
+
```ruby
|
254
|
+
# app/controllers/comments_controller.rb
|
255
|
+
class CommentsController < ApplicationController
|
256
|
+
def create
|
257
|
+
@comment = Comment.create!(params)
|
258
|
+
|
259
|
+
# This will publish a JSON representation of the comment to the `comments` topic
|
260
|
+
# in Kafka. Make sure to create the topic first, or this may fail.
|
261
|
+
Racecar.produce_sync(value:comment.to_json, topic: "comments")
|
262
|
+
end
|
263
|
+
end
|
264
|
+
```
|
265
|
+
|
266
|
+
The above example will block the server process until the message has been delivered. If you want deliveries to happen in the background in order to free up your server processes more quickly, call #deliver_async instead:
|
267
|
+
|
268
|
+
```ruby
|
269
|
+
# app/controllers/comments_controller.rb
|
270
|
+
class CommentsController < ApplicationController
|
271
|
+
def show
|
272
|
+
@comment = Comment.find(params[:id])
|
273
|
+
|
274
|
+
event = {
|
275
|
+
name: "comment_viewed",
|
276
|
+
data: {
|
277
|
+
comment_id: @comment.id,
|
278
|
+
user_id: current_user.id
|
279
|
+
}
|
280
|
+
}
|
281
|
+
|
282
|
+
# By delivering messages asynchronously you free up your server processes faster.
|
283
|
+
Racecar.produce_async(value: event.to_json, topic: "activity")
|
284
|
+
end
|
285
|
+
end
|
286
|
+
```
|
287
|
+
In addition to improving response time, delivering messages asynchronously also protects your application against Kafka availability issues -- if messages cannot be delivered, they'll be buffered for later and retried automatically.
|
288
|
+
|
289
|
+
A third method is to produce messages first (without delivering the messages to Kafka yet), and deliver them synchronously later:
|
290
|
+
|
291
|
+
```ruby
|
292
|
+
# app/controllers/comments_controller.rb
|
293
|
+
class CommentsController < ApplicationController
|
294
|
+
def create
|
295
|
+
@comment = Comment.create!(params)
|
296
|
+
|
297
|
+
event = {
|
298
|
+
name: "comment_created",
|
299
|
+
data: {
|
300
|
+
comment_id: @comment.id
|
301
|
+
user_id: current_user.id
|
302
|
+
}
|
303
|
+
}
|
304
|
+
|
305
|
+
# This will queue the two messages in the internal buffer and block server process until they are delivered.
|
306
|
+
Racecar.wait_for_delivery do
|
307
|
+
Racecar.produce_async(comment.to_json, topic: "comments")
|
308
|
+
Racecar.produce_async(event.to_json, topic: "activity")
|
309
|
+
end
|
310
|
+
end
|
311
|
+
end
|
312
|
+
```
|
313
|
+
|
225
314
|
### Configuration
|
226
315
|
|
227
316
|
Racecar provides a flexible way to configure your consumer in a way that feels at home in a Rails application. If you haven't already, run `bundle exec rails generate racecar:install` in order to generate a config file. You'll get a separate section for each Rails environment, with the common configuration values in a shared `common` section.
|
228
317
|
|
229
|
-
**Note:** many of these configuration keys correspond directly to similarly named concepts in [ruby
|
318
|
+
**Note:** many of these configuration keys correspond directly to similarly named concepts in [rdkafka-ruby](https://github.com/appsignal/rdkafka-ruby); for more details on low-level operations, read that project's documentation.
|
230
319
|
|
231
320
|
It's also possible to configure Racecar using environment variables. For any given configuration key, there should be a corresponding environment variable with the prefix `RACECAR_`, in upper case. For instance, in order to configure the client id, set `RACECAR_CLIENT_ID=some-id` in the process in which the Racecar consumer is launched. You can set `brokers` by passing a comma-separated list, e.g. `RACECAR_BROKERS=kafka1:9092,kafka2:9092,kafka3:9092`.
|
232
321
|
|
@@ -241,87 +330,96 @@ end
|
|
241
330
|
|
242
331
|
#### Basic configuration
|
243
332
|
|
244
|
-
|
245
|
-
|
246
|
-
|
247
|
-
|
333
|
+
- `brokers` – A list of Kafka brokers in the cluster that you're consuming from. Defaults to `localhost` on port 9092, the default Kafka port.
|
334
|
+
- `client_id` – A string used to identify the client in logs and metrics.
|
335
|
+
- `group_id` – The group id to use for a given group of consumers. Note that this _must_ be different for each consumer class. If left blank a group id is generated based on the consumer class name such that (for example) a consumer with the class name `BaconConsumer` would default to a group id of `bacon-consumer`.
|
336
|
+
- `group_id_prefix` – A prefix used when generating consumer group names. For instance, if you set the prefix to be `kevin.` and your consumer class is named `BaconConsumer`, the resulting consumer group will be named `kevin.bacon-consumer`.
|
337
|
+
|
338
|
+
#### Batches
|
339
|
+
|
340
|
+
- `fetch_messages` - The number of messages to fetch in a single batch. This can be set on a per consumer basis.
|
248
341
|
|
249
342
|
#### Logging
|
250
343
|
|
251
|
-
|
252
|
-
|
344
|
+
- `logfile` – A filename that log messages should be written to. Default is `nil`, which means logs will be written to standard output.
|
345
|
+
- `log_level` – The log level for the Racecar logs, one of `debug`, `info`, `warn`, or `error`. Default is `info`.
|
253
346
|
|
254
347
|
#### Consumer checkpointing
|
255
348
|
|
256
349
|
The consumers will checkpoint their positions from time to time in order to be able to recover from failures. This is called _committing offsets_, since it's done by tracking the offset reached in each partition being processed, and committing those offset numbers to the Kafka offset storage API. If you can tolerate more double-processing after a failure, you can increase the interval between commits in order to better performance. You can also do the opposite if you prefer less chance of double-processing.
|
257
350
|
|
258
|
-
|
351
|
+
- `offset_commit_interval` – How often to save the consumer's position in Kafka. Default is every 10 seconds.
|
259
352
|
|
260
353
|
#### Timeouts & intervals
|
261
354
|
|
262
355
|
All timeouts are defined in number of seconds.
|
263
356
|
|
264
|
-
|
265
|
-
|
266
|
-
|
267
|
-
|
268
|
-
|
269
|
-
|
270
|
-
|
357
|
+
- `session_timeout` – The idle timeout after which a consumer is kicked out of the group. Consumers must send heartbeats with at least this frequency.
|
358
|
+
- `heartbeat_interval` – How often to send a heartbeat message to Kafka.
|
359
|
+
- `max_poll_interval` – The maximum time between two message fetches before the consumer is kicked out of the group. Put differently, your (batch) processing must finish earlier than this.
|
360
|
+
- `pause_timeout` – How long to pause a partition for if the consumer raises an exception while processing a message. Default is to pause for 10 seconds. Set this to `0` in order to disable automatic pausing of partitions or to `-1` to pause indefinitely.
|
361
|
+
- `pause_with_exponential_backoff` – Set to `true` if you want to double the `pause_timeout` on each consecutive failure of a particular partition.
|
362
|
+
- `socket_timeout` – How long to wait when trying to communicate with a Kafka broker. Default is 30 seconds.
|
363
|
+
- `max_wait_time` – How long to allow the Kafka brokers to wait before returning messages. A higher number means larger batches, at the cost of higher latency. Default is 1 second.
|
364
|
+
- `message_timeout` – How long to try to deliver a produced message before finally giving up. Default is 5 minutes. Transient errors are automatically retried. If a message delivery fails, the current read message batch is retried.
|
365
|
+
- `statistics_interval` – How frequently librdkafka should publish statistics about its consumers and producers; you must also add a `statistics_callback` method to your processor, otherwise the stats are disabled. The default is 1 second, however this can be quite memory hungry, so you may want to tune this and monitor.
|
271
366
|
|
272
367
|
#### Memory & network usage
|
273
368
|
|
274
369
|
Kafka is _really_ good at throwing data at consumers, so you may want to tune these variables in order to avoid ballooning your process' memory or saturating your network capacity.
|
275
370
|
|
276
|
-
Racecar uses ruby-
|
371
|
+
Racecar uses [rdkafka-ruby](https://github.com/appsignal/rdkafka-ruby) under the hood, which fetches messages from the Kafka brokers in a background thread. This thread pushes fetch responses, possible containing messages from many partitions, into a queue that is read by the processing thread (AKA your code). The main way to control the fetcher thread is to control the size of those responses and the size of the queue.
|
277
372
|
|
278
|
-
|
279
|
-
|
373
|
+
- `max_bytes` — Maximum amount of data the broker shall return for a Fetch request.
|
374
|
+
- `min_message_queue_size` — The minimum number of messages in the local consumer queue.
|
280
375
|
|
281
376
|
The memory usage limit is roughly estimated as `max_bytes * min_message_queue_size`, plus whatever your application uses.
|
282
377
|
|
283
378
|
#### SSL encryption, authentication & authorization
|
284
379
|
|
285
|
-
|
286
|
-
|
287
|
-
|
288
|
-
|
289
|
-
|
290
|
-
|
291
|
-
|
292
|
-
|
380
|
+
- `security_protocol` – Protocol used to communicate with brokers (`:ssl`)
|
381
|
+
- `ssl_ca_location` – File or directory path to CA certificate(s) for verifying the broker's key
|
382
|
+
- `ssl_crl_location` – Path to CRL for verifying broker's certificate validity
|
383
|
+
- `ssl_keystore_location` – Path to client's keystore (PKCS#12) used for authentication
|
384
|
+
- `ssl_keystore_password` – Client's keystore (PKCS#12) password
|
385
|
+
- `ssl_certificate_location` – Path to the certificate
|
386
|
+
- `ssl_key_location` – Path to client's certificate used for authentication
|
387
|
+
- `ssl_key_password` – Client's certificate password
|
293
388
|
|
294
389
|
#### SASL encryption, authentication & authorization
|
295
390
|
|
296
391
|
Racecar has support for using SASL to authenticate clients using either the GSSAPI or PLAIN mechanism either via plaintext or SSL connection.
|
297
392
|
|
298
|
-
|
299
|
-
|
393
|
+
- `security_protocol` – Protocol used to communicate with brokers (`:sasl_plaintext` `:sasl_ssl`)
|
394
|
+
- `sasl_mechanism` – SASL mechanism to use for authentication (`GSSAPI` `PLAIN` `SCRAM-SHA-256` `SCRAM-SHA-512`)
|
300
395
|
|
301
|
-
|
302
|
-
|
303
|
-
|
304
|
-
|
305
|
-
|
306
|
-
|
396
|
+
- `sasl_kerberos_principal` – This client's Kerberos principal name
|
397
|
+
- `sasl_kerberos_kinit_cmd` – Full kerberos kinit command string, `%{config.prop.name}` is replaced by corresponding config object value, `%{broker.name}` returns the broker's hostname
|
398
|
+
- `sasl_kerberos_keytab` – Path to Kerberos keytab file. Uses system default if not set
|
399
|
+
- `sasl_kerberos_min_time_before_relogin` – Minimum time in milliseconds between key refresh attempts
|
400
|
+
- `sasl_username` – SASL username for use with the PLAIN and SASL-SCRAM-.. mechanism
|
401
|
+
- `sasl_password` – SASL password for use with the PLAIN and SASL-SCRAM-.. mechanism
|
307
402
|
|
308
403
|
#### Producing messages
|
309
404
|
|
310
405
|
These settings are related to consumers that _produce messages to Kafka_.
|
311
406
|
|
312
|
-
|
407
|
+
- `partitioner` – The strategy used to determine which topic partition a message is written to when Racecar produces a value to Kafka. The codec needs to be one of `consistent`, `consistent_random` `murmur2` `murmur2_random` `fnv1a` `fnv1a_random` either as a Symbol or a String, defaults to `consistent_random`
|
408
|
+
- `producer_compression_codec` – If defined, Racecar will compress messages before writing them to Kafka. The codec needs to be one of `gzip`, `lz4`, or `snappy`, either as a Symbol or a String.
|
313
409
|
|
314
410
|
#### Datadog monitoring
|
315
411
|
|
316
|
-
Racecar supports
|
412
|
+
Racecar supports [Datadog](https://www.datadoghq.com/) monitoring integration. If you're running a normal Datadog agent on your host, you just need to set `datadog_enabled` to `true`, as the rest of the settings come with sane defaults.
|
413
|
+
|
414
|
+
- `datadog_enabled` – Whether Datadog monitoring is enabled (defaults to `false`).
|
415
|
+
- `datadog_host` – The host running the Datadog agent.
|
416
|
+
- `datadog_port` – The port of the Datadog agent.
|
417
|
+
- `datadog_namespace` – The namespace to use for Datadog metrics.
|
418
|
+
- `datadog_tags` – Tags that should always be set on Datadog metrics.
|
317
419
|
|
318
|
-
|
319
|
-
* `datadog_host` – The host running the Datadog agent.
|
320
|
-
* `datadog_port` – The port of the Datadog agent.
|
321
|
-
* `datadog_namespace` – The namespace to use for Datadog metrics.
|
322
|
-
* `datadog_tags` – Tags that should always be set on Datadog metrics.
|
420
|
+
Furthermore, there's a [standard Datadog dashboard configuration file](https://raw.githubusercontent.com/zendesk/racecar/master/extra/datadog-dashboard.json) that you can import to get started with a Racecar dashboard for all of your consumers.
|
323
421
|
|
324
|
-
#### Consumers Without Rails
|
422
|
+
#### Consumers Without Rails
|
325
423
|
|
326
424
|
By default, if Rails is detected, it will be automatically started when the consumer is started. There are cases where you might not want or need Rails. You can pass the `--without-rails` option when starting the consumer and Rails won't be started.
|
327
425
|
|
@@ -359,7 +457,6 @@ describe CreateContactsConsumer do
|
|
359
457
|
end
|
360
458
|
```
|
361
459
|
|
362
|
-
|
363
460
|
### Deploying consumers
|
364
461
|
|
365
462
|
If you're already deploying your Rails application using e.g. [Capistrano](http://capistranorb.com/), all you need to do to run your Racecar consumers in production is to have some _process supervisor_ start the processes and manage them for you.
|
@@ -371,7 +468,7 @@ racecar-process-payments: bundle exec racecar ProcessPaymentsConsumer
|
|
371
468
|
racecar-resize-images: bundle exec racecar ResizeImagesConsumer
|
372
469
|
```
|
373
470
|
|
374
|
-
If you've ever used Heroku you'll recognize the format – indeed, deploying to Heroku should just work if you add Racecar invocations to your Procfile
|
471
|
+
If you've ever used Heroku you'll recognize the format – indeed, deploying to Heroku should just work if you add Racecar invocations to your Procfile and [enable the Heroku integration](#deploying-to-heroku)
|
375
472
|
|
376
473
|
With Foreman, you can easily run these processes locally by executing `foreman run`; in production you'll want to _export_ to another process management format such as Upstart or Runit. [capistrano-foreman](https://github.com/hyperoslo/capistrano-foreman) allows you to do this with Capistrano.
|
377
474
|
|
@@ -379,6 +476,8 @@ With Foreman, you can easily run these processes locally by executing `foreman r
|
|
379
476
|
|
380
477
|
If you run your applications in Kubernetes, use the following [Deployment](https://kubernetes.io/docs/concepts/workloads/controllers/deployment/) spec as a starting point:
|
381
478
|
|
479
|
+
##### Recreate Strategy
|
480
|
+
|
382
481
|
```yaml
|
383
482
|
apiVersion: apps/v1
|
384
483
|
kind: Deployment
|
@@ -386,8 +485,8 @@ metadata:
|
|
386
485
|
name: my-racecar-deployment
|
387
486
|
labels:
|
388
487
|
app: my-racecar
|
389
|
-
spec
|
390
|
-
replicas:
|
488
|
+
spec
|
489
|
+
replicas: 4 # <-- this is a good value if you have a multliple of 4 partitions
|
391
490
|
selector:
|
392
491
|
matchLabels:
|
393
492
|
app: my-racecar
|
@@ -399,20 +498,119 @@ spec:
|
|
399
498
|
app: my-racecar
|
400
499
|
spec:
|
401
500
|
containers:
|
402
|
-
|
403
|
-
|
404
|
-
|
405
|
-
|
406
|
-
|
407
|
-
|
408
|
-
|
409
|
-
|
501
|
+
- name: my-racecar
|
502
|
+
image: my-racecar-image
|
503
|
+
command: ["bundle", "exec", "racecar", "MyConsumer"]
|
504
|
+
env: # <-- you can configure the consumer using environment variables!
|
505
|
+
- name: RACECAR_BROKERS
|
506
|
+
value: kafka1,kafka2,kafka3
|
507
|
+
- name: RACECAR_OFFSET_COMMIT_INTERVAL
|
508
|
+
value: 5
|
509
|
+
```
|
510
|
+
|
511
|
+
This configuration uses the recreate strategy which completely terminates all consumers before starting new ones.
|
512
|
+
It's simple and easy to understand but can result in significant 'downtime' where no messages are processed.
|
513
|
+
|
514
|
+
##### Rolling Updates and 'sticky-cooperative' Assignment
|
515
|
+
|
516
|
+
A newer alternative is to use the consumer's "cooperative-sticky" assignment strategy which allows healthy consumers to keep processing their partitions while others are terminated.
|
517
|
+
This can be combined with a restricted rolling update to minimize processing downtime.
|
518
|
+
|
519
|
+
Add to your Racecar config:
|
520
|
+
```ruby
|
521
|
+
Racecar.configure do |c|
|
522
|
+
c.partition_assignment_strategy = "cooperative-sticky"
|
523
|
+
end
|
524
|
+
```
|
525
|
+
|
526
|
+
Replace the Kubernetes deployment strategy with:
|
527
|
+
```yaml
|
528
|
+
strategy:
|
529
|
+
type: RollingUpdate
|
530
|
+
rollingUpdate:
|
531
|
+
maxSurge: 0 # <- Never boot an excess consumer
|
532
|
+
maxUnavailable: 1 # <- The deploy 'rolls' one consumer at a time
|
533
|
+
```
|
534
|
+
|
535
|
+
These two configurations should be deployed together.
|
536
|
+
|
537
|
+
While `maxSurge` should always be 0, `maxUnavailable` can be increased to reduce deployment times in exchange for longer pauses in message processing.
|
538
|
+
|
539
|
+
#### Liveness Probe
|
540
|
+
|
541
|
+
Racecar comes with a built-in liveness probe, primarily for use with Kubernetes, but useful for any deployment environment where you can periodically run a process to check the health of your consumer.
|
542
|
+
|
543
|
+
To use this feature:
|
544
|
+
- set the `liveness_probe_enabled` config option to true.
|
545
|
+
- configure your Kubernetes deployment to run `$ racecarctl liveness_probe`
|
546
|
+
|
547
|
+
|
548
|
+
When enabled (see config) Racecar will touch the file at `liveness_probe_file_path` each time it finishes polling Kafka and processing the messages in the batch (if any).
|
549
|
+
|
550
|
+
The modified time of this file can be observed to determine when the consumer last exhibited 'liveness'.
|
551
|
+
|
552
|
+
Running `racecarctl liveness_probe` will return a successful exit status if the last 'liveness' event happened within an acceptable time, `liveness_probe_max_interval`.
|
553
|
+
|
554
|
+
`liveness_probe_max_interval` should be long enough to account for both the Kafka polling time of `max_wait_time` and the processing time of a full message batch.
|
555
|
+
|
556
|
+
On receiving `SIGTERM`, Racecar will gracefully shut down and delete this file, causing the probe to fail immediately after exit.
|
557
|
+
|
558
|
+
You may wish to tolerate more than one failed probe run to accommodate for environmental variance and clock changes.
|
559
|
+
|
560
|
+
See the [Configuration section](https://github.com/zendesk/racecar#configuration) for the various ways the liveness probe can be configured, environment variables being one option.
|
561
|
+
|
562
|
+
Here is an example Kubernetes liveness probe configuration:
|
563
|
+
|
564
|
+
```yaml
|
565
|
+
apiVersion: apps/v1
|
566
|
+
kind: Deployment
|
567
|
+
spec:
|
568
|
+
template:
|
569
|
+
spec:
|
570
|
+
containers:
|
571
|
+
- name: consumer
|
572
|
+
|
573
|
+
args:
|
574
|
+
- racecar
|
575
|
+
- SomeConsumer
|
576
|
+
|
577
|
+
env:
|
578
|
+
- name: RACECAR_LIVENESS_PROBE_ENABLED
|
579
|
+
value: "true"
|
580
|
+
|
581
|
+
livenessProbe:
|
582
|
+
exec:
|
583
|
+
command:
|
584
|
+
- racecarctl
|
585
|
+
- liveness_probe
|
586
|
+
|
587
|
+
# Allow up to 10 consecutive failures before terminating Pod:
|
588
|
+
failureThreshold: 10
|
589
|
+
|
590
|
+
# Wait 30 seconds before starting the probes:
|
591
|
+
initialDelaySeconds: 30
|
592
|
+
|
593
|
+
# Perform the check every 10 seconds:
|
594
|
+
periodSeconds: 10
|
410
595
|
```
|
411
596
|
|
412
|
-
|
597
|
+
#### Deploying to Heroku
|
598
|
+
|
599
|
+
If you run your applications in Heroku and/or use the Heroku Kafka add-on, you application will be provided with 4 ENV variables that allow connecting to the cluster: `KAFKA_URL`, `KAFKA_TRUSTED_CERT`, `KAFKA_CLIENT_CERT`, and `KAFKA_CLIENT_CERT_KEY`.
|
413
600
|
|
414
|
-
|
601
|
+
Racecar has a built-in helper for configuring your application based on these variables – just add `require "racecar/heroku"` and everything should just work.
|
415
602
|
|
603
|
+
Please note aliasing the Heroku Kafka add-on will break this integration. If you have a need to do that, please ask on [the discussion board](https://github.com/zendesk/racecar/discussions).
|
604
|
+
|
605
|
+
```ruby
|
606
|
+
# This takes care of setting up your consumer based on the ENV
|
607
|
+
# variables provided by Heroku.
|
608
|
+
require "racecar/heroku"
|
609
|
+
|
610
|
+
class SomeConsumer < Racecar::Consumer
|
611
|
+
# ...
|
612
|
+
end
|
613
|
+
```
|
416
614
|
|
417
615
|
#### Running consumers in the background
|
418
616
|
|
@@ -430,10 +628,9 @@ Since the process is daemonized, you need to know the process id (PID) in order
|
|
430
628
|
|
431
629
|
Again, the recommended approach is to manage the processes using process managers. Only do this if you have to.
|
432
630
|
|
433
|
-
|
434
631
|
### Handling errors
|
435
632
|
|
436
|
-
When processing messages from a Kafka topic, your code may encounter an error and raise an exception. The cause is typically one of two things:
|
633
|
+
#### When processing messages from a Kafka topic, your code may encounter an error and raise an exception. The cause is typically one of two things:
|
437
634
|
|
438
635
|
1. The message being processed is somehow malformed or doesn't conform with the assumptions made by the processing code.
|
439
636
|
2. You're using some external resource such as a database or a network API that is temporarily unavailable.
|
@@ -468,6 +665,15 @@ end
|
|
468
665
|
|
469
666
|
It is highly recommended that you set up an error handler. Please note that the `info` object contains different keys and values depending on whether you are using `process` or `process_batch`. See the `instrumentation_payload` object in the `process` and `process_batch` methods in the `Runner` class for the complete list.
|
470
667
|
|
668
|
+
#### Errors related to Compression
|
669
|
+
|
670
|
+
A sample error might look like this:
|
671
|
+
|
672
|
+
```
|
673
|
+
E, [2022-10-09T11:28:29.976548 #15] ERROR -- : (try 5/10): Error for topic subscription #<struct Racecar::Consumer::Subscription topic="support.entity_incremental.views.view_ticket_ids", start_from_beginning=false, max_bytes_per_partition=104857, additional_config={}>: Local: Not implemented (not_implemented)
|
674
|
+
```
|
675
|
+
|
676
|
+
Please see [Compression](#compression)
|
471
677
|
|
472
678
|
### Logging
|
473
679
|
|
@@ -475,35 +681,50 @@ By default, Racecar will log to `STDOUT`. If you're using Rails, your applicatio
|
|
475
681
|
|
476
682
|
In order to make Racecar log its own operations to a log file, set the `logfile` configuration variable or pass `--log filename.log` to the `racecar` command.
|
477
683
|
|
478
|
-
|
479
684
|
### Operations
|
480
685
|
|
481
686
|
In order to gracefully shut down a Racecar consumer process, send it the `SIGTERM` signal. Most process supervisors such as Runit and Kubernetes send this signal when shutting down a process, so using those systems will make things easier.
|
482
687
|
|
483
688
|
In order to introspect the configuration of a consumer process, send it the `SIGUSR1` signal. This will make Racecar print its configuration to the standard error file descriptor associated with the consumer process, so you'll need to know where that is written to.
|
484
689
|
|
485
|
-
|
486
690
|
### Upgrading from v1 to v2
|
487
691
|
|
488
|
-
In order to safely upgrade from Racecar v1 to v2, you need to completely shut down your consumer group before starting it up again with the v2 Racecar dependency.
|
692
|
+
In order to safely upgrade from Racecar v1 to v2, you need to completely shut down your consumer group before starting it up again with the v2 Racecar dependency.
|
489
693
|
|
694
|
+
### Compression
|
695
|
+
|
696
|
+
Racecar v2 requires a C library (zlib) to compress the messages before producing to the topic. If not already installed on you consumer docker container, please install using following command in Dockerfile of consumer
|
697
|
+
|
698
|
+
```
|
699
|
+
apt-get update && apt-get install -y libzstd-dev
|
700
|
+
```
|
490
701
|
|
491
702
|
## Development
|
492
703
|
|
493
704
|
After checking out the repo, run `bin/setup` to install dependencies. Then, run `rspec` to run the tests. You can also run `bin/console` for an interactive prompt that will allow you to experiment.
|
494
705
|
|
706
|
+
The integration tests run against a Kafka instance that is not automatically started from within `rspec`. You can set one up using the provided `docker-compose.yml` by running `docker-compose up`.
|
707
|
+
|
708
|
+
### Running RSpec within Docker
|
709
|
+
|
710
|
+
There can be behavioural inconsistencies between running the specs on your machine, and in the CI pipeline. Due to this, there is now a Dockerfile included in the project, which is based on the CircleCI ruby 2.7.8 image. This could easily be extended with more Dockerfiles to cover different Ruby versions if desired. In order to run the specs via Docker:
|
711
|
+
|
712
|
+
- Uncomment the `tests` service from the docker-compose.yml
|
713
|
+
- Bring up the stack with `docker-compose up -d`
|
714
|
+
- Execute the entire suite with `docker-compose run --rm tests bundle exec rspec`
|
715
|
+
- Execute a single spec or directory with `docker-compose run --rm tests bundle exec rspec spec/integration/consumer_spec.rb`
|
716
|
+
|
717
|
+
Please note - your code directory is mounted as a volume, so you can make code changes without needing to rebuild
|
495
718
|
|
496
719
|
## Contributing
|
497
720
|
|
498
721
|
Bug reports and pull requests are welcome on [GitHub](https://github.com/zendesk/racecar). Feel free to [join our Slack team](https://ruby-kafka-slack.herokuapp.com/) and ask how best to contribute!
|
499
722
|
|
500
|
-
|
501
723
|
## Support and Discussion
|
502
724
|
|
503
|
-
If you've discovered a bug, please file a [Github issue](https://github.com/zendesk/racecar/issues/new), and make sure to include all the relevant information, including the version of Racecar, ruby
|
504
|
-
|
505
|
-
If you have other questions, or would like to discuss best practises, how to contribute to the project, or any other ruby-kafka related topic, [join our Slack team](https://ruby-kafka-slack.herokuapp.com/)!
|
725
|
+
If you've discovered a bug, please file a [Github issue](https://github.com/zendesk/racecar/issues/new), and make sure to include all the relevant information, including the version of Racecar, rdkafka-ruby, and Kafka that you're using.
|
506
726
|
|
727
|
+
If you have other questions, or would like to discuss best practises, or how to contribute to the project, [join our Slack team](https://ruby-kafka-slack.herokuapp.com/)!
|
507
728
|
|
508
729
|
## Copyright and license
|
509
730
|
|
data/Rakefile
CHANGED
data/docker-compose.yml
ADDED
@@ -0,0 +1,65 @@
|
|
1
|
+
version: '2.1'
|
2
|
+
|
3
|
+
services:
|
4
|
+
zookeeper:
|
5
|
+
image: confluentinc/cp-zookeeper:5.5.1
|
6
|
+
ports:
|
7
|
+
- "2181:2181"
|
8
|
+
environment:
|
9
|
+
ZOOKEEPER_CLIENT_PORT: 2181
|
10
|
+
ZOOKEEPER_TICK_TIME: 2000
|
11
|
+
KAFKA_OPTS: "-Dzookeeper.4lw.commands.whitelist=*"
|
12
|
+
healthcheck:
|
13
|
+
test: echo ruok | nc 127.0.0.1 2181 | grep imok
|
14
|
+
|
15
|
+
broker:
|
16
|
+
image: confluentinc/cp-kafka:5.5.1
|
17
|
+
depends_on:
|
18
|
+
- zookeeper
|
19
|
+
ports:
|
20
|
+
- "29092:29092"
|
21
|
+
- "9092:9092"
|
22
|
+
- "9101:9101"
|
23
|
+
environment:
|
24
|
+
KAFKA_BROKER_ID: 1
|
25
|
+
KAFKA_ZOOKEEPER_CONNECT: 'zookeeper:2181'
|
26
|
+
KAFKA_LISTENER_SECURITY_PROTOCOL_MAP: PLAINTEXT:PLAINTEXT,PLAINTEXT_HOST:PLAINTEXT
|
27
|
+
KAFKA_ADVERTISED_LISTENERS: PLAINTEXT://broker:29092,PLAINTEXT_HOST://localhost:9092
|
28
|
+
KAFKA_OFFSETS_TOPIC_REPLICATION_FACTOR: 1
|
29
|
+
KAFKA_TRANSACTION_STATE_LOG_MIN_ISR: 1
|
30
|
+
KAFKA_TRANSACTION_STATE_LOG_REPLICATION_FACTOR: 1
|
31
|
+
KAFKA_GROUP_INITIAL_REBALANCE_DELAY_MS: 0
|
32
|
+
KAFKA_JMX_PORT: 9101
|
33
|
+
KAFKA_DELETE_TOPIC_ENABLE: 'true'
|
34
|
+
healthcheck:
|
35
|
+
test: nc -z 127.0.0.1 9092
|
36
|
+
|
37
|
+
wait-for-healthy-services:
|
38
|
+
image: alpine
|
39
|
+
depends_on:
|
40
|
+
broker:
|
41
|
+
condition: service_healthy
|
42
|
+
zookeeper:
|
43
|
+
condition: service_healthy
|
44
|
+
|
45
|
+
|
46
|
+
# If you want to run the tests locally with Docker, comment in the tests service.
|
47
|
+
# The behaviour, especially of the integration tests, can differ somewhat compared
|
48
|
+
# to running it on your machine.
|
49
|
+
|
50
|
+
# tests:
|
51
|
+
# build:
|
52
|
+
# context: .
|
53
|
+
# depends_on:
|
54
|
+
# wait-for-healthy-services:
|
55
|
+
# condition: service_started
|
56
|
+
# environment:
|
57
|
+
# RACECAR_BROKERS: broker:29092
|
58
|
+
# DOCKER_SUDO: 'true'
|
59
|
+
# # When bringing up the stack, we just let the container exit. For running the
|
60
|
+
# # specs, we'll use commands like `docker-compose run tests rspec`
|
61
|
+
# command: ["echo", "ready"]
|
62
|
+
# volumes:
|
63
|
+
# # The line below allows us to run docker commands from the container itself
|
64
|
+
# - "/var/run/docker.sock:/var/run/docker.sock"
|
65
|
+
# - .:/app
|