dionysus-rb 0.1.0

Sign up to get free protection for your applications and to get access to all the features.
Files changed (102) hide show
  1. checksums.yaml +7 -0
  2. data/.circleci/config.yml +61 -0
  3. data/.github/workflows/ci.yml +77 -0
  4. data/.gitignore +12 -0
  5. data/.rspec +3 -0
  6. data/.rubocop.yml +175 -0
  7. data/.rubocop_todo.yml +53 -0
  8. data/CHANGELOG.md +227 -0
  9. data/Gemfile +10 -0
  10. data/Gemfile.lock +258 -0
  11. data/LICENSE.txt +21 -0
  12. data/README.md +1206 -0
  13. data/Rakefile +10 -0
  14. data/assets/logo.svg +51 -0
  15. data/bin/console +11 -0
  16. data/bin/karafka_health_check +14 -0
  17. data/bin/outbox_worker_health_check +12 -0
  18. data/bin/setup +8 -0
  19. data/dionysus-rb.gemspec +64 -0
  20. data/docker-compose.yml +44 -0
  21. data/lib/dionysus/checks/health_check.rb +50 -0
  22. data/lib/dionysus/checks.rb +7 -0
  23. data/lib/dionysus/consumer/batch_events_publisher.rb +33 -0
  24. data/lib/dionysus/consumer/config.rb +97 -0
  25. data/lib/dionysus/consumer/deserializer.rb +231 -0
  26. data/lib/dionysus/consumer/dionysus_event.rb +42 -0
  27. data/lib/dionysus/consumer/karafka_consumer_generator.rb +56 -0
  28. data/lib/dionysus/consumer/params_batch_processor.rb +65 -0
  29. data/lib/dionysus/consumer/params_batch_transformations/remove_duplicates_strategy.rb +54 -0
  30. data/lib/dionysus/consumer/params_batch_transformations.rb +4 -0
  31. data/lib/dionysus/consumer/persistor.rb +157 -0
  32. data/lib/dionysus/consumer/registry.rb +84 -0
  33. data/lib/dionysus/consumer/synced_data/assign_columns_from_synced_data.rb +27 -0
  34. data/lib/dionysus/consumer/synced_data/assign_columns_from_synced_data_job.rb +26 -0
  35. data/lib/dionysus/consumer/synced_data.rb +4 -0
  36. data/lib/dionysus/consumer/synchronizable_model.rb +93 -0
  37. data/lib/dionysus/consumer/workers_group.rb +18 -0
  38. data/lib/dionysus/consumer.rb +36 -0
  39. data/lib/dionysus/monitor.rb +48 -0
  40. data/lib/dionysus/producer/base_responder.rb +46 -0
  41. data/lib/dionysus/producer/config.rb +104 -0
  42. data/lib/dionysus/producer/deleted_record_serializer.rb +17 -0
  43. data/lib/dionysus/producer/genesis/performed.rb +11 -0
  44. data/lib/dionysus/producer/genesis/stream_job.rb +13 -0
  45. data/lib/dionysus/producer/genesis/streamer/base_job.rb +44 -0
  46. data/lib/dionysus/producer/genesis/streamer/standard_job.rb +43 -0
  47. data/lib/dionysus/producer/genesis/streamer.rb +40 -0
  48. data/lib/dionysus/producer/genesis.rb +62 -0
  49. data/lib/dionysus/producer/karafka_responder_generator.rb +133 -0
  50. data/lib/dionysus/producer/key.rb +14 -0
  51. data/lib/dionysus/producer/model_serializer.rb +105 -0
  52. data/lib/dionysus/producer/outbox/active_record_publishable.rb +74 -0
  53. data/lib/dionysus/producer/outbox/datadog_latency_reporter.rb +26 -0
  54. data/lib/dionysus/producer/outbox/datadog_latency_reporter_job.rb +11 -0
  55. data/lib/dionysus/producer/outbox/datadog_latency_reporter_scheduler.rb +47 -0
  56. data/lib/dionysus/producer/outbox/datadog_tracer.rb +32 -0
  57. data/lib/dionysus/producer/outbox/duplicates_filter.rb +26 -0
  58. data/lib/dionysus/producer/outbox/event_name.rb +26 -0
  59. data/lib/dionysus/producer/outbox/health_check.rb +48 -0
  60. data/lib/dionysus/producer/outbox/latency_tracker.rb +43 -0
  61. data/lib/dionysus/producer/outbox/model.rb +117 -0
  62. data/lib/dionysus/producer/outbox/producer.rb +26 -0
  63. data/lib/dionysus/producer/outbox/publishable.rb +106 -0
  64. data/lib/dionysus/producer/outbox/publisher.rb +131 -0
  65. data/lib/dionysus/producer/outbox/records_processor.rb +56 -0
  66. data/lib/dionysus/producer/outbox/runner.rb +120 -0
  67. data/lib/dionysus/producer/outbox/tombstone_publisher.rb +22 -0
  68. data/lib/dionysus/producer/outbox.rb +103 -0
  69. data/lib/dionysus/producer/partition_key.rb +42 -0
  70. data/lib/dionysus/producer/registry/validator.rb +32 -0
  71. data/lib/dionysus/producer/registry.rb +165 -0
  72. data/lib/dionysus/producer/serializer.rb +52 -0
  73. data/lib/dionysus/producer/suppressor.rb +18 -0
  74. data/lib/dionysus/producer.rb +121 -0
  75. data/lib/dionysus/railtie.rb +9 -0
  76. data/lib/dionysus/rb/version.rb +5 -0
  77. data/lib/dionysus/rb.rb +8 -0
  78. data/lib/dionysus/support/rspec/outbox_publishable.rb +78 -0
  79. data/lib/dionysus/topic_name.rb +15 -0
  80. data/lib/dionysus/utils/default_message_filter.rb +25 -0
  81. data/lib/dionysus/utils/exponential_backoff.rb +7 -0
  82. data/lib/dionysus/utils/karafka_datadog_listener.rb +20 -0
  83. data/lib/dionysus/utils/karafka_sentry_listener.rb +9 -0
  84. data/lib/dionysus/utils/null_error_handler.rb +6 -0
  85. data/lib/dionysus/utils/null_event_bus.rb +5 -0
  86. data/lib/dionysus/utils/null_hermes_event_producer.rb +5 -0
  87. data/lib/dionysus/utils/null_instrumenter.rb +7 -0
  88. data/lib/dionysus/utils/null_lock_client.rb +13 -0
  89. data/lib/dionysus/utils/null_model_factory.rb +5 -0
  90. data/lib/dionysus/utils/null_mutex_provider.rb +7 -0
  91. data/lib/dionysus/utils/null_retry_provider.rb +7 -0
  92. data/lib/dionysus/utils/null_tracer.rb +5 -0
  93. data/lib/dionysus/utils/null_transaction_provider.rb +15 -0
  94. data/lib/dionysus/utils/sidekiq_batched_job_distributor.rb +24 -0
  95. data/lib/dionysus/utils.rb +6 -0
  96. data/lib/dionysus/version.rb +7 -0
  97. data/lib/dionysus-rb.rb +3 -0
  98. data/lib/dionysus.rb +133 -0
  99. data/lib/tasks/dionysus.rake +18 -0
  100. data/log/development.log +0 -0
  101. data/sig/dionysus/rb.rbs +6 -0
  102. metadata +585 -0
data/README.md ADDED
@@ -0,0 +1,1206 @@
1
+ # Dionysus::Rb
2
+
3
+ ![Dionysus](assets/logo.svg)
4
+
5
+ `Dionysus` - a framework on top of [Karafka](http://github.com/karafka/karafka) for Change Data Capture on the domain model level.
6
+
7
+ In distibuted systems, transferring data between applications is often a challenge. There are multiple ways how of to do this, especially when using Kafka. There is a good chance that you be familiar with [Change Data Capture](https://www.confluent.io/learn/change-data-capture/) pattern, often applied to relational databases such as PostgreSQL, which is a way of extracting row-level changes in real time. In that cases CDC focuses on INSERTs, UPDATEs and DELETEs of rows. If you are familiar with logical replication, this concept ring a bell. When exploring Kafka, you might have you heard of [Debezium](https://debezium.io), which makes CDC via Kafka simple.
8
+
9
+ However, there is one problem with this kind of CDC - they are all about row-level changes. This could work for simple cases, but in more complex domains there is a good chance that a database row is not a great reprentation of a domain model. This would be especiallty true if you apply Domain-Driven Design methodology and what you would like to replicate is an Aggregate that could be composed of several rows coming from different tables.
10
+
11
+ Fortunately, mighty Dionysus himself, powered by wine from [Karafka](https://karafka.io/docs/), has got your back - Dionysus can handle CDC on the domain model level. On the producer side, it will publish `model_created`, `model_updated` and `model_destroyed` events with a snapshot of a given model using custom serializers, also handling dependencies and computed properties (where the value of attribute depends on the value from the other model), with a possiblity of using [transactional outbox pattern](https://karolgalanciak.com/blog/2022/11/12/the-inherent-unreliability-of-after_commit-callback-and-most-service-objects-implementation/) to ensure that everthing gets published. On the consumer side, it will make sure that the snapshots of models are persisted and that you could react to all changes not only via ActiveRecord callbacks but also via event bus. And all of this is achievable merely via a couple of config opions and powerful DSL!
12
+
13
+
14
+ ## Installation
15
+ Install the gem and add to the application's Gemfile by executing:
16
+
17
+ $ bundle add "dionysus-rb"
18
+
19
+ If bundler is not being used to manage dependencies, install the gem by executing:
20
+
21
+ $ gem install "dionysus-rb"
22
+
23
+ ## Usage
24
+
25
+ Please read [this article first](https://www.smily.com/engineering/integration-patterns-for-distributed-architecture-how-we-use-kafka-in-smily-and-why) to understand the context how this gem was built. Also, it's just recently been made public, so some part of the docs might require clarification. If you find any section like that, don't hesitate to submit an issue.
26
+
27
+ ### TODO - update the article is published.
28
+ Also, [read this article], which is an introduction to the gem.
29
+
30
+
31
+ Any application can be both consumer and the producer of Karafka events, so let's take a look how to handle configuration for both scenario.
32
+
33
+
34
+ ### Producer
35
+
36
+ First, you need to define a file `karafka.rb` with a content like this:
37
+
38
+ ``` rb
39
+ # frozen_string_literal: true
40
+
41
+ Dionysus.initialize_application!(
42
+ environment: ENV["RAILS_ENV"],
43
+ seed_brokers: ENV.fetch("DIONYSUS_SEED_BROKER").split(";"),
44
+ client_id: "NAME_OF_THE_APP",
45
+ logger: Rails.logger
46
+ )
47
+ ```
48
+
49
+ `DIONYSUS_SEED_BROKER` is a string containing all the brokers separated a *semicolon*, e.g. `localhost:9092`. Protocol should not be included.
50
+
51
+ This is going to handle the initialization process.
52
+
53
+ If you are migration from the gem prior to making `dionysus-rb` public, most likely you will need to also provide `consumer_group_prefix` for backwards compatibility:
54
+
55
+ ``` rb
56
+ Dionysus.initialize_application!(
57
+ environment: ENV["RAILS_ENV"],
58
+ seed_brokers: ENV.fetch("DIONYSUS_SEED_BROKER").split(";"),
59
+ client_id: "NAME_OF_THE_APP",
60
+ logger: Rails.logger,
61
+ consumer_group_prefix: "prometheus_consumer_group_for"
62
+ )
63
+ ```
64
+
65
+ By default, the name of the consumer grpup will be "NAME_OF_THE_APP_dionysus_consumer_group_for_NAME_OF_THE_APP" where `dionysus_consumer_group_for` is a `consumer_group_prefix`.
66
+
67
+
68
+ And define `dionysus.rb` initializer with your Kafka topics:
69
+
70
+ ``` rb config/initializers/dionysus.rb
71
+ Rails.application.config.to_prepare do
72
+ Karafka::App.setup do |config|
73
+ config.producer = ::WaterDrop::Producer.new do |producer_config|
74
+ producer_config.kafka = {
75
+ 'bootstrap.servers': 'localhost:9092', # this needs to be comma-separates list of brokers
76
+ 'request.required.acks': 1,
77
+ "client.id": "id_of_the_producer_goes_here"
78
+ }
79
+ producer_config.id = "id_of_the_producer_goes_here"
80
+ producer_config.deliver = true
81
+ end
82
+ end
83
+
84
+ Dionysus::Producer.declare do
85
+ namespace :v3 do # the name of the namespace is supposed to group topics that use the same serializer, think of it as an API versioning. The name of the namespace is going to be included in the topics' names, e.g. `v3_accounts`
86
+ serializer YourCustomSerializerClass
87
+
88
+ topic :accounts, genesis_replica: true, partition_key: :id do # Refer to Genesis section for more details regarding this options, by default it's false
89
+ publish Account
90
+ end
91
+
92
+ topic :rentals, partition_key: :account_id do # partition key as a name of the attribute
93
+ publish Availability
94
+ publish Bathroom
95
+ publish Bedroom
96
+ publish Rental
97
+ end
98
+
99
+ bookings_topic_partition_key_resolver = ->(resource) do # a partition key can also be a lambda
100
+ resource.id.to_s if resource.class.name == "Booking"
101
+ resource.rental_id.to_s if resource.respond_to?(:rental_id)
102
+ end
103
+
104
+ topic :bookings, partition_key: bookings_topic_partition_key_resolver do
105
+ publish Booking, with: [BookingsFee, BookingsTax]
106
+ end
107
+
108
+ topic :los_records, partition_key: :rental_id do
109
+ publish LosRecord
110
+ end
111
+ end
112
+ end
113
+ end
114
+ ```
115
+
116
+
117
+ There are a couple of important things to understand here.
118
+ - A namespace might be used for versioning so that you can have e.g., `v3` and v4` format working at the same time and consumers consumings from different ones as they need. Namespace is a part of the topic name, in the example above the following topics are declared: `v3_accounts`, `v3_rentals`, `v3_bookings`, `v3_los_records`. Most likely you will need to create them manually in the production environement, depending the Kafka cluster configuration.
119
+ - `topic` is a declaration of Kafka `topics`. To understand more about topics and what would be some rule of thumbs when designing them, please [read this article](https://www.smily.com/engineering/integration-patterns-for-distributed-architecture-intro-to-kafka).
120
+ - Some entities might have attributes depending on other entities (computed properties) or might need to be always published together (kind of like Domain-Driven Design Aggregate). For these cases, use `with` directive which is an equivalent of sideloading from REST APIs. E.g., Booking could have `final_price` attribute that depends on other models, like BookingsFee or BookingsTax, which contribute to that price. Publishing these items separately, e.g. first Bookings Fee first and then Booking with the changed final price might lead to inconsistency on the consumer side where `final_price` value doesn't match the value that would be obtained by summing all elements of the price. That's why all these records need to be published together. That's what `with` option is supposed to cover: `publish Booking, with: [BookingsFee, BookingsTax]`. Thanks to that declaration, any update to Booking or change its dependencies (BookingsFee, BookingsTax) such as creation/update/deletion will result in publishing `booking_updated` event.
121
+
122
+
123
+ #### Serializer
124
+
125
+ Serializer is a class that needs implements `serialize` method with the following signature:
126
+
127
+ ``` rb
128
+ class YourCustomSerializerClass
129
+ def self.serialize(record_or_records, dependencies:)
130
+ # do stuff here
131
+ end
132
+ end
133
+ ```
134
+
135
+ `record_or_records` is either a single record or the array of records and dependencies are what is defined via `with` option, in most cases this is going to be an empty array, in cases like Bookings in the example above it is going to be an array of dependencies to be sideloaded. The job of the serializer is to figure out how to find the right serializer (kind of like a factory) for a given model and how to sideload the dependencies and return the array of serialized payloads (could be one-element array when passing single record, but it needs to be an array).
136
+
137
+ The best way to implement serialization part would be to create `YourCustomSerializerClass` class inheriting from `Dionysus::Producer::Serializer`. Then, you would need to implement just a single method: `infer_serializer`:
138
+
139
+ ``` rb
140
+ class YourCustomSerializerClass < Dionysus::Producer::Serializer
141
+ def infer_serializer
142
+ somehow_figure_out_the_right_serializer_for_the_model_klass(model_klass)
143
+ end
144
+ end
145
+ ```
146
+
147
+ The `record` method will be available inside the class so that's how you can get a serializer for a specific model. And to implement the actual serializer for the model, you can create classes inherting from `ModelSerializer`:
148
+
149
+ ``` rb
150
+ class SomeModelSerializer < Dionysus::Producer::ModelSerializer
151
+ attributes :name, :some_other_attribute
152
+
153
+ has_one :account
154
+ has_many :related_records
155
+ end
156
+ ```
157
+
158
+ The declared attributes/relationships will be delegated to the given record by default, although you can override these methods.
159
+
160
+ To resolve serializers for declared relationships, also `YourCustomSerializerClass` will be used.
161
+
162
+ When testing serializers, you can just limit the scope of the test to `as_json` method:
163
+
164
+ ``` rb
165
+ SomeModelSerializer.new(record_to_serialize, include: array_of_underscored_dependencies_to_be_sideloaded, context_serializer: YourCustomSerializerClass).as_json
166
+ ```
167
+
168
+ You can also try testing using `YourCustomSerializerClass`, so that you could also verity that `infer_serializer` method works as expected:
169
+
170
+ ``` rb
171
+ YourCustomSerializerClass.serialize(record_or_records, dependencies: dependencies)
172
+ ```
173
+
174
+
175
+ ### Config options
176
+
177
+ #### Bypassing serializers
178
+
179
+
180
+ For a large volume of data, sometimes it doesn't make sense to use serializers foe certain use cases to serialize records individually. One example would be deleting records for models that are soft-deletable. If the only thing you expect your consumers to do is to delete records by ID for a given model and the amount of data is huge, you will be better off sending a single event with a lot of IDs instead of sending multiple events for every record individually while perofrming the full serialization. This is usually combined with `import` option on consumer side to make the cosuming even more efficient.
181
+
182
+
183
+ Here is a complere example of a use case where serialization is bypassed using `serialize: false` option with some extras that will be useful when thinking about `import` option on the consumer side. Let's consider hypotehtical `Record` model:
184
+
185
+
186
+ ``` rb
187
+
188
+ Dionysus::Producer.responders_for(Record).each do |responder|
189
+ partition_key = account.id.to_s
190
+ key = "RecordsCollection:#{account_id}"
191
+ created_records = Record.for_accounts(account).visible
192
+ canceled_records = Record.for_accounts(account).soft_deleted
193
+
194
+ message = [].tap do |current_message|
195
+ current_message << ["record_created", created_records.to_a, {}]
196
+ if canceled_records.any?
197
+ current_message << ["record_destroyed", canceled_records.map { |record| RecordPrometheusDTO.new(record) }, { serialize: false }]
198
+ end
199
+ end
200
+
201
+ result = responder.call(message, partition_key: partition_key, key: key)
202
+ end
203
+
204
+ class RecordDionysusDTO < SimpleDelegator
205
+ def as_json
206
+ {
207
+ id: id
208
+ }
209
+ end
210
+ end
211
+ ```
212
+
213
+ That way, the serializer will not load any relationships, etc., it will just serialize IDs for `records_destroyed` event and as a bonus part, it covers the case where it might be useful to not deal with records one by one but with a huge batch at once and then using something like [activerecord-import](https://github.com/zdennis/activerecord-import) on the consumer side.
214
+
215
+ #### Responders
216
+
217
+ Prior to Karafka 2.0, there used to be a concept of Responders that were responsible for publishing messages. This concept was dropped in Karafka 2.0, but a similar concept is still used in Dionysus as its predecessor was built on top of Karafka 1.x.
218
+
219
+ Responders implement `call` method that take `message` as a positional argument and `partition_key` and message `key`. Most likely you are not going to need to use this knowledge, but in case you do something really, here is an API to get responders:
220
+
221
+ - `Dionysus::Producer.responders_for(model_klass)` - get all responders for a given model class, regardless of the topic
222
+ - `Dionysus::Producer.responders_for_model_for_topic(model_klass, topic)` - get all responders for a given model class, for a given topic
223
+ - `Dionysus::Producer.responders_for_dependency_parent(model_klass)` - get all parent-responders for a given model class that is a dependency (when using `with` direcive), regardless of a topic
224
+ - `Dionysus::Producer.responders_for_dependency_parent(model_klass topic)` - get all parent-responders for a given model class that is a dependency (when using `with` direcive), for a given topic
225
+
226
+
227
+ #### Instrumentation & Event Bus
228
+
229
+
230
+ Instrumenter - an object for instrumentation expecting the following interface (this is the default class):
231
+
232
+ ``` rb
233
+ class Dionysus::Utils::NullInstrumenter
234
+ def self.instrument(name, payload = {})
235
+ yield
236
+ end
237
+ end
238
+ ```
239
+
240
+
241
+ Event Bus is useful if you want to react to some events, with the following interface ((this is the default class)):
242
+
243
+ ``` rb
244
+ class Dionysus::Utils::NullEventBus
245
+ def self.publish(name, payload)
246
+ end
247
+ end
248
+ ```
249
+
250
+ For the instrumentation, the entire publishing logic is wrapped with the following block: `instrumenter.instrument("Dionysus.respond.#{responder_class_name}")`.
251
+
252
+ For the event_bus, the event it published after getting a success response from Kafka: `event_bus.publish("Dionysus.respond", topic_name: topic_name, message: message, options: final_options)`
253
+
254
+ You can configure those dependencies in the initializer:
255
+
256
+ ``` rb
257
+ Dionysus::Producer.configure do |config|
258
+ config.instrumenter = MyInstrumentation
259
+ config.event_bus = MyEventBusForDionysus
260
+ end
261
+ ```
262
+
263
+ They are not required though; null-object-pattern-based objects are injected by default.
264
+
265
+
266
+ #### Sentry and Datadog integration
267
+
268
+ This is applicable to both consumers and producers. For Sentry and Datadog integration, add these 2 lines to your initializer:
269
+
270
+ ``` rb
271
+ Karafka.monitor.subscribe(Dionysus::Utils::KarafkaSentryListener)
272
+ Karafka.monitor.subscribe(Dionysus::Utils::KarafkaDatadogListener)
273
+ ```
274
+
275
+ Don't put these inside `Rails.application.config.to_prepare do` block.
276
+
277
+ #### Transactional Outbox Pattern
278
+
279
+ The typical problem on the producer's side you will experience is a possibility of losing some messages due to the lack of transactional boundaries as things like publishing events happens usually in `after_commit` event.
280
+
281
+ To prevent that, you could take advantage of [Transactional Outbox Pattern](https://microservices.io/patterns/data/transactional-outbox.html) which is implemented in this gem.
282
+
283
+ The idea is simple - store the messages in a temporary table (in the same transaction where creating/updating/deleting publishable record happens) and then publish them in a separate process and mark these messages as published.
284
+
285
+ Dionysus has also an extra optimization allowing publish from both after commit callbacks (for performance reasons) and also after a certain delay, to publish from a separate worker that reads data from the transactional outbox table, which covers the cases where some records were not published - in that case they will not be lost, but just retried later.
286
+
287
+ ##### Making models publishable
288
+
289
+ To make ActiveRecord models publishable, you need to make sure that 'Dionysus::Producer::Outbox::ActiveRecordPublishable' module is included in the model. This should be handled automatically by the gem when a model is declared inside `Dionysus::Producer.declare`.
290
+
291
+ Thanks to that, an outbox record will be created after each create/update/destroy event.
292
+
293
+ In some cases, you might want to publish update events, even after the record is soft-deleted. To do that, you need to override `dionysus_publish_updates_after_soft_delete?` method:
294
+
295
+ ```rb
296
+ def dionysus_publish_updates_after_soft_delete?
297
+ true
298
+ end
299
+ ```
300
+
301
+
302
+ ##### Outbox configuration
303
+
304
+ ``` rb
305
+ Dionysus::Producer.configure do |config|
306
+ config.database_connection_provider = ActiveRecord::Base # required
307
+ config.transaction_provider = ActiveRecord::Base # required
308
+ config.outbox_model = DionysusOutbox # required
309
+ config.outbox_publishing_batch_size = 100 # not required, defaults to 100
310
+ config.lock_client = Redlock::Client.new([ENV["REDIS_URL"]]) # required if you want to use more than a single worker/more than a single thread per worker, defaults to Dionysus::Producer::Outbox::NullLockClient. Check its interface and the interface of `redlock` gem. To cut the long story short, when the lock is acquired, a hash with the structure outlined in Dionysus::Producer::Outbox::NullLockClient should be yielded. If the lock is not acquired, a nil should be yielded.
311
+ config.lock_expiry_time = 10_000 # not required, defaults to 10_000, in milliseconds
312
+ config.error_handler = Sentry # not required but highly recommended, defaults Dionysus::Utils::NullErrorHandler. When using Sentry, you will probably want to exclude SignalException `config.excluded_exceptions += ["SignalException"]`.
313
+ config.soft_delete_column = :deleted_at # defaults to "canceled_at" when not provided
314
+ config.default_partition_key = :some_id # defaults to :account_id when not provided, you can override it per topic when declaring them with `partition_key` config option. You can pass either a symbol or a lambda taking the resource as the argument.
315
+ config.outbox_worker_sleep_seconds = 1 # defaults to 0.2 second when not provided, it's the time interval between each iteration of the outbox worker which fetches pubishable records, publishes them to Kafka and marks them as finished
316
+ config.transactional_outbox_enabled = false # not required, defaults to `true`. Set it to `false` only if you want to disable creating outbox records (which might be useful for the migration period). If you are not sure if you need this config setting or not, then probably you don't
317
+ config.publish_after_commit = true # not required, defaults to `false`. Check `Publishing records right after the transaction is committed` section for more details.
318
+ config.outbox_worker_publishing_delay = 5 # non required, defaults to 0 a delay in seconds until the outbox record is considered publishable. Check `Publishing records right after the transaction is committed` section for more details.
319
+ config.remove_consecutive_duplicates_before_publishing = true # not required, defaults to false. If set to true, the consecutive duplicates in the publishable batch will be removed and only one message will be published to a given topic. For example, if for whatever reason there are ten messages in a row for a given topic to publish `user_updated` ecent, only the last will be published. Check `Dionysus::Consumer::ParamsBatchTransformations::RemoveDuplicatesStrategy` for exact implementation. To verify if this feature is useful, it's recommended to browse Karafka UI and check messages in the topics if there are any obvious duplicates happening often.
320
+ config.observers_inline_maximum_size = 100 # not required, defaults to 1000. This config setting matters in case there is a huge amount of dependent records (observers). If the threshold is exceeded, the observers will be published via Genesis process to not cause issues like blocking the outbox worker.
321
+ config.sidekiq_queue = :default # required, defaults to :dionysus. The queue will be used for a genesis process
322
+ end
323
+ ```
324
+
325
+ ##### DionysusOutbox model
326
+
327
+ Generate a model for the outbox:
328
+
329
+ ```
330
+ rails generate model DionysusOutbox
331
+ ```
332
+
333
+ and use the following migration code:
334
+
335
+ ``` rb
336
+ create_table(:dionysus_outboxes) do |t|
337
+ t.string "resource_class", null: false
338
+ t.string "resource_id", null: false
339
+ t.string "event_name", null: false
340
+ t.string "topic", null: false
341
+ t.string "partition_key"
342
+ t.datetime "published_at"
343
+ t.datetime "failed_at"
344
+ t.datetime "retry_at"
345
+ t.string "error_class"
346
+ t.string "error_message"
347
+ t.integer "attempts", null: false, default: 0
348
+ t.datetime "created_at", precision: 6, null: false
349
+ t.datetime "updated_at", precision: 6, null: false
350
+
351
+ # some of these indexes are not needed, but they are here for convenience when checking stuff in console or when using a tartarus for archiving
352
+ t.index ["topic", "created_at"], name: "index_dionysus_outboxes_publishing_idx", where: "published_at IS NULL"
353
+ t.index ["resource_class", "event_name"], name: "index_dionysus_outboxes_on_resource_class_and_event"
354
+ t.index ["resource_class", "resource_id"], name: "index_dionysus_outboxes_on_resource_class_and_resource_id"
355
+ t.index ["topic"], name: "index_dionysus_outboxes_on_topic"
356
+ t.index ["created_at"], name: "index_dionysus_outboxes_on_created_at"
357
+ t.index ["resource_class", "created_at"], name: "index_dionysus_outboxes_on_resource_class_and_created_at"
358
+ t.index ["resource_class", "published_at"], name: "index_dionysus_outboxes_on_resource_class_and_published_at"
359
+ t.index ["published_at"], name: "index_dionysus_outboxes_on_published_at"
360
+ end
361
+ ```
362
+
363
+ You also need to include `Dionysus::Producer::Outbox::Model` module in your model:
364
+
365
+ ``` rb
366
+ class DionysusOutbox < ApplicationRecord
367
+ include Dionysus::Producer::Outbox::Model
368
+ end
369
+ ```
370
+
371
+ For testing publishable models, you can take advantage of the `Dionysus Transactional Outbox Publishable"` shared behavior. First, you need to require the following file:
372
+
373
+ ``` rb
374
+ require "dionysus/support/rspec/outbox_publishable"
375
+ ```
376
+
377
+ And then just add `it_behaves_like "Dionysis Transactional Outbox Publishable"` in models' specs.
378
+
379
+
380
+ ##### Running outbox worker
381
+
382
+ Use the following Rake task:
383
+
384
+ ```
385
+ DIONYSUS_THREADS_NUMBER=5 DB_POOL=10 bundle exec rake dionysus:producer
386
+ ```
387
+
388
+ If you want to use just a single thread:
389
+
390
+ ```
391
+ bundle exec rake dionysus:producer
392
+ ```
393
+
394
+ ##### Publishing records right after the transaction is committed
395
+
396
+ When the throughput of outbox records' creation is really high, there is a very good chance that it might take even few minutes to publish some records from the workers (due to the limited capacity).
397
+
398
+ In such case you might consider publishing records right after the transaction is committed. To do so, you need to:
399
+
400
+ 1. Enable publishing globally:
401
+
402
+ ``` rb
403
+ Dionysus::Producer.configure do |config|
404
+ config.publish_after_commit = true
405
+ end
406
+ ```
407
+
408
+ 2. Or enable/disable it per model where `Dionysus::Producer::Outbox::ActiveRecordPublishable` is included
409
+
410
+ ``` rb
411
+ class MyModel < ApplicationRecord
412
+ include Dionysus::Producer::Outbox::ActiveRecordPublishable
413
+
414
+ private
415
+
416
+ def publish_after_commit?
417
+ true
418
+ end
419
+ end
420
+ ```
421
+
422
+ To avoid publishing something or running into some conflicts with publishing records after the transaction (from `after_commit` callback) and from the outbox worker, it is recommended to add some delay for the outbox records until they will be considered publishable.
423
+
424
+ ``` rb
425
+ Dionysus::Producer.configure do |config|
426
+ config.outbox_worker_publishing_delay = 5 # in seconds, defaults to 0
427
+ end
428
+ ```
429
+
430
+ By default, the records will be considered publishable right away. With that config option, it will take 5 seconds after creation until they are considered publishable.
431
+
432
+ ##### Outbox Publishing Latency Tracking
433
+
434
+ It's highly recommended to tracking latency of publishing outbox records defined as the difference between the `published_at` and `created_at` timestamps.
435
+
436
+ ``` rb
437
+ Dionysus::Producer.configure do |config|
438
+ config.datadog_statsd_client = Datadog::Statsd.new("localhost", 8125, namespace: "application_name.production") # required for latency tracking, defaults to `nil`
439
+ config.high_priority_sidekiq_queue = :critical # not required, defaults to `:dionysus_high_priority`
440
+ end
441
+ ```
442
+
443
+ You also need to add a job to the sidekiq-cron schedule that will run every 1 minute:
444
+
445
+ ``` rb
446
+ Sidekiq.configure_server do |config|
447
+ config.on(:startup) do
448
+ Dionysus::Producer::Outbox::DatadogLatencyReporterScheduler.new.add_to_schedule
449
+ end
450
+ end
451
+ ```
452
+
453
+ With this setup, you will have the following metrics available on DataDog:
454
+
455
+ - `"#{namespace}.dionysus.producer.outbox.latency.minimum"`
456
+ - `"#{namespace}.dionysus.producer.outbox.latency.maximum"`
457
+ - `"#{namespace}.dionysus.producer.outbox.latency.average"`
458
+ - `"#{namespace}.dionysus.producer.outbox.latency.highest_since_creation_date`
459
+
460
+ ##### Archiving old outbox records
461
+
462
+ You will probably want to periodically archive/delete published outbox records. It's recommended to use [tartarus-rb](https://github.com/BookingSync/tartarus-rb) for that.
463
+
464
+ Here is an example config:
465
+
466
+ ```
467
+ tartarus.register do |item|
468
+ item.model = DionysusOutbox
469
+ item.cron = "5 4 * * *"
470
+ item.queue = "default"
471
+ item.archive_items_older_than = -> { 3.days.ago }
472
+ item.timestamp_field = :published_at
473
+ item.archive_with = :delete_all_using_limit_in_batches
474
+ end
475
+ ```
476
+
477
+
478
+ ##### Events, hooks and monitors
479
+
480
+ You can subscribe to certain events that are published by `Dionysus.monitor`. The monitor is based on [`dry-monitor`](https://github.com/dry-rb/dry-monitor).
481
+
482
+ Available events and arguments are:
483
+
484
+ - "outbox_producer.started", no arguments
485
+ - "outbox_producer.stopped", no arguments
486
+ - "outbox_producer.shutting_down", no arguments
487
+ - "outbox_producer.error", arguments: error, error_message
488
+ - "outbox_producer.publishing_failed", arguments: outbox_record
489
+ - "outbox_producer.published", arguments: outbox_record
490
+ - "outbox_producer.processing_topic", arguments: topic
491
+ - "outbox_producer.processed_topic", arguments: topic
492
+ - "outbox_producer.lock_exists_for_topic", arguments: topic
493
+
494
+
495
+ ##### Outbox Worker Health Check
496
+
497
+ You need to explicitly enable the health check (e.g. in the initializer, but it needs to be outside the `Rails.application.config.to_prepare` block):
498
+
499
+ ``` rb
500
+ Dionysus.enable_outbox_worker_healthcheck
501
+ ```
502
+
503
+ To perform the actual health check, use `bin/outbox_worker_health_check`. On success, the script exits with `0` status and on failure, it logs the error and exits with `1` status.
504
+
505
+ ```
506
+ bundle exec outbox_worker_health_check
507
+ ```
508
+
509
+ It works for both readiness and liveness checks.
510
+
511
+ #### Tombstoning records
512
+
513
+ The only way to get rid of messages under a given key from Kafka is to tombstone them. Use `Dionysus::Producer::Outbox::TombstonePublisher` to do it:
514
+
515
+ ``` rb
516
+ Dionysus::Producer::Outbox::TombstonePublisher.new.publish(resource, responder)
517
+ ```
518
+
519
+ Or if you want a custom `key`/`partition_key`:
520
+
521
+ ``` rb
522
+ Dionysus::Producer::Outbox::TombstonePublisher.new.publish(resource, responder, partition_key: partition_key, key: key)
523
+ ```
524
+
525
+
526
+ #### Genesis
527
+
528
+ When you add `dionysus-rb` to the existing application, there is a good chance that you will need to stream all of the existing records of the publishable models. Or maybe you changed the schema of the serializer introducing some new attributes and you want to re-stream the records. In either case you need to publish everything from scratch. Or in other words, perform a Genesis.
529
+
530
+ The way to handle this is to use `Dionysus::Producer::Genesis#stream` method, which is going to enqueue some Sidekiq jobs.
531
+
532
+ This method takes the following keyword arguments:
533
+ - `topic` - required, the name of the topic where you want to publish a given model (this is necessary as one model might be published to multiple topics)
534
+ - `model` - required, the model class you want to publish
535
+ - `from` - non-required, to be used together with `to`, it's establish the timeline defined by `from` and `to` timestamps to scope the records to the ones that were updated only during this time. Defaults to `nil`. Don't provide any value if you want to publish all records.
536
+ - `to` - non-required, to be used together with `from`, it's establish the timeline defined by `from` and `to` timestamps to scope the records to the ones that were updated only during this time. Defaults to `nil`. Don't provide any value if you want to publish all records.
537
+ - `number_of_days` - required, this arguments defined the timeline for executing all the jobs. If you set it to 7, it means the jobs to publish records to Kafka will be evenly distributed over 7 days. You can use fractions here as well, e.g. 0.5 for half of the day (12 hours).
538
+ - `streamer_job` non-required, defaults to `Dionysus::Producer::Genesis::Streamer::StandardJob`. In majority of the cases, you don't want to change this argument, but sometimes it might happen that you want to use a different strategy for streaming the records. For example, you might use model X that has a relationship to model Y and usually single record X contains thousands of models Y. In such case, you might intercept model X and provide some custom publishing logic for model Y. Check `Dionysus::Producer::Genesis::Streamer::BaseJob` if you want to apply some customization.
539
+
540
+ If you need the full mapping of all available topics and models, use `Dionysus::Producer.topics_models_mapping`.
541
+
542
+ When executing Genesis, `Dionysus::Producer::Genesis::Performed` event is going to be published via [Hermes](http://github.com/BookingSync/hermes-rb) if the gem is included. If you don't want for whatever reason to use Hermes, you can use `Dionysus::Utils::NullHermesEventProducer` (which is the default), the config options are described below.
543
+
544
+ Config options dedicated for Genesis feature:
545
+
546
+ ``` rb
547
+ Dionysus::Producer.configure do |config|
548
+ config.sidekiq_queue = :messaging # non-required, defaults to `:dionysus`. Remember that you need to add this queue to the Sidekiq config file.
549
+ config.publisher_service_name = "my_service" # non-required, defaults to `WaterDrop.config.client_id`
550
+ config.genesis_consistency_safety_delay = 120.seconds # non-required, defaults to `60.seconds`, this is an extra delay taking into consideration the time it might take to schedule the jobs so that you can have an accurate timeline for the Genesis window which is needed for `Dionysus::Producer::Genesis::Performed` event
551
+ config.hermes_event_producer = Dionysus::Utils::NullHermesEventProducer # non-required
552
+ end
553
+ ```
554
+
555
+ However, such a setup might not be ideal. If you have just a single topic where you publish both current events actually happening in the application at the given moment and also want to re-stream all the records, there is a good chance that you will end up with a huge lag on consumers' side at some point.
556
+
557
+ The recommended approach is to have two separate topics:
558
+
559
+ 1. a standard one, used for publishing current events - e.g. "v3_rentals". This topic should have also a limited retention configured, e.g. to be 7 days.
560
+ 2. a genesis-one, used for publishing everything - e.g. "v3_rentals_genesis". You might consider having an infinite retention in this topic.
561
+
562
+ Thanks to such a separation, there will not be an extra lag on the consumers' side causing delays with processing potentially critical events.
563
+
564
+ To achieve this result, add `genesis_replica: true` option when declaring a topic on the Producer's decide:
565
+
566
+ ``` rb
567
+ Dionysus::Producer.declare do
568
+ namespace :v3 do
569
+ serializer YourCustomSerializerClass
570
+
571
+ topic :accounts, genesis_replica: true do
572
+ publish Account
573
+ end
574
+ end
575
+ end
576
+ ```
577
+
578
+ When a topic is declared as such, there are 2 possible scenarios of publishing events
579
+ 1. When calling `Dionysus::Producer::Genesis#stream` with the *primary* topic as an argument (based on the example above: `v3_accounts`), the events will be published to both `v3_accounts` and `v3_accounts_genesis` topics
580
+ 2. When calling `Dionysus::Producer::Genesis#stream` with the *genesis* topic as an argument (based on the example above: `v3_accounts_genesis`), the events will be published to only `v3_accounts_genesis` topic
581
+
582
+ That implies that the event is always published to the genesis-topic. Only the primary one can be skipped. **IMPORTANT** This behavior is exactly the same during "standard" publishing, outside Genesis - the event will be published to both standard and genesis topic if the topic is declared as a genesis one.
583
+
584
+ The reasons behind this behavior is that the Genesis topic cannot have stale data, especially that it's expected to have an infinite retention.
585
+
586
+ It's a highly opinionated choice design, if you don't want to maintain a separate topic because you don't need an infinite storage, you can either set a super-short retention for the Genesis replica topic, or enable/disable the feature conditionally, e.g. via ENV variable:
587
+
588
+ ``` rb
589
+ use_genesis_replica_for_accounts_topics = (ENV.fetch("USE_GENESIS_REPLICA_FOR_ACCOUNTS_TOPIC", false).to_s == "true")
590
+
591
+ topic :accounts, genesis_replica: use_genesis_replica_for_accounts_topics do
592
+ publish Account
593
+ end
594
+ ```
595
+
596
+ Alternatively, feel free to submit a PR with a cleaner solution.
597
+
598
+ **Notice for consumers**: if you decide to introduce such a separation, it would be recommended to use dedicated consumers just for the genesis-topic.
599
+
600
+ ### Observers for dependencies for computed properties
601
+
602
+ Imagine the case where you have for example a Rental model, with some config attribute, for example `check_in_time`. Such an attribute might not necessarily be something that is directly readable from `rentals` table as a simple column. The logic might work in the way that the value from `rentals` table is returned if it's present or delegated to a related `Account` as a fallback. That means that the `Rental` has a dependency on `Account` and you probably want to observe `Account` and publish related rentals if some `default_check_in_time` attribute changes.
603
+
604
+ To handle it, you need to do 2 things:
605
+
606
+ 1. Add `changeset` column to the outbox model. If you don't need an encryption, just use `jsonb` type for the column. If you need encryption, use `text` type.
607
+ 2. Add a proper topic declaration. For the example described above, it could look like this:
608
+ ``` rb
609
+ topic :rentals do
610
+ publish Rental, observe: [
611
+ {
612
+ model: Account,
613
+ attributes: %i[default_check_in_time],
614
+ association_name: :rentals
615
+ }
616
+ ]
617
+ end
618
+ ```
619
+
620
+ It's going to work for both to-one and to-many relationships.
621
+
622
+ To make sure you the columns specified `attribues` actually exist, you can use the following service to validate them:
623
+
624
+ ``` rb
625
+ Dionysus::Producer::Registry::Validator.new.validate_columns
626
+ ```
627
+
628
+ You can put it in a separate spec to keep things simple or just use the following rake task:
629
+
630
+ ```
631
+ bundle exec rake dionysus:validate_columns
632
+ ```
633
+
634
+ You could also pass string of chained methods as `association_name`, for example: `association_name: "other.association.rentals"`. Also, when using strings, the validation will be skipped whether a given association exists or not (which is performed for symbols).
635
+
636
+ #### Encryption of changesets
637
+
638
+ If you store some sensitive data (e.g. anything in scope of GDPR), it will be a good idea to encrypt `changeset`. The recommend solution would be to use [crypt_keeper](https://github.com/jmazzi/crypt_keeper) gem. To make outbox records work with encrypted changesets, call `encrypts_changeset!` class method after declaring the encryption:
639
+
640
+ ``` rb
641
+ class DionysusOutboxEncrChangeset < ApplicationRecord
642
+ include Dionysus::Producer::Outbox::Model
643
+
644
+ crypt_keeper :changeset, encryptor: :postgres_pgp, key: ENV.fetch("CRYPT_KEEPER_KEY"), encoding: "UTF-8"
645
+
646
+ encrypts_changeset!
647
+ end
648
+ ```
649
+
650
+ ### Consumer
651
+
652
+ First, you need to define a file `karafka.rb` with a content like this:
653
+
654
+ ``` rb karafka.rb
655
+ # frozen_string_literal: true
656
+
657
+ Dionysus.initialize_application!(
658
+ environment: ENV["RAILS_ENV"],
659
+ seed_brokers: [ENV.fetch("DIONYSUS_SEED_BROKER")],
660
+ client_id: NAME_OF_THE_APP,
661
+ logger: Rails.logger
662
+ )
663
+ ```
664
+
665
+ `DIONYSUS_SEED_BROKER` is a string containing all the brokers separated a semicolon, e.g. `localhost:9092`. Protocol should not be included.
666
+
667
+
668
+ If you are migration from the gem prior to making `dionysus-rb` public, most likely you will need to also provide `consumer_group_prefix` for backwards compatibility:
669
+
670
+ ``` rb
671
+ Dionysus.initialize_application!(
672
+ environment: ENV["RAILS_ENV"],
673
+ seed_brokers: ENV.fetch("DIONYSUS_SEED_BROKER").split(";"),
674
+ client_id: ["NAME_OF_THE_APP"][],
675
+ logger: Rails.logger,
676
+ consumer_group_prefix: "prometheus_consumer_group_for"
677
+ )
678
+ ```
679
+
680
+ By default, the name of the consumer grpup will be "NAME_OF_THE_APP_dionysus_consumer_group_for_NAME_OF_THE_APP" where `dionysus_consumer_group_for` is a `consumer_group_prefix`.
681
+
682
+
683
+ And define `dionysus.rb` initializer:
684
+
685
+ ``` rb config/initializers/dionysus.rb
686
+ Rails.application.config.to_prepare do
687
+ Dionysus::Consumer.declare do
688
+ namespace :v3 do
689
+ topic :rentals do
690
+ dead_letter_queue(topic: "dead_messages", max_retries: 2)
691
+ end
692
+ end
693
+
694
+ Dionysus::Consumer.configure do |config|
695
+ config.transaction_provider = ActiveRecord::Base # not required, but highly recommended
696
+ config.model_factory = DionysusModelFactory # required
697
+ end
698
+ end
699
+
700
+ Dionysus.initialize_application!(
701
+ environment: ENV["RAILS_ENV"],
702
+ seed_brokers: [ENV.fetch("DIONYSUS_SEED_BROKER")],
703
+ client_id: NAME_OF_THE_APP,
704
+ logger: Rails.logger
705
+ )
706
+ end
707
+ ```
708
+
709
+ Notice that you can provide a block to the `topic` method, which allows you to provide some extra configuration options (the same ones as in Karafka, e.g. a Dead Letter Queue config).
710
+
711
+ The structure of namespace/topics must reflect what is configured by a Producer! You just don't need to declare specific models, that happens automatically/can be configured with `model_factory` where you could e.g., return nil for the models that you don't want to be processed.
712
+
713
+ `model_factory` is an object that returns a model class (or a proper factory! It does not need to be a model class, but returning ActiveRecord model class will work and will be the simplest way to deal with it. Check specs for more details if you want to decouple it from using model classes directly) for a given name, e.g.:
714
+
715
+ ``` rb
716
+ class DionysusModelFactory
717
+ def self.for_model(model_name)
718
+ model_name.classify.gsub("::", "").constantize rescue nil
719
+ end
720
+ end
721
+ ```
722
+
723
+ Start `karafka server`:
724
+
725
+ ```
726
+ bundle exec karafka server
727
+ ```
728
+
729
+ That will be enough to process `_created`, `_updated`, and `_destroyed` events in a generic way.
730
+
731
+ So far, Dionysus expects format to be compliant with [BookingSync API v3](https://developers.bookingsync.com/reference/). It also performs some special mapping (the notation is attribute from payload to local attribute):
732
+ - id -> synced_id
733
+ - created_at -> synced_created_at
734
+ - updated_at -> synced_updated_at
735
+ - canceled_at -> synced_canceled_at
736
+ - relationship_id -> synced_relationship_id
737
+ - relationship_type -> synced_relationship_type (for polymorphic associations)
738
+
739
+ Also, Dionysus checks timestamps (`updated_at` or `created_at` from payload with local `synced_updated_at` or `synced_created_at` values). If the remote timestamp is from the past comparing to local timestamps, the persistence will not be executed. `synced_updated_at`/`synced_created_at` are configurable (check config options reference)
740
+
741
+ #### Consumer Base Class
742
+
743
+ If you are happy with `Karafka::BaseConsumer` being a base class for all your consumers, you don't need to do anything as this is a default. If you want to customize it, you have two options:
744
+
745
+ 1. Global config - specify a base class in Consumer Config in an initializer via `consumer_base_class` attribute:
746
+
747
+
748
+ ``` rb
749
+ Dionysus::Consumer::Config.configure do |config|
750
+ config.consumer_base_class = CustomConsumerClassInhertingFromKarafkaBaseConsumer
751
+ end
752
+ ```
753
+
754
+ 2. Specify per topic - which also takes precedence over a global config (so you can use both of these options!) via `consumer_base_class` option:
755
+
756
+ ``` rb
757
+ topic :rentals, consumer_base_class: CustomConsumerClassInhertingFromKarafkaBaseConsumer
758
+ ```
759
+
760
+ Here is an example:
761
+
762
+ ``` rb
763
+ class CustomConsumerClassInhertingFromKarafkaBaseConsumer < Karafka::BaseConsumer
764
+ alias_method :original_on_consume, :on_consume
765
+
766
+ def on_consume
767
+ Retryable.perform(times: 3, errors: errors_to_retry, before_retry: BeforeRetry) do
768
+ original_on_consume
769
+ end
770
+ end
771
+
772
+ private
773
+
774
+ def errors_to_retry
775
+ @errors_to_retry ||= [ActiveRecord::StatementInvalid, PG::ConnectionBad, PG::Error]
776
+ end
777
+
778
+ class Retryable
779
+ def self.perform(times:, errors:, before_retry: ->(_error) {})
780
+ executed = 0
781
+ begin
782
+ executed += 1
783
+ yield
784
+ rescue *errors => e
785
+ if executed < times
786
+ before_retry.call(e)
787
+ retry
788
+ else
789
+ raise e
790
+ end
791
+ end
792
+ end
793
+ end
794
+
795
+ class BeforeRetry
796
+ def self.call(_error)
797
+ ActiveRecord::Base.clear_active_connections!
798
+ end
799
+ end
800
+ end
801
+ ```
802
+
803
+
804
+ #### Retryable consuming
805
+
806
+ When consuming the events, it might happen that some errors will occur (similar case mentioned already for consumer base class section). If you want to retry from the errors in some way, you can inject a custom `retry_provider` which is supposed to be an object implementing `retry` method that should yield a block. You can specify it on a config level:
807
+
808
+ ``` rb
809
+ Dionysus::Consumer::Config.configure do |config|
810
+ config.retry_provider = CustomRetryProvider.new
811
+ end
812
+ ```
813
+
814
+ Here is an example:
815
+
816
+
817
+ ``` rb
818
+ class CustomRetryProvider
819
+ def retry(&block)
820
+ Retryable.perform(times: 3, errors: errors_to_retry, before_retry: BeforeRetry, &block)
821
+ end
822
+
823
+ private
824
+
825
+ def errors_to_retry
826
+ @errors_to_retry ||= [ActiveRecord::StatementInvalid, PG::ConnectionBad, PG::Error]
827
+ end
828
+
829
+ class Retryable
830
+ def self.perform(times:, errors:, before_retry: ->(_error) {})
831
+ executed = 0
832
+ begin
833
+ executed += 1
834
+ yield
835
+ rescue *errors => e
836
+ if executed < times
837
+ before_retry.call(e)
838
+ retry
839
+ else
840
+ raise e
841
+ end
842
+ end
843
+ end
844
+ end
845
+
846
+ class BeforeRetry
847
+ def self.call(_error)
848
+ ActiveRecord::Base.clear_active_connections!
849
+ end
850
+ end
851
+ end
852
+ ```
853
+
854
+ #### Association/disassociation of relationships
855
+
856
+ For relationships, especially the sideloaded ones, Dionysus doesn't know if something is has_many or has_many through relationship type, so it doesn't automatically perform linking between records. If Booking is serialized with BookingsFee, it will create/update Booking and BookingsFee as if there were some separate events, but will not magically link them. Most likely, in this scenario, BookingsFee will be linked to Booking anyway via foreign keys via synced attributes (BookingsFee will have synced_booking_id), bu `has_many :through relationship` it is not going to happen. Dionysus doesn't try to guess and allows to define the way associations should be linked to the consumer. The models need to implement the following methods:
857
+
858
+ ``` rb
859
+ def resolve_to_one_association(name, id_from_remote_payload)
860
+ end
861
+
862
+ def resolve_to_many_association(name, ids_from_remote_payload)
863
+ end
864
+ ```
865
+
866
+ If you don't have has_one through relationships, you can leave `resolve_to_one_association` empty. If you don't have has_many through relationships, you can implement `resolve_to_many_association` in the following way:
867
+
868
+ ``` rb
869
+ def resolve_to_many_association(name, ids_from_remote_payload)
870
+ public_send(name).where.not(id: ids_from_remote_payload).destroy_all
871
+ end
872
+ ```
873
+
874
+ and add it to `ApplicationRecord`.
875
+
876
+ Thay way, e.g., BookingsFees that are locally associated to the Booking but were not removed yet (but were on the Producer side, that's why they are no longer in the payload) will be cleaned up.
877
+
878
+ ### Handling deletion/cancelation/restoration (soft-delete)
879
+
880
+ By default, all records are restored on create/update event by setting `soft_deleted_at_timestamp_attribute` (by default, `synced_canceled_at`) to nil.
881
+
882
+ For soft delete, there are a couple of ways this can work:
883
+ - if it's possible to soft delete record by setting a timestamp, it will be done that way (e.g., by setting `:synced_canceled_at` to a timestamp from payload)
884
+ - if `canceled_at` is not available in the payload, but the model responds to the method configured via `soft_delete_strategy` (by default: `:cancel`), that method will be called.
885
+ - if there is no other option to soft-delete the record, `destroy` method will be called.
886
+
887
+ #### Batch import
888
+
889
+ For high volume of data, you don't probably want to process records one by one, but batch-import them. In that case, you can specify `import` option for the topic:
890
+
891
+ ``` rb
892
+ topic :heavy_records, import: true
893
+ ```
894
+
895
+ That way, for `heavy_record_created` event there will not be persistence executed one by one, but instead, `dionysus_import` method will be called on the object returned by a model factory (here, most likely HeavtRecord class, or any other model class). The argument of the method is an array of deserialized data access objects responding to `attributes`, `has_many` and `has_one` methods. You may want to inspect the payload of each of them, although most likely you will be just interested here in the `attributes` only (unless something is sideloaded) which will contain the serialized payload (on the producer side) for attributes with some transformations applied on top for reserved attributes (id, created_at, canceled_at, updated_at).
896
+
897
+ Also, this will impact `heavy_record_destroyed` event. In that case, you need to handle the logic using `dionysus_destroy` method that is called exactly in the same way as `dionysus_import`. The recommended way to handle the payload would be to extract IDs and find the corresponding records and (sof)delete them using `update_all` method with a single query.
898
+
899
+ #### Batch transformation
900
+
901
+ You can perform some transformations on the batch of records before the batch is processed. By default `Dionysus::Consumer::ParamsBatchTransformations::RemoveDuplicatesStrategy` is applied which removes duplicates for `_updated` events from the batch (based on message `key`) and keeps the most recent event only.
902
+
903
+ You can disable it by explicitly setting it to `nil`:
904
+
905
+ ``` rb
906
+ topic :my_topic, params_batch_transformation: nil
907
+ ```
908
+
909
+ It is also a very useful addition when using `import: true` option. `params_batch` will always contain multiple items that will be processed sequentially, one by one. When using `import: true` that probably doesn't make much sense for the performance reasons and it might a better idea to just merge all the batches into a single one or some grouped batches.
910
+
911
+ You can do that by applying `params_batch_transformation` which expects an object with lambda-like interface responding to `call` method taking a single argument which is `params_batch`:
912
+
913
+ ``` rb
914
+ topic :heavy_records, import: true, params_batch_transformation: ->(params_batch) { do_some_merging_logic_here }
915
+ ```
916
+
917
+ #### Concurrency
918
+
919
+ If you process records only from a single topic/partition, you will not have any issue with concurrent processing of the same record, but if you process from multiple partitions where the same record can get published, you might run into some conflicts. In such a case, you might consider using a mutex, like advisory lock from Postgres. You can use `processing_mutex_provider` and `processing_mutex_method_name` for that that:
920
+
921
+ ``` rb
922
+ Dionysus::Consumer.configure do |config|
923
+ config.processing_mutex_provider = ActiveRecord::Base # optional
924
+ config.processing_mutex_method_name = :with_advisory_lock # optional, https://github.com/ClosureTree/with_advisory_lock
925
+ end
926
+ ```
927
+
928
+ Keep in mind that this is going impact database load.
929
+
930
+ #### Storing entire payload
931
+
932
+ It might be the case that when you added a given model to consuming application it didn't contain all the attributes that were serialized in the message and these attribute will be needed in the future. You have 2 options how to handle it:
933
+
934
+ 1. Reset offset and consume everything from the beginning (absolutely not recommended for large volume of data)
935
+ 2. Store all the attributes that were present in the message for a given record so that you can reuse them later
936
+
937
+ Dionysus forces you to go with the second option and expects that all entities will have `synced_data` accessor (although the name is configurable) which will store that payload.
938
+
939
+ If you want to configure the behavior of this attribute (you might for example want to store the data as a separate model), just override the attribute:
940
+
941
+ ``` rb
942
+ class MyModel < ApplicationRecord
943
+ after_save :persist_synced_data
944
+
945
+ attr_accessor :synced_data
946
+
947
+ def synced_data=(data)
948
+ @synced_data = data
949
+ end
950
+
951
+ private
952
+
953
+ def persist_synced_data
954
+ synced_data_entity = SyncedDataEntity.find_or_initialize_by(model_name: self.class.to_s, model_id: id)
955
+ synced_data_entity.synced_data = synced_data
956
+ synced_data_entity.save!
957
+ end
958
+ end
959
+ ```
960
+
961
+ That way you can store the payloads under a polymorphic SyncedDataEntity model. Or you can avoid storing anthing if that's your choice.
962
+
963
+ ##### Assigning values from `synced_data`
964
+
965
+ Sometimes it might happen that you would like to assign a value from `synced_data` to a model's column, e.g., when some column was missing.
966
+
967
+ To do that, you can use `Dionysus::Consumer::SyncedData::AssignColumnsFromSyncedDataJob.enqueue`:
968
+
969
+ ``` rb
970
+ Dionysus::Consumer::SyncedData::AssignColumnsFromSyncedDataJob.enqueue(model_class, columns, batch_size:) # batch_size defaults to 1000
971
+ ```
972
+
973
+ To make it work, you need to make sure these values on the config level are properly set:
974
+
975
+
976
+ ``` rb
977
+ Dionysus::Consumer.configure do |config|
978
+ config.resolve_synced_data_hash_proc = ->(record) { record.synced_data_model.synced_data_hash } # optional, defaults to ->(record) { record.public_send(Dionysus::Consumer.configuration.synced_data_attribute).to_h }
979
+ config.sidekiq_queue = :default # optional, defaults to `:dionysus`
980
+ end
981
+ ```
982
+
983
+ If you store `synced_data` as a `jsonb` attribute on the model level, you don't need to adjust `resolve_synced_data_hash_proc`.
984
+
985
+
986
+ #### Globalize extensions
987
+
988
+ (This is no related to Dionysis itself, but might be useful)
989
+
990
+ If you use `globalize` gem, there might be a chance that you will serialize translatable attributes to the following format:
991
+
992
+ ```rb
993
+ {
994
+ "translatable_attribute" => {
995
+ "en" => "English",
996
+ "fr" => "French"
997
+ }
998
+ }
999
+ ```
1000
+
1001
+ To handle translated attributes correctly on the consumer, you might need the following patch for `globalize` gem, you can put it, e.g., in the initializer:
1002
+
1003
+ ``` rb
1004
+ # frozen_string_literal: true
1005
+
1006
+ module Globalize::ActiveRecord::ClassMethods
1007
+ protected
1008
+
1009
+ def define_translated_attr_writer(name)
1010
+ define_method(:"#{name}=") do |value|
1011
+ if value.is_a?(Hash)
1012
+ send("#{name}_translations").each_key { |locale| value[locale] ||= "" }
1013
+ value.each do |(locale, val)|
1014
+ write_attribute(name, val, locale: locale)
1015
+ end
1016
+ else
1017
+ write_attribute(name, value)
1018
+ end
1019
+ end
1020
+ end
1021
+
1022
+ def define_translations_accessor(name)
1023
+ attribute(name, ::ActiveRecord::Type::Value.new) if Globalize.rails_5?
1024
+ define_translations_reader(name)
1025
+ define_translations_writer(name)
1026
+ define_translation_used_locales(name)
1027
+ end
1028
+
1029
+ def define_translation_used_locales(name)
1030
+ define_method(:"#{name}_used_locales") do
1031
+ send("#{name}_translations").select { |_key, value| value.present? }.keys
1032
+ end
1033
+ end
1034
+ end
1035
+ ```
1036
+
1037
+ And the specs (you might need to adjust a `model`):
1038
+
1039
+ ``` rb
1040
+ # frozen_string_literal: true
1041
+
1042
+ require "rails_helper"
1043
+
1044
+ RSpec.describe "Globalize extensions" do
1045
+ describe "assigning hash" do
1046
+ subject(:model_name_translations) { model.name_translations }
1047
+
1048
+ context "when hash is not empty" do
1049
+ let(:assign_name) { model.name = name_translations }
1050
+ let(:model) { Record.new }
1051
+ let(:name_translations) do
1052
+ {
1053
+ "en" => "record",
1054
+ "fr" => "record in French"
1055
+ }
1056
+ end
1057
+
1058
+ it "assigns values to a proper locale" do
1059
+ assign_name
1060
+
1061
+ expect(model_name_translations).to eq name_translations
1062
+ end
1063
+ end
1064
+
1065
+ context "when hash is empty and some translations were assigned before" do
1066
+ let(:assign_name) { model.name = name_translations }
1067
+ let(:model) { Record.new }
1068
+ let(:original_translations) do
1069
+ {
1070
+ "en" => "record",
1071
+ "fr" => "record in French"
1072
+ }
1073
+ end
1074
+ let(:name_translations) do
1075
+ {}
1076
+ end
1077
+ let(:expected_result) do
1078
+ {
1079
+ "en" => "",
1080
+ "fr" => ""
1081
+ }
1082
+ end
1083
+
1084
+ before do
1085
+ model.name = original_translations
1086
+ end
1087
+
1088
+ it "assigns nullified values for all locales" do
1089
+ assign_name
1090
+
1091
+ expect(model_name_translations).to eq expected_result
1092
+ end
1093
+ end
1094
+ end
1095
+
1096
+ describe "used locales" do
1097
+ subject(:name_used_locales) { model.name_used_locales }
1098
+
1099
+ let(:assign_name) { model.name = name_translations }
1100
+ let(:model) { Record.new }
1101
+ let(:name_translations) do
1102
+ {
1103
+ "en" => "record",
1104
+ "fr" => "record in French"
1105
+ }
1106
+ end
1107
+
1108
+ it "adds a method that extracts used locales" do
1109
+ assign_name
1110
+
1111
+ expect(name_used_locales).to match_array %w[en fr]
1112
+ end
1113
+ end
1114
+ end
1115
+
1116
+
1117
+ ```
1118
+
1119
+ #### Config options
1120
+
1121
+ Full config reference:
1122
+
1123
+ ``` rb
1124
+ Dionysus::Consumer.configure do |config|
1125
+ config.transaction_provider = ActiveRecord::Base # not required, but highly recommended
1126
+ config.model_factory = DionysusModelFactory # required
1127
+ config.instrumenter = MyInstrumentation # optional
1128
+ config.processing_mutex_provider = ActiveRecord::Base # optional
1129
+ config.processing_mutex_method_name = :with_advisory_lock # optional
1130
+ config.event_bus = MyEventBusForDionysus # optional
1131
+ config.soft_delete_strategy = :cancel # optional, default: :cancel
1132
+ config.soft_deleted_at_timestamp_attribute = :synced_canceled_at # optional, default: :synced_canceled_at
1133
+ config.synced_created_at_timestamp_attribute = :synced_created_at # optional, default: :synced_created_at
1134
+ config.synced_updated_at_timestamp_attribute = :synced_updated_at # optional, default: :synced_updated_at
1135
+ config.synced_id_attribute = :synced_id # optional, default: :synced_id
1136
+ config.synced_data_attribute = :synced_data # required, default: :synced_data
1137
+ config.resolve_synced_data_hash_proc = ->(record) { record.synced_data_model.synced_data_hash } # optional, defaults to ->(record) { record.public_send(Dionysus::Consumer.configuration.synced_data_attribute).to_h }
1138
+ config.sidekiq_queue = :default # optional, defaults to `:dionysus`
1139
+ config.message_filter = FilterIgnoringLargeMessageToAvoidOutofMemoryErrors.new(error_handler: Sentry) # not required, defaults to Dionysus::Utils::DefaultMessageFilter, which doesn't ignore any messages. It can be useful when you want to ignore some messages, e.g. some very large ones that would cause OOM error. Check the implementation of `Dionysus::Utils::DefaultMessageFilter for more details to understand what kind of arguments are available to set the condition. `error_handler` needs to implement Sentry-like interface.
1140
+
1141
+ # if you ever need to provide mapping:
1142
+
1143
+ config.add_attributes_mapping_for_model("Rental") do
1144
+ {
1145
+ local_rental_type: :remote_rental_type
1146
+ }
1147
+ end
1148
+ end
1149
+ ```
1150
+
1151
+ #### Instrumentation & Event Bus
1152
+
1153
+
1154
+ Check publisher for reference about instrumentation and event bus. The only difference is about the methods that are instrumented and events that are published.
1155
+
1156
+ For the event bus, you may expect the `dionysus.consume` event. It contains the following attributes:
1157
+ - `topic_name`, e.g. "v3_inbox", "v3_rentals"
1158
+ - `model_name`, e.g. "Conversation", "Rental"
1159
+ - `event_name`, e.g. "rental_created", "converation_updated", "message_destroyed"
1160
+ - `transformed_data`, deserialized event payload. Please check out DeserializedRecord in `Dionysus::Consumer::Deserializer`
1161
+ - `local_changes`, this contains all changes that took place while handling this event. It contains a hash with keys as array of two elements: model/relationship name and its id from Core. Every value is a result of `ActiveModel#changes` that is called before committing it locally. This will contain changes of the main resource as well as all of relationships included. An example of possible value:
1162
+
1163
+ ``` rb
1164
+ {
1165
+ ["Rental", 1] => {"name" => ["old name", "Villa Saganaki"] },
1166
+ ["bookings", 101] => {"start_at" => [nil, 1] }
1167
+ }
1168
+ ```
1169
+
1170
+ Event bus is the recommended way to do something upon consuming events if you want to avoid putting that logic into ActiveRecord callbacks.
1171
+
1172
+
1173
+
1174
+ #### Karafka Worker Health check
1175
+
1176
+ If you want to perform a karafka health check (for consumer apps), use `Dionysus::Checks::HealthCheck.check`.
1177
+
1178
+ To make it work, you need to assign healthcheck to `Dionysus`:
1179
+
1180
+ ``` rb
1181
+ # in the initializer, after calling `initialize_application!`
1182
+ Dionysus.health_check = Dionysus::Checks::HealthCheck.new
1183
+ ```
1184
+
1185
+ It works for both readiness and liveness probes. However, keep in mind that you need to enable statistics emission for Karafka to liveness checks work (by setting `statistics.interval.ms' - [more about it here](https://karafka.io/docs/Monitoring-and-Logging/#naming-considerations-for-custom-events)).
1186
+
1187
+ To perform the actual health check, use `bin/karafka_health_check`. On success, the script exits with `0` status and on failure, it logs the error and exits with `1` status.
1188
+
1189
+ ```
1190
+ bundle exec karafka_health_check
1191
+ ```
1192
+
1193
+ ## Development
1194
+
1195
+ After checking out the repo, run `bin/setup` to install dependencies. Then, run `rake spec` to run the tests. You can also run `bin/console` for an interactive prompt that will allow you to experiment.
1196
+
1197
+ To install this gem onto your local machine, run `bundle exec rake install`. To release a new version, update the version number in `version.rb`, and then run `bundle exec rake release`, which will create a git tag for the version, push git commits and the created tag, and push the `.gem` file to [rubygems.org](https://rubygems.org).
1198
+
1199
+ ## Contributing
1200
+
1201
+ Bug reports and pull requests are welcome on GitHub at https://github.com/BookingSync/dionysus-rb.
1202
+
1203
+ ## License
1204
+
1205
+ The gem is available as open source under the terms of the [MIT License](https://opensource.org/licenses/MIT).
1206
+ ["NAME_OF_THE_APP"]: