karafka 0.5.0
Sign up to get free protection for your applications and to get access to all the features.
- checksums.yaml +7 -0
- data/.gitignore +68 -0
- data/.ruby-gemset +1 -0
- data/.ruby-version +1 -0
- data/.travis.yml +6 -0
- data/CHANGELOG.md +202 -0
- data/Gemfile +8 -0
- data/Gemfile.lock +216 -0
- data/MIT-LICENCE +18 -0
- data/README.md +831 -0
- data/Rakefile +17 -0
- data/bin/karafka +7 -0
- data/karafka.gemspec +34 -0
- data/lib/karafka.rb +73 -0
- data/lib/karafka/app.rb +45 -0
- data/lib/karafka/base_controller.rb +162 -0
- data/lib/karafka/base_responder.rb +118 -0
- data/lib/karafka/base_worker.rb +41 -0
- data/lib/karafka/capistrano.rb +2 -0
- data/lib/karafka/capistrano/karafka.cap +84 -0
- data/lib/karafka/cli.rb +52 -0
- data/lib/karafka/cli/base.rb +74 -0
- data/lib/karafka/cli/console.rb +23 -0
- data/lib/karafka/cli/flow.rb +46 -0
- data/lib/karafka/cli/info.rb +26 -0
- data/lib/karafka/cli/install.rb +45 -0
- data/lib/karafka/cli/routes.rb +39 -0
- data/lib/karafka/cli/server.rb +59 -0
- data/lib/karafka/cli/worker.rb +26 -0
- data/lib/karafka/connection/consumer.rb +29 -0
- data/lib/karafka/connection/listener.rb +54 -0
- data/lib/karafka/connection/message.rb +17 -0
- data/lib/karafka/connection/topic_consumer.rb +48 -0
- data/lib/karafka/errors.rb +50 -0
- data/lib/karafka/fetcher.rb +40 -0
- data/lib/karafka/helpers/class_matcher.rb +77 -0
- data/lib/karafka/helpers/multi_delegator.rb +31 -0
- data/lib/karafka/loader.rb +77 -0
- data/lib/karafka/logger.rb +52 -0
- data/lib/karafka/monitor.rb +82 -0
- data/lib/karafka/params/interchanger.rb +33 -0
- data/lib/karafka/params/params.rb +102 -0
- data/lib/karafka/patches/dry/configurable/config.rb +37 -0
- data/lib/karafka/process.rb +61 -0
- data/lib/karafka/responders/builder.rb +33 -0
- data/lib/karafka/responders/topic.rb +43 -0
- data/lib/karafka/responders/usage_validator.rb +59 -0
- data/lib/karafka/routing/builder.rb +89 -0
- data/lib/karafka/routing/route.rb +80 -0
- data/lib/karafka/routing/router.rb +38 -0
- data/lib/karafka/server.rb +53 -0
- data/lib/karafka/setup/config.rb +57 -0
- data/lib/karafka/setup/configurators/base.rb +33 -0
- data/lib/karafka/setup/configurators/celluloid.rb +20 -0
- data/lib/karafka/setup/configurators/sidekiq.rb +34 -0
- data/lib/karafka/setup/configurators/water_drop.rb +19 -0
- data/lib/karafka/setup/configurators/worker_glass.rb +13 -0
- data/lib/karafka/status.rb +23 -0
- data/lib/karafka/templates/app.rb.example +26 -0
- data/lib/karafka/templates/application_controller.rb.example +5 -0
- data/lib/karafka/templates/application_responder.rb.example +9 -0
- data/lib/karafka/templates/application_worker.rb.example +12 -0
- data/lib/karafka/templates/config.ru.example +13 -0
- data/lib/karafka/templates/sidekiq.yml.example +26 -0
- data/lib/karafka/version.rb +6 -0
- data/lib/karafka/workers/builder.rb +49 -0
- data/log/.gitkeep +0 -0
- metadata +267 -0
data/MIT-LICENCE
ADDED
@@ -0,0 +1,18 @@
|
|
1
|
+
Permission is hereby granted, free of charge, to any person obtaining
|
2
|
+
a copy of this software and associated documentation files (the
|
3
|
+
"Software"), to deal in the Software without restriction, including
|
4
|
+
without limitation the rights to use, copy, modify, merge, publish,
|
5
|
+
distribute, sublicense, and/or sell copies of the Software, and to
|
6
|
+
permit persons to whom the Software is furnished to do so, subject to
|
7
|
+
the following conditions:
|
8
|
+
|
9
|
+
The above copyright notice and this permission notice shall be
|
10
|
+
included in all copies or substantial portions of the Software.
|
11
|
+
|
12
|
+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
|
13
|
+
EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
|
14
|
+
MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
|
15
|
+
NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE
|
16
|
+
LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION
|
17
|
+
OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION
|
18
|
+
WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
|
data/README.md
ADDED
@@ -0,0 +1,831 @@
|
|
1
|
+
# Karafka
|
2
|
+
|
3
|
+
[![Build Status](https://travis-ci.org/karafka/karafka.png)](https://travis-ci.org/karafka/karafka)
|
4
|
+
[![Code Climate](https://codeclimate.com/github/karafka/karafka/badges/gpa.svg)](https://codeclimate.com/github/karafka/karafka)
|
5
|
+
[![Join the chat at https://gitter.im/karafka/karafka](https://badges.gitter.im/karafka/karafka.svg)](https://gitter.im/karafka/karafka?utm_source=badge&utm_medium=badge&utm_campaign=pr-badge&utm_content=badge)
|
6
|
+
|
7
|
+
Framework used to simplify Apache Kafka based Ruby applications development.
|
8
|
+
|
9
|
+
It allows programmers to use approach similar to "the Rails way" when working with asynchronous Kafka messages.
|
10
|
+
|
11
|
+
Karafka not only handles incoming messages but also provides tools for building complex data-flow applications that receive and send messages.
|
12
|
+
|
13
|
+
## Table of Contents
|
14
|
+
|
15
|
+
- [Table of Contents](#table-of-contents)
|
16
|
+
- [Support](#support)
|
17
|
+
- [Requirements](#requirements)
|
18
|
+
- [How does it work](#how-does-it-work)
|
19
|
+
- [Installation](#installation)
|
20
|
+
- [Setup](#setup)
|
21
|
+
- [Application](#application)
|
22
|
+
- [Configurators](#configurators)
|
23
|
+
- [Environment variables settings](#environment-variables-settings)
|
24
|
+
- [Kafka brokers auto-discovery](#kafka-brokers-auto-discovery)
|
25
|
+
- [Usage](#usage)
|
26
|
+
- [Karafka CLI](#karafka-cli)
|
27
|
+
- [Routing](#routing)
|
28
|
+
- [Topic](#topic)
|
29
|
+
- [Group](#group)
|
30
|
+
- [Worker](#worker)
|
31
|
+
- [Parser](#parser)
|
32
|
+
- [Interchanger](#interchanger)
|
33
|
+
- [Responder](#responder)
|
34
|
+
- [Receiving messages](#receiving-messages)
|
35
|
+
- [Processing messages directly (without Sidekiq)](#processing-messages-directly-without-sidekiq)
|
36
|
+
- [Sending messages from Karafka](#sending-messages-from-karafka)
|
37
|
+
- [Using responders (recommended)](#using-responders-recommended)
|
38
|
+
- [Using WaterDrop directly](#using-waterdrop-directly)
|
39
|
+
- [Important components](#important-components)
|
40
|
+
- [Controllers](#controllers)
|
41
|
+
- [Controllers callbacks](#controllers-callbacks)
|
42
|
+
- [Responders](#responders)
|
43
|
+
- [Registering topics](#registering-topics)
|
44
|
+
- [Responding on topics](#responding-on-topics)
|
45
|
+
- [Response validation](#response-validation)
|
46
|
+
- [Monitoring and logging](#monitoring-and-logging)
|
47
|
+
- [Example monitor with Errbit/Airbrake support](#example-monitor-with-errbitairbrake-support)
|
48
|
+
- [Example monitor with NewRelic support](#example-monitor-with-newrelic-support)
|
49
|
+
- [Deployment](#deployment)
|
50
|
+
- [Capistrano](#capistrano)
|
51
|
+
- [Docker](#docker)
|
52
|
+
- [Sidekiq Web UI](#sidekiq-web-ui)
|
53
|
+
- [Concurrency](#concurrency)
|
54
|
+
- [Integrating with other frameworks](#integrating-with-other-frameworks)
|
55
|
+
- [Integrating with Ruby on Rails](#integrating-with-ruby-on-rails)
|
56
|
+
- [Integrating with Sinatra](#integrating-with-sinatra)
|
57
|
+
- [Articles and other references](#articles-and-other-references)
|
58
|
+
- [Libraries and components](#libraries-and-components)
|
59
|
+
- [Articles and references](#articles-and-references)
|
60
|
+
- [Note on Patches/Pull Requests](#note-on-patchespull-requests)
|
61
|
+
|
62
|
+
## How does it work
|
63
|
+
|
64
|
+
Karafka provides a higher-level abstraction than raw Kafka Ruby drivers, such as Kafka-Ruby and Poseidon. Instead of focusing on single topic consumption, it provides developers with a set of tools that are dedicated for building multi-topic applications similarly to how Rails applications are being built.
|
65
|
+
|
66
|
+
## Support
|
67
|
+
|
68
|
+
If you have any questions about using Karafka, feel free to join our [Gitter](https://gitter.im/karafka/karafka) chat channel.
|
69
|
+
|
70
|
+
## Requirements
|
71
|
+
|
72
|
+
In order to use Karafka framework, you need to have:
|
73
|
+
|
74
|
+
- Zookeeper (required by Kafka)
|
75
|
+
- Kafka (at least 0.9.0)
|
76
|
+
- Ruby (at least 2.3.0)
|
77
|
+
|
78
|
+
## Installation
|
79
|
+
|
80
|
+
Karafka does not have a full installation shell command. In order to install it, please follow given steps:
|
81
|
+
|
82
|
+
Create a directory for your project:
|
83
|
+
|
84
|
+
```bash
|
85
|
+
mkdir app_dir
|
86
|
+
cd app_dir
|
87
|
+
```
|
88
|
+
|
89
|
+
Create a **Gemfile** with Karafka:
|
90
|
+
|
91
|
+
```ruby
|
92
|
+
source 'https://rubygems.org'
|
93
|
+
|
94
|
+
gem 'karafka', github: 'karafka/karafka'
|
95
|
+
```
|
96
|
+
|
97
|
+
and run Karafka install CLI task:
|
98
|
+
|
99
|
+
```
|
100
|
+
bundle exec karafka install
|
101
|
+
```
|
102
|
+
|
103
|
+
## Setup
|
104
|
+
|
105
|
+
### Application
|
106
|
+
Karafka has following configuration options:
|
107
|
+
|
108
|
+
| Option | Required | Value type | Description |
|
109
|
+
|------------------------|----------|-------------------|---------------------------------------------------------------------------------------------|
|
110
|
+
| name | true | String | Application name |
|
111
|
+
| redis | true | Hash | Hash with Redis configuration options |
|
112
|
+
| monitor | false | Object | Monitor instance (defaults to Karafka::Monitor) |
|
113
|
+
| logger | false | Object | Logger instance (defaults to Karafka::Logger) |
|
114
|
+
| kafka.hosts | false | Array<String> | Kafka server hosts. If 1 provided, Karafka will discover cluster structure automatically |
|
115
|
+
|
116
|
+
To apply this configuration, you need to use a *setup* method from the Karafka::App class (app.rb):
|
117
|
+
|
118
|
+
```ruby
|
119
|
+
class App < Karafka::App
|
120
|
+
setup do |config|
|
121
|
+
config.kafka.hosts = %w( 127.0.0.1:9092 )
|
122
|
+
config.redis = {
|
123
|
+
url: 'redis://redis.example.com:7372/1'
|
124
|
+
}
|
125
|
+
config.name = 'my_application'
|
126
|
+
config.logger = MyCustomLogger.new # not required
|
127
|
+
end
|
128
|
+
end
|
129
|
+
```
|
130
|
+
|
131
|
+
Note: You can use any library like [Settingslogic](https://github.com/binarylogic/settingslogic) to handle your application configuration.
|
132
|
+
|
133
|
+
### Configurators
|
134
|
+
|
135
|
+
If you want to do some configurations after all of this is done, please add to config directory a proper file (needs to inherit from Karafka::Config::Base and implement setup method), after that everything will happen automatically.
|
136
|
+
|
137
|
+
Example configuration class:
|
138
|
+
|
139
|
+
```ruby
|
140
|
+
class ExampleConfigurator < Base
|
141
|
+
def setup
|
142
|
+
ExampleClass.logger = Karafka.logger
|
143
|
+
ExampleClass.redis = config.redis
|
144
|
+
end
|
145
|
+
end
|
146
|
+
```
|
147
|
+
|
148
|
+
### Environment variables settings
|
149
|
+
|
150
|
+
There are several env settings you can use:
|
151
|
+
|
152
|
+
| ENV name | Default | Description |
|
153
|
+
|-------------------|-----------------|-------------------------------------------------------------------------------|
|
154
|
+
| KARAFKA_ENV | development | In what mode this application should boot (production/development/test/etc) |
|
155
|
+
| KARAFKA_BOOT_FILE | app_root/app.rb | Path to a file that contains Karafka app configuration and booting procedures |
|
156
|
+
|
157
|
+
### Kafka brokers auto-discovery
|
158
|
+
|
159
|
+
Karafka supports Kafka brokers auto-discovery during startup and on failures. You need to provide at least one Kafka broker, from which the entire Kafka cluster will be discovered. Karafka will refresh list of available brokers if something goes wrong. This allows it to be aware of changes that happen in the infrastructure (adding and removing nodes).
|
160
|
+
|
161
|
+
## Usage
|
162
|
+
|
163
|
+
### Karafka CLI
|
164
|
+
|
165
|
+
Karafka has a simple CLI built in. It provides following commands:
|
166
|
+
|
167
|
+
| Command | Description |
|
168
|
+
|----------------|---------------------------------------------------------------------------|
|
169
|
+
| help [COMMAND] | Describe available commands or one specific command |
|
170
|
+
| console | Start the Karafka console (short-cut alias: "c") |
|
171
|
+
| flow | Print application data flow (incoming => outgoing) |
|
172
|
+
| info | Print configuration details and other options of your application |
|
173
|
+
| install | Installs all required things for Karafka application in current directory |
|
174
|
+
| routes | Print out all defined routes in alphabetical order |
|
175
|
+
| server | Start the Karafka server (short-cut alias: "s") |
|
176
|
+
| worker | Start the Karafka Sidekiq worker (short-cut alias: "w") |
|
177
|
+
|
178
|
+
All the commands are executed the same way:
|
179
|
+
|
180
|
+
```
|
181
|
+
bundle exec karafka [COMMAND]
|
182
|
+
```
|
183
|
+
|
184
|
+
If you need more details about each of the CLI commands, you can execute following command:
|
185
|
+
|
186
|
+
```
|
187
|
+
bundle exec karafka help [COMMAND]
|
188
|
+
```
|
189
|
+
|
190
|
+
### Routing
|
191
|
+
|
192
|
+
Routing engine provides an interface to describe how messages from all the topics should be handled. To start using it, just use the *draw* method on routes:
|
193
|
+
|
194
|
+
```ruby
|
195
|
+
App.routes.draw do
|
196
|
+
topic :example do
|
197
|
+
controller ExampleController
|
198
|
+
end
|
199
|
+
end
|
200
|
+
```
|
201
|
+
|
202
|
+
The basic route description requires providing *topic* and *controller* that should handle it (Karafka will create a separate controller instance for each request).
|
203
|
+
|
204
|
+
There are also several other methods available (optional):
|
205
|
+
|
206
|
+
- *group* - symbol/string with a group name. Groups are used to cluster applications
|
207
|
+
- *worker* - Class name - name of a worker class that we want to use to schedule perform code
|
208
|
+
- *parser* - Class name - name of a parser class that we want to use to parse incoming data
|
209
|
+
- *interchanger* - Class name - name of a interchanger class that we want to use to format data that we put/fetch into/from *#perform_async*
|
210
|
+
- *responder* - Class name - name of a responder that we want to use to generate responses to other Kafka topics based on our processed data
|
211
|
+
|
212
|
+
```ruby
|
213
|
+
App.routes.draw do
|
214
|
+
topic :binary_video_details do
|
215
|
+
group :composed_application
|
216
|
+
controller Videos::DetailsController
|
217
|
+
worker Workers::DetailsWorker
|
218
|
+
parser Parsers::BinaryToJson
|
219
|
+
interchanger Interchangers::Binary
|
220
|
+
responder BinaryVideoProcessingResponder
|
221
|
+
end
|
222
|
+
|
223
|
+
topic :new_videos do
|
224
|
+
controller Videos::NewVideosController
|
225
|
+
end
|
226
|
+
end
|
227
|
+
```
|
228
|
+
|
229
|
+
See description below for more details on each of them.
|
230
|
+
|
231
|
+
##### Topic
|
232
|
+
|
233
|
+
- *topic* - symbol/string with a topic that we want to route
|
234
|
+
|
235
|
+
```ruby
|
236
|
+
topic :incoming_messages do
|
237
|
+
# Details about how to handle this topic should go here
|
238
|
+
end
|
239
|
+
```
|
240
|
+
|
241
|
+
Topic is the root point of each route. Keep in mind that:
|
242
|
+
|
243
|
+
- All topic names must be unique in a single Karafka application
|
244
|
+
- Topics names are being validated because Kafka does not accept some characters
|
245
|
+
- If you don't specify a group, it will be built based on the topic and application name
|
246
|
+
|
247
|
+
##### Group
|
248
|
+
|
249
|
+
- *group* - symbol/string with a group name. Groups are used to cluster applications
|
250
|
+
|
251
|
+
Optionally you can use **group** method to define group for this topic. Use it if you want to build many applications that will share the same Kafka group. Otherwise it will just build it based on the **topic** and application name. If you're not planning to build applications that will load-balance messages between many different applications (but between one applications many processes), you may want not to define it and allow the framework to define it for you.
|
252
|
+
|
253
|
+
```ruby
|
254
|
+
topic :incoming_messages do
|
255
|
+
group :load_balanced_group
|
256
|
+
controller MessagesController
|
257
|
+
end
|
258
|
+
```
|
259
|
+
|
260
|
+
Note that a single group can be used only in a single topic.
|
261
|
+
|
262
|
+
##### Worker
|
263
|
+
|
264
|
+
- *worker* - Class name - name of a worker class that we want to use to schedule perform code
|
265
|
+
|
266
|
+
Karafka by default will build a worker that will correspond to each of your controllers (so you will have a pair - controller and a worker). All of them will inherit from **ApplicationWorker** and will share all its settings.
|
267
|
+
|
268
|
+
To run Sidekiq you should have sidekiq.yml file in *config* folder. The example of sidekiq.yml file will be generated to config/sidekiq.yml.example once you run **bundle exec karafka install**.
|
269
|
+
|
270
|
+
However, if you want to use a raw Sidekiq worker (without any Karafka additional magic), or you want to use SidekiqPro (or any other queuing engine that has the same API as Sidekiq), you can assign your own custom worker:
|
271
|
+
|
272
|
+
```ruby
|
273
|
+
topic :incoming_messages do
|
274
|
+
controller MessagesController
|
275
|
+
worker MyCustomController
|
276
|
+
end
|
277
|
+
```
|
278
|
+
|
279
|
+
Note that even then, you need to specify a controller that will schedule a background task.
|
280
|
+
|
281
|
+
Custom workers need to provide a **#perform_async** method. It needs to accept two arguments:
|
282
|
+
|
283
|
+
- *topic* - first argument is a current topic from which a given message comes
|
284
|
+
- *params* - all the params that came from Kafka + additional metadata. This data format might be changed if you use custom interchangers. Otherwise it will be an instance of Karafka::Params::Params.
|
285
|
+
|
286
|
+
Keep in mind, that params might be in two states: parsed or unparsed when passed to #perform_async. This means, that if you use custom interchangers and/or custom workers, you might want to look into Karafka's sources to see exactly how it works.
|
287
|
+
|
288
|
+
##### Parser
|
289
|
+
|
290
|
+
- *parser* - Class name - name of a parser class that we want to use to parse incoming data
|
291
|
+
|
292
|
+
Karafka by default will parse messages with a JSON parser. If you want to change this behaviour you need to set custom parser for each route. Parser needs to have a #parse method and raise error that is a ::Karafka::Errors::ParserError descendant when problem appears during parsing process.
|
293
|
+
|
294
|
+
```ruby
|
295
|
+
class XmlParser
|
296
|
+
class ParserError < ::Karafka::Errors::ParserError; end
|
297
|
+
|
298
|
+
def self.parse(message)
|
299
|
+
Hash.from_xml(message)
|
300
|
+
rescue REXML::ParseException
|
301
|
+
raise ParserError
|
302
|
+
end
|
303
|
+
end
|
304
|
+
|
305
|
+
App.routes.draw do
|
306
|
+
topic :binary_video_details do
|
307
|
+
controller Videos::DetailsController
|
308
|
+
parser XmlParser
|
309
|
+
end
|
310
|
+
end
|
311
|
+
```
|
312
|
+
|
313
|
+
Note that parsing failure won't stop the application flow. Instead, Karafka will assign the raw message inside the :message key of params. That way you can handle raw message inside the Sidekiq worker (you can implement error detection, etc - any "heavy" parsing logic can and should be implemented there).
|
314
|
+
|
315
|
+
##### Interchanger
|
316
|
+
|
317
|
+
- *interchanger* - Class name - name of a interchanger class that we want to use to format data that we put/fetch into/from #perform_async.
|
318
|
+
|
319
|
+
Custom interchangers target issues with non-standard (binary, etc) data that we want to store when we do #perform_async. This data might be corrupted when fetched in a worker (see [this](https://github.com/karafka/karafka/issues/30) issue). With custom interchangers, you can encode/compress data before it is being passed to scheduling and decode/decompress it when it gets into the worker.
|
320
|
+
|
321
|
+
**Warning**: if you decide to use slow interchangers, they might significantly slow down Karafka.
|
322
|
+
|
323
|
+
```ruby
|
324
|
+
class Base64Interchanger
|
325
|
+
class << self
|
326
|
+
def load(params)
|
327
|
+
Base64.encode64(Marshal.dump(params))
|
328
|
+
end
|
329
|
+
|
330
|
+
def parse(params)
|
331
|
+
Marshal.load(Base64.decode64(params))
|
332
|
+
end
|
333
|
+
end
|
334
|
+
end
|
335
|
+
|
336
|
+
topic :binary_video_details do
|
337
|
+
controller Videos::DetailsController
|
338
|
+
interchanger Base64Interchanger
|
339
|
+
end
|
340
|
+
```
|
341
|
+
|
342
|
+
##### Responder
|
343
|
+
|
344
|
+
- *responder* - Class name - name of a responder that we want to use to generate responses to other Kafka topics based on our processed data.
|
345
|
+
|
346
|
+
Responders are used to design the response that should be generated and sent to proper Kafka topics, once processing is done. It allows programmers to build not only data-consuming apps, but to build apps that consume data and, then, based on the business logic output send this processed data onwards (similary to how Bash pipelines work).
|
347
|
+
|
348
|
+
```ruby
|
349
|
+
class Responder < ApplicationResponder
|
350
|
+
topic :users_created
|
351
|
+
topic :profiles_created
|
352
|
+
|
353
|
+
def respond(user, profile)
|
354
|
+
respond_to :users_created, user
|
355
|
+
respond_to :profiles_created, profile
|
356
|
+
end
|
357
|
+
end
|
358
|
+
```
|
359
|
+
|
360
|
+
For more details about responders, please go to the [using responders](#using-responders) section.
|
361
|
+
|
362
|
+
### Receiving messages
|
363
|
+
|
364
|
+
Karafka framework has a long running server process that is responsible for receiving messages.
|
365
|
+
|
366
|
+
To start Karafka server process, use the following CLI command:
|
367
|
+
|
368
|
+
```bash
|
369
|
+
bundle exec karafka server
|
370
|
+
```
|
371
|
+
|
372
|
+
Karafka server can be daemonized with the **--daemon** flag:
|
373
|
+
|
374
|
+
```
|
375
|
+
bundle exec karafka server --daemon
|
376
|
+
```
|
377
|
+
|
378
|
+
#### Processing messages directly (without Sidekiq)
|
379
|
+
|
380
|
+
If you don't want to use Sidekiq for processing and you would rather process messages directly in the main Karafka server process, you can do that using the *before_enqueue* callback inside of controller:
|
381
|
+
|
382
|
+
```ruby
|
383
|
+
class UsersController < ApplicationController
|
384
|
+
before_enqueue :perform_directly
|
385
|
+
|
386
|
+
# By throwing abort signal, Karafka will not schedule a background #perform task.
|
387
|
+
def perform_directly
|
388
|
+
User.create(params[:user])
|
389
|
+
throw(:abort)
|
390
|
+
end
|
391
|
+
end
|
392
|
+
```
|
393
|
+
|
394
|
+
Note: it can slow Karafka significantly if you do heavy stuff that way.
|
395
|
+
|
396
|
+
### Sending messages from Karafka
|
397
|
+
|
398
|
+
It's quite common when using Kafka, to treat applications as parts of a bigger pipeline (similary to Bash pipeline) and forward processing results to other applications. Karafka provides two ways of dealing with that:
|
399
|
+
|
400
|
+
- Using responders
|
401
|
+
- Using Waterdrop directly
|
402
|
+
|
403
|
+
Each of them has it's own advantages and disadvantages and it strongly depends on your application business logic which one will be better. The recommended (and way more elegant) way is to use responders for that.
|
404
|
+
|
405
|
+
#### Using responders (recommended)
|
406
|
+
|
407
|
+
One of the main differences when you respond to a Kafka message instead of a HTTP response, is that the response can be sent to many topics (instead of one HTTP response per one request) and that the data that is being sent can be different for different topics. That's why a simple **respond_to** would not be enough.
|
408
|
+
|
409
|
+
In order to go beyond this limitation, Karafka uses responder objects that are responsible for sending data to other Kafka topics.
|
410
|
+
|
411
|
+
By default, if you name a responder with the same name as a controller, it will be detected automatically:
|
412
|
+
|
413
|
+
```ruby
|
414
|
+
module Users
|
415
|
+
class CreateController < ApplicationController
|
416
|
+
def perform
|
417
|
+
# You can provide as many objects as you want to respond_with as long as a responders
|
418
|
+
# #respond method accepts the same amount
|
419
|
+
respond_with User.create(params[:user])
|
420
|
+
end
|
421
|
+
end
|
422
|
+
|
423
|
+
class CreateResponder < ApplicationResponder
|
424
|
+
topic :user_created
|
425
|
+
|
426
|
+
def respond(user)
|
427
|
+
respond_to :user_created, user
|
428
|
+
end
|
429
|
+
end
|
430
|
+
end
|
431
|
+
```
|
432
|
+
|
433
|
+
Appropriate responder will be used automatically when you invoke the **respond_with** controller method.
|
434
|
+
|
435
|
+
Why did we separate response layer from the controller layer? Because sometimes when you respond to multiple topics conditionally, that logic can be really complex and it is way better to manage and test it in isolation.
|
436
|
+
|
437
|
+
For more details about responders DSL, please visit the [responders](#responders) section.
|
438
|
+
|
439
|
+
#### Using WaterDrop directly
|
440
|
+
|
441
|
+
It is not recommended (as it breaks responders validations and makes it harder to track data flow), but if you want to send messages outside of Karafka responders, you can to use **waterdrop** gem directly.
|
442
|
+
|
443
|
+
Example usage:
|
444
|
+
|
445
|
+
```ruby
|
446
|
+
message = WaterDrop::Message.new('topic', 'message')
|
447
|
+
message.send!
|
448
|
+
|
449
|
+
message = WaterDrop::Message.new('topic', { user_id: 1 }.to_json)
|
450
|
+
message.send!
|
451
|
+
```
|
452
|
+
|
453
|
+
Please follow [WaterDrop README](https://github.com/karafka/waterdrop/blob/master/README.md) for more details on how to use it.
|
454
|
+
|
455
|
+
|
456
|
+
## Important components
|
457
|
+
|
458
|
+
Apart from the internal implementation, Karafka is combined from the following components programmers mostly will work with:
|
459
|
+
|
460
|
+
- Controllers - objects that are responsible for processing incoming messages (similar to Rails controllers)
|
461
|
+
- Responders - objects that are responsible for sending responses based on the processed data
|
462
|
+
- Workers - objects that execute data processing using Sidekiq backend
|
463
|
+
|
464
|
+
### Controllers
|
465
|
+
|
466
|
+
Controllers should inherit from **ApplicationController** (or any other controller that inherits from **Karafka::BaseController**). If you don't want to use custom workers (and except some particular cases you don't need to), you need to define a **#perform** method that will execute your business logic code in background.
|
467
|
+
|
468
|
+
```ruby
|
469
|
+
class UsersController < ApplicationController
|
470
|
+
# Method execution will be enqueued in Sidekiq
|
471
|
+
# Karafka will schedule automatically a proper job and execute this logic in the background
|
472
|
+
def perform
|
473
|
+
User.create(params[:user])
|
474
|
+
end
|
475
|
+
end
|
476
|
+
```
|
477
|
+
|
478
|
+
#### Controllers callbacks
|
479
|
+
|
480
|
+
You can add any number of *before_enqueue* callbacks. It can be method or block.
|
481
|
+
before_enqueue acts in a similar way to Rails before_action so it should perform "lightweight" operations. You have access to params inside. Based on it you can define which data you want to receive and which not.
|
482
|
+
|
483
|
+
**Warning**: keep in mind, that all *before_enqueue* blocks/methods are executed after messages are received. This is not executed in Sidekiq, but right after receiving the incoming message. This means, that if you perform "heavy duty" operations there, Karafka might significantly slow down.
|
484
|
+
|
485
|
+
If any of callbacks throws :abort - *perform* method will be not enqueued to the worker (the execution chain will stop).
|
486
|
+
|
487
|
+
Once you run consumer - messages from Kafka server will be send to a proper controller (based on topic name).
|
488
|
+
|
489
|
+
Presented example controller will accept incoming messages from a Kafka topic named :karafka_topic
|
490
|
+
|
491
|
+
```ruby
|
492
|
+
class TestController < ApplicationController
|
493
|
+
# before_enqueue has access to received params.
|
494
|
+
# You can modify them before enqueue it to sidekiq queue.
|
495
|
+
before_enqueue {
|
496
|
+
params.merge!(received_time: Time.now.to_s)
|
497
|
+
}
|
498
|
+
|
499
|
+
before_enqueue :validate_params
|
500
|
+
|
501
|
+
# Method execution will be enqueued in Sidekiq.
|
502
|
+
def perform
|
503
|
+
Service.new.add_to_queue(params[:message])
|
504
|
+
end
|
505
|
+
|
506
|
+
# Define this method if you want to use Sidekiq reentrancy.
|
507
|
+
# Logic to do if Sidekiq worker fails (because of exception, timeout, etc)
|
508
|
+
def after_failure
|
509
|
+
Service.new.remove_from_queue(params[:message])
|
510
|
+
end
|
511
|
+
|
512
|
+
private
|
513
|
+
|
514
|
+
# We will not enqueue to sidekiq those messages, which were sent
|
515
|
+
# from sum method and return too high message for our purpose.
|
516
|
+
def validate_params
|
517
|
+
throw(:abort) unless params['message'].to_i > 50 && params['method'] != 'sum'
|
518
|
+
end
|
519
|
+
end
|
520
|
+
```
|
521
|
+
|
522
|
+
### Responders
|
523
|
+
|
524
|
+
Responders are used to design and control response flow that comes from a single controller action. You might be familiar with a #respond_with Rails controller method. In Karafka it is an entrypoint to a responder *#respond*.
|
525
|
+
|
526
|
+
Having a responders layer helps you prevent bugs when you design a receive-respond applications that handle multiple incoming and outgoing topics. Responders also provide a security layer that allows you to control that the flow is as you intended. It will raise an exception if you didn't respond to all the topics that you wanted to respond to.
|
527
|
+
|
528
|
+
Here's a simple responder example:
|
529
|
+
|
530
|
+
```ruby
|
531
|
+
class ExampleResponder < ApplicationResponder
|
532
|
+
topic :users_notified
|
533
|
+
|
534
|
+
def respond(user)
|
535
|
+
respond_to :users_notified, user
|
536
|
+
end
|
537
|
+
end
|
538
|
+
```
|
539
|
+
|
540
|
+
Note: You can use responders outside of controllers scope, however it is not recommended because then, they won't be listed when executing **karafka flow** CLI command.
|
541
|
+
|
542
|
+
#### Registering topics
|
543
|
+
|
544
|
+
In order to maintain order in topics organization, before you can send data to a given topic, you need to register it. To do that, just execute *#topic* method with a topic name and optional settings during responder initialization:
|
545
|
+
|
546
|
+
```ruby
|
547
|
+
class ExampleResponder < ApplicationResponder
|
548
|
+
topic :regular_topic
|
549
|
+
topic :optional_topic, required: false
|
550
|
+
topic :multiple_use_topic, multiple_usage: true
|
551
|
+
end
|
552
|
+
```
|
553
|
+
|
554
|
+
*#topic* method accepts following settings:
|
555
|
+
|
556
|
+
| Option | Type | Default | Description |
|
557
|
+
|----------------|---------|---------|------------------------------------------------------------------------------------------------------------|
|
558
|
+
| required | Boolean | true | Should we raise an error when a topic was not used (if required) |
|
559
|
+
| multiple_usage | Boolean | false | Should we raise an error when during a single response flow we sent more than one message to a given topic |
|
560
|
+
|
561
|
+
#### Responding on topics
|
562
|
+
|
563
|
+
When you receive a single HTTP request, you generate a single HTTP response. This logic does not apply to Karafka. You can respond on as many topics as you want (or on none).
|
564
|
+
|
565
|
+
To handle responding, you need to define *#respond* instance method. This method should accept the same amount of arguments passed into *#respond_with* method.
|
566
|
+
|
567
|
+
In order to send a message to a given topic, you have to use *#respond_to* method that accepts two arguments:
|
568
|
+
|
569
|
+
- topic name (Symbol)
|
570
|
+
- data you want to send (if data is not string, responder will try to run #to_json method on the incoming data)
|
571
|
+
|
572
|
+
```ruby
|
573
|
+
# respond_with user, profile
|
574
|
+
|
575
|
+
class ExampleResponder < ApplicationResponder
|
576
|
+
topic :regular_topic
|
577
|
+
topic :optional_topic, required: false
|
578
|
+
|
579
|
+
def respond(user, profile)
|
580
|
+
respond_to :regular_topic, user
|
581
|
+
|
582
|
+
if user.registered?
|
583
|
+
respond_to :optional_topic, profile
|
584
|
+
end
|
585
|
+
end
|
586
|
+
end
|
587
|
+
```
|
588
|
+
|
589
|
+
#### Response validation
|
590
|
+
|
591
|
+
In order to ensure the dataflow is as intended, responder will validate what and where was sent, making sure that:
|
592
|
+
|
593
|
+
- Only topics that were registered were used (no typos, etc)
|
594
|
+
- Only a single message was sent to a topic that was registered without a **multiple_usage** flag
|
595
|
+
- Any topic that was registered with **required** flag (default behavior) has been used
|
596
|
+
|
597
|
+
This is an automatic process and does not require any triggers.
|
598
|
+
|
599
|
+
## Monitoring and logging
|
600
|
+
|
601
|
+
Karafka provides a simple monitor (Karafka::Monitor) with a really small API. You can use it to develop your own monitoring system (using for example NewRelic). By default, the only thing that is hooked up to this monitoring is a Karafka logger (Karafka::Logger). It is based on a standard [Ruby logger](http://ruby-doc.org/stdlib-2.2.3/libdoc/logger/rdoc/Logger.html).
|
602
|
+
|
603
|
+
To change monitor or a logger assign new logger/monitor during setup:
|
604
|
+
|
605
|
+
```ruby
|
606
|
+
class App < Karafka::App
|
607
|
+
setup do |config|
|
608
|
+
# Other setup stuff...
|
609
|
+
config.logger = MyCustomLogger.new
|
610
|
+
config.monitor = CustomMonitor.new
|
611
|
+
end
|
612
|
+
end
|
613
|
+
```
|
614
|
+
|
615
|
+
Keep in mind, that if you replace monitor with a custom one, you will have to implement logging as well. It is because monitoring is used for both monitoring and logging and a default monitor handles logging as well.
|
616
|
+
|
617
|
+
### Example monitor with Errbit/Airbrake support
|
618
|
+
|
619
|
+
Here's a simple example of monitor that is used to handle errors logging into Airbrake/Errbit.
|
620
|
+
|
621
|
+
```ruby
|
622
|
+
class AppMonitor < Karafka::Monitor
|
623
|
+
def notice_error(caller_class, e)
|
624
|
+
super
|
625
|
+
Airbrake.notify_or_ignore(e)
|
626
|
+
end
|
627
|
+
end
|
628
|
+
```
|
629
|
+
|
630
|
+
### Example monitor with NewRelic support
|
631
|
+
|
632
|
+
Here's a simple example of monitor that is used to handle events and errors logging into NewRelic. It will send metrics with information about amount of processed messages per topic and how many of them were scheduled to be performed async.
|
633
|
+
|
634
|
+
```ruby
|
635
|
+
# NewRelic example monitor for Karafka
|
636
|
+
class AppMonitor < Karafka::Monitor
|
637
|
+
# @param [Class] caller class for this notice
|
638
|
+
# @param [Hash] hash with options for this notice
|
639
|
+
def notice(caller_class, options = {})
|
640
|
+
# Use default Karafka monitor logging
|
641
|
+
super
|
642
|
+
# Handle differently proper actions that we want to monit with NewRelic
|
643
|
+
return unless respond_to?(caller_label, true)
|
644
|
+
send(caller_label, options[:topic])
|
645
|
+
end
|
646
|
+
|
647
|
+
# @param [Class] caller class for this notice error
|
648
|
+
# @param e [Exception] error that happened
|
649
|
+
def notice_error(caller_class, e)
|
650
|
+
super
|
651
|
+
NewRelic::Agent.notice_error(e)
|
652
|
+
end
|
653
|
+
|
654
|
+
private
|
655
|
+
|
656
|
+
# Log that message for a given topic was consumed
|
657
|
+
# @param topic [String] topic name
|
658
|
+
def consume(topic)
|
659
|
+
record_count metric_key(topic, __method__)
|
660
|
+
end
|
661
|
+
|
662
|
+
# Log that message for topic was scheduled to be performed async
|
663
|
+
# @param topic [String] topic name
|
664
|
+
def perform_async(topic)
|
665
|
+
record_count metric_key(topic, __method__)
|
666
|
+
end
|
667
|
+
|
668
|
+
# Log that message for topic was performed async
|
669
|
+
# @param topic [String] topic name
|
670
|
+
def perform(topic)
|
671
|
+
record_count metric_key(topic, __method__)
|
672
|
+
end
|
673
|
+
|
674
|
+
# @param topic [String] topic name
|
675
|
+
# @param action [String] action that we want to log (consume/perform_async/perform)
|
676
|
+
# @return [String] a proper metric key for NewRelic
|
677
|
+
# @example
|
678
|
+
# metric_key('videos', 'perform_async') #=> 'Custom/videos/perform_async'
|
679
|
+
def metric_key(topic, action)
|
680
|
+
"Custom/#{topic}/#{action}"
|
681
|
+
end
|
682
|
+
|
683
|
+
# Records occurence of a given event
|
684
|
+
# @param [String] key under which we want to log
|
685
|
+
def record_count(key)
|
686
|
+
NewRelic::Agent.record_metric(key, count: 1)
|
687
|
+
end
|
688
|
+
end
|
689
|
+
```
|
690
|
+
|
691
|
+
## Deployment
|
692
|
+
|
693
|
+
Karafka is currently being used in production with following deployment methods:
|
694
|
+
|
695
|
+
- Capistrano
|
696
|
+
- Docker
|
697
|
+
|
698
|
+
Since the only thing that is long-running is Karafka server, it should't be hard to make it work with other deployment and CD tools.
|
699
|
+
|
700
|
+
### Capistrano
|
701
|
+
|
702
|
+
Use the built-in Capistrano recipe for easy Karafka server start/stop and restart with deploys.
|
703
|
+
|
704
|
+
In your **Capfile** file:
|
705
|
+
|
706
|
+
```ruby
|
707
|
+
require 'karafka/capistrano'
|
708
|
+
```
|
709
|
+
|
710
|
+
Take a look at the [load:defaults task](https://github.com/karafka/karafka/blob/master/lib/karafka/capistrano/karafka.cap) (top of file) for options you can set. For example, to specify a different pidfile than default:
|
711
|
+
|
712
|
+
```ruby
|
713
|
+
set :karafka_pid, ->{ File.join(shared_path, 'tmp', 'pids', 'karafka0') }
|
714
|
+
```
|
715
|
+
|
716
|
+
### Docker
|
717
|
+
|
718
|
+
Karafka can be dockerized as any other Ruby/Rails app. To execute **karafka server** command in your Docker container, just put this into your Dockerfile:
|
719
|
+
|
720
|
+
```bash
|
721
|
+
ENV KARAFKA_ENV production
|
722
|
+
CMD bundle exec karafka server
|
723
|
+
```
|
724
|
+
|
725
|
+
## Sidekiq Web UI
|
726
|
+
|
727
|
+
Karafka comes with a Sidekiq Web UI application that can display the current state of a Sidekiq installation. If you installed Karafka based on the install instructions, you will have a **config.ru** file that allows you to run standalone Puma instance with a Sidekiq Web UI.
|
728
|
+
|
729
|
+
To be able to use it (since Karafka does not depend on Puma and Sinatra) add both of them into your Gemfile:
|
730
|
+
|
731
|
+
```ruby
|
732
|
+
gem 'puma'
|
733
|
+
gem 'sinatra'
|
734
|
+
```
|
735
|
+
|
736
|
+
bundle and run:
|
737
|
+
|
738
|
+
```
|
739
|
+
bundle exec rackup
|
740
|
+
# Puma starting...
|
741
|
+
# * Min threads: 0, max threads: 16
|
742
|
+
# * Environment: development
|
743
|
+
# * Listening on tcp://localhost:9292
|
744
|
+
```
|
745
|
+
|
746
|
+
You can then navigate to displayer url to check your Sidekiq status. Sidekiq Web UI by default is password protected. To check (or change) your login and password, please review **config.ru** file in your application.
|
747
|
+
|
748
|
+
## Concurrency
|
749
|
+
|
750
|
+
Karafka uses [Celluloid](https://celluloid.io/) actors to handle listening to incoming connections. Since each topic and group requires a separate connection (which means that we have a connection per controller) we do this concurrently. It means, that for each route, you will have one additional thread running.
|
751
|
+
|
752
|
+
## Integrating with other frameworks
|
753
|
+
|
754
|
+
Want to use Karafka with Ruby on Rails or Sinatra? It can be done!
|
755
|
+
|
756
|
+
### Integrating with Ruby on Rails
|
757
|
+
|
758
|
+
Add Karafka to your Ruby on Rails application Gemfile:
|
759
|
+
|
760
|
+
```ruby
|
761
|
+
gem 'karafka', github: 'karafka/karafka'
|
762
|
+
```
|
763
|
+
|
764
|
+
Copy the **app.rb** file from your Karafka application into your Rails app (if you don't have this file, just create an empty Karafka app and copy it). This file is responsible for booting up Karafka framework. To make it work with Ruby on Rails, you need to load whole Rails application in this file. To do so, replace:
|
765
|
+
|
766
|
+
```ruby
|
767
|
+
ENV['RACK_ENV'] ||= 'development'
|
768
|
+
ENV['KARAFKA_ENV'] = ENV['RACK_ENV']
|
769
|
+
|
770
|
+
Bundler.require(:default, ENV['KARAFKA_ENV'])
|
771
|
+
```
|
772
|
+
|
773
|
+
with
|
774
|
+
|
775
|
+
```ruby
|
776
|
+
ENV['RAILS_ENV'] ||= 'development'
|
777
|
+
ENV['KARAFKA_ENV'] = ENV['RAILS_ENV']
|
778
|
+
|
779
|
+
require ::File.expand_path('../config/environment', __FILE__)
|
780
|
+
Rails.application.eager_load!
|
781
|
+
```
|
782
|
+
|
783
|
+
and you are ready to go!
|
784
|
+
|
785
|
+
### Integrating with Sinatra
|
786
|
+
|
787
|
+
Sinatra applications differ from one another. There are single file applications and apps with similar to Rails structure. That's why we cannot provide a simple single tutorial. Here are some guidelines that you should follow in order to integrate it with Sinatra based application:
|
788
|
+
|
789
|
+
Add Karafka to your Sinatra application Gemfile:
|
790
|
+
|
791
|
+
```ruby
|
792
|
+
gem 'karafka', github: 'karafka/karafka'
|
793
|
+
```
|
794
|
+
|
795
|
+
After that make sure that whole your application is loaded before setting up and booting Karafka (see Ruby on Rails integration for more details about that).
|
796
|
+
|
797
|
+
## Articles and other references
|
798
|
+
|
799
|
+
### Libraries and components
|
800
|
+
|
801
|
+
* [Karafka framework](https://github.com/karafka/karafka)
|
802
|
+
* [Waterdrop](https://github.com/karafka/waterdrop)
|
803
|
+
* [Worker Glass](https://github.com/karafka/worker-glass)
|
804
|
+
* [Envlogic](https://github.com/karafka/envlogic)
|
805
|
+
* [Apache Kafka](http://kafka.apache.org/)
|
806
|
+
* [Apache ZooKeeper](https://zookeeper.apache.org/)
|
807
|
+
* [Ruby-Kafka](https://github.com/zendesk/ruby-kafka)
|
808
|
+
|
809
|
+
### Articles and references
|
810
|
+
|
811
|
+
* [Karafka – Ruby micro-framework for building Apache Kafka message-based applications](http://dev.mensfeld.pl/2015/08/karafka-ruby-micro-framework-for-building-apache-kafka-message-based-applications/)
|
812
|
+
* [Benchmarking Karafka – how does it handle multiple TCP connections](http://dev.mensfeld.pl/2015/11/benchmarking-karafka-how-does-it-handle-multiple-tcp-connections/)
|
813
|
+
* [Karafka – Ruby framework for building Kafka message based applications (presentation)](http://mensfeld.github.io/karafka-framework-introduction/)
|
814
|
+
* [Karafka example application](https://github.com/karafka/karafka-example-app)
|
815
|
+
* [Karafka Travis CI](https://travis-ci.org/karafka/karafka)
|
816
|
+
* [Karafka Code Climate](https://codeclimate.com/github/karafka/karafka)
|
817
|
+
|
818
|
+
## Note on Patches/Pull Requests
|
819
|
+
|
820
|
+
Fork the project.
|
821
|
+
Make your feature addition or bug fix.
|
822
|
+
Add tests for it. This is important so I don't break it in a future version unintentionally.
|
823
|
+
Commit, do not mess with Rakefile, version, or history. (if you want to have your own version, that is fine but bump version in a commit by itself I can ignore when I pull). Send me a pull request. Bonus points for topic branches.
|
824
|
+
|
825
|
+
Each pull request must pass our quality requirements. To check if everything is as it should be, we use [PolishGeeks Dev Tools](https://github.com/polishgeeks/polishgeeks-dev-tools) that combine multiple linters and code analyzers. Please run:
|
826
|
+
|
827
|
+
```bash
|
828
|
+
bundle exec rake
|
829
|
+
```
|
830
|
+
|
831
|
+
to check if everything is in order. After that you can submit a pull request.
|