RubyGems - redstream - Versions diffs - 0.0.1 - Mend

redstream 0.0.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (27) hide show

checksums.yaml +7 -0
data/.gitignore +14 -0
data/.travis.yml +10 -0
data/Gemfile +5 -0
data/LICENSE.txt +22 -0
data/README.md +253 -0
data/Rakefile +9 -0
data/docker-compose.yml +6 -0
data/lib/redstream.rb +134 -0
data/lib/redstream/consumer.rb +115 -0
data/lib/redstream/delayer.rb +100 -0
data/lib/redstream/lock.rb +80 -0
data/lib/redstream/message.rb +52 -0
data/lib/redstream/model.rb +57 -0
data/lib/redstream/producer.rb +145 -0
data/lib/redstream/trimmer.rb +91 -0
data/lib/redstream/version.rb +5 -0
data/redstream.gemspec +38 -0
data/spec/redstream/consumer_spec.rb +90 -0
data/spec/redstream/delayer_spec.rb +53 -0
data/spec/redstream/lock_spec.rb +68 -0
data/spec/redstream/model_spec.rb +57 -0
data/spec/redstream/producer_spec.rb +79 -0
data/spec/redstream/trimmer_spec.rb +32 -0
data/spec/redstream_spec.rb +117 -0
data/spec/spec_helper.rb +66 -0
metadata +289 -0

checksums.yaml ADDED

@@ -0,0 +1,7 @@
+---
+SHA256:
+  metadata.gz: 0e1ddc2700836c469d1ca61069e3416c21e657e05725b92e75969aa8110768e3
+  data.tar.gz: c8565f3754b3fd4f66823d6d7035e814f73e27abdb15936ecd906f5f07dd8643
+SHA512:
+  metadata.gz: edd496df8d06b98b9318b9796f400e2c0870edfc84c3aa7f9c7946dbe6cf91c5a8c0ab32425d627bc20c585389eab92ed1b290e57e0df856e8995547d8a9b7c6
+  data.tar.gz: 4893d2197f427479e4df0821ca29a23ee98a604fa73680f955da8d2c71cbdb192d006c476dc3bd6c03c719da327c3d9b6f207842082a64133f0fe2383771aef5

data/.gitignore ADDED

@@ -0,0 +1,14 @@
+/.bundle/
+/.yardoc
+/Gemfile.lock
+/_yardoc/
+/coverage/
+/doc/
+/pkg/
+/spec/reports/
+/tmp/
+*.bundle
+*.so
+*.o
+*.a
+mkmf.log

data/.travis.yml ADDED

@@ -0,0 +1,10 @@
+sudo: false
+language: ruby
+rvm:
+  - ruby-head
+before_install:
+  - docker-compose up -d
+  - sleep 10
+install:
+  - travis_retry bundle install
+script: rspec

data/Gemfile ADDED

@@ -0,0 +1,5 @@
+source "https://rubygems.org"
+gemspec

data/LICENSE.txt ADDED

@@ -0,0 +1,22 @@
+Copyright (c) 2014 Benjamin Vetter
+MIT License
+Permission is hereby granted, free of charge, to any person obtaining
+a copy of this software and associated documentation files (the
+"Software"), to deal in the Software without restriction, including
+without limitation the rights to use, copy, modify, merge, publish,
+distribute, sublicense, and/or sell copies of the Software, and to
+permit persons to whom the Software is furnished to do so, subject to
+the following conditions:
+The above copyright notice and this permission notice shall be
+included in all copies or substantial portions of the Software.
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE
+LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION
+OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION
+WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

data/README.md ADDED

@@ -0,0 +1,253 @@
+# Redstream
+**Using redis streams to keep your primary database in sync with secondary
+datastores (e.g. elasticsearch).**
+[![Build Status](https://secure.travis-ci.org/mrkamel/redstream.png?branch=master)](http://travis-ci.org/mrkamel/redstream)
+## Installation
+First, install redis. Then, add this line to your application's Gemfile:
+```ruby
+gem 'redstream'
+```
+And then execute:
+    $ bundle
+Or install it yourself as:
+    $ gem install redstream
+## Reference Docs
+The reference docs can be found at
+[https://www.rubydoc.info/github/mrkamel/redstream/master](https://www.rubydoc.info/github/mrkamel/redstream/master).
+## Usage
+Include `Redstream::Model` in your model and add a call to
+`redstream_callbacks`.
+```ruby
+class MyModel < ActiveRecord::Base
+  include Redstream::Model
+  # ...
+  redstream_callbacks
+  # ...
+end
+```
+`redstream_callbacks` adds `after_save`, `after_touch`, `after_destroy` and,
+most importantly, `after_commit` callbacks which write messages, containing the
+record id, to a redis stream. A background worker can then fetch those messages
+and update secondary datastores.
+In a background process, you need to run a `Redstream::Consumer`, `Redstream::Delayer`
+and a `Redstream::Trimmer`:
+```ruby
+Redstream::Consumer.new(stream_name: Product.redstream_name, name: "consumer").run do |messages|
+  # Update seconday datastore
+end
+# ...
+Redstream::Delayer.new(stream_name: Product.redstream_name, delay: 5.minutes).run
+# ...
+trimmer = RedStream::Trimmer.new(
+  stream_name: Product.redstream_name,
+  consumer_names: ["indexer", "cacher"],
+  interval: 30
+)
+trimmer.run
+```
+As all of them are blocking, you should run them in individual threads. But as
+none of them must be stopped gracefully, this can be as simple as:
+```ruby
+Thread.new do
+  Redstream::Consumer.new("...").run do |messages|
+    # ...
+  end
+end
+```
+More concretely, `after_save`, `after_touch` and `after_destroy` only write
+"delay" messages to an additional redis stream. Delay message are like any
+other messages, but they get processed by a `Redstream::Delayer` and the
+`Delayer`will wait for some (configurable) delay/time before processing them.
+As the `Delayer` is neccessary to fix inconsistencies, the delay must be at
+least as long as your maxiumum database transaction time. Contrary,
+`after_commit` writes messages to a redis stream from which the messages can
+be fetched immediately to keep the secondary datastores updated in
+near-realtime. The reasoning of all this is simple: usually, i.e. by using only
+one way to update secondary datastores, namely `after_save` or `after_commit`,
+any errors occurring in between `after_save` and `after_commit` result in
+inconsistencies between your primary and secondary datastore. By using these
+kinds of "delay" messages triggered by `after_save` and fetched after e.g. 5
+minutes, errors occurring in between `after_save` and `after_commit` can be
+fixed when the delay message get processed.
+Any messages are fetched in batches, such that e.g. elasticsearch can be
+updated using its bulk API. For instance, depending on which elasticsearch ruby
+client you are using, the reindexing code regarding elasticsearch will look
+similar to:
+```ruby
+Thread.new do
+  Redstream::Consumer.new(stream_name: Product.redstream_name, name: "indexer").run do |messages|
+    ids = messages.map { |message| message.payload["id"] }
+    ProductIndex.import Product.where(id: ids)
+  end
+end
+Thread.new do
+  Redstream::Delayer.new(stream_name: Product.redstream_name, delay: 5.minutes).run
+end
+Thread.new do
+  RedStream::Trimmer.new(stream_name: Product.redstream_name, consumer_names: ["indexer"], interval: 30).run
+end
+```
+You should run a consumer per `(stream_name, name)` tuple on multiple hosts for
+high availability. They'll use a redis based locking mechanism to ensure that
+only one consumer is consuming messages per tuple while the others are
+hot-standbys, i.e. they'll take over in case the currently active instance
+dies. The same stands for delayers and trimmers.
+Please note: if you have multiple kinds of consumers for a single model/topic,
+then you must use distinct names. Assume you have an indexer, which updates a
+search index for a model and a cacher, which updates a cache store for a model:
+```ruby
+Redstream::Consumer.new(stream_name: Product.redstream_name, name: "indexer").run do |messages|
+  # ...
+end
+Redstream::Consumer.new(stream_name: Product.redstream_name, name: "cacher").run do |messages|
+  # ...
+end
+```
+# Consumer, Delayer, Trimmer, Producer
+A `Consumer` fetches messages that have been added to a redis stream via
+`after_commit` or by a `Delayer`, i.e. messages that are available for
+immediate retrieval/reindexing/syncing.
+```ruby
+  Redstream::Consumer.new(stream_name: Product.redstream_name, name: "indexer").run do |messages|
+    ids = messages.map { |message| message.payload["id"] }
+    ProductIndex.import Product.where(id: ids)
+  end
+```
+A `Delayer` fetches messages that have been added to a second redis stream via
+`after_save`, `after_touch` and `after_destroy` to be retrieved after a certain
+configurable amount of time (5 minutes usually) to fix inconsistencies. The
+amount of time must be longer than your maximum database transaction time at
+least.
+```ruby
+  Redstream::Delayer.new(stream_name: Product.redstream_name, delay: 5.minutes).run
+```
+A `Trimmer` is responsible to finally remove messages from redis streams.
+Without a `Trimmer` messages will fill up your redis server and redis will
+finally crash due to out of memory errors. To be able to trim a stream, you
+must pass an array containing all consumer names reading from the respective
+stream. The `Trimmer` then continously checks how far each consumer already
+processed the stream and trims the stream up to the committed minimum.
+Contrary, if there is nothing to trim, the `Trimmer` will sleep for a specified
+`interval`.
+```ruby
+  RedStream::Trimmer.new(stream_name: Product.redstream_name, consumer_names: ["indexer"], interval: 30).run
+```
+A `Producer` adds messages to the concrete redis streams, and you
+can actually pass a concrete `Producer` instance via `redstream_callbacks`:
+```ruby
+class Product < ActiveRecord::Base
+  include Redstream::Model
+  # ...
+  redstream_callbacks producer: Redstream::Producer.new("...")
+  # ...
+end
+```
+As you might recognize, `Redstream::Model` is of course only able to send
+messages to redis streams for model lifecyle callbacks. This is however not
+the case for `#update_all`:
+```ruby
+Product.where(on_stock: true).update_all(featured: true)
+```
+To capture those updates as well, you need to change:
+```ruby
+Product.where(on_stock: true).update_all(featured: true)
+```
+to
+```ruby
+RedstreamProducer = Redstream::Producer.new
+Product.where(on_stock: true).find_in_batches do |products|
+  RedstreamProducer.bulk products do
+    Product.where(id: products.map(&:id)).update_all(featured: true)
+  end
+end
+```
+The `Producer` will write a message for every matched record into the delay
+stream before `update_all` is called and will write another message for every
+record to the main stream after `update_all` is called - just like it is done
+within the model lifecycle callbacks.
+The `#bulk` method must ensure that the same set of records is used for the
+delay messages and the instant messages. Thus, you better directly pass an
+array of records to `Redstream::Producer#bulk`, like shown above. If you pass
+an `ActiveRecord::Relation`, the `#bulk` method will convert it to an array,
+i.e. load the whole result set into memory.
+## Namespacing
+In case you are using a shared redis, where multiple appications read/write
+from the same redis server using Redstream, key conflicts could occur.
+To avoid that, you want to use namespacing:
+```ruby
+Redstream.namespace = 'my_app'
+```
+such that every application will have its own namespaced Redstream keys.
+## Contributing
+Bug reports and pull requests are welcome on GitHub at https://github.com/mrkamel/redstream
+## License
+The gem is available as open source under the terms of the [MIT License](http://opensource.org/licenses/MIT).

data/Rakefile ADDED

@@ -0,0 +1,9 @@
+require "bundler/gem_tasks"
+require "rake/testtask"
+Rake::TestTask.new(:test) do |t|
+  t.libs << "lib"
+  t.pattern = "test/**/*_test.rb"
+  t.verbose = true
+end

data/docker-compose.yml ADDED

@@ -0,0 +1,6 @@
+version: '2'
+services:
+  redis:
+    image: redis:5.0
+    ports:
+      - 127.0.0.1:6379:6379

data/lib/redstream.rb ADDED

@@ -0,0 +1,134 @@
+require "active_support/inflector"
+require "connection_pool"
+require "redis"
+require "json"
+require "thread"
+require "set"
+require "redstream/version"
+require "redstream/lock"
+require "redstream/message"
+require "redstream/consumer"
+require "redstream/producer"
+require "redstream/delayer"
+require "redstream/model"
+require "redstream/trimmer"
+module Redstream
+  # Redstream uses the connection_pool gem to pool redis connections. In case
+  # you have a distributed redis setup (sentinel/cluster) or the default pool
+  # size doesn't match your requirements, then you must specify the connection
+  # pool. A connection pool is neccessary, because redstream is using blocking
+  # commands. Please note, redis connections are somewhat cheap, so you better
+  # specify the pool size to be large enough instead of running into
+  # bottlenecks.
+  #
+  # @example
+  #   Redstream.connection_pool = ConnectionPool.new(size: 50) do
+  #     Redis.new("...")
+  #   end
+  def self.connection_pool=(connection_pool)
+    @connection_pool = connection_pool
+  end
+  # Returns the connection pool instance or sets and creates a new connection
+  # pool in case no pool is yet created.
+  #
+  # @return [ConnectionPool] The connection pool
+  def self.connection_pool
+    @connection_pool ||= ConnectionPool.new { Redis.new }
+  end
+  # You can specify a namespace to use for redis keys. This is useful in case
+  # you are using a shared redis.
+  #
+  # @example
+  #   Redstream.namespace = 'my_app'
+  def self.namespace=(namespace)
+    @namespace = namespace
+  end
+  # Returns the previously set namespace for redis keys to be used by
+  # Redstream.
+  def self.namespace
+    @namespace
+  end
+  # Returns the max id of the specified stream, i.e. the id of the
+  # last/newest message added. Returns nil for empty streams.
+  #
+  # @param stream_name [String] The stream name
+  # @return [String, nil] The id of a stream's newest messages, or nil
+  def self.max_stream_id(stream_name)
+    connection_pool.with do |redis|
+      message = redis.xrevrange(stream_key_name(stream_name), "+", "-", count: 1).first
+      return unless message
+      message[0]
+    end
+  end
+  # Returns the max committed id, i.e. the consumer's offset, for the specified
+  # consumer name.
+  #
+  # @param stream_name [String] the stream name
+  # @param name [String] the consumer name
+  #
+  # @return [String, nil] The max committed offset, or nil
+  def self.max_consumer_id(stream_name:, consumer_name:)
+    connection_pool.with do |redis|
+      redis.get offset_key_name(stream_name: stream_name, consumer_name: consumer_name)
+    end
+  end
+  # @api private
+  #
+  # Generates the low level redis stream key name.
+  #
+  # @param stream_name A high level stream name
+  # @return [String] A low level redis stream key name
+  def self.stream_key_name(stream_name)
+    "#{base_key_name}:stream:#{stream_name}"
+  end
+  # @api private
+  #
+  # Generates the redis key name used for storing a consumer's current offset,
+  # i.e. the maximum id successfully processed.
+  #
+  # @param consumer_name A high level consumer name
+  # @return [String] A redis key name for storing a stream's current offset
+  def self.offset_key_name(stream_name:, consumer_name:)
+    "#{base_key_name}:offset:#{stream_name}:#{consumer_name}"
+  end
+  # @api private
+  #
+  # Generates the redis key name used for locking.
+  #
+  # @param name A high level name for the lock
+  # @return [String] A redis key name used for locking
+  def self.lock_key_name(name)
+    "#{base_key_name}:lock:#{name}"
+  end
+  # @api private
+  #
+  # Returns the full name namespace for redis keys.
+  def self.base_key_name
+    [namespace, "redstream"].compact.join(":")
+  end
+end