RubyGems - atomic_cache - Versions diffs - 0.1.0.rc1 - Mend

atomic_cache 0.1.0.rc1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (31) hide show

checksums.yaml +7 -0
data/.gitignore +51 -0
data/.ruby_version +1 -0
data/.travis.yml +26 -0
data/CODE_OF_CONDUCT.md +46 -0
data/Gemfile +6 -0
data/LICENSE +201 -0
data/README.md +49 -0
data/Rakefile +6 -0
data/atomic_cache.gemspec +36 -0
data/bin/console +14 -0
data/bin/setup +8 -0
data/docs/ARCH.md +34 -0
data/docs/INTERFACES.md +45 -0
data/docs/MODEL_SETUP.md +31 -0
data/docs/PROJECT_SETUP.md +68 -0
data/docs/USAGE.md +106 -0
data/docs/img/quick_retry_graph.png +0 -0
data/lib/atomic_cache.rb +11 -0
data/lib/atomic_cache/atomic_cache_client.rb +197 -0
data/lib/atomic_cache/concerns/global_lmt_cache_concern.rb +111 -0
data/lib/atomic_cache/default_config.rb +62 -0
data/lib/atomic_cache/key/keyspace.rb +98 -0
data/lib/atomic_cache/key/last_mod_time_key_manager.rb +95 -0
data/lib/atomic_cache/storage/dalli.rb +46 -0
data/lib/atomic_cache/storage/instance_memory.rb +37 -0
data/lib/atomic_cache/storage/memory.rb +67 -0
data/lib/atomic_cache/storage/shared_memory.rb +40 -0
data/lib/atomic_cache/storage/store.rb +31 -0
data/lib/atomic_cache/version.rb +5 -0
metadata +185 -0

data/bin/setup ADDED Viewed

@@ -0,0 +1,8 @@
+#!/usr/bin/env bash
+set -euo pipefail
+IFS=$'\n\t'
+set -vx
+bundle install
+# Do any other automated setup that you need to do here

data/docs/ARCH.md ADDED Viewed

@@ -0,0 +1,34 @@
+## Overview
+The problem of handling the scope of timestamps for multiple caches within a single context is more nuanced than it appears at first.  The most common context is a model class. That will be used as the example through this documentation, but this gem could support other contexts as well.
+Any single model class may have multiple caches associated with it, for example, a cache of all active or inactive instances of the model.  When any instance of that class changes, which changes become invalidated?  A simple solution is to keep one last modified time that is within scope for all instances of the class, and a change to any instance results in a change of the last modified time.  Likewise, a change to the last modified time of the model would need to result in an invalidation of all collection caches.  Thus, the last modified time is at a broader scope than any individual cache.  In addition, what is often viewed as a single key or an individual cache is actually a collection of similar keys oriented around storing one logical value.  The reason for this is the cache client has a fall-through stack where it tries to find the best value; it possibly needs to look into several cache keys before finding the best value, thus it needs to understand the namespace (or collection of sub-keys), not just a single string.
+The implementation of this gem handles this by separating management of the last modified time value into a "timestamp manager" and encapsulation of all the sub-keys for a given cache into a "keyspace".  Because the timestamp manager maintains a timestamp which has a scope larger than any single logical value being stored it stores this time in a parent keyspace.  Additional caches for that model are then child keyspaces which namespace themselves relative to the parent and their specific concern.
+To keep things simple, when using a concern, there is a one-to-one correlation between a cache client instance and a timestamp manager.  In the common case this simplifies needing to know about these individual parts and lets users just get to the tasks of fetching and writing caches.  At runtime the cache client only requires the namespace in order to operate, and automatically uses the last modified time from it's timestamp manager.
+#### Terms
+  * *Keyspace* - Responsible for knowing the namespace and generating all the sub keys for a logical cache location
+  * *TimestampManager* - Responsible for managing and storing the last modified time.  Represents a logical scope of cache invalidation.
+  * *CacheClient* - The distributed lock implementation. Responsible for fetching the best value for a keyspace.
+  * *StorageAdapter* - Interface to storage facility
+#### Storage Locations
+The gem stores data in two locations, a key store and a cache store.
+##### Stored in the Atomic Cache Client's storage:
+  * cached value
+###### Stored in the Key Keyspace's storage:
+  * atomic lock
+  * last known key
+  * last modified time
+### Keyspace Keys
+Example keys assume use of concern.  `id` in this context is whatever is given when `cache_keyspace` is run.
+  * *last modified time* - `<namespace>:<class name>:<version>:lmt`
+  * *value* - `<namespace>:<class name>:<version>:<id>:<timestamp>`
+  * *last known key* -  `<namespace>:<class name>:<version>:<id>:lkk`
+  * *lock* -  `<namespace>:<class name>:<version>:<id>:lock`

data/docs/INTERFACES.md ADDED Viewed

@@ -0,0 +1,45 @@
+## Storage Adapter
+Any options passed in by the user at fetch time will be passed through to the storage adapter.
+```ruby
+class StorageAdapter
+  # (String, Object, Integer, Hash) -> Boolean
+  # ttl is in millis
+  # operation must be atomic
+  # returns true when the key doesn't exist and was written successfully
+  # returns false in all other cases
+  def add(key, new_value, ttl, user_options); end
+  # (String, Hash) -> String
+  # return the `value` at `key`
+  def read(key, user_options); end
+  # (String, Object) -> Boolean
+  # returns true if it succeeds; false otherwise
+  def set(key, new_value, user_options); end
+  # (String) -> Boolean
+  # returns true if it succeeds; false otherwise
+  def delete(key, user_options); end
+end
+```
+## Metrics
+```ruby
+class Metrics
+  # (String, Hash) -> Nil
+  def increment(key, options); end
+  # (String, Hash, Block) -> Nil
+  def time(key, options, &block); end
+end
+```
+## Logger
+```ruby
+class Logger
+  # (Object) -> Nil
+  def warn(msg); end
+  def info(msg); end
+  def debug(msg); end
+end
+```

data/docs/MODEL_SETUP.md ADDED Viewed

@@ -0,0 +1,31 @@
+## Model Setup
+Include the `GlobalLMTCacheConcern`.
+```ruby
+class Foo < ActiveRecord::Base
+  include AtomicCache::GlobalLMTCacheConcern
+end
+```
+### cache_class
+By default the cache identifier for a class is set to the name of a class (ie. `self.to_s`).  In some cases it makes sense to set a custom value for the cache identifier.  In cases where a custom cache identifier is set, it's important that the identifier remain unique across the project.
+```ruby
+class SuperDescriptiveDomainModelAbstractFactoryImplManager < ActiveRecord::Base
+  include AtomicCache::GlobalLMTCacheConcern
+  cache_class('sddmafim')
+end
+```
+#### ★ Best Practice ★
+Generally it should only be necessary to explicitly set a `cache_class` in cases where the class name is extremely long and causing the max key length to be hit.  In such a case the `cache_class` can be set to an abbreviation of the class name.
+### cache_version
+In cases where a code change that is incompatible with cached values already written needs to be deployed, a cache version can be set which further sub-divides the cache namespace, preventing old values from being read.  When the version is `nil` (the default), no version is added to the cache key.
+```ruby
+class Foo < ActiveRecord::Base
+  include AtomicCache::GlobalLMTCacheConcern
+  cache_version(5)
+end
+```

data/docs/PROJECT_SETUP.md ADDED Viewed

@@ -0,0 +1,68 @@
+## Gem Installation
+You will need to ensure you have the correct deploy credentials
+Add this line to your application's Gemfile:
+```ruby
+gem 'atomic_cache'
+```
+And then execute:
+    $ bundle
+## Project Setup
+`AtomicCache::DefaultConfig` is a singleton which allows global configuration.
+#### Rails Initializer Example
+```ruby
+# config/initializers/cache.rb
+require 'datadog/statsd'
+require 'atomic_cache'
+AtomicCache::DefaultConfig.configure do |config|
+  config.logger    = Rails.logger
+  config.metrics   = Datadog::Statsd.new('localhost', 8125, namespace: 'cache.atomic')
+  config.namespace = 'atom'
+end
+```
+#### Required
+  * `cache_storage` - Storage adapter for cache (see below)
+  * `key_storage` - Storage adapter for key manager (see below)
+#### Optional
+  * `default_options` - Default options for every fetch call.  See [options](TODO: LINK).
+  * `logger` - Logger instance.  Used for debug and warn logs. Defaults to nil.
+  * `timestamp_formatter` - Proc to format last modified time for storage. Defaults to timestamp (`Time.to_i`)
+  * `metrics` - Metrics instance. Defaults to nil.
+  * `namespace` - Global namespace that will prefix all cache keys. Defaults to nil.
+#### ★ Best Practice ★
+Keep the global namespace short.  For example, memcached has a limit of 250 characters for key length.
+## Storage Adapters
+### InstanceMemory & SharedMemory
+Both of these storage adapters provide a cache storage implementation that is limited to a single ruby instance.  The difference is that `InstanceMemory` maintains a private store that is only visible when interacting with that instance of the adapter where as `SharedMemory` creates a class-scoped store such that all instances of the storage adapter read and write from the same store.  `InstanceMemory` is great for integration testing as it isolates visibility of the store and `SharedMemory` is great for local development and integration testing in cases where multiple components reading and writing needs to be represented.
+Neither memory storage implementation should be considered "production ready".  Both respect TTL but only evaluate it on read meaning that data is only removed from the store when it's attempted to be read and the TTL is evaluated as expired.
+##### Example
+```ruby
+AtomicCache::DefaultConfig.configure do |config|
+  config.key_storage = AtomicCache::Storage::InstanceMemory.new
+end
+```
+### Dalli
+The `Dalli` storage adapter provides a thin wrapper around the Dalli client.
+##### Example
+```ruby
+dc = Dalli::Client.new('localhost:11211', options)
+AtomicCache::DefaultConfig.configure do |config|
+  config.key_storage = AtomicCache::Storage::Dalli.new(dc)
+end
+```

data/docs/USAGE.md ADDED Viewed

@@ -0,0 +1,106 @@
+## Usage
+### Invalidating the Cache on Change
+The concern makes the `expire_cache` method available both on the class and on the instance.
+```ruby
+expire_cache
+expire_cache(Time.now - 100) # an optional time can be given
+```
+### Getting Last Modified Time
+The concern makes a `last_modified_time` method available both on the class and on the instance.
+### Fetch
+The concern makes a `AtomicCache` object available both on the class and on the instance.
+```ruby
+AtomicCache.fetch(options) do
+  # generate block
+end
+```
+In addition to the below options, any other options given (e.g. `expires_in`, `cache_nils`) are passed through to the underlying storage adapter.  This allows storage-specific options to be passed through (reference: [Dalli config](https://github.com/petergoldstein/dalli#configuration)).
+#### `generate_ttl_ms`
+_Defaults to 30 seconds._
+When a cache client identifies that a cache is empty and that no other processes are actively generating a value, it will establish a lock and attempt to generate the value itself.  However, if that process were to die or the instance on which it's on goes down in addition to being unable to write a cache and the lock that it established would still be active, preventing other processes from generating a new cache value.  To prevent this, the lock *always* has a TTL on it forcing the lock to automatically be removed by the storage mechanism to prevent permanent locks.  `generate_ttl_ms` is the duration of that TTL.
+The ideal `generate_ttl_ms` time is just slightly longer than the average generate block duration.  If `generate_ttl_ms` is set too low, the lock might expire before a process has written it's new value and another process will then try and generate an identical value.
+If metrics are enabled, the `<namespace>.generate.run` can be used to determine the min/max/average generate time for a particular cache and the `generate_ttl_ms` tuned using that.
+#### `quick_retry_ms`
+_`false` to disable. Defaults to false._
+In the case where another process is computing the new cache value, before falling back to the last known value, if `quick_retry_ms` has a value the atomic client will check the new cache once after the given duration (in milliseconds).
+The danger with `quick_retry_ms` is that when enabled it applies a delay to all fall-through requests at the cost of only benefitting some customers.  As the average generate block duration increases, the effectiveness of `quick_retry_ms` decreases because there is less of a likelihood that a customer will get a fresh value.  Consider the graph below.  For example, a cache with an average generate duration of 200ms, configured with a `quick_retry_ms` of 50ms (red) will only likely get a fresh value for 25% of customers.
+`quick_retry_ms` is most effective for caches that are quick to generate but whose values are slow to change.  `quick_retry_ms` is least effective for caches that are slow to update but quick to change.
+![quick_retry_ms graph](img/quick_retry_ms_graph.png)
+#### `max_retries` & `backoff_duration_ms`
+_`max_retries` defaults to 5._
+_`backoff_duration_ms` defaults to 50ms._
+In cases where neither the cached value nor the last known value isn't available the client ends up in a state of polling for the new value, under the assumption that another process is generating that value.  It's possible that the other process went down or is for some reason not able to write the new value to the cache.  If the client didn't stop polling for a value, it would steal all the process time from other requests.  `max_retries` defeats that case by limiting how many times the client can poll before giving up.
+The client wait between polling. The duration it waits is `backoff_duration_ms * retry_count * random(1 to 15ms)`. A small random value is added to stagger multiple processes in the case after a deploy where many machines come online close to the same time and all need to same cache.
+`backoff_duration_ms` and `max_retries` should both be small values.  Ideally
+##### Example retry with durations
+`max_retries` = 5
+`backoff_duration_ms` = 50ms
+Assumes the random offset is always 10ms
+Total time spent polling: 800ms
+  * First retry - wait 60ms
+  * Second retry - wait 110ms
+  * Third retry - wait 160ms
+  * Fourth retry - wait 210ms
+  * Fifth retry - wait 260ms
+## Testing
+### Integration Style Tests
+`AtomicCache::Storage::InstanceMemory` or `AtomicCache::Storage::SharedMemory` can be used to make testing easier by offering an integration testing approach that allows assertion against what ended up in the cache instead of what methods on the cache client were called.  Both storage adapters expose the following methods.
+  * `#reset` -- Clears all stored values
+  * `#store` -- Returns the underlying hash of values stored
+All incoming keys are normalized to symbols.  All values are stored with a `value`, `ttl`, and `written_at` property.
+It's likely preferable to use an environments file to configure the `key_storage` and `cache_storage` to always be an in-memory adapter when running in the test environment instead of manually configuring the storage adapter per spec.
+#### ★ Testing Tip ★
+If using `SharedMemory` for integration style tests, a global `before(:each)` can be configured in `spec_helper.rb`.
+```ruby
+# spec/spec_helper.rb
+RSpec.configure do |config|
+  #your other config
+  config.before(:each) do
+    AtomicCache::Storage::SharedMemory.reset
+  end
+end
+```
+## Metrics
+If a metrics client is configured via the DefaultConfig, the following metrics will be published:
+* `<namespace>.read.present` - Number of times a key was fetched and was present in the cache
+* `<namespace>.read.not-present` - Number of times a key was fetched and was NOT present in the cache
+* `<namespace>.generate.current-thread` - Number of times the value was not present in the cache and the current thread started the task of generating a new value
+* `<namespace>.generate.other-thread` - Number of times the value was not present in the cache but another thread was already generating the value
+* `<namespace>.empty-cache-retry.present` - Number of times the value was not present, but the client checked again after a short duration and it was present
+* `<namespace>.empty-cache-retry.not-present` - Number of times the value was not present, but the client checked again after a short duration and it was NOT present
+* `<namespace>.last-known-value.present` - Number of times the value was not present but the last known value was
+* `<namespace>.last-known-value.not-present` - Number of times the value was not present and the last known value was not either
+* `<namespace>.wait.run` - When the value and last known value isn't available, this timer is the duration it takes to wait for another thread to generate the value before being recognized by the client on the current thread
+* `<namespace>.generate.run` - When a new value is being generated, this timer is the duration it takes to generate that new value

data/docs/img/quick_retry_graph.png ADDED Viewed

Binary file

data/lib/atomic_cache.rb ADDED Viewed

@@ -0,0 +1,11 @@
+# frozen_string_literal: true
+require_relative 'atomic_cache/version'
+require_relative 'atomic_cache/default_config'
+require_relative 'atomic_cache/atomic_cache_client'
+require_relative 'atomic_cache/key/last_mod_time_key_manager'
+require_relative 'atomic_cache/key/keyspace'
+require_relative 'atomic_cache/concerns/global_lmt_cache_concern'
+require_relative 'atomic_cache/storage/instance_memory'
+require_relative 'atomic_cache/storage/shared_memory'
+require_relative 'atomic_cache/storage/dalli'

data/lib/atomic_cache/atomic_cache_client.rb ADDED Viewed

@@ -0,0 +1,197 @@
+# frozen_string_literal: true
+require 'active_support/core_ext/object'
+require 'active_support/core_ext/hash'
+module AtomicCache
+  class AtomicCacheClient
+    DEFAULT_quick_retry_ms = false
+    DEFAULT_MAX_RETRIES = 5
+    DEFAULT_GENERATE_TIME_MS = 30000 # 30 seconds
+    BACKOFF_DURATION_MS = 50
+    # @param storage [Object] Cache storage adapter
+    # @param timestamp_manager [Object] Timestamp manager
+    # @param default_options [Hash] Default fetch options
+    # @param logger [Object] Logger
+    # @param metrics [Object] Metrics client
+    def initialize(storage: nil, timestamp_manager: nil, default_options: {}, logger: nil, metrics: nil)
+      @default_options = (DefaultConfig.instance.default_options&.clone || {}).merge(default_options || {})
+      @timestamp_manager = timestamp_manager
+      @logger = logger || DefaultConfig.instance.logger
+      @metrics = metrics || DefaultConfig.instance.metrics
+      @storage = storage || DefaultConfig.instance.cache_storage
+      raise ArgumentError.new("`timestamp_manager` required but none given") unless @timestamp_manager.present?
+      raise ArgumentError.new("`storage` required but none given") unless @storage.present?
+    end
+    # Attempts to fetch the given keyspace, using an optional block to generate
+    # a new value when the cache is expired
+    #
+    # @param keyspace [AtomicCache::Keyspace] the keyspace to fetch
+    # @option options [Numeric] :generate_ttl_ms (30000) Max generate duration in ms
+    # @option options [Numeric] :quick_retry_ms (false) Short duration to check back before using last known value
+    # @option options [Numeric] :max_retries (5) Max times to rety in waiting case
+    # @option options [Numeric] :backoff_duration_ms (50) Duration in ms to wait between retries
+    # @yield Generates a new value when cache is expired
+    def fetch(keyspace, options=nil)
+      options ||= {}
+      key = @timestamp_manager.current_key(keyspace)
+      tags = ["cache_keyspace:#{keyspace.root}"]
+      # happy path: see if the value is there in the key we expect
+      value = @storage.read(key, options) if key.present?
+      if !value.nil?
+        metrics(:increment, 'read.present', tags: tags)
+        return value
+      end
+      metrics(:increment, 'read.not-present', tags: tags)
+      log(:debug, "Cache key `#{key}` not present.")
+      # try to generate a new value if another process already isn't
+      if block_given?
+        new_value = generate_and_store(keyspace, options, tags, &Proc.new)
+        return new_value unless new_value.nil?
+      end
+      # quick check back to see if the other process has finished
+      # or fall back to the last known value
+      value = quick_retry(keyspace, options, tags) || last_known_value(keyspace, options, tags)
+      return value if value.present?
+      # wait for the other process if a last known value isn't there
+      if key.present?
+        return time('wait.run', tags: tags) do
+          wait_for_new_value(key, options, tags)
+        end
+      end
+      # At this point, there's no key, value, last known key, or last known value.
+      # A block wasn't given or couldn't create a non-nil value making it
+      # impossible to do anything else, so bail
+      if !key.present?
+        metrics(:increment, 'no-key.give-up')
+        log(:warn, "Giving up fetching cache keyspace for root `#{keyspace.root}`. No key could be generated.")
+      end
+      nil
+    end
+    protected
+    def generate_and_store(keyspace, options, tags)
+      generate_ttl_ms = option(:generate_ttl_ms, options, DEFAULT_GENERATE_TIME_MS).to_f / 1000
+      if @timestamp_manager.lock(keyspace, generate_ttl_ms, options)
+        lmt = Time.now
+        new_value = yield
+        if new_value.nil?
+          # let another thread try right away
+          @timestamp_manager.unlock(keyspace)
+          metrics(:increment, 'generate.nil', tags: tags)
+          log(:warn, "Generator for #{keyspace.key} returned nil. Aborting new cache value.")
+          return nil
+        end
+        new_key = @timestamp_manager.next_key(keyspace, lmt)
+        @timestamp_manager.promote(keyspace, last_known_key: new_key, timestamp: lmt)
+        @storage.set(new_key, new_value, options)
+        metrics(:increment, 'generate.current-thread', tags: tags)
+        log(:debug, "Generating new value for `#{new_key}`")
+        return new_value
+      end
+      metrics(:increment, 'generate.other-thread', tags: tags)
+      nil
+    end
+    def quick_retry(keyspace, options, tags)
+      key = @timestamp_manager.current_key(keyspace)
+      duration = option(:quick_retry_ms, options, DEFAULT_quick_retry_ms)
+      if duration.present? and key.present?
+        sleep(duration.to_f / 1000)
+        value = @storage.read(key, options)
+        if !value.nil?
+          metrics(:increment, 'empty-cache-retry.present', tags: tags)
+          return value
+        end
+        metrics(:increment, 'empty-cache-retry.not-present', tags: tags)
+      end
+      nil
+    end
+    def last_known_value(keyspace, options, tags)
+      lkk = @timestamp_manager.last_known_key(keyspace)
+      if lkk.present?
+        lkv = @storage.read(lkk, options)
+        # even if the last_known_key is present, the value at the
+        # last known key may have expired
+        if !lkv.nil?
+          metrics(:increment, 'last-known-value.present', tags: tags)
+          return lkv
+        end
+        # if the value of the last known key is nil, we can infer that it's
+        # most likely expired, thus remove it so other processes don't waste
+        # time trying to read it
+        @storage.delete(lkk)
+      end
+      metrics(:increment, 'last-known-value.not-present', tags: tags)
+      nil
+    end
+    def wait_for_new_value(key, options, tags)
+      max_retries = option(:max_retries, options, DEFAULT_MAX_RETRIES)
+      max_retries.times do |attempt|
+        metrics_tags = tags.clone.push("attempt:#{attempt}")
+        metrics(:increment, 'wait.attempt', tags: metrics_tags)
+        # the duration is given a random element in order to stagger retry across many processes
+        backoff_duration_ms = BACKOFF_DURATION_MS + rand(15)
+        backoff_duration_ms = option(:backoff_duration_ms, options, backoff_duration_ms)
+        sleep((backoff_duration_ms.to_f / 1000) * attempt)
+        value = @storage.read(key, options)
+        if !value.nil?
+          metrics(:increment, 'wait.present', tags: metrics_tags)
+          return value
+        end
+      end
+      metrics(:increment, 'wait.give-up')
+      log(:warn, "Giving up fetching cache key `#{key}`. Exceeded max retries (#{max_retries}).")
+      nil
+    end
+    def option(key, options, default=nil)
+      options[key] || @default_options[key] || default
+    end
+    def log(method, *args)
+      @logger.send(method, *args) if @logger.present?
+    end
+    def metrics(method, *args)
+      @metrics.send(method, *args) if @metrics.present?
+    end
+    def time(*args)
+      if @metrics.present?
+        @metrics.time(*args, &Proc.new)
+      else
+        yield
+      end
+    end
+  end
+end