RubyGems - josh-rack-cache - Versions diffs - 0.5.1 - Mend

josh-rack-cache 0.5.1

Files changed (40) hide show

data/CHANGES +167 -0
data/COPYING +18 -0
data/README +110 -0
data/Rakefile +137 -0
data/TODO +27 -0
data/doc/configuration.markdown +112 -0
data/doc/faq.markdown +141 -0
data/doc/index.markdown +121 -0
data/doc/layout.html.erb +34 -0
data/doc/license.markdown +24 -0
data/doc/rack-cache.css +362 -0
data/doc/server.ru +34 -0
data/doc/storage.markdown +164 -0
data/example/sinatra/app.rb +25 -0
data/example/sinatra/views/index.erb +44 -0
data/lib/rack/cache.rb +45 -0
data/lib/rack/cache/appengine.rb +52 -0
data/lib/rack/cache/cachecontrol.rb +193 -0
data/lib/rack/cache/context.rb +253 -0
data/lib/rack/cache/entitystore.rb +339 -0
data/lib/rack/cache/key.rb +52 -0
data/lib/rack/cache/metastore.rb +407 -0
data/lib/rack/cache/options.rb +150 -0
data/lib/rack/cache/request.rb +33 -0
data/lib/rack/cache/response.rb +267 -0
data/lib/rack/cache/storage.rb +62 -0
data/rack-cache.gemspec +70 -0
data/test/cache_test.rb +38 -0
data/test/cachecontrol_test.rb +139 -0
data/test/context_test.rb +774 -0
data/test/entitystore_test.rb +230 -0
data/test/key_test.rb +50 -0
data/test/metastore_test.rb +302 -0
data/test/options_test.rb +77 -0
data/test/pony.jpg +0 -0
data/test/request_test.rb +19 -0
data/test/response_test.rb +178 -0
data/test/spec_setup.rb +237 -0
data/test/storage_test.rb +94 -0
metadata +118 -0

data/doc/server.ru ADDED

@@ -0,0 +1,34 @@
+# Rackup config that serves the contents of Rack::Cache's
+# doc directory. The documentation is rebuilt on each request.
+# Rewrites URLs like conventional web server configs.
+class Rewriter < Struct.new(:app)
+  def call(env)
+    if env['PATH_INFO'] =~ /\/$/
+      env['PATH_INFO'] += 'index.html'
+    elsif env['PATH_INFO'] !~ /\.\w+$/
+      env['PATH_INFO'] += '.html'
+    end
+    app.call(env)
+  end
+end
+# Rebuilds documentation on each request.
+class DocBuilder < Struct.new(:app)
+  def call(env)
+    if env['PATH_INFO'] !~ /\.(css|js|gif|jpg|png|ico)$/
+      env['rack.errors'] << "*** rebuilding documentation (rake -s doc)\n"
+      system "rake -s doc"
+    end
+    app.call(env)
+  end
+end
+use Rack::CommonLogger
+use DocBuilder
+use Rewriter
+use Rack::Static, :root => File.dirname(__FILE__), :urls => ["/"]
+run(lambda{|env| [404,{},'<h1>Not Found</h1>']})
+# vim: ft=ruby

data/doc/storage.markdown ADDED

@@ -0,0 +1,164 @@
+Storage
+=======
+__Rack::Cache__ runs within each of your backend application processes and does not
+rely on a single intermediary process like most types of proxy cache
+implementations. Because of this, the storage subsystem has implications on not
+only where cache data is stored but whether the cache is properly distributed
+between multiple backend processes. It is highly recommended that you read and
+understand the following before choosing a storage implementation.
+Storage Areas
+-------------
+__Rack::Cache__ stores cache entries in two separate configurable storage
+areas: a _MetaStore_ and an _EntityStore_.
+The _MetaStore_ keeps high level information about each cache entry, including
+the request/response headers and other status information. When a request is
+received, the core caching logic uses this meta information to determine whether
+a fresh cache entry exists that can satisfy the request.
+The _EntityStore_ is where the actual response body content is stored. When a
+response is entered into the cache, a SHA1 digest of the response body content
+is calculated and used as a key. The entries stored in the MetaStore reference
+their response bodies using this SHA1 key.
+Separating request/response meta-data from response content has a few important
+advantages:
+  * Different storage types can be used for meta and entity storage. For
+    example, it may be desirable to use memcached to store meta information
+    while using the filesystem for entity storage.
+  * Cache entry meta-data may be retrieved quickly without also retrieving
+    response bodies. This avoids significant overhead when the cache misses
+    or only requires validation.
+  * Multiple different responses may include the same exact response body. In
+    these cases, the actual body content is stored once and referenced from
+    each of the meta store entries.
+You should consider how the meta and entity stores differ when choosing a storage
+implementation. The MetaStore does not require nearly as much memory as the
+EntityStore and is accessed much more frequently. The EntityStore can grow quite
+large and raw performance is less of a concern. Using a memory based storage
+implementation (`heap` or `memcached`) for the MetaStore is strongly advised,
+while a disk based storage implementation (`file`) is often satisfactory for
+the EntityStore and uses much less memory.
+Storage Configuration
+---------------------
+The MetaStore and EntityStore used for a particular request is determined by
+inspecting the `rack-cache.metastore` and `rack-cache.entitystore` Rack env
+variables. The value of these variables is a URI that identifies the storage
+type and location (URI formats are documented in the following section).
+The `heap:/` storage is assumed if either storage type is not explicitly
+provided. This storage type has significant drawbacks for most types of
+deployments so explicit configuration is advised.
+The default metastore and entitystore values can be specified when the
+__Rack::Cache__ object is added to the Rack middleware pipeline as follows:
+    use Rack::Cache,
+      :metastore => 'file:/var/cache/rack/meta',
+      :entitystore => 'file:/var/cache/rack/body'
+Alternatively, the `rack-cache.metastore` and `rack-cache.entitystore`
+variables may be set in the Rack environment by an upstream component.
+Storage Implementations
+-----------------------
+__Rack::Cache__ includes meta and entity storage implementations backed by local
+process memory ("heap storage"), the file system ("disk storage"), and
+memcached. This section includes information on configuring __Rack::Cache__ to
+use a specific storage implementation as well as pros and cons of each.
+### Heap Storage
+Uses local process memory to store cached entries.
+    use Rack::Cache,
+      :metastore   => 'heap:/',
+      :entitystore => 'heap:/'
+The heap storage backend is simple, fast, and mostly useless. All cache
+information is stored in each backend application's local process memory (using
+a normal Hash, in fact), which means that data cached under one backend is
+invisible to all other backends. This leads to low cache hit rates and excessive
+memory use, the magnitude of which is a function of the number of backends in
+use. Further, the heap storage provides no mechanism for purging unused entries
+so memory use is guaranteed to exceed that available, given enough time and
+utilization.
+Use of heap storage is recommended only for testing purposes or for very
+simple/single-backend deployment scenarios where the number of resources served
+is small and well understood.
+### Disk Storage
+Stores cached entries on the filesystem.
+    use Rack::Cache,
+      :metastore   => 'file:/var/cache/rack/meta',
+      :entitystore => 'file:/var/cache/rack/body'
+The URI may specify an absolute, relative, or home-rooted path:
+  * `file:/storage/path` - absolute path to storage directory.
+  * `file:storage/path` - relative path to storage directory, rooted at the
+    process's current working directory (`Dir.pwd`).
+  * `file:~user/storage/path` - path to storage directory, rooted at the
+    specified user's home directory.
+  * `file:~/storage/path` - path to storage directory, rooted at the current
+    user's home directory.
+File system storage is simple, requires no special daemons or libraries, has a
+tiny memory footprint, and allows multiple backends to share a single cache; it
+is one of the slower storage implementations, however. Its use is recommended in
+cases where memory is limited or in environments where more complex storage
+backends (i.e., memcached) are not available. In many cases, it may be
+acceptable (and even optimal) to use file system storage for the entitystore and
+a more performant storage implementation (i.e. memcached) for the metastore.
+__NOTE:__ When both the metastore and entitystore are configured to use file
+system storage, they should be set to different paths to prevent any chance of
+collision.
+### Memcached Storage
+Stores cached entries in a remote [memcached](http://www.danga.com/memcached/)
+instance.
+    use Rack::Cache,
+      :metastore   => 'memcached://localhost:11211/meta',
+      :entitystore => 'memcached://localhost:11211/body'
+The URI must specify the host and port of a remote memcached daemon. The path
+portion is an optional (but recommended) namespace that is prepended to each
+cache key.
+The memcached storage backend requires either the `memcache-client` or
+`memcached` libraries. By default, the `memcache-client` library is used;
+require the `memcached` library explicitly to use it instead.
+    gem install memcache-client
+Memcached storage is reasonably fast and allows multiple backends to share a
+single cache. It is also the only storage implementation that allows the cache
+to reside somewhere other than the local machine. The memcached daemon stores
+all data in local process memory so using it for the entitystore can result in
+heavy memory usage. It is by far the best option for the metastore in
+deployments with multiple backend application processes since it allows the
+cache to be properly distributed and provides fast access to the
+meta-information required to perform cache logic. Memcached is considerably more
+complex than the other storage implementations, requiring a separate daemon
+process and extra libraries. Still, its use is recommended in all cases where
+you can get away with it.
+[e]: http://blog.evanweaver.com/files/doc/fauna/memcached/files/README.html
+[f]: http://blog.evanweaver.com/articles/2008/01/21/b-the-fastest-u-can-b-memcached/
+[l]: http://tangent.org/552/libmemcached.html

data/example/sinatra/app.rb ADDED

@@ -0,0 +1,25 @@
+require 'sinatra'
+require 'rack/cache'
+use Rack::Cache do
+  set :verbose, true
+  set :metastore,   'heap:/'
+  set :entitystore, 'heap:/'
+  on :receive do
+    pass! if request.url =~ /favicon/
+  end
+end
+before do
+  last_modified $updated_at ||= Time.now
+end
+get '/' do
+  erb :index
+end
+put '/' do
+  $updated_at = nil
+  redirect '/'
+end

data/example/sinatra/views/index.erb ADDED

@@ -0,0 +1,44 @@
+<html>
+  <head>
+    <title>Sample Rack::Cache Sinatra app</title>
+    <style type="text/css" media="screen">
+      body {
+        font-family: Georgia;
+        font-size: 24px;
+        text-align: center;
+      }
+      #headers {
+        font-size: 16px;
+      }
+      input {
+        font-size: 24px;
+        cursor: pointer;
+      }
+    </style>
+  </head>
+  <body>
+    <h1>Last updated at: <%= $updated_at.strftime('%l:%m:%S%P') %></h1>
+    <p>
+      <form action="/" method="post">
+        <input type="hidden" name="_method" value="PUT">
+        <input type="submit" value="Expire the cache.">
+      </form>
+    </p>
+    <div id="headers">
+      <h3>Headers:</h3>
+      <% response.headers.each do |key, value| %>
+        <p><%= key %>: <%= value %></p>
+      <% end %>
+      <h3>Params:</h3>
+      <% params.each do |key, value| %>
+        <p><%= key %>: <%= value || '(blank)' %></p>
+      <% end %>
+    </div>
+  </body>
+</html>

data/lib/rack/cache.rb ADDED

@@ -0,0 +1,45 @@
+require 'rack'
+# = HTTP Caching For Rack
+#
+# Rack::Cache is suitable as a quick, drop-in component to enable HTTP caching
+# for Rack-enabled applications that produce freshness (+Expires+, +Cache-Control+)
+# and/or validation (+Last-Modified+, +ETag+) information.
+#
+# * Standards-based (RFC 2616 compliance)
+# * Freshness/expiration based caching and validation
+# * Supports HTTP Vary
+# * Portable: 100% Ruby / works with any Rack-enabled framework
+# * Disk, memcached, and heap memory storage backends
+#
+# === Usage
+#
+# Create with default options:
+#   require 'rack/cache'
+#   Rack::Cache.new(app, :verbose => true, :entitystore => 'file:cache')
+#
+# Within a rackup file (or with Rack::Builder):
+#   require 'rack/cache'
+#   use Rack::Cache do
+#     set :verbose, true
+#     set :metastore, 'memcached://localhost:11211/meta'
+#     set :entitystore, 'file:/var/cache/rack'
+#   end
+#   run app
+module Rack::Cache
+  autoload :Request,      'rack/cache/request'
+  autoload :Response,     'rack/cache/response'
+  autoload :Context,      'rack/cache/context'
+  autoload :Storage,      'rack/cache/storage'
+  autoload :CacheControl, 'rack/cache/cachecontrol'
+  # Create a new Rack::Cache middleware component that fetches resources from
+  # the specified backend application. The +options+ Hash can be used to
+  # specify default configuration values (see attributes defined in
+  # Rack::Cache::Options for possible key/values). When a block is given, it
+  # is executed within the context of the newly create Rack::Cache::Context
+  # object.
+  def self.new(backend, options={}, &b)
+    Context.new(backend, options, &b)
+  end
+end

data/lib/rack/cache/appengine.rb ADDED

@@ -0,0 +1,52 @@
+require 'base64'
+module Rack::Cache::AppEngine
+  module MC
+    require 'java'
+    import com.google.appengine.api.memcache.Expiration;
+    import com.google.appengine.api.memcache.MemcacheService;
+    import com.google.appengine.api.memcache.MemcacheServiceFactory;
+    import com.google.appengine.api.memcache.Stats;
+    Service = MemcacheServiceFactory.getMemcacheService
+  end unless defined?(Rack::Cache::AppEngine::MC)
+  class MemCache
+      def initialize(options = {})
+        @cache = MC::Service
+        @cache.namespace = options[:namespace] if options[:namespace]
+      end
+      def contains?(key)
+        MC::Service.contains(key)
+      end
+      def get(key)
+        value = MC::Service.get(key)
+        Marshal.load(Base64.decode64(value)) if value
+      end
+      def put(key, value, ttl = nil)
+        expiration = ttl ? MC::Expiration.byDeltaSeconds(ttl) : nil
+        value = Base64.encode64(Marshal.dump(value)).gsub(/\n/, '')
+        MC::Service.put(key, value, expiration)
+      end
+      def namespace
+        MC::Service.getNamespace
+      end
+      def namespace=(value)
+        MC::Service.setNamespace(value.to_s)
+      end
+      def delete(key)
+        MC::Service.delete(key)
+      end
+  end
+end

data/lib/rack/cache/cachecontrol.rb ADDED

@@ -0,0 +1,193 @@
+module Rack
+  module Cache
+    # Parses a Cache-Control header and exposes the directives as a Hash.
+    # Directives that do not have values are set to +true+.
+    class CacheControl < Hash
+      def initialize(value=nil)
+        parse(value)
+      end
+      # Indicates that the response MAY be cached by any cache, even if it
+      # would normally be non-cacheable or cacheable only within a non-
+      # shared cache.
+      #
+      # A response may be considered public without this directive if the
+      # private directive is not set and the request does not include an
+      # Authorization header.
+      def public?
+        self['public']
+      end
+      # Indicates that all or part of the response message is intended for
+      # a single user and MUST NOT be cached by a shared cache. This
+      # allows an origin server to state that the specified parts of the
+      # response are intended for only one user and are not a valid
+      # response for requests by other users. A private (non-shared) cache
+      # MAY cache the response.
+      #
+      # Note: This usage of the word private only controls where the
+      # response may be cached, and cannot ensure the privacy of the
+      # message content.
+      def private?
+        self['private']
+      end
+      # When set in a response, a cache MUST NOT use the response to satisfy a
+      # subsequent request without successful revalidation with the origin
+      # server. This allows an origin server to prevent caching even by caches
+      # that have been configured to return stale responses to client requests.
+      #
+      # Note that this does not necessary imply that the response may not be
+      # stored by the cache, only that the cache cannot serve it without first
+      # making a conditional GET request with the origin server.
+      #
+      # When set in a request, the server MUST NOT use a cached copy for its
+      # response. This has quite different semantics compared to the no-cache
+      # directive on responses. When the client specifies no-cache, it causes
+      # an end-to-end reload, forcing each cache to update their cached copies.
+      def no_cache?
+        self['no-cache']
+      end
+      # Indicates that the response MUST NOT be stored under any circumstances.
+      #
+      # The purpose of the no-store directive is to prevent the
+      # inadvertent release or retention of sensitive information (for
+      # example, on backup tapes). The no-store directive applies to the
+      # entire message, and MAY be sent either in a response or in a
+      # request. If sent in a request, a cache MUST NOT store any part of
+      # either this request or any response to it. If sent in a response,
+      # a cache MUST NOT store any part of either this response or the
+      # request that elicited it. This directive applies to both non-
+      # shared and shared caches. "MUST NOT store" in this context means
+      # that the cache MUST NOT intentionally store the information in
+      # non-volatile storage, and MUST make a best-effort attempt to
+      # remove the information from volatile storage as promptly as
+      # possible after forwarding it.
+      #
+      # The purpose of this directive is to meet the stated requirements
+      # of certain users and service authors who are concerned about
+      # accidental releases of information via unanticipated accesses to
+      # cache data structures. While the use of this directive might
+      # improve privacy in some cases, we caution that it is NOT in any
+      # way a reliable or sufficient mechanism for ensuring privacy. In
+      # particular, malicious or compromised caches might not recognize or
+      # obey this directive, and communications networks might be
+      # vulnerable to eavesdropping.
+      def no_store?
+        self['no-store']
+      end
+      # The expiration time of an entity MAY be specified by the origin
+      # server using the Expires header (see section 14.21). Alternatively,
+      # it MAY be specified using the max-age directive in a response. When
+      # the max-age cache-control directive is present in a cached response,
+      # the response is stale if its current age is greater than the age
+      # value given (in seconds) at the time of a new request for that
+      # resource. The max-age directive on a response implies that the
+      # response is cacheable (i.e., "public") unless some other, more
+      # restrictive cache directive is also present.
+      #
+      # If a response includes both an Expires header and a max-age
+      # directive, the max-age directive overrides the Expires header, even
+      # if the Expires header is more restrictive. This rule allows an origin
+      # server to provide, for a given response, a longer expiration time to
+      # an HTTP/1.1 (or later) cache than to an HTTP/1.0 cache. This might be
+      # useful if certain HTTP/1.0 caches improperly calculate ages or
+      # expiration times, perhaps due to desynchronized clocks.
+      #
+      # Many HTTP/1.0 cache implementations will treat an Expires value that
+      # is less than or equal to the response Date value as being equivalent
+      # to the Cache-Control response directive "no-cache". If an HTTP/1.1
+      # cache receives such a response, and the response does not include a
+      # Cache-Control header field, it SHOULD consider the response to be
+      # non-cacheable in order to retain compatibility with HTTP/1.0 servers.
+      #
+      # When the max-age directive is included in the request, it indicates
+      # that the client is willing to accept a response whose age is no
+      # greater than the specified time in seconds.
+      def max_age
+        self['max-age'].to_i  if key?('max-age')
+      end
+      # If a response includes an s-maxage directive, then for a shared
+      # cache (but not for a private cache), the maximum age specified by
+      # this directive overrides the maximum age specified by either the
+      # max-age directive or the Expires header. The s-maxage directive
+      # also implies the semantics of the proxy-revalidate directive. i.e.,
+      # that the shared cache must not use the entry after it becomes stale
+      # to respond to a subsequent request without first revalidating it with
+      # the origin server. The s-maxage directive is always ignored by a
+      # private cache.
+      def shared_max_age
+        self['s-maxage'].to_i  if key?('s-maxage')
+      end
+      alias_method :s_maxage, :shared_max_age
+      # Because a cache MAY be configured to ignore a server's specified
+      # expiration time, and because a client request MAY include a max-
+      # stale directive (which has a similar effect), the protocol also
+      # includes a mechanism for the origin server to require revalidation
+      # of a cache entry on any subsequent use. When the must-revalidate
+      # directive is present in a response received by a cache, that cache
+      # MUST NOT use the entry after it becomes stale to respond to a
+      # subsequent request without first revalidating it with the origin
+      # server. (I.e., the cache MUST do an end-to-end revalidation every
+      # time, if, based solely on the origin server's Expires or max-age
+      # value, the cached response is stale.)
+      #
+      # The must-revalidate directive is necessary to support reliable
+      # operation for certain protocol features. In all circumstances an
+      # HTTP/1.1 cache MUST obey the must-revalidate directive; in
+      # particular, if the cache cannot reach the origin server for any
+      # reason, it MUST generate a 504 (Gateway Timeout) response.
+      #
+      # Servers SHOULD send the must-revalidate directive if and only if
+      # failure to revalidate a request on the entity could result in
+      # incorrect operation, such as a silently unexecuted financial
+      # transaction. Recipients MUST NOT take any automated action that
+      # violates this directive, and MUST NOT automatically provide an
+      # unvalidated copy of the entity if revalidation fails.
+      def must_revalidate?
+        self['must-revalidate']
+      end
+      # The proxy-revalidate directive has the same meaning as the must-
+      # revalidate directive, except that it does not apply to non-shared
+      # user agent caches. It can be used on a response to an
+      # authenticated request to permit the user's cache to store and
+      # later return the response without needing to revalidate it (since
+      # it has already been authenticated once by that user), while still
+      # requiring proxies that service many users to revalidate each time
+      # (in order to make sure that each user has been authenticated).
+      # Note that such authenticated responses also need the public cache
+      # control directive in order to allow them to be cached at all.
+      def proxy_revalidate?
+        self['proxy-revalidate']
+      end
+      def to_s
+        bools, vals = [], []
+        each do |key,value|
+          if value == true
+            bools << key
+          elsif value
+            vals << "#{key}=#{value}"
+          end
+        end
+        (bools.sort + vals.sort).join(', ')
+      end
+    private
+      def parse(value)
+        return  if value.nil? || value.empty?
+        value.delete(' ').split(',').inject(self) do |hash,part|
+          name, value = part.split('=', 2)
+          hash[name.downcase] = (value || true) unless name.empty?
+          hash
+        end
+      end
+    end
+  end
+end