rack-cache 0.2.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Potentially problematic release.


This version of rack-cache might be problematic. Click here for more details.

Files changed (44) hide show
  1. data/CHANGES +27 -0
  2. data/COPYING +18 -0
  3. data/README +96 -0
  4. data/Rakefile +144 -0
  5. data/TODO +40 -0
  6. data/doc/configuration.markdown +224 -0
  7. data/doc/events.dot +27 -0
  8. data/doc/faq.markdown +133 -0
  9. data/doc/index.markdown +113 -0
  10. data/doc/layout.html.erb +33 -0
  11. data/doc/license.markdown +24 -0
  12. data/doc/rack-cache.css +362 -0
  13. data/doc/storage.markdown +162 -0
  14. data/lib/rack/cache.rb +51 -0
  15. data/lib/rack/cache/config.rb +65 -0
  16. data/lib/rack/cache/config/busters.rb +16 -0
  17. data/lib/rack/cache/config/default.rb +134 -0
  18. data/lib/rack/cache/config/no-cache.rb +13 -0
  19. data/lib/rack/cache/context.rb +95 -0
  20. data/lib/rack/cache/core.rb +271 -0
  21. data/lib/rack/cache/entitystore.rb +224 -0
  22. data/lib/rack/cache/headers.rb +237 -0
  23. data/lib/rack/cache/metastore.rb +309 -0
  24. data/lib/rack/cache/options.rb +119 -0
  25. data/lib/rack/cache/request.rb +37 -0
  26. data/lib/rack/cache/response.rb +76 -0
  27. data/lib/rack/cache/storage.rb +50 -0
  28. data/lib/rack/utils/environment_headers.rb +78 -0
  29. data/rack-cache.gemspec +74 -0
  30. data/test/cache_test.rb +35 -0
  31. data/test/config_test.rb +66 -0
  32. data/test/context_test.rb +465 -0
  33. data/test/core_test.rb +84 -0
  34. data/test/entitystore_test.rb +176 -0
  35. data/test/environment_headers_test.rb +71 -0
  36. data/test/headers_test.rb +215 -0
  37. data/test/logging_test.rb +45 -0
  38. data/test/metastore_test.rb +210 -0
  39. data/test/options_test.rb +64 -0
  40. data/test/pony.jpg +0 -0
  41. data/test/response_test.rb +37 -0
  42. data/test/spec_setup.rb +189 -0
  43. data/test/storage_test.rb +94 -0
  44. metadata +120 -0
@@ -0,0 +1,162 @@
1
+ Storage
2
+ =======
3
+
4
+ __Rack::Cache__ runs within each of your backend application processes and does not
5
+ rely on a single intermediary process like most types of proxy cache
6
+ implementations. Because of this, the storage subsystem has implications on not
7
+ only where cache data is stored but whether the cache is properly distributed
8
+ between multiple backend processes. It is highly recommended that you read and
9
+ understand the following before choosing a storage implementation.
10
+
11
+ Storage Areas
12
+ -------------
13
+
14
+ __Rack::Cache__ stores cache entries in two separate configurable storage
15
+ areas: a _MetaStore_ and an _EntityStore_.
16
+
17
+ The _MetaStore_ keeps high level information about each cache entry, including
18
+ the request/response headers and other status information. When a request is
19
+ received, the core caching logic uses this meta information to determine whether
20
+ a fresh cache entry exists that can satisfy the request.
21
+
22
+ The _EntityStore_ is where the actual response body content is stored. When a
23
+ response is entered into the cache, a SHA1 digest of the response body content
24
+ is calculated and used as a key. The entries stored in the MetaStore reference
25
+ their response bodies using this SHA1 key.
26
+
27
+ Separating request/response meta-data from response content has a few important
28
+ advantages:
29
+
30
+ * Different storage types can be used for meta and entity storage. For
31
+ example, it may be desirable to use memcached to store meta information
32
+ while using the filesystem for entity storage.
33
+
34
+ * Cache entry meta-data may be retrieved quickly without also retrieving
35
+ response bodies. This avoids significant overhead when the cache misses
36
+ or only requires validation.
37
+
38
+ * Multiple different responses may include the same exact response body. In
39
+ these cases, the actual body content is stored once and referenced from
40
+ each of the meta store entries.
41
+
42
+ You should consider how the meta and entity stores differ when choosing a storage
43
+ implementation. The MetaStore does not require nearly as much memory as the
44
+ EntityStore and is accessed much more frequently. The EntityStore can grow quite
45
+ large and raw performance is less of a concern. Using a memory based storage
46
+ implementation (`heap` or `memcached`) for the MetaStore is strongly advised,
47
+ while a disk based storage implementation (`file`) is often satisfactory for
48
+ the EntityStore and uses much less memory.
49
+
50
+ Storage Configuration
51
+ ---------------------
52
+
53
+ The MetaStore and EntityStore used for a particular request is determined by
54
+ inspecting the `rack-cache.metastore` and `rack-cache.entitystore` Rack env
55
+ variables. The value of these variables is a URI that identifies the storage
56
+ type and location (URI formats are documented in the following section).
57
+
58
+ The `heap:/` storage is assumed if either storage type is not explicitly
59
+ provided. This storage type has significant drawbacks for most types of
60
+ deployments so explicit configuration is advised.
61
+
62
+ The default metastore and entitystore values can be specified when the
63
+ __Rack::Cache__ object is added to the Rack middleware pipeline as follows:
64
+
65
+ use Rack::Cache do
66
+ set :metastore, 'file:/var/cache/rack/meta'
67
+ set :entitystore, 'file:/var/cache/rack/body'
68
+ end
69
+
70
+ Alternatively, the `rack-cache.metastore` and `rack-cache.entitystore`
71
+ variables may be set in the Rack environment by an upstream component.
72
+
73
+ Storage Implementations
74
+ -----------------------
75
+
76
+ __Rack::Cache__ includes meta and entity storage implementations backed by local
77
+ process memory ("heap storage"), the file system ("disk storage"), and
78
+ memcached. This section includes information on configuring __Rack::Cache__ to
79
+ use a specific storage implementation as well as pros and cons of each.
80
+
81
+ ### Heap Storage
82
+
83
+ Uses local process memory to store cached entries.
84
+
85
+ set :metastore, 'heap:/'
86
+ set :entitystore, 'heap:/'
87
+
88
+ The heap storage backend is simple, fast, and mostly useless. All cache
89
+ information is stored in each backend application's local process memory (using
90
+ a normal Hash, in fact), which means that data cached under one backend is
91
+ invisible to all other backends. This leads to low cache hit rates and excessive
92
+ memory use, the magnitude of which is a function of the number of backends in
93
+ use. Further, the heap storage provides no mechanism for purging unused entries
94
+ so memory use is guaranteed to exceed that available, given enough time and
95
+ utilization.
96
+
97
+ Use of heap storage is recommended only for testing purposes or for very
98
+ simple/single-backend deployment scenarios where the number of resources served
99
+ is small and well understood.
100
+
101
+ ### Disk Storage
102
+
103
+ Stores cached entries on the filesystem.
104
+
105
+ set :metastore, 'file:/var/cache/rack/meta'
106
+ set :entitystore, 'file:/var/cache/rack/body'
107
+
108
+ The URI may specify an absolute, relative, or home-rooted path:
109
+
110
+ * `file:/storage/path` - absolute path to storage directory.
111
+ * `file:storage/path` - relative path to storage directory, rooted at the
112
+ process's current working directory (`Dir.pwd`).
113
+ * `file:~user/storage/path` - path to storage directory, rooted at the
114
+ specified user's home directory.
115
+ * `file:~/storage/path` - path to storage directory, rooted at the current
116
+ user's home directory.
117
+
118
+ File system storage is simple, requires no special daemons or libraries, has a
119
+ tiny memory footprint, and allows multiple backends to share a single cache; it
120
+ is one of the slower storage implementations, however. Its use is recommended in
121
+ cases where memory is limited or in environments where more complex storage
122
+ backends (i.e., memcached) are not available. In many cases, it may be
123
+ acceptable (and even optimal) to use file system storage for the entitystore and
124
+ a more performant storage implementation (i.e. memcached) for the metastore.
125
+
126
+ __NOTE:__ When both the metastore and entitystore are configured to use file
127
+ system storage, they should be set to different paths to prevent any chance of
128
+ collision.
129
+
130
+ ### Memcached Storage
131
+
132
+ Stores cached entries in a remote [memcached](http://www.danga.com/memcached/)
133
+ instance.
134
+
135
+ set :metastore, 'memcached://localhost:11211/meta'
136
+ set :entitystore, 'memcached://localhost:11211/body'
137
+
138
+ The URI must specify the host and port of a remote memcached daemon. The path
139
+ portion is an optional (but recommended) namespace that is prepended to each
140
+ cache key.
141
+
142
+ The memcached storage backend requires [Evan Weaver's memcached client library][e].
143
+ This is a [fast][f] client implementation built on the SWIG/[libmemcached][l] C
144
+ library. The library may be installed from Gem as follows:
145
+
146
+ sudo gem install memcached --no-rdoc --no-ri
147
+
148
+ Memcached storage is reasonably fast and allows multiple backends to share a
149
+ single cache. It is also the only storage implementation that allows the cache
150
+ to reside somewhere other than the local machine. The memcached daemon stores
151
+ all data in local process memory so using it for the entitystore can result in
152
+ heavy memory usage. It is by far the best option for the metastore in
153
+ deployments with multiple backend application processes since it allows the
154
+ cache to be properly distributed and provides fast access to the
155
+ meta-information required to perform cache logic. Memcached is considerably more
156
+ complex than the other storage implementations, requiring a separate daemon
157
+ process and extra libraries. Still, its use is recommended in all cases where
158
+ you can get away with it.
159
+
160
+ [e]: http://blog.evanweaver.com/files/doc/fauna/memcached/files/README.html
161
+ [f]: http://blog.evanweaver.com/articles/2008/01/21/b-the-fastest-u-can-b-memcached/
162
+ [l]: http://tangent.org/552/libmemcached.html
data/lib/rack/cache.rb ADDED
@@ -0,0 +1,51 @@
1
+ require 'fileutils'
2
+ require 'time'
3
+ require 'rack'
4
+
5
+ module Rack #:nodoc:
6
+ end
7
+
8
+ # = HTTP Caching For Rack
9
+ #
10
+ # Rack::Cache is suitable as a quick, drop-in component to enable HTTP caching
11
+ # for Rack-enabled applications that produce freshness (+Expires+, +Cache-Control+)
12
+ # and/or validation (+Last-Modified+, +ETag+) information.
13
+ #
14
+ # * Standards-based (RFC 2616 compliance)
15
+ # * Freshness/expiration based caching and validation
16
+ # * Supports HTTP Vary
17
+ # * Portable: 100% Ruby / works with any Rack-enabled framework
18
+ # * VCL-like configuration language for advanced caching policies
19
+ # * Disk, memcached, and heap memory storage backends
20
+ #
21
+ # === Usage
22
+ #
23
+ # Create with default options:
24
+ # require 'rack/cache'
25
+ # Rack::Cache.new(app, :verbose => true, :entitystore => 'file:cache')
26
+ #
27
+ # Within a rackup file (or with Rack::Builder):
28
+ # require 'rack/cache'
29
+ # use Rack::Cache do
30
+ # set :verbose, true
31
+ # set :metastore, 'memcached://localhost:11211/meta'
32
+ # set :entitystore, 'file:/var/cache/rack'
33
+ # end
34
+ # run app
35
+ #
36
+ module Rack::Cache
37
+ require 'rack/cache/request'
38
+ require 'rack/cache/response'
39
+ require 'rack/cache/context'
40
+ require 'rack/cache/storage'
41
+
42
+ # Create a new Rack::Cache middleware component that fetches resources from
43
+ # the specified backend application. The +options+ Hash can be used to
44
+ # specify default configuration values (see attributes defined in
45
+ # Rack::Cache::Options for possible key/values). When a block is given, it
46
+ # is executed within the context of the newly create Rack::Cache::Context
47
+ # object.
48
+ def self.new(backend, options={}, &b)
49
+ Context.new(backend, options, &b)
50
+ end
51
+ end
@@ -0,0 +1,65 @@
1
+ require 'set'
2
+
3
+ module Rack::Cache
4
+ # Provides cache configuration methods. This module is included in the cache
5
+ # context object.
6
+
7
+ module Config
8
+ # Evaluate a block of configuration code within the scope of receiver.
9
+ def configure(&block)
10
+ instance_eval(&block) if block_given?
11
+ end
12
+
13
+ # Import the configuration file specified. This has the same basic semantics
14
+ # as Ruby's built-in +require+ statement but always evaluates the source
15
+ # file within the scope of the receiver. The file may exist anywhere on the
16
+ # $LOAD_PATH.
17
+ def import(file)
18
+ return false if imported_features.include?(file)
19
+ path = add_file_extension(file, 'rb')
20
+ if path = locate_file_on_load_path(path)
21
+ source = File.read(path)
22
+ imported_features.add(file)
23
+ instance_eval source, path, 1
24
+ true
25
+ else
26
+ raise LoadError, 'no such file to load -- %s' % [file]
27
+ end
28
+ end
29
+
30
+ private
31
+ # Load the default configuration and evaluate the block provided within
32
+ # the scope of the receiver.
33
+ def initialize_config(&block)
34
+ import 'rack/cache/config/default'
35
+ configure(&block)
36
+ end
37
+
38
+ # Set of files that have been imported.
39
+ def imported_features
40
+ @imported_features ||= Set.new
41
+ end
42
+
43
+ # Attempt to expand +file+ to a full path by possibly adding an .rb
44
+ # extension and traversing the $LOAD_PATH looking for matches.
45
+ def locate_file_on_load_path(file)
46
+ if file[0,1] == '/'
47
+ file if File.exist?(file)
48
+ else
49
+ $LOAD_PATH.
50
+ map { |base| File.join(base, file) }.
51
+ detect { |p| File.exist?(p) }
52
+ end
53
+ end
54
+
55
+ # Add an extension to the filename provided if the file doesn't
56
+ # already have extension.
57
+ def add_file_extension(file, extension='rb')
58
+ if file =~ /\.\w+$/
59
+ file
60
+ else
61
+ "#{file}.#{extension}"
62
+ end
63
+ end
64
+ end
65
+ end
@@ -0,0 +1,16 @@
1
+ # Adds a very long max-age response header when the requested url
2
+ # looks like it includes a cache busting timestamp. Cache busting
3
+ # URLs look like this:
4
+ # http://HOST/PATH?DIGITS
5
+ #
6
+ # DIGITS is typically the number of seconds since some epoch but
7
+ # this can theoretically be any set of digits. Example:
8
+ # http://example.com/css/foo.css?7894387283
9
+ #
10
+ on :fetch do
11
+ next if response.freshness_information?
12
+ if request.url =~ /\?\d+$/
13
+ trace 'adding huge max-age to response for cache busting URL'
14
+ response.ttl = 100000000000000
15
+ end
16
+ end
@@ -0,0 +1,134 @@
1
+ # Called at the beginning of request processing, after the complete
2
+ # request has been fully received. Its purpose is to decide whether or
3
+ # not to serve the request and how to do it.
4
+ #
5
+ # The request should not be modified.
6
+ #
7
+ # Possible transitions from receive:
8
+ #
9
+ # * pass! - pass the request to the backend the response upstream,
10
+ # bypassing all caching features.
11
+ #
12
+ # * lookup! - attempt to locate the entry in the cache. Control will
13
+ # pass to the +hit+, +miss+, or +fetch+ event based on the result of
14
+ # the cache lookup.
15
+ #
16
+ # * error! - return the error code specified, abandoning the request.
17
+ #
18
+ on :receive do
19
+ pass! unless request.method? 'GET', 'HEAD'
20
+ pass! if request.header? 'Cookie', 'Authorization', 'Expect'
21
+ lookup!
22
+ end
23
+
24
+ # Called upon entering pass mode. The request is sent to the backend,
25
+ # and the backend's response is sent to the client, but is not entered
26
+ # into the cache. The event is triggered immediately after the response
27
+ # is received from the backend but before the it has been sent upstream.
28
+ #
29
+ # Possible transitions from pass:
30
+ #
31
+ # * finish! - deliver the response upstream.
32
+ #
33
+ # * error! - return the error code specified, abandoning the request.
34
+ #
35
+ on :pass do
36
+ finish!
37
+ end
38
+
39
+ # Called after a cache lookup when no matching entry is found in the
40
+ # cache. Its purpose is to decide whether or not to attempt to retrieve
41
+ # the response from the backend and in what manner.
42
+ #
43
+ # Possible transitions from miss:
44
+ #
45
+ # * fetch! - retrieve the requested document from the backend with
46
+ # caching features enabled.
47
+ #
48
+ # * pass! - pass the request to the backend and the response upstream,
49
+ # bypassing all caching features.
50
+ #
51
+ # * error! - return the error code specified and abandon request.
52
+ #
53
+ # The default configuration transfers control to the fetch event.
54
+ on :miss do
55
+ fetch!
56
+ end
57
+
58
+ # Called after a cache lookup when the requested document is found in
59
+ # the cache and is fresh.
60
+ #
61
+ # Possible transitions from hit:
62
+ #
63
+ # * deliver! - transfer control to the deliver event, sending the cached
64
+ # response upstream.
65
+ #
66
+ # * pass! - abandon the cache entry and transfer to pass mode. The
67
+ # original request is sent to the backend and the response sent
68
+ # upstream, bypassing all caching features.
69
+ #
70
+ # * error! - return the error code specified and abandon request.
71
+ #
72
+ on :hit do
73
+ deliver!
74
+ end
75
+
76
+ # Called after a document is successfully retrieved from the backend
77
+ # application or after a cache entry is validated with the backend.
78
+ # During validation, the original request is used as a template for a
79
+ # conditional GET request with the backend. The +original_response+
80
+ # object contains the response as received from the backend and +entry+
81
+ # is set to the cached response that triggered validation.
82
+ #
83
+ # Possible transitions from fetch:
84
+ #
85
+ # * store! - store the fetched response in the cache or, when
86
+ # validating, update the cached response with validated results.
87
+ #
88
+ # * deliver! - deliver the response upstream without entering it
89
+ # into the cache.
90
+ #
91
+ # * error! return the error code specified and abandon request.
92
+ #
93
+ on :fetch do
94
+ store! if response.cacheable?
95
+ deliver!
96
+ end
97
+
98
+ # Called immediately before an entry is written to the underlying
99
+ # cache. The +entry+ object may be modified.
100
+ #
101
+ # Possible transitions from store:
102
+ #
103
+ # * persist! - commit the object to cache and transfer control to
104
+ # the deliver event.
105
+ #
106
+ # * deliver! - transfer control to the deliver event without committing
107
+ # the object to cache.
108
+ #
109
+ # * error! - return the error code specified and abandon request.
110
+ #
111
+ on :store do
112
+ entry.ttl = default_ttl if entry.ttl.nil?
113
+ trace 'store backend response in cache (ttl: %ds)', entry.ttl
114
+ persist!
115
+ end
116
+
117
+ # Called immediately before +response+ is delivered upstream. +response+
118
+ # may be modified at this point but the changes will not effect the
119
+ # cache since the entry has already been persisted.
120
+ #
121
+ # * finish! - complete processing and send the response upstream
122
+ #
123
+ # * error! - return the error code specified and abandon request.
124
+ #
125
+ on :deliver do
126
+ finish!
127
+ end
128
+
129
+ # Called when an error! transition is triggered. The +response+ has the
130
+ # error code, headers, and body that will be delivered to upstream and
131
+ # may be modified if needed.
132
+ on :error do
133
+ finish!
134
+ end
@@ -0,0 +1,13 @@
1
+ # The default configuration ignores the `Cache-Control: no-cache` directive on
2
+ # requests. Per RFC 2616, the presence of the no-cache directive should cause
3
+ # intermediaries to process requests as if no cached version were available.
4
+ # However, this directive is most often targetted at shared proxy caches, not
5
+ # gateway caches, and so we've chosen to break with the spec in our default
6
+ # configuration.
7
+ #
8
+ # Import 'rack/cache/config/no-cache' to enable standards-based
9
+ # processing.
10
+
11
+ on :receive do
12
+ pass! if request.header['Cache-Control'] =~ /no-cache/
13
+ end