rack-cache 0.2.0

Sign up to get free protection for your applications and to get access to all the features.

Potentially problematic release.


This version of rack-cache might be problematic. Click here for more details.

Files changed (44) hide show
  1. data/CHANGES +27 -0
  2. data/COPYING +18 -0
  3. data/README +96 -0
  4. data/Rakefile +144 -0
  5. data/TODO +40 -0
  6. data/doc/configuration.markdown +224 -0
  7. data/doc/events.dot +27 -0
  8. data/doc/faq.markdown +133 -0
  9. data/doc/index.markdown +113 -0
  10. data/doc/layout.html.erb +33 -0
  11. data/doc/license.markdown +24 -0
  12. data/doc/rack-cache.css +362 -0
  13. data/doc/storage.markdown +162 -0
  14. data/lib/rack/cache.rb +51 -0
  15. data/lib/rack/cache/config.rb +65 -0
  16. data/lib/rack/cache/config/busters.rb +16 -0
  17. data/lib/rack/cache/config/default.rb +134 -0
  18. data/lib/rack/cache/config/no-cache.rb +13 -0
  19. data/lib/rack/cache/context.rb +95 -0
  20. data/lib/rack/cache/core.rb +271 -0
  21. data/lib/rack/cache/entitystore.rb +224 -0
  22. data/lib/rack/cache/headers.rb +237 -0
  23. data/lib/rack/cache/metastore.rb +309 -0
  24. data/lib/rack/cache/options.rb +119 -0
  25. data/lib/rack/cache/request.rb +37 -0
  26. data/lib/rack/cache/response.rb +76 -0
  27. data/lib/rack/cache/storage.rb +50 -0
  28. data/lib/rack/utils/environment_headers.rb +78 -0
  29. data/rack-cache.gemspec +74 -0
  30. data/test/cache_test.rb +35 -0
  31. data/test/config_test.rb +66 -0
  32. data/test/context_test.rb +465 -0
  33. data/test/core_test.rb +84 -0
  34. data/test/entitystore_test.rb +176 -0
  35. data/test/environment_headers_test.rb +71 -0
  36. data/test/headers_test.rb +215 -0
  37. data/test/logging_test.rb +45 -0
  38. data/test/metastore_test.rb +210 -0
  39. data/test/options_test.rb +64 -0
  40. data/test/pony.jpg +0 -0
  41. data/test/response_test.rb +37 -0
  42. data/test/spec_setup.rb +189 -0
  43. data/test/storage_test.rb +94 -0
  44. metadata +120 -0
@@ -0,0 +1,162 @@
1
+ Storage
2
+ =======
3
+
4
+ __Rack::Cache__ runs within each of your backend application processes and does not
5
+ rely on a single intermediary process like most types of proxy cache
6
+ implementations. Because of this, the storage subsystem has implications on not
7
+ only where cache data is stored but whether the cache is properly distributed
8
+ between multiple backend processes. It is highly recommended that you read and
9
+ understand the following before choosing a storage implementation.
10
+
11
+ Storage Areas
12
+ -------------
13
+
14
+ __Rack::Cache__ stores cache entries in two separate configurable storage
15
+ areas: a _MetaStore_ and an _EntityStore_.
16
+
17
+ The _MetaStore_ keeps high level information about each cache entry, including
18
+ the request/response headers and other status information. When a request is
19
+ received, the core caching logic uses this meta information to determine whether
20
+ a fresh cache entry exists that can satisfy the request.
21
+
22
+ The _EntityStore_ is where the actual response body content is stored. When a
23
+ response is entered into the cache, a SHA1 digest of the response body content
24
+ is calculated and used as a key. The entries stored in the MetaStore reference
25
+ their response bodies using this SHA1 key.
26
+
27
+ Separating request/response meta-data from response content has a few important
28
+ advantages:
29
+
30
+ * Different storage types can be used for meta and entity storage. For
31
+ example, it may be desirable to use memcached to store meta information
32
+ while using the filesystem for entity storage.
33
+
34
+ * Cache entry meta-data may be retrieved quickly without also retrieving
35
+ response bodies. This avoids significant overhead when the cache misses
36
+ or only requires validation.
37
+
38
+ * Multiple different responses may include the same exact response body. In
39
+ these cases, the actual body content is stored once and referenced from
40
+ each of the meta store entries.
41
+
42
+ You should consider how the meta and entity stores differ when choosing a storage
43
+ implementation. The MetaStore does not require nearly as much memory as the
44
+ EntityStore and is accessed much more frequently. The EntityStore can grow quite
45
+ large and raw performance is less of a concern. Using a memory based storage
46
+ implementation (`heap` or `memcached`) for the MetaStore is strongly advised,
47
+ while a disk based storage implementation (`file`) is often satisfactory for
48
+ the EntityStore and uses much less memory.
49
+
50
+ Storage Configuration
51
+ ---------------------
52
+
53
+ The MetaStore and EntityStore used for a particular request is determined by
54
+ inspecting the `rack-cache.metastore` and `rack-cache.entitystore` Rack env
55
+ variables. The value of these variables is a URI that identifies the storage
56
+ type and location (URI formats are documented in the following section).
57
+
58
+ The `heap:/` storage is assumed if either storage type is not explicitly
59
+ provided. This storage type has significant drawbacks for most types of
60
+ deployments so explicit configuration is advised.
61
+
62
+ The default metastore and entitystore values can be specified when the
63
+ __Rack::Cache__ object is added to the Rack middleware pipeline as follows:
64
+
65
+ use Rack::Cache do
66
+ set :metastore, 'file:/var/cache/rack/meta'
67
+ set :entitystore, 'file:/var/cache/rack/body'
68
+ end
69
+
70
+ Alternatively, the `rack-cache.metastore` and `rack-cache.entitystore`
71
+ variables may be set in the Rack environment by an upstream component.
72
+
73
+ Storage Implementations
74
+ -----------------------
75
+
76
+ __Rack::Cache__ includes meta and entity storage implementations backed by local
77
+ process memory ("heap storage"), the file system ("disk storage"), and
78
+ memcached. This section includes information on configuring __Rack::Cache__ to
79
+ use a specific storage implementation as well as pros and cons of each.
80
+
81
+ ### Heap Storage
82
+
83
+ Uses local process memory to store cached entries.
84
+
85
+ set :metastore, 'heap:/'
86
+ set :entitystore, 'heap:/'
87
+
88
+ The heap storage backend is simple, fast, and mostly useless. All cache
89
+ information is stored in each backend application's local process memory (using
90
+ a normal Hash, in fact), which means that data cached under one backend is
91
+ invisible to all other backends. This leads to low cache hit rates and excessive
92
+ memory use, the magnitude of which is a function of the number of backends in
93
+ use. Further, the heap storage provides no mechanism for purging unused entries
94
+ so memory use is guaranteed to exceed that available, given enough time and
95
+ utilization.
96
+
97
+ Use of heap storage is recommended only for testing purposes or for very
98
+ simple/single-backend deployment scenarios where the number of resources served
99
+ is small and well understood.
100
+
101
+ ### Disk Storage
102
+
103
+ Stores cached entries on the filesystem.
104
+
105
+ set :metastore, 'file:/var/cache/rack/meta'
106
+ set :entitystore, 'file:/var/cache/rack/body'
107
+
108
+ The URI may specify an absolute, relative, or home-rooted path:
109
+
110
+ * `file:/storage/path` - absolute path to storage directory.
111
+ * `file:storage/path` - relative path to storage directory, rooted at the
112
+ process's current working directory (`Dir.pwd`).
113
+ * `file:~user/storage/path` - path to storage directory, rooted at the
114
+ specified user's home directory.
115
+ * `file:~/storage/path` - path to storage directory, rooted at the current
116
+ user's home directory.
117
+
118
+ File system storage is simple, requires no special daemons or libraries, has a
119
+ tiny memory footprint, and allows multiple backends to share a single cache; it
120
+ is one of the slower storage implementations, however. Its use is recommended in
121
+ cases where memory is limited or in environments where more complex storage
122
+ backends (i.e., memcached) are not available. In many cases, it may be
123
+ acceptable (and even optimal) to use file system storage for the entitystore and
124
+ a more performant storage implementation (i.e. memcached) for the metastore.
125
+
126
+ __NOTE:__ When both the metastore and entitystore are configured to use file
127
+ system storage, they should be set to different paths to prevent any chance of
128
+ collision.
129
+
130
+ ### Memcached Storage
131
+
132
+ Stores cached entries in a remote [memcached](http://www.danga.com/memcached/)
133
+ instance.
134
+
135
+ set :metastore, 'memcached://localhost:11211/meta'
136
+ set :entitystore, 'memcached://localhost:11211/body'
137
+
138
+ The URI must specify the host and port of a remote memcached daemon. The path
139
+ portion is an optional (but recommended) namespace that is prepended to each
140
+ cache key.
141
+
142
+ The memcached storage backend requires [Evan Weaver's memcached client library][e].
143
+ This is a [fast][f] client implementation built on the SWIG/[libmemcached][l] C
144
+ library. The library may be installed from Gem as follows:
145
+
146
+ sudo gem install memcached --no-rdoc --no-ri
147
+
148
+ Memcached storage is reasonably fast and allows multiple backends to share a
149
+ single cache. It is also the only storage implementation that allows the cache
150
+ to reside somewhere other than the local machine. The memcached daemon stores
151
+ all data in local process memory so using it for the entitystore can result in
152
+ heavy memory usage. It is by far the best option for the metastore in
153
+ deployments with multiple backend application processes since it allows the
154
+ cache to be properly distributed and provides fast access to the
155
+ meta-information required to perform cache logic. Memcached is considerably more
156
+ complex than the other storage implementations, requiring a separate daemon
157
+ process and extra libraries. Still, its use is recommended in all cases where
158
+ you can get away with it.
159
+
160
+ [e]: http://blog.evanweaver.com/files/doc/fauna/memcached/files/README.html
161
+ [f]: http://blog.evanweaver.com/articles/2008/01/21/b-the-fastest-u-can-b-memcached/
162
+ [l]: http://tangent.org/552/libmemcached.html
data/lib/rack/cache.rb ADDED
@@ -0,0 +1,51 @@
1
+ require 'fileutils'
2
+ require 'time'
3
+ require 'rack'
4
+
5
+ module Rack #:nodoc:
6
+ end
7
+
8
+ # = HTTP Caching For Rack
9
+ #
10
+ # Rack::Cache is suitable as a quick, drop-in component to enable HTTP caching
11
+ # for Rack-enabled applications that produce freshness (+Expires+, +Cache-Control+)
12
+ # and/or validation (+Last-Modified+, +ETag+) information.
13
+ #
14
+ # * Standards-based (RFC 2616 compliance)
15
+ # * Freshness/expiration based caching and validation
16
+ # * Supports HTTP Vary
17
+ # * Portable: 100% Ruby / works with any Rack-enabled framework
18
+ # * VCL-like configuration language for advanced caching policies
19
+ # * Disk, memcached, and heap memory storage backends
20
+ #
21
+ # === Usage
22
+ #
23
+ # Create with default options:
24
+ # require 'rack/cache'
25
+ # Rack::Cache.new(app, :verbose => true, :entitystore => 'file:cache')
26
+ #
27
+ # Within a rackup file (or with Rack::Builder):
28
+ # require 'rack/cache'
29
+ # use Rack::Cache do
30
+ # set :verbose, true
31
+ # set :metastore, 'memcached://localhost:11211/meta'
32
+ # set :entitystore, 'file:/var/cache/rack'
33
+ # end
34
+ # run app
35
+ #
36
+ module Rack::Cache
37
+ require 'rack/cache/request'
38
+ require 'rack/cache/response'
39
+ require 'rack/cache/context'
40
+ require 'rack/cache/storage'
41
+
42
+ # Create a new Rack::Cache middleware component that fetches resources from
43
+ # the specified backend application. The +options+ Hash can be used to
44
+ # specify default configuration values (see attributes defined in
45
+ # Rack::Cache::Options for possible key/values). When a block is given, it
46
+ # is executed within the context of the newly create Rack::Cache::Context
47
+ # object.
48
+ def self.new(backend, options={}, &b)
49
+ Context.new(backend, options, &b)
50
+ end
51
+ end
@@ -0,0 +1,65 @@
1
+ require 'set'
2
+
3
+ module Rack::Cache
4
+ # Provides cache configuration methods. This module is included in the cache
5
+ # context object.
6
+
7
+ module Config
8
+ # Evaluate a block of configuration code within the scope of receiver.
9
+ def configure(&block)
10
+ instance_eval(&block) if block_given?
11
+ end
12
+
13
+ # Import the configuration file specified. This has the same basic semantics
14
+ # as Ruby's built-in +require+ statement but always evaluates the source
15
+ # file within the scope of the receiver. The file may exist anywhere on the
16
+ # $LOAD_PATH.
17
+ def import(file)
18
+ return false if imported_features.include?(file)
19
+ path = add_file_extension(file, 'rb')
20
+ if path = locate_file_on_load_path(path)
21
+ source = File.read(path)
22
+ imported_features.add(file)
23
+ instance_eval source, path, 1
24
+ true
25
+ else
26
+ raise LoadError, 'no such file to load -- %s' % [file]
27
+ end
28
+ end
29
+
30
+ private
31
+ # Load the default configuration and evaluate the block provided within
32
+ # the scope of the receiver.
33
+ def initialize_config(&block)
34
+ import 'rack/cache/config/default'
35
+ configure(&block)
36
+ end
37
+
38
+ # Set of files that have been imported.
39
+ def imported_features
40
+ @imported_features ||= Set.new
41
+ end
42
+
43
+ # Attempt to expand +file+ to a full path by possibly adding an .rb
44
+ # extension and traversing the $LOAD_PATH looking for matches.
45
+ def locate_file_on_load_path(file)
46
+ if file[0,1] == '/'
47
+ file if File.exist?(file)
48
+ else
49
+ $LOAD_PATH.
50
+ map { |base| File.join(base, file) }.
51
+ detect { |p| File.exist?(p) }
52
+ end
53
+ end
54
+
55
+ # Add an extension to the filename provided if the file doesn't
56
+ # already have extension.
57
+ def add_file_extension(file, extension='rb')
58
+ if file =~ /\.\w+$/
59
+ file
60
+ else
61
+ "#{file}.#{extension}"
62
+ end
63
+ end
64
+ end
65
+ end
@@ -0,0 +1,16 @@
1
+ # Adds a very long max-age response header when the requested url
2
+ # looks like it includes a cache busting timestamp. Cache busting
3
+ # URLs look like this:
4
+ # http://HOST/PATH?DIGITS
5
+ #
6
+ # DIGITS is typically the number of seconds since some epoch but
7
+ # this can theoretically be any set of digits. Example:
8
+ # http://example.com/css/foo.css?7894387283
9
+ #
10
+ on :fetch do
11
+ next if response.freshness_information?
12
+ if request.url =~ /\?\d+$/
13
+ trace 'adding huge max-age to response for cache busting URL'
14
+ response.ttl = 100000000000000
15
+ end
16
+ end
@@ -0,0 +1,134 @@
1
+ # Called at the beginning of request processing, after the complete
2
+ # request has been fully received. Its purpose is to decide whether or
3
+ # not to serve the request and how to do it.
4
+ #
5
+ # The request should not be modified.
6
+ #
7
+ # Possible transitions from receive:
8
+ #
9
+ # * pass! - pass the request to the backend the response upstream,
10
+ # bypassing all caching features.
11
+ #
12
+ # * lookup! - attempt to locate the entry in the cache. Control will
13
+ # pass to the +hit+, +miss+, or +fetch+ event based on the result of
14
+ # the cache lookup.
15
+ #
16
+ # * error! - return the error code specified, abandoning the request.
17
+ #
18
+ on :receive do
19
+ pass! unless request.method? 'GET', 'HEAD'
20
+ pass! if request.header? 'Cookie', 'Authorization', 'Expect'
21
+ lookup!
22
+ end
23
+
24
+ # Called upon entering pass mode. The request is sent to the backend,
25
+ # and the backend's response is sent to the client, but is not entered
26
+ # into the cache. The event is triggered immediately after the response
27
+ # is received from the backend but before the it has been sent upstream.
28
+ #
29
+ # Possible transitions from pass:
30
+ #
31
+ # * finish! - deliver the response upstream.
32
+ #
33
+ # * error! - return the error code specified, abandoning the request.
34
+ #
35
+ on :pass do
36
+ finish!
37
+ end
38
+
39
+ # Called after a cache lookup when no matching entry is found in the
40
+ # cache. Its purpose is to decide whether or not to attempt to retrieve
41
+ # the response from the backend and in what manner.
42
+ #
43
+ # Possible transitions from miss:
44
+ #
45
+ # * fetch! - retrieve the requested document from the backend with
46
+ # caching features enabled.
47
+ #
48
+ # * pass! - pass the request to the backend and the response upstream,
49
+ # bypassing all caching features.
50
+ #
51
+ # * error! - return the error code specified and abandon request.
52
+ #
53
+ # The default configuration transfers control to the fetch event.
54
+ on :miss do
55
+ fetch!
56
+ end
57
+
58
+ # Called after a cache lookup when the requested document is found in
59
+ # the cache and is fresh.
60
+ #
61
+ # Possible transitions from hit:
62
+ #
63
+ # * deliver! - transfer control to the deliver event, sending the cached
64
+ # response upstream.
65
+ #
66
+ # * pass! - abandon the cache entry and transfer to pass mode. The
67
+ # original request is sent to the backend and the response sent
68
+ # upstream, bypassing all caching features.
69
+ #
70
+ # * error! - return the error code specified and abandon request.
71
+ #
72
+ on :hit do
73
+ deliver!
74
+ end
75
+
76
+ # Called after a document is successfully retrieved from the backend
77
+ # application or after a cache entry is validated with the backend.
78
+ # During validation, the original request is used as a template for a
79
+ # conditional GET request with the backend. The +original_response+
80
+ # object contains the response as received from the backend and +entry+
81
+ # is set to the cached response that triggered validation.
82
+ #
83
+ # Possible transitions from fetch:
84
+ #
85
+ # * store! - store the fetched response in the cache or, when
86
+ # validating, update the cached response with validated results.
87
+ #
88
+ # * deliver! - deliver the response upstream without entering it
89
+ # into the cache.
90
+ #
91
+ # * error! return the error code specified and abandon request.
92
+ #
93
+ on :fetch do
94
+ store! if response.cacheable?
95
+ deliver!
96
+ end
97
+
98
+ # Called immediately before an entry is written to the underlying
99
+ # cache. The +entry+ object may be modified.
100
+ #
101
+ # Possible transitions from store:
102
+ #
103
+ # * persist! - commit the object to cache and transfer control to
104
+ # the deliver event.
105
+ #
106
+ # * deliver! - transfer control to the deliver event without committing
107
+ # the object to cache.
108
+ #
109
+ # * error! - return the error code specified and abandon request.
110
+ #
111
+ on :store do
112
+ entry.ttl = default_ttl if entry.ttl.nil?
113
+ trace 'store backend response in cache (ttl: %ds)', entry.ttl
114
+ persist!
115
+ end
116
+
117
+ # Called immediately before +response+ is delivered upstream. +response+
118
+ # may be modified at this point but the changes will not effect the
119
+ # cache since the entry has already been persisted.
120
+ #
121
+ # * finish! - complete processing and send the response upstream
122
+ #
123
+ # * error! - return the error code specified and abandon request.
124
+ #
125
+ on :deliver do
126
+ finish!
127
+ end
128
+
129
+ # Called when an error! transition is triggered. The +response+ has the
130
+ # error code, headers, and body that will be delivered to upstream and
131
+ # may be modified if needed.
132
+ on :error do
133
+ finish!
134
+ end
@@ -0,0 +1,13 @@
1
+ # The default configuration ignores the `Cache-Control: no-cache` directive on
2
+ # requests. Per RFC 2616, the presence of the no-cache directive should cause
3
+ # intermediaries to process requests as if no cached version were available.
4
+ # However, this directive is most often targetted at shared proxy caches, not
5
+ # gateway caches, and so we've chosen to break with the spec in our default
6
+ # configuration.
7
+ #
8
+ # Import 'rack/cache/config/no-cache' to enable standards-based
9
+ # processing.
10
+
11
+ on :receive do
12
+ pass! if request.header['Cache-Control'] =~ /no-cache/
13
+ end