ledgetech / ledge Goto Github PK
View Code? Open in Web Editor NEWAn RFC compliant and ESI capable HTTP cache for Nginx / OpenResty, backed by Redis
An RFC compliant and ESI capable HTTP cache for Nginx / OpenResty, backed by Redis
The Age response-header field conveys the sender's estimate of the amount of time since the response (or its revalidation) was generated at the origin server.
This information is used by intermediate proxies to convey an estimate of how old a stored response is:
HTTP/1.1 requires origin servers to send a Date header, if possible, with every response, giving the time at which the response was generated (see section 14.18). We use the term "date_value" to denote the value of the Date header, in a form appropriate for arithmetic operations.
HTTP/1.1 uses the Age response-header to convey the estimated age of the response message when obtained from a cache. The Age field value is the cache's estimate of the amount of time since the response was generated or revalidated by the origin server.
Furthermore, the Age header field is intended to be used by intermediate caches only:
The presence of an Age header field in a response implies that a response is not first-hand.
That means the presence of the header field Age: 0 means that the received response was sent by an intermediate cache and is only zero seconds old. So it was probably just fetched from the origin server before sending it to the client.
The precedence is:
If you set a config option to true
during init_by_lua
, and then attempt to override to false
in a subsequent phase, config_get()
falls through to the initial value, as false
is assumed to be nil
.
2012/06/26 10:46:56 [error] 7428#0: *11891 lua handler aborted: runtime error: /home/jhurst/prj/ledge/lib/ledge/ledge.lua:263: Unexpeted reply from Redis when trying to save
stack traceback:
[C]: in function 'save'
/home/jhurst/prj/ledge/lib/ledge/ledge.lua:263: in function 'fetch'
/home/jhurst/prj/ledge/lib/ledge/ledge.lua:104: in function 'mw'
/home/jhurst/prj/lua-resty-rack/lib/resty/rack.lua:92: in function 'next'
...e/jhurst/prj/lua-resty-rack/lib/resty/rack/read_body.lua:9: in function 'mw'
/home/jhurst/prj/lua-resty-rack/lib/resty/rack.lua:92: in function 'next'
/home/jhurst/prj/lua-resty-rack/lib/resty/rack.lua:79: in function 'run'
This is because a 0 ttl doesn't affect res.cacheable()
. Irrespective of Cache-Control headers, if we have no ttl, we can't cache anything. We should still allow for responses with no cache headers, and a config with a max_stale (still on the TODO).
If an item expires, all is safe. But if a force refresh re-caches an item, the hash fields will be merged rather than replaced, resulting in potentially inconsistent behaviour.
We need to support some way of purging URLs from cache other than sending requests with 'Cache-Control: no-cache'.
Preferably with some way to purge URLs based on a pattern.
e.g. PURGE for www.example.com/foo/bar will also clear the cache for www.example.com/foo/bar/?abc=123
This should probably be done with Redis sets to avoid the potentially huge performance issues of using the KEYS
command in Redis.
The related keys should probably be configurable somehow or but maybe just relating query strings is good enough initially?
nginx/openresty will accept requests with a PURGE method through to lua and we should be able to use nginx's access control configuration options to secure it so that doesn't need to be done in Ledge
e.g.
if ($request_method = 'PURGE') {
rewrite (.*) /_purge/$1 last;
}
location /_purge {
internal;
allow 127.0.0.1;
deny all;
content_by_lua ' ... ';
}
edit:
Compatibility with Squid's purge method would be a good thing here I think.
I have a situation where I need subrequests that may be ledge requests as well. Should the options, etc, be done on a per request ledge table/object? Something like
local ledge = require "ledge.ledge"
local l = ledge.create()
l.set(......)
l.bind(...)
rack.use(l, { proxy_location = "/__ledge/example.com" })
rack.run()
This may also allow "caching" the ledge "objects" in init_by_lua. Perhaps you have named ledge objects, ie you pass a key to create and can later fetch the object by key.
Currently Via shows the host, which doesn't really help. It should be ngx.var.hostname
instead.
Maintenance mode loads whatever you supply at a given location, but really we should 302 to avoid bots indexing the wrong content.
Currently revalidation is spawned in a thread before sending the response. It then yields on i/o, returning to the parent thread and finishing the response to the client before returning to the background work.
If we have ESI fragments though, the scheduler yields on the ESI i/o, which returns to the background job, which blocks until complete.
For background work to happen reliably, it must be started after ngx.eof()
.
Deleting a cache entry changes some logic, such as deciding if it's ok to collapse requests, since we don't know if the item was previously cacheable.
We should instead expire the item on purge, as this is more useful.
Currently we can use the X-Cache and X-Cache-State response headers in the nginx access log to get a decent idea of what happened with each request.
However for requests that are uncacheable (Have cache-control: no-cache or whatever) those headers aren't sent and thus we just get an empty bit in the logfile.
It would also be nice to be able to log a bit more detailed information without filling the response with headers, things like was this a collapsed request or was it a 'negative hit' etc.
So maybe if we can push a state variable back to nginx and just add that to the log format?
Setting a new cache_key_spec arrives in strange keys, often with repeated prefixes:
ledge:cache_obj:ledge:cache_obj:ledge:cache_obj:
Not sure why as yet.
Sometimes it is desirable to serve known stale content. We need a configuration mechanism to express how stale things can be, probably as simple as:
ledge:config_set("max_stale", seconds | percentage)
If a number is provided, this will be added to the ttl to create a stale ttl. If a string is provided, in the format \d%
, the ttl is extended via the percentage given.
Stale content should have mark the response as RESPONSE_STATE_WARM
, and add the warning header 110 Response is stale
(see #32).
This seems nicer than sending a post across an internal subrequest.
We need to be able to revalidate/refresh cached responses when they are served stale.
Without blocking the current request and without delaying further requests on the same connection.
ngx.on_abort only fires if the client actually cancels the request so this isn't really that helpful
http://wiki.nginx.org/HttpLuaModule#ngx.on_abort
ngx.thread allows us to trigger a background fetch and re-save without blocking the serving of the current request
http://wiki.nginx.org/HttpLuaModule#ngx.thread.spawn
However, any additional requests on the same connection wait until all threads from the previous request has finished.
So thats not great either.
Basic example:
location /foo {
content_by_lua '
function bar()
ngx.sleep(5)
print("bar")
end
print("request start")
ngx.thread.spawn(bar)
ngx.say("foo")
ngx.eof()
print("request done")
';
}
2 requests back to back with a keepalive between:
https://gist.github.com/417457900b51d7083acf
The 'Cache-Control' header is normally sent with uppercase C's, but the accepts_cache function is looking for 'cache-control'
Maybe the headers table should be converted to all lowercase to avoid any issues with case?
This came up in discussion about Squid the other day but the short version is that Ledge needs to honour the Vary header when generating a cache key.
Vary is set by the origin server and specifies some request headers that should be used to determine if the the cached response is valid.
Basically we need to include any request headers listed in the Vary response header in Ledge's cache key.
e.g
Vary: Accept-Encoding means we have a different cache key for requests that specify 'Accept-Encoding: gzip' vs those that don't
Just reading the w3 specs on this makes my head hurt, good luck James!
http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.44
http://www.w3.org/Protocols/rfc2616/rfc2616-sec13.html#sec13.6
http://mark.koli.ch/2010/09/understanding-the-http-vary-header-and-caching-proxies-squid-etc.html
nginx, luajut, zeromq, zma_lua, nginx_lua, redis, etc.. Minimum versions need to be ascertained.
There's a window between failing to obtain the collapse forwarding lock (i.e. another request is currently fetching), and completing the SUBSCRIBE where the PUBLISH could have already happened (which means we never hear about the collapsed response).
This is handled with a transaction, so a key is watched before hand, and if it changes before SUBSCRIBE then we assume we missed the PUBLISH, and thus we go back to checking the cache.
We need test coverage for this though. Might not be trivial.
Mark Nottingham's suggestions for stale-if-error
and stale-while-revalidate
seem sensible. Currently on an upstream error we have no choice but to pass this to the client. Serving stale would be nice.
Set-Cookie should not be returned on a cache hit!
Currently doing this in the nginx conf:
ledge.bind("response_ready", function(req, res)
local state = res.header["X-Cache"]
if state and state == "HIT" then
res.header["Set-Cookie"] = nil
end
end)
This should probably just be a built in feature of Ledge though
It should be harder to accidentally set an unknown parameter, and in some cases setting a parameter with a mistyped or out of range value.
With defaults currently as nil
in some cases this isn't easy.
Looks like we're assuming ngx.ctx.ledge.event
exists.
2012/06/26 10:15:50 [error] 6995#0: *11806 lua handler aborted: runtime error: /home/jhurst/prj/ledge/lib/ledge/ledge.lua:81: attempt to index field 'ledge' (a nil value)
stack traceback:
/home/jhurst/prj/ledge/lib/ledge/ledge.lua: in function 'mw'
/home/jhurst/prj/lua-resty-rack/lib/resty/rack.lua:92: in function 'next'
...e/jhurst/prj/lua-resty-rack/lib/resty/rack/read_body.lua:9: in function 'mw'
/home/jhurst/prj/lua-resty-rack/lib/resty/rack.lua:92: in function 'next'
/home/jhurst/prj/lua-resty-rack/lib/resty/rack.lua:79: in function 'run'
Multiple headers of the same name are represented as a table in ngx_lua.
Ledge currently tries to pass this table into the redis hsetall which throws an HTTP 500 error.
We need to flatten this table and restore it on cache read.
This function (which determines if the client will accept a cached response) is still just a stub, and must be fleshed out in accordance with the HTTP spec.
This is now redundant, set()
should conditionally use ngx.ctx
depending on ngx.get_phase()
.
The line
res.header["Pragma"] = nil
doesn't appear to work for me, I'm not sure exactly why but changing it to "" fixed the problem.
Sorry for the lack of details!
This allows the origin to be protected if there's just no way it would cope without cache.
We need to handle the various revalidation / reload controls available correctly. After closing #25 it's become clear that actually greater distinction between a reload and revalidation is required.
There are three broad scenarios to cover. For more details read: http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.9.4
This occurs when the request does not accept a cacheable response under any circumstance, usually with Cache-Control: no-cache
. We should always go to the origin, and the cache state should be RELOADED
.
The request specifies Cache-Control: max-age=0
and one or more conditional validators, such as If-Modified-Since: {HTTP_Date_Stamp}
, If-None-Match: {Etag}
etc. It means that we SHOULD validate against our internal cache. If it is still valid, we can return 304 Not Modified
.
The cache state should be either HOT
or WARM
.
If our copy is not valid against the validator provided, we revalidate upstream (to another cache or the origin). We should use our own validator (e.g. If-Modified-Since
based on the Last-Modified
in our cache). If we get 304
back from upstream, we can safely return 200
to the client with our copy, since we now know it is considered valid. If not, we can compare the new 200
response against the validator supplied in the request. If the new response meets the client requirements, we can update our cache and return 304
to avoid transfer to the client. Otherwise we return 200
with the new response.
The cache state should be REVALIDATED
.
The request specifies Cache-Control: max-age=0
but no validator. This means we must revalidate upstream, using our cached validator (Last-Modified
) if present.
The cache state should be REVALIDATED
.
These are defined here: http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.25
If more than one validator is supplied, they must all be considered. That is, If-None-Match
does not override If-Modified-Since
when both are present - both conditions must be run. Also if more than one etag is provided in an If-Not-Match
conditional, then they must all be considered.
Nginx will deal with setting an appropriate content-length header for 1.0 clients.. it definitely shouldn't be cached. If it is cached, the output body length will likely be different, resulting in incomplete page loads.
Rather than enabling ESI via an event hook, it may be better to simply have a configuration option.
ledge:config_set("esi_enabled", true | false)
If the option is enabled, we can look for ESI instructions in the representation prior to saving, and store additional flags about the types of ESI instructions included as metadata.
Then during the fast path, having read from cache, we choose to process ESI instructions as a function of the config option being enabled, and ESI metadata being present.
The metadata can be finely grained, one flag for each type of instruction (comment removal, esi:remove, esi:include and any others we support over time). This ensures the fast path only performs the regex required.
Unless I'm missing something there isn't a way to bind to ledge once the request is received but before ledge has started processing it.
So to modify the incoming request before Ledge has checked if its cacheable, determined the cache key etc
This would be good for things like stripping out no-cache headers from the request for certain IPs or URIs.
It seemed sensible to ensure cache keys are unique for the current request method. In reality though, a POST response with cacheable headers present MAY be used for subsequent GET / HEAD requests at the same URI.
Similarly, if a GET request for a URI results in revalidation, the next HEAD request for this URI should have the latest validation data.
So rather than splitting the cache between request methods, we should have single items per URI, and introduce logic to ensure that, for example, a HEAD request which reaches the origin doesn't override the body for the item (since it will be empty).
http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.46
Multiple warning headers MAY be set. This is a list of the currently-defined warn-codes, each with a recommended warn-text in English, and a description of its meaning.
110 Response is stale MUST be included whenever the returned response is stale.
111 Revalidation failed MUST be included if a cache returns a stale response because an attempt to revalidate the response failed, due to an inability to reach the server.
112 Disconnected operation SHOULD be included if the cache is intentionally disconnected from the rest of the network for a period of time.
113 Heuristic expiration MUST be included if the cache heuristically chose a freshness lifetime greater than 24 hours and the response's age is greater than 24 hours.
199 Miscellaneous warning The warning text MAY include arbitrary information to be presented to a human user, or logged. A system receiving this warning MUST NOT take any automated action, besides presenting the warning to the user.
214 Transformation applied MUST be added by an intermediate cache or proxy if it applies any transformation changing the content-coding (as specified in the Content-Encoding header) or media-type (as specified in the Content-Type header) of the response, or the entity-body of the response, unless this Warning code already appears in the response.
299 Miscellaneous persistent warning The warning text MAY include arbitrary information to be presented to a human user, or logged. A system receiving this warning MUST NOT take any automated action.
The events from Ledge are useful, but it would be nice to have a system where modules can be supplied to Ledge as plugins, which make use of the events as they see fit. Something like:
local plugin = require "my.ledge.plugin"
ledge.use(plugin)
The plugin would call ledge.bind
as required to access the events.
This helps distinguish between expired cacheable responses, and those which should simply be proxied and not cached.
Currently any work scheduled can easily get duplicated.
http://host/page is fine, but http://host attempts to redirect to http://hosthttp//host.
Explicitly removing the trailing slash in code works, but that's just masking the issue.
When the origin does not return cache-control, it results in the following error:
2012/06/26 10:25:56 [error] 17254#0: *1 lua handler aborted: runtime error: squiz_edge_installs/ledge/lib/ledge/ledge.lua:200: attempt to index field 'Cache-Control' (a nil value)
By default, we 200 if a fragment errors. We should instead pass a 5xx error onto the parent resource, unless onerror="continue"
is specified.
If continue
is specified the cache lifetime of the parent resource should be minimised explicitly to 0
. This ensure that even though the resource is cached for us, browsers and downstream intermediaries will keep coming back for a fresh copy until the error is resolved.
The following (so-called "hop-by-hop") headers are not-cacheable, according to http://www.w3.org/Protocols/rfc2616/rfc2616-sec13.html#sec13.5.1
Basic idea for this is to use SETNX to determine if a request is already going to the origin.
Then redis PUBSUB to know when the first request has finished.
Should we be collapsing all requests to the origin or just those requests that are accepting cache?
If we collapse 'no-cache' requests is it acceptable to return the cached (but just saved) content?
This is going to strip certain headers etc and seems like a bad idea to me.
Items are stored as a Redis hash, along these lines:
["ledge:cache_obj:http:GET:example.com:/test"] = {
status = 200,
h:Content-Type = "text/html",
h:Expires = "Sat, 23 Jun 2012 12:08:23 GMT",
body = [BODY CONTENT RAW],
uri = "http://example.com/test"
}
The hash itself has EXPIRE
set to the ttl of the item, based on cache headers etc. The code looks like this:
ngx.ctx.redis:hmset(ngx.ctx.ledge.cache_key,
'body', res.body,
'status', res.status,
'uri', req.uri_full,
unpack(h)
)
ngx.ctx.redis:expire(ngx.ctx.ledge.cache_key, res.ttl())
In terms of control data, we have a sorted set called ledge:uris_by_expiry
which allows us to index cache keys by their expiry timestamp.
COLD
and SUBZERO
states.I'm thinking that we stop expiring the main cache keys, instead they persist as metadata. The body becomes a separate (volatile) key, referenced by a hash of the content, which will be handy for validation patterns. We could also add an option for large files to be stored on the filesystem, accessed by an ngx.exec()
to a location containing try_files
.
["ledge:cache:http:GET:example.com:/test"] = {
status = 200,
h:Content-Type = "text/html",
h:Expires = "Sat, 23 Jun 2012 12:08:23 GMT",
body = "d41d8cd98f00b204e9800998ecf8427e",
expires = 1340453303, -- A calculated timestamp
max_stale = 3600, -- Additional "stale" time
}
["ledge:cache_obj:d41d8cd98f00b204e9800998ecf8427e"] = [BODY CONTENT RAW]
The cache logic could be expressed as (pseudo):
local x_cache_state = SUBZERO
if cache_meta.exists() then
x_cache_state = COLD
end
if cache_meta.expires() > time.now() then
x_cache_state = HOT
elseif cache_meta.expires() + cache_meta.max_stale() > time.now() then
x_cache_state = WARM
-- Attempt background refresh
else
fetch_from_origin()
end
That is, if we have a cache key matching the request, then the X-Cache-State is at least COLD
. Otherwise, it is SUBZERO
. If the expires
key is in the future, X-Cache-State is HOT. Otherwise, if expires + max_stale
is in the future, X-Cache-State is WARM and a background refresh should be attempted (still on the TODO list).
If we have X-Cache-State == HOT
and no cache body item, an error has occurred.
Putting the server into offline mode would involve something like:
-- Get all cache objects
local items = ngx.ctx.redis:keys("ledge:cache_obj:*)
for _,v in ipairs(items)
ngx.ctx.redis:persist(v) -- Remove expiry
end
And restoring would look something like:
-- Get all items not yet expired
local items = ngx.ctx.redis:zrevrangebyscore('ledge:uris_by_expiry', -1, time.now(), "WITHSCORES")
for _,v in ipairs(items)
ngx.ctx.redis:expire(items[1], items[2]) -- Re-set the expiry
end
-- Get all expired items and clean up
local expired_items = ngx.ctx.redis:zangebyscore('ledge:uris_by_expiry', 0, time.now())
for _,v in ipairs(items)
ngx.ctx.redis:del(v)
end
Though this feature needs more discussion generally, I just want to make sure it's possible.
We have two ways of setting configuration in Ledge. Firstly the table of options via the Rack interface:
local opts = {
proxy_location = '/__ledge/origin',
redis = {
host = 'localhost',
port = 6739,
}
}
rack.use(ledge, opts)
And the second via the set()
function:
ledge.set("cache_key_spec", {
ngx.var.request_method,
--ngx.var.scheme,
ngx.var.host,
ngx.var.uri,
ngx.md5(ngx.var.args)
})
The reasons are slightly historical.. it goes back to before ngx_lua had ngx.ctx
. But really, even though ngx.ctx is per-request, it probably makes sense for all config data to be stored there. That is, you could have nginx config which allowed you to use two different proxy_locations, based on the URI of the request.
So I think we should move all config to the second method; ledge.set(option, value, filter?)
. I'm still not 100% on the filter table idea, but leaving that aside for now, it'd be nice to agree on some sensible options for all config parameters, so that the simplest case to load Ledge can be two lines of Lua.
local rack = require "resty.rack"
rack.use(require "ledge.ledge")
Default: /__ledge/origin
Renamed from "proxy_location" to be more generic (you might not be proxying).
Default: 127.0.0.1
Default: 6379
Default: nil
connect()
will use TCP by default, unless redis_socket
is defined.
Default: nil
ngx_lua defaults to 60s, overridable per worker process by using the lua_socket_read_timeout
directive. Only set this if you want fine grained control over Redis timeouts (rather than all cosocket connections).
Default: nil
ngx_lua defaults to 60s, overridable per worker process by using the lua_socket_keepalive_timeout
directive.
Default: nil
ngx_lua defaults to 30, overridable per worker process by using the lua_socket_pool_size
directive.
Default:
{
ngx.var.request_method,
ngx.var.scheme,
ngx.var.host,
ngx.var.uri,
ngx.var.args
}
local ledge = require "ledge.ledge"
ledge.set("origin_location", "/__ledge/my_special_location_name")
ledge.set("redis_socket", "/tmp/redis.sock")
ledge.set("redis_keepalive_pool_size", 1000)
ledge.set("cache_key_spec", {
ngx.var.request_method,
ngx.var.host,
ngx.var.uri,
ngx.var.args
})
ledge.bind("origin_fetched", function(req, res)
-- Do some things to the responses
end)
local rack = require "resty.rack"
rack.use(ledge)
Any opinions on this, and the defaults chosen?
lua handler aborted: ledge.lua:340: attempt to index field 'ledge' (a nil value)
How much control do we need over this? The current key is:
set $cache_key ledge:cache_obj:$request_method:$scheme:$host:$uri:$query_hash;
Which I like because we end with keys which can easily be filtered in Redis:
redis 127.0.0.1:6379> keys ledge:cache_obj:*:*:example.com:/about*
1) "ledge:cache_obj:GET:http:example.com:/about:"
2) "ledge:cache_obj:GET:http:example.com:/about/contact-us:"
One side affect of baking the cache key in code is that you can't abstract certain elements to increase cache hit rates (at the expense of collisions).
For example, if you're doing SSL termination at Nginx, your cache entries for HTTP and HTTPS might be identical.. No point in splitting the cache, just remove $scheme
from the cache key and you'll get hits across both.
Some people might even find that useful for $host
.
Currently when we revalidate upstream, if there are no client validators we create them from the last cache entry (as per the RFC). However, by leaving things in this state, Nginx can return 304 to a client which has no cache item (i.e. didn't send a validator), yielding an empty body.
Client validators (including the lack of) should be restored after revalidating upstream. Ideally we'd just add those headers in for the upstream request, but OpenResty doesn't currently support this.
It looks like nginx will happily pass any kind of request method through to certain location blocks, including content_by_lua blocks.
Ledge is passing on
ngx['HTTP_' .. ngx.req.get_method()]
as the method to the origin which returns nil and defaults to GET.
Vanilla nginx serving files off disk does this:
https://gist.github.com/da66b92c168528a3f5dd
Ledge does this:
https://gist.github.com/0ab5b15320e635b5ad7e
We should probably check that ngx['HTTP_' .. ngx.req.get_method()]
returns something valid before proxying
2012/06/26 16:11:43 [error] 11412#0: *12489 lua handler aborted: runtime error: /home/jhurst/prj/ledge/lib/ledge/ledge.lua:135: attempt to index field 'keepalive' (a nil value)
stack traceback:
/home/jhurst/prj/ledge/lib/ledge/ledge.lua: in function 'redis_close'
/home/jhurst/prj/ledge/lib/ledge/ledge.lua:113: in function 'mw'
Related to #11.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.