ledgetech / ledge Goto Github PK

View Code? Open in Web Editor NEW

455.0 455.0 59.0 3.25 MB

An RFC compliant and ESI capable HTTP cache for Nginx / OpenResty, backed by Redis

Makefile 0.95% Lua 98.34% Perl 0.71%

cache edge esi http lua luajit openresty redis redis-sentinel

ledge's People

Contributors

Stargazers

Watchers

Forkers

hamishforbes harveyzh benagricola jaakkos scalp42 webeau josexie ajayk timothy-r jmmitchell melinite lneto mindis iakuf vonwenm musicglue cloudxtreme heartshare tabcorp zyq2007 forkfork dwightgunning 0-t-0 hanssonr seymer atula togear keen99 peak-dev wxfpdc subhransusekhar peckjerry congdonglinux manucutillas nginx-modules jloh 11mariom nilkkkk infinityhacks adslqa canv15 han4235 i90runner 10oo01sir imfantuan zeus911 zhanglei mvandermeulen hyqgod simhaonline icodus csmithatsquiz cyrengmbh xiyang2011 seanpm2001 servdhost

ledge's Issues

Age header should be calculated

The Age response-header field conveys the sender's estimate of the amount of time since the response (or its revalidation) was generated at the origin server.

This information is used by intermediate proxies to convey an estimate of how old a stored response is:

HTTP/1.1 requires origin servers to send a Date header, if possible, with every response, giving the time at which the response was generated (see section 14.18). We use the term "date_value" to denote the value of the Date header, in a form appropriate for arithmetic operations.

HTTP/1.1 uses the Age response-header to convey the estimated age of the response message when obtained from a cache. The Age field value is the cache's estimate of the amount of time since the response was generated or revalidated by the origin server.

Furthermore, the Age header field is intended to be used by intermediate caches only:

The presence of an Age header field in a response implies that a response is not first-hand.

That means the presence of the header field Age: 0 means that the received response was sent by an intermediate cache and is only zero seconds old. So it was probably just fetched from the origin server before sending it to the client.

res.ttl() needs to honour Cache-Control: max-age / s-maxage

The precedence is:

Cache-Control: s-maxage=$SECONDS
Cache-Control: max-age=$SECONDS
Expires: $HTTP_DATE_STAMP

Config options set to false falling back to defaults

If you set a config option to true during init_by_lua, and then attempt to override to false in a subsequent phase, config_get() falls through to the initial value, as false is assumed to be nil.

Redis fails to save when ttl is 0

2012/06/26 10:46:56 [error] 7428#0: *11891 lua handler aborted: runtime error: /home/jhurst/prj/ledge/lib/ledge/ledge.lua:263: Unexpeted reply from Redis when trying to save
stack traceback:
    [C]: in function 'save'
    /home/jhurst/prj/ledge/lib/ledge/ledge.lua:263: in function 'fetch'
    /home/jhurst/prj/ledge/lib/ledge/ledge.lua:104: in function 'mw'
    /home/jhurst/prj/lua-resty-rack/lib/resty/rack.lua:92: in function 'next'
    ...e/jhurst/prj/lua-resty-rack/lib/resty/rack/read_body.lua:9: in function 'mw'
    /home/jhurst/prj/lua-resty-rack/lib/resty/rack.lua:92: in function 'next'
    /home/jhurst/prj/lua-resty-rack/lib/resty/rack.lua:79: in function 'run'

This is because a 0 ttl doesn't affect res.cacheable(). Irrespective of Cache-Control headers, if we have no ttl, we can't cache anything. We should still allow for responses with no cache headers, and a config with a max_stale (still on the TODO).

Cache items should be removed when being re-saved

If an item expires, all is safe. But if a force refresh re-caches an item, the hash fields will be merged rather than replaced, resulting in potentially inconsistent behaviour.

Purge Requests

We need to support some way of purging URLs from cache other than sending requests with 'Cache-Control: no-cache'.

Preferably with some way to purge URLs based on a pattern.
e.g. PURGE for www.example.com/foo/bar will also clear the cache for www.example.com/foo/bar/?abc=123

This should probably be done with Redis sets to avoid the potentially huge performance issues of using the KEYS command in Redis.
The related keys should probably be configurable somehow or but maybe just relating query strings is good enough initially?

nginx/openresty will accept requests with a PURGE method through to lua and we should be able to use nginx's access control configuration options to secure it so that doesn't need to be done in Ledge
e.g.

if ($request_method = 'PURGE') {
    rewrite (.*) /_purge/$1 last;
}
location /_purge {
    internal;
    allow 127.0.0.1;
    deny all;
    content_by_lua ' ... ';
}

edit:
Compatibility with Squid's purge method would be a good thing here I think.

Can only use one set of ledge options per request?

I have a situation where I need subrequests that may be ledge requests as well. Should the options, etc, be done on a per request ledge table/object? Something like

local ledge = require "ledge.ledge"
local l = ledge.create()
l.set(......)
l.bind(...)
rack.use(l, { proxy_location = "/__ledge/example.com" })
rack.run()

This may also allow "caching" the ledge "objects" in init_by_lua. Perhaps you have named ledge objects, ie you pass a key to create and can later fetch the object by key.

Via header should indicate the proxy server name

Currently Via shows the host, which doesn't really help. It should be ngx.var.hostname instead.

Maintenance mode should 302

Maintenance mode loads whatever you supply at a given location, but really we should 302 to avoid bots indexing the wrong content.

Background revalidation should be triggered after sending to the client

Currently revalidation is spawned in a thread before sending the response. It then yields on i/o, returning to the parent thread and finishing the response to the client before returning to the background work.

If we have ESI fragments though, the scheduler yields on the ESI i/o, which returns to the background job, which blocks until complete.

For background work to happen reliably, it must be started after ngx.eof().

Purge should expire a cache entry, not delete it

Deleting a cache entry changes some logic, such as deciding if it's ok to collapse requests, since we don't know if the item was previously cacheable.

We should instead expire the item on purge, as this is more useful.

Logging Cache State

Currently we can use the X-Cache and X-Cache-State response headers in the nginx access log to get a decent idea of what happened with each request.

However for requests that are uncacheable (Have cache-control: no-cache or whatever) those headers aren't sent and thus we just get an empty bit in the logfile.

It would also be nice to be able to log a bit more detailed information without filling the response with headers, things like was this a collapsed request or was it a 'negative hit' etc.

So maybe if we can push a state variable back to nginx and just add that to the log format?

cache_key_spec doesn't appear to work

Setting a new cache_key_spec arrives in strange keys, often with repeated prefixes:

ledge:cache_obj:ledge:cache_obj:ledge:cache_obj:

Not sure why as yet.

Serving stale content based on configuration

Sometimes it is desirable to serve known stale content. We need a configuration mechanism to express how stale things can be, probably as simple as:

ledge:config_set("max_stale", seconds | percentage)

If a number is provided, this will be added to the ttl to create a stale ttl. If a string is provided, in the format \d%, the ttl is extended via the percentage given.

Stale content should have mark the response as RESPONSE_STATE_WARM, and add the warning header 110 Response is stale (see #32).

Refactor Redis to use the cosocket API client driver

This seems nicer than sending a post across an internal subrequest.

https://github.com/agentzh/lua-resty-redis

Background revalidate

We need to be able to revalidate/refresh cached responses when they are served stale.
Without blocking the current request and without delaying further requests on the same connection.

ngx.on_abort only fires if the client actually cancels the request so this isn't really that helpful
http://wiki.nginx.org/HttpLuaModule#ngx.on_abort

ngx.thread allows us to trigger a background fetch and re-save without blocking the serving of the current request
http://wiki.nginx.org/HttpLuaModule#ngx.thread.spawn
However, any additional requests on the same connection wait until all threads from the previous request has finished.
So thats not great either.

Basic example:

location /foo {
    content_by_lua '
        function bar()
            ngx.sleep(5)
            print("bar")
        end
        print("request start")
        ngx.thread.spawn(bar)
        ngx.say("foo")
        ngx.eof()
        print("request done")
    ';
}

2 requests back to back with a keepalive between:
https://gist.github.com/417457900b51d7083acf

Cache-Control case

The 'Cache-Control' header is normally sent with uppercase C's, but the accepts_cache function is looking for 'cache-control'

Maybe the headers table should be converted to all lowercase to avoid any issues with case?

Ledge should honour the Vary header

This came up in discussion about Squid the other day but the short version is that Ledge needs to honour the Vary header when generating a cache key.

Vary is set by the origin server and specifies some request headers that should be used to determine if the the cached response is valid.
Basically we need to include any request headers listed in the Vary response header in Ledge's cache key.
e.g
Vary: Accept-Encoding means we have a different cache key for requests that specify 'Accept-Encoding: gzip' vs those that don't

Just reading the w3 specs on this makes my head hurt, good luck James!

http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.44
http://www.w3.org/Protocols/rfc2616/rfc2616-sec13.html#sec13.6
http://mark.koli.ch/2010/09/understanding-the-http-vary-header-and-caching-proxies-squid-etc.html

Dependencies need to be properly identified

nginx, luajut, zeromq, zma_lua, nginx_lua, redis, etc.. Minimum versions need to be ascertained.

Test needed for collapsed forwarding subscribe overlap

There's a window between failing to obtain the collapse forwarding lock (i.e. another request is currently fetching), and completing the SUBSCRIBE where the PUBLISH could have already happened (which means we never hear about the collapsed response).

This is handled with a transaction, so a key is watched before hand, and if it changes before SUBSCRIBE then we assume we missed the PUBLISH, and thus we go back to checking the cache.

We need test coverage for this though. Might not be trivial.

Semantics for serving stale content

Mark Nottingham's suggestions for stale-if-error and stale-while-revalidate seem sensible. Currently on an upstream error we have no choice but to pass this to the client. Serving stale would be nice.

https://tools.ietf.org/html/rfc5861

Set-Cookie headers

Set-Cookie should not be returned on a cache hit!

Currently doing this in the nginx conf:

ledge.bind("response_ready", function(req, res)
     local state = res.header["X-Cache"]
     if state and state == "HIT" then
        res.header["Set-Cookie"] = nil
     end
 end)

This should probably just be a built in feature of Ledge though

Setting wrong configuration parameters

It should be harder to accidentally set an unknown parameter, and in some cases setting a parameter with a mistyped or out of range value.

With defaults currently as nil in some cases this isn't easy.

Fatal error if you don't bind to at least one event

Looks like we're assuming ngx.ctx.ledge.event exists.

2012/06/26 10:15:50 [error] 6995#0: *11806 lua handler aborted: runtime error: /home/jhurst/prj/ledge/lib/ledge/ledge.lua:81: attempt to index field 'ledge' (a nil value)
stack traceback:
    /home/jhurst/prj/ledge/lib/ledge/ledge.lua: in function 'mw'
    /home/jhurst/prj/lua-resty-rack/lib/resty/rack.lua:92: in function 'next'
    ...e/jhurst/prj/lua-resty-rack/lib/resty/rack/read_body.lua:9: in function 'mw'
    /home/jhurst/prj/lua-resty-rack/lib/resty/rack.lua:92: in function 'next'
    /home/jhurst/prj/lua-resty-rack/lib/resty/rack.lua:79: in function 'run'

Ledge errors saving multiple identical response headers

Multiple headers of the same name are represented as a table in ngx_lua.
Ledge currently tries to pass this table into the redis hsetall which throws an HTTP 500 error.

We need to flatten this table and restore it on cache read.

req.accepts_cache is incomplete

This function (which determines if the client will accept a cached response) is still just a stub, and must be fleshed out in accordance with the HTTP spec.

Remove gset()

This is now redundant, set() should conditionally use ngx.ctx depending on ngx.get_phase().

Non working "ignore headers and just cache it" code in README

The line
res.header["Pragma"] = nil
doesn't appear to work for me, I'm not sure exactly why but changing it to "" fixed the problem.

Sorry for the lack of details!

If redis_connect fails, we should be able to 503 instead of redirect to the origin

This allows the origin to be protected if there's just no way it would cope without cache.

Validation

We need to handle the various revalidation / reload controls available correctly. After closing #25 it's become clear that actually greater distinction between a reload and revalidation is required.

There are three broad scenarios to cover. For more details read: http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.9.4

1) End-to-end reload

This occurs when the request does not accept a cacheable response under any circumstance, usually with Cache-Control: no-cache. We should always go to the origin, and the cache state should be RELOADED.

2) Specific end-to-end revalidation

The request specifies Cache-Control: max-age=0 and one or more conditional validators, such as If-Modified-Since: {HTTP_Date_Stamp}, If-None-Match: {Etag} etc. It means that we SHOULD validate against our internal cache. If it is still valid, we can return 304 Not Modified.

The cache state should be either HOT or WARM.

If our copy is not valid against the validator provided, we revalidate upstream (to another cache or the origin). We should use our own validator (e.g. If-Modified-Since based on the Last-Modified in our cache). If we get 304 back from upstream, we can safely return 200 to the client with our copy, since we now know it is considered valid. If not, we can compare the new 200 response against the validator supplied in the request. If the new response meets the client requirements, we can update our cache and return 304 to avoid transfer to the client. Otherwise we return 200 with the new response.

The cache state should be REVALIDATED.

3) Unspecified end-to-end revalidation

The request specifies Cache-Control: max-age=0 but no validator. This means we must revalidate upstream, using our cached validator (Last-Modified) if present.

The cache state should be REVALIDATED.

Validators

If-Match
If-None-Match
If-Modified-Since
If-Unmodified-Since

These are defined here: http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.25

Other notes

If more than one validator is supplied, they must all be considered. That is, If-None-Match does not override If-Modified-Since when both are present - both conditions must be run. Also if more than one etag is provided in an If-Not-Match conditional, then they must all be considered.

Content-Length should not be cached

Nginx will deal with setting an appropriate content-length header for 1.0 clients.. it definitely shouldn't be cached. If it is cached, the output body length will likely be different, resulting in incomplete page loads.

Retain metadata regarding ESI instructions during save

Rather than enabling ESI via an event hook, it may be better to simply have a configuration option.

ledge:config_set("esi_enabled", true | false)

If the option is enabled, we can look for ESI instructions in the representation prior to saving, and store additional flags about the types of ESI instructions included as metadata.

Then during the fast path, having read from cache, we choose to process ESI instructions as a function of the config option being enabled, and ESI metadata being present.

The metadata can be finely grained, one flag for each type of instruction (comment removal, esi:remove, esi:include and any others we support over time). This ensures the fast path only performs the regex required.

request_ready event

Unless I'm missing something there isn't a way to bind to ledge once the request is received but before ledge has started processing it.
So to modify the incoming request before Ledge has checked if its cacheable, determined the cache key etc

This would be good for things like stripping out no-cache headers from the request for certain IPs or URIs.

Removing request method from the default cache key

It seemed sensible to ensure cache keys are unique for the current request method. In reality though, a POST response with cacheable headers present MAY be used for subsequent GET / HEAD requests at the same URI.

Similarly, if a GET request for a URI results in revalidation, the next HEAD request for this URI should have the latest validation data.

So rather than splitting the cache between request methods, we should have single items per URI, and introduce logic to ensure that, for example, a HEAD request which reaches the origin doesn't override the body for the item (since it will be empty).

Warning header

http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.46

Multiple warning headers MAY be set. This is a list of the currently-defined warn-codes, each with a recommended warn-text in English, and a description of its meaning.

110 Response is stale MUST be included whenever the returned response is stale.

111 Revalidation failed MUST be included if a cache returns a stale response because an attempt to revalidate the response failed, due to an inability to reach the server.

112 Disconnected operation SHOULD be included if the cache is intentionally disconnected from the rest of the network for a period of time.

113 Heuristic expiration MUST be included if the cache heuristically chose a freshness lifetime greater than 24 hours and the response's age is greater than 24 hours.

199 Miscellaneous warning The warning text MAY include arbitrary information to be presented to a human user, or logged. A system receiving this warning MUST NOT take any automated action, besides presenting the warning to the user.

214 Transformation applied MUST be added by an intermediate cache or proxy if it applies any transformation changing the content-coding (as specified in the Content-Encoding header) or media-type (as specified in the Content-Type header) of the response, or the entity-body of the response, unless this Warning code already appears in the response.

299 Miscellaneous persistent warning The warning text MAY include arbitrary information to be presented to a human user, or logged. A system receiving this warning MUST NOT take any automated action.

Plugin system

The events from Ledge are useful, but it would be nice to have a system where modules can be supplied to Ledge as plugins, which make use of the events as they see fit. Something like:

local plugin = require "my.ledge.plugin"
ledge.use(plugin)

The plugin would call ledge.bind as required to access the events.

Don't add X-Cache* headers for non cacheable responses

This helps distinguish between expired cacheable responses, and those which should simply be proxied and not cached.

Pledge primers must coordinate to avoid multiple fetches

Currently any work scheduled can easily get duplicated.

Trailing slash is not being proxied correctly

http://host/page is fine, but http://host attempts to redirect to http://hosthttp//host.

Explicitly removing the trailing slash in code works, but that's just masking the issue.

status code 500 when cache-control is not set

When the origin does not return cache-control, it results in the following error:

2012/06/26 10:25:56 [error] 17254#0: *1 lua handler aborted: runtime error: squiz_edge_installs/ledge/lib/ledge/ledge.lua:200: attempt to index field 'Cache-Control' (a nil value)

ESI include tags should support the onerror attribute

By default, we 200 if a fragment errors. We should instead pass a 5xx error onto the parent resource, unless onerror="continue" is specified.

If continue is specified the cache lifetime of the parent resource should be minimised explicitly to 0. This ensure that even though the resource is cached for us, browsers and downstream intermediaries will keep coming back for a fresh copy until the error is resolved.

Hop-by-hop headers should not be cached

The following (so-called "hop-by-hop") headers are not-cacheable, according to http://www.w3.org/Protocols/rfc2616/rfc2616-sec13.html#sec13.5.1

Connection
Keep-Alive
Proxy-Authenticate
Proxy-Authorization
TE
Trailers
Transfer-Encoding
Upgrade

Collapsed Forwarding

Basic idea for this is to use SETNX to determine if a request is already going to the origin.
Then redis PUBSUB to know when the first request has finished.

Should we be collapsing all requests to the origin or just those requests that are accepting cache?
If we collapse 'no-cache' requests is it acceptable to return the cached (but just saved) content?
This is going to strip certain headers etc and seems like a bad idea to me.

Redis data structure refactor

Current structure

Items are stored as a Redis hash, along these lines:

["ledge:cache_obj:http:GET:example.com:/test"] = {
    status = 200,
    h:Content-Type = "text/html",
    h:Expires = "Sat, 23 Jun 2012 12:08:23 GMT",
    body = [BODY CONTENT RAW],
    uri = "http://example.com/test"
}

The hash itself has EXPIRE set to the ttl of the item, based on cache headers etc. The code looks like this:

ngx.ctx.redis:hmset(ngx.ctx.ledge.cache_key, 
        'body', res.body, 
        'status', res.status,
        'uri', req.uri_full,
        unpack(h)
)
ngx.ctx.redis:expire(ngx.ctx.ledge.cache_key, res.ttl())

In terms of control data, we have a sorted set called ledge:uris_by_expiry which allows us to index cache keys by their expiry timestamp.

Problems

All of the useful information is expired with the item.
- We have no way to know the difference between the COLD and SUBZERO states.
- We have no place to collect hit/miss stats for individual items.
We lack a mechanism to compare response bodies using a hash for Etag validation. The body is just stored raw.
If we wanted an "offline mode", where the server stops expiring items, we lack the metadata to re-expire items once offline mode is disabled again (because the expires is not stored as a calculated timestamp)

Proposed new structure

I'm thinking that we stop expiring the main cache keys, instead they persist as metadata. The body becomes a separate (volatile) key, referenced by a hash of the content, which will be handy for validation patterns. We could also add an option for large files to be stored on the filesystem, accessed by an ngx.exec() to a location containing try_files.

["ledge:cache:http:GET:example.com:/test"] = {
    status = 200,
    h:Content-Type = "text/html",
    h:Expires = "Sat, 23 Jun 2012 12:08:23 GMT",
    body = "d41d8cd98f00b204e9800998ecf8427e",
    expires = 1340453303, -- A calculated timestamp
    max_stale = 3600, -- Additional "stale" time
}

["ledge:cache_obj:d41d8cd98f00b204e9800998ecf8427e"] = [BODY CONTENT RAW]

The cache logic could be expressed as (pseudo):

local x_cache_state = SUBZERO

if cache_meta.exists() then
    x_cache_state = COLD
end

if cache_meta.expires() > time.now() then
    x_cache_state = HOT
elseif cache_meta.expires() + cache_meta.max_stale() > time.now() then
    x_cache_state = WARM
    -- Attempt background refresh
else
    fetch_from_origin()
end

That is, if we have a cache key matching the request, then the X-Cache-State is at least COLD. Otherwise, it is SUBZERO. If the expires key is in the future, X-Cache-State is HOT. Otherwise, if expires + max_stale is in the future, X-Cache-State is WARM and a background refresh should be attempted (still on the TODO list).

If we have X-Cache-State == HOT and no cache body item, an error has occurred.

Offline mode

Putting the server into offline mode would involve something like:

-- Get all cache objects
local items = ngx.ctx.redis:keys("ledge:cache_obj:*)
for _,v in ipairs(items)
    ngx.ctx.redis:persist(v) -- Remove expiry
end

And restoring would look something like:

-- Get all items not yet expired
local items = ngx.ctx.redis:zrevrangebyscore('ledge:uris_by_expiry', -1, time.now(), "WITHSCORES")
for _,v in ipairs(items)
    ngx.ctx.redis:expire(items[1], items[2]) -- Re-set the expiry
end

-- Get all expired items and clean up
local expired_items = ngx.ctx.redis:zangebyscore('ledge:uris_by_expiry', 0, time.now())
for _,v in ipairs(items)
    ngx.ctx.redis:del(v)
end

Though this feature needs more discussion generally, I just want to make sure it's possible.

Config mechanism is unclear

We have two ways of setting configuration in Ledge. Firstly the table of options via the Rack interface:

local opts = {
    proxy_location = '/__ledge/origin',
    redis = {
        host = 'localhost',
        port = 6739,
    }
}

rack.use(ledge, opts)

And the second via the set() function:

ledge.set("cache_key_spec", {
    ngx.var.request_method,
    --ngx.var.scheme,
    ngx.var.host,
    ngx.var.uri,
    ngx.md5(ngx.var.args)
})

The reasons are slightly historical.. it goes back to before ngx_lua had ngx.ctx. But really, even though ngx.ctx is per-request, it probably makes sense for all config data to be stored there. That is, you could have nginx config which allowed you to use two different proxy_locations, based on the URI of the request.

So I think we should move all config to the second method; ledge.set(option, value, filter?). I'm still not 100% on the filter table idea, but leaving that aside for now, it'd be nice to agree on some sensible options for all config parameters, so that the simplest case to load Ledge can be two lines of Lua.

local rack = require "resty.rack"
rack.use(require "ledge.ledge")

Default options

origin_location

Default: /__ledge/origin

Renamed from "proxy_location" to be more generic (you might not be proxying).

redis_host

Default: 127.0.0.1

redis_port

Default: 6379

redis_socket

Default: nil

connect() will use TCP by default, unless redis_socket is defined.

redis_timeout

Default: nil

ngx_lua defaults to 60s, overridable per worker process by using the lua_socket_read_timeout directive. Only set this if you want fine grained control over Redis timeouts (rather than all cosocket connections).

redis_keepalive_timeout

Default: nil

ngx_lua defaults to 60s, overridable per worker process by using the lua_socket_keepalive_timeout directive.

redis_keepalive_pool_size

Default: nil

ngx_lua defaults to 30, overridable per worker process by using the lua_socket_pool_size directive.

cache_key_spec

Default:

{
    ngx.var.request_method,
    ngx.var.scheme,
    ngx.var.host,
    ngx.var.uri,
    ngx.var.args
}

Example

local ledge = require "ledge.ledge"

ledge.set("origin_location", "/__ledge/my_special_location_name")
ledge.set("redis_socket", "/tmp/redis.sock")
ledge.set("redis_keepalive_pool_size", 1000)
ledge.set("cache_key_spec", {
    ngx.var.request_method,
    ngx.var.host,
    ngx.var.uri,
    ngx.var.args
})

ledge.bind("origin_fetched", function(req, res)
    -- Do some things to the responses
end)

local rack = require "resty.rack"
rack.use(ledge)

Any opinions on this, and the defaults chosen?

bind() causes a fatal

lua handler aborted: ledge.lua:340: attempt to index field 'ledge' (a nil value)

Move cache key generation to Ledge

How much control do we need over this? The current key is:

set $cache_key ledge:cache_obj:$request_method:$scheme:$host:$uri:$query_hash;

Which I like because we end with keys which can easily be filtered in Redis:

redis 127.0.0.1:6379> keys ledge:cache_obj:*:*:example.com:/about*
1) "ledge:cache_obj:GET:http:example.com:/about:"
2) "ledge:cache_obj:GET:http:example.com:/about/contact-us:"

One side affect of baking the cache key in code is that you can't abstract certain elements to increase cache hit rates (at the expense of collisions).

For example, if you're doing SSL termination at Nginx, your cache entries for HTTP and HTTPS might be identical.. No point in splitting the cache, just remove $scheme from the cache key and you'll get hits across both.

Some people might even find that useful for $host.

Upstream revalidation needs to be re-compared against client validators

Currently when we revalidate upstream, if there are no client validators we create them from the last cache entry (as per the RFC). However, by leaving things in this state, Nginx can return 304 to a client which has no cache item (i.e. didn't send a validator), yielding an empty body.

Client validators (including the lack of) should be restored after revalidating upstream. Ideally we'd just add those headers in for the upstream request, but OpenResty doesn't currently support this.

Ledge proxies any unrecognised request method as GET

It looks like nginx will happily pass any kind of request method through to certain location blocks, including content_by_lua blocks.

Ledge is passing on
ngx['HTTP_' .. ngx.req.get_method()]
as the method to the origin which returns nil and defaults to GET.

Vanilla nginx serving files off disk does this:
https://gist.github.com/da66b92c168528a3f5dd

Ledge does this:
https://gist.github.com/0ab5b15320e635b5ad7e

We should probably check that ngx['HTTP_' .. ngx.req.get_method()] returns something valid before proxying

Fatal error when keepalive is not specified

2012/06/26 16:11:43 [error] 11412#0: *12489 lua handler aborted: runtime error: /home/jhurst/prj/ledge/lib/ledge/ledge.lua:135: attempt to index field 'keepalive' (a nil value)
stack traceback:
    /home/jhurst/prj/ledge/lib/ledge/ledge.lua: in function 'redis_close'
    /home/jhurst/prj/ledge/lib/ledge/ledge.lua:113: in function 'mw'

Related to #11.

ledgetech / ledge Goto Github PK

ledge's People

Contributors

Stargazers

Watchers

Forkers

ledge's Issues

1) End-to-end reload

2) Specific end-to-end revalidation

3) Unspecified end-to-end revalidation

Validators

Other notes

Current structure

Problems

Proposed new structure

Offline mode

Default options

origin_location

redis_host

redis_port

redis_socket

redis_timeout

redis_keepalive_timeout

redis_keepalive_pool_size

cache_key_spec

Example

Recommend Projects

Recommend Topics

Recommend Org