Code Monkey home page Code Monkey logo

vcl-cache-validation's Introduction

vcl-cache-validation

VCL scripts for Varnish Cache to enforce cache content validation.

Goal

The goal of this project is to provide VCL (Varnish Configuration Language) scripts to help turn Varnish into a reverse-proxy client-side cache.

Motivation

The main motivation for this was to have a caching layer around an internal HTTP service we have at my employer, YouGov. This system uses Shoji for object representation and manipulation. The biggest way to get performance increases for this type of system is to take advantage of caching.

We want to achieve this by providing clients with a type of cache which can be used as a client-focused cache (letting the cache know what's best when it comes to returning content), but can also be used a server-side cache,

Backstory

Most clients using this system demand that the data is fresh. When documents are modified using PUT requests, this should invalidate any cached representation of that document. Furthermore, fulfilling GET requests to this system can be expensive, and we should take advantage of caching if possible. If a client has a cached representation of a document, and it is able to ask the server to validate its representation, and the server can indicate the validity of that representation without going through the expense of regenerating it, then that is a ''good thing''.

In our real world use-case, being able to avoid generating responses for some resources, while being able to validate they were up-to-date is a very desirable goal. To reduce the amount of document generation (by virtue of caching), my suggestion was to have a caching layer in front of the server, rather than relying on every client being capable of caching. This would reduce the amount of work for everyone who want to be a user of the service, and a shared cache for all clients would mean a larger proportion of cache hits.

I looked at various systems to achieve this, and decided that Varnish would be a good fit, as it seemed much more configurable than other systems like Nginx. I also read somewhere that Varnish would perform cache validation on each request, so that seemed to be what we were after.

After a lot of experimentation, I realised that Varnish didn't do this at all. In fact, Varnish isn't even a caching proxy as described in RFC2616 - it is more of a basic (yet very configurable and fast) accelerator for content.

So I then spent some time trying to see if I could implement this sort of validation. And after a lot of experimentation, I could - even though Varnish didn't make it easy (mainly due to lack of certain values or information being unavailable until much later in the request processing workflow). The results of which are made available in this repository.

vcl-cache-validation's People

Contributors

the-allanc avatar

Stargazers

 avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

vcl-cache-validation's Issues

Create test suite.

This will need to be done fairly early in the process to ensure everything works as expected.

Figure out a way of If-Modified-Since headers not being passed to backend

This has really disheartened me a bit. Varnish won't pass on validation headers on to the backend server (well, at least If-Modified-Since). It took a while to figure out, but it explicitly strips out If-Modified-Since headers in cache_esi_deliver.c.

(Update: And If-None-Match headers too. Crap. Can't figure out where the code to do this is though.)
(Further update: Actually, all known validating headers seem to be stripped out.)
(More debugging later: I think it's all defined in http_headers.h. That's a tidy way to do it.)

So - Varnish strips out these headers, so I can't pass them along to the backend server. I think I'll have to find what the best way is to transmit these headers in the request (maybe just adding X- prefixes to the lot of them).

On a sidenote, there is the experimental-ims branch of Varnish which provides full support for this sort of thing, which is probably a nicer alternative than using any scripts that I write here.

However, that branch appears to be at least a year out-of-date, as well as not being entirely clear whether the API it uses will change or not. Varnish 4 might be out relatively soon, but it's still not clear that it will incorporate support for it or not. So I'll persist on this route for now.

Does hash_always_miss mean we can simplify scripts?

Some of the approaches to deal with Varnish centre around the fact updating a cached document is difficult.

However, this page seems to indicate that req_always_miss will update the cache. The question is, does this cause the content to be purged before vcl_fetch has a chance to examine the beresp object?

If the content is purged, then we're still in the same boat - we can't test to see if a document has been modified or not without sacrificing the original cached content. But if we can examine the response without fear of losing the original cached document, then that would simplify the script somewhat and allow us to drop a number of various hacks which I've come up with.

I would have thought that during my research that I would have come across this already, so I do find it a bit surprising. Anyway, let's check.

Move to "go" from Python

The current prototype web server which I have experimented with uses CherryPy. Before I being work on the test suite, I'd like to move to using Go - just to give myself a chance to experiment with it. May work or fail, but let's give it a try!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.