Richie Request Router, or rrrouter
, is a HTTP reverse proxy.
It accepts incoming HTTP requests, chooses the first matching forwarding rule from an ordered ruleset, and proxies the request, performing minimal header rewrites in the process.
This repository contains a Dockerfile
that can be used to build and run rrrouter, also in production. Note that the image only includes the richie-request-router
binary, not the config utility. To update the config, build rrrouter locally.
In your working copy, you can build the Docker image like this:
docker build -t rrrouter .
Either the MAPPING_FILE
or MAPPING_URL
environment variable is mandatory. A complete invocation example is thus:
docker run \
-e MAPPING_URL=[latest mapping config url] \
--publish 8080:5000 \
rrrouter
An optional MAPPING_CHECK_INTERVAL can be given in the environment and when set to non-zero, will check the file or URL every N seconds for changes.
rrrouter is written in Go. It uses the go module system.
To compile rrrouter, check out the repo and run
go install ./...
rrrouter requires some environment variables to run. To see full list of mandatory and optional variables, run
richie-request-router --help
rrrouter's routing configuration consists of a list of rules, each containing an url path pattern, an optional host, a destination and additional flags.
The configuration can be specified in JSON or YAML and can be fetched from a remote URL (set with the MAPPING_URL
environment variable) or a local file (using MAPPING_FILE
).
{
"rules": [
{
"path": "/config/v1/*",
"host": "app.example.com",
"destination": "https://richie-appconfig.herokuapp.com/v1/$1",
"internal": true
},
{
"path": "/external-customer-1/srv/*",
"host": "app.example.com",
"destination": "https://s.external-customer-1.com/$1"
}
]
}
rules:
- destination: https://richie-appconfig.herokuapp.com/v1/$1
internal: true
path: /config/v1/*
host: app.example.com
- destination: https://s.external-customer-1.com/$1
path: /external-customer-1/srv/*
host: app.example.com
The path pattern can include asterisks (*) as wildcards. The destinations can refer to these wildcards with the $1, $2 (etc) notation.
The first matching rule is selected.
You can check the syntax of a routing configuration with the checkmapping
subcommand:
richie-request-router checkmapping --url=https://example.com/config.json
richie-request-router checkmapping --file=./config.json
A matching route in the configuration for a single query can be searched with the checkquery
subcommand:
richie-request-router checkquery --url=https://example.com/config.json --method=GET --query=http://app.example.com/config/v1/example
richie-request-router checkquery --file=./config.json --method=GET --query=http://app.example.com/config/v1/example
The outputs of checkmapping
and checkquery
contain rule structures for proxy and copy rules. If a rule is present, its representation in output contains flags, matching methods, original URL pattern and destination URL pattern. In the representation, the flags are specified as followed:
-
I - the rule is internal (it has
internal
flag) -
E - the rule is external (it has no
internal
flag) -
P - the rule is a proxy rule
-
C - the rule is a copy rule
-
Example:
Rule (I,P) * "app.example.com/config/v1/*" -> "https://richie-appconfig.herokuapp.com/v1/$1"
The rule is an internal proxy rule, it matches all the HTTP methods and the beginning of the URL is replaced with the first parts of an application's actual Heroku URL.
You can specify that a rule applies to only a subset of HTTP methods. To do this, include a methods
list in a rule.
{
"rules": [
{
"path": "/config/v1/*",
"host": "app.example.com",
"destination": "https://richie-appconfig.herokuapp.com/v1/$1",
"internal": true,
"methods": [
"GET",
"HEAD"
]
}
]
}
You can specify that a request with Accept-Encoding of either br
or gzip
should be fulfilled by rrrouter. For example, if the request specifies Accept-Encoding: br
and the origin response is application/json
with no Content-Encoding
applied, rrrouter will compress it with Brotli and return Content-Encoding: br
.
Additionally, with a client request with Accept-Encoding: br
, origin being Content-Encoding: gzip
the content will be first decompressed and re-compressed with Brotli.
Only plain text types application/json
and text/<anything>
, when Content-Encoding
is missing or identity
will be compressed with the desired Accept-Encoding
.
Currently, if a request does not include either br
or gzip
in the accepted encodings, no encoding or decoding actions are performed on the origin response.
Compression level can be specified with environment variables BROTLI_LEVEL (0-11, defaults to 0) and GZIP_LEVEL (1-9, defaults to 1)
Rules without this field default to not performing any compression:
rules:
- destination: https://richie-appconfig.herokuapp.com/v1/$1
path: /config/v1/*
host: app.example.com
recompression: true
A rule can include retry_rule
, which is another rule to try right after a first try with the parent rule should fail with any server or client error. All fields except cache
are used in the retry_rule
where applicable.
A rule with hostheader: original
will use what the incoming request used. hostheader: destination
will use the rule destination's host and port, which is the default. An override, hostheader: app2.example.com:9000
is used as-is.
rules:
- destination: https://richie-appconfig.herokuapp.com/v1/$1
path: /config/v1/*
host: app.example.com
hostheader: app2.example.com:9000
A rule with
request_headers:
authorization: null
foo: bar
applies all keys and their values to the request headers, prior to sending the request to the origin. A special value, null
, can be used to remove a header.
A rule with
response_headers:
access-control-allow-origin: "*"
applies all keys and their values to the response headers. Whitespace is trimmed from both the key and value.
A rule with
restart_on_redirect: true
will cause the URL in the Location
header to be requested as if the request was made by the client, using the same method and headers. When a cache is used, each redirected URL is cached as a separate entry.
For example, with a set of rules being
rules:
- path: /config/v1/*
destination: https://richie-appconfig.example.com/v1/$1
restart_on_redirect: true
- path: /config/v2/*
host: external-host.example.com
destination: https://external-host.example.com/$1
request_headers:
authorization: null
cache: c1
and a request with an authorization
header to app.example.com/config/v1/*
, where a redirection from https://richie-appconfig.example.com/v1/$1
to https://external-host.example.com/config/v2/*
will result in that URL being requested by rrrouter, with the authorization
header dropped. In this case, an "outer" request with user-specific credentials would not be cached, while the "inner" redirection would be cached for all following requests, as the user-specific credentials are dropped before requesting that resource.
In addition to proxying requests, rrrouter can copy traffic to a host without reporting this to the user.
{
"rules": [
{
"path": "/config/v1/*",
"destination": "https://richie-appconfig-staging.herokuapp.com/v1/$1",
"internal": true,
"type": "copy_traffic"
},
{
"path": "/config/v1/*",
"destination": "https://richie-appconfig.herokuapp.com/v1/$1",
"internal": true
}
]
}
With this configuration, in addition to creating a request to https://richie-appconfig.herokuapp.com/v1/q
, a request to app.example.com/config/v1/q
is also sent to https://richie-appconfig-staging.herokuapp.com/v1/q
. The user does not see this; the response from the non-copy target is returned.
A type
field with value proxy
is equivalent to a missing type
field, and means the default behavior.
A note on memory usage: rrrouter tries to conserve memory, but it has to read the request body to memory most of the time. There's only one situation where it isn't necessary: it's a POST request (so not retryable because it's not idempotent) and there is only one of copy_traffic
and proxy
rules matching it (because you can't stream to two places without at least some copying and syncing, none of which rrrouter does at the moment.) If you use traffic copying or non-POST methods with endpoints that receive large requests, this can lead to significant memory usage.
The mechanism for selecting rules is as follows:
- Rules are checked in the order they are in the file.
- If a rule's
enabled
is false, the rule is skipped. - If a rule has a
methods
list and it does not include the request method, the rule is skipped. - If a rule matches and its type is
copy_traffic
, it's selected as the copy target. This will not be overridden by latercopy_traffic
matches. - If a rule matches and its type is
proxy
, it's selected as the proxy target and the search is terminated.
rrrouter forwards all request headers to the destination server. The exception is the Host header, which is rewritten according to destination specified in the rule. This means that the destination server does not need to know the hostnames used by the upstream clients to access the content. It also means that the destination is expected to present a certificate for its own name, rather than the name the client used to perform the request. The other exception is Range
, if the request matches a caching rule.
Each forwarding destination can be configured as internal or external. Internal destinations are our own microservices. External destinations are other systems, like Amazon S3.
When forwarding requests to internal destinations (see the Routing Configuration section) and the ROUTING_SECRETS
environment variable is set, rrrouter adds three custom request headers to each request. These are:
- Richie-Originating-IP
- Richie-Request-ID
- Richie-Routing-Secret
As per RFC 6648, no โX-โ prefix is used in these names.
Richie-Originating-IP is the IP address (either IPv4 or IPv6) of the connecting client. The value can currently be copied from the CF-Connecting-IP
header added by CloudFlare. The intention is to remove the need to hardcode a CloudFlare-specific header name into downstream microservices implementations while also discouraging the use of the inherently unreliable X-Forwarded-For
header.
Richie-Request-ID is a random UUID assigned by rrrouter for the request. This is intended to allow the correlation of requests within various microservices to a single originating (external) request, for example when analyzing log files.
Richie-Routing-Secret is a shared secret known by the rrrouter and downstream microservices. This allows microservices to deny all requests without a valid secret. This makes the rrrouter into a firewall of sorts; without this mechanism, services hosted on public PaaS providers like Heroku are directly accessible by anyone guessing the application endpoint, which can be trivial. More than one valid secret may exist at any given time, but the rrrouter will only ever send the latest one on the list. This is to allow key rotation without the risk of denied requests. These keys are configured with the ROUTING_SECRETS
environment variable.
When an incoming request contains a valid Richie-Routing-Secret
, it, along with the Richie-Originating-IP
and Richie-Request-ID
headers, are forwarded as-is to the destination. This enables Richie microservices to perform additional requests to other microservices via the rrrouter while retaining the information in the original/external request.
A missing value in one of the other headers is replaced with a new one in incoming requests with a valid Richie-Routing-Secret
header.
If an incoming request has an invalid Richie-Routing-Secret
or it's missing but one of the other headers has a value, the request is denied and status code 407 is returned.
Rrrouter can cache responses if a rule includes a cache: <disk-1>
field indicating the storage to use.
The cache can be configured with a number of disk storage backends:
caches:
- size: 300G
path: /a/path
id: disk-1
When a qualifying request is received and it is to be cached, a cache writer is wrapped alongside the golang http.ResponseWriter
. A file descriptor is opened for the new cache entry and written to first in full, after which the bytes to be written for the client are then read from.
The cache uses a lock for each new entry's key
to minimize the number of requests to origin. A key
consists of the request host, path and qualifying headers. The path on disk is the cache root path joined with the hash of a key
.
Qualifying headers for cache keys include host
, accept-encoding
and origin
headers.
Upon trying to fetch an entry from the cache, should it not exist, the caller will receive either a writer or a channel to wait with, while a parallel request's writer finishes writing to the entry. After a writer is done, a waiting reader's channel will receive and an entry can be tried to be fetched again, as the resource can now be available under the same cache key. This subsequent call can still return zilch, as a response for a writer could have returned a header which disqualifies the resource from being stored in the cache at all.
The cache does not use locks when reading, as the underlying implementation of io.ReadFrom uses the platform's sendfile() to efficiently copy the bytes over to be written to the client. A file descriptor is opened for each reader to support Range
requests.
Requests with Range: bytes=...
are supported, but the request to the origin does not include this header. The origin's resource is requested in full and the range request is satisfied after caching.
The storage tries to limit the used space to size
. Purging is done first with entries for which the last access time is unknown. If all entries have known access times, the earliest entries are selected. This record-keeping, with million entries should fit into ~40MB.
Each cached entry is stored on disk with the qualifying request and full response headers included as extended attributes.
An example:
$ xattr -l /a/path/file
user.rrrouter: {"Host":"data.example.com","Path":"/editions-eu/package.tar","RequestHeader":{"Accept-Encoding":["gzip, deflate"]},"ResponseHeader":{"Accept-Ranges":["bytes"],"Cache-Control":["max-age=86400, s-maxage=86400"],"Content-Length":["10311696"],"Content-Type":["application/octet-stream"],"Date":["Tue, 26 Jan 2021 14:44:12 GMT"],"Etag":["\"fa459fea12476eea0e6b6389d5069630\""],"Last-Modified":["Tue, 19 Jan 2021 21:24:21 GMT"],"Server":["SomeS3/1.0"],"X-Amz-Id-2":["9Q9mgGszhQ5aF7dF5U4bAN8agKer0ugYOEzABEBxXIo8uyYQY8Nik2uBsW4yuyPnBgW38W7XO+l0"],"X-Amz-Request-Id":["F6F19B7D6DB8B77C"]},"Status":200,"Created":1611672253,"Revalidated":0,"Size":10311696}
When client headers include:
authorization
When response headers include: cache-control: no-store cache-control: max-age=0 cache-control: s-maxage=0 cache-control: private
Short of a full cache wipe, the caches can be organized by types of responses served and purged manually.
If a forced refresh of responses is needed, the env var ETAG_SUFFIX
can be set and its value will be suffixed to each
entry's ETag
header when serving responses, or comparing with incoming If-None-Match
headers. This suffix is not
stored to entries on disk and can be changed to have an immediate effect.
Rrrouter includes a richie-edge-cache
response header, with values being miss
, hit
, revalidated
, stale
, pass
or uncacheable
. If a rule should match and the response was not cached, the richie-edge-cache
is omitted.
For best performance it's recommended to set noatime on the mounted storage. As this is the starting point, no atimes are attempted to be read from disk storage. Rrrouter stores access times for new entries in memory and will flush them underneath the configured cache path, up to ATIME_LOG_SIZE_BYTES bytes, ~3M records by default. When the maximum size is met, the log file is truncated by 10% starting from the earliest accesses. ATIME_FLUSH_INTERVAL, 30 seconds by default, specifies how often unique entry accesses--inside this interval--are appended to the log file. This log file is read upon startup and information within used when creating space for new entries, starting with the oldest entries, should there not be enough entries without access times to use for this purpose. Disable with ATIME_DISABLE set to "true".
There's an admin endpoint, /__SYSTEMINFO
, for checking state of the system rrrouter is running on. The endpoint requires that the ADMIN_NAME
and ADMIN_PASS
environment variables are set.
If no caches are configured, the /__RRROUTER/health
endpoint returns HTTP 200 and OK
as the body. If a cache is configured and any of its storages fail a write check, the endpoint returns HTTP 503 with an error description in the body.
Set SENTRY_DSN
in the environment to capture errors.