Code Monkey home page Code Monkey logo

crunchy-proxy's Introduction

Installation Guide - Crunchy Proxy

crunchy logo

Project

crunchy-proxy is a minimal PostgreSQL-aware proxy used to handle PostgreSQL application requests. The diagram below depicts a PostgreSQL client application connecting to the crunchy-proxy, which appears to the PostgreSQL application as any other PostgreSQL database connection. The crunchy-proxy accepts the inbound client requests, and routes them to an appropriate PostgreSQL cluster member (as described in more detail below).

scaledwidth=

To the PostgreSQL server, the crunchy-proxy is transparent and appears as any other PostgreSQL client would appear. As described in more detail below, the crunchy-proxy currently provides the following capabilities:

  • ability to route messages based on the SQL command type - writes are sent to a master and reads are sent to replicas in a round-robin fashion

  • configuration via JSON file

  • PostgreSQL healthcheck

  • ability to route inbound client messages based on the health of PostgreSQL

  • REST administrative interface

  • ability to publish healthcheck events to consumers outside of the proxy

  • ability to load balance SQL statements across multiple replica backends

  • connection pooling

  • support for certificate and SSL based authentication

As described under "Approach" below, implementation of the crunchy-proxy requires that (1) a user has access to their application’s SQL and (2) a user has the ability to add basic annotations to their SQL statements.

Approach

One of the key features of the crunchy-proxy is the ability to route different SQL statements to a PostgreSQL cluster members based on certain pre-defined rules. This capability enables end-user applications to see a PostgreSQL cluster as a single connection. In order to support this capability, it is necessary that crunchy-proxy have knowledge of the SQL statements that a PostgreSQL application sends to PostgreSQL.

One approach to addressing the requirement that a proxy be "SQL aware" (e.g. pgpool) is to first parse each SQL statement looking for grammar that would imply a write or update and to then send those SQL statements to the cluster master. SQL statements that are determined to be read-only in nature are deemed safe to send to the cluster replica members. Remember, in PostgreSQL cluster replica members are only able to process SQL that is read-only in nature.

crunchy-proxy takes a different approach to SQL routing. Instead of attempting to parse the entire SQL grammer of each SQL statement, it looks for a simple annotation, supplied by the end-user within their SQL statement, to determine the routing destination.

For example, the following SQL statement is annotated to indicate it is a read-only SQL statement:

/* read */ select now();

Examples of a write SQL statement include the following:

create table boofar (id int);
drop table boofar (id int);

If a SQL statement does not include an annotation, the the statement is deemed a write and thus sent to the master cluster member.

By parsing only the annotation, crunchy-proxy simplifies the complexity associated with determining whether a SQL statement is a write or read and thus to which member (master or replica) of a PostgreSQL cluser to send a SQL statement.

In taking this approach, crunchy-proxy has assumed (1) a user has access to their application’s SQL and (2) a user has the ability to add the annotation in their SQL statements. If they do, then they can use the crunchy-proxy for SQL routing.

Of course these assumptions introduce certain limitations on the crunchy-proxy. Nonetheless, it was determined that these assumptions will not be unduly limiting in the usability of the crunchy-proxy and that the resulting limiations are justified by the benefits of (1) reduction in complexity associated with SQL parsing implementation, (2) increase in proxy throughput and (3) improved routing accuracy of the SQL parsing.

PostgreSQL Wire Protocol

crunchy-proxy operates at the PostgreSQL wire protocol (network) layer to understand PostgreSQL client authentication requests and SQL statements passed by a client to a PostgreSQL backend.

As crunchy-proxy uses annotations to route messages to the backend, the proxy primarily examines SQL statements for proxy-specific annotations and does very little processing of the messages sent between a client and an actual backend.

Its important to note that the proxy does not implement all features of libpq or provide an application interface similar to a JDBC driver or other language driver.

The following resources are useful in understanding the PostgreSQL wire protocol:

In the future, by working at the wire protocol level, crunchy-proxy can implement a variety of features important for high speed proxy handling and for supporting PostgreSQL features.

Execution

The proxy is a golang binary, you execute it as follows:

$> crunchy-proxy start --config=config.yaml

To run the proxy at different logging output levels:

$> crunchy-proxy start --config=config.yaml --log-level=<level>

Where <level> is one of the following:

  • debug

  • info

  • error

  • fatal

Detailed documentation including configuration file format and developer information is found in the User Guide

For Docker users, you can run the proxy using the run-docker.sh script.

Feedback

If you find a bug, or want to provide feedback on the design and features feel free to create a github issue.

Copyright © 2017 Crunchy Data Solutions, Inc.

CRUNCHY DATA SOLUTIONS, INC. PROVIDES THIS GUIDE "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESS OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF NON INFRINGEMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.

Crunchy, Crunchy Data Solutions, Inc. and the Crunchy Hippo Logo are trademarks of Crunchy Data Solutions, Inc.

crunchy-proxy's People

Contributors

abrightwell avatar prlaurence avatar stephenlacy avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

crunchy-proxy's Issues

Add support for target_session_attrs

PostgreSQL 10 will include a connection time parameter as part of libpq to say whether your connection must be routed to a master node. The idea is that the pooler can assume everyone is a read connection unless they signal they are going to write. Here's the documentation from https://www.postgresql.org/docs/devel/static/libpq-connect.html#libpq-connect-target-session-attrs

target_session_attrs
If this parameter is set to read-write, only a connection in which read-write transactions are accepted by default is considered acceptable. The query show transaction_read_only will be sent upon any successful connection; if it returns on, the connection will be closed. If multiple hosts were specified in the connection string, any remaining servers will be tried just as if the connection attempt had failed. The default value of this parameter, any, regards all connections as acceptable.

There are some samples at http://paquier.xyz/postgresql-2/postgres-10-libpq-read-write/ that demonstrate this means to allow libpq itself work to search multiple hosts for the master and require no failover proxy as all. That will work for some small environments and lets core Postgres say it has a complete fail-over solution for them. There's still a need for larger sites to have a program like crunchy-proxy in the middle instead, where clients only connect to one address with transparent failover behind that, and there are APIs for the failover that integrate into a larger environment.

Regardless, eventually target_session_attrs could (and arguably should) be the standard way to label applications connections for routing purposes. That will eliminate the need to tag the query text with comments the way crunchy-proxy does now. The current comment-based approach is still useful today because it's going to take a while before enough software can use this. The drivers for JDBC, Ruby, etc. will all need updates before they can handle this feature.

explore TAP testing

see if the golang tap testing framework can be used for something useful here.

crunchy proxy when the replica is not running at startup

Error message received:

INFO[2017-08-10T15:37:20Z] Admin Server listening on: [::]:10000        
ERRO[2017-08-10T15:37:21Z] healthcheck: query failed: dial tcp 172.30.6.30:5432: getsockopt: no route to host

panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x58 pc=0x8b841f]
goroutine 21 [running]:

github.com/crunchydata/crunchy-proxy/proxy.(*Proxy).setupPools(0xc420181260)
	/home/abrightwell/development/go/src/github.com/crunchydata/crunchy-proxy/proxy/proxy.go:81 +0x4cf

github.com/crunchydata/crunchy-proxy/proxy.NewProxy(0x8)
	/home/abrightwell/development/go/src/github.com/crunchydata/crunchy-proxy/proxy/proxy.go:45 +0xc5

github.com/crunchydata/crunchy-proxy/server.(*ProxyServer).Serve(0xc42015bd40, 0xc93140, 0xc42008e0f0, 0x0, 0x0)
	/home/abrightwell/development/go/src/github.com/crunchydata/crunchy-proxy/server/proxy.go:44 +0x10b

created by github.com/crunchydata/crunchy-proxy/server.(*Server).Start
	/home/abrightwell/development/go/src/github.com/crunchydata/crunchy-proxy/server/server.go:61 +0x1db

Config Credentials?

Should each node have it's own Credentials/Authentication section? As of right now there is only a global credentials section for all nodes, I'd assume that most environments are probably homogenous in the type of authentication, but is it safe to assume that all nodes will utilize the same credentials? Assuming that since they are replicas, they are indeed 'identical' in all respects?

not able to connect using jdbc

Hi everyone, this is how i am trying to connect from java program
String url = "jdbc:postgresql://127.0.0.1:9091/io?sslmode=disable&user=*&password=";
public Connection connect() {
Connection conn = null;
try {
Class.forName("org.postgresql.Driver");
conn = DriverManager.getConnection(url);
System.out.println("Connected Successfully");
} catch (Exception e) {
e.printStackTrace();
}
return conn;
}
crunchy_proxy

enable parameters to be passed to adapters

an example of this would be to allow the filename to be passed to the audit adapter as specified in the config.json file, this means adding a new structure to the config structure for the adapters.

Feature: live edit of list of backends

To support an infrastructure where the list of PostgreSQL backends is being maintained by Kubernetes/Openshift, we need to support "hot" changes to the list of backends. That is, we need to be able to add new backends and remove old ones with proxy running, and only have it drop connections for the affected backends.

There's three primary cases we want to support here. All of these need to work via external commands, not just healthcheck, due to a need to support a system where there may be multiple proxies in the infra and an external service is controlling the list of nodes:

  1. Pool growth: I increase the number of PostgreSQL replicas from 3 to 7. I should be able to add the additional 4 replicas to proxy without restarting, and have them receive (new) load-balanced connections as they are added.

  2. Node removal: if I am removing nodes, either for pool shrinkage or because replicas have failed, I need to be able to do it without restarting proxy and without waiting for proxy itself to healthcheck out those nodes. Existing connections to other nodes should not be interrupted.

  3. Master Failover: I need to be able to change which node is the master without restarting proxy. Only connections to the old master should be interrupted by this.

Update PDF Docs?

Do we need to update crunchy-proxy/docs/pdf/ to reconcile with revised documentation?

Allowing clients to use different credentials to crunchy-proxy's credentials

I may be musunderstanding this but I believe that it's not currently possible to use a different set of credentials from the client to crunchy-proxy as the credentials that crunchy-proxy uses to communicate with the backends. If this is the case then it's not possible to use multiple usernames or databases with a single crunchy-proxy instance.

When attempting to connect through crunchy-proxy using a different (but valid on the backend) set of credentials FATAL: could not validate user/database is returned, I've traced that to here:

func ValidateClient(message []byte) bool {

Just wondering if this is an intentional design decision. It would be great to be able to provide credentials to crunchy-proxy to perform it's healthchecks, but allow the client to supply different arbitrary credentials for their connections as it would mean that the proxy is truly transparent to the application. Otherwise different applications (or even single apps using multiple databases) would need to hit a different crunchy-proxy for each database it needs.

If all this is correct then this may be a non trivial change as pooling would probably need to be grouped by username/database used by the client, but it may be worth exploring if it broadens the use case for crunchy-proxy.

Although I've been snagged a bit by this issue, I have to say I'm very impressed with crunchy-proxy, it's very clean. Thanks for creating it!

Load-Balance Authentication

Per the docs:

Each client must authenticate against the master backend before the proxy will process future client requests. crunchy proxy does not include an authentication store itself, but instead relies on the master backend to perform authentication.

There's no particular reason we need to use the master backend for this. In most situations, authentication information will be uniform across replicas. As such, the client could be authenticated against any replica. This would greatly improve the ability of the proxy to handle large numbers of connections.

log message refactor

need cleaner log messages set to the right verbosity and showing the right information to assist debugging.

refactor global Config variable

evaluate and/or implement a global variable for the config, this would be something like:
var Cfg Config
within the config.go and config package

other packages would import and use it like:

import "github.com/crunchydata/crunchy-proxy/config"
net.ResolveTCPAddr(config.Cfg.IPAddr)

Things to look for:
a) no circular package deps
b) cleaner code should emerge
c) shouldn't break anything :^))

enable logging via the config file

would be nice to be able to enable logging via the config file, this works now from the command line but in a container environment, it would be useful to set this via the config file.

Add monitoring endpoints

I was looking at this as an option to replace pgbouncer, but one of the main reason I want to replace pgbouncer is the minimal instrumentation it has.

I'm interested in adding Prometheus client_golang and pprof to this code in order to better monitor it in production.

Does this sound like a good idea?

pre and post query hooks for connections

Create hook points for pre and post queries for connections. This would allow a command/query to be run before and after a connection is utilized. An example might be setting/resetting client/session authorization for a query.

fix run-cluster.sh permissions

run-cluster.sh assumes a postgres UID on the host, set the permissions instead which most users will find easier than messing with a new UID on their test box.

healthcheck - check connections

if you have a bad connection in a pool, the healthcheck needs to test for it during its check, replace the connection if possible.

this can be tested if you shutdown the replica and restart it during the healthcheck window.

IP Address vs Host?

We use IPAddr throughout the configuration and implementation. Is it too pedantic to think this would be more correctly called 'host'? I'd imagine it is possible for all the related nodes to be given some form of hostname/domain that they are identified by as opposed to an IP address. Thoughts?

add signal handler

when the proxy runs in a container, it would be good to catch SIGTERM/SIGINT for a clean shutdown.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.