tidwall / uhaha Goto Github PK

High Availability Raft Framework for Go

License: MIT License

Go 100.00%

raft high-availability framework fault-tolerant

uhaha's Introduction

High Availabilty Framework for Happy Data

Uhaha is a framework for building highly available Raft-based data applications in Go. This is basically an upgrade to the Finn project, but has an updated API, better security features (TLS and auth passwords), customizable services, deterministic time, recalculable random numbers, simpler snapshots, a smaller network footprint, and more. Under the hood it utilizes hashicorp/raft, tidwall/redcon, and syndtr/goleveldb.

Features

Simple API for quickly creating a custom Raft-based application.
Deterministic monotonic time that does not drift and stays in sync with the internet.
APIs for building custom services such as HTTP and gRPC. Supports the Redis protocol by default, so most Redis client library will work with Uhaha.
TLS and Auth password support.
Multiple examples to help jumpstart integration, including a Key-value DB, a Timeseries DB, and a Ticket Service.

Example

Below a simple example of a service for monotonically increasing tickets.

package main

import "github.com/tidwall/uhaha"

type data struct {
	Ticket int64
}

func main() {
	// Set up a uhaha configuration
	var conf uhaha.Config
	
	// Give the application a name. All servers in the cluster should use the
	// same name.
	conf.Name = "ticket"
	
	// Set the initial data. This is state of the data when first server in the 
	// cluster starts for the first time ever.
	conf.InitialData = new(data)

	// Since we are not holding onto much data we can used the built-in JSON
	// snapshot system. You just need to make sure all the important fields in
	// the data are exportable (capitalized) to JSON. In this case there is
	// only the one field "Ticket".
	conf.UseJSONSnapshots = true
	
	// Add a command that will change the value of a Ticket. 
	conf.AddWriteCommand("ticket", cmdTICKET)

	// Finally, hand off all processing to uhaha.
	uhaha.Main(conf)
}

// TICKET
// help: returns a new ticket that has a value that is at least one greater
// than the previous TICKET call.
func cmdTICKET(m uhaha.Machine, args []string) (interface{}, error) {
	// The the current data from the machine
	data := m.Data().(*data)

	// Increment the ticket
	data.Ticket++

	// Return the new ticket to caller
	return data.Ticket, nil
}

Building

Using the source file from the examples directory, we'll build an application named "ticket"

go build -o ticket examples/ticket/main.go

Running

It's ideal to have three, five, or seven nodes in your cluster.

Let's create the first node.

./ticket -n 1 -a :11001

This will create a node named 1 and bind the address to :11001

Now let's create two more nodes and add them to the cluster.

./ticket -n 2 -a :11002 -j :11001
./ticket -n 3 -a :11003 -j :11001

Now we have a fault-tolerant three node cluster up and running.

Using

You can use any Redis compatible client, such as the redis-cli, telnet, or netcat.

I'll use the redis-cli in the example below.

Connect to the leader. This will probably be the first node you created.

redis-cli -p 11001

Send the server a TICKET command and receive the first ticket.

> TICKET
"1"

From here on every TICKET command will guarentee to generate a value larger than the previous TICKET command.

> TICKET
"2"
> TICKET
"3"
> TICKET
"4"
> TICKET
"5"

Built-in Commands

There are a number built-in commands for managing and monitor the cluster.

VERSION                                 # show the application version
MACHINE                                 # show information about the state machine
RAFT LEADER                             # show the address of the current raft leader
RAFT INFO [pattern]                     # show information about the raft server and cluster
RAFT SERVER LIST                        # show all servers in cluster
RAFT SERVER ADD id address              # add a server to cluster
RAFT SERVER REMOVE id                   # remove a server from the cluster
RAFT SNAPSHOT NOW                       # make a snapshot of the data
RAFT SNAPSHOT LIST                      # show a list of all snapshots on server
RAFT SNAPSHOT FILE id                   # show the file path of a snapshot on server
RAFT SNAPSHOT READ id [RANGE start end] # download all or part of a snapshot

And also some client commands.

QUIT                                    # close the client connection
PING                                    # ping the server
ECHO [message]                          # echo a message to the server
AUTH password                           # authenticate with a password

Network and security considerations (TLS and Auth password)

By default a single Uhaha instance is bound to the local 127.0.0.1 IP address. Thus nothing outside that machine, including other servers in the cluster or machines on the same local network will be able communicate with this instance.

Network security

To open up the service you will need to provide an IP address that can be reached from the outside. For example, let's say you want to set up three servers on a local 10.0.0.0 network.

On server 1:

./ticket -n 1 -a 10.0.0.1:11001

On server 2:

./ticket -n 2 -a 10.0.0.2:11001 -j 10.0.0.1:11001

On server 3:

./ticket -n 3 -a 10.0.0.3:11001 -j 10.0.0.1:11001

Now you have a Raft cluster running on three distinct servers in the same local network. This may be enough for applications that only require a network security policy. Basically any server on the local network can access the cluster.

Auth password

If you want to lock down the cluster further you can provide a secret auth, which is more or less a password that the cluster and client will need to communicate with each other.

./ticket -n 1 -a 10.0.0.1:11001 --auth my-secret

All the servers will need to be started with the same auth.

./ticket -n 2 -a 10.0.0.2:11001 --auth my-secret -j 10.0.0.1:11001

./ticket -n 2 -a 10.0.0.3:11001 --auth my-secret -j 10.0.0.1:11001

The client will also need the same auth to talk with cluster. All redis clients support an auth password, such as:

redis-cli -h 10.0.0.1 -p 11001 -a my-secret

This may be enough if you keep all your machines on the same private network, but you don't want all machines or applications to have unfettered access to the cluster.

TLS

Finally you can use TLS, which I recommend along with an auth password.

In this example a custom cert and key are created using the mkcert tool.

mkcert uhaha-example
# produces uhaha-example.pem, uhaha-example-key.pem, and a rootCA.pem

Then create a cluster using the cert & key files. Along with an auth.

./ticket -n 1 -a 10.0.0.1:11001 --tls-cert uhaha-example.pem --tls-key uhaha-example-key.pem --auth my-secret

./ticket -n 2 -a 10.0.0.2:11001 --tls-cert uhaha-example.pem --tls-key uhaha-example-key.pem --auth my-secret -j 10.0.0.1:11001

./ticket -n 2 -a 10.0.0.3:11001 --tls-cert uhaha-example.pem --tls-key uhaha-example-key.pem --auth my-secret -j 10.0.0.1:11001

Now you can connect to the server from a client that has the rootCA.pem. You can find the location of your rootCA.pem file in the running ls "$(mkcert -CAROOT)/rootCA.pem".

redis-cli -h 10.0.0.1 -p 11001 --tls --cacert rootCA.pem -a my-secret

Command-line options

Below are all of the command line options.

Usage: my-uhaha-app [-n id] [-a addr] [options]

Basic options:
  -v               : display version
  -h               : display help, this screen
  -a addr          : bind to address  (default: 127.0.0.1:11001)
  -n id            : node ID  (default: 1)
  -d dir           : data directory  (default: data)
  -j addr          : leader address of a cluster to join
  -l level         : log level  (default: info) [debug,verb,info,warn,silent]

Security options:
  --tls-cert path  : path to TLS certificate
  --tls-key path   : path to TLS private key
  --auth auth      : cluster authorization, shared by all servers and clients

Networking options:
  --advertise addr : advertise address  (default: network bound address)

Advanced options:
  --nosync         : turn off syncing data to disk after every write. This leads
                     to faster write operations but opens up the chance for data
                     loss due to catastrophic events such as power failure.
  --openreads      : allow followers to process read commands, but with the
                     possibility of returning stale data.
  --localtime      : have the raft machine time synchronized with the local
                     server rather than the public internet. This will run the
                     risk of time shifts when the local server time is
                     drastically changed during live operation.
  --restore path   : restore a raft machine from a snapshot file. This will
                     start a brand new single-node cluster using the snapshot as
                     initial data. The other nodes must be re-joined. This
                     operation is ignored when a data directory already exists.
                     Cannot be used with -j flag.

uhaha's People

Contributors

Stargazers

Watchers

uhaha's Issues

multi group raft

does it support multi group raft ?
thanks.

Leader election issues?

HI @tidwall 👋

Getting back to playing with your library again and making some updates/fixes to my bitraft I noticed some strange behaviour I'm wondering if you can help figure out.

I form a 3-node cluster:

Node 1:

james@Jamess-MacBook-Pro
Mon Jun 07 11:22:17
~/tmp/bitraft
 (upgrade_finn_uhaha) 0
$ ./bitraft -i 1 -p ./data1 -b 0.0.0.0:4920
INFO[0000] 63366:S 07 Jun 2021 11:22:21.566 # starting bitraft version 0.0.1@HEAD
INFO[0000] 63366:S 07 Jun 2021 11:22:21.861 * synchronized time
INFO[0000] 63366:S 07 Jun 2021 11:22:21.861 * server listening at [::]:4920
INFO[0000] 63366:S 07 Jun 2021 11:22:21.861 * server advertising as 0.0.0.0:4920
INFO[0000] 63366:S 07 Jun 2021 11:22:21.935 * initial configuration: index=0 servers=[]
INFO[0000] 63366:S 07 Jun 2021 11:22:21.935 * bootstrapping new cluster
INFO[0000] 63366:F 07 Jun 2021 11:22:21.935 * entering follower state: follower="Node at [::]:4920 [Follower]" leader=
INFO[0001] 63366:F 07 Jun 2021 11:22:23.366 # heartbeat timeout reached, starting election: last-leader=
INFO[0001] 63366:C 07 Jun 2021 11:22:23.366 * entering candidate state: node="Node at [::]:4920 [Candidate]" term=2
INFO[0001] 63366:C 07 Jun 2021 11:22:23.366 * election won: tally=1
INFO[0001] 63366:L 07 Jun 2021 11:22:23.366 * entering leader state: leader="Node at [::]:4920 [Leader]"
INFO[0002] 63366:L 07 Jun 2021 11:22:23.937 * logs loaded: ready for commands
INFO[0044] 63366:L 07 Jun 2021 11:23:06.027 * updating configuration: command=AddStaging server-id=2 server-addr=0.0.0.0:4921 servers="[{Suffrage:Voter ID:1 Address:0.0.0.0:4920} {Suffrage:Voter ID:2 Address:0.0.0.0:4921}]"
INFO[0044] 63366:L 07 Jun 2021 11:23:06.027 * added peer, starting replication: peer=2
INFO[0044] 63366:L 07 Jun 2021 11:23:06.029 # appendEntries rejected, sending older logs: peer="{Voter 2 0.0.0.0:4921}" next=1
INFO[0044] 63366:L 07 Jun 2021 11:23:06.030 * pipelining replication: peer="{Voter 2 0.0.0.0:4921}"
INFO[0106] 63366:L 07 Jun 2021 11:24:07.976 * updating configuration: command=AddStaging server-id=3 server-addr=0.0.0.0:4922 servers="[{Suffrage:Voter ID:1 Address:0.0.0.0:4920} {Suffrage:Voter ID:2 Address:0.0.0.0:4921} {Suffrage:Voter ID:3 Address:0.0.0.0:4922}]"
INFO[0106] 63366:L 07 Jun 2021 11:24:07.976 * added peer, starting replication: peer=3
INFO[0106] 63366:L 07 Jun 2021 11:24:07.978 # appendEntries rejected, sending older logs: peer="{Voter 3 0.0.0.0:4922}" next=1
INFO[0106] 63366:L 07 Jun 2021 11:24:07.982 * pipelining replication: peer="{Voter 3 0.0.0.0:4922}"

INFO[0300] 63366:L 07 Jun 2021 11:27:22.554 * aborting pipeline replication: peer="{Voter 3 0.0.0.0:4922}"
INFO[0300] 63366:L 07 Jun 2021 11:27:22.565 # failed to heartbeat to: peer=0.0.0.0:4922 error=EOF
INFO[0301] 63366:L 07 Jun 2021 11:27:23.055 # failed to contact: server-id=3 time=501.441291ms
INFO[0301] 63366:L 07 Jun 2021 11:27:23.531 # failed to contact: server-id=3 time=977.02433ms
INFO[0302] 63366:L 07 Jun 2021 11:27:23.622 # failed to appendEntries to: peer="{Voter 3 0.0.0.0:4922}" error="dial tcp 0.0.0.0:4922: connect: connection refused"
INFO[0302] 63366:L 07 Jun 2021 11:27:23.682 # failed to heartbeat to: peer=0.0.0.0:4922 error="dial tcp 0.0.0.0:4922: connect: connection refused"
INFO[0302] 63366:L 07 Jun 2021 11:27:24.007 # failed to contact: server-id=3 time=1.453318097s
INFO[0303] 63366:L 07 Jun 2021 11:27:24.633 # failed to appendEntries to: peer="{Voter 3 0.0.0.0:4922}" error="dial tcp 0.0.0.0:4922: connect: connection refused"
INFO[0303] 63366:L 07 Jun 2021 11:27:24.867 # failed to heartbeat to: peer=0.0.0.0:4922 error="dial tcp 0.0.0.0:4922: connect: connection refused"
INFO[0304] 63366:L 07 Jun 2021 11:27:25.644 # failed to appendEntries to: peer="{Voter 3 0.0.0.0:4922}" error="dial tcp 0.0.0.0:4922: connect: connection refused"
INFO[0304] 63366:L 07 Jun 2021 11:27:25.672 * pipelining replication: peer="{Voter 3 0.0.0.0:4922}"
INFO[0304] 63366:L 07 Jun 2021 11:27:26.007 # failed to heartbeat to: peer=0.0.0.0:4922 error="dial tcp 0.0.0.0:4922: connect: connection refused"

Node 2:

james@Jamess-MacBook-Pro
Mon Jun 07 11:22:46
~/tmp/bitraft
 (upgrade_finn_uhaha) 0
$ ./bitraft -i 2 -p ./data2 -b 0.0.0.0:4921 -j 127.0.0.1:4920
INFO[0000] 63379:S 07 Jun 2021 11:23:05.679 # starting bitraft version 0.0.1@HEAD
INFO[0000] 63379:S 07 Jun 2021 11:23:05.956 * synchronized time
INFO[0000] 63379:S 07 Jun 2021 11:23:05.956 * server listening at [::]:4921
INFO[0000] 63379:S 07 Jun 2021 11:23:05.956 * server advertising as 0.0.0.0:4921
INFO[0000] 63379:S 07 Jun 2021 11:23:06.027 * initial configuration: index=0 servers=[]
INFO[0000] 63379:S 07 Jun 2021 11:23:06.027 * joining existing cluster at 127.0.0.1:4920
INFO[0000] 63379:F 07 Jun 2021 11:23:06.027 * entering follower state: follower="Node at [::]:4921 [Follower]" leader=
INFO[0000] 63379:F 07 Jun 2021 11:23:06.029 # failed to get previous log: previous-index=214 last-index=0 error="log not found"

Node 3:

james@Jamess-MacBook-Pro
Mon Jun 07 11:27:22
~/tmp/bitraft
 (upgrade_finn_uhaha) 130
$ ./bitraft -i 3 -p ./data3 -b 0.0.0.0:4922 -j 127.0.0.1:4920
INFO[0000] 63413:S 07 Jun 2021 11:27:25.137 # starting bitraft version 0.0.1@HEAD
INFO[0000] 63413:S 07 Jun 2021 11:27:25.427 * synchronized time
INFO[0000] 63413:S 07 Jun 2021 11:27:25.427 * server listening at [::]:4922
INFO[0000] 63413:S 07 Jun 2021 11:27:25.427 * server advertising as 0.0.0.0:4922
INFO[0000] 63413:S 07 Jun 2021 11:27:25.534 * initial configuration: index=522 servers="[{Suffrage:Voter ID:1 Address:0.0.0.0:4920} {Suffrage:Voter ID:2 Address:0.0.0.0:4921} {Suffrage:Voter ID:3 Address:0.0.0.0:4922}]"
INFO[0000] 63413:S 07 Jun 2021 11:27:25.534 # ignoring join request because server already belongs to a cluster
INFO[0000] 63413:F 07 Jun 2021 11:27:25.534 * entering follower state: follower="Node at [::]:4922 [Follower]" leader=

Then I query the cluster with RAFT LEADER and RAFT SERVER LIST:

$ telnet localhost 4920
Trying ::1...
Connected to localhost.
Escape character is '^]'.
RAFT LEADER
$0

RAFT SERVER LIST
*3
*6
$2
id
$1
1
$7
address
$12
0.0.0.0:4920
$6
leader
$5
false
*6
$2
id
$1
2
$7
address
$12
0.0.0.0:4921
$6
leader
$5
false
*6
$2
id
$1
3
$7
address
$12
0.0.0.0:4922
$6
leader
$5
false

It would appear nobody knows who the leader is? However writes do work as well as reads and there no obvious errors on Node 1 (besides what you see above).

SET foo bar
+OK
GET foo
$3
bar

Support for Redis cluster commands

Hi 👋

This is an excellent project!
I was playing around with the ticket server example and wanted to suggest a minor change.

I recently discovered that redis-cli supports an additional -c flag to configure it in cluster-mode:

-c    Enable cluster mode (follow -ASK and -MOVED redirections).

so I made a small change:

diff --git a/uhaha.go b/uhaha.go
index 4e39375..7a96854 100644
--- a/uhaha.go
+++ b/uhaha.go
@@ -988,7 +988,9 @@ func errRaftConvert(ra *raftWrap, err error) error {
        if err == raft.ErrNotLeader {
                leader := getLeaderBroadcastAddr(ra)
                if leader != "" {
-                       return fmt.Errorf("TRY %s", leader)
+                       parts := strings.Split(leader, "]") # TODO: proper parsing
+                       return fmt.Errorf("MOVED foo 127.0.0.1%s", parts[1])

which shows the following behavior with redis-cli

redis-cli -p 11003 -c
127.0.0.1:11003> RAFT LEADER
"[::]:11003"
127.0.0.1:11003> TICKET
"6"
# trigger a leader election
127.0.0.1:11003> TICKET
-> Redirected to slot [0] located at 127.0.0.1:11002
"9"
127.0.0.1:11002>

which is pretty convenient for testing with redis-cli.

Being a little greedy, I tried to reproduce this by connecting from a go redis client. It looks like only client that implements the redis cluster protocol support the MOVED command.

rdb := redis.NewClusterClient(&redis.ClusterOptions{
   Addrs: []string{":11001", ":11002", ":11003"},
})

and seeing a somewhat expected:

ERR unknown command 'cluster'

My understanding is that when starting the initial connection, the client asks a redis server node for the slot partitioning, in order to more efficiently route requests directly to the node responsible for that slot range.

I'd be curious to hear your thoughts on supporting the CLUSTER SLOTS command, and assigning all slots to the current leader, with the format described at https://redis.io/commands/cluster-slots

I'm not sure if it's possible to add such a behavior with a dedicated command conf.AddIntermediateCommand("cluster", cmdCLUSTER).

The goal is to have clients connect to a single address and follow redirections from -MOVED … errors. The semantics look similar, even if with uhaha, a single node covers all slots.

leveldb store open: resource temporarily unavailable

Using the upgrade_finn_uhaha branch of bitraft I noticed the following problems that seem to have to do with leveldb you're using behind the scenes?

How do I fix/resolve this? Can I control what is being used here?

Running the leader:

james@Jamess-MacBook-Pro
Mon Jun 07 10:26:04
~/tmp/bitraft
 (upgrade_finn_uhaha) 130
$ ./bitraft -d ./data1
INFO[0000] 61466:S 07 Jun 2021 10:26:13.767 # starting bitraft version 0.0.1@HEAD
INFO[0000] 61466:S 07 Jun 2021 10:26:14.056 * synchronized time
INFO[0000] 61466:S 07 Jun 2021 10:26:14.057 * server listening at [::]:4920
INFO[0000] 61466:S 07 Jun 2021 10:26:14.057 * server advertising as :5920
INFO[0000] 61466:S 07 Jun 2021 10:26:14.156 * initial configuration: index=0 servers=[]
INFO[0000] 61466:S 07 Jun 2021 10:26:14.156 * bootstrapping new cluster
INFO[0000] 61466:F 07 Jun 2021 10:26:14.156 * entering follower state: follower="Node at [::]:4920 [Follower]" leader=
INFO[0001] 61466:F 07 Jun 2021 10:26:15.657 # heartbeat timeout reached, starting election: last-leader=
INFO[0001] 61466:C 07 Jun 2021 10:26:15.657 * entering candidate state: node="Node at [::]:4920 [Candidate]" term=2
INFO[0001] 61466:C 07 Jun 2021 10:26:15.728 * election won: tally=1
INFO[0001] 61466:L 07 Jun 2021 10:26:15.728 * entering leader state: leader="Node at [::]:4920 [Leader]"
INFO[0002] 61466:L 07 Jun 2021 10:26:16.223 * logs loaded: ready for commands

Adding another instance:

james@Jamess-MacBook-Pro
Mon Jun 07 10:23:06
~/tmp/bitraft
 (master) 130
$ ./bitraft -b 0.0.0.0:4921 -j 127.0.0.1:4920 -d ./data2
INFO[0000] 61470:S 07 Jun 2021 10:26:28.259 # starting bitraft version 0.0.1@HEAD
INFO[0000] 61470:S 07 Jun 2021 10:26:28.538 * synchronized time
INFO[0000] 61470:S 07 Jun 2021 10:26:28.538 * server listening at [::]:4921
INFO[0000] 61470:S 07 Jun 2021 10:26:28.538 * server advertising as :5920
INFO[0000] 61470:S 07 Jun 2021 10:26:28.538 # leveldb store open: resource temporarily unavailable

upgrading from finn

Hi Josh :D

I'd like to upgrade bitraft from finn. I need some help though... Mostly it would be really nice if you could document a few more internals. The examples look good, README is great, but some of the API docs are a bit sparse.

For example; I don't know what Initialdata should be, the type is interface{} of course, but I get very "urky" when seeing such open types :)

Also many of the options on the Config struct are a bit undocumented and I'm not sure what some of them mean.

Thanks!