1xyz / coolbeans Goto Github PK

View Code? Open in Web Editor NEW

67.0 6.0 6.0 690 KB

Coolbeans is a distributed work queue that implements the beanstalkd protocol.

License: Mozilla Public License 2.0

Dockerfile 0.15% Makefile 1.58% Go 98.04% Shell 0.24%

beanstalkd raft go distributed-systems consensus queue

coolbeans's Introduction

Coolbeans

Coolbeans

Coolbeans is a distributed replicated work queue service that implements the beanstalkd protocol.

Unlike a message queue, beanstalkd is a work queue that provides primitive operations to work with jobs.

Coolbeans primarily differs from beanstalkd in that it allows the work queue to be replicated across multiple machines. It uses the RAFT consensus algorithm to replicate the job state consistently across machines.

Motivation

Beanstalkd is a feature-rich and easy to use queue. Beanstalkd, however has a few drawbacks that include: (i) A lack of replication or high availability in terms of machine failures. (ii) There is no native sharding, (iii) No native support for encryption & authentication between the service & the client.

Given the initial setup of beanstalkd is simple, having a HA or sharded production setup is non-trivial. Our premise with Coolbeans is to provide a replicated beanstalkd queue followed by addressing the other issues incrementally. Read about our design approach here.

Key features

A fully replicated work queue built using Hashicorp's Raft library.
Strong consistency of all queue operations.
Compatible with existing beanstalkd clients.
Easy installation, available as a static binary or as a Linux docker image.
Monitor metrics using Prometheus and visualize them via Grafana.

Releases

Static binary can be downloaded from the release pages.
Docker release images can be pulled from here.
Docker development images can be pulled from here.

Getting Started

Refer the getting started guide.
To setup a three node cluster refer here.
Getting started guide to run coolbeans on Kubernetes, refer here.

How to contribute

Coolbeans is currently at alpha release quality. It is all about improving the quality of this by testing, testing & more testing.

Here are a few ways you can contribute:

Be an early adopter, Try it out on your machine, testbed or pre-production stack and give us feedback or report issues.
Have a feature in mind. Tell us more about by filing an issue.
Want to contribute to code, documentation. Checkout the contribution guide.

icon by Llisole from the Noun Project

coolbeans's People

Contributors

Stargazers

Watchers

Forkers

kokizzu matlad jrgallelli accaolei yaochie vaskozl

coolbeans's Issues

Speed?

Hello! Thanks for starting and putting forward such a nice project.

Currently investing lots of time and effort into beanstalkd and your "drop-in" replacement is very interesting.

I'm currently wondering about the impact on speed that adding the Raft replication algorithm to beanstalkd would have. I know it's been benchmarked in the past to be very quick but I also know that replicating and having network round-trips are very costly.

Have you tried comparing the two side by side?
Thanks in advance!

Metrics (High level work)

What?

Measure Latency and qps metrics for the Grpc Server (node level or cluster level metrics)
-- " -- for the proxy server
See if the hashicorp raft library exposes or needs any metrics

How to measure?

Figure out what might be a good metrics strategy, basically library to how to expose them
Can you use a metrics library independent of the capture system (prometheus, or anything else)
Hashicorp & google libraries use https://github.com/armon/go-metrics - might be worthwhile to check what this is actually about??

Is this project still under development?

Additional context
I have seen the last release is from 2020, so I wonder is it is halt or still under development :)

start a cluster with snapshotted data fails

Problem: when an assigned bootstrap node attempts to bootstrap an existing cluster, it fails with bootstrap only works on new clusters

Repro: steps

I have a three node cluster, but none of the nodes are running.
Ensure that the three node cluster has snapshot data from a previous run. For example: a previous run of make run-cluster generated data in /tmp/bean*
Run make run-cluster. The bootstrap node fails to bootstrap the cluster with bootstrap only works on new clusters

Expected: the bootstrap to start the cluster.

Root cause: Since raft performs bootstrap for new clusters only.
Fix detail: Only the bootstrap node can fails with raft.ErrCantBootstrap. In that case of the bootstrap document, we are going to ignore this error and continue to to start the Grpc server allowing other peers to connect

// called on an un-bootstrapped Raft instance after it has been created. This
// should only be called at the beginning of time for the cluster with an
// identical configuration listing all Voter servers. There is no need to
// bootstrap Nonvoter and Staging servers.
//
// A cluster can only be bootstrapped once from a single participating Voter
// server. Any further attempts to bootstrap will return an error that can be
// safely ignored.
//
// One sane approach is to bootstrap a single server with a configuration
// listing just itself as a Voter, then invoke AddVoter() on it to add other
// servers to the cluster.
func (r *Raft) BootstrapCluster(configuration Configuration) Future {```

Getting started for multiple nodes does not work as expected

Describe the bug:
The getting started guide for a three node HA setup does not work as expected. Reported by external user

Provide the steps to reproduce the behavior:

I set up a three node cluster as you have documented.I am assuming this is a HA setup since they all have the same bootstrap node id.
Put three items and remove two.
I then kill the bean0 process and then try to find the contents of tube01 and it does not work. I also try to put another job and it does not work. I assume it will tolerate failure of 1 node.

Other:

OS: [e.g. Linux, Docker, OSX]: lInux
Version or branch-commit-hash [e.g. 1.0, master-abcde]: N/A

Is this project still under development?

/(ㄒoㄒ)/~~

beanstalkd-proxy does not handle SIGTERM correctly

Describe the bug:
beanstalkd-proxy hangs and does not handle a SIGTERM properly

Provide the steps to reproduce the behavior:

Start the beanstalkd proxy service
Allow to connect to an upstream proxy
Send a kill to the proxy service

The process does not exit.

What is the expected behavior?
Expect the process to gracefully exit

Other:

OS: [e.g. Linux, Docker, OSX]
Version or branch-commit-hash [e.g. 1.0, master-abcde]
Add any other context about the problem here.

coolbeans- benchmark on kubernetes

Run the beanstalk-benchmark on a kubernetes setup

Consider an alternate approach to connection transfer

Problem: Refer 1xyz/coolbeans-backup#24

The approach is unclean, consider alternate approach such as maintain multiple TCP connections between proxy client and cluster-node server.

Or maintain your own timers for timing out reservations.

beanstalk-client: Add missing commands

beanstalk-client only has put & reserve commands, as we add more commands to the service, it would be nice to have them implemented

CI build fails

See https://github.com/1xyz/coolbeans/runs/6379258692?check_suite_focus=true

The CI build failed with the following error.

Run make build
rm -rf bin/
go clean -cache
gofmt -s -l -w ./cluster/server/serverfakes/fake_jsm_tick.go ./cluster/server/jsm_server.go ./cluster/server/reservations_controller.go ./cluster/server/reservations_controller_test.go ./cluster/server/cluster_server.go ./cluster/server/health_server.go ./cluster/cmd/cmd_cluster.go ./cluster/cmd/cmd_client.go ./cluster/client/cluster_client.go ./main.go ./state/state.go ./state/client_resv.go ./state/jsm.go ./state/jsm_test.go ./state/errors.go ./state/state_string.go ./state/job_heap.go ./state/index.go ./state/job_heap_test.go ./state/client_resv_test.go ./tools/tools.go ./tools/opts.go ./tests/e2e/protocol_test.go ./beanstalkd/proto/conn_state_string.go ./beanstalkd/proto/conn.go ./beanstalkd/proto/conn_test.go ./beanstalkd/proto/conn_state.go ./beanstalkd/proto/tcp_server.go ./beanstalkd/core/cmd_type_string.go ./beanstalkd/core/client_test.go ./beanstalkd/core/parse_test.go ./beanstalkd/core/cmd_data.go ./beanstalkd/core/core.go ./beanstalkd/core/client.go ./beanstalkd/core/cmd_type.go ./beanstalkd/core/parse.go ./beanstalkd/core/cmd_data_test.go ./beanstalkd/core/cmd_proc.go ./beanstalkd/proxy/bool.go ./beanstalkd/proxy/bool_test.go ./beanstalkd/proxy/client.go ./beanstalkd/cmd/cmd_beanstalkd.go ./beanstalkd/cmd/beanstalkd.go ./store/store.go ./store/snapshot.go ./store/snapshot_test.go ./store/client_uri_test.go ./store/client_uri.go ./api/v1/empty.pb.go ./api/v1/cluster.pb.go ./api/v1/job.pb.go ./api/v1/jsm.pb.go ./api/v1/client.pb.go
./tools/tools.go
./tests/e2e/protocol_test.go
go get -u github.com/golang/protobuf/protoc-gen-go
go: downloading github.com/golang/protobuf v1.5.2
go: downloading google.golang.org/protobuf v1.2[6](https://github.com/1xyz/coolbeans/runs/6379258692?check_suite_focus=true#step:5:6).0
go: downloading google.golang.org/protobuf v1.2[8](https://github.com/1xyz/coolbeans/runs/6379258692?check_suite_focus=true#step:5:8).0
go: module github.com/golang/protobuf is deprecated: Use the "google.golang.org/protobuf" module instead.
go: upgraded github.com/golang/protobuf v1.4.3 => v1.5.2
go: upgraded google.golang.org/protobuf v1.25.0 => v1.28.0
protoc -I api/v1 api/v1/*.proto --go_out=plugins=grpc:api/v1 --go_opt=paths=source_relative
protoc-gen-go: program not found or is not executable
Please specify a program using absolute path or make sure the program is available in your PATH system variable
--go_out: protoc-gen-go: Plugin failed with status code 1.
make: *** [Makefile:81: protoc] Error 1

Bug: When a job whose size is > the job limit is posted, a CmdBadFormat error is returned

Problem: when a job > the prescribed job size limit is posted a ErrBadFormat is returned, it is expected to return ErrJobSizeTooBig

Fix: Added special case handling when the put job's size is higher than the posted value

Figure out Repo permissions when made public

Expose snapshot creation, and verification

In pr 1xyz/coolbeans-backup#21 . We exposed the creation of raft's user defined snapshot via the cluster RPC

We need to do a few things to make that snapshot really usable

Provide the facility to stream out bytes of the snapshot via RPC
Ability to verify the snapshot (a tool) attached sample source code
Provide an ability to start another cluster w/ this snapshot (or restore this snapshot)

package main

import (
	"github.com/1xyz/coolbeans/state"
	"github.com/1xyz/coolbeans/store"
	log "github.com/sirupsen/logrus"
	"os"
	"time"
)

func init() {
	log.SetFormatter(&log.TextFormatter{})
	log.SetOutput(os.Stdout)
	log.SetLevel(log.DebugLevel)
}
func main() {
	filename := "/tmp/bean0/snapshots/2-418905-1587171565583/state.bin"
	timeout := 120 * time.Second

	rdr, err := os.Open(filename)
	if err != nil {
		log.Panicf("err = %v", err)
	}

	jsm, err := state.NewJSM()
	if err != nil {
		log.Panicf("err = %v", err)
	}

	err = store.RestoreSnapshotTo(rdr, jsm, timeout)
	if err != nil {
		log.Panicf("err = %v", err)
	}

	ss, err := jsm.Snapshot()
	if err != nil {
		log.Panicf("err= %v", err)
	}

	clients, err := ss.SnapshotClients()
	if err != nil {
		log.Panicf("err = %v", err)
	}

	nc := 0
	for c := range clients {
		nc ++
		log.Infof("c = %v. heapIndex=%v", c.CliID, c.HeapIndex)
	}

	log.Infof("nc = %v", nc)

	bucket := make(map[state.JobState]int)
	bucket[state.Reserved] = 0
	bucket[state.Initial] = 0
	bucket[state.Ready] = 0
	bucket[state.Buried] = 0
	bucket[state.Delayed] = 0
	bucket[state.Deleted] = 0
	jobs, err := ss.SnapshotJobs()
	if err != nil {
		log.Panicf("err = %v", err)
	}

	tubeBuckets := make(map[state.TubeName]int)


	nj := 0
	for j := range jobs {
		nj++
		bucket[j.State()]++

		_, ok := tubeBuckets[j.TubeName()]
		if !ok {
			tubeBuckets[j.TubeName()] = 0
		}

		tubeBuckets[j.TubeName()]++
	}

	log.Infof("total of %v jobs encountered", nj)
	log.Infof("buckets %v", bucket)
	log.Infof("tubeBuckets %v", tubeBuckets)
}

Update Getting Started Documentation for non k8s

Describe the bug
Update Getting Started Documentation for non k8s

Documentation and improvements

Here are some documentation fixes for Readme

Add Motivation
Add Key Features section
Setting up a cluster manually on three machines, with an example. Refer https://github.com/rqlite/rqlite excellent documentation to get ideas
Durability of writing to the store
Also, see if you can generate godoc

cc: @jrgallelli

Add contribution guidelines & code of conduct to repository

Introduce the ability to trace methods calls

Add context in all method of state.go interface, this is to measure the time and to do detailed debugging. Also, see how you can span tracing.

tracing library: https://github.com/opentracing/opentracing-go

Metrics: Install a prometheus server

Have an option to start a prometheus server within beanstalkd that can expose the metrics

Binary Releases are not getting posted in releases page

Describe the bug
Binary Releases are not getting posted in releases page

Expect: Binary Releases to get posted in releases page

Expected behavior
Binary Releases are not getting posted in releases page

Improve beanstalkd proxy connection retry strategy

Problem: Refer beanstalkd/proxy/client.go
Here, we setup a streaming connection between the client and the leader server, however, if the connection is lost for some reason, we retry to connect with no delay or exponential back off. The bug suggests, that it might be problematic to consider a case with no exponential backoff an consider adding a backoff strategy for this

Add beanstalkd-proxy call retry strategy

If a GRPC call fails due to a connection error, then we may want to retry the call using an exponential backoff strategy. This approach is different from GRPCs standard retry strategy, which I think does not handle a connection being reset by a peer

Pause-Tube & List-tubes-xyz beanstalkd commands is not present

The list-tubes command returns a list of all existing tubes. Its form is: list-tubes\r\n
The list-tube-used command returns the tube currently being used by the
client.
Its form is: list-tube-used\r\n
The list-tubes-watched command returns a list tubes currently being watched by the client. Its form is: list-tubes-watched\r\n
The pause-tube command can delay any new job being reserved for a given time. Its form is:
pause-tube \r\n

Add a CLI support for coolbeans

Problem: Add a coolbeans specific CLI with commands as follows:

Leave a cluster
Check ready of a cluster

coolbeans client leave [node-id] [--node-addr=<addr>]
coolbeans client isReady [--node-addr=<addr>

Why is this important?

leave can be to add a preStopHook to leave a cluster (although this has been added in SIGTERM handling)
isReady can be added to readinessProbe check in K8s

Expose State Server Grpc configuration via the configuration file

Related bug 1xyz/coolbeans-backup#13

The beanstalkd state should have various elements of its Grpc configuration exposed as a configuration file (JSON), state server configuration

GrpC Server

https://github.com/grpc/grpc-go/tree/master/examples/features

release is not respecting the delay parameter

Problem: when you release a job from reserved state, the delay parameter is not working.

Repro:
reserve --tube=bar --release --delay 10
Actual: the delay is 1024 seconds
Expect: the delay to be 10 seconds

Upgrade the protoc version

> git clone https://github.com/1xyz/coolbeans.git
> make build

go: module github.com/golang/protobuf is deprecated: Use the "google.golang.org/protobuf" module instead.
protoc -I api/v1 api/v1/*.proto --go_out=plugins=grpc:api/v1 --go_opt=paths=source_relative
--go_out: protoc-gen-go: plugins are not supported; use 'protoc --go-grpc_out=...' to generate gRPC

See https://grpc.io/docs/languages/go/quickstart/#regenerate-grpc-code for more information.

What is the expected behavior?

Expected the build to complete. Looks like protoc

Other:

OS: [e.g. Linux, Docker, OSX]: OSX
Version or branch-commit-hash [e.g. 1.0, master-abcde] master
Add any other context about the problem here.

go version version 1.18.1

1xyz / coolbeans Goto Github PK

coolbeans's Introduction

Coolbeans

Motivation

Key features

Releases

Getting Started

How to contribute

coolbeans's People

Contributors

Stargazers

Watchers

Forkers

coolbeans's Issues

Recommend Projects

Recommend Topics

Recommend Org