Code Monkey home page Code Monkey logo

coolbeans's Introduction

Build Release Docker Go Report Card

Coolbeans

Coolbeans is a distributed replicated work queue service that implements the beanstalkd protocol.

Unlike a message queue, beanstalkd is a work queue that provides primitive operations to work with jobs.

Coolbeans primarily differs from beanstalkd in that it allows the work queue to be replicated across multiple machines. It uses the RAFT consensus algorithm to replicate the job state consistently across machines.

Motivation

Beanstalkd is a feature-rich and easy to use queue. Beanstalkd, however has a few drawbacks that include: (i) A lack of replication or high availability in terms of machine failures. (ii) There is no native sharding, (iii) No native support for encryption & authentication between the service & the client.

Given the initial setup of beanstalkd is simple, having a HA or sharded production setup is non-trivial. Our premise with Coolbeans is to provide a replicated beanstalkd queue followed by addressing the other issues incrementally. Read about our design approach here.

Key features

Releases

  • Static binary can be downloaded from the release pages.
  • Docker release images can be pulled from here.
  • Docker development images can be pulled from here.

Getting Started

How to contribute

Coolbeans is currently at alpha release quality. It is all about improving the quality of this by testing, testing & more testing.

Here are a few ways you can contribute:


icon by Llisole from the Noun Project

coolbeans's People

Contributors

1xyz avatar harinik avatar yaochie avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

coolbeans's Issues

Speed?

Hello! Thanks for starting and putting forward such a nice project.

Currently investing lots of time and effort into beanstalkd and your "drop-in" replacement is very interesting.

I'm currently wondering about the impact on speed that adding the Raft replication algorithm to beanstalkd would have. I know it's been benchmarked in the past to be very quick but I also know that replicating and having network round-trips are very costly.

Have you tried comparing the two side by side?
Thanks in advance!

Metrics (High level work)

What?

  1. Measure Latency and qps metrics for the Grpc Server (node level or cluster level metrics)
  2. -- " -- for the proxy server
  3. See if the hashicorp raft library exposes or needs any metrics

How to measure?

  1. Figure out what might be a good metrics strategy, basically library to how to expose them
    Can you use a metrics library independent of the capture system (prometheus, or anything else)
  2. Hashicorp & google libraries use https://github.com/armon/go-metrics - might be worthwhile to check what this is actually about??

start a cluster with snapshotted data fails

Problem: when an assigned bootstrap node attempts to bootstrap an existing cluster, it fails with bootstrap only works on new clusters

Repro: steps

  1. I have a three node cluster, but none of the nodes are running.
  2. Ensure that the three node cluster has snapshot data from a previous run. For example: a previous run of make run-cluster generated data in /tmp/bean*
  3. Run make run-cluster. The bootstrap node fails to bootstrap the cluster with bootstrap only works on new clusters

Expected: the bootstrap to start the cluster.

Root cause: Since raft performs bootstrap for new clusters only.
Fix detail: Only the bootstrap node can fails with raft.ErrCantBootstrap. In that case of the bootstrap document, we are going to ignore this error and continue to to start the Grpc server allowing other peers to connect

// called on an un-bootstrapped Raft instance after it has been created. This
// should only be called at the beginning of time for the cluster with an
// identical configuration listing all Voter servers. There is no need to
// bootstrap Nonvoter and Staging servers.
//
// A cluster can only be bootstrapped once from a single participating Voter
// server. Any further attempts to bootstrap will return an error that can be
// safely ignored.
//
// One sane approach is to bootstrap a single server with a configuration
// listing just itself as a Voter, then invoke AddVoter() on it to add other
// servers to the cluster.
func (r *Raft) BootstrapCluster(configuration Configuration) Future {```

Getting started for multiple nodes does not work as expected

Describe the bug:
The getting started guide for a three node HA setup does not work as expected. Reported by external user

Provide the steps to reproduce the behavior:

  1. I set up a three node cluster as you have documented.I am assuming this is a HA setup since they all have the same bootstrap node id.
  2. Put three items and remove two.
  3. I then kill the bean0 process and then try to find the contents of tube01 and it does not work. I also try to put another job and it does not work. I assume it will tolerate failure of 1 node.

Other:

  • OS: [e.g. Linux, Docker, OSX]: lInux
  • Version or branch-commit-hash [e.g. 1.0, master-abcde]: N/A

beanstalkd-proxy does not handle SIGTERM correctly

Describe the bug:
beanstalkd-proxy hangs and does not handle a SIGTERM properly

Provide the steps to reproduce the behavior:

  • Start the beanstalkd proxy service
  • Allow to connect to an upstream proxy
  • Send a kill to the proxy service

The process does not exit.

What is the expected behavior?
Expect the process to gracefully exit

Other:

  • OS: [e.g. Linux, Docker, OSX]
  • Version or branch-commit-hash [e.g. 1.0, master-abcde]
  • Add any other context about the problem here.

Consider an alternate approach to connection transfer

Problem: Refer 1xyz/coolbeans-backup#24

The approach is unclean, consider alternate approach such as maintain multiple TCP connections between proxy client and cluster-node server.

Or maintain your own timers for timing out reservations.

CI build fails

See https://github.com/1xyz/coolbeans/runs/6379258692?check_suite_focus=true

The CI build failed with the following error.

Run make build
rm -rf bin/
go clean -cache
gofmt -s -l -w ./cluster/server/serverfakes/fake_jsm_tick.go ./cluster/server/jsm_server.go ./cluster/server/reservations_controller.go ./cluster/server/reservations_controller_test.go ./cluster/server/cluster_server.go ./cluster/server/health_server.go ./cluster/cmd/cmd_cluster.go ./cluster/cmd/cmd_client.go ./cluster/client/cluster_client.go ./main.go ./state/state.go ./state/client_resv.go ./state/jsm.go ./state/jsm_test.go ./state/errors.go ./state/state_string.go ./state/job_heap.go ./state/index.go ./state/job_heap_test.go ./state/client_resv_test.go ./tools/tools.go ./tools/opts.go ./tests/e2e/protocol_test.go ./beanstalkd/proto/conn_state_string.go ./beanstalkd/proto/conn.go ./beanstalkd/proto/conn_test.go ./beanstalkd/proto/conn_state.go ./beanstalkd/proto/tcp_server.go ./beanstalkd/core/cmd_type_string.go ./beanstalkd/core/client_test.go ./beanstalkd/core/parse_test.go ./beanstalkd/core/cmd_data.go ./beanstalkd/core/core.go ./beanstalkd/core/client.go ./beanstalkd/core/cmd_type.go ./beanstalkd/core/parse.go ./beanstalkd/core/cmd_data_test.go ./beanstalkd/core/cmd_proc.go ./beanstalkd/proxy/bool.go ./beanstalkd/proxy/bool_test.go ./beanstalkd/proxy/client.go ./beanstalkd/cmd/cmd_beanstalkd.go ./beanstalkd/cmd/beanstalkd.go ./store/store.go ./store/snapshot.go ./store/snapshot_test.go ./store/client_uri_test.go ./store/client_uri.go ./api/v1/empty.pb.go ./api/v1/cluster.pb.go ./api/v1/job.pb.go ./api/v1/jsm.pb.go ./api/v1/client.pb.go
./tools/tools.go
./tests/e2e/protocol_test.go
go get -u github.com/golang/protobuf/protoc-gen-go
go: downloading github.com/golang/protobuf v1.5.2
go: downloading google.golang.org/protobuf v1.2[6](https://github.com/1xyz/coolbeans/runs/6379258692?check_suite_focus=true#step:5:6).0
go: downloading google.golang.org/protobuf v1.2[8](https://github.com/1xyz/coolbeans/runs/6379258692?check_suite_focus=true#step:5:8).0
go: module github.com/golang/protobuf is deprecated: Use the "google.golang.org/protobuf" module instead.
go: upgraded github.com/golang/protobuf v1.4.3 => v1.5.2
go: upgraded google.golang.org/protobuf v1.25.0 => v1.28.0
protoc -I api/v1 api/v1/*.proto --go_out=plugins=grpc:api/v1 --go_opt=paths=source_relative
protoc-gen-go: program not found or is not executable
Please specify a program using absolute path or make sure the program is available in your PATH system variable
--go_out: protoc-gen-go: Plugin failed with status code 1.
make: *** [Makefile:81: protoc] Error 1

Expose snapshot creation, and verification

In pr 1xyz/coolbeans-backup#21 . We exposed the creation of raft's user defined snapshot via the cluster RPC

We need to do a few things to make that snapshot really usable

  1. Provide the facility to stream out bytes of the snapshot via RPC

  2. Ability to verify the snapshot (a tool) attached sample source code

  3. Provide an ability to start another cluster w/ this snapshot (or restore this snapshot)

package main

import (
	"github.com/1xyz/coolbeans/state"
	"github.com/1xyz/coolbeans/store"
	log "github.com/sirupsen/logrus"
	"os"
	"time"
)

func init() {
	log.SetFormatter(&log.TextFormatter{})
	log.SetOutput(os.Stdout)
	log.SetLevel(log.DebugLevel)
}
func main() {
	filename := "/tmp/bean0/snapshots/2-418905-1587171565583/state.bin"
	timeout := 120 * time.Second

	rdr, err := os.Open(filename)
	if err != nil {
		log.Panicf("err = %v", err)
	}

	jsm, err := state.NewJSM()
	if err != nil {
		log.Panicf("err = %v", err)
	}

	err = store.RestoreSnapshotTo(rdr, jsm, timeout)
	if err != nil {
		log.Panicf("err = %v", err)
	}

	ss, err := jsm.Snapshot()
	if err != nil {
		log.Panicf("err= %v", err)
	}

	clients, err := ss.SnapshotClients()
	if err != nil {
		log.Panicf("err = %v", err)
	}

	nc := 0
	for c := range clients {
		nc ++
		log.Infof("c = %v. heapIndex=%v", c.CliID, c.HeapIndex)
	}

	log.Infof("nc = %v", nc)

	bucket := make(map[state.JobState]int)
	bucket[state.Reserved] = 0
	bucket[state.Initial] = 0
	bucket[state.Ready] = 0
	bucket[state.Buried] = 0
	bucket[state.Delayed] = 0
	bucket[state.Deleted] = 0
	jobs, err := ss.SnapshotJobs()
	if err != nil {
		log.Panicf("err = %v", err)
	}

	tubeBuckets := make(map[state.TubeName]int)


	nj := 0
	for j := range jobs {
		nj++
		bucket[j.State()]++

		_, ok := tubeBuckets[j.TubeName()]
		if !ok {
			tubeBuckets[j.TubeName()] = 0
		}

		tubeBuckets[j.TubeName()]++
	}

	log.Infof("total of %v jobs encountered", nj)
	log.Infof("buckets %v", bucket)
	log.Infof("tubeBuckets %v", tubeBuckets)
}

Improve beanstalkd proxy connection retry strategy

Problem: Refer beanstalkd/proxy/client.go
Here, we setup a streaming connection between the client and the leader server, however, if the connection is lost for some reason, we retry to connect with no delay or exponential back off. The bug suggests, that it might be problematic to consider a case with no exponential backoff an consider adding a backoff strategy for this

Add beanstalkd-proxy call retry strategy

If a GRPC call fails due to a connection error, then we may want to retry the call using an exponential backoff strategy. This approach is different from GRPCs standard retry strategy, which I think does not handle a connection being reset by a peer

Pause-Tube & List-tubes-xyz beanstalkd commands is not present

  • The list-tubes command returns a list of all existing tubes. Its form is: list-tubes\r\n

  • The list-tube-used command returns the tube currently being used by the
    client.

  • Its form is: list-tube-used\r\n

  • The list-tubes-watched command returns a list tubes currently being watched by the client. Its form is: list-tubes-watched\r\n

  • The pause-tube command can delay any new job being reserved for a given time. Its form is:
    pause-tube \r\n

Add a CLI support for coolbeans

Problem: Add a coolbeans specific CLI with commands as follows:

  1. Leave a cluster
  2. Check ready of a cluster
coolbeans client leave [node-id] [--node-addr=<addr>]
coolbeans client isReady [--node-addr=<addr>

Why is this important?

  • leave can be to add a preStopHook to leave a cluster (although this has been added in SIGTERM handling)
  • isReady can be added to readinessProbe check in K8s

release is not respecting the delay parameter

Problem: when you release a job from reserved state, the delay parameter is not working.

Repro:
reserve --tube=bar --release --delay 10
Actual: the delay is 1024 seconds
Expect: the delay to be 10 seconds

Upgrade the protoc version

> git clone https://github.com/1xyz/coolbeans.git
> make build

go: module github.com/golang/protobuf is deprecated: Use the "google.golang.org/protobuf" module instead.
protoc -I api/v1 api/v1/*.proto --go_out=plugins=grpc:api/v1 --go_opt=paths=source_relative
--go_out: protoc-gen-go: plugins are not supported; use 'protoc --go-grpc_out=...' to generate gRPC

See https://grpc.io/docs/languages/go/quickstart/#regenerate-grpc-code for more information.

What is the expected behavior?

  • Expected the build to complete. Looks like protoc

Other:

  • OS: [e.g. Linux, Docker, OSX]: OSX
  • Version or branch-commit-hash [e.g. 1.0, master-abcde] master
  • Add any other context about the problem here.

go version version 1.18.1

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.