Code Monkey home page Code Monkey logo

xline's Introduction

Xline

Xline-logo

Discord Shield Apache 2.0 licensed Build Status codecov OpenSSF Best Practices

Welcome to the Xline Project!

Xline is meant to provide high-performance, strongly consistent metadata management for data centers in WAN.

Scope

At a high level, we expect the scope of Xline to be restricted to the following functionalities:

  • Cloud-native, compatible with the Kubernetes ecosystem.
  • Support geo-distributed friendly deployment.
  • Supports a rich set of key-value (KV) interfaces, fully compatible with etcd's API.
  • Ensures high performance and strong consistency in a wide-area network environment, using the CURP protocol as the underlying consensus protocol.

cncf-logo

Xline is a sandbox project of the Cloud Native Computing Foundation (CNCF).

If you have any questions, suggestions, or would like to join the xline's discussions, please feel free to join our Discord channel.

Motivation

With the widespread adoption of cloud computing, multi-cloud platforms (multiple public or hybrid clouds) have become the mainstream IT architecture for enterprise customers. However, multi-cloud platform architecture hinders data access between different clouds to some extent.

In addition, data isolation and data fragmentation due to cloud barriers have become obstacles to business growth. The biggest challenge of multi-cloud architectures is how to maintain strong data consistency and ensure high performance in the competitive conditions of multi-data center scenarios. Traditional single data center solutions cannot meet the availability, performance, and consistency requirements of multi-data center scenarios.

This project aims to enable a high-performance multi-cloud metadata management solution for multi-cloud scenarios, which is critical for organizations with geo-distributed and multi-active deployment requirements.

Innovation

Cross-datacenter network latency is the most important factor that impacts the performance of geo-distributed systems, especially when a consensus protocol is used. We know consensus protocols are popular to use to achieve high availability. For instance, Etcd uses the Raft protocol, which is quite popular in recently developed systems.

Although Raft is stable and easy to implement, it takes 2 RTTs to complete a consensus request from the view of a client. One RTT takes place between the client and the leader server, and the leader server takes another RTT to broadcast the message to the follower servers. In a geo-distributed environment, an RTT is quite long, varying from tens of milliseconds to hundreds of milliseconds, so 2 RTTs are too long in such cases.

We adopt a new consensus protocol named CURP to resolve the above issue. Please refer to the paper for a detailed description. The main benefit of the protocol is reducing 1 RTT when contention is not too high. As far as we know, Xline is the first product to use CURP. For more protocol comparison, please refer to the blog

Performance Comparison

We compared Xline with Etcd in a simulated multi-cluster environment. The details of the deployment is shown below.

test deployment

We compared the performance with two different workloads. One is 1 key case, the other is 100K key space case. Here's the test result.

xline key perf

It's easy to tell Xline has a better performance than Etcd in a geo-distributed multi-cluster environment.

Xline client

For more information about the Xline client SDK, or the Xline client command line tool, please refer to the following documents:

Quick Start

To get started, check out the document QUICK_START.md for in-depth information and step-by-step instructions.

Contribute Guide

Our project welcomes contributions from any member of our community. To get started contributing, please see our CONTRIBUTING.md.

Code of Conduct

The Xline project adheres to the CNCF Community Code of Conduct . It describes the minimum behavior expected from all contributors.

Roadmap

  • v0.1 ~ v0.2

    • Support all major Etcd APIs
    • Support configuration file
    • Pass validation tests (All the supported Etcd APIs and their validation test results can be viewed in VALIDATION_REPORT.md)
  • v0.3 ~ v0.5

    • Enable persistent storage
    • Enable snapshot
    • Enable cluster membership change
    • Implement a k8s operator basically
  • v0.6 ~ v0.8

    • Enable to export metrics to some monitoring and alerting systems
    • Enable SSL/TLS certificates
    • Provide clients implementing in different languages, like go, python (not determined). [Note: Although the Xline is etcd-compatible, we provide an Xline specific client SDK to users for better performance. Currently this SDK is only in Rust lang, and we plan to extend it to other languages]
  • v1.0 ~

    • Enable chaos engineering to validate the system's stability
    • Integration with other CNCF components
    • Support Karmada (a Kubernetes management system)

xline's People

Contributors

bsbds avatar caicancai avatar chaudharyraman avatar dependabot[bot] avatar eaimty avatar gitter-badger avatar igxnon avatar kikkon avatar liangyuanpeng avatar liubog2008 avatar markcty avatar markgaox avatar myrfy001 avatar phoenix500526 avatar pzheng1025 avatar rogercloud avatar themanforfree avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

xline's Issues

[Xline] Failed to validate the test case in quick start

I run commands in the Contribution Guide step by step and I am stuck in the Send Etcd requests section. The error log is as down below.

$ docker exec node4 /bin/sh -c "/usr/local/bin/etcdctl --endpoints=\"http://172.20.0.3:2379\" put A 1"
OK
$ docker exec node4 /bin/sh -c "/usr/local/bin/etcdctl --endpoints=\"http://172.20.0.3:2379\" get A 1"
{"level":"warn","ts":"2022-12-27T09:18:09.831Z","logger":"etcd-client","caller":"v3/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc0002b4000/172.20.0.3:2379","attempt":0,"error":"rpc error: code = Unknown desc = malformed header: missing HTTP content-type"}
Error: rpc error: code = Unknown desc = malformed header: missing HTTP content-type

The GC process may break the CURP Durability in some extreme cases

Currently, our gc process mainly relies on running gc_task every GC_INTERVAL. In some extreme cases, it threatens to lose those completed-but-yet-to-sync commands.

For instance, a client sends a request to a leader and the leader executes this command in the fast path. But before syncing the command execution result to others, it crashes. And for some unknown reason, the cluster cannot reach the same page about who is the next leader. After GC_INTERVAL, this command is removed from spec_pool via the gc task. After ten seconds or so, the new leader is finally elected. But unfortunately, the speculatively executed command is gone. The new leader cannot recover it from the spec_pool.

[Xline] Variables can be used directly in the `format!` string

Inline variables in a format string and eliminate these warnings.

cargo clippy
    Checking curp v0.1.0 (/home/jiawei/Xline/curp)
error: variables can be used directly in the `format!` string
   --> curp/src/client.rs:252:58
    |
252 |                     return Err(ProposeError::SyncedError(format!("{:?}", e)));
    |                                                          ^^^^^^^^^^^^^^^^^^
    |
    = help: for further information visit https://rust-lang.github.io/rust-clippy/master/index.html#uninlined_format_args
note: the lint level is defined here
   --> curp/src/lib.rs:44:5
    |
44  |     clippy::pedantic,
    |     ^^^^^^^^^^^^^^^^
    = note: `#[deny(clippy::uninlined_format_args)]` implied by `#[deny(clippy::pedantic)]`
help: change this to
    |
252 -                     return Err(ProposeError::SyncedError(format!("{:?}", e)));
252 +                     return Err(ProposeError::SyncedError(format!("{e:?}")));
    |

error: variables can be used directly in the `format!` string
   --> curp/src/server/mod.rs:167:17
    |
167 |                 format!("0.0.0.0:{}", port)
    |                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^
    |
    = help: for further information visit https://rust-lang.github.io/rust-clippy/master/index.html#uninlined_format_args
help: change this to
    |
167 -                 format!("0.0.0.0:{}", port)
167 +                 format!("0.0.0.0:{port}")
    |

error: variables can be used directly in the `format!` string
   --> curp/src/server/mod.rs:169:60
    |
169 |                     .map_err(|e| ServerError::ParsingError(format!("{}", e)))?,
    |                                                            ^^^^^^^^^^^^^^^^
    |
    = help: for further information visit https://rust-lang.github.io/rust-clippy/master/index.html#uninlined_format_args
help: change this to
    |
169 -                     .map_err(|e| ServerError::ParsingError(format!("{}", e)))?,
169 +                     .map_err(|e| ServerError::ParsingError(format!("{e}")))?,
    |

error: variables can be used directly in the `format!` string
   --> curp/src/server/mod.rs:403:45
    |
403 |             tonic::Status::invalid_argument(format!("propose cmd decode failed: {}", e))
    |                                             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    |
    = help: for further information visit https://rust-lang.github.io/rust-clippy/master/index.html#uninlined_format_args
help: change this to
    |
403 -             tonic::Status::invalid_argument(format!("propose cmd decode failed: {}", e))
403 +             tonic::Status::invalid_argument(format!("propose cmd decode failed: {e}"))
    |

error: variables can be used directly in the `format!` string
   --> curp/src/server/mod.rs:483:45
    |
483 |             tonic::Status::invalid_argument(format!("wait_synced id decode failed: {}", e))
    |                                             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    |
    = help: for further information visit https://rust-lang.github.io/rust-clippy/master/index.html#uninlined_format_args
help: change this to
    |
483 -             tonic::Status::invalid_argument(format!("wait_synced id decode failed: {}", e))
483 +             tonic::Status::invalid_argument(format!("wait_synced id decode failed: {e}"))
    |

error: variables can be used directly in the `format!` string
   --> curp/src/server/mod.rs:584:50
    |
584 |             .map_err(|e| tonic::Status::internal(format!("encode or decode error, {}", e)))?;
    |                                                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    |
    = help: for further information visit https://rust-lang.github.io/rust-clippy/master/index.html#uninlined_format_args
help: change this to
    |
584 -             .map_err(|e| tonic::Status::internal(format!("encode or decode error, {}", e)))?;
584 +             .map_err(|e| tonic::Status::internal(format!("encode or decode error, {e}")))?;
    |

error: could not compile `curp` due to 6 previous errors

compile error

cargo build --release
Compiling console v0.15.2
error: unnecessary parentheses around match arm expression
--> curp/src/bg_tasks.rs:191:28
|
191 | Ok(req) => (req),
| ^ ^
|
note: the lint level is defined here
--> curp/src/lib.rs:41:5
|
41 | warnings, // treat all wanings as errors
| ^^^^^^^^
= note: #[deny(unused_parens)] implied by #[deny(warnings)]
help: remove these parentheses
|
191 - Ok(req) => (req),
191 + Ok(req) => req,
|

Compiling indicatif v0.17.2
error: could not compile curp due to previous error
warning: build failed, waiting for other jobs to finish...

rustc -V
rustc 1.65.0 (897e37553 2022-11-02)
protoc --version
libprotoc 3.21.11
uname -a
Darwin bcd07439fa51 22.1.0 Darwin Kernel Version 22.1.0: Sun Oct 9 20:15:09 PDT 2022; root:xnu-8792.41.9~2/RELEASE_ARM64_T6000 arm64

[Refactor]: Refactor stream redirect

we need to redirect lease keep alive stream to the leader in lease service, the current implementation doesn't look very good, maybe we can use proxy pattern to refactor it?

[Bug]: Cannot modify kv pairs after watch them in etcdctl

Description about the bug

  1. One-line summary
    I cannot modify an existing key after using the etcdctl watch command to watch it.

  2. What did I expect to happen?
    I should have successfully modified an existing key-value pair whether I watch it or not.

  3. How to reproduce this issue?
    step 1. build an environment (run the quick_start.sh script)
    step 2. refer to the Relevant log output down below.

Version

0.1.0 (Default)

Relevant log output

1. Modified an existing key without watching it, all modified ops can work properly.
$ docker exec -it node4 bash
root@e97cceaf264a:/# alias etcdctl='/usr/local/bin/etcdctl --endpoints="http://172.20.0.3:2379"'
root@e97cceaf264a:/# etcdctl put hello world1
OK
root@e97cceaf264a:/# etcdctl put hello world2
OK
root@e97cceaf264a:/# etcdctl put hello world3
OK
root@e97cceaf264a:/# etcdctl put hello world4
OK
root@e97cceaf264a:/# etcdctl put hello world5
OK
root@e97cceaf264a:/# etcdctl put hello world6
OK
root@e97cceaf264a:/# etcdctl put hello world7
OK
root@e97cceaf264a:/# etcdctl put hello world8
OK
root@e97cceaf264a:/# etcdctl put hello world9
OK
root@e97cceaf264a:/# etcdctl put hello world10
OK

2. Modified an existing key after watching it, the most ops after the watch op failed.
$ docker exec -it node4 bash
root@2cb3d250295f:/# alias etcdctl='/usr/local/bin/etcdctl --endpoints="http://172.20.0.3:2379"'
root@2cb3d250295f:/# etcdctl put hello world1
OK
root@2cb3d250295f:/# etcdctl put hello world2
OK
root@2cb3d250295f:/# etcdctl put hello world3
OK
root@2cb3d250295f:/# etcdctl put hello world4
OK
root@2cb3d250295f:/# etcdctl put hello world5
OK
root@2cb3d250295f:/# etcdctl watch hello -w=json --rev=1
{"Header":{"revision":6},"Events":[{"kv":{"key":"aGVsbG8=","create_revision":2,"mod_revision":2,"version":1,"value":"d29ybGQx"}},{"kv":{"key":"aGVsbG8=","create_revision":2,"mod_revision":3,"version":2,"value":"d29ybGQy"}},{"kv":{"key":"aGVsbG8=","create_revision":2,"mod_revision":4,"version":3,"value":"d29ybGQz"}},{"kv":{"key":"aGVsbG8=","create_revision":2,"mod_revision":5,"version":4,"value":"d29ybGQ0"}},{"kv":{"key":"aGVsbG8=","create_revision":2,"mod_revision":6,"version":5,"value":"d29ybGQ1"}}],"CompactRevision":0,"Canceled":false,"Created":false}
^C
root@2cb3d250295f:/# etcdctl put hello world6
OK
root@2cb3d250295f:/# etcdctl put hello world7
OK
root@2cb3d250295f:/# etcdctl put hello world8
{"level":"warn","ts":"2023-01-14T03:29:53.423Z","logger":"etcd-client","caller":"v3/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc000244000/172.20.0.3:2379","attempt":0,"error":"rpc error: code = Unknown desc = malformed header: missing HTTP content-type"}
Error: rpc error: code = Unknown desc = malformed header: missing HTTP content-type
root@2cb3d250295f:/# etcdctl put hello world9
{"level":"warn","ts":"2023-01-14T03:30:01.802Z","logger":"etcd-client","caller":"v3/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc0001f6a80/172.20.0.3:2379","attempt":0,"error":"rpc error: code = Unknown desc = malformed header: missing HTTP content-type"}
Error: rpc error: code = Unknown desc = malformed header: missing HTTP content-type
root@2cb3d250295f:/# etcdctl put hello world10
{"level":"warn","ts":"2023-01-14T03:30:09.523Z","logger":"etcd-client","caller":"v3/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc000234000/172.20.0.3:2379","attempt":0,"error":"rpc error: code = Unknown desc = malformed header: missing HTTP content-type"}
Error: rpc error: code = Unknown desc = malformed header: missing HTTP content-type
root@2cb3d250295f:/# etcdctl put hello world11
{"level":"warn","ts":"2023-01-14T03:38:58.177Z","logger":"etcd-client","caller":"v3/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc000452a80/172.20.0.3:2379","attempt":0,"error":"rpc error: code = DeadlineExceeded desc = context deadline exceeded"}
Error: context deadline exceeded
root@2cb3d250295f:/# etcdctl put hello world11
{"level":"warn","ts":"2023-01-14T03:39:05.726Z","logger":"etcd-client","caller":"v3/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc000234000/172.20.0.3:2379","attempt":0,"error":"rpc error: code = DeadlineExceeded desc = context deadline exceeded"}
Error: context deadline exceeded
root@2cb3d250295f:/# etcdctl put hello world12
{"level":"warn","ts":"2023-01-14T03:40:00.185Z","logger":"etcd-client","caller":"v3/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc00024e000/172.20.0.3:2379","attempt":0,"error":"rpc error: code = DeadlineExceeded desc = context deadline exceeded"}
Error: context deadline exceeded
root@2cb3d250295f:/# etcdctl put hello1 world1
OK
root@2cb3d250295f:/# etcdctl put hello1 world2
{"level":"warn","ts":"2023-01-14T03:40:22.492Z","logger":"etcd-client","caller":"v3/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc000236000/172.20.0.3:2379","attempt":0,"error":"rpc error: code = Unknown desc = malformed header: missing HTTP content-type"}
Error: rpc error: code = Unknown desc = malformed header: missing HTTP content-type

Code of Conduct

  • I agree to follow this project's Code of Conduct

make gc interval configurable

Currently, the GC_INTERVAL is still hard-coded in the curp/src/server/gc.rs, like:

/// How often we should do gc
const GC_INTERVAL: Duration = Duration::from_secs(20);

/// Run background GC tasks for Curp server
pub(super) fn run_gc_tasks<C: Command + 'static>(cmd_board: CmdBoardRef<C>, spec: SpecPoolRef<C>) {
    let _spec_pool_gc = tokio::spawn(gc_spec_pool(spec, GC_INTERVAL));
    let _cmd_board_gc = tokio::spawn(gc_cmd_board(cmd_board, GC_INTERVAL));
}

In fact, since the gc interval may be changed in some situations, we should put it into the ServerTimeout in the config.rs file and make it configurable, as other configuration field does.

[Curp] Make everything configurable

Currently, configurations like HEATBEAT_INTERVAL, N_EXECUTE_WORKERS, or ELECTION_TIMEOUT are hardcoded in the code. Make it configurable by developing something like a server builder pattern.

[Curp] Optimize the sync channel blocking mechanism

Currently, the sync of commands that can be executed speculatively will be postponed until the execution is finished. This is to preserve the execution order. For example, if a=1 and a=2 arrive sequentially. Only after a=1 finishes speculative execution will a=1 be synced. Only then will a=2 be synced and executed due to the key-based channel.

The postponement is unnecessary. We can add another mechanism to preserve the execution order. For example, store the execution order in a channel and spawn an execution task.

Originally posted by @rogercloud in #86 (comment)

[Curp] Protocol and SyncManager should be better organized

Ideally, Protocol in server.rs should accept commands from clients and hand them to SyncManager where we use Raft to sync between different servers. Currently, however, they are interleaved with each other: Protocol also handles some of the RPCs used in Raft while SyncManager will remove executed commands from the speculative commands pool.

[Xline] refactor Range API

Our Range request is slower than etcd now, etcd's range request doesn't go through the consensus protocol, just ask the leader for a committed index, and then wait for the applied index of the current node to catch up with the committed index. and they can be batched when asking for index from leader. but in our current implementation, all Range requests use the curp protocol.
Xline
etcd

[Feature]: Snapshot

When to create

  1. When a node falls too far behind the Leader, the Leader creates a Snapshot and sends it to the corresponding follower.
  2. When a user actively creates a Snapshot.

How to create

Create a Snapshot through the StorageApi's interface, which returns a Snapshot structure. The specific implementation is determined by the underlying storage engine and must implement the AsyncRead and AsyncWrite traits for reading and writing Snapshot-related files.

How to send

Snapshots are sent in two situations:

  1. Leader sends to Follower - this is done through the InstallSnapshot RPC, where the curp client sends the Snapshot to the lagging Follower. The request is a stream that sends the Snapshot in chunks to the Follower.
  2. User actively requests a Snapshot - this is done through the Snapshot RPC of the maintenance service, where the user sends a SnapshotRequest request and the server returns a stream that sends the Snapshot in chunks back to the user.

How to recover

When the Follower receives the InstallSnapshot RPC, it writes the Snapshot to its local storage and then applies the Snapshot to the local storage through the StorageApi's apply_snapshot interface, which is implemented by the underlying storage engine.

The Snapshot file obtained by the user is only used for disaster recovery and is recovered to obtain the business data through a separate tool.

Proto

message InstallSnapshotRequest {
    uint64 term = 1;
    uint64 leader_id = 2;
    uint64 last_included_index = 3;
    uint64 last_included_term = 4;
    uint64 offset = 5;
    bytes data = 6;
    bool done = 7;
}

message InstallSnapshotResponse {
    uint64 term = 1;
}

service Protocol {
    ...

    rpc InstallSnapshot (stream InstallSnapshotRequest) returns (InstallSnapshotResponse);

}

Abstract

trait StorageApi {
    type Error:  std::error::Error + Send + Sync + 'static;
    
    type Snapshot: Snapshot;

    ...

    fn snapshot(&self) -> Result<Self::Snapshot,Self::Error>;

    fn apply_snapshot(&self, snapshot: Self::Snapshot) -> Result<(),Self::Error>;
}

trait Snapshot: Read + Write {

    fn size(&self) -> u64;
}

Discussion

we will persist term and other metadata directly, do we need to create snapshots for them regularly? Or do we only need to regularly compact the log?

[Feature]: Improve timeout mechanism in Curp

Description about the issue

Currently, our code uses a real system time timeout mechanism, which is straight forward and easy to understand. However, obtaining system time is not efficient. We need to update last_rpc_time every time we receive a new append_entries, which could lead to performance issue when the server is busy.

Etcd offers a more clever timeout detection mechanism based on tick. A task is spawned to periodically perform tick operation. Every tick is 1 heartbeat interval. A tick will increase election_timeout and heartbeat_timeout by 1. A broadcast of heartbeats(for leader) or a new round of election(for follower) will be performed when timeout.

I believe adopting this method will improve curp’s performance as well as readability(since all periodically executed operations will be in one place).

Approach

  • Remove last_rpc_time. Add heartbeat_timeout and election_timeout tick count.
  • Merge timeout logic inside bg_election and bg_heartbeat.
  • Add tick logic and relevant tests.

Code of Conduct

  • I agree to follow this project's Code of Conduct

[Bug]: Watch Concurrent Bug

Description about the bug

When we call watch in kvwatcher.rs, the following things will happen:

  1. get events from the requested revision
  2. add watcher to the map and start listening for events

However, these two steps are not atomic operation. So if a new event occur after 1 but before 2, then we will lose this event.

Version

0.1.0 (Default)

Relevant log output

Not yet produced.

Code of Conduct

  • I agree to follow this project's Code of Conduct

[Feature]: Abstract a structure to index by both id and key

Description about the feature

  1. Indexing by both id and key seems to be a common need in our project(in lease and watch).
  2. encapsulate two maps inside the structure and provide necessary methods externally

Code of Conduct

  • I agree to follow this project's Code of Conduct

[Feature]: Replace the reachable with an injected function

Description about the feature

Replace the reachable with an injected function. Currently, we use an atomic bool to control whether the server is reachable. We can extend the logic to a function that could help us test in other ways.

Code of Conduct

  • I agree to follow this project's Code of Conduct

Abstract DB interfaces as a trait.

Right now DB is implemented as a in-memory hashmap. To be able to substitute different DB backends, need to abstract DB interfaces as a trait and allow others DB backends to implement.

help: unitTest error

image

When I tested the method test_kv_put() in xline/tests/kv_test.rs, it have a wrong.

[Feature]: Refactor Curp State

Description about the feature

Currently, the state of our curp server is filled with many things. It is very maintainable since we will soon add persistent support to the project. And a big lock of state is not efficient either. We should refactor curp's state now.

In addition, we need to add index to our log entry since it will be utilized when snapshot is enabled.

Approach

We can divide the state into following sections:

  1. PersistedState that needs to be persisted on disk: term, voted_for.
  2. VolatileState that stores only in memory: commitIndex, lastApplied, leader_id, next_index, match_index, and timeout utils.
  3. Logs that needs to be persisted on disk.

Other structures that exist in current State are all shared references. They can be moved out into a new structure called Shared.

Code of Conduct

  • I agree to follow this project's Code of Conduct

[Feature]: Make `ExecuteError` become more generic

Description about the feature

Our ExecuteError now has only two members, but in actual situations, other errors may occur in the executor, but in Curp, it is impossible to know the errors that will occur in the upper layer, so a solution is needed to allow Corp users to define their own errors type.

Approach

  1. The easiest way is to let ExecuteError only contain the message but not the type, which is similar to our current implementation, but the upper layer will not be able to judge the type of the error.
  2. Another way is let the error as the associated type of the CommandExecutor trait, But because many places in Curp used ExecuteError, here will be complicated changes.

Code of Conduct

  • I agree to follow this project's Code of Conduct

[Bug]: Add tests to cover bugs fixed in #134

Description about the bug

Heartbeat may wipe newly replicated logs.

Version

0.1.0 (Default)

Relevant log output

No response

Code of Conduct

  • I agree to follow this project's Code of Conduct

[Refactor] Refactor the shutdown channel

Currently, our curp server will abort all the background tasks via a Shutdown instance. When a Protocol instance is dropped, it will send a message to notify a background task to abort other tasks. Some relevant codes are shown below:

/// Shutdown broadcast wrapper
#[derive(Debug)]
pub(crate) struct Shutdown {
    /// `true` if the shutdown signal has been received
    shutdown: bool,
    /// The receive half of the channel used to listen for shutdown.
    notify: broadcast::Receiver<()>,
}

impl<C: 'static + Command> Drop for Protocol<C> {
     #[inline]
     fn drop(&mut self) {
         // TODO: async drop is still not supported by Rust(should wait for bg tasks to be stopped?), or we should create an async `stop` function for Protocol
        let _ = self.stop_tx.send(()).ok();
     }
 }

However, we don't need to record this shutdown bool variable, since this shutdown channel will only be notified once. Thus, we can use an event listener to replace the Shutdown struct.

[Curp] Failed to compile proto in that message.proto contains proto3 optional fields.

When I executed the cargo build --release as described in the Contribution Guide document, it showed me a proto compile error as follows:

$ cargo build --release
   Compiling curp v0.1.0 (/home/Xline/curp)
error: failed to run custom build command for `curp v0.1.0 (/home/Xline/curp)`

Caused by:
  process didn't exit successfully: `/home/Xline/target/release/build/curp-755359d522eb8104/build-script-build` (exit status: 101)
  --- stderr
  thread 'main' panicked at 'Failed to compile proto, error is Custom { kind: Other, error: "protoc failed: message.proto: This file contains proto3 optional fields, but --experimental_allow_proto3_optional was not set.\n" }', curp/build.rs:4:29
  note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

I will issue a pr to solve this problem.

[Feature]: Curp Refactoring

Motivations

Currently, our curp server has the following pain points that might slow down our later developments:

  1. Poorly-structured state: Currently, the state of our curp server is filled with many things. It is not very maintainable since we will soon add persistent support to the project. And a big lock for the state is not efficient either.
  2. Poor ergonomics for developers and reviewers: Anyone who has worked with async Rust code development has likely struggled with the LockGuard scope issue. Given that LockGuard must be released at the await point, we must take special care in how we structure the code, which results in opaque and nested code that is difficult both for the developer to write and for reviewers to read. Things would go more smoothly if we could divide our code into two parts: the non-async (update of curp's state) component and the async (transport, io) half.
  3. Duplicated code and long functions: we need to dedup and create shorter and more readable code
  4. Preparation for the upcoming persistency and snapshot support: abstract storage layer and index should be added to log entry

Approach

As a solution, I suggest introducing a new abstraction layer called RawCurp. RawCurp can be viewed as a state machine. It is purely non-async. So the receiving and sending of requests and responses will be handled outside of RawCurp.

It looks like this:

pub(super) struct RawCurp<C: Command> {
    /// Curp state
    st: RwLock<State>,
    /// Additional leader state
    lst: RwLock<LeaderState>,
    /// Additional candidate state
    cst: Mutex<CandidateState<C>>,
    /// Curp logs
    log: RwLock<Log<C>>,
    /// Relevant context
    ctx: Context<C>,
}

impl<C: 'static + Command> RawCurp<C> {
    /// Handle `append_entries`
    /// Return Ok(term) if succeeds
    /// Return Err(term, hint_next_index_for_leader) if fails
    pub(super) fn handle_append_entries(
            &self,
            term: u64,
            leader_id: String,
            prev_log_index: usize,
            prev_log_term: u64,
            entries: Vec<LogEntry<C>>,
            leader_commit: usize,
        ) -> Result<u64, (u64, usize)>;
  
    /// Handle `append_entries` response
    /// Return Ok(append_entries_succeeded)
    /// Return Err(()) if self is no longer the leader
    pub(super) fn handle_append_entries_resp(
          &self,
          follower_id: &ServerId,
          last_sent_index: Option<usize>, // None means the ae is a heartbeat
          term: u64,
          success: bool,
          hint_index: usize,
      ) -> Result<bool, ()>;
  
    // other handlers ...

    pub(super) fn tick(&self) -> TickAction;
}

Before refactoring: Protocol, State, bg_tasks are messed up together
image

After refactoring:
image

CurpNode will handle all transport-related functionalities and error handlings, while RawCurp only needs to take care of internal structures and their consistencies.

I plan to deliver this refactor in 4 prs.

  1. A refactor of command workers: currently, we let bg_tasks to start execute workers and after sync workers, which can be encapsulated into the cmd workers. The cmd workers should only expose a channel through which the cmd can be sent.
  2. RawCurp without removing the bg_tasks: I will try my best to reserve the current bg_tasks interface and replace the inner code with the RawCurp. This will help reduce the burden of reviewers.
  3. Final enhancement: including rewriting tests and remove bg_tasks; refactor LogEntry: only one cmd for each log entry and it should contain index; improve error handling.

P.S. Although I will remove the bg_tasks in the end, it doesn't mean that there will be no background tasks in CurpNode. In fact, there will still be bg_tick and bg_leader_calibrate_follower, but since most logics are handled in the curp module, there will be not much code.

About Locks in RawCurp

RawCurp has 4 locks:

//! Lock order should be:
//!     1. self.st
//!     2. self.ctx || self.ltx (there is no need for grabbing both)
//!     3. self.log

pub(super) struct RawCurp<C: Command> {
    /// Curp state
    st: RwLock<State>,
    /// Additional leader state
    lst: RwLock<LeaderState>,
    /// Additional candidate state
    cst: Mutex<CandidateState<C>>,
    /// Curp logs
    log: RwLock<Log<C>>,
    /// Relevant context
    ctx: Context<C>,
}


pub(super) struct State {
    /* persisted state */
    /// Current term
    pub(super) term: u64,
    /// Candidate id that received vote in current term
    pub(super) voted_for: Option<ServerId>,

    /* volatile state */
    /// Role of the server
    pub(super) role: Role,
    /// Cached id of the leader.
    pub(super) leader_id: Option<ServerId>,
}

pub(super) struct CandidateState<C> {
    /// Collected speculative pools, used for recovery
    pub(super) sps: HashMap<ServerId, Vec<Arc<C>>>,
    /// Votes received in the election
    pub(super) votes_received: u64,
}

pub(super) struct LeaderState {
    /// For each server, index of the next log entry to send to that server
    pub(super) next_index: HashMap<ServerId, usize>,
    /// For each server, index of highest log entry known to be replicated on server
    pub(super) match_index: HashMap<ServerId, usize>,
}


pub(super) struct Log<C: Command> {
    /// Log entries, should be persisted
    pub(super) entries: Vec<LogEntry<C>>,
    /// Index of highest log entry known to be committed
    pub(super) commit_index: usize,
    /// Index of highest log entry applied to state machine
    pub(super) last_applied: usize,
}
Function | Lock st lst cst log Notes
tick(leader) 1 0 0 1 grab and release state read lock to check role, grab log read lock to generate heartbeats
tick(candidate/follower) 1 0 1 1 grab and release state read lock to check role, grab candidate state write lock and log read lock to become candidate and generate votes
handle_append_entries 1 0 0 1 grab upgradable read state lock to check term(upgrade to write if term is updated), while holding the state log, grab log write lock to append new logs
handle_append_entries_resp 1 1 0 1 grab and release state read lock to check role, grab and release leader state write lock to update match_index for follower, grab and release leader state read lock and log read lock to check if commit_index updated, if can grab log write lock to update commit index
handle_vote 1 0 0 1 grab state write lock to vote, grab log read lock to check if candidate's log is up-to-date
handle_vote_resp 1 1 1 1 grab state and cadidate write lock to handle vote response. If election succeeds, will release candidate lock, and grab leader state write log(to reset next_index) and log write lock(to recover commands)

Code of Conduct

  • I agree to follow this project's Code of Conduct

[Test]Execute etcd validation test automatically

Xline is designed to be compatible with etcd API, including the kv API, watch API, auth API, etc. Currently, xline uses etcdctl to verify these APIs validation manually. We need to automate these tests to improve efficiency of testing.

execute quick_start.sh got error

./quick_start.sh
stopping
"docker stop" requires at least 1 argument.
See 'docker stop --help'.

Usage: docker stop [OPTIONS] CONTAINER [CONTAINER...]

Stop one or more running containers
stopped
container starting
Unable to find image 'gcr.io/etcd-development/etcd:v3.5.5' locally
bca9b7f8162a07cf747235b906ce7c7897cdf61abacad40ebf4bccce5577f4c8
162a701933e3b6f8a3d4debc799bcb173cb9dfc5726cf2a872347fbf52946ffc
c5d27f380aee918358c0c6c29bf5b3a93a46c757cc0e378e3e88dfdbcc6364e0
v3.5.5: Pulling from etcd-development/etcd
fa223d8c149d: Pull complete
a25d580da1fd: Pull complete
b5dd3f73ef5b: Pull complete
2767fb7dc720: Pull complete
4cbd5560d3c8: Pull complete
14776ed116ad: Pull complete
Digest: sha256:89b6debd43502d1088f3e02f39442fd3e951aa52bee846ed601cf4477114b89e
Status: Downloaded newer image for gcr.io/etcd-development/etcd:v3.5.5
b6b53140265b4516b16da915496b7cd05a018920492b4c73aefa5d36f525e370
container started
cluster starting
Error response from daemon: OCI runtime exec failed: exec failed: unable to start container process: exec /usr/local/bin/xline: exec format error: unknown
Error response from daemon: OCI runtime exec failed: exec failed: unable to start container process: exec /usr/local/bin/xline: exec format error: unknown
Error response from daemon: OCI runtime exec failed: exec failed: unable to start container process: exec /usr/local/bin/xline: exec format error: unknown
command is: docker exec -e RUST_LOG=debug -d node2 /usr/local/bin/xline --name node2 --cluster-peers 172.20.0.3:2379 172.20.0.5:2379 --self-ip-port 172.20.0.4:2379 --leader-ip-port 172.20.0.3:2379
command is: docker exec -e RUST_LOG=debug -d node1 /usr/local/bin/xline --name node1 --cluster-peers 172.20.0.4:2379 172.20.0.5:2379 --self-ip-port 172.20.0.3:2379 --leader-ip-port 172.20.0.3:2379 --is-leader
command is: docker exec -e RUST_LOG=debug -d node3 /usr/local/bin/xline --name node3 --cluster-peers 172.20.0.3:2379 172.20.0.4:2379 --self-ip-port 172.20.0.5:2379 --leader-ip-port 172.20.0.3:2379
cluster started

docker -v
Docker version 20.10.17-rd, build c2e4e01
uname -a
Darwin bcd07439fa51 22.1.0 Darwin Kernel Version 22.1.0: Sun Oct 9 20:15:09 PDT 2022; root:xnu-8792.41.9~2/RELEASE_ARM64_T6000 arm64
rustc -V
rustc 1.61.0 (fe5b13d68 2022-05-18)

[Curp] Improve slow path behaviour

If speculative execution is feasible but the result of speculative execution is lost, then the proposal will fail because slow path doesn't carry execution result. We should put execution result in the wait sync request in this situation.

[Curp] Merge heartbeat logic with append entries

Currently, heartbeat and append entries are separated -- heartbeat will be triggered every 150ms. However, when the server is busy, there will be many append entries triggered in between two heartbeats. These append entries alone can keep the follower alive. Therefore, these heartbeats can be optimized out.

Originally posted by @rogercloud in #85 (comment)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.