Code Monkey home page Code Monkey logo

raft's Introduction

This project is unmaintained

English|简体中文

The dqlite team is no longer maintaining our raft implementation as an independent project. Instead, the raft source code has been incorporated into canonical/dqlite as a private implementation detail. v0.18.1 is the last release of dqlite's libraft. We regret any inconvenience caused by this change.

If you depend on dqlite but not on raft directly, see canonical/dqlite for up-to-date instructions on how to use dqlite's bundled raft build configuration. If you formerly depended on dqlite's libraft, you should switch to the maintained fork cowsql/raft.

A discussion thread is open on the dqlite repository for any questions about this change.

The remainder of this README is of historical interest only.


Fully asynchronous C implementation of the Raft consensus protocol.

The library has modular design: its core part implements only the core Raft algorithm logic, in a fully platform independent way. On top of that, a pluggable interface defines the I/O implementation for networking (send/receive RPC messages) and disk persistence (store log entries and snapshots).

A stock implementation of the I/O interface is provided when building the library with default options. It is based on libuv and should fit the vast majority of use cases. The only catch is that it currently requires Linux, since it uses the Linux AIO API for disk I/O. Patches are welcome to add support for more platforms.

See raft.h for full documentation.

License

This raft C library is released under a slightly modified version of LGPLv3, that includes a copyright exception letting users to statically link the library code in their project and release the final work under their own terms. See the full license text.

Features

This implementation includes all the basic features described in the Raft dissertation:

  • Leader election
  • Log replication
  • Log compaction
  • Membership changes

It also includes a few optional enhancements:

  • Optimistic pipelining to reduce log replication latency
  • Writing to leader's disk in parallel
  • Automatic stepping down when the leader loses quorum
  • Leadership transfer extension
  • Pre-vote protocol

Install

If you are on a Debian-based system, you can get the latest development release from dqlite's dev PPA:

sudo add-apt-repository ppa:dqlite/dev
sudo apt-get update
sudo apt-get install libraft-dev

Building

To build libraft from source you'll need:

  • A reasonably recent version of libuv (v1.18.0 or beyond).
  • Optionally, but recommended, a reasonably recent version of liblz4 (v1.7.1 or beyond).
sudo apt-get install libuv1-dev liblz4-dev libtool pkg-config build-essential
autoreconf -i
./configure --enable-example
make

Example

The best way to understand how to use the library is probably reading the code of the example server included in the source code.

You can also see the example server in action by running:

./example/cluster

which spawns a little cluster of 3 servers, runs a sample workload, and randomly stops and restarts a server from time to time.

Quick guide

It is recommended that you read raft.h for documentation details, but here's a quick high-level guide of what you'll need to do (error handling is omitted for brevity).

Create an instance of the stock raft_io interface implementation (or implement your own one if the one that comes with the library really does not fit):

const char *dir = "/your/raft/data";
struct uv_loop_s loop;
struct raft_uv_transport transport;
struct raft_io io;
uv_loop_init(&loop);
raft_uv_tcp_init(&transport, &loop);
raft_uv_init(&io, &loop, dir, &transport);

Define your application Raft FSM, implementing the raft_fsm interface:

struct raft_fsm
{
  void *data;
  int (*apply)(struct raft_fsm *fsm, const struct raft_buffer *buf, void **result);
  int (*snapshot)(struct raft_fsm *fsm, struct raft_buffer *bufs[], unsigned *n_bufs);
  int (*restore)(struct raft_fsm *fsm, struct raft_buffer *buf);
}

Pick a unique ID and address for each server and initialize the raft object:

unsigned id = 1;
const char *address = "192.168.1.1:9999";
struct raft raft;
raft_init(&raft, &io, &fsm, id, address);

If it's the first time you start the cluster, create a configuration object containing each server that should be present in the cluster (typically just one, since you can grow your cluster at a later point using raft_add and raft_promote) and bootstrap:

struct raft_configuration configuration;
raft_configuration_init(&configuration);
raft_configuration_add(&configuration, 1, "192.168.1.1:9999", true);
raft_bootstrap(&raft, &configuration);

Start the raft server:

raft_start(&raft);
uv_run(&loop, UV_RUN_DEFAULT);

Asynchronously submit requests to apply new commands to your application FSM:

static void apply_callback(struct raft_apply *req, int status, void *result) {
  /* ... */
}

struct raft_apply req;
struct raft_buffer buf;
buf.len = ...; /* The length of your FSM entry data */
buf.base = ...; /* Your FSM entry data */
raft_apply(&raft, &req, &buf, 1, apply_callback);

To add more servers to the cluster use the raft_add() and raft_promote APIs.

Usage Notes

The default libuv based raft_io implementation compresses the raft snapshots using the liblz4 library. Next to saving disk space, the lz4 compressed snapshots offer additional data integrity checks in the form of a Content Checksum, this allows raft to detect corruptions that occurred during storage. It is therefore recommended to not disable lz4 compression by means of the --disable-lz4 configure flag.

Detailed tracing will be enabled when the environment variable LIBRAFT_TRACE is set upon startup. The value of it can be in [0..5] range and reperesents a tracing level, where 0 means "no traces" emitted, 5 enables minimum (FATAL records only), and 1 enables maximum verbosity (all: DEBUG, INFO, WARN, ERROR, FATAL records).

Notable users

Credits

Of course the biggest thanks goes to Diego Ongaro :) (the original author of the Raft dissertation).

A lot of ideas and inspiration was taken from other Raft implementations such as:

raft's People

Contributors

andreasstieger avatar bekicot avatar bl-ue avatar cole-miller avatar ericcurtin avatar freeekanayaka avatar ganto avatar gnustomp avatar growdu avatar hermannch avatar jsoref avatar just-now avatar mathieubordere avatar mwnsiri avatar norbertheusser avatar ralight avatar smithed avatar stgraber avatar zouyonghao avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

raft's Issues

How many rpc messages?

How many rpc messages does this implementation does leader node initiate to send a message in 3-node cluster.

I tried checking with wireshark, but it contains too much of clutter from raft ports, where raft is listening, can you once see if, everything is fine. I have checked it with v0.9.9. I tried to check with v0.9.13, as API seems to have changed I didn't go with re-writing for this, as my code is already a mess.

When I have checked with wireshark, there were too many tcp ack's, being sent with other nodes.

I have configured heartbeat to 300ms, but messages were around >100 within second.

Thank you.

state callback

Unfortunately, there is not raft state change callback, so user code does not know when the node becomes leader or follower.

Potentially invalid allocation in the server example

In this snippet we can see that the allocation size is defined to be the size of what is under the dereferencing of fsm which means that it will be sizeof(struct raft_fsm).

The problem is that it is not the size of the struct we want to use, the correct size should be sizeof(Fsm), I am missing something?

raft/example/server.c

Lines 70 to 83 in 4d3de31

static int FsmInit(struct raft_fsm *fsm)
{
struct Fsm *f = raft_malloc(sizeof *fsm);
if (f == NULL) {
return RAFT_NOMEM;
}
f->count = 0;
fsm->version = 1;
fsm->data = f;
fsm->apply = FsmApply;
fsm->snapshot = FsmSnapshot;
fsm->restore = FsmRestore;
return 0;
}

Integration tests fail on tmpfs

I'm trying to build a RPM for raft using the Fedora COPR service. Unfortunately the Fedora 31/32 x86_64 builds fail during the test suite (ganto/copr-lxc4#8) with the error:

===================================
   raft 0.9.24: ./test-suite.log
===================================
# TOTAL: 5
# PASS:  4
# SKIP:  0
# XFAIL: 0
# FAIL:  1
# XPASS: 0
# ERROR: 0
.. contents:: :depth: 2
FAIL: test/integration/uv
=========================
Running test suite with seed 0x06f4451d...
init/dirTooLong                                             [ OK    ] [ 0.00002815 / 0.00001289 CPU ]
init/oom                                                    
  heap-fault-delay=1, heap-fault-repeat=1                   [ ERROR ]
Error: test/integration/test_uv_init.c:156: assertion failed: _rv == 1 (0 == 1)
Error: child killed by signal 6 (Aborted)

I saw in the test comment, that this test has some issues with tmpfs. I cannot reproduce test_uv_init.c assertion errors when I'm building the RPM locally with help of mock where I'm definitely not using a tmpfs for the build directory. I don't know how the COPR setup exactly works.

  • The same error happens for aarch64:
init/oom                                                    
  heap-fault-delay=1, heap-fault-repeat=1                   [ ERROR ]
Error: test/integration/test_uv_init.c:156: assertion failed: _rv == 1 (0 == 1)
Error: child killed by signal 6 (Aborted)

On the 32bit architectures there are some more issues:

  • Also armhfp has this error but adding another:
UvWriterInit/noResources                                    [ ERROR ]
Error: test/lib/aio.c:47: assertion failed: rv == 0 (-1 == 0)
qemu: uncaught target signal 6 (Aborted) - core dumped
Error: child killed by signal 6 (Aborted)
[...]
init/oom                                                    
  heap-fault-delay=1, heap-fault-repeat=1                   [ ERROR ]
Error: test/integration/test_uv_init.c:156: assertion failed: _rv == 1 (0 == 1)
qemu: uncaught target signal 6 (Aborted) - core dumped
Error: child killed by signal 6 (Aborted)
  • The i686 skips this test (as far as I understood) but then fails with other errors:
tick/request_vote_only_to_voters                            
  n_voting=2                                                [ OK    ] [ 0.00000858 / 0.00000818 CPU ]
raft_transfer/upToDate                                      [ ERROR ]
munmap_chunk(): invalid pointer
Error: child killed by signal 6 (Aborted)
raft_transfer/catchUp                                       [ ERROR ]
munmap_chunk(): invalid pointer
Error: child killed by signal 6 (Aborted)
raft_transfer/expire                                        [ OK    ] [ 0.00048329 / 0.00030064 CPU ]
raft_transfer/unknownServer                                 [ OK    ] [ 0.00000971 / 0.00000914 CPU ]
raft_transfer/twice                                         [ ERROR ]
munmap_chunk(): invalid pointer
Error: child killed by signal 6 (Aborted)
raft_transfer/autoSelect                                    [ ERROR ]
munmap_chunk(): invalid pointer
Error: child killed by signal 6 (Aborted)
raft_transfer/autoSelectUpToDate                            [ ERROR ]
munmap_chunk(): invalid pointer
Error: child killed by signal 6 (Aborted)
raft_transfer/afterDemotion                                 [ ERROR ]
munmap_chunk(): invalid pointer
Error: child killed by signal 6 (Aborted)
  • In my local build on my workstation I also have to aio.c error for i686 that I didn't spot on COPR:
UvWriterInit/noResources                                    [ ERROR ]                                                                                          
Error: test/lib/aio.c:47: assertion failed: rv == 0 (-1 == 0)                                                                                                  
Error: child killed by signal 6 (Aborted)

The test suite seems to pass for all architectures on Fedora 33 and Rawhide. Is there something "wrong" with Fedora or this an issue of the test suite?

Transactions are corrupted when server restarts

I'm on v0.10.0 from git.
I'm running a simple cluster of 3 VMs (although i see the same problem when running locally on different ports). I'm using the builtin UV TCP transport.
When I take a node down and bring it back up, the first X bytes of a transaction are cut off. If I examine the open-1 file of the node I take down, it contains the full transaction data, but when I restart the node, the transaction is corrupted.

Any ideas? How would I debug?

Progress of a Rust wrapper

I am currently working on making a good and safe abstraction/wrapper of libraft in Rust that I will probably release under the name canonical-raft.

The story of replicated system in Rust is worse than the one in Go where the most popular library to replicate states over the network is the hashicorp/raft library. The power of it is mainly comes from the ease of use of it: you only have to implement one single and simple FSM interface and you are done.

I would love to be able to make that easy in Rust, it would highly empower the language!
The only viable Raft library out there is the one from pingcap and I must say that it is not easy to use at all!!!

In a second time I would like to create a pure Rust implementation of a raft_io backend that I will be able to include with the library. It would be easier to integrate in the async ecosystem of Rust than the uv based one and it would be more cross-platform too.
Do you think it could be possible?

The last thing that I have seen is that the hashicorp/raft library comes with an official MDBStore backend, as I am the author of heed: a safe typed LMDB wrapper in Rust, I will probably make an equivalent store on top of LMDB using heed.
Does that make sense to you?

I would like to thank you about the great work you have done here!
I was searching for a Raft or Paxos simple library to make our search engine MeiliSearch replicated for too long now!

Raft snapshots take up a lot of disk space

@freeekanayaka dqlite has users with pretty large DB's (+1GB) whose snapshots take up a lot of room. We have seen issues with slow storage devices that crumble under the I/O load of the frequent writing of these snapshots when there are a lot of writes to the DB. This results in writes to the DB sometimes taking over 30s to complete.

My proposal would be to compress the snapshots (lz4) before writing them to disk, this will not fix a slow storage device, but should at least alleviate a lot of the pressure on it. The compression can be done in raft itself or through the fsm methods that provide the snapshots to raft. My feeling is that it's the most convenient to do this in raft itself.

What are your thoughts on this?

libuv API misuse

Aug 02 08:17:25 soetomodb2 microk8s.daemon-apiserver[15684]: kube-apiserver: src/unix/poll.c:109: uv_poll_stop: Assertion `!uv__is_closing(handle)' failed.
Aug 02 08:17:26 soetomodb2 systemd[1]: snap.microk8s.daemon-apiserver.service: Main process exited, code=killed, status=6/ABRT

Possibly

rv = uv_poll_stop(&w->event_poller);
is hit when the fd is already closed, triggering the assertion in https://github.com/libuv/libuv/blob/04289fa326b790c1a4abb236d1f9d913bacfc8c6/src/unix/poll.c#L113

Also see canonical/microk8s#2487

UvList result can include incomplete Snapshots

UvList being called in uvSnapshotGetWorkCb can return incomplete Snapshots, this happens because it can be called while a Snapshot is being written to disk. This can result in sending an incomplete snapshot to a follower that will crash due to trying to install the incomplete snapshot.

Add crc data checksum to uv Snapshot

If I'm not mistaken we calculate crc checksums for the snapshot.meta and segment files, but perform no integrity checking on the snapshot data files. This should be introduced to detect disk corruptions.

Node fails to join as stand-by

We have reports in canonical/microk8s#2144 where a node fails to start as a stand-by node with the following error:

Apr 03 02:26:47 microk8s-hades-node-3 microk8s.daemon-kubelite[17016]: Error: start node: raft_start(): io: load closed segment 0000000000001025-0000000000001024: entries batch 1 starting at byte 8: entries count in preamble is zero
Apr 03 02:26:47 microk8s-hades-node-3 microk8s.daemon-kubelite[17016]: F0403 02:26:47.778967   17016 daemon.go:67] API Server exited start node: raft_start(): io: load closed segment 0000000000001025-0000000000001024: entries batch 1 starting at byte 8: entries count in preamble is zero

LXD refuses to start if kernel does not support AIO

Originally reported at https://github.com/lxc/lxd/issues/9189


I had to compile my own kernel for various reasons, and while I was doing so I figured to harden it a little. Among other things, I decided to disable AIO because it's typically defined as legacy and has had many security issues in the past (besides, io_uring is the new kid on the block).

However, once running on this kernel, 'lxd init --preseed' no longer works because lxd won't start. In the logs the following stack trace is found:

Sep 02 18:46:52 hpv1 lxd.daemon[122708]: panic: runtime error: invalid memory address or nil pointer dereference
Sep 02 18:46:52 hpv1 lxd.daemon[122708]: [signal SIGSEGV: segmentation violation code=0x1 addr=0xd08 pc=0x40d56e]
Sep 02 18:46:52 hpv1 lxd.daemon[122708]: goroutine 1 [running]:
Sep 02 18:46:52 hpv1 lxd.daemon[122708]: github.com/canonical/go-dqlite/internal/bindings._Cfunc_GoString(...)
Sep 02 18:46:52 hpv1 lxd.daemon[122708]:         _cgo_gotypes.go:102
Sep 02 18:46:52 hpv1 lxd.daemon[122708]: github.com/canonical/go-dqlite/internal/bindings.NewNode(0x1, 0x183fb84, 0x1, 0xc0001a1050, 0x28, 0x0, 0x0, 0x0)
Sep 02 18:46:52 hpv1 lxd.daemon[122708]:         /build/lxd/parts/lxd/src/.go/src/github.com/canonical/go-dqlite/internal/bindings/server.go:127 +0x136
Sep 02 18:46:52 hpv1 lxd.daemon[122708]: github.com/canonical/go-dqlite.New(0x1, 0x183fb84, 0x1, 0xc0001a1050, 0x28, 0xc000a170c0, 0x1, 0x1, 0x8, 0x203000, ...)
Sep 02 18:46:52 hpv1 lxd.daemon[122708]:         /build/lxd/parts/lxd/src/.go/src/github.com/canonical/go-dqlite/node.go:70 +0xc5
Sep 02 18:46:52 hpv1 lxd.daemon[122708]: github.com/lxc/lxd/lxd/cluster.(*Gateway).init(0xc0002ea2a0, 0xc000c29100, 0xc000c70b40, 0xc000458640)
Sep 02 18:46:52 hpv1 lxd.daemon[122708]:         /build/lxd/parts/lxd/src/.go/src/github.com/lxc/lxd/lxd/cluster/gateway.go:811 +0x471
Sep 02 18:46:52 hpv1 lxd.daemon[122708]: github.com/lxc/lxd/lxd/cluster.NewGateway(0xc000316090, 0xc0002e4000, 0xc0002cac30, 0xc000a175b0, 0x2, 0x2, 0xc00019d350, 0x2, 0x0)
Sep 02 18:46:52 hpv1 lxd.daemon[122708]:         /build/lxd/parts/lxd/src/.go/src/github.com/lxc/lxd/lxd/cluster/gateway.go:66 +0x1d7
Sep 02 18:46:52 hpv1 lxd.daemon[122708]: main.(*Daemon).init(0xc00014e680, 0xc0001af1b8, 0xc00014e680)
Sep 02 18:46:52 hpv1 lxd.daemon[122708]:         /build/lxd/parts/lxd/src/.go/src/github.com/lxc/lxd/lxd/daemon.go:962 +0x12ce
Sep 02 18:46:52 hpv1 lxd.daemon[122708]: main.(*Daemon).Init(0xc00014e680, 0xc0003ec120, 0xc00014e680)
Sep 02 18:46:52 hpv1 lxd.daemon[122708]:         /build/lxd/parts/lxd/src/.go/src/github.com/lxc/lxd/lxd/daemon.go:707 +0x2f
Sep 02 18:46:52 hpv1 lxd.daemon[122708]: main.(*cmdDaemon).Run(0xc0001af110, 0xc000035400, 0xc0002cc9c0, 0x0, 0x4, 0x0, 0x0)
Sep 02 18:46:52 hpv1 lxd.daemon[122708]:         /build/lxd/parts/lxd/src/.go/src/github.com/lxc/lxd/lxd/main_daemon.go:67 +0x36f
Sep 02 18:46:52 hpv1 lxd.daemon[122708]: github.com/spf13/cobra.(*Command).execute(0xc000035400, 0xc0001c4010, 0x4, 0x4, 0xc000035400, 0xc0001c4010)
Sep 02 18:46:52 hpv1 lxd.daemon[122708]:         /build/lxd/parts/lxd/src/.go/src/github.com/spf13/cobra/command.go:856 +0x472
Sep 02 18:46:52 hpv1 lxd.daemon[122708]: github.com/spf13/cobra.(*Command).ExecuteC(0xc000035400, 0xc00049df58, 0x1, 0x1)
Sep 02 18:46:52 hpv1 lxd.daemon[122708]:         /build/lxd/parts/lxd/src/.go/src/github.com/spf13/cobra/command.go:974 +0x375
Sep 02 18:46:52 hpv1 lxd.daemon[122708]: github.com/spf13/cobra.(*Command).Execute(...)
Sep 02 18:46:52 hpv1 lxd.daemon[122708]:         /build/lxd/parts/lxd/src/.go/src/github.com/spf13/cobra/command.go:902
Sep 02 18:46:52 hpv1 lxd.daemon[122708]: main.main()
Sep 02 18:46:52 hpv1 lxd.daemon[122708]:         /build/lxd/parts/lxd/src/.go/src/github.com/lxc/lxd/lxd/main.go:218 +0x1af7
Sep 02 18:46:53 hpv1 lxd.daemon[122568]: => LXD failed to start

If it is intended for AIO to be required, my suggestion would be to document this (the only reference to AIO I could find is a recommendation for aio-max-nr sysctl) and maybe add some error handling so LXD doesn't just panic.

Required information

  • Distribution: Ubuntu 20.04.1 LTS
  • The output of "lxc info":
# lxc info
config:
  cluster.https_address: 'snip'
  core.https_address: 'snip'
  core.trust_password: true
api_extensions:
- storage_zfs_remove_snapshots
- container_host_shutdown_timeout
- container_stop_priority
- container_syscall_filtering
- auth_pki
- container_last_used_at
- etag
- patch
- usb_devices
- https_allowed_credentials
- image_compression_algorithm
- directory_manipulation
- container_cpu_time
- storage_zfs_use_refquota
- storage_lvm_mount_options
- network
- profile_usedby
- container_push
- container_exec_recording
- certificate_update
- container_exec_signal_handling
- gpu_devices
- container_image_properties
- migration_progress
- id_map
- network_firewall_filtering
- network_routes
- storage
- file_delete
- file_append
- network_dhcp_expiry
- storage_lvm_vg_rename
- storage_lvm_thinpool_rename
- network_vlan
- image_create_aliases
- container_stateless_copy
- container_only_migration
- storage_zfs_clone_copy
- unix_device_rename
- storage_lvm_use_thinpool
- storage_rsync_bwlimit
- network_vxlan_interface
- storage_btrfs_mount_options
- entity_description
- image_force_refresh
- storage_lvm_lv_resizing
- id_map_base
- file_symlinks
- container_push_target
- network_vlan_physical
- storage_images_delete
- container_edit_metadata
- container_snapshot_stateful_migration
- storage_driver_ceph
- storage_ceph_user_name
- resource_limits
- storage_volatile_initial_source
- storage_ceph_force_osd_reuse
- storage_block_filesystem_btrfs
- resources
- kernel_limits
- storage_api_volume_rename
- macaroon_authentication
- network_sriov
- console
- restrict_devlxd
- migration_pre_copy
- infiniband
- maas_network
- devlxd_events
- proxy
- network_dhcp_gateway
- file_get_symlink
- network_leases
- unix_device_hotplug
- storage_api_local_volume_handling
- operation_description
- clustering
- event_lifecycle
- storage_api_remote_volume_handling
- nvidia_runtime
- container_mount_propagation
- container_backup
- devlxd_images
- container_local_cross_pool_handling
- proxy_unix
- proxy_udp
- clustering_join
- proxy_tcp_udp_multi_port_handling
- network_state
- proxy_unix_dac_properties
- container_protection_delete
- unix_priv_drop
- pprof_http
- proxy_haproxy_protocol
- network_hwaddr
- proxy_nat
- network_nat_order
- container_full
- candid_authentication
- backup_compression
- candid_config
- nvidia_runtime_config
- storage_api_volume_snapshots
- storage_unmapped
- projects
- candid_config_key
- network_vxlan_ttl
- container_incremental_copy
- usb_optional_vendorid
- snapshot_scheduling
- snapshot_schedule_aliases
- container_copy_project
- clustering_server_address
- clustering_image_replication
- container_protection_shift
- snapshot_expiry
- container_backup_override_pool
- snapshot_expiry_creation
- network_leases_location
- resources_cpu_socket
- resources_gpu
- resources_numa
- kernel_features
- id_map_current
- event_location
- storage_api_remote_volume_snapshots
- network_nat_address
- container_nic_routes
- rbac
- cluster_internal_copy
- seccomp_notify
- lxc_features
- container_nic_ipvlan
- network_vlan_sriov
- storage_cephfs
- container_nic_ipfilter
- resources_v2
- container_exec_user_group_cwd
- container_syscall_intercept
- container_disk_shift
- storage_shifted
- resources_infiniband
- daemon_storage
- instances
- image_types
- resources_disk_sata
- clustering_roles
- images_expiry
- resources_network_firmware
- backup_compression_algorithm
- ceph_data_pool_name
- container_syscall_intercept_mount
- compression_squashfs
- container_raw_mount
- container_nic_routed
- container_syscall_intercept_mount_fuse
- container_disk_ceph
- virtual-machines
- image_profiles
- clustering_architecture
- resources_disk_id
- storage_lvm_stripes
- vm_boot_priority
- unix_hotplug_devices
- api_filtering
- instance_nic_network
- clustering_sizing
- firewall_driver
- projects_limits
- container_syscall_intercept_hugetlbfs
- limits_hugepages
- container_nic_routed_gateway
- projects_restrictions
- custom_volume_snapshot_expiry
- volume_snapshot_scheduling
- trust_ca_certificates
- snapshot_disk_usage
- clustering_edit_roles
- container_nic_routed_host_address
- container_nic_ipvlan_gateway
- resources_usb_pci
- resources_cpu_threads_numa
- resources_cpu_core_die
- api_os
- container_nic_routed_host_table
- container_nic_ipvlan_host_table
- container_nic_ipvlan_mode
- resources_system
- images_push_relay
- network_dns_search
- container_nic_routed_limits
- instance_nic_bridged_vlan
- network_state_bond_bridge
- usedby_consistency
- custom_block_volumes
- clustering_failure_domains
- resources_gpu_mdev
- console_vga_type
- projects_limits_disk
- network_type_macvlan
- network_type_sriov
- container_syscall_intercept_bpf_devices
- network_type_ovn
- projects_networks
- projects_networks_restricted_uplinks
- custom_volume_backup
- backup_override_name
- storage_rsync_compression
- network_type_physical
- network_ovn_external_subnets
- network_ovn_nat
- network_ovn_external_routes_remove
- tpm_device_type
- storage_zfs_clone_copy_rebase
- gpu_mdev
- resources_pci_iommu
- resources_network_usb
- resources_disk_address
- network_physical_ovn_ingress_mode
- network_ovn_dhcp
- network_physical_routes_anycast
- projects_limits_instances
- network_state_vlan
- instance_nic_bridged_port_isolation
- instance_bulk_state_change
- network_gvrp
- instance_pool_move
- gpu_sriov
- pci_device_type
- storage_volume_state
- network_acl
- migration_stateful
- disk_state_quota
- storage_ceph_features
- projects_compression
- projects_images_remote_cache_expiry
- certificate_project
- network_ovn_acl
- projects_images_auto_update
- projects_restricted_cluster_target
- images_default_architecture
- network_ovn_acl_defaults
- gpu_mig
- project_usage
- network_bridge_acl
- warnings
- projects_restricted_backups_and_snapshots
- clustering_join_token
- clustering_description
- server_trusted_proxy
- clustering_update_cert
- storage_api_project
- server_instance_driver_operational
- server_supported_storage_drivers
- event_lifecycle_requestor_address
- resources_gpu_usb
- clustering_evacuation
api_status: stable
api_version: "1.0"
auth: trusted
public: false
auth_methods:
- tls
environment:
  addresses:
  - 'snip'
  architectures:
  - x86_64
  - i686
  certificate: |
    -----BEGIN CERTIFICATE-----
snip
    -----END CERTIFICATE-----
  certificate_fingerprint: 1420snip062
  driver: qemu | lxc
  driver_version: 6.1.0 | 4.0.10
  firewall: nftables
  kernel: Linux
  kernel_architecture: x86_64
  kernel_features:
    netnsid_getifaddrs: "true"
    seccomp_listener: "true"
    seccomp_listener_continue: "true"
    shiftfs: "false"
    uevent_injection: "true"
    unpriv_fscaps: "true"
  kernel_version: 5.4.133-formicidae20210902165706
  lxc_features:
    cgroup2: "true"
    devpts_fd: "true"
    idmapped_mounts_v2: "true"
    mount_injection_file: "true"
    network_gateway_device_route: "true"
    network_ipvlan: "true"
    network_l2proxy: "true"
    network_phys_macvlan_mtu: "true"
    network_veth_router: "true"
    pidfd: "true"
    seccomp_allow_deny_syntax: "true"
    seccomp_notify: "true"
    seccomp_proxy_send_notify_fd: "true"
  os_name: Ubuntu
  os_version: "20.04"
  project: default
  server: lxd
  server_clustered: true
  server_name: yes
  server_pid: 12644
  server_version: "4.17"
  storage: zfs
  storage_version: 2.1.0-1
  storage_supported_drivers:
  - name: zfs
    version: 2.1.0-1
    remote: false
  - name: ceph
    version: 15.2.13
    remote: true
  - name: btrfs
    version: 5.4.1
    remote: false
  - name: cephfs
    version: 15.2.13
    remote: true
  - name: dir
    version: "1"
    remote: false
  - name: lvm
    version: 2.03.07(2) (2019-11-30) / 1.02.167 (2019-11-30) / 4.41.0
    remote: false

Committing entries from previous terms

raft/src/replication.c

Lines 1539 to 1574 in 4c71b94

void replicationQuorum(struct raft *r, const raft_index index)
{
size_t votes = 0;
size_t i;
assert(r->state == RAFT_LEADER);
if (index <= r->commit_index) {
return;
}
/* TODO: fuzzy-test --seed 0x8db5fccc replication/entries/partitioned
* fails the assertion below. */
if (logTermOf(&r->log, index) == 0) {
return;
}
// assert(logTermOf(&r->log, index) > 0);
assert(logTermOf(&r->log, index) <= r->current_term);
for (i = 0; i < r->configuration.n; i++) {
struct raft_server *server = &r->configuration.servers[i];
if (server->role != RAFT_VOTER) {
continue;
}
if (r->leader_state.progress[i].match_index >= index) {
votes++;
}
}
if (votes > configurationVoterCount(&r->configuration) / 2) {
r->commit_index = index;
tracef("new commit index %llu", r->commit_index);
}
return;
}

From section 3.6.2:
Raft never commits log entries from previous terms by counting replicas. Only log entries from the leader’s current term are committed by counting replicas.

In the code, I have not found the implementation of the logic described above.

Implement prevoting before an actual vote

Let me start by saying that this implementation looks fantastic and I'm really excited to see such a library written in C. What are your opinions on adding a pre-vote phase to the vote? From Ongardie's long-form thesis paper (https://github.com/ongardie/dissertation#readme), the concept of pre-vote is explained:

One downside of Raft’s leader election algorithm is that a server that has been partitioned from the
cluster is likely to cause a disruption when it regains connectivity. When a server is partitioned, it
will not receive heartbeats. It will soon increment its term to start an election, although it won’t
be able to collect enough votes to become leader. When the server regains connectivity sometime
later, its larger term number will propagate to the rest of the cluster (either through the server’s
RequestVote requests or through its AppendEntries response). This will force the cluster leader to
step down, and a new election will have to take place to select a new leader. Fortunately, such events
are likely to be rare, and each will only cause one leader to step down.

I've worked on networks where these situations are not as rare (world-distributed with tighter heartbeats) and it can cause non-negligible leadership churn.

Here's an example of how it works in my toy raft implementation: (https://github.com/matthewaveryusa/raft.ts/blob/master/src/server.ts#L401-L455)

Would you be interested in a pull request with pre-voting?

Why need to reset vote during pre-vote

In the pre-vote process, the vote is set to 0. At the same time, there is a test case(name is preVoteWithcandidateCrash) for this scenario. I don’t understand why need to reset vote. Can someone answer it, please?

IOBE: no procedure provided to prepare the first (configuration) log entry

To implement a custom I/O backend (struct raft_io), at the bootstrap phase, one needs to encode the configuration into an internal (for libraft) format, to write it into the persistent storage as the first record. But currently there aren't exposed any procedures for that.

It would suffice to just add this into include/raft.h:

RAFT_API int
configurationEncode (const struct raft_configuration *c,
                     struct raft_buffer *buf);

then use it with

  struct raft_entry entry;
  entry.term = 1;
  entry.type = RAFT_CHANGE;
  entry.buf = buffer;  /* encoded configuration */
  entry.batch = NULL;  /* OK? */

and store the entry via the usual means provided by the new IOBE.
Although there may be a better solutions, or I just miss something.

LGPL-3.0-or-later or LGPL-3.0-only clarification

Is this intended to be under LGPL-3.0-or-later or LGPL-3.0-only - with the linking exception? I can't find anything that specifies the 'only' or 'or later' option.

Maybe you could use SPDX identifiers in your source code :) which would be one of (depending on answer above):

SPDX-License-Identifier: LGPL-3.0-or-later WITH LGPL-3.0-linking-exception
SPDX-License-Identifier: LGPL-3.0-only WITH LGPL-3.0-linking-exception

Node stays leader after role changed to `RAFT_SPARE`

A node stays leader of a cluster after being demoted to RAFT_SPARE. This results in unavailability of a dqlite cluster because a node will only report who the leader if it is a voter in the current configuration, see https://github.com/canonical/dqlite/blob/9946636b7b6e3d41ad463c6ca11443c6e026ea49/src/gateway.c#L151

The other nodes will still think the spare node is the leader and will refer the client to the spare node, who in turn will report that it doesn't know the leader.

There's a test in raft with RAFT_STANDBY role instead of RAFT_SPARE https://github.com/canonical/raft/blame/master/test/integration/test_assign.c#L435, so this behavior looks intentional.

This looks to be the cause of canonical/dqlite#323.

@freeekanayaka I think the leader should step down once it notices that it's no longer a voter in the newly committed configuration, what do you think?

Candidate reverts to follower during raft_transfer with pre_vote on

Below some debug logging, basically server 1 wants to transfer leadership to 2 through raft_transfer, server 2 receives the TimeoutNow message, becomes candidate, doesn't increase it's term due to pre_vote, receives a heartbeat from 1 and converts back to follower, nullifying the effect of the attempted leadership transfer.

raft_transfer self:1 to:2        
self:2 recv message:6 from:1 state:1 term:1
recvTimeoutNow:28
convertSetState id:2 new_state:2
electionStart:114
self:3 recv message:1 from:1 state:1 term:1
self:4 recv message:1 from:1 state:1 term:1
self:1 recv message:2 from:3 state:3 term:1
self:1 recv message:2 from:4 state:3 term:1
self:2 recv message:1 from:1 state:2 term:1
convertSetState id:2 new_state:1

Unable to restart raft server

When I manually stop the raft server using interrupt signal, and then restart I see following error

raft_start(): io: load closed segment 0000000000000002-0000000000000013: entries batch 2 starting at byte 197: entries count 19005183 in preamble is too high.

base of raft_buffer passed in raft_apply() call is a dynamically allocated memory of size 157 bytes.

Any help or pointers are appreciated.

AIO events get depleted during the tests

An excerpt from test-suite.log:

uvFileCreate/noResources                                    [ ERROR ]
Error: test/unit/test_uv_file.c:226: assertion failed: rv_ == UV__ERROR (0 == 1)
Error: child killed by signal 6 (Aborted)

Digging this up we're coming to:

$ cat /proc/sys/fs/aio-nr
768

IDK what to do next. How to "drop" the events, for example.

self elect term problem

according to the raft paper. when a server turn to a candidate, it will increase it's term.

but self elect mechanism when there is only one voter (itself), raft didn't increase it's term. It will cause some term checking problem when you start a single node then add more node.

convert.c line166

Is this a problem, or It's just my misunderstanding.

Crashed followers sometimes fail to rejoin

I've experienced this on master but have ran into it in previous versions.

Steps to reproduce:

For convenience, add something like the following in example/server.c's serverTimerCb so that it's obvious when it happens.

raft_id leader;
const char *addr;
raft_leader(&s->raft, &leader, &addr);
printf("id = %d leader = %lld\n", s->id, leader);

I tried to find the steps where it occurred the quickest but I have ran into it with the current defaults used in the example. Step 5 isn't strictly necessary to reproduce this.

  1. cd example
  2. Run 3 instances until one is a leader:
    ID=1; mkdir -p r$ID; LD_LIBRARY_PATH=../.libs ./server r$ID $ID for ID=1,2,3
  3. Pick a server X that is currently a follower
  4. Terminate X with ctrl+c or kill -9
  5. Wait (~5 seconds).
  6. Re-execute X
  7. If you see leader != 0, repeat from step 4.

It takes a bit of tries (usually fewer than 10).

Eventually, X is running again and instead of joining the other two, it seems to go off and start its own elections (from what I saw after enabling tracef in tick.c).

From this point on, X never rejoins the others (regardless of restarting X or wiping X's raft directory before re-executing it) until the current leader is manually restarted.

This also happens in the case where in step 6, you wipe X's raft directory just before re-executing it (which I think also makes it occur even more frequently).

configure: error: example program requires libuv

But I've already have libuv installed.

$ apt search libuv1
Sorting... Done
Full Text Search... Done
libuv1/bionic,now 1.18.0-3 amd64 [installed,automatic]
  asynchronous event notification library - runtime library

libuv1-dev/bionic,now 1.18.0-3 amd64 [installed]
  asynchronous event notification library - development files

Environment

$ uname -a
Linux lx 5.3.0-45-generic #37~18.04.1-Ubuntu SMP Fri Mar 27 15:58:10 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
$ cat /etc/issue
Ubuntu 18.04.4 LTS \n \l

$ autoreconf --version
autoreconf (GNU Autoconf) 2.69
Copyright (C) 2012 Free Software Foundation, Inc.
License GPLv3+/Autoconf: GNU GPL version 3 or later
<http://gnu.org/licenses/gpl.html>, <http://gnu.org/licenses/exceptions.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.

Written by David J. MacKenzie and Akim Demaille.
$ ./configure --version
raft configure 0.9.17
generated by GNU Autoconf 2.69

Copyright (C) 2012 Free Software Foundation, Inc.
This configure script is free software; the Free Software Foundation
gives unlimited permission to copy, distribute and modify it.

And my log

$ autoreconf -i
libtoolize: putting auxiliary files in AC_CONFIG_AUX_DIR, 'ac'.
libtoolize: copying file 'ac/ltmain.sh'
libtoolize: putting macros in AC_CONFIG_MACRO_DIRS, 'm4'.
libtoolize: copying file 'm4/libtool.m4'
libtoolize: copying file 'm4/ltoptions.m4'
libtoolize: copying file 'm4/ltsugar.m4'
libtoolize: copying file 'm4/ltversion.m4'
libtoolize: copying file 'm4/lt~obsolete.m4'
configure.ac:9: installing 'ac/compile'
configure.ac:11: installing 'ac/config.guess'
configure.ac:11: installing 'ac/config.sub'
configure.ac:7: installing 'ac/install-sh'
configure.ac:7: installing 'ac/missing'
Makefile.am: installing 'ac/depcomp'
parallel-tests: installing 'ac/test-driver'
$ ./configure --enable-example
checking for a BSD-compatible install... /usr/bin/install -c
checking whether build environment is sane... yes
checking for a thread-safe mkdir -p... /bin/mkdir -p
checking for gawk... no
checking for mawk... mawk
checking whether make sets $(MAKE)... yes
checking whether make supports nested variables... yes
checking whether make supports nested variables... (cached) yes
checking for style of include used by make... GNU
checking for gcc... gcc
checking whether the C compiler works... yes
checking for C compiler default output file name... a.out
checking for suffix of executables... 
checking whether we are cross compiling... no
checking for suffix of object files... o
checking whether we are using the GNU C compiler... yes
checking whether gcc accepts -g... yes
checking for gcc option to accept ISO C89... none needed
checking whether gcc understands -c and -o together... yes
checking dependency style of gcc... gcc3
checking how to run the C preprocessor... gcc -E
checking for grep that handles long lines and -e... /bin/grep
checking for egrep... /bin/grep -E
checking for ANSI C header files... yes
checking for sys/types.h... yes
checking for sys/stat.h... yes
checking for stdlib.h... yes
checking for string.h... yes
checking for memory.h... yes
checking for strings.h... yes
checking for inttypes.h... yes
checking for stdint.h... yes
checking for unistd.h... yes
checking minix/config.h usability... no
checking minix/config.h presence... no
checking for minix/config.h... no
checking whether it is safe to define __EXTENSIONS__... yes
checking build system type... x86_64-pc-linux-gnu
checking host system type... x86_64-pc-linux-gnu
checking how to print strings... printf
checking for a sed that does not truncate output... /bin/sed
checking for fgrep... /bin/grep -F
checking for ld used by gcc... /usr/local/bin/ld
checking if the linker (/usr/local/bin/ld) is GNU ld... yes
checking for BSD- or MS-compatible name lister (nm)... /usr/bin/nm -B
checking the name lister (/usr/bin/nm -B) interface... BSD nm
checking whether ln -s works... yes
checking the maximum length of command line arguments... 1572864
checking how to convert x86_64-pc-linux-gnu file names to x86_64-pc-linux-gnu format... func_convert_file_noop
checking how to convert x86_64-pc-linux-gnu file names to toolchain format... func_convert_file_noop
checking for /usr/local/bin/ld option to reload object files... -r
checking for objdump... objdump
checking how to recognize dependent libraries... pass_all
checking for dlltool... no
checking how to associate runtime and link libraries... printf %s\n
checking for ar... ar
checking for archiver @FILE support... @
checking for strip... strip
checking for ranlib... ranlib
checking command to parse /usr/bin/nm -B output from gcc object... ok
checking for sysroot... no
checking for a working dd... /bin/dd
checking how to truncate binary pipes... /bin/dd bs=4096 count=1
checking for mt... mt
checking if mt is a manifest tool... no
checking for dlfcn.h... yes
checking for objdir... .libs
checking if gcc supports -fno-rtti -fno-exceptions... no
checking for gcc option to produce PIC... -fPIC -DPIC
checking if gcc PIC flag -fPIC -DPIC works... yes
checking if gcc static flag -static works... yes
checking if gcc supports -c -o file.o... yes
checking if gcc supports -c -o file.o... (cached) yes
checking whether the gcc linker (/usr/local/bin/ld -m elf_x86_64) supports shared libraries... yes
checking whether -lc should be explicitly linked in... no
checking dynamic linker characteristics... GNU/Linux ld.so
checking how to hardcode library paths into programs... immediate
checking whether stripping libraries is possible... yes
checking if libtool supports shared libraries... yes
checking whether to build shared libraries... yes
checking whether to build static libraries... yes
checking for pkg-config... no
checking for UV... no
configure: error: example program requires libuv

send snapshot size

now send snapshot use only one rpc request ? what if snapshot is bigger than memory size of machine?

ASan reported SEGV

I got SEGV when I ran some tests with example/server

==64489==WARNING: AddressSanitizer failed to allocate 0x393a312e302e302e bytes
AddressSanitizer:DEADLYSIGNAL
=================================================================
==64489==ERROR: AddressSanitizer: SEGV on unknown address 0x00000000000b (pc 0x7ff5b0862764 bp 0x61c000000690 sp 0x7ffeb3b44720 T0)
==64489==The signal is caused by a READ memory access.
==64489==Hint: address points to the zero page.
    #0 0x7ff5b0862763  (/home/zyh/raft/.libs/libraft.so.0+0x1d763)
    #1 0x7ff5b086bd02  (/home/zyh/raft/.libs/libraft.so.0+0x26d02)
    #2 0x7ff5b0cab907  (/usr/lib/x86_64-linux-gnu/libuv.so.1+0x18907)
    #3 0x7ff5b0c9ec34  (/usr/lib/x86_64-linux-gnu/libuv.so.1+0xbc34)
    #4 0x4c3f33  (/home/zyh/raft/example/server+0x4c3f33)
    #5 0x7ff5af66fbf6  (/lib/x86_64-linux-gnu/libc.so.6+0x21bf6)
    #6 0x41b959  (/home/zyh/raft/example/server+0x41b959)

AddressSanitizer can not provide additional info.
SUMMARY: AddressSanitizer: SEGV (/home/zyh/raft/.libs/libraft.so.0+0x1d763) 
==64489==ABORTING

The line of 0x1d763 is

raft/src/tick.c

Line 184 in fff44b8

if (change != NULL && change->cb != NULL) {

membership rollback issue

assert(entry != NULL);

The assertion may fail if the configuration entry has already been removed. In addition, an uncommitted configuration may be stored in a snapshot, making it impossible to roll back.

another failed uv test

Building 0.9.8, one test still fails:

uvEnsureDir/statError                                       [ ERROR ]
Error: test/unit/test_uv_os.c:64: assertion failed: uvEnsureDir("/proc/1/root", &errmsg) == UV__ERROR (0 == 1)
Error: child killed by signal 6 (Aborted)

My guess, it's because the tests in that group compare for the locale-dependent error message string:

/* If the directory can't be created, an error is returned. */
TEST(uvEnsureDir, mkdirError, NULL, NULL, 0, NULL)
{
    ENSURE_DIR_ERROR("/foobarbazegg", UV__ERROR, "mkdir: permission denied");
    return MUNIT_OK;
}

/* If the directory can't be probed for existence, an error is returned. */
TEST(uvEnsureDir, statError, NULL, NULL, 0, NULL)
{
    ENSURE_DIR_ERROR("/proc/1/root", UV__ERROR, "stat: permission denied");
    return MUNIT_OK;
}

/* If the given path is not a directory, an error is returned. */
TEST(uvEnsureDir, notDir, NULL, NULL, 0, NULL)
{
    ENSURE_DIR_ERROR("/dev/null", UV__ERROR, "not a directory");
    return MUNIT_OK;
}

question about unique id for log entry!

If i want to assign a unique id to FSM entry data pass to raft_apply, may I use logLastIndex to get it ?

I want to avoid the id value duplicate after election or cluster rebooted, persistence and never repeat.

Add user_data to raft_server and callback for message encryption/decryption

Thanks for this great work.

To enhanced security, a 32 byte public key as user_data can be add to struct raft_server, and also a callback and extra reserved byte number provide by app to do the RPC message encryption /decryption. for example:

int do_message_pack(int is_encryption,  int server_id, char[32] server_public_key, char* raw_ptr, int size){
    // this function will do zero memory malloc,  but reuse the extra reserved byte at in the raw memory block provide by library
}

struct raft_configuration configuration;
raft_configuration_init(&configuration);
const int message_reserved_byte = 40; // the raft library provide 40 byte extra space for each message
const int public_key_size = 32;
raft_configuration_set_message_callback(&configuration, do_message_pack,  message_reserved_byte, public_key_size);
raft_configuration_add(&configuration, 1, "192.168.1.1:9999", true, "node_1_public_key");

The raft library maybe not know the message size before decryption, so maybe a extra callback to decode the fix size message header need provide by app. (or put the size at the message begin without encryption, use it late as Authenticated Encryption with Associated Data).

I also like to report when I run the example get this log on last commit 1e3f2fd:

server: raft/src/replication.c:1466: int replicationApply(struct raft *): Assertion `r->last_applied <= r->commit_index' failed.
3: starting
2: stopping
2: starting
2: stopping
2: starting
1: starting
2: stopping
server: raft/src/log.c:814: void removeSuffix(struct raft_log *, const raft_index, _Bool): Assertion `index <= logLastIndex(l)' failed.
2: starting
3: stopping
3: starting
2: stopping
server: raft/src/log.c:814: void removeSuffix(struct raft_log *, const raft_index, _Bool): Assertion `index <= logLastIndex(l)' failed.

`test/unit/uv` fails

make check fails, and in test-suite.log there is:

==================================
   raft 0.9.3: ./test-suite.log
==================================

# TOTAL: 4
# PASS:  3
# SKIP:  0
# XFAIL: 0
# FAIL:  1
# XPASS: 0
# ERROR: 0

.. contents:: :depth: 2

FAIL: test/unit/uv
==================

Running test suite with seed 0x3492a69d...
uv_os/join                                                  [ OK    ] [ 0.00001463 / 0.00001452 CPU ]
uv_os/ensure_dir/does_not_exists                            [ ERROR ]
Error: test/lib/dir.c:97: No such file or directory
Error: child killed by signal 6 (Aborted)
uv_os/ensure_dir/exists                                     [ ERROR ]
Error: test/lib/dir.c:97: No such file or directory
Error: child killed by signal 6 (Aborted)
uv_os/ensure_dir/error/mkdir                                [ ERROR ]
Error: test/lib/dir.c:97: No such file or directory
Error: child killed by signal 6 (Aborted)
uv_os/ensure_dir/error/stat                                 [ ERROR ]
Error: test/lib/dir.c:97: No such file or directory
Error: child killed by signal 6 (Aborted)
uv_os/ensure_dir/error/not_a_dir                            [ ERROR ]
Error: test/lib/dir.c:97: No such file or directory
Error: child killed by signal 6 (Aborted)
uv_os/sync_dir/error/open                                   [ ERROR ]
Error: test/lib/dir.c:97: No such file or directory
Error: child killed by signal 6 (Aborted)
uv_os/open_file/error/open                                  [ ERROR ]
Error: test/lib/dir.c:97: No such file or directory
Error: child killed by signal 6 (Aborted)
uv_os/probe/tmpfs                                           
  dir-fs=tmpfs                                              [ ERROR ]
Error: test/lib/dir.c:97: No such file or directory
Error: child killed by signal 6 (Aborted)
uv_os/probe/zfs                                             
  dir-fs=zfs                                                [ ERROR ]
Error: test/lib/dir.c:97: No such file or directory
Error: child killed by signal 6 (Aborted)
uv_os/probe/error/no_access                                 [ ERROR ]
Error: test/lib/dir.c:97: No such file or directory
Error: child killed by signal 6 (Aborted)
uv_os/probe/error/no_space                                  [ ERROR ]
Error: test/lib/dir.c:97: No such file or directory
Error: child killed by signal 6 (Aborted)
uv_file/create/success                                      
  dir-fs=tmpfs                                              [ ERROR ]
Error: test/lib/dir.c:97: No such file or directory
Error: child killed by signal 6 (Aborted)
  dir-fs=ext4                                               [ ERROR ]
Error: test/lib/dir.c:97: No such file or directory
Error: child killed by signal 6 (Aborted)
uv_file/create/error/no_entry                               [ ERROR ]
Error: test/lib/dir.c:97: No such file or directory
Error: child killed by signal 6 (Aborted)
uv_file/create/error/already_exists                         [ ERROR ]
Error: test/lib/dir.c:97: No such file or directory
Error: child killed by signal 6 (Aborted)
uv_file/create/error/no_space                               [ ERROR ]
Error: test/lib/dir.c:97: No such file or directory
Error: child killed by signal 6 (Aborted)
uv_file/create/error/no_resources                           [ ERROR ]
Error: test/lib/dir.c:97: No such file or directory
Error: child killed by signal 6 (Aborted)
uv_file/create/error/cancel                                 [ ERROR ]
Error: test/lib/dir.c:97: No such file or directory
Error: child killed by signal 6 (Aborted)
uv_file/write/one                                           
  dir-fs=tmpfs                                              [ ERROR ]
Error: test/lib/dir.c:97: No such file or directory
Error: child killed by signal 6 (Aborted)
  dir-fs=ext4                                               [ ERROR ]
Error: test/lib/dir.c:97: No such file or directory
Error: child killed by signal 6 (Aborted)
uv_file/write/two                                           
  dir-fs=tmpfs                                              [ ERROR ]
Error: test/lib/dir.c:97: No such file or directory
Error: child killed by signal 6 (Aborted)
  dir-fs=ext4                                               [ ERROR ]
Error: test/lib/dir.c:97: No such file or directory
Error: child killed by signal 6 (Aborted)
uv_file/write/twice                                         
  dir-fs=tmpfs                                              [ ERROR ]
Error: test/lib/dir.c:97: No such file or directory
Error: child killed by signal 6 (Aborted)
  dir-fs=ext4                                               [ ERROR ]
Error: test/lib/dir.c:97: No such file or directory
Error: child killed by signal 6 (Aborted)
uv_file/write/vec                                           
  dir-fs=tmpfs                                              [ ERROR ]
Error: test/lib/dir.c:97: No such file or directory
Error: child killed by signal 6 (Aborted)
  dir-fs=ext4                                               [ ERROR ]
Error: test/lib/dir.c:97: No such file or directory
Error: child killed by signal 6 (Aborted)
uv_file/write/vec_twice                                     
  dir-fs=tmpfs                                              [ ERROR ]
Error: test/lib/dir.c:97: No such file or directory
Error: child killed by signal 6 (Aborted)
  dir-fs=ext4                                               [ ERROR ]
Error: test/lib/dir.c:97: No such file or directory
Error: child killed by signal 6 (Aborted)
uv_file/write/concurrent                                    
  dir-fs=tmpfs                                              [ ERROR ]
Error: test/lib/dir.c:97: No such file or directory
Error: child killed by signal 6 (Aborted)
  dir-fs=ext4                                               [ ERROR ]
Error: test/lib/dir.c:97: No such file or directory
Error: child killed by signal 6 (Aborted)
uv_file/write/concurrent_twice                              
  dir-fs=tmpfs                                              [ ERROR ]
Error: test/lib/dir.c:97: No such file or directory
Error: child killed by signal 6 (Aborted)
  dir-fs=ext4                                               [ ERROR ]
Error: test/lib/dir.c:97: No such file or directory
Error: child killed by signal 6 (Aborted)
uv_file/write/error/no_resources                            
  dir-fs=tmpfs                                              [ ERROR ]
Error: test/lib/dir.c:97: No such file or directory
Error: child killed by signal 6 (Aborted)
uv_file/write/error/cancel                                  
  dir-fs=tmpfs                                              [ ERROR ]
Error: test/lib/dir.c:97: No such file or directory
Error: child killed by signal 6 (Aborted)
  dir-fs=ext4                                               [ ERROR ]
Error: test/lib/dir.c:97: No such file or directory
Error: child killed by signal 6 (Aborted)
uv_init/ensure_dir/error/cant_create                        [ ERROR ]
Error: test/lib/dir.c:97: No such file or directory
Error: child killed by signal 6 (Aborted)
uv_init/ensure_dir/error/not_a_dir                          [ ERROR ]
Error: test/lib/dir.c:97: No such file or directory
Error: child killed by signal 6 (Aborted)
uv_init/ensure_dir/error/no_access                          [ ERROR ]
Error: test/lib/dir.c:97: No such file or directory
Error: child killed by signal 6 (Aborted)
uv_init/metadata/empty_dir                                  [ ERROR ]
Error: test/lib/dir.c:97: No such file or directory
Error: child killed by signal 6 (Aborted)
uv_init/metadata/only_1                                     [ ERROR ]
Error: test/lib/dir.c:97: No such file or directory
Error: child killed by signal 6 (Aborted)
uv_init/metadata/1                                          [ ERROR ]
Error: test/lib/dir.c:97: No such file or directory
Error: child killed by signal 6 (Aborted)
uv_init/metadata/2                                          [ ERROR ]
Error: test/lib/dir.c:97: No such file or directory
Error: child killed by signal 6 (Aborted)
uv_init/metadata/short_file                                 [ ERROR ]
Error: test/lib/dir.c:97: No such file or directory
Error: child killed by signal 6 (Aborted)
uv_init/metadata/same_version                               [ ERROR ]
Error: test/lib/dir.c:97: No such file or directory
Error: child killed by signal 6 (Aborted)
uv_init/metadata/error/no_access                            [ ERROR ]
Error: test/lib/dir.c:97: No such file or directory
Error: child killed by signal 6 (Aborted)
uv_init/metadata/error/bad_format                           [ ERROR ]
Error: test/lib/dir.c:97: No such file or directory
Error: child killed by signal 6 (Aborted)
uv_init/metadata/error/bad_version                          [ ERROR ]
Error: test/lib/dir.c:97: No such file or directory
Error: child killed by signal 6 (Aborted)
uv_init/metadata/error/no_space                             [ ERROR ]
Error: test/lib/dir.c:97: No such file or directory
Error: child killed by signal 6 (Aborted)
uv_list/segments/empty                                      [ ERROR ]
Error: test/lib/dir.c:97: No such file or directory
Error: child killed by signal 6 (Aborted)
uv_list/snapshots/one                                       [ ERROR ]
Error: test/lib/dir.c:97: No such file or directory
Error: child killed by signal 6 (Aborted)
uv_list/snapshots/many                                      [ ERROR ]
Error: test/lib/dir.c:97: No such file or directory
Error: child killed by signal 6 (Aborted)
uv_load/segments/ignore_unknown                             [ ERROR ]
Error: test/lib/dir.c:97: No such file or directory
Error: child killed by signal 6 (Aborted)
uv_load/segments/closed                                     [ ERROR ]
Error: test/lib/dir.c:97: No such file or directory
Error: child killed by signal 6 (Aborted)
uv_load/segments/open_empty                                 [ ERROR ]
Error: test/lib/dir.c:97: No such file or directory
Error: child killed by signal 6 (Aborted)
uv_load/segments/open_all_zeros                             [ ERROR ]
Error: test/lib/dir.c:97: No such file or directory
Error: child killed by signal 6 (Aborted)
uv_load/segments/open_not_all_zeros                         [ ERROR ]
Error: test/lib/dir.c:97: No such file or directory
Error: child killed by signal 6 (Aborted)
uv_load/segments/open_truncate                              [ ERROR ]
Error: test/lib/dir.c:97: No such file or directory
Error: child killed by signal 6 (Aborted)
uv_load/segments/open_partial_bach                          [ ERROR ]
Error: test/lib/dir.c:97: No such file or directory
Error: child killed by signal 6 (Aborted)
uv_load/segments/open_second                                [ ERROR ]
Error: test/lib/dir.c:97: No such file or directory
Error: child killed by signal 6 (Aborted)
uv_load/segments/open_second_all_zeroes                     [ ERROR ]
Error: test/lib/dir.c:97: No such file or directory
Error: child killed by signal 6 (Aborted)
uv_load/segments/open                                       [ ERROR ]
Error: test/lib/dir.c:97: No such file or directory
Error: child killed by signal 6 (Aborted)
uv_load/snapshot/many                                       [ ERROR ]
Error: test/lib/dir.c:97: No such file or directory
Error: child killed by signal 6 (Aborted)
uv_load/snapshot/closed_segment_with_old_entries            [ ERROR ]
Error: test/lib/dir.c:97: No such file or directory
Error: child killed by signal 6 (Aborted)
uv_load/snapshot/dangling_open_segment                      [ ERROR ]
Error: test/lib/dir.c:97: No such file or directory
Error: child killed by signal 6 (Aborted)
uv_load/snapshot/dangling_open_segment_behind               [ ERROR ]
Error: test/lib/dir.c:97: No such file or directory
Error: child killed by signal 6 (Aborted)
uv_load/snapshot/valid_closed_segments                      [ ERROR ]
Error: test/lib/dir.c:97: No such file or directory
Error: child killed by signal 6 (Aborted)
uv_load/snapshot/noncontigous_closed_segments               [ ERROR ]
Error: test/lib/dir.c:97: No such file or directory
Error: child killed by signal 6 (Aborted)
uv_load/snapshot/more_recent_closed_segment                 [ ERROR ]
Error: test/lib/dir.c:97: No such file or directory
Error: child killed by signal 6 (Aborted)
uv_load/error/short_format                                  [ ERROR ]
Error: test/lib/dir.c:97: No such file or directory
Error: child killed by signal 6 (Aborted)
uv_load/error/short_preamble                                [ ERROR ]
Error: test/lib/dir.c:97: No such file or directory
Error: child killed by signal 6 (Aborted)
uv_load/error/short_header                                  [ ERROR ]
Error: test/lib/dir.c:97: No such file or directory
Error: child killed by signal 6 (Aborted)
uv_load/error/short_data                                    [ ERROR ]
Error: test/lib/dir.c:97: No such file or directory
Error: child killed by signal 6 (Aborted)
uv_load/error/corrupt_header                                [ ERROR ]
Error: test/lib/dir.c:97: No such file or directory
Error: child killed by signal 6 (Aborted)
uv_load/error/corrupt_data                                  [ ERROR ]
Error: test/lib/dir.c:97: No such file or directory
Error: child killed by signal 6 (Aborted)
uv_load/error/closed_bad_index                              [ ERROR ]
Error: test/lib/dir.c:97: No such file or directory
Error: child killed by signal 6 (Aborted)
uv_load/error/closed_empty                                  [ ERROR ]
Error: test/lib/dir.c:97: No such file or directory
Error: child killed by signal 6 (Aborted)
uv_load/error/closed_bad_format                             [ ERROR ]
Error: test/lib/dir.c:97: No such file or directory
Error: child killed by signal 6 (Aborted)
uv_load/error/open_no_access                                [ ERROR ]
Error: test/lib/dir.c:97: No such file or directory
Error: child killed by signal 6 (Aborted)
uv_load/error/open_zero_format                              [ ERROR ]
Error: test/lib/dir.c:97: No such file or directory
Error: child killed by signal 6 (Aborted)
uv_load/error/open_bad_format                               [ ERROR ]
Error: test/lib/dir.c:97: No such file or directory
Error: child killed by signal 6 (Aborted)
uv_prepare/success/first                                    [ ERROR ]
Error: test/lib/dir.c:97: No such file or directory
Error: child killed by signal 6 (Aborted)
uv_prepare/success/second                                   [ ERROR ]
Error: test/lib/dir.c:97: No such file or directory
Error: child killed by signal 6 (Aborted)
uv_prepare/error/no_resources                               [ ERROR ]
Error: test/lib/dir.c:97: No such file or directory
Error: child killed by signal 6 (Aborted)
uv_prepare/error/no_space                                   [ ERROR ]
Error: test/lib/dir.c:97: No such file or directory
Error: child killed by signal 6 (Aborted)
uv_prepare/error/oom                                        
  heap-fault-delay=0, heap-fault-repeat=1                   [ ERROR ]
Error: test/lib/dir.c:97: No such file or directory
Error: child killed by signal 6 (Aborted)
  heap-fault-delay=1, heap-fault-repeat=1                   [ ERROR ]
Error: test/lib/dir.c:97: No such file or directory
Error: child killed by signal 6 (Aborted)
uv_prepare/close/noop                                       [ ERROR ]
Error: test/lib/dir.c:97: No such file or directory
Error: child killed by signal 6 (Aborted)
uv_prepare/close/cancel_requests                            [ ERROR ]
Error: test/lib/dir.c:97: No such file or directory
Error: child killed by signal 6 (Aborted)
uv_prepare/close/remove_pool                                [ ERROR ]
Error: test/lib/dir.c:97: No such file or directory
Error: child killed by signal 6 (Aborted)
uv_append/success/first                                     [ ERROR ]
Error: test/lib/dir.c:97: No such file or directory
Error: child killed by signal 6 (Aborted)
uv_append/success/fit_block                                 [ ERROR ]
Error: test/lib/dir.c:97: No such file or directory
Error: child killed by signal 6 (Aborted)
uv_append/success/match_block                               [ ERROR ]
Error: test/lib/dir.c:97: No such file or directory
Error: child killed by signal 6 (Aborted)
uv_append/success/exceed_block                              [ ERROR ]
Error: test/lib/dir.c:97: No such file or directory
Error: child killed by signal 6 (Aborted)
uv_append/success/batch                                     [ ERROR ]
Error: test/lib/dir.c:97: No such file or directory
Error: child killed by signal 6 (Aborted)
uv_append/success/wait                                      [ ERROR ]
Error: test/lib/dir.c:97: No such file or directory
Error: child killed by signal 6 (Aborted)
uv_append/success/resize_arena                              [ ERROR ]
Error: test/lib/dir.c:97: No such file or directory
Error: child killed by signal 6 (Aborted)
uv_append/success/truncate                                  [ ERROR ]
Error: test/lib/dir.c:97: No such file or directory
Error: child killed by signal 6 (Aborted)
uv_append/success/truncate_closing                          [ ERROR ]
Error: test/lib/dir.c:97: No such file or directory
Error: child killed by signal 6 (Aborted)
uv_append/success/counter                                   [ ERROR ]
Error: test/lib/dir.c:97: No such file or directory
Error: child killed by signal 6 (Aborted)
uv_append/error/too_big                                     [ ERROR ]
Error: test/lib/dir.c:97: No such file or directory
Error: child killed by signal 6 (Aborted)
uv_append/error/cancel                                      [ ERROR ]
Error: test/lib/dir.c:97: No such file or directory
Error: child killed by signal 6 (Aborted)
uv_append/error/write                                       [ ERROR ]
Error: test/lib/dir.c:97: No such file or directory
Error: child killed by signal 6 (Aborted)
uv_append/error/oom                                         
  heap-fault-delay=0, heap-fault-repeat=1                   [ ERROR ]
Error: test/lib/dir.c:97: No such file or directory
Error: child killed by signal 6 (Aborted)
  heap-fault-delay=1, heap-fault-repeat=1                   [ ERROR ]
Error: test/lib/dir.c:97: No such file or directory
Error: child killed by signal 6 (Aborted)
uv_append/close/during_write                                [ ERROR ]
Error: test/lib/dir.c:97: No such file or directory
Error: child killed by signal 6 (Aborted)
uv_append/close/current_segment                             [ ERROR ]
Error: test/lib/dir.c:97: No such file or directory
Error: child killed by signal 6 (Aborted)
uv_finalize/success/first                                   [ ERROR ]
Error: test/lib/dir.c:97: No such file or directory
Error: child killed by signal 6 (Aborted)
uv_finalize/success/unused                                  [ ERROR ]
Error: test/lib/dir.c:97: No such file or directory
Error: child killed by signal 6 (Aborted)
uv_finalize/success/wait                                    [ ERROR ]
Error: test/lib/dir.c:97: No such file or directory
Error: child killed by signal 6 (Aborted)
uv_finalize/error/oom                                       
  heap-fault-delay=0, heap-fault-repeat=1                   [ ERROR ]
Error: test/lib/dir.c:97: No such file or directory
Error: child killed by signal 6 (Aborted)
uv_snapshot/load/success/first                              [ ERROR ]
Error: test/lib/dir.c:97: No such file or directory
Error: child killed by signal 6 (Aborted)
uv_snapshot/load/error/no_metadata                          [ ERROR ]
Error: test/lib/dir.c:97: No such file or directory
Error: child killed by signal 6 (Aborted)
uv_snapshot/load/error/no_header                            [ ERROR ]
Error: test/lib/dir.c:97: No such file or directory
Error: child killed by signal 6 (Aborted)
uv_snapshot/load/error/format                               [ ERROR ]
Error: test/lib/dir.c:97: No such file or directory
Error: child killed by signal 6 (Aborted)
uv_snapshot/load/error/configuration_too_big                [ ERROR ]
Error: test/lib/dir.c:97: No such file or directory
Error: child killed by signal 6 (Aborted)
uv_snapshot/load/error/no_configuration                     [ ERROR ]
Error: test/lib/dir.c:97: No such file or directory
Error: child killed by signal 6 (Aborted)
uv_snapshot/load/error/no_data                              [ ERROR ]
Error: test/lib/dir.c:97: No such file or directory
Error: child killed by signal 6 (Aborted)
uv_snapshot/load/error/oom                                  
  heap-fault-delay=0, heap-fault-repeat=1                   [ ERROR ]
Error: test/lib/dir.c:97: No such file or directory
Error: child killed by signal 6 (Aborted)
  heap-fault-delay=1, heap-fault-repeat=1                   [ ERROR ]
Error: test/lib/dir.c:97: No such file or directory
Error: child killed by signal 6 (Aborted)
  heap-fault-delay=2, heap-fault-repeat=1                   [ ERROR ]
Error: test/lib/dir.c:97: No such file or directory
Error: child killed by signal 6 (Aborted)
  heap-fault-delay=3, heap-fault-repeat=1                   [ ERROR ]
Error: test/lib/dir.c:97: No such file or directory
Error: child killed by signal 6 (Aborted)
  heap-fault-delay=4, heap-fault-repeat=1                   [ ERROR ]
Error: test/lib/dir.c:97: No such file or directory
Error: child killed by signal 6 (Aborted)
uv_snapshot/put/first                                       [ ERROR ]
Error: test/lib/dir.c:97: No such file or directory
Error: child killed by signal 6 (Aborted)
uv_snapshot/put/entries_less_than_trailing                  [ ERROR ]
Error: test/lib/dir.c:97: No such file or directory
Error: child killed by signal 6 (Aborted)
uv_snapshot/put/entries_more_than_trailing                  [ ERROR ]
Error: test/lib/dir.c:97: No such file or directory
Error: child killed by signal 6 (Aborted)
uv_snapshot/put/after_truncate                              [ ERROR ]
Error: test/lib/dir.c:97: No such file or directory
Error: child killed by signal 6 (Aborted)
uv_snapshot/get/first                                       [ ERROR ]
Error: test/lib/dir.c:97: No such file or directory
Error: child killed by signal 6 (Aborted)
uv_truncate/success/whole_segment                           [ ERROR ]
Error: test/lib/dir.c:97: No such file or directory
Error: child killed by signal 6 (Aborted)
uv_truncate/success/same_as_last_index                      [ ERROR ]
Error: test/lib/dir.c:97: No such file or directory
Error: child killed by signal 6 (Aborted)
uv_truncate/success/partial_segment                         [ ERROR ]
Error: test/lib/dir.c:97: No such file or directory
Error: child killed by signal 6 (Aborted)
uv_truncate/error/oom                                       
  heap-fault-delay=0, heap-fault-repeat=1                   [ ERROR ]
Error: test/lib/dir.c:97: No such file or directory
Error: child killed by signal 6 (Aborted)
uv_tcp_connect/success/first                                [ OK    ] [ 0.00009344 / 0.00009341 CPU ]
uv_tcp_connect/error/refused                                [ OK    ] [ 0.00004604 / 0.00004597 CPU ]
uv_tcp_connect/error/oom                                    
  heap-fault-delay=0, heap-fault-repeat=1                   [ OK    ] [ 0.00000113 / 0.00000090 CPU ]
  heap-fault-delay=1, heap-fault-repeat=1                   [ OK    ] [ 0.00000296 / 0.00000278 CPU ]
uv_tcp_connect/error/oom_async                              
  heap-fault-delay=0, heap-fault-repeat=1                   [ OK    ] [ 0.00010070 / 0.00010061 CPU ]
uv_tcp_connect/close/immediately                            [ OK    ] [ 0.00007095 / 0.00007088 CPU ]
uv_tcp_connect/close/handshake                              [ OK    ] [ 0.00008321 / 0.00008319 CPU ]
uv_tcp_listen/success/first                                 [ OK    ] [ 0.00006693 / 0.00006689 CPU ]
uv_tcp_listen/error/bad_protocol                            [ OK    ] [ 0.00006865 / 0.00006861 CPU ]
uv_tcp_listen/error/abort                                   
  n=8                                                       [ OK    ] [ 0.00012617 / 0.00012588 CPU ]
  n=16                                                      [ OK    ] [ 0.00006342 / 0.00006341 CPU ]
  n=24                                                      [ OK    ] [ 0.00006406 / 0.00006405 CPU ]
  n=32                                                      [ OK    ] [ 0.00006250 / 0.00006248 CPU ]
uv_tcp_listen/error/oom                                     
  heap-fault-delay=0, heap-fault-repeat=1                   [ OK    ] [ 0.00003394 / 0.00003392 CPU ]
  heap-fault-delay=1, heap-fault-repeat=1                   [ OK    ] [ 0.00003342 / 0.00003334 CPU ]
  heap-fault-delay=2, heap-fault-repeat=1                   [ OK    ] [ 0.00005242 / 0.00005239 CPU ]
uv_tcp_listen/close/pending                                 [ OK    ] [ 0.00001620 / 0.00001612 CPU ]
uv_tcp_listen/close/connected                               [ OK    ] [ 0.00003142 / 0.00003132 CPU ]
uv_tcp_listen/close/handshake                               
  n=8                                                       [ OK    ] [ 0.00004364 / 0.00004360 CPU ]
  n=16                                                      [ OK    ] [ 0.00004364 / 0.00004362 CPU ]
  n=24                                                      [ OK    ] [ 0.00004509 / 0.00004506 CPU ]
  n=32                                                      [ OK    ] [ 0.00004579 / 0.00004575 CPU ]
io_uv_recv/success/first                                    [ ERROR ]
Error: test/lib/dir.c:97: No such file or directory
Error: child killed by signal 6 (Aborted)
io_uv_recv/success/second                                   [ ERROR ]
Error: test/lib/dir.c:97: No such file or directory
Error: child killed by signal 6 (Aborted)
io_uv_recv/success/request_vote_result                      [ ERROR ]
Error: test/lib/dir.c:97: No such file or directory
Error: child killed by signal 6 (Aborted)
io_uv_recv/success/append_entries                           [ ERROR ]
Error: test/lib/dir.c:97: No such file or directory
Error: child killed by signal 6 (Aborted)
io_uv_recv/success/heartbeat                                [ ERROR ]
Error: test/lib/dir.c:97: No such file or directory
Error: child killed by signal 6 (Aborted)
io_uv_recv/success/append_entries_result                    [ ERROR ]
Error: test/lib/dir.c:97: No such file or directory
Error: child killed by signal 6 (Aborted)
io_uv_recv/success/install_snapshot                         [ ERROR ]
Error: test/lib/dir.c:97: No such file or directory
Error: child killed by signal 6 (Aborted)
io_uv_recv/error/bad_protocol                               [ ERROR ]
Error: test/lib/dir.c:97: No such file or directory
Error: child killed by signal 6 (Aborted)
io_uv_recv/error/bad_size                                   [ ERROR ]
Error: test/lib/dir.c:97: No such file or directory
Error: child killed by signal 6 (Aborted)
io_uv_recv/error/bad_type                                   [ ERROR ]
Error: test/lib/dir.c:97: No such file or directory
Error: child killed by signal 6 (Aborted)
io_uv_recv/error/oom                                        
  heap-fault-delay=3, heap-fault-repeat=1                   [ ERROR ]
Error: test/lib/dir.c:97: No such file or directory
Error: child killed by signal 6 (Aborted)
  heap-fault-delay=4, heap-fault-repeat=1                   [ ERROR ]
Error: test/lib/dir.c:97: No such file or directory
Error: child killed by signal 6 (Aborted)
  heap-fault-delay=5, heap-fault-repeat=1                   [ ERROR ]
Error: test/lib/dir.c:97: No such file or directory
Error: child killed by signal 6 (Aborted)
  heap-fault-delay=6, heap-fault-repeat=1                   [ ERROR ]
Error: test/lib/dir.c:97: No such file or directory
Error: child killed by signal 6 (Aborted)
io_uv_recv/close/accept                                     [ ERROR ]
Error: test/lib/dir.c:97: No such file or directory
Error: child killed by signal 6 (Aborted)
io_uv_recv/close/append_entries                             [ ERROR ]
Error: test/lib/dir.c:97: No such file or directory
Error: child killed by signal 6 (Aborted)
io_uv_send/success/first                                    [ ERROR ]
Error: test/lib/dir.c:97: No such file or directory
Error: child killed by signal 6 (Aborted)
io_uv_send/success/second                                   [ ERROR ]
Error: test/lib/dir.c:97: No such file or directory
Error: child killed by signal 6 (Aborted)
io_uv_send/success/vote_result                              [ ERROR ]
Error: test/lib/dir.c:97: No such file or directory
Error: child killed by signal 6 (Aborted)
io_uv_send/success/append_entries                           [ ERROR ]
Error: test/lib/dir.c:97: No such file or directory
Error: child killed by signal 6 (Aborted)
io_uv_send/success/heartbeat                                [ ERROR ]
Error: test/lib/dir.c:97: No such file or directory
Error: child killed by signal 6 (Aborted)
io_uv_send/success/append_entries_result                    [ ERROR ]
Error: test/lib/dir.c:97: No such file or directory
Error: child killed by signal 6 (Aborted)
io_uv_send/success/install_snapshot                         [ ERROR ]
Error: test/lib/dir.c:97: No such file or directory
Error: child killed by signal 6 (Aborted)
io_uv_send/error/connect                                    [ ERROR ]
Error: test/lib/dir.c:97: No such file or directory
Error: child killed by signal 6 (Aborted)
io_uv_send/error/bad_address                                [ ERROR ]
Error: test/lib/dir.c:97: No such file or directory
Error: child killed by signal 6 (Aborted)
io_uv_send/error/bad_message                                [ ERROR ]
Error: test/lib/dir.c:97: No such file or directory
Error: child killed by signal 6 (Aborted)
io_uv_send/error/reconnect                                  [ ERROR ]
Error: test/lib/dir.c:97: No such file or directory
Error: child killed by signal 6 (Aborted)
io_uv_send/error/queue                                      [ ERROR ]
Error: test/lib/dir.c:97: No such file or directory
Error: child killed by signal 6 (Aborted)
io_uv_send/error/oom                                        
  heap-fault-delay=0, heap-fault-repeat=1                   [ ERROR ]
Error: test/lib/dir.c:97: No such file or directory
Error: child killed by signal 6 (Aborted)
  heap-fault-delay=1, heap-fault-repeat=1                   [ ERROR ]
Error: test/lib/dir.c:97: No such file or directory
Error: child killed by signal 6 (Aborted)
  heap-fault-delay=2, heap-fault-repeat=1                   [ ERROR ]
Error: test/lib/dir.c:97: No such file or directory
Error: child killed by signal 6 (Aborted)
  heap-fault-delay=3, heap-fault-repeat=1                   [ ERROR ]
Error: test/lib/dir.c:97: No such file or directory
Error: child killed by signal 6 (Aborted)
  heap-fault-delay=4, heap-fault-repeat=1                   [ ERROR ]
Error: test/lib/dir.c:97: No such file or directory
Error: child killed by signal 6 (Aborted)
  heap-fault-delay=5, heap-fault-repeat=1                   [ ERROR ]
Error: test/lib/dir.c:97: No such file or directory
Error: child killed by signal 6 (Aborted)
io_uv_send/error/oom_async                                  
  heap-fault-delay=0, heap-fault-repeat=1                   [ ERROR ]
Error: test/lib/dir.c:97: No such file or directory
Error: child killed by signal 6 (Aborted)
io_uv_send/close/writing                                    [ ERROR ]
Error: test/lib/dir.c:97: No such file or directory
Error: child killed by signal 6 (Aborted)
io_uv_send/close/connecting                                 [ ERROR ]
Error: test/lib/dir.c:97: No such file or directory
Error: child killed by signal 6 (Aborted)
uv/init/dir_too_long                                        [ ERROR ]
Error: test/lib/dir.c:97: No such file or directory
Error: child killed by signal 6 (Aborted)
uv/start/tick                                               [ ERROR ]
Error: test/lib/dir.c:97: No such file or directory
Error: child killed by signal 6 (Aborted)
uv/start/recv                                               [ ERROR ]
Error: test/lib/dir.c:97: No such file or directory
Error: child killed by signal 6 (Aborted)
uv/load/pristine                                            [ ERROR ]
Error: test/lib/dir.c:97: No such file or directory
Error: child killed by signal 6 (Aborted)
uv/bootstrap/pristine                                       [ ERROR ]
Error: test/lib/dir.c:97: No such file or directory
Error: child killed by signal 6 (Aborted)
uv/set_term/term                                            [ ERROR ]
Error: test/lib/dir.c:97: No such file or directory
Error: child killed by signal 6 (Aborted)
uv/set_vote/pristine                                        [ ERROR ]
Error: test/lib/dir.c:97: No such file or directory
Error: child killed by signal 6 (Aborted)
uv/append/pristine                                          [ ERROR ]
Error: test/lib/dir.c:97: No such file or directory
Error: child killed by signal 6 (Aborted)
uv/send/first                                               [ ERROR ]
Error: test/lib/dir.c:97: No such file or directory
Error: child killed by signal 6 (Aborted)
23 of 200 (12%) tests successful, 0 (0%) test skipped.
FAIL test/unit/uv (exit status: 1)

not sure how I'd fix this, except by --disable-uv -- libUV in general breaks so many things, because itself is broken.

ASan reported heap-buffer-overflow

I got heap-buffer-overflow when I ran some tests with example/server

=================================================================
==11599==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x602000000250 at pc 0x7f553b99a8e3 bp 0x7ffcf8dee520 sp 0x7ffcf8dee518
READ of size 8 at 0x602000000250 thread T0
  #0 0x7f553b99a8e2  (/home/zyh/raft/.libs/libraft.so.0+0x808e2) byte.h:133
  #1 0x7f553b99bc9b  (/home/zyh/raft/.libs/libraft.so.0+0x81c9b) uv_encoding.c:390
  #2 0x7f553b99ad61  (/home/zyh/raft/.libs/libraft.so.0+0x80d61) uv_encoding.c:477
  #3 0x7f553b9b438e  (/home/zyh/raft/.libs/libraft.so.0+0x9a38e) uv_recv:260
  #4 0x7f553c4475ce  (/usr/lib/x86_64-linux-gnu/libuv.so.1+0x155ce)
  #5 0x7f553c44833b  (/usr/lib/x86_64-linux-gnu/libuv.so.1+0x1633b)
  #6 0x7f553c44d33f  (/usr/lib/x86_64-linux-gnu/libuv.so.1+0x1b33f)
  #7 0x7f553c43dcc7  (/usr/lib/x86_64-linux-gnu/libuv.so.1+0xbcc7)
  #8 0x4c6542  (/home/zyh/raft/example/server+0x4c6542) example/server.c:473
  #9 0x7f553ab6bbf6  (/lib/x86_64-linux-gnu/libc.so.6+0x21bf6)
  #10 0x41bfe9  (/home/zyh/raft/example/server+0x41bfe9)

0x602000000252 is located 0 bytes to the right of 2-byte region [0x602000000250,0x602000000252)
allocated by thread T0 here:
  #0 0x49605d  (/home/zyh/raft/example/server+0x49605d)
  #1 0x7f553b94daf4  (/home/zyh/raft/.libs/libraft.so.0+0x33af4) heap.c:10
  #2 0x7f553b94d4fa  (/home/zyh/raft/.libs/libraft.so.0+0x334fa) heap.c:57
  #3 0x7f553b9b36b5  (/home/zyh/raft/.libs/libraft.so.0+0x996b5) uv_recv.c:144
  #4 0x7f553c4474a7  (/usr/lib/x86_64-linux-gnu/libuv.so.1+0x154a7)

SUMMARY: AddressSanitizer: heap-buffer-overflow (/home/zyh/raft/.libs/libraft.so.0+0x808e2)
Shadow bytes around the buggy address:
0x0c047fff7ff0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x0c047fff8000: fa fa fd fd fa fa 00 fa fa fa 00 07 fa fa fd fd
0x0c047fff8010: fa fa fd fd fa fa fd fd fa fa 00 07 fa fa 00 07
0x0c047fff8020: fa fa 00 07 fa fa 03 fa fa fa fd fd fa fa fd fd
0x0c047fff8030: fa fa 00 07 fa fa fd fd fa fa fd fd fa fa 00 07
=>0x0c047fff8040: fa fa fd fd fa fa 00 07 fa fa[02]fa fa fa fa fa
0x0c047fff8050: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
0x0c047fff8060: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
0x0c047fff8070: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
0x0c047fff8080: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
0x0c047fff8090: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
Shadow byte legend (one shadow byte represents 8 application bytes):
Addressable:           00
Partially addressable: 01 02 03 04 05 06 07
Heap left redzone:       fa
Freed heap region:       fd
Stack left redzone:      f1
Stack mid redzone:       f2
Stack right redzone:     f3
Stack after return:      f5
Stack use after scope:   f8
Global redzone:          f9
Global init order:       f6
Poisoned by user:        f7
Container overflow:      fc
Array cookie:            ac
Intra object redzone:    bb
ASan internal:           fe
Left alloca redzone:     ca
Right alloca redzone:    cb
Shadow gap:              cc
==11599==ABORTING

Arch linux build failed

' | ^~~~~~~~~~~~~~~~~~~~~~
CC src/uv_tcp.lo
CC src/uv_tcp_connect.lo
CC src/uv_tcp_listen.lo
CC src/uv_truncate.lo
CC src/fixture.lo
src/fixture.c: In function 'raft_fixture_step':
src/fixture.c:1390:39: warning: 'j' may be used uninitialized in this function [-Wmaybe-uninitialized]
1390 | (tick_time == completion_time && i <= j)) {
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~
src/fixture.c:1390:39: warning: 'i' may be used uninitialized in this function [-Wmaybe-uninitialized]
CCLD libraft.la
CCLD example-server
/usr/bin/ld: example/server-server.o: in function sigintCloseCb': /home/user/go/deps/raft/example/server.c:124: undefined reference to uv_timer_stop'
/usr/bin/ld: example/server-server.o: in function sigintCb': /home/user/go/deps/raft/example/server.c:134: undefined reference to uv_signal_stop'
/usr/bin/ld: example/server-server.o: in function serverClose': /home/user/go/deps/raft/example/server.c:256: undefined reference to uv_loop_close'
/usr/bin/ld: example/server-server.o: in function sigintCloseCb': /home/user/go/deps/raft/example/server.c:125: undefined reference to uv_close'
/usr/bin/ld: example/server-server.o: in function sigintCb': /home/user/go/deps/raft/example/server.c:135: undefined reference to uv_close'
/usr/bin/ld: example/server-server.o: in function serverInit': /home/user/go/deps/raft/example/server.c:154: undefined reference to uv_loop_init'
/usr/bin/ld: /home/user/go/deps/raft/example/server.c:161: undefined reference to uv_signal_init' /usr/bin/ld: /home/user/go/deps/raft/example/server.c:245: undefined reference to uv_loop_close'
/usr/bin/ld: /home/user/go/deps/raft/example/server.c:168: undefined reference to uv_timer_init' /usr/bin/ld: /home/user/go/deps/raft/example/server.c:241: undefined reference to uv_close'
/usr/bin/ld: /home/user/go/deps/raft/example/server.c:243: undefined reference to uv_close' /usr/bin/ld: example/server-server.o: in function serverStart':
/home/user/go/deps/raft/example/server.c:346: undefined reference to uv_close' /usr/bin/ld: /home/user/go/deps/raft/example/server.c:347: undefined reference to uv_close'
/usr/bin/ld: /home/user/go/deps/raft/example/server.c:348: undefined reference to uv_run' /usr/bin/ld: /home/user/go/deps/raft/example/server.c:322: undefined reference to uv_signal_start'
/usr/bin/ld: /home/user/go/deps/raft/example/server.c:326: undefined reference to uv_timer_start' /usr/bin/ld: /home/user/go/deps/raft/example/server.c:332: undefined reference to uv_run'
/usr/bin/ld: /home/user/go/deps/raft/example/server.c:340: undefined reference to uv_timer_stop' /usr/bin/ld: /home/user/go/deps/raft/example/server.c:342: undefined reference to uv_signal_stop'
/usr/bin/ld: ./.libs/libraft.so: undefined reference to uv_is_closing' /usr/bin/ld: ./.libs/libraft.so: undefined reference to uv_read_start'
/usr/bin/ld: ./.libs/libraft.so: undefined reference to uv_write' /usr/bin/ld: ./.libs/libraft.so: undefined reference to uv_ip4_addr'
/usr/bin/ld: ./.libs/libraft.so: undefined reference to uv_poll_init' /usr/bin/ld: ./.libs/libraft.so: undefined reference to uv_tcp_connect'
/usr/bin/ld: ./.libs/libraft.so: undefined reference to uv_strerror' /usr/bin/ld: ./.libs/libraft.so: undefined reference to uv_poll_start'
/usr/bin/ld: ./.libs/libraft.so: undefined reference to uv_now' /usr/bin/ld: ./.libs/libraft.so: undefined reference to uv_queue_work'
/usr/bin/ld: ./.libs/libraft.so: undefined reference to uv_tcp_init' /usr/bin/ld: ./.libs/libraft.so: undefined reference to uv_tcp_bind'
/usr/bin/ld: ./.libs/libraft.so: undefined reference to uv_listen' /usr/bin/ld: ./.libs/libraft.so: undefined reference to uv_read_stop'
/usr/bin/ld: ./.libs/libraft.so: undefined reference to `uv_accept''

ASan report bad-free

I got bad-free when I ran some tests with example/server

==24124==WARNING: AddressSanitizer failed to allocate 0x312e302e302e3732 bytes
=================================================================
==24124==ERROR: AddressSanitizer: attempting free on address which was not malloc()-ed: 0x611000001618 in thread T0
    #0 0x4934fd  (/home/zyh/raft/example/server+0x4934fd)
    #1 0x7fb7421e92dc  (/home/zyh/raft/.libs/libraft.so.0+0xe2dc) src/heap.c:65
    #2 0x7fb742205bba  (/home/zyh/raft/.libs/libraft.so.0+0x2abba) src/uv_encoding.c:371
    #3 0x7fb74220cb5a  (/home/zyh/raft/.libs/libraft.so.0+0x31b5a) src/uv_recv.c:260
    #4 0x7fb742c7e5ce  (/usr/lib/x86_64-linux-gnu/libuv.so.1+0x155ce)
    #5 0x7fb742c7f33b  (/usr/lib/x86_64-linux-gnu/libuv.so.1+0x1633b)
    #6 0x7fb742c8433f  (/usr/lib/x86_64-linux-gnu/libuv.so.1+0x1b33f)
    #7 0x7fb742c74cc7  (/usr/lib/x86_64-linux-gnu/libuv.so.1+0xbcc7)
    #8 0x4c3e4f  (/home/zyh/raft/example/server+0x4c3e4f)
    #9 0x7fb74142abf6  (/lib/x86_64-linux-gnu/libc.so.6+0x21bf6)
    #10 0x41b879  (/home/zyh/raft/example/server+0x41b879)

0x611000001618 is located 152 bytes inside of 200-byte region [0x611000001580,0x611000001648)
allocated by thread T0 here:
    #0 0x49377d  (/home/zyh/raft/example/server+0x49377d)
    #1 0x7fb74220c2c9  (/home/zyh/raft/.libs/libraft.so.0+0x312c9) src/uv_recv.c:347

SUMMARY: AddressSanitizer: bad-free (/home/zyh/raft/example/server+0x4934fd)
==24124==ABORTING

example/server failed to start after killing

Hi, all
I manually started 3 nodes of example/server, and try to kill/restart them all together several times with killall.

Then it sometimes failed to start with following logs:

1: starting
2: raft_start(): io: last entry on disk has index 389, which is behind last snapshot's index 345408
2: stopping
1: raft_start(): io: last entry on disk has index 389, which is behind last snapshot's index 345408
1: stopping

musl build fails

I am packaging this software for VoidLinux, so far it builds for every single architecture except for *-musl. It fails because it does not find the error.h header (src/replication.c` line 6) in the musl libc implementation.
Removing that line makes the compiler happy and compiles.

Is this patch good enough?

--- src/replication.c	2019-07-17 15:35:04.329935151 -0700
+++ src/replication.c	2019-07-17 15:44:11.922717037 -0700
@@ -3,7 +3,9 @@
 #include "assert.h"
 #include "configuration.h"
 #include "convert.h"
-#include "error.h"
+#ifdef __GLIBC__
+    #include "error.h"
+#endif
 #include "log.h"
 #include "logging.h"
 #include "membership.h"

support send snapshot by patch mode

To avoid send large snapshot, we can use tools like bsdiff to generate patch for snapshot, and each node need keep some snapshot index to generate the patch.

on all RPC results message need add the node snapshot index list. (number array)

When leader send snapshot install rpc, use the follower's snapshot index list find a recent local snapshot, then made a bsdiff patch. the install snapshot RCP should include the selected local snapshot index.

The follower will apply the patch with they own local snapshot, then restore the pached snapshot.

And a option to set how many snapshot should keep.

This should be a options function so people can disable it, if enabled they should provide a patch/apply_patch callback into FSM. (and this callback should be exec from uv_worker)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.