Code Monkey home page Code Monkey logo

mir's Introduction

Go Reference Mir Test Go Report Card

Mir - The Distributed Protocol Implementation Framework

Mir is a framework for implementing, debugging, and analyzing distributed protocols. It has the form of a library that provides abstractions representing different components of a distributed system and an engine orchestrating their interaction.

Mir aims to be general enough to enable implementing distributed protocols in a way that is agnostic to network transport, storage, and cryptographic primitive implementation. All these (and other) usual components of a distributed protocol implementation are encapsulated in abstractions with well-defined interfaces. While Mir provides some implementations of those abstractions to be used directly "out of the box", the consumer is free to provide their own implementations.

The first intended use of Mir is as a scalable and efficient consensus layer in Filecoin subnets and, potentially, as a Byzantine fault-tolerant ordering service in Hyperledger Fabric. However, Mir hopes to be a building block of a next generation of distributed systems, being used by many applications.

Currently Mir includes an implementation of the Trantor modular state machine replication system. It has also been used to implement and evaluate the Alea-BFT protocol.

Nodes, Modules, and Events

Mir is a framework for implementing distributed protocols (also referred to as distributed algorithms) meant to run on a distributed system. The basic unit of a distributed system is a node. Each node locally executes (its portion of) a protocol, sending and receiving messages to and from other nodes over a communication network.

Mir models a node of such a distributed system and presents the consumer (the programmer using Mir) with a Node abstraction. The main task of a Node is to process events.

A node contains one or multiple Modules that implement the logic for event processing. Each module independently consumes, processes, and outputs events. This approach bears resemblance to the actor model, where events exchanged between modules correspond to messages exchanged between actors.

The Node implements an event loop where all events created by modules are stored in a buffer and distributed to their corresponding target modules for processing. For example, when the networking module receives a protocol message over the network, it generates a MessageReceived event (containing the received message) that the node implementation routes to the protocol module, which processes the message, potentially outputting SendMessage events that the Node implementation routes back to the networking module.

The architecture described above enables a powerful debugging approach. All Events in the event loop can, in debug mode, be recorded, inspected, or even modified and replayed to the Node using a debugging interface.

In practice, when instantiating a Node, the consumer of Mir provides implementations of these modules to Mir. For example, instantiating a node of a state machine replication system might look as follows:

    // Example Node instantiation
    node, err := mir.NewNode(
		/* some more technical arguments ... */
		&modules.Modules{
			// ... 
			"app":      NewChatApp(),
			"protocol": TOBProtocol,
			"net":      grpcNetworking,
			"crypto":   ecdsaCrypto,
		},
		eventInterceptor,
		writeAheadLog,
	)

Example Mir Node

Here the consumer provides modules for networking (implements sending and receiving messages over the network), the protocol logic (using some total-order broadcast protocol), the application (implementing the logic of the replicated app), and a cryptographic module (able to produce and verify digital signatures using ECDSA). the eventInterceptor implements recording of the events passed between the modules for analysis and debugging purposes. The writeAheadLog is a special module that enables the node to recover from crashes. For more details, see the Documentation.

The programmer working with Mir is free to provide own implementations of these modules, but Mir already comes bundled with several module implementations out of the box.

Relation to the Mir-BFT protocol

Mir-BFT is a scalable atomic broadcast protocol. The Mir framework initially started as an implementation of that protocol - thus the related name - but has since been made completely independent of Mir-BFT. Even the implementation of the Mir-BFT protocol itself has been abandoned and replaced by its successor, ISS, which is intended to be the first protocol implemented within Mir. However, since Mir is designed to be modular and versatile, ISS is just one (the first) of the protocols implemented in Mir.

Current Status

This library is in development. This document describes what the library should become rather than what it currently is. This document itself is more than likely to still change. You are more than welcome to contribute to accelerating the development of the Mir library as an open-source contributor. Have a look at the Contributions section if you want to help out!

Compiling and running tests

Assuming Go version 1.18 or higher is installed, the tests can be run by executing go test ./... in the root folder of the repository. The dependencies should be downloaded and installed automatically. Some of the dependencies may require gcc installed. On Ubuntu Linux, it can be done by invoking sudo apt install gcc.

If the sources have been updated, it is possible that some of the generated source files need to be updated as well. More specifically, the Mir library relies on Protocol Buffers and gomock. The protoc compiler and the corresponding Go plugin need to be installed as well as mockgen. On Ubuntu Linux, those can be installed using

sudo snap install --classic protobuf
go install google.golang.org/protobuf/cmd/protoc-gen-go@latest
go install github.com/golang/mock/[email protected]

Make sure (by configuring your shell interpreter) that the directory with Go binaries (usually ~/go/bin by default) is included in the PATH environment variable. On a default Ubuntu Linux system, for example, this can be achieved by running

echo 'PATH=$PATH:~/go/bin' >> ~/.profile

and restarting the terminal.

Once the dependencies are ready, the generated sources can be updated by executing go generate ./... in the root directory.

Documentation

For a description of the design and inner workings of the library, see Mir Library Overview. We also keep a log of Architecture Decision Records (ADRs).

For a small demo application, see /samples/chat-demo

For an automated deployment of Mir on a set of remote machines, see the remote deployment instructions.

Getting started

To get started using (and contributing to) Mir, in addition to this README, we recommend the following:

  1. Watch the first introductory video
  2. Read the Mir Library Overview
  3. Watch the second introductory video. (Very low-level coding, this is not how Mir coding works in real life - it is for developers to understand how Mir internally works. Realistic coding videos will follow soon.)
  4. Check out the chat-demo sample application to learn how to use Mir for state machine replication.
  5. To see an example of using a DSL module (allowing to write pseudocode-like code for the protocol logic), look at the implementation of Byzantine Consistent Broadcast (BCB) being used in the corresponding sample application. Original pseudocode can also be found in these lecture notes (Algorithm 4 (Echo broadcast [Rei94])).
  6. A more complex example of DSL code is the implementation of the SMR availability layer (concretely the multisigcollector).

To learn about the first complex system being built on Mir, have a look at Trantor, a complex SMR system being implemented using Mir.

Contributing

Contributions are more than welcome!

If you want to contribute, have a look at the open issues. If you have any questions (specific or general), do not hesitate to drop an email to the active maintainer(s) or write a message to the developers in the public Slack channel #mir-dev of the Filecoin project.

Active maintainer(s)

License

The Mir library source code is made available under the Apache License, version 2.0 (Apache-2.0), located in the LICENSE file.

Acknowledgments

This project is a continuation of the development started under the name mirbft as a Hyperledger Lab.

mir's People

Contributors

abread avatar adlrocha avatar codingjzy avatar dependabot[bot] avatar dnkolegov avatar jsoares avatar matejpavlovic avatar ranchalp avatar sergefdrv avatar xosmig avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

mir's Issues

Protocol Implementation Guide

Write a document describing how a Protocol module should be implemented.
Ideally it could even include a tutorial walking a new programmer new to Mir through the necessary steps of implementing their first distributed protocol.

Test strategy for Mir

Prepare a test strategy for Mir framework. The test strategy should outline the testing objectives and approach to ensure decent stability of the Mir framework and ISS implementation within such that it can be eventually used as a reliable consensus mechanism in sub-chains of Filecoin with hierarchical consensus.

The document: https://hackmd.io/EAak8WMCQBaHAN8i94IyBw.

Eudico's State Manager

  • Mempool integration (0.5 weeks)
  • Block assembler (0.5 week)
  • (Re)Configuration
    • Format of configuration data and config transaction flow (1 week)
    • Reconfiguration
      • Instantiating components with new configuration (1 week)
      • Adding / removing nodes (2 weeks)
      • Garbage-collection of old components (1 week)
    • State transfer (1 week)
    • Testing / debugging / buffer (1 week)

Estimated duration: 8 weeks

Mock modules

Standard mocking utils like gomock do not really work well with Mir because they only allow setting expectations on method calls while we need to set expectations on the received events and, instead of specifying return values, we should be able to specify output events.

Improve input validation in splitAddrPort

Here and here parsing and input validation are not accurate.
At least, it doesn't consider the case when splitAddrPort(gt.membership[gt.ownId]) called and gt.ownId is not in the membership.

We should check whether gt.ownId in the map or return different error messages.

General approach for protocol message retransmission

Currently, any protocol implementation in Mir needs to manage periodic retransmission of its messages based on an abstract notion of time.
Since the need for periodically retransmitting certain protocol messages occurs at multiple places and in various protocols, it would be useful to create an abstraction for periodic message retransmission that can easily be reused.

Potential challenges:

  • Finding a compromise between encapsulating in a separate abstraction and giving the protocol enough control
  • Stopping retransmission when no longer needed

Running Eudico with Mir triggers a panic: type assertion error

panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x2 addr=0x30 pc=0x105a4a7c4]

goroutine 15 [running]:
github.com/filecoin-project/mir.(*workItems).AddEvents(0x140020af1c0, 0x140001ee120?)
	/Users/alpha/go/pkg/mod/github.com/filecoin-project/[email protected]/workitems.go:86 +0x154
github.com/filecoin-project/mir.(*Node).processWAL(0x14000e6e400)
	/Users/alpha/go/pkg/mod/github.com/filecoin-project/[email protected]/node.go:214 +0xe0
github.com/filecoin-project/mir.(*Node).Run(0x14000e6e400, {0x107fc4348, 0x14000dc0880}, 0x140010f62a0?)
	/Users/alpha/go/pkg/mod/github.com/filecoin-project/[email protected]/node.go:182 +0x3c
github.com/filecoin-project/lotus/chain/consensus/mir.(*MirAgent).Start.func2()
	/Users/alpha/Projects/eudico/chain/consensus/mir/mir_agent.go:143 +0x6c
created by github.com/filecoin-project/lotus/chain/consensus/mir.(*MirAgent).Start
	/Users/alpha/Projects/eudico/chain/consensus/mir/mir_agent.go:142 +0x1d8

Line 86 performs the type assertion operation without checking if type assertion holds or not.

If the type assertion is false, a run-time panic occurs.

The result of type assertion is not checked systematically. I believe we MUST enforce this check everywhere in MIR.

Mircat debugging tool

Implement a debugging tool that prints the events that occurred at a mirbft node.

Those events are usually gathered by an event interceptor while the node runs.
Mircat prints a human-readable representation of the events in the order in which they have been intercepted.

As a base for the implementation, the legacy version of mircat can be used and adapted to the new event format.

See also the description of developing Mircat as a stand-alone project.

Race detected during execution of the initial version tests

Steps to reproduce:

  1. git reset 3ba8221c2b44624157cb386a9552be90ea3eace1
  2. go test -race
  3. You will see something like that:
•Creating temp dir: /var/folders/p_/rj3w31dn05942p50hxzy03fc0000gn/T/mir-deployment-test.2218592999
Node 0: running
Connecting to node: 127.0.0.1:20000
Node 0: ReqRec: Listening for request connections on port 20000
Node 0: ISS: PBFT: Proposing. epoch=0 instance=0 sn=0 batchSize=0
Node 0: ISS: Delivering entry. sn=0 nReq=0
Node 0: ISS: PBFT: Proposing. epoch=0 instance=0 sn=1 batchSize=0
Node 0: ISS: Delivering entry. sn=1 nReq=0
Node 0: ISS: PBFT: Proposing. epoch=0 instance=0 sn=2 batchSize=0
Node 0: ISS: Delivering entry. sn=2 nReq=0
Node 0: ISS: PBFT: Proposing. epoch=0 instance=0 sn=3 batchSize=0
Node 0: ISS: Delivering entry. sn=3 nReq=0
Node 0: ISS: PBFT: Proposing. epoch=0 instance=0 sn=4 batchSize=0
Node 0: ISS: Delivering entry. sn=4 nReq=0
Node 0: ISS: PBFT: Proposing. epoch=0 instance=0 sn=5 batchSize=0
Node 0: ISS: Delivering entry. sn=5 nReq=0
Node 0: ISS: PBFT: Proposing. epoch=0 instance=0 sn=6 batchSize=0
Node 0: ISS: Delivering entry. sn=6 nReq=0
Node 0: ISS: PBFT: Proposing. epoch=0 instance=0 sn=7 batchSize=0
Node 0: ISS: Delivering entry. sn=7 nReq=0
Node 0: ISS: PBFT: Proposing. epoch=0 instance=0 sn=8 batchSize=0
Node 0: ISS: Delivering entry. sn=8 nReq=0
Node 0: ISS: PBFT: Proposing. epoch=0 instance=0 sn=9 batchSize=0
Node %!d(types.NodeID=0) (127.0.0.1:20000) connected.
Node 0: ReqRec: Incoming connection from 127.0.0.1:51983
Node 0: ReqRec: Received request clId=1 reqNo=0 authLen=71
Node 0: ReqRec: Received request clId=1 reqNo=1 authLen=72
Node 0: ReqRec: Received request clId=1 reqNo=2 authLen=71
Node 0: ReqRec: Received request clId=1 reqNo=3 authLen=71
Node 0: ReqRec: Received request clId=1 reqNo=4 authLen=70
Node 0: ReqRec: Received request clId=1 reqNo=5 authLen=71
Node 0: ReqRec: Received request clId=1 reqNo=6 authLen=69
Node 0: ReqRec: Received request clId=1 reqNo=7 authLen=71
Node 0: ReqRec: Received request clId=1 reqNo=8 authLen=72
Node 0: ReqRec: Received request clId=1 reqNo=9 authLen=72
Node 0: ISS: Delivering entry. sn=9 nReq=0
Node 0: ISS: PBFT: Proposing. epoch=1 instance=1 sn=10 batchSize=4
Node 0: ISS: PBFT: Proposing. epoch=1 instance=1 sn=11 batchSize=4
Node 0: ISS: New stable checkpoint. epoch=1 sn=10 replacingEpoch=0 replacingSn=0
Node 0: ISS: PBFT: Proposing. epoch=1 instance=1 sn=12 batchSize=2
Node 0: ISS: Delivering entry. sn=10 nReq=4
Processed requests: 1
Processed requests: 2
Processed requests: 3
Processed requests: 4
Node 0: ISS: Delivering entry. sn=11 nReq=4
Processed requests: 5
Processed requests: 6
Processed requests: 7
Processed requests: 8
Node 0: ISS: Delivering entry. sn=12 nReq=2
Processed requests: 9
Processed requests: 10
Node 0: ISS: PBFT: Proposing. epoch=1 instance=1 sn=13 batchSize=0
Node 0: ISS: Delivering entry. sn=13 nReq=0
Node 0: ISS: PBFT: Proposing. epoch=1 instance=1 sn=14 batchSize=0
Node 0: ISS: Delivering entry. sn=14 nReq=0
Node 0: ISS: PBFT: Proposing. epoch=1 instance=1 sn=15 batchSize=0
Node 0: ISS: Delivering entry. sn=15 nReq=0
Node 0: ISS: PBFT: Proposing. epoch=1 instance=1 sn=16 batchSize=0
Node 0: ISS: Delivering entry. sn=16 nReq=0
Node 0: ISS: PBFT: Proposing. epoch=1 instance=1 sn=17 batchSize=0
Node 0: ISS: Delivering entry. sn=17 nReq=0
Node 0: ISS: PBFT: Proposing. epoch=1 instance=1 sn=18 batchSize=0
Node 0: ISS: Delivering entry. sn=18 nReq=0
Node 0: ISS: PBFT: Proposing. epoch=1 instance=1 sn=19 batchSize=0
Node 0: ISS: Delivering entry. sn=19 nReq=0
Node 0: ISS: New stable checkpoint. epoch=2 sn=20 replacingEpoch=1 replacingSn=10
==================
WARNING: DATA RACE
Write at 0x00c000560cd0 by goroutine 7:
  google.golang.org/grpc.(*clientStream).CloseSend()
      /Users/alpha/go/pkg/mod/google.golang.org/[email protected]/stream.go:857 +0x4c
  github.com/filecoin-project/mir/pkg/requestreceiver.(*requestReceiverListenClient).CloseAndRecv()
      /Users/alpha/Projects/mir/pkg/requestreceiver/requestreceiver_grpc.pb.go:61 +0x44
  github.com/filecoin-project/mir/pkg/dummyclient.(*DummyClient).Disconnect()
      /Users/alpha/Projects/mir/pkg/dummyclient/dummyclient.go:158 +0xbc
  github.com/filecoin-project/mir/pkg/deploytest.(*Deployment).Run()
      /Users/alpha/Projects/mir/pkg/deploytest/deployment.go:213 +0x574
  github.com/filecoin-project/mir_test.glob..func1.2()
      /Users/alpha/Projects/mir/mirbft_test.go:87 +0x360
  runtime.call16()
      /opt/homebrew/Cellar/go/1.18/libexec/src/runtime/asm_arm64.s:507 +0x78
  reflect.Value.Call()
      /opt/homebrew/Cellar/go/1.18/libexec/src/reflect/value.go:339 +0x94
  github.com/onsi/ginkgo/extensions/table.TableEntry.generateIt.func1()
      /Users/alpha/go/pkg/mod/github.com/onsi/[email protected]/extensions/table/table_entry.go:40 +0x6c
Run returned!
  github.com/onsi/ginkgo/internal/leafnodes.(*runner).runSync()
      /Users/alpha/go/pkg/mod/github.com/onsi/[email protected]/internal/leafnodes/runner.go:113 +0xb0
  github.com/onsi/ginkgo/internal/leafnodes.(*runner).run()
      /Users/alpha/go/pkg/mod/github.com/onsi/[email protected]/internal/leafnodes/runner.go:64 +0xf8
  github.com/onsi/ginkgo/internal/leafnodes.(*ItNode).Run()
      /Users/alpha/go/pkg/mod/github.com/onsi/[email protected]/internal/leafnodes/it_node.go:26 +0x6c
  github.com/onsi/ginkgo/internal/spec.(*Spec).runSample()
      /Users/alpha/go/pkg/mod/github.com/onsi/[email protected]/internal/spec/spec.go:215 +0x248
  github.com/onsi/ginkgo/internal/spec.(*Spec).Run()
      /Users/alpha/go/pkg/mod/github.com/onsi/[email protected]/internal/spec/spec.go:138 +0x148
  github.com/onsi/ginkgo/internal/specrunner.(*SpecRunner).runSpec()
      /Users/alpha/go/pkg/mod/github.com/onsi/[email protected]/internal/specrunner/spec_runner.go:200 +0x138
  github.com/onsi/ginkgo/internal/specrunner.(*SpecRunner).runSpecs()
      /Users/alpha/go/pkg/mod/github.com/onsi/[email protected]/internal/specrunner/spec_runner.go:170 +0x1f8
  github.com/onsi/ginkgo/internal/specrunner.(*SpecRunner).Run()
      /Users/alpha/go/pkg/mod/github.com/onsi/[email protected]/internal/specrunner/spec_runner.go:66 +0x120
  github.com/onsi/ginkgo/internal/suite.(*Suite).Run()
      /Users/alpha/go/pkg/mod/github.com/onsi/[email protected]/internal/suite/suite.go:62 +0x514
Node 0: ReqRec: Stopping request receiver.  github.com/onsi/ginkgo.RunSpecsWithCustomReporters()
      /Users/alpha/go/pkg/mod/github.com/onsi/[email protected]/ginkgo_dsl.go:221 +0x170
  github.com/onsi/ginkgo.RunSpecs()
      /Users/alpha/go/pkg/mod/github.com/onsi/[email protected]/ginkgo_dsl.go:202 +0x60

  github.com/filecoin-project/mir_test.TestMir()
      /Users/alpha/Projects/mir/mirbft_suite_test.go:66 +0x1f0
  testing.tRunner()
      /opt/homebrew/Cellar/go/1.18/libexec/src/testing/testing.go:1439 +0x18c
  testing.(*T).Run.func1()
      /opt/homebrew/Cellar/go/1.18/libexec/src/testing/testing.go:1486 +0x44

Previous read at 0x00c000560cd0 by goroutine 140:
  google.golang.org/grpc.(*clientStream).SendMsg()
      /Users/alpha/go/pkg/mod/google.golang.org/[email protected]/stream.go:776 +0xd8
  github.com/filecoin-project/mir/pkg/requestreceiver.(*requestReceiverListenClient).Send()
      /Users/alpha/Projects/mir/pkg/requestreceiver/requestreceiver_grpc.pb.go:57 +0x58
  github.com/filecoin-project/mir/pkg/dummyclient.(*DummyClient).SubmitRequest()
      /Users/alpha/Projects/mir/pkg/dummyclient/dummyclient.go:131 +0x5d0
  github.com/filecoin-project/mir/pkg/deploytest.submitDummyRequests()
      /Users/alpha/Projects/mir/pkg/deploytest/deployment.go:262 +0xa0
  github.com/filecoin-project/mir/pkg/deploytest.(*Deployment).Run.func2()
      /Users/alpha/Projects/mir/pkg/deploytest/deployment.go:203 +0x64
  github.com/filecoin-project/mir/pkg/deploytest.(*Deployment).Run.func5()
      /Users/alpha/Projects/mir/pkg/deploytest/deployment.go:204 +0x48

Goroutine 7 (running) created at:
  testing.(*T).Run()
      /opt/homebrew/Cellar/go/1.18/libexec/src/testing/testing.go:1486 +0x560
  testing.runTests.func1()
      /opt/homebrew/Cellar/go/1.18/libexec/src/testing/testing.go:1839 +0x94
  testing.tRunner()
      /opt/homebrew/Cellar/go/1.18/libexec/src/testing/testing.go:1439 +0x18c
  testing.runTests()
      /opt/homebrew/Cellar/go/1.18/libexec/src/testing/testing.go:1837 +0x6c8
  testing.(*M).Run()
      /opt/homebrew/Cellar/go/1.18/libexec/src/testing/testing.go:1719 +0x878
  main.main()
      _testmain.go:49 +0x2fc

Goroutine 140 (finished) created at:
  github.com/filecoin-project/mir/pkg/deploytest.(*Deployment).Run()
      /Users/alpha/Projects/mir/pkg/deploytest/deployment.go:201 +0x3c4
  github.com/filecoin-project/mir_test.glob..func1.2()
      /Users/alpha/Projects/mir/mirbft_test.go:87 +0x360
  runtime.call16()
      /opt/homebrew/Cellar/go/1.18/libexec/src/runtime/asm_arm64.s:507 +0x78
  reflect.Value.Call()
      /opt/homebrew/Cellar/go/1.18/libexec/src/reflect/value.go:339 +0x94
  github.com/onsi/ginkgo/extensions/table.TableEntry.generateIt.func1()
      /Users/alpha/go/pkg/mod/github.com/onsi/[email protected]/extensions/table/table_entry.go:40 +0x6c
  github.com/onsi/ginkgo/internal/leafnodes.(*runner).runSync()
      /Users/alpha/go/pkg/mod/github.com/onsi/[email protected]/internal/leafnodes/runner.go:113 +0xb0
  github.com/onsi/ginkgo/internal/leafnodes.(*runner).run()
      /Users/alpha/go/pkg/mod/github.com/onsi/[email protected]/internal/leafnodes/runner.go:64 +0xf8
  github.com/onsi/ginkgo/internal/leafnodes.(*ItNode).Run()
      /Users/alpha/go/pkg/mod/github.com/onsi/[email protected]/internal/leafnodes/it_node.go:26 +0x6c
  github.com/onsi/ginkgo/internal/spec.(*Spec).runSample()
      /Users/alpha/go/pkg/mod/github.com/onsi/[email protected]/internal/spec/spec.go:215 +0x248
  github.com/onsi/ginkgo/internal/spec.(*Spec).Run()
      /Users/alpha/go/pkg/mod/github.com/onsi/[email protected]/internal/spec/spec.go:138 +0x148
  github.com/onsi/ginkgo/internal/specrunner.(*SpecRunner).runSpec()
      /Users/alpha/go/pkg/mod/github.com/onsi/[email protected]/internal/specrunner/spec_runner.go:200 +0x138
  github.com/onsi/ginkgo/internal/specrunner.(*SpecRunner).runSpecs()
      /Users/alpha/go/pkg/mod/github.com/onsi/[email protected]/internal/specrunner/spec_runner.go:170 +0x1f8
  github.com/onsi/ginkgo/internal/specrunner.(*SpecRunner).Run()
      /Users/alpha/go/pkg/mod/github.com/onsi/[email protected]/internal/specrunner/spec_runner.go:66 +0x120
  github.com/onsi/ginkgo/internal/suite.(*Suite).Run()
      /Users/alpha/go/pkg/mod/github.com/onsi/[email protected]/internal/suite/suite.go:62 +0x514
  github.com/onsi/ginkgo.RunSpecsWithCustomReporters()
      /Users/alpha/go/pkg/mod/github.com/onsi/[email protected]/ginkgo_dsl.go:221 +0x170
  github.com/onsi/ginkgo.RunSpecs()
      /Users/alpha/go/pkg/mod/github.com/onsi/[email protected]/ginkgo_dsl.go:202 +0x60
  github.com/filecoin-project/mir_test.TestMir()
      /Users/alpha/Projects/mir/mirbft_suite_test.go:66 +0x1f0
  testing.tRunner()
      /opt/homebrew/Cellar/go/1.18/libexec/src/testing/testing.go:1439 +0x18c
  testing.(*T).Run.func1()
      /opt/homebrew/Cellar/go/1.18/libexec/src/testing/testing.go:1486 +0x44
==================
Node 0: ReqRec: Connection terminated: 127.0.0.1:51983 (EOF)
Node 0: ReqRec: Request receiver stopped.
Could not close connection to node %!d(types.NodeID=0)
Intercepted events written to event log: 420
Node 0: exit with exitErr=stopped at caller request
All go routines shut down
Test finished.

--- FAIL: TestMir (22.75s)
    testing.go:1312: race detected during execution of test
FAIL
exit status 1
FAIL	github.com/filecoin-project/mir	23.507s

Serializing custom types

Each custom type (like NodeID or SeqNr) in the types package could have, in addition to the Pb() method, a Bytes() method that returns the value deterministically serialized into a slice of bytes.

Unsafe error handling

Simple WAL

The current implementation of the write-ahead log

  1. Is not properly documented
  2. Uses dependencies with vulnerabilities (tidwall/gjson)

Fix this.

Event loop flow control

Implement a mechanism that monitors the state of event buffers in the Node's main event loop and, if the buffers exceed a configured threshold, stop reading events from ActiveModules.
For more details see discussion #51 .

Remove availability-related code from ISS

Simplify the implementation of ISS to not use request retransmission (which is still not fully implemented) and instead include all requests directly in the proposed batch.
If used with an external availability layer (as is the intention, at least for now), the batches will be very small anyway and do not justify complicating the protocol by advanced availability mechanisms.

Timer module

Design and implement a Mir-native time module that would replace the explicit handling of Tick events by the protocol.

Make the request receiver an active module

The request receiver has been written before the notion of active and passive modules has been introduced.
It thus still uses the Node's event injection interface to feed events to the Node. Making it an active module would be more idiomatic.

Mempool compatible with Eudico

Replace the RequestStore abstraction (and potentially the ClientTracker as well) by a properly defined Mempool abstraction.

  • Define the new Mempool's interface (1 week)
    • Querying the mempool for new requests
    • Push-style notifications about new requests in the mempool
  • Create a Eudico-compatible implementation of the new mempool interface (2 weeks)
    • In particular, ensure per-client FIFO ordering of requests

Estimated duration: 3.5 weeks

Draft interface for integration with Eudico

The interface must specify exactly:

  • Guarantees the Eudico mempool has to provide regarding
    • Durability
    • Delivery guarantees
  • Interaction with Eudico's state
    • Applying blocks
    • semantics of snapshots
    • durability of applied blocks
  • Source of system configuration
  • Consensus protocol metadata
    • especially the "merit" assignment required for rewards

Expected duration: 2 weeks

Clean up / unify message and event format

Mir is currently using Protobufs for internal events and messages, while the libp2p transport layer is wrapping the Protobufs in CBOR. The gRPC based transport, however, is using Protobufs directly. Events and messages are also defined in a rather ad-hoc manner. This could use some cleaning up.

Finish consensus layer implementation

  • Message retransmission where necessary (0.5 weeks)
  • Instance-local checkpoints in PBFT (0.5 weeks)
  • Proposal distribution: include in PBFT Preprepare (0.5 weeks)
  • Secure crypto module (0.5 weeks)

Estimated time: 3 weeks

Preliminary performance evaluation

  • Implement (just for Mir):
    • Load generator (2 weeks)
    • Distributed deployment (2 weeks)
  • Evaluate performance (1 week)
  • gRPC vs libp2p performance comparison (optional)

Estimated time: (5 weeks)

TestIntegration/ISS/002 sporadically fails

TestIntegration/ISS/002 sporadically fails not committing the expected number of requests:

--- FAIL: TestIntegration (56.62s)
    --- FAIL: TestIntegration/ISS (56.61s)
        mir_test.go:111: Created temp dir: /tmp/TestIntegrationISS962582436/001
        mir_test.go:111: Created temp dir: /tmp/TestIntegrationISS962582436/002
        mir_test.go:111: Using deployment dir: /tmp/mirbft-deployment-test
        --- FAIL: TestIntegration/ISS/002 (4.36s)
            mir_test.go:114: 
                	Error Trace:	mir_test.go:201
                	            				mir_test.go:114
                	Error:      	Not equal: 
                	            	expected: 10
                	            	actual  : 0
                	Test:       	TestIntegration/ISS/002
            mir_test.go:114: Test failed. Saving deployment data to: failed-test-data/mirbft-deployment-test
            mir_test.go:117: Test #002 (Submit 10 fake requests with 1 node) failed
--- FAIL: TestIntegration (54.86s)
    --- FAIL: TestIntegration/ISS (54.86s)
        mir_test.go:111: Created temp dir: /tmp/TestIntegrationISS3387325571/001
        mir_test.go:111: Created temp dir: /tmp/TestIntegrationISS3387325571/002
        mir_test.go:111: Using deployment dir: /tmp/mirbft-deployment-test
        --- FAIL: TestIntegration/ISS/002 (4.01s)
            mir_test.go:114: 
                	Error Trace:	mir_test.go:201
                	            				mir_test.go:114
                	Error:      	Not equal: 
                	            	expected: 10
                	            	actual  : 8
                	Test:       	TestIntegration/ISS/002```

Implement integration testing infrastructure

The testing infrastructure should support running multiple Mir nodes in a controlled simulation of the environment that the tested components interact with, e.g. communication and local clocks. As described in #72, it should model non-determinism and allow reproducing and analyzing the test results. It should also support measuring memory consumption.

Use make for building

Standardize Makefile targets for

  • testing
  • linting
  • building
  • cleaning
  • generating code

ISS: loosely synchronize agreement with execution

Prevent ISS from advancing to a new epoch until a reasonably recent checkpoint has been established.
Otherwise, if the application execution is too slow, the agreement will "run away", agreeing on new batches faster than the application can execute them, having to remember more and more unexecuted batches.

For example, before starting the next epoch, make sure that the starting checkpoint of the current epoch already has been established (maybe even stabilized). This creates a "buffer" of one epoch for the checkpoint to be created. If the checkpoint is still not produced by the end of the epoch, advancing to the next epoch will need to wait.

Deterministic serialization of chat demo app state

When taking a snapshot of the application state, the serialization must be performed deterministically,
such that all nodes can agree on a snapshot. Currently, a protobuf message is used to serialize the chat demo app state.
The protocol buffer implementation, however, does not guarantee deterministic serialization.

While in practice (in the concrete case of the chat demo app) the state snapshot is always the same, technically this is not guaranteed.

Fix this by providing a deterministic implementation of the Snapshot() method of the chat demo app logic:
https://github.com/filecoin-project/mir/blob/946689a112983e6384d6c697c01de00d0f433d14/samples/chat-demo/app.go#L75

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.