anoma / typhon Goto Github PK

View Code? Open in Web Editor NEW

26.0 26.0 3.0 210.61 MB

Formalization of Typhon protocol

Home Page: https://specs.anoma.net/main/architecture/consensus/typhon.html

TLA 96.11% Rust 3.89%

blockchain consensus p2p distributed-systems paxos

typhon's Introduction

Typhon

Typhon stores, orders, and executes transactions on Anoma blockchains. For more information, please see the spec.

typhon's People

Contributors

Stargazers

Watchers

Forkers

gavinly deiver1056 sskav7

typhon's Issues

Multi-State Machine Execution on Heterogeneous Paxos Chain

On a Heterogeneous Paxos chain, run a state machine for each "main chain," such that blocks can contain bundles of transactions on multiple chains.
Note that:

Not all nodes can / should execute all transactions
We will ultimately need some kind of language for communicating these "bundles"
We want to make as few assumptions as possible about the state machines from each "main chain."

Prove Heterogeneous Paxos Agreement in IVY or TLA+

Using the Heterogeneous Paxos Spec developed earlier, formalize and prove heterogeneous agreement in Ivy or TLA+.

We should also prove non-triviality, but that should be very easy.

Message Passing for Heterogeneous Paxos

This issue may well be combined with our existing ledger projects.

Ultimately, our Heterogeneous Paxos implementation will need to send messages (1b, 2a, etc).

How do we format these?
How do we disseminate them?
- While Gossip is efficient in terms of per-node bandwidth and connections, just sending UDP packets to all destinations might be faster, although we'd have to be clever about filling in messages that some recipients missed.
what, if any, signature aggregation do we use?
- trees of signature aggregation have in the past been fruitfully combined with gossip trees.

Chimera state machines, block creation, gossiping around transactions, matchmaking

This is more of an engineering issue. Things that need to be figured out:

Transaction encoding
Matchmaking
Communication between acceptors that are not on the overlapping set

Sub-Tasks

Message Pipelining

Pipelineing, at least as used in HottStuff, is the art of using a single message as part of multiple instances of consensus, to save on message space.

With the naive message design in the tech report, the main content of each message is a list of references to prior messages. This makes it very easy for a message to serve as part of multiple consensus instances. We do have to watch out for a couple of points:

Since a recipient may not care about the same set of consensuses as the sender, the recipient may no longer want to wait to receive all of the referenced messages before receiving this one. We may have to flag each reference with relevant information.
A sender will now send messages that are "well-formed" on at least one consensus, but not necessarily on all (they aren't really 2A or 1B or whatever). So we need to not throw those away.
If we introduce more elaborate signature aggregation, we need to not mess up pipelining.

Complete BPCon Formalization in IVY

Formalize the BPCon algorithm in IVY, and try to get the agreement invariant proven.

Optimise the message passing

HPaxos does not use optimised message passing. We should look into optimisations such as aggregating signatures, re-using messages, etc.

BFTree were suggested as the first thing to look into for signature aggregation.

The pipelining issue #18 is also part of this issue.

Permissions for moving Objects Comment

We have to consider who is allowed to move which objects where. One way to do this is to have "move" simply be a "private method" of any object: the programmer has to specifically program in any methods that the public (all acceptors of the chain this object is lovated at?) can call to move the object. This could get messy fast.

Another possibility is to have each object specify its required safety and liveness properties, and then check with each move that these are preserved. This too could get messy. We may end up doing Information Flow Control.

Execution Engine Formal Spec

Develop a formal specification for the execution engine in TLA+ or similar. This will involve some kind of assumptions about the Taiga API. Ideally this would match up with the formal Heterogeneous Paxos Spec and formal Heterogeneous Narwhal Spec, so we can formalize how they work together.

Prerequisite

https://github.com/anoma/specs/issues/123

Sub-Tasks

Formal description of Execution Engine
Serializability Proof

Heterogeneous Narwhal English Spec

Task description

We need to formalize our version of Heterogeneous Narwhal in english, in such a way that at least other Anoma developers can tell what is supposed to happen.

A version exists on specs.anoma.net, but it's extremely technical and may not reflect the most recent design choices.

Sub-tasks

https://github.com/anoma/specs/issues/306
update specs.anoma.net with similar content to the TeX version.

Definition of Done

We have a human-readable document of the mempool layer architecture on specs.anoma.net

Coherent Typhon English Spec

After drafting each of its components, revise the Typhon English Spec so all the components match up properly, and it is, in a sense, complete. Ideally, this would match up with the relevant formal specs and prototypes.

Prerequisites

Sub-Tasks

Add comments to HPaxos TLA Spec

Add some human-readable comments to the spec part of HPaxos so that anyone can tell what's going on.

New Proposal Timeouts

As presently designed, Heterogeneous Paxos needs time-outs before proposals are re-sent (to ensure liveness). This is equivalent to time-outs for choosing new leaders in Tendermint. How do we want to choose time-outs?

In "always-committing" chains, we can start exponentially growing time windows at the time of the last commit, and then assume they will eventually be big enough. Unfortunately, in "lazy" chains that only append blocks when there is demand, this means that if there is not demand for a while, there will be HUGE time windows by the time there is a proposal. If a round (or leader) fails, we then have to wait a long time to re-try.

Alternatively, constant time windows have a different problem: if the window is too short, we'll be timing out and re-proposing too fast to get anything done, and if it's too long, we run unnecessarily slowly.

Solutions include:

do the whole thing with shared randomness and probabilistic termination (not sure how to do this)
figure out some more clever clock sync setup that can reset time windows
reach some kind of consensus even when there is no demand to reset time windows
- hopefully this work can be shared across multiple chains, or it will be unnecessarily burdensome.

Related: Tendermint's discussion on empty blocks: tendermint/tendermint#5911

Related: Jacob's Slack thread: https://heliax.slack.com/archives/C02AHU18CTG/p1637242390019000

Name brainstorming for whole project

Heterogenous Paxos is the name of the algorithm, and Chimera chains are the name of the particular feature, but we need a name for the whole project, including consensus algorithm design, proof-of-stake integration, Rust implementation, etc. Please comment with nominated name ideas and we can collectively decide in a week or so.

Probability of a bad partition

Stated generally, given N validators, of which f are faulty:

If we assign each validator (uniformly and independently at random) one of K partitions, what are the odds that:
1. no partition has more than x faulty validators?
2. every partition has at least y non-faulty validators?
3. no partition has a ratio of faulty validators / non-faulty validators greater than z?
4. no partition is empty?
If we choose K (possibly overlapping) subsets of validators (uniformly at random), each of size S, what are the odds that:
1. no subset has more than x faulty validators?
2. every subset has at least y non-faulty validators?
3. no subset has a ratio of faulty validators / non-faulty validators greater than z?
4. every validator is in a subset?
5. Is there anything convenient about the special case where K×S=N?
If we (uniformly at random) partition the validators into K (non-overlapping) partitions, each of size S (so K×S=N), what are the odds that:
1. no partition has more than x faulty validators?
2. every partition has at least y non-faulty validators?
3. no partition has a ratio of faulty validators / non-faulty validators greater than z?
4. what if instead of all being size S, each partition has size S or S+1, and K×S<N< K×(S+1)?

These seem like classic Balls & Bins problems, but I don't know the answers.

These probabilities could be important in calculating the odds that a randomly-selected "shard" of validators is correct.

Come up with a validator selection algorithm that helps conditions for HPaxos to be met

Modify Heterogeneous Paxos for Learner Graph Changes

As written, Heterogeneous Paxos assumes a learner graph that does not change over time.
Technically, of course, we can encode all learners from all points in time in one big graph, but I don't know if that's useful.

One possibility is to simply stop running Heterogeneous Paxos Chains when the quorums of any of their "main chains" change:
Users would have to move any interesting state over to main chains, allow the quorum change to take place, and then start up a new Heterogeneous Paxos chain, and move state into it.

Another possibility is some kind of synchronized quorum shift across all chains that share a "main chain."
Whenever a transaction is approved on any chain that changes a main chain's quorums, it doesn't "take effect" for a while, and in that interim, all other chains using those quorums have to update accordingly.

Mempool Communication and Design

Task description

Arguably, we can break down Typhon into layers, which include:

Transaction dissemination from Clients to Storage (probably on validators, who may also be orderers)
Mempool Organization, wherein organized portions of the transactions are identified for consensus to decide on. (This Issue)
Consensus, which chooses an ever-growing sub-dag from the mempool representing "ordered" transactions.
Execution, wherein state "after" each transaction is calculated and published.

In order to order transactions that have been stored, we need to organize them into a partially-ordered DAG structure from which Consensus can choose an ever-growing non-conflicting closed sub-DAG. I suspect that something similar to Narwhal can help us here, but we have to adapt it for the Heterogeneous setting.

Ultimately, we want this design formalized in something like TLA+, and integrated with the Heterogeneous Paxos spec. We may even want to prove some safety or liveness (or even fairness) properties along the way.

Sub-tasks (no particular order)

Describe where and how Narwhal fits in Anoma's layered architecture (see figure)
Analyse Narwhal for the mempool layer (in depth)
Study an easy example of TLA+ (perhaps Paxos)
Document findings
https://github.com/anoma/specs/issues/306
#41
#42

Definition of Done

We have a human-readable document of the mempool layer architecture
We have formalized the design in TLA+

Chimera Chain Blog Post

Post to the Anoma blog a description of what we're trying to accomplish with Chimera Chains, and how those are supposed to work.

Prerequisite

#48
- although perhaps this doesn't really have to be completed for the blog post to work

Sub-Tasks

Add Layered Description to Typhon Spec

Arguably, we can break down Typhon into layers, which include:

Transaction dissemination from Clients to Storage (probably on validators, who may also be orderers)
Mempool Organization, wherein organized portions of the transactions are identified for consensus to decide on. (This Issue)
Consensus, which chooses an ever-growing sub-dag from the mempool representing "ordered" transactions.
Execution, wherein state "after" each transaction is calculated and published.

This should be explained the spec documentation.

How do we check what type of chimera chains are possible?

If we only allow 2-way chimera chains then we are limiting scalability. One way to solve this is allow more than 2-way chimera chains. Since all of these chains need to have pairwise intersections with at least 1 honest node, the question that arises is how to efficiently find such combinations of chains who fulfil this condition? This needs to be done 1 time a day (at the beginning of each epoch when validators change).

To me the easiest way is a greedy approach or to have it done offline and allow users to submit good combinations and reward them for it.

Another question is that after genesis, once, we do have already multiple active chimera chains, can we add or remove main chains (ways) to the same chimera chain without having to kill the old chain and create a new chain at the beginning of each epoch?

Coherent Typhon Formal Spec

After each component has been formalized, combine the mempool, consensus, and execution engine formal specs into a larger formal spec (in TLA+ or similar) for which we can check end-to-end properties. Ideally, this would match the english spec.

Prerequisites

Sub-Tasks

combine spec text together (may involve matching up different representations of stuff)
prove end-to-end serializability: based on proven properties of consensus and mempool, the execution engine in fact is equivalent to a serial execution
prove end-to-end censorship resistance

Sharding schemes over hpaxos

It would be helpful to be able to instantiate a sort of flexible sharding utilising heterogenous Paxos, where a validator set controlled by a single logical proof-of-stake system and staking token can be split across "shards" - independent consensus instances which execute in parallel, but where the validator subsets can be guaranteed to have certain amounts of overlap by the assignment logic of the logically centralised proof-of-stake module.

Ideally we would be able to craft an automatic scheduling algorithm so that transactions can use a unified programming model ("as-if-atomic") and the scheduling mechanism can produce an actual execution which is serialisable per a particular ordering but splits actual execution amongst shards efficiently (as efficiently as possible). This intersects nontrivially with Ferveo, as we want to commit to an ordering before knowing transaction contents (making scheduling considerably more difficult), but it should still be possible to run the scheduling algorithm after we know the order and transactions are decrypted - this just fixes a particular order (of serialisability) which actual execution must be semantically equivalent to.

Prior research:

https://arxiv.org/pdf/2107.07297.pdf

Gossip Layer for Heterogeneous Paxos Transactions

Arguably, we can break down Typhon into layers, which include:

Transaction dissemination from Clients to Storage (probably on validators, who may also be orderers) (This Issue)
Mempool Organization, wherein organized portions of the transactions are identified for consensus to decide on.
Consensus, which chooses an ever-growing sub-dag from the mempool representing "ordered" transactions.
Execution, wherein state "after" each transaction is calculated and published.

This will have to incorporate / reference sub-transactions in each of the state machines involved, and efficiently disseminate transaction data to relevant parties. This gossipping should happen before consensus.

Ultimately, we want this design formalized in something like TLA+, and integrated with the Mempool Spec. We may even want to prove some safety or liveness (or even fairness) properties along the way.

Heterogeneous Narwhal Stateright Prototype

Building on the english spec, construct a prototype version of Heterogeneous Narwhal in the Stateright framework. This may end up being integrated with a formal spec.

This should ultimately be integrated with a Heterogeneous Paxos Stateright prototype and an execution engine Stateright prototype for a Typhon Stateright prototype.

Prerequisites

Chimera Chain English Spec

Properly describe chimera chains within the typhon english language spec. This means putting together consensus in multiple rounds, and connecting mempool, consensus, and execution layers.

Prerequisites

Sub-Tasks

Proposer selection protocol for multi-decision Heterogeneous Paxos
Chimera State machine
protocol for moving resources between chimera chains / base chains

Prepare and submit paper on Anoma's P2P architecture

Task description

Prepare and submit paper on Anoma's P2P Overlay architecture to a suitable conference

Steps

Select a suitable target conference
Modify current paper so that its appropriate for target conference
Review paper internally
Update paper based on feedback
Submit to target conference

Definition of Done

Paper in format of target conference
Reviewed by Heliax colleagues
Approved by Consumer

Roles

Producer: Naqib
Consumer: Isaac

Dependencies

Task #43

Manual Sharding

Using our multi-chain nodes, enable cheap creation of additional chains with the same quorums as a "main chain." This would allow transactions with the same integrity as the main chain to run concurrently (for load balancing). The same programming techniques that we employ for Heterogeneous Paxos could work here.

Test test

Test

Test2

Modify Heterogeneous Paxos for Blockchains

This entails several sub-problems, including:

Ultimately, we'll want an implementation of Heterogeneous Paxos that can run a blockchain state machine.

Multi-Round Heterogeneous Paxos

Extend Heterogeneous Paxos from making a single decision to making multiple decisions, necessary to decide on successive blocks. In its simplest form, this can just run the single-decision mode over and over, but we might want to think about what, if anything, we want to keep from previous rounds. TCP channels seem like a good thing to re-use, for example.

Prerequisites

Sub-Tasks

Choose a proposer strategy: how do we choose who proposes, and what time-outs do we use?
English Spec
Formal Spec
Prototype

Create spec document for new Ivy formalization of HPaxos

IBC Communication Between Heterogeneous Paxos Chains and Main Chains With Bounded Latency

When our nodes can simultaneously run main chains and Heterogeneous Paxos Chains between them, we want to communicate between chains easily, and with bounded latency (as measured in blocks). Since nodes participating in Heterogeneous Consensus are also participating in main chain consensus, this should be possible.
We should probably implement this using IBC bridges.

Primitive Benchmark

Description

We should establish some way to test how "good" (throughput? latency?) Typhon is (relative to Tendermint). This means creating some kind of benchmark now that can at least measure tendermint, and then as we build prototypes, we should be able to see how well they compare.

Some features we might want include:

ability to test gossip/storage vs. consensus/ordering layers separately
ability to meaningfully calculate performance for many machines, ideally without actually using many machines.
ability to abstract over optimizations that Tendermint's software has made, and we can also make, but haven't yet done in our prototype (e.g. fast marshaling / unmarshaling of messages).

Sub-tasks (not in particular order)

find out whether there is some benchmarking tool implemented by others (e.g., Tendermint themselves) that we can use
determine suitable performance metrics that will allow comparison
determine a concrete set of behaviors that we want to benchmark (filter out unnecessary complexities)
determine a set of functional/non-functional requirements for the benchmarking tool
design the benchmarking tool
implement the benchmarking tool
perform a comparison study

Definition of Done

We have a document with a concise and clear comparison study between Tendermint (what we use for Namada) and Typhon (what we use for Anoma).

Handling objects once conditions for H-Paxos are not met anymore

Question:
Another point we need to discuss is what happens to chimera chains once the conditions dont hold anymore. Are the objects stuck there for ever? What if no one kills the chimera chain, but the conditions also have not been met for 1 months and hence there is no way to kill the chain?

Reply by Isaac:
Good question.
I suppose that, once the chimera chain has updated its quorums, users can see that its atomicity guarantee is now very bad (ex, atomicity is not guaranteed even if everyone is operating correctly).
However, users can still commit transactions to the chimera chain. What they lose is the guarantee that if they commit a "multi-state-machine" block, then if one state machine's transactions are committed, the other state machine's transactions are committed.
However, if they're only interested in committing transactions to one state machine, that would be ok.
Conveniently, the transactions necessary to move state back to a main chain are only on one state machine (the one corresponding to said main chain).
Therefore, even after atomicity guarantees are lost, we can move state back to main chains, or onto other chains with the same quorums.

Alternatively, if we had some uniform way of "moving state" from the chimera chain back to the main chain, we could automatically move any state back to the main chain as part of any quorum update that would make the chimera chain's atomicity guarantee worthless.

dependency

Heterogeneous Narwhal Formal Spec

Develop a formal specification for Heterogeneous Narwhal. This should reflect the English Spec in a language such as TLA+ where we can prove things. Ideally this would match up with the formal Heterogeneous Paxos Spec and the formal Execution Engine Spec so we can formalize how they work together.

Prerequisite

Modify Existing Ledger Code to Support Multiple Chains

Ultimately, we want our existing ledger node / consensus participants to be able to support multiple chains within the same program, rather than as totally separate, unrelated processes.
Specifically, we want to:

save on resources whenever possible to combining multiple processes
be able to join / leave new chains easily
be able to start up new chains easily

Heterogeneous Paxos Stateright Prototype

Building on the english spec, and the formal spec construct a prototype version of Heterogeneous Paxos in the Stateright framework. To begin with, we'll try to do a single instance of consensus with some kind of simple proposal scheme.

This should ultimately be integrated with a Heterogeneous Narwhal Stateright prototype and an execution engine Stateright prototype for a Typhon Stateright prototype.

Sub-Tasks

efficient implementation of caught
efficient implementation of unburied
rust formulation of core protocol within the actor model
rudimentary testing: ensure we can achieve consensus in easy cases (can be single machine)
model checking: can we rigorously test anything interesting in small cases? (single machine)
distributed test: we should be able to run the same stateright implementation across multiple machines

Start written specification for heterogenous Paxos Tendermint-flavoured

Specification document HackMD

Prior discussions:

Complete Heterogeneous Paxos Formalization in IVY or TLA+

Complete a formal spec for Heterogeneous Paxos in Ivy or TLA+.

This will likely be based on the BPCon Formalization, but with the addition of a Learner Graph and modifications for heterogeneous learners.

Implement Cheap Heterogeneous Paxos Blockchain Creation

When our nodes can run multiple chains, and we have a spec of Heterogeneous Paxos modified for blockchains, we want the ability to "spin-up" new Heterogeneous Paxos quickly an easily.

clients should be able to request starting a new chain easily
these chains should add new blocks only as necessary, and use few resources when they have no transactions to commit

High-level Programming Model for "Moving State" Between Chains

Building on our languages for bounded-latency communication between chains that share quorums and atomic communication within Heterogeneous Paxos chains, implement a higher-level notion of "moving state" from one chain to another. Specifically, we want to be able to program a single object which moves to different chains depending on convenience for multi-chain atomic commits or load balancing.

Note that each move must come with some well-communicated properties concerning the safety and liveness of the destination chain. We may want to tag pieces of state and transactions with minimum safety and liveness properties, so that we can ensure that at all times, they stay on acceptable chains.

Execution Engine Stateright Prototype

Building on the english spec, construct a prototype version of the Execution Engine in the Stateright framework. This may involve some kind of "dummy" Taiga API. Ideally, this would match up with the formal spec.

This will have to match up with the Heterogeneous Paxos Stateright Prototype and the Heterogeneous Narwhal Stateright Prototype to be useful.

Prerequisites

Sub-Tasks

Shards
Executors
Full-Nodes?
Garbage collection messages

Programming API for Atomic Message Passing in Heterogeneous Paxos Chains

Implement a programming model for atomic message passing between state machines running on a Heterogeneous Paxos Chain.
This can be based on IBC, bug should provide additional atomicity guarantees.
Note that:

We need to communicate to the programmer exactly what the atomicity guarantees are
Not all nodes understand all state machines, so one node needs to prove that a message was sent with some specific content, so that the other nodes can progress, having read the message.

How is the next Chimera proposer chosen?

Round-robin and random selection are obvious choices. However, this problem would necessarily be coupled with #22, and perhaps an resolution there would allow a smarter solution to this question.

One would need a procedure select_proposer that has certain properties.

Incentive scheme for HPaxos

Some funds need to be locked for opening a chimera chain. This is mainly because open chimera chains require updating, every time quorums get updated and that can get expensive if there are many open chimera chains.

We need to figure out economics for:

Opening up a chimera chain
Killing Chimera chain
Rewards and slashing
Fees

Prove Heterogeneous Paxos Termination

Using the Heterogeneous Paxos Spec developed earlier, can we formalize and prove heterogeneous termination in Ivy or TLA+?

Who can be a Chimera block proposer?

I talked to Isaac and Chris about this. Isaac opinion was that one of the block proposer needs to be honest but can be anyone from the main chains and not only the acceptors in the overlaps of the quorums if we are ok with loosing liveness.

Chris and I however are not sure about this. Chris brought up that there will be a problem with data availability. I think there is also a problem with delaying the consensus on the main chains if we allow this. If there is a fork in the chimera chain caused by another non-overlapping acceptors from the other main chain creating a forking chimera block, only the acceptors in the overlap would know about it and blocks in A or B can build on these. It will be live (but just delayed): Heterogenous Paxos guarantees that somehow, but it will result in the validator in the quorum overlaps having to vote blocks down that everyone else thinks are ok.

Moreover, if we allow all acceptors to produce blocks networking becomes more critical.

anoma / typhon Goto Github PK

typhon's Introduction

Typhon

typhon's People

Contributors

Stargazers

Watchers

Forkers

typhon's Issues

Sub-Tasks

Prerequisite

Sub-Tasks

Task description

Sub-tasks

Definition of Done

Prerequisites

Sub-Tasks

Task description

Sub-tasks (no particular order)

Definition of Done

Prerequisite

Sub-Tasks

Prerequisites

Sub-Tasks

Prerequisites

Prerequisites

Sub-Tasks

Task description

Steps

Definition of Done

Roles

Dependencies

Prerequisites

Sub-Tasks

Description

Sub-tasks (not in particular order)

Definition of Done

Prerequisite

Sub-Tasks

Prerequisites

Sub-Tasks

Recommend Projects

Recommend Topics

Recommend Org