Code Monkey home page Code Monkey logo

typhon's Introduction

Typhon

Typhon stores, orders, and executes transactions on Anoma blockchains. For more information, please see the spec.

typhon's People

Contributors

cwgoes avatar gavinly avatar heindel avatar karbyshev avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

typhon's Issues

Multi-State Machine Execution on Heterogeneous Paxos Chain

On a Heterogeneous Paxos chain, run a state machine for each "main chain," such that blocks can contain bundles of transactions on multiple chains.
Note that:

  • Not all nodes can / should execute all transactions
  • We will ultimately need some kind of language for communicating these "bundles"
  • We want to make as few assumptions as possible about the state machines from each "main chain."

Message Passing for Heterogeneous Paxos

This issue may well be combined with our existing ledger projects.

Ultimately, our Heterogeneous Paxos implementation will need to send messages (1b, 2a, etc).

  • How do we format these?
  • How do we disseminate them?
    • While Gossip is efficient in terms of per-node bandwidth and connections, just sending UDP packets to all destinations might be faster, although we'd have to be clever about filling in messages that some recipients missed.
  • what, if any, signature aggregation do we use?
    • trees of signature aggregation have in the past been fruitfully combined with gossip trees.

Message Pipelining

Pipelineing, at least as used in HottStuff, is the art of using a single message as part of multiple instances of consensus, to save on message space.

With the naive message design in the tech report, the main content of each message is a list of references to prior messages. This makes it very easy for a message to serve as part of multiple consensus instances. We do have to watch out for a couple of points:

  • Since a recipient may not care about the same set of consensuses as the sender, the recipient may no longer want to wait to receive all of the referenced messages before receiving this one. We may have to flag each reference with relevant information.
  • A sender will now send messages that are "well-formed" on at least one consensus, but not necessarily on all (they aren't really 2A or 1B or whatever). So we need to not throw those away.
  • If we introduce more elaborate signature aggregation, we need to not mess up pipelining.

Optimise the message passing

HPaxos does not use optimised message passing. We should look into optimisations such as aggregating signatures, re-using messages, etc.

BFTree were suggested as the first thing to look into for signature aggregation.

The pipelining issue #18 is also part of this issue.

Permissions for moving Objects Comment

We have to consider who is allowed to move which objects where. One way to do this is to have "move" simply be a "private method" of any object: the programmer has to specifically program in any methods that the public (all acceptors of the chain this object is lovated at?) can call to move the object. This could get messy fast.

Another possibility is to have each object specify its required safety and liveness properties, and then check with each move that these are preserved. This too could get messy. We may end up doing Information Flow Control.

Heterogeneous Narwhal English Spec

Task description

We need to formalize our version of Heterogeneous Narwhal in english, in such a way that at least other Anoma developers can tell what is supposed to happen.

A version exists on specs.anoma.net, but it's extremely technical and may not reflect the most recent design choices.

Sub-tasks

Definition of Done

  • We have a human-readable document of the mempool layer architecture on specs.anoma.net

New Proposal Timeouts

As presently designed, Heterogeneous Paxos needs time-outs before proposals are re-sent (to ensure liveness). This is equivalent to time-outs for choosing new leaders in Tendermint. How do we want to choose time-outs?

In "always-committing" chains, we can start exponentially growing time windows at the time of the last commit, and then assume they will eventually be big enough. Unfortunately, in "lazy" chains that only append blocks when there is demand, this means that if there is not demand for a while, there will be HUGE time windows by the time there is a proposal. If a round (or leader) fails, we then have to wait a long time to re-try.

Alternatively, constant time windows have a different problem: if the window is too short, we'll be timing out and re-proposing too fast to get anything done, and if it's too long, we run unnecessarily slowly.

Solutions include:

  • do the whole thing with shared randomness and probabilistic termination (not sure how to do this)
  • figure out some more clever clock sync setup that can reset time windows
  • reach some kind of consensus even when there is no demand to reset time windows
    • hopefully this work can be shared across multiple chains, or it will be unnecessarily burdensome.

Related: Tendermint's discussion on empty blocks: tendermint/tendermint#5911

Related: Jacob's Slack thread: https://heliax.slack.com/archives/C02AHU18CTG/p1637242390019000

Name brainstorming for whole project

Heterogenous Paxos is the name of the algorithm, and Chimera chains are the name of the particular feature, but we need a name for the whole project, including consensus algorithm design, proof-of-stake integration, Rust implementation, etc. Please comment with nominated name ideas and we can collectively decide in a week or so.

Probability of a bad partition

Stated generally, given N validators, of which f are faulty:

  1. If we assign each validator (uniformly and independently at random) one of K partitions, what are the odds that:
    1. no partition has more than x faulty validators?
    2. every partition has at least y non-faulty validators?
    3. no partition has a ratio of faulty validators / non-faulty validators greater than z?
    4. no partition is empty?
  2. If we choose K (possibly overlapping) subsets of validators (uniformly at random), each of size S, what are the odds that:
    1. no subset has more than x faulty validators?
    2. every subset has at least y non-faulty validators?
    3. no subset has a ratio of faulty validators / non-faulty validators greater than z?
    4. every validator is in a subset?
    5. Is there anything convenient about the special case where K×S=N?
  3. If we (uniformly at random) partition the validators into K (non-overlapping) partitions, each of size S (so K×S=N), what are the odds that:
    1. no partition has more than x faulty validators?
    2. every partition has at least y non-faulty validators?
    3. no partition has a ratio of faulty validators / non-faulty validators greater than z?
    4. what if instead of all being size S, each partition has size S or S+1, and K×S<N< K×(S+1)?

These seem like classic Balls & Bins problems, but I don't know the answers.

These probabilities could be important in calculating the odds that a randomly-selected "shard" of validators is correct.

Modify Heterogeneous Paxos for Learner Graph Changes

As written, Heterogeneous Paxos assumes a learner graph that does not change over time.
Technically, of course, we can encode all learners from all points in time in one big graph, but I don't know if that's useful.

One possibility is to simply stop running Heterogeneous Paxos Chains when the quorums of any of their "main chains" change:
Users would have to move any interesting state over to main chains, allow the quorum change to take place, and then start up a new Heterogeneous Paxos chain, and move state into it.

Another possibility is some kind of synchronized quorum shift across all chains that share a "main chain."
Whenever a transaction is approved on any chain that changes a main chain's quorums, it doesn't "take effect" for a while, and in that interim, all other chains using those quorums have to update accordingly.

Mempool Communication and Design

Task description

Arguably, we can break down Typhon into layers, which include:

  • Transaction dissemination from Clients to Storage (probably on validators, who may also be orderers)
  • Mempool Organization, wherein organized portions of the transactions are identified for consensus to decide on. (This Issue)
  • Consensus, which chooses an ever-growing sub-dag from the mempool representing "ordered" transactions.
  • Execution, wherein state "after" each transaction is calculated and published.

In order to order transactions that have been stored, we need to organize them into a partially-ordered DAG structure from which Consensus can choose an ever-growing non-conflicting closed sub-DAG. I suspect that something similar to Narwhal can help us here, but we have to adapt it for the Heterogeneous setting.

Ultimately, we want this design formalized in something like TLA+, and integrated with the Heterogeneous Paxos spec. We may even want to prove some safety or liveness (or even fairness) properties along the way.

Sub-tasks (no particular order)

  • Describe where and how Narwhal fits in Anoma's layered architecture (see figure)
  • Analyse Narwhal for the mempool layer (in depth)
  • Study an easy example of TLA+ (perhaps Paxos)
  • Document findings
  • https://github.com/anoma/specs/issues/306
  • #41
  • #42

Definition of Done

  • We have a human-readable document of the mempool layer architecture
  • We have formalized the design in TLA+

Chimera Chain Blog Post

Post to the Anoma blog a description of what we're trying to accomplish with Chimera Chains, and how those are supposed to work.

Prerequisite

  • #48
    • although perhaps this doesn't really have to be completed for the blog post to work

Sub-Tasks

  • establish outline
  • construct helpful diagrams
  • fill in explanatory text
  • internal review
  • publish

Add Layered Description to Typhon Spec

Arguably, we can break down Typhon into layers, which include:

  • Transaction dissemination from Clients to Storage (probably on validators, who may also be orderers)
  • Mempool Organization, wherein organized portions of the transactions are identified for consensus to decide on. (This Issue)
  • Consensus, which chooses an ever-growing sub-dag from the mempool representing "ordered" transactions.
  • Execution, wherein state "after" each transaction is calculated and published.

This should be explained the spec documentation.

How do we check what type of chimera chains are possible?

If we only allow 2-way chimera chains then we are limiting scalability. One way to solve this is allow more than 2-way chimera chains. Since all of these chains need to have pairwise intersections with at least 1 honest node, the question that arises is how to efficiently find such combinations of chains who fulfil this condition? This needs to be done 1 time a day (at the beginning of each epoch when validators change).

To me the easiest way is a greedy approach or to have it done offline and allow users to submit good combinations and reward them for it.

Another question is that after genesis, once, we do have already multiple active chimera chains, can we add or remove main chains (ways) to the same chimera chain without having to kill the old chain and create a new chain at the beginning of each epoch?

Coherent Typhon Formal Spec

After each component has been formalized, combine the mempool, consensus, and execution engine formal specs into a larger formal spec (in TLA+ or similar) for which we can check end-to-end properties. Ideally, this would match the english spec.

Prerequisites

Sub-Tasks

  • combine spec text together (may involve matching up different representations of stuff)
  • prove end-to-end serializability: based on proven properties of consensus and mempool, the execution engine in fact is equivalent to a serial execution
  • prove end-to-end censorship resistance

Sharding schemes over hpaxos

It would be helpful to be able to instantiate a sort of flexible sharding utilising heterogenous Paxos, where a validator set controlled by a single logical proof-of-stake system and staking token can be split across "shards" - independent consensus instances which execute in parallel, but where the validator subsets can be guaranteed to have certain amounts of overlap by the assignment logic of the logically centralised proof-of-stake module.

Ideally we would be able to craft an automatic scheduling algorithm so that transactions can use a unified programming model ("as-if-atomic") and the scheduling mechanism can produce an actual execution which is serialisable per a particular ordering but splits actual execution amongst shards efficiently (as efficiently as possible). This intersects nontrivially with Ferveo, as we want to commit to an ordering before knowing transaction contents (making scheduling considerably more difficult), but it should still be possible to run the scheduling algorithm after we know the order and transactions are decrypted - this just fixes a particular order (of serialisability) which actual execution must be semantically equivalent to.

Prior research:

Gossip Layer for Heterogeneous Paxos Transactions

Arguably, we can break down Typhon into layers, which include:

  • Transaction dissemination from Clients to Storage (probably on validators, who may also be orderers) (This Issue)
  • Mempool Organization, wherein organized portions of the transactions are identified for consensus to decide on.
  • Consensus, which chooses an ever-growing sub-dag from the mempool representing "ordered" transactions.
  • Execution, wherein state "after" each transaction is calculated and published.

This will have to incorporate / reference sub-transactions in each of the state machines involved, and efficiently disseminate transaction data to relevant parties. This gossipping should happen before consensus.

Ultimately, we want this design formalized in something like TLA+, and integrated with the Mempool Spec. We may even want to prove some safety or liveness (or even fairness) properties along the way.

Prepare and submit paper on Anoma's P2P architecture

Task description

Prepare and submit paper on Anoma's P2P Overlay architecture to a suitable conference

Steps

  • Select a suitable target conference
  • Modify current paper so that its appropriate for target conference
  • Review paper internally
  • Update paper based on feedback
  • Submit to target conference

Definition of Done

  • Paper in format of target conference
  • Reviewed by Heliax colleagues
  • Approved by Consumer

Roles

  • Producer: Naqib
  • Consumer: Isaac

Dependencies

Manual Sharding

Using our multi-chain nodes, enable cheap creation of additional chains with the same quorums as a "main chain." This would allow transactions with the same integrity as the main chain to run concurrently (for load balancing). The same programming techniques that we employ for Heterogeneous Paxos could work here.

Multi-Round Heterogeneous Paxos

Extend Heterogeneous Paxos from making a single decision to making multiple decisions, necessary to decide on successive blocks. In its simplest form, this can just run the single-decision mode over and over, but we might want to think about what, if anything, we want to keep from previous rounds. TCP channels seem like a good thing to re-use, for example.

Prerequisites

Sub-Tasks

Primitive Benchmark

Description

We should establish some way to test how "good" (throughput? latency?) Typhon is (relative to Tendermint). This means creating some kind of benchmark now that can at least measure tendermint, and then as we build prototypes, we should be able to see how well they compare.

Some features we might want include:

  • ability to test gossip/storage vs. consensus/ordering layers separately
  • ability to meaningfully calculate performance for many machines, ideally without actually using many machines.
  • ability to abstract over optimizations that Tendermint's software has made, and we can also make, but haven't yet done in our prototype (e.g. fast marshaling / unmarshaling of messages).

Sub-tasks (not in particular order)

  • find out whether there is some benchmarking tool implemented by others (e.g., Tendermint themselves) that we can use
  • determine suitable performance metrics that will allow comparison
  • determine a concrete set of behaviors that we want to benchmark (filter out unnecessary complexities)
  • determine a set of functional/non-functional requirements for the benchmarking tool
  • design the benchmarking tool
  • implement the benchmarking tool
  • perform a comparison study

Definition of Done

We have a document with a concise and clear comparison study between Tendermint (what we use for Namada) and Typhon (what we use for Anoma).

Handling objects once conditions for H-Paxos are not met anymore

Question:
Another point we need to discuss is what happens to chimera chains once the conditions dont hold anymore. Are the objects stuck there for ever? What if no one kills the chimera chain, but the conditions also have not been met for 1 months and hence there is no way to kill the chain?

Reply by Isaac:
Good question.
I suppose that, once the chimera chain has updated its quorums, users can see that its atomicity guarantee is now very bad (ex, atomicity is not guaranteed even if everyone is operating correctly).
However, users can still commit transactions to the chimera chain. What they lose is the guarantee that if they commit a "multi-state-machine" block, then if one state machine's transactions are committed, the other state machine's transactions are committed.
However, if they're only interested in committing transactions to one state machine, that would be ok.
Conveniently, the transactions necessary to move state back to a main chain are only on one state machine (the one corresponding to said main chain).
Therefore, even after atomicity guarantees are lost, we can move state back to main chains, or onto other chains with the same quorums.

Alternatively, if we had some uniform way of "moving state" from the chimera chain back to the main chain, we could automatically move any state back to the main chain as part of any quorum update that would make the chimera chain's atomicity guarantee worthless.

Modify Existing Ledger Code to Support Multiple Chains

Ultimately, we want our existing ledger node / consensus participants to be able to support multiple chains within the same program, rather than as totally separate, unrelated processes.
Specifically, we want to:

  • save on resources whenever possible to combining multiple processes
  • be able to join / leave new chains easily
  • be able to start up new chains easily

Heterogeneous Paxos Stateright Prototype

Building on the english spec, and the formal spec construct a prototype version of Heterogeneous Paxos in the Stateright framework. To begin with, we'll try to do a single instance of consensus with some kind of simple proposal scheme.

This should ultimately be integrated with a Heterogeneous Narwhal Stateright prototype and an execution engine Stateright prototype for a Typhon Stateright prototype.

Sub-Tasks

  • efficient implementation of caught
  • efficient implementation of unburied
  • rust formulation of core protocol within the actor model
  • rudimentary testing: ensure we can achieve consensus in easy cases (can be single machine)
  • model checking: can we rigorously test anything interesting in small cases? (single machine)
  • distributed test: we should be able to run the same stateright implementation across multiple machines

High-level Programming Model for "Moving State" Between Chains

Building on our languages for bounded-latency communication between chains that share quorums and atomic communication within Heterogeneous Paxos chains, implement a higher-level notion of "moving state" from one chain to another. Specifically, we want to be able to program a single object which moves to different chains depending on convenience for multi-chain atomic commits or load balancing.

Note that each move must come with some well-communicated properties concerning the safety and liveness of the destination chain. We may want to tag pieces of state and transactions with minimum safety and liveness properties, so that we can ensure that at all times, they stay on acceptable chains.

Execution Engine Stateright Prototype

Building on the english spec, construct a prototype version of the Execution Engine in the Stateright framework. This may involve some kind of "dummy" Taiga API. Ideally, this would match up with the formal spec.

This will have to match up with the Heterogeneous Paxos Stateright Prototype and the Heterogeneous Narwhal Stateright Prototype to be useful.

Prerequisites

Sub-Tasks

  • Shards
  • Executors
  • Full-Nodes?
  • Garbage collection messages

Programming API for Atomic Message Passing in Heterogeneous Paxos Chains

Implement a programming model for atomic message passing between state machines running on a Heterogeneous Paxos Chain.
This can be based on IBC, bug should provide additional atomicity guarantees.
Note that:

  • We need to communicate to the programmer exactly what the atomicity guarantees are
  • Not all nodes understand all state machines, so one node needs to prove that a message was sent with some specific content, so that the other nodes can progress, having read the message.

How is the next Chimera proposer chosen?

Round-robin and random selection are obvious choices. However, this problem would necessarily be coupled with #22, and perhaps an resolution there would allow a smarter solution to this question.

One would need a procedure select_proposer that has certain properties.

Incentive scheme for HPaxos

Some funds need to be locked for opening a chimera chain. This is mainly because open chimera chains require updating, every time quorums get updated and that can get expensive if there are many open chimera chains.

We need to figure out economics for:

  • Opening up a chimera chain
  • Killing Chimera chain
  • Rewards and slashing
  • Fees

Who can be a Chimera block proposer?

I talked to Isaac and Chris about this. Isaac opinion was that one of the block proposer needs to be honest but can be anyone from the main chains and not only the acceptors in the overlaps of the quorums if we are ok with loosing liveness.

Chris and I however are not sure about this. Chris brought up that there will be a problem with data availability. I think there is also a problem with delaying the consensus on the main chains if we allow this. If there is a fork in the chimera chain caused by another non-overlapping acceptors from the other main chain creating a forking chimera block, only the acceptors in the overlap would know about it and blocks in A or B can build on these. It will be live (but just delayed): Heterogenous Paxos guarantees that somehow, but it will result in the validator in the quorum overlaps having to vote blocks down that everyone else thinks are ok.

Moreover, if we allow all acceptors to produce blocks networking becomes more critical.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.