mempool/p2p: Research implications of peer disconnect based on ResponseCheckTx

Original issue: tendermint/tendermint#9546

As mentioned in tendermint/tendermint#7918 and tendermint/tendermint#2185, it might be of interest to disconnect from a peer when receiving transactions that could never have been valid.

Before implementing this behaviour we want to look into more details of such a behaviour. Namely:

The use case where a node sends transactions that could never have been valid;
Implications on the security and correctness model of Tendermint (how does the application know that it is this particular peer that should be removed, is this scenario not covered already by other layers in Tendermint, etc.);
What other changes in other parts of Tendermint should the implementation of this require;
If solution/problem still seem valuable, propose an implementation path taking into account potential need to refactor and change the p2p layer itself to support this.

Monitoring/logging improvements

Was: tendermint/tendermint#9076

It must be easy to profile/understand “what is my node currently doing?”, and "where my bandwidth is going?"

Bandwidth optimization

Inspired by tendermint/tendermint#9880 but expanded here.

High-level tracking issue for general bandwidth optimization efforts.

CometBFT-based networks consume large amounts of bandwidth.
This results in high operational costs for the network operators (typically, the validators).
We need to reduce the bandwidth consumption of the solution while maintaining its core properties, in terms of correctness, progress and fault-tolerance.

Several strategies to reduce the bandwidth usage in CometBFT have been proposed by CometBFT developers and community members.
While some of these ideas are simple to implement, others require much larger efforts, implying impactful modifications on the protocol.
This results in a dilemma when looking for a solution. We also note that there is currently no general strategy to validate a solution, whether it would be through local tests, or by collecting information from the operators.
A third concern comes from the fact that while there are clear quick wins, it seems also necessary to prepare a long term effort.
The subject of improving bandwidth usage in peer-to-peer systems is vast, with decades of efforts in research and practical solutions.

This issue tracks our efforts in addressing a large spectrum of approaches to reduce bandwidth usage in CometBFT.
We prioritize the work based on their impact on use cases, on the effort needed to implement it, and on how it helps pave the way for future improvements.
The main problem is subdivided as listed next. Note that some of the problems/solutions might overlap.

Preliminary work

Beta Give feedback

Refactor the mempool to prepare for future improvements #1048

P:bandwidth-optimization mempool
Consolidate existing mempool specs and documentation #1076

P:bandwidth-optimization mempool spec
Write an abstract description of the mempool propagation protocol #1081

P:bandwidth-optimization mempool
Options

Short term goals

Beta Give feedback

Analyze, prototype and experiment easily applicable solutions to reduce bandwidth usage #1058

P:bandwidth-optimization mempool tracking
Understand use cases blocked by high BW usage and what would take to unblock them #1060

1 of 3

P:bandwidth-optimization consensus mempool
Add support for fast prototyping #1059

6 of 12

P:bandwidth-optimization e2e tracking
Improve telemetry of bandwidth usage #1061

1 of 5

P:bandwidth-optimization e2e
Options

Long term goals

Beta Give feedback

Investigate how to optimize transaction propagation #11

P:bandwidth-optimization
Support for the CAT (push-pull gossip) mempool #2027

enhancement mempool
Feasibility study to replace the current gossip protocol
Options

Previous efforts

Investigate what kind of data is consuming the most bandwidth
- tendermint/tendermint#9706 (Rendered RFC)
- #26
- #734
  -Changes from tendermint/tendermint#9760, released in v0.34.25, testing for backwards compatibility in tendermint/tendermint#9928

Related, but not covered by this issue.

Investigate protocol change options for more substantial bandwidth optimization
- #13
- #14

References

tendermint/tendermint#9575 (efforts toward bandwidth improvements in Q4-2022)

State sync from local snapshot

As this issue contained tasks that take more than 2 weeks collectively, it has been split into two parts:

#28
#29

Original issue: tendermint/tendermint#9946

Execute on the v0.38 release QA plan and capture the results

Tasks:

Capture v0.37 baseline metrics (Optional, to do only if existing v0.37 metrics can't be reused)
Improve files with intructions as the tests are run
- #878
qa: Capture v0.38 pre-release metrics
- #841
- #842
#843
Validate that there's no meaningful, substantial regression from v0.37 to v0.38
- Fix all major issues found

DoD:

Report is reviewed and merged
No major bugs standing on v0.38.x

Original issue: tendermint/tendermint#9921

State sync from local snapshot: Understand problem, explore solutions

Tasks:

Analyze and understand the problem ADR083 is trying to address
Gauge users' interest in solving this problem (how painful it is not to solve it?)
Explore all possible solutions, among them ADR083 (tendermint/tendermint#9651)
Adapt ADR083 according to the solution decided, finalize and merge it

DoD:
- We have made sure that users want the problem solved
- We have decided on a solution, and implemented it
- The ADR is merged

Original issue: tendermint/tendermint#9947

spec: define syncing protocols

Was tendermint/tendermint#8219

Summary

Our current spec has no mention for the syncing protocols that CometBFT uses: block sync and state sync. It would be good to write one up these so people can use it to quickly understand how each of them work.

Block sync specification (@jmalicevic)
State sync specification (@cmwaters)

ABCI++ vote extensions & `FinalizeBlock`

This is the continuation of the work started in tendermint/tendermint#9053: backport ABCI++ implementation, which lives in the v0.36.x branch.
Once this tracking issue is completed, the whole ABCI++ interface should be ready for QA process, and then release.

Proposed Path to Done. Summary

The main idea behind this plan is to proceed in the same way as tendermint/tendermint#9053.
The main approach is:

We will "follow" our work on v0.36.x. By "following a PR", I mean looking at that PR (similar work done for v0.36.x) and copying over only what makes sense. "git cherry-pick" turned out to be our best friend for "following" PRs in tendermint/tendermint#9053
We will be considering only commits related to vote extensions and FinalizeBlock
We will be skipping PrepareProposal/ProcessProposal related commits (as they are already in main via tendermint/tendermint#9053)

The work is structured very similarly to #9053. It consists of two main parts (further described below):

(1) Preliminary work
(2) Core feature work

(1) and (2) can mostly proceed in parallel.

protobufs will be managed as in v0.37.x → They will evolve with the code

Proposed Path to Done. Details

(1) Preliminary Work

We could have two threads here, but as all these tasks serialized are shorter than the critical path in "Core Feature Work", there's no point

Create feature branch: feature/abci++vef
Restore CI on feature/abci++vef branch
e2e tests: port any improvements not included in the core work section (git diff HEAD..v0.36.x -- test/e2e) (tracking: tendermint/tendermint#9426)

(2) Core feature work

We have three threads that can proceed in parallel:

Vote extensions (*)
FinalizeBlock (*)
Spec and doc work

(*) Only when vote extensions and FinalizeBlock are done, can we proceed with the fourth part:

Final adjustments
- tasks here could be further parallelized

Vote Extensions

Old vote extensions
- Follow tendermint/tendermint#6646, then tendermint/tendermint#6885
- Double-check integration of PrepareProposal -- vote extensions
  - Check if there is something missing from tendermint/tendermint#7821 (or tendermint/tendermint#6915)
tendermint/tendermint#9835
- First cherry-pick tendermint/tendermint#8031 (ignore spec part),
- Main part: follow tendermint/tendermint#8141
tendermint/tendermint#9846
- follow tendermint/tendermint#8402
Vote extension propagation tendermint/tendermint#9852
- follow tendermint/tendermint#8433
tendermint/tendermint#9861
- Add activation logic: follow tendermint/tendermint#8547
tendermint/tendermint#9864
- follow tendermint/tendermint#8587
tendermint/tendermint#9887

`FinalizeBlock`

tendermint/tendermint#9427 [Estimation: 4 days]
- Follow tendermint/tendermint#7798 (it's bulky, and might depend on other infra PRs, if yes cherry-pick them first)
Sync FinalizeBlock with spec
- Description: follow tendermint/tendermint#7983

Final adjustments

Most of these tasks can be done in parallel, but they are quite short.

#10

Spec and doc work

Low risk, can proceed completely in parallel with the rest

#24
#25

Original issue: tendermint/tendermint#9396

Establish the baseline and future requirements for the storage backend

Was tendermint/tendermint#9750

In order to simplify CometBFT's storage mechanism, we need to first establish what typical usage patterns look like. Given that we intend on offloading event indexing, this effort should focus on understanding and gathering data on what the storage workloads look like for specific use cases. This is a tracking issue for everything that needs to be completed in order to end up with a set of requirements for a database that will serve as CometBFT's storage mechanism.

The requirements can only be established if we have a testing setup with which we can measure the current performance and bottlenecks. This setup will also serve as a baseline to measure any improvements done to the storage layer.

Write tests and build a testing environment to test the storage layer
Understand the workload: What data are we storing, in what format
#46
#67

ABCI v3: Enabling new use cases for application developers

High-level tracking issue for multi-year efforts toward enabling new use cases for application developers.

The ABCI model has been quite successful thus far, but suffers from some limitations that we believe could be overcome by ABCI 2.0 (a.k.a. ABCI++).

Original issue: tendermint/tendermint#9886

Improve QA infra to test upgrades on e2e/testnets

Was: tendermint/tendermint#9937

We need to test version upgrading (minor and major) either in our e2e tests, or in our testnet infra (or in both)

DoD:

We are able to easily test upgrading Tendermint between two consecutive minor, or major versions

Related issue: tendermint/tendermint#8653

abci++: Define and implement pruning strategy for stored extended commits

Was tendermint/tendermint#8458

As per RFC 017 (tendermint/tendermint#8317), we are going to be storing all extended commits in the block store, which will contain vote extensions and vote extension signatures. This could take up a substantial amount of disk space for large vote extensions and large validator sets.

We need to define and implement an optimal pruning strategy in order to minimize the operational impact of storing extended commits while still maintaining overall network integrity.

State sync from local snapshot: Understand problem, explore solutions

Tasks:

Analyze and understand the problem ADR083 is trying to address
Gauge users' interest in solving this problem (how painful it is not to solve it?)
Explore all possible solutions, among them ADR083 (tendermint/tendermint#9651)
Adapt ADR083 according to the solution decided, finalize and merge it

DoD:

We have made sure that users want the problem solved
We have decided on a solution, and implemented it
The ADR is merged

Original issue: tendermint/tendermint#9947

QA runner improvements

Overview

This issue outlines a set of improvements that should be taken within the near term and longer term to allow the core team to run QA tests more quickly and with less effort.

During the release of Tendermint version v0.37.x, we executed steps of the new release process outlined in RELEASES.md to ensure there was not a clear regression in the quality of the software. The steps performed were quite manual, requiring the operator to run a series of scripts from their local machine to setup the instances, generate the configuration files, start the processes, run the load, and capture the results. After running this process once we have demonstrated that large scale testnets on virtual machines are a reasonable way to test Tendermint and we have learned a lot about how to orchestrate a large Tendermint network. We should improve this process to reduce the amount of effort required to run the QA process and capture the results.

Near term improvements

This section suggests a set of changes that should be implemented within the next 1-1.5 quarters. These changes largely comprise migrating logic from scripts in the tendermint-testnet repository into the Tendermint e2e test runner that performs a similar set of functionality using docker instances on a local network. The logic, as implemented in the testnet repository, is written as a set of shell scripts and ansible playbooks that are not very portable, not tolerant to transient failures in the network and digital ocean API, and are difficult add functionality too due to their already large degree of complexity.

Runner generates the network configuration files

Currently, a bash script and ansible playbook create the set of Tendermint configuration files for the test network and copy them to the testnet machines. This logic can and should be moved to the e2e runner. The runner already is used for most of the testnet configuration generation with the bash script just updating a few config values and the IP addresses so that they match those from the Digital Ocean infrastructure.

Runner adds the load to the network

The e2e test runner currently generates load for the e2e tests. This logic could be extended to generate transactions for the release testnets.

The release testnets require a transaction data format that is more specific than what the nightly tests currently use. The data format can be ported over to be used by the e2e runner. Additional work will be needed to incorporate the tm-loadtest periodic load generation logic into the runner.

Runner starts and stops the processes

The nightly e2e runner currently starts and stops the Tendermint docker instances during the nightly tests. This logic can be adapted to start and stop the Tendermint process on remote nodes during a large testnet running on many machines.

Runner retrieves the data

Currently, retrieving the Tendermint blockstore and the prometheus data captured during the large scale testnet is a manual process performed with a pair of ansible playbooks 1 2. The data is collected by the network operator upon completion of the test. The data is then manually uploaded to Digital Ocean storage.

This procedure can be automated and combined into the runner process. Upon completion of the test, the runner can fetch the blockstore and the prometheus database and automatically upload them to Digital Ocean, either by placing them onto a mounted drive that is intended for reuse, or by uploading them directly to a Digital Ocean 'space'.

Long term improvements

Runner manages the infrastructure

In the long term, the runner should be improved to directly manage the infrastructure running the testnet. This means the runner, running on a single DO instance, should be updated to able to spawn and destroy all of the necessary droplets.

Managing a fleet of infrastructure is complex and existing tools and practices like Terraform run from the command line have many advantages. Terraform implements a declarative syntax, idempotent requests for resource creation, and has built in definitions for many Digital Ocean resource types already.

A future version of the runner should be augmented to perform the role of resource creation and destruction without operator intervention. This would need to be carefully done so as to avoid any possible scenarios where the tool provisions too many resources or fails to destroy resources and leaves them running indefinitely. This is listed as a long term improvement because it is complex and will take more careful consideration.

Runner triggered from a github action upon release

Once the runner is able to provision resources in digital ocean, run the entire suite automatically, and retrieve and upload the results, it should be enhanced to started from a github action when a release is triggered.

Overall TODO

Original issue: tendermint/tendermint#9580

P2P Stability

Was tendermint/tendermint#9055

Tracking issue for Q3 work on stabilizing the P2P layer.

tendermint/tendermint#9123
Stretch goal: go through tendermint/tendermint#5670 to see whether any of the raised issues are applicable to v0.34 and could be addressed in this quarter

Improve experience for integrators

High-level tracking issue for improvements targeting users who want to integrate with CometBFT-based nodes. This issue will be expanded over time.

Integrators currently make use of several surfaces/APIs of a CometBFT-based node in order to provide value-added services on top of those nodes. We want to provide ways to make integrators' lives easier.

This issue does not cover the Go APIs. For improvements targeting the Go APIs, please see #42

Sub-issues:

Original issue tendermint/tendermint#9883

Investigate how to optimize vote propagation

#backlog-priority
Was tendermint/tendermint#9924

In this task we investigate how to optimize the propagation of votes in Tendermint, using as input the results of #9922.

Votes are propagated for the current Height, for the current and previous rounds.
Once a vote has been received by a node, it informs its neighbors that such vote is no longer required.
Once a node is known to have a vote, it need not receive it again.

Tasks

Investigate a pull-push approach to votes propagation (on each round, first ask what is needed)
approaches by other projects

DoD:

We identified alternatives to how to propagate votes that may be applied to Tendermint and have an initial comparison of pros and cons of each alternative. This data serves as input to an ADR to address the problem.

References:

Under the Hood of the Ethereum Gossip Protocol

CometBFT specification

Was tendermint/tendermint#9321

CometBFT consists of many protocols and interfaces that need to be very well understood. Currently, not all protocols are clearly specified. This issue is designed to track the current state of the specification of CometBFT protocols and provide a clear overview of the remaining work in that respect.

Definition of done

This issue will be considered done when each sub-issue (corresponding to individual protocols and interfaces) is completed.
For a sub-issue to be considered completed it needs to provide the following:

(1) What properties does the protocol provide (externally visible variables, events, together with safety, liveness, "best effort" properties)
(2) What does the protocol expect from other protocols: shared state, timing, transition properties (when the node transitions from blocksync to consensus it should be at most x heights behind the current height of the chain)
(3) How is the protocol operating (protocol description)
(4) Observations/shortcomings/potential issues (I guess when we do this work, we will find problems that eventually should be addressed. We might note them down here together with the version of the software they apply to)

Satisfaction criteria

For 1) and 2): The protocol team and the engineering team are confident that we have a complete list, and we understand the properties well enough to be sure that they can be formalized.

For 3) protocol team and engineering team agree that the description and the code are well aligned, and captures worst case scenarios / corner cases

For 4) we should provide some evaluation of the current state. Written constructively so that we can infer future steps. (perhaps describing scenarios where the implemented protocol behaves differently from what people think it might do).

CometBFT protocols

Here we provide a list of protocols that exist in CometBFT along with the existing documentation and its status.

Consensus

Algorithm in arXiv: OK (except separation of proposal and block)
PBTS extension
Port from retired 0.36 to main
Detection of equivocation: missing
#15
Interface between consensus and other CometBFT modules.
Develop onboarding material explaining/answering some crucial parts of the Tendermint consensus algorithm (e.g., innovative Tendermint termination mechanism)
Specification of the WAL and the replay mechanism

Accountability

Detection of misbehavior - Amnesia: missing
Light client detection: OK (perhaps re-organize lightclient)
Evidence handling: missing
Evidence gossiping: missing

Mempool

Good coverage of parameters (in docs).
Specifications missing. (There is an ADR on the priority mempool. tendermint/tendermint#9310 to write the spec for priority mempool.)
- Problem statement
  - #223
- Protocol description
- #612
Interface
- Documentation contains interface to clients - how to post transactions
- Research: practical behavior, performance, (problems felt in mempool are actually problems of peer-to-peer)

Peer exchange (PEX) and p2p :

Existing issue: #19
#20

Block sync:

[] Existing issue: tendermint/tendermint#8586

State sync

#22

Light client

Documentation adequate and exists here.

CometBFT interfaces

ABCI

ABCI is well specified and understood. Ongoing issue.
Tutorials
- Old ABCI version
- ABCI ++

Peer management

API + network properties are currently not clearly specified. Under this we can cover as well any related information on cryptography, encryption, authentication

Client facing

Mempool (missing for non SDK users)
RPC (missing for non SDK users)
config.toml revisit parameters and find out why they are not consensus params
- Does it depend on the node? (operator knows best)
- Does it need to be protected by agreement?
- Does it have an "agreed" component and an "operator-based" component?

Private key storage and use

Spec of API? Secure?

Persistence, Data Availability

Database
Data availability implications of pruning, statesync, retain_height. What are the network-wide properties that we need?

Core data types

#1014

ABCI 2.0: Docs and tutorials

Tasks:

#544
Revisit the following files
- See if tendermint/tendermint#7660 is still relevant
- getting-started.md [small, 0.5 days]
- indexing-transactions.md [small, 0.5 days]
- abci-cli.md (review only) [small, 0.5 days]
- what-is-tendermint.md (ABCI section) [small, 0.5 days]
- go-built-in.md and go.md (same changes) [medium, 2 days]

Stretch goal:

#2853

DoD:

The doc is updated, and proof-read
No references of BeginBlock, DeliverTx, EndBlock can be found (except if the text is explicitly referring to them as obsolete)
(Strech) The new tutorial is complete and included in doc

Original issue: tendermint/tendermint#9918

Use case: Consensus engine developers

This is a general tracking issue for efforts towards to refactor Comet so as to be more amenable to being used by consensus engine developers.

At present, many such developers have to fork Comet to accomplish their aims.

Ideal Go API boundaries for consensus engine developers
#342

Originally tendermint/tendermint#9878

Formal specification of the Gossip layer

This issue tracks the work of documenting and specifying the interaction between the consensus reactor and the P2P Layer, as well as the interactions inside the consensus reactor itself, between the consensus implementation and the consensus gossip layer.

First, this work will produce an initial set of draft specifications for the abstract behavior expected of the gossip layer and of its interactions Consensus and P2P, both in English and in Quint.
Quint specs will contain tests and will be model checked, once the tooling allows it.

Second, the specs will be refined to produce lower level abstractions, closer to the implementation level.
A primary concern here is to ensure that the abstractions can be efficiently implemented.

Third, the current implementation will be described in documented, in English, and abstracted in Quint.
The implementation will be matched to the abstract specs from the previous steps in a high level refinement mapping.
This work will possibly be completed with formal refinement mapping proofs, manual and/or automated.

The following list breaks these tasks down for easier tracking:

Draft specification of the P2P/Gossip/Consensus interaction
- Exploratory study of the interaction between Consensus and Gossip (done in #16)
  - Identify main abstractions to specify.
    - It has been identified that Gossip may be implemented as a 2P-Set CRDT.
    - It has been identified that a "garbage collection" must run in the 2P-Set; the definition will happen later.
- Draft of high level GOSSIP-I, in English (done in #16)
- Draft of high level GOSSIP-I, in Quint (done in #16)
- Draft of high level P2P-I, in English (done in #16)
- Draft of high level P2P-I, in Quint (done in #16)
  - A very simple interface was defined here and further work will be done in collaboration with other team members.
- Quint tests for provided specs
  - P2P-I tests
  - GOSSIP-I tests
  - Quint mocks of GOSSIP
  - Quint mocks of P2P
- Specification of the "garbage collection", GC (#608)
  - English
  - Quint
  - Tests
Further Investigate CRDT, including implementations.
- #610
- #751
Refine specification of the P2P/Gossip/Consensus interaction (#17)
- Refine GOSSIP-I, in english
- Refine GOSSIP-I, in Quint
- Refine P2P-I, in English (collaboration with other spec work)
- Refine P2P-I, in Quint (collaboration with other spec work)
- Update Quint tests.
Document the current implementation State/Gossip/P2P interaction (#18)

DoD:

The current implementation is well understood
- all components of the current implementation are documented
- the interaction between the components of the Consensus reactor are is documented.
- the interaction between the consensus reactor and the P2P layer is documented.
There exists a formal specification of the interactions inside the Consensus reactor and between Consensus and P2P
Mismatches between the formal specification and the current implementation are identified.

The output of this task will serve as input for #30

spec/consensus: Draft specification of the P2P/Gossip/Consensus interaction

This issue addressed the first tasks of #15.

The following description of the work done here. This task will be closed and new ones will be open to deal with remaining tasks, in small chunks.

The work so far It consisted of an exploratory study of the interaction between Consensus and Gossip, which delved into the discrepancy between what the Tendermint algorithm uses to prove progress (defined as Gossip Communication) what it actually needs. Initially this work explored communication with supersession of messages (which may be dropped) and then on eventual consistent state replication, in the form of a 2P-Set CRDT.

The artifacts produced were

the English specification of the GOSSIP-I interface and its corresponding Quint specification
a mock "implementation" of GOSSIP, to test GOSSIP-I using Quint, which achieves eventual consistency instantaneously,
an abstract specification of GOSSIP, to test GOSSIP-I using Quint, which achieves uses P2P-I to send and receive messages,
the English specification of P2P-I and its corresponding Quint specification
a mock "implementation" of P2P, to test P2P-I using Quint, which delivers messages instantaneously,
an abstract specification of P2P, which a delivers messages with some delay.

QA: Improvements to the result extraction process

Tasks:

Turn (exploratory) octave scripts into python for automatic plotting
Find a better way to extract Prometheus graphs. By "better", I mean
- a) less manual than Firefox snapshot feature
- b) allowing for customizing things like labels, legend, axes setup, title
#799
#55
#58

DoD:

No existing octave scripts no longer there (new, exploratory ones accepted)
loadtime outputs sorted results
There is an automatic process to extract Prometheus graphs

Original issue: tendermint/tendermint#9919

v0.38 release plan

Was tendermint/tendermint#9428

The primary focus of the v0.38 release will be rolling out ABCI++ vote extensions and FinalizeBlock functionality.

Deprioritised from this release:

Operators should have more control over bandwidth consumption
- Investigate where Tendermint bandwidth is being consumed (perhaps capture this in an RFC as an articulation of the problem space)
- Find short-term wins/low-hanging fruit where bandwidth consumption can be reduced (ideally in Q4 of 2022)
#54

Evaluate database engines according to requirements and decide which one to optimize

Problem definition

Based on the requirements in #63 , we need to evaluate the different database engines in order to find one that satisfies most of them based on a criteria.

Some preliminary work has been done in RFC 001 and insights from the community can be found in the comments of tendermint/tendermint#6032.

Intern project on understanding pruning/compaction of different databases. The benchmarks also analyzed the impact of key order on the access time. The final presentation of the work is found in DB experiments.pdf

DoD

An RFC containing:
- The different databases evaluated and the reasoning behind their choice.
- An explanation of the methodology of the evaluation.
- Their ranking based on the requirements.
- Recommendation on which database to choose.
Optimize the chosen database based on the requirements established in #67 and feedback from users (#68)

Original issue tendermint/tendermint#9944

Define storage use cases and workloads

Was tendermint/tendermint#9943

Problem definition

We do not have a complete picture on how our users as well as different Tendermint components interact with storage. We therefore need to collect all the relevant use cases for the storage backend:

Who and how are the using Tendermint storage (validators, sentries, full nodes, etc., but with specific emphasis on what people are doing with full nodes)
How are different Tendermint components using the storage.

DoD

A list of different users, their typical usage patterns and issues with regards to Tendermint storage.

Add in-process compaction support to databases

Was tendermint/tendermint#9743

Summary

Experiment with adding in-proces compaction, so that nodes don't need to be stopped to perform compaction. This issue was originally targeting levelDB but we added support for this to all cometbft database backends that support this feature: RocksDB, PebbleDB and LevelDB.

Problem Definition

Background

One of the most common problem that operators signal is that storage growth is unbounded and compaction doesn't work. Some operators stop their node, trigger experimental-compact-goleveldb (#8564) which deletes old data, and then restart their node.

Why do we need this feature?

The use of command experimental-compact-goleveldb has the disadvantage that while this is running the node is stopped and is missing blocks. It typically take on the order of tens of minutes to finish compaction of a node on a production network, so the number of missed blocks can be significant.

Proposal

We'll go about this incrementally

Tendermint team does initial de-risking and sanity checks to see that in-process compaction can be implemented safely
- Add a new database type that does compaction
We ask an operator to deploy an early experiment replacing one of their full nodes with the patched tendermint version that has in-process compaction
- relayer team tests relaying against that node, monitor general health
We collect advanced metrics on latency in particular, as well as storage growth evolution
- consensus: additional timing metrics #9733

Update the e2e runner to disconnect a node from network in Digital Ocean testnet

Tasks:

Familiarize with tendermint/tendermint#9860
Clone it to Comet (merge to main)
Drive tendermint/tendermint#9860 to completion, including any non-CI testing needed
- #852
Test on DO
- cometbft/qa-infra#17

DoD:

Patch finalized, reviewed, approved, and merged

Original issue: tendermint/tendermint#9950

ABCI 2.0: Final adjustments

Tasks:

DoD:

No errors in CI
No major concerns if testnets were run
Branch feature/abci++vef has been merged back into main

Original issue: tendermint/tendermint#9916

docs/ux: Simplifying storage management

At present, managing Tendermint storage is unnecessarily complex and sometimes confusing. We have many configuration options that impact storage (some of which need to be configured by the application) and it's not clearly documented as to the impact of all of those configuration options. We should investigate options for simplifying that experience for operators.

Improve the documentation around the impact of using different storage-related options
Decide on whether an alternative UX strategy should be implemented for operators
- Option: operators define one overall maximum storage limit for the node, and let the node automatically prune non-critical data to try to keep to that target. Questions remain though around this strategy, such as:
  - Can we separate each type of data stored by a Tendermint node into a different "category", ranked by criticality?
  - What are the different "levels" of criticality that we should support?
  - Should data category criticality be able to be controlled by operators?
  - What does a Tendermint node do when it cannot prevent itself going over the storage limit? (by way of the criticality configuration, if there is a level that can never be pruned)

Original issue: tendermint/tendermint#9906

v0.37 release plan

Was tendermint/tendermint#9091

This issue currently targets the v0.37 release.

In order to ship tendermint/tendermint#9053, we need to do the following:

Things that still need to be done before we "feature-freeze" v0.37 and cut the v0.37.x branch:

mempool: Deprecate (`v0.37.1`) and/or remove (`v0.38.0`?) the priority mempool

Was: tendermint/tendermint#9388

In light of ABCI++ landing and, as per the previous discussion in tendermint/tendermint#9388, it was decided to first deprecate the priority mempool for v0.37 and then remove it from v0.38, as its functionality can be obtained by implementing an app-side mempool with ABCI++.

ToDo

For v0.37.1:

#259

For v0.38.0:

#260

State sync from local snapshot: Implementation

Tasks:

Implementation (existing implementation of ADR083: tendermint/tendermint#9541)
Introduce necessary changes to documentation and/or spec to keep them up to date
Check if tendermint/tendermint#4642 can be closed once implementation merged

DoD:

Implementation code complete and all PRs merged
Any changes needed to the documentation/spec are merged

Original issue: tendermint/tendermint#9948

modularity: Rolling out new functionality faster, but reliably

High-level, multi-quarter tracking issue for specific work aiming to make the codebase more malleable, while maintaining high standards in terms of QA. This issue will be expanded over time.

The ecosystem benefits from changes to Tendermint Core/CometBFT relatively slowly. This is partially due to the sheer complexity and entanglement of various parts of the codebase as it has evolved somewhat organically over the years.

Paying off technical debt

Paying off technical debt will enable us to move faster.

RPC

Beta Give feedback

rpc: Consider an architectural overhaul #447

P:tech-debt rpc
Options

Consensus internal refactoring

Beta Give feedback

refactor(consensus): separate sending block parts from "pick part" logic #2663

consensus wip
refactor(consensus): separate sending votes from "pick votes" logic #2659

consensus
Options

Persona: Consensus engine developers

Beta Give feedback

Ideal Go API boundaries for consensus engine developers
ADR for modular transaction hashing #342

P:consensus-engine-devs community-call crypto enhancement security
crypto: provide simple way to add new curves #2424

2 of 2

crypto enhancement needs-triage
Options

These used to be tracked by #43. Consolidated those issues here:

Double-check whether we need something from (tm)#7768

When we retracted v0.35.x and went back to v0.34.x, we lost the context management that had been put in place over many months.

As (tm)#7768 was fixing some contexts in v0.36.x, the goal here is to try to bring whatever makes sense to main.

Original issue: tendermint/tendermint#9954

Understand and simplify CometBFT database backend

Supersedes tendermint/tendermint#6032, breaking that issue down into more concrete, clearly separated deliverables/sub-tasks.

It's ultimately expensive and painful for the team to have to cater to many different use cases that require different underlying storage engines, and we would rather converge on a single database that meets our core requirements well. We do, however, want to simultaneously provide ways for integrators to access and transform core data into whatever storage system suits their use case.

Problems

Through cometbft-db, CometBFT currently supports multiple database backends. As such:
1. CometBFT does not make extensive use of database-specific optimizations.
2. Storage behavior is not consistent across different databases, potentially resulting in more troubleshooting and bug fixing work for the team (e.g. tendermint/tendermint#8416, #1017 ).
While we could just decide to only support GoLevelDB (as per tendermint/tendermint#9741), one of the most commonly used underlying databases for CometBFT, it seems to struggle with pruning (see tendermint/tendermint#9743, informalsystems/interchain#1). It's also not clear, given our typical storage workloads for the most common use cases, whether the underlying data structure that LevelDB provides is even suitable for CometBFT. Current problems experienced by operators seem to suggest otherwise.
We currently do not have a very clear set of requirements for an underlying database for CometBFT.

RFC-001 provides some more detail around the problem space.

Work Breakdown

In order to achieve our overall goal of storage simplification, we need to complete the following work.

Original issue: tendermint/tendermint#9749

logging improvements: blocksync module

Tasks:

First: check in v0.35.x/v0.36.x for any existing work on improving logs for blocksync
Full audit of all logs, and their log level
Circulate an example of the improved logs to other members of the team
(stretch) Circulate example among (some of) our users

(this can be done on main directly)

DoD:

Team has taken a look at the example, and approves the improvement
(stretch) Users have taken a look at the example, and approve the improvement

Original issue: tendermint/tendermint#9920

Pinpoint inefficiencies in block, vote and transaction propagation

Was tendermint/tendermint#9922

It has been identified that votes, block parts and transaction propagation in the mempool use more data than what should be needed to reach a decision (tendermint/tendermint#9706)

In this task we determine by which factor this inefficiencies happen on each kind of message (which will let us prioritize optimization of each of them) and what are the sources of the inefficiencies (which will point the direction to the fixes).

Some questions to be answered

do nodes forget having send messages and send them again?
- they don't forget having sent (except for a case that has already been fixed), but sometimes require getting the message from the other node back, before stopping sending it.
do all nodes needlessly send the same messages to the same nodes?
- given the unstructured nature of the network, yes, the same message is received multiple times from multiple sources.
is the "has votes" message effective?
- yes, as long as it is delivered before the votes itself, which is not normally the case. This has been addressed in #904.

Tasks:

Add metrics to track how many times a node receives duplicate votes (#896)
Add metric to track how many times a node receives duplicate block parts (#896)
Add metric to track how many times a node receives duplicate transactions (present in mempool cache) (#637)
Compile the results (see the discussion in #904)
Add logs to identify the sources of duplicate votes (metrics and code analysis were enough)
Add logs to identify the sources of duplicate block parts (metrics and code analysis were enough)
Add logs to identify the sources of duplicate duplicate transactions (present in mempool cache) (metrics and code analysis were enough)

DoD:

We identified why duplication happens.
This information serves as input to optimizing the message exchange: #30

Establish and implement the relevant metrics to understand storage workloads

Was tendermint/tendermint#9773

We need to identify the set of metrics to understand the storage workloads of CometBFT. The metrics should help us identify:

The access patterns (sequential / random access)
How often the data is read/written
Who reads/writes the data
Is the data accessed by multiple components or just one.
How much of the total height time is spent in storage - on a small network, on a big network? (Is storage a bottleneck? )

Open questions: Do we want information on which CometBFT Blockstore / StateStore method call triggered the access or is read/write/delete count and timing enough?

Draft implementation: tendermint/tendermint#9774

DoD

Set of candidate metrics is established, discussed and well understood.
The identified metrics are added into the Tendermint codebase.
Created a custom grafana dashboard for easy monitoring. #2448 #2107 #2615 #2616

Final review, testing, and merge of branch `feature/adr075-backport` into `main`

Tasks:

Ask the submitter of tendermint/tendermint#9857 to resubmit to Comet (if it's still private, let's do it ourselves)
Final high-level review of tendermint/tendermint#9857 (all this code has already been reviewed twice)
Run UTs, e2e locally
(Optional) Run a testnet to make sure no major breakage is introduced
Merge tendermint/tendermint#9857 into main
Update documentation, such as state of ADR (if needed), update references to the backport feature branch

DoD:

Branch feature/adr075-backport is merged
No major breakages, any minor issue is solved
Documentation up to date

Original Issue: tendermint/tendermint#9949

Specify the operation of the p2p layer

The goal of this issue is to produce a high-level specification of the p2p communication layer adopted by CometBFT.

The original issue is tendermint/tendermint#9089, it refers to the implementation of the p2p layer on v0.34.x branch.

Definition of Done

Ultimately what we want here is a clear understanding of how the existing p2p layer works in v0.34 through v0.38+ (it's the same implementation). The target audience here is consensus engine developers, which includes the CometBFT team. The impact we anticipate here is that our team will be able to move faster in debugging and fixing issues in the p2p layer, as well as providing support for future work on refactoring the existing implementation.

This issue can be considered "done" once we have documentation in the spec/p2p directory of the repository that helps consensus engine developers understand the operation of the p2p layer in CometBFT.

Tracked Issues

Update the e2e runner to be able to start and stop CometBFT testnet processes on non-local machines

Tasks:

Familiarize with PR tendermint/tendermint#9801
Clone it to Comet (merge to main)
Drive it to completion, including any non-CI testing needed
- Refactor to enable start and stop (#796)
- Add the DO infra and test
  - #846
  - cometbft/qa-infra#16

DoD:

The Comet PR (cloned from tendermint/tendermint#9801), and subsequent PRs, are approved, tested, and merged

Original issue: tendermint/tendermint#9790

Investigate how to optimize transaction propagation

This tasks tracks efforts in investigating how to optimize the propagation of transactions, namely the mempool protocol.

The goal is to investigate the literature on the topic and existing solutions adopted by other blockchain products in order to identify alternative propagation approaches that may be applied to CometBFT. Even if an approach cannot be directly applied to CometBFT, it may have aspects that might be considered on future designs of the mempool.

Original issue: tendermint/tendermint#9925

Literature

Transaction propagation can be translated into gossiping pieces of information in an unknown, unstructured, and partially connected network. Some fundamental and general references for gossip to start the investigation:

In addition, we should survey literature on gossip and anti-entropy protocols, with an emphasis on BFT solutions.

Other protocols

This is a (not comprehensive and under construction) list of existing solutions for transaction propagation:

Celestia
- Content Addressable Transaction Pool
- This topic was already discussed, but we need to document the outputs
Narwhal mempool:
Solana:
- gulf-stream
CodedHotstuff
Anoma/Thyphon:
- https://arxiv.org/abs/2306.16153

Definition of Done

We have identified alternatives for transaction propagating that may be applied to CometBFT.
We have an initial comparison of pros and cons of each alternative we have identified and investigated.

spec/consensus: Refined specification of the P2P/Gossip/Consensus interaction

In this task we refine the output of #16 to provide lower level specifications, updated GOSSIP-I and P2P-I (in collaboration with other spec work) and data types.

Tasks:

Refine GOSSIP-I, in english
Refine GOSSIP-I, in Quint
Refine P2P-I, in English (collaboration with other spec work)
Refine P2P-I, in Quint (collaboration with other spec work)
Update Quint tests.

DoD:

Lower level specifications are provided in English and Quint. Specifications are tested and possibly formally checked.

Investigate how to optimize block proposal propagation

#backlog-priority
Was tendermint/tendermint#9923

In this task we investigate how to optimize the propagation of proposals, i.e., proposal messages and block part messages.
This task takes as input the results of #9922 and previous discussion, such as #7932.

Tasks

Investigate solutions used by other protocols
- Solana
- Coded Hotstuff
- Other alternatives
  - Would a pull-push work better than the current push only?

DoD:

We identified alternatives to how to propagate proposals that may be applied to Tendermint and have an initial comparison of pros and cons of each alternative. This data serves as input to an ADR to address the problem.

References:

Storage optimization

High-level tracking issue for general storage optimization efforts. This issue can be expanded over time.

At present (mid 2023), depending on their configuration, Tendermint-based nodes use large quantities of storage space. This has significant cost implications for operators. We aim to implement strategies to reduce and/or offload certain data stored in order to reduce operators' costs.

The two main problems that are present in the CometBFT storage layer:

We have a very big storage footprint
Querying stored data (whether supporting RPC queries or Comet retrieving consensus data structures) is not optimized and in some cases proven to be very efficient

To address these problems, we first need to build understanding of:

Workloads : What we store, how frequently we access it, what are the characteristics of the stored data (and this list will be expanded).
The database backend: database features, design goals and optimization possibilities.

The work to be done can be broken down in the following main subsections:

#48
The end result of this work should be CometBFT optimized for a single storage backend which ultimately results in a significant reduction in both storage access time and on disk storage footprint.

To reach this goal we envision the following steps :

#68
Preliminary investigation to identify the users and workloads of the storage backend (how they query the nodes, what are their common pain points with regards to storage, collection of issues to address).
#63
- #1044
- Understand the workload: What data are we storing, in what format
- #46
- #67
#64
#1039
Add support for users to migrate to the chosen backend

Tune CometBFT to address storage related bottlenecks

Part of this section covers addressing issues found during the benchmarking and investigation process outlined above. Another part addresses concrete issues reported by users. While part of this issues cannot be fully addressed before the analysis above, some optimizations can be performed on CometBFT as it is today - marked with * .

#1037 *
The Genesis file can be large and surpass internal DB file size limitations (3GB for RocksDB).
#1040 *
#1041
Reconstruct state using iterators rather than storing it as an entry. ( depends on previous point)
Pruning of blockstore is not reflected in storage used
- informalsystems/interchain#1
- #49
- #169
  It seems that for users pruning the indexer is not as high a priority and well understood as reducing the footprint and reducing the potential DoS vector querying it can be.

CometBFT stores and allows querying of data not essential for consensus
We need to Identify the functionalities we want to support within Tendermint and offload non-critical data and functionality.

#816
This implementation provides users with an API to implement their own event indexing and prune the full nodes who store events at the moment.
Write a data companion based on ADR 101

#50

CometBFT currently maintains its own [WAL](https://github.com/cometbft/cometbft/blob/101bf50e715d6a10c8135392166c35bdae94972e/consensus/wal.go) - is this even necessary, given that the underlying database should actually be taking care of this? It is another source of complexity and potential point of failure in the system that the team has to maintain.

Original issue: tendermint/tendermint#9881

ABCI 2.0: Spec work

Tasks:

Issues identified:

#1326

DoD:

Out of all solutions proposed in RFC017, the one implemented (sub-optimal storage) is now part of the spec
Spec is updated to ABCI 2.0 (very similar to the one in v0.36.x)
Spec is complete and proof-read
Spec methods and data structures match the corresponding protobufs

Original issue: tendermint/tendermint#9917

p2p: Define a specification

This follows from #19.

We want to ensure that our P2P layer implementation is correct. Given that all we have at present (v0.37, v0.34 and earlier versions) is an implementation, without a specification (i.e. a definition of how the P2P layer should work), it makes it hard to understand and test.

To this end, we believe we need to develop (i.e. invent) a specification for the P2P layer, using whatever relevant learnings we've picked up from our prior work in documenting how the v0.34 P2P layer works. This can also be informed by ADR-061 and ADR-062. This need not apply to the entire P2P layer - only those parts that can possibly be specified.

Follow-up work to this will include refactoring of the P2P layer to conform to the specification we develop here.

Define consensus' requirements from the P2P layer
Define the mempool's requirements from the P2P layer

Originally tendermint/tendermint#9573

docs/consensus: Document the current implementation State/Gossip/P2P interaction

Was tendermint/tendermint#9930

Tasks:

an overview of all files that compose the reactor is provided
all message types exchanged State are described
all message types exchanged by Gossip are described
all structures used to keep track of message in Gossip are described
the interaction between Gossip and State is described
- A preliminary write up by @cason exists for Tendermint v0.34.x, but should still apply to fork v0.37.x

DoD:

The current implementation is well understood
- all components of the current implementation are documented
- the interaction between Gossip and State is documented.

Tracking issue for more aggressive removal of bad peers

Original issue : tendermint/tendermint#9545

Description

This issue aims to cover a wide variety of use cases where Tendermint could remove bad peers more proactively. Peer removal is triggered by information from various reactors and aims to a) increase security and b) boost performance.

There are already a number of issues created around this topic.

Definition of done

Define and specify bad peer behaviour. By bad behaviour we do not necessarily mean malicious behaviour. Rather, peers can be considered bad if they are slow, sending stale or unwanted messages or too frequent requests. Each issue tackling peer removal has to clearly identify what behaviour is considered bad.
Node is disconnected from peers upon detection of bad behaviour.
Changes are backported to 0.34.x
Changes are backported to the latest 0.37 version
QA tests are performed for each backport before release.

Individual issues tackling removal of bad peers

Mempool

Removing transactions failing CheckTx. tendermint/tendermint#6523
Allow application to return a code in CheckTx marking a transaction that never could have been valid, leading to a disconnect from the peer who sent this transaction.
- #66
  - The use case where a node sends transactions that could never have been valid;
  Implications on the security and correctness model of Tendermint (how does the application know that it is this particular peer that should be removed, is this scenario not covered already by other layers in Tendermint, etc.);
  - What other changes in other parts of Tendermint should the implementation of this require;
  - If solution/problem still seem valuable, propose an implementation path taking into account potential need to refactor and change the p2p layer itself to support this.
- #623
  Related issues from the tendermint repository:
  - tendermint/tendermint#7918
  - tendermint/tendermint#2185

Consensus

Blocksync

tendermint/tendermint#2896

cometbft / cometbft Goto Github PK

cometbft's Issues

Preliminary work

Short term goals

Long term goals

Previous efforts

Related, but not covered by this issue.

References

Summary

Proposed Path to Done. Summary

Proposed Path to Done. Details

(1) Preliminary Work

(2) Core feature work

Vote Extensions

FinalizeBlock

Final adjustments

Spec and doc work

Overview

Near term improvements

Runner generates the network configuration files

Runner adds the load to the network

Runner starts and stops the processes

Runner retrieves the data

Long term improvements

Runner manages the infrastructure

Runner triggered from a github action upon release

Overall TODO

Sub-issues:

Definition of done

Satisfaction criteria

CometBFT protocols

Consensus

Accountability

Mempool

Peer exchange (PEX) and p2p :

CometBFT interfaces

ABCI

Peer management

Client facing

Private key storage and use

Persistence, Data Availability

Core data types

Deprioritised from this release:

Problem definition

DoD

Problem definition

DoD

Summary

Problem Definition

Background

Why do we need this feature?

Proposal

ToDo

Paying off technical debt

RPC

Consensus internal refactoring

Persona: Consensus engine developers

Related:

Problems

Work Breakdown

DoD

Definition of Done

Tracked Issues

Literature

Other protocols

Definition of Done

Description

Definition of done

Individual issues tackling removal of bad peers

Mempool

Consensus

Blocksync

Pex/p2p

Recommend Projects

Recommend Topics

Recommend Org

`FinalizeBlock`