Code Monkey home page Code Monkey logo

Comments (39)

chillyvee avatar chillyvee commented on June 1, 2024 2

Seeing the activity here. Need to review notes and will be back soon.

from cometbft.

yihuang avatar yihuang commented on June 1, 2024 2

Any reason for targeting this change to branch v0.34.x? I would say it would be better and safer to target the main branch, then after due checks, backport it to release branches.

target main branch: #733

from cometbft.

chillyvee avatar chillyvee commented on June 1, 2024 1

Happy to collaborate when you are ready. We just updated the code for the latest Juno v13 and have some intermediate feedback based on previous discussions.

from cometbft.

yihuang avatar yihuang commented on June 1, 2024 1

Hi @chillyvee would you kindly reconstruct your original PR and submit to this repo as a draft PR preferably, with a base branch v0.34? We've now in the "understand problem & solution" space (#28) so it would be good to see this implementation.

Thanks in advance!

Love this feature, would like to help if needed, BTW, where can I find the ADR083?

From my understanding, state-sync can be decoupled into several components:

  • snapshot discovery and chunks downloading
  • snapshot chunks verification with light client protocol
  • snapshot restoration
  • node startup with the new state.

Ideally we should have offline commands for each steps to provide maximum flexibility.

from cometbft.

yihuang avatar yihuang commented on June 1, 2024 1

Just did a squash rebase and a few tweaks to fix building, and opened the PR targeting v0.34.x branch: #728

from cometbft.

yihuang avatar yihuang commented on June 1, 2024 1

Seeing the activity here. Need to review notes and will be back soon.

Please feel free to take over the PR when you are back 😄

from cometbft.

yihuang avatar yihuang commented on June 1, 2024 1

But I hope we can merge #803 to unblock app side solutions immediately, then we can take our time to find more ideal design, wdyt?

#803 is ported to sdk directly, it turns out we can hack tendermint state directly, so the whole local snapshot restoration procedure can be done in app side alone.

from cometbft.

yihuang avatar yihuang commented on June 1, 2024 1

maybe we can do this command more elegantly in cometbft, after all, it use many internal APIs to do the job, those are maybe prone to change in the future.

Got it. This is a nice to have, right? Correct me if I'm wrong, it seems like on Comet side there's nothing left to do (except for "nice to have" in the future).

yeah, nice to have.

from cometbft.

adizere avatar adizere commented on June 1, 2024

Thank @chillyvee, appreciate it ! We'll get back to you in a few weeks time after we have explored all the existing material in tendermint/tendermint and have clarity over the solution. This work is part of our Q2 priorities.

from cometbft.

adizere avatar adizere commented on June 1, 2024

Hi @chillyvee would you kindly reconstruct your original PR and submit to this repo as a draft PR preferably, with a base branch v0.34? We've now in the "understand problem & solution" space (#28) so it would be good to see this implementation.

Thanks in advance!

from cometbft.

cason avatar cason commented on June 1, 2024

I think Adi referred to the open PR in the old repository: tendermint/tendermint#9651

This has become a huge PR due to the changes made in the upstream repository. We are only interested on this file: https://github.com/tendermint/tendermint/pull/9651/files#diff-393baba5ea65c4316e47d1a3715a1c4a8e934c1310071f982d2cf5418ef84d80

from cometbft.

yihuang avatar yihuang commented on June 1, 2024

I think Adi referred to the open PR in the old repository: tendermint/tendermint#9651

This has become a huge PR due to the changes made in the upstream repository. We are only interested on this file: https://github.com/tendermint/tendermint/pull/9651/files#diff-393baba5ea65c4316e47d1a3715a1c4a8e934c1310071f982d2cf5418ef84d80

Thanks.

Allow applications to start with a bootstrapped state alongside an empty Tendermint instance using a subsystem of the state sync protocol.

This is exactly what I want.

from cometbft.

adizere avatar adizere commented on June 1, 2024

Thank you @yihuang. I'll be opening an ADR based on the original one soon.

from cometbft.

adizere avatar adizere commented on June 1, 2024

Ported the ADR here: #729. It's not yet clear to me if it's reflective of the solution, please have a look.

from cometbft.

cason avatar cason commented on June 1, 2024

Any reason for targeting this change to branch v0.34.x? I would say it would be better and safer to target the main branch, then after due checks, backport it to release branches.

from cometbft.

chillyvee avatar chillyvee commented on June 1, 2024

Is there a preference for v0.34 or main?

Can probably make a working example for Juno v14 if anyone thinks that is helpful for the "understand" part of the work.

from cometbft.

adizere avatar adizere commented on June 1, 2024

Is there a preference for v0.34 or main?

Can probably make a working example for Juno v14 if anyone thinks that is helpful for the "understand" part of the work.

Excellent idea!

from cometbft.

yihuang avatar yihuang commented on June 1, 2024

FYI, after we identified the root cause of the state sync timeout issue and use the workaround: #742, the state sync UX become good again for us.

from cometbft.

adizere avatar adizere commented on June 1, 2024

Can probably make a working example for Juno v14 if anyone thinks that is helpful for the "understand" part of the work.

Any progress on this?

from cometbft.

chillyvee avatar chillyvee commented on June 1, 2024

@adizere Will try to make this a nice weekend project :)

from cometbft.

adizere avatar adizere commented on June 1, 2024

@adizere Will try to make this a nice weekend project :)

Hey, it's not urgent. I was asking because I was not able to test it. We spoke among the whole team also about the solution, and there's some concerns that the application state might be easily corruptible, or unsafe to transfer across machines. We are leaning towards a solution that reuses the snapshotting mechanism (instead of app.db).

from cometbft.

chillyvee avatar chillyvee commented on June 1, 2024

@adizere Assuming statesync works (which it does), then calling statesync with local files rather than the exact same files downloaded over p2p between machines doesn't seem like a way to make the appstate more corruptible.

All the code does is reuse the existing snapshot restore mechanism and reference local application.db. Then finishes up by triggering comet-bft to restore the remaining non-application state.

Essentially the difference is replacing the statesync file downloads via FTP with a USB thumb drive. It's the same code executing.

Do you know what part of using the existing snapshot restore code/other code is the additional concern coming from? Could take your feedback to make extra checks or confirm the risk.

from cometbft.

yihuang avatar yihuang commented on June 1, 2024

In my test, the current implementation do has some issues with handshake process, if it's too tricky to hook into the handshake process, I'd like to have some low level tools instead, for example let user to download and store the consensus state at an arbitrary height using the light client protocol. So power user can just do the thing in multiple steps manually:

  • download snapshot and restore app state
  • restore cometbft state to the same height (#803)
  • start the node and it'll sync from there.

from cometbft.

cason avatar cason commented on June 1, 2024

@chillyvee Can you take a look on #801?

I think the proposal I am writing there, as ADR 104, is essentially the same as the one you are proposing.

from cometbft.

cason avatar cason commented on June 1, 2024

@yihuang State sync currently does what you are proposing, with the advantage that it makes sure that the application state matches CometBFT's state (i.e., block store and state store). It does not make much sense to me to bootstrap CometBFT's state to a given height, as you proposed in #803, without having an application at the same height.

In #801 we propose a mechanism to allow what you are proposing as the first bullet, i.e., to restore an application snapshot from the local file system, while bootstrapping CometBFT's state accordingly.

from cometbft.

yihuang avatar yihuang commented on June 1, 2024

It does not make much sense to me to bootstrap CometBFT's state to a given height, as you proposed in #803, without having an application at the same height.

the point is split up the state-sync process into multiple commands to maximize flexibility, there'll be a separate command to restore the app state from a local snapshot, it could be implemented in cosmos-sdk.

with the advantage that it makes sure that the application state matches CometBFT's state

after offline restoration, node start up normally, the normal start up procedures will check the consistencies anyway.

I think i'm just trying to do the same thing without intrusive changes in the normal node start up logic, just a few offline commands, but I'm open for more integrated solutions as well, as long as it solve the problem.

from cometbft.

yihuang avatar yihuang commented on June 1, 2024

BTW, the second bullet in your proposal can also be implemented in app side, as demonstrated here

from cometbft.

cason avatar cason commented on June 1, 2024

the point is split up the state-sync process into multiple commands to maximize flexibility

Yes, I agree with this. Maybe not really CLI commands, but to make the steps more clearly separated on the code, although they are already kind of.

More specifically:

  1. This node startup method configures the Light client provider and calls State Sync
  2. State sync main routine receives a candidate snapshot, uses its height to start the light client, then use the light-client obtained state to verify the installed snapshot
  3. Back to the node startup, the state and commit obtained by the light client are used to bootstrap the state store and the block store accordingly

from cometbft.

cason avatar cason commented on June 1, 2024

I think i'm just trying to do the same thing without intrusive changes in the normal node start up logic

The ADR 104 proposal (#801) is not intrusive as well. It has the advantage of also considering the server side, namely, providing a command to allow "exporting" an application snapshot to a "file", that will be use to bootstrap another node.

from cometbft.

cason avatar cason commented on June 1, 2024

BTW, the second bullet in your proposal can also be implemented in app side, as demonstrated here

Yes, this is an alternative for the server side to "export" snapshots.

I am not sure that is the best one, as it requires every snapshot to be saved to disk, even if the snapshots are not shipped out-of-band to fresh clients. Notice that snapshots are already saved somewhere by the application, and this approach will create some data duplication, with the disadvantage that the application knows how and when to prune or delete old snapshots, feature that would not be trivially transferred to this file system "snapshot store".

from cometbft.

yihuang avatar yihuang commented on June 1, 2024

BTW, the second bullet in your proposal can also be implemented in app side, as demonstrated here

Yes, this is an alternative for the server side to "export" snapshots.

I am not sure that is the best one, as it requires every snapshot to be saved to disk, even if the snapshots are not shipped out-of-band to fresh clients. Notice that snapshots are already saved somewhere by the application, and this approach will create some data duplication, with the disadvantage that the application knows how and when to prune or delete old snapshots, feature that would not be trivially transferred to this file system "snapshot store".

in cosmos-sdk, it'll directly restore the state from snapshot without saving them, it only saves the snapshots created by itself, the new flag is to make it save the snapshot for the first restoration. cometbft do save the chunk to file in a temporary directory though, it's immediately removed after restoration finished.
the saved snapshot is to be used by another cli command to restore app state from it, then user use the bootstrap-state cli command to restore cometbft state, then a normal node start up.

from cometbft.

cason avatar cason commented on June 1, 2024

in cosmos-sdk

We have to also consider users that do not adopt the SDK. Of course, once we have a more detailed proposal, we would be happy to discuss specifics with the SDK team.

from cometbft.

cason avatar cason commented on June 1, 2024

Moreover, I would really appreciate your feedback on #801. It is in draft mode, but you can comment or propose changes there.

from cometbft.

yihuang avatar yihuang commented on June 1, 2024

Moreover, I would really appreciate your feedback on #801. It is in draft mode, but you can comment or propose changes there.

have left review comments there, I think we need more descriptions of the new design to decide if it's a better one.

from cometbft.

yihuang avatar yihuang commented on June 1, 2024

But I hope we can merge #803 to unblock app side solutions immediately, then we can take our time to find more ideal design, wdyt?

from cometbft.

adizere avatar adizere commented on June 1, 2024

But I hope we can merge #803 to unblock app side solutions immediately, then we can take our time to find more ideal design, wdyt?

#803 is ported to sdk directly, it turns out we can hack tendermint state directly, so the whole local snapshot restoration procedure can be done in app side alone.

Excellent. Thanks @yihuang for the update!

I don't fully understand the SDK-side solution, but it seems to handle snapshot restoration plus also saving the snapshot (cosmos/cosmos-sdk#16060). What exactly is left on Comet side to do?

from cometbft.

yihuang avatar yihuang commented on June 1, 2024

But I hope we can merge #803 to unblock app side solutions immediately, then we can take our time to find more ideal design, wdyt?

#803 is ported to sdk directly, it turns out we can hack tendermint state directly, so the whole local snapshot restoration procedure can be done in app side alone.

Excellent. Thanks @yihuang for the update!

I don't fully understand the SDK-side solution, but it seems to handle snapshot restoration plus also saving the snapshot (cosmos/cosmos-sdk#16060). What exactly is left on Comet side to do?

maybe we can do this command more elegantly in cometbft, after all, it use many internal APIs to do the job, those are maybe prone to change in the future.

from cometbft.

adizere avatar adizere commented on June 1, 2024

maybe we can do this command more elegantly in cometbft, after all, it use many internal APIs to do the job, those are maybe prone to change in the future.

Got it. This is a nice to have, right? Correct me if I'm wrong, it seems like on Comet side there's nothing left to do (except for "nice to have" in the future).

from cometbft.

adizere avatar adizere commented on June 1, 2024

yeah, nice to have.

Thanks @yihuang. We'll track that work here: #884.

I'm closing the present PR: The original scope and problem we wanted to address here shifted. In the future, we might implement ADR 104 (#801) as a solution for non-SDK users that want support for state sync out of a local snapshot. For the case of SDK users, their needs seem to be addressed.

from cometbft.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.