Comments (7)
Thank you for opening the issue.
I don't think this is the case for v17 only but for all versions using cosmos-sdk v47.
Does the node respond when you query /status
or /block
?
from gaia.
https://forum.cosmos.network/t/cosmos-hub-v17-1-chain-halt-post-mortem/13899
During the chain halt the node was still running and peering and responding to /status
and /block
Maybe this issue should be raised in the cosmos-sdk repository.
from gaia.
I am pretty sure this is a feature of the upstream SDK yea. Peer Exchange continues despite a non 0 exit code (baseapp recovers from it to keep the instances alive) since it could recover in some exit 1 cases (not this one, but others).
from gaia.
I am pretty sure this is a feature of the upstream SDK yea. Peer Exchange continues despite a non 0 exit code (baseapp recovers from it to keep the instances alive) since it could recover in some exit 1 cases (not this one, but others).
From a devops standpoint, when running nodes, I believe nodes that halt and do not progress from irrecoverable errors should exit with 1 code. For example, should one run a container on either Docker or Kubernetes, when the main app exits from an error there could be automatic attempts to restart and also trigger error logs to Docker/Kubernetes monitoring systems.
When you keep a node alive from errors that cannot be recovered there are a ton of funny edge cases you need to think about.
A node that is in error and connected to peers and responds to API calls, I have seen catching_up = false
on status calls. Even if we monitor latest_block_time
how do we differentiate between a legitimate chain halt from an upgrade where latest_block_time
is stale, or a fatal error that crashes a node and continues to give a stale latest_block_time
?
Furthermore, is there really a point in staying connected to peers during a fatal error that crashes a node?
Surely there needs to be more work on identifying errors that should immediately terminate the node and cause a cascade in warnings and error in monitoring systems and errors that are recoverable.
from gaia.
I agree with the points above.
Can you open this issue on cosmos-sdk? This is out of scope for a chain repository.
Please tag me on the new issue if you need further assistance from us.
from gaia.
How was the fatal halt conveyed here? apphash, lastresulthash mismatches? it seems the framework acted as intended.
from gaia.
Discussion continues on cosmos-sdk
Closing.
from gaia.
Related Issues (20)
- unknown query path for `gaiad query interchain-accounts controller params` HOT 5
- Node is not syncing giving peer issue HOT 6
- [investigate]: Prevoting nil err, wrong Block.Header.AppHash. HOT 9
- I need to do a hot change of feemarket... HOT 3
- [question]: Unable to get the latest list of proposals through the restful interface HOT 2
- [Docs]: update build and run docs for CosmWasm HOT 1
- upgrade PSS to SDK v0.50.0 (ICS v5.1.0) HOT 1
- Jailed validator HOT 4
- [feature request]: gaia container image is outdated since 2 years HOT 6
- [Docs]: Security Audit HOT 1
- [Question]: Atom Node Not Syncing with Gaia 17.1.0 HOT 3
- [Docs]: update repo readme file
- [Bug]: "grpc: received message larger than max" when get specified tx info HOT 3
- cosmos-sdk v50.x with LSM HOT 1
- [Epic]: Gaia v19.0.0
- [Epic]: Gaia v20.0.0
- docs: governance/parameters docs do not reflect actual on chain parameters HOT 1
- Error when testing sending transactions on v18.1.0 HOT 4
- UX: IBC forwarding for transactions HOT 5
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from gaia.