multiversx / mx-chain-go Goto Github PK
View Code? Open in Web Editor NEW⚡ The official implementation of the MultiversX blockchain protocol, written in golang.
Home Page: https://multiversx.com
License: GNU General Public License v3.0
⚡ The official implementation of the MultiversX blockchain protocol, written in golang.
Home Page: https://multiversx.com
License: GNU General Public License v3.0
I have 3 nodes running in a VPS (8 cores, 30gb ram and 800gb ssd)
In the last 3 days I saw that at some point some nodes go down. Attach you have the fatalerrors log file of the last one.
37b12118f12f_fatalErrors-2019-08-08-06-54-57.log
Ask me anything and I will upload what you need
I checked my balance on https://testnet.elrond.com/?fbclid=IwAR0oziqDY3Edk0D18Hfxp5AKTQ0PJBR8ov0iLhgs50M-mRD_bwowNL-RX1U#/address/32ef1f68e26a3036d7550ca81c30031ba2d207b731ab077ac912ff38a708fae8 and I've seen that same ages have negative values.
First users installing elrond-go from the instructions in README.md
will not have set their $GOPATH yet. Yet it is referenced in:
https://github.com/ElrondNetwork/elrond-go/blob/master/README.md
Solution could be something like:
# set $GOPATH if not set and export to ~/.profile along with Go binary path
if [[ $GOPATH=="" ]]; then
GOPATH="$HOME/go"
fi
I've checked the log and found these two non repeating errors, on shard 2:
elrond- go/consensus/spos/commonSubround/subroundBlock.go","level":"info","line_number":330,"msg":"canceled round 12613 in subround (BLOCK), wrong nonce in block\n","time":"2019-07-11T07:00:55+03:00"}
elrond-go/process/block/baseProcess.go","level":"info","line_number":115,"msg":"nonce not match: local block nonce is 11945 and node received block with nonce 11947\n","time":"2019-07-11T07:00:55+03:00"}
Full log:
The node consumes too much data Ram in the direction of increasing steadily
Up to now has restarted Nodes 2 - 3 times!
Started a fresh observe, left it syncing, but it never arrives because it's syncing for a block that doesn't exist yet.
As can be seen in this first screenshot, it's trying to sync to 22757 block while 214?? is the highest final block sync.
After ctrl+c and restart ./node it immediately comes back as synchronised. As can be seen in this second screenshot, from briefly after, it's now synced on 21496 block, significantly lower than which it was previously trying.
It seems it was originally trying to sync to "cross check block height: meta", which is ahead.
Half an hour later the obervser is still reporting it's in sync, on 21746 block, still lower than what it was trying to sync to. (See third screenshot.)
(Server time is UTC, as noted in the tmux right lower corner.)
ubuntu server 18.04, shard 2, code from branch above, started node with reset storage:
The log contains two sets of errors, the first with "too many open files" stopped after running "ulimit -n 65535" and restarting.
I believe the second error started after this, "time is out" with 104k transactions in the pool
edit:
I've updated the node to commit 8005ec and I get 'time is out' as well, but tx pool is 0 now
Fork was detected at a certain nonce and hash and it started the rollback mechanism to a previous header with nonce and hash and after that it ended the fork choice. After that it started to request the same header with the same hash as the one where fork was detected and went into a loop, with rollback, request, fork detected, rollback, request, fork detected and so on.
Logs: http://s000.tinyupload.com/?file_id=22154264996387362718
I'd like to suggest to print the time a node process has been running in the output, this should make troubleshooting (others) easier. That way we can tell if someone successfully restarted their node or not.
Block finality: block N is final only if block N-1, block N-2 … block N-K are signed. Metachain only notarized final blocks. Currently we have chosen K = 1;
You can read more about the K-finality here
When a boostrap from storage starts it should feed again the elastic from the first block if the node is launch with a specific flag in this sense
Change "Insufficient founds" print on transaction validation process from info to debug
Join the validator community chat on Riot: https://riot.im/app/#/room/#elrond:matrix.org
Observer connected to shard 2, using the code from branch "elrond-go-issue-248-Wrong-nonce-in-block".
This error is repeating, and it did not happen when I was running the observer on shard 0, same node version:
Total txs in pool: 566
header in metablock is not final
header in metablock is not final
requested header with nonce 20646 from network
received requested header with nonce 20646 from network and probable highest nonce is 49259
I'm using this repo to run elrond-node in docker:
https://github.com/mrz1703/elrond-node
My current test machine is a Rasp py 4 with a 64 bits kernel (aarch64). When launching the first "docker-compose build" I get this error:
../../core/logger/redirectStderrLinux.go:12:9: undefined: syscall.Dup2
As stated on another github issue, dup2 is not supported on armv8:
rfjakob/gocryptfs#121
The file on "master/core/logger/redirectStderrLinux.go" should have a case for arm64 kernel in order to support dup3
When playing CryptoBubbles.io, I noticed that the start time was delayed for up to 6 seconds, after having waited for 15 seconds after the previous round.
I guess this is due to a higher latency in the CB testnet? I also noticed on the CB testnet Explorer that the most recent transactions were sometimes visible after more than 20 seconds.
Any clues about this?
Edit: Latency now 7 seconds in the CB Testnet Explorer, so today, no extra delays! But... also just a small amount of CB players. ;)
I've connected remotely to someone who was having issues setting up the node on Linux.
I've recommended the _golang.sh script from elrond-go-scripts, but there's a problem, it downloads go1.13, and this is not compatible with v1.0.15, it fails at compile time.
In the end I had to remove go1.13 and install go1.12.9 and the problem goes away
In Windows, if you run the node with the parameter --storage-cleanup, you get the following error:
"error creating trie: error creating accountsTrieStorage: mkdir 8c1f16875103: Access is denied."
if the folder 8c1f16875103 already exists.
Problem:
ubuntu@ubuntu:~/go/src/github.com/ElrondNetwork/elrond-go/cmd/node$ go build github.com/ElrondNetwork/elrond-go/core/logger ../../core/logger/redirectStderrLinux.go:12:9: undefined: syscall.Dup2
Solution: changes in the redirectStderrLinux.go
//+build linux darwin package logger import ( "os" "syscall" ) // redirectStderr redirects the output of the stderr to the file passed in func redirectStderr(f *os.File) error { err := syscall.Dup2(int(f.Fd()), int(os.Stderr.Fd())) if err != nil { return err } return nil }
Fix: new vestion of redirectStderrLinux.go
//+build linux darwin package logger import ( "os" "syscall" ) // redirectStderr redirects the output of the stderr to the file passed in func redirectStderr(f *os.File) error { err := syscall.Dup3(int(f.Fd()), int(os.Stderr.Fd()), os.O_CREATE|os.O_APPEND|os.O_WRONLY) if err != nil { return err } return nil }
pull request will follow shortly
Ulimit is unlimited.
Experiencing time is out, forking and/or no/slow syncing issues this run (via Docker on Ubuntu 18.04 LTS). Shard 4 logs will encompass the new node after the patch fix.
shard1.zip
shard2.zip
shard2b.zip
shard3.zip
shard3b.zip
shard4.zip
shard5.zip
It appears that the metachain misses rounds if there is a high load on the shard chains. This might be caused by the fact that finality shard headers are requested by metachain validators in shards and the responses take too long.
After 24 hours of running, my validator node has died with this message, not contained in the log file:
2019-07-04 22:04:06.355331617 Step 2: signature has been sent
panic: runtime error: slice bounds out of range
goroutine 24130 [running]:
github.com/libp2p/go-msgio.(*reader).ReadMsg(0xc0056c39f0, 0x0, 0x0, 0x0, 0x0, 0x0)
C:/Users/Sorin/go/pkg/mod/github.com/libp2p/[email protected]/msgio.go:216 +0x25f
github.com/libp2p/go-libp2p-secio.(*etmReader).fill(0xc00645cd20, 0x634, 0x0)
C:/Users/Sorin/go/pkg/mod/github.com/libp2p/[email protected]/rw.go:132 +0x3f
github.com/libp2p/go-libp2p-secio.(*etmReader).Read(0xc00645cd20, 0xc005fd9c20, 0xc, 0xc, 0x0, 0x0, 0x0)
C:/Users/Sorin/go/pkg/mod/github.com/libp2p/[email protected]/rw.go:171 +0x264
io.ReadAtLeast(0x2c919c8, 0xc00034d440, 0xc005fd9c20, 0xc, 0xc, 0xc, 0xc, 0x0, 0x0)
D:/Go/src/io/io.go:310 +0x8f
io.ReadFull(...)
D:/Go/src/io/io.go:329
github.com/libp2p/go-yamux.(*Session).recvLoop(0xc004a51860, 0x0, 0x0)
C:/Users/Sorin/go/pkg/mod/github.com/libp2p/[email protected]/session.go:499 +0xdb
github.com/libp2p/go-yamux.(*Session).recv(0xc004a51860)
C:/Users/Sorin/go/pkg/mod/github.com/libp2p/[email protected]/session.go:478 +0x32
created by github.com/libp2p/go-yamux.newSession
C:/Users/Sorin/go/pkg/mod/github.com/libp2p/[email protected]/session.go:128 +0x38f
Full log here:
I'd like to suggest that if NodeDisplayName isn't set, it gets a random name assigned, rather than everyone "N/A". (Can be random characters, based on say MAC address or public key, from a word list, etc.) This way we can help those who struggle to set it (and / or don't want to interrupt their uptime) find themselves on the validator list by
`cat $GOPATH/src/github.com/ElrondNetwork/elrond-go-node-*/config/config.toml | grep NodeDisplayName
"Time is out" errors is endlessly happening after upgrading to v1.07. It happens whether or not the db is wiped. Logs:
Hey guys,
My v1.0.16 testnet node was running smooth, until I changed the name of the folder elrond-go to elrond-go-16. Then I run node.exe again and the firewall (Windows Defender) asked me if I should give it permission, which I answered "no" this time.
Running node.exe again, I got the Invalid Key error.
I run again node.exe with -use-log-view parameter and the last line before the app exited, were:
No views for current node
No AppStatusHandler used. Started with NilStatusHandler
could not create local data store: file missing [file=MANIFEST-000000]
Then I installed v1.0.17, built it and tried to reproduce this issue. The result is different, as the Invalid Key message appears only for a second, after which the app resumes normal operation, so it seems the issue is partially fixed.
One second later ...
Tried to reproduce it with v1.0.16 built from scratch and I get the same behaviour as in v1.0.17.
The only difference is that I disabled the AntiVirus (Avast).
After re-enabling it and trying to reproduce the issue again, node.exe crashed with the messages:
No AppStatusHandler used. Started with NilStatusHandler
closing the leveldb process loop
closing the timed batch handler
closing the timed batch handler
closing the leveldb process loop
closing the timed batch handler
closing the leveldb process loop
closing the timed batch handler
closing the leveldb process loop
could not create local data store: file missing [file=MANIFEST-000000]
The conclusion is that the issue is infact produced by the AntiVirus running node.exe in SandBox for a few seconds.
At night there was an unexpected shutdown of the Internet. After the restart, the node sync stuck on block 7695-7696.
On my machine two nodes work and both have problems with synchronization. One works in Windows 10, the second through VirtualBox in Windows 7. I wanted to test it. Each node has its own keys. One node is on shard 4, the second is " meta".
I decided to restart the node from the very first step by deleting the "go" folder and creating it again. I re-created the node on the manual from the site and ran it again. But that didn't solve my problem. At new synchronization I also got stuck on the block 7696
How to solve the problem ? My nickname in Riot: mystic999
logs and stats :
Node.zip
windows 10 keep shutting down my nodes
04449deb2b39_fatalErrors-2019-10-10-09-28-27.log
Hi,
on windows with a laptop surface pro 6 (core i7 with 16GB of RAM) I have this error:
Starting as observer node...
Starting in shard: 0
closing the leveldb process loop
closing the timed batch handler
could not create local data store: not supported cache type
Just saw that one of multiple nodes on a VPS was suddenly offline.
So I checked and the node was still running, it just lost connection to the network somehow! Very strange. I stopped and restarted it and now it's synced again and connected to 332 peers and counting.
It was on shard 3. The machine has multiple nodes running on different REST-API ports, this one was on 8080.
Any idea what could've happened?
I cannot synchronize and create blocks
092823.log
https://matrix.org/_matrix/media/r0/download/matrix.org/LAkYDPPMgvVVSAwDjOXEWyXJ
After curling http://localhost:8080/node/stop I get an "ok", which is expected.
However, when I call it again right after I still get and "ok" instead of the errors specified in api/node/routes.go (lines 110 - 119):
if !ef.IsNodeRunning() {
c.JSON(http.StatusOK, gin.H{"message": errors.ErrNodeAlreadyStopped.Error()})
return
}
err := ef.StopNode()
if err != nil && ef.IsNodeRunning() {
c.JSON(http.StatusInternalServerError, gin.H{"error": fmt.Sprintf("%s: %s", errors.ErrCouldNotStopNode.Error(), err.Error())})
return
}
It would be nice to have such a feature !
After fixing the compilation problem for arm64 architecture, once the ./node is started, the following errors occurs:
`ubuntu@ubuntu:~/go/src/github.com/ElrondNetwork/elrond-go/cmd/node$ ./node -use-log-view
Starting node with version undefined/go1.12.9/linux-arm64
Process ID: 741
/home/ubuntu/go/src/github.com/ElrondNetwork/elrond-go/cmd/node/config/config.toml
Initialized with config from: ./config/config.toml
/home/ubuntu/go/src/github.com/ElrondNetwork/elrond-go/cmd/node/config/p2p.toml
Initialized with p2p config from: ./config/p2p.toml
/home/ubuntu/go/src/github.com/ElrondNetwork/elrond-go/cmd/node/config/genesis.json
Initialized with genesis config from: ./config/genesis.json
/home/ubuntu/go/src/github.com/ElrondNetwork/elrond-go/cmd/node/config/nodesSetup.json
Initialized with nodes config from: ./config/nodesSetup.json
NTP average clock offset: 0s
Start time formatted: Fri Sep 13 17:50:00 UTC 2019
Start time in seconds: 1568397000
/home/ubuntu/go/src/github.com/ElrondNetwork/elrond-go/cmd/node/config/initialNodesSk.pem
Starting with public key: 32854f7742a70f06b9a87ff0b62b1795a93418b65af3b2e557d2cd1ed5c0876f6ac341bdf39f1092771a7486ec03d905dd10036a4193a631c6e8a0d8958fab6334a0d6a97e2ffc3de456d8ddc52542445ed75c17944aedf2040ab22b18d8e7eb0cb9030ed38aeca03dc1131680fe0d89e1127c89e39a50167f716ed388dfacd8
Starting as observer node...
Starting in shard: 0
/home/ubuntu/go/src/github.com/ElrondNetwork/elrond-go/cmd/node/config/server.toml
No views for current node
No AppStatusHandler used. Started with NilStatusHandler
creatingShardDataPool from config
invalid public key string`
Raspberry Pi 4 - Ubuntu server 18.04 (arm64) + go1.12.9 linux/arm64
Not sure if this is a Docker or Elrond issue, but when running the node with --use-log-view in a Docker container and attaching to it, the ^P^Q detach command does not work.
After several trying to type node.exe, at least but not last i can connected again to server, i just tried the new update.bat script,,
but the problem is not in there, cause when i try again to do manual connection, the server.toml issue is back, it's prevent us to get connected to the server, after several try, at least but not last, i can get connected again,,
and then now is the time for trying to run the additional node, and it comes again, this time i just copied (just 1 file), server.toml from cmd/node/config, and overwritten it who were in Node1/config/ directory, and it did it, solve the issue by now, but i think it'll still remain, unless some one has fixing it,,
i found this issue on windows, i use windows 7 sp 1, didn't know how it goes on another os
Thank you for reading my report
Version 1.0.24 eat lot ram in Ubuntu compared to the last version 1.0.23
The Rest API port seems to be hardcoded in main.go with the name "rest-api-port", and requires rebuilding the app after it's changed. Please expose it either in config.toml or as an argument when starting the node over CLI.
Thanks
the node just died, on windows, no error message on screen:
log with details:
After processing about 15m - 20m txs on constant high network load on dual-core 4gb ram, node prints "out of memory" on node console. After that, instance keeps running but no further execution takes place.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.