status-im / nim-eth Goto Github PK

View Code? Open in Web Editor NEW

79.0 21.0 31.0 3.76 MB

Common utilities for Ethereum

Home Page: https://nimbus.status.im

License: Apache License 2.0

Nim 100.00%

ethereum nim devp2p rlp eth discv5 whisper

nim-eth's People

Contributors

Stargazers

Watchers

nim-eth's Issues

Transports not closed on disconnect() or when connection is dropped

As discussed on gitter, there are some transport leaks currently. Creating this issue so that there is some traceability and that you know that I'm looking into it.

Basically on every disconnect(), unless it is followed specifically with a peer.transport.close() (which is never?), we leak transports.
And also on a dropped connection (https://github.com/status-im/nim-eth/blob/master/eth/p2p/rlpx.nim#L518) it will not be closed currently.

Transport does get closed when a disconnect message is received: https://github.com/status-im/nim-eth/blob/master/eth/p2p/rlpx.nim#L526

Either mark actually used variables as such or removing them

Or otherwise ameliorate these:

nim-beacon-chain/vendor/nim-eth/eth/p2p/p2p_backends_helpers.nim(33, 6) Hint: 'initProtocolStates' is declared but not used [XDeclaredButNotUsed]
nim-beacon-chain/vendor/nim-eth/eth/p2p/p2p_backends_helpers.nim(41, 6) Hint: 'resolveFuture' is declared but not used [XDeclaredButNotUsed]
nim-beacon-chain/vendor/nim-eth/eth/p2p/p2p_backends_helpers.nim(46, 6) Hint: 'requestResolver' is declared but not used [XDeclaredButNotUsed]
nim-beacon-chain/vendor/nim-eth/eth/p2p/p2p_backends_helpers.nim(78, 6) Hint: 'linkSendFailureToReqFuture' is declared but not used [XDeclaredButNotUsed]
nim-beacon-chain/vendor/nim-eth/eth/p2p/p2p_backends_helpers.nim(93, 6) Hint: 'handshakeImpl' is declared but not used [XDeclaredButNotUsed]
nim-beacon-chain/vendor/nim-eth/eth/p2p/p2p_protocol_dsl.nim(288, 15) Hint: 'CurrentProtocol' is declared but not used [XDeclaredButNotUsed]
nim-beacon-chain/vendor/nim-eth/eth/p2p/p2p_protocol_dsl.nim(292, 17) Hint: 'perProtocolMsgId' is declared but not used [XDeclaredButNotUsed]
nim-beacon-chain/vendor/nim-eth/eth/p2p/p2p_protocol_dsl.nim(299, 16) Hint: 'state' is declared but not used [XDeclaredButNotUsed]
nim-beacon-chain/vendor/nim-eth/eth/p2p/p2p_protocol_dsl.nim(304, 16) Hint: 'networkState' is declared but not used [XDeclaredButNotUsed]
nim-beacon-chain/vendor/nim-eth/eth/p2p/p2p_protocol_dsl.nim(593, 20) template/generic instantiation of `makeEth2Request` from here
nim-beacon-chain/vendor/nim-eth/eth/p2p/p2p_tracing.nim(87, 12) Hint: 'initTracing' is declared but not used [XDeclaredButNotUsed]
nim-beacon-chain/vendor/nim-eth/eth/p2p/p2p_tracing.nim(89, 12) Hint: 'logSentMsg' is declared but not used [XDeclaredButNotUsed]
nim-beacon-chain/vendor/nim-eth/eth/p2p/p2p_tracing.nim(90, 12) Hint: 'logReceivedMsg' is declared but not used [XDeclaredButNotUsed]
nim-beacon-chain/vendor/nim-eth/eth/p2p/p2p_tracing.nim(91, 12) Hint: 'logConnectedPeer' is declared but not used [XDeclaredButNotUsed]
nim-beacon-chain/vendor/nim-eth/eth/p2p/p2p_tracing.nim(92, 12) Hint: 'logAcceptedPeer' is declared but not used [XDeclaredButNotUsed]
nim-beacon-chain/vendor/nim-eth/eth/p2p/p2p_tracing.nim(93, 12) Hint: 'logDisconnectedPeer' is declared but not used [XDeclaredButNotUsed]

Unintuitive error in `p2pProtocol` macro

Embarking on my first nim-devp2p hello world adventure, I was stumped for awhile by the following situation:

import
  eth/p2p, chronos, chronicles, chronos/timer

const
  bzzVersion = 42

p2pProtocol Bzz(version = bzzVersion):

  proc hellooo(peer: Peer,
    foo: string) =
     echo foo
    
  onPeerConnected do (peer: Peer):
    warn "conn"
    if peer.supports(Bzz):
      warn "bzz"
      waitFor peer.hellooo("bar")
      waitFor sleepAsync(1000)

Compiling this returned the following error:

/home/lash/src/status/nim-eth/eth/rlp.nim(377, 18) Error: type mismatch: got <Rlp, type string>
but expected one of: 
proc read[T](future: Future[T] | FutureVar[T]): T
  first type mismatch at position: 1
  required type: Future[read.T] or FutureVar[read.T]
  but expression 'rlp' is of type: Rlp
proc read(rstream: AsyncStreamReader; n = 0): Future[seq[byte]]
  first type mismatch at position: 1
  required type: AsyncStreamReader
  but expression 'rlp' is of type: Rlp
proc read(transp: StreamTransport; n = -1): Future[seq[byte]]
  first type mismatch at position: 1
  required type: StreamTransport
  but expression 'rlp' is of type: Rlp

expression: read(rlp, type(result.foo))

Turns out the p2pProtocol is a macro, and it inserts code that refers to the rlp namespace. Adding eth/rlp to the imports in the code solved the error.

It was however not so easy to understand the error context, so maybe it could be improved.

Database backends not exception safe

All implementations of TrieDatabaseRef raise exceptions but have exception safety issues - in particular, resources will not be released / aborted / committed correctly leading to deadlocks / data loss.

Examples where a raise will leak or break - the code in general needs a full review, these are just samples:

nim-eth/eth/trie/backends/lmdb_backend.nim

Line 113 in 3ee5651

raiseKeyWriteError(key)

(txn not released)

nim-eth/eth/trie/backends/sqlite_backend.nim

Line 57 in 3ee5651

result.deleteStmt = prepare "DELETE FROM trie_nodes WHERE key = ?;"

(previous prepared statement not released)
https://github.com/status-im/nim-rocksdb/blob/08fec021c0f28f63d1221d40a655078b5b923d1b/rocksdb.nim#L130 (leak partially opened db)

Refactor whisper message structure + protocol code

Split current Whisper code into message structure and purely protocol code.
This avoids duplicated code & tests for other versions/forks of the protocol such as the one Status uses (soon Waku/0)

uninitialised field in const object

The Nim VM fails to zero an object field in a const assignment involving a proc call.

Nim code:

proc initNibbleRange*(bytes: ByteRange): NibblesRange =
  result.bytes = bytes
  result.ibegin = 0
  result.iend = bytes.len * 2> Does this sound familiar to anyone? The Nim VM fails to zero an object field in a `const` assignment involving a proc call.

Nim code:
```nim
proc initNibbleRange*(bytes: ByteRange): NibblesRange =
  result.bytes = bytes
  result.ibegin = 0
  result.iend = bytes.len * 2

const zeroNibblesRange* = initNibbleRange(zeroBytesRange)

Now this constant is imported and used in another file. This is how the generated C looks for the latter:

NIM_CONST tyObject_NibblesRange_NTijGP6lMbjliuMIEehBLw TM_9aIpn29c4nFUFIFQM9b2UMu5w_4 = {{NIM_NIL, NIM_NIL, ((NI) 0)}, ((NI) 0), ((NI) IL64(279856062208720))};

We suddenly have uninitialised memory in the iend field instead of 0.

This is a hard to replicate bug. I tried copying the relevant Nim code in the same file or in separate files (in case it's an import issue) to make a stand-alone test case, without any luck.

The best test case I can come up with needs nim-eth:

import eth/trie/[trie_defs,nibbles]

echo zeroBytesRange.len
echo zeroNibblesRange.len
let nr = initNibbleRange(ByteRange())
echo nr.len

The obvious workaround is to replace the const with a let, but that won't solve the underlying VM bug.

const zeroNibblesRange* = initNibbleRange(zeroBytesRange)

Now this constant is imported and used in another file. This is how the generated C looks for the latter:
```C
NIM_CONST tyObject_NibblesRange_NTijGP6lMbjliuMIEehBLw TM_9aIpn29c4nFUFIFQM9b2UMu5w_4 = {{NIM_NIL, NIM_NIL, ((NI) 0)}, ((NI) 0), ((NI) IL64(279856062208720))};

We suddenly have uninitialised memory in the iend field instead of 0.

This is a hard to replicate bug. I tried copying the relevant Nim code in the same file or in separate files (in case it's an import issue) to make a stand-alone test case, without any luck.

The best test case I can come up with needs nim-eth:

import eth/trie/[trie_defs,nibbles]

echo zeroBytesRange.len
echo zeroNibblesRange.len
let nr = initNibbleRange(ByteRange())
echo nr.len

Example output (the non-zero value changes with each compilation):

0
280895550172144
0

The obvious workaround is to replace the const with a let, but that won't solve the underlying VM bug.

duplicate connections when incoming vs outgoing connect

It's possible to have multiple connections with the same Peer:

Easy scenario:

Peer A connects to peer B (outgoing connection from Peer A): connection successful
Peer B connects to peer A (incoming connection to Peer B): connection is set-up but a disconnect is done due to duplicate connection. Remove peer from peer_pool.connectedNodes is done.
Peer A remains connected to B (connection from step 1)
Peer A connects again to peer B (outgoing connection Peer A): connection is set-up as peer B is no longer in connectedNodes

I think the same is possible with first an incoming connection for Peer A and next an outgoing. It is just more difficult to achieve as there is an initial check here: https://github.com/status-im/nim-eth/blob/master/eth/p2p/peer_pool.nim#L62
However, due to the async behaviour of rlpxConnect and rlpxAccept, I think this could still occur the other way around.

To resolve this the following could be done:
Do not let the disconnect call remove the peer from the connectedNodes in the peer pool in case DisconnectionReason is AlreadyConnected.

Other options are to check within rlpxAccept and rlpxConnect itself if the node is already in connectedNodes or connectingNodes. By the way, in case of an incoming connection, connectingNodes is never set, it is true however that in this case the actual node (pub key) is known much later. Also, even when moving the check to rlpxAccept and rlpxConnect, there is still a chance to have the problem will occur (due to the last await) unless the connectedNodes add is moved right after the check.

So I guess the question basically is, in case of duplicate connection: Do we allow the full connection (including all sub protocol connection handlers) to happen and then afterwards need to disconnect (including running all sub protocol disconnection handlers). Or do we just stop rlpxAccept or rlpxConnect the moment we know it is a duplicate peer (e.g. before postHelloSteps)? The latter will be more intrusive as change.

Thoughts?

Wakusim revealed duplicate valid_envelopes for node sending traffic

Testing simulation that is done here status-im/nimbus-eth1#437 revealed that the traffic sending node has more valid_envelopes than there are actually being send. Investigate how these duplicates got added.

Run discovery DHT behind nat

This ticket describes issues with the current code, when running behind a nat:

node A runs on port 50000 on network X.0
node B runs on port 50001 on network X.0
node C runs on port 50002 on network Y.0
to node C, node A runs on network X.1, behind a NAT

A and B connect to each other on local network - they will have each others 'X.0' addresses in the DHT. Now C connects to A because it had a bootstrap address with X.1:50000, and is unable to connect to node B, because there is no DHT entry with X.1 - A only sees B as X.0:50001



                                 FIREWALL
                                 +----+
                                 |    |
                                 |    |       +-----------------+
                                 |    |       |                 |
                                 |    |       |                 |
                                 |    |       |      A          |
                                 |    +-------+                 |
                                 |    |       |                 |
                                 |    |       |                 |
                                 |    |       |     X.0:50000   |
        +-----------+            |    |       +-----------------+
        |           |            |    |                |
        |           |            |    |                |
        |           |            |    |                |
        |   C       +------------+X.1 |                |
        |           |            |    |                |
        |           |            |    |                |
        |           |            |    |                |
        +-----------+            |    |                |
                                 |    |                |
                                 |    |       +-----------------+
                                 |    |       |     X.0:50001   |
                                 |    |       |                 |
                                 |    |       |       B         |
                                 |    +-------+                 |
                                 |    |       |                 |
                                 |    |       |                 |
                                 |    |       |                 |
                                 |    |       |                 |
                                 |    |       +-----------------+
                                 |    |
                                 +----+

Add metrics to Whisper and peer pool

When having a whisper node it could be useful to have some metrics such as:

number of peers
number of envelopes
number of dropped envelopes
- because of expired
- future timed
- PoW reasons
- other invalid?

Perhaps also metrics to be able to assess all incoming messages versus messages that are requested specifically through filters.

Prevent silent failures in RLP serialization

Right now, a default serializer will swallow any type that's passed to RLP - this makes the serialization easy to break by simply forgetting an import, as can be seen in status-im/nimbus-eth2#144

Instead of having an overload that silently swallows mistakes, make enabling of the default serializer explicit. Consider both intrusive (pragma) and non-intrusive (for 3rd party types) options .

Fix length calculation in PoW calculation

Use the size of short_rlp(envelope) as length. As is done now in latest version of geth and Status whisper fork.

Improve handling of dropped connection before end of postHelloSteps

Limit (incoming) connecting peers.

Currently in our implementation, the maximum connected peers limit is only applied on the creation of new outgoing connections.

The listener for incoming connections remains active. This gives a serious risk and should be handled by either:

Being the very friendly peer: allowing the handshake to start and then send a disconnect message with ToManyPeers as DisconnectionReason
Ignore any incoming connections and just disconnect them without any message.
Don't allow the incoming connection at all?

I'm guessing when going from 1. down, the possibility of being seen as a "bad" peer will go up. However, practically, I'm not sure if any client actually cares.

"Broken pipe" or "Transport is already closed" error on SendMsg results in crash

At https://github.com/status-im/nim-eth/blob/master/eth/p2p/rlpx.nim#L283

Already gets catched but reraised there, so crash can occur still if not properly handled higher up.
E.g.

Async traceback:
  nimbus.nim(147)        nimbus
  nimbus.nim(126)        process
  asyncloop.nim(266)     poll
  asyncmacro2.nim(36)    sendMsg_continue
  rlpx.nim(283)          sendMsgIter
  asyncfutures2.nim(377) read
  #[
    nimbus.nim(147)        nimbus
    nimbus.nim(126)        process
    asyncloop.nim(266)     poll
    asyncmacro2.nim(39)    sendMsg_continue
    excpt.nim(138)         sendMsgIter
  ]#
  #[
    nimbus.nim(147)        nimbus
    nimbus.nim(126)        process
    asyncloop.nim(266)     poll
    asyncmacro2.nim(36)    sendMsg_continue
    excpt.nim(138)         sendMsgIter
  ]#
Exception message: (32) Broken pipe
Exception type: [Exception]

Could probably just not reraise there, but perhaps has some side-effects (possible other error cases?).

Allow Whisper light client to send regular messages?

Decide on whether we want to change our Whisper implementation to allow this.
More information here: vacp2p/rfc#28 (comment)

For Waku/0 we will most likely have to change this.

'payloadBytesCount' in rlp.nim behaves differently on 32 bit vs 64 bit

I found 'payloadBytesCount' in eth/rlp.nim behaves differently when compiled to 32bit. The spec doesn't say anything about decoding to platform dependent 'int'.

it's a bit confusing whether to fail on 32 bit or 64 bit. need deeper investigation.

rlpx.nim Future double completion

@zah now when more better stack traces available

DBG 2019-04-17 15:02:32+02:00 GossipSub message not delivered to Peer tid=23109 peer=Node[<hidden ip address>:9100]
Error: unhandled exception: An attempt was made to complete a Future more than once.
  Details: Future ID: 129240
  Creation location:
    ../../nim-eth/eth/p2p/p2p_backends_helpers.nim(65) [unspecified]
  First completion location:
    ../../nim-eth/eth/p2p/rlpx.nim(183) [unspecified]
  Second completion location:
    ../../nim-eth/eth/p2p/rlpx.nim(270) [unspecified]
  Stack trace to moment of creation:
    No stack traceback available
  Stack trace to moment of secondary completion:
    No stack traceback available
[FutureError]

Setting RLPx subprotocol version to 0 crashes

Problem

If I set wakuVersion (or whisperVersion) to 0 like this: https://github.com/oskarth/nim-eth/blob/waku-zero/eth/p2p/rlpx_protocols/waku_protocol.nim#L56-L57

I get the following crash:

[Suite] Waku connections
INF 2019-11-15 17:01:52+08:00 RLPx listener up                           tid=10758 self=enode://23cb2baaab4231bc85271e540c6161549199b78c7f66c2997ca7f0722650a8fcf5e005ff8b34043959bd5e8e0c27a1b868c8a870282a0f8bc18d5c1617084225@127.0.0.1:30304
Traceback (most recent call last)
/home/oskarth/git/nim-eth/tests/p2p/test_waku_connect.nim(26) test_waku_connect
/home/oskarth/.nimble/pkgs/chronos-2.3.2/chronos/asyncloop.nim(933) waitFor
/home/oskarth/.nimble/pkgs/chronos-2.3.2/chronos/asyncloop.nim(274) poll
/home/oskarth/git/nim-eth/eth/p2p/rlpx.nim(1029) rlpxConnect
/home/oskarth/git/nim-eth/eth/p2p/rlpx.nim(899) postHelloSteps
/home/oskarth/git/nim-eth/eth/p2p/p2p_protocol_dsl.nim(305) WakuPeerConnected
/home/oskarth/.choosenim/toolchains/nim-0.20.0/lib/system/gc.nim(238) asgnRef
/home/oskarth/.choosenim/toolchains/nim-0.20.0/lib/system/gc.nim(183) incRef
SIGSEGV: Illegal storage access. (Attempt to read from nil?)
Error: execution of an external program failed: '/home/oskarth/git/nim-eth/tests/p2p/test_waku_connect -vvv'

Solution

I can't find in docs/p2p.md or https://github.com/ethereum/devp2p/blob/master/rlpx.md that suggest that this should be invalid. Either:

Make version 0 valid, or:
Give a better validation/compile error, and add to docs that this is a requirement

Notes

I didn't debug or try to fix this further, but seems like a bug in the P2P DSL.

Additionally, and separately, I noticed that p2p/docs.md says subprotocols should be 3 letters. In devp2p/rlpx.md it just says "short ASCII name" as far as I can tell. I can't tell if this a spec constraint and I'm looking in the wrong place, or what's going on there. Didn't look into it too much. Some validation there might be useful too.

Observer onPeerDisconnected run when it shouldn't

There is the possibility that the onPeerDisconnected call of an PeerObserver is called when it shouldn't. That is, when the onPeerConnected never got called.

This is because removePeer is run in disconnect, and disconnect can get called already in rlpxConnect or rlpxAccept`, e.g. https://github.com/status-im/nim-eth/blob/master/eth/p2p/rlpx.nim#L1347

This issue gets quick-fixed here: 2174a7b

But, it is already the second time that I see a problem because removePeer should not be run. So perhaps an architectural change would be better.

Possibly wrong rlp (de)serialisation of NewBlockAnnounce

The eth_protocol message newBlock is currently not used, but I noticed several different rlp deserialisation errors from incoming messages, so it looks like something is wrong there. To be investigated...

e.g. errors:

DBG 2019-06-14 13:07:29+02:00 Failed rlp.read                            tid=30804 file=../../../.nimble/pkgs/eth-1.0.0/eth/p2p/rlpx.nim:470 exception="Int expected, but found a List" msg=NewBlockAnnounce peer=Node[159.100.254.92:30303] rlpData="{\n  byte 88\n  byte 0\n  blob(3) [01D4C0]\n  blob(20) [20E94867794DBA030EE287F1406E100D03C84CD3]\n  \"\"\n  blob(68) [A9059CBB0000000000000000000000007B74C19124A9CA92C6141A2ED5F92130FC2791F200000000000000000000000000000000000000000000007E9A2F86853F440000]\n}"
DBG 2019-06-14 13:07:29+02:00 RlpError, ending dispatchMessages loop     topics="rlpx" tid=30804 file=../../../.nimble/pkgs/eth-1.0.0/eth/p2p/rlpx.nim:558 clientId=Geth/v1.8.23-stable-c9427004/linux-amd64/go1.10.4 msg=newBlock peer=Node[159.100.254.92:30303]

DBG 2019-06-14 13:29:37+02:00 Failed rlp.read                            tid=30804 file=../../../.nimble/pkgs/eth-1.0.0/eth/p2p/rlpx.nim:470 exception="The RLP contains a larger than expected Int value" msg=NewBlockAnnounce peer=Node[195.201.192.23:30304] rlpData="blob(20) [8E71B195D9CC953F46F41AD013BA1147464B621D]"
DBG 2019-06-14 13:29:37+02:00 RlpError, ending dispatchMessages loop     topics="rlpx" tid=30804 file=../../../.nimble/pkgs/eth-1.0.0/eth/p2p/rlpx.nim:558 clientId=Geth/v1.8.27-stable-noku-ae734200/linux-amd64/go1.11.4 msg=newBlock peer=Node[195.201.192.23:30304]

DBG 2019-06-14 13:30:36+02:00 Failed rlp.read                            tid=30804 file=../../../.nimble/pkgs/eth-1.0.0/eth/p2p/rlpx.nim:470 exception="Sequence expected, but the source RLP is not a list." msg=NewBlockAnnounce peer=Node[212.32.246.39:30314] rlpData="blob(9) [23BCCD842D5D4D0C2B]"
DBG 2019-06-14 13:30:36+02:00 RlpError, ending dispatchMessages loop     topics="rlpx" tid=30804 file=../../../.nimble/pkgs/eth-1.0.0/eth/p2p/rlpx.nim:558 clientId=Geth/v5.4.3/linux/go1.9.7 msg=newBlock peer=Node[212.32.246.39:30314]

DBG 2019-06-14 12:46:09+02:00 Failed rlp.read                            tid=30804 file=../../../.nimble/pkgs/eth-1.0.0/eth/p2p/rlpx.nim:470 exception="Fixed-size array expected, but the source RLP contains a blob of different length" msg=NewBlockAnnounce peer=Node[77.223.66.87:30303] rlpData="blob(2) [5208]"
DBG 2019-06-14 12:46:09+02:00 RlpError, ending dispatchMessages loop     topics="rlpx" tid=30804 file=../../../.nimble/pkgs/eth-1.0.0/eth/p2p/rlpx.nim:558 clientId=Geth/v1.8.25-stable/linux-amd64/go1.11.5 msg=newBlock peer=Node[77.223.66.87:30303]

DBG 2019-06-14 15:13:52+02:00 Failed rlp.read                            tid=1332 file=../../../.nimble/pkgs/eth-1.0.0/eth/p2p/rlpx.nim:469 dataType=NewBlockAnnounce exception="Small number encoded in a non-canonical way" peer=Node[35.158.27.128:30303]
DBG 2019-06-14 15:13:52+02:00 RlpError, ending dispatchMessages loop     topics="rlpx" tid=1332 file=../../../.nimble/pkgs/eth-1.0.0/eth/p2p/rlpx.nim:556 clientId=Geth/v1.8.23-stable-c9427004/linux-amd64/go1.10.4 msg=newBlock peer=Node[35.158.27.128:30303]

Return hash on Whisper/Waku postMessage

The non-pruning HexaryTrie is dangerous to use

Transferring the issue description from status-im/nimbus-eth1#229 (comment):

Digging deeper, I finally understand the problem. And we can leave the trie as the last suspect in this case.

The problem involves two or more accounts with the same storageRoot. To simplify the illustration, I will use only two accounts.

During transaction U execution, account A was modified and it has storageRoot=0xABCDEF
During transaction V execution, account B was modified and also arrive at storageRoot=0xABCDEF

Later on, during transaction X, either account A or B is modified again and it has storageRoot=0xDEADBEEF.
Here the problem arise, during transaction Y execution, account with storageRoot=0xABCDEF will have corrupted/incomplete trie. The other account that previously have the same storageRoot already doing damage on the trie if the pruning is turned on.

Perhaps this is more of design issue.

@zah, you know the hexary trie better than me, do you have any idea how to approach this situation? although my little fix works, but conceptually it doesn't remove the problem.

eth_protocol handshake with Parity clients always results in a timeout

When monitoring nimbus I often noticed a disconnection during rlpxConnect.
Turns out this is due to a timeout on the eth_protocol handshake.
Further debugging has shown that this occurs almost always with Parity clients.
In fact, so far, I've never seen a successful connection with a Parity client, and I've also tested this locally by starting one myself and trying to connect to it.

Let devp2p / rlpx find a match between highest common version of a protocol

Devp2p hello packet passes the node its capabilities. Capability is a protocol short name + version.
We should only start a sub-protocol (capability) between peers for the highest common version of that protocol.
Currently all common versions will be started.

Research & implement PoC for Waku/0 or Waku/1 & Whisper/6 compatiblity

As Waku is a different protocol from Whisper, nodes using Waku protocol will not be able to talk directly with nodes talking still Whisper/6 protocol (unless we do some tricks in handshake...but that would be hacky).
This will need to be resolved if the Status app requires to be compatible with older versions. (App pre Waku can send messages to app using Waku protocol).

Most likely solution here is to have enough full nodes in the network that speak both Waku and Whisper/v6 protocol and that those nodes relay the messages between both protocols.
So the possible solution here would be to implement such relay/bridging node.

Acceptance:
Working PoC bridge solution between Whisper/6 and Waku/0

Combine tests with `all_tests` + categories

See #27 (comment)

Run only onPeerDisconnected for subprotocols that had onPeerConnected ran.

Currently, if one of the subprotocol onPeerConnected calls fails (exception raised) this will be catched in rlpxConnect/rlpxAccept and the peer will get disconnected (UselessPeer).

Before this happens, all others subprotocol onPeerConnected calls will be ran though.
However, the onPeerDisconnected calls are never run. Possibly creating problems if they should have cleaned up certain resources/loops.

This could be added easily, but then also the onPeerDisconnected call of the failing subprotocol would be run, potentially causing other problems.

In short, best would be to have only onPeerDisconnected ran for those subprotocols that succesfully had onPeerConnected ran.

This used to be a potential problem for whisper_protocol but got solved like this: #62

But I think it could still be a problem for les_protocol: https://github.com/status-im/nim-eth/blob/master/eth/p2p/rlpx_protocols/les_protocol.nim#L249

Might connect to more than allowed number of connections.

Currently the check of whether to try to connect to more peers happens before the call which tries to connect to an x amount of peers.
A seq of candidate nodes is returned by discovery and then connected to in parallel. This might result in a higher amount of peers. I guess this is also why it is called minPeers, so technically, there is no problem.

However, there might be use cases where you do not want to go over a maximum defined limit.

Move discovery to a background thread

The discovery interface is pretty simple as it mostly consists of getters, like getRandomPeers, getPeerCount, with v5 smth like getPeersForTopic. These gettters look like an easy boundary for a separate thread. Even though we aim for the beacon node to be non-blocking, I still think it's a good idea to make discovery independent thread-wise of the rest of the code. After all it is a library designed to be used not only by nimbus.

Things to keep in mind:

The database reference must be thread-deep-copyable. I suppose that's already how it is with rocksdb.
All discovery instances should share one global discovery thread, in case we want to run an e.g. simulation in a single process.

Investigate possible abuse of expiry & ttl field in envelope

Here it is informally explained that in Whisper the origin of a message could be found out by a powerful adversary.
Perhaps the next issue is obvious and known and not the goal Whisper, but I assumed differently so I'll post it here anyhow. Also, I've not actually tested it.

A Whisper envelope contains the expiry and ttl fields. Typically in implementations: expiry - ttl = creation timestamp.

Now I question if a peer can not simply abuse this timestamp to figure out the origin of the envelope.
A peer his direct connections, could have a fairly stable round-trip time (RTT). Just looking at the timestamp could perhaps identify the peer as a sender or not. Especially over multiple messages.

This is however somewhat more complicated:

First there is the PoW calculation which will delay the queuing of the message. This delay will also have some randomness. However, the delay will probably not be enough compared to a usual RTT/2. Especially if the selected PoW is low (e.g. very low setting of PoW in Status).
Secondly, messages are send in batch and not immediate. This is not a requirement of EIP-627 but it is generally assumed and done in geth and nim-eth. Depending on the send interval (300ms currently), this will make it more difficult to figure out the origin of the message (1 - 300 ms can be added as delay).

In the worst case (e.g. messages with less delay added than RTT/2), it is likely that some messages will be able to be linked to a peer you are connected with.
Combine this e.g. with a node monitoring the messages of some most used public channels on Status and you could perhaps link IP-address to the messages & Status names of a user that this node is connected with.
Of course these are public channels but still I believe pseudoanonimity should be a goal even on those channels.

Possible ideas of improving this:

add more time randomness by decreasing expiry with random amount.
add more time randomness to the batch envelope send.

TBI

[P2P - Win 32-bits] LES protocol, flow control overflow on 32-bit

https://ci.appveyor.com/project/nimbus/nim-eth/builds/22193468/job/9gukl8y540xc2whc#L4691

Hint: C:\projects\nim-eth\tests\p2p\les\test_flow_control.exe  [Exec]
[Suite] les flow control
[Suite] running averages
  [OK] consistent costs
  [OK] randomized averages
[Suite] buffer value calculations
test_flow_control.nim(4) test_flow_control
flow_control.nim(421)    tests
flow_control.nim(266)    initFlowControl
system.nim(2830)         sysFatal
    Unhandled exception: over- or underflow [OverflowError]

Unfortunately before #4 Windows tests were not compiling at all (even in nim-eth-p2p before the merging).

Only start sending messages when connected to > 1 peers?

Perhaps we should add a check of x connected peers before we start sending Whisper messages at the specified interval.

If a node is only connected to 1 peer, and that peer is malicious, it could decide not to forward the node its messages.
In this case, the malicious peer could monitor with all it's connected peers if any duplicates of the node its messages are arriving. If this does not occur, it could mean that the messages originates from that node. It could of course also just mean, that the TTL is not long enough to get forwarded all the way to this peer. However, with a low amount of nodes (which Status kinda has), and over several messages, the malicious peer could probably deduce this.

This might be a none issue, especially if a node is only for a very limited time (like @ startup) connected to only 1 node.

Allow Protocol to have multiple versions

If we would have #140 working, we could get along by providing different Protocol names for different versions.

E.g.:

p2pProtocol Wakuv1(version = wakuVersion,
...
p2pProtocol Wakuv2(version = differentWakuVersion,
...

node.addCapability(Wakuv1)
node.addCapability(Wakuv2)

However, it would be a nice-to-have if we could arrange that through the provided version somehow. This is mostly a cosmetic thing though.

A nicer addition would be if we could tag also certain procs of a protocol to certain versions. So that if they remain the same, they don't have to be duplicated. Like, a newer version of the protocol could default just "inherit" all the procs of the older versions, unless it implements its own.
These are just some quick thoughts right now, not a priority and perhaps it is not worth the effort. One could always try to just isolate as much of the code and share it.

Implement waku/0 or /1 - addition of Waku mode to waku protocol

Implement a draft of the Waku mode. This could be something similar as #114, but exact spec has to be still decided/written. Draft spec writing might go in parallel of this implementation as PoC (testing ground).

Draft version would probably still fall under waku/0.
Finalized spec + implementation could be waku/1 or waku/0. This has to be still decided and is not part of this issue.

Requirements:

#123 (or at least the baseline).

Acceptance:
Working PoC of Waku mode.

Ambiguous call error (utils.fromHex) when running tests from Nim 0.20.0

Running

nimble install
nimble test

I expect tests to pass without any additional special action.

Actual result seen below, appears to be some ambiguous function calls in nim-0.20.0 and nimcrypto-0.39.9

> nimble test
  Executing task test in /home/oskarth/git/nim-eth/eth.nimble

Running: tests/test_bloom

[Suite] Simple Bloom Filter tests
  [OK] incl proc
  [OK] int value

Running: tests/test_common

[Suite] rlp encoding
  [OK] receipt roundtrip

Running: tests/keyfile/test_keyfile
/home/oskarth/git/nim-eth/eth/keyfile/keyfile.nim(233, 23) Error: ambiguous call; both utils.fromHex(a: string) [declared in /home/oskarth/.nimble/pkgs/nimcrypto-0.3.9/nimcrypto/utils.nim(158, 6)] and strutils.fromHex(s: string) [declared in /home/oskarth/.choosenim/toolchains/nim-0.20.0/lib/pure/strutils.nim(1049, 6)] match for: (string)
stack trace: (most recent call last)
/tmp/nimblecache/nimscriptapi.nim(164, 16)
/home/oskarth/git/nim-eth/eth_31398.nims(103, 18) testTask
/home/oskarth/git/nim-eth/eth_31398.nims(35, 12) runKeyfileTests
/home/oskarth/git/nim-eth/eth_31398.nims(27, 8) runTest
/home/oskarth/.choosenim/toolchains/nim-0.20.0/lib/system/nimscript.nim(242, 7) exec
/home/oskarth/.choosenim/toolchains/nim-0.20.0/lib/system/nimscript.nim(242, 7) Error: unhandled exception: FAILED: nim c -r -d:release -d:chronicles_log_level=ERROR --verbosity:0 --hints:off --warnings:off tests/keyfile/test_keyfile
     Error: Exception raised during nimble script execution

Workaround

Downgrading to 0.9.16 with choosenim 0.9.16. appears to work.

Support optional fields in RLPx messages

Existing use case for this is described here:
status-im/specs#60 (comment)

Allow for changing between topic-interest and bloom filter by means of new packet type

When topic-interest is used, we currently do now allow to switch back to usage of bloom filter and vice versa.

This issue has been raised and a new packet type with a StatusOptions parameter will be added to to specification.
As soon as this is done, this needs to be added to the waku_protocol here to be compliant.
This might include the removal of some of the current packets types.

See also here for the status-go version: status-im/status-go#1853

[CI] Status badges

Assert triggered in kademlia waitNeighbours sometimes when running beacon chain eth2_network_simulation

Not always, but starting the eth2_network_simulation sometimes triggers this:

FAT 2019-06-14 17:25:43+02:00 Fatal exception reached tid=11000 err="/home/deme/repos/nimbus-nbc/vendor/nim-eth/eth/p2p/kademlia.nim(291, 11) `not contains(k.neighboursCallbacks, remote)`

nim-eth/eth/p2p/kademlia.nim

Line 291 in ee27111

doAssert(remote notin k.neighboursCallbacks)

Implement association list for Waku Status packet

As per spec: https://github.com/vacp2p/specs/blob/master/waku/waku.md#packet-usage

Check rlpx for EIP-8 compliance

Check if rlpx complies with EIP-8 and create unit tests with these test vectors:
https://github.com/ethereum/EIPs/blob/master/EIPS/eip-8.md#test-vectors

This was done already for the discovery part.

Sporadic stall of CI on test_sparse_binary_trie.nim

I've seen the test_sparse_binary_trie test hanging sporadically on our CI tests.

Specifically on Travis, MacOS AMD64. I can't recall, but it might also occur on Linux.

Running: tests/trie/test_sparse_binary_trie

[Suite] sparse binary trie

No output has been received in the last 10m0s, this potentially indicates a stalled build or something wrong with the build itself.

Check the details on how to adjust your build configuration on: https://docs.travis-ci.com/user/common-build-problems/#Build-times-out-because-no-output-was-received

The build has been terminated

Implement the not implemented methods in eth_types.nim

eth_types.nim has a bunch of not implemented methods: https://github.com/status-im/nim-eth/blob/master/eth/common/eth_types.nim#L344

Need to go over the list and see what is necessary & implement them.

getBlockBody and getReceipt are needed for eth_protocol.
Others might be needed also for eth rpc.

Per IP-address peer limiting?

First off:
I'm not sure if the application the right place to do this. Perhaps this should be arranged system wide as this is typically a kind of restriction that can be variable per system. However, some advice on it should be formulated then.

The concern is that when having a limitation of amount of connected peers (see #107 & #108) combined with certain high timeouts on the handshakes in the rlpx protocol and sub-protocols that one node could easily launch many connects and allow them to timeout hogging up the connections.
This is obviously not completely correct as currently the limit is not enforced by amount of open connections but by amount of actual completed handshakes (and thus fully connected peers, which would be more intensive for a node to pull off). Just mentioning this part, in case we would change that.

However, there is still a higher limit that remains valid which is the open files limit on Unix systems.

First step should probably be to investigate if there are actually locations where there is no timeout at all (other than the TCP timeout): like are these OK?
Basic solution could also be to just lower the timeouts with a decent amount. (For Whisper I actually never used the 10 second timeout)

Implement waku/0 - current Status whisper fork as is

Implement waku/0 protocol, a Status fork of Whisper implementing the additions/changes done to it.

This will mostly be adding message packets ids, changes to light client (not sure?), notification on send (how does this work now? and why?). Questions should be asked regarding the validity of changes (is everything still used and needed?).

Mailserver related changes are targeted in separate issue: https://github.com/status-im/nim-eth/issues/102

Requirements:

Changes done by Status should be well documented (e.g. at https://github.com/status-im/specs/ ).
#122

Acceptance: Implement Status go compatible waku/0.

[RLPx] Test is passing even though there are errors

Is that normal that the following is passing even though there are errors?

https://travis-ci.org/status-im/nim-eth/jobs/489580420#L3422-L3431

cc @zah

DBG 2019-02-06 15:54:11+00:00 Received neighbours                        topics="discovery" thread=15118 neighbours="@[Node[127.0.0.1:30303], Node[127.0.0.1:30304], Node[127.0.0.1:30305]]" remote=Node[127.0.0.1:30301]
DBG 2019-02-06 15:54:11+00:00 onPeerConnected Whisper                    thread=15118
DBG 2019-02-06 15:54:11+00:00 onPeerConnected Whisper                    thread=15118
DBG 2019-02-06 15:54:11+00:00 Whisper peer                               thread=15118 peer=Node[127.0.0.1:51484] whisperVersion=6
DBG 2019-02-06 15:54:11+00:00 Sending envelopes                          thread=15118 amount=0
DBG 2019-02-06 15:54:11+00:00 Whisper peer initialized                   thread=15118
DBG 2019-02-06 15:54:11+00:00 Whisper peer                               thread=15118 peer=Node[127.0.0.1:30304] whisperVersion=6
DBG 2019-02-06 15:54:11+00:00 Whisper peer initialized                   thread=15118
DBG 2019-02-06 15:54:11+00:00 Not enough nodes                           topics="discovery" thread=15118 present=3 requested=10
DBG 2019-02-06 15:54:11+00:00 Not enough nodes                           topics="discovery" thread=15118 present=3 requested=10
DBG 2019-02-06 15:54:11+00:00 Received neighbours                        topics="discovery" thread=15118 neighbours="@[Node[127.0.0.1:30303], Node[127.0.0.1:30301], Node[127.0.0.1:30305]]" remote=Node[127.0.0.1:30304]
DBG 2019-02-06 15:54:11+00:00 Received neighbours                        topics="discovery" thread=15118 neighbours="@[Node[127.0.0.1:30303], Node[127.0.0.1:30301], Node[127.0.0.1:30304]]" remote=Node[127.0.0.1:30305]
DBG 2019-02-06 15:54:11+00:00 Received neighbours                        topics="discovery" thread=15118 neighbours="@[Node[127.0.0.1:30303], Node[127.0.0.1:30301], Node[127.0.0.1:30304]]" remote=Node[127.0.0.1:30305]
DBG 2019-02-06 15:54:11+00:00 Received neighbours                       m topics="discovery" thread=15118 neighbours="@[Node[127.0.0.1:30303], Node[127.0.0.1:30304], Node[127.0.0.1:30305]]" remote=Node[127.0.0.1:30301]
DBG 2019-02-06 15:54:11+00:00 Received neighbours                        topics="discovery" thread=15118 neighbours="@[Node[127.0.0.1:30303], Node[127.0.0.1:30304], Node[127.0.0.1:30305]]" remote=Node[127.0.0.1:30301]
DBG 2019-02-06 15:54:11+00:00 Received neighbours                        topics="discovery" thread=15118 neighbours="@[Node[127.0.0.1:30301], Node[127.0.0.1:30304], Node[127.0.0.1:30305]]" remote=Node[127.0.0.1:30303]
DBG 2019-02-06 15:54:11+00:00 Sending envelopes                          thread=15118 amount=0
DBG 2019-02-06 15:54:11+00:00 Sending envelopes                          thread=15118 amount=0
DBG 2019-02-06 15:54:11+00:00 Sending envelopes                          thread=15118 amount=0
ERR 2019-02-06 15:54:11+00:00 Light node not allowed to post messages    thread=15118
  [OK] Light node posting
INF 2019-02-06 15:54:11+00:00 RLPx listener up                           thread=15118 self=enode://5115f6a21cd791f92a684991e984afd7dd7634d00ee7e1166483d74814d06fa457e29112f69671454b2d9974d5cecfe34538354d64b08168bc19c75757bf5a92@127.0.0.1:30307
DBG 2019-02-06 15:54:11+00:00 onPeerConnected Whisper                    thread=15118
DBG 2019-02-06 15:54:11+00:00 onPeerConnected Whisper                    thread=15118
DBG 2019-02-06 15:54:11+00:00 Whisper peer                               thread=15118 peer=Node[127.0.0.1:51491] whisperVersion=6
DBG 2019-02-06 15:54:11+00:00 Exception in rlpxAccept                    topics="rlpx" thread=15118 err="Two light nodes connected\nAsync traceback:\n  test_shh_connect.nim(383) test_shh_connect\n  asyncloop.nim(761)        waitFor\n  asyncloop.nim(238)        poll\n  asyncmacro2.nim(36)       WhisperHandshake_continue\n  whisper_protocol.nim(718) WhisperHandshakeIter\n  #[\n    test_shh_connect.nim(383) test_shh_connect\n    asyncloop.nim(761)        waitFor\n    asyncloop.nim(238)        poll\n    asyncmacro2.nim(36)       postHelloSteps_continue\n    rlpx.nim(1243)            postHelloStepsIter\n    asyncfutures2.nim(325)    read\n  ]#\nException message: Two light nodes connected\nException type:" stackTrace="test_shh_connect.nim(383) test_shh_connect\nasyncloop.nim(761)       waitFor\nasyncloop.nim(238)       poll\nasyncmacro2.nim(36)      WhisperHandshake_continue\nwhisper_protocol.nim(718) WhisperHandshakeIter\n[[reraised from:\ntest_shh_connect.nim(383) test_shh_connect\nasyncloop.nim(761)       waitFor\nasyncloop.nim(238)       poll\nasyncmacro2.nim(36)      postHelloSteps_continue\nrlpx.nim(1243)           postHelloStepsIter\nasyncfutures2.nim(325)   read\n]]\n[[reraised from:\ntest_shh_connect.nim(383) test_shh_connect\nasyncloop.nim(761)       waitFor\nasyncloop.nim(238)       poll\nasyncmacro2.nim(36)      rlpxAccept_continue\nrlpx.nim(1413)           rlpxAcceptIter\nasyncfutures2.nim(325)   read\n]]\n"
DBG 2019-02-06 15:54:11+00:00 Whisper peer                               thread=15118 peer=Node[127.0.0.1:30307] whisperVersion=6
ERR 2019-02-06 15:54:11+00:00 dispatchMessages failed                    topics="rlpx" thread=15118 err="Data incomplete!\nAsync traceback:\n  test_shh_connect.nim(383) test_shh_connect\n  asyncloop.nim(761)        waitFor\n  asyncloop.nim(238)        poll\n  asyncmacro2.nim(36)       readExactly_continue\n  stream.nim(1284)          readExactlyIter\n  #[\n    test_shh_connect.nim(383) test_shh_connect\n    asyncloop.nim(761)        waitFor\n    asyncloop.nim(238)        poll\n    asyncmacro2.nim(36)       recvMsg_continue\n    rlpx.nim(405)             recvMsgIter\n    asyncfutures2.nim(325)    read\n  ]#\nException message: Data incomplete!\nException type:"
ERR 2019-02-06 15:54:11+00:00 dispatchMessages failed                    topics="rlpx" thread=15118 err="Data incomplete!\nAsync traceback:\n  test_shh_connect.nim(383) test_shh_connect\n  asyncloop.nim(761)        waitFor\n  asyncloop.nim(238)        poll\n  asyncmacro2.nim(36)       readExactly_continue\n  stream.nim(1284)          readExactlyIter\n  #[\n    test_shh_connect.nim(383) test_shh_connect\n    asyncloop.nim(761)        waitFor\n    asyncloop.nim(238)        poll\n    asyncmacro2.nim(36)       recvMsg_continue\n    rlpx.nim(405)             recvMsgIter\n    asyncfutures2.nim(325)    read\n  ]#\nException message: Data incomplete!\nException type:"
  [OK] Connect two light nodes

[p2p, rlpx] reqId procs incorrect codegen

It seems like reqId procs are wrong. Relevant codegen:

proc getBeaconBlockHeaders_thunk(msgSender: Peer; _1613480: int; data1613482: Rlp) {.
    gcsafe, async.} =
  var rlp = data1613482
  var msg {.noinit.}: getBeaconBlockHeadersObj
  let reqId = read(rlp, int)
  enterList(rlp)
  msg.blockRoot = checkedRlpRead(msgSender, rlp, Eth2Digest)
  msg.slot = checkedRlpRead(msgSender, rlp, uint64)
  msg.maxHeaders = checkedRlpRead(msgSender, rlp, int)
  msg.skipSlots = checkedRlpRead(msgSender, rlp, int)
  await(getBeaconBlockHeaders1613476(msgSender, reqId, msg.blockRoot, msg.slot,
                                     msg.maxHeaders, msg.skipSlots))

proc getBeaconBlockHeaders*(msgRecipient: Peer; blockRoot: Eth2Digest; slot: uint64;
                           maxHeaders: int; skipSlots: int; timeout: int = 10000): Future[
    Option[beaconBlockHeadersObj]] {.gcsafe.} =
  var writer = initRlpWriter()
  const
    perProtocolMsgId = 2
  let perPeerMsgId = perPeerMsgIdImpl(msgRecipient, BeaconSyncProtocol, 2)
  append(writer, perPeerMsgId)
  startList(writer, 4)
  newFuture result
  let reqId = registerRequest(msgRecipient, timeout, result, perPeerMsgId +
      1)
  append(writer, reqId)
  append(writer, blockRoot)
  append(writer, slot)
  append(writer, maxHeaders)
  append(writer, skipSlots)
  let msgBytes = finish(writer)
  linkSendFailureToReqFuture(sendMsg(msgRecipient, msgBytes), result)

Note startList(writer, 4) in the requestor, which then appends 5 params including reqId.
Note enterList(rlp) in the thunk, which happens after reading reqId and followed by 4 reads.
Fix on the way.

Implement effective solution for Waku/0 or Waku/1 & Whisper/6 message relay

Continued from #125

An effective implementation might require some additional changes to the current Whisper protocol implementation.

Requirement:

#124
#125

P2P protocol backwards compatibility

Currently the procs of a p2pProtocol will allow for parsing rlp that contains more fields than there are parameters. This is good as it allows for EIP-8 test vectors to pass.

However, the other way around does not work. When our protocol adds certain parameters to a proc in a newer version, then running against an older version will make it fail.
It would be nice if we could provide a fallback to a lower version.

It could perhaps be done by having some optional parameters in a proc, and then within that proc it could be decided how to continue (be compatible or not...).

This would be specifically handy in the handshake for #114.