Code Monkey home page Code Monkey logo

nim-eth's People

Contributors

adamspitz avatar arnetheduck avatar cammellos avatar cheatfate avatar clyybber avatar corpetty avatar cortze avatar decanus avatar etan-status avatar ivansete-status avatar jangko avatar jlokier avatar jtraglia avatar kaiserd avatar kdeme avatar konradstaniec avatar mjfh avatar mratsim avatar narimiran avatar nolash avatar onqtam avatar oskarth avatar stefantalpalaru avatar swader avatar tersec avatar web3-developer avatar yglukhov avatar zah avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

nim-eth's Issues

Transports not closed on disconnect() or when connection is dropped

As discussed on gitter, there are some transport leaks currently. Creating this issue so that there is some traceability and that you know that I'm looking into it.

Basically on every disconnect(), unless it is followed specifically with a peer.transport.close() (which is never?), we leak transports.
And also on a dropped connection (https://github.com/status-im/nim-eth/blob/master/eth/p2p/rlpx.nim#L518) it will not be closed currently.

Transport does get closed when a disconnect message is received: https://github.com/status-im/nim-eth/blob/master/eth/p2p/rlpx.nim#L526

Either mark actually used variables as such or removing them

Or otherwise ameliorate these:

nim-beacon-chain/vendor/nim-eth/eth/p2p/p2p_backends_helpers.nim(33, 6) Hint: 'initProtocolStates' is declared but not used [XDeclaredButNotUsed]
nim-beacon-chain/vendor/nim-eth/eth/p2p/p2p_backends_helpers.nim(41, 6) Hint: 'resolveFuture' is declared but not used [XDeclaredButNotUsed]
nim-beacon-chain/vendor/nim-eth/eth/p2p/p2p_backends_helpers.nim(46, 6) Hint: 'requestResolver' is declared but not used [XDeclaredButNotUsed]
nim-beacon-chain/vendor/nim-eth/eth/p2p/p2p_backends_helpers.nim(78, 6) Hint: 'linkSendFailureToReqFuture' is declared but not used [XDeclaredButNotUsed]
nim-beacon-chain/vendor/nim-eth/eth/p2p/p2p_backends_helpers.nim(93, 6) Hint: 'handshakeImpl' is declared but not used [XDeclaredButNotUsed]
nim-beacon-chain/vendor/nim-eth/eth/p2p/p2p_protocol_dsl.nim(288, 15) Hint: 'CurrentProtocol' is declared but not used [XDeclaredButNotUsed]
nim-beacon-chain/vendor/nim-eth/eth/p2p/p2p_protocol_dsl.nim(292, 17) Hint: 'perProtocolMsgId' is declared but not used [XDeclaredButNotUsed]
nim-beacon-chain/vendor/nim-eth/eth/p2p/p2p_protocol_dsl.nim(299, 16) Hint: 'state' is declared but not used [XDeclaredButNotUsed]
nim-beacon-chain/vendor/nim-eth/eth/p2p/p2p_protocol_dsl.nim(304, 16) Hint: 'networkState' is declared but not used [XDeclaredButNotUsed]
nim-beacon-chain/vendor/nim-eth/eth/p2p/p2p_protocol_dsl.nim(593, 20) template/generic instantiation of `makeEth2Request` from here
nim-beacon-chain/vendor/nim-eth/eth/p2p/p2p_tracing.nim(87, 12) Hint: 'initTracing' is declared but not used [XDeclaredButNotUsed]
nim-beacon-chain/vendor/nim-eth/eth/p2p/p2p_tracing.nim(89, 12) Hint: 'logSentMsg' is declared but not used [XDeclaredButNotUsed]
nim-beacon-chain/vendor/nim-eth/eth/p2p/p2p_tracing.nim(90, 12) Hint: 'logReceivedMsg' is declared but not used [XDeclaredButNotUsed]
nim-beacon-chain/vendor/nim-eth/eth/p2p/p2p_tracing.nim(91, 12) Hint: 'logConnectedPeer' is declared but not used [XDeclaredButNotUsed]
nim-beacon-chain/vendor/nim-eth/eth/p2p/p2p_tracing.nim(92, 12) Hint: 'logAcceptedPeer' is declared but not used [XDeclaredButNotUsed]
nim-beacon-chain/vendor/nim-eth/eth/p2p/p2p_tracing.nim(93, 12) Hint: 'logDisconnectedPeer' is declared but not used [XDeclaredButNotUsed]

Unintuitive error in `p2pProtocol` macro

Embarking on my first nim-devp2p hello world adventure, I was stumped for awhile by the following situation:

import
  eth/p2p, chronos, chronicles, chronos/timer

const
  bzzVersion = 42

p2pProtocol Bzz(version = bzzVersion):

  proc hellooo(peer: Peer,
    foo: string) =
     echo foo
    
  onPeerConnected do (peer: Peer):
    warn "conn"
    if peer.supports(Bzz):
      warn "bzz"
      waitFor peer.hellooo("bar")
      waitFor sleepAsync(1000)

Compiling this returned the following error:

/home/lash/src/status/nim-eth/eth/rlp.nim(377, 18) Error: type mismatch: got <Rlp, type string>
but expected one of: 
proc read[T](future: Future[T] | FutureVar[T]): T
  first type mismatch at position: 1
  required type: Future[read.T] or FutureVar[read.T]
  but expression 'rlp' is of type: Rlp
proc read(rstream: AsyncStreamReader; n = 0): Future[seq[byte]]
  first type mismatch at position: 1
  required type: AsyncStreamReader
  but expression 'rlp' is of type: Rlp
proc read(transp: StreamTransport; n = -1): Future[seq[byte]]
  first type mismatch at position: 1
  required type: StreamTransport
  but expression 'rlp' is of type: Rlp

expression: read(rlp, type(result.foo))

Turns out the p2pProtocol is a macro, and it inserts code that refers to the rlp namespace. Adding eth/rlp to the imports in the code solved the error.

It was however not so easy to understand the error context, so maybe it could be improved.

Database backends not exception safe

All implementations of TrieDatabaseRef raise exceptions but have exception safety issues - in particular, resources will not be released / aborted / committed correctly leading to deadlocks / data loss.

Examples where a raise will leak or break - the code in general needs a full review, these are just samples:

raiseKeyWriteError(key)
(txn not released)
result.deleteStmt = prepare "DELETE FROM trie_nodes WHERE key = ?;"
(previous prepared statement not released)
https://github.com/status-im/nim-rocksdb/blob/08fec021c0f28f63d1221d40a655078b5b923d1b/rocksdb.nim#L130 (leak partially opened db)

uninitialised field in const object

The Nim VM fails to zero an object field in a const assignment involving a proc call.

Nim code:

proc initNibbleRange*(bytes: ByteRange): NibblesRange =
  result.bytes = bytes
  result.ibegin = 0
  result.iend = bytes.len * 2> Does this sound familiar to anyone? The Nim VM fails to zero an object field in a `const` assignment involving a proc call.

Nim code:
```nim
proc initNibbleRange*(bytes: ByteRange): NibblesRange =
  result.bytes = bytes
  result.ibegin = 0
  result.iend = bytes.len * 2

const zeroNibblesRange* = initNibbleRange(zeroBytesRange)

Now this constant is imported and used in another file. This is how the generated C looks for the latter:

NIM_CONST tyObject_NibblesRange_NTijGP6lMbjliuMIEehBLw TM_9aIpn29c4nFUFIFQM9b2UMu5w_4 = {{NIM_NIL, NIM_NIL, ((NI) 0)}, ((NI) 0), ((NI) IL64(279856062208720))};

We suddenly have uninitialised memory in the iend field instead of 0.

This is a hard to replicate bug. I tried copying the relevant Nim code in the same file or in separate files (in case it's an import issue) to make a stand-alone test case, without any luck.

The best test case I can come up with needs nim-eth:

import eth/trie/[trie_defs,nibbles]

echo zeroBytesRange.len
echo zeroNibblesRange.len
let nr = initNibbleRange(ByteRange())
echo nr.len

The obvious workaround is to replace the const with a let, but that won't solve the underlying VM bug.

const zeroNibblesRange* = initNibbleRange(zeroBytesRange)

Now this constant is imported and used in another file. This is how the generated C looks for the latter:
```C
NIM_CONST tyObject_NibblesRange_NTijGP6lMbjliuMIEehBLw TM_9aIpn29c4nFUFIFQM9b2UMu5w_4 = {{NIM_NIL, NIM_NIL, ((NI) 0)}, ((NI) 0), ((NI) IL64(279856062208720))};

We suddenly have uninitialised memory in the iend field instead of 0.

This is a hard to replicate bug. I tried copying the relevant Nim code in the same file or in separate files (in case it's an import issue) to make a stand-alone test case, without any luck.

The best test case I can come up with needs nim-eth:

import eth/trie/[trie_defs,nibbles]

echo zeroBytesRange.len
echo zeroNibblesRange.len
let nr = initNibbleRange(ByteRange())
echo nr.len

Example output (the non-zero value changes with each compilation):

0
280895550172144
0

The obvious workaround is to replace the const with a let, but that won't solve the underlying VM bug.

duplicate connections when incoming vs outgoing connect

It's possible to have multiple connections with the same Peer:

Easy scenario:

  1. Peer A connects to peer B (outgoing connection from Peer A): connection successful
  2. Peer B connects to peer A (incoming connection to Peer B): connection is set-up but a disconnect is done due to duplicate connection. Remove peer from peer_pool.connectedNodes is done.
  3. Peer A remains connected to B (connection from step 1)
  4. Peer A connects again to peer B (outgoing connection Peer A): connection is set-up as peer B is no longer in connectedNodes

I think the same is possible with first an incoming connection for Peer A and next an outgoing. It is just more difficult to achieve as there is an initial check here: https://github.com/status-im/nim-eth/blob/master/eth/p2p/peer_pool.nim#L62
However, due to the async behaviour of rlpxConnect and rlpxAccept, I think this could still occur the other way around.

To resolve this the following could be done:
Do not let the disconnect call remove the peer from the connectedNodes in the peer pool in case DisconnectionReason is AlreadyConnected.

Other options are to check within rlpxAccept and rlpxConnect itself if the node is already in connectedNodes or connectingNodes. By the way, in case of an incoming connection, connectingNodes is never set, it is true however that in this case the actual node (pub key) is known much later. Also, even when moving the check to rlpxAccept and rlpxConnect, there is still a chance to have the problem will occur (due to the last await) unless the connectedNodes add is moved right after the check.

So I guess the question basically is, in case of duplicate connection: Do we allow the full connection (including all sub protocol connection handlers) to happen and then afterwards need to disconnect (including running all sub protocol disconnection handlers). Or do we just stop rlpxAccept or rlpxConnect the moment we know it is a duplicate peer (e.g. before postHelloSteps)? The latter will be more intrusive as change.

Thoughts?

Run discovery DHT behind nat

This ticket describes issues with the current code, when running behind a nat:

  • node A runs on port 50000 on network X.0
  • node B runs on port 50001 on network X.0
  • node C runs on port 50002 on network Y.0
  • to node C, node A runs on network X.1, behind a NAT

A and B connect to each other on local network - they will have each others 'X.0' addresses in the DHT. Now C connects to A because it had a bootstrap address with X.1:50000, and is unable to connect to node B, because there is no DHT entry with X.1 - A only sees B as X.0:50001



                                 FIREWALL
                                 +----+
                                 |    |
                                 |    |       +-----------------+
                                 |    |       |                 |
                                 |    |       |                 |
                                 |    |       |      A          |
                                 |    +-------+                 |
                                 |    |       |                 |
                                 |    |       |                 |
                                 |    |       |     X.0:50000   |
        +-----------+            |    |       +-----------------+
        |           |            |    |                |
        |           |            |    |                |
        |           |            |    |                |
        |   C       +------------+X.1 |                |
        |           |            |    |                |
        |           |            |    |                |
        |           |            |    |                |
        +-----------+            |    |                |
                                 |    |                |
                                 |    |       +-----------------+
                                 |    |       |     X.0:50001   |
                                 |    |       |                 |
                                 |    |       |       B         |
                                 |    +-------+                 |
                                 |    |       |                 |
                                 |    |       |                 |
                                 |    |       |                 |
                                 |    |       |                 |
                                 |    |       +-----------------+
                                 |    |
                                 +----+


Add metrics to Whisper and peer pool

When having a whisper node it could be useful to have some metrics such as:

  • number of peers
  • number of envelopes
  • number of dropped envelopes
    • because of expired
    • future timed
    • PoW reasons
    • other invalid?

Perhaps also metrics to be able to assess all incoming messages versus messages that are requested specifically through filters.

Prevent silent failures in RLP serialization

Right now, a default serializer will swallow any type that's passed to RLP - this makes the serialization easy to break by simply forgetting an import, as can be seen in status-im/nimbus-eth2#144

Instead of having an overload that silently swallows mistakes, make enabling of the default serializer explicit. Consider both intrusive (pragma) and non-intrusive (for 3rd party types) options .

Improve handling of dropped connection before end of postHelloSteps

See also #71

Somewhat related to #63

Basically, the check/disconnect could also be moved to dispatchMessages. But then the peer will still briefly be added to the peer_pool.

The disconnect should not be fully done, no removePeer is needed.
In general removePeer could be moved to dispatchMessages (or out of rlpx )

Limit (incoming) connecting peers.

Currently in our implementation, the maximum connected peers limit is only applied on the creation of new outgoing connections.

The listener for incoming connections remains active. This gives a serious risk and should be handled by either:

  1. Being the very friendly peer: allowing the handshake to start and then send a disconnect message with ToManyPeers as DisconnectionReason
  2. Ignore any incoming connections and just disconnect them without any message.
  3. Don't allow the incoming connection at all?

I'm guessing when going from 1. down, the possibility of being seen as a "bad" peer will go up. However, practically, I'm not sure if any client actually cares.

"Broken pipe" or "Transport is already closed" error on SendMsg results in crash

At https://github.com/status-im/nim-eth/blob/master/eth/p2p/rlpx.nim#L283

Already gets catched but reraised there, so crash can occur still if not properly handled higher up.
E.g.

Async traceback:
  nimbus.nim(147)        nimbus
  nimbus.nim(126)        process
  asyncloop.nim(266)     poll
  asyncmacro2.nim(36)    sendMsg_continue
  rlpx.nim(283)          sendMsgIter
  asyncfutures2.nim(377) read
  #[
    nimbus.nim(147)        nimbus
    nimbus.nim(126)        process
    asyncloop.nim(266)     poll
    asyncmacro2.nim(39)    sendMsg_continue
    excpt.nim(138)         sendMsgIter
  ]#
  #[
    nimbus.nim(147)        nimbus
    nimbus.nim(126)        process
    asyncloop.nim(266)     poll
    asyncmacro2.nim(36)    sendMsg_continue
    excpt.nim(138)         sendMsgIter
  ]#
Exception message: (32) Broken pipe
Exception type: [Exception]

Could probably just not reraise there, but perhaps has some side-effects (possible other error cases?).

rlpx.nim Future double completion

@zah now when more better stack traces available

DBG 2019-04-17 15:02:32+02:00 GossipSub message not delivered to Peer tid=23109 peer=Node[<hidden ip address>:9100]
Error: unhandled exception: An attempt was made to complete a Future more than once.
  Details: Future ID: 129240
  Creation location:
    ../../nim-eth/eth/p2p/p2p_backends_helpers.nim(65) [unspecified]
  First completion location:
    ../../nim-eth/eth/p2p/rlpx.nim(183) [unspecified]
  Second completion location:
    ../../nim-eth/eth/p2p/rlpx.nim(270) [unspecified]
  Stack trace to moment of creation:
    No stack traceback available
  Stack trace to moment of secondary completion:
    No stack traceback available
[FutureError]

Setting RLPx subprotocol version to 0 crashes

Problem

If I set wakuVersion (or whisperVersion) to 0 like this: https://github.com/oskarth/nim-eth/blob/waku-zero/eth/p2p/rlpx_protocols/waku_protocol.nim#L56-L57

I get the following crash:

[Suite] Waku connections
INF 2019-11-15 17:01:52+08:00 RLPx listener up                           tid=10758 self=enode://23cb2baaab4231bc85271e540c6161549199b78c7f66c2997ca7f0722650a8fcf5e005ff8b34043959bd5e8e0c27a1b868c8a870282a0f8bc18d5c1617084225@127.0.0.1:30304
Traceback (most recent call last)
/home/oskarth/git/nim-eth/tests/p2p/test_waku_connect.nim(26) test_waku_connect
/home/oskarth/.nimble/pkgs/chronos-2.3.2/chronos/asyncloop.nim(933) waitFor
/home/oskarth/.nimble/pkgs/chronos-2.3.2/chronos/asyncloop.nim(274) poll
/home/oskarth/git/nim-eth/eth/p2p/rlpx.nim(1029) rlpxConnect
/home/oskarth/git/nim-eth/eth/p2p/rlpx.nim(899) postHelloSteps
/home/oskarth/git/nim-eth/eth/p2p/p2p_protocol_dsl.nim(305) WakuPeerConnected
/home/oskarth/.choosenim/toolchains/nim-0.20.0/lib/system/gc.nim(238) asgnRef
/home/oskarth/.choosenim/toolchains/nim-0.20.0/lib/system/gc.nim(183) incRef
SIGSEGV: Illegal storage access. (Attempt to read from nil?)
Error: execution of an external program failed: '/home/oskarth/git/nim-eth/tests/p2p/test_waku_connect -vvv'

Solution

I can't find in docs/p2p.md or https://github.com/ethereum/devp2p/blob/master/rlpx.md that suggest that this should be invalid. Either:

  1. Make version 0 valid, or:

  2. Give a better validation/compile error, and add to docs that this is a requirement

Notes

I didn't debug or try to fix this further, but seems like a bug in the P2P DSL.

Additionally, and separately, I noticed that p2p/docs.md says subprotocols should be 3 letters. In devp2p/rlpx.md it just says "short ASCII name" as far as I can tell. I can't tell if this a spec constraint and I'm looking in the wrong place, or what's going on there. Didn't look into it too much. Some validation there might be useful too.

Observer onPeerDisconnected run when it shouldn't

There is the possibility that the onPeerDisconnected call of an PeerObserver is called when it shouldn't. That is, when the onPeerConnected never got called.

This is because removePeer is run in disconnect, and disconnect can get called already in rlpxConnect or rlpxAccept`, e.g. https://github.com/status-im/nim-eth/blob/master/eth/p2p/rlpx.nim#L1347

This issue gets quick-fixed here: 2174a7b

But, it is already the second time that I see a problem because removePeer should not be run. So perhaps an architectural change would be better.

Possibly wrong rlp (de)serialisation of NewBlockAnnounce

The eth_protocol message newBlock is currently not used, but I noticed several different rlp deserialisation errors from incoming messages, so it looks like something is wrong there. To be investigated...

e.g. errors:

DBG 2019-06-14 13:07:29+02:00 Failed rlp.read                            tid=30804 file=../../../.nimble/pkgs/eth-1.0.0/eth/p2p/rlpx.nim:470 exception="Int expected, but found a List" msg=NewBlockAnnounce peer=Node[159.100.254.92:30303] rlpData="{\n  byte 88\n  byte 0\n  blob(3) [01D4C0]\n  blob(20) [20E94867794DBA030EE287F1406E100D03C84CD3]\n  \"\"\n  blob(68) [A9059CBB0000000000000000000000007B74C19124A9CA92C6141A2ED5F92130FC2791F200000000000000000000000000000000000000000000007E9A2F86853F440000]\n}"
DBG 2019-06-14 13:07:29+02:00 RlpError, ending dispatchMessages loop     topics="rlpx" tid=30804 file=../../../.nimble/pkgs/eth-1.0.0/eth/p2p/rlpx.nim:558 clientId=Geth/v1.8.23-stable-c9427004/linux-amd64/go1.10.4 msg=newBlock peer=Node[159.100.254.92:30303]

DBG 2019-06-14 13:29:37+02:00 Failed rlp.read                            tid=30804 file=../../../.nimble/pkgs/eth-1.0.0/eth/p2p/rlpx.nim:470 exception="The RLP contains a larger than expected Int value" msg=NewBlockAnnounce peer=Node[195.201.192.23:30304] rlpData="blob(20) [8E71B195D9CC953F46F41AD013BA1147464B621D]"
DBG 2019-06-14 13:29:37+02:00 RlpError, ending dispatchMessages loop     topics="rlpx" tid=30804 file=../../../.nimble/pkgs/eth-1.0.0/eth/p2p/rlpx.nim:558 clientId=Geth/v1.8.27-stable-noku-ae734200/linux-amd64/go1.11.4 msg=newBlock peer=Node[195.201.192.23:30304]

DBG 2019-06-14 13:30:36+02:00 Failed rlp.read                            tid=30804 file=../../../.nimble/pkgs/eth-1.0.0/eth/p2p/rlpx.nim:470 exception="Sequence expected, but the source RLP is not a list." msg=NewBlockAnnounce peer=Node[212.32.246.39:30314] rlpData="blob(9) [23BCCD842D5D4D0C2B]"
DBG 2019-06-14 13:30:36+02:00 RlpError, ending dispatchMessages loop     topics="rlpx" tid=30804 file=../../../.nimble/pkgs/eth-1.0.0/eth/p2p/rlpx.nim:558 clientId=Geth/v5.4.3/linux/go1.9.7 msg=newBlock peer=Node[212.32.246.39:30314]

DBG 2019-06-14 12:46:09+02:00 Failed rlp.read                            tid=30804 file=../../../.nimble/pkgs/eth-1.0.0/eth/p2p/rlpx.nim:470 exception="Fixed-size array expected, but the source RLP contains a blob of different length" msg=NewBlockAnnounce peer=Node[77.223.66.87:30303] rlpData="blob(2) [5208]"
DBG 2019-06-14 12:46:09+02:00 RlpError, ending dispatchMessages loop     topics="rlpx" tid=30804 file=../../../.nimble/pkgs/eth-1.0.0/eth/p2p/rlpx.nim:558 clientId=Geth/v1.8.25-stable/linux-amd64/go1.11.5 msg=newBlock peer=Node[77.223.66.87:30303]

DBG 2019-06-14 15:13:52+02:00 Failed rlp.read                            tid=1332 file=../../../.nimble/pkgs/eth-1.0.0/eth/p2p/rlpx.nim:469 dataType=NewBlockAnnounce exception="Small number encoded in a non-canonical way" peer=Node[35.158.27.128:30303]
DBG 2019-06-14 15:13:52+02:00 RlpError, ending dispatchMessages loop     topics="rlpx" tid=1332 file=../../../.nimble/pkgs/eth-1.0.0/eth/p2p/rlpx.nim:556 clientId=Geth/v1.8.23-stable-c9427004/linux-amd64/go1.10.4 msg=newBlock peer=Node[35.158.27.128:30303]

The non-pruning HexaryTrie is dangerous to use

Transferring the issue description from status-im/nimbus-eth1#229 (comment):


Digging deeper, I finally understand the problem. And we can leave the trie as the last suspect in this case.

The problem involves two or more accounts with the same storageRoot. To simplify the illustration, I will use only two accounts.

During transaction U execution, account A was modified and it has storageRoot=0xABCDEF
During transaction V execution, account B was modified and also arrive at storageRoot=0xABCDEF

Later on, during transaction X, either account A or B is modified again and it has storageRoot=0xDEADBEEF.
Here the problem arise, during transaction Y execution, account with storageRoot=0xABCDEF will have corrupted/incomplete trie. The other account that previously have the same storageRoot already doing damage on the trie if the pruning is turned on.

Perhaps this is more of design issue.

@zah, you know the hexary trie better than me, do you have any idea how to approach this situation? although my little fix works, but conceptually it doesn't remove the problem.

eth_protocol handshake with Parity clients always results in a timeout

When monitoring nimbus I often noticed a disconnection during rlpxConnect.
Turns out this is due to a timeout on the eth_protocol handshake.
Further debugging has shown that this occurs almost always with Parity clients.
In fact, so far, I've never seen a successful connection with a Parity client, and I've also tested this locally by starting one myself and trying to connect to it.

Research & implement PoC for Waku/0 or Waku/1 & Whisper/6 compatiblity

As Waku is a different protocol from Whisper, nodes using Waku protocol will not be able to talk directly with nodes talking still Whisper/6 protocol (unless we do some tricks in handshake...but that would be hacky).
This will need to be resolved if the Status app requires to be compatible with older versions. (App pre Waku can send messages to app using Waku protocol).

Most likely solution here is to have enough full nodes in the network that speak both Waku and Whisper/v6 protocol and that those nodes relay the messages between both protocols.
So the possible solution here would be to implement such relay/bridging node.

Acceptance:
Working PoC bridge solution between Whisper/6 and Waku/0

Run only onPeerDisconnected for subprotocols that had onPeerConnected ran.

Currently, if one of the subprotocol onPeerConnected calls fails (exception raised) this will be catched in rlpxConnect/rlpxAccept and the peer will get disconnected (UselessPeer).

Before this happens, all others subprotocol onPeerConnected calls will be ran though.
However, the onPeerDisconnected calls are never run. Possibly creating problems if they should have cleaned up certain resources/loops.

This could be added easily, but then also the onPeerDisconnected call of the failing subprotocol would be run, potentially causing other problems.

In short, best would be to have only onPeerDisconnected ran for those subprotocols that succesfully had onPeerConnected ran.

This used to be a potential problem for whisper_protocol but got solved like this: #62

But I think it could still be a problem for les_protocol: https://github.com/status-im/nim-eth/blob/master/eth/p2p/rlpx_protocols/les_protocol.nim#L249

Might connect to more than allowed number of connections.

Currently the check of whether to try to connect to more peers happens before the call which tries to connect to an x amount of peers.
A seq of candidate nodes is returned by discovery and then connected to in parallel. This might result in a higher amount of peers. I guess this is also why it is called minPeers, so technically, there is no problem.

However, there might be use cases where you do not want to go over a maximum defined limit.

Move discovery to a background thread

The discovery interface is pretty simple as it mostly consists of getters, like getRandomPeers, getPeerCount, with v5 smth like getPeersForTopic. These gettters look like an easy boundary for a separate thread. Even though we aim for the beacon node to be non-blocking, I still think it's a good idea to make discovery independent thread-wise of the rest of the code. After all it is a library designed to be used not only by nimbus.

Things to keep in mind:

  • The database reference must be thread-deep-copyable. I suppose that's already how it is with rocksdb.
  • All discovery instances should share one global discovery thread, in case we want to run an e.g. simulation in a single process.

Investigate possible abuse of expiry & ttl field in envelope

Here it is informally explained that in Whisper the origin of a message could be found out by a powerful adversary.
Perhaps the next issue is obvious and known and not the goal Whisper, but I assumed differently so I'll post it here anyhow. Also, I've not actually tested it.

A Whisper envelope contains the expiry and ttl fields. Typically in implementations: expiry - ttl = creation timestamp.

Now I question if a peer can not simply abuse this timestamp to figure out the origin of the envelope.
A peer his direct connections, could have a fairly stable round-trip time (RTT). Just looking at the timestamp could perhaps identify the peer as a sender or not. Especially over multiple messages.

This is however somewhat more complicated:

  • First there is the PoW calculation which will delay the queuing of the message. This delay will also have some randomness. However, the delay will probably not be enough compared to a usual RTT/2. Especially if the selected PoW is low (e.g. very low setting of PoW in Status).
  • Secondly, messages are send in batch and not immediate. This is not a requirement of EIP-627 but it is generally assumed and done in geth and nim-eth. Depending on the send interval (300ms currently), this will make it more difficult to figure out the origin of the message (1 - 300 ms can be added as delay).

In the worst case (e.g. messages with less delay added than RTT/2), it is likely that some messages will be able to be linked to a peer you are connected with.
Combine this e.g. with a node monitoring the messages of some most used public channels on Status and you could perhaps link IP-address to the messages & Status names of a user that this node is connected with.
Of course these are public channels but still I believe pseudoanonimity should be a goal even on those channels.

Possible ideas of improving this:

  • add more time randomness by decreasing expiry with random amount.
  • add more time randomness to the batch envelope send.

TBI

[P2P - Win 32-bits] LES protocol, flow control overflow on 32-bit

https://ci.appveyor.com/project/nimbus/nim-eth/builds/22193468/job/9gukl8y540xc2whc#L4691

Hint: C:\projects\nim-eth\tests\p2p\les\test_flow_control.exe  [Exec]
[Suite] les flow control
[Suite] running averages
  [OK] consistent costs
  [OK] randomized averages
[Suite] buffer value calculations
test_flow_control.nim(4) test_flow_control
flow_control.nim(421)    tests
flow_control.nim(266)    initFlowControl
system.nim(2830)         sysFatal
    Unhandled exception: over- or underflow [OverflowError]

Unfortunately before #4 Windows tests were not compiling at all (even in nim-eth-p2p before the merging).

Only start sending messages when connected to > 1 peers?

Perhaps we should add a check of x connected peers before we start sending Whisper messages at the specified interval.

If a node is only connected to 1 peer, and that peer is malicious, it could decide not to forward the node its messages.
In this case, the malicious peer could monitor with all it's connected peers if any duplicates of the node its messages are arriving. If this does not occur, it could mean that the messages originates from that node. It could of course also just mean, that the TTL is not long enough to get forwarded all the way to this peer. However, with a low amount of nodes (which Status kinda has), and over several messages, the malicious peer could probably deduce this.

This might be a none issue, especially if a node is only for a very limited time (like @ startup) connected to only 1 node.

Allow Protocol to have multiple versions

If we would have #140 working, we could get along by providing different Protocol names for different versions.

E.g.:

p2pProtocol Wakuv1(version = wakuVersion,
...
p2pProtocol Wakuv2(version = differentWakuVersion,
...

node.addCapability(Wakuv1)
node.addCapability(Wakuv2)

However, it would be a nice-to-have if we could arrange that through the provided version somehow. This is mostly a cosmetic thing though.

A nicer addition would be if we could tag also certain procs of a protocol to certain versions. So that if they remain the same, they don't have to be duplicated. Like, a newer version of the protocol could default just "inherit" all the procs of the older versions, unless it implements its own.
These are just some quick thoughts right now, not a priority and perhaps it is not worth the effort. One could always try to just isolate as much of the code and share it.

Implement waku/0 or /1 - addition of Waku mode to waku protocol

Implement a draft of the Waku mode. This could be something similar as #114, but exact spec has to be still decided/written. Draft spec writing might go in parallel of this implementation as PoC (testing ground).

Draft version would probably still fall under waku/0.
Finalized spec + implementation could be waku/1 or waku/0. This has to be still decided and is not part of this issue.

Requirements:

  • #123 (or at least the baseline).

Acceptance:
Working PoC of Waku mode.

Ambiguous call error (utils.fromHex) when running tests from Nim 0.20.0

Running

nimble install
nimble test

I expect tests to pass without any additional special action.

Actual result seen below, appears to be some ambiguous function calls in nim-0.20.0 and nimcrypto-0.39.9

> nimble test
  Executing task test in /home/oskarth/git/nim-eth/eth.nimble

Running: tests/test_bloom

[Suite] Simple Bloom Filter tests
  [OK] incl proc
  [OK] int value

Running: tests/test_common

[Suite] rlp encoding
  [OK] receipt roundtrip

Running: tests/keyfile/test_keyfile
/home/oskarth/git/nim-eth/eth/keyfile/keyfile.nim(233, 23) Error: ambiguous call; both utils.fromHex(a: string) [declared in /home/oskarth/.nimble/pkgs/nimcrypto-0.3.9/nimcrypto/utils.nim(158, 6)] and strutils.fromHex(s: string) [declared in /home/oskarth/.choosenim/toolchains/nim-0.20.0/lib/pure/strutils.nim(1049, 6)] match for: (string)
stack trace: (most recent call last)
/tmp/nimblecache/nimscriptapi.nim(164, 16)
/home/oskarth/git/nim-eth/eth_31398.nims(103, 18) testTask
/home/oskarth/git/nim-eth/eth_31398.nims(35, 12) runKeyfileTests
/home/oskarth/git/nim-eth/eth_31398.nims(27, 8) runTest
/home/oskarth/.choosenim/toolchains/nim-0.20.0/lib/system/nimscript.nim(242, 7) exec
/home/oskarth/.choosenim/toolchains/nim-0.20.0/lib/system/nimscript.nim(242, 7) Error: unhandled exception: FAILED: nim c -r -d:release -d:chronicles_log_level=ERROR --verbosity:0 --hints:off --warnings:off tests/keyfile/test_keyfile
     Error: Exception raised during nimble script execution

Workaround

Downgrading to 0.9.16 with choosenim 0.9.16. appears to work.

Allow for changing between topic-interest and bloom filter by means of new packet type

When topic-interest is used, we currently do now allow to switch back to usage of bloom filter and vice versa.

See also comment here: https://github.com/status-im/nim-eth/blob/master/eth/p2p/rlpx_protocols/waku_protocol.nim#L368

This issue has been raised and a new packet type with a StatusOptions parameter will be added to to specification.
As soon as this is done, this needs to be added to the waku_protocol here to be compliant.
This might include the removal of some of the current packets types.

See also here for the status-go version: status-im/status-go#1853

Sporadic stall of CI on test_sparse_binary_trie.nim

I've seen the test_sparse_binary_trie test hanging sporadically on our CI tests.

Specifically on Travis, MacOS AMD64. I can't recall, but it might also occur on Linux.

Running: tests/trie/test_sparse_binary_trie

[Suite] sparse binary trie

No output has been received in the last 10m0s, this potentially indicates a stalled build or something wrong with the build itself.

Check the details on how to adjust your build configuration on: https://docs.travis-ci.com/user/common-build-problems/#Build-times-out-because-no-output-was-received

The build has been terminated

Per IP-address peer limiting?

First off:
I'm not sure if the application the right place to do this. Perhaps this should be arranged system wide as this is typically a kind of restriction that can be variable per system. However, some advice on it should be formulated then.

The concern is that when having a limitation of amount of connected peers (see #107 & #108) combined with certain high timeouts on the handshakes in the rlpx protocol and sub-protocols that one node could easily launch many connects and allow them to timeout hogging up the connections.
This is obviously not completely correct as currently the limit is not enforced by amount of open connections but by amount of actual completed handshakes (and thus fully connected peers, which would be more intensive for a node to pull off). Just mentioning this part, in case we would change that.

However, there is still a higher limit that remains valid which is the open files limit on Unix systems.

First step should probably be to investigate if there are actually locations where there is no timeout at all (other than the TCP timeout): like are these OK?
Basic solution could also be to just lower the timeouts with a decent amount. (For Whisper I actually never used the 10 second timeout)

Implement waku/0 - current Status whisper fork as is

Implement waku/0 protocol, a Status fork of Whisper implementing the additions/changes done to it.

This will mostly be adding message packets ids, changes to light client (not sure?), notification on send (how does this work now? and why?). Questions should be asked regarding the validity of changes (is everything still used and needed?).

Mailserver related changes are targeted in separate issue: https://github.com/status-im/nim-eth/issues/102

Requirements:

Acceptance: Implement Status go compatible waku/0.

[RLPx] Test is passing even though there are errors

Is that normal that the following is passing even though there are errors?

https://travis-ci.org/status-im/nim-eth/jobs/489580420#L3422-L3431

cc @zah

DBG 2019-02-06 15:54:11+00:00 Received neighbours                        topics="discovery" thread=15118 neighbours="@[Node[127.0.0.1:30303], Node[127.0.0.1:30304], Node[127.0.0.1:30305]]" remote=Node[127.0.0.1:30301]
DBG 2019-02-06 15:54:11+00:00 onPeerConnected Whisper                    thread=15118
DBG 2019-02-06 15:54:11+00:00 onPeerConnected Whisper                    thread=15118
DBG 2019-02-06 15:54:11+00:00 Whisper peer                               thread=15118 peer=Node[127.0.0.1:51484] whisperVersion=6
DBG 2019-02-06 15:54:11+00:00 Sending envelopes                          thread=15118 amount=0
DBG 2019-02-06 15:54:11+00:00 Whisper peer initialized                   thread=15118
DBG 2019-02-06 15:54:11+00:00 Whisper peer                               thread=15118 peer=Node[127.0.0.1:30304] whisperVersion=6
DBG 2019-02-06 15:54:11+00:00 Whisper peer initialized                   thread=15118
DBG 2019-02-06 15:54:11+00:00 Not enough nodes                           topics="discovery" thread=15118 present=3 requested=10
DBG 2019-02-06 15:54:11+00:00 Not enough nodes                           topics="discovery" thread=15118 present=3 requested=10
DBG 2019-02-06 15:54:11+00:00 Received neighbours                        topics="discovery" thread=15118 neighbours="@[Node[127.0.0.1:30303], Node[127.0.0.1:30301], Node[127.0.0.1:30305]]" remote=Node[127.0.0.1:30304]
DBG 2019-02-06 15:54:11+00:00 Received neighbours                        topics="discovery" thread=15118 neighbours="@[Node[127.0.0.1:30303], Node[127.0.0.1:30301], Node[127.0.0.1:30304]]" remote=Node[127.0.0.1:30305]
DBG 2019-02-06 15:54:11+00:00 Received neighbours                        topics="discovery" thread=15118 neighbours="@[Node[127.0.0.1:30303], Node[127.0.0.1:30301], Node[127.0.0.1:30304]]" remote=Node[127.0.0.1:30305]
DBG 2019-02-06 15:54:11+00:00 Received neighbours                       m topics="discovery" thread=15118 neighbours="@[Node[127.0.0.1:30303], Node[127.0.0.1:30304], Node[127.0.0.1:30305]]" remote=Node[127.0.0.1:30301]
DBG 2019-02-06 15:54:11+00:00 Received neighbours                        topics="discovery" thread=15118 neighbours="@[Node[127.0.0.1:30303], Node[127.0.0.1:30304], Node[127.0.0.1:30305]]" remote=Node[127.0.0.1:30301]
DBG 2019-02-06 15:54:11+00:00 Received neighbours                        topics="discovery" thread=15118 neighbours="@[Node[127.0.0.1:30301], Node[127.0.0.1:30304], Node[127.0.0.1:30305]]" remote=Node[127.0.0.1:30303]
DBG 2019-02-06 15:54:11+00:00 Sending envelopes                          thread=15118 amount=0
DBG 2019-02-06 15:54:11+00:00 Sending envelopes                          thread=15118 amount=0
DBG 2019-02-06 15:54:11+00:00 Sending envelopes                          thread=15118 amount=0
ERR 2019-02-06 15:54:11+00:00 Light node not allowed to post messages    thread=15118
  [OK] Light node posting
INF 2019-02-06 15:54:11+00:00 RLPx listener up                           thread=15118 self=enode://5115f6a21cd791f92a684991e984afd7dd7634d00ee7e1166483d74814d06fa457e29112f69671454b2d9974d5cecfe34538354d64b08168bc19c75757bf5a92@127.0.0.1:30307
DBG 2019-02-06 15:54:11+00:00 onPeerConnected Whisper                    thread=15118
DBG 2019-02-06 15:54:11+00:00 onPeerConnected Whisper                    thread=15118
DBG 2019-02-06 15:54:11+00:00 Whisper peer                               thread=15118 peer=Node[127.0.0.1:51491] whisperVersion=6
DBG 2019-02-06 15:54:11+00:00 Exception in rlpxAccept                    topics="rlpx" thread=15118 err="Two light nodes connected\nAsync traceback:\n  test_shh_connect.nim(383) test_shh_connect\n  asyncloop.nim(761)        waitFor\n  asyncloop.nim(238)        poll\n  asyncmacro2.nim(36)       WhisperHandshake_continue\n  whisper_protocol.nim(718) WhisperHandshakeIter\n  #[\n    test_shh_connect.nim(383) test_shh_connect\n    asyncloop.nim(761)        waitFor\n    asyncloop.nim(238)        poll\n    asyncmacro2.nim(36)       postHelloSteps_continue\n    rlpx.nim(1243)            postHelloStepsIter\n    asyncfutures2.nim(325)    read\n  ]#\nException message: Two light nodes connected\nException type:" stackTrace="test_shh_connect.nim(383) test_shh_connect\nasyncloop.nim(761)       waitFor\nasyncloop.nim(238)       poll\nasyncmacro2.nim(36)      WhisperHandshake_continue\nwhisper_protocol.nim(718) WhisperHandshakeIter\n[[reraised from:\ntest_shh_connect.nim(383) test_shh_connect\nasyncloop.nim(761)       waitFor\nasyncloop.nim(238)       poll\nasyncmacro2.nim(36)      postHelloSteps_continue\nrlpx.nim(1243)           postHelloStepsIter\nasyncfutures2.nim(325)   read\n]]\n[[reraised from:\ntest_shh_connect.nim(383) test_shh_connect\nasyncloop.nim(761)       waitFor\nasyncloop.nim(238)       poll\nasyncmacro2.nim(36)      rlpxAccept_continue\nrlpx.nim(1413)           rlpxAcceptIter\nasyncfutures2.nim(325)   read\n]]\n"
DBG 2019-02-06 15:54:11+00:00 Whisper peer                               thread=15118 peer=Node[127.0.0.1:30307] whisperVersion=6
ERR 2019-02-06 15:54:11+00:00 dispatchMessages failed                    topics="rlpx" thread=15118 err="Data incomplete!\nAsync traceback:\n  test_shh_connect.nim(383) test_shh_connect\n  asyncloop.nim(761)        waitFor\n  asyncloop.nim(238)        poll\n  asyncmacro2.nim(36)       readExactly_continue\n  stream.nim(1284)          readExactlyIter\n  #[\n    test_shh_connect.nim(383) test_shh_connect\n    asyncloop.nim(761)        waitFor\n    asyncloop.nim(238)        poll\n    asyncmacro2.nim(36)       recvMsg_continue\n    rlpx.nim(405)             recvMsgIter\n    asyncfutures2.nim(325)    read\n  ]#\nException message: Data incomplete!\nException type:"
ERR 2019-02-06 15:54:11+00:00 dispatchMessages failed                    topics="rlpx" thread=15118 err="Data incomplete!\nAsync traceback:\n  test_shh_connect.nim(383) test_shh_connect\n  asyncloop.nim(761)        waitFor\n  asyncloop.nim(238)        poll\n  asyncmacro2.nim(36)       readExactly_continue\n  stream.nim(1284)          readExactlyIter\n  #[\n    test_shh_connect.nim(383) test_shh_connect\n    asyncloop.nim(761)        waitFor\n    asyncloop.nim(238)        poll\n    asyncmacro2.nim(36)       recvMsg_continue\n    rlpx.nim(405)             recvMsgIter\n    asyncfutures2.nim(325)    read\n  ]#\nException message: Data incomplete!\nException type:"
  [OK] Connect two light nodes

[p2p, rlpx] reqId procs incorrect codegen

It seems like reqId procs are wrong. Relevant codegen:

proc getBeaconBlockHeaders_thunk(msgSender: Peer; _1613480: int; data1613482: Rlp) {.
    gcsafe, async.} =
  var rlp = data1613482
  var msg {.noinit.}: getBeaconBlockHeadersObj
  let reqId = read(rlp, int)
  enterList(rlp)
  msg.blockRoot = checkedRlpRead(msgSender, rlp, Eth2Digest)
  msg.slot = checkedRlpRead(msgSender, rlp, uint64)
  msg.maxHeaders = checkedRlpRead(msgSender, rlp, int)
  msg.skipSlots = checkedRlpRead(msgSender, rlp, int)
  await(getBeaconBlockHeaders1613476(msgSender, reqId, msg.blockRoot, msg.slot,
                                     msg.maxHeaders, msg.skipSlots))

proc getBeaconBlockHeaders*(msgRecipient: Peer; blockRoot: Eth2Digest; slot: uint64;
                           maxHeaders: int; skipSlots: int; timeout: int = 10000): Future[
    Option[beaconBlockHeadersObj]] {.gcsafe.} =
  var writer = initRlpWriter()
  const
    perProtocolMsgId = 2
  let perPeerMsgId = perPeerMsgIdImpl(msgRecipient, BeaconSyncProtocol, 2)
  append(writer, perPeerMsgId)
  startList(writer, 4)
  newFuture result
  let reqId = registerRequest(msgRecipient, timeout, result, perPeerMsgId +
      1)
  append(writer, reqId)
  append(writer, blockRoot)
  append(writer, slot)
  append(writer, maxHeaders)
  append(writer, skipSlots)
  let msgBytes = finish(writer)
  linkSendFailureToReqFuture(sendMsg(msgRecipient, msgBytes), result)

Note startList(writer, 4) in the requestor, which then appends 5 params including reqId.
Note enterList(rlp) in the thunk, which happens after reading reqId and followed by 4 reads.
Fix on the way.

P2P protocol backwards compatibility

Currently the procs of a p2pProtocol will allow for parsing rlp that contains more fields than there are parameters. This is good as it allows for EIP-8 test vectors to pass.

However, the other way around does not work. When our protocol adds certain parameters to a proc in a newer version, then running against an older version will make it fail.
It would be nice if we could provide a fallback to a lower version.

It could perhaps be done by having some optional parameters in a proc, and then within that proc it could be decided how to continue (be compatible or not...).

This would be specifically handy in the handshake for #114.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.