bmuller / kademlia Goto Github PK

View Code? Open in Web Editor NEW

811.0 41.0 214.0 408 KB

A DHT in Python using asyncio

Home Page: http://kademlia.readthedocs.org

License: MIT License

Python 100.00%

kademlia's Introduction

Python Distributed Hash Table

Documentation can be found at kademlia.readthedocs.org.

This library is an asynchronous Python implementation of the Kademlia distributed hash table. It uses the asyncio library in Python 3 to provide asynchronous communication. The nodes communicate using RPC over UDP to communiate, meaning that it is capable of working behind a NAT.

This library aims to be as close to a reference implementation of the Kademlia paper as possible.

Installation

pip install kademlia

Usage

This assumes you have a working familiarity with asyncio.

Assuming you want to connect to an existing network:

import asyncio
from kademlia.network import Server

async def run():
    # Create a node and start listening on port 5678
    node = Server()
    await node.listen(5678)

    # Bootstrap the node by connecting to other known nodes, in this case
    # replace 123.123.123.123 with the IP of another node and optionally
    # give as many ip/port combos as you can for other nodes.
    await node.bootstrap([("123.123.123.123", 5678)])

    # set a value for the key "my-key" on the network
    await node.set("my-key", "my awesome value")

    # get the value associated with "my-key" from the network
    result = await node.get("my-key")
    print(result)

asyncio.run(run())

Initializing a Network

If you're starting a new network from scratch, just omit the node.bootstrap call in the example above. Then, bootstrap other nodes by connecting to the first node you started.

See the examples folder for a first node example that other nodes can bootstrap connect to and some code that gets and sets a key/value.

Logging

This library uses the standard Python logging library. To see debut output printed to STDOUT, for instance, use:

import logging

log = logging.getLogger('kademlia')
log.setLevel(logging.DEBUG)
log.addHandler(logging.StreamHandler())

Running Tests

To run tests:

pip install -r dev-requirements.txt
pytest

Reporting Issues

Please report all issues on github.

Fidelity to Original Paper

The current implementation should be an accurate implementation of all aspects of the paper save one - in Section 2.3 there is the requirement that the original publisher of a key/value republish it every 24 hours. This library does not do this (though you can easily do this manually).

kademlia's People

Contributors

Stargazers

Watchers

Forkers

pombredanne gitmob sunfull0714 vpol gsec guyz daveajones perigee zmike808 lilyout vink007 xuziyan001 bmcorser z4m0 dimddev perillaseed dariobottazzi larryaubstore 117111302 ben-dua kamyu104 utsavdrolia alphaso achillesa cpacia ikeberlein alexwzk rendaw readthecodes vaizguy csnoyes tha-robert f483 anymaster robertsdotpm hfeeki lekster berserkr vasco-santos jacobhenner liangxiaobo rlugojr prabodh1194 bravandi tasatko reklaim tasatkolab cy-fir gvsurenderreddy luxcem tpnguyen mayfield linearregression hanumathrao bjhockley dkodnik a740122 mohamedahmedawad93 faisalburhanudin fashtimedotcom bashkirtsevich fps heeby pendleto trungthanh-tran talon001 loulancn luozijun nandajavarma varsize brianpugh zhoudaifa007 cliftonm jmyles xon91 marcosvalle zenomeplatform marcosmagno vdrg fgadaleta saeveritt dvf sasaxie unoffices yutiansut sourcepirate mitacha nixnodes dzyk mofei168888 songzcn jf87 nnonno mugendainayumeatono luzhongqiu sunzhaoping bretttjohnson1 gyc567 kigawas lamden

kademlia's Issues

Node finding from a server

I am running a kad server on the ubuntu VM and I am trying to find peers/neighbours to route a string. I get the default answer of " key result :a value".

Please let me know if the issue is with my code. If yes, what could be the reason?

Fails to bootstrap node in example

Describe the bug
I am attempting to run the first_node and set examples in tandem, and it fails to bootstrap the node after 5 seconds.

To Reproduce
Steps to reproduce the behavior:

Run first_node.py and set.py using a batch file similar to the one below:

start python first_node.py
python set.py 192.168.86.38 8468 test_k test_v
pause

After running, the set.py instance outputs the following error, while first_node continues running:

2020-07-18 14:21:01,051 - kademlia.network - INFO - Node 108995474039131676481680986686925546654942811706 listening on 192.168.86.38:8469
2020-07-18 14:21:01,052 - kademlia.network - DEBUG - Refreshing routing table
2020-07-18 14:21:01,052 - kademlia.network - DEBUG - Attempting to bootstrap node with 1 initial contacts
Did not receive reply for msg id b'PtLAsQuhaK4GIujIPLtqplhOv14=' within 5 seconds
2020-07-18 14:21:06,053 - kademlia.crawling - INFO - creating spider with peers: []
2020-07-18 14:21:06,053 - kademlia.crawling - INFO - crawling network with nearest: ()
2020-07-18 14:21:06,053 - kademlia.network - INFO - setting 'test_k' = 'test_v' on network
2020-07-18 14:21:06,053 - kademlia.network - WARNING - There are no known neighbors to set key 8488291004d33c55edef2966ff7bc3867068b968

Expected behavior
The Set node should bootstrap and find the neighbor.

Screenshots

Desktop (please complete the following information):

OS: Windows 10
Browser: Chrome
Version: Latest
Python Version: 3.8.3 64-bit

findNeighbors (routing.py) - Is nsmallest even needed?

Hi,

i am not sure about this, but is nsmallest really needed for this method? As far as i can see it, the method selects at max knodes from the buckets (while iteratively increasing the range to those nodes by selecting buckets more to the left or the right of the specified node). Since nodes would hold at max k nodes in the end, wouldn't heapq.nsmallest(k, nodes) just return the list as it is? Because there are never more than k nodes inside the list.
Shouldn't this method just fill the list with nodes from whole buckets and only select the nsmallest nodes when we reach a count greater or equal than k? This would make sure that we really return the k-closest nodes to the given node.

As i said i could be completely wrong. And before i forget: Great code. I really appreciate it!

Question about the routing table

PyPI sdist doesnt contain requirements.txt

setup.py opens requirements.txt, which is missing from the sdist on PyPI.

Missing `items` method in `IStorage` interface

the items method of the storage class is called in the welcomeIfNewNode method of the protocol module line 104

_nodesFound sometimes returns a coroutine, which set_digest tries to use as a list

I've been hunting the cause of this error:

    self.log.info("setting '%s' on %s" % (dkey.hex(), list(map(str, nodes))))
E   TypeError: 'coroutine' object is not iterable

I think I've found it, but what I'm looking at doesn't make much sense.

set_digest calls await spider.find(), calling the result nodes.

This usually works just fine, because nodes ends up being a list, which is used to emit a log on the following line:

self.log.info("setting '%s' on %s" % (dkey.hex(), list(map(str, nodes))))

However, there is at least one execution path where nodes is a coroutine instead of a list.

set_digest calls NodeSpiderCrawl.find, which calls SpiderCrawl._find, which calls NodeSpiderCrawl._nodesFound. You can see from the conclusion of this method that, if we haven't contacted all of self.nearest, then we return the find() method (a coroutine) instead of the list:

https://github.com/bmuller/kademlia/blob/master/kademlia/crawling.py#L142

bootstrap seems to handle this scenario, but set_digest does not.

Am I doing something wrong?

Error when running first_node.py from command line

...$ python3.6 first_node.py 
2019-02-03 21:49:39,816 - kademlia.network - INFO - Node 92115567338729139094396247980566258409520920742 listening on 0.0.0.0:5678
2019-02-03 21:49:39,817 - kademlia.network - DEBUG - Refreshing routing table
Traceback (most recent call last):
  File "first_node.py", line 18, in <module>
    loop.run_until_complete(server.listen(5678))
  File "/usr/lib/python3.6/asyncio/base_events.py", line 452, in run_until_complete
    future = tasks.ensure_future(future, loop=self)
  File "/usr/lib/python3.6/asyncio/tasks.py", line 526, in ensure_future
    raise TypeError('An asyncio.Future, a coroutine or an awaitable is '
TypeError: An asyncio.Future, a coroutine or an awaitable is required

Same issue with the README example (ommiting the node_bootstrap line). I'm not able to see what the problem with loop.run_until_complete is, Thank You.

load_state does not await bootstrap

Describe the bug
When trying to load_state, the bootstrap promise is never awaited and therefore the function won't work. https://github.com/bmuller/kademlia/blob/master/kademlia/network.py#L224

This brings up a bit of weirdness for me because I don't find it smart to save state that way. My solution was to save the whole routing table.

Example in README.md is wrong

A loop.run_forever() call is required to bring the Server online. This should be in the first example in the readme.

Possibility of Direct messaging / broadcast?

Hello!
I have a question about using kademlia beyond a simple DHT storage. Indeed, I need both DHT and a messaging overlay, but I'm having trouble finding how to implement messaging over this library. I'm almost sure that there is a way using the RPC over UDP module, but I'm too dumb (ok, I came from the Java world and I previously used the TomP2P library).

Can you give me some hints on how to send messages between nodes?

Thank you for your help!

Unexpected type conversion in buckets

Describe the bug
Integer seems to be converted to float at https://github.com/bmuller/kademlia/blob/master/kademlia/routing.py#L26 - floats are not precise enough to represent the correct number and therefore both buckets will have overlapping start and end range.

To Reproduce
Split a bucket and check datatype.

$ python3
Python 3.7.5 (default, Nov 20 2019, 09:21:52) 
[GCC 9.2.1 20191008] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> (2**160) / 2
7.307508186654515e+47
>>> type((2**160) / 2)
<class 'float'>
>>> (2**160) // 2
730750818665451459101842416358141509827966271488
>>> type((2**160) // 2)
<class 'int'>

Supporting multiple clients within the same program?

Do you support launching multiple clients from the same python file on different ports? I am attempting to run a simulation and I would like to know if this is possible.

how to solve the problem about NAT?

Dear author,
I'm a new comer in the P2P filed. I very like your kademlia and help me learn the pricinple of P2P network. But I have some problems:
In the real scene of P2P, node is behind NAT and I couldn't get its real IP. I dont know how to use server.bootstrap([bootstrap_node]), its arg must be node's IP.

bug report : Remove nodes with same ip/port

bug1:
if i created a node with 5000 port when i run this file again it will give different node id with same port it means the port and node id are deleted
shouldn't a existing node with the same ip/port be deleted?
what I mean is that a new node will be created and stored in the table even if it has already been stored before (which happens when the same node connects to the network multiple times, as a different id is created each time).

Q) network created through LAN is working but how to through WAN is not connecting , how can i do that?
Q)every network ip address is a bootstrap node?

Exception when republishing keys

There is an issue when republishing keys. In RefreshTable all the keys older than one hour are republished. The problem is in network.py where the key is set. Since the "key" when republishing is actually a sha1 digest you do not need to create a digest of it again. In addition the self.log.debug() messages will throw an exception because the key is a digest, not a string.

def set(self, key, value):
    """
    Set the given key to the given value in the network.
    """
    self.log.debug("setting '%s' = '%s' on network" % (key, value)) <-- key is a digest OR a string here

K closest neighbours

Hi Bmuller,

According to the paper, find_node should return the most closest nodes in its routing table.
I think the code may have bug, firstly it should reversed pop the right bucket, moreover, the node in the same bucket not sorted by distances.

Correct me if I made a mistake
thanks

def next(self):
    """
    Pop an item from the left subtree, then right, then left, etc.
    """
    if len(self.currentNodes) > 0:
        return self.currentNodes.pop()

    if self.left and len(self.leftBuckets) > 0:
        self.currentNodes = self.leftBuckets.pop().getNodes()
        self.left = False
        return self.next()

    if len(self.rightBuckets) > 0:
        self.currentNodes = self.rightBuckets.pop().getNodes()
        self.left = True
        return self.next()

    raise StopIteration

 def findNeighbors(self, node, k=None, exclude=None):
    k = k or self.ksize
    nodes = []
    for neighbor in TableTraverser(self, node):
        if neighbor.id != node.id and (exclude is None or not neighbor.sameHomeAs(exclude)):
            heapq.heappush(nodes, (node.distanceTo(neighbor), neighbor))
        if len(nodes) == k: // should be commented
            break // should be commented

    return map(operator.itemgetter(1), heapq.nsmallest(k, nodes))

'Server' object has no attribute 'close'

In the asyncio example, a close method is called on a Server object, but Server has no such method:

https://github.com/bmuller/kademlia/blob/python3.5/examples/example.py#L30

Also, does it perhaps make more sense to call server.close() and loop.close() in a finally clause of the try block at the end there?

example error

Hi, author
I confront error when running example. After running examples/first_node.py, examples/set.py is launched. My python version is 3.6. Some errors are located at 'loop.run_until_complete(server.bootstrap([bootstrap_node]))'

2018-01-27 15:55:34,424 - kademlia.network - INFO - Node 620640661369763367814907233484537209378776020287 listening on 0.0.0.0:8469
2018-01-27 15:55:34,438 - kademlia.network - DEBUG - Refreshing routing table
2018-01-27 15:55:34,442 - kademlia.network - DEBUG - Attempting to bootstrap node with 1 initial contacts
Traceback (most recent call last):
File "/Applications/PyCharm.app/Contents/helpers/pydev/pydevd.py", line 1531, in
globals = debugger.run(setup['file'], None, None, is_module)
File "/Applications/PyCharm.app/Contents/helpers/pydev/pydevd.py", line 938, in run
pydev_imports.execfile(file, globals, locals) # execute the script
File "/Applications/PyCharm.app/Contents/helpers/pydev/_pydev_imps/_pydev_execfile.py", line 18, in execfile
exec(compile(contents+"\n", file, 'exec'), glob, loc)
File "/Users/zhangken/PycharmProject/kademlia/examples/set.py", line 24, in
loop.run_until_complete(server.bootstrap([bootstrap_node]))
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/asyncio/base_events.py", line 449, in run_until_complete
return future.result()
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/asyncio/tasks.py", line 241, in step
result = coro.throw(exc)
File "/Users/zhangken/PycharmProject/kademlia/kademlia/network.py", line 125, in bootstrap
gathered = await asyncio.gather(cos)
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/asyncio/tasks.py", line 304, in _wakeup
future.result()
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/asyncio/tasks.py", line 239, in _step
result = coro.send(None)
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/asyncio/coroutines.py", line 121, in send
return self.gen.send(value)
File "/Users/zhangken/PycharmProject/kademlia/kademlia/network.py", line 133, in bootstrap_node
return Node(result[1], addr[0], addr[1]) if result[0] else None
TypeError: 'bool' object is not subscriptable*

Actually, at first_node.py side, there is no error and it has received another node info. logs is here

2018-01-27 15:55:11,566 - kademlia.network - INFO - Node 66545907984935011663605094311617222781228204585 listening on 0.0.0.0:8468
2018-01-27 15:55:11,569 - kademlia.network - DEBUG - Refreshing routing table
2018-01-27 15:55:34,452 - kademlia.protocol - INFO - never seen 127.0.0.1:8469 before, adding to router

Remove nodes with same ip/port

If a node is added to the heap, shouldn't a existing node with the same ip/port be deleted? Now, when a node reconnects it gets stored again.

Problem adding many keys in one go

I have a problem adding lots of keys. The modified webserver.tac fails after 277 keys on my machine (consistently) with timeout on the rpc call.
Any ideas?

 def render_POST(self, request):
        for i in range(10000):
            key = i
            #key = request.path.split('/')[-1]
            value = request.content.getvalue()
            log.msg("Setting %s = %s" % (key, value))
            self.kserver.set(key, value)
        return value

Help

What is the use of standalone server in examples/server.tac?

Listen to a port if change happens (for example if one peer puts some data into the DHT)

I used the code to start the server(first node, in port 8580) and to bootstrap other node(5678) with it.
Then I tried to put some string into the DHT with a KEY.
Then i tried to obtain the key value.

But i am curious to know how to inform the other nodes that some change has occured??

Republishing example

To quote the readme: in Section 2.3 there is the requirement that the original publisher of a key/value republish it every 24 hours. This library does not do this (though you can easily do this manually).

Is republishing just calling server.refreshTable()? It seemed like that may be the case, but I didn't know if there was anything else that has to be done.

Also, it isn't clear from the documentation if refreshTable republishes all stored key/values or just key/values originally published by this node. May the latter be assumed?

Python 3.5 Bug: No need to import izip

Firstly, thanks for the wonderful reference implementation!

dvf@ubuntu-us-nyc-01:~ twistd -noy server.tac 
Unhandled Error
Traceback (most recent call last):
  File "/usr/local/lib/python3.5/dist-packages/twisted/application/app.py", line 662, in run
    runApp(config)
  File "/usr/local/lib/python3.5/dist-packages/twisted/scripts/twistd.py", line 25, in runApp
    _SomeApplicationRunner(config).run()
  File "/usr/local/lib/python3.5/dist-packages/twisted/application/app.py", line 380, in run
    self.application = self.createOrGetApplication()
  File "/usr/local/lib/python3.5/dist-packages/twisted/application/app.py", line 445, in createOrGetApplication
    application = getApplication(self.config, passphrase)
--- <exception caught here> ---
  File "/usr/local/lib/python3.5/dist-packages/twisted/application/app.py", line 456, in getApplication
    application = service.loadApplication(filename, style, passphrase)
  File "/usr/local/lib/python3.5/dist-packages/twisted/application/service.py", line 412, in loadApplication
    application = sob.loadValueFromFile(filename, 'application')
  File "/usr/local/lib/python3.5/dist-packages/twisted/persisted/sob.py", line 177, in loadValueFromFile
    eval(codeObj, d, d)
  File "server.tac", line 7, in <module>
    from kademlia.network import Server
  File "/usr/local/lib/python3.5/dist-packages/kademlia/network.py", line 13, in <module>
    from kademlia.storage import ForgetfulStorage
  File "/usr/local/lib/python3.5/dist-packages/kademlia/storage.py", line 2, in <module>
    from itertools import izip
builtins.ImportError: cannot import name 'izip'

Since Twisted supports 3.6, how about:

try:
    from itertools import izip
except ImportError:
    from itertools import zip
...

README example in Python3.5 references Python2.x code.

The setup.py for this project's python3.5 branch had one reference to rpcudp>=3.0.0 which isn't in pypi, so things explode when you try to install it. I'm not sure if there's anything that can be done about that so long as you don't feel rpcudp is ready for prime time though:

Could not find a version that satisfies the requirement rpcudp>=3.0.0 (from kademlia) (from versions: 0.1, 0.2, 0.3, 0.4, 1.0, 2.0, 2.1)
No matching distribution found for rpcudp>=3.0.0 (from kademlia)

However, kademilia also doesn't cite a dependency on twisted (though the master branch does). This was a surprise when I was trying things out so I thought it worth mentioning as that's definitely a stable package you can reference in setup.py.

IPv4-only node communicating with dual stack node

Hi,
Firstly, I'm using your lib in a study project and I find it truly amazing. Thank you for this.
There is something I couldn't figure out - I'm sorry if it's the wrong place to ask.
I have three nodes, one of them being used by the two others to bootstrap. All nodes are IPv6/IPv4 enabled and listening to 0.0.0.0 and "::", except one of them which is IPv4 only (this is not the node used for bootstrap).

When the dual stack one bootstraps against the bootstrap node, it uses his IPv6 address as source. I'm assuming that his IPv4 is then not stored in the bootstrap node neighbors list (maybe I'm wrong)
Then, when the IPv4-only node bootstraps against bootstrap node, will it be able to set or get data from the dual stack node ? Will the bootstrap node sends him the dual stack node IPv4 even if it used his IPv6 to bootstrap ?

I hope this does not look too confused. It's working perfectly when all the nodes are dual stack capable, but I want to ensure that if a node cannot have IPv6 connectivity it will be able to fully act like every other one.
Thank you

Connect 2 peers behind the NAT

Hello,

Thanks for this good project. I have been trying to use this project to test the peer to peer discovery. I have the following setup. (2 nodes over NAT and one Non-NATed bootstrap node)

A public ip (vps on DigitalOcean) as a Bootstrap Node with the following code:

import logging
import asyncio

from kademlia.network import Server

handler = logging.StreamHandler()
formatter = logging.Formatter('%(asctime)s - %(name)s - %(levelname)s - %(message)s')
handler.setFormatter(formatter)
log = logging.getLogger('kademlia')
log.addHandler(handler)
log.setLevel(logging.DEBUG)

server = Server()
server.listen(8468)

loop = asyncio.get_event_loop()
loop.set_debug(True)

try:
    loop.run_forever()
except KeyboardInterrupt:
    pass
finally:
    server.stop()
    loop.close()

Computer A behind NAT (at office) which bootstrap from public ip at DigitalOcean with the following code:

import logging
import asyncio

from kademlia.network import Server

handler = logging.StreamHandler()
formatter = logging.Formatter('%(asctime)s - %(name)s - %(levelname)s - %(message)s')
handler.setFormatter(formatter)
log = logging.getLogger('kademlia')
log.addHandler(handler)
log.setLevel(logging.DEBUG)

server = Server()
server.listen(8468)

loop = asyncio.get_event_loop()
loop.set_debug(True)
loop.run_until_complete(server.bootstrap([("<public ip from DigitalOcean>", 8468)]))

try:
    loop.run_forever()
except KeyboardInterrupt:
    pass
finally:
    server.stop()
    loop.close()

Computer B behind NAT (at home) with the same source code as Computer A.

Those 2 nodes could connect the bootstrap node. My problem is that I can't make those 2 nodes to connect to each other through bootstrap node.

In Computer B - The response was kademlia.protocol - WARNING - no response from <Computer A's NAT's external IP>:46357, removing from router

In Computer A - Vice versa

Since I'm new to "P2P Network" (Kademlia), perhaps I missed or did something wrong. I've tried to search for the answers on the internet for many days, but no luck. There aren't many documents about this problem either.

Could you help guide me how to connect those 2 nodes over NAT? Or source code would be highly appreciated.

Thanks!

Asynchronous Server listening

Hello !

I was wondering why the listen(...) method of the class Server is synchronous ? For a simpler integration with aiohttp I attempted to modify this method a little to make it asynchronous. This is what I did :

from kademlia.network import Server

class AsyncServer(Server):
    def __init__(self, *args, **kwargs):
        self.loop = kwargs.pop("loop", asyncio.get_event_loop())
        super().__init__(*args, **kwargs)

    async def listen(self, port, interface="0.0.0.0"):  # nosec
        """
        Start listening on the given port.
        Provide interface="::" to accept ipv6 address
        """
        listen = self.loop.create_datagram_endpoint(
            self._create_protocol, local_addr=(interface, port)
        )
        log.info("Node %i listening on %s:%i", self.node.long_id, interface, port)
        self.transport, self.protocol = await listen
        # finally, schedule refreshing table
        self.refresh_table()

As you can see I modified this line https://github.com/bmuller/kademlia/blob/master/kademlia/network.py#L71 to replace loop.run_until_complete(...) by await listen.

I don't fully see the impacts of this modification on the overall package but it seems to work pretty well for now. Is there a particular reason why you used loop.run_until_complete(...) method to do the listen ?

I also changed the __init__(...) method to allow to pass a custom loop to the class.

If you find that theses modifications make sense I would be happy to do a PR.

Cheers,
Matthieu.

Consider using github actions as CI

Some problems

There is a problem that the max value of the C{KBucket}.depth() return is 20(which is the length return by util.sharedPrefix()). So I think the ids of the nodes are compared in the form of bytes but not bits. I fixed this problem in a ugly way, turning bytes into integer and then bin() and then add some 0 in the head, so I won't pull my code to you. : D
There will be a problem that, for example, when a KBucket for the prefix 1111 is full of nodes with prefix 11111, this KBucket can never split however it should if a node with 11110 comes in. So I think the new come node should be determine the KBucket to be split or not. This can be done to pass the new come node's id into C{KBucket}.depth().
In the original paper, the author suggest that drop the node only if it hasn't responded for 5 times, in the case that udp is unreliable. I notice that in your implementation the node is removed as long as it have not responded once. In my simulation the network can hardly be stable. May I ask why do you do so?

Is kademlia moving to support ayncio only and not Twisted?

Hey @bmuller. Thanks so much for kademlia.

I just watched your talk, "Dont Overreact", from pygotham last year. Brilliant stuff man.

My question: It sounded pretty certain at the end of the talk that you were only wanting to support ayncio in the future (presumably for this and other projects). Is that decision official?

For my part, I really like Twisted and I suspect that Twisted will experience a substantial resurgence in the next couple of years. I'd love for kademlia to be able to work on either one.

Example of concurrent tasks

Hi !

I'm trying to create an app that communicate thanks to Kademlia. This app have to do an extensive task when a key is updated on the network, so I think I have to do the extensive task concurrently with the network listening.

I have tried to do something like that but it doesn't seems to work.

async def main(server, blockchain):
    """Main loop."""
    while True:
        blockchain = await update_blockchain(server, blockchain)
        blockchain = await mine(blockchain)
        await update_blockchain(server, blockchain)

port = sys.argv[1]

server = Server()
server.listen(port)

loop = asyncio.get_event_loop()

bootstrap_node = [("localhost", 8468)]
loop.run_until_complete(server.bootstrap(bootstrap_node))

# Bootstrap the blockchain
blockchain = loop.run_until_complete(bootstrap_blockchain(server))
print("Blockchain bootstrapped with difficulty " + str(blockchain.difficulty))

print("Start mining...")
loop.create_task(main(server, blockchain))

try:
    loop.run_forever()
except KeyboardInterrupt:
    pass
finally:
    server.stop()
    loop.close()

Note: the update_blockchain(...) coroutine uses server.get(...) and server.set(...) to interact with the network.

I think here the task is not concurrent with the listening because I receive messages like this :

Did not received reply for msg id b'aGaXClwL0Y5BfZ4kzZvZShMx/gs=' within 5 seconds
no response from 127.0.0.1:8470, removing from router
received unknown message b'aGaXClwL0Y5BfZ4kzZvZShMx/gs=' from ('127.0.0.1', 8470); ignoring

I'm sure this question is more asyncio related than kademlia but maybe have you already thought about this kind of problem ?

Cheers,
Matthieu.

Can write permissions be set for keys on Kademlia?

Apologies if this not the right place for the question, and a bit of a newbie to kademlia. Is it possible to set write permissions for a key in a shared Kademlia store? I imagine this could be programmed manually in my client code -- e.g. verify private/public key of author/group match before allowing an overwrite -- but does that still leave open the possibility that another client could connect to the network and overwrite it anyway?

Thanks for reading this and thanks for this incredible package!

Single node network

It looks like the server doesn't work with a single node due to line https://github.com/bmuller/kademlia/blob/master/kademlia/network.py#L166

I haven't read the original paper, but is there any reason this has to happen? I was hoping to test locally with one instance in order to verify other (non DHT) parts of the program.

Values in the DHT must be of specific types

Values in the DHT must be hashable because of:

https://github.com/bmuller/kademlia/blob/master/kademlia/crawling.py#L114

Values in the DHT must also be of types: bool, int, float, str, bytes, because of this function kademlia calls
https://github.com/vsergeev/u-msgpack-python/blob/master/umsgpack.py#L473.

after close my machine nodes not be available in the network i mean nodes not available in kbucket

can anyone help me iam using this repository for to make a private blockchain because kademlia p2p protocol network is very great features so iam using this one , if anyone knows can you send me the detailed info to [email protected]

below i explained what i did sofar and also i attached the screenshots of each file output , i have some questions i given in the end of the page

--> in the firstnode.py file i gave only port is 5000
loop.run_until_complete(server.listen(5000))
if i run that code it is giving nodeid, interface and port and it will keep on running only
because of we used run_forever()
output of firstnode.py file:
"2019-06-14 14:35:52 INFO MainThread modules.network Node
1359752123986314333800961178562141229721419151290 listening on 0.0.0.0:5000"
-->next in the set.py file i gave port is 6000 and here i have to pass the parameters
loop.run_until_complete(server.listen(6000))
bootstrap_node = (sys.argv[1], int(sys.argv[2]))
loop.run_until_complete(server.bootstrap([bootstrap_node]))
result = loop.run_until_complete(server.set(sys.argv[3], sys.argv[4]))
"Usage: python set.py "
bootstrap node: means it is my computer LAN IPaddress i.e "192.168.2.24"
bootstrap port: it is firstnode.py file port i.e 5000 i gave here
key: i.e 1
value: i.e mykey
command : python set.py 192.168.2.24 5000 1 mykey
if i run that command it is giving nodeid,port,interface and also it is gatheresd the peers and n
nearest nodes in the network

sofar what i did it is correct, if any thing wrong can you tell me

that's it sofar it was working but when i will close my machine that created nodes not available in the network because of if you create a new node it will gather the other nodes also if they are online but every time it will create a new nodes added into the router after closing my machine the nodes gone , where can i see my nodes i mean previously created nodes

Questions:

Q1. is it my computer LAN network ip address is my bootstrap node beacause i passed the
bootstrap node in the set.py file as a parameter
Q2. if my system shut down the created nodes automatically removed , i dont't where that
nodes are saved in the network (i know bsically the nodes will be saved in the kbucket but
when i close my machine the nodes gone )?
Q3. if i run same code in the different machine is it both machines gathered the nodes too?but
not gathered
Q4 have you know how to hardcode the bootstrap ip ?
Q5 how to find out the bootstrap node from an existing network?

KBucket splitting is flawed.

It seems that when you split a KBucket, it uses an unbounded value provided by the node id. Splitting buckets uses the node id to determine which bucket to put things into. However, due to the way it is implemented, this does not occur. Consider this:

n1 = Node(b'ag')
n2 = Node(b'gggggggg')

The binary representation of these nodes, which is used when splitting, is the following

b'ag' -> '1100000010111'
b'gggggggg'' -> '11000000010110010101010001001101100010010100101011111'

Therefore, these pieces of data have a depth of 12. This is extremely close together in terms of the protocol. However, the pieces of data are in reality massively different and should be 100% stored in different buckets.

The protocol needs to enforce either left padding or hashes to standardize the size of the node id to maintain correctness.

Consider add a second_node.py example

Examples are useful and they all work well, but there is no example about how to run more than one node (a second_node.py or additional_node.py example).

As an alternative, first_node.py could become a generic node which tries to bootstrap if a host and a port are specified.

Kademlia is dependent on Forgetful Storage

Forgetful Storage should probably be the base class that is extended as it kademlia is dependent on the functions in it.
example: https://github.com/bmuller/kademlia/blob/master/kademlia/network.py#L98

Swappable hash alogirithm for DHT keys?

What's the feasibility of making the SHA1 algorithm swappable so that users can implement, for example, BLAKE2?

I know that SHA1 is in the "basic kademlia" spec, but it doesn't look to me like using BLAKE2 will break anything, and for our purposes, we'd like to make it more difficult to create collisions.

Python 3 support would be nice.

It is already the year 2015 after all.

Add interval for refresh_table

Now it's 3600s, I'd like to specify it like server.refresh_table(interval=30)

didn't find network

how can i know the bootstrap node ip address and port , i pulled the code , i run the your bootstrap node code it will keep on running ,while i run the same file with passing bootstrap node (i mean prevoius code ip and port ) then it is finding the node in my machine , when i run the different system it is not working , can you please give me clarity once

Problem of Run multiple node

Hello, I meet some mistake when i run multiple node.

Node A

2018-11-30 17:38:32,641 - kademlia.network - INFO - Node 410271437375177868241350581552805887836623615468 listening on 0.0.0.0:8468
2018-11-30 17:38:32,642 - kademlia.network - DEBUG - Refreshing routing table
2018-11-30 17:38:40,002 - kademlia.protocol - INFO - never seen 127.0.0.1:8469 before, adding to router
2018-11-30 17:38:40,005 - kademlia.protocol - INFO - finding neighbors of 361734964577977192664527317510177695264778001408 in local table
2018-11-30 17:38:40,009 - kademlia.protocol - INFO - finding neighbors of 361734964577977192664527317510177695264778001408 in local table

Node B

2018-11-30 17:35:51,994 - kademlia.network - INFO - Node 785497310352827902615457672872274918546086031709 listening on 0.0.0.0:1234
2018-11-30 17:35:52,002 - kademlia.network - DEBUG - Refreshing routing table
2018-11-30 17:35:52,002 - kademlia.network - DEBUG - Attempting to bootstrap node with 1 initial contacts
Did not received reply for msg id b'r+uBvXlcuyTQsB1WhRhEX7hzPxg=' within 5 seconds
2018-11-30 17:35:57,006 - kademlia.crawling - INFO - creating spider with peers: []
2018-11-30 17:35:57,006 - kademlia.crawling - INFO - crawling network with nearest: ()
2018-11-30 17:36:26,830 - kademlia.protocol - INFO - never seen 127.0.0.1:8469 before, adding to router
2018-11-30 17:36:26,835 - kademlia.protocol - INFO - finding neighbors of 1009438779640377194790323085010479765906207537747 in local table

This error raise, and i don't known how to fix this .

Did not received reply for msg id b'r+uBvXlcuyTQsB1WhRhEX7hzPxg=' within 5 seconds

when i run get and set I can connect A or B, but when i set a value in A, It can't get in the node B. I am confused, somebody can explain this to me ?

Question about findNeighbors

It looks like this returns only contacts that the node knows about. The Kademlia spec refers to a "node lookup" that hits its closest nodes with FIND_NODE RPC's to find closer nodes. Is that implemented here?

Ah - I see that's implemented in SpiderCrawler. OK, gotta see how/when that's called. Sorry for the premature question, I'm trying to map the Kademlia spec to various implementations (and yours is at the top of readability and clean, across multiple languages I'm looking at) and the thing that's confusing me at the moment in the spec is when the crawl occurs (always, or just for FindNode, or for a specific other call?) and to what recursion depth FindNode crawls to (this last one is particularly of interest because I could see the crawler never ending.)

KBucket.replacement_nodes is never pruned

The list of replacement nodes is never pruned so it will grow to contain all nodes ever seen.

Should it be capped to some size?

Question about TableTraverser next method

Hi, in TableTraverser code as below , when pop an item from rightBuckets, should it pop from the let side from the rightBuckets list ? Because it is nearer, however pop() function pop from right side.

class TableTraverser(object):
    def __init__(self, table, startNode):
        index = table.getBucketFor(startNode)
        table.buckets[index].touchLastUpdated()
        self.currentNodes = table.buckets[index].getNodes()
        self.leftBuckets = table.buckets[:index]
        self.rightBuckets = table.buckets[(index + 1):]
        self.left = True

    def __iter__(self):
        return self

    def __next__(self):
        """
        Pop an item from the left subtree, then right, then left, etc.
        """
        if len(self.currentNodes) > 0:
            return self.currentNodes.pop()

        if self.left and len(self.leftBuckets) > 0:
            self.currentNodes = self.leftBuckets.pop().getNodes()
            self.left = False
            return next(self)

        if len(self.rightBuckets) > 0:
            self.currentNodes = self.rightBuckets.pop().getNodes()      # <----  Here!
            self.left = True
            return next(self)

        raise StopIteration