Code Monkey home page Code Monkey logo

dhtnode's People

Contributors

geod24 avatar iain-buclaw-sociomantic avatar joseph-wakeling-frequenz avatar joseph-wakeling-sociomantic avatar leandro-lucarella-sociomantic avatar mathias-baumann-sociomantic avatar mathias-lang-sociomantic avatar matthias-wende-sociomantic avatar mihails-strasuns-sociomantic avatar nemanja-boric-sociomantic avatar stefan-koch-sociomantic avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

dhtnode's Issues

Packaged dhtnode-d2 is built with coverage

% apt-cache policy dhtnode-d2   
dhtnode-d2:
  Installed: 1.12.8+dirty.20180717073230-xenial
  Candidate: 1.12.8+dirty.20180717073230-xenial
  Version table:
 *** 1.12.8+dirty.20180717073230-xenial 100
         50 https://dl.bintray.com/sociomantic-tsunami/nodes xenial/release amd64 Packages
        100 /var/lib/dpkg/status
% ls -la /.*lst
-rw------- 1 dhtnode core 0 Sep  3 09:05 /srv/dhtnode/dhtnode-3/.-submodules-ocean-src-ocean-core-Array.lst
-rw------- 1 dhtnode core 0 Sep  3 09:05 /srv/dhtnode/dhtnode-3/.-submodules-ocean-src-ocean-core-array-Mutation.lst
-rw------- 1 dhtnode core 0 Sep  3 09:05 /srv/dhtnode/dhtnode-3/.-submodules-ocean-src-ocean-core-array-Search.lst
-rw------- 1 dhtnode core 0 Sep  3 09:05 /srv/dhtnode/dhtnode-3/.-submodules-ocean-src-ocean-core-array-Transformation.lst
-rw------- 1 dhtnode core 0 Sep  3 09:05 /srv/dhtnode/dhtnode-3/.-submodules-ocean-src-ocean-core-BitManip.lst
-rw------- 1 dhtnode core 0 Sep  3 09:05 /srv/dhtnode/dhtnode-3/.-submodules-ocean-src-ocean-core-Buffer.lst
-rw------- 1 dhtnode core 0 Sep  3 09:05 /srv/dhtnode/dhtnode-3/.-submodules-ocean-src-ocean-core-buffer-NoIndirections.lst
-rw------- 1 dhtnode core 0 Sep  3 09:05 /srv/dhtnode/dhtnode-3/.-submodules-ocean-src-ocean-core-buffer-Void.lst
-rw------- 1 dhtnode core 0 Sep  3 09:05 /srv/dhtnode/dhtnode-3/.-submodules-ocean-src-ocean-core-buffer-WithIndirections.lst
-rw------- 1 dhtnode core 0 Sep  3 09:05 /srv/dhtnode/dhtnode-3/.-submodules-ocean-src-ocean-core-ByteSwap.lst
-rw------- 1 dhtnode core 0 Sep  3 09:05 /srv/dhtnode/dhtnode-3/.-submodules-ocean-src-ocean-core-ContextUnion.lst
-rw------- 1 dhtnode core 0 Sep  3 09:05 /srv/dhtnode/dhtnode-3/.-submodules-ocean-src-ocean-core-Enforce.lst

It seems that the DHT node is compiled and packaged with -cov

Investigate effects of load-factor on loading / iteration / lookup speeds

We're currently using a fixed load-factor, which is probably a bad thing.

Looking at the TC sourcecode:

#define TCMDBMNUM      8                 // number of internal maps
#define TCMDBDEFBNUM   65536             // default bucket number

So it seems that the default number of buckets is 8*65,536 = 524,288. This is pretty small.

Refactor storage methods to avoid double-copy of records?

Calling tcmdbget causes TC to allocate a buffer and copy the record value into it. This value is then copied into the user-provided buffer and the TC buffer freed.

It seems like it might be possible to refactor such methods to directly return the TC buffer and free it when finished (e.g. make the value accessible in the scope of a delegate), avoiding the extra copy. This might be especially relevant during iteration, which we know is very CPU intensive.

Investigate performance impact of only triggering Mirror updates if a record value has actually changed

There are sometimes cases where a Put request will overwrite a record with the same value (i.e. unchanged). These cases will currently trigger Mirror and Listen updates unnecessarily, leading to wasted resources in the DHT, network, and mirroring applications.

It would be worth testing the performance impact of checking if the record value has changed before triggering listeners. (It may turn out that the impact of this change would be greater than the impact of not doing this check, but it seems worth looking into quickly.)

Small optimisation: Pass binary 64bits keys to Tokyocabinet

The dhtnode currently passes record keys to TC as 16-character char[]s: 128 bits. They could be passed as slices of ulongs: 64 bits.

This would reduce memory usage in the nodes a little and would be a small optimisation on record insertion / lookup (hashing 8 bytes instead of 16).

Note that this doesn't mean the format dumped to disk would need to be changed as well.

Remove Trusty support

Remove trusty support from all projects. Trusty is already more than 3 years old and the latest LTS (Xenial) has already been out there for more than a year, so it should be enough to maintain the latest LTS only now.

Optimise storage iteration for the usual case (only one iteration request active)

As each TokyoCabinet instance only has a single iterator, StorageEngine.getNextKey currently always restarts the iteration based on the last key that was returned by the iterator (https://github.com/sociomantic-tsunami/dhtnode/blob/master/src/dhtnode/storage/StorageEngine.d#L616-L617). I think this (restarting an iteration at an arbitrary key) is a relatively expensive operation.

Looking at the node request stats, I see that most of the time only a single GetAll request is active, so this constant restarting of the iteration is probably not necessary. It only needs to be restarted it the iterator for that instance has been moved by another request.

This could provide a pretty significant CPU performance improvement to GetAll.

Check and update README

It contains some out-dated stuff. (e.g. "the dht client defined in swarm (swarm.dht.DhtClient)".)

Neo handling of redistributions

During a redistribution, the behaviour of the Redistribute, Get, Put, and GetHashRange requests interact to form the behaviour of the system as a whole. Only by each of these requests behaving in strictly defined ways can the system function normally while data redistribution is in progress.

This issue discusses the required behaviour, from the node's point of view.

See also sociomantic-tsunami/dhtproto#22.

Adapt storage engine methods to not convert hashes

The neo-compatible storage engine methods currently:

  • Accept a hash_t.
  • Render it to a char[16].
  • Pass this to the appropriate legacy storage engine method.

The legacy storage engine methods are now being adapted (#37) to pass hash_ts directly to Tokyo Cabinet as keys. This means that the methods can be refactored so that:

  • The neo methods are the "main" ones. (Dealing purely with hash_ts.)
  • The legacy methods should convert from a string to a hash_t and call the appropriate neo method.

`getResources` could return the invalid iterator

When shutting down and starting up neotest utility in the mirror mode (starting and shutting down several instances at random times, while three to four neotest instances are filling the channels, dhtnode segfaulted.

It seems that the iterator returned here:

https://github.com/sociomantic-tsunami/dhtnode/blob/neo/src/dhtnode/request/neo/Mirror.d#L198

is invalid:

(gdb) p *iterator
$6 = {storage = 0x7ffff2d09dd8, started = 8, 
  key_buffer = 0x1 <error: Cannot access memory at address 0x1>, 
  current_key = 0x8 <error: Cannot access memory at address 0x8>, 
  value_buffer = 0x8 <error: Cannot access memory at address 0x8>}
#0  0x0000000000000000 in ?? ()
#1  0x0000000000671ea7 in dhtnode.request.neo.Mirror.MirrorImpl_v0.startIteration() (this=0x7ffff2a94d78)
    at ./src/dhtnode/request/neo/Mirror.d:199
#2  0x0000000000688814 in dhtproto.node.neo.request.Mirror.MirrorProtocol_v0.PeriodicRefresh.refresh() (this=0x7ffff2a94be0)
    at ./submodules/dhtproto/src/dhtproto/node/neo/request/Mirror.d:979
#3  0x0000000000688766 in dhtproto.node.neo.request.Mirror.MirrorProtocol_v0.PeriodicRefresh.fiberMethod() (this=0x7ffff2a94be0)
    at ./submodules/dhtproto/src/dhtproto/node/neo/request/Mirror.d:964
#4  0x0000000000769d9e in core.thread.Fiber.run() (
    this=0x7ffff2a94d68)
    at /home/jenkins/docker/src/core/thread.d:3200
#5  0x0000000000769cf8 in fiber_entryPoint ()
    at /home/jenkins/docker/src/core/thread.d:2489
#6  0x0000000000000000 in ?? ()

Segfault in YieldedRequestOnConns with Mirror

Steps to reproduce (dhtnode and dhtproto neo-alpha-3 + sociomantic-tsunami/swarm#115):

  1. Start a dhtnode.
  2. Start a client writing to 5 channels (e.g. neotest fill 100000 test test2 test3 test4 test5).
  3. Start a client mirroring the same channels (e.g. neotest multimirror test test2 test3 test4 test5).
  4. Stop and start the mirroring client.
  5. A segfault occurs here:
Program received signal SIGSEGV, Segmentation fault.
0x0000000000714f03 in swarm.neo.connection.YieldedRequestOnConns.YieldedRequestOnConns.handle_(ulong).__dgliteral574(swarm.neo.connection.YieldedRequestOnConns.YieldedRequestOnConns.IYieldedRequestOnConn) (this=0x7ffff7eeaa00, yielded=0x7fffee16f7c8)
    at ./submodules/swarm/src/swarm/neo/connection/YieldedRequestOnConns.d:126
126	            (IYieldedRequestOnConn yielded) {yielded.resume();}
(gdb) bt
#0  0x0000000000714f03 in swarm.neo.connection.YieldedRequestOnConns.YieldedRequestOnConns.handle_(ulong).__dgliteral574(swarm.neo.connection.YieldedRequestOnConns.YieldedRequestOnConns.IYieldedRequestOnConn) (this=0x7ffff7eeaa00, yielded=0x7fffee16f7c8)
    at ./submodules/swarm/src/swarm/neo/connection/YieldedRequestOnConns.d:126
#1  0x000000000071520f in swarm.neo.connection.YieldedRequestOnConns.YieldedRequestOnConns.YieldedQueue.swapAndPop(void(swarm.neo.connection.YieldedRequestOnConns.YieldedRequestOnConns.IYieldedRequestOnConn) delegate).__foreachbody1935(ref swarm.neo.connection.YieldedRequestOnConns.YieldedRequestOnConns.IYieldedRequestOnConn) (this=0x7fffffffde70, __applyArg0=0x7fffffffdd28) at ./submodules/swarm/src/swarm/neo/connection/YieldedRequestOnConns.d:225
#2  0x00000000007153f8 in swarm.neo.util.TreeQueue.TreeQueue!(swarm.neo.connection.YieldedRequestOnConns.YieldedRequestOnConns.IYieldedRequestOnConn).TreeQueue.opApply(int(ref swarm.neo.connection.YieldedRequestOnConns.YieldedRequestOnConns.IYieldedRequestOnConn) delegate).__dgliteral428(ref ulong) (
    this=0x7fffffffde10, value_=0x7fffffffdda0) at ./submodules/swarm/src/swarm/neo/util/TreeQueue.d:99
#3  0x00000000006e1aab in swarm.neo.util.TreeQueue.TreeQueueCore.opApply(int(ref ulong) delegate) (this=0x7ffff7eeaa60, dg=...)
    at ./submodules/swarm/src/swarm/neo/util/TreeQueue.d:462
#4  0x00000000007153be in swarm.neo.util.TreeQueue.TreeQueue!(swarm.neo.connection.YieldedRequestOnConns.YieldedRequestOnConns.IYieldedRequestOnConn).TreeQueue.opApply(int(ref swarm.neo.connection.YieldedRequestOnConns.YieldedRequestOnConns.IYieldedRequestOnConn) delegate) (this=0x7ffff7eeaa60, dg=...)
    at ./submodules/swarm/src/swarm/neo/util/TreeQueue.d:95
#5  0x0000000000715159 in swarm.neo.connection.YieldedRequestOnConns.YieldedRequestOnConns.YieldedQueue.swapAndPop(void(swarm.neo.connection.YieldedRequestOnConns.YieldedRequestOnConns.IYieldedRequestOnConn) delegate) (this=0x7ffff7eeaa60, dg=...)
    at ./submodules/swarm/src/swarm/neo/connection/YieldedRequestOnConns.d:224
#6  0x0000000000714ee2 in swarm.neo.connection.YieldedRequestOnConns.YieldedRequestOnConns.handle_(ulong) (this=0x7ffff7eeaa00, n=1)
    at ./submodules/swarm/src/swarm/neo/connection/YieldedRequestOnConns.d:125
#7  0x00000000006a9939 in ocean.io.select.client.SelectEvent.ISelectEvent.handle(ocean.sys.Epoll.epoll_event_t.Event) (this=0x7ffff7eeaa00, event=1)
    at ./submodules/ocean/src/ocean/io/select/client/SelectEvent.d:154
#8  0x000000000071a0f9 in ocean.io.select.selector.SelectedKeysHandler.SelectedKeysHandler.handleSelectedKey(ocean.sys.Epoll.epoll_event_t, void(Exception) delegate) (this=0x7ffff7ee9c80, unhandled_exception_hook=..., key=...) at ./submodules/ocean/src/ocean/io/select/selector/SelectedKeysHandler.d:156
#9  0x000000000071a08c in ocean.io.select.selector.SelectedKeysHandler.SelectedKeysHandler.opCall(ocean.sys.Epoll.epoll_event_t[], void(Exception) delegate) (
    this=0x7ffff7ee9c80, unhandled_exception_hook=..., selected_set=...) at ./submodules/ocean/src/ocean/io/select/selector/SelectedKeysHandler.d:118
#10 0x000000000071ccc0 in ocean.io.select.EpollSelectDispatcher.EpollSelectDispatcher.select(bool) (this=0x7ffff7ee8d00, exit_asap=false)
    at ./submodules/ocean/src/ocean/io/select/EpollSelectDispatcher.d:795
#11 0x000000000071cb15 in ocean.io.select.EpollSelectDispatcher.EpollSelectDispatcher.eventLoop(bool() delegate, void(Exception) delegate) (
    this=0x7ffff7ee8d00, unhandled_exception_hook=..., select_cycle_hook=...) at ./submodules/ocean/src/ocean/io/select/EpollSelectDispatcher.d:711
#12 0x000000000064d99b in dhtnode.main.DhtNodeServer.run(ocean.text.Arguments.Arguments, ocean.util.config.ConfigParser.ConfigParser) (this=0x7ffff7eeae00, 
    config=0x7ffff7ee8a80, args=0x7ffff7ed3b00) at src/dhtnode/main.d:414
#13 0x000000000069eca9 in ocean.util.app.DaemonApp.DaemonApp.run(char[][]) (this=0x7ffff7eeae00, args=...)
    at ./submodules/ocean/src/ocean/util/app/DaemonApp.d:430
#14 0x00000000007090f8 in ocean.util.app.Application.Application.main(char[][]) (this=0x7ffff7eeae00, args=...)
    at ./submodules/ocean/src/ocean/util/app/Application.d:265
#15 0x000000000064ce38 in D main (cl_args=...) at src/dhtnode/main.d:72

dhtdump one-shot mode doesn't exit immediately

The documentation of the one-shot parameter says:

one-shot mode, perform a single dump immediately then exit

However, what happens is that the dump cycle is immediately performed, but the dhtdump does not exit.

It is possible that the timer registered in the epoll is preventing the shutdown (there's no call for epoll.shutdown).

Add systemd support

Systemd support should be added in 3 stages:

  • Create the unit file and add it to the package (when available)
  • Deploy the unit file (deploying the package when available)
  • Switch the servers running the app to systemd

Add test of dump format

A test that dumps a TCM file to disk and then reads the format back to check it.

Complication: a dump file can be written in two ways:

  1. By the dhtnode at shutdown (data provided via a storage engine iterator).
  2. By dhtdump (data provided via a GetAll request).

Client connection can hang after Mirror request has been aborted

Reproduction steps (dhtnode and dhtproto neo-alpha-3):

  1. Start a dhtnode with some data in a channel.
  2. Start a client which does a Mirror request.
  3. During the initial refresh cycle, SIGINT the client.
  4. Upon restarting the client, the client hangs after the this.dht.blocking.waitAllHashRangesKnown(); call exits.

Segfault in GC with Mirror

Steps to reproduce (dhtnode and dhtproto neo-alpha-3 + sociomantic-tsunami/swarm#115 + sociomantic-tsunami/swarm#117):

  1. Start a dhtnode with data in 5 channels.
  2. Start a client mirroring the channels (e.g. neotest multimirror test test2 test3 test4 test5).
  3. Stop and start the mirroring client.
  4. A segfault occurs here:
Program received signal SIGSEGV, Segmentation fault.
gc.cdgc.gc.malloc(ulong, uint, ulong*) (pm_bitmask=0xa73230 <internal>, attrs=638, size=140737304053504) at /home/jenkins/docker/src/gc/cdgc/gc.d:1521
1521	/home/jenkins/docker/src/gc/cdgc/gc.d: No such file or directory.
(gdb) bt
#0  gc.cdgc.gc.malloc(ulong, uint, ulong*) (pm_bitmask=0xa73230 <internal>, attrs=638, size=140737304053504) at /home/jenkins/docker/src/gc/cdgc/gc.d:1521
#1  0x0000000000766515 in gc.cdgc.gc.calloc(ulong, uint, ulong*) (pm_bitmask=0x61, attrs=97, size=97) at /home/jenkins/docker/src/gc/cdgc/gc.d:1600
#2  0x00000000007629b2 in gc.cdgc.gc.gc_calloc(ulong, uint, object.PointerMap).locked!(void*, gc.cdgc.gc.gc_calloc(ulong, uint, object.PointerMap).__dgliteral122()).locked().__dgliteral122() (this=0x7fffee1a2a30) at /home/jenkins/docker/src/gc/cdgc/gc.d:2604
#3  0x0000000000762906 in gc.cdgc.gc.gc_calloc(ulong, uint, object.PointerMap).locked!(void*, gc.cdgc.gc.gc_calloc(ulong, uint, object.PointerMap).__dgliteral122()).locked() (this=0x7fffee1a2a30) at /home/jenkins/docker/src/gc/cdgc/gc.d:268
#4  0x0000000000762581 in gc_calloc (size=97, attrs=0, ptrmap=...) at /home/jenkins/docker/src/gc/cdgc/gc.d:2599
#5  0x000000000075f939 in _d_arraysetlengthT (ti=0xa588c0 <TypeInfo_AS5swarm3neo8protocol6socket9uio_const11iovec_const.init$>, newlength=6, p=0x7ffff7eea8f0)
    at /home/jenkins/docker/src/rt/lifetime.d:756
#6  0x00000000006e184d in swarm.neo.protocol.socket.MessageGenerator.IoVecMessage.setup(swarm.neo.protocol.Message.MessageType, void[][], void[][]...) (
    this=0x7ffff7eea8f0, static_fields=..., dynamic_fields=..., type=1 '\001') at ./submodules/swarm/src/swarm/neo/protocol/socket/MessageGenerator.d:62
#7  0x000000000069325b in swarm.neo.protocol.socket.MessageSender.MessageSender.assign(swarm.neo.protocol.Message.MessageType, void[][], void[][]...) (
    this=0x7ffff7eea800, static_fields=..., dynamic_fields=..., type=1 '\001') at ./submodules/swarm/src/swarm/neo/protocol/socket/MessageSender.d:166
#8  0x000000000070eebf in swarm.neo.connection.ConnectionBase.ConnectionBase.SendLoop.sendRequestPayload(ulong).__dgliteral567(void[][]) (
    this=0x7fffee1a2dd0, payload=...) at ./submodules/swarm/src/swarm/neo/connection/ConnectionBase.d:423
#9  0x00000000006e03f4 in swarm.neo.connection.RequestOnConnBase.RequestOnConnBase.getPayloadForSending(void(void[][]) delegate) (this=0x7ffff4fdca00, 
    send=...) at ./submodules/swarm/src/swarm/neo/connection/RequestOnConnBase.d:1734
#10 0x0000000000751eb3 in swarm.neo.node.RequestSet.RequestSet.Request.getPayloadForSending(void(void[][]) delegate) (this=0x7ffff4fdca00, send=...)
    at ./submodules/swarm/src/swarm/neo/node/RequestSet.d:148
#11 0x0000000000727996 in swarm.neo.node.Connection.Connection.getPayloadForSending(ulong, void(void[][]) delegate) (this=0x7ffff4f84f00, send=..., id=2)
    at ./submodules/swarm/src/swarm/neo/node/Connection.d:216
#12 0x000000000070edca in swarm.neo.connection.ConnectionBase.ConnectionBase.SendLoop.sendRequestPayload(ulong) (this=0x7ffff4f84a00, id=2)
    at ./submodules/swarm/src/swarm/neo/connection/ConnectionBase.d:419
#13 0x000000000070ed7e in swarm.neo.connection.ConnectionBase.ConnectionBase.SendLoop.loop().__foreachbody1902(ref ulong) (this=0x7fffee1a2f18, 
    __applyArg0=0x7fffee1a2e70) at ./submodules/swarm/src/swarm/neo/connection/ConnectionBase.d:387
#14 0x00000000006e251b in swarm.neo.util.TreeQueue.TreeQueueCore.opApply(int(ref ulong) delegate) (this=0x7ffff4f84a60, dg=...)
    at ./submodules/swarm/src/swarm/neo/util/TreeQueue.d:462
#15 0x000000000070ecd7 in swarm.neo.connection.ConnectionBase.ConnectionBase.SendLoop.loop() (this=0x7ffff4f84a00)
    at ./submodules/swarm/src/swarm/neo/connection/ConnectionBase.d:386
#16 0x000000000070eaec in swarm.neo.connection.ConnectionBase.ConnectionBase.SendLoop.fiberMethod() (this=0x7ffff4f84a00)
    at ./submodules/swarm/src/swarm/neo/connection/ConnectionBase.d:320
#17 0x000000000076b6ae in core.thread.Fiber.run() (this=0x5ed) at /home/jenkins/docker/src/core/thread.d:3200
#18 0x000000000076b608 in fiber_entryPoint () at /home/jenkins/docker/src/core/thread.d:2489
#19 0x0000000000000000 in ?? ()

Test failure after implementing neo GetChannels

#17 requires implementing the neo GetChannels request. After doing so, I'm seeing odd test failures: the legacy handshake done to remove the test channel after each test case fails.

Though I don't think this is a problem with any test case (or combination thereof) in particular, the following seems to be the minimum set of test cases to encounter this issue:

import dhttest.cases.UnorderedPut;
import dhttest.cases.neo.Basic;
import dhttest.cases.neo.OrderedPut;
import dhttest.cases.neo.Mirror;
import dhttest.cases.neo.GetChannels;

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.