Code Monkey home page Code Monkey logo

Comments (5)

bboreham avatar bboreham commented on May 24, 2024

The idea is that gossip messages are not sent very often - Weave Mesh selects log(number of connections) peers to send to - so the scalability should be good.

However various other issues in the code mean that messages are sent way more often than this ideal. #101, #106 and #107 are attempts to improve matters, though work is ongoing to understand the full set of causes.

from mesh.

jacksontj avatar jacksontj commented on May 24, 2024

From my use-case of protokube (kubernetes/kops#7427) I'm seeing ~2 cores of CPU usage and ~3G of RAM usage with a fully connected mesh of ~300 nodes. This seems to highlight some serious scale limitations of weaveworks/mesh -- as that isn't even a very large cluster. More importantly the utilization ramp-up was more-or-less exponential as more nodes were added.

Even after I made a custom build with #107 fixed the CPU usage dropped to 1.6 cores -- which is still way too many (all the CPU time was being spent marshaling/unmarshaling the peer list being gossiped around).

There seem to be quite a few issues, a couple: (1) no concept of "suspect" state (2) peer messages include the list of all peers it has connected-- which scaled with cluster size. There are likely more but TBH I have decided to instead spend my time swapping protokube to a more robust/reliable gossip library.

from mesh.

bboreham avatar bboreham commented on May 24, 2024

We're seeing quite positive results from deferring gossip updates - #117 and #118.
Would you like to try those in your build?

from mesh.

bboreham avatar bboreham commented on May 24, 2024

peer messages include the list of all peers it has connected

It's worse than that - the topology message lists all the connections of all peers. In other words in a fully-connected cluster it's O(N^2).

However, for 300 nodes that might be 8MB per message, which needs something else to get to 1-2GB.

We found that the connections would each read a message then block on the Peers lock to apply the update. So with 200 connections that's 1.6GB.

Changing the "everyone sends everything" behaviour is quite a big change, so ahead of that I felt that just slowing down the initial connections would help - #124 . After initial connection the updates only go to logN peers so we shouldn't get the massive spikes.

from mesh.

bboreham avatar bboreham commented on May 24, 2024

I'm going to close this issue now 0.4 is released - if you want to come back to the discussion please do.

from mesh.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.