Code Monkey home page Code Monkey logo

Comments (6)

tqchen avatar tqchen commented on August 24, 2024

If is not necessary to build rabit with MPI support if you want to use rabit's as a communication lib. If you build it with MPI support, the communication lib is switched to MPI.

To build with MPI as backend, simply link against librabt_mpi.a.

  • you can type make mpi to build that

from rabit.

weijianwen avatar weijianwen commented on August 24, 2024

Interesting. Sounds like that rabit is a "message operation" library supporting various backend engines. ZeroMQ works in the similar way.

MPI uses broadcast, collect, reduce as verbs, so it is nice candidate for rabit backend engine. Extra benefits from MPI lie in:

  1. Good integration with highend network fabrics such as Infiniband.
  2. Good integration with job scheduling system. Name a few, SLURM, LSF, OpenLava, SGE. The scheduling system will take care of the MPI jobs for us.

But as mentioned by Tianqi, the tradeoff is: no auto recovery in MPI.

Anyway, topics and benchmarks entitled "rabit-socket v.s. rabit-MPI v.s. ZeroMQ" may be interesting. ZeroMQ is performance oriented, thus no reliability mechanism is designed for it.

Please correct me if I am wrong.

from rabit.

hjk41 avatar hjk41 commented on August 24, 2024

MPI 2.0 does allow you to dynamically spawn new process in case you want to
restart a dead one, though I would say it is not as easy as it seems. And
currently rabit-MPI does not leverage that feature.

ZeroMQ is more optimized for small messages and is not necessarily a good
choice for machine learning workloads, since most messages in ML are large
messages. ZeroMQ is reliable, in that it automatically re-transmits
messages and it also has some kind of load balancing mechanism built into
it.

Besides communication, Rabit provides checkpointing, I think that is the
most important distinction.

On Thu, Jul 30, 2015 at 1:05 AM, 健美猫 [email protected] wrote:

Interesting. Sounds like that rabit is a "message operation" library
supporting various backend engine. ZeroMQ works in the similar way.

MPI uses broadcast, collect, reduce as verbs, so it is nice candidate for
rabit backend engine. Extra benefits from MPI lie in:

  1. Good integration with highend network fabrics such as Infiniband.
  2. Good integration with job scheduling system. Name a few, SLURM,
    LSF, OpenLava, SGE. The scheduling system will take care of the MPI jobs
    for us.

But as mentioned by Tianqi, the tradeoff is: no auto recovery in MPI.

Any, topics and benchmarks entitled "rabit-socket v.s. rabit-MPI v.s.
ZeroMQ" may be interesting. ZeroMQ is performance oriented, thus no
reliability mechanism is designed for it.

Please correct me if I am wrong.


Reply to this email directly or view it on GitHub
#23 (comment).

HONG Chuntao
System Research Group
Microsoft Research Asia

from rabit.

weijianwen avatar weijianwen commented on August 24, 2024

Thank you, @hjk41 . I think these nice features should be higlighted in README and tutorials.

For simplicity, I'll try rabit with default setting first. This issue will be closed.

from rabit.

tqchen avatar tqchen commented on August 24, 2024

@weijianwen It would be great if you can open a PR and contribute your understanding to the tutorial., thanks

from rabit.

weijianwen avatar weijianwen commented on August 24, 2024

@tqchen Sure glad to help. I'll send feedbacks about how to install dmlc stacks on a moderate-sized cluster. As I wasn't engaged in the design process before, my feedbacks will reflect what a library user hope to know when he/she at the very beginning. That would be a good point to reorganize README, tutorials and other docs.

On more thing. I appreciate if someone can merge my PR in dmlc/wormhole. It is typo fixing, not feature adding. As ps-lite replaces ps in wormhole's dependency, I wonder if we should also replace ps's reference link in "Depending DMLC Libraries".

dmlc/wormhole#18

Best,

from rabit.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.