Comments (6)
If is not necessary to build rabit with MPI support if you want to use rabit's as a communication lib. If you build it with MPI support, the communication lib is switched to MPI.
To build with MPI as backend, simply link against librabt_mpi.a
.
- you can type
make mpi
to build that
from rabit.
Interesting. Sounds like that rabit is a "message operation" library supporting various backend engines. ZeroMQ works in the similar way.
MPI uses broadcast
, collect
, reduce
as verbs, so it is nice candidate for rabit backend engine. Extra benefits from MPI lie in:
- Good integration with highend network fabrics such as Infiniband.
- Good integration with job scheduling system. Name a few, SLURM, LSF, OpenLava, SGE. The scheduling system will take care of the MPI jobs for us.
But as mentioned by Tianqi, the tradeoff is: no auto recovery in MPI.
Anyway, topics and benchmarks entitled "rabit-socket v.s. rabit-MPI v.s. ZeroMQ" may be interesting. ZeroMQ is performance oriented, thus no reliability mechanism is designed for it.
Please correct me if I am wrong.
from rabit.
MPI 2.0 does allow you to dynamically spawn new process in case you want to
restart a dead one, though I would say it is not as easy as it seems. And
currently rabit-MPI does not leverage that feature.
ZeroMQ is more optimized for small messages and is not necessarily a good
choice for machine learning workloads, since most messages in ML are large
messages. ZeroMQ is reliable, in that it automatically re-transmits
messages and it also has some kind of load balancing mechanism built into
it.
Besides communication, Rabit provides checkpointing, I think that is the
most important distinction.
On Thu, Jul 30, 2015 at 1:05 AM, 健美猫 [email protected] wrote:
Interesting. Sounds like that rabit is a "message operation" library
supporting various backend engine. ZeroMQ works in the similar way.MPI uses broadcast, collect, reduce as verbs, so it is nice candidate for
rabit backend engine. Extra benefits from MPI lie in:
- Good integration with highend network fabrics such as Infiniband.
- Good integration with job scheduling system. Name a few, SLURM,
LSF, OpenLava, SGE. The scheduling system will take care of the MPI jobs
for us.But as mentioned by Tianqi, the tradeoff is: no auto recovery in MPI.
Any, topics and benchmarks entitled "rabit-socket v.s. rabit-MPI v.s.
ZeroMQ" may be interesting. ZeroMQ is performance oriented, thus no
reliability mechanism is designed for it.Please correct me if I am wrong.
—
Reply to this email directly or view it on GitHub
#23 (comment).
HONG Chuntao
System Research Group
Microsoft Research Asia
from rabit.
Thank you, @hjk41 . I think these nice features should be higlighted in README and tutorials.
For simplicity, I'll try rabit with default setting first. This issue will be closed.
from rabit.
@weijianwen It would be great if you can open a PR and contribute your understanding to the tutorial., thanks
from rabit.
@tqchen Sure glad to help. I'll send feedbacks about how to install dmlc stacks on a moderate-sized cluster. As I wasn't engaged in the design process before, my feedbacks will reflect what a library user hope to know when he/she at the very beginning. That would be a good point to reorganize README, tutorials and other docs.
On more thing. I appreciate if someone can merge my PR in dmlc/wormhole. It is typo fixing, not feature adding. As ps-lite replaces ps in wormhole's dependency, I wonder if we should also replace ps's reference link in "Depending DMLC Libraries".
Best,
from rabit.
Related Issues (20)
- [RFC] vectorize reducer HOT 11
- Multi-threading and Rabit allreduce/broadcast ops HOT 2
- How to set the environment variables in rabit?
- add timeout thread to avoid rabit hang forever HOT 1
- Publish rabit on Pypi ? HOT 4
- include/rabit/serializable.h:14:10: fatal error: 'dmlc/io.h' file not found HOT 2
- Network Implementation Methods HOT 3
- Setup CI for Windows
- Rabit should import dmlc-core as a CMake target HOT 2
- C vs C++ MPI Usage HOT 8
- Conflict of _assert with _assert defined in the standard headers on FreeBSD
- CI is broken HOT 1
- Consider altering Makefile ARCH/CFLAGS Behaviour HOT 3
- Rabit moved into XGBoost; no issue / PR will be accepted here HOT 2
- 不好意思打扰了,请问下为什么我的编译不了
- How to run rabit on cluster? HOT 2
- Eliminate extra dataset copy in Python. HOT 1
- Compiling and running tests? HOT 5
- [RFC] Rabit2 Design Proposal HOT 4
- dmlc-submit does not accept ssh cluster type HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from rabit.