Code Monkey home page Code Monkey logo

mpi4torch's Introduction

mpi4torch Logo


mpi4torch is an automatic-differentiable wrapper of MPI functions for the pytorch tensor library.

MPI stands for Message Passing Interface and is the de facto standard communication interface on high-performance computing resources. To facilitate the usage of pytorch on these resources an MPI wrapper that is transparent to pytorch's automatic differentiation (AD) engine is much in need. This library tries to bridge this gap.

Installation

mpi4torch is also hosted on PyPI. However, due to the ABI-incompatibility of the different MPI implementations it is not provided as a binary wheel and needs to be built locally. Hence, you should have an appropriate C++ compiler installed, as well as the development files of your MPI library be present. The latter are usually provided through the module system of your local cluster, and you should consult the manuals of your cluster for this, or through the package manager of your Linux distribution.

Once the dependencies have been satisfied the installation can be triggered by the usual

    pip install mpi4torch

Usage

It is highly advised to first read the basic usage chapter of the documentation before jumping into action, since there are some implications of the pytorch AD design on the usage of mpi4torch. In other words, there are some footguns lurking!

You have been warned, but if you insist on an easy usage example, consider the following code snippet, which is an excerpt from examples/simple_linear_regression.py

   comm = mpi4torch.COMM_WORLD

   def lossfunction(params):
       # average initial params to bring all ranks on the same page
       params = comm.Allreduce(params, mpi4torch.MPI_SUM) / comm.size

       # compute local loss
       localloss = torch.sum(torch.square(youtput - some_parametrized_function(xinput, params)))

       # sum up the loss among all ranks
       return comm.Allreduce(localloss, mpi4torch.MPI_SUM)

Here we have parallelized a loss function simply by adding two calls to Allreduce. For a more thorough discussion of the example see here.

Tests

Running tests is as easy as

    mpirun -np 2 nose2

Project Status

Tests Documentation Status

mpi4torch's People

Contributors

d1saster avatar lehr-fa avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

Forkers

helen-research

mpi4torch's Issues

torch is installed twice during `pip install`

torch is specified twice: once as a build dependency, and once as an actual install requirement.
The way PEP517 is currently realized in pip is that the build dependencies are installed into a new temporary directory every time one calls pip install. This happens even if torch is already installed in the virtual environment. Possibly a bug in pip ?

Developer install does not work

Installing via python setup.py install works.

But installing via pip install -e . does not. This is however the preferred way for the development cycle.

mpi4torch may be built against a different pytorch version than present in the venv

Related to #5: pip seems to utilize its own installation of pytorch to build mpi4torch, no matter which version is present in the virtual environment.

This has the sideeffect that installing heat==1.2.0, which requires torch<=1.11.0 and then installing mpi4torch leads to unresolved symbols, once one tries to load mpi4torch, since pip will pull a newer version of torch to build mpi4torch.

Steps to reproduce in a fresh virtual environment:

pip install torch==1.11.0
pip install -v mpi4torch # this will at the moment also fetch an instance of torch==1.12.1 and use that to build mpi4torch
python -c 'import mpi4torch'

The latter will result (with torch=1.12.1 being used for the build) in:

Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/home/xyz/Test/2022-09-03_mpi4torch_bug/venv/lib/python3.9/site-packages/mpi4torch/__init__.py", line 2, in <module>
    from ._mpi import *
ImportError: /home/xyz/Test/2022-09-03_mpi4torch_bug/venv/lib/python3.9/site-packages/mpi4torch/_mpi.cpython-39-x86_64-linux-gnu.so: undefined symbol: _ZN3c1022getCustomClassTypeImplERKSt10type_index

which is reasonable since this was changed between 1.11 and 1.12.

Compilation: mpi.h not found

OpenMPI installs its mpi.h under /usr/include/mpi/mpi.h . The csrc file includes however <mpi.h>.

We should either change the include to <mpi/mpi.h> or setup.py must figure out the include directory of mpi. I am not sure though, whether the different mpi vendors put the includes into the same prefix or not.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.