Code Monkey home page Code Monkey logo

horovod-speedup-estimator's Introduction

horovod-speedup-estimator

Python tool to estimate speedup of Pytorch models distributed with Horovod, with respect to the batch size and the number of MPI processes used.

This is visualized in a 2D plot:

example

To explain the figure:

  • The black areas are configurations where no speedup is achieved (speedup < 1) from distribution, so distributing the training of this example network with a batch size less than 40 and with 2 MPI processes will result in slower training times compared to training with the same batch size without any distribution of the computation, because of communication overhead introduced from distribution.
  • 2 - 7 MPI processes can be executed on a single computation node for this case, therefore we see a drop in speedup as we have to use mode than one node from 8 MPI processes onwards, as the communication overhead now includes (relatively slow) inter-node communication.
  • As the batch size increases, the computational work per batch increases, and since the communication time does not scale with the batch size (only with the number of parameters in the network), the distribution is more efficient, as is reflected by the higher speedup towards the right side of the figure.
  • In this example, around a batch size of 50 the results suddenly show a discontinuity in the speedup, this is due to external factors that may influence the estimation of computation times (such as cache hits/misses, or other processes using the system). Increasing the number of iterations (-it) will result in a reduction of these anomalies and a smoother figure, at the cost of longer simulation times.

###Dealing with model hyper parameters

The tool currently only works for models that take no required parameters, as they would be impractical to specify on the command line. Therefore, if you have a model that takes required parameters, you will have to wrap it in a wrapper class. You can do this as follows:

class MyNet(torch.nn.Module):
    def __init__(self, a, b):  # this nn requires 2 parameters
        self.a = a
        self.b = b
        ...
    ...


class MyNetWrapper(MyNet):  # wrapper behaves exactly like MyNet
    def __init__(self):  # but requires 0 parameters
        a = ...  # you define your parameters here
        b = ...
        super().__init__(a, b)  # and that's it

(The model is only initialized once by the tool, keep that in mind when defining model parameters in the wrapper) You can now use MyNetWrapper without issue.

horovod-speedup-estimator's People

Contributors

matthijsdewit111 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.