Code Monkey home page Code Monkey logo

Comments (7)

jan-janssen avatar jan-janssen commented on August 22, 2024

When the number of cores is increased further, than the number of cores with high CPU load also increases. So it seems the mpi broadcast https://github.com/pyiron/pylammpsmpi/blob/main/pylammpsmpi/mpi/lmpmpi.py#L488 is waiting for the socket to receive information https://github.com/pyiron/pympipool/blob/main/pympipool/shared/communication.py#L140

from pympipool.

jan-janssen avatar jan-janssen commented on August 22, 2024

The CPU load is related to the MPI broadcast, here is a reduced example. Script reply.py:

import sys

from mpi4py import MPI

from pympipool.shared.communication import (
    interface_connect,
    interface_send,
    interface_receive,
)
from pympipool.shared.backend import parse_arguments


def main(argument_lst=None):
    if argument_lst is None:
        argument_lst = sys.argv
    argument_dict = parse_arguments(argument_lst=argument_lst)
    if MPI.COMM_WORLD.rank == 0:
        context, socket = interface_connect(
            host=argument_dict["host"], port=argument_dict["zmqport"]
        )
    else:
        context, socket = None, None

    while True:
        if MPI.COMM_WORLD.rank == 0:
            input_dict = interface_receive(socket=socket)
        else:
            input_dict = None
        input_dict = MPI.COMM_WORLD.bcast(input_dict, root=0)
        if MPI.COMM_WORLD.rank == 0 and input_dict is not None:
            interface_send(socket=socket, result_dict={"result": input_dict})


if __name__ == "__main__":
    main(argument_lst=sys.argv)

jupyter notebook to control the reply.py script:

import os
from pympipool import interface_bootup, interface_send, interface_receive

interface = interface_bootup(
    command_lst=["python", os.path.join(os.path.abspath("."), "reply.py")],
    cwd=None,
    cores=8,
    gpus_per_core=0,
    oversubscribe=False,
    enable_flux_backend=False,
    enable_slurm_backend=False,
    queue_adapter=None,
    queue_type=None,
    queue_adapter_kwargs=None,
)
interface.send_and_receive_dict(input_dict={"a": 1})

With this code 8 cores remain busy. In contrast when the reply script is replaced with:

import sys

from pympipool.shared.communication import (
    interface_connect,
    interface_send,
    interface_receive,
)
from pympipool.shared.backend import parse_arguments


def main(argument_lst=None):
    if argument_lst is None:
        argument_lst = sys.argv
    argument_dict = parse_arguments(argument_lst=argument_lst)
    context, socket = interface_connect(
        host=argument_dict["host"], port=argument_dict["zmqport"]
    )

    while True:
        input_dict = interface_receive(socket=socket)
        interface_send(socket=socket, result_dict={"result": input_dict})


if __name__ == "__main__":
    main(argument_lst=sys.argv)

And the number of cores is reduced to cores=1 in the jupyter notebook everything works fine.

from pympipool.

jan-janssen avatar jan-janssen commented on August 22, 2024

@pmrv I moved the issue to pympipool as it is related to the SocketInterface class. The openMPI documentation suggests export OMPI_MCA_mpi_yield_when_idle=1 but at least for me this did not work out of the box.

from pympipool.

jan-janssen avatar jan-janssen commented on August 22, 2024

If you are testing with OpenMPI, you might have to set oversubscribe=True depending on you configuration.

from pympipool.

jan-janssen avatar jan-janssen commented on August 22, 2024

If you are testing with OpenMPI, you might have to set oversubscribe=True depending on you configuration.

The debugging is simplified by #178

from pympipool.

jan-janssen avatar jan-janssen commented on August 22, 2024

Maybe it is related to mpi4py/mpi4py#468

from pympipool.

jan-janssen avatar jan-janssen commented on August 22, 2024

Maybe it is related to mpi4py/mpi4py#468

This fix was added in #279

from pympipool.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.