Code Monkey home page Code Monkey logo

Comments (3)

davidegraff avatar davidegraff commented on May 18, 2024

I don’t believe this claim is right. the launcher portion of the batch script works by

  1. starting one task on each node in the allocation
  2. In that task, start a ray worker with all the CPUS on that node ($SLURM_CPUS_ON_NODE)
    By SLURM’s nature, a job step has exclusive access to all the resources on the nodes for which they were scheduled, meaning that if you start a ray worker within a job step, it will be able to use all the resources of the node on which it’s running. See the —exclusive section of the srun page:

This option applies to job and job step allocations, and has two slightly different meanings for each one. When used to initiate a job, the job allocation cannot share nodes with other running jobs (or just other users with the "=user" option or "=mcs" option). If user/mcs are not specified (i.e. the job allocation can not share nodes with other running jobs), the job is allocated all CPUs and GRES on all nodes in the allocation, but is only allocated as much memory as it requested. This is by design to support gang scheduling, because suspended jobs still reside in memory. To request all the memory on a node, use --mem=0. The default shared/exclusive behavior depends on system configuration and the partition's OverSubscribe option takes precedence over the job's option.
This option can also be used when initiating more than one job step within an existing resource allocation (default), where you want separate processors to be dedicated to each job step. If sufficient processors are not available to initiate the job step, it will be deferred. This can be thought of as providing a mechanism for resource management to the job within its allocation (--exact implied).

The exclusive allocation of CPUs applies to job steps by default, but --exact is NOT the default. In other words, the default behavior is this: job steps will not share CPUs, but job steps will be allocated all CPUs available to the job on all nodes allocated to the steps.

There are multiple ways in which to start a ray cluster inside a SLURM job and this is only one of them. If you run this script with varying values of —ntasks-per-node you will see that your ray cluster possess more resources and your screen runs faster.

from pyscreener.

likun1212 avatar likun1212 commented on May 18, 2024

try this:

#SBATCH -N 1
#SBATCH -p ???
#SBATCH --ntasks-per-node 1
#SBATCH -c 4

echo $SLURM_CPUS_ON_NODE

#################
say I have 32 cpus on that node, but it will return 4 instead of 32, since i am asking for 4 cpus.

from pyscreener.

davidegraff avatar davidegraff commented on May 18, 2024

You should read up on the SLURM documentation if you’re confused by this. You requested 4 so SLURM gave you 4 even if the node has 32. If you want all 32 then ask for them

from pyscreener.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.