Code Monkey home page Code Monkey logo

Comments (15)

andersy005 avatar andersy005 commented on July 18, 2024

@apendergrass,

It appears that for some reasons (unbeknownst to me), SLURM has been unstable/inconsistent for the last few days. As a result, sometimes my jobs don't go through the queue and/or I get errors that look like:

sbatch /glade/scratch/abanihi/tmp30w6t9ah.sh
stdout:

stderr:
sbatch: error: Batch job submission failed: I/O error writing script/environment to file

So, it's likely that the error is due to a glitch in the system, and if you try again at another time, it may work :)

from ncar-python-tutorial.

andersy005 avatar andersy005 commented on July 18, 2024

Ccing @jbaksta as he may know what is really going on.

from ncar-python-tutorial.

apendergrass avatar apendergrass commented on July 18, 2024

Cool, thanks! I emailed cislhelp too, since the message said too. Perhaps it will work after it has a weekend of rest...

from ncar-python-tutorial.

jbaksta avatar jbaksta commented on July 18, 2024

@apendergrass that should only pop up if you sbatch submission was missing the -t option to specify the walltime.

@andersy005 you issue the other day was part of a collection of other issues which should be resolved at this point.

from ncar-python-tutorial.

apendergrass avatar apendergrass commented on July 18, 2024

@jbaksta i used the exact same script successfully on Thursday. The script was doing the calling to sbatch, apparently correctly. I think it was using a variable called wallclock to call.

from ncar-python-tutorial.

matt-long avatar matt-long commented on July 18, 2024

jlab-dav uses the -t option. I have encountered issues like this previously and I attribute them to system-wide issues with Slurm. jlab-dav is working for me right now.

from ncar-python-tutorial.

apendergrass avatar apendergrass commented on July 18, 2024

@matt-long Interesting. Still giving me the same error.

from ncar-python-tutorial.

jbaksta avatar jbaksta commented on July 18, 2024

@apendergrass since jlab-dav is just BASH, please run it as bash -x jlab-dav and post the output if you don't mind.

from ncar-python-tutorial.

matt-long avatar matt-long commented on July 18, 2024

It looks like your account is not being set. Try using the -a option or set an environment variable JOB_ACCOUNT with the project number you want to use.

from ncar-python-tutorial.

matt-long avatar matt-long commented on July 18, 2024

I just replicated the problem by deleting my JOB_ACCOUNT environment variable.

This code in jlab-dav used to set JOB_ACCOUNT, but no longer seems to work.

if [ -z "${JOB_ACCOUNT}" ]; then
  source /glade/u/apps/ch/opt/usr/bin/getacct.sh
fi

The -t error is a red-herring.

from ncar-python-tutorial.

apendergrass avatar apendergrass commented on July 18, 2024

@matt-long yes I just came to the same conclusion. It totally works when I set JOB_ACCOUNT. Seems that's what changed from Thursday to Friday. Thanks for your help everyone!

from ncar-python-tutorial.

jbaksta avatar jbaksta commented on July 18, 2024

@matt-long then there is a bug in the slurm job submission filter that I'll work on fixing.

from ncar-python-tutorial.

matt-long avatar matt-long commented on July 18, 2024

@jbaksta, thanks!

from ncar-python-tutorial.

jbaksta avatar jbaksta commented on July 18, 2024

Actually, based on repeating this, I'm going to open up an issue against jlab-dav needing to check for zero-length variables or null/uninitialized variables as well as the submit filter. The instance happens (arguably how it should) with the following:

sbatch -A -t 5:00 ... and therefore account is taken as '-t' and then there is no provided -t option to the submission. '-t' could be a completely valid slurm account name, although not a wise one.

from ncar-python-tutorial.

matt-long avatar matt-long commented on July 18, 2024

I just pushed a change to jlab-dav; we were checking for some empty variables, but not JOB_ACCOUNT. This new version traps this error.

Hopefully this script becomes obsolete once JupyterHub works on DAV.

from ncar-python-tutorial.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.