tum-daml / seml Goto Github PK
View Code? Open in Web Editor NEWSEML: Slurm Experiment Management Library
License: Other
SEML: Slurm Experiment Management Library
License: Other
Follow-up from #81.
Add a new command that allows one to update pending jobs by overriding their sbatch and config parameters.
Proposed syntax:
seml <collection> update -b 7 -sb mem=25GB -o dataset=imagenet
Considerations:
This is not a bug report, it's more of a feature request (or an example request if this feature is already somehow implemented.)
When I use grid
for a parameter in the config file, seml will create several experiments for me, one for every possible parameter value of the grid (=experimental condition). Is it somehow possible to specify a unique log directories for every experimental condition? I am asking as I don't want to save results (e.g. tensorflow event files, pytorch models, etc) from all experimental conditions in the same log directory (which is what would happen if I only specified the output_dir
in the seml block of the config file.) One potential solution would be to extend the log directory path in my own python code and returning this path with the dictionary that is returned at the end of an experimental condition's run. This has the downside that if an experiment does not finish as it is supposed to, e.g. because it gets terminated by slurm due to running beyond the time limit, the result dictionary will not be returned and the path will hence not be saved in the mongodb. Alternatively, is it possible to save output files (tensorflow event files, pytorch models) in the mongodb directly? (Although this might also not be a viable solution as the db grows very big when storing many models, I guess.)
In any case, thanks for creating seml - I find it very useful!
seml <> status
should show experiments as pending if their slurm state is CONFIGURING.
The CONFIGURING state is used, for example, by auto-scaling cloud deployments while the instances are being provisioned.
seml marks these experiments as killed.
seml <> status
Looks like get_slurm_jobs() and get_slurm_array_tasks() need to include all Slurm job states that could be considered pending
def get_slurm_arrays_tasks():
....
squeue_out = subprocess.check_output("squeue -a -t **pending,running** -h -o %i -u `whoami`", shell=True)
....
def get_slurm_jobs():
try:
squeue_out = subprocess.check_output("squeue -a -t **pending,running** -h -o %i -u `whoami`", shell=True)
Experiments are running with SEML, while other seperate SLURM throttled job arrays are running.
seml db status
should return the status
An error is thrown
seml [db_name] status
Traceback (most recent call last):
File "[...]./local/bin/seml", line 10, in <module>
sys.exit(main())
File "[...].local/lib/python3.7/site-packages/seml/main.py", line 231, in main
f(**args.__dict__)
File "[...]/.local/lib/python3.7/site-packages/seml/manage.py", line 17, in report_status
detect_killed(db_collection_name, print_detected=False)
File "[...]/.local/lib/python3.7/site-packages/seml/manage.py", line 263, in detect_killed
running_jobs = get_slurm_arrays_tasks()
File "[...]/.local/lib/python3.7/site-packages/seml/manage.py", line 317, in get_slurm_arrays_tasks
job_dict[array_id][0].append(range(int(lower), int(upper) + 1))
ValueError: invalid literal for int() with base 10: b'4%1'
conda list
): -The LICENSE
file should be included in the PyPI package (this is required by the license itself).
If I run python -m build .
in a clean checkout of this repo then I have the LICENSE file in the resulting tar.gz
, so a rebuild & reupload to PyPI should suffice.
I stumbled over this while packaging seml for conda-forge: conda-forge/staged-recipes#16946.
BTW, if you want to be a maintainer on the seml feedstock for conda-forge, let me know and I will add you :)
seml seml_example2 start --num-exps 1 --local
Starting local worker thread that will run up to 1 experiment, until no queued experiments remain.
Traceback (most recent call last):
File "/path/bin/seml", line 8, in <module>
sys.exit(main())
File "/path/lib64/python3.6/site-packages/seml/main.py", line 210, in main
f(**args.__dict__)
File "/path/lib64/python3.6/site-packages/seml/start.py", line 492, in start_experiments
output_to_file=output_to_file)
File "/path/lib64/python3.6/site-packages/seml/start.py", line 422, in start_jobs
tq.set_postfix(failed=f"{num_exceptions}/{i_exp} experiments")
AttributeError: 'enumerate' object has no attribute 'set_postfix'
Looks like it's due to tqdm bypass:
seml/start.py
try:
from tqdm.autonotebook import tqdm
except ImportError:
def tqdm(iterable, total=None):
return iterable
....
tq = tqdm(enumerate(exps_list))
for i_exp, exp in tq:
if output_to_file:
output_dir_path = get_output_dir_path(exp)
else:
output_dir_path = None
success = start_local_job(collection, exp, unobserved, post_mortem, output_dir_path)
if success is False:
num_exceptions += 1
tq.set_postfix(failed=f"{num_exceptions}/{i_exp} experiments")
Doing a pip install tqdm fixes the error
Seml scripts can be named with arbitrary name or a nice warning message is shown if the name is invalid.
seml
fails to add config file with a specific script name - in my case, the scripts is called tokenize.py
.
Scripts cannot be named if the name is also an installed package name that seml
requires in setup.py
:
numpy.py
or munch.py
and 1 package not mentioned there (anndata.py
).mkdir -p test/experiments test/configs test/scripts && cd test
scripts/tokenize.py
:cat << EOF > scripts/tokenize.py
from sacred import Experiment
import seml
ex = Experiment()
seml.setup_logger(ex)
@ex.automain
def main(foo: str) -> None:
print(foo)
EOF
configs/tokenize.yaml
:cat << EOF > configs/tokenize.yaml
seml:
executable: scripts/tokenize.py
conda_environment: test
name: test
output_dir: experiments
project_root_dir: ..
slurm:
experiments_per_job: 1
sbatch_options_template: CPU
sbatch_options:
mem: 1G
time: 0-00:00
fixed:
foo: bar
EOF
seml ... add configs/tokenize.yaml
WARNING: Current Anaconda environment does not match the experiment's environment ('test').
EXECUTABLE ERROR: Executable /home/michal/test/scripts/tokenize.py was not found in the source code files to upload
Changing the script name to e.g. _tokenize.py
and adjusting the path in the config works .
conda list
):This RFC proposes to change how we handle sub-configs in order to make them more flexible. Instead of treating a sub-config as a sub-experiment that is just run as is, this proposal is about using them as definitions that can then be flexibly used in the remaining config.
# Define sub-configs to use them later
nitrogen:
...
cyclobutadiene:
...
short_training:
...
long_training:
...
model1:
...
model2:
...
# The user has to specify how to use the sub-configs
load:
- model1 # directly specifying a sub-config means always using it (like `fixed`)
- choice:
- nitrogen
- cyclobutadiene
- choice:
- short_training
- long_training
grid:
other_attribute:
type: choice
options:
- True
- False
We currently assume that always exactly one sub-config is chosen. This proposal allows users to define themselves how they want to combine sub-configs. Specifying them all in a grid would replicate the behavior we had previously, so this is a natural extension.
A sub-config can itself contain fixed/grid/random blocks and further sub-configs, just like they do now. The outer set of configs is then combined with the inner set of configs (outer product), just like now. I.e. loading a single sub-config can result in multiple configs, if the sub-config contains a grid.
This also naturally extends to loading separate config files into one config. The other config file can be treated exactly the same way as a subconfig. The user would only have to provide a keyword, specifying that this is now an external file. I.e.
# Define sub-configs
nitrogen:
...
model2:
...
load:
- model2
- file: ~/test/general.yaml # A full config file, with fixed, grid, and random blocks.
- choice:
- nitrogen
- simple_file: ~/test/dataset2.yaml # A simple config file, which only containes fixed parameter values, similar to what Sacred uses.
While defining sub-configs seems pretty clear (that's exactly how we do it currently), specifying their usage still remains an open question. Instead of creating a new top-level block load
, we could also integrate loading sub-configs into the fixed/grid/random blocks by introducing a special load
keyword. However, this is much more difficult, considering the restrictions of yaml (every key has to be unique).
fixed:
model1: load # Always use everything from one sub-config
grid:
dataset: # The name doesn't do anything, but it has to be unique (due to yaml)
type: load # Specifies that we want to load sub-configs
options:
- nitrogen
- cyclobutadiene
- file: ~/test/general.yaml
- simple_file: ~/test/dataset2.yaml
training:
type: load
options:
- short_training # Multiple sub-configs can be combined
- long_training
I prefer the original suggestion. It also seems to align better with user expectations. With this variant, loading a sub-config in the fixed block could lead to multiple different configs, which might be unexpected.
There are 2 open questions:
load
for fixed/grid/random? How do we best specify importing files in these 3 cases? -> No random; use keywords fixed
and choice
.Support for the following syntax would be great to add multiple configs at once:
seml experiment add *.yaml
Hello,
sry for the possibly ignorant question:
Is it possible to run the library without MongoDB installed?
Best,
Aaron
One should be able to set a value to "{}"
.
If one tries to set a value to "{}"
that entry is just completely dropped.
seml:
executable: script.py
name: test
output_dir: ~/slurm-output
conda_environment: seml_test
project_root_dir: .
slurm:
experiments_per_job: 1
sbatch_options:
gres: gpu:0
mem: 1G
cpus-per-task: 1
time: 00-00:01
partition: cpu
qos: cpu
fixed:
test: "{}"
seml seml_test add test.yaml
CONFIG ERROR: No parameters defined under grid, fixed, or random in the config file.
In this case, we are getting a no parameters defined error because the only parameter is dropped.
If we had multiple parameters (with values != "{}"
) this config would be valid but all parameters with value "{}"
would be missing. (They also do not occur in the mongodb
).
Hi all,
it seems like there is something strange going on with the status command and pending jobs on our slurm cluster. I'm not entirely sure why this happens, but it seems like experiments are killed silently when seml tries to determine if they have been killed externally on the slurm cluster, while they are still pending in the slurm queue. It might be some issue with how slurm displays the jobs on our cluster or it might be something else, which I'm not quite seeing.
Observing experiments using seml [db_collection_name] status
during execution should show how the jobs move from staged
to pending
to running
to completed
for the example experiment.
Running seml [db_collection_name] status
will kill pending jobs silently. Not running the command will run jobs as expected.
The issue likely happens when the status command tires to detect killed experiments in this line:
Line 22 in 7d9352e
watch seml seml_example status
runningseml seml_example add example_config.yaml
, the jobs now appear in the staged section in the second terminal window.seml seml_example start
, the jobs now appear in squeue
, but they also immediately appear in the killed
section in the second terminal window.To get the slurm jobs to start on our cluster, I had to change the partition to exercise
in the example_config.yaml
and reduced the maximum slurm time to 2 hours.
Most logs look something like this:
(seml) [hborras@ceg-octane logs]$ cat example_experiment_68391_10.out
Starting job 68402
SLURM assigned me the node(s): ceg-brook01
WARNING: Experiment with ID 11 does not have status PENDING and will not be run.
Experiments are running under the following process IDs:
With ceg-brook01
being one of our GPU nodes.
The squeue output looks something like this during execution:
[hborras@ceg-octane ~]$ squeue
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
68391_4 exercise example_ hborras CG 0:05 1 ceg-brook01
68391_6 exercise example_ hborras CG 0:05 1 ceg-brook01
68391_[10-71] exercise example_ hborras PD 0:00 1 (Resources)
68391_9 exercise example_ hborras R 0:00 1 ceg-brook02
68391_7 exercise example_ hborras R 0:01 1 ceg-brook02
68391_8 exercise example_ hborras R 0:01 1 ceg-brook02
68391_5 exercise example_ hborras R 0:05 1 ceg-brook01
(seml) [hborras@ceg-octane examples]$ uname -a
Linux ceg-octane 3.10.0-1127.19.1.el7.x86_64 #1 SMP Tue Aug 25 17:23:54 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
conda list
):(seml) [hborras@ceg-octane examples]$ conda list
# packages in environment at /home/hborras/.conda/envs/seml:
#
# Name Version Build Channel
_libgcc_mutex 0.1 main
_openmp_mutex 4.5 1_gnu
ca-certificates 2021.10.26 h06a4308_2
certifi 2021.10.8 py37h06a4308_2
colorama 0.4.4 pypi_0 pypi
debugpy 1.5.1 pypi_0 pypi
docopt 0.6.2 pypi_0 pypi
gitdb 4.0.9 pypi_0 pypi
gitpython 3.1.27 pypi_0 pypi
importlib-metadata 4.11.2 pypi_0 pypi
jsonpickle 1.5.2 pypi_0 pypi
libedit 3.1.20210910 h7f8727e_0
libffi 3.2.1 hf484d3e_1007
libgcc-ng 9.3.0 h5101ec6_17
libgomp 9.3.0 h5101ec6_17
libstdcxx-ng 9.3.0 hd4cf53a_17
munch 2.5.0 pypi_0 pypi
ncurses 6.3 h7f8727e_2
numpy 1.21.5 pypi_0 pypi
openssl 1.0.2u h7b6447c_0
packaging 21.3 pypi_0 pypi
pandas 1.1.5 pypi_0 pypi
pip 21.2.2 py37h06a4308_0
py-cpuinfo 8.0.0 pypi_0 pypi
pymongo 4.0.1 pypi_0 pypi
pyparsing 3.0.7 pypi_0 pypi
python 3.7.0 h6e4f718_3
python-dateutil 2.8.2 pypi_0 pypi
pytz 2021.3 pypi_0 pypi
pyyaml 6.0 pypi_0 pypi
readline 7.0 h7b6447c_5
sacred 0.8.2 pypi_0 pypi
seml 0.3.6 pypi_0 pypi
setuptools 58.0.4 py37h06a4308_0
six 1.16.0 pypi_0 pypi
smmap 5.0.0 pypi_0 pypi
sqlite 3.33.0 h62c20be_0
tk 8.6.11 h1ccaba5_0
tqdm 4.63.0 pypi_0 pypi
typing-extensions 4.1.1 pypi_0 pypi
wheel 0.37.1 pyhd3eb1b0_0
wrapt 1.13.3 pypi_0 pypi
xz 5.2.5 h7b6447c_0
zipp 3.7.0 pypi_0 pypi
zlib 1.2.11 h7f8727e_4
Chaining commands via seml <collection> [commands]
may have unintended side effects, especially when canceling jobs. It may happen that the following commands get executed before all jobs have been completely canceled by slurm
.
Potential solutions:
seml <collection> cancel
after the jobs have been actually cancelled by slurm
cancelling the started experiments
python TypeError: unhashable type: 'list'
python seml/track.py -c examples/example_config.yaml queue
python seml/track.py -c examples/example_config.yaml start
python seml/track.py -c examples/example_config.yaml cancel
Traceback (most recent call last):
File "seml/track.py", line 346, in <module>
f(**args.__dict__)
File "seml/track.py", line 109, in cancel_experiments
slurm_ids = set([e['slurm']['id'] for e in exps if "slurm" in e and ["id"] in e['slurm']])
File "seml/track.py", line 109, in <listcomp>
slurm_ids = set([e['slurm']['id'] for e in exps if "slurm" in e and ["id"] in e['slurm']])
TypeError: unhashable type: 'list'
conda list
):_anaconda_depends 2019.03 py37_0 anaconda
_libgcc_mutex 0.1 main
alabaster 0.7.12 py37_0 anaconda
anaconda custom py37_1
anaconda-client 1.7.2 py37_0 anaconda
anaconda-project 0.8.3 py_0
asn1crypto 1.0.1 py37_0 anaconda
astroid 2.3.1 py37_0 anaconda
astropy 3.2.2 py37h7b6447c_0 anaconda
atomicwrites 1.3.0 py37_1 anaconda
attrs 19.2.0 py_0
babel 2.7.0 py_0
backcall 0.1.0 py37_0 anaconda
backports 1.0 py_2
backports.os 0.1.1 py37_0 anaconda
backports.shutil_get_terminal_size 1.0.0 py37_2 anaconda
beautifulsoup4 4.8.0 py37_0 anaconda
bitarray 1.0.1 py37h7b6447c_0 anaconda
bkcharts 0.2 py37_0 anaconda
blas 1.0 mkl
bleach 3.1.0 py37_0 anaconda
blosc 1.16.3 hd408876_0
bokeh 1.3.4 py37_0 anaconda
boto 2.49.0 py37_0 anaconda
boto3 1.9.251 pypi_0 pypi
botocore 1.12.251 pypi_0 pypi
bottleneck 1.2.1 py37h035aef0_1 anaconda
bzip2 1.0.8 h7b6447c_0
ca-certificates 2019.8.28 0 anaconda
cairo 1.14.12 h8948797_3
certifi 2019.9.11 py37_0 anaconda
cffi 1.12.3 py37h2e261b9_0 anaconda
chardet 3.0.4 py37_1003 anaconda
click 6.7 pypi_0 pypi
cloudpickle 1.2.2 py_0
clyent 1.2.2 py37_1 anaconda
colorama 0.4.1 py37_0 anaconda
conda 4.7.12 py37_0 anaconda
conda-env 2.6.0 1
conda-package-handling 1.6.0 py37h7b6447c_0 anaconda
contextlib2 0.6.0 py_0
cryptography 2.7 py37h1ba5d50_0 anaconda
curl 7.65.3 hbc83047_0
cycler 0.10.0 py37_0 anaconda
cython 0.29.13 py37he6710b0_0 anaconda
cytoolz 0.10.0 py37h7b6447c_0 anaconda
dask 2.5.2 py_0
dask-core 2.5.2 py_0
dbus 1.13.6 h746ee38_0
decorator 4.4.0 py37_1 anaconda
defusedxml 0.6.0 py_0
dill 0.3.1.1 pypi_0 pypi
distributed 2.5.2 py_0
docopt 0.6.2 pypi_0 pypi
docutils 0.15.2 py37_0 anaconda
entrypoints 0.3 py37_0 anaconda
et_xmlfile 1.0.1 py37_0 anaconda
expat 2.2.6 he6710b0_0
fastcache 1.1.0 py37h7b6447c_0 anaconda
filelock 3.0.12 py_0
flask 1.1.1 py_0
flask-cors 3.0.8 py_0
fontconfig 2.13.0 h9420a91_0
freetype 2.9.1 h8a8886c_1
fribidi 1.0.5 h7b6447c_0
fsspec 0.5.2 py_0
get_terminal_size 1.0.0 haa9412d_0
gevent 1.4.0 py37h7b6447c_0 anaconda
gitdb2 2.0.6 pypi_0 pypi
gitpython 3.0.3 pypi_0 pypi
glib 2.56.2 hd408876_0
glob2 0.7 py_0
gmp 6.1.2 h6c8ec71_1
gmpy2 2.0.8 py37h10f8cd9_2 anaconda
graphite2 1.3.13 h23475e2_0
greenlet 0.4.15 py37h7b6447c_0 anaconda
gst-plugins-base 1.14.0 hbbd80ab_1
gstreamer 1.14.0 hb453b48_1
h5py 2.9.0 py37h7918eee_0 anaconda
harfbuzz 1.8.8 hffaf4a1_0
hdf5 1.10.4 hb1b8bf9_0
heapdict 1.0.1 py_0
html5lib 1.0.1 py37_0 anaconda
hub 0.2.0.4 pypi_0 pypi
icu 58.2 h9c2bf20_1
idna 2.8 py37_0 anaconda
imageio 2.6.0 py37_0 anaconda
imagesize 1.1.0 py37_0 anaconda
importlib_metadata 0.23 py37_0 anaconda
intel-openmp 2019.4 243
ipykernel 5.1.2 py37h39e3cac_0 anaconda
ipython 7.8.0 py37h39e3cac_0 anaconda
ipython_genutils 0.2.0 py37_0 anaconda
ipywidgets 7.5.1 py_0
isort 4.3.21 py37_0 anaconda
itsdangerous 1.1.0 py37_0 anaconda
jbig 2.1 hdba287a_0
jdcal 1.4.1 py_0
jedi 0.15.1 py37_0 anaconda
jeepney 0.4.1 py_0
jinja2 2.10.3 py_0
jmespath 0.9.4 pypi_0 pypi
jpeg 9b h024ee3a_2
json5 0.8.5 py_0
jsonpickle 0.9.6 pypi_0 pypi
jsonschema 3.0.2 py37_0 anaconda
jupyter 1.0.0 py37_7 anaconda
jupyter_client 5.3.3 py37_1 anaconda
jupyter_console 6.0.0 py37_0 anaconda
jupyter_core 4.5.0 py_0
jupyterlab 1.1.4 pyhf63ae98_0
jupyterlab_server 1.0.6 py_0
keyring 18.0.0 py37_0 anaconda
kiwisolver 1.1.0 py37he6710b0_0 anaconda
krb5 1.16.1 h173b8e3_7
lazy-object-proxy 1.4.2 py37h7b6447c_0 anaconda
libarchive 3.3.3 h5d8350f_5
libcurl 7.65.3 h20c2e04_0
libedit 3.1.20181209 hc058e9b_0
libffi 3.2.1 hd88cf55_4
libgcc 7.2.0 h69d50b8_2
libgcc-ng 9.1.0 hdf63c60_0
libgfortran 3.0.0 1 https://repo.continuum.io/pkgs/free
libgfortran-ng 7.3.0 hdf63c60_0
libiconv 1.15 h63c8f33_5
liblief 0.9.0 h7725739_2
libpng 1.6.37 hbc83047_0
libsodium 1.0.16 h1bed415_0
libssh2 1.8.2 h1ba5d50_0
libstdcxx-ng 9.1.0 hdf63c60_0
libtiff 4.0.10 h2733197_2
libtool 2.4.6 h7b6447c_5
libuuid 1.0.3 h1bed415_2
libxcb 1.13 h1bed415_1
libxml2 2.9.9 hea5a465_1
libxslt 1.1.33 h7d1a2b0_0
llvmlite 0.29.0 py37hd408876_0 anaconda
locket 0.2.0 py37_1 anaconda
lxml 4.4.1 py37hefd8a0e_0 anaconda
lz4-c 1.8.1.2 h14c3975_0
lzo 2.10 h49e0be7_2
markupsafe 1.1.1 py37h7b6447c_0 anaconda
matplotlib 3.1.1 py37h5429711_0 anaconda
mccabe 0.6.1 py37_1 anaconda
mistune 0.8.4 py37h7b6447c_0 anaconda
mkl 2019.4 243
mkl-service 2.3.0 py37he904b0f_0 anaconda
mkl_fft 1.0.14 py37ha843d7b_0 anaconda
mkl_random 1.1.0 py37hd6b4f25_0 anaconda
mock 3.0.5 py37_0 anaconda
more-itertools 7.2.0 py37_0 anaconda
mpc 1.1.0 h10f8cd9_1
mpfr 4.0.1 hdf1c602_3
mpmath 1.1.0 py37_0 anaconda
msgpack-python 0.6.1 py37hfd86e86_1 anaconda
multipledispatch 0.6.0 py37_0 anaconda
multiprocess 0.70.9 pypi_0 pypi
munch 2.3.2 pypi_0 pypi
nbconvert 5.6.0 py37_1 anaconda
nbformat 4.4.0 py37_0 anaconda
ncurses 6.1 he6710b0_1
networkx 2.3 py_0
nltk 3.4.5 py37_0 anaconda
nose 1.3.7 py37_2 anaconda
notebook 6.0.1 py37_0 anaconda
numba 0.45.1 py37h962f231_0 anaconda
numexpr 2.7.0 py37h9e4a6bb_0 anaconda
numpy 1.17.2 py37haad9e8e_0 anaconda
numpy-base 1.17.2 py37hde5b4d6_0 anaconda
numpydoc 0.9.1 py_0
olefile 0.46 py37_0 anaconda
openpyxl 3.0.0 py_0
openssl 1.1.1 h7b6447c_0 anaconda
packaging 19.2 py_0
pandas 0.25.1 py37he6710b0_0 anaconda
pandoc 2.2.3.2 0
pandocfilters 1.4.2 py37_1 anaconda
pango 1.42.4 h049681c_0
parso 0.5.1 py_0
partd 1.0.0 py_0
patchelf 0.9 he6710b0_3
path.py 12.0.1 py_0
pathlib2 2.3.5 py37_0 anaconda
pathos 0.2.5 pypi_0 pypi
patsy 0.5.1 py37_0 anaconda
pcre 8.43 he6710b0_0
pep8 1.7.1 py37_0 anaconda
pexpect 4.7.0 py37_0 anaconda
pickleshare 0.7.5 py37_0 anaconda
pillow 6.2.0 py37h34e0f95_0 anaconda
pip 19.2.3 py37_0 anaconda
pixman 0.38.0 h7b6447c_0
pkginfo 1.5.0.1 py37_0 anaconda
pluggy 0.13.0 py37_0 anaconda
ply 3.11 py37_0 anaconda
pox 0.2.7 pypi_0 pypi
ppft 1.6.6.1 pypi_0 pypi
prometheus_client 0.7.1 py_0
prompt_toolkit 2.0.10 py_0
psutil 5.6.3 py37h7b6447c_0 anaconda
ptyprocess 0.6.0 py37_0 anaconda
py 1.8.0 py37_0 anaconda
py-cpuinfo 5.0.0 pypi_0 pypi
py-lief 0.9.0 py37h7725739_2 anaconda
pyasn1 0.4.7 py_0
pycodestyle 2.5.0 py37_0 anaconda
pycosat 0.6.3 py37h14c3975_0 anaconda
pycparser 2.19 py37_0 anaconda
pycrypto 2.6.1 py37h14c3975_9 anaconda
pycurl 7.43.0.3 py37h1ba5d50_0 anaconda
pyflakes 2.1.1 py37_0 anaconda
pygments 2.4.2 py_0
pylint 2.4.2 py37_0 anaconda
pymongo 3.9.0 pypi_0 pypi
pyodbc 4.0.27 py37he6710b0_0 anaconda
pyopenssl 19.0.0 py37_0 anaconda
pyparsing 2.4.2 py_0
pyqt 5.9.2 py37h22d08a2_1 anaconda
pyrsistent 0.15.4 py37h7b6447c_0 anaconda
pysocks 1.7.1 py37_0 anaconda
pytables 3.5.2 py37h71ec239_1 anaconda
pytest 5.2.1 py37_0 anaconda
pytest-arraydiff 0.3 py37h39e3cac_0 anaconda
pytest-astropy 0.5.0 py37_0 anaconda
pytest-doctestplus 0.4.0 py_0
pytest-openfiles 0.4.0 py_0
pytest-remotedata 0.3.2 py37_0 anaconda
python 3.7.4 h265db76_1 anaconda
python-dateutil 2.8.0 py37_0 anaconda
python-libarchive-c 2.8 py37_13 anaconda
pytz 2019.3 py_0
pywavelets 1.0.3 py37hdd07704_1 anaconda
pyyaml 5.1.2 py37h7b6447c_0 anaconda
pyzmq 18.1.0 py37he6710b0_0 anaconda
qt 5.9.7 h5867ecd_1
qtawesome 0.6.0 py_0
qtconsole 4.5.5 py_0
qtpy 1.9.0 py_0
readline 7.0 h7b6447c_5
redis 5.0.3 h7b6447c_0
redis-py 3.3.8 py_0
requests 2.22.0 py37_0 anaconda
ripgrep 0.10.0 hc07d326_0
rope 0.14.0 py_0
ruamel_yaml 0.15.46 py37h14c3975_0 anaconda
s3transfer 0.2.1 pypi_0 pypi
sacred 0.8.0 pypi_0 pypi
scikit-image 0.15.0 py37he6710b0_0 anaconda
scikit-learn 0.20.3 py37hd81dba3_0 anaconda
scipy 1.3.1 py37h7c811a0_0 anaconda
seaborn 0.9.0 py37_0 anaconda
secretstorage 3.1.1 py37_0 anaconda
send2trash 1.5.0 py37_0 anaconda
setuptools 41.4.0 py37_0 anaconda
simplegeneric 0.8.1 py37_2 anaconda
singledispatch 3.4.0.3 py37_0 anaconda
sip 4.19.13 py37he6710b0_0 anaconda
six 1.12.0 py37_0 anaconda
smmap2 2.0.5 pypi_0 pypi
snappy 1.1.7 hbae5bb6_3
snowballstemmer 2.0.0 py_0
sortedcollections 1.1.2 py37_0 anaconda
sortedcontainers 2.1.0 py37_0 anaconda
soupsieve 1.9.3 py37_0 anaconda
sphinx 2.2.0 py_0
sphinxcontrib 1.0 py37_1 anaconda
sphinxcontrib-applehelp 1.0.1 py_0
sphinxcontrib-devhelp 1.0.1 py_0
sphinxcontrib-htmlhelp 1.0.2 py_0
sphinxcontrib-jsmath 1.0.1 py_0
sphinxcontrib-qthelp 1.0.2 py_0
sphinxcontrib-serializinghtml 1.1.3 py_0
sphinxcontrib-websupport 1.1.2 py_0
spyder 3.3.6 py37_0 anaconda
spyder-kernels 0.5.2 py37_0 anaconda
sqlalchemy 1.3.9 py37h7b6447c_0 anaconda
sqlite 3.30.0 h7b6447c_0
statsmodels 0.10.1 py37hdd07704_0 anaconda
sympy 1.4 py37_0 anaconda
tbb 2019.4 hfd86e86_0
tblib 1.4.0 py_0
tenacity 5.1.1 pypi_0 pypi
terminado 0.8.2 py37_0 anaconda
testpath 0.4.2 py37_0 anaconda
tk 8.6.8 hbc83047_0
toolz 0.10.0 py_0
tornado 6.0.3 py37h7b6447c_0 anaconda
tqdm 4.36.1 py_0
traitlets 4.3.3 py37_0 anaconda
unicodecsv 0.14.1 py37_0 anaconda
unixodbc 2.3.7 h14c3975_0
urllib3 1.24.2 py37_0 anaconda
wcwidth 0.1.7 py37_0 anaconda
webencodings 0.5.1 py37_1 anaconda
werkzeug 0.16.0 py_0
wheel 0.33.6 py37_0 anaconda
widgetsnbextension 3.5.1 py37_0 anaconda
wrapt 1.11.2 py37h7b6447c_0 anaconda
wurlitzer 1.0.3 py37_0 anaconda
xlrd 1.2.0 py37_0 anaconda
xlsxwriter 1.2.1 py_0
xlwt 1.3.0 py37_0 anaconda
xz 5.2.4 h14c3975_4
yaml 0.1.7 had09818_2
zeromq 4.3.1 he6710b0_3
zict 1.0.0 py_0
zipp 0.6.0 py_0
zlib 1.2.11 h7b6447c_3
zstd 1.3.7 h0b5b093_0
Line 857 in b16c130
Starting job 6372963
SLURM assigned me the node(s): gpu08
[I 16:15:57.661 NotebookApp] Serving notebooks from local directory: /nfs/homedirs/username
[I 16:15:57.661 NotebookApp] Jupyter Notebook 6.3.0 is running at:
[I 16:15:57.661 NotebookApp] http://gpu08.kdd.in.tum.de:8888/?token=dcd458ad326a6bb2ed9c78a15bb472f84f2a5bdb8b941f7c
[I 16:15:57.661 NotebookApp] or http://127.0.0.1:8888/?token=dcd458ad326a6bb2ed9c78a15bb472f84f2a5bdb8b941f7c
[I 16:15:57.661 NotebookApp] Use Control-C to stop this server and shut down all kernels (twice to skip confirmation).
[C 16:15:57.676 NotebookApp]
To access the notebook, open this file in a browser:
file:///nfs/homedirs/username/.local/share/jupyter/runtime/nbserver-10756-open.html
Or copy and paste one of these URLs:
http://gpu08.kdd.in.tum.de:8888/?token=dcd458ad326a6bb2ed9c78a15bb472f84f2a5bdb8b941f7c
or http://127.0.0.1:8888/?token=dcd458ad326a6bb2ed9c78a15bb472f84f2a5bdb8b941f7c
[I 16:16:44.535 NotebookApp] 302 GET /login?next=%2Ftree%3F (172.24.64.22) 2.210000ms
[I 16:16:49.296 NotebookApp] 302 GET / (172.24.64.22) 0.720000ms
[I 16:17:04.577 NotebookApp] 302 POST /login?next=%2F (172.24.64.22) 1.650000ms
[I 16:17:04.614 NotebookApp] 302 GET / (172.24.64.22) 0.560000ms
[I 16:18:42.661 NotebookApp] 302 GET / (172.24.64.22) 0.700000ms
As you could see the above output from SLURM, out file contains not only one http ~ url line. It has 4 url lines.
Therefore, I suggest that the referenced line should be fixed or deleted unless there is other reason for this line.
Starting job locally should be able to run addional/optional conda activate/deactivate scripts
It failes with error message (first and only log) /bin/sh: 5: <pathtocondaenv>/etc/conda/deactivate.d/deactivate-gxx_linux-64.sh: Syntax error: "(" unexpected
conda install gxx_linux-64==7.*
-l
option/bin/sh: 5: <pathtocondaenv>/etc/conda/deactivate.d/deactivate-gxx_linux-64.sh: Syntax error: "(" unexpected
conda list
):# Name Version Build Channel
_libgcc_mutex 0.1 conda_forge conda-forge
_openmp_mutex 4.5 1_gnu conda-forge
argon2-cffi 20.1.0 py38h497a2fe_2 conda-forge
ase 3.21.1 pypi_0 pypi
async_generator 1.10 py_0 conda-forge
attrs 20.3.0 pyhd3deb0d_0 conda-forge
backcall 0.2.0 pyh9f0ad1d_0 conda-forge
backports 1.0 py_2 conda-forge
backports.functools_lru_cache 1.6.4 pyhd8ed1ab_0 conda-forge
binutils_impl_linux-64 2.35.1 h193b22a_2 conda-forge
binutils_linux-64 2.35 h67ddf6f_30 conda-forge
blas 1.0 mkl conda-forge
bleach 3.3.0 pyh44b312d_0 conda-forge
ca-certificates 2020.12.5 ha878542_0 conda-forge
certifi 2020.12.5 py38h578d9bd_1 conda-forge
cffi 1.14.5 py38ha65f79e_0 conda-forge
chardet 4.0.0 pypi_0 pypi
colorama 0.4.4 pypi_0 pypi
cudatoolkit 10.2.89 h8f6ccaa_8 conda-forge
cycler 0.10.0 pypi_0 pypi
dataclasses 0.6 pypi_0 pypi
dbus 1.13.6 h48d8840_2 conda-forge
debugpy 1.2.1 pypi_0 pypi
decorator 4.4.2 pypi_0 pypi
defusedxml 0.7.1 pyhd8ed1ab_0 conda-forge
docopt 0.6.2 pypi_0 pypi
entrypoints 0.3 pyhd8ed1ab_1003 conda-forge
expat 2.3.0 h9c3ff4c_0 conda-forge
filelock 3.0.12 pypi_0 pypi
fontconfig 2.13.1 hba837de_1005 conda-forge
freetype 2.10.4 h0708190_1 conda-forge
future 0.18.2 pypi_0 pypi
gcc_impl_linux-64 7.5.0 habd7529_19 conda-forge
gcc_linux-64 7.5.0 h47867f9_30 conda-forge
gettext 0.19.8.1 h0b5b191_1005 conda-forge
gitdb 4.0.7 pypi_0 pypi
gitpython 3.1.15 pypi_0 pypi
glib 2.68.1 h9c3ff4c_0 conda-forge
glib-tools 2.68.1 h9c3ff4c_0 conda-forge
googledrivedownloader 0.4 pypi_0 pypi
gst-plugins-base 1.18.4 h29181c9_0 conda-forge
gstreamer 1.18.4 h76c114f_0 conda-forge
gxx_impl_linux-64 7.5.0 hd0bb8aa_19 conda-forge
gxx_linux-64 7.5.0 h555fc39_30 conda-forge
h5py 3.2.1 pypi_0 pypi
icu 68.1 h58526e2_0 conda-forge
idna 2.10 pypi_0 pypi
importlib-metadata 4.0.1 py38h578d9bd_0 conda-forge
intel-openmp 2020.2 254
ipykernel 5.5.3 py38hd0cf306_0 conda-forge
ipython 7.22.0 py38hd0cf306_0 conda-forge
ipython_genutils 0.2.0 py_1 conda-forge
ipywidgets 7.6.3 pyhd3deb0d_0 conda-forge
isodate 0.6.0 pypi_0 pypi
jedi 0.18.0 py38h578d9bd_2 conda-forge
jinja2 2.11.3 pyh44b312d_0 conda-forge
joblib 1.0.1 pypi_0 pypi
jpeg 9d h36c2ea0_0 conda-forge
jsonpickle 1.5.2 pypi_0 pypi
jsonschema 3.2.0 pyhd8ed1ab_3 conda-forge
jupyter 1.0.0 py38h578d9bd_6 conda-forge
jupyter_client 6.1.12 pyhd8ed1ab_0 conda-forge
jupyter_console 6.4.0 pyhd8ed1ab_0 conda-forge
jupyter_core 4.7.1 py38h578d9bd_0 conda-forge
jupyterlab_pygments 0.1.2 pyh9f0ad1d_0 conda-forge
jupyterlab_widgets 1.0.0 pyhd8ed1ab_1 conda-forge
kernel-headers_linux-64 2.6.32 h77966d4_13 conda-forge
kernels 1.0.0 dev_0 <develop>
kiwisolver 1.3.1 pypi_0 pypi
krb5 1.17.2 h926e7f8_0 conda-forge
lcms2 2.12 hddcbb42_0 conda-forge
ld_impl_linux-64 2.35.1 hea4e1c9_2 conda-forge
libblas 3.8.0 21_mkl conda-forge
libcblas 3.8.0 21_mkl conda-forge
libclang 11.1.0 default_ha53f305_0 conda-forge
libedit 3.1.20191231 he28a2e2_2 conda-forge
libevent 2.1.10 hcdb4288_3 conda-forge
libffi 3.3 h58526e2_2 conda-forge
libgcc-devel_linux-64 7.5.0 hda03d7c_19 conda-forge
libgcc-ng 9.3.0 h2828fa1_19 conda-forge
libgfortran-ng 9.3.0 hff62375_19 conda-forge
libgfortran5 9.3.0 hff62375_19 conda-forge
libglib 2.68.1 h3e27bee_0 conda-forge
libgomp 9.3.0 h2828fa1_19 conda-forge
libiconv 1.16 h516909a_0 conda-forge
liblapack 3.8.0 21_mkl conda-forge
libllvm11 11.1.0 hf817b99_2 conda-forge
libopenblas 0.3.12 pthreads_h4812303_1 conda-forge
libpng 1.6.37 h21135ba_2 conda-forge
libpq 13.1 hfd2b0eb_2 conda-forge
libsodium 1.0.18 h36c2ea0_1 conda-forge
libstdcxx-devel_linux-64 7.5.0 hb016644_19 conda-forge
libstdcxx-ng 9.3.0 h6de172a_19 conda-forge
libtiff 4.2.0 hdc55705_0 conda-forge
libuuid 2.32.1 h7f98852_1000 conda-forge
libuv 1.41.0 h7f98852_0 conda-forge
libwebp-base 1.2.0 h7f98852_2 conda-forge
libxcb 1.13 h7f98852_1003 conda-forge
libxkbcommon 1.0.3 he3ba5ed_0 conda-forge
libxml2 2.9.10 h72842e0_4 conda-forge
littleutils 0.2.2 pypi_0 pypi
llvmlite 0.36.0 pypi_0 pypi
lz4-c 1.9.3 h9c3ff4c_0 conda-forge
markupsafe 1.1.1 py38h497a2fe_3 conda-forge
matplotlib 3.4.1 pypi_0 pypi
mistune 0.8.4 py38h497a2fe_1003 conda-forge
mkl 2020.2 256
munch 2.5.0 pypi_0 pypi
mysql-common 8.0.23 ha770c72_1 conda-forge
mysql-libs 8.0.23 h935591d_1 conda-forge
nbclient 0.5.3 pyhd8ed1ab_0 conda-forge
nbconvert 6.0.7 py38h578d9bd_3 conda-forge
nbformat 5.1.3 pyhd8ed1ab_0 conda-forge
ncurses 6.2 h58526e2_4 conda-forge
nest-asyncio 1.5.1 pyhd8ed1ab_0 conda-forge
networkx 2.5.1 pypi_0 pypi
ninja 1.10.2 h4bd325d_0 conda-forge
notebook 6.3.0 pyha770c72_1 conda-forge
nspr 4.30 h9c3ff4c_0 conda-forge
nss 3.64 hb5efdd6_0 conda-forge
numba 0.53.1 pypi_0 pypi
numpy 1.20.2 py38h9894fe3_0 conda-forge
ogb 1.3.1 pypi_0 pypi
olefile 0.46 pyh9f0ad1d_1 conda-forge
openjpeg 2.4.0 hf7af979_0 conda-forge
openssl 1.1.1k h7f98852_0 conda-forge
outdated 0.2.1 pypi_0 pypi
packaging 20.9 pyh44b312d_0 conda-forge
pandas 1.2.4 py38h1abd341_0 conda-forge
pandoc 2.12 h7f98852_0 conda-forge
pandocfilters 1.4.2 py_1 conda-forge
parso 0.8.2 pyhd8ed1ab_0 conda-forge
pcre 8.44 he1b5a44_0 conda-forge
pexpect 4.8.0 pyh9f0ad1d_2 conda-forge
pickleshare 0.7.5 py_1003 conda-forge
pillow 8.1.2 py38ha0e1e83_1 conda-forge
pip 21.0.1 pyhd8ed1ab_0 conda-forge
prometheus_client 0.10.1 pyhd8ed1ab_0 conda-forge
prompt-toolkit 3.0.18 pyha770c72_0 conda-forge
prompt_toolkit 3.0.18 hd8ed1ab_0 conda-forge
pthread-stubs 0.4 h36c2ea0_1001 conda-forge
ptyprocess 0.7.0 pyhd3deb0d_0 conda-forge
py-cpuinfo 8.0.0 pypi_0 pypi
pycparser 2.20 pyh9f0ad1d_2 conda-forge
pygments 2.8.1 pyhd8ed1ab_0 conda-forge
pymongo 3.11.3 pypi_0 pypi
pyparsing 2.4.7 pyh9f0ad1d_0 conda-forge
pyqt 5.12.3 py38h578d9bd_7 conda-forge
pyqt-impl 5.12.3 py38h7400c14_7 conda-forge
pyqt5-sip 4.19.18 py38h709712a_7 conda-forge
pyqtchart 5.12 py38h7400c14_7 conda-forge
pyqtwebengine 5.12.1 py38h7400c14_7 conda-forge
pyrsistent 0.17.3 py38h497a2fe_2 conda-forge
python 3.8.8 hffdb5ce_0_cpython conda-forge
python-dateutil 2.8.1 py_0 conda-forge
python-louvain 0.15 pypi_0 pypi
python_abi 3.8 1_cp38 conda-forge
pytorch 1.8.1 py3.8_cuda10.2_cudnn7.6.5_0 pytorch
pytz 2021.1 pyhd8ed1ab_0 conda-forge
pyyaml 5.4.1 pypi_0 pypi
pyzmq 22.0.3 py38h2035c66_1 conda-forge
qt 5.12.9 hda022c4_4 conda-forge
qtconsole 5.0.3 pyhd8ed1ab_0 conda-forge
qtpy 1.9.0 py_0 conda-forge
rdflib 5.0.0 pypi_0 pypi
readline 8.1 h46c0cb4_0 conda-forge
requests 2.25.1 pypi_0 pypi
rgnn-at-scale 1.0.0 dev_0 <develop>
sacred 0.8.2 pypi_0 pypi
scikit-learn 0.24.1 pypi_0 pypi
scipy 1.6.2 pypi_0 pypi
seaborn 0.11.1 pypi_0 pypi
seml 0.3.4 pypi_0 pypi
send2trash 1.5.0 py_0 conda-forge
setuptools 49.6.0 py38h578d9bd_3 conda-forge
six 1.15.0 pyh9f0ad1d_0 conda-forge
smmap 4.0.0 pypi_0 pypi
sqlite 3.35.4 h74cdb3f_0 conda-forge
sysroot_linux-64 2.12 h77966d4_13 conda-forge
tabulate 0.8.9 pypi_0 pypi
terminado 0.9.4 py38h578d9bd_0 conda-forge
testpath 0.4.4 py_0 conda-forge
threadpoolctl 2.1.0 pypi_0 pypi
tinydb 4.4.0 pypi_0 pypi
tinydb-serialization 2.1.0 pypi_0 pypi
tk 8.6.10 h21135ba_1 conda-forge
torch-cluster 1.5.9 pypi_0 pypi
torch-geometric 1.7.0 pypi_0 pypi
torch-scatter 2.0.6 pypi_0 pypi
torch-sparse 0.6.9 pypi_0 pypi
torch-spline-conv 1.2.1 pypi_0 pypi
torchaudio 0.8.1 py38 pytorch
torchvision 0.2.2 py_3 pytorch
tornado 6.1 py38h497a2fe_1 conda-forge
tqdm 4.60.0 pypi_0 pypi
traitlets 5.0.5 py_0 conda-forge
typing_extensions 3.7.4.3 py_0 conda-forge
urllib3 1.26.4 pypi_0 pypi
wcwidth 0.2.5 pyh9f0ad1d_2 conda-forge
webencodings 0.5.1 py_1 conda-forge
wheel 0.36.2 pyhd3deb0d_0 conda-forge
widgetsnbextension 3.5.1 py38h578d9bd_4 conda-forge
wrapt 1.12.1 pypi_0 pypi
xorg-libxau 1.0.9 h7f98852_0 conda-forge
xorg-libxdmcp 1.1.3 h7f98852_0 conda-forge
xz 5.2.5 h516909a_1 conda-forge
zeromq 4.3.4 h9c3ff4c_0 conda-forge
zipp 3.4.1 pyhd8ed1ab_0 conda-forge
zlib 1.2.11 h516909a_1010 conda-forge
zstd 1.4.9 ha95c52a_0 conda-forge
It should be possible to set the TMPDIR
from where the jobs are run.
Otherwise jobs fail if there is not sufficient space on /tmp
.
The temporary directory is created in this line here.
/tmp
is hardcoded.
Starting job 5944119
SLURM assigned me the node(s): supergpu08
/var/spool/slurmd/job5944119/slurm_script: line 34: cannot create temp file for here-document: No space left on device
mkdir: cannot create directory โ/tmp/8e303830-9b96-4a67-87a7-c16a400f6705โ: No space left on device
Experiments are running under the following process IDs:
Run on a scientific cluster where /tmp
should not be used but /localscratch
instead
While running the example, I'm getting the following error:
% seml seml_example start
Starting 288 experiments in 288 Slurm jobs in 1 Slurm job array.
Traceback (most recent call last):
File "/path/venv/p3/bin/seml", line 11, in <module>
sys.exit(main())
File "/path/venv/p3/lib64/python3.6/site-packages/seml/main.py", line 210, in main
f(**args.__dict__)
File "/path/venv/p3/lib64/python3.6/site-packages/seml/start.py", line 483, in start_experiments
output_to_file=output_to_file)
File "/path/venv/p3/lib64/python3.6/site-packages/seml/start.py", line 391, in start_jobs
name=job_name, output_dir_path=output_dir_path, **slurm_config)
File "/path/venv/p3/lib64/python3.6/site-packages/seml/start.py", line 135, in start_slurm_job
with open(f"{os.path.dirname(__file__)}/slurm_template.sh", 'r') as f:
FileNotFoundError: [Errno 2] No such file or directory: '/path/venv/p3/lib64/python3.6/site-packages/seml/slurm_template.sh'
Checking start.py, the code appears to expect slurm_template.sh to be present on install:
# Construct Slurm script
with open(f"{os.path.dirname(__file__)}/slurm_template.sh", 'r') as f:
template = f.read()
However, it does not appear to be in the package:
pip show -f seml
Name: seml
Version: 0.2.3
Summary: Slurm Experiment Management Library
Home-page: http://github.com/TUM-DAML/seml
Author: DAML Group @ TUM
Author-email: [email protected], [email protected]
License: UNKNOWN
Location: <cut>
Requires: jsonpickle, munch, sacred, pandas, pyyaml, numpy, pymongo
Required-by:
Files:
../../../bin/seml
seml-0.2.3.dist-info/INSTALLER
seml-0.2.3.dist-info/LICENSE
seml-0.2.3.dist-info/METADATA
seml-0.2.3.dist-info/RECORD
seml-0.2.3.dist-info/WHEEL
seml-0.2.3.dist-info/entry_points.txt
seml-0.2.3.dist-info/top_level.txt
seml/__init__.py
seml/__pycache__/__init__.cpython-36.pyc
seml/__pycache__/config.cpython-36.pyc
seml/__pycache__/database.cpython-36.pyc
seml/__pycache__/evaluation.cpython-36.pyc
seml/__pycache__/experiment.cpython-36.pyc
seml/__pycache__/main.cpython-36.pyc
seml/__pycache__/manage.cpython-36.pyc
seml/__pycache__/observers.cpython-36.pyc
seml/__pycache__/parameters.cpython-36.pyc
seml/__pycache__/prepare_experiment.cpython-36.pyc
seml/__pycache__/queuing.cpython-36.pyc
seml/__pycache__/settings.cpython-36.pyc
seml/__pycache__/sources.cpython-36.pyc
seml/__pycache__/start.cpython-36.pyc
seml/__pycache__/utils.cpython-36.pyc
seml/config.py
seml/database.py
seml/evaluation.py
seml/experiment.py
seml/main.py
seml/manage.py
seml/observers.py
seml/parameters.py
seml/prepare_experiment.py
seml/queuing.py
seml/settings.py
seml/sources.py
seml/start.py
seml/utils.py
When some (SLURM) config params are wrong (e.g. being over max allowed time), I'd expect to get a nice error message.
Non-informative error
time: 3-00:00
, wheras maximum allowed time for 1 job is 2 days, not 3Traceback (most recent call last):
File "/home/icb/marius.lange/.miniconda3/envs/nsode/bin/seml", line 8, in <module>
sys.exit(main())
File "/home/icb/marius.lange/.miniconda3/envs/nsode/lib/python3.8/site-packages/seml/main.py", line 279, in main
f(**vars(command))
File "/home/icb/marius.lange/.miniconda3/envs/nsode/lib/python3.8/site-packages/seml/start.py", line 801, in start_experiments
add_to_slurm_queue(collection=collection, exps_list=staged_experiments, unobserved=unobserved,
File "/home/icb/marius.lange/.miniconda3/envs/nsode/lib/python3.8/site-packages/seml/start.py", line 564, in add_to_slurm_queue
start_sbatch_job(collection, exp_array, unobserved,
File "/home/icb/marius.lange/.miniconda3/envs/nsode/lib/python3.8/site-packages/seml/start.py", line 229, in start_sbatch_job
output = subprocess.run(f'sbatch {path}', shell=True, check=True, capture_output=True).stdout
File "/home/icb/marius.lange/.miniconda3/envs/nsode/lib/python3.8/subprocess.py", line 516, in run
raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command 'sbatch /tmp/1ab7cf9b-23a4-4f28-a1ba-99271cda7b60.sh' returned non-zero exit status 1.
conda list
):I want to create relationships between jobs. For the beginning, it should suffice to create for each
and once for all
relationships. These connected jobs should result in a Directed Acyclic Graph (DAG) that jobs can be easily executed from root nodes to leaf nodes. If I trigger job A the framework should check if all preceding jobs do exist and if not they should be queued first. The execution order of jobs (e.g. A should start after all preceding jobs finished) can be defined in SLURM via the sbatch
command (Example). Jobs that can be executed in parallel should use the parallelism determined by the SLURM scheduler.
It would be easiest to use an existing framework to capture these features. Often they also provide a nice frontend etc. However, I feat that the current yaml configuration files are hardly compatible with other existing solutions.
If we decide to extend SEML, I suggest:
n
jobs, and path/to/a.yaml
defines m
jobs, then this results in a total of m * n
jobs.n
jobs, and path/to/b.yaml
defines k
jobs, then this results in a total of n
jobs.n
jobs, path/to/a.yaml
defines m
jobs, and path/to/b.yaml
defines m
jobs (with relationtype of for_each_of
), then this results in a total of k * m *n
jobs.Example:
seml:
...
dependencies:
- for_each_of: path/to/a.yaml
- once_for_all_of: path/to/b.yaml
Here some references to other pipeline frameworks (mostly for inspiration):
seml jupyter
relies on https://github.com/TUM-DAML/seml/blob/master/seml/jupyter_template.sh#L20 to determine the FQDN. However, the FQDN is neither necessarily unique nor is the first entry of hostname --all-fqdn
a reliable way of determining the hostname (e.g. see https://www.computerhope.com/unix/uhostnam.htm#The-FQDN "Do not make any assumptions about the order of the output.").
Thus, I suggest sacrificing readability and using the plain IP address.
Using the Slurm default parameters when slurm
block is missing in the experiment configuration file.
TypeError
as slurm_config
is None
when slurm
block is missing in the experiment configuration file.
slurm
block from examples/example_config.yaml
seml seml_example queue examples/example_config.yaml
Traceback (most recent call last):
File "<PATH_TO_CONDA_ENV>/bin/seml", line 8, in <module>
sys.exit(main())
File "<PATH_TO_CONDA_ENV>/lib/python3.8/site-packages/seml/main.py", line 210, in main
f(**args.__dict__)
File "<PATH_TO_CONDA_ENV>/lib/python3.8/site-packages/seml/queuing.py", line 124, in queue_experiments
if k not in slurm_config['sbatch_options']:
TypeError: 'NoneType' object is not subscriptable
At the moment one has to edit the config file if one is interested in changing a single parameter. It would be great if we could do this via CLI. The case I most encounter is that I want to name different batches. Currently, I will have to either edit the yaml
file and set a name there or put everything into a different MongoDB collection. Since it is much more comfortable to do this via the CLI I do the latter but the drawback is that my MongoDB is getting polluted with lots of collections.
My proposal would be to overwrite parameters via CLI similar to sacred
. In sacred
one can do
python script.py with param1=5
For seml
I could envision something like
seml <collection> add <yaml> with param1=5
The syntax with with
might be unfitting and up to discussion. Other points of discussion are the interactions with subconfigs and grid
in general.
I ran into the following problem using tensorflow 1.15. The collect collect_exp_stats hook throws an error when serializing the MaxBytesInUse() Tensor.
The post run hook should run without error and store the stats in the database.
The MaxBytesInUse() Tensor cannot be serialized and an error is thrown.
import seml
from sacred import Experiment
import tensorflow
ex = Experiment()
seml.setup_logger(ex)
@ex.post_run_hook
def collect_stats(_run):
seml.collect_exp_stats(_run)
@ex.config
def config():
overwrite = None
db_collection = None
if db_collection is not None:
ex.observers.append(
seml.create_mongodb_observer(db_collection, overwrite=overwrite)
)
@ex.automain
def run():
return dict()
seml:
executable: run.py
project_root_dir: .
output_dir: .
conda_environment: tf15
slurm:
experiments_per_job: 1
sbatch_options:
gres: gpu:1 # num GPUs
mem: 16G # memory
cpus-per-task: 1 # num cores
time: 0-24:00 # max time, D-HH:MM
qos: interactive
Traceback (most recent calls WITHOUT Sacred internals):
File "/tmp/17504/run.py", line 11, in collect_stats
seml.collect_exp_stats(_run)
File "/nfs/homedirs/kibler/miniconda3/envs/tf15/lib/python3.7/site-packages/seml/experiment.py", line 89, in collect_exp_stats
{'$set': {'stats': stats}})
File "/nfs/homedirs/kibler/miniconda3/envs/tf15/lib/python3.7/site-packages/pymongo/collection.py", line 1024, in update_one
hint=hint, session=session),
File "/nfs/homedirs/kibler/miniconda3/envs/tf15/lib/python3.7/site-packages/pymongo/collection.py", line 870, in _update_retryable
_update, session)
File "/nfs/homedirs/kibler/miniconda3/envs/tf15/lib/python3.7/site-packages/pymongo/mongo_client.py", line 1498, in _retryable_write
return self._retry_with_session(retryable, func, s, None)
File "/nfs/homedirs/kibler/miniconda3/envs/tf15/lib/python3.7/site-packages/pymongo/mongo_client.py", line 1384, in _retry_with_session
return self._retry_internal(retryable, func, session, bulk)
File "/nfs/homedirs/kibler/miniconda3/envs/tf15/lib/python3.7/site-packages/pymongo/mongo_client.py", line 1416, in _retry_internal
return func(session, sock_info, retryable)
File "/nfs/homedirs/kibler/miniconda3/envs/tf15/lib/python3.7/site-packages/pymongo/collection.py", line 866, in _update
retryable_write=retryable_write)
File "/nfs/homedirs/kibler/miniconda3/envs/tf15/lib/python3.7/site-packages/pymongo/collection.py", line 836, in _update
retryable_write=retryable_write).copy()
File "/nfs/homedirs/kibler/miniconda3/envs/tf15/lib/python3.7/site-packages/pymongo/pool.py", line 699, in command
self._raise_connection_failure(error)
File "/nfs/homedirs/kibler/miniconda3/envs/tf15/lib/python3.7/site-packages/pymongo/pool.py", line 694, in command
exhaust_allowed=exhaust_allowed)
File "/nfs/homedirs/kibler/miniconda3/envs/tf15/lib/python3.7/site-packages/pymongo/network.py", line 122, in command
codec_options, ctx=compression_ctx)
File "/nfs/homedirs/kibler/miniconda3/envs/tf15/lib/python3.7/site-packages/pymongo/message.py", line 715, in _op_msg
flags, command, identifier, docs, check_keys, opts)
bson.errors.InvalidDocument: cannot encode object: <tf.Tensor 'MaxBytesInUse:0' shape=() dtype=int64>, of type: <class 'tensorflow.python.framework.ops.Tensor'>
Wrapping in as session, evaluating and casting to int doesn't throw an error. I am not though, if it will return the correct value.
if tf.test.is_gpu_available():
with tf.Session() as sess:
stats['tensorflow']['gpu_max_memory_bytes'] = int(tf.contrib.memory_stats.MaxBytesInUse().eval())
I believe many projects (and many cluster quotas) would profit from having easy access to profiling for their experiments. In my seml
projects I used this code: https://gist.github.com/siboehm/bf69a17cc9bca71c37a2fae0214a1eeb
It profiles the whole experiment using py-spy
. It works well even for long running experiments (many hours). The size of the profile is a few MB and the CPU overhead of the profiler is pretty low. It runs in a subprocess and doesn't require changes to the main experiment.
Would you be interested in getting a PR that integrates profiling via py-spy
into seml directly? If not, should I add the Gist as an example or can we link to it somehow?
When defining a grid search of the type range
seml should follow the behaviour of np.arange
and accept float parameters as steps.
The step size is cast to an integer in this code line: https://github.com/TUM-DAML/seml/blob/master/seml/parameters.py#L179
Which in some cases results in the wrong step size and in extreme cases causes a division by zero error.
range
with a step size of 0.1
.ZeroDivisionError
as pasted below.(seml) [hborras@ceg-octane sensitivity-metric]$ seml mlp-test3 add mlp_global_noise_experiment.yaml
Traceback (most recent call last):
File "/home/hborras/.conda/envs/seml/bin/seml", line 8, in <module>
sys.exit(main())
File "/home/hborras/.conda/envs/seml/lib/python3.9/site-packages/seml/main.py", line 298, in main
f(**vars(command))
File "/home/hborras/.conda/envs/seml/lib/python3.9/site-packages/seml/add.py", line 153, in add_experiments
configs = generate_configs(experiment_config, overwrite_params=overwrite_params)
File "/home/hborras/.conda/envs/seml/lib/python3.9/site-packages/seml/config.py", line 224, in generate_configs
grids = [generate_grid(v, parent_key=k) for k, v in grid_params.items()]
File "/home/hborras/.conda/envs/seml/lib/python3.9/site-packages/seml/config.py", line 224, in <listcomp>
grids = [generate_grid(v, parent_key=k) for k, v in grid_params.items()]
File "/home/hborras/.conda/envs/seml/lib/python3.9/site-packages/seml/parameters.py", line 181, in generate_grid
values = list(np.arange(min_val, max_val, step))
ZeroDivisionError: float division by zero
(seml) [hborras@ceg-octane examples]$ uname -a
Linux ceg-octane 3.10.0-1127.19.1.el7.x86_64 #1 SMP Tue Aug 25 17:23:54 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
conda list
):(seml) [hborras@ceg-octane examples]$ conda list
# packages in environment at /home/hborras/.conda/envs/seml:
#
# Name Version Build Channel
_libgcc_mutex 0.1 main
_openmp_mutex 4.5 1_gnu
ca-certificates 2021.10.26 h06a4308_2
certifi 2021.10.8 py37h06a4308_2
colorama 0.4.4 pypi_0 pypi
debugpy 1.5.1 pypi_0 pypi
docopt 0.6.2 pypi_0 pypi
gitdb 4.0.9 pypi_0 pypi
gitpython 3.1.27 pypi_0 pypi
importlib-metadata 4.11.2 pypi_0 pypi
jsonpickle 1.5.2 pypi_0 pypi
libedit 3.1.20210910 h7f8727e_0
libffi 3.2.1 hf484d3e_1007
libgcc-ng 9.3.0 h5101ec6_17
libgomp 9.3.0 h5101ec6_17
libstdcxx-ng 9.3.0 hd4cf53a_17
munch 2.5.0 pypi_0 pypi
ncurses 6.3 h7f8727e_2
numpy 1.21.5 pypi_0 pypi
openssl 1.0.2u h7b6447c_0
packaging 21.3 pypi_0 pypi
pandas 1.1.5 pypi_0 pypi
pip 21.2.2 py37h06a4308_0
py-cpuinfo 8.0.0 pypi_0 pypi
pymongo 4.0.1 pypi_0 pypi
pyparsing 3.0.7 pypi_0 pypi
python 3.7.0 h6e4f718_3
python-dateutil 2.8.2 pypi_0 pypi
pytz 2021.3 pypi_0 pypi
pyyaml 6.0 pypi_0 pypi
readline 7.0 h7b6447c_5
sacred 0.8.2 pypi_0 pypi
seml 0.3.6 pypi_0 pypi
setuptools 58.0.4 py37h06a4308_0
six 1.16.0 pypi_0 pypi
smmap 5.0.0 pypi_0 pypi
sqlite 3.33.0 h62c20be_0
tk 8.6.11 h1ccaba5_0
tqdm 4.63.0 pypi_0 pypi
typing-extensions 4.1.1 pypi_0 pypi
wheel 0.37.1 pyhd3eb1b0_0
wrapt 1.13.3 pypi_0 pypi
xz 5.2.5 h7b6447c_0
zipp 3.7.0 pypi_0 pypi
zlib 1.2.11 h7f8727e_4
Thanks for the great work on seml!
One thing I am curious about: Is it possible to run seml without starting a sacred experiment?
My use case is the following: In the data preprocessing stage I want to extract intermediate data representations which are varied by multiple parameters. Those steps are compute intensive, that's why I am using a cluster with slurm. I want to do a search over the parameter space and then collect all of the intermediate data representations.
When using a central mongodb, there's no information about what user queued/ran an experiment collection. Looks like this is an issue with sacred as well. However, this is a somewhat worse problem with SEML as SLURM is typically used in a multi-user environment. One workaround is for me to ask each user to create their own MongoDB. Another workaround is for me to ask each user to prepend their experiment name with their username. Both of these are problematic.
Perhaps seml could be updated to address this in some way? One idea is to automatically prefix the collection in the DB with the username. That would allow keeping the start_id code unchanged. Another idea is to add the username in the collection itself, though I could see that requiring a lot more changes.
Apologies in advance if there's a simple workaround I'm missing.
I have started using SEML recently, and I have often had to parse through the source code to figure out the behaviour of some functions, what flags I have available, etc. It would be nice, and not too hard to make more detailed documentation (or have I missed that it already exists or is in the works?).
If it's not already being done and if you agree that's it's a reasonable thing to do, I would be willing to do it and send a PR. There are a couple of nice options for the docs which would make them easy to maintain
My personal preference is readthedocs, but curious to hear your thoughts on this.
The example_config.yaml
should generate 192 configs.
for small_datasets: 5(learning_rate) * 3(dropout) * 2(dataset) * 2(hidden_size) * 3(max_epoch)=180
for large_dataset: 3(dropout) * 2(dataset) * 2(hidden_size)=12
Only 72 configs are generated:
for small_datasets: 5(learning_rate) * 3(dropout) * 2(dataset) * 2(hidden_size) * 1(max_epoch)=60
for large_dataset: 3(dropout) * 2(dataset) * 2(hidden_size)=12
seml seml_test add example_config.yaml
Right now, seml
only checkpoints your python code. However, there may be additional files that change during development, e.g., yaml
or json
files. It would be great to be able to specify a list of files that should be checkpointed in addition to the code.
Passing an api_token
to the create_neptune_observer
should make the function use this argument.
The argument is ignored and the default setting or the setting defined in the config file is used.
Starting job 68847
SLURM assigned me the node(s): ceg-brook02
Experiments are running under the following process IDs:
Experiment ID: 82 Process ID: 2474
Traceback (most recent calls WITHOUT Sacred internals):
File "/home/hborras/.conda/envs/seml/lib/python3.7/site-packages/neptune/internal/api_clients/credentials.py", line 95, in _api_token_to_dict
return json.loads(base64.b64decode(api_token.encode()).decode("utf-8"))
File "/home/hborras/.conda/envs/seml/lib/python3.7/base64.py", line 87, in b64decode
return binascii.a2b_base64(s)
binascii.Error: Invalid base64-encoded string: length cannot be 1 more than a multiple of 4
During handling of the above exception, another exception occurred:
Traceback (most recent calls WITHOUT Sacred internals):
File "/tmp/e9f90b9f-fc54-4079-a03f-22190d11122d/examples/example_experiment.py", line 24, in <module>
ex.observers.append(seml.create_neptune_observer('hendrikb/'+db_collection, api_token=None))
File "/home/hborras/seml-test2/seml/seml/observers.py", line 163, in create_neptune_observer
neptune_obs = NeptuneObserver(api_token=api_token, project_name=project_name, source_extensions=source_extensions)
File "/home/hborras/.conda/envs/seml/lib/python3.7/site-packages/neptunecontrib/monitoring/sacred.py", line 81, in __init__
neptune.init(project_qualified_name=project_name, api_token=api_token)
File "/home/hborras/.conda/envs/seml/lib/python3.7/site-packages/neptune/__init__.py", line 185, in init
proxies=proxies,
File "/home/hborras/.conda/envs/seml/lib/python3.7/site-packages/neptune/internal/api_clients/backend_factory.py", line 30, in backend_factory
return HostedNeptuneBackendApiClient(api_token, proxies)
File "/home/hborras/.conda/envs/seml/lib/python3.7/site-packages/neptune/utils.py", line 298, in wrapper
return func(*args, **kwargs)
File "/home/hborras/.conda/envs/seml/lib/python3.7/site-packages/neptune/internal/api_clients/hosted_api_clients/hosted_backend_api_client.py", line 69, in __init__
self.credentials = Credentials(api_token)
File "/home/hborras/.conda/envs/seml/lib/python3.7/site-packages/neptune/internal/api_clients/credentials.py", line 76, in __init__
token_dict = self._api_token_to_dict(self.api_token)
File "/home/hborras/.conda/envs/seml/lib/python3.7/site-packages/neptune/internal/api_clients/credentials.py", line 97, in _api_token_to_dict
raise InvalidApiKey()
neptune.api_exceptions.InvalidApiKey:
----InvalidApiKey-----------------------------------------------------------------------
Your API token is invalid.
Learn how to get it in this docs page:
https://docs-legacy.neptune.ai/security-and-privacy/api-tokens/how-to-find-and-set-neptune-api-token.html
There are two options to add it:
- specify it in your code
- set an environment variable in your operating system.
CODE
Pass the token to neptune.init() via api_token argument:
neptune.init(project_qualified_name='WORKSPACE_NAME/PROJECT_NAME', api_token='YOUR_API_TOKEN')
ENVIRONMENT VARIABLE (Recommended option)
or export or set an environment variable depending on your operating system:
Linux/Unix
In your terminal run:
export NEPTUNE_API_TOKEN=YOUR_API_TOKEN
Windows
In your CMD run:
set NEPTUNE_API_TOKEN=YOUR_API_TOKEN
and skip the api_token argument of neptune.init():
neptune.init(project_qualified_name='WORKSPACE_NAME/PROJECT_NAME')
You may also want to check the following docs pages:
- https://docs-legacy.neptune.ai/security-and-privacy/api-tokens/how-to-find-and-set-neptune-api-token.html
- https://docs-legacy.neptune.ai/getting-started/quick-starts/log_first_experiment.html
Need help?-> https://docs-legacy.neptune.ai/getting-started/getting-help.html
(seml) [hborras@ceg-octane examples]$ uname -a
Linux ceg-octane 3.10.0-1127.19.1.el7.x86_64 #1 SMP Tue Aug 25 17:23:54 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
conda list
):(seml) [hborras@ceg-octane examples]$ conda list
# packages in environment at /home/hborras/.conda/envs/seml:
#
# Name Version Build Channel
_libgcc_mutex 0.1 main
_openmp_mutex 4.5 1_gnu
ca-certificates 2021.10.26 h06a4308_2
certifi 2021.10.8 py37h06a4308_2
colorama 0.4.4 pypi_0 pypi
debugpy 1.5.1 pypi_0 pypi
docopt 0.6.2 pypi_0 pypi
gitdb 4.0.9 pypi_0 pypi
gitpython 3.1.27 pypi_0 pypi
importlib-metadata 4.11.2 pypi_0 pypi
jsonpickle 1.5.2 pypi_0 pypi
libedit 3.1.20210910 h7f8727e_0
libffi 3.2.1 hf484d3e_1007
libgcc-ng 9.3.0 h5101ec6_17
libgomp 9.3.0 h5101ec6_17
libstdcxx-ng 9.3.0 hd4cf53a_17
munch 2.5.0 pypi_0 pypi
ncurses 6.3 h7f8727e_2
numpy 1.21.5 pypi_0 pypi
openssl 1.0.2u h7b6447c_0
packaging 21.3 pypi_0 pypi
pandas 1.1.5 pypi_0 pypi
pip 21.2.2 py37h06a4308_0
py-cpuinfo 8.0.0 pypi_0 pypi
pymongo 4.0.1 pypi_0 pypi
pyparsing 3.0.7 pypi_0 pypi
python 3.7.0 h6e4f718_3
python-dateutil 2.8.2 pypi_0 pypi
pytz 2021.3 pypi_0 pypi
pyyaml 6.0 pypi_0 pypi
readline 7.0 h7b6447c_5
sacred 0.8.2 pypi_0 pypi
seml 0.3.6 pypi_0 pypi
setuptools 58.0.4 py37h06a4308_0
six 1.16.0 pypi_0 pypi
smmap 5.0.0 pypi_0 pypi
sqlite 3.33.0 h62c20be_0
tk 8.6.11 h1ccaba5_0
tqdm 4.63.0 pypi_0 pypi
typing-extensions 4.1.1 pypi_0 pypi
wheel 0.37.1 pyhd3eb1b0_0
wrapt 1.13.3 pypi_0 pypi
xz 5.2.5 h7b6447c_0
zipp 3.7.0 pypi_0 pypi
zlib 1.2.11 h7f8727e_4
seml XXX add YYY --no-hash
should do the same thing as seml XXX add YYY
.
It doesn't once you have nested entries in your config; specifically, in these cases, it adds duplicate configs to the database.
The offending statement is (https://github.com/TUM-DAML/seml/blob/master/seml/add.py#L43-L45):
lookup_dict = {
f'config.{key}': value for key, value in config.items()
}
Once you have nested dictionaries here, MongoDB no longer correctly parses the query which uses the above dict as filter dict. This can be fixed by replacing those three lines with, for example:
lookup_dict = flatten({'config': config})
where flatten
is imported from seml.utils
.
Started Jupyter job in Slurm job with ID 12345.
The logfile of the job is /nfs/homedirs/zuegnerd/libraries/seml/slurm-6322311.out.
Trying to fetch the machine and port of the Jupyter instance once the job is running... (ctrl-C to cancel).
Jupyter instance is starting up...
Startup completed. The Jupyter instance is running at 'gpuxx.kdd.in.tum.de:8889'.
To stop the job, run 'scancel 12345'.
(base) username@fs:~$ seml jupyter
Traceback (most recent call last):
File "/nfs/homedirs/username/miniconda3/bin/seml", line 8, in <module>
sys.exit(main())
File "/nfs/homedirs/username/miniconda3/lib/python3.8/site-packages/seml/main.py", line 223, in main
f(**args.__dict__)
File "/nfs/homedirs/username/miniconda3/lib/python3.8/site-packages/seml/start.py", line 820, in start_jupyter_job
output = subprocess.run(f'sbatch {path}', shell=True, check=True, capture_output=True).stdout
File "/nfs/homedirs/username/miniconda3/lib/python3.8/subprocess.py", line 512, in run
raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command 'sbatch /tmp/414481.sh' returned non-zero exit status 1.
(base) username@fs:~$ ls -al /tmp/ | grep 'username'
-rw-r--r-- 1 username user 791 Apr 13 16:52 169206.sh
-rw-r--r-- 1 username user 791 Apr 13 16:50 27284.sh
-rw-r--r-- 1 username user 791 Apr 13 17:02 414481.sh
-rw-r--r-- 1 username user 791 Apr 13 16:52 642607.sh
-rw-r--r-- 1 username user 791 Apr 13 16:50 75671.sh
-rw-r--r-- 1 username user 791 Apr 13 16:59 856739.sh
drwx------ 2 username user 4096 Apr 13 07:42 tracker-extract-files.31921
(base) username@fs:~$ sbatch /tmp/414481.sh
sbatch: error: Batch job submission failed: Invalid account or account/partition combination specified
Traceback (most recent call last):
File "/nfs/homedirs/username/miniconda3/bin/seml", line 8, in <module>
sys.exit(main())
File "/nfs/homedirs/username/miniconda3/lib/python3.8/site-packages/seml/main.py", line 223, in main
f(**args.__dict__)
File "/nfs/homedirs/username/miniconda3/lib/python3.8/site-packages/seml/start.py", line 820, in start_jupyter_job
output = subprocess.run(f'sbatch {path}', shell=True, check=True, capture_output=True).stdout
File "/nfs/homedirs/username/miniconda3/lib/python3.8/subprocess.py", line 512, in run
raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command 'sbatch /tmp/414481.sh' returned non-zero exit status 1.
conda list
):# packages in environment at <condapath>/miniconda3:
#
# Name Version Build Channel
_libgcc_mutex 0.1 main
argon2-cffi 20.1.0 pypi_0 pypi
async-generator 1.10 pypi_0 pypi
attrs 20.3.0 pypi_0 pypi
backcall 0.2.0 pypi_0 pypi
bleach 3.3.0 pypi_0 pypi
brotlipy 0.7.0 py38h27cfd23_1003
ca-certificates 2021.1.19 h06a4308_1
certifi 2020.12.5 py38h06a4308_0
cffi 1.14.3 py38h261ae71_2
chardet 3.0.4 py38h06a4308_1003
colorama 0.4.4 pypi_0 pypi
conda 4.10.0 py38h06a4308_0
conda-package-handling 1.7.2 py38h03888b9_0
cryptography 3.2.1 py38h3c74f83_1
debugpy 1.2.1 pypi_0 pypi
decorator 5.0.6 pypi_0 pypi
defusedxml 0.7.1 pypi_0 pypi
docopt 0.6.2 pypi_0 pypi
entrypoints 0.3 pypi_0 pypi
gitdb 4.0.7 pypi_0 pypi
gitpython 3.1.14 pypi_0 pypi
idna 2.10 py_0
ipykernel 5.5.3 pypi_0 pypi
ipython 7.22.0 pypi_0 pypi
ipython-genutils 0.2.0 pypi_0 pypi
ipywidgets 7.6.3 pypi_0 pypi
jedi 0.18.0 pypi_0 pypi
jinja2 2.11.3 pypi_0 pypi
jsonpickle 1.5.2 pypi_0 pypi
jsonschema 3.2.0 pypi_0 pypi
jupyter 1.0.0 pypi_0 pypi
jupyter-client 6.1.12 pypi_0 pypi
jupyter-console 6.4.0 pypi_0 pypi
jupyter-core 4.7.1 pypi_0 pypi
jupyterlab-pygments 0.1.2 pypi_0 pypi
jupyterlab-widgets 1.0.0 pypi_0 pypi
ld_impl_linux-64 2.33.1 h53a641e_7
libedit 3.1.20191231 h14c3975_1
libffi 3.3 he6710b0_2
libgcc-ng 9.1.0 hdf63c60_0
libstdcxx-ng 9.1.0 hdf63c60_0
markupsafe 1.1.1 pypi_0 pypi
mistune 0.8.4 pypi_0 pypi
munch 2.5.0 pypi_0 pypi
nbclient 0.5.3 pypi_0 pypi
nbconvert 6.0.7 pypi_0 pypi
nbformat 5.1.3 pypi_0 pypi
ncurses 6.2 he6710b0_1
nest-asyncio 1.5.1 pypi_0 pypi
notebook 6.3.0 pypi_0 pypi
numpy 1.20.2 pypi_0 pypi
openssl 1.1.1k h27cfd23_0
packaging 20.9 pypi_0 pypi
pandas 1.2.4 pypi_0 pypi
pandocfilters 1.4.3 pypi_0 pypi
parso 0.8.2 pypi_0 pypi
pexpect 4.8.0 pypi_0 pypi
pickleshare 0.7.5 pypi_0 pypi
pip 20.2.4 py38h06a4308_0
prometheus-client 0.10.1 pypi_0 pypi
prompt-toolkit 3.0.18 pypi_0 pypi
ptyprocess 0.7.0 pypi_0 pypi
py-cpuinfo 7.0.0 pypi_0 pypi
pycosat 0.6.3 py38h7b6447c_1
pycparser 2.20 py_2
pygments 2.8.1 pypi_0 pypi
pymongo 3.11.3 pypi_0 pypi
pyopenssl 19.1.0 pyhd3eb1b0_1
pyparsing 2.4.7 pypi_0 pypi
pyrsistent 0.17.3 pypi_0 pypi
pysocks 1.7.1 py38h06a4308_0
python 3.8.5 h7579374_1
python-dateutil 2.8.1 pypi_0 pypi
pytz 2021.1 pypi_0 pypi
pyyaml 5.4.1 pypi_0 pypi
pyzmq 22.0.3 pypi_0 pypi
qtconsole 5.0.3 pypi_0 pypi
qtpy 1.9.0 pypi_0 pypi
readline 8.0 h7b6447c_0
requests 2.24.0 py_0
ruamel_yaml 0.15.87 py38h7b6447c_1
sacred 0.8.2 pypi_0 pypi
seml 0.3.0 pypi_0 pypi
send2trash 1.5.0 pypi_0 pypi
setuptools 50.3.1 py38h06a4308_1
six 1.15.0 py38h06a4308_0
smmap 4.0.0 pypi_0 pypi
sqlite 3.33.0 h62c20be_0
terminado 0.9.4 pypi_0 pypi
testpath 0.4.4 pypi_0 pypi
tk 8.6.10 hbc83047_0
tornado 6.1 pypi_0 pypi
tqdm 4.51.0 pyhd3eb1b0_0
traitlets 5.0.5 pypi_0 pypi
urllib3 1.25.11 py_0
wcwidth 0.2.5 pypi_0 pypi
webencodings 0.5.1 pypi_0 pypi
wheel 0.35.1 pyhd3eb1b0_0
widgetsnbextension 3.5.1 pypi_0 pypi
wrapt 1.12.1 pypi_0 pypi
xz 5.2.5 h7b6447c_0
yaml 0.2.5 h7b6447c_0
zlib 1.2.11 h7b6447c_3
(base) username@fs:~$ sbatch /tmp/414481.sh
sbatch: error: Batch job submission failed: Invalid account or account/partition combination specified
Hello, to debug the above error, I tried to find the file under /tmp/, and after I knew that it still exists there, run the above command.
Any suggestion? editing in start.py or subprocess.py file?
% cat experiment_log.out
Starting job 376
SLURM assigned me the node(s): full node names
...
SLURM assigned me the node(s): node name truncated to 20 char
Looks like squeue -O defaults to 20 chars output unless "type[:[.]size]" is specified. I'd suggest a larger size for the nodelist parameter. e.g. cloud node names are frequently quite long because they include the shape .
# Print job information
echo "Starting job ${{SLURM_JOBID}}"
echo "SLURM assigned me the node(s): $(squeue -j ${{SLURM_JOBID}} -O nodelist | tail -n +2)"
https://slurm.schedmd.com/squeue.html
-O <output_format>, --Format=<output_format>
Specify the information to be displayed. Also see the -o <output_format>, --format=<output_format> option described below (which supports greater flexibility in formatting, but does not support access to all fields because we ran out of letters). Requests a comma separated list of job information to be displayed.
The format of each field is "type[:[.]size]"
size
is the minimum field size. If no size is specified, 20 characters will be allocated to print the information.
.
indicates the output should be right justified and size must be specified. By default, output is left justified.
Calling seml
without any arguments should print a help dialog with the expected arguments
Raises attribute error
seml
in the shellTraceback (most recent call last):
File "/home/icb/simon.boehm/miniconda3/envs/cpa_graphs/bin/seml", line 8, in <module>
sys.exit(main())
File "/home/icb/simon.boehm/miniconda3/envs/cpa_graphs/lib/python3.7/site-packages/seml/main.py", line 220, in main
if args.func in [mongodb_credentials_prompt, start_jupyter_job]:
AttributeError: 'Namespace' object has no attribute 'func'
conda list
):# Name Version Build Channel
_libgcc_mutex 0.1 conda_forge conda-forge
_openmp_mutex 4.5 1_llvm conda-forge
adjusttext 0.7.3.1 py_1 conda-forge
alabaster 0.7.12 py_0 conda-forge
alsa-lib 1.2.3 h516909a_0 conda-forge
anndata 0.7.6 py37h89c1867_0 conda-forge
argcomplete 1.12.3 pyhd8ed1ab_2 conda-forge
argon2-cffi 20.1.0 py37h5e8e339_2 conda-forge
async_generator 1.10 py_0 conda-forge
attrs 21.2.0 pyhd8ed1ab_0 conda-forge
babel 2.9.1 pyh44b312d_0 conda-forge
backcall 0.2.0 pyh9f0ad1d_0 conda-forge
backports 1.0 py_2 conda-forge
backports.functools_lru_cache 1.6.4 pyhd8ed1ab_0 conda-forge
blas 2.111 mkl conda-forge
blas-devel 3.9.0 11_linux64_mkl conda-forge
bleach 4.1.0 pyhd8ed1ab_0 conda-forge
blosc 1.21.0 h9c3ff4c_0 conda-forge
boost 1.74.0 py37h6dcda5c_3 conda-forge
boost-cpp 1.74.0 h312852a_4 conda-forge
brotlipy 0.7.0 py37h5e8e339_1001 conda-forge
bzip2 1.0.8 h7f98852_4 conda-forge
c-ares 1.17.2 h7f98852_0 conda-forge
ca-certificates 2021.10.8 ha878542_0 conda-forge
cached-property 1.5.2 hd8ed1ab_1 conda-forge
cached_property 1.5.2 pyha770c72_1 conda-forge
cairo 1.16.0 h6cf1ce9_1008 conda-forge
certifi 2021.10.8 py37h89c1867_0 conda-forge
cffi 1.14.6 py37h036bc23_1 conda-forge
chardet 4.0.0 py37h89c1867_1 conda-forge
charset-normalizer 2.0.0 pyhd8ed1ab_0 conda-forge
cloudpickle 2.0.0 pyhd8ed1ab_0 conda-forge
colorama 0.4.4 pyh9f0ad1d_0 conda-forge
cryptography 3.4.7 py37h5d9358c_0 conda-forge
cudatoolkit 10.2.89 h8f6ccaa_9 conda-forge
curl 7.79.1 h2574ce0_1 conda-forge
cycler 0.10.0 py_2 conda-forge
dbus 1.13.6 h48d8840_2 conda-forge
debugpy 1.4.1 py37hcd2ae1e_0 conda-forge
decorator 4.4.2 py_0 conda-forge
defusedxml 0.7.1 pyhd8ed1ab_0 conda-forge
dgl 0.7.1 py37_0 dglteam
dgllife 0.2.6 py37_0 dglteam
docopt 0.6.2 pypi_0 pypi
docutils 0.17.1 py37h89c1867_0 conda-forge
dunamai 1.6.0 pyhd8ed1ab_0 conda-forge
entrypoints 0.3 py37hc8dfbb8_1002 conda-forge
expat 2.4.1 h9c3ff4c_0 conda-forge
fontconfig 2.13.1 hba837de_1005 conda-forge
freetype 2.10.4 h0708190_1 conda-forge
get_version 3.5.3 pyhd8ed1ab_0 conda-forge
gettext 0.19.8.1 h73d1719_1008 conda-forge
git 2.33.0 pl5321hc30692c_2 conda-forge
gitdb 4.0.7 pypi_0 pypi
gitpython 3.1.24 pypi_0 pypi
glib 2.68.4 h9c3ff4c_1 conda-forge
glib-tools 2.68.4 h9c3ff4c_1 conda-forge
greenlet 1.1.2 py37hcd2ae1e_0 conda-forge
gst-plugins-base 1.18.5 hf529b03_0 conda-forge
gstreamer 1.18.5 h76c114f_0 conda-forge
h5py 3.4.0 nompi_py37hd308b1e_101 conda-forge
hdf5 1.12.1 nompi_h2750804_101 conda-forge
icu 68.1 h58526e2_0 conda-forge
idna 3.1 pyhd3deb0d_0 conda-forge
imagesize 1.2.0 py_0 conda-forge
importlib-metadata 4.8.1 py37h89c1867_0 conda-forge
importlib_metadata 4.8.1 hd8ed1ab_0 conda-forge
intel-openmp 2021.3.0 h06a4308_3350
ipykernel 6.4.1 py37h6531663_0 conda-forge
ipython 7.28.0 py37h6531663_0 conda-forge
ipython_genutils 0.2.0 py_1 conda-forge
ipywidgets 7.6.5 pyhd8ed1ab_0 conda-forge
jbig 2.1 h7f98852_2003 conda-forge
jedi 0.18.0 py37h89c1867_2 conda-forge
jinja2 3.0.2 pyhd8ed1ab_0 conda-forge
joblib 1.1.0 pyhd8ed1ab_0 conda-forge
jpeg 9d h36c2ea0_0 conda-forge
jsonpickle 1.5.2 pypi_0 pypi
jsonschema 4.1.0 pyhd8ed1ab_0 conda-forge
jupyter 1.0.0 py37h89c1867_6 conda-forge
jupyter_client 6.1.12 pyhd8ed1ab_0 conda-forge
jupyter_console 6.4.0 pyhd8ed1ab_1 conda-forge
jupyter_core 4.8.1 py37h89c1867_0 conda-forge
jupyterlab_pygments 0.1.2 pyh9f0ad1d_0 conda-forge
jupyterlab_widgets 1.0.2 pyhd8ed1ab_0 conda-forge
jupytext 1.13.0 pyh6002c4b_0 conda-forge
kiwisolver 1.3.2 py37h2527ec5_0 conda-forge
krb5 1.19.2 hcc1bbae_2 conda-forge
lcms2 2.12 hddcbb42_0 conda-forge
ld_impl_linux-64 2.36.1 hea4e1c9_2 conda-forge
legacy-api-wrap 1.2 py_0 conda-forge
lerc 2.2.1 h9c3ff4c_0 conda-forge
libblas 3.9.0 11_linux64_mkl conda-forge
libcblas 3.9.0 11_linux64_mkl conda-forge
libclang 11.1.0 default_ha53f305_1 conda-forge
libcurl 7.79.1 h2574ce0_1 conda-forge
libdeflate 1.7 h7f98852_5 conda-forge
libedit 3.1.20191231 he28a2e2_2 conda-forge
libev 4.33 h516909a_1 conda-forge
libevent 2.1.10 h9b69904_4 conda-forge
libffi 3.4.2 h9c3ff4c_4 conda-forge
libgcc-ng 11.2.0 h1d223b6_10 conda-forge
libgfortran-ng 11.2.0 h69a702a_10 conda-forge
libgfortran5 11.2.0 h5c6108e_10 conda-forge
libglib 2.68.4 h174f98d_1 conda-forge
libiconv 1.16 h516909a_0 conda-forge
liblapack 3.9.0 11_linux64_mkl conda-forge
liblapacke 3.9.0 11_linux64_mkl conda-forge
libllvm10 10.0.1 he513fc3_3 conda-forge
libllvm11 11.1.0 hf817b99_2 conda-forge
libnghttp2 1.43.0 h812cca2_1 conda-forge
libogg 1.3.4 h7f98852_1 conda-forge
libopus 1.3.1 h7f98852_1 conda-forge
libpng 1.6.37 h21135ba_2 conda-forge
libpq 13.3 hd57d9b9_1 conda-forge
libsodium 1.0.18 h36c2ea0_1 conda-forge
libssh2 1.10.0 ha56f1ee_2 conda-forge
libstdcxx-ng 11.2.0 he4da1e4_10 conda-forge
libtiff 4.3.0 hf544144_1 conda-forge
libuuid 2.32.1 h7f98852_1000 conda-forge
libuv 1.42.0 h7f98852_0 conda-forge
libvorbis 1.3.7 h9c3ff4c_0 conda-forge
libwebp-base 1.2.1 h7f98852_0 conda-forge
libxcb 1.13 h7f98852_1003 conda-forge
libxkbcommon 1.0.3 he3ba5ed_0 conda-forge
libxml2 2.9.12 h72842e0_0 conda-forge
libzlib 1.2.11 h36c2ea0_1013 conda-forge
llvm-openmp 12.0.1 h4bd325d_1 conda-forge
llvmlite 0.36.0 py37h9d7f4d0_0 conda-forge
lz4-c 1.9.3 h9c3ff4c_1 conda-forge
lzo 2.10 h516909a_1000 conda-forge
markdown-it-py 1.1.0 pyhd8ed1ab_0 conda-forge
markupsafe 2.0.1 py37h5e8e339_0 conda-forge
matplotlib 3.4.3 py37h89c1867_1 conda-forge
matplotlib-base 3.4.3 py37h1058ff1_1 conda-forge
matplotlib-inline 0.1.3 pyhd8ed1ab_0 conda-forge
mdit-py-plugins 0.2.8 pyhd8ed1ab_0 conda-forge
mistune 0.8.4 py37h5e8e339_1004 conda-forge
mkl 2021.3.0 h06a4308_520
mkl-devel 2021.3.0 h66538d2_520
mkl-include 2021.3.0 h06a4308_520
mock 4.0.3 py37h89c1867_1 conda-forge
munch 2.5.0 pypi_0 pypi
mysql-common 8.0.25 ha770c72_2 conda-forge
mysql-libs 8.0.25 hfa10184_2 conda-forge
natsort 7.1.1 pyhd8ed1ab_0 conda-forge
nbclient 0.5.4 pyhd8ed1ab_0 conda-forge
nbconvert 6.2.0 py37h89c1867_0 conda-forge
nbformat 5.1.3 pyhd8ed1ab_0 conda-forge
ncurses 6.2 h58526e2_4 conda-forge
nest-asyncio 1.5.1 pyhd8ed1ab_0 conda-forge
networkx 2.6.3 pyhd8ed1ab_0 conda-forge
ninja 1.10.2 h4bd325d_1 conda-forge
notebook 6.4.4 pyha770c72_0 conda-forge
nspr 4.30 h9c3ff4c_0 conda-forge
nss 3.69 hb5efdd6_1 conda-forge
numba 0.53.1 py37hb11d6e1_1 conda-forge
numexpr 2.7.3 py37he8f5f7f_0 conda-forge
numpy 1.21.2 py37h31617e3_0 conda-forge
olefile 0.46 pyh9f0ad1d_1 conda-forge
openjpeg 2.4.0 hb52868f_1 conda-forge
openssl 1.1.1l h7f98852_0 conda-forge
packaging 21.0 pyhd8ed1ab_0 conda-forge
pandas 1.3.3 py37he8f5f7f_0 conda-forge
pandoc 2.14.2 h7f98852_0 conda-forge
pandocfilters 1.5.0 pyhd8ed1ab_0 conda-forge
parso 0.8.2 pyhd8ed1ab_0 conda-forge
patsy 0.5.2 pyhd8ed1ab_0 conda-forge
pcre 8.45 h9c3ff4c_0 conda-forge
pcre2 10.37 h032f7d1_0 conda-forge
perl 5.32.1 1_h7f98852_perl5 conda-forge
pexpect 4.8.0 pyh9f0ad1d_2 conda-forge
pickleshare 0.7.5 py37hc8dfbb8_1002 conda-forge
pillow 8.3.2 py37h0f21c89_0 conda-forge
pip 21.3 pyhd8ed1ab_0 conda-forge
pixman 0.40.0 h36c2ea0_0 conda-forge
prometheus_client 0.11.0 pyhd8ed1ab_0 conda-forge
prompt-toolkit 3.0.20 pyha770c72_0 conda-forge
prompt_toolkit 3.0.20 hd8ed1ab_0 conda-forge
pthread-stubs 0.4 h36c2ea0_1001 conda-forge
ptyprocess 0.7.0 pyhd3deb0d_0 conda-forge
py-cpuinfo 8.0.0 pypi_0 pypi
pycairo 1.20.1 py37hfff247e_0 conda-forge
pycparser 2.20 pyh9f0ad1d_2 conda-forge
pygments 2.10.0 pyhd8ed1ab_0 conda-forge
pymongo 3.12.0 pypi_0 pypi
pynndescent 0.5.4 pyh6c4a22f_0 conda-forge
pyopenssl 21.0.0 pyhd8ed1ab_0 conda-forge
pyparsing 2.4.7 pyh9f0ad1d_0 conda-forge
pyqt 5.12.3 py37h89c1867_7 conda-forge
pyqt-impl 5.12.3 py37he336c9b_7 conda-forge
pyqt5-sip 4.19.18 py37hcd2ae1e_7 conda-forge
pyqtchart 5.12 py37he336c9b_7 conda-forge
pyqtwebengine 5.12.1 py37he336c9b_7 conda-forge
pyrsistent 0.17.3 py37h5e8e339_2 conda-forge
pysocks 1.7.1 py37h89c1867_3 conda-forge
pytables 3.6.1 py37h5dea08b_4 conda-forge
python 3.7.10 hb7a2778_102_cpython conda-forge
python-dateutil 2.8.2 pyhd8ed1ab_0 conda-forge
python_abi 3.7 2_cp37m conda-forge
pytorch 1.9.1 py3.7_cuda10.2_cudnn7.6.5_0 pytorch
pytz 2021.3 pyhd8ed1ab_0 conda-forge
pyyaml 5.4.1 py37h5e8e339_1 conda-forge
pyzmq 22.3.0 py37h336d617_0 conda-forge
qt 5.12.9 hda022c4_4 conda-forge
qtconsole 5.1.1 pyhd8ed1ab_0 conda-forge
qtpy 1.11.2 pyhd8ed1ab_0 conda-forge
rdkit 2021.03.5 py37h13c2175_0 conda-forge
readline 8.1 h46c0cb4_0 conda-forge
reportlab 3.5.68 py37h69800bb_0 conda-forge
requests 2.26.0 pyhd8ed1ab_0 conda-forge
sacred 0.8.2 pypi_0 pypi
scanpy 1.8.1 pyhd8ed1ab_0 conda-forge
scikit-learn 1.0 py37hf0f1638_1 conda-forge
scipy 1.7.1 py37hf2a6cf1_0 conda-forge
seaborn 0.11.2 hd8ed1ab_0 conda-forge
seaborn-base 0.11.2 pyhd8ed1ab_0 conda-forge
seml 0.3.4 pypi_0 pypi
send2trash 1.8.0 pyhd8ed1ab_0 conda-forge
setuptools 58.2.0 py37h89c1867_0 conda-forge
sinfo 0.3.1 py_0 conda-forge
six 1.16.0 pyh6c4a22f_0 conda-forge
smmap 4.0.0 pypi_0 pypi
snowballstemmer 2.1.0 pyhd8ed1ab_0 conda-forge
sphinx 4.2.0 pyh6c4a22f_0 conda-forge
sphinxcontrib-applehelp 1.0.2 py_0 conda-forge
sphinxcontrib-devhelp 1.0.2 py_0 conda-forge
sphinxcontrib-htmlhelp 2.0.0 pyhd8ed1ab_0 conda-forge
sphinxcontrib-jsmath 1.0.1 py_0 conda-forge
sphinxcontrib-qthelp 1.0.3 py_0 conda-forge
sphinxcontrib-serializinghtml 1.1.5 pyhd8ed1ab_0 conda-forge
sqlalchemy 1.4.25 py37h5e8e339_0 conda-forge
sqlite 3.36.0 h9cd32fc_2 conda-forge
statsmodels 0.13.0 py37hb1e94ed_0 conda-forge
stdlib-list 0.7.0 py_2 conda-forge
submitit 1.2.1 pyh44b312d_0 conda-forge
tbb 2020.2 h4bd325d_4 conda-forge
terminado 0.12.1 py37h89c1867_0 conda-forge
testpath 0.5.0 pyhd8ed1ab_0 conda-forge
threadpoolctl 3.0.0 pyh8a188c0_0 conda-forge
tk 8.6.11 h27826a3_1 conda-forge
toml 0.10.2 pyhd8ed1ab_0 conda-forge
tornado 6.1 py37h5e8e339_1 conda-forge
tqdm 4.62.3 pyhd8ed1ab_0 conda-forge
traitlets 5.1.0 pyhd8ed1ab_0 conda-forge
typing_extensions 3.10.0.2 pyha770c72_0 conda-forge
tzdata 2021c he74cb21_0 conda-forge
umap-learn 0.5.1 py37h89c1867_1 conda-forge
urllib3 1.26.7 pyhd8ed1ab_0 conda-forge
wcwidth 0.2.5 pyh9f0ad1d_2 conda-forge
webencodings 0.5.1 py_1 conda-forge
wheel 0.37.0 pyhd8ed1ab_1 conda-forge
widgetsnbextension 3.5.1 py37h89c1867_4 conda-forge
wrapt 1.13.1 pypi_0 pypi
xorg-kbproto 1.0.7 h7f98852_1002 conda-forge
xorg-libice 1.0.10 h7f98852_0 conda-forge
xorg-libsm 1.2.3 hd9c2040_1000 conda-forge
xorg-libx11 1.7.2 h7f98852_0 conda-forge
xorg-libxau 1.0.9 h7f98852_0 conda-forge
xorg-libxdmcp 1.1.3 h7f98852_0 conda-forge
xorg-libxext 1.3.4 h7f98852_1 conda-forge
xorg-libxrender 0.9.10 h7f98852_1003 conda-forge
xorg-renderproto 0.11.1 h7f98852_1002 conda-forge
xorg-xextproto 7.3.0 h7f98852_1002 conda-forge
xorg-xproto 7.0.31 h7f98852_1007 conda-forge
xz 5.2.5 h516909a_1 conda-forge
yaml 0.2.5 h516909a_0 conda-forge
zeromq 4.3.4 h9c3ff4c_1 conda-forge
zipp 3.6.0 pyhd8ed1ab_0 conda-forge
zlib 1.2.11 h36c2ea0_1013 conda-forge
zstd 1.5.0 ha95c52a_0 conda-forge
Since `seml [...] start' requires a config file, I would expect that also just the queued experiments of this configuration file are started.
All (?!) queued experiments are started.
seml seml/configs/train_1.yaml queue
2020-04-30 08:57:20 (INFO): Queueing 6 configs into the database (batch-ID 4).
seml seml/configs/train_2.yaml queue
2020-04-30 08:57:40 (INFO): Queueing 3 configs into the database (batch-ID 5).
seml start
ERROR: Please provide a path to the config file.
seml seml/configs/train_1.yaml start
Starting 9 experiments in 9 Slurm jobs in 2 Slurm job arrays.
conda list
): ...status command giving the status of the running/completed etc. experiments
giving warning for each pending experiment and killing them.
If experiments is still QUEUED or COMPLETED. Status command behaves as expected.
WARNING: File /nfs/homedirs/sirin/seml/examples/slurm/example_experiment_5217208_0.out could not be read.
WARNING: File /nfs/homedirs/sirin/seml/examples/slurm/example_experiment_5217208_1.out could not be read.
WARNING: File /nfs/homedirs/sirin/seml/examples/slurm/example_experiment_5217208_2.out could not be read.
WARNING: File /nfs/homedirs/sirin/seml/examples/slurm/example_experiment_5217208_3.out could not be read.
WARNING: File /nfs/homedirs/sirin/seml/examples/slurm/example_experiment_5217208_4.out could not be read.
WARNING: File /nfs/homedirs/sirin/seml/examples/slurm/example_experiment_5217208_5.out could not be read.
WARNING: File /nfs/homedirs/sirin/seml/examples/slurm/example_experiment_5217208_6.out could not be read.
WARNING: File /nfs/homedirs/sirin/seml/examples/slurm/example_experiment_5217208_7.out could not be read.
WARNING: File /nfs/homedirs/sirin/seml/examples/slurm/example_experiment_5217208_8.out could not be read.
WARNING: File /nfs/homedirs/sirin/seml/examples/slurm/example_experiment_5217208_9.out could not be read.
WARNING: File /nfs/homedirs/sirin/seml/examples/slurm/example_experiment_5217208_10.out could not be read.
WARNING: File /nfs/homedirs/sirin/seml/examples/slurm/example_experiment_5217208_11.out could not be read.
WARNING: File /nfs/homedirs/sirin/seml/examples/slurm/example_experiment_5217208_12.out could not be read.
WARNING: File /nfs/homedirs/sirin/seml/examples/slurm/example_experiment_5217208_13.out could not be read.
WARNING: File /nfs/homedirs/sirin/seml/examples/slurm/example_experiment_5217208_14.out could not be read.
WARNING: File /nfs/homedirs/sirin/seml/examples/slurm/example_experiment_5217208_15.out could not be read.
WARNING: File /nfs/homedirs/sirin/seml/examples/slurm/example_experiment_5217208_16.out could not be read.
WARNING: File /nfs/homedirs/sirin/seml/examples/slurm/example_experiment_5217208_17.out could not be read.
WARNING: File /nfs/homedirs/sirin/seml/examples/slurm/example_experiment_5217208_18.out could not be read.
WARNING: File /nfs/homedirs/sirin/seml/examples/slurm/example_experiment_5217208_19.out could not be read.
WARNING: File /nfs/homedirs/sirin/seml/examples/slurm/example_experiment_5217208_20.out could not be read.
WARNING: File /nfs/homedirs/sirin/seml/examples/slurm/example_experiment_5217208_21.out could not be read.
WARNING: File /nfs/homedirs/sirin/seml/examples/slurm/example_experiment_5217208_22.out could not be read.
WARNING: File /nfs/homedirs/sirin/seml/examples/slurm/example_experiment_5217208_23.out could not be read.
WARNING: File /nfs/homedirs/sirin/seml/examples/slurm/example_experiment_5217208_24.out could not be read.
WARNING: File /nfs/homedirs/sirin/seml/examples/slurm/example_experiment_5217208_25.out could not be read.
WARNING: File /nfs/homedirs/sirin/seml/examples/slurm/example_experiment_5217208_26.out could not be read.
WARNING: File /nfs/homedirs/sirin/seml/examples/slurm/example_experiment_5217208_27.out could not be read.
WARNING: File /nfs/homedirs/sirin/seml/examples/slurm/example_experiment_5217208_28.out could not be read.
WARNING: File /nfs/homedirs/sirin/seml/examples/slurm/example_experiment_5217208_29.out could not be read.
WARNING: File /nfs/homedirs/sirin/seml/examples/slurm/example_experiment_5217208_30.out could not be read.
WARNING: File /nfs/homedirs/sirin/seml/examples/slurm/example_experiment_5217208_31.out could not be read.
WARNING: File /nfs/homedirs/sirin/seml/examples/slurm/example_experiment_5217208_32.out could not be read.
WARNING: File /nfs/homedirs/sirin/seml/examples/slurm/example_experiment_5217208_33.out could not be read.
WARNING: File /nfs/homedirs/sirin/seml/examples/slurm/example_experiment_5217208_34.out could not be read.
WARNING: File /nfs/homedirs/sirin/seml/examples/slurm/example_experiment_5217208_35.out could not be read.
WARNING: File /nfs/homedirs/sirin/seml/examples/slurm/example_experiment_5217208_36.out could not be read.
WARNING: File /nfs/homedirs/sirin/seml/examples/slurm/example_experiment_5217208_37.out could not be read.
WARNING: File /nfs/homedirs/sirin/seml/examples/slurm/example_experiment_5217208_38.out could not be read.
WARNING: File /nfs/homedirs/sirin/seml/examples/slurm/example_experiment_5217208_39.out could not be read.
WARNING: File /nfs/homedirs/sirin/seml/examples/slurm/example_experiment_5217208_40.out could not be read.
WARNING: File /nfs/homedirs/sirin/seml/examples/slurm/example_experiment_5217208_41.out could not be read.
WARNING: File /nfs/homedirs/sirin/seml/examples/slurm/example_experiment_5217208_42.out could not be read.
WARNING: File /nfs/homedirs/sirin/seml/examples/slurm/example_experiment_5217208_43.out could not be read.
WARNING: File /nfs/homedirs/sirin/seml/examples/slurm/example_experiment_5217208_44.out could not be read.
WARNING: File /nfs/homedirs/sirin/seml/examples/slurm/example_experiment_5217208_45.out could not be read.
WARNING: File /nfs/homedirs/sirin/seml/examples/slurm/example_experiment_5217208_46.out could not be read.
WARNING: File /nfs/homedirs/sirin/seml/examples/slurm/example_experiment_5217208_47.out could not be read.
WARNING: File /nfs/homedirs/sirin/seml/examples/slurm/example_experiment_5217208_48.out could not be read.
WARNING: File /nfs/homedirs/sirin/seml/examples/slurm/example_experiment_5217208_49.out could not be read.
WARNING: File /nfs/homedirs/sirin/seml/examples/slurm/example_experiment_5217208_50.out could not be read.
WARNING: File /nfs/homedirs/sirin/seml/examples/slurm/example_experiment_5217208_51.out could not be read.
WARNING: File /nfs/homedirs/sirin/seml/examples/slurm/example_experiment_5217208_52.out could not be read.
WARNING: File /nfs/homedirs/sirin/seml/examples/slurm/example_experiment_5217208_53.out could not be read.
WARNING: File /nfs/homedirs/sirin/seml/examples/slurm/example_experiment_5217208_54.out could not be read.
WARNING: File /nfs/homedirs/sirin/seml/examples/slurm/example_experiment_5217208_55.out could not be read.
WARNING: File /nfs/homedirs/sirin/seml/examples/slurm/example_experiment_5217208_56.out could not be read.
WARNING: File /nfs/homedirs/sirin/seml/examples/slurm/example_experiment_5217208_57.out could not be read.
WARNING: File /nfs/homedirs/sirin/seml/examples/slurm/example_experiment_5217208_58.out could not be read.
WARNING: File /nfs/homedirs/sirin/seml/examples/slurm/example_experiment_5217208_59.out could not be read.
WARNING: File /nfs/homedirs/sirin/seml/examples/slurm/example_experiment_5217208_60.out could not be read.
WARNING: File /nfs/homedirs/sirin/seml/examples/slurm/example_experiment_5217208_61.out could not be read.
WARNING: File /nfs/homedirs/sirin/seml/examples/slurm/example_experiment_5217208_62.out could not be read.
WARNING: File /nfs/homedirs/sirin/seml/examples/slurm/example_experiment_5217208_63.out could not be read.
WARNING: File /nfs/homedirs/sirin/seml/examples/slurm/example_experiment_5217208_64.out could not be read.
WARNING: File /nfs/homedirs/sirin/seml/examples/slurm/example_experiment_5217208_65.out could not be read.
WARNING: File /nfs/homedirs/sirin/seml/examples/slurm/example_experiment_5217208_66.out could not be read.
WARNING: File /nfs/homedirs/sirin/seml/examples/slurm/example_experiment_5217208_67.out could not be read.
WARNING: File /nfs/homedirs/sirin/seml/examples/slurm/example_experiment_5217208_68.out could not be read.
WARNING: File /nfs/homedirs/sirin/seml/examples/slurm/example_experiment_5217208_69.out could not be read.
WARNING: File /nfs/homedirs/sirin/seml/examples/slurm/example_experiment_5217208_70.out could not be read.
WARNING: File /nfs/homedirs/sirin/seml/examples/slurm/example_experiment_5217208_71.out could not be read.
WARNING: File /nfs/homedirs/sirin/seml/examples/slurm/example_experiment_5217208_72.out could not be read.
WARNING: File /nfs/homedirs/sirin/seml/examples/slurm/example_experiment_5217208_73.out could not be read.
WARNING: File /nfs/homedirs/sirin/seml/examples/slurm/example_experiment_5217208_74.out could not be read.
WARNING: File /nfs/homedirs/sirin/seml/examples/slurm/example_experiment_5217208_75.out could not be read.
WARNING: File /nfs/homedirs/sirin/seml/examples/slurm/example_experiment_5217208_76.out could not be read.
WARNING: File /nfs/homedirs/sirin/seml/examples/slurm/example_experiment_5217208_77.out could not be read.
WARNING: File /nfs/homedirs/sirin/seml/examples/slurm/example_experiment_5217208_78.out could not be read.
WARNING: File /nfs/homedirs/sirin/seml/examples/slurm/example_experiment_5217208_79.out could not be read.
WARNING: File /nfs/homedirs/sirin/seml/examples/slurm/example_experiment_5217208_80.out could not be read.
WARNING: File /nfs/homedirs/sirin/seml/examples/slurm/example_experiment_5217208_81.out could not be read.
WARNING: File /nfs/homedirs/sirin/seml/examples/slurm/example_experiment_5217208_82.out could not be read.
WARNING: File /nfs/homedirs/sirin/seml/examples/slurm/example_experiment_5217208_83.out could not be read.
WARNING: File /nfs/homedirs/sirin/seml/examples/slurm/example_experiment_5217208_84.out could not be read.
WARNING: File /nfs/homedirs/sirin/seml/examples/slurm/example_experiment_5217208_85.out could not be read.
WARNING: File /nfs/homedirs/sirin/seml/examples/slurm/example_experiment_5217208_86.out could not be read.
WARNING: File /nfs/homedirs/sirin/seml/examples/slurm/example_experiment_5217208_87.out could not be read.
WARNING: File /nfs/homedirs/sirin/seml/examples/slurm/example_experiment_5217208_88.out could not be read.
WARNING: File /nfs/homedirs/sirin/seml/examples/slurm/example_experiment_5217208_89.out could not be read.
WARNING: File /nfs/homedirs/sirin/seml/examples/slurm/example_experiment_5217208_90.out could not be read.
WARNING: File /nfs/homedirs/sirin/seml/examples/slurm/example_experiment_5217208_91.out could not be read.
WARNING: File /nfs/homedirs/sirin/seml/examples/slurm/example_experiment_5217208_92.out could not be read.
WARNING: File /nfs/homedirs/sirin/seml/examples/slurm/example_experiment_5217208_93.out could not be read.
WARNING: File /nfs/homedirs/sirin/seml/examples/slurm/example_experiment_5217208_94.out could not be read.
WARNING: File /nfs/homedirs/sirin/seml/examples/slurm/example_experiment_5217208_95.out could not be read.
WARNING: File /nfs/homedirs/sirin/seml/examples/slurm/example_experiment_5217208_96.out could not be read.
WARNING: File /nfs/homedirs/sirin/seml/examples/slurm/example_experiment_5217208_97.out could not be read.
WARNING: File /nfs/homedirs/sirin/seml/examples/slurm/example_experiment_5217208_98.out could not be read.
WARNING: File /nfs/homedirs/sirin/seml/examples/slurm/example_experiment_5217208_99.out could not be read.
WARNING: File /nfs/homedirs/sirin/seml/examples/slurm/example_experiment_5217208_100.out could not be read.
WARNING: File /nfs/homedirs/sirin/seml/examples/slurm/example_experiment_5217208_101.out could not be read.
WARNING: File /nfs/homedirs/sirin/seml/examples/slurm/example_experiment_5217208_102.out could not be read.
WARNING: File /nfs/homedirs/sirin/seml/examples/slurm/example_experiment_5217208_103.out could not be read.
WARNING: File /nfs/homedirs/sirin/seml/examples/slurm/example_experiment_5217208_104.out could not be read.
WARNING: File /nfs/homedirs/sirin/seml/examples/slurm/example_experiment_5217208_105.out could not be read.
WARNING: File /nfs/homedirs/sirin/seml/examples/slurm/example_experiment_5217208_106.out could not be read.
WARNING: File /nfs/homedirs/sirin/seml/examples/slurm/example_experiment_5217208_107.out could not be read.
WARNING: File /nfs/homedirs/sirin/seml/examples/slurm/example_experiment_5217208_108.out could not be read.
WARNING: File /nfs/homedirs/sirin/seml/examples/slurm/example_experiment_5217208_109.out could not be read.
WARNING: File /nfs/homedirs/sirin/seml/examples/slurm/example_experiment_5217208_110.out could not be read.
WARNING: File /nfs/homedirs/sirin/seml/examples/slurm/example_experiment_5217208_111.out could not be read.
WARNING: File /nfs/homedirs/sirin/seml/examples/slurm/example_experiment_5217208_112.out could not be read.
WARNING: File /nfs/homedirs/sirin/seml/examples/slurm/example_experiment_5217208_113.out could not be read.
WARNING: File /nfs/homedirs/sirin/seml/examples/slurm/example_experiment_5217208_114.out could not be read.
WARNING: File /nfs/homedirs/sirin/seml/examples/slurm/example_experiment_5217208_115.out could not be read.
WARNING: File /nfs/homedirs/sirin/seml/examples/slurm/example_experiment_5217208_116.out could not be read.
WARNING: File /nfs/homedirs/sirin/seml/examples/slurm/example_experiment_5217208_117.out could not be read.
WARNING: File /nfs/homedirs/sirin/seml/examples/slurm/example_experiment_5217208_118.out could not be read.
WARNING: File /nfs/homedirs/sirin/seml/examples/slurm/example_experiment_5217208_119.out could not be read.
WARNING: File /nfs/homedirs/sirin/seml/examples/slurm/example_experiment_5217208_120.out could not be read.
WARNING: File /nfs/homedirs/sirin/seml/examples/slurm/example_experiment_5217208_121.out could not be read.
WARNING: File /nfs/homedirs/sirin/seml/examples/slurm/example_experiment_5217208_122.out could not be read.
WARNING: File /nfs/homedirs/sirin/seml/examples/slurm/example_experiment_5217208_123.out could not be read.
WARNING: File /nfs/homedirs/sirin/seml/examples/slurm/example_experiment_5217208_124.out could not be read.
WARNING: File /nfs/homedirs/sirin/seml/examples/slurm/example_experiment_5217208_125.out could not be read.
WARNING: File /nfs/homedirs/sirin/seml/examples/slurm/example_experiment_5217208_126.out could not be read.
WARNING: File /nfs/homedirs/sirin/seml/examples/slurm/example_experiment_5217208_127.out could not be read.
WARNING: File /nfs/homedirs/sirin/seml/examples/slurm/example_experiment_5217208_128.out could not be read.
WARNING: File /nfs/homedirs/sirin/seml/examples/slurm/example_experiment_5217208_129.out could not be read.
WARNING: File /nfs/homedirs/sirin/seml/examples/slurm/example_experiment_5217208_130.out could not be read.
WARNING: File /nfs/homedirs/sirin/seml/examples/slurm/example_experiment_5217208_131.out could not be read.
WARNING: File /nfs/homedirs/sirin/seml/examples/slurm/example_experiment_5217208_132.out could not be read.
WARNING: File /nfs/homedirs/sirin/seml/examples/slurm/example_experiment_5217208_133.out could not be read.
WARNING: File /nfs/homedirs/sirin/seml/examples/slurm/example_experiment_5217208_134.out could not be read.
WARNING: File /nfs/homedirs/sirin/seml/examples/slurm/example_experiment_5217208_135.out could not be read.
WARNING: File /nfs/homedirs/sirin/seml/examples/slurm/example_experiment_5217208_136.out could not be read.
WARNING: File /nfs/homedirs/sirin/seml/examples/slurm/example_experiment_5217208_137.out could not be read.
WARNING: File /nfs/homedirs/sirin/seml/examples/slurm/example_experiment_5217208_138.out could not be read.
WARNING: File /nfs/homedirs/sirin/seml/examples/slurm/example_experiment_5217208_139.out could not be read.
WARNING: File /nfs/homedirs/sirin/seml/examples/slurm/example_experiment_5217208_140.out could not be read.
WARNING: File /nfs/homedirs/sirin/seml/examples/slurm/example_experiment_5217208_141.out could not be read.
WARNING: File /nfs/homedirs/sirin/seml/examples/slurm/example_experiment_5217208_142.out could not be read.
WARNING: File /nfs/homedirs/sirin/seml/examples/slurm/example_experiment_5217208_143.out could not be read.
WARNING: File /nfs/homedirs/sirin/seml/examples/slurm/example_experiment_5217208_144.out could not be read.
WARNING: File /nfs/homedirs/sirin/seml/examples/slurm/example_experiment_5217208_145.out could not be read.
WARNING: File /nfs/homedirs/sirin/seml/examples/slurm/example_experiment_5217208_146.out could not be read.
WARNING: File /nfs/homedirs/sirin/seml/examples/slurm/example_experiment_5217208_147.out could not be read.
WARNING: File /nfs/homedirs/sirin/seml/examples/slurm/example_experiment_5217208_148.out could not be read.
WARNING: File /nfs/homedirs/sirin/seml/examples/slurm/example_experiment_5217208_149.out could not be read.
WARNING: File /nfs/homedirs/sirin/seml/examples/slurm/example_experiment_5217208_150.out could not be read.
WARNING: File /nfs/homedirs/sirin/seml/examples/slurm/example_experiment_5217208_151.out could not be read.
WARNING: File /nfs/homedirs/sirin/seml/examples/slurm/example_experiment_5217208_152.out could not be read.
WARNING: File /nfs/homedirs/sirin/seml/examples/slurm/example_experiment_5217208_153.out could not be read.
WARNING: File /nfs/homedirs/sirin/seml/examples/slurm/example_experiment_5217208_154.out could not be read.
WARNING: File /nfs/homedirs/sirin/seml/examples/slurm/example_experiment_5217208_155.out could not be read.
WARNING: File /nfs/homedirs/sirin/seml/examples/slurm/example_experiment_5217208_156.out could not be read.
WARNING: File /nfs/homedirs/sirin/seml/examples/slurm/example_experiment_5217208_157.out could not be read.
WARNING: File /nfs/homedirs/sirin/seml/examples/slurm/example_experiment_5217208_158.out could not be read.
WARNING: File /nfs/homedirs/sirin/seml/examples/slurm/example_experiment_5217208_159.out could not be read.
WARNING: File /nfs/homedirs/sirin/seml/examples/slurm/example_experiment_5217208_160.out could not be read.
WARNING: File /nfs/homedirs/sirin/seml/examples/slurm/example_experiment_5217208_161.out could not be read.
WARNING: File /nfs/homedirs/sirin/seml/examples/slurm/example_experiment_5217208_162.out could not be read.
WARNING: File /nfs/homedirs/sirin/seml/examples/slurm/example_experiment_5217208_163.out could not be read.
WARNING: File /nfs/homedirs/sirin/seml/examples/slurm/example_experiment_5217208_164.out could not be read.
WARNING: File /nfs/homedirs/sirin/seml/examples/slurm/example_experiment_5217208_165.out could not be read.
WARNING: File /nfs/homedirs/sirin/seml/examples/slurm/example_experiment_5217208_166.out could not be read.
WARNING: File /nfs/homedirs/sirin/seml/examples/slurm/example_experiment_5217208_167.out could not be read.
WARNING: File /nfs/homedirs/sirin/seml/examples/slurm/example_experiment_5217208_168.out could not be read.
WARNING: File /nfs/homedirs/sirin/seml/examples/slurm/example_experiment_5217208_169.out could not be read.
WARNING: File /nfs/homedirs/sirin/seml/examples/slurm/example_experiment_5217208_170.out could not be read.
WARNING: File /nfs/homedirs/sirin/seml/examples/slurm/example_experiment_5217208_171.out could not be read.
WARNING: File /nfs/homedirs/sirin/seml/examples/slurm/example_experiment_5217208_172.out could not be read.
WARNING: File /nfs/homedirs/sirin/seml/examples/slurm/example_experiment_5217208_173.out could not be read.
WARNING: File /nfs/homedirs/sirin/seml/examples/slurm/example_experiment_5217208_174.out could not be read.
WARNING: File /nfs/homedirs/sirin/seml/examples/slurm/example_experiment_5217208_175.out could not be read.
WARNING: File /nfs/homedirs/sirin/seml/examples/slurm/example_experiment_5217208_176.out could not be read.
WARNING: File /nfs/homedirs/sirin/seml/examples/slurm/example_experiment_5217208_177.out could not be read.
WARNING: File /nfs/homedirs/sirin/seml/examples/slurm/example_experiment_5217208_178.out could not be read.
WARNING: File /nfs/homedirs/sirin/seml/examples/slurm/example_experiment_5217208_179.out could not be read.
WARNING: File /nfs/homedirs/sirin/seml/examples/slurm/example_experiment_5217208_180.out could not be read.
WARNING: File /nfs/homedirs/sirin/seml/examples/slurm/example_experiment_5217208_181.out could not be read.
WARNING: File /nfs/homedirs/sirin/seml/examples/slurm/example_experiment_5217208_182.out could not be read.
WARNING: File /nfs/homedirs/sirin/seml/examples/slurm/example_experiment_5217208_183.out could not be read.
WARNING: File /nfs/homedirs/sirin/seml/examples/slurm/example_experiment_5217208_184.out could not be read.
WARNING: File /nfs/homedirs/sirin/seml/examples/slurm/example_experiment_5217208_185.out could not be read.
WARNING: File /nfs/homedirs/sirin/seml/examples/slurm/example_experiment_5217208_186.out could not be read.
WARNING: File /nfs/homedirs/sirin/seml/examples/slurm/example_experiment_5217208_187.out could not be read.
WARNING: File /nfs/homedirs/sirin/seml/examples/slurm/example_experiment_5217208_188.out could not be read.
WARNING: File /nfs/homedirs/sirin/seml/examples/slurm/example_experiment_5217208_189.out could not be read.
WARNING: File /nfs/homedirs/sirin/seml/examples/slurm/example_experiment_5217208_190.out could not be read.
WARNING: File /nfs/homedirs/sirin/seml/examples/slurm/example_experiment_5217208_191.out could not be read.
WARNING: File /nfs/homedirs/sirin/seml/examples/slurm/example_experiment_5217208_192.out could not be read.
WARNING: File /nfs/homedirs/sirin/seml/examples/slurm/example_experiment_5217208_193.out could not be read.
WARNING: File /nfs/homedirs/sirin/seml/examples/slurm/example_experiment_5217208_194.out could not be read.
WARNING: File /nfs/homedirs/sirin/seml/examples/slurm/example_experiment_5217208_195.out could not be read.
WARNING: File /nfs/homedirs/sirin/seml/examples/slurm/example_experiment_5217208_196.out could not be read.
WARNING: File /nfs/homedirs/sirin/seml/examples/slurm/example_experiment_5217208_197.out could not be read.
WARNING: File /nfs/homedirs/sirin/seml/examples/slurm/example_experiment_5217208_198.out could not be read.
WARNING: File /nfs/homedirs/sirin/seml/examples/slurm/example_experiment_5217208_199.out could not be read.
WARNING: File /nfs/homedirs/sirin/seml/examples/slurm/example_experiment_5217208_200.out could not be read.
WARNING: File /nfs/homedirs/sirin/seml/examples/slurm/example_experiment_5217208_201.out could not be read.
WARNING: File /nfs/homedirs/sirin/seml/examples/slurm/example_experiment_5217208_202.out could not be read.
WARNING: File /nfs/homedirs/sirin/seml/examples/slurm/example_experiment_5217208_203.out could not be read.
WARNING: File /nfs/homedirs/sirin/seml/examples/slurm/example_experiment_5217208_204.out could not be read.
WARNING: File /nfs/homedirs/sirin/seml/examples/slurm/example_experiment_5217208_205.out could not be read.
WARNING: File /nfs/homedirs/sirin/seml/examples/slurm/example_experiment_5217208_206.out could not be read.
WARNING: File /nfs/homedirs/sirin/seml/examples/slurm/example_experiment_5217208_207.out could not be read.
WARNING: File /nfs/homedirs/sirin/seml/examples/slurm/example_experiment_5217208_208.out could not be read.
WARNING: File /nfs/homedirs/sirin/seml/examples/slurm/example_experiment_5217208_209.out could not be read.
WARNING: File /nfs/homedirs/sirin/seml/examples/slurm/example_experiment_5217208_210.out could not be read.
WARNING: File /nfs/homedirs/sirin/seml/examples/slurm/example_experiment_5217208_211.out could not be read.
WARNING: File /nfs/homedirs/sirin/seml/examples/slurm/example_experiment_5217208_212.out could not be read.
WARNING: File /nfs/homedirs/sirin/seml/examples/slurm/example_experiment_5217208_213.out could not be read.
WARNING: File /nfs/homedirs/sirin/seml/examples/slurm/example_experiment_5217208_214.out could not be read.
WARNING: File /nfs/homedirs/sirin/seml/examples/slurm/example_experiment_5217208_215.out could not be read.
WARNING: File /nfs/homedirs/sirin/seml/examples/slurm/example_experiment_5217208_216.out could not be read.
WARNING: File /nfs/homedirs/sirin/seml/examples/slurm/example_experiment_5217208_217.out could not be read.
WARNING: File /nfs/homedirs/sirin/seml/examples/slurm/example_experiment_5217208_218.out could not be read.
WARNING: File /nfs/homedirs/sirin/seml/examples/slurm/example_experiment_5217208_219.out could not be read.
WARNING: File /nfs/homedirs/sirin/seml/examples/slurm/example_experiment_5217208_220.out could not be read.
WARNING: File /nfs/homedirs/sirin/seml/examples/slurm/example_experiment_5217208_221.out could not be read.
WARNING: File /nfs/homedirs/sirin/seml/examples/slurm/example_experiment_5217208_222.out could not be read.
WARNING: File /nfs/homedirs/sirin/seml/examples/slurm/example_experiment_5217208_223.out could not be read.
WARNING: File /nfs/homedirs/sirin/seml/examples/slurm/example_experiment_5217208_224.out could not be read.
WARNING: File /nfs/homedirs/sirin/seml/examples/slurm/example_experiment_5217208_225.out could not be read.
WARNING: File /nfs/homedirs/sirin/seml/examples/slurm/example_experiment_5217208_226.out could not be read.
WARNING: File /nfs/homedirs/sirin/seml/examples/slurm/example_experiment_5217208_227.out could not be read.
WARNING: File /nfs/homedirs/sirin/seml/examples/slurm/example_experiment_5217208_228.out could not be read.
WARNING: File /nfs/homedirs/sirin/seml/examples/slurm/example_experiment_5217208_229.out could not be read.
WARNING: File /nfs/homedirs/sirin/seml/examples/slurm/example_experiment_5217208_230.out could not be read.
WARNING: File /nfs/homedirs/sirin/seml/examples/slurm/example_experiment_5217208_231.out could not be read.
WARNING: File /nfs/homedirs/sirin/seml/examples/slurm/example_experiment_5217208_232.out could not be read.
WARNING: File /nfs/homedirs/sirin/seml/examples/slurm/example_experiment_5217208_233.out could not be read.
WARNING: File /nfs/homedirs/sirin/seml/examples/slurm/example_experiment_5217208_234.out could not be read.
WARNING: File /nfs/homedirs/sirin/seml/examples/slurm/example_experiment_5217208_235.out could not be read.
WARNING: File /nfs/homedirs/sirin/seml/examples/slurm/example_experiment_5217208_236.out could not be read.
WARNING: File /nfs/homedirs/sirin/seml/examples/slurm/example_experiment_5217208_237.out could not be read.
WARNING: File /nfs/homedirs/sirin/seml/examples/slurm/example_experiment_5217208_238.out could not be read.
WARNING: File /nfs/homedirs/sirin/seml/examples/slurm/example_experiment_5217208_239.out could not be read.
WARNING: File /nfs/homedirs/sirin/seml/examples/slurm/example_experiment_5217208_240.out could not be read.
WARNING: File /nfs/homedirs/sirin/seml/examples/slurm/example_experiment_5217208_241.out could not be read.
WARNING: File /nfs/homedirs/sirin/seml/examples/slurm/example_experiment_5217208_242.out could not be read.
WARNING: File /nfs/homedirs/sirin/seml/examples/slurm/example_experiment_5217208_243.out could not be read.
WARNING: File /nfs/homedirs/sirin/seml/examples/slurm/example_experiment_5217208_244.out could not be read.
WARNING: File /nfs/homedirs/sirin/seml/examples/slurm/example_experiment_5217208_245.out could not be read.
WARNING: File /nfs/homedirs/sirin/seml/examples/slurm/example_experiment_5217208_246.out could not be read.
WARNING: File /nfs/homedirs/sirin/seml/examples/slurm/example_experiment_5217208_247.out could not be read.
WARNING: File /nfs/homedirs/sirin/seml/examples/slurm/example_experiment_5217208_248.out could not be read.
WARNING: File /nfs/homedirs/sirin/seml/examples/slurm/example_experiment_5217208_249.out could not be read.
WARNING: File /nfs/homedirs/sirin/seml/examples/slurm/example_experiment_5217208_250.out could not be read.
WARNING: File /nfs/homedirs/sirin/seml/examples/slurm/example_experiment_5217208_251.out could not be read.
WARNING: File /nfs/homedirs/sirin/seml/examples/slurm/example_experiment_5217208_252.out could not be read.
WARNING: File /nfs/homedirs/sirin/seml/examples/slurm/example_experiment_5217208_253.out could not be read.
WARNING: File /nfs/homedirs/sirin/seml/examples/slurm/example_experiment_5217208_254.out could not be read.
WARNING: File /nfs/homedirs/sirin/seml/examples/slurm/example_experiment_5217208_255.out could not be read.
WARNING: File /nfs/homedirs/sirin/seml/examples/slurm/example_experiment_5217208_256.out could not be read.
WARNING: File /nfs/homedirs/sirin/seml/examples/slurm/example_experiment_5217208_257.out could not be read.
WARNING: File /nfs/homedirs/sirin/seml/examples/slurm/example_experiment_5217208_258.out could not be read.
WARNING: File /nfs/homedirs/sirin/seml/examples/slurm/example_experiment_5217208_259.out could not be read.
WARNING: File /nfs/homedirs/sirin/seml/examples/slurm/example_experiment_5217208_260.out could not be read.
WARNING: File /nfs/homedirs/sirin/seml/examples/slurm/example_experiment_5217208_261.out could not be read.
WARNING: File /nfs/homedirs/sirin/seml/examples/slurm/example_experiment_5217208_262.out could not be read.
WARNING: File /nfs/homedirs/sirin/seml/examples/slurm/example_experiment_5217208_263.out could not be read.
WARNING: File /nfs/homedirs/sirin/seml/examples/slurm/example_experiment_5217208_264.out could not be read.
WARNING: File /nfs/homedirs/sirin/seml/examples/slurm/example_experiment_5217208_265.out could not be read.
WARNING: File /nfs/homedirs/sirin/seml/examples/slurm/example_experiment_5217208_266.out could not be read.
WARNING: File /nfs/homedirs/sirin/seml/examples/slurm/example_experiment_5217208_267.out could not be read.
WARNING: File /nfs/homedirs/sirin/seml/examples/slurm/example_experiment_5217208_268.out could not be read.
WARNING: File /nfs/homedirs/sirin/seml/examples/slurm/example_experiment_5217208_269.out could not be read.
WARNING: File /nfs/homedirs/sirin/seml/examples/slurm/example_experiment_5217208_270.out could not be read.
WARNING: File /nfs/homedirs/sirin/seml/examples/slurm/example_experiment_5217208_271.out could not be read.
WARNING: File /nfs/homedirs/sirin/seml/examples/slurm/example_experiment_5217208_272.out could not be read.
WARNING: File /nfs/homedirs/sirin/seml/examples/slurm/example_experiment_5217208_273.out could not be read.
WARNING: File /nfs/homedirs/sirin/seml/examples/slurm/example_experiment_5217208_274.out could not be read.
WARNING: File /nfs/homedirs/sirin/seml/examples/slurm/example_experiment_5217208_275.out could not be read.
WARNING: File /nfs/homedirs/sirin/seml/examples/slurm/example_experiment_5217208_276.out could not be read.
WARNING: File /nfs/homedirs/sirin/seml/examples/slurm/example_experiment_5217208_277.out could not be read.
WARNING: File /nfs/homedirs/sirin/seml/examples/slurm/example_experiment_5217208_278.out could not be read.
WARNING: File /nfs/homedirs/sirin/seml/examples/slurm/example_experiment_5217208_279.out could not be read.
WARNING: File /nfs/homedirs/sirin/seml/examples/slurm/example_experiment_5217208_280.out could not be read.
WARNING: File /nfs/homedirs/sirin/seml/examples/slurm/example_experiment_5217208_281.out could not be read.
WARNING: File /nfs/homedirs/sirin/seml/examples/slurm/example_experiment_5217208_282.out could not be read.
WARNING: File /nfs/homedirs/sirin/seml/examples/slurm/example_experiment_5217208_283.out could not be read.
WARNING: File /nfs/homedirs/sirin/seml/examples/slurm/example_experiment_5217208_284.out could not be read.
WARNING: File /nfs/homedirs/sirin/seml/examples/slurm/example_experiment_5217208_285.out could not be read.
WARNING: File /nfs/homedirs/sirin/seml/examples/slurm/example_experiment_5217208_286.out could not be read.
WARNING: File /nfs/homedirs/sirin/seml/examples/slurm/example_experiment_5217208_287.out could not be read.
********** Report for database collection 'seml_example' **********
* - 0 queued experiments
* - 0 pending experiments
* - 0 running experiments
* - 0 completed experiments
* - 0 interrupted experiments
* - 0 failed experiments
* - 288 killed experiments
*******************************************************************
conda list
):_libgcc_mutex 0.1 conda_forge conda-forge
_openmp_mutex 4.5 1_llvm conda-forge
_tflow_select 2.1.0 gpu
absl-py 0.9.0 py37hc8dfbb8_1 conda-forge
astor 0.7.1 py_0 conda-forge
attrs 19.3.0 pypi_0 pypi
backcall 0.1.0 pypi_0 pypi
bleach 3.1.4 pypi_0 pypi
blinker 1.4 py_1 conda-forge
brotlipy 0.7.0 py37h8f50634_1000 conda-forge
c-ares 1.15.0 h516909a_1001 conda-forge
ca-certificates 2020.4.5.1 hecc5488_0 conda-forge
cachetools 3.1.1 py_0 conda-forge
certifi 2020.4.5.1 py37hc8dfbb8_0 conda-forge
cffi 1.14.0 py37hd463f26_0 conda-forge
chardet 3.0.4 py37hc8dfbb8_1006 conda-forge
click 7.1.2 pyh9f0ad1d_0 conda-forge
colorama 0.4.3 pypi_0 pypi
cryptography 2.9.2 py37hb09aad4_0 conda-forge
cudatoolkit 10.1.243 h6bb024c_0
cudnn 7.6.5 cuda10.1_0
cupti 10.1.168 0
cycler 0.10.0 pypi_0 pypi
decorator 4.4.2 pypi_0 pypi
defusedxml 0.6.0 pypi_0 pypi
entrypoints 0.3 pypi_0 pypi
gast 0.2.2 py_0 conda-forge
gin-config 0.3.0 pypi_0 pypi
gitdb 4.0.4 pypi_0 pypi
gitpython 3.1.1 pypi_0 pypi
google-auth 1.14.1 pyh9f0ad1d_0 conda-forge
google-auth-oauthlib 0.4.1 py_2 conda-forge
google-pasta 0.2.0 pyh8c360ce_0 conda-forge
googledrivedownloader 0.4 pypi_0 pypi
grpcio 1.27.2 py37hf8bcb03_0
h5py 2.10.0 nompi_py37h513d04c_102 conda-forge
hdf5 1.10.5 nompi_h3c11f04_1104 conda-forge
idna 2.9 py_1 conda-forge
imageio 2.8.0 pypi_0 pypi
importlib-metadata 1.6.0 pypi_0 pypi
ipykernel 5.2.1 pypi_0 pypi
ipython 7.13.0 pypi_0 pypi
ipython-genutils 0.2.0 pypi_0 pypi
ipywidgets 7.5.1 pypi_0 pypi
isodate 0.6.0 pypi_0 pypi
jax 0.1.64 pypi_0 pypi
jaxlib 0.1.45 pypi_0 pypi
jedi 0.17.0 pypi_0 pypi
jinja2 2.11.2 pypi_0 pypi
joblib 0.14.1 pypi_0 pypi
jsonpickle 1.4.1 pypi_0 pypi
jsonschema 3.2.0 pypi_0 pypi
jupyter 1.0.0 pypi_0 pypi
jupyter-client 6.1.3 pypi_0 pypi
jupyter-console 6.1.0 pypi_0 pypi
jupyter-core 4.6.3 pypi_0 pypi
keras-applications 1.0.8 py_1 conda-forge
keras-preprocessing 1.1.0 py_0 conda-forge
kiwisolver 1.2.0 pypi_0 pypi
ld_impl_linux-64 2.34 h53a641e_0 conda-forge
libblas 3.8.0 16_openblas conda-forge
libcblas 3.8.0 16_openblas conda-forge
libffi 3.2.1 he1b5a44_1007 conda-forge
libgcc-ng 9.2.0 h24d8f2e_2 conda-forge
libgfortran-ng 7.3.0 hdf63c60_5 conda-forge
liblapack 3.8.0 16_openblas conda-forge
libopenblas 0.3.9 h5ec1e0e_0 conda-forge
libprotobuf 3.11.4 h8b12597_0 conda-forge
libstdcxx-ng 9.2.0 hdf63c60_2 conda-forge
llvm-openmp 10.0.0 hc9558a2_0 conda-forge
llvmlite 0.32.0 pypi_0 pypi
markdown 3.2.1 py_0 conda-forge
markupsafe 1.1.1 pypi_0 pypi
matplotlib 3.2.1 pypi_0 pypi
mistune 0.8.4 pypi_0 pypi
munch 2.5.1.dev12 pypi_0 pypi
nbconvert 5.6.1 pypi_0 pypi
nbformat 5.0.6 pypi_0 pypi
ncurses 6.1 hf484d3e_1002 conda-forge
networkx 2.4 pypi_0 pypi
notebook 6.0.3 pypi_0 pypi
numba 0.49.0 pypi_0 pypi
numpy 1.18.1 py37h8960a57_1 conda-forge
oauthlib 3.0.1 py_0 conda-forge
openssl 1.1.1g h516909a_0 conda-forge
opt-einsum 3.2.1 pypi_0 pypi
packaging 20.3 pypi_0 pypi
pandas 1.0.3 pypi_0 pypi
pandocfilters 1.4.2 pypi_0 pypi
parso 0.7.0 pypi_0 pypi
pexpect 4.8.0 pypi_0 pypi
pickleshare 0.7.5 pypi_0 pypi
pillow 7.1.2 pypi_0 pypi
pip 20.0.2 py_2 conda-forge
plyfile 0.7.2 pypi_0 pypi
prometheus-client 0.7.1 pypi_0 pypi
prompt-toolkit 3.0.5 pypi_0 pypi
protobuf 3.11.4 py37h3340039_1 conda-forge
ptyprocess 0.6.0 pypi_0 pypi
py-cpuinfo 5.0.0 pypi_0 pypi
pyasn1 0.4.8 py_0 conda-forge
pyasn1-modules 0.2.7 py_0 conda-forge
pycparser 2.20 py_0 conda-forge
pygments 2.6.1 pypi_0 pypi
pyjwt 1.7.1 py_0 conda-forge
pymongo 3.10.1 pypi_0 pypi
pyopenssl 19.1.0 py_1 conda-forge
pyparsing 3.0.0a1 pypi_0 pypi
pyrsistent 0.16.0 pypi_0 pypi
pysocks 1.7.1 py37hc8dfbb8_1 conda-forge
python 3.7.6 h8356626_5_cpython conda-forge
python-dateutil 2.8.1 pypi_0 pypi
python_abi 3.7 1_cp37m conda-forge
pytz 2020.1 pypi_0 pypi
pywavelets 1.1.1 pypi_0 pypi
pyzmq 19.0.0 pypi_0 pypi
qtconsole 4.7.3 pypi_0 pypi
qtpy 1.9.0 pypi_0 pypi
rdflib 5.0.0 pypi_0 pypi
readline 8.0 hf8c457e_0 conda-forge
requests 2.23.0 pyh8c360ce_2 conda-forge
requests-oauthlib 1.2.0 py_0 conda-forge
rsa 4.0 py_0 conda-forge
sacred 0.8.1 pypi_0 pypi
scikit-image 0.16.2 pypi_0 pypi
scikit-learn 0.22.2.post1 pypi_0 pypi
scipy 1.4.1 py37ha3d9a3c_3 conda-forge
seaborn 0.10.1 pypi_0 pypi
seml 0.2.2 dev_0
send2trash 1.5.0 pypi_0 pypi
setuptools 46.1.3 py37hc8dfbb8_0 conda-forge
six 1.14.0 py_1 conda-forge
smmap 3.0.2 pypi_0 pypi
sqlite 3.30.1 hcee41ef_0 conda-forge
tensorboard 2.1.1 py_1 conda-forge
tensorflow 2.1.0 gpu_py37h7a4bb67_0
tensorflow-base 2.1.0 gpu_py37h6c5654b_0
tensorflow-estimator 2.1.0 pyhd54b08b_0
tensorflow-gpu 2.1.0 h0d30ee6_0
termcolor 1.1.0 py_2 conda-forge
terminado 0.8.3 pypi_0 pypi
testpath 0.4.4 pypi_0 pypi
tk 8.6.10 hed695b0_0 conda-forge
torch 1.5.0+cpu pypi_0 pypi
torch-cluster 1.5.4 pypi_0 pypi
torch-geometric 1.4.3 pypi_0 pypi
torch-scatter 2.0.4 pypi_0 pypi
torch-sparse 0.6.2 pypi_0 pypi
torch-spline-conv 1.2.0 pypi_0 pypi
torchvision 0.6.0+cpu pypi_0 pypi
tornado 6.0.4 pypi_0 pypi
traitlets 4.3.3 pypi_0 pypi
urllib3 1.25.9 py_0 conda-forge
wcwidth 0.1.9 pypi_0 pypi
webencodings 0.5.1 pypi_0 pypi
werkzeug 1.0.1 pyh9f0ad1d_0 conda-forge
wheel 0.34.2 py_1 conda-forge
widgetsnbextension 3.5.1 pypi_0 pypi
wrapt 1.12.1 py37h8f50634_1 conda-forge
xz 5.2.5 h516909a_0 conda-forge
zipp 3.1.0 pypi_0 pypi
zlib 1.2.11 h516909a_1006 conda-forge
Due to running experiments on a cluster where any job gets killed automatically after 24h, I'm looking for a way to restage killed experiments in order to continue training, without having the info and captured_out fields etc. deleted from the mongodb. Is there any functionality in SEML at the moment that I can look into for this purpose? Any directions would be greatly appreciated.
Hello,
big fan and user of seml here โ (thanks @MxMstrmn for introducing + setting up mongodb at helmholtz servers).
I'd like to access some unique hash/id for each job in a collection (like the config_hash
in mongodb server?), to use it as filename for some results that I don't want to save in mongodb but rather to a file in storage. How can I access it from within the experiment class?
Thank you!
Giovanni
#79 follow-up.
At the moment, one must specify sbatch
options in the yaml
file. However, as discussed in #71, it may be beneficial to override these settings via the CLI. The suggested syntax would be:
seml <collection> add <yaml> -sb mem=25GB partition=gpu_all
The new syntax should be adapted to the -sb
option in seml jupyter
as well. For parsing the key, values we can reuse the key value parser introduced in #79.
seml ... start
doesn't crash.
It does crash.
New version was released a day ago, having a pin from above would be nice though (or a fix).
Traceback (most recent call last):
File "/home/icb/marius.lange/.miniconda3/envs/nsode/bin/seml", line 8, in <module>
sys.exit(main())
File "/home/icb/marius.lange/.miniconda3/envs/nsode/lib/python3.8/site-packages/seml/main.py", line 231, in main
f(**args.__dict__)
File "/home/icb/marius.lange/.miniconda3/envs/nsode/lib/python3.8/site-packages/seml/start.py", line 768, in start_experiments
collection = get_collection(db_collection_name)
File "/home/icb/marius.lange/.miniconda3/envs/nsode/lib/python3.8/site-packages/seml/database.py", line 15, in get_collection
db = get_database(**mongodb_config)
File "/home/icb/marius.lange/.miniconda3/envs/nsode/lib/python3.8/site-packages/seml/database.py", line 24, in get_database
db.authenticate(name=username, password=password)
File "/home/icb/marius.lange/.miniconda3/envs/nsode/lib/python3.8/site-packages/pymongo/collection.py", line 2579, in __call__
raise TypeError("'Collection' object is not callable. If you "
TypeError: 'Collection' object is not callable. If you meant to call the 'authenticate' method on a 'Database' object it is failing because no such method exists.
conda list
):Started Jupyter job in Slurm job with ID 12345.
The logfile of the job is /nfs/homedirs/zuegnerd/libraries/seml/slurm-6322311.out.
Trying to fetch the machine and port of the Jupyter instance once the job is running... (ctrl-C to cancel).
Jupyter instance is starting up...
Startup completed. The Jupyter instance is running at 'gpuxx.kdd.in.tum.de:8889'.
To stop the job, run 'scancel 12345'.
Traceback (most recent call last):
File "<condapath>/miniconda3/bin/seml", line 8, in <module>
sys.exit(main())
File "<condapath>/miniconda3/lib/python3.8/site-packages/seml/main.py", line 223, in main
f(**args.__dict__)
File "<condapath>/miniconda3/lib/python3.8/site-packages/seml/start.py", line 803, in start_jupyter_job
template = pkg_resources.resource_string(__name__, "jupyter_template.sh").decode("utf-8")
File "<condapath>/miniconda3/lib/python3.8/site-packages/pkg_resources/__init__.py", line 1140, in resource_string
return get_provider(package_or_requirement).get_resource_string(
File "<condapath>/miniconda3/lib/python3.8/site-packages/pkg_resources/__init__.py", line 1386, in get_resource_string
return self._get(self._fn(self.module_path, resource_name))
File "<condapath>/miniconda3/lib/python3.8/site-packages/pkg_resources/__init__.py", line 1609, in _get
with open(path, 'rb') as stream:
FileNotFoundError: [Errno 2] No such file or directory: '<condapath>/miniconda3/lib/python3.8/site-packages/seml/jupyter_template.sh'
Traceback (most recent call last):
File "<condapath>/miniconda3/bin/seml", line 8, in <module>
sys.exit(main())
File "<condapath>/miniconda3/lib/python3.8/site-packages/seml/main.py", line 223, in main
f(**args.__dict__)
File "<condapath>/miniconda3/lib/python3.8/site-packages/seml/start.py", line 803, in start_jupyter_job
template = pkg_resources.resource_string(__name__, "jupyter_template.sh").decode("utf-8")
File "<condapath>/miniconda3/lib/python3.8/site-packages/pkg_resources/__init__.py", line 1140, in resource_string
return get_provider(package_or_requirement).get_resource_string(
File "<condapath>/miniconda3/lib/python3.8/site-packages/pkg_resources/__init__.py", line 1386, in get_resource_string
return self._get(self._fn(self.module_path, resource_name))
File "<condapath>/miniconda3/lib/python3.8/site-packages/pkg_resources/__init__.py", line 1609, in _get
with open(path, 'rb') as stream:
FileNotFoundError: [Errno 2] No such file or directory: '<condapath>/miniconda3/lib/python3.8/site-packages/seml/jupyter_template.sh'
conda list
):# packages in environment at <condapath>/miniconda3:
#
# Name Version Build Channel
_libgcc_mutex 0.1 main
argon2-cffi 20.1.0 pypi_0 pypi
async-generator 1.10 pypi_0 pypi
attrs 20.3.0 pypi_0 pypi
backcall 0.2.0 pypi_0 pypi
bleach 3.3.0 pypi_0 pypi
brotlipy 0.7.0 py38h27cfd23_1003
ca-certificates 2021.1.19 h06a4308_1
certifi 2020.12.5 py38h06a4308_0
cffi 1.14.3 py38h261ae71_2
chardet 3.0.4 py38h06a4308_1003
colorama 0.4.4 pypi_0 pypi
conda 4.10.0 py38h06a4308_0
conda-package-handling 1.7.2 py38h03888b9_0
cryptography 3.2.1 py38h3c74f83_1
debugpy 1.2.1 pypi_0 pypi
decorator 5.0.6 pypi_0 pypi
defusedxml 0.7.1 pypi_0 pypi
docopt 0.6.2 pypi_0 pypi
entrypoints 0.3 pypi_0 pypi
gitdb 4.0.7 pypi_0 pypi
gitpython 3.1.14 pypi_0 pypi
idna 2.10 py_0
ipykernel 5.5.3 pypi_0 pypi
ipython 7.22.0 pypi_0 pypi
ipython-genutils 0.2.0 pypi_0 pypi
ipywidgets 7.6.3 pypi_0 pypi
jedi 0.18.0 pypi_0 pypi
jinja2 2.11.3 pypi_0 pypi
jsonpickle 1.5.2 pypi_0 pypi
jsonschema 3.2.0 pypi_0 pypi
jupyter 1.0.0 pypi_0 pypi
jupyter-client 6.1.12 pypi_0 pypi
jupyter-console 6.4.0 pypi_0 pypi
jupyter-core 4.7.1 pypi_0 pypi
jupyterlab-pygments 0.1.2 pypi_0 pypi
jupyterlab-widgets 1.0.0 pypi_0 pypi
ld_impl_linux-64 2.33.1 h53a641e_7
libedit 3.1.20191231 h14c3975_1
libffi 3.3 he6710b0_2
libgcc-ng 9.1.0 hdf63c60_0
libstdcxx-ng 9.1.0 hdf63c60_0
markupsafe 1.1.1 pypi_0 pypi
mistune 0.8.4 pypi_0 pypi
munch 2.5.0 pypi_0 pypi
nbclient 0.5.3 pypi_0 pypi
nbconvert 6.0.7 pypi_0 pypi
nbformat 5.1.3 pypi_0 pypi
ncurses 6.2 he6710b0_1
nest-asyncio 1.5.1 pypi_0 pypi
notebook 6.3.0 pypi_0 pypi
numpy 1.20.2 pypi_0 pypi
openssl 1.1.1k h27cfd23_0
packaging 20.9 pypi_0 pypi
pandas 1.2.4 pypi_0 pypi
pandocfilters 1.4.3 pypi_0 pypi
parso 0.8.2 pypi_0 pypi
pexpect 4.8.0 pypi_0 pypi
pickleshare 0.7.5 pypi_0 pypi
pip 20.2.4 py38h06a4308_0
prometheus-client 0.10.1 pypi_0 pypi
prompt-toolkit 3.0.18 pypi_0 pypi
ptyprocess 0.7.0 pypi_0 pypi
py-cpuinfo 7.0.0 pypi_0 pypi
pycosat 0.6.3 py38h7b6447c_1
pycparser 2.20 py_2
pygments 2.8.1 pypi_0 pypi
pymongo 3.11.3 pypi_0 pypi
pyopenssl 19.1.0 pyhd3eb1b0_1
pyparsing 2.4.7 pypi_0 pypi
pyrsistent 0.17.3 pypi_0 pypi
pysocks 1.7.1 py38h06a4308_0
python 3.8.5 h7579374_1
python-dateutil 2.8.1 pypi_0 pypi
pytz 2021.1 pypi_0 pypi
pyyaml 5.4.1 pypi_0 pypi
pyzmq 22.0.3 pypi_0 pypi
qtconsole 5.0.3 pypi_0 pypi
qtpy 1.9.0 pypi_0 pypi
readline 8.0 h7b6447c_0
requests 2.24.0 py_0
ruamel_yaml 0.15.87 py38h7b6447c_1
sacred 0.8.2 pypi_0 pypi
seml 0.3.0 pypi_0 pypi
send2trash 1.5.0 pypi_0 pypi
setuptools 50.3.1 py38h06a4308_1
six 1.15.0 py38h06a4308_0
smmap 4.0.0 pypi_0 pypi
sqlite 3.33.0 h62c20be_0
terminado 0.9.4 pypi_0 pypi
testpath 0.4.4 pypi_0 pypi
tk 8.6.10 hbc83047_0
tornado 6.1 pypi_0 pypi
tqdm 4.51.0 pyhd3eb1b0_0
traitlets 5.0.5 pypi_0 pypi
urllib3 1.25.11 py_0
wcwidth 0.2.5 pypi_0 pypi
webencodings 0.5.1 pypi_0 pypi
wheel 0.35.1 pyhd3eb1b0_0
widgetsnbextension 3.5.1 pypi_0 pypi
wrapt 1.12.1 pypi_0 pypi
xz 5.2.5 h7b6447c_0
yaml 0.2.5 h7b6447c_0
zlib 1.2.11 h7b6447c_3
I tried to test SEML with seml jupyter commmand, but it seems jupyter_template.sh is missing.
The 'jupyter_template.sh' file exists in github repository, but it does not exist under seml folder, which was downloaded via pip.
I checked the past version which are published in pip, but jupyter_template.sh was not there too.
No error.
Importing generate_configs throws error since update to seml 0.3.
Seml 0.2 + Yaml 3.13 does not throw this error.
from seml.config import generate_configs
/usr/local/lib/python3.7/dist-packages/seml/config.py in ()
327
328
--> 329 class YamlUniqueLoader(yaml.FullLoader):
330 """
331 Custom YAML loader that disallows duplicate keys
AttributeError: module 'yaml' has no attribute 'FullLoader'
Traceback (most recent call last):
File "", line 1, in
...
### Specifications
- Version: 0.3
- Python version: 3.7
- Platform: Google Colab
- Anaconda environment (`conda list`):
yaml Version: 3.13
seml jupyter
should provide an output like:
Queued Jupyter instance in Slurm job with ID 2964998.
The job's log-file is '<some_path>/seml-jupyter-2964998.out'.
Waiting for start-up to fetch the machine and port of the Jupyter instance... (ctrl-C to cancel fetching)
Slurm job is running. Jupyter instance is starting up...
Start-up completed. The Jupyter instance is running at 'gpusrv.scidom.de:8890/?token=<token>'.
To stop the job, run 'scancel 2964998'.
seml
exits with an error and does not provide above infos (the job is still set up properly on slurm
side).
Slurm
to 20.11.7
Traceback (most recent call last):
File "envs/py38/bin/seml", line 8, in <module>
sys.exit(main())
File "envs/py38/lib/python3.8/site-packages/seml/main.py", line 23
1, in main
f(**args.__dict__)
File "envs/py38/lib/python3.8/site-packages/seml/start.py", line 1065, in start_jupyter_job
job_info_dict = {x.split("=")[0]: x.split("=")[1] for x in job_output_results}
File "py38/lib/python3.8/site-packages/seml/start.py", line 1065, in <dictcomp>
job_info_dict = {x.split("=")[0]: x.split("=")[1] for x in job_output_results}
IndexError: list index out of range
I checked the job_output_results
and noticed an entry NtasksPerTRES:0\n
which explains why the .split('=')
method fails. Adding an if '=' in x
in
Line 828 in 84228b5
Line 839 in 84228b5
conda list
):Output should look like:
Queued Jupyter instance in Slurm job with ID 3912111.
The job's log-file is '/mnt/home/icb/leon.hetzel/seml-jupyter-3912111.out'.
Waiting for start-up to fetch the machine and port of the Jupyter instance... (ctrl-C to cancel fetching)
Slurm job is running. Jupyter instance is starting up...
Start-up completed. The Jupyter instance is running at 'http://supergpu.scidom.de:8889/lab/'.
But the path is not printed correctly:
Queued Jupyter instance in Slurm job with ID 3912111.
The job's log-file is '/mnt/home/icb/leon.hetzel/seml-jupyter-3912111.out'.
Waiting for start-up to fetch the machine and port of the Jupyter instance... (ctrl-C to cancel fetching)
Slurm job is running. Jupyter instance is starting up...
Start-up completed. The Jupyter instance is running at 'ServerApp]'.
I'm sorry to say, but the printed commands are still not correct. Example with all the failure cases we've discussed:
seml:
executable: script.py
name: test
output_dir: ~/slurm-output
conda_environment: seml_test
project_root_dir: .
slurm:
sbatch_options:
cpus-per-task: 1
fixed:
nicholas1: "{ 'a': 'test' }"
marten: marten's data with "quotes" in quotes
nicholas2: True
johannes: {'dataset': "test"}
import seml
from sacred import Experiment
ex = Experiment()
seml.setup_logger(ex)
@ex.config
def config():
overwrite = None
db_collection = None
if db_collection is not None:
ex.observers.append(seml.create_mongodb_observer(
db_collection, overwrite=overwrite))
@ex.automain
def run(
nicholas1: str, marten: str, nicholas2: bool, johannes: dict
):
return locals()
seml print-command
********** First experiment **********
Executable: command_test.py
Anaconda environment: graph
Arguments for VS Code debugger:
["with", "--debug", "nicholas={'a': 'test'}", "marten='marten\\'s data with \"quotes\" in quotes'", "nicholas2=True", "johannes={'dataset': 'test'}", "db_collection='seml_example'", "--unobserved"]
Arguments for PyCharm debugger:
with --debug 'nicholas={'"'"'a'"'"': '"'"'test'"'"'}' 'marten='"'"'marten\'"'"'s data with "quotes" in quotes'"'"'' nicholas2=True 'johannes={'"'"'dataset'"'"': '"'"'test'"'"'}' 'db_collection='"'"'seml_example'"'"'' --unobserved
Command for post-mortem debugging:
python command_test.py with 'nicholas={'"'"'a'"'"': '"'"'test'"'"'}' 'marten='"'"'marten\'"'"'s data with "quotes" in quotes'"'"'' nicholas2=True 'johannes={'"'"'dataset'"'"': '"'"'test'"'"'}' 'db_collection='"'"'seml_example'"'"'' --unobserved --pdb
Command for remote debugging:
python -m debugpy --listen 172.24.64.14:41913 --wait-for-client command_test.py with 'nicholas={'"'"'a'"'"': '"'"'test'"'"'}' 'marten='"'"'marten\'"'"'s data with "quotes" in quotes'"'"'' nicholas2=True 'johannes={'"'"'dataset'"'"': '"'"'test'"'"'}' 'db_collection='"'"'seml_example'"'"'' --unobserved
********** All raw commands **********
python command_test.py with 'nicholas={'"'"'a'"'"': '"'"'test'"'"'}' 'marten='"'"'marten\'"'"'s data with "quotes" in quotes'"'"'' nicholas2=True 'johannes={'"'"'dataset'"'"': '"'"'test'"'"'}' 'db_collection='"'"'seml_example'"'"'' overwrite=1 --force
Result: {'nicholas1': "{\\'a\\': \\'test\\'}", 'marten': '\\\'marten\\\\\'s data with "quotes" in quotes\\\'', 'nicholas2': True, 'johannes': "{\\'dataset\\': \\'test\\'}"}
"console": "internalConsole",
in VS Code's launch.json
. However, then we don't see anything that is printed to the console. So we should have working commands here.Calling seml
with seml experiment start --debug-server
should print IP address and port to attach to the session on the allocated node.
Calling seml
with seml experiment start --debug-server
causes an error as it relies on array.array
's tostring
method, which is not supported in Python 3.9 anymore.
From https://docs.python.org/3.9/whatsnew/3.9.html#removed:
array.array: tostring() and fromstring() methods have been removed. They were aliases to tobytes() and frombytes(), deprecated since Python 3.2. (Contributed by Victor Stinner in bpo-38916.)
An easy fix is to replace tostring
with tobytes
.
seml experiment start --debug-server
on Python 3.9Traceback (most recent call last):
Traceback (most recent call last):
File "/nfs/homedirs/fuchsgru/miniconda3/bin/seml", line 8, in <module>
sys.exit(main())
File "/nfs/homedirs/fuchsgru/miniconda3/lib/python3.9/site-packages/seml/main.py", line 231, in main
f(**args.__dict__)
File "/nfs/homedirs/fuchsgru/miniconda3/lib/python3.9/site-packages/seml/start.py", line 786, in start_experiments
start_local_worker(collection=collection, num_exps=num_exps, filter_dict=filter_dict, unobserved=unobserved,
File "/nfs/homedirs/fuchsgru/miniconda3/lib/python3.9/site-packages/seml/start.py", line 646, in start_local_worker
success = start_local_job(collection=collection, exp=exp, unobserved=unobserved, post_mortem=post_mortem,
File "/nfs/homedirs/fuchsgru/miniconda3/lib/python3.9/site-packages/seml/start.py", line 290, in start_local_job
interpreter, exe, config = get_command_from_exp(exp, collection.name,
File "/nfs/homedirs/fuchsgru/miniconda3/lib/python3.9/site-packages/seml/start.py", line 46, in get_command_from_exp
ip_address, port = find_free_port()
File "/nfs/homedirs/fuchsgru/miniconda3/lib/python3.9/site-packages/seml/network.py", line 40, in find_free_port
ifaces = get_network_interfaces()
File "/nfs/homedirs/fuchsgru/miniconda3/lib/python3.9/site-packages/seml/network.py", line 29, in get_network_interfaces
namestr = names.tostring()
AttributeError: 'array.array' object has no attribute 'tostring'
conda list
):# packages in environment at /nfs/homedirs/fuchsgru/miniconda3:
#
# Name Version Build Channel
_libgcc_mutex 0.1 main
_openmp_mutex 4.5 1_gnu
blas 1.0 mkl
brotlipy 0.7.0 py39h27cfd23_1003
bzip2 1.0.8 h7b6447c_0
ca-certificates 2021.10.8 ha878542_0 conda-forge
certifi 2021.10.8 py39hf3d152e_0 conda-forge
cffi 1.14.6 py39h400218f_0
chardet 4.0.0 py39h06a4308_1003
colorama 0.4.4 pypi_0 pypi
conda 4.10.3 py39hf3d152e_2 conda-forge
conda-package-handling 1.7.3 py39h27cfd23_1
cryptography 3.4.7 py39hd23ed53_0
cudatoolkit 10.2.89 hfd86e86_1
debugpy 1.5.0 pypi_0 pypi
decorator 4.4.2 py_0 conda-forge
docopt 0.6.2 pypi_0 pypi
ffmpeg 4.3 hf484d3e_0 pytorch
freetype 2.10.4 h5ab3b9f_0
gitdb 4.0.7 pypi_0 pypi
gitpython 3.1.24 pypi_0 pypi
gmp 6.2.1 h2531618_2
gnutls 3.6.15 he1e5248_0
googledrivedownloader 0.4 pyhd3deb0d_1 conda-forge
gust 0.1 pypi_0 pypi
idna 2.10 pyhd3eb1b0_0
intel-openmp 2021.3.0 h06a4308_3350
jinja2 3.0.2 pyhd8ed1ab_0 conda-forge
joblib 1.1.0 pyhd8ed1ab_0 conda-forge
jpeg 9b h024ee3a_2
jsonpickle 1.5.2 pypi_0 pypi
lame 3.100 h7b6447c_0
lcms2 2.12 h3be6417_0
ld_impl_linux-64 2.35.1 h7274673_9
libffi 3.3 he6710b0_2
libgcc-ng 9.3.0 h5101ec6_17
libgfortran-ng 7.5.0 h14aa051_19 conda-forge
libgfortran4 7.5.0 h14aa051_19 conda-forge
libgomp 9.3.0 h5101ec6_17
libiconv 1.15 h63c8f33_5
libidn2 2.3.2 h7f8727e_0
libpng 1.6.37 hbc83047_0
libstdcxx-ng 9.3.0 hd4cf53a_17
libtasn1 4.16.0 h27cfd23_0
libtiff 4.2.0 h85742a9_0
libunistring 0.9.10 h27cfd23_0
libuv 1.40.0 h7b6447c_0
libwebp-base 1.2.0 h27cfd23_0
llvmlite 0.37.0 pypi_0 pypi
lz4-c 1.9.3 h295c915_1
markupsafe 2.0.1 py39h3811e60_0 conda-forge
mkl 2021.3.0 h06a4308_520
mkl-service 2.4.0 py39h7f8727e_0
mkl_fft 1.3.0 py39h42c9631_2
mkl_random 1.2.2 py39h51133e4_0
munch 2.5.0 pypi_0 pypi
ncurses 6.2 he6710b0_1
nettle 3.7.3 hbbd107a_1
networkx 2.5.1 pyhd8ed1ab_0 conda-forge
ninja 1.10.2 hff7bd54_1
numba 0.54.1 pypi_0 pypi
numpy 1.20.3 py39hf144106_0
numpy-base 1.20.3 py39h74d4b33_0
olefile 0.46 pyhd3eb1b0_0
openh264 2.1.0 hd408876_0
openjpeg 2.4.0 h3ad879b_0
openssl 1.1.1l h7f8727e_0
packaging 21.0 pypi_0 pypi
pandas 1.2.5 py39hde0f152_0 conda-forge
pillow 8.3.1 py39h2c7a002_0
pip 21.1.3 py39h06a4308_0
py-cpuinfo 8.0.0 pypi_0 pypi
pycosat 0.6.3 py39h27cfd23_0
pycparser 2.20 py_2
pyg 2.0.1 py39_torch_1.9.0_cu102 pyg
pymongo 3.12.0 pypi_0 pypi
pyopenssl 20.0.1 pyhd3eb1b0_1
pyparsing 2.4.7 pyh9f0ad1d_0 conda-forge
pysocks 1.7.1 py39h06a4308_0
python 3.9.5 h12debd9_4
python-dateutil 2.8.2 pyhd8ed1ab_0 conda-forge
python-louvain 0.15 pyhd3deb0d_0 conda-forge
python_abi 3.9 2_cp39 conda-forge
pytorch 1.9.1 py3.9_cuda10.2_cudnn7.6.5_0 pytorch
pytorch-cluster 1.5.9 py39_torch_1.9.0_cu102 pyg
pytorch-scatter 2.0.8 py39_torch_1.9.0_cu102 pyg
pytorch-sparse 0.6.12 py39_torch_1.9.0_cu102 pyg
pytorch-spline-conv 1.2.1 py39_torch_1.9.0_cu102 pyg
pytz 2021.3 pyhd8ed1ab_0 conda-forge
pyyaml 5.4.1 py39h3811e60_0 conda-forge
readline 8.1 h27cfd23_0
requests 2.25.1 pyhd3eb1b0_0
ruamel_yaml 0.15.100 py39h27cfd23_0
sacred 0.8.2 pypi_0 pypi
scikit-learn 0.24.2 py39ha9443f7_0
scipy 1.7.1 py39h292c36d_2
seml 0.3.4 pypi_0 pypi
setuptools 52.0.0 py39h06a4308_0
six 1.16.0 pyhd3eb1b0_0
smmap 4.0.0 pypi_0 pypi
sqlite 3.36.0 hc218d9a_0
threadpoolctl 3.0.0 pyh8a188c0_0 conda-forge
tk 8.6.10 hbc83047_0
torchaudio 0.9.1 py39 pytorch
torchvision 0.10.1 py39_cu102 pytorch
tqdm 4.61.2 pyhd3eb1b0_1
typing_extensions 3.10.0.2 pyh06a4308_0
tzdata 2021a h52ac0ba_0
urllib3 1.26.6 pyhd3eb1b0_1
wheel 0.36.2 pyhd3eb1b0_0
wrapt 1.13.2 pypi_0 pypi
xz 5.2.5 h7b6447c_0
yacs 0.1.6 py_0 conda-forge
yaml 0.2.5 h7b6447c_0
zlib 1.2.11 h7b6447c_3
zstd 1.4.9 haebb681_0
When deleting a failed experiment, not all associated saved source files in the mongodb collections fs.files
and fs.chunks
are deleted. Only those are deleted, which have been stored when staging the experiment and not those stored when starting/running it (i.e. the actual experiment script). The (only) consequence of the bug is a cluttering of those two collections over time.
Delete all source files associated to an experiment in the mongodb. This includes the source-files saved during staging (in the mongodb collection listed under seml->source_files) and those saved when running the experiment (in the mongodb collection under experiment->sources).
Only the entries in fs.files
and fs.chunks
are deleted, which correspond to the source files saved during staging and listed in seml->source_files.
fs.files
and fs.chunks
collectionseml mycollection add myconfig
seml mycollection start
seml mycollection delete
fs.files
and fs.chunks
collectionA declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.