Hello, thanks for this pipeline, it's been very useful. I found an error when

Hello Francisco, I didn't know that when submitting jobs to a <code class="notrans

error in rule megahit when running two or more jobs in parallel about metagem HOT 4 CLOSED

franciscozorrilla commented on July 29, 2024

error in rule megahit when running two or more jobs in parallel

from metagem.

Comments (4)

matrs commented on July 29, 2024 1

Thank you very much, I'll check the next steps in the pipeline the following days and I'll implement one of the solutions you suggested.

Thanks!

from metagem.

franciscozorrilla commented on July 29, 2024

Hi Jose,

Indeed that should fix this problem in your situation! I suspect that part of the problem also stems from the fact that your scratch/ path in the config.yaml file is likely pointing to a single directory, is this correct?

In the clusters I have used to develop metaGEM there is generally a variable called something like $TMPDIR or $SCRATCH, which has a job-specific directory for each sample when submitting jobs (e.g. this), meaning that you can use the same variable in the Snakefile and then each job will be given a unique storage location by the scheduler/cluster.

Does your cluster have such a variable? If so, then you can set your scratch/ path in the config.yaml file as shown below to avoid having to modify other rules that make use of the scratch/ directory.

    scratch: $YOUR_CLUSTER_TMPDIR

Thanks for reporting, I will update the documentation to elaborate on the usage of the scratch path.

Best wishes,
Francisco

from metagem.

matrs commented on July 29, 2024

Hello Francisco,
I didn't know that when submitting jobs to a $SCRATCH partition a unique directory for each job is created, that explains why nobody complained about this in the past. In this particular cluster there is no scratch partition, so no $SCRATCH is defined. The /tmp directory works as in any linux system and also is rather small so I defined tmp to be a directory in my $HOME in the json config (in this cluster, /home is a local file system).

Thank you for your help.

Jose luis

from metagem.

franciscozorrilla commented on July 29, 2024

Yes, unfortunately it can be a bit difficult to build readily usable/deployable pipelines when clusters tend to be quite idiosyncratic.

I am slightly concerned in your situation: when you submit jobs in parallel further downstream in the analysis (e.g. see Snakefile rule crossMap) then you will have multiple jobs trying to use the same directory and this will cause errors. At the moment I see 3 potential solutions:

The cleanest and easiest solution for you is to probably create a job specific sub directory at the start of jobs (within the scratch/ directory).
Alternatively you could simply remove/comment out the lines of code within the shell section of the Snakefile jobs that move files into the scratch dir.
The most annoying option would be to leave everything as is and submit jobs in series, but of course this defeats the purpose of using the cluster.

I will implement solution 1 in the Snakefile as soon as I get the chance. This would fix the problem for users that dont have a job specific $SCRATCH or $TMPDIR variable, while also not causing problems for users that do have that job specific var.

from metagem.

error in rule megahit when running two or more jobs in parallel about metagem HOT 4 CLOSED

Comments (4)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent