Code Monkey home page Code Monkey logo

Comments (4)

matrs avatar matrs commented on July 29, 2024 1

Thank you very much, I'll check the next steps in the pipeline the following days and I'll implement one of the solutions you suggested.

Thanks!

from metagem.

franciscozorrilla avatar franciscozorrilla commented on July 29, 2024

Hi Jose,

Indeed that should fix this problem in your situation! I suspect that part of the problem also stems from the fact that your scratch/ path in the config.yaml file is likely pointing to a single directory, is this correct?

In the clusters I have used to develop metaGEM there is generally a variable called something like $TMPDIR or $SCRATCH, which has a job-specific directory for each sample when submitting jobs (e.g. this), meaning that you can use the same variable in the Snakefile and then each job will be given a unique storage location by the scheduler/cluster.

Does your cluster have such a variable? If so, then you can set your scratch/ path in the config.yaml file as shown below to avoid having to modify other rules that make use of the scratch/ directory.

    scratch: $YOUR_CLUSTER_TMPDIR

Thanks for reporting, I will update the documentation to elaborate on the usage of the scratch path.

Best wishes,
Francisco

from metagem.

matrs avatar matrs commented on July 29, 2024

Hello Francisco,
I didn't know that when submitting jobs to a $SCRATCH partition a unique directory for each job is created, that explains why nobody complained about this in the past. In this particular cluster there is no scratch partition, so no $SCRATCH is defined. The /tmp directory works as in any linux system and also is rather small so I defined tmp to be a directory in my $HOME in the json config (in this cluster, /home is a local file system).

Thank you for your help.

Jose luis

from metagem.

franciscozorrilla avatar franciscozorrilla commented on July 29, 2024

Yes, unfortunately it can be a bit difficult to build readily usable/deployable pipelines when clusters tend to be quite idiosyncratic.

I am slightly concerned in your situation: when you submit jobs in parallel further downstream in the analysis (e.g. see Snakefile rule crossMap) then you will have multiple jobs trying to use the same directory and this will cause errors. At the moment I see 3 potential solutions:

  1. The cleanest and easiest solution for you is to probably create a job specific sub directory at the start of jobs (within the scratch/ directory).
  2. Alternatively you could simply remove/comment out the lines of code within the shell section of the Snakefile jobs that move files into the scratch dir.
  3. The most annoying option would be to leave everything as is and submit jobs in series, but of course this defeats the purpose of using the cluster.

I will implement solution 1 in the Snakefile as soon as I get the chance. This would fix the problem for users that dont have a job specific $SCRATCH or $TMPDIR variable, while also not causing problems for users that do have that job specific var.

from metagem.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.