Code Monkey home page Code Monkey logo

Comments (5)

franciscozorrilla avatar franciscozorrilla commented on September 3, 2024

Hey Sam,

Good question! Yes, Snakemake will figure out which rules need to be run based on the presence/absence of output/input files based on the target rule. It will not re-run the jobs for already assembled samples, unless you delete/move the output file, or if you use the Snakeamke -R flag.

For example, lets say my target rule is assembly, so you would run something like e.g. bash metaGEM.sh -t megahit -j 43 -c 32. Let's look at the megahit rule in the Snakefile:

metaGEM/Snakefile

Lines 273 to 278 in d81186a

rule megahit:
input:
R1 = rules.qfilter.output.R1,
R2 = rules.qfilter.output.R2
output:
f'{config["path"]["root"]}/{config["folder"]["assemblies"]}/{{IDs}}/contigs.fasta.gz'

First of all, Snakemake will make sure that the inputs for this rule are present. So if the samples have not been quality filtered, then Snakemake would submit 43 qfilter jobs + 43 assembly jobs.

Now let's say that the samples have all been quality filtered in a previous run, then Snakemake will check if the output of the target rule is present, i.e. it will search the assemblies/ subfolders for files called contigs.fasta.gz. In your scenario you said you had 10 assemblies completed, so if they are present in the specified location, then Snakemake would only submit 33 assembly jobs.

Some useful troubleshooting tips:

  • Double check what jobs will be submitted by running the metaGEM.sh script, as it will always dry-run jobs before asking you if they look good for submission.
  • Alternatively, check this manually by running snakemake all -n in your metaGEM folder.
  • Sanity check by using touch to create dummy output files, then dryrun to see if you tricked Snakemake into thinking that the files have already been generated. Remember to delete the dummy files afterwards!

Best,
Francisco

from metagem.

slambrechts avatar slambrechts commented on September 3, 2024

Hi Francisco,

Thank you for your answer. Copying the assemblies/ subfolders did not work. metaGEM recognized the samples that were assembled on the same cluster, but not the others that were assembled on the other cluster and copied over. Maybe I should also copy files for the intermediate results folder?

Also, it seems that when metaGEM then starts a task for a sample that was previously run on a different machine, the result folder that is already present (the one I copied over) for that sample gets deleted.

Best,
Sam

from metagem.

franciscozorrilla avatar franciscozorrilla commented on September 3, 2024

Hey Sam,

Did metaGEM try to submit quality filtering jobs for the samples who's assemblies got deleted? Did you have all the qfiltered/ result files (including the ones for the samples that were assembled on your local machine) on your new cluster? Similarly, does you dataset/ folder contain all you samples? You need to have these files present, otherwise Snakemake will try to re-create them before running your target rule.

from metagem.

slambrechts avatar slambrechts commented on September 3, 2024

Hi francisco,

No metaGEM did not try to submit qc jobs for those samples. All the samples were qfiltered on both machines, and the dataset folder contained all the samples. As a work around, I temporarily moved the samples that were already assembled from the dataset folder and restarted.

from metagem.

franciscozorrilla avatar franciscozorrilla commented on September 3, 2024

I see, sorry to hear you were having trouble with this, but glad that you figured out the workaround!

from metagem.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.