Code Monkey home page Code Monkey logo

Comments (5)

franciscozorrilla avatar franciscozorrilla commented on September 3, 2024

Hi Hongzhong,

There are a few ways to do this, and the most optimal way depends on what you are trying to do.
Two important questions to ask yourself are:

  1. Do you just want to use metabat2 for binning or are you using all 3 binners + metaWRAP? Yes = best performance
  2. Are you cross mapping each set of short reads to each assembly? Yes = best performance

You can look at the tutorial/demo to get an idea of how to use the metaGEM.sh parser to interface with the Snakefile for job submissions, and specifically in this section you can see the cross-mapping. I just uncommented out 3 lines of the Snakefile in this last commit, so now you should be able to submit crossMap jobs to generate depth files just as described in the tutorial. For example, to submit 2 jobs with 24 cores + 120 GB RAM and 24 hour max runtime:

bash metaGEM.sh -t crossMap -j 2 -c 24 -m 120 -h 24

Note that by default this will run the Snakefile rule crossMap, which will submit one job per each of your samples. Within each of these jobs, there will be a for loop mapping each set of paired end reads in your dataset to the focal sample's assembly. These mapping files will are used to generate your coverage inputs for CONCOCT, MetaBAT2, and MaxBin2.

I should mention as a note of caution that this approach works well for small-to-medium-sized datasets (~ <= 150 medium sized samples), but may become impractical for large datasets, both in terms of runtime and computational load. This is because the job needs to generate N sorted bam files to create the concoct coverage table, where N = number of samples. You can imagine if you had a dataset of 300 samples, and each bam file is ~10GB, you would need around 3TB of temporary storage per job, and up to ~900TB if you run all jobs in parallel.

In the metaGEM manuscript we processed the TARA oceans dataset which was quite large (~246 samples). For these larger datasets we recommend to run a slightly modified workflow where each individual mapping operation is submitted as an individual job and mapped using kallisto. I am now working on adding support for this alternative branch of the workflow to the metaGEM.sh parser (issue #22).

Please let me know if you have further questions.
Best wishes,
Francisco

from metagem.

hongzhonglu avatar hongzhonglu commented on September 3, 2024

Hi Francisco,
Thanks a lot for your kind help! Now I just want to use metabat2 for binning as it is the first time for me to run the meta-genome analysis. So I start from simple things. I could find the steps to generate depth.txt file from your nice pipeline. I will study how to run it.

Best regards,
Hongzhong

from metagem.

franciscozorrilla avatar franciscozorrilla commented on September 3, 2024

Hi Hongzhong,

In that case I recommend looking at the metabat rule in line 512 of the Snakefile.
Note that the output is commented out currently, since this is a "backup"/alternative version of running metabat2.
You will need to uncomment out the output to the metabat rule in line 518 to look like this:

directory(f'{config["path"]["root"]}/{config["folder"]["metabat"]}/{{IDs}}/{{IDs}}.metabat-bins')

and then comment out the output to the main metabat rule metabatCross on line 581 to look like this:

#directory(f'{config["path"]["root"]}/{config["folder"]["metabat"]}/{{IDs}}/{{IDs}}.metabat-bins')

You need to do this so that Snakemake knows exactly which rule to execute to generate your desired files.
After making sure that only your desired metabat2 rule has an uncommented output then you can submit metabat2 jobs to the cluster using:

bash metaGEM.sh -t metabat -j N_JOBS -c N_CORES -m MEMORY -h RUN_TIME

I have also recently expanded the metaGEM wiki, so please check it out if you want to learn more about usage and implementation of metaGEM.

Also, just so you know, from personal experience I found that CONCOCT tends to outperform maxbin2 and metabat2 in most cases. As a reference you can look at Supplementary Figure 2 of the metaGEM paper:

Screenshot 2021-03-23 at 17 04 18

Hope this helps and let me know if you have any other questions.
Best wishes,
Francisco

from metagem.

hongzhonglu avatar hongzhonglu commented on September 3, 2024

Hi Francisco,
Thanks so much! Very good reference for me to study.

Best regards,
Hongzhong

from metagem.

franciscozorrilla avatar franciscozorrilla commented on September 3, 2024

Closing this due to inactivity but please reopen/comment if you have further questions.

from metagem.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.