Hi Francisco, Could I ask a question about usage of metabat2 in bin analysis? I fo

Hi Hongzhong, In that case I recommend looking at the <code class="n

Question about generating "depth.txt" for metabat2 about metagem HOT 5 CLOSED

franciscozorrilla commented on September 3, 2024

Question about generating "depth.txt" for metabat2

from metagem.

Comments (5)

franciscozorrilla commented on September 3, 2024

Hi Hongzhong,

There are a few ways to do this, and the most optimal way depends on what you are trying to do.
Two important questions to ask yourself are:

Do you just want to use metabat2 for binning or are you using all 3 binners + metaWRAP? Yes = best performance
Are you cross mapping each set of short reads to each assembly? Yes = best performance

You can look at the tutorial/demo to get an idea of how to use the metaGEM.sh parser to interface with the Snakefile for job submissions, and specifically in this section you can see the cross-mapping. I just uncommented out 3 lines of the Snakefile in this last commit, so now you should be able to submit crossMap jobs to generate depth files just as described in the tutorial. For example, to submit 2 jobs with 24 cores + 120 GB RAM and 24 hour max runtime:

bash metaGEM.sh -t crossMap -j 2 -c 24 -m 120 -h 24

Note that by default this will run the Snakefile rule crossMap, which will submit one job per each of your samples. Within each of these jobs, there will be a for loop mapping each set of paired end reads in your dataset to the focal sample's assembly. These mapping files will are used to generate your coverage inputs for CONCOCT, MetaBAT2, and MaxBin2.

I should mention as a note of caution that this approach works well for small-to-medium-sized datasets (~ <= 150 medium sized samples), but may become impractical for large datasets, both in terms of runtime and computational load. This is because the job needs to generate N sorted bam files to create the concoct coverage table, where N = number of samples. You can imagine if you had a dataset of 300 samples, and each bam file is ~10GB, you would need around 3TB of temporary storage per job, and up to ~900TB if you run all jobs in parallel.

In the metaGEM manuscript we processed the TARA oceans dataset which was quite large (~246 samples). For these larger datasets we recommend to run a slightly modified workflow where each individual mapping operation is submitted as an individual job and mapped using kallisto. I am now working on adding support for this alternative branch of the workflow to the metaGEM.sh parser (issue #22).

Please let me know if you have further questions.
Best wishes,
Francisco

from metagem.

hongzhonglu commented on September 3, 2024

Hi Francisco,
Thanks a lot for your kind help! Now I just want to use metabat2 for binning as it is the first time for me to run the meta-genome analysis. So I start from simple things. I could find the steps to generate depth.txt file from your nice pipeline. I will study how to run it.

Best regards,
Hongzhong

from metagem.

franciscozorrilla commented on September 3, 2024

Hi Hongzhong,

In that case I recommend looking at the metabat rule in line 512 of the Snakefile.
Note that the output is commented out currently, since this is a "backup"/alternative version of running metabat2.
You will need to uncomment out the output to the metabat rule in line 518 to look like this:

directory(f'{config["path"]["root"]}/{config["folder"]["metabat"]}/{{IDs}}/{{IDs}}.metabat-bins')

and then comment out the output to the main metabat rule metabatCross on line 581 to look like this:

#directory(f'{config["path"]["root"]}/{config["folder"]["metabat"]}/{{IDs}}/{{IDs}}.metabat-bins')

You need to do this so that Snakemake knows exactly which rule to execute to generate your desired files.
After making sure that only your desired metabat2 rule has an uncommented output then you can submit metabat2 jobs to the cluster using:

bash metaGEM.sh -t metabat -j N_JOBS -c N_CORES -m MEMORY -h RUN_TIME

I have also recently expanded the metaGEM wiki, so please check it out if you want to learn more about usage and implementation of metaGEM.

Also, just so you know, from personal experience I found that CONCOCT tends to outperform maxbin2 and metabat2 in most cases. As a reference you can look at Supplementary Figure 2 of the metaGEM paper:

Hope this helps and let me know if you have any other questions.
Best wishes,
Francisco

from metagem.

hongzhonglu commented on September 3, 2024

Hi Francisco,
Thanks so much! Very good reference for me to study.

Best regards,
Hongzhong

from metagem.

franciscozorrilla commented on September 3, 2024

Closing this due to inactivity but please reopen/comment if you have further questions.

from metagem.

Question about generating "depth.txt" for metabat2 about metagem HOT 5 CLOSED

Comments (5)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent