Comments (5)
Hi Hongzhong,
There are a few ways to do this, and the most optimal way depends on what you are trying to do.
Two important questions to ask yourself are:
- Do you just want to use metabat2 for binning or are you using all 3 binners + metaWRAP? Yes = best performance
- Are you cross mapping each set of short reads to each assembly? Yes = best performance
You can look at the tutorial/demo to get an idea of how to use the metaGEM.sh
parser to interface with the Snakefile for job submissions, and specifically in this section you can see the cross-mapping. I just uncommented out 3 lines of the Snakefile in this last commit, so now you should be able to submit crossMap jobs to generate depth files just as described in the tutorial. For example, to submit 2 jobs with 24 cores + 120 GB RAM and 24 hour max runtime:
bash metaGEM.sh -t crossMap -j 2 -c 24 -m 120 -h 24
Note that by default this will run the Snakefile rule crossMap
, which will submit one job per each of your samples. Within each of these jobs, there will be a for loop mapping each set of paired end reads in your dataset to the focal sample's assembly. These mapping files will are used to generate your coverage inputs for CONCOCT, MetaBAT2, and MaxBin2.
I should mention as a note of caution that this approach works well for small-to-medium-sized datasets (~ <= 150 medium sized samples), but may become impractical for large datasets, both in terms of runtime and computational load. This is because the job needs to generate N sorted bam files to create the concoct coverage table, where N = number of samples. You can imagine if you had a dataset of 300 samples, and each bam file is ~10GB, you would need around 3TB of temporary storage per job, and up to ~900TB if you run all jobs in parallel.
In the metaGEM manuscript we processed the TARA oceans dataset which was quite large (~246 samples). For these larger datasets we recommend to run a slightly modified workflow where each individual mapping operation is submitted as an individual job and mapped using kallisto. I am now working on adding support for this alternative branch of the workflow to the metaGEM.sh
parser (issue #22).
Please let me know if you have further questions.
Best wishes,
Francisco
from metagem.
Hi Francisco,
Thanks a lot for your kind help! Now I just want to use metabat2 for binning as it is the first time for me to run the meta-genome analysis. So I start from simple things. I could find the steps to generate depth.txt file from your nice pipeline. I will study how to run it.
Best regards,
Hongzhong
from metagem.
Hi Hongzhong,
In that case I recommend looking at the metabat
rule in line 512 of the Snakefile.
Note that the output is commented out currently, since this is a "backup"/alternative version of running metabat2.
You will need to uncomment out the output to the metabat
rule in line 518 to look like this:
directory(f'{config["path"]["root"]}/{config["folder"]["metabat"]}/{{IDs}}/{{IDs}}.metabat-bins')
and then comment out the output to the main metabat rule metabatCross
on line 581 to look like this:
#directory(f'{config["path"]["root"]}/{config["folder"]["metabat"]}/{{IDs}}/{{IDs}}.metabat-bins')
You need to do this so that Snakemake knows exactly which rule to execute to generate your desired files.
After making sure that only your desired metabat2 rule has an uncommented output then you can submit metabat2 jobs to the cluster using:
bash metaGEM.sh -t metabat -j N_JOBS -c N_CORES -m MEMORY -h RUN_TIME
I have also recently expanded the metaGEM wiki, so please check it out if you want to learn more about usage and implementation of metaGEM
.
Also, just so you know, from personal experience I found that CONCOCT tends to outperform maxbin2 and metabat2 in most cases. As a reference you can look at Supplementary Figure 2 of the metaGEM paper:
Hope this helps and let me know if you have any other questions.
Best wishes,
Francisco
from metagem.
Hi Francisco,
Thanks so much! Very good reference for me to study.
Best regards,
Hongzhong
from metagem.
Closing this due to inactivity but please reopen/comment if you have further questions.
from metagem.
Related Issues (20)
- CompositionVis & modelVis output HOT 7
- Job submission with qsub HOT 2
- [Bug]: 'BiGG_gene' is both an index level and a column label, which is ambiguous. HOT 4
- [Usage]: running workflow on workstation with local flag HOT 17
- Questions about media, gapfilling, and predicting interactions HOT 11
- Getting the following error while running the bash metaGEM.sh -t check
- Getting the following error while running the bash metaGEM.sh -t check HOT 8
- refined_bins output remains empty after successful binRefine step HOT 2
- [Question]: How to define and construct a custom culture medium component that can be recognized by CarveMe? HOT 2
- [Question]:Why, when I use CarveMe for gap-filling, does it show that my custom medium does not exist in the database? HOT 1
- [Question]: I meet some errors when I use CarveMe for gap-filling? HOT 3
- [Question]: How to use the GEM output of CarveMe to generate these two files? HOT 1
- [Bug]: Metawrap Installation failure HOT 5
- ERROR when using GTDBTK HOT 2
- maintenance: check bonus tool implementation in Snakefile and wrapper
- crossmap with multiple threads HOT 2
- Implementation of EukRep in the Snakemake pipeline HOT 4
- How to set media to interpret and compare the metabolic interactions at different habitats? HOT 5
- abundance | samtools view: failed to add PG line to the header HOT 2
- dir_util.py AttributeError: 'dict' object has no attribute 'add' HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from metagem.