Comments (6)
Ryan, please send the reads to voutcn at gmail dot com. I will try it.
On Jul 3, 2015, at 07:05, Ryan Wick [email protected] wrote:
Thank you for adding FASTG support to MEGAHIT! I gave it a try on a simple dataset (synthetic reads from a plasmid sequence), and used Bandage to compare the results to Velvet at the same k-mer.
Here's the 61mer Velvet graph:
And here's the 61mer MEGAHIT graph:
You can see that while they have the same overall structure, some contigs in the MEGAHIT graph are broken into pieces, resulting in 'dead ends'. It is most notable for one of the long contigs on the left side of the images. This results in contigs that are not as long as they could be.
The data is fairly ideal (error-free reads and decent read depth), so I'm not sure why this would be happening. I could email you the reads if you wanted to try it for yourselves.
—
Reply to this email directly or view it on GitHub.
from megahit.
Hi Ryan,
Here is the graph of MEGAHIT with parameter --k-list 61
, which looks the same as that of Velvet:
Some contigs of MEGAHIT's graph with default parameter are broken because the genome has some closed region that shares very similar patterns, which are merged in k=21. For example the long contig on the left is broken around the follow regions:
...TTCTGCCGCCATTG_A_AGCAAATGCTTTTATACAAAAGGCACTTTTCTGCCGCCATTG_G_AGCAAATGCTTTTATACAAAAGGCACTTTTTTGCTCTCCA...
The two similar patterns in bold are very closed and only differed by 1 bases. A small k “collapses” this region, and one of them would be removed as they form a perfect bubble.
Therefore we recommend using larger starting k when the sequencing depth is high. But as MEGAHIT targets on metagenomes, where many species are sequenced at low depth, we use a rather small k=21 by default.
from megahit.
I agree. MEGAHIT is designed for metagenomes. But in my applications I need a "good" (but not perfect) genome assembly and I need it "fast". I never use k=21 at the lower limit. For 100x Illumina >100bp PE I would use --min-k 31 or 41 maybe and go as high as 121. I would also set the --min-count to >2, usually something like >10.
@voutcn Are there any other parameters you think we should change when assembling a single bacterium (possibly with a multi-copy higher coverage plasmid) ?
from megahit.
I should probably also use the --no-mercy
option?
Mercy k-mer
This is specially designed for metagenomics assembly to recover low coverage sequence. For generic dataset >= 30x, MEGAHIT may generate better results with --no-mercy option.
from megahit.
@tseemann Yes I suggest so.
from megahit.
Okay, I think I understand this better now. I had previously assumed that the graph from a 61mer only assembly would be identical to the 61mer graph from an assembly that used a range of k-mers from 21 to 99. However, it seems that the min k-mer also affects the larger k-mer graphs, due to bubble collapsing. When I tried the same assembly (k=21 to 99) using the --no-bubble
option, the graph had no broken contigs.
Thank you for your help!
from megahit.
Related Issues (20)
- Problem with big assembly HOT 1
- Error with latest OS X Bioconda recipe HOT 1
- Assembly contiguity & sequencing depth
- Usage on Mac - megahit_core read2sdbg
- Ubuntu WSL and Open-Suse server megahit_core read2sdbg
- Stuck with exit code 1 for a while please help HOT 2
- Hi~I got this error: 'Exit code -9' HOT 3
- == Error == system call for: "['']" finished abnormally, OS return value: 2
- Will circular DNA be reported? HOT 3
- Exit Code -6
- Exit code 1
- Running into a permission error while running a test data
- How to solve memory shortages?
- Megahit running for almost a month now
- MEGAHIT output: scaffolds HOT 3
- How to tell what the "peak memory" or "maximum memory needed" would be for a job based on the log?
- Is there a default for the --preset option?
- Error: Exit code -7
- Merge simple edges in the assembly graph? HOT 1
- Illumina SE
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from megahit.