Code Monkey home page Code Monkey logo

Comments (6)

voutcn avatar voutcn commented on July 18, 2024

Ryan, please send the reads to voutcn at gmail dot com. I will try it.

On Jul 3, 2015, at 07:05, Ryan Wick [email protected] wrote:

Thank you for adding FASTG support to MEGAHIT! I gave it a try on a simple dataset (synthetic reads from a plasmid sequence), and used Bandage to compare the results to Velvet at the same k-mer.

Here's the 61mer Velvet graph:

And here's the 61mer MEGAHIT graph:

You can see that while they have the same overall structure, some contigs in the MEGAHIT graph are broken into pieces, resulting in 'dead ends'. It is most notable for one of the long contigs on the left side of the images. This results in contigs that are not as long as they could be.

The data is fairly ideal (error-free reads and decent read depth), so I'm not sure why this would be happening. I could email you the reads if you wanted to try it for yourselves.


Reply to this email directly or view it on GitHub.

from megahit.

voutcn avatar voutcn commented on July 18, 2024

Hi Ryan,

Here is the graph of MEGAHIT with parameter --k-list 61, which looks the same as that of Velvet:

image

Some contigs of MEGAHIT's graph with default parameter are broken because the genome has some closed region that shares very similar patterns, which are merged in k=21. For example the long contig on the left is broken around the follow regions:

...TTCTGCCGCCATTG_A_AGCAAATGCTTTTATACAAAAGGCACTTTTCTGCCGCCATTG_G_AGCAAATGCTTTTATACAAAAGGCACTTTTTTGCTCTCCA...

The two similar patterns in bold are very closed and only differed by 1 bases. A small k “collapses” this region, and one of them would be removed as they form a perfect bubble.

Therefore we recommend using larger starting k when the sequencing depth is high. But as MEGAHIT targets on metagenomes, where many species are sequenced at low depth, we use a rather small k=21 by default.

from megahit.

tseemann avatar tseemann commented on July 18, 2024

I agree. MEGAHIT is designed for metagenomes. But in my applications I need a "good" (but not perfect) genome assembly and I need it "fast". I never use k=21 at the lower limit. For 100x Illumina >100bp PE I would use --min-k 31 or 41 maybe and go as high as 121. I would also set the --min-count to >2, usually something like >10.

@voutcn Are there any other parameters you think we should change when assembling a single bacterium (possibly with a multi-copy higher coverage plasmid) ?

from megahit.

tseemann avatar tseemann commented on July 18, 2024

I should probably also use the --no-mercy option?

Mercy k-mer
This is specially designed for metagenomics assembly to recover low coverage sequence. For generic dataset >= 30x, MEGAHIT may generate better results with --no-mercy option.

from megahit.

voutcn avatar voutcn commented on July 18, 2024

@tseemann Yes I suggest so.

from megahit.

rrwick avatar rrwick commented on July 18, 2024

Okay, I think I understand this better now. I had previously assumed that the graph from a 61mer only assembly would be identical to the 61mer graph from an assembly that used a range of k-mers from 21 to 99. However, it seems that the min k-mer also affects the larger k-mer graphs, due to bubble collapsing. When I tried the same assembly (k=21 to 99) using the --no-bubble option, the graph had no broken contigs.

Thank you for your help!

from megahit.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.