Code Monkey home page Code Monkey logo

Comments (2)

KamilSJaron avatar KamilSJaron commented on August 26, 2024

Very very cool. I am glad you have such a beautiful sequencing run of diatoms!

It really is beautiful, it can't get much better than this - all your peaks are very clearly separated and there seems to be only very little of any background noise. That makes me quite confident that your true 1n coverage is indeed ~100x. Not Genomescope or smudgeplot gave you any indication it could be half of it, so this is something you can be fairly sure about, so let's proceed assuming you got the coverage right.

There are 4 clear peaks in GenomeScope - 1n and 2n peaks are really large, 3n and 4n peaks are rather small. Usually, "the ploidy peak" is the highest one, but there are exceptions to this rule - highly heterozygous species have 1n peak higher, and degenerated tetraploids (usually allotetraploids) can have their 4n peak quite a bit smaller than 1n and 2n peaks. However, just considering genomescope, my naive guess would be diploid with some duplications.

I really like to look at Smudgeplot in these cases, it tells us something about relationship of the individual genomescope peaks - are the any similar sequences in any other peaks? Which ones? In you case it seems that majority of similar sequences are pairing 1n - 1n peak (expected if the species is diploid). Furthermore, you have a big array of smudges that are weakly pairing all possible peaks (nicely showing up on log scale) - this makes me think it's probably not a single event that generated them (it would be less gradual), so my best bet would be that this is just a genome with loads of duplication (perhaps some lazy transposons?).

I think you have a great idea about what to expect form the genome now, I would try to assemble it and run KAT or Mercury. I am sure it will make all sense and if it is a tetraploid in the end (my intuition can be wrong), you should not have problems assembling the homoelog sequences separately because they would be substantially diverged.

Hope this helps :-)

P.S. One of my pandemic project was to do a shirt with Heckel's illustration of diatoms
image

from smudgeplot.

sarahfrail avatar sarahfrail commented on August 26, 2024

Hi Kamil,

Thanks so much for your detailed response! We've spent the last several months wrapping our heads around the data in lots of different ways.

We're fairly sure that our Merqury results are consistent with diploid, and now we think we must just have some interesting transposon related duplication like you suggested. Soon, I will try to make some phylogenies from the repeats to try to get at this question. We may yet try to perform a haploid resolved assembly, but we are somewhat limited by sequencing depth and read length, so we've yet to see if this is viable. We also now think our genome is closer to 80% repetitive, which introduces further complications. These organisms are so fun and interesting, but sometimes difficult to work with!

The shirt is beautiful! Thank you for sharing. The diatoms really are living art pieces.

from smudgeplot.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.