Code Monkey home page Code Monkey logo

Comments (3)

KamilSJaron avatar KamilSJaron commented on August 26, 2024

Hi M.,

10x is not the best for making smudgeplots, the coverage variation is a bit higher compared to other types of libraries (surprisingly even PacBio HiFi is better for kmer spectra analyses). So I think your analysis will be a bit more sensitive to the L parameter you chose. What did you use? And can you show us also a genomescope plot? That helps a lot to decide about a meaningful L.

I am a bit puzzled of the 1n coverage estimate, I am not 100% how smudgeplot ended up on that number... (the labels are not well overlaping with the smudges). But I also supect your L was too low and the bottom part of the smudgeplot is simply error kmers paired with the unique genomic kmers and then the upper smudge is the real diploid one (which would also estimate the 1n coverage a bit closer to the expected 30x, assuming that the 60x is what people call genome coverage, not per-haplotype coverage).

from smudgeplot.

mason-linscott avatar mason-linscott commented on August 26, 2024

Hi Kamil,

Thank you for looking into this! Here is the genomscope plot made using 31-mers (high K chosen due to genome repetitiveness). I was not sure whether to also apply a higher K for smudgeplot and just followed the guide. I hope it helps with your interpretation.

G3_sup_fig_1_genomscope

L was set to 15 and U to 2700 based on the output from the smudgeplot.py cutoff. I can raise L on the next smudgepairs run (takes a day and a half).

Thanks again,
Mason

from smudgeplot.

KamilSJaron avatar KamilSJaron commented on August 26, 2024

Hi Mason,

that is a giant genome. That must be a huge dataset you are dealing with.

So, the first thing first. Your sequencing is not the cleanest (which is understandable for giant mollusc genomes) - the error peak and genome peak are overlapping. The problem is that you don't have a clean cut between errors and real genomic kmers, which is what we want L to be.

Good news is that the genomescope looks alright (both het and homo zygous peaks are kind of visible and the model relativelly well explains the data). Also, you have very little evidence of polyploidy (assuming your 1n coverage fit is right), because those usually have some of the higher coverage peaks relatively high as well. So the only thing I would suggest to you is to redo the smudgeplot PLOT with a coverage prior from the genomescope model. You can run it as:

smudgeplot.py plot 21kmc_L15_U2700_coverages.tsv -n 18 -o "Land_snail" -t "O. idahoensis" 

I am suspecting you will get the peaks annotated right this time (as AB and AABB). I suspect the proportion of AB vs AABB kmers will be a bit different and quite possibly favouring AABB. That practically means that the 2n peak kmers have very often a very similar kmers within the same 2n peak. It is a puzzling signature for genomes with high heterozygosity, but perhaps we should not be bothered by this that much given how much troubles we had separating 1n and error peaks. Perhaps just keep in mind for downstream analyses that you might have quite a lot of paralogy in your genome (and keep even deeper in your mind that there polyploidy was not conclusively rejected in this analysis).

from smudgeplot.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.