Code Monkey home page Code Monkey logo

Comments (7)

fritzsedlazeck avatar fritzsedlazeck commented on August 12, 2024

Hi,
thanks for pointing this out.
the -s defines the minimum number of reads supporting an event before it is called. Thus -s2 should give the same results then -s 5.

I usually use -s 10 to get a robust set of SV calls. Apart from that everything looks good. Would you mind sharing the bam file such that I can take a look and also check if there is something going on with NGMLR?

What might interest you is also the -n parameter. This gives you the read names that support the deletion. I often use this to check strange SV in IGV.

Thanks,
Fritz

from sniffles.

eldariont avatar eldariont commented on August 12, 2024

Hi,
sure, here are:

Thanks for the hint with the -n parameter! It turns out that (at least for one big detected deletion I have looked at) there are reads that map to both sides of the SV. That would be all fine if the two mapping locations wouldn't be so far apart. But maybe you can find out more with the files :)

Cheers, David

from sniffles.

fritzsedlazeck avatar fritzsedlazeck commented on August 12, 2024

Hi David,
yes, I was more expecting an alignment artifact. NGMLR improves things a lot, but we still see sometimes noisy events... Thats why I keep the -s to 10 (default) most of the time...

Thanks for the files.
Fritz

from sniffles.

eldariont avatar eldariont commented on August 12, 2024

Hi Fritz,

I would like to hear your opinion on the following scenario: Let's say there is an intra-chromosomal duplication (e.g. a LINE) that copied the sequence of chr1:44,000,000-44,007,000 to another place in that same chromosome (e.g. at chr1:100,000). And let's say there is a read covering that insertion location such that the two read tails map left and right of the insertion location but the middle part maps to the source region of the duplication. Would Sniffles call a deletion between chr1:100,000-44,000,000?

I'm asking because I think this might be behind some of the very large deletion calls described above. I'm seeing this not only for simulated data: From the NA12878 PacBio data, Sniffles (with -s 10) reports 175 deletions >1Mb with the largest having a size of 78 Mb.

Cheers
David

from sniffles.

fritzsedlazeck avatar fritzsedlazeck commented on August 12, 2024

Hi David,
thanks for reaching out. Yes that is indeed a problem. I looked into this for a long time and did not find a satisfying solution. The DEL or INV if they are very large can indicate exactly this.

The problem is that there is no difference in the signal for me apart from the length. Thus, I did not want to set an arbitrary threshold.

I hope that helps
Fritz

from sniffles.

eldariont avatar eldariont commented on August 12, 2024

Thanks for the quick reply, Fritz.
It is indeed hard to distinguish from the signal except when you look at all read segments at once. But even then, there might be cases where it's very hard to tell apart.

Best
David

from sniffles.

fritzsedlazeck avatar fritzsedlazeck commented on August 12, 2024

No problem.
I hope that helps.
Thanks
Fritz

from sniffles.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.