Code Monkey home page Code Monkey logo

Comments (6)

kishorejaganathan avatar kishorejaganathan commented on August 14, 2024

There are two differences between the current software's output and the one we used to generate the scores provided with the paper:

  • In the paper, for each variant, we looked for splicing changes up to 50 positions on either side of the variant. In the current version, we look for 500 positions on either side. In the CNGB1 example, notice that the 0.81 value was predicted to be 62 positions away from the variant (this was not caught earlier because we stopped looking at 50 positions).
  • In the data we provided, we also filter out scores which correspond to increase in the strength of a canonical site and decrease in the strength of a non-canonical site, as these are not likely to cause rare genetic diseases.

We will document this in the README section soon.

from spliceai.

zhangtaowhu avatar zhangtaowhu commented on August 14, 2024

kishorejaganathan, Thanks a lot for your kind help!!!

from spliceai.

zhangtaowhu avatar zhangtaowhu commented on August 14, 2024

There are two differences between the current software's output and the one we used to generate the scores provided with the paper:

  • In the paper, for each variant, we looked for splicing changes up to 50 positions on either side of the variant. In the current version, we look for 500 positions on either side. In the CNGB1 example, notice that the 0.81 value was predicted to be 62 positions away from the variant (this was not caught earlier because we stopped looking at 50 positions).
  • In the data we provided, we also filter out scores which correspond to increase in the strength of a canonical site and decrease in the strength of a non-canonical site, as these are not likely to cause rare genetic diseases.

We will document this in the README section soon.

Does it mean we might miss some information when dealing with the deep intronic variants deeper than 500bp. [ In the current version, we look for 500 positions on either side.]

from spliceai.

kishorejaganathan avatar kishorejaganathan commented on August 14, 2024

What I mean by that statement is the following: A variant can alter splicing either at the position of the variant, or in nearby positions right? For example, if the variant affects the branch point, the position of splicing change will be 20-50 nucleotides downstream. What we did earlier was to look for changes +/-50 positions away from the variant to alleviate computational issues. The software looks for +/-500 positions.

Even if the variant is deep into the intron, the position of splicing change is not 1000s of positions away from the variant (these variants usually create a new exon nearby). Most of the time, it is <50nt away, so the output of the software and the supplemental data should be consistent with each other. A few examples pop up where the distance is higher, the number of examples that require >500nt away should be extremely low.

from spliceai.

kishorejaganathan avatar kishorejaganathan commented on August 14, 2024

To more precisely answer your question, in case you are considering the scenario where a deep intronic variant affects the closest splice site, yes we will be missing those because the splice site is too far away from the variant (but we're not too worried about it as we expect the number of such variants to be very low, deep intronic splice altering variants end up creating new exons most of the time). In the file spliceai/utils.py (line 89), you can change

def get_delta_scores(record, ann, cov=1001):

to

def get_delta_scores(record, ann, cov=10001):

It will slow down the scoring by a little bit, but that's alright if you don't have too many variants.

from spliceai.

zhangtaowhu avatar zhangtaowhu commented on August 14, 2024

kishorejaganathan,
Sincere appreciation for your detailed explanation!

from spliceai.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.