I run the as spliceai -I input.vcf -O output.vcf -R hg37.genome.fa for my vcf f

Mismatching scores about spliceai HOT 6 CLOSED

illumina commented on August 14, 2024

Mismatching scores

from spliceai.

Comments (6)

kishorejaganathan commented on August 14, 2024

There are two differences between the current software's output and the one we used to generate the scores provided with the paper:

In the paper, for each variant, we looked for splicing changes up to 50 positions on either side of the variant. In the current version, we look for 500 positions on either side. In the CNGB1 example, notice that the 0.81 value was predicted to be 62 positions away from the variant (this was not caught earlier because we stopped looking at 50 positions).
In the data we provided, we also filter out scores which correspond to increase in the strength of a canonical site and decrease in the strength of a non-canonical site, as these are not likely to cause rare genetic diseases.

We will document this in the README section soon.

from spliceai.

zhangtaowhu commented on August 14, 2024

kishorejaganathan, Thanks a lot for your kind help!!!

from spliceai.

zhangtaowhu commented on August 14, 2024

There are two differences between the current software's output and the one we used to generate the scores provided with the paper:

In the paper, for each variant, we looked for splicing changes up to 50 positions on either side of the variant. In the current version, we look for 500 positions on either side. In the CNGB1 example, notice that the 0.81 value was predicted to be 62 positions away from the variant (this was not caught earlier because we stopped looking at 50 positions).

In the data we provided, we also filter out scores which correspond to increase in the strength of a canonical site and decrease in the strength of a non-canonical site, as these are not likely to cause rare genetic diseases.

We will document this in the README section soon.

Does it mean we might miss some information when dealing with the deep intronic variants deeper than 500bp. [ In the current version, we look for 500 positions on either side.]

from spliceai.

kishorejaganathan commented on August 14, 2024

What I mean by that statement is the following: A variant can alter splicing either at the position of the variant, or in nearby positions right? For example, if the variant affects the branch point, the position of splicing change will be 20-50 nucleotides downstream. What we did earlier was to look for changes +/-50 positions away from the variant to alleviate computational issues. The software looks for +/-500 positions.

Even if the variant is deep into the intron, the position of splicing change is not 1000s of positions away from the variant (these variants usually create a new exon nearby). Most of the time, it is <50nt away, so the output of the software and the supplemental data should be consistent with each other. A few examples pop up where the distance is higher, the number of examples that require >500nt away should be extremely low.

from spliceai.

kishorejaganathan commented on August 14, 2024

To more precisely answer your question, in case you are considering the scenario where a deep intronic variant affects the closest splice site, yes we will be missing those because the splice site is too far away from the variant (but we're not too worried about it as we expect the number of such variants to be very low, deep intronic splice altering variants end up creating new exons most of the time). In the file spliceai/utils.py (line 89), you can change

def get_delta_scores(record, ann, cov=1001):

def get_delta_scores(record, ann, cov=10001):

It will slow down the scoring by a little bit, but that's alright if you don't have too many variants.

from spliceai.

zhangtaowhu commented on August 14, 2024

kishorejaganathan,
Sincere appreciation for your detailed explanation!

from spliceai.

Mismatching scores about spliceai HOT 6 CLOSED

Comments (6)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent