Comments (6)
There are two differences between the current software's output and the one we used to generate the scores provided with the paper:
- In the paper, for each variant, we looked for splicing changes up to 50 positions on either side of the variant. In the current version, we look for 500 positions on either side. In the CNGB1 example, notice that the 0.81 value was predicted to be 62 positions away from the variant (this was not caught earlier because we stopped looking at 50 positions).
- In the data we provided, we also filter out scores which correspond to increase in the strength of a canonical site and decrease in the strength of a non-canonical site, as these are not likely to cause rare genetic diseases.
We will document this in the README section soon.
from spliceai.
kishorejaganathan, Thanks a lot for your kind help!!!
from spliceai.
There are two differences between the current software's output and the one we used to generate the scores provided with the paper:
- In the paper, for each variant, we looked for splicing changes up to 50 positions on either side of the variant. In the current version, we look for 500 positions on either side. In the CNGB1 example, notice that the 0.81 value was predicted to be 62 positions away from the variant (this was not caught earlier because we stopped looking at 50 positions).
- In the data we provided, we also filter out scores which correspond to increase in the strength of a canonical site and decrease in the strength of a non-canonical site, as these are not likely to cause rare genetic diseases.
We will document this in the README section soon.
Does it mean we might miss some information when dealing with the deep intronic variants deeper than 500bp. [ In the current version, we look for 500 positions on either side.]
from spliceai.
What I mean by that statement is the following: A variant can alter splicing either at the position of the variant, or in nearby positions right? For example, if the variant affects the branch point, the position of splicing change will be 20-50 nucleotides downstream. What we did earlier was to look for changes +/-50 positions away from the variant to alleviate computational issues. The software looks for +/-500 positions.
Even if the variant is deep into the intron, the position of splicing change is not 1000s of positions away from the variant (these variants usually create a new exon nearby). Most of the time, it is <50nt away, so the output of the software and the supplemental data should be consistent with each other. A few examples pop up where the distance is higher, the number of examples that require >500nt away should be extremely low.
from spliceai.
To more precisely answer your question, in case you are considering the scenario where a deep intronic variant affects the closest splice site, yes we will be missing those because the splice site is too far away from the variant (but we're not too worried about it as we expect the number of such variants to be very low, deep intronic splice altering variants end up creating new exons most of the time). In the file spliceai/utils.py
(line 89), you can change
def get_delta_scores(record, ann, cov=1001):
to
def get_delta_scores(record, ann, cov=10001):
It will slow down the scoring by a little bit, but that's alright if you don't have too many variants.
from spliceai.
kishorejaganathan,
Sincere appreciation for your detailed explanation!
from spliceai.
Related Issues (20)
- Transcript Dependent Scores HOT 1
- Delta position seems to be wrong for this variant HOT 1
- Interpret SpliceAI result
- Lower Accuracy Than Introme HOT 1
- Training with additional Batch Normalization layer producing strange results HOT 1
- Trouble to launch SpliceAI with grch37 HOT 5
- spliceAI not giving output value while running using vep (Variant Ensemble Predictor) HOT 3
- Position of splice sites within an insertion HOT 1
- Training input shape HOT 1
- Question about using snv and indel score files
- variant not scored HOT 5
- Running SpliceAI takes too much time
- Duplicate records in the released VCF file HOT 3
- Unable to install using conda install HOT 1
- Running Short Tandem Repeat genotypes
- build-in grch38 annotation
- How to make a custom annotation set? HOT 2
- No training configuration found in the save file, so the model was *not* compiled. Compile it manually. HOT 3
- spliceai score HOT 3
- Query about spliceai to calculate Delins HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from spliceai.