Comments (4)
ah, ok, I've found the answer. It's not really clear at the first time to figure out what is what and where is it...
somewhere under a link from README:
Note: The annotations for all possible SNVs within genes are available here for download.
somehow with instruction from cli
I figured out to run such command
./bs project download --id 66029966 -o down/
there are vcfs with scores for hg19:
down/SpliceAI_supplement_ds.79b22cc932df4db8848c87afd19d78d3$ ls
exome_spliceai_scores.vcf.gz
gencode_gtex_train.tsv
gencode_test.tsv
gencode_train.tsv
gtex_junctions
lincrna.tsv
README
whole_genome_filtered_spliceai_scores.vcf.gz
Can spliceai use this data directly or I should write my own scripts?
from spliceai.
(regarding 64 CPUs) I'm not sure SpliceAI is capable of using multiprocessing to speed things up, unless you've made code changes. On a single CPU, it scores around 4K variants per hour, the number is around 25K on a single GPU.
(caching) No, SpliceAI does not cache any variants.
(user warning) No, it is not important - you can ignore it.
(regarding the prescored variants) SpliceAI cannot use this data directly at the moment. That is a good suggestion though, and I will consider adding that functionality in the next release. Right now, what we recommend is to use to tool to only score INDELs and use the prescored list for all SNV annotations (since we've covered all SNVs). The file you're interested in is whole_genome_filtered_spliceai_scores.vcf.gz . We scored all possible SNVs from TSS start to stop of GENCODE canonical genes. To keep the file size small, we've discarded variants with scores less than 0.1.
from spliceai.
Hi, I find that there are two types of prescored files in dataset(spliceai_scores.masked.indel.hg19.vcf.gz and spliceai_scores.raw.indel.hg19.vcf.gz), I want to know what is the difference between these two files and can I use these
prescored indel files to annotate my own indel variants directly ? Many thanks @kishorejaganathan
from spliceai.
From FAQ #2:
The raw files also include splicing changes corresponding to strengthening annotated splice sites and weakening unannotated splice sites, which are typically much less pathogenic than weakening annotated splice sites and strengthening unannotated splice sites. The delta scores of such splicing changes are set to 0 in the masked files. We recommend using raw files for alternative splicing analysis and masked files for variant interpretation.
from spliceai.
Related Issues (20)
- Lower Accuracy Than Introme HOT 1
- Training with additional Batch Normalization layer producing strange results HOT 1
- Trouble to launch SpliceAI with grch37 HOT 5
- spliceAI not giving output value while running using vep (Variant Ensemble Predictor) HOT 3
- Position of splice sites within an insertion HOT 1
- Training input shape HOT 1
- Question about using snv and indel score files
- variant not scored HOT 5
- Running SpliceAI takes too much time
- Duplicate records in the released VCF file HOT 3
- Unable to install using conda install HOT 1
- Running Short Tandem Repeat genotypes
- build-in grch38 annotation
- How to make a custom annotation set? HOT 2
- No training configuration found in the save file, so the model was *not* compiled. Compile it manually. HOT 3
- spliceai score HOT 3
- Query about spliceai to calculate Delins HOT 1
- WARNING:root:Skipping record (ref too long)
- Way to many TEMP files
- Figure 1c Reproduction HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from spliceai.