Comments (8)
Could you let me know what the variant under consideration is? (the first variant that has not been annotated by SpliceAI, i.e., INFO field doesn't contain the key SpliceAI)
from spliceai.
I'm running this with ClinVar data's vcf format. Here's the first few lines:
##fileDate=2019-02-11
##source=ClinVar
##reference=GRCh37
##ID=<Description="ClinVar Variation ID">
##INFO=<ID=AF_ESP,Number=1,Type=Float,Description="allele frequencies from GO-ESP">
##INFO=<ID=AF_EXAC,Number=1,Type=Float,Description="allele frequencies from ExAC">
##INFO=<ID=AF_TGP,Number=1,Type=Float,Description="allele frequencies from TGP">
##INFO=<ID=ALLELEID,Number=1,Type=Integer,Description="the ClinVar Allele ID">
##INFO=<ID=CLNDN,Number=.,Type=String,Description="ClinVar's preferred disease name for the concept specified by disease identifiers in CLNDISDB">
##INFO=<ID=CLNDNINCL,Number=.,Type=String,Description="For included Variant : ClinVar's preferred disease name for the concept specified by disease identifiers in CLNDISDB">
##INFO=<ID=CLNDISDB,Number=.,Type=String,Description="Tag-value pairs of disease database name and identifier, e.g. OMIM:NNNNNN">
##INFO=<ID=CLNDISDBINCL,Number=.,Type=String,Description="For included Variant: Tag-value pairs of disease database name and identifier, e.g. OMIM:NNNNNN">
##INFO=<ID=CLNHGVS,Number=.,Type=String,Description="Top-level (primary assembly, alt, or patch) HGVS expression.">
##INFO=<ID=CLNREVSTAT,Number=.,Type=String,Description="ClinVar review status for the Variation ID">
##INFO=<ID=CLNSIG,Number=.,Type=String,Description="Clinical significance for this single variant">
##INFO=<ID=CLNSIGCONF,Number=.,Type=String,Description="Conflicting clinical significance for this single variant">
##INFO=<ID=CLNSIGINCL,Number=.,Type=String,Description="Clinical significance for a haplotype or genotype that includes this variant. Reported as pairs of VariationID:clinical significance.">
##INFO=<ID=CLNVC,Number=1,Type=String,Description="Variant type">
##INFO=<ID=CLNVCSO,Number=1,Type=String,Description="Sequence Ontology id for variant type">
##INFO=<ID=CLNVI,Number=.,Type=String,Description="the variant's clinical sources reported as tag-value pairs of database and variant identifier">
##INFO=<ID=DBVARID,Number=.,Type=String,Description="nsv accessions from dbVar for the variant">
##INFO=<ID=GENEINFO,Number=1,Type=String,Description="Gene(s) for the variant reported as gene symbol:gene id. The gene symbol and id are delimited by a colon (:) and each pair is delimited by a vertical bar (|)">
##INFO=<ID=MC,Number=.,Type=String,Description="comma separated list of molecular consequence in the form of Sequence Ontology ID|molecular_consequence">
##INFO=<ID=ORIGIN,Number=.,Type=String,Description="Allele origin. One or more of the following values may be added: 0 - unknown; 1 - germline; 2 - somatic; 4 - inherited; 8 - paternal; 16 - maternal; 32 - de-novo; 64 - biparental; 128 -
uniparental; 256 - not-tested; 512 - tested-inconclusive; 1073741824 - other">
##INFO=<ID=RS,Number=.,Type=String,Description="dbSNP ID (i.e. rs number)">
##INFO=<ID=SSR,Number=1,Type=Integer,Description="Variant Suspect Reason Codes. One or more of the following values may be added: 0 - unspecified, 1 - Paralog, 2 - byEST, 4 - oldAlign, 8 - Para_EST, 16 - 1kg_failed, 1024 - other">
#CHROM POS ID REF ALT QUAL FILTER INFO
1 949422 475283 G A . . AF_ESP=0.00546;AF_EXAC=0.00165;AF_TGP=0.00619;ALLELEID=446939;CLNDISDB=MedGen:C4015293,OMIM:616126,Orphanet:ORPHA319563;CLNDN=Immunodeficiency_38_with_basal_ganglia_calcification;CL
NHGVS=NC_000001.10:g.949422G>A;CLNREVSTAT=criteria_provided,_single_submitter;CLNSIG=Benign;CLNVC=single_nucleotide_variant;CLNVCSO=SO:0001483;GENEINFO=ISG15:9636;MC=SO:0001583|missense_variant;ORIGIN=1;RS=143888043
1 949502 542074 C T . . AF_ESP=0.00015;AF_EXAC=0.00010;ALLELEID=514926;CLNDISDB=MedGen:C4015293,OMIM:616126,Orphanet:ORPHA319563;CLNDN=Immunodeficiency_38_with_basal_ganglia_calcification;CLNHGVS=NC_000001
.10:g.949502C>T;CLNREVSTAT=criteria_provided,_single_submitter;CLNSIG=Uncertain_significance;CLNVC=single_nucleotide_variant;CLNVCSO=SO:0001483;GENEINFO=ISG15:9636;MC=SO:0001583|missense_variant;ORIGIN=1;RS=150861311```
from spliceai.
Did SpliceAI annotate a few variants before producing this error, or was the error produced right away? You can get the answer to this in the output file.
from spliceai.
Ah I found the error. This is the line that it stopped on.
1 7682000 599309 CGTCTTCTCTGGGTGACCTCGTGTGACCTTGCAATCCCGTCAAGTCTTTGGTTCTGCAGAAACAACTGTCAGATCGGGAGTTAGTCACCTCCAAAGACAGAAACCAAAAACCAAAACAGTGACCTTGTTCTTTCTGTTCATGACCCTGGAGGTATGGGGGGAGGAGGAGGAAGCGGAGATTGGTTTTGCTCGTTAACTCATTATCAGTGACTGAGCGGCTACTCATGGCTGGGAACTTGTTGGGTGCCCTCCATTACACTTCCCTTAATCCTCTCAATGGCTTTTGATATCCTCAGGTAGATGGGAGGATCCCCATTTTACCTGGGAGGGATACAAGGCCAAGAGGGGTTAAGTAACTTGCCCAAGATCACACAGCGTTTGAAGTGGCCAAGGTGAAATTTGATCTAACTTCCACTTGGATGTGTGATTCCACTCCACATCCAAGTTCTTTCCATCATCTGGCTGCAGGCAAGATAACAGATTGGTGTATTCTTTCTATCTTATTTGAATTCCAAAGTTAGTATCCGCTATTGAAAACAGCTGACTTCATTATAACCCATCATTGACTCCTTTCTTTAACAAATAACCAGCAAAACAGAGCGTTCTTTCGGAGTAAGCCTTCCACCACCTGAACTTTGACAACTTTCTCCCTAATGGGTTTCTCCCTGTGAGTATTTGATCTTGGCCACCAGCATGAGCCTATCCTGGTGATCTCAAAATGCTGAGGTTTTTCTTCTGTTAAGAGATTGAATCTTCTGTTCCATGTGGAGGCTTATTCAACATTAAAGCGTTCCTGAACCATTTCATTTATTCAACAAATATTTTCCAAGCCCAGCTCTGGGCCAGATGCCAGGCTCAAGGCTTAGCAAGCCAGACAGATGTGGAGCCGCCATCATGGAACTCGGCCCTCACAGGCCATTGTCCTTGGCTTTCCAGACTGAACCTCTGTGCTTTCTTGTTTCACGGCTGTTTTCCTGAATGCTCCCTGTGAGCCCGAGTCTGTCCTGAGGTCTTCGTCCCTATTCCCTGTGCCTGAATCCTTTCCTCTCCCAGAGCTCCCCACAGCTGGCTTCTTTTTATCATCTCAGATACAGCTGAAGGTTCCATCCTCAGAGGCCTTCTCAAGCTCGCTGGCTAAATTCTCCACCCACCCCCATTTTTTATGCTCATACCACTTCATGCCATCCAGAACCACCTTATTCTTTATTCCAATGTTTATAGCTCGTCTCTTCCATGCTCTTTGAGGGCAGAGTCTTAGACTTGGTCACAGTAGACCCCAAATGTAGGCCAGTCCCTGGTATCTTGTGGTTCTTGAGAGCTAATCTGTTGAATGAATGAATGAATGAATGAATGAGGGAATTAATGAACACCCTGCAAAGCAAGTGCCACAGTTACCCCTACTCTATAGAAGAGGCAGCTGTTCCTCTAAAAGGCTGAGAATCCCACCTCAGAGCTACCTGCAGGAGAGTGGGTGAGCTGGGACCCCAGTCATTTCAGGTTAACTGAAGGGAAAGCCTGACTTCCCATCAGCTCCACACACTGAGTCCTTTCTCCTAGAAATGAAGGAAACACAAAGAAGTCACTAAGAAGGCACCTCGCCCTCAAGCATCTGAAACCCTCAGTTGGAAATTTGACTGCAACCTCTGCCTCCTGGGTTCAAGGGATTCTCCTGCCTCAGCCCCCCAAGTAGCTGGGACTACAGGCATGCGCCACCACACCCAGCTAATTCTGTATTTTCAGGAGAGATGGGATTTCACCATGTTGGCCAGGCTGGTCTCAAACTCCTGACCTCAGGTGTTCCGCCCGCCTCGGCCTCCCGAGGTGCTGGGATTACAGAAGCTCTTTTTCAACATTGCCTTTTAGATCCCCTTTCCCATCAGCTTCCTCCAAAGGAGCTGCTGGCTTGATTTACCTGCAGCCACCGCAATTAATTGCTGTCCCCTGGTAGCTAACATCTCTTAACAGCTCCATAGCAACAGTGGTTTCCAAATGCCAGAAAGTGTTTTTTCCAAGGGGACTCTATCCTCGACGCAAAATAGAACATTTTGAGTGAAATTTAAACACAGTTTTTGAACGGTGCTGTTCTGATTTCTTCACACTTCACAAATGCTTTCTTTGCAAACATGGGTGTCCCAGTTACCCTGGGCTGGTGCCAAGGCCCTCAGCCCAGCTTCTGTTTTCTCAGCTTCAAGGCAGCCGTGGTAAGACCTAAAAACAAACAGGGAGCGAGTGCTGTGGGTCTCCCCGCTAAGGGGTTGCCTTGCGCCTGTGCAGAGATGAGGACCACAGTTTGTCTCCCACACCAGCTGTGATCAATTGGTGATGACAACGCAGAACCCTGCCTATGCGATTTCTGGGACCCCATCTGAGCTTAGTGGGAAAGAGTGCCATCCGCCATGGTTCCAGTGCAGAATTCCAAGGGCCAAGATGGGGAGTAGGGATTTGTGTATGAGTGGTTGGTTGGAACAAATCATCCTGGTATACCACTGCTCCCACAGGTGGAAACAGAGAAACTGGGTTAAAATGAAGACTCCAGTAAATTCGCATGCAGAAGTACCCAGAAAGGCCCTTGCCATGTCCTTCACTCTGCCTTGGTGAAGGTGGAAGAGATGAAGGGAGAGGGTGATGGGTGGGTGGTGGGTGTGCTGGCCACCGACTGGCACAGCCCCCAGCACAGGGGCTGTCTCTAGTGACCGATCCAGGATGAGTGTCCCCATTCAGATGCAGTCAACCCACCACTGATTCAACCAACACTTGCTGACTGCCTATCAGGGAATACATTTCCTTCTGATGCACACTCTGCTCTCAGGAAGTCTGGGGTCCAGAGAGGGAATGGAGGTGCCAGCCAGCCTGAAGATCAGTGGAAAGAGTGAGGCTGGCTGATGCTGCAGGAGAGACACAGAGGACTGTCAGGGGAGTCCCAAAGGGAATGAAATTATTTTGGTTGGAAGGATAAAGGGAGGCTGGAAGGACAGTGTGACGTTGGATCTGGGTCTTGACA C . . ALLELEID=590571;CLNDISDB=MedGen:C3553661,OMIM:614756,Orphanet:ORPHA314647;CLNDN=Cerebellar_ataxia,_nonprogressive,_with_mental_retardation;CLNHGVS=NC_000001.10:g.7682001_7685000del;CLNREVSTAT=criteria_provided,_single_submitter;CLNSIG=Pathogenic;CLNVC=Deletion;CLNVCSO=SO:0000159;GENEINFO=CAMTA1:23261;ORIGIN=32
from spliceai.
This makes sense. It turns out that the software cannot handle deletions larger than 500. Thank you for bringing this to our attention, we will fix this in the next commit. For now, if you could just delete such variants, you should be fine.
from spliceai.
Another issue
File "/home/ubuntu/anaconda3/envs/tensorflow_p36/bin/spliceai", line 11, in <module>
sys.exit(main())
File "/home/ubuntu/anaconda3/envs/tensorflow_p36/lib/python3.6/site-packages/spliceai/__main__.py", line 53, in main
scores = get_delta_scores(record, ann)
File "/home/ubuntu/anaconda3/envs/tensorflow_p36/lib/python3.6/site-packages/spliceai/utils.py", line 72, in get_delta_scores
for j in range(len(record.alts)):
TypeError: object of type 'NoneType' has no len()
The value:
1 11854442 187893 C . . . AF_ESP=8e-05;AF_EXAC=2e-05;ALLELEID=185770;CLNDISDB=MedGen:C1856058,OMIM:236250;CLNDN=Homocysteinemia_due_to_MTHFR_deficiency;CLNHGVS=NC_000001.10:g.11854442C>T;CLNREVSTAT=no_assertion_criteria_provided;CLNSIG=Pathogenic;CLNVC=single_nucleotide_variant;CLNVCSO=SO:0001483;GENEINFO=MTHFR:4524;MC=SO:0001819|synonymous_variant;ORIGIN=1;RS=367585605;
from spliceai.
I am not entirely sure what this variant is. As earlier, please delete this variant (and others which have a "." in the ALT column) in the input file before running the software. Do let us know if you run into more edge cases like these, and we will incorporate them into the annotation software so that it skips such variants automatically instead of crashing.
from spliceai.
The current release (v1.2) should ignore these edge cases automatically.
from spliceai.
Related Issues (20)
- Lower Accuracy Than Introme HOT 1
- Training with additional Batch Normalization layer producing strange results HOT 1
- Trouble to launch SpliceAI with grch37 HOT 5
- spliceAI not giving output value while running using vep (Variant Ensemble Predictor) HOT 3
- Position of splice sites within an insertion HOT 1
- Training input shape HOT 1
- Question about using snv and indel score files
- variant not scored HOT 5
- Running SpliceAI takes too much time
- Duplicate records in the released VCF file HOT 3
- Unable to install using conda install HOT 1
- Running Short Tandem Repeat genotypes
- build-in grch38 annotation
- How to make a custom annotation set? HOT 2
- No training configuration found in the save file, so the model was *not* compiled. Compile it manually. HOT 3
- spliceai score HOT 3
- Query about spliceai to calculate Delins HOT 1
- WARNING:root:Skipping record (ref too long)
- Way to many TEMP files
- Figure 1c Reproduction HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from spliceai.