Code Monkey home page Code Monkey logo

Comments (8)

kishorejaganathan avatar kishorejaganathan commented on September 17, 2024

Could you let me know what the variant under consideration is? (the first variant that has not been annotated by SpliceAI, i.e., INFO field doesn't contain the key SpliceAI)

from spliceai.

cnk113 avatar cnk113 commented on September 17, 2024

I'm running this with ClinVar data's vcf format. Here's the first few lines:

##fileDate=2019-02-11
##source=ClinVar
##reference=GRCh37
##ID=<Description="ClinVar Variation ID">
##INFO=<ID=AF_ESP,Number=1,Type=Float,Description="allele frequencies from GO-ESP">
##INFO=<ID=AF_EXAC,Number=1,Type=Float,Description="allele frequencies from ExAC">
##INFO=<ID=AF_TGP,Number=1,Type=Float,Description="allele frequencies from TGP">
##INFO=<ID=ALLELEID,Number=1,Type=Integer,Description="the ClinVar Allele ID">
##INFO=<ID=CLNDN,Number=.,Type=String,Description="ClinVar's preferred disease name for the concept specified by disease identifiers in CLNDISDB">
##INFO=<ID=CLNDNINCL,Number=.,Type=String,Description="For included Variant : ClinVar's preferred disease name for the concept specified by disease identifiers in CLNDISDB">
##INFO=<ID=CLNDISDB,Number=.,Type=String,Description="Tag-value pairs of disease database name and identifier, e.g. OMIM:NNNNNN">
##INFO=<ID=CLNDISDBINCL,Number=.,Type=String,Description="For included Variant: Tag-value pairs of disease database name and identifier, e.g. OMIM:NNNNNN">
##INFO=<ID=CLNHGVS,Number=.,Type=String,Description="Top-level (primary assembly, alt, or patch) HGVS expression.">
##INFO=<ID=CLNREVSTAT,Number=.,Type=String,Description="ClinVar review status for the Variation ID">
##INFO=<ID=CLNSIG,Number=.,Type=String,Description="Clinical significance for this single variant">
##INFO=<ID=CLNSIGCONF,Number=.,Type=String,Description="Conflicting clinical significance for this single variant">
##INFO=<ID=CLNSIGINCL,Number=.,Type=String,Description="Clinical significance for a haplotype or genotype that includes this variant. Reported as pairs of VariationID:clinical significance.">
##INFO=<ID=CLNVC,Number=1,Type=String,Description="Variant type">
##INFO=<ID=CLNVCSO,Number=1,Type=String,Description="Sequence Ontology id for variant type">
##INFO=<ID=CLNVI,Number=.,Type=String,Description="the variant's clinical sources reported as tag-value pairs of database and variant identifier">
##INFO=<ID=DBVARID,Number=.,Type=String,Description="nsv accessions from dbVar for the variant">
##INFO=<ID=GENEINFO,Number=1,Type=String,Description="Gene(s) for the variant reported as gene symbol:gene id. The gene symbol and id are delimited by a colon (:) and each pair is delimited by a vertical bar (|)">
##INFO=<ID=MC,Number=.,Type=String,Description="comma separated list of molecular consequence in the form of Sequence Ontology ID|molecular_consequence">
##INFO=<ID=ORIGIN,Number=.,Type=String,Description="Allele origin. One or more of the following values may be added: 0 - unknown; 1 - germline; 2 - somatic; 4 - inherited; 8 - paternal; 16 - maternal; 32 - de-novo; 64 - biparental; 128 -
 uniparental; 256 - not-tested; 512 - tested-inconclusive; 1073741824 - other">
##INFO=<ID=RS,Number=.,Type=String,Description="dbSNP ID (i.e. rs number)">
##INFO=<ID=SSR,Number=1,Type=Integer,Description="Variant Suspect Reason Codes. One or more of the following values may be added: 0 - unspecified, 1 - Paralog, 2 - byEST, 4 - oldAlign, 8 - Para_EST, 16 - 1kg_failed, 1024 - other">
#CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO
1       949422  475283  G       A       .       .       AF_ESP=0.00546;AF_EXAC=0.00165;AF_TGP=0.00619;ALLELEID=446939;CLNDISDB=MedGen:C4015293,OMIM:616126,Orphanet:ORPHA319563;CLNDN=Immunodeficiency_38_with_basal_ganglia_calcification;CL
NHGVS=NC_000001.10:g.949422G>A;CLNREVSTAT=criteria_provided,_single_submitter;CLNSIG=Benign;CLNVC=single_nucleotide_variant;CLNVCSO=SO:0001483;GENEINFO=ISG15:9636;MC=SO:0001583|missense_variant;ORIGIN=1;RS=143888043
1       949502  542074  C       T       .       .       AF_ESP=0.00015;AF_EXAC=0.00010;ALLELEID=514926;CLNDISDB=MedGen:C4015293,OMIM:616126,Orphanet:ORPHA319563;CLNDN=Immunodeficiency_38_with_basal_ganglia_calcification;CLNHGVS=NC_000001
.10:g.949502C>T;CLNREVSTAT=criteria_provided,_single_submitter;CLNSIG=Uncertain_significance;CLNVC=single_nucleotide_variant;CLNVCSO=SO:0001483;GENEINFO=ISG15:9636;MC=SO:0001583|missense_variant;ORIGIN=1;RS=150861311```

from spliceai.

kishorejaganathan avatar kishorejaganathan commented on September 17, 2024

Did SpliceAI annotate a few variants before producing this error, or was the error produced right away? You can get the answer to this in the output file.

from spliceai.

cnk113 avatar cnk113 commented on September 17, 2024

Ah I found the error. This is the line that it stopped on.
1 7682000 599309 CGTCTTCTCTGGGTGACCTCGTGTGACCTTGCAATCCCGTCAAGTCTTTGGTTCTGCAGAAACAACTGTCAGATCGGGAGTTAGTCACCTCCAAAGACAGAAACCAAAAACCAAAACAGTGACCTTGTTCTTTCTGTTCATGACCCTGGAGGTATGGGGGGAGGAGGAGGAAGCGGAGATTGGTTTTGCTCGTTAACTCATTATCAGTGACTGAGCGGCTACTCATGGCTGGGAACTTGTTGGGTGCCCTCCATTACACTTCCCTTAATCCTCTCAATGGCTTTTGATATCCTCAGGTAGATGGGAGGATCCCCATTTTACCTGGGAGGGATACAAGGCCAAGAGGGGTTAAGTAACTTGCCCAAGATCACACAGCGTTTGAAGTGGCCAAGGTGAAATTTGATCTAACTTCCACTTGGATGTGTGATTCCACTCCACATCCAAGTTCTTTCCATCATCTGGCTGCAGGCAAGATAACAGATTGGTGTATTCTTTCTATCTTATTTGAATTCCAAAGTTAGTATCCGCTATTGAAAACAGCTGACTTCATTATAACCCATCATTGACTCCTTTCTTTAACAAATAACCAGCAAAACAGAGCGTTCTTTCGGAGTAAGCCTTCCACCACCTGAACTTTGACAACTTTCTCCCTAATGGGTTTCTCCCTGTGAGTATTTGATCTTGGCCACCAGCATGAGCCTATCCTGGTGATCTCAAAATGCTGAGGTTTTTCTTCTGTTAAGAGATTGAATCTTCTGTTCCATGTGGAGGCTTATTCAACATTAAAGCGTTCCTGAACCATTTCATTTATTCAACAAATATTTTCCAAGCCCAGCTCTGGGCCAGATGCCAGGCTCAAGGCTTAGCAAGCCAGACAGATGTGGAGCCGCCATCATGGAACTCGGCCCTCACAGGCCATTGTCCTTGGCTTTCCAGACTGAACCTCTGTGCTTTCTTGTTTCACGGCTGTTTTCCTGAATGCTCCCTGTGAGCCCGAGTCTGTCCTGAGGTCTTCGTCCCTATTCCCTGTGCCTGAATCCTTTCCTCTCCCAGAGCTCCCCACAGCTGGCTTCTTTTTATCATCTCAGATACAGCTGAAGGTTCCATCCTCAGAGGCCTTCTCAAGCTCGCTGGCTAAATTCTCCACCCACCCCCATTTTTTATGCTCATACCACTTCATGCCATCCAGAACCACCTTATTCTTTATTCCAATGTTTATAGCTCGTCTCTTCCATGCTCTTTGAGGGCAGAGTCTTAGACTTGGTCACAGTAGACCCCAAATGTAGGCCAGTCCCTGGTATCTTGTGGTTCTTGAGAGCTAATCTGTTGAATGAATGAATGAATGAATGAATGAGGGAATTAATGAACACCCTGCAAAGCAAGTGCCACAGTTACCCCTACTCTATAGAAGAGGCAGCTGTTCCTCTAAAAGGCTGAGAATCCCACCTCAGAGCTACCTGCAGGAGAGTGGGTGAGCTGGGACCCCAGTCATTTCAGGTTAACTGAAGGGAAAGCCTGACTTCCCATCAGCTCCACACACTGAGTCCTTTCTCCTAGAAATGAAGGAAACACAAAGAAGTCACTAAGAAGGCACCTCGCCCTCAAGCATCTGAAACCCTCAGTTGGAAATTTGACTGCAACCTCTGCCTCCTGGGTTCAAGGGATTCTCCTGCCTCAGCCCCCCAAGTAGCTGGGACTACAGGCATGCGCCACCACACCCAGCTAATTCTGTATTTTCAGGAGAGATGGGATTTCACCATGTTGGCCAGGCTGGTCTCAAACTCCTGACCTCAGGTGTTCCGCCCGCCTCGGCCTCCCGAGGTGCTGGGATTACAGAAGCTCTTTTTCAACATTGCCTTTTAGATCCCCTTTCCCATCAGCTTCCTCCAAAGGAGCTGCTGGCTTGATTTACCTGCAGCCACCGCAATTAATTGCTGTCCCCTGGTAGCTAACATCTCTTAACAGCTCCATAGCAACAGTGGTTTCCAAATGCCAGAAAGTGTTTTTTCCAAGGGGACTCTATCCTCGACGCAAAATAGAACATTTTGAGTGAAATTTAAACACAGTTTTTGAACGGTGCTGTTCTGATTTCTTCACACTTCACAAATGCTTTCTTTGCAAACATGGGTGTCCCAGTTACCCTGGGCTGGTGCCAAGGCCCTCAGCCCAGCTTCTGTTTTCTCAGCTTCAAGGCAGCCGTGGTAAGACCTAAAAACAAACAGGGAGCGAGTGCTGTGGGTCTCCCCGCTAAGGGGTTGCCTTGCGCCTGTGCAGAGATGAGGACCACAGTTTGTCTCCCACACCAGCTGTGATCAATTGGTGATGACAACGCAGAACCCTGCCTATGCGATTTCTGGGACCCCATCTGAGCTTAGTGGGAAAGAGTGCCATCCGCCATGGTTCCAGTGCAGAATTCCAAGGGCCAAGATGGGGAGTAGGGATTTGTGTATGAGTGGTTGGTTGGAACAAATCATCCTGGTATACCACTGCTCCCACAGGTGGAAACAGAGAAACTGGGTTAAAATGAAGACTCCAGTAAATTCGCATGCAGAAGTACCCAGAAAGGCCCTTGCCATGTCCTTCACTCTGCCTTGGTGAAGGTGGAAGAGATGAAGGGAGAGGGTGATGGGTGGGTGGTGGGTGTGCTGGCCACCGACTGGCACAGCCCCCAGCACAGGGGCTGTCTCTAGTGACCGATCCAGGATGAGTGTCCCCATTCAGATGCAGTCAACCCACCACTGATTCAACCAACACTTGCTGACTGCCTATCAGGGAATACATTTCCTTCTGATGCACACTCTGCTCTCAGGAAGTCTGGGGTCCAGAGAGGGAATGGAGGTGCCAGCCAGCCTGAAGATCAGTGGAAAGAGTGAGGCTGGCTGATGCTGCAGGAGAGACACAGAGGACTGTCAGGGGAGTCCCAAAGGGAATGAAATTATTTTGGTTGGAAGGATAAAGGGAGGCTGGAAGGACAGTGTGACGTTGGATCTGGGTCTTGACA C . . ALLELEID=590571;CLNDISDB=MedGen:C3553661,OMIM:614756,Orphanet:ORPHA314647;CLNDN=Cerebellar_ataxia,_nonprogressive,_with_mental_retardation;CLNHGVS=NC_000001.10:g.7682001_7685000del;CLNREVSTAT=criteria_provided,_single_submitter;CLNSIG=Pathogenic;CLNVC=Deletion;CLNVCSO=SO:0000159;GENEINFO=CAMTA1:23261;ORIGIN=32

from spliceai.

kishorejaganathan avatar kishorejaganathan commented on September 17, 2024

This makes sense. It turns out that the software cannot handle deletions larger than 500. Thank you for bringing this to our attention, we will fix this in the next commit. For now, if you could just delete such variants, you should be fine.

from spliceai.

cnk113 avatar cnk113 commented on September 17, 2024

Another issue

  File "/home/ubuntu/anaconda3/envs/tensorflow_p36/bin/spliceai", line 11, in <module>
    sys.exit(main())
  File "/home/ubuntu/anaconda3/envs/tensorflow_p36/lib/python3.6/site-packages/spliceai/__main__.py", line 53, in main
    scores = get_delta_scores(record, ann)
  File "/home/ubuntu/anaconda3/envs/tensorflow_p36/lib/python3.6/site-packages/spliceai/utils.py", line 72, in get_delta_scores
    for j in range(len(record.alts)):
TypeError: object of type 'NoneType' has no len()

The value:

1 11854442 187893 C . . . AF_ESP=8e-05;AF_EXAC=2e-05;ALLELEID=185770;CLNDISDB=MedGen:C1856058,OMIM:236250;CLNDN=Homocysteinemia_due_to_MTHFR_deficiency;CLNHGVS=NC_000001.10:g.11854442C>T;CLNREVSTAT=no_assertion_criteria_provided;CLNSIG=Pathogenic;CLNVC=single_nucleotide_variant;CLNVCSO=SO:0001483;GENEINFO=MTHFR:4524;MC=SO:0001819|synonymous_variant;ORIGIN=1;RS=367585605;

from spliceai.

kishorejaganathan avatar kishorejaganathan commented on September 17, 2024

I am not entirely sure what this variant is. As earlier, please delete this variant (and others which have a "." in the ALT column) in the input file before running the software. Do let us know if you run into more edge cases like these, and we will incorporate them into the annotation software so that it skips such variants automatically instead of crashing.

from spliceai.

kishorejaganathan avatar kishorejaganathan commented on September 17, 2024

The current release (v1.2) should ignore these edge cases automatically.

from spliceai.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.