Code Monkey home page Code Monkey logo

Comments (5)

kishorejaganathan avatar kishorejaganathan commented on August 14, 2024

Your assumption is correct. The precomputed scores were produced only for GRCh37, and they were then lifted over to GRCh38. However, the files annotations/grch37.txt and annotations/grch38.txt are not lifted over versions of each other. We downloaded both of those from the UCSC genome browser, and filtered out some genes from annotations/grch38.txt as they had different number of exons/transcript lengths in the two builds. So, when you use the grch38 option, if you see a score, it will match the score on the precomputed lifted over files, but the lifted over files might contain a few more transcripts.

from spliceai.

aerval avatar aerval commented on August 14, 2024

Just so you know, this is not totally correct. We found that you included genes on the Y chromosomal pseudoautosomal region in the annotations file that are not in the prescored file (only prescored for the X coordinates). Furthermore we found that the annotations for GRCh37 includes the gene SCGB1C2 on chromosome 11 while on GRCh38 it is on chromosome 17. While the latter is in line with Gencode, the GRCh37 position is totally off and can only be partially explained by the gene SCGB1C1 being at a mostly overlapping position. However, due to the liftover, in the GRCh38 prescored file SCGB1C2 variants are now all located on chromosome 11 too.

I would recommend to remove SCGB1C2 and the pseudoautosomal region genes from the SpliceAI GRCh38 annotations. However, I am not 100% sure SCGB1C2 is the only case of such a coordinate swap since I have no idea how it may have originated.

from spliceai.

kishorejaganathan avatar kishorejaganathan commented on August 14, 2024

Thanks for letting me know, I got to the bottom of this issue. The GENCODE annotations are originally in hg38 and the hg19 version is obtained via hg38ToHg19.over.chain, and the SpliceAI scores are originally in hg19 and the hg38 version is obtained via hg19ToHg38.over.chain. For this gene, the two liftovers are not reversible unfortunately. chr17:137525 in hg38 goes to chr11:193034 in hg19, but chr11:193034 in hg19 seems to stay at chr11:193034 in hg38 as well.

This issue seems to affect 17 genes in total:
PPIAL4E
MUC12
OR4C46
OR4A8
OR4A5
IGHV1-69-2
RP11-294C11.4
RP11-294C11.2
POTEB
SCGB1C2
CBSL
U2AF1L5
CH507-152C13.3
SMIM11B
OR11H1
SPANXB1
OPN1MW3

I'll take these out from the annotations file for the sake of consistency.

from spliceai.

aerval avatar aerval commented on August 14, 2024

Awesome, that totally makes sense!

from spliceai.

smeeta avatar smeeta commented on August 14, 2024

Hi !

Any idea why hg38 ANNOVAR annotated variants (gene) do not match the TxDb.Hsapiens.UCSC.hg38.knownGene ?

I used hg 38 to annotate my.vcf file in ANNOVAR however when I used this my_filtered.vcf to plot variants using lollyplot from trackViewer package. I get different gene name. The latter uses TxDb.Hsapiens.UCSC.hg38.knownGene db.
Example I have 5 variants in gene PCDHA7 but the lollyplot is showing those 5 variants to be on gene PCDHB4.

Annovar link - https://annovar.openbioinformatics.org/en/latest/
Lollyplot link - https://www.bioconductor.org/packages/devel/bioc/vignettes/trackViewer/inst/doc/lollipopPlot.html#Variant_Call_Format_(VCF)_data ( see under vcf input format)

Any help will be super great !!
Thank you
Smeeta

from spliceai.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.