Code Monkey home page Code Monkey logo

Comments (4)

thedam avatar thedam commented on August 14, 2024 1

ah, ok, I've found the answer. It's not really clear at the first time to figure out what is what and where is it...
somewhere under a link from README:

Note: The annotations for all possible SNVs within genes are available here for download.

somehow with instruction from cli

I figured out to run such command
./bs project download --id 66029966 -o down/

there are vcfs with scores for hg19:

down/SpliceAI_supplement_ds.79b22cc932df4db8848c87afd19d78d3$ ls
exome_spliceai_scores.vcf.gz  
gencode_gtex_train.tsv 
gencode_test.tsv  
gencode_train.tsv  
gtex_junctions 
lincrna.tsv  
README  
whole_genome_filtered_spliceai_scores.vcf.gz

Can spliceai use this data directly or I should write my own scripts?

from spliceai.

kishorejaganathan avatar kishorejaganathan commented on August 14, 2024 1

(regarding 64 CPUs) I'm not sure SpliceAI is capable of using multiprocessing to speed things up, unless you've made code changes. On a single CPU, it scores around 4K variants per hour, the number is around 25K on a single GPU.

(caching) No, SpliceAI does not cache any variants.

(user warning) No, it is not important - you can ignore it.

(regarding the prescored variants) SpliceAI cannot use this data directly at the moment. That is a good suggestion though, and I will consider adding that functionality in the next release. Right now, what we recommend is to use to tool to only score INDELs and use the prescored list for all SNV annotations (since we've covered all SNVs). The file you're interested in is whole_genome_filtered_spliceai_scores.vcf.gz . We scored all possible SNVs from TSS start to stop of GENCODE canonical genes. To keep the file size small, we've discarded variants with scores less than 0.1.

from spliceai.

GuoFengWang avatar GuoFengWang commented on August 14, 2024

Hi, I find that there are two types of prescored files in dataset(spliceai_scores.masked.indel.hg19.vcf.gz and spliceai_scores.raw.indel.hg19.vcf.gz), I want to know what is the difference between these two files and can I use these
prescored indel files to annotate my own indel variants directly ? Many thanks @kishorejaganathan

from spliceai.

kishorejaganathan avatar kishorejaganathan commented on August 14, 2024

From FAQ #2:
The raw files also include splicing changes corresponding to strengthening annotated splice sites and weakening unannotated splice sites, which are typically much less pathogenic than weakening annotated splice sites and strengthening unannotated splice sites. The delta scores of such splicing changes are set to 0 in the masked files. We recommend using raw files for alternative splicing analysis and masked files for variant interpretation.

from spliceai.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.