Code Monkey home page Code Monkey logo

vep's Introduction

IMPORTANT: This repository is not maintained anymore and has been moved to https://github.com/buschlab/vep

Containerized Variant Effect Predictor (VEP) + Cache

Twitter

 + Introduction
 + Building image with Singularity
 + Run VEP
    |-- More options
    |-- Examples
 + Post-processing
    |-- Split VEP
    |-- Filtering by VEP annotations
 + VEP plugins
 + Build & run VEP with Docker
 + Acknowledgements

Introduction

This documentation describes the usage of the Docker image at https://hub.docker.com/r/matmu/vep which contains the bioinformatics tool Ensembl Variant effect predictor (VEP) for annotating genetic variants. The image comes with

  • Merged cache including RefSeq and Ensembl transcripts (VEP parameter --merged required)
  • Reference genome and index
  • Plugins (annotation data is not included)

Available versions

Human

105-GRCh38 105-GRCh37
103-GRCh38-merged 103-GRCh38
101-GRCh38
100-GRCh38 100-GRCh38-merged 100-GRCh37 100-GRCh37-merged
99-GRCh38-merged 99-GRCh37-merged

Mouse

105-GRCm39
103-GRCm39
101-GRCm38
100-GRCm38 100-GRCm38-merged

The term merged refers to the merged Ensembl/RefSeq cache. To be consistent with the Ensembl website, chose Ensembl cache only (i.e. without the term merged). Examples for available versions are 99-GRCh38 (VEP 99 with Ensembl cache for reference GRCh38) or 99-GRh37-merged (VEP 99 with Ensembl/Refseq cache for reference GRCh37).

You can also visit https://hub.docker.com/r/matmu/vep/tags to get a list of available versions.

Note: If you require a container for a species not mentioned above, feel free to contact us or even better, create an issue.

Build image with Singularity

singularity build vep.<version>.simg docker://matmu/vep:<version>

<version> is a tag representing the Ensembl version and the species + version of the reference genome.

Run VEP

To run VEP execute

singularity exec vep.<version>.simg vep [options]

whereby <version> is replaced by a respective version (see above), e.g. 99-CRCh38. It is essential to add the VEP option --merged when using an image with merged Ensembl/Refseq cache. For species except homo sapiens, also the parameter --species (e.g. --species mus_musculus), has to be set as well.

More options

The options for base cache/plugin directories, species and assembly are set to the right values by default and do not need to be set by the user.

Visit http://www.ensembl.org/info/docs/tools/vep/script/vep_options.html for detailed information about all VEP options. Detailed information about input/output formats can be found at https://www.ensembl.org/info/docs/tools/vep/vep_formats.html#defaultout.

Examples

Minimum (output format: compressed tab delimited)

singularity exec vep.100-GRCh38-merged.simg vep --dir /opt/vep/.vep --merged --offline --cache --input_file <filename>.vcf[.gz] --output_file <filename>.txt.gz --tab --compress_output bgzip
singularity exec vep.100-GRCh38.simg vep --dir /opt/vep/.vep --offline --cache --input_file <filename>.vcf[.gz] --output_file <filename>.txt.gz --tab --compress_output bgzip
singularity exec vep.100-GRCm38.simg vep --dir /opt/vep/.vep --offline --cache --input_file <filename>.vcf[.gz] --output_file <filename>.txt.gz --tab --compress_output bgzip -species mus_musculus

Minimum (output format: compressed vcf)

singularity exec vep.100-GRCh38.simg vep --dir /opt/vep/.vep --offline --cache --input_file <filename>.vcf[.gz] --output_file <filename>.vcf.gz --vcf --compress_output bgzip

Full annotation

singularity exec vep.100-GRCh38.simg vep --dir /opt/vep/.vep --offline --cache --input_file <filename>.vcf[.gz] --output_file <filename>.vcf.gz --vcf --compress_output bgzip --everything --nearest symbol        

Post-processing

Split VEP

There is a plugin for bcftools that allows to split VEP annotations as well as sample information in a VCF file and convert it to a text file: http://samtools.github.io/bcftools/howtos/plugin.split-vep.html.

Filtering by VEP annotations

If you chose to output the VEP annotations as text file, any command line tool (e.g. awk) or even Excel can be used for filtering the results. For VCF files, the image includes a VEP filtering script which can be executed by

singularity exec vep.<version>.simg filter_vep [options]

Options

Visit https://www.ensembl.org/info/docs/tools/vep/script/vep_filter.html for detailed info about available options.

Filtering examples

Filter for rare variants
singularity exec vep.<version>.simg filter_vep --input_file <filename>.vcf --output_file <filename>.filtered.vcf --only_matched --filter "(IMPACT is HIGH or IMPACT is MODERATE or IMPACT is LOW) and (BIOTYPE is protein_coding) and ((PolyPhen > 0.446) or (SIFT < 0.05)) and (EUR_AF < 0.001 or gnomAD_NFE_AF < 0.001 or (not EUR_AF and not gnomAD_NFE_AF))" 

VEP plugins

VEP allows several other annotations sources (aka Plugins). Their respective Perl modules are included in the image, the annotation files have to be added seperately, however. The list of plugins as well as instructions on how to download and pre-process the annotation files can be found at: http://www.ensembl.org/info/docs/tools/vep/script/vep_plugins.html.

singularity exec vep.100-GRCh38-merged.simg vep --dir /opt/vep/.vep --merged --offline --cache --input_file <filename>.vcf[.gz] --output_file <filename>.txt.gz --tab --compress_output bgzip --plugin CADD,/path/to/ALL.TOPMed_freeze5_hg38_dbSNP.tsv.gz

Build and run VEP with Docker

To pull the image and run the container with Docker use

docker run matmu/vep:<version> vep [options]

Unlike Singularity, the directories of Plugin annotation files (e.g. /path/to/dir) have to be explicitely bound to a target directory (e.g. /opt/data) within the container with option -v:

docker run -v /path/to/dir:/opt/data matmu/vep:<version> vep [options]

Acknowledgments

This document has been created by Julia Remes & Matthias Munz, University of Lübeck, Germany.

vep's People

Contributors

juliar1 avatar matmu avatar

Stargazers

 avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

vep's Issues

VEP docker run error

Hi,
I am trying to run VEP using this docker command:

docker run -v /home/dnanexus:/opt/data matmu/vep:105-GRCh37 vep --dir /opt/vep/.vep --offline --cache --input_file /opt/data/NGC00142_01.vcf.gz --vcf -o STDOUT --format vcf --offline --cache --dir_cache /opt/data/snv-master/demo/softwares/vep/grch37 --force_overwrite --species homo_sapiens --assembly GRCh37 --port 3337 --vcf_info_field ANN --sift b --polyphen b --humdiv --regulatory --allele_number --total_length --numbers --domains --hgvs --protein --symbol --ccds --uniprot --canonical --biotype --check_existing --af --af_1kg --pubmed --gene_phenotype --variant_class --plugin CADD,/opt/data/snv-master/demo/resources/cadd/grch37/whole_genome_SNVs.tsv.gz,/opt/data/snv-master/demo/resources/cadd/grch37/InDels.tsv.gz --plugin ExACpLI,/opt/data/snv-master/demo/resources/gnomad/grch37/gnomad.v2.1.1.lof_metrics.by_transcript_forVEP.txt --plugin REVEL,/opt/data/snv-master/demo/resources/revel/grch37/new_tabbed_revel.tsv.gz --plugin SpliceRegion --stats_text --stats_file /opt/data/tmpFile_1_AC0.exon.noGT.vep_stats.txt --output_file /opt/data/tmpFile_1_AC0.exon.noGT.new.vep.vcf

But, getting the following error:
Screenshot 2022-05-29 at 6 39 45 pm

Please let me know how to fix it.
Thanks with regards,
Ravi

Error while building the singularity image

Sometimes an error occurs while building the singularity image:

Downloaded layer sha256:94da6cf6f5118f7b1daab46ec4470d1f17ef914933c005fd963f718c0c4c9424.tar.gz does not match checksum
Error during layer download - one or more layers failed to download correctly.

Adding samtools

Hi @matmu,

is it possible to include samtools in the docker image in the next release. The LOFTEE plugin requires samtools and it would be interesting for us to include the information of LOFTEE as well.

Many thanks,
Axel

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.