Code Monkey home page Code Monkey logo

mutserve's Introduction

Build Status Twitter Follow

Mutserve is a variant caller for mtDNA to detect heteroplasmic sites in NGS data. It has been integrated in mtDNA-Server. For scalability reasons, mutserve is parallelized using Hadoop MapReduce but also available as a standalone tool.

Differences to mtDNA-Server

  • mutserve always reports the non-reference level as the heteroplasmy level, while mtDNA-Server reports the minor component.
  • mutserve includes a Bayesian model for homoplasmy detection. It uses the 1000G Phase 3 data as a prior and calculates the most likely posterior probability for each genotype. mtDNA-Server only outputs homoplasmic variants with a coverage > 30.

Standalone Usage

You can run mutserve as a standalone tool starting with CRAM/BAM files and detecting heteroplasmic and homoplasmic sites. By default BAQ is set (--noBaq otherwise).

wget https://github.com/seppinho/mutserve/releases/download/v1.3.4/mutserve-1.3.4.jar

java -jar mutserve-1.3.4.jar  analyse-local --input <file/folder> --output <filename.vcf / filename.txt> --reference <fasta> --level 0.01

To create a VCF file as an output simple specify --output filename.vcf.gz. Please use this reference file when using BAQ.

BAM Preperation

Best Practice Pipelines recommend the following steps for BAM files preperation:

  • Remove Duplicates (java -jar picard-tools-2.5.0/picard.jar MarkDuplicates),
  • Local realignment around indels (GenomeAnalysisTK.jar -T RealignerTargetCreator, java -jar GenomeAnalysisTK.jar -T IndelRealigner)
  • BQSR (GenomeAnalysisTK.jar -T BaseRecalibrator).

Default Parameters

Parameter Default Value Command Line Option
InputFolder --input
Output File (supported: *.txt, *.vcf, *vcf.gz) --output
Output Fasta --writeFasta
Heteroplasmy Level 0.01 --level
MappingQuality 20 --mapQ
BaseQuality 20 --baseQ
AlignmentQuality 30 --alignQ
noBaq false --noBaq
noFreq false --noFreq
deletions (beta) false --deletions
insertions (beta) false --insertions

Output Formats

Tab delimited File

By default (--output filename does not end with .vcf or .vcf.gz) we export a TAB-delimited file including ID, Position, Reference, Variant & VariantLevel. Please note that the VariantLevel always reports the non-reference variant level. The output file also includes the most and second most base at a specific position (MajorBase + MajorLevel, MinorBase+MinorLevel). The reported variant can be the major or the minor component. The last column includes the type of the variant (1: Homoplasmy, 2: Heteroplasmy or Low-Level Variant, 3: Low-Level Deletion, 4: Deletion, 5: Insertion). See here for an example.

VCF

If you want a VCF file as an output, please specify --output filename.vcf.gz. Heteroplasmies are coded as 1/0 genotypes, the heteroplasmy level is included in the FORMAT using the AF attribute (allele frequency) of the first non-reference allele. Please note that indels are currently not included in the VCF. This VCF file can be used as an input for https://github.com/seppinho/haplogrep-cmd.

Current Shortcomings

  • The insertions/deletions calling is currently in beta.

Mixture-Module and Performance - Sensitivity and Specificity

As with v1.3.3 you can generate your gold standard given 2 variant files from the source files, which were used to generate mixtures in lab. With those files (call mutserve on the two samples) you can calculate the expected sites (parameter generate-gold) given a mixture ratio and subsequently compare the results from the lab-mixture to the this gold-standard with the performance parameter.

Generate Gold-Standard

Provide the text files from mutserve output of the two files (the txt-variant files from analyse-local are used as input files - file1 for the major component and file2 for the minor mixture component), as well as the level of the mixture (value between 0 and 1) and the output file - which is the resulting gold-standard and input file for the next step - performance calculation - see below:

java -jar mutserve-1.3.4.jar  generate-gold --file1 <variantfileMajorComponent.txt> --file2 <variantfileMinorComponent.txt> --level <mixture levels (e.g. 0.01 for 1%)> --output <expectedvariants.txt>

Performance

If you have a mixture model generated, you can use mutserve for checking precision, specificity and sensitivity. The expected variants (homoplasmic and heteroplasmic) need to be provided as gold standard in form of a text file, with one column, containing the positions expected (this can now be calculated -see previous step). The txt-variant file from analyse-local is used as input file and length needs to be specified (typically 16569 for human mitochondrial genomes, but as there are different reference sequence, this can vary as well). The value provided in level indicates the threshold at which heteroplasmic levels are considered in the analysis.

java -jar mutserve-1.3.4.jar  performance --in <variantfile.txt> --gold <expectedvariants.txt> --length <size of reference (e.g. 16569)> --level <threshold for heteroplasmic levels (e.g. 0.01)>

Citation

If you use this tool, please cite this paper.

Checkout and contribute

mutserve's People

Contributors

seppinho avatar haansi avatar lukfor avatar

Watchers

James Cloos avatar

Forkers

hzaurzli

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.