Code Monkey home page Code Monkey logo

metaminimac2's Introduction

MetaMinimac2

MetaMinimac2 is an efficient tool to combine genotype data imputed against multiple reference panels.

Installation

Prerequisite: cget>=0.1, cmake>=3.2

git clone https://github.com/yukt/MetaMinimac2.git
cd MetaMinimac2
bash install.sh

Usages

The first step is to impute pre-phased target haplotypes against reference panels separately using Minimac4 with --meta option on, which will generate a .empiricalDose.vcf.gz file (which is required for MetaMinimac2) in addition to the .dose.vcf.gz file. Please see http://genome.sph.umich.edu/wiki/Minimac4 for detailed documentation for Minimac4.

minimac4 --refHaps refPanelA.m3vcf \
         --haps targetStudy.vcf \
         --prefix PanelA.imputed \
         --meta
         
minimac4 --refHaps refPanelB.m3vcf \
         --haps targetStudy.vcf \
         --prefix PanelB.imputed \
         --meta

The second step is to integrate the imputed results using MetaMinimac2.

MetaMinimac2 -i PanelA.imputed:PanelB.imputed -o A_B.meta.testrun

Options

-i, --input  <prefix1:prefix2 ...>  (Required) Colon-separated prefixes of input data to meta-impute
-o, --output <prefix>               (Required) Output prefix [MetaMinimac.Output.Prefix]
-f, --format <string>               Comma-separated output FORMAT tags [GT,DS,HDS]
-p, --skipPhasingCheck              OFF by default. If ON, program will skip phasing consistency check before analysis. 
-s, --skipInfo                      OFF by default. If ON, the INFO fields are removed from the output file
-n, --nobgzip                       OFF by default. If ON, output files will NOT be bgzipped
-w, --weight                        OFF by default. If ON, weights will be saved in [MetaMinimac.Output.Prefix].metaWeights(.gz)
-l, --log                           OFF by default. If ON, log will be written to $prefix.logfile
-h, --help                          OFF by default. If ON, detailed documentation on options and usage will be displayed

Output Files

The meta-imputed result will be saved in [MetaMinimac.Output.Prefix].metaDose.vcf.gz.

When --weight is ON, the weights for meta-imputation will be saved in [MetaMinimac.Output.Prefix].metaWeights.gz. The weight file is also in VCF format, which is good for individual filtering by vcftools or bcftools. In the format field, WT1 stands for the weight on reference panel 1 (which is related to input files with prefix1), WT2 stands for the weight on reference panel 2, etc.

Imputation Server

Michigan Imputation Server offers the option to generate .empiricalDose.vcf.gz file for the convenience of downstream meta-imputation using MetaMinimac2.

Imputation steps:

  1. Choose Genotype Imputation (Meta-Imputation Option) in the Run tab;
  2. Set up the reference panel, input files, etc. following the instruction here;
  3. Tick the checkbox Generate Meta-imputation file before Submit Job;
  4. After the job has finished, imputation results can be downloaded from the server. The zip archive contains both .dose.vcf.gz and .empiricalDose.vcf.gz files.

TOPMed Imputation Server will stage the same option soon.

Important Notes

1. The reference panels must be on the same build.

The reference panels used for meta-imputation must be on the same build. For example, for meta-imputation using 1000G and TOPMed, user should impute using 1000G GRCh38 instead of the original GRCh37 build.

2. Phasing must be consistent across input files.

If the phasing is different between input imputed files, the resulting meta dosage (which is supposed to be a weighted average of imputed dosages on the same haplotype) will be messed up. Therefore, we highly recommend that users always keep --skipPhasingCheck OFF to avoid any risk.

The best practice is to do the phasing first and use the same pre-phased vcf file for imputation against different reference panels.

An alternative is to use .empiricalDose.vcf.gz file from imputation using one reference panel as input vcf file for imputation using other reference panels.

For example, when the phasing step is performed by Michigan Imputation Server where the phased vcf file is not provided as output, user can use the output .empiricalDose.vcf.gz file for imputation using other reference panels.

Please note that .empiricalDose.vcf.gz file contains overlapping sites between the genotype array and the reference panel in use, and does not include the sites existing in the genotype array only, so the results generated by imputation using empiricalDose.vcf could be different from that using a pre-phased vcf including all genotyped sites.

metaminimac2's People

Contributors

yukt avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.