Code Monkey home page Code Monkey logo

glimpse_pipeline's Introduction

Genotype imputation using low depth sequencing data

1. Description

Pipeline for genotype imputation from low depth sequencing data using GLIMPSE1. This pipeline uses GATK HaplotypeCaller for estimating genotype probabilities at reference sites prior to imputation. The pipeline was developed using Nextflow and was tested on SLURM job scheduler.

2. Prerequisites

The following software is required:

  • Singularity
  • Nextflow
  • bcftools

3. Installation

To run this pipline you will need:

  1. Download and install GLIMPSE1
  2. Build GATK singularity container: singularity build gatk_VERSION.sif docker://broadinstitute/gatk:VERSION
  3. Clone this repo: git clone https://github.com/CERC-Genomic-Medicine/glimpse_pipeline.git

4. Execution

  1. Modify nextflow.config configuration file:
  • params.reference_vcfs -- path to VCF/BCF files with phased reference panel genotypes. Each VCF/BCF file must have the corresponding tbi/csi index.
  • params.reference_sites_vcfs -- path to sites-only VCF/BCF files of the reference panel. Each VCF/BCF file must have the corresponding tbi/csi index.
  • params.study_bams -- path to BAM/CRAM files. One BAM/CRAM file per study participant. Each BAM/CRAM file must have the corresponding bai/crai index.
  • params.referenceDir -- path to the folder with the reference genome *.fa file.
  • params.referenceGenome -- name of the reference genome *.fa file (e.g. hs37d5.fa).
  • params.gatkContainer -- path to the GATK singularity image file (.sif).
  • params.window_size -- imputation window size in base-pairs. This is a parameter to the GLIMPSE_chunk executable. See GLIMPSE1 documentation for more details.
  • params.buffer_size -- imputation window buffer size in base-pairs. This is a parameter to the GLIMPSE_chunk executable. See GLIMPSE1 documentation for more details.
  • params.chunk_exec -- path to the GLIMPSE_chunk executable.
  • params.phase_exec -- path to the GLIMPSE_phase executable.
  • params.ligate_exec -- path to the GLIMPSE_ligate executable.
  • params.glimpse_maps -- path to the GLIMPSE's genetic maps folder with the corresponding human genome build version.
  • process.* and executor.* -- set this arguments according to your compute cluster configuration.
  1. Run pipleine. Example of interactive SLURM job:
salloc --time=12:00:00 --ntasks=1 --mem-per-cpu=16G
module load nextflow
module load singularity
module load bcftools
nextflow run Imputation.nf

glimpse_pipeline's People

Contributors

dtaliun avatar

Stargazers

 avatar  avatar

Watchers

 avatar  avatar  avatar

Forkers

fbeghini

glimpse_pipeline's Issues

Stopping at HaplotypeCaller

Thanks @dtaliun for putting this pipeline together!

I was able to get through the first steps of the pipeline correctly. I made some modifications in the command lines called since I'm using GLIMPSE v2 and some options are a bit different.
However, I'm only able to get log files from the chunk step in the result folder for now. I was able to do the HaplotypeCaller step as well, but then it stops without any error. I seems that the pipeline is not able to find the input files it needs in order to continue?

I also noticed that when I launch again the pipeline, it can't detect which steps ended up well. So it just starts again from scratch.
Can you help me with this?
Thanks a lot!
JC

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.