Code Monkey home page Code Monkey logo

agg_combininig_queries's Introduction

Combine genotype and functional annotation queries

This workflow allows you to extract variants and samples that comply to both a set of genotype and functional annotation filters, by intersecting the genotype VCFs with the functional annotation VCFs.

Table of contents

Pipeline overview

The pipeline has the following main processes:

  • FIND_CHUNK: finds the genomic and functional annotation agg chunks of interest.
  • EXTRACT_VARIANT_VEP: filters the annotation agg vcfs.
  • INTERSECT_ANNOTATION_GENOTYPE_VCF: intersects the genomic vcf with the filtered annotation vcf.
  • FIND_SAMPLES: finds samples of interest.
  • SUMMARISE_OUTPUT: produces summary tables.

Required inputs

input_bed

This is a region file of your genes of interest. This must be a three or column tab-delimited file of chromosome, start, and stop (with an option fourth column of an identifier - i.e. a gene name). The file should have the .bed extension.

Example of input_bed file:

chr2	213005363	213151603	IKZF2
chr7	50304716	50405101	IKZF1

agg_chunks_bed

This is the list of chunk names and full file paths to both the genotype and functional annotation VCFs for either aggV2 or aggCOVID. These can be found under GEL data resources > aggregate_file_lists > aggV2_chunk_names.bed and GEL data resources > aggregate_file_lists > aggCOVID_4.2_chunk_names.bed

include_exclude

This parameter defines whether to include (set to -i) or to exclude (set to -e) the sites selected using the --expression parameter (see below).

expression

This parameter defines the bcftools filter of your query. See bcftools EXPRESSIONS for accepted filters https://samtools.github.io/bcftools/bcftools.html#expressions.

format

This parameter defines the format of the query, see https://samtools.github.io/bcftools/bcftools.html#query for details. For the process to run, you should add the following fields '[%SAMPLE\t%CHROM\t%POS\t%REF\t%ALT\n]', but you can also specify additional fields after the initial list.

cpus

Number of cpus to be used by each nextflow process. The default is set to 1 cpu per process, but when using and input_bed file with > 5 entries please set it to a higher value.

memory

Total RAM available for each nextflow process. The default is set to 2.GB per process, but when using and input_bed file with > 5 entries please set it to a higher value.

Optional inputs

severity_scale

This file lists the severity of variants. It can be found under GEL data resources > aggregations > gel_mainProgramme > somAgg > v0.2 > additional data > vep severity scale > VEP_severity_scale_2020.txt. Provide this file if interested only in variant with a specific consequence.

severity

With this parameter we choose the severity of variants we are interested in for our query. For example, if you want look only at missense variants or worse, the input value would be missense. Only use if the parameter severity_scale is set.

Outputs

This workflows produces three ouputs for each gene in your input bed file.

  • *_result.tsv file: this is a tab-delimited output from bcftools query command.
  • *_platekey_summary.tsv file: this is a two-column tab-delimited file, where one column is the list of platekeys recovered by the query, and the second column is the number of variants per each participant that satisfied the query.
  • *_variant_summary.tsv file: this is a two-column tab-delimited file, where one column is the list of variants that satisfied the query, and the second column is the number of participants that have that query.

Examples

Example 1

An example question would be: "I want to extract the samples in aggV2 who are homozygous alt for missense (or worse) rare variants within the gene IKZF1".

The final command would look like this:

Example 1

Example 2

An example question would be: "I want to extract the samples in aggV2 who are homozygous alt for any type of variant within the gene IKZF1".

The final command would look like this:

Example 2

agg_combininig_queries's People

Contributors

mcrotti-gel avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.