Code Monkey home page Code Monkey logo

mirage's Introduction

mirage

An R package for MIxture model based Rare variant Analysis on GEnes.

Description

mirage is a new Bayesian statistical method for rare variant (RV) association testing that better accounts for heterogeneity of variant effects within a gene using external annotation information. It models variants in a gene as a mixture of risk and non-risk variants, with a prior probability of being a risk variant determined by functional annotations of the variant such as conservation score and impact on protein structure. Since in general external annotations alone have limited accuracy in predicting functional effects, a simple filter based on such annotations (as commonly performed in many RV association analysis) may result in both false positive and negatives. Instead, by incorporating such information as prior and using a hierarchical model to pool information across genes, mirage is able to better characterize the inclusion probability of rare variants for different functional categories, thus improving the power to detect an association.

Quick Start

  1. Follow the setup instructions below.

  2. See the Quick Start Example for a toy example to run mirage.

Setup

To install the package,

library("devtools")
install_github('xinhe-lab/mirage')

Developer notes

To build documentation for the package,

setwd("/path/to/package/root")
devtools::document()

Please do not manually edit any .Rd files under man folder!

To add tutorials, you write them as .Rmd files and put them under vignettes folder. Then edit this list simply adding to it the names of your .Rmd file (without .Rmd extension).

To build the documentation website (make sure you are connected to Internet while running these commands):

setwd("/path/to/package/root")
devtools::document()
pkgdown::build_site()

To install locally from source code via devtools,

setwd("/path/to/package/root")
devtools::document()
devtools::install()

To auto-format the code,

setwd("/path/to/package/root")
formatR::tidy_dir('./R', wrap = 120)

mirage's People

Contributors

han16 avatar gaow avatar

Stargazers

Jacob Ulirsch avatar

Watchers

James Cloos avatar  avatar Nicholas Knoblauch avatar Xin He avatar  avatar  avatar

mirage's Issues

Example of gene level FDR & multiple testing in genome-wide scan

@han16 I was asked by @linnanqia offline who has fixed bug in her code and got what seems encouraging results (log(BF) about 5 for some genes that seems to make sense). However in our tutorial we didn't explain how results are interpreted; in particular, how multiple testing is performed -- how gene level posterior probability should be interpreted in terms of FDR, and what threshold to use.

Could you kindly update the tutorial adding a section on interpreting the results? Thanks!

Annotation group documentation

I got this email from a user:

In my data analysis, I got the result by using annovar anova. I found that you annotate with several popular programs including PolyPhen, CADD and SIFT. In my data analysis, I will do this step before mirage analysis? I read your code about mirage, there are four columns( format of input data column 1:variant, 2,NO.variant in cases 3. No variant in control 4 variant group index). but I do not know where these columns come from? can you teach me?

@han16 we wrote on this page "Variant groups can be user defined, usually depending on its annotations." but it seems too vague to be helpful to users. Is it correct if we change it to the following:

"Variant groups can be user defined, usually depending on its annotations. For example, in Han et al (2019+), we label as group 2 those variants with PolyPhen 194 score greater than 0.957, CADD score top 10% or SIFT score < 0.05; other variants are labelled group 1"

Is this correct?

Is it true to obtain the NO.case and NO.control?

Is it true to obtain the NO.case and NO.control?
#NO.case
#grep 0/1
count_case_01 <- data.frame(apply(tmp_vcf_case_data,1,function(x) length(grep('0/1',x))))
rownames(count_case_01) <- tmp_vcf_case_data[,1]
colnames(count_case_01) <- "count_01"
#grep 1/2
count_case_12 <- data.frame(apply(tmp_vcf_case_data,1,function(x) length(grep('1/2',x))))
rownames(count_case_12) <- tmp_vcf_case_data[,1]
colnames(count_case_12) <- "count_12"

#grep 1/1

count_case_11 <- data.frame(apply(tmp_vcf_case_data,1,function(x) length(grep('1/1',x))))
rownames(count_case_11) <- tmp_vcf_case_data[,1]
colnames(count_case_11) <- "count_11"

#grep 2/2
count_case_22 <- data.frame(apply(tmp_vcf_case_data,1,function(x) length(grep('2/2',x))))
rownames(count_case_22) <- tmp_vcf_case_data[,1]
colnames(count_case_22) <- "count_22"

#combine four data for case
count_case <- cbind(count_case_01,count_case_11,count_case_12,count_case_22)
count_case[,5] <- 2rowSums(count_case[,2:4])+1count_case[,1]
colnames(count_case)[5] <- "N.case"

#NO.control
#grep 0/1
count_contro_01 <- data.frame(apply(tmp_vcf_control_data,1,function(x) length(grep('0/1',x))))
rownames(count_contro_01) <- tmp_vcf_control_data[,1]
colnames(count_contro_01) <- "count_01"

#grep 1/2
count_contro_12 <- data.frame(apply(tmp_vcf_control_data,1,function(x) length(grep('1/2',x))))
rownames(count_contro_12) <- tmp_vcf_control_data[,1]
colnames(count_contro_12) <- "count_12"

#grep 1/1

count_contro_11 <- data.frame(apply(tmp_vcf_control_data,1,function(x) length(grep('1/1',x))))
rownames(count_contro_11) <- tmp_vcf_control_data[,1]
colnames(count_contro_11) <- "count_11"

#grep 2/2
count_contro_22 <- data.frame(apply(tmp_vcf_control_data,1,function(x) length(grep('2/2',x))))
rownames(count_contro_22) <- tmp_vcf_control_data[,1]
colnames(count_contro_22) <- "count_22"
#combine four data for control
count_cantro <- cbind(count_contro_01,count_contro_11,count_contro_12,count_contro_22)
count_cantro[,5] <- 2rowSums(count_cantro[,2:4])+1count_cantro[,1]
colnames(count_cantro)[5] <- "N.control"

how to obtain the NO.case and NO.control

Dear,
I still felt confused about how to obtain the NO.case and NO.control. You said that " (No.case, how many times the variant appears in cases, No.contr, how many times the variant appears in controls โ€” you can compute these quantities from your data)". which file I can get this information. Can you give me an example?Thank you!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.