xinhe-lab / mirage Goto Github PK

View Code? Open in Web Editor NEW

1.0 6.0 2.0 10.06 MB

Mixture model based Rare variant Analysis on Genes

Home Page: https://xinhe-lab.github.io/mirage

R 35.08% HTML 64.92%

mirage's Introduction

`mirage`

An R package for MIxture model based Rare variant Analysis on GEnes.

Description

mirage is a new Bayesian statistical method for rare variant (RV) association testing that better accounts for heterogeneity of variant effects within a gene using external annotation information. It models variants in a gene as a mixture of risk and non-risk variants, with a prior probability of being a risk variant determined by functional annotations of the variant such as conservation score and impact on protein structure. Since in general external annotations alone have limited accuracy in predicting functional effects, a simple filter based on such annotations (as commonly performed in many RV association analysis) may result in both false positive and negatives. Instead, by incorporating such information as prior and using a hierarchical model to pool information across genes, mirage is able to better characterize the inclusion probability of rare variants for different functional categories, thus improving the power to detect an association.

Quick Start

Follow the setup instructions below.
See the Quick Start Example for a toy example to run mirage.

Setup

To install the package,

library("devtools")
install_github('xinhe-lab/mirage')

Developer notes

To build documentation for the package,

setwd("/path/to/package/root")
devtools::document()

Please do not manually edit any .Rd files under man folder!

To add tutorials, you write them as .Rmd files and put them under vignettes folder. Then edit this list simply adding to it the names of your .Rmd file (without .Rmd extension).

To build the documentation website (make sure you are connected to Internet while running these commands):

setwd("/path/to/package/root")
devtools::document()
pkgdown::build_site()

To install locally from source code via devtools,

setwd("/path/to/package/root")
devtools::document()
devtools::install()

To auto-format the code,

setwd("/path/to/package/root")
formatR::tidy_dir('./R', wrap = 120)

mirage's People

Contributors

Stargazers

Watchers

Forkers

crerecombinase xsun1229

mirage's Issues

Example of gene level FDR & multiple testing in genome-wide scan

@han16 I was asked by @linnanqia offline who has fixed bug in her code and got what seems encouraging results (log(BF) about 5 for some genes that seems to make sense). However in our tutorial we didn't explain how results are interpreted; in particular, how multiple testing is performed -- how gene level posterior probability should be interpreted in terms of FDR, and what threshold to use.

Could you kindly update the tutorial adding a section on interpreting the results? Thanks!

Annotation group documentation

I got this email from a user:

In my data analysis, I got the result by using annovar anova. I found that you annotate with several popular programs including PolyPhen, CADD and SIFT. In my data analysis, I will do this step before mirage analysis? I read your code about mirage, there are four columns( format of input data column 1:variant, 2,NO.variant in cases 3. No variant in control 4 variant group index). but I do not know where these columns come from? can you teach me?

@han16 we wrote on this page "Variant groups can be user defined, usually depending on its annotations." but it seems too vague to be helpful to users. Is it correct if we change it to the following:

"Variant groups can be user defined, usually depending on its annotations. For example, in Han et al (2019+), we label as group 2 those variants with PolyPhen 194 score greater than 0.957, CADD score top 10% or SIFT score < 0.05; other variants are labelled group 1"

Is this correct?

Is it true to obtain the NO.case and NO.control?

Is it true to obtain the NO.case and NO.control?
#NO.case
#grep 0/1
count_case_01 <- data.frame(apply(tmp_vcf_case_data,1,function(x) length(grep('0/1',x))))
rownames(count_case_01) <- tmp_vcf_case_data[,1]
colnames(count_case_01) <- "count_01"
#grep 1/2
count_case_12 <- data.frame(apply(tmp_vcf_case_data,1,function(x) length(grep('1/2',x))))
rownames(count_case_12) <- tmp_vcf_case_data[,1]
colnames(count_case_12) <- "count_12"

#grep 1/1

count_case_11 <- data.frame(apply(tmp_vcf_case_data,1,function(x) length(grep('1/1',x))))
rownames(count_case_11) <- tmp_vcf_case_data[,1]
colnames(count_case_11) <- "count_11"

#grep 2/2
count_case_22 <- data.frame(apply(tmp_vcf_case_data,1,function(x) length(grep('2/2',x))))
rownames(count_case_22) <- tmp_vcf_case_data[,1]
colnames(count_case_22) <- "count_22"

#combine four data for case
count_case <- cbind(count_case_01,count_case_11,count_case_12,count_case_22)
count_case[,5] <- 2rowSums(count_case[,2:4])+1count_case[,1]
colnames(count_case)[5] <- "N.case"

#NO.control
#grep 0/1
count_contro_01 <- data.frame(apply(tmp_vcf_control_data,1,function(x) length(grep('0/1',x))))
rownames(count_contro_01) <- tmp_vcf_control_data[,1]
colnames(count_contro_01) <- "count_01"

#grep 1/2
count_contro_12 <- data.frame(apply(tmp_vcf_control_data,1,function(x) length(grep('1/2',x))))
rownames(count_contro_12) <- tmp_vcf_control_data[,1]
colnames(count_contro_12) <- "count_12"

#grep 1/1

count_contro_11 <- data.frame(apply(tmp_vcf_control_data,1,function(x) length(grep('1/1',x))))
rownames(count_contro_11) <- tmp_vcf_control_data[,1]
colnames(count_contro_11) <- "count_11"

#grep 2/2
count_contro_22 <- data.frame(apply(tmp_vcf_control_data,1,function(x) length(grep('2/2',x))))
rownames(count_contro_22) <- tmp_vcf_control_data[,1]
colnames(count_contro_22) <- "count_22"
#combine four data for control
count_cantro <- cbind(count_contro_01,count_contro_11,count_contro_12,count_contro_22)
count_cantro[,5] <- 2rowSums(count_cantro[,2:4])+1count_cantro[,1]
colnames(count_cantro)[5] <- "N.control"

how to obtain the NO.case and NO.control

Dear,
I still felt confused about how to obtain the NO.case and NO.control. You said that " (No.case, how many times the variant appears in cases, No.contr, how many times the variant appears in controls — you can compute these quantities from your data)". which file I can get this information. Can you give me an example?Thank you!

xinhe-lab / mirage Goto Github PK

mirage's Introduction

`mirage`

Description

Quick Start

Setup

Developer notes

mirage's People

Contributors

Stargazers

Watchers

Forkers

mirage's Issues

Example of gene level FDR & multiple testing in genome-wide scan

Annotation group documentation

Is it true to obtain the NO.case and NO.control?

how to obtain the NO.case and NO.control

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent