Code Monkey home page Code Monkey logo

variational_inference_of_bayesian_linear_regression's Introduction

Variational inference of Bayesian polygenic risk score regression with spike-and-slab prior

Implemented the variational inference (VI) algorithm to infer Bayesian linear regression coefficients that uses a spike-and-slab prior and reparametrized into Bernoulli-Gaussian prior.

Variational Inference

Variational inference is a technique used in Bayesian statistics and machine learning to approximate complex posterior distributions when exact inference is computationally infeasible. It is particularly useful when dealing with models that involve high-dimensional latent variables or complex dependencies.

Polygenic risk score (PRS) regression

Polygenic risk score (PRS) regression is a statistical method used in genetics to predict complex traits or diseases based on the aggregate effects of multiple genetic variants. It involves calculating a polygenic risk score for an individual by summing the effects of genetic variants, each weighted by its effect size.

A Spike and Slab prior

A spike-and-slab prior is a type of prior distribution commonly used in Bayesian regression to handle variable selection. It consists of two parts:

  • Spike: This part of the prior assigns high probability mass to a specific value of the regression coefficient, often set to zero, representing the "sparsity" assumption, meaning that many coefficients are expected to be exactly zero.
  • slab: The slab part assigns a broader distribution to the coefficients, allowing them to take non-zero values. By combining the spike and slab components, the spike-and-slab prior encourages sparsity in the model, effectively performing variable selection by setting some coefficients to exactly zero.

Variational inference of Bayesian linear regression

When applying variational inference to Bayesian polygenic risk score regression with a spike-and-slab prior, the main goal is to approximate the posterior distribution over the regression coefficients and other model parameters. This approximation is achieved by transforming the inference problem into an optimization problem. Variational inference optimizes a variational lower bound on the log marginal likelihood (evidence) of the model.

In particular, the model is specified as below:

model

The EM algorithm of Bayesian PRS with spike-and-slab prior

em

Dataset:

There are M = 100 SNPs, $N_{train}$ = 439 training patients, $N_{test}$ = 50 test patients. Both the genotype $X_{train},X_{test}$ and phenotype $y_{train}, y_{test}$ were standardized. The marginal summary statistics $β_{marginal}$ saved in beta marginal.csv.gz and the LD matrix R saved in LD.csv.gz were completely derived from the 439 training patients only: dataset

Expectation step

In the Expectation step, we cycle through one SNP at a time to update its posterior precision and posterior mean of the effect size and its PIP while fixing all of the rest of the SNPs.

Set the initial values for the posterior estimates for all SNPs to the following values: e1

Set the initial values for the hyperparameters to the following values: e2

For numerical stability, after a full cycle of E-step, cap the resulting PIP $γ_j$ of each SNP within [0.01, 0.99]. That is, if $γ_j$ < 0.01 set it to 0.01, and if $γ_j$ > 0.99 set it to 0.99.

Maximization step

In Maximization step, we didn't update $τ_ϵ$ in the M-step and keep its value always as 1, and only implement the update for $τ_β$ and π.

Evidence lower bound

The evidence lower bound (ELBO) of the model is: elbo

More specifically,

elbos

where $γ_j^* $ , $μ_j^* $ , and $τ_j$ are the inferred PIP, mean and precision of the effect size for SNP j at the E-step, respectively; ◦ is the elementwise product of two vectors.

I ran EM-update algorithm for 10 iterations and the plot the ELBO as a function of iterations. elbor

Evaluating PRS prediction

$X_{train}, y_{train}, X_{test}, y_{test}$ from the respective CSV files: X train.csv.gz, y train.csv.gz, X test.csv.gz, y test.csv.gz

Then, predict PRS for both the 439 training patients and the 50 testing patients using their genotypes: train

where $γ^∗$ and $μ^∗$ are the inferred PIP and expected effect size at the E-step, respectively. ◦ is the elementwise product of two vectors.

I then calculated the Pearson correlation coefficient (PCC) between your predicted phenotypes and the true phenotypes, and generated the following scatter plots PRS prediction on training and testing set

prs1 prs2

Fine-mapping

Inferred PIP γ. The causal SNPs are rs9482449, rs7771989, and rs2169092 are colored in red.

pip

variational_inference_of_bayesian_linear_regression's People

Contributors

sagarnandeshwar avatar

Stargazers

 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.