Code Monkey home page Code Monkey logo

pygemma's Introduction

logo

pyGEMMA: A Fast, User-Friendly Python/R Implementation of Linear Mixed Models for Genome-Wide Association Studies

Table of Contents

Software Requirements

The current implementation of pyGEMMA was tested on the following configuration:

Operating System and Compilers

  • Ubuntu 18.04.6 LST (Bionic Beaver)
  • gcc/g++ 7.5.0 (Ubuntu 7.5.0-3ubuntu1~18.04)

Python Environment

Installation

The installation of pyGEMMA is straightforward and can be performed using Python's pip package manager. Here, we detail the installation process using a virtualenv Python enviroment. This has been tested with the configuration listed in the Software Requirements section. While installation may be possible with other configurations, we it has only been tested with the configuration we list.

  1. Create your Python environment and activate it. Using virtualenv, this can be done by running
pip3 install virtualenv
python3 -m virtualenv pygemma_env
source pygemma_env/bin/activate

Note: If pip3 install virtualenv fails because it can't find pip3, you can try running python3 -m pip install virtualenv instead. This looks for the pip module directly if pip3 isn't in your PATH.

  1. Ensure the Numpy and Cython packages are both installed prior to installing pyGEMMA (they will not be installed automatically). This can be done by running
pip install numpy Cython
  1. Clone this repository.
  2. Install pyGEMMA's dependencies. From the pygemma directory, this can be done by running
pip install -r requirements.txt
  1. Ensure that you have a valid C/C++ compiler loaded. pyGEMMA has been tested using gcc/g++.

  2. Install pyGEMMA. From the pygemma directory, this can be done by running

python setup.py install

Usage

The pyGEMMA package contains both high-level and low-level functions for fitting the linear mixed model outlined in the original GEMMA paper by Zhou et al. (Nat Gen 2012).

General Usage

The pyGEMMA package is designed to fit the same model as GEMMA. That is, it fits

$$ \mathbf{y} = \mathbf{W} \mathbf{\alpha} + \mathbf{x} \mathbf{\beta} + \mathbf{Z} \mathbf{u} + \mathbf{\varepsilon} $$

$$ \mathbf{u} \sim \mathcal{\text{MVN}}_{m}(\mathbf{0}, \lambda \tau^{-1} \mathbf{K}) $$

$$ \mathbf{\varepsilon} \sim \mathcal{\text{MVN}}_{n} \left(\mathbf{0},\tau^{-1} \mathbf{I}_n \right) $$

where $\mathbf{y}$ is $n \times 1$ is the vector phenotype, $\mathbf{W}$ is $n \times c$ is the matrix of fixed effect covariates (including the intercept), $\mathbf{\alpha}$ is the $c \times 1$ vector of coefficients for the covariates, $\mathbf{x}$ is the $n \times 1$ vector of genotypes, $beta$ is the effect size of the genotype, $\mathbf{Z}$ is the $n \times m$ loading matrix, $\mathbf{u}$ is the $m \times 1$ vector of random effects, $\mathbf{\varepsilon}$ is the $n \times 1$ vector of errors, $\tau^{-1}$ is the variance of the resitual errors, $\lambda$ is the ratio between the two variances components, and $\mathbf{K}$ is the relatedness matrix. (Description adapted from Zhou et al. (Nat Gen 2012))

This model can be fit using the function pygemma.lmm.pygemma.

from pygemma import lmm
lmm.pygemma(Y, X, W, K, snps=snps, verbose=1)

Note that snps is a list of SNP names that will be used to label the pandas DataFrame returned by the function. verbose controls whether to output run progress.

pyGEMMA in R

We have also developed an R interface for pyGEMMA, enabling its use within the R programming environment. A comprehensive tutorial for this integration can be found here

Examples

Test Script

We provide a test script designed to test almost all basic functions and to run on three GWAS test cases (10,000 SNPs each). Before using the script, the paths for the data should be updated, as they are currently hardcoded (to be changed later). In our tests, this script took around 20 to 25 minutes to run on 32 cores. Below, we show the Q-Q and Manhattan plots for the three test cases.

qq1 manhattan1
qq2 manhattan2
qq3 manhattan3

1000 Genomes Data

We provide a number of scripts to run eQTL analyses for over 7100 genes using data from 1000 Genomes. These scripts are provided here.

The jobs may be launched by batching the run_pyGEMMA.sh script. Note that all files in the directory should be examined and paths modified before running. The scripts were designed to run using the SLURM scheduler.

Contact

@rlangefe - Robert Langefeld (Department of Biostatistics - University of Michigan)

If you have any questions or comments, please feel free to contact me.

pygemma's People

Contributors

borangao avatar rlangefe avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.