Code Monkey home page Code Monkey logo

pfr's Introduction

Across-platform imputation of DNA methylation levels using penalized functional regression

DNA methylation is a key epigenetic modification involved in both normal development and disease progression. Recent advances in high-throughput technologies have enabled genomewide profiling of DNA methylation. However, DNA methylation profiling often employs different designs and platforms with varying resolution, which hinders joint analysis of methylation data from multiple platforms.

Here we propose a penalized functional regression model to impute an HM27 (Illumina HumanMethylation27) dataset into an HM450 (Illumina HumanMethylation450) dataset. It is demonstrated that, by incorporating functional predictors, our model can utilize information from non-local probes and impute the missing missing methylation data accurately.

Content

  1. pfr.R: main script used for imputation.
  2. refund_lib.R: library for penalized funuctional regression.
  3. annotation/GPL8490-65.csv.gz: compressed annotation file of the HM27 array
  4. annotation/GPL13534-10305.csv.gz: compressed annotation file of the HM450 array

Input Files

To run the model, you need to put two files under the same directory as the scripts:

  1. meth_450K_QC.txt: HM450 data used to train the penalized functional regression model.
  2. meth_27K_QC.txt: HM27 data which you want to impute into HM450 data

Notice that the methylation data in 1 and 2 should come from the same tissue. For simplicity, we assume both files are tab-delimited files where the row name is probe ID and the column name is sample ID. For example,

2802	2803	2804	...(more samples)...
cg03586879	0.126	0.247	0.029
cg19378133	0.294	0.578	0.037
cg03490200	0.924	0.955	0.941
...(more probes)...

Usage

The main script is pfr.R. You can either run it interactively in R or in the batch mode

R CMD BATCH pfr.R

Output Files

There will be two output files:

  1. impute.txt: A tab-delimited file storing the imputed β values, where the row name is probe ID and the column name is sample ID.
  2. dispersion.txt: A tab-delimited file storing the under-dispersion measure for each probe, where row name is probe ID. This can be used to remove low-quality results in 1.

pfr's People

Stargazers

 avatar  avatar Tim Triche, Jr. avatar

Watchers

Tim Triche, Jr. avatar Guosheng Zhang avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.