DeepLINK: Deep Learning INference using Knockoffs

DeepLINK is a variable selection algorithm that guarantees the false discovery rate (FDR) control in high-dimensional settings. It consists of two major parts: an autoencoder network for the knockoff variable construction and a multilayer perceptron network for feature selection with the FDR control.

Dependencies

DeepLINK requires Python 3 (>= 3.7.6) with the following packages:

numpy >= 1.18.5
keras >= 2.4.3
tensorflow >= 2.3.0

Installation

Clone the github repository and enter DeepLINK directory with

$ git clone https://github.com/zifanzhu/DeepLINK.git
$ cd DeepLINK

Usage

The inputs of DeepLINK are sample-by-feature data matrix and the corresponding response vector. Assume the dimension of the data matrix is n-by-p. Then the response should an n-by-1 vector. Also, they should be in '.npy' format and without row and column names. To convert your data into '.npy' format, you can simply load it to python by numpy.loadtxt, trim the row and column names, and then save it with numpy.save. You need to have 'numpy' installed to do so. Another required input is the output directory. The outputs of DeepLINK are two space delimited files with selected feature indices using knockoff and knockoff+ threshold respectively, with names selected_variable_ko and selected_variable_ko+. Notice that feature index is 0-based, so the first feature (first column of the data matrix) has index 0.

Options

-h, --help            show this help message and exit
-X DATA, --data=DATA  data matrix in '.npy' format (row sample, column
                      feature)
-y RESPONSE, --response=RESPONSE
                      response vector in '.npy' format
-o OUTPUT_DIR, --out=OUTPUT_DIR
                      output directory
-s                    use this flag if you do not want to center and scale
                      data (center/scale makes every feature column have
                      mean 0/sd 1)
-l L1, --l1=L1        l1 regularization coefficient used in the feature
                      selection MLP [default: 0.001]
-r LR, --lr=LR        learning rate used in the feature selection MLP
                      [default: 0.001]
-a ACT, --act=ACT     activation function used in the feature selection MLP
                      [default: elu]
-L LOSS, --loss=LOSS  loss function used in the feature selection MLP
                      [default: mean_squared_error]
-q Q, --fdr_level=Q   fdr level [default: 0.2]

A toy example

Run the following code and the outputs are in folder test/out:

$ python deeplink.py -X test/X.npy -y test/y.npy -o test/out

Replicate studies in the paper

Simulation studies

Codes used in the simulation studies are in folder Simulations/. An example python script, example.py, for the simulation setting with linear factor model and linear link function is provided. Simulation results of other settings can be obtained by running this script with changes of setting parameters: x_design, y_design, etc. Choices of factor model design x_design: linear, add_quad, logistic. Choices of link function design y_design: linear, nonlinear. Note that the l1 regularization factor (l1) and the learning rate (lr) in the mlp training needs to be tuned for optimal performance.

Real data analyses

Data matrices and codes used in the real data analyses are in folder Real_data_analyses/. Preprocessed data matrices are given as csv files under the corresponding data/ folder. For the microbiome data analysis, we used an independent dataset yu_CRC_common.csv for screening. For the other two scRNA-seq datasets, the screening was done using 50% of the dataset itself. The comparison folder contains the codes for a prediction comparison analysis between DeepLINK, IPAD and Random Forests.

Copyright and License Information

Authors: Zifan Zhu, Yingying Fan, Yinfei Kong, Jinchi Lv, Fengzhu Sun

This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with this program. If not, see http://www.gnu.org/licenses/.

Commercial users should contact Dr. Fengzhu Sun ([email protected]) or Dr. Yingying Fan ([email protected]), copyright at University of Southern California.

stjordanis / deeplink Goto Github PK

deeplink's Introduction

DeepLINK: Deep Learning INference using Knockoffs

Dependencies

Installation

Usage

Options

A toy example

Replicate studies in the paper

Simulation studies

Real data analyses

Copyright and License Information

deeplink's People

Contributors

Watchers

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent